How to create web scraper with python ? ( Selenium )

in #pythonlast year

scraper.png

there are many ways to create Web Scraper with python the best way to do that is using selenium . the selenium enables you to open up a browser page with python and do certain tasks like pressing keys or scraping part of the page . the best thing about selenium is that it acts like human being and is not like any other scraper that are easily detectable .

the selenium is not only used to scrape data . it can be used for many things like automated buy order or …

today we will be creating the basic python program that opens up google in IE and select the search bar in google ( currently google search bar class name is “gLFyf” ) and types “Hi mom” and press ENTER to do the search and writes the source code of the page in txt called “page_source_of_google_after_typing_hi_mom.txt” file in the same place as the program .

this program is for demonstration of the way that work is done . after getting the new page source code you can do anything with it . please be creative there are many projects like this on freelancing sites . you just have to be more creative and play around with code ( some ideas are that you can create a web page and put the scraped information in it so that it becomes user friendly )

The first thing that you should do is download the python from the official website :

https://www.python.org/

in this tutorial we will be using windows .

After the installing the python you need to install selenium and webdriver-manager . you can do that by typing below commands in CMD or powershell of your windows.

pip install webdriver-manager

pip install selenium

now you are ready to go . I wrote the program and put it on github you can download it and play with it .

https://github.com/sinas12/blue_scrape

now I want to explain briefly what every line does .

first we create a function called “scrape(url)” and pass the ’url” varible to it . in the function first we need to open up edge browser ( driver = webdriver.Edge() ( you can use driver = webdriver.Firefox() for opening firefox ) then we need to open the url ( driver.get(url) ) now we have the url opened . you can try running it at this stage and see it only opens the edge and goes to url .

now we need to select the google textarea with class name “gLFyf” . and press enter ( element.send_keys('Hi mom !' + Keys.RETURN) ) .

the html_content = driver.page_source will get the source code of page and put it in html_conten variable.

the last three lines below are for writing html_conten variable to file called page_source_of_google_after_typing_hi_mom.txt .

2.png

My Website ( Currently on Construction ) : http://www.alphageek.ir

My Youtube Channel : https://www.youtube.com/@alphageeks

My Odysee Channel : https://odysee.com/@alphageek

Twitter : https://twitter.com/alphageek11

Read.cash : https://read.cash/@alphageek

Hive.blog : https://hive.blog/@alfageek/

Coin Marketplace

STEEM 0.24
TRX 0.26
JST 0.040
BTC 96724.88
ETH 3455.53
SBD 1.54