Sometimes we need to extract information from websites. We can extract data from websites by using there available API’s. But there are websites where API’s are not available.
Here, Web scraping comes into play!
Python is widely being used in web scraping, for the ease it provides in writing the core logic. In this tutorial , we shall be looking into scraping using some very powerful Python based libraries like BeautifulSoup and Selenium.
BeautifulSoup and urllib
BeautifulSoup is a Python library for pulling data out of HTML and XML files. But it does not get data directly from a webpage. So here we will use urllib library to extract webpage.
First we need to install BeautifulSoup4 in our system using following command :
$ sudo pip install BeatifulSoup4
$ pip install lxml
OR
$ sudo apt-get install python3-bs4
$ sudo apt-get install python-lxml
So here I am going to extract homepage from a website https://www.botreetechnologies.com
from urllib.request import urlopen
from bs4 import BeautifulSoup
We import our package that we are going to use in our program. Now we will extract our webpage using following.
response = urlopen('https://www.botreetechnologies.com/case-studies')
Continue Reading: https://www.botreetechnologies.com/blog/web-scrapping-using-python/