Building Your First Python Web Scraper: A Step-by-Step Guide
Learn how to build a simple web scraper in Python to extract information from web pages using libraries like requests and BeautifulSoup.
Web scraping is a useful skill when you want to collect data from websites automatically instead of manually copying and pasting. In this tutorial, we'll build a beginner-friendly web scraper using Python. We'll scrape basic information from a sample website using the popular libraries: requests and BeautifulSoup.
Step 1: Setting up your environment. First, make sure you have Python installed on your computer. Then, install the required libraries by running the following command in your terminal or command prompt:
pip install requests beautifulsoup4Step 2: Import the libraries in your Python script.
import requests
from bs4 import BeautifulSoupStep 3: Fetch the webpage content. We'll use requests to download the HTML content of the page.
url = 'https://quotes.toscrape.com/'
response = requests.get(url)
print(response.status_code) # Should print 200 if successful
html_content = response.textStep 4: Parse the HTML content using BeautifulSoup.
soup = BeautifulSoup(html_content, 'html.parser')Step 5: Extract information. For example, let's scrape all the quotes and their authors from the page.
quotes = soup.find_all('div', class_='quote')
for quote in quotes:
text = quote.find('span', class_='text').get_text()
author = quote.find('small', class_='author').get_text()
print(f'"{text}" — {author}')Step 6: Putting it all together, here is the complete script:
import requests
from bs4 import BeautifulSoup
url = 'https://quotes.toscrape.com/'
response = requests.get(url)
if response.status_code == 200:
soup = BeautifulSoup(response.text, 'html.parser')
quotes = soup.find_all('div', class_='quote')
for quote in quotes:
text = quote.find('span', class_='text').get_text()
author = quote.find('small', class_='author').get_text()
print(f'"{text}" — {author}')
else:
print('Failed to retrieve the webpage')Congratulations! You just built your first web scraper in Python. This scraper gets quotes and authors from a webpage. With similar steps, you can scrape other data from different websites, but always remember to check the website's terms of service and robots.txt to avoid violating any rules.