Building Your First Python Web Scraper: A Step-by-Step Guide

Learn how to build a simple web scraper in Python to extract information from web pages using libraries like requests and BeautifulSoup.

Web scraping is a useful skill when you want to collect data from websites automatically instead of manually copying and pasting. In this tutorial, we'll build a beginner-friendly web scraper using Python. We'll scrape basic information from a sample website using the popular libraries: requests and BeautifulSoup.

Step 1: Setting up your environment. First, make sure you have Python installed on your computer. Then, install the required libraries by running the following command in your terminal or command prompt:

python
pip install requests beautifulsoup4

Step 2: Import the libraries in your Python script.

python
import requests
from bs4 import BeautifulSoup

Step 3: Fetch the webpage content. We'll use requests to download the HTML content of the page.

python
url = 'https://quotes.toscrape.com/'
response = requests.get(url)
print(response.status_code)  # Should print 200 if successful
html_content = response.text

Step 4: Parse the HTML content using BeautifulSoup.

python
soup = BeautifulSoup(html_content, 'html.parser')

Step 5: Extract information. For example, let's scrape all the quotes and their authors from the page.

python
quotes = soup.find_all('div', class_='quote')

for quote in quotes:
    text = quote.find('span', class_='text').get_text()
    author = quote.find('small', class_='author').get_text()
    print(f'"{text}" — {author}')

Step 6: Putting it all together, here is the complete script:

python
import requests
from bs4 import BeautifulSoup

url = 'https://quotes.toscrape.com/'
response = requests.get(url)

if response.status_code == 200:
    soup = BeautifulSoup(response.text, 'html.parser')
    quotes = soup.find_all('div', class_='quote')

    for quote in quotes:
        text = quote.find('span', class_='text').get_text()
        author = quote.find('small', class_='author').get_text()
        print(f'"{text}" — {author}')
else:
    print('Failed to retrieve the webpage')

Congratulations! You just built your first web scraper in Python. This scraper gets quotes and authors from a webpage. With similar steps, you can scrape other data from different websites, but always remember to check the website's terms of service and robots.txt to avoid violating any rules.