How to Build a Python Web Scraper for Live Stock Price Monitoring and Analysis

How to Build a Python Web Scraper for Live Stock Price Monitoring and Analysis

Automate Stock Market Tracking with Python: Create a Real-Time Web Scraper for Data, Analysis, and Visuals

Tracking stock prices is crucial for making informed investment decisions. While numerous platforms provide this service, building a custom web scraper in Python offers the flexibility to extract and analyze data according to your specific needs. In this guide, we’ll walk you through creating an advanced Python web scraper that not only tracks real-time stock prices but also gathers additional financial data, stores it in a database, and visualizes trends over time.

What You’ll Learn:

  • Web scraping using requests and BeautifulSoup.

  • Storing scraped data in a SQLite database for easy access and analysis.

  • Automating the scraping process with scheduling tools.

  • Visualizing stock trends using matplotlib or plotly.

  • Handling common challenges such as IP blocking, CAPTCHAs, and dynamic content.

1. Setting Up the Environment

Before we start, make sure Python is installed on your machine. We'll need the following libraries:

pip install requests beautifulsoup4 sqlite3 matplotlib plotly schedule

Library Overview:

  • requests: Handles HTTP requests to retrieve web content.

  • BeautifulSoup: Parses HTML content to extract data.

  • sqlite3: Manages a SQLite database for storing scraped data.

  • matplotlib and plotly: Visualize data trends.

  • schedule: Automates the scraping process at regular intervals.

2. Selecting the Target Website

For this tutorial, we’ll use Yahoo Finance as our data source. Ensure you are compliant with their terms of service regarding web scraping.

3. Building the Web Scraper

We start by fetching the HTML content of the stock page using requests and then parse it with BeautifulSoup.

import requests
from bs4 import BeautifulSoup

def fetch_stock_page(stock_symbol):
    url = f"https://finance.yahoo.com/quote/{stock_symbol}"
    headers = {"User-Agent": "Mozilla/5.0"}
    response = requests.get(url, headers=headers)
    if response.status_code == 200:
        return response.text
    else:
        print(f"Failed to retrieve data for {stock_symbol}")
        return None

def parse_stock_price(html_content, stock_symbol):
    soup = BeautifulSoup(html_content, "html.parser")
    price_tag = soup.find("fin-streamer", {"data-symbol": stock_symbol, "data-field": "regularMarketPrice"})
    if price_tag:
        return float(price_tag.text.replace(',', ''))
    else:
        print(f"Failed to find the stock price for {stock_symbol}")
        return None

Step 2: Extracting Comprehensive Stock Data

In addition to the current price, we’ll extract the open price, daily high/low, and volume.

def parse_stock_details(html_content):
    soup = BeautifulSoup(html_content, "html.parser")
    details = {
        "current_price": float(soup.find("fin-streamer", {"data-field": "regularMarketPrice"}).text.replace(',', '')),
        "open_price": float(soup.find("td", {"data-test": "OPEN-value"}).text.replace(',', '')),
        "day_high": float(soup.find("td", {"data-test": "DAYS_RANGE-value"}).text.split(" - ")[1].replace(',', '')),
        "day_low": float(soup.find("td", {"data-test": "DAYS_RANGE-value"}).text.split(" - ")[0].replace(',', '')),
        "volume": int(soup.find("td", {"data-test": "TD_VOLUME-value"}).text.replace(',', ''))
    }
    return details

4. Storing Data in a SQLite Database

Step 3: Setting Up the Database

We’ll use SQLite to store the stock data. This allows us to query and analyze the data later.

import sqlite3

def create_database(db_name="stock_data.db"):
    conn = sqlite3.connect(db_name)
    cursor = conn.cursor()
    cursor.execute('''CREATE TABLE IF NOT EXISTS stocks
                      (timestamp TEXT, symbol TEXT, current_price REAL, open_price REAL, 
                      day_high REAL, day_low REAL, volume INTEGER)''')
    conn.commit()
    conn.close()

def save_to_database(stock_symbol, stock_details, db_name="stock_data.db"):
    conn = sqlite3.connect(db_name)
    cursor = conn.cursor()
    cursor.execute('''INSERT INTO stocks (timestamp, symbol, current_price, open_price, 
                      day_high, day_low, volume) VALUES (datetime('now'), ?, ?, ?, ?, ?, ?)''',
                   (stock_symbol, stock_details["current_price"], stock_details["open_price"],
                    stock_details["day_high"], stock_details["day_low"], stock_details["volume"]))
    conn.commit()
    conn.close()

# Example usage
create_database()

5. Automating the Scraper

Step 4: Scheduling Regular Scrapes

To continuously monitor stock prices, we’ll schedule the scraper to run at specified intervals.

import schedule
import time

def job(stock_symbol):
    html_content = fetch_stock_page(stock_symbol)
    if html_content:
        stock_details = parse_stock_details(html_content)
        save_to_database(stock_symbol, stock_details)
        print(f"Saved {stock_symbol} stock details to database.")

schedule.every(30).minutes.do(job, stock_symbol="AAPL")

while True:
    schedule.run_pending()
    time.sleep(1)

6. Visualizing the Data

Step 5: Plotting Stock Prices

We can use matplotlib or plotly to visualize the stock price trends over time.

import matplotlib.pyplot as plt

def plot_stock_prices(stock_symbol, db_name="stock_data.db"):
    conn = sqlite3.connect(db_name)
    cursor = conn.cursor()
    cursor.execute('''SELECT timestamp, current_price FROM stocks WHERE symbol=?''', (stock_symbol,))
    data = cursor.fetchall()
    conn.close()

    timestamps = [row[0] for row in data]
    prices = [row[1] for row in data]

    plt.plot(timestamps, prices, label=f"{stock_symbol} Price")
    plt.xlabel('Time')
    plt.ylabel('Price ($)')
    plt.title(f"{stock_symbol} Stock Price Over Time")
    plt.legend()
    plt.xticks(rotation=45)
    plt.tight_layout()
    plt.show()

# Example usage
plot_stock_prices("AAPL")

Step 6: Advanced Visualizations with Plotly

For more interactive visualizations, use Plotly:

import plotly.graph_objs as go
import plotly.express as px

def plot_stock_prices_plotly(stock_symbol, db_name="stock_data.db"):
    conn = sqlite3.connect(db_name)
    cursor = conn.cursor()
    cursor.execute('''SELECT timestamp, current_price FROM stocks WHERE symbol=?''', (stock_symbol,))
    data = cursor.fetchall()
    conn.close()

    timestamps = [row[0] for row in data]
    prices = [row[1] for row in data]

    fig = px.line(x=timestamps, y=prices, title=f"{stock_symbol} Stock Price Over Time", labels={'x': 'Time', 'y': 'Price ($)'})
    fig.show()

# Example usage
plot_stock_prices_plotly("AAPL")

7. Handling Challenges

Step 7: Dealing with IP Blocking and CAPTCHAs

Some websites implement security measures like CAPTCHAs or IP blocking. To mitigate these:

  • Use rotating proxies.

  • Implement user-agent rotation.

  • Respect the website’s robots.txt file.

  • Introduce random delays between requests.

import random
import time

def fetch_stock_page_with_delay(stock_symbol):
    url = f"https://finance.yahoo.com/quote/{stock_symbol}"
    headers = {"User-Agent": random.choice(USER_AGENTS)}
    time.sleep(random.uniform(1, 3))
    response = requests.get(url, headers=headers)
    if response.status_code == 200:
        return response.text
    else:
        print(f"Failed to retrieve data for {stock_symbol}")
        return None

USER_AGENTS = [
    "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36",
    "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/14.0.3 Safari/605.1.15",
    # Add more user agents here
]

8. Scaling the Scraper

Step 8: Tracking Multiple Stocks

To track multiple stocks, modify the job function to accept a list of stock symbols.

def job(stock_symbols):
    for symbol in stock_symbols:
        html_content = fetch_stock_page(symbol)
        if html_content:
            stock_details = parse_stock_details(html_content)
            save_to_database(symbol, stock_details)
            print(f"Saved {symbol} stock details to database.")

# Schedule for multiple stocks
schedule.every(30).minutes.do(job, stock_symbols=["AAPL", "GOOGL", "MSFT"])

Disclaimer: This script is for educational purposes only. We are not responsible for any financial decisions made based on this script. Always conduct your own research or consult with a professional before making any financial decisions.

Conclusion
In this tutorial, we’ve built a comprehensive Python web scraper that tracks stock prices, stores the data in a SQLite database, and visualizes the trends over time. By automating the scraping process and handling common challenges like IP blocking, this project serves as a powerful tool for financial data analysis.

Whether you’re an investor looking to keep a close eye on your portfolio or a developer interested in exploring data scraping and visualization, this project provides a solid foundation. Consider expanding it further by adding features like alerts for price changes, integrating with trading APIs, or scaling up to handle large datasets using cloud databases and distributed scraping techniques.