How to Build a Python Web Scraper for Live Stock Price Monitoring and Analysis
Automate Stock Market Tracking with Python: Create a Real-Time Web Scraper for Data, Analysis, and Visuals
Tracking stock prices is crucial for making informed investment decisions. While numerous platforms provide this service, building a custom web scraper in Python offers the flexibility to extract and analyze data according to your specific needs. In this guide, we’ll walk you through creating an advanced Python web scraper that not only tracks real-time stock prices but also gathers additional financial data, stores it in a database, and visualizes trends over time.
What You’ll Learn:
Web scraping using
requests
andBeautifulSoup
.Storing scraped data in a SQLite database for easy access and analysis.
Automating the scraping process with scheduling tools.
Visualizing stock trends using
matplotlib
orplotly
.Handling common challenges such as IP blocking, CAPTCHAs, and dynamic content.
1. Setting Up the Environment
Before we start, make sure Python is installed on your machine. We'll need the following libraries:
pip install requests beautifulsoup4 sqlite3 matplotlib plotly schedule
Library Overview:
requests
: Handles HTTP requests to retrieve web content.BeautifulSoup
: Parses HTML content to extract data.sqlite3
: Manages a SQLite database for storing scraped data.matplotlib
andplotly
: Visualize data trends.schedule
: Automates the scraping process at regular intervals.
2. Selecting the Target Website
For this tutorial, we’ll use Yahoo Finance as our data source. Ensure you are compliant with their terms of service regarding web scraping.
3. Building the Web Scraper
We start by fetching the HTML content of the stock page using requests
and then parse it with BeautifulSoup
.
import requests
from bs4 import BeautifulSoup
def fetch_stock_page(stock_symbol):
url = f"https://finance.yahoo.com/quote/{stock_symbol}"
headers = {"User-Agent": "Mozilla/5.0"}
response = requests.get(url, headers=headers)
if response.status_code == 200:
return response.text
else:
print(f"Failed to retrieve data for {stock_symbol}")
return None
def parse_stock_price(html_content, stock_symbol):
soup = BeautifulSoup(html_content, "html.parser")
price_tag = soup.find("fin-streamer", {"data-symbol": stock_symbol, "data-field": "regularMarketPrice"})
if price_tag:
return float(price_tag.text.replace(',', ''))
else:
print(f"Failed to find the stock price for {stock_symbol}")
return None
Step 2: Extracting Comprehensive Stock Data
In addition to the current price, we’ll extract the open price, daily high/low, and volume.
def parse_stock_details(html_content):
soup = BeautifulSoup(html_content, "html.parser")
details = {
"current_price": float(soup.find("fin-streamer", {"data-field": "regularMarketPrice"}).text.replace(',', '')),
"open_price": float(soup.find("td", {"data-test": "OPEN-value"}).text.replace(',', '')),
"day_high": float(soup.find("td", {"data-test": "DAYS_RANGE-value"}).text.split(" - ")[1].replace(',', '')),
"day_low": float(soup.find("td", {"data-test": "DAYS_RANGE-value"}).text.split(" - ")[0].replace(',', '')),
"volume": int(soup.find("td", {"data-test": "TD_VOLUME-value"}).text.replace(',', ''))
}
return details
4. Storing Data in a SQLite Database
Step 3: Setting Up the Database
We’ll use SQLite to store the stock data. This allows us to query and analyze the data later.
import sqlite3
def create_database(db_name="stock_data.db"):
conn = sqlite3.connect(db_name)
cursor = conn.cursor()
cursor.execute('''CREATE TABLE IF NOT EXISTS stocks
(timestamp TEXT, symbol TEXT, current_price REAL, open_price REAL,
day_high REAL, day_low REAL, volume INTEGER)''')
conn.commit()
conn.close()
def save_to_database(stock_symbol, stock_details, db_name="stock_data.db"):
conn = sqlite3.connect(db_name)
cursor = conn.cursor()
cursor.execute('''INSERT INTO stocks (timestamp, symbol, current_price, open_price,
day_high, day_low, volume) VALUES (datetime('now'), ?, ?, ?, ?, ?, ?)''',
(stock_symbol, stock_details["current_price"], stock_details["open_price"],
stock_details["day_high"], stock_details["day_low"], stock_details["volume"]))
conn.commit()
conn.close()
# Example usage
create_database()
5. Automating the Scraper
Step 4: Scheduling Regular Scrapes
To continuously monitor stock prices, we’ll schedule the scraper to run at specified intervals.
import schedule
import time
def job(stock_symbol):
html_content = fetch_stock_page(stock_symbol)
if html_content:
stock_details = parse_stock_details(html_content)
save_to_database(stock_symbol, stock_details)
print(f"Saved {stock_symbol} stock details to database.")
schedule.every(30).minutes.do(job, stock_symbol="AAPL")
while True:
schedule.run_pending()
time.sleep(1)
6. Visualizing the Data
Step 5: Plotting Stock Prices
We can use matplotlib
or plotly
to visualize the stock price trends over time.
import matplotlib.pyplot as plt
def plot_stock_prices(stock_symbol, db_name="stock_data.db"):
conn = sqlite3.connect(db_name)
cursor = conn.cursor()
cursor.execute('''SELECT timestamp, current_price FROM stocks WHERE symbol=?''', (stock_symbol,))
data = cursor.fetchall()
conn.close()
timestamps = [row[0] for row in data]
prices = [row[1] for row in data]
plt.plot(timestamps, prices, label=f"{stock_symbol} Price")
plt.xlabel('Time')
plt.ylabel('Price ($)')
plt.title(f"{stock_symbol} Stock Price Over Time")
plt.legend()
plt.xticks(rotation=45)
plt.tight_layout()
plt.show()
# Example usage
plot_stock_prices("AAPL")
Step 6: Advanced Visualizations with Plotly
For more interactive visualizations, use Plotly:
import plotly.graph_objs as go
import plotly.express as px
def plot_stock_prices_plotly(stock_symbol, db_name="stock_data.db"):
conn = sqlite3.connect(db_name)
cursor = conn.cursor()
cursor.execute('''SELECT timestamp, current_price FROM stocks WHERE symbol=?''', (stock_symbol,))
data = cursor.fetchall()
conn.close()
timestamps = [row[0] for row in data]
prices = [row[1] for row in data]
fig = px.line(x=timestamps, y=prices, title=f"{stock_symbol} Stock Price Over Time", labels={'x': 'Time', 'y': 'Price ($)'})
fig.show()
# Example usage
plot_stock_prices_plotly("AAPL")
7. Handling Challenges
Step 7: Dealing with IP Blocking and CAPTCHAs
Some websites implement security measures like CAPTCHAs or IP blocking. To mitigate these:
Use rotating proxies.
Implement user-agent rotation.
Respect the website’s robots.txt file.
Introduce random delays between requests.
import random
import time
def fetch_stock_page_with_delay(stock_symbol):
url = f"https://finance.yahoo.com/quote/{stock_symbol}"
headers = {"User-Agent": random.choice(USER_AGENTS)}
time.sleep(random.uniform(1, 3))
response = requests.get(url, headers=headers)
if response.status_code == 200:
return response.text
else:
print(f"Failed to retrieve data for {stock_symbol}")
return None
USER_AGENTS = [
"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36",
"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/14.0.3 Safari/605.1.15",
# Add more user agents here
]
8. Scaling the Scraper
Step 8: Tracking Multiple Stocks
To track multiple stocks, modify the job
function to accept a list of stock symbols.
def job(stock_symbols):
for symbol in stock_symbols:
html_content = fetch_stock_page(symbol)
if html_content:
stock_details = parse_stock_details(html_content)
save_to_database(symbol, stock_details)
print(f"Saved {symbol} stock details to database.")
# Schedule for multiple stocks
schedule.every(30).minutes.do(job, stock_symbols=["AAPL", "GOOGL", "MSFT"])
Disclaimer: This script is for educational purposes only. We are not responsible for any financial decisions made based on this script. Always conduct your own research or consult with a professional before making any financial decisions.
Conclusion
Whether you’re an investor looking to keep a close eye on your portfolio or a developer interested in exploring data scraping and visualization, this project provides a solid foundation. Consider expanding it further by adding features like alerts for price changes, integrating with trading APIs, or scaling up to handle large datasets using cloud databases and distributed scraping techniques.