Scraping Stock Data from Yahoo Finance with Python and Selenium

Yahoo Finance is one of the most popular platforms for real-time market information. With the right tools, you can extract valuable financial data for major tech companies without using paid services. This guide demonstrates how to scrape stock data from Yahoo Finance using Python libraries like Selenium, fake user agents, and pandas.

Required Libraries

To begin scraping Yahoo Finance, you’ll need several key libraries:

Selenium: Provides tools to control a browser programmatically
By class: Helps locate elements on web pages using expressions or class names
Options: Customizes the Chrome browser to appear as a regular user
Fake User Agent: Generates realistic browser fingerprints to avoid being blocked
Pandas: Organizes scraped data into clean tables for export
Time module: Allows pausing to ensure pages load properly before scraping

Setting Up the Stock List

Define a variable containing a list of stock symbols to scrape:

AAPL (Apple)
GOOGL (Google)
MSFT (Microsoft)
TSLA (Tesla)
AMZN (Amazon)

You can expand this list with additional stock symbols according to your needs.

Configuring the Browser

Create a function that configures and returns a Chrome browser instance by:

Generating a random user agent string to simulate a legitimate browser fingerprint
Creating options objects and attaching the random user agent
Adding arguments to disable features that would reveal automation tools
Adding flags to improve compatibility, especially on Linux or headless servers
Launching Chrome with these options and returning the driver instance

Creating the Scraping Function

The scraping function takes two parameters: the browser driver and the stock ticker (e.g., AAPL). It then:

Creates a dynamically generated URL for the specific ticker
Opens the URL in the browser
Pauses briefly to ensure the page loads completely
Initializes a dictionary to store the scraped data
Uses try/except blocks to handle potential errors
Locates the main table containing financial data using XPath
Finds all list items containing data points
Extracts label-value pairs for each data point (like market cap, dividends, etc.)
Stores the extracted data in the dictionary
Returns the completed dictionary if successful

Implementing the Main Function

The main function orchestrates the entire scraping process:

Gets the configured browser driver
Prepares an empty list to collect all scraped data
Loops through each ticker in the list
Calls the scraping function for each ticker
Adds the returned data to the master list
Closes the browser when finished to free resources

Saving the Data

After scraping all tickers, the program:

Converts the collected data into a pandas DataFrame
Saves the data to an Excel file named ‘Yahoo_finance_data.xlsx’
Confirms successful data storage with a message

Running the Script

When executed, the script opens a browser window, scrapes data for each ticker in the list, and saves the information to an Excel file with ticker symbols and corresponding data points.

This approach provides a foundation for scraping stock data that you can expand to include more tickers or adapt to scrape other sections of the Yahoo Finance website.