Scraping Stock Data from Yahoo Finance with Python and Selenium
Yahoo Finance is one of the most popular platforms for real-time market information. With the right tools, you can extract valuable financial data for major tech companies without using paid services. This guide demonstrates how to scrape stock data from Yahoo Finance using Python libraries like Selenium, fake user agents, and pandas.
Required Libraries
To begin scraping Yahoo Finance, you’ll need several key libraries:
- Selenium: Provides tools to control a browser programmatically
- By class: Helps locate elements on web pages using expressions or class names
- Options: Customizes the Chrome browser to appear as a regular user
- Fake User Agent: Generates realistic browser fingerprints to avoid being blocked
- Pandas: Organizes scraped data into clean tables for export
- Time module: Allows pausing to ensure pages load properly before scraping
Setting Up the Stock List
Define a variable containing a list of stock symbols to scrape:
- AAPL (Apple)
- GOOGL (Google)
- MSFT (Microsoft)
- TSLA (Tesla)
- AMZN (Amazon)
You can expand this list with additional stock symbols according to your needs.
Configuring the Browser
Create a function that configures and returns a Chrome browser instance by:
- Generating a random user agent string to simulate a legitimate browser fingerprint
- Creating options objects and attaching the random user agent
- Adding arguments to disable features that would reveal automation tools
- Adding flags to improve compatibility, especially on Linux or headless servers
- Launching Chrome with these options and returning the driver instance
Creating the Scraping Function
The scraping function takes two parameters: the browser driver and the stock ticker (e.g., AAPL). It then:
- Creates a dynamically generated URL for the specific ticker
- Opens the URL in the browser
- Pauses briefly to ensure the page loads completely
- Initializes a dictionary to store the scraped data
- Uses try/except blocks to handle potential errors
- Locates the main table containing financial data using XPath
- Finds all list items containing data points
- Extracts label-value pairs for each data point (like market cap, dividends, etc.)
- Stores the extracted data in the dictionary
- Returns the completed dictionary if successful
Implementing the Main Function
The main function orchestrates the entire scraping process:
- Gets the configured browser driver
- Prepares an empty list to collect all scraped data
- Loops through each ticker in the list
- Calls the scraping function for each ticker
- Adds the returned data to the master list
- Closes the browser when finished to free resources
Saving the Data
After scraping all tickers, the program:
- Converts the collected data into a pandas DataFrame
- Saves the data to an Excel file named ‘Yahoo_finance_data.xlsx’
- Confirms successful data storage with a message
Running the Script
When executed, the script opens a browser window, scrapes data for each ticker in the list, and saves the information to an Excel file with ticker symbols and corresponding data points.
This approach provides a foundation for scraping stock data that you can expand to include more tickers or adapt to scrape other sections of the Yahoo Finance website.