How to Extract Website Data Without APIs Using Python and Selenium

How to Extract Website Data Without APIs Using Python and Selenium

Need real-time data from websites that don’t offer APIs? Many websites charge for their data access or simply don’t provide API interfaces. Fortunately, there’s still a reliable method to extract up-to-date information directly from any website.

When evaluating different extraction methods, several approaches present themselves, each with limitations:

  • PHP with Curl – Cannot process JavaScript-generated content
  • PHP with NodeJS – Complex implementation with performance issues
  • NodeJS with Puppeteer – Resource-intensive with limited browser support

The optimal solution is Python with Selenium – it executes JavaScript natively, efficiently extracts structured data, and runs in a single environment.

Setting Up Your Web Scraping Project

To begin extracting data using Python and Selenium, you’ll need:

  1. Python installed on your system (available from the official website)
  2. The target webpage URL
  3. CSS selectors for the data elements you need

Create a file named serve.py containing code to navigate to your target URL and extract data based on the CSS classes you’ve identified. You’ll need to install the required packages:

pip install Selenium flask

Running the Python script with python serve.py will start a web server that displays the extracted data, refreshing with the latest information each time you reload the page.

Creating Downloadable Data

To save the extracted data for later use, create a file named create-file.py with code that:

  1. Navigates to the target webpage
  2. Extracts the required data
  3. Saves it as a structured JSON file

When executed with python create-file.py, this script generates a clean JSON file containing all the extracted information, ready to be used in any application.

Displaying the Extracted Data

The JSON data can be easily incorporated into web applications by creating an HTML file that uses JavaScript’s fetch API to load and display the information. This approach allows you to create:

  • Information portals
  • Online tools
  • Custom dashboards

For automated updates, you can set up a cron job on your web host to periodically fetch fresh data, ensuring your records remain current.

Understanding the Code

The server script combines Flask and Selenium to:

  1. Import necessary libraries for web serving and browser automation
  2. Set up a Flask application to serve web pages
  3. Define an HTML template for data presentation
  4. Launch Chrome in headless mode (without a visible browser window)
  5. Navigate to the target page and wait for dynamic content to load
  6. Extract the HTML of the required table elements
  7. Return the data through the Flask web server

The data extraction script:

  1. Uses Selenium WebDriver to control Chrome
  2. Locates elements on the page using CSS selectors
  3. Extracts structured data from table rows
  4. Converts the data to formatted JSON
  5. Saves the results to a file for future use

With this approach, you can reliably extract data from virtually any website, even without official API access, creating powerful web tools and information resources with minimal overhead.

Leave a Comment