Automating Web Scraping with FireCrow and Make.com: A Step-by-Step Guide
Web scraping has become an essential tool for data-driven applications, and new AI-powered solutions are making it more accessible than ever. This article explores how to use FireCrow’s new ‘Stract’ function alongside Make.com (formerly known as Integromat) to collect cryptocurrency data from CoinGecko and store it in a database.
Understanding FireCrow’s Stract Function
FireCrow is a powerful platform that simplifies web scraping by leveraging AI to extract structured data from websites. Its recently launched ‘Stract’ function allows users to provide a natural language prompt describing what data they want to collect, and the AI automatically generates a schema and extracts the data accordingly.
The platform supports various output formats including Markdown, JSON, HTML, and CSV, making it versatile for different applications.
Setting Up a Cryptocurrency Data Scraper
For this demonstration, we’ll extract the top 10 cryptocurrencies from CoinGecko.com, including their names, current prices, price variations, trading volumes, and market capitalizations.
Step 1: Testing in the FireCrow Playground
Before implementing the full automation, it’s wise to test your scraping setup in FireCrow’s playground:
- Navigate to the FireCrow playground
- Provide a prompt like: “Extract the top 10 cryptocurrencies from CoinGecko.com with their corresponding values: name, price, price variation, volume, market cap”
- Click ‘Generate’ to create the schema
- Specify you only want to scrape the main page
- Run the scraper to see the extracted data in JSON format
The output should provide a structured dataset containing the top 10 cryptocurrencies with all requested information organized in a clean format.
Building the Automation in Make.com
Once we’ve confirmed our scraping setup works in the playground, we can implement it as an automated workflow in Make.com:
Step 2: Setting Up HTTP Requests
Create a new scenario in Make.com with these components:
- Add an HTTP Request node configured as a POST request to the FireCrow API endpoint
- Set up authorization using your FireCrow API token (found in Dashboard > API Tokens)
- Configure the request body in JSON format, including:
- The data schema (matching what was generated in the playground)
- The target URL (CoinGecko.com)
- The natural language prompt
- Whether to scrape specific pages or all pages
Step 3: Handling Asynchronous Processing
Since FireCrow processes scraping jobs asynchronously, we need to poll for results:
- Add a delay module (30 seconds)
- Add another HTTP Request node (GET) to check the job status using the ID returned from the first request
- Implement a loop to check periodically until the job completes
Step 4: Processing and Storing Data
Once the scraping job completes:
- Use a Split module to extract just the cryptocurrency data from the response
- Create a loop that processes each cryptocurrency entry
- Use a database module (in this case, Airtable) to store each entry with fields for name, price, variation, volume, and market cap
This automation can be triggered manually or set up on a schedule to regularly update your cryptocurrency database with fresh data.
Extending the Solution
This basic implementation can be enhanced in several ways:
- Set up a scheduled trigger to run the scraper at regular intervals (hourly, daily, etc.)
- Create a webhook that triggers the scenario when called from another application
- Build a front-end interface to display the collected data
- Set up email notifications with price alerts based on the scraped data
Conclusion
The combination of FireCrow’s AI-powered scraping capabilities and Make.com’s automation platform creates a powerful solution for collecting, processing, and storing web data without writing complex code. This approach can be adapted to scrape virtually any website and integrate the data into your applications or workflows.
As web scraping technology continues to evolve with AI assistance, building data-driven applications becomes increasingly accessible to users without extensive programming knowledge.