How to Automate Web Scraping for Contact Information Using Fire Troll
Web scraping is an essential skill for gathering business intelligence, lead generation, and data analysis. This step-by-step guide demonstrates how to extract contact information from websites automatically using Fire Troll and integrate the results with Notion.
Setting Up the Workflow
The process begins with a simple workflow that can extract contact details including email addresses, phone numbers, and social media links from any website. The system works by accepting a URL input and then processing it through several stages.
Core Components of the System
The workflow includes several important validation steps:
- Input validation to check if the provided URL is empty
- Waiting periods to ensure proper processing
- Dynamic URL handling to accommodate various websites
Configuring Fire Troll API
To set up the Fire Troll API connection:
- Use the POST method to the endpoint: ap.firetroll.del/t1/scrape
- For authentication, select generic credential and click on header
- Create your API key with the authorization header
- Copy your API key from Fire Troll and paste it in the authorization field
Once configured, the system will create a request to the website and return the content in markdown format.
Processing the Data with Gemini
The workflow continues by:
- Converting the scraped content to a markdown file
- Uploading the file to Gemini for structured data extraction
- Checking the processing state (active or processing)
- Prompting Gemini to extract the data in the required format
Data Parsing and Notion Integration
The final steps involve:
- Cleaning the JSON response
- Parsing essential information (company name, website URL, contact details)
- Updating the Notion database with the structured data
An important note: The workflow uses empty strings instead of null values because Notion’s API has limitations handling null values, which can cause errors.
Getting Started
The complete workflow is available in the community, allowing anyone to implement this powerful web scraping solution for contact information extraction. This automated approach saves significant time compared to manual data gathering while providing comprehensive contact details from any website.