Building Custom Financial Data Sets with AI Agents: No APIs Required

Building Custom Financial Data Sets with AI Agents: No APIs Required

Creating custom financial datasets has traditionally required complex web scraping or expensive API subscriptions. A new approach leverages AI agents to gather company-level financial data without relying on either method, potentially giving analysts and investors a competitive edge.

This innovative workflow uses AI to collect unstructured data points that aren’t typically available in standard financial datasets – such as executive departures, board compositions, employee sentiment, and open job postings.

The Technology Stack

The implementation is surprisingly straightforward, using:

  • OpenAI’s Agents SDK library – a Python framework for building agent-based AI applications
  • PIDANTIC library – for defining and enforcing structured outputs with clearly typed fields
  • Pandas – for exporting collected data into a dataframe for analysis

Model Options and Cost Considerations

Two AI model options were tested for this workflow:

  1. OpenAI’s GPT-4o Mini
  2. Perplexity’s Sonar Pro

Cost analysis reveals a significant difference: OpenAI charges over 2.5 times more than Perplexity for 1,000 web searches (assuming medium context size), making Perplexity’s Sonar models substantially more cost-effective.

Implementation Process

The implementation follows these key steps:

  1. Setting up the environment with necessary libraries
  2. Defining a structured data model using PIDANTIC
  3. Creating an agent with web search capabilities
  4. Processing a list of stock tickers to gather company information
  5. Converting the results into a pandas dataframe

The PIDANTIC model ensures that each data point collected (founding year, CEO tenure, number of employees, etc.) has a specific data type, making the output ready for immediate analysis.

Performance Comparison

When comparing the performance of OpenAI and Perplexity models:

  • Both performed well on easily accessible data (sector, founding year, current CEO tenure)
  • Perplexity’s Sonar Pro demonstrated superior accuracy, particularly with more complex queries like the number of CEOs since 2010 and current job openings
  • GPT-4o Mini significantly underperformed on these more challenging data points

The accuracy gap is noteworthy considering Perplexity’s lower cost structure.

Practical Applications and Future Potential

This approach to data collection offers several advantages:

  • Flexibility to gather information based on specific investment theses
  • Access to unstructured data points not available in standard financial APIs
  • Ability to create proprietary datasets that may provide market insights

While validation remains important, this technology is rapidly improving. The current transition period – as these tools evolve from somewhat unreliable to highly accurate – presents a prime opportunity for early adopters to gain advantages before such methods become optimized and widely adopted.

Improvements can be made by including specific time references in prompts and implementing external validation processes for critical data points.

Leave a Comment