Building a Bus Ticket Price Comparison App with Web Scraping
Web scraping can be incredibly powerful when applied to practical problems. A recent project demonstrates how scraping multiple bus service websites can be transformed into a useful mobile application that saves users money.
The Get Me Home App
The ‘Get Me Home’ app was developed to help find the cheapest bus tickets between locations in the Northeastern United States. The application scraped data from three different bus services: Megabus, Flixbus, and Arabus.
Though it’s no longer maintained or available on the App Store, the app had achieved over 100 downloads during its lifetime. The project serves as an excellent case study of web scraping techniques in a real-world application.
Technical Architecture
The project consists of two main components:
- Backend: Written in Python using the Flask framework
- Frontend: Developed with Swift and SwiftUI for iOS
Backend Implementation
The backend architecture is relatively straightforward, consisting of four main files with a single API endpoint:
The primary route get_trips
determines whether to retrieve data from all bus services or just one specific service, then calls the appropriate helper functions.
Asynchronous Processing
One of the most significant optimizations was implementing asynchronous requests. This change reduced the response time from 30 seconds to just 2 seconds when retrieving data from all three services simultaneously.
Scraping Different APIs
Each bus service required a different approach to data extraction:
Flixbus
The simplest implementation used formatted requests to the service’s existing API endpoints. After creating the appropriate URL with location and date parameters, the code makes requests to get trip data and intermediate stations, then formats everything into a standardized trip object.
Megabus
Similar to Flixbus, the Megabus implementation required crafting the appropriate API URL and a separate link that would allow users to purchase tickets directly. The response data is parsed and organized into the common trip object format.
Arabus – HTML Scraping Challenge
Unlike the other services, Arabus doesn’t expose a public API that returns JSON data. Instead, the developer had to extract information from the HTML response:
- The code searches for a specific variable called ‘default_search’ in the HTML
- This variable contains a large string with all the necessary trip data
- Beautiful Soup is used to parse this information
- The extracted data is then converted to a JSON object
- Finally, the JSON data is formatted into the standardized trip object
Arabus also has an interesting feature that automatically suggests alternative nearby routes if no direct trips are available for the requested locations.
Trip Object Structure
Each trip, regardless of the source, is standardized into a common format containing:
- Intermediate stations
- Direct link to purchase tickets
- Trip date
- Trip ID
- Arrival location
- Departure time
- Departure location
- Coordinates (for map display)
Hosting
The backend was hosted on Render, a cloud hosting service that offers both free and paid plans. The developer initially used the free plan for development, then upgraded to a paid plan when the app was published, and finally reverted to the free tier when the app was no longer being maintained.
Render provided valuable metrics, logs, and insights that helped monitor the application’s performance.
Conclusion
This project demonstrates how web scraping can be leveraged to create practical applications that provide real value to users. By combining data from multiple sources and presenting it in a unified, user-friendly interface, the developer was able to create a tool that helped people find the most affordable transportation options.
The combination of asynchronous processing, different scraping techniques, and mobile app development showcases the versatility and power of web scraping beyond simple data collection.