How to View and Manage Data Sets in Crawler Manager
Crawler Manager offers a comprehensive suite of tools for viewing, analyzing, and managing data sets from your web crawling activities. This guide walks you through the essential features and functionalities to help you get the most out of your collected data.
Accessing Your Data Sets
To begin working with your data sets, navigate to the dashboard and click on “Manage Project.” From there, select your desired project from the list. Access the menu icon and choose “Crawlers” from the available options. You can then click on the specific crawler you want to examine, which will open its details page.
The details page provides a comprehensive overview of your crawler’s status and activity history. From here, you can access the data set page, which displays statistics for specific runs, including the total number of records pulled and data in tabular format.
Working with Data Set Tables
Crawler Manager provides several ways to customize your data view:
- Adjust column widths by dragging the borders
- Copy text or URLs directly from the table
- Scroll horizontally and vertically to explore the entire data set
- Use the search bar to locate specific terms or values within the data
Customizing Column Display
To filter and display only relevant columns, click the list icon to open a dialog showing all column headers with checkboxes. Select only the columns you wish to view, then click “Save” to apply your selection and update the data set view.
Switching Between Runs
Comparing data sets from different crawler runs is straightforward. Click on the date timestamp to open a calendar showing crawl activity dates (highlighted with pink circles). The calendar clearly indicates run statuses with color-coded dots: green for successful runs and red for failed ones.
Select a date to view available run times, then click on a specific time to switch to that data set. The view will refresh with records from your selected run, allowing you to compare statistics like record count across different crawling sessions.
Downloading Data Sets
To export your data for external analysis, locate the download icon on the right side of the interface, below the date and timestamp. Click this icon and select your preferred format to initiate the download.
Analyzing Performance and Data Quality
Crawler Manager includes robust reporting features that provide insights into your data quality and crawler performance:
Access the “Reports” section to view detailed metrics such as:
- Overall fill rate
- Total row count
- Field completion rates
- Unique values per field
- Data distribution across the data set
You can switch between different report tabs to explore various aspects of your data’s completeness and structure, including heat maps and crawler performance reports that offer deep insights into your data quality.
Reviewing Crawler Logs
For troubleshooting or analyzing crawler behavior, click on “Logs” to review detailed crawl logs. This feature is particularly helpful for debugging failed runs. You can scroll through all logs and download them in .txt format by clicking the download icon below the timestamp.
With these tools at your disposal, you can effectively manage, analyze, and extract value from the data collected by your web crawlers.