Extracting Web Data in Power BI: A Step-by-Step Guide
Power BI offers robust capabilities for extracting data from various sources, including web pages. This powerful feature allows business intelligence professionals to gather information directly from websites and transform it into usable datasets.
Connecting to Web Data Sources
The process of connecting to web data follows the same intuitive pattern as other data connections in Power BI. To begin, click on the “Get Data” button and then select “More” to view additional connector options. While Power BI offers numerous connectors, the web connector can be found by either scrolling down or using the search functionality.
After selecting the web connector and clicking “Connect,” you’ll need to input the URL of the web page containing the data you wish to extract. For demonstration purposes, a page containing FIFA World Cup winner information was used.
Automatic Table Identification
One of Power BI’s most impressive features is its ability to automatically identify and extract tables from web pages. After providing the URL, Power BI analyzes the HTML structure and presents multiple tables that it identifies within the page.
The system offers two viewing options for each table:
- Tabular format – Displays the data in a structured table format ready for analysis
- Web view – Shows how the data appears on the original web page
This dual-view capability allows users to verify that the correct data has been extracted without having to switch between applications.
Transforming Web Data in Power Query Editor
After selecting the desired table, clicking “Transform Data” opens the Power Query Editor where the data can be refined. Power BI automatically performs several steps to prepare the data:
- Extracts tables from HTML
- Promotes the first row to headers
- Changes column data types appropriately
From here, users can perform additional transformations as needed. The editor offers two primary tabs for data manipulation:
Transform vs. Add Column
The Transform tab allows for modifying existing columns directly, while the Add Column tab enables the creation of new columns based on existing data. For example, when working with FIFA World Cup data, we can extract just the final score from a column containing additional information.
To extract specific characters from a text column:
- Select the column containing the data
- Click on the Add Column tab
- Choose “Extract” and then “First Characters”
- Specify the number of characters to extract (in this case, 3 characters containing the score)
- Rename the new column to something meaningful like “Final Score”
Each of these actions appears as a step in the Applied Steps pane, allowing users to track and modify their data transformation process.
Finalizing Your Data
Once all necessary transformations are complete, click “Close & Apply” to return to the Power BI Desktop. The extracted and transformed web data will now be available for visualization and analysis in the Power BI environment, accessible through the Report, Data, and Model views.
This web data extraction capability significantly expands the range of data sources available for business intelligence analysis, allowing organizations to incorporate publicly available web data into their decision-making processes.