Web Scraping Stock Data Using Regular Expressions and Node.js

Web scraping is a powerful technique for extracting valuable data from websites. Using regular expressions with Node.js provides a flexible approach to parse HTML and XML content to retrieve specific information. This article explores how to implement web scraping to extract stock prices from financial websites.

Setting Up the Scraper

The foundation of our web scraper is an asynchronous JavaScript function that fetches HTML content from a target website. We’ll create a main function that will serve as the entry point for our application:

To begin, we create a basic lookup function and an anonymous async main function that will execute our code:

<code>const lookupStock = async (stock) => {
  // Function code will go here
}

// Main function (self-executing async function)
(async () => {
  // This is where we'll run everything
})();</code>

Fetching Stock Data

To retrieve stock information, we need to identify the URL structure for the financial data we want to access. For this example, we’ll use Google Finance to get stock quotes from NASDAQ.

First, we define the URL for fetching stock data:

<code>const url = `https://www.google.com/finance/quote/${stock}:NASDAQ`;</code>

Next, we use the fetch API to retrieve the HTML content from the webpage:

<code>const response = await fetch(url);
const html = await response.text();</code>

Extracting Price Information with Regular Expressions

The key to effective web scraping is identifying the patterns in the HTML that contain the data we want. Regular expressions provide a powerful way to target specific elements within the page.

To extract the stock price, we use a combination of lookbehind and lookahead assertions:

<code>const priceMatch = html.match(/(?<=data-last-price=").*?(?=" data-normal-market-price-stamp)/) || ["Not found"];
return priceMatch[0];</code>

This regular expression targets content between the attribute data-last-price=" and " data-normal-market-price-stamp, which contains the current stock price in Google Finance’s HTML structure.

Processing Multiple Stocks

To scale our scraper for multiple stocks, we can create an array of stock symbols and map through them:

<code>const stocks = ["NVDA", "AAPL", "MSFT"];

const stockPrices = stocks.map(stock => lookupStock(stock));

// Since lookupStock returns promises, we need to resolve them all
const prices = await Promise.all(stockPrices);
console.log(prices);</code>

By using Promise.all(), we can efficiently process all stock queries in parallel and wait for all results to return before proceeding.

Understanding the Regular Expression Components

Let’s break down the regular expression used:

Lookbehind assertion: (?<=data-last-price=") identifies the text that comes after this pattern
.*?: Matches any character (.) any number of times (*), but as few as possible (? makes it non-greedy)
Lookahead assertion: (?=" data-normal-market-price-stamp) identifies the text that comes before this pattern

This technique ensures we extract only the price value, excluding the surrounding HTML structure.

Practical Applications

This web scraping approach can be extended to create various useful tools:

Daily stock price notifications via email or SMS
Price movement alerts when stocks reach certain thresholds
Historical price tracking for analysis
Portfolio monitoring across multiple exchanges

With minor modifications to the regular expressions, you can extract additional data points such as market cap, trading volume, or price changes.

Conclusion

Web scraping with regular expressions and Node.js provides a flexible and efficient method for extracting specific data from websites. By understanding the structure of the target HTML and crafting appropriate regular expressions, you can build powerful tools to gather and analyze financial information or any other web-based data.