How to Scrape Instagram Profiles and Posts: A Comprehensive Guide

How to Scrape Instagram Profiles and Posts: A Comprehensive Guide

Instagram data scraping is a valuable skill for data analysis and research. This article outlines effective methods to extract profile information and post data from Instagram, offering both standard and optimized approaches.

Method 1: Scraping Instagram Profiles Using GraphQL

The first approach involves using Instagram’s GraphQL endpoint with the necessary authentication tokens. This method requires extracting crucial elements from the HTML page:

  1. Obtain the LSD (Facebook security token) from the HTML
  2. Extract the user ID from the page data structure
  3. Make a request to the GraphQL endpoint using these values

This approach works reliably but requires parsing complex nested JSON structures in the page source. The key challenge is finding the user ID which is deeply embedded in the page’s JavaScript objects.

Finding the LSD Token

The LSD token acts as an authentication mechanism and can be found in the HTML source:

  • Use a library like Cheerio to parse the HTML
  • Target the script tag containing the token
  • Extract the value which is required for API requests

While this token eventually expires (typically after several hours), a single token can be used for multiple requests before needing renewal.

Extracting the User ID

The user ID is necessary for the GraphQL request but isn’t directly visible in the URL. To find it:

  • Parse the JavaScript data embedded in the page
  • Use recursive functions to navigate the nested JSON structure
  • Look for patterns like “Polaris profile nested content route”
  • Extract the ID from the props object

Once you have both the LSD token and user ID, you can make successful GraphQL requests to retrieve detailed profile information.

Method 2: The Optimized Approach (i.instagram.com)

A significantly more efficient method utilizes Instagram’s internal API endpoint:

i.instagram.com/api/v1/users/web_profile_info/?username=[handle]

This approach requires:

  • An X-IG-App-ID header
  • A mobile/iOS user agent
  • Ideally, a residential proxy IP

The advantages of this method are substantial:

  • Much faster response times
  • Less likely to be rate-limited
  • Returns additional data including recent posts and related accounts
  • No need to extract complex tokens from HTML

The response includes comprehensive profile information, the user’s most recent posts (including media URLs), and related accounts—making it ideal for more extensive data collection projects.

Scraping Instagram Posts

For posts, the process is similar to the profile scraping method:

  1. Extract the short code from the post URL
  2. Obtain an LSD token (can be reused from profile scraping)
  3. Make a request to the GraphQL endpoint with these values

The short code is visible in the post URL (e.g., instagram.com/p/[short_code]) and serves as the unique identifier for the post.

Best Practices for Instagram Scraping

To maintain consistent access and avoid blocks:

  • Use residential proxies rather than data center IPs
  • Implement rate limiting in your scraping code
  • Cache the LSD token and refresh it periodically (every few hours)
  • Use mobile user agents when possible
  • Consider running a cron job to periodically refresh authentication tokens

By following these methods, you can reliably extract profile information and post data from Instagram for analysis, research, or integration with other systems.

Leave a Comment