Understanding HTML Basics: Structure, Tags, and Elements

HTML (Hypertext Markup Language) serves as the foundation for web development and is essential for web scraping projects. This article explores the fundamentals of HTML, breaking down its core components and structure to help you better understand how web pages are constructed.

The Origins of HTML

The World Wide Web began as a way to share documents on the internet, which was gaining functionality through services like email and news. In 1990, English physicist Tim Berners-Lee and Belgian computer scientist Robert Cailliau introduced HTML to the world, revolutionizing how information would be shared online.

HTML Tags Explained

HTML tags are the building blocks of web pages, consisting of opening and closing brackets. They typically come in pairs (opening and closing tags), though some function as standalone or single tags. An important characteristic of HTML tags is that they are not case-sensitive, meaning <head>, <HEAD>, and <Head> are all equivalent.

Tags instruct the browser how to display content and come in two main types:

Empty (single) tags
Container tags (which have opening and closing components)

HTML Attributes

HTML attributes provide additional information to elements and modify their behavior or appearance. They appear within the opening tag and consist of an attribute name and an attribute value.

For example, in an image tag: <img src="image.jpg">

img is the tag
src is the attribute name
“image.jpg” is the attribute value

Attributes allow for customization, such as changing text color, font size, or linking to resources.

HTML Elements

An HTML element includes everything from the opening tag to the closing tag, including the content between them. The complete structure consists of:

Opening tag (with optional attributes)
Content
Closing tag

For example: <a href="contact.html">Contact Us</a>

In this element, <a href=”contact.html”> is the opening tag with an attribute, “Contact Us” is the content, and </a> is the closing tag.

The Structure of HTML Documents

HTML documents follow a clear hierarchical structure:

DOCTYPE declaration (indicates the document type)
<html> (the root element)
<head> (contains metadata, title, etc.)
<title> (defines the page title shown in browser tabs)
<body> (contains the visible content of the page)

Within the body, you can include various elements such as paragraphs, headings, lists, images, and tables.

Common HTML Elements

Text Formatting

<p> – Paragraph
<h1> to <h6> – Headings (h1 is largest, h6 is smallest)
<b> – Bold text
<i> – Italic text
<u> – Underlined text
<br> – Line break

Lists

<ol> – Ordered list (numbered)
<ul> – Unordered list (bulleted)
<li> – List item

Tables

HTML tables are structured with these elements:

<table> – Defines the table
<tr> – Table row
<td> – Table data (cell)

Tables allow for organizing content in rows and columns, making them useful for displaying structured data.

Conclusion

Understanding HTML basics is crucial for anyone involved in web development or web scraping. By grasping the concepts of tags, attributes, elements, and document structure, you can better navigate and extract information from websites, making your web scraping projects more efficient and effective.