Skip navigation
Part I Chapter 3

Markup

Introduction

As the 2020 chapter said, without HTML there are no web pages, no web sites, no web apps. You can say that without HTML, there’s no Web. That makes HTML one of the most important web standards, if not the most important web standard.

Accordingly, like every year, we used the millions of pages in our data set—7.9 million in the mobile set, 5.4 million in the desktop set, with overlap—to also look at HTML. This chapter doesn’t cover “everything” there is about HTML, so we explicitly encourage you to also analyze the data we gathered and to share your own conclusions—and when you do, tag them: #htmlalmanac.

Document data

There’s much to be curious about when it comes to how we write HTML. We can ask lots of questions, but when it comes to HTML in general, let’s have a look at how our HTML is sent to our browsers, before we even get into the contents of the markup itself.

Doctypes

Doctype Desktop Mobile
html 88.1% 90.0%
html -//w3c//dtd xhtml 1.0 transitional//en http://www.w3.org/tr/xhtml1/dtd/xhtml1-transitional.dtd 4.7% 3.9%
No doctype 3.0% 2.7%
html -//w3c//dtd xhtml 1.0 strict//en http://www.w3.org/tr/xhtml1/dtd/xhtml1-strict.dtd 1.2% 1.1%
html -//w3c//dtd html 4.01 transitional//en http://www.w3.org/tr/html4/loose.dtd 0.9% 0.6%
html -//w3c//dtd html 4.01 transitional//en 0.4% 0.4%
Figure 3.1. Doctype usage.

Let’s start with doctypes—which one is the most popular? But you know the answer to this one: It’s the short, simple, boring standard HTML doctype, that is, <!DOCTYPE html>.

90%
Figure 3.2. Mobile using the standard HTML doctype.

90% of all mobile pages use it—as the mobile data set is largest, this chapter will usually work with that data. Next most popular is XHTML 1.0 Transitional (3.9%, down from 4.6% in 2021). After that it’s no doctype being set at all at 2.7%, up from 2.5% last year.

Compression

Figure 3.3. HTML document content encoding.

Are HTML documents being compressed? How many? How? 86% of them are—with 58% (down 5.8% since last year) overall being gzip-compressed, and 28% (up 6.1%) being compressed using Brotli. Overall, slightly more documents are being compressed, and compressed more effectively.

Languages

Figure 3.4. Most popular regional HTML lang values.

What about languages? In our data set, 35% of pages used a lang attribute mapping to English; 17% had no language set; and you already see the difficulties—the sample is likely biased and also not as big as to reflect all of the world, and no lang attribute being used is not equaling no language being set so, this isn’t something our data would be useful for.

Conformance

Do documents conform with the HTML specification—i.e., are they valid? A quick way for you to tell is by using a tool like the W3C markup validation service.

We didn’t and we couldn’t check this yet. So why include this section?

The reason to at least mention conformance is that if you don’t check on conformance, if you don’t validate, there’s a good chance—in practice, effectively a 100% chance—you end up writing at least some fictitious and fantasy (and therefore wrong) HTML. But HTML isn’t fiction or fantasy—it’s a hard technical standard with clear rules on what works and what doesn’t.

For a professional, it’s good to know these rules. It’s good work to produce code that works and that doesn’t contain anything superfluous, too. And both of that—learning and not shipping anything non-working or superfluous—is why conformance matters, and why validation matters.

We don’t have conformance data to share in the Web Almanac yet, but that doesn’t mean the point is any less important. And if you haven’t focused on conformance yet—start validating your HTML output. Maybe one of the next editions of the Web Almanac will have some positive news to share because of you.

Document size

HTML payload and document size are a staple in this series—we’ve looked at this information since 2019. But the trend is clear, and while it follows a common theme that other chapters will confirm, too, it’s not a great one:

Figure 3.5. Median transfer size of HTML document.

After some brief relief in 2020, document size has continued growing in 2021, and again in 2022, with a median transfer size of 30 kB in our mobile data set.

One way to counter this trend is to write HTML, the HTML way (and not the XHTML way), as that would already result in smaller HTML transfer size. Disclosure: Your author here likes to come up with HTML writing classifications, and enjoys promoting minimal HTML.

Elements

If you’re not including the svg and math elements—because they’re specified outside of HTML—the current HTML specification currently consists of 111 elements.

Elements, not tags, because we’re not referring to mere start or end tags, like <li> or </ins>. And some people count HTML elements differently, but most important is to be clear about how you’re counting.

What can we observe?

Element diversity

Figure 3.6. Distribution of distinct elements per page.

The first thing we can note is that developers use slightly more different elements per page now, with a median of 32 different elements per document.

The median is up from 31 elements in 2021, and 30 elements in 2020. As this is a trend throughout, it may be a tender sign that developers put HTML elements to better use, by using more of them for what they’re there for.

Alas, there’s another trend which aligns with an increasing document size, and that’s a growing number of elements per page in total:

Figure 3.7. Distribution of elements per page.

The median is currently at 653 elements per page, up from 616 in 2021, and 587 in 2020—all per the respective mobile data set. Do we publish more content, requiring more elements to hold them (something like, more paragraphs per text, more p elements)? Or is this just another sign of an unchecked div pandemic? Our data doesn’t answer this but it is probably due to both—and more—reasons.

Top elements

The following elements are used most frequently:

2019 2020 2021 2022
div div div div
a a a a
span span span span
li li li li
img img img img
script script script script
p p p p
option link link link
i meta i
option i meta
Figure 3.8. Most used elements.

The div element is—by far—the most popular element: We found 2,123,819,193 occurrences in the mobile data set, and 1,522,017,185 of them in our desktop data set.

29%
Figure 3.9. Percentage of elements which are div elements.

Divitis is real.

If you wonder about the odd one out, the i element, it stands to reason that this is still largely due to Font Awesome and its arguable misuse of this element. The element has also a bad reputation because during XHTML times, everyone suggested to use em instead—but that advice wasn’t sound, and i elements have their use cases.

When it comes to what elements are being used on the most documents, the list looks a little different:

Figure 3.10. Adoption of top HTML elements.

It’s not a surprise that nearly every document uses html, head, or body tags—they are automatically inserted in the DOM and that is what is being counted here. That the numbers are slightly less than 100% is due to a small number of pages that break detection by overriding the JavaScript APIs we use—for example, MooTools overriding the JSON.stringify() API.

It’s a lot more surprising to miss title on 1% of all sampled documents—this element is not optional, and not being inserted in the DOM, and its omission an indicator for lack of conformance checking.

The elements that then follow are old friends—especially a, img, and meta have been popular elements ever since Ian Hickson’s seminal HTML study back in 2005.

What’s the least used HTML element that’s part of the current standard, you ask? That’s samp, with a mere 2,002 findings in our mobile set.

Custom elements

Custom elements—elements we can loosely identify by their inner-name use of a hyphen—also made it into our samples again. This year, however, the Top 10 is entirely dominated by Slider Revolution:

Custom element Desktop Mobile
rs-module-wrap 2.1% 2.3%
rs-module 2.1% 2.3%
rs-slides 2.1% 2.3%
rs-slide 2.1% 2.3%
rs-sbg-wrap 2.0% 2.2%
rs-sbg-px 2.0% 2.2%
rs-sbg 2.0% 2.2%
rs-progress 2.0% 2.2%
rs-layer 1.8% 2.0%
rs-mask-wrap 1.8% 2.0%
Figure 3.11. Most used custom elements.

That’s impressive—but gives us little to work with other than saying that Slider Revolution is used on roughly 2% of all sampled pages.

What are the next popular custom elements that are not part of Slider Revolution?

Custom element Desktop Mobile
pages-css 1.1% 2.0%
wix-image 1.1% 2.0%
router-outlet 0.7% 0.5%
wix-iframe 0.4% 0.7%
ss3-loader 0.5% 0.5%
Figure 3.12. Most used custom elements not starting with rs-.

This is more diverse: pages-css, wix-image and wix-iframe come from the Wix website builder. router-outlet originates in Angular. And ss3-loader seems to be related to Smart Slider.

Obsolete elements

Are obsolete elements still a thing? Given that not-validating is still a thing, yes.

Figure 3.13. Obsolete elements.

On 6.1% of pages, you still find center elements (hi Google homepage), and on 5.4% of pages, you find font elements. Use of both elements went down (down 0.5% in both cases), fortunately, while marquee, nobr, and big didn’t witness significant changes.

center and font make for the lion’s share (81.2%) of all obsolete elements, per our analysis:

Figure 3.14. Obsolete elements relative use.

Attributes

If elements are the bread of HTML, then attributes are the butter. What can we learn here?

Top attributes

The most popular attribute, by far, was and still is class:

Figure 3.15. Attribute usage.

This order isn’t any different from what we’ve seen last year, but there are some changes:

  • class (▼0.3%), href (▼0.9%), style (▼0.6%), id (▼0.2%), type (▼0.1%), title (▼0.3%), and value (▼0.5%) are all used a little less than before.
  • src (▲0.3%) and alt (▲0.1%) are used more than before—tentatively good news for accessibility!
  • rel usage hasn’t changed significantly.

Are there attributes we find on (nearly) every document? Yes:

Figure 3.16. Attribute usage by page.

href, src, content (metadata), and name (metadata, form identifiers) are present on nearly every document in our sample.

data-* attributes

For data-* attributes—which allow authors to embed their own custom metadata—we also pulled new information.

This changed only little compared to last year’s data-* attributes stats. Here are some changes to call out:

  • data-id is still the most popular data-* attribute, with a 0.7% increase compared to 2021.
  • data-element_type, though its position stayed the same, gained 0.7% as well.
  • data-testid ranked #6 before, gained 0.3%, and jumped to #4.
  • data-widget_type ranked #8, gained 0.4% popularity, and also gained two spots, taking #6 in 2022.

data-element_type and data-widget_type relate to Elementor, while data-testid is coming from Testing Library.

Let’s have a look at how often we find data-* attributes on our pages:

Figure 3.17. Data attribute popularity.

Their popularity is high! Per the chart above close to every fourth document uses data-* attributes. But the overall data show that 88% of documents use at least one data-* attribute. That’s quite some adoption.

Social markup

Last year’s edition introduced a section on social markup, special markup which makes it easier for social platforms to identify and display the respective metadata. Here’s the 2022 update:

Figure 3.18. Social meta nodes usage.

Do you need all of this metadata? That depends on your requirements. But if these requirements are about showing title, description, and image, you don’t seem to need nearly as much. You may be able to do with twitter:card, og:title, og:description (hooked up to standard description metadata), and og:image. The author and many others have described options for minimal social markup.

Conclusion

This was a glance at HTML in 2022.

The conclusion is brief: Going from year to year, it’s hard to say what important trends were started or reversed. Document size seems to keep growing—at least from 2020 to 2021 to 2022. The number of elements per page goes up every year too. There may be slightly more alt attributes now, but that’s relative to itself and we can’t tell whether more images now do have an appropriate alt attribute set—nor whether its text is really meaningful.

But with all of this, the Web Almanac will help. We’re going to look at HTML again—next year, the year after next, and the year after that. And we’ll go into more detail again and we’ll look back at more years.

What perhaps we’ll also be able to do is to look at conformance too. Not everyone may care about this at this time in our field. But we’re all professionals, and it seems at least relevant to know whether overall, we produce work that corresponds to the underlying standard(s). After all, this shouldn’t be a chapter about fantasy HTML—it should be one about HTML that actually works. It’s one of the most important web standards.

Author