Skip navigation
Part II Chapter 8

SEO

Introduction

SEO (Search Engine Optimization) is the practice of optimizing a website or web page to increase the quantity and quality of its traffic from a search engine’s organic results.

SEO is more popular than ever and has seen huge growth over the last couple years as companies sought new ways to reach customers. SEO’s popularity has far outpaced other digital channels.

Google Trends comparison of SEO versus pay-per-click, social media marketing, and email marketing.
Screenshot showing the interest over time in Google Trends comparing four digital marketing channels; search engine optimization, pay-per-click, social media marketing, and email marketing. SEO remained the most popular digital marketing channel, with interest far outpacing the other channels in recent years. SEO continued to be an evolving field with a growing community around the world.
Figure 8.1. Google Trends comparison of SEO versus pay-per-click, social media marketing, and email marketing.

The purpose of the SEO chapter of the Web Almanac is to analyze various elements related to optimizing a website. In this chapter, we’ll check if websites are providing a great experience for users and search engines.

Many sources of data were used for our analysis including Lighthouse, the Chrome User Experience Report (CrUX), as well as raw and rendered HTML elements from the HTTP Archive on mobile and desktop. In the case of the HTTP Archive and Lighthouse, the data is limited to the data identified from websites’ homepages only, not site-wide crawls. Keep that in mind when drawing conclusions from our results. You can learn more about the analysis on our Methodology page.

Read on to find out more about the current state of the web and its search engine friendliness.

Crawlability and Indexability

To return relevant results to these user queries, search engines have to create an index of the web. The process for that involves:

  1. Crawling - search engines use web crawlers, or spiders, to visit pages on the internet. They find new pages through sources such as sitemaps or links between pages.
  2. Processing - in this step search engines may render the content of the pages. They will extract information they need like content and links that they will use to build and update their index, rank pages, and discover new content.
  3. Indexing - Pages that meet certain indexability requirements around content quality and uniqueness will typically be indexed. These indexed pages are eligible to be returned for user queries.

Let’s look at some issues that may impact crawlability and indexability.

robots.txt

robots.txt is a file located in the root folder of each subdomain on a website that tells robots such as search engine crawlers where they can and can’t go.

81.9% of websites make use of the robots.txt file (mobile). Compared with previous years (72.2% in 2019 and 80.5% in 2020), that’s a slight improvement.

Having a robots.txt is not a requirement. If it’s returning a 404 not found, Google assumes that every page on a website can be crawled. Other search engines may treat this differently.

Bar chart showing percent of pages with a valid robots.txt file. Status code 200 was present on 81.9% of mobile sites, status code 404 was present on 16.5% of mobile sites. The other status codes are barely used, and desktop numbers are almost identical to mobile.
Figure 8.2. Breakdown of robots.txt status codes.

Using robots.txt allows website owners to control search engine robots. However, the data showed that as many as 16.5% of websites have no robots.txt file.

Websites may have misconfigured robots.txt files. For example, some popular websites were (presumably mistakenly) blocking search engines. Google may keep these websites indexed for a period of time, but eventually their visibility in search results will be lessened.

Another category of errors related to robots.txt is accessibility and/or network errors, meaning the robots.txt exists but cannot be accessed. If Google requests a robots.txt file and gets such an error, the bot may stop requesting pages for a while. The logic behind this is that search engines are unsure if a given page can or cannot be crawled, so it waits until robots.txt becomes accessible.

~0.3% of websites in our dataset returned either 403 Forbidden or 5xx. Different bots may handle these errors differently, so we don’t know exactly what Googlebot may have seen.

The latest information available from Google, from 2019 is that as many as 5% of websites were temporarily returning 5xx on robots.txt, while as many as 26% were unreachable.

Breakdown of robots.txt status codes Googlebot encountered.
Screenshot showing the percentage of robots.txt status codes encountered by Googlebot. Taken from 2019 data, 69% of sites were Good and utilized status code 200 or returned 404 for open access. As many as 5% were Temporarily OK returning 5xx on robots.txt. As much as 26% of sites were Unreachable.
Figure 8.3. Breakdown of robots.txt status codes Googlebot encountered.

Two things may cause the discrepancy between the HTTP Archive and Google data:

  1. Google presents data from 2 years back while the HTTP Archive is based on recent information, or

  2. The HTTP Archive focuses on websites that are popular enough to be included in the CrUX data, while Google tries to visit all known websites.

robots.txt size

Bar chart showing robots.txt size distribution. Nearly all robots.txt files are small and weighed between 0-100 kb. We found that 96.72% of robots.txt files on mobile pages weighed between 0-100 kb (similar results for desktop). Virtually no web pages (desktop or mobile) had robots.txt files greater than 100 kb, and 1.58% were missing.
Figure 8.4. robots.txt size distribution.

Most robots.txt files are fairly small, weighing between 0-100 kb. However, we did find over 3,000 domains that have a robots.txt file size over 500 KiB which is beyond Google’s max limit. Rules after this size limit will be ignored.

Bar chart showing the ten most common robots.txt user-agent usage. Results were nearly identical for desktop and mobile, with 75.2% of domains not indicating a specific user-agent. We found 6.3% for adsbot-google, 5.6% for mj12bot, 5.0% for ahrefsbot, 4.9% for mediapartners-google, 3.4% for googlebot, 3.3% for nutch, 3.1% for yandex, 2.9% for pinterest, 2.7% for ahrefssiteaudit.
Figure 8.5. robots.txt user-agent usage.

You can declare a rule for all robots or specify a rule for specific robots. Bots usually try to follow the most specific rule for their user-agents. User-agent: Googlebot will refer to Googlebot only, while User-agent: * will refer to all bots that don’t have a more specific rule.

We saw two popular SEO-related robots: mj12bot (Majestic) and ahrefsbot (Ahrefs) in the top 5 most specified user agents.

robots.txt search engine breakdown

User-agent Desktop Mobile
Googlebot 3.3% 3.4%
Bingbot 2.5% 3.4%
Baiduspider 1.9% 1.9%
Yandexbot 0.5% 0.5%
Figure 8.6. robots.txt search engine breakdown.

When looking at rules applying to particular search engines, Googlebot was the most referenced appearing on 3.3% of crawled websites.

Robots rules related to other search engines, such as Bing, Baidu, and Yandex, are less popular (respectively 2.5%, 1.9%, and 0.5%). We did not look at what rules were applied to these bots.

Canonical tags

The web is a massive set of documents, some of which are duplicates. To prevent duplicate content issues, webmasters can use canonical tags to tell search engines which version they prefer to be indexed. Canonicals also help to consolidate signals such as links to the ranking page.

Bar chart showing canonical tag usage. We found most web pages used canonical tags (58.5% of mobile pages and 56.6% of desktop pages). The percentage of canonicalized web pages was higher on mobile (8.3%) compared to desktop (4.3%).
Figure 8.7. Canonical tag usage.

The data shows increased adoption of canonical tags over the years. For example, 2019’s edition shows that 48.3% of mobile pages were using a canonical tag. In 2020’s edition, the percentage grew to 53.6%, and in 2021 we see 58.5%.

More mobile pages have canonicals set than their desktop counterparts. In addition, 8.3% of mobile pages and 4.3% of desktop pages are canonicalized to another page so that they provide a clear hint to Google and other search engines that the page indicated in the canonical tag is the one that should be indexed.

A higher number of canonicalized pages on mobile seems to be related to websites using separate mobile URLs. In these cases, Google recommends placing a rel="canonical" tag pointing to the corresponding desktop URLs.

Our dataset and analysis are limited to homepages of websites; the data is likely to be different when considering all URLs on the tested websites.

Two methods of implementing canonical tags

When implementing canonicals, there are two methods to specify canonical tags:

  1. In the HTML’s <head> section of a page
  2. In the HTTP headers (via the Link HTTP header)
Bar chart showing the presence of raw vs rendered canonical tags (raw, rendered, rendered but not raw, rendering changed, http header changed). Our data found that raw canonical tags were found on 55.9% of desktop and 57.7% of mobile pages. Rendered canonical tags were found on 56.5% of desktop and 58.4% of mobile pages. Other tags were found on less than 1.5% of desktop or mobile pages.
Figure 8.8. Canonical raw versus rendered usage.

Implementing canonical tags in the <head> of a HTML page is much more popular than using the Link header method. Implementing the tag in the head section is generally considered easier, which is why that usage so much higher.

We also saw a slight change (< 1%) in canonical between the raw HTML delivered, and the rendered HTML after JavaScript has been applied.

Conflicting canonical tags

Sometimes pages contain more than one canonical tag. When there are conflicting signals like this, search engines will have to figure it out. One of Google’s Search Advocates, Martin Splitt, once said it causes undefined behavior on Google’s end.

The previous figure shows as many as 1.3% of mobile pages have different canonical tags in the initial HTML and the rendered version.

Last year’s chapter noted that “A similar conflict can be found with the different implementation methods, with 0.15% of the mobile pages and 0.17% of the desktop ones showing conflicts between the canonical tags implemented via their HTTP headers and HTML head.”

This year’s data on that conflict is even more worrisome. Pages are sending conflicting signals in 0.4% of cases on desktop and 0.3% of cases on mobile.

As the Web Almanac data only looks on homepages, there may be additional problems with pages located deeper in the architecture, which are pages more likely to be in need of canonical signals.

Page Experience

2021 saw an increased focus on user experience. Google launched the Page Experience Update which included existing signals, such as HTTPS and mobile-friendliness, and new speed metrics called Core Web Vitals.

HTTPS

Bar chart showing the percentage of HTTPS. We found that that 81.2% of mobile pages were HTTPS and 84.3% of desktop pages were HTTPS.
Figure 8.9. Percentage of Desktop and Mobile pages served with HTTPS.

Adoption of HTTPS is still increasing. HTTPS was the default on 81.2% of mobile pages and 84.3% of desktop pages. That’s up nearly 8% on mobile websites and 7% on Desktop websites year over year.

Mobile-friendliness

There’s a slight uptick in mobile-friendliness this year. Responsive design implementations have increased while dynamic serving has remained relatively flat.

Responsive design sends the same code and adjusts how the website is displayed based on the screen size, while dynamic serving will send different code depending on the device. The viewport meta tag was used to identify responsive websites vs the Vary: User-Agent header to identify websites using dynamic serving.

91.1%
Figure 8.10. Percent of mobile pages using the viewport meta tag—a signal of mobile friendliness.

91.1% of mobile pages include the viewport meta tag, up from 89.2% in 2020. 86.4% of desktop pages also included the viewport meta tag, up from 83.8% in 2020.

Bar chart showing the vary header used to identify mobile-friendless. We found that most web pages utilized a response design (87.4% for desktop and 86.6% for mobile), compared to pages that used dynamic serving (12.6% for desktop and 13.4% for mobile).
Figure 8.11. Vary: User-Agent header usage.

For the Vary: User-Agent header, the numbers were pretty much unchanged with 12.6% of desktop pages and 13.4% of mobile pages with this footprint.

13.5%
Figure 8.12. Percent of mobile pages not using legible font sizes.

One of the biggest reasons for failing mobile-friendliness was that 13.5% of pages did not use a legible font size. Meaning 60% or more of the text had a font size smaller than 12px which can be hard to read on mobile.

Core Web Vitals

Core Web Vitals are the new speed metrics that are part of Google’s Page Experience signals. The metrics measure visual load with Largest Contentful Paint (LCP), visual stability with Cumulative Layout Shift (CLS), and interactivity with First Input Delay (FID).

The data comes from the Chrome User Experience Report (CrUX), which records real-world data from opted-in Chrome users.

Line chart showing the percentage of good CWV experiences on mobile. In 2021, Good LCP increased from 42% to 45%, Good FID increased from 81% to 90%, Good CLS increased from 55% to 62%, and Good CWV increased from 23% to 29%. Our findings suggest that percentage of mobile websites that offer good CWV experiences will continue to increase each year.
Figure 8.13. Core web vitals metrics trend.

29% of mobile websites are now passing Core Web Vitals thresholds, up from 20% last year. Most websites are passing FID, but website owners seem to be struggling to improve CLS and LCP. See the Performance chapter for more on this topic.

On-Page

Search engines look at your page’s content to determine whether it’s a relevant result for the search query. Other on-page elements may also impact rankings or appearance on the search engines.

Metadata

Metadata includes <title> elements and <meta name="description"> tags. Metadata can directly and/or indirectly affect SEO performance.

Bar chart showing the percentage of pages that have metadata. We found that 98.8% of mobile and desktop pages had a title element and 71.1% of mobile and desktop pages had a meta description.
Figure 8.14. Breakdown of title and meta description usage.

In 2021, 98.8% of desktop and mobile pages had <title> elements. 71.1% of desktop and mobile homepages had <meta name="description"> tags.

<title> Element

The <title> element is an on-page ranking factor that provides a strong hint regarding page relevance and may appear on the Search Engine Results Page (SERP). In August 2021 Google started re-writing more titles in their search results.

Bar chart showing the number of words in the title tag percentile (10, 25, 50, 75, and 90). The median page contained a title that was 6 words, and 50% of all titles contained 3-9 words. There was no difference in the word count between desktop and mobile within our dataset.
Figure 8.15. Number of words used in title elements.
Bar chart showing the number of characters in the title tag per percentile (10, 25, 50, 75, and 90). The median page contained a title character count of 39 characters on desktop and 40 on mobile. There was little different in the character count between desktop and mobile in our dataset.
Figure 8.16. Number of characters used in title elements.

In 2021:

  • The median page <title> contained 6 words.
  • The median page <title> contained 39 and 40 characters on desktop and mobile, respectively.
  • 10% of pages had <title> elements containing 12 words.
  • 10% of desktop and mobile pages had <title> elements containing 74 and 75 characters, respectively.

Most of these stats are relatively unchanged since last year. Reminder that these are titles on homepages which tend to be shorter than those used on deeper pages.

Meta description tag

The <meta name="description> tag does not directly impact rankings. However, it may appear as the page description on the SERP.

Bar chart showing the number of words in the meta description per percentile (10, 25, 50, 75, and 90). The median page contained a meta description with 20 words for desktop and 19 words for mobile. There was little different in the character count between desktop and mobile in our dataset.
Figure 8.17. Number of words used in meta descriptions.
Bar chart showing the number of characters in the meta description tag per percentile (10, 25, 50, 75, and 90). The median page contained meta description with 138 characters on desktop pages and 137 characters on mobile pages. There was little different in the character count between desktop and mobile in our dataset.
Figure 8.18. Number of characters used in meta descriptions.

In 2021:

  • The median desktop and mobile page <meta name="description> tag contained 20 and 19 words, respectively.
  • The median desktop and mobile page <meta name="description> tag contained 138 and 127 characters, respectively.
  • 10% of desktop and mobile pages had <meta name="description> tags containing 35 words.
  • 10% of desktop and mobile pages had <meta name="description> tags containing 232 and 231 characters, respectively.

These numbers are relatively unchanged from last year.

Images

Bar chart showing the number of <img> elements per page per percentile (10, 25, 50, 75, and 90). The median desktop page featured 21 <img> elements and the median mobile page featured 19 <img> elements.
Figure 8.19. Number of images on each page.

Images can directly and indirectly impact SEO as they impact image search rankings and page performance.

  • 10% of pages have two or fewer <img> tags. That’s true of both desktop and mobile.
  • The median desktop page has 21 <img> tags while the median mobile page has 19 <img> tags.
  • 10% of desktop pages have 83 or more <img> tags. 10% of mobile pages have 73 or more <img> tags.

These numbers have changed very little since 2020.

Image alt attributes

The alt attribute on the <img> element helps explain image content and impacts accessibility.

Note that missing alt attributes may not indicate a problem. Pages may include extremely small or blank images which don’t require an alt attribute for SEO (nor accessibility) reasons.

Bar chart showing the number of images with alt attributes present per percentile (10, 25, 50, 75, and 90). Our data found the median web page contained 54.6% of images with an alt attribute on mobile pages and 56.5% on desktop pages.
Figure 8.20. Percentage of images that contain alt attributes.
Bar chart showing the percent of featured alt blank attributes per percentile (25, 50, 75, and 90). The median web page featured 10.5% blank alt attributes on desktop and 11.8% blank alt attributes on mobile.
Figure 8.21. Percentage of alt attributes that were blank.
Bar chart showing the percent of images with alt attributes missing per percentile (10, 25, 50, 75, and 90). The median web page had 1.4% of images with alt attributes missing on desktop on zero alt attributes missing on mobile.
Figure 8.22. Percentage of images missing alt attributes.

We found that:

  • On the median desktop page, 56.5% of <img> tags have an alt attribute. This is a slight increase versus 2020.
  • On the median mobile page, 54.6% of <img> tags have an alt attribute. This is a slight increase versus 2020.
  • However, on the median desktop and mobile pages 10.5% and 11.8% of <img> tags have blank alt attributes (respectively). This is effectively the same as 2020.
  • On the median desktop and mobile pages there are zero or close to zero <img> tags missing alt attributes. This is an improvement over 2020, when 2-3% of <img> tags on median pages were missing alt attributes.

Image loading attributes

The loading attribute on <img> elements affects how user agents prioritize rendering and display of images on the page. It may impact user experience and page load performance, both of which impact SEO success.

Bar chart showing percent of pages and image loading property usage (missing, lazy, eager, invalid, auto, and blank). Our data found that 83.3% of desktop and 83.5% of mobile pages were missing an image loading property. We found that 15.6% of desktop and mobile pages use loading="lazy" while only .8% of desktop and mobile pages use loading="eager". The number of other cases is less than 1% on desktop and mobile pages, this includes cases with an invalid or blank property or with loading="auto".
Figure 8.23. Image loading property usage.

We saw that:

  • 85.5% of pages don’t use any image loading property.
  • 15.6% of pages use loading="lazy" which delays loading an image until it is close to being in the viewport.
  • 0.8% of pages use loading="eager" which loads the image as soon as the browser loads the code.
  • 0.1% of pages use invalid loading properties.
  • 0.1% of pages use loading="auto" which uses the default browser loading method.

Word count

The number of words on a page isn’t a ranking factor, but the way pages deliver words can profoundly impact rankings. Words can be in the raw page code or the rendered content.

Rendered word count

First, we look at rendered page content. Rendered is the content of the page after the browser has executed all JavaScript and any other code that modifies the DOM or CSSOM.

Bar chart showing the number of visible words rendered by percentile (10, 25, 50, 75, 90). The median rendered desktop page contained 425 words and the median mobile page contained 367. Mobile pages contained fewer rendered words at every percentile.
Figure 8.24. Visible words rendered by percentile.
  • The median rendered desktop page contains 425 words, versus 402 words in 2020.
  • The median rendered mobile page contains 367 words, versus 348 words in 2020.
  • Rendered mobile pages contain 13.6% fewer words than rendered desktop pages. Note that Google is a mobile-only index. Content not on the mobile version may not get indexed.

Raw word count

Next, we look at the raw page content Raw is the content of the page before the browser has executed JavaScript or any other code that modified the DOM or CSSOM. It’s the “raw” content delivered and visible in the source code.

Bar chart showing the number of raw visible words by percentile (10, 25, 50, 75, 90). The median raw desktop page contained 369 words and the median mobile page contained 321. Mobile pages contained fewer raw words at every percentile.
Figure 8.25. Visible words raw by percentile.
  • The median raw desktop page contains 369 words, versus 360 words in 2020.
  • The median raw mobile page contains 321 words, versus 312 words in 2020.
  • Raw mobile pages contain 13.1% fewer words than raw desktop pages. Note that Google is a mobile-only index. Content not on the mobile HTML version may not get indexed.

Overall, 15% of written content on desktop devices is generated by JavaScript and 14.3% on mobile versions.

Structured Data

Historically, search engines have worked with unstructured data: the piles of words, paragraphs and other content that comprise the text on a page.

Schema markup and other types of structured data provide search engines another way to parse and organize content. Structured data powers many of Google’s search features.

Like words on the page, structured data can be modified with JavaScript.

Bar chart showing the number of pages with raw vs rendered structured data. 41.8% of desktop and 42.5% of mobile of pages had raw structured data. The number of pages that had rendered structure data was 43.2% for desktop and 44.2% for mobile. Few pages had only rendered structure data, 1.4% of desktop pages and 1.7% of mobile pages. Lastly, 4.5% of desktop pages and 4.7% of mobile pages had structured data rendering changes.
Figure 8.26. Structure data usage.

42.5% of mobile pages and 41.8% of desktop pages have structured data in the HTML. JavaScript modifies the structured data on 4.7% of mobile pages and 4.5% of desktop pages.

On 1.7% of mobile pages and 1.4% of desktop pages structured data is added by JavaScript where it didn’t exist in the initial HTML response.

Bar chart showing the number of pages with structured data formats (JSON-LD, microdata, RDFa, microformats2). The JSON-LD structured data format was implemented on 62.4% of desktop and 60.5% of mobile sites. The microdata format was implemented on 34.6% of desktop and 36.9% of mobile sites. The RDFa format was implemented on 2.9% of desktop and 2.4% of mobile sites. The microformats2 format was used on only .2% of desktop and mobile sites in our dataset.
Figure 8.27. Breakdown of structured data formats.

There are several ways to include structured data on a page: JSON-LD, microdata, RDFa, and microformats2. JSON-LD is the most popular implementation method. Over 60% of desktop and mobile pages that have structured data implement it with JSON-LD.

Among websites implementing structured data, over 36% of desktop and mobile pages use microdata and less than 3% of pages use RDFa or microformats2.

Structured data adoption is up a bit since last year. It’s used on 33.2% of pages in 2021 vs 30.6% in 2020.

Bar chart showing the most popular schema types found on homepages. Results were nearly identical for desktop and mobile homepages. The most popular schema types were WebSite, SearchAction, WebPage, UnknownType and Organization.
Figure 8.28. Most popular schema types.

The most popular schema types found on homepages are WebSite, SearchAction, WebPage. SearchAction is what powers the Sitelinks Search Box, which Google can choose to show in the Search Results Page.

<h> elements (headings)

Heading elements (<h1>, <h2>, etc.) are an important structural element. While they don’t directly impact rankings, they do help Google to better understand the content on the page.

Bar chart showing the percent of pages with the presence of H elements by heading tag (level 1, 2, 3, 4). There was little to no difference between desktop and mobile results. h1 headings were found on 65.4% of pages, h2s were found the most frequently on 71.9% of pages, h3s were found on 61.8% of pages and h4 headings were found on 37.6% of pages.
Figure 8.29. Heading element usage.

For main headings, more pages (71.9%) have h2s than have h1s (65.4%). There’s no obvious explanation for the discrepancy. 61.4% of desktop and mobile pages use h3s and less than 39% use h4s.

There was very little difference between desktop and mobile heading usage, nor was there a major change versus 2020.

Bar chart showing the percent of pages with the presence of non-empty <h> elements by heading tag (level 1, 2, 3, 4). There was little to no difference between desktop and mobile results. h1 headings were found on 58.1% of pages, h2s were found the most frequently on 70.5% of pages, h3s on 60.3% of pages and h4 headings were found on 36.5% of pages.
Figure 8.30. Non-empty heading element usage.

However, a lower percentage of pages include non-empty<h> elements, particularly h1. Websites often wrap logo-images in <h1> elements on homepages, and this may explain the discrepancy.

Search engines use links to discover new pages and to pass PageRank which helps determine the importance of pages.

16.0%
Figure 8.31. Pages using non-descriptive link texts.

On top of PageRank, the text used as a link anchor helps search engines to understand what a linked page is about. Lighthouse has a test to check if the anchor text used is useful text or if it’s generic anchor text like “learn more” or “click here” which aren’t very descriptive. 16% of the tested links did not have descriptive anchor text, which is a missed opportunity from an SEO perspective and also bad for accessibility.

Bar chart showing the number of internal links on homepages by percentile (10, 25, 50, 75, 90). The median desktop homepage had 64 internal links compared to 55 internal links on mobile. There were more internal links on desktop homepages at every percentile.
Figure 8.32. Internal links from homepages.

Internal links are links to other pages on the same site. Pages had less links on the mobile versions compared to the desktop versions.

The data shows that the median number of internal links on desktop is 16% higher than mobile, 64 vs 55 respectively. It’s likely this is because developers tend to minimize the navigation menus and footers on mobile to make them easier to use on smaller screens.

The most popular websites (the top 1,000 according to CrUX data) have more outgoing internal links than less popular websites. 144 on desktop vs. 110 on mobile, over two times higher than the median! This may be because of the use of mega-menus on larger sites that generally have more pages.

Bar chart showing the number of external links on homepages by percentile (10, 25, 50, 75, 90). The median desktop homepage had 7 external links compared to 6 external links on mobile. There were more external links on desktop homepages at every percentile.
Figure 8.33. External links from homepages.

External links are links from one website to a different site. The data again shows fewer external links on the mobile versions of the pages.

The numbers are nearly identical to 2020. Despite Google rolling out mobile first indexing this year, websites have not brought their mobile versions to parity with their desktop versions.

Bar chart showing the number of text links per percentile (10, 25, 50, 75, and 90). The median page contained 69 text links on desktop and 63 text links on mobile.
Figure 8.34. Text links from homepages.
Bar chart showing the number of image links per percentile (10, 25, 50, 75, and 90). The median web page contained 7 image links on desktop and 6 image links on mobile.
Figure 8.35. Image links from homepages.

While a significant portion of links on the web are text based, a portion also link images to other pages. 9.2% of links on desktop pages and 8.7% of links on mobile pages are image links. With image links, the alt attributes set for the image act as anchor text to provide additional context on what the pages are about.

In September of 2019, Google introduced attributes that allow publishers to classify links as being sponsored or user-generated content. These attributes are in addition to rel=nofollow which was previously introduced in 2005. The new attributes, rel=ugc and rel=sponsored, add additional information to the links.

Bar chart showing the usage (in percent) of rel attributes on desktop and mobile. Our data found that that 29.2% of homepages featured nofollow attributes on their desktop version and 30.7% on mobile. rel="noopener" was featured on 31.6% of desktop pages and 30.1% on mobile. rel="noreferrer" was featured on 15.8% of desktop pages and 14.8% of mobile. rel="dofollow", rel="ugc", rel="sponsored", and rel="follow" were all featured on fewer than 1% of desktop and mobile pages.
Figure 8.36. Rel attribute usage.

The new attributes are still fairly rare, at least on homepages, with rel="ugc" appearing on 0.4% of mobile pages and rel="sponsored" appearing on 0.3% of mobile pages. It’s likely these attributes are seeing more adoption on pages that aren’t homepages.

rel="follow" and rel=dofollow appear on more pages than rel="ugc" and rel="sponsored". While this is not a problem, Google ignores rel="follow" and rel="dofollow" because they aren’t official attributes.

rel="nofollow" was found on 30.7% of mobile pages, similar to last year. With the attribute used so much, it’s no surprise that Google has changed nofollow to a hint—which means they can choose whether or not they respect it.

Accelerated Mobile Pages (AMP)

2021 saw major changes in the Accelerated Mobile Pages (AMP) ecosystem. AMP is no longer required for the Top Pages carousel, no longer required for the Google News app, and Google will no longer show the AMP logo next to AMP results in the SERP.

Bar chart showing the percent of pages with AMP markup types. The Amp attribute was present on 0.09% of desktop and 0.22% of mobile pages. The Amp & Emjoi attribute was present on 0.02% of desktop and 0.04% of mobile pages. The Amp or Emjoi attribute was used on 0.10% of desktop and 0.26% of mobile pages. Lastly, the Rel Amp Tag was used on 0.82% of desktop pages and 0.75% of mobile pages.
Figure 8.37. AMP attribute usage.

However, AMP adoption continued to increase in 2021. 0.09% of desktop pages now include the AMP attribute vs 0.22% for mobile pages. This is up from 0.06% on desktop pages and 0.15% on mobile pages in 2020.

Internationalization

If you have multiple versions of a page for different languages or regions, tell Google about these different variations. Doing so will help Google Search point users to the most appropriate version of your page by language or region.
Google SEO documentation

To let search engines know about localized versions of your pages, use hreflang tags. hreflang attributes are also used by Yandex and Bing (to some extent).

Horizontal bar chart showing hreflang usage. The most popular hreflang attribute was en (English version) and hreflang attributes (across all languages) were implemented on less than 5% of desktop and mobile pages.
Figure 8.38. Top hreflang tag attributes chart.

9.0% of desktop pages and 8.4% of mobile pages use the hreflang attribute.

There are three ways of implementing hreflang information: in HTML <head> elements, Link headers, and with XML sitemaps. This data does not include data for XML sitemaps.

The most popular hreflang attribute is "en" (English version). 4.75% of mobile homepages use it and 5.32% of desktop homepages.

x-default (also called the fallback version) is used in 2.56% of cases on mobile. Other popular languages addressed by hreflang attributes are French and Spanish.

For Bing, hreflang is a “far weaker signal” than the content-language header.

As with many other SEO parameters, content-language has multiple implementation methods including:

  1. HTTP server response
  2. HTML tag
Horizontal bar chart showing percent of pages with language usage (HTML and HTTP header).
Figure 8.39. Language usage (HTML and HTTP header).

Using an HTTP server response is the most popular way of implementing content-language. 8.7% of websites use it on desktop while 9.3% on mobile.

Using the HTML tag is less popular, with content-language appearing on just 3.3% of mobile websites.

Conclusion

Websites are slowly improving from an SEO perspective. Likely due to a combination of websites improving their SEO and the platforms hosting websites also improving. The web is a big and messy place so there’s still a lot to do, but it’s nice to see consistent progress.

Authors