Skip navigation
Part I Chapter 8

Third Parties

Hero image of Web Almanac characters plugging various things into a web page.

Introduction

Website developers can use third parties to implement certain features such as advertising, analytics, social media integration, payment processing, and content delivery. A web page typically comprises resources served by the first party and various third parties. Using third parties to compose a web page allows for modular development, which enables efficient and rapid deployment of rich features but can also pose potential privacy, security, and performance issues.

In this chapter, we conduct an empirical analysis to shed light on the practice of using third parties on the web. We find that nearly all websites contain one or more third parties. We provide a breakdown of the types of resources served by these third parties, such as images, JavaScript, fonts, etc. We provide a breakdown of different categories of third parties on the Web, such as ad, analytics, CDN, video, tag manager, etc. We also provide a breakdown of how different third parties are included—directly or indirectly—on web pages.

Definitions

Before we start on our analysis, it helps to have some common definitions of what we will cover in this chapter.

Sites and pages

In this chapter, we use the term site to depict the registerable part of a given domain—often referred to as extended Top Level Domain plus one (eTLD+1). For example, given the URL https://www.bar.com/ the eTLD+1 is bar.com and for the URL https://foo.co.uk the eTLD+1 is foo.co.uk. By page (or web page), we mean a unique URL or, more specifically, the document (for example HTML or JavaScript) located at the particular URL.

What is a third party?

We stick to the aforementioned definition of a third party used in previous editions of the Web Almanac to allow for comparison between this and the previous editions.

A third party is an entity different from the site owner (aka first party). It involves the aspects of the site not directly implemented and served by the site owner. More precisely, third-party content is loaded from a different site (i.e., the third party) rather than the one originally visited by the user. Assume that the user visits example.com (the first party) and example.com includes silly cat images from awesome-cats.edu (for example using an <img> tag). In that scenario, awesome-cats.edu is the third party, as it was not originally visited by the user. However, if the user directly visits awesome-cats.edu, awesome-cats.edu is the first party.

Only third parties originating from a domain whose resources can be found on at least 50 unique pages in the HTTP Archive dataset were included to match the definition. When third-party content is directly served from a first-party domain, it is counted as first-party content. For example, self-hosted CSS or fonts are counted as first-party content. Similarly, first-party content served from a third-party domain is counted as third-party content—assuming it passes the “more than 50 pages criteria.” Some third parties serve content from different subdomains. However, regardless of the number of subdomains, they are counted as a single third party. Further, it is becoming increasingly common for third parties to be masqueraded as a first party, for example, through techniques like CNAME cloaking. We consider them a first party in this analysis. Thus, our results present a lower bound on the prevalence of third parties on the web.

Categories

As previously indicated, third parties can be used for various use cases—for example, to include videos, to serve ads, or to include content from social media sites. To categorize the observed third parties in our dataset, we rely on the third-party Web repository from Patrick Hulce. The repository breaks down third parties along the following categories:

  • Ad: These scripts are part of advertising networks, either serving or measuring.
  • Analytics: These scripts measure or track users and their actions. There’s a wide range of impact here, depending on what’s being tracked.
  • CDN: These are a mixture of publicly hosted open source libraries (for example jQuery) served over different public CDNs and private CDN usage.
  • Content: These scripts are from content providers or publishing-specific affiliate tracking.
  • Customer Success: These scripts are from customer support/marketing providers that offer chat and contact solutions. These scripts are generally heavier in weight.
  • Hosting*: These scripts are from web hosting platforms (WordPress, Wix, Squarespace, etc.).
  • Marketing: These scripts are from marketing tools that add popups/newsletters/etc.
  • Social: These scripts enable social features.
  • Tag Manager: These scripts tend to load many other scripts and initiate many tasks.
  • Utility: These scripts are developer utilities (API clients, site monitoring, fraud detection, etc.).
  • Video: These scripts enable video player and streaming functionality.
  • Consent provider: These scripts allow sites to manage the user consent (eg. for the General Data Protection Regulation compliance). They are also known as the ’Cookie Consent’ popups and are usually loaded on the critical path.
  • Other: These are miscellaneous scripts delivered via a shared origin with no precise category or attribution.

Note: The CDN category here includes providers that provide resources on public CDN domains (for example bootstrapcdn.com, cdnjs.cloudflare.com, etc.) and does not include resources that are simply served over a CDN. For example, putting Cloudflare in front of a page would not influence its first-party designation according to our criteria.

Similar to previous years, the Hosting category is removed from our analysis. For example, if you happen to use WordPress.com for your blog, or Shopify for your e-commerce platform, then we’re going to ignore other requests for those domains by that site as not truly “third-party” as they are, in many ways, part of hosting on those platforms.

Content Type

We use the Content-Type HTTP header to determine the type of the third party resources. The values of Content-Type include text/javascript or application/javascript (for scripts), text/html (for HTML content), application/json (for JSON data), text/plain (for plain text), image/png (for PNG images), image/jpeg (for JPEG images), image/gif (for GIF images), etc.

Prevalence

Figure 8.1. Percentage of pages that use one or more third parties.

There is a slight decrease in the percentage of pages that use one or more third parties for low-ranked websites. Similar to 2021 and 2022, the percentage of pages with one or more third parties remains high at 92%.

Figure 8.2. Distribution of the number of third parties by rank.

We note a considerable decrease in the number of third parties for lower-ranked websites. The median number of third-parties is 66 for the top thousand websites and 27 for the top million websites. The number of third parties on the desktop is higher than that for mobile pages. The contrast between desktop and mobile is greater for higher-ranked websites.

Figure 8.3. Distribution of the number of third party requests per page by rank.

We note that the number of third-party requests is higher for higher-ranked websites than lower-ranked websites. When looking at requests, the difference between higher- and lower-ranked websites is less skewed than when looking at the number of third-parties in figure 2.

Figure 8.4. Distribution of the third party request categories by rank.

Excluding unknown, the top categories include consent provider, video, and customer success. The most popular consent provider domain is fundingchoicesmessages.google.com, the most popular video domain is www.youtube.com, and the most customer-success domain is embed.tawk.to.

Figure 8.5. Distribution of the third party request types by rank.

The top 3 types include script, image, and other. The most popular domain under these content-types is fonts.googleapis.com.

Figure 8.6. Top third parties by the number of pages.

The top 10 third-party domains include several Google-owned domains such as googleapis.com, googletagmanager.com, google-analytics.com, google.com, and youtube.com. Meta’s facebook.com is the only non-Google domain in the top 5.

Inclusion

Recall from our earlier example that example.com (a first party) can include an image from awesome-cats.edu (a third party via an <img> tag). This inclusion of an image would be considered direct inclusion. However, if the image was loaded by a third-party script on the site via the XMLHttpRequest, then the inclusion of the image would be considered indirect inclusion. The indirectly included third parties can further include additional third parties. For example, a third-party script that is directly included on the site may further include another third-party script.

Such indirect inclusion of third parties on a page can be represented as a third-party inclusion chain. The inclusion chain can be constructed using the initiator information, identifying what triggered a particular request. We use the eTLD+1 of a third party as the node identifier in the inclusion chain. An inclusion chain might include multiple domains operated by the same company (for example: example.comgoogletagmanager.comgoogle-analytics.comdoubleclick.net) or different companies (for example: example.comgoogletagmanager.comfacebook.com).

Figure 8.7. Median depth of third-party inclusion chains.

The median depth of the inclusion chains is 3.4 of the inclusion chains are of length > 1, which means that they indirectly include at least one third party on the page. Notably, 14% of the inclusion chains are of length > 5. The inclusion chain with the highest depth has a length of 2,930.

Figure 8.8. Median depth of different categories of websites.

Across all categories, desktop pages have longer inclusion chains than mobile pages. We observe substantial differences across different website categories. The website category with the longest inclusion chains is /Games.

Figure 8.9. Google Tag Manager inclusion chain URLs.

When we look specifically at googletagmanager.com, one of the top third-party domains. Note that it includes a number of other Google domains such googleapis.com, google-analytics.com, google.com, gstatic.com, youtube.com, googlesyndication.com, and googleadservices.com. Only three of the top 10 third-party domains included by googletagmanager.com are non-Google domains, which are facebook.com and facebook.net for Meta and shopify.com for Shopify.

Conclusion

Our findings show the ubiquitous and complex nature of third-parties on the web. We find that the use of third parties on the web is more common than ever before. More than nine-in-ten web pages include one or more third-parties, often indirectly.

We find that third parties are often not directly included by the first party. Nearly one-third of third parties on all web pages are used for advertising, analytics, and consent management. Google is the most popular third party on the web, with five of the top ten third-party domains being Google domains: googleapis.com, googletagmanager.com, google.com, google-analytics.com, and youtube.com.

The inclusion of third-parties presents privacy, security, and performance implications that should be considered by web developers.

Authors

Citation

BibTeX
@inbook{WebAlmanac.2024.ThirdParties,
author = "Urban, Tobias and Vekaria, Yash and Shafiq, Zubair and Böttger, Chris and Pollard, Barry",
title = "Third Parties",
booktitle = "The 2024 Web Almanac",
chapter = 8,
publisher = "HTTP Archive",
year = "2024",
language = "English",
doi = "10.5281/zenodo.14193384",
url = "https://almanac.httparchive.org/en/2024/third-parties"
}