The 2021 Web Almanac: Security

Saptak Sengupta; Tom Van Goethem; Nurullah Demir

Part II Chapter 12

Security

Date published: 2021/12/01

Last updated: 2025/01/02

Hero image of Web Almanac characters padlocking a web page, while other Web Almanac characters subdue a masked thief who has a set of bolt cutters.

Written by Saptak Sengupta, Tom Van Goethem, and Nurullah Demir

Reviewed by Caleb Queern, Edmond W. W. Chan, and Matteo Große-Kampmann

Analyzed by Gertjan Franken

Edited by Barry Pollard

Introduction

We are becoming more and more digital today. We are not only digitizing our business but also our private life. We contact people online, send messages, share moments with friends, do our business, and organize our daily routine. At the same time, this shift means that more and more critical data is being digitized and processed privately and commercially. In this context, cybersecurity is also becoming more and more important as its goal is to safeguard users by offering availability, integrity and confidentiality of user data. When we look at today’s technology, we see that web resources are increasingly used to provide digitally delivered solutions. It also means that there is a strong link between our modern life and the security of web applications due to their widespread use.

This chapter analyzes the current state of security on the web and gives an overview of methods that the web community uses (and misses) to protect their environment. More specifically, in this report, we analyze different metrics on Transport Layer Security (HTTPS), such as general implementation, protocol versions, and cipher suites. We also give an overview of the techniques used to protect cookies. You will then find a comprehensive analysis on the topic of content inclusion and methods for thwarting attacks (e.g., use of specific security headers). We also look at how the security mechanisms are adopted (e.g., by country or specific technology). We also discuss malpractices on the web, such as Cryptojacking and, finally we look at usage of security.txt URLs.

We crawl the analyzed pages in both desktop and mobile mode, but for a lot of the data they give similar results, so unless otherwise noted, stats presented in this chapter refer to the set of mobile pages. For more information on how the data has been collected, refer to the Methodology page.

Transport security

Following the recent trend, we see continuous growth in the number of websites adopting HTTPS this year as well. Transport Layer Security is important to allow secure browsing of websites by ensuring that the resources being served to you and the data sent to the website are untampered in the transit. Almost all major browsers now come with a HTTPS-only setting and increasing warnings are shown to users when HTTP is used by a website instead of HTTPS, thus pushing broader adoption forward.

91.1%

Figure 12.1. The percentage of requests that use HTTPS on mobile.

Currently, we see that 91.9% of total requests for websites on desktop and 91.1% for mobile are being served using HTTPS. We see an increasing number of certificates being issued every day thanks to non-profit certificate authorities like Let’s Encrypt.

Currently, 84.3% of website home pages in desktop and 81.2% of website home pages in mobile are served over HTTPS so we still see a gap between websites using HTTPS and requests using HTTPS. This is because a lot of the impressive percentage of HTTPS requests are often dominated by third-party services like fonts, analytics, CDNs, and not the initial web page itself.

We do see a continuous improvement in sites using HTTPS (approximately 7-8% increase since last year), but soon a lot of unmaintained websites might start seeing warnings once browsers start adopting HTTPS-only mode by default.

Protocol versions

Transport Layer Security (TLS) is the protocol that helps make HTTP requests secure and private. With time, new vulnerabilities are discovered and fixed in TLS. Hence, it’s not just important to serve a website over HTTPS but also to ensure that modern, up-to-date TLS configuration is being used to avoid such vulnerabilities.

As part of this effort to improve security and reliability by adopting modern versions, TLS 1.0 and 1.1 have been deprecated by the Internet Engineering Task Force (IETF) as of March 25, 2021. All upstream browsers have also either completely removed support or deprecated TLS 1.0 and 1.1. For example, Firefox has deprecated TLS 1.0 and 1.1 but has not completely removed it because during the pandemic, users might need to access government websites that often still run on TLS 1.0. The user may still decide to change security.tls.version.min in browser config to decide the lowest TLS version they want the browser to allow.

Figure 12.3. TLS versions usage for sites.

60.4% of pages in desktop and 62.1% of pages in mobile are now using TLSv1.3, making it the majority protocol version over TLSv1.2. The number of pages using TLSv1.3 has increased approximately 20% since last year when we saw 43.2% and 45.4% respectively.

Cipher suites

Cipher suites are a set of algorithms that are used with TLS to help make secure connections. Modern Galois/Counter Mode (GCM) cipher modes are considered to be much more secure compared to the older Cipher Block Chaining Mode (CBC) ciphers which have shown to be vulnerable to padding attacks. While TLSv1.2 did support use of both newer and older cipher suites, TLSv1.3 does not support any of the older cipher suites. This is one reason TLSv1.3 is the more secure option for connections.

96.8%

Figure 12.4. Mobile sites using forward secrecy.

Almost all modern cipher suites support Forward Secrecy key exchange, meaning in the case that the server’s keys are compromised, old traffic that used those keys cannot be decrypted. 96.6% in desktop and 96.8% in mobile use forward secrecy. TLSv1.3 has made forward secrecy compulsory though it is optional in TLSv1.2—yet another reason it is more secure.

The other consideration apart from the cipher mode is the key size of the Authenticated Encryption and Authenticated Decryption algorithm. A larger key size will take a lot longer to compromise and the intensive computations for encryption and decryption of the connection impose little to no perceptible impact to site performance

Figure 12.5. Distribution of cipher suites.

AES_128_GCM is still the most widely used cipher suite, by a long way, with 79.4% in desktop and 78.9% in mobile usage. AES_128_GCM indicates that it uses GCM cipher mode with Advanced Encryption Standard (AES) of key size 128-bit for encryption and decryption. 128-bit key size is still considered secured, but 256-bit size is slowly becoming the industry standard to better resist brute force attacks for a longer time.

Certificate Authorities

A Certificate Authority is a company or organization that issues digital certificates which helps validate the ownership and identity of entities on the web, like websites. A Certificate Authority is needed to issue a TLS certificate recognized by browsers so that the website can be served over HTTPS. Like the previous year, we will again look into the CAs used by websites themselves rather than third-party services and resources.

Issuer	Algorithm	Desktop	Mobile
R3	RSA	46.9%	49.2%
Cloudflare Inc ECC CA-3	ECDSA	11.7%	11.5%
Sectigo RSA Domain Validation Secure Server CA	RSA	8.3%	8.2%
cPanel, Inc. Certification Authority	RSA	5.0%	5.5%
Go Daddy Secure Certificate Authority - G2	RSA	3.6%	3.0%
Amazon	RSA	3.4%	3.0%
Encryption Everywhere DV TLS CA - G1	RSA	1.3%	1.6%
AlphaSSL CA - SHA256 - G2	RSA	1.2%	1.2%
RapidSSL TLS DV RSA Mixed SHA256 2020 CA-1	RSA	1.2%	1.1%
DigiCert SHA2 Secure Server CA	RSA	1.1%	0.9%

Figure 12.6. Top 10 certificate issuers for websites.

Let’s Encrypt has changed their subject common name from “Let’s Encrypt Authority X3” to just “R3” to save bytes in new certificates. So, any SSL certificates signed by R3 are issued by Let’s Encrypt. Thus, like previous years, we see Let’s Encrypt continue to lead the charts with 46.9% of desktop websites and 49.2% of mobile sites using certificates issued by them. This is up 2-3% from last year. Its free, automated certificate generation has played a game-changing role in making it easier for everyone to serve their websites over HTTPS.

Cloudflare continues to be in second position with its similarly free certificates for its customers. Also, Cloudflare CDNs increase the usage of Elliptic Curve Cryptography (ECC) certificates which are smaller and more efficient than RSA certificates but are often difficult to deploy, due to the need to also continue to serve non-ECC certificates to older clients. Using a CDN like Cloudflare takes care of that complexity for you. All the latest browsers are compatible with ECC certificates, though some browsers like Chrome depend on the OS. So, if someone uses Chrome in an old OS like Windows XP, then they need to fall back to non-ECC certificates.

HTTP Strict Transport Security

HTTP Strict Transport Security (HSTS) is a response header that tells the browser that it should always use secure HTTPS connections to communicate with the website.

22.2%

Figure 12.7. The percentage of requests that have HSTS header on mobile.

The Strict-Transport-Security header helps convert a http:// URL to a https:// URL before a request is made for that site. 22.2% of the mobile responses and 23.9% of desktop responses have a HSTS header.

HSTS Directive	Desktop	Mobile
Valid `max-age`	92.7%	93.4%
`includeSubdomains`	34.5%	33.3%
`preload`	17.6%	18.0%

Figure 12.8. Usage of HSTS directives.

Out of the sites with HSTS header, 92.7% in desktop and 93.4% in mobile have a valid max-age (that is, the value is non-zero and non-empty) which determines how many seconds the browser should only visit the website over HTTPS.

33.3% of request responses for mobile, and 34.5% for desktop include includeSubdomain in the HSTS settings. The number of responses with the preload directive is lower because it is not part of the HSTS specification and needs a minimum max-age of 31,536,000 seconds (or 1 year) and also the includeSubdomain directive to be present.

HSTS max-age values for all requests (in days). — Figure 12.9. HSTS `max-age` values for all requests (in days).

The median value for max-age attribute in HSTS headers over all requests is 365 days in both mobile and desktop. https://hstspreload.org/ recommends a max-age of 2 years once the HSTS header is set up properly and verified to not cause any issues.

Cookies

An HTTP cookie is a small piece of information about the user accessing the website that the server sends to the web browser. Browsers store this information and send it back with subsequent requests to the server. Cookies help in session management to maintain state information of the user, such as if the user is currently logged in.

Without properly securing cookies, an attacker can hijack a session and send unwanted changes to the server by impersonating the user. It can also lead to Cross-Site Request Forgery attacks, whereby the user’s browser inadvertently sends a request, including the cookies, unbeknownst to the user.

Several other types of attacks rely on the inclusion of cookies in cross-site requests, such as Cross-Site Script Inclusion (XSSI) and various techniques in the XS-Leaks vulnerability class.

You can ensure that cookies are sent securely and aren’t accessed by unintended parties or scripts by adding certain attributes or prefixes.

Figure 12.10. Cookie attributes (desktop).

`Secure`

Cookies that have the Secure attribute set will only be sent over a secure HTTPS connection, preventing them from being stolen in a Manipulator-in-the-middle attack. Similar to HSTS, this also helps enhance the security provided by TLS protocols. For first-party cookies, just over 30% of the cookies in both desktop and mobile have the Secure attribute set. However, we do see a significant increase in the percentage of third-party cookies in desktop having the Secure attribute from 35.2% last year to 67.0% this year. This increase is likely due to the Secure attribute being a requirement for SameSite=none cookies, that we will discuss below.

`HttpOnly`

A cookie that has the HttpOnly attribute set cannot be accessed through the document.cookie API in JavaScript. Such cookies can only be sent to the server and helps in mitigating client-side Cross-Site Scripting (XSS) attacks that misuse the cookie. It’s used for cookies that are only needed for server-side sessions. The percentage of cookies with HttpOnly attribute has a smaller difference between first-party cookies and third-party compared to the other cookie attributes being used by 32.7% and 20.0% respectively.

`SameSite`

The SameSite attribute in cookies allows the websites to inform the browser when and whether to send a cookie with cross-site requests. This is used to prevent cross-site request forgery attacks. SameSite=Strict allows the cookie to be sent only to the site where it originated. With SameSite=Lax, cookies are not sent to cross-site requests unless a user is navigating to the origin site by following a link. SameSite=None means cookies are sent in both originating and cross-site requests.

We see that 58.5% of all first-party cookies with a SameSite attribute have the attribute set to Lax while there is still a pretty daunting 39.1% cookies where SameSite attribute is set to none—although the number is steadily decreasing. Almost all current browsers now default to SameSite=Lax if no SameSite attribute is set. Approximately 65% of overall first-party cookies have no SameSite attribute.

Prefixes

Cookie prefixes __Host- and __Secure- help mitigate attacks to override the session cookie information for a session fixation attack. __Host- helps in domain locking a cookie by requiring the cookie to also have Secure attribute, Path attribute set to /, not have Domain attribute and to be sent from a secure origin. __Secure- on the other hand requires the cookie to only have Secure attribute and to be sent from a secure origin.

Type of cookie	`__Secure`	`__Host`
First-party	0.02%	0.01%
Third-party	< 0.01%	0.03%

Figure 12.11. Usage of __Secure and __Host cookie prefixes in mobile.

Though both the prefixes are used in a significantly lower percentage of cookies, __Secure- is more commonly found in first-party cookies due to its lower prerequisites.

Permanent cookies are deleted at a date specified by the Expires attribute, or after a period of time specified by the Max-Age attribute. If both Expires and Max-Age are set, Max-Age has precedence.

Figure 12.12. Cookie age usage in days (mobile).

We see that the median Max-Age is 365 days, as we see about 20.5% of the cookies with Max-Age have the value 31,536,000. However, 64.2% of the first-party cookies have Expires and 23.3% have Max-Age. Since Expires is much more dominant among cookies, the median for real maximum age is the same as Expires (180 days) instead of Max-Age as you would expect.

Content inclusion

Most websites have quite a lot of media and CSS or JavaScript libraries that more often than not are loaded from various different external sources, CDNs or cloud storage services. It’s important for the security of the website as well as the security of the users of a website to ensure which source of content can be trusted. Otherwise, the website is vulnerable to cross-site scripting attacks if untrusted content gets loaded.

Content Security Policy

Content Security Policy (CSP) is the predominant method used to mitigate cross-site scripting and data injection attacks by restricting the origins allowed to load various content. There are numerous directives that can be used by the website to specify sources for different kinds of content. For instance, script-src is used to specify origins or domains from which scripts can be loaded. It also has other values to define if inline scripts and eval() functions are allowed.

Figure 12.13. Most common directives used in CSP.

We see more and more websites starting to use CSP with 9.3% home pages on mobile using CSP now compared to 7.2% last year. upgrade-insecure-requests continues to be the most frequent CSP used. The high adoption rate for this policy is likely because of the same reasons mentioned last year; it is an easy, low-risk, policy that helps in upgrading all HTTP requests to HTTPS and also helps with to block mixed content being used on the page. frame-ancestors is a close second, which helps one define valid parents that may embed a page.

The adoption of policies defining the sources from which content can be loaded continues to be low. Most of these policies are more difficult to implement, as they can cause breakages. They require effort to implement to define nonce, hashes or domains for allowing external content.

While a strict CSP is a strong defense against attacks, they can lead to undesirable effects and prevent valid content from loading, if the policy is incorrectly defined. Different libraries and APIs loading further content makes this even more difficult.

Lighthouse recently started flagging severity warnings when such directives are missing from CSP, encouraging people to adopt a stricter CSP to prevent XSS attacks. We will discuss more about how CSP helps in stopping XSS attacks in the thwarting attacks section of this chapter.

To allow web developers to evaluate the correctness of their CSP policy, there is also a non-enforcing alternative, which can be enabled by defining the policy in the Content-Security-Policy-Report-Only response header. The prevalence of this header is still fairly small: 0.9% in mobile. However, most of the time this header is added in the testing phase and later is replaced by the enforcing CSP, so the low usage is not unexpected.

Sites can also use the report-uri directive to report any CSP violations to a particular link that is able to parse the CSP errors. These can help after a CSP directive has been added to check if any valid content is accidentally being blocked by the new directive. The drawback of this powerful feedback mechanism is that CSP reporting can be noisy due to browser extensions and other technology outside of the website owner’s control.

The median length of CSP headers continue to be pretty low: 75 bytes. Most websites still use single directives for specific purposes, instead of long strict CSPs. For instance, 24.2% of websites only have upgrade-insecure-requests directives.

43,488

Figure 12.15. Bytes in the longest CSP observed.

On the other side of the spectrum, the longest CSP header is almost twice as long as last year’s longest CSP header: 43,488 bytes.

Origin	Desktop	Mobile
https://www.google-analytics.com	0.29%	0.22%
https://www.googletagmanager.com	0.26%	0.22%
https://fonts.googleapis.com	0.22%	0.16%
https://fonts.gstatic.com	0.20%	0.15%
https://www.google.com	0.19%	0.14%
https://www.youtube.com	0.19%	0.13%
https://connect.facebook.net	0.16%	0.11%
https://stats.g.doubleclick.net	0.15%	0.11%
https://www.gstatic.com	0.14%	0.11%
https://cdnjs.cloudflare.com	0.12%	0.10%

Figure 12.16. Most frequently allowed hosts in CSP policies.

The most common origins used in *-src directives continue to be heavily dominated by Google (fonts, ads, analytics). We also see Cloudflare’s popular library CDN showing up in the 10th position this year.

Subresource Integrity

A lot of websites, load JavaScript libraries and CSS libraries from external CDNs. This can have certain security implications if the CDN is compromised, or an attacker finds some other way to replace the frequently used libraries. Subresource Integrity (SRI) helps in avoiding such consequences, though it introduces other risks if the website may not function without that resource for a non-malicious change. Self-hosting instead of loading from a third party is usually a safer option where possible.

66.2%

Figure 12.17. Usage of SHA384 hash function for SRI in mobile.

Web developers can add the integrity attribute to <script> and <link> tags which are used to include JavaScript and CSS code to the website. The integrity attribute consists of a hash of the expected content of the resource. The browser can then compare the hash of the fetched content and hash mentioned in the integrity attribute to check its validity and only render the resource if they match.

<script src="https://code.jquery.com/jquery-3.6.0.min.js"
  integrity="sha256-/xUj+3OJU5yExlq6GSYGSHk7tPXikynS7ogEvDej/m4="
  crossorigin="anonymous"></script>

The hash can be computed with three different algorithms: SHA256, SHA384, and SHA512. SHA384 (66.2% in mobile) is currently the most used, followed by SHA256 (31.1% in mobile). Currently, all three hashing algorithms are considered safe to use.

82.6%

Figure 12.18. Percentage of SRI in <script> elements for mobile.

There has been some increase in the usage of SRI over the past couple of years, with 17.5% elements in desktop and 16.1% elements in mobile containing the integrity attribute. 82.6% of those were in the <script> element for mobile.

Figure 12.19. Subresource integrity: coverage per page.

However, it still is a minority option for <script> elements. The median percentage of <script> elements on websites which have an integrity attribute is 3.3%.

Host	Desktop	Mobile
www.gstatic.com	44.3%	44.1%
cdn.shopify.com	23.4%	23.9%
code.jquery.com	7.5%	7.5%
cdnjs.cloudflare.com	7.2%	6.9%
stackpath.bootstrapcdn.com	2.7%	2.7%
maxcdn.bootstrapcdn.com	2.2%	2.3%
cdn.jsdelivr.net	2.1%	2.1%

Figure 12.20. Most common hosts from which SRI-protected scripts are included.

Among the common hosts from which SRI-protected scripts are included, we see most of them are made up of CDNs. We see that there are three very common CDNs that are used by multiple websites when using different libraries: jQuery, cdnjs, and Bootstrap. It is probably not coincidental that all three of these CDNs have the integrity attribute in their example HTML code, so when developers use the examples to embed these libraries, they are ensuring that SRI-protected scripts are being loaded.

Permissions Policy

All browsers these days provide a myriad of APIs and functionalities, which can be used for tracking and malicious purposes, thus proving detrimental to the privacy of the users. Permissions Policy is a web platform API that gives a website the ability to allow or block the use of browser features in its own frame or in iframes that it embeds.

The Permissions-Policy response header allows websites to decide which features they want to use and also which powerful features they want to disallow on the website to limit misuse. A Permissions Policy can be used to control APIs like Geolocation, User media, Video autoplay, Encrypted media decoding and many more. While some of these APIs do require browser permission from the user—a malicious script can’t turn on the microphone without the user getting a permission pop up—it’s still good practice to use Permission Policy to restrict usage of certain features completely if they are not required by the website.

This API specification was previously known as Feature Policy but as well as the rename there have been many other updates. Though the Feature-Policy response header is still in use, it is pretty low with only 0.6% of websites in mobile using it. The Permissions-Policy response headers contains an allow list for different APIs. For example, Permissions-Policy: geolocation=(self "https://example.com") means that the website disallows the use of Geolocation API except for its own origin and those whose origin is “https://example.com”. One can disable the use of an API entirely in a website by specifying an empty list, e.g., Permissions-Policy: geolocation=().

We see 1.3% of websites on the mobile using the Permissions-Policy already. A possible reason for this higher than expected usage of this new header, could be some website admins choosing to opt-out of Federated Learning of Cohorts or FLoC (which was experimentally implemented in Chrome) to protect user’s privacy. The privacy chapter has a detailed analysis of this.

Directive	Desktop	Mobile
`encrypted-media`	46.8%	45.0%
`conversion-measurement`	39.5%	36.1%
`autoplay`	30.5%	30.1%
`picture-in-picture`	17.8%	17.2%
`accelerometer`	16.4%	16.0%
`gyroscope`	16.4%	16.0%
`clipboard-write`	11.2%	10.9%
`microphone`	4.3%	4.5%
`camera`	4.2%	4.4%
`geolocation`	4.0%	4.3%

Figure 12.21. Prevalence of allow directives on frames.

One can also use the allow attribute in <iframe> elements to enable or disable features allowed to be used in the embedded frame. 18.3% of 16.8 million frames in mobile contained the allow attribute to enable permission or feature policies.

An earlier version of this chapter reported incorrect values for the total number of frames and the percentage of frames with the allow attribute. These errors have now been corrected. More information can be found in this GitHub PR.

As in previous years, the most used directives in allow attributes on iframes are still related to controls for embedded videos and media. The most used directive continues to be encrypted-media which is used to control access to the Encrypted Media Extensions API.

Iframe sandbox

An untrusted third-party in an iframe could launch a number of attacks on the page. For instance, it could navigate the top page to a phishing page, launch popups with fake anti-virus advertisements and other cross-frame scripting attacks.

The sandbox attribute on iframes applies restrictions to the content, and therefore reduces the opportunities for launching attacks from the embedded web page. The value of the attribute can either be empty to apply all restrictions (the embedded page cannot execute any JavaScript code, no forms can be submitted, and no popups can be created, to name a few restrictions), or space-separated tokens to lift particular restrictions. As embedding third-party content such as advertisements or videos via iframes is common practice on the web, it is not surprising that many of these are restricted via the sandbox attribute: 19.7% of the iframes on desktop pages have a sandbox attribute while on mobile pages this is 21.0%.

An earlier version of this chapter reported incorrect values for the percentage of frames with the sandbox attribute. More information can be found in this GitHub PR.

Figure 12.22. Prevalence of sandbox directives on frames.

The most commonly used directive, allow-scripts, which is present in 99.98% of all sandbox policies on desktop pages, allows the embedded page to execute JavaScript code. The other directive that is present on virtually all sandbox policies, allow-same-origin, allows the embedded page to retain its origin and, for example, access cookies that were set on that origin.

Thwarting attacks

Web applications can be vulnerable to multiple attacks. Fortunately, there exist several mechanisms that can either prevent certain classes of vulnerabilities (e.g., framing protection through X-Frame-Options or CSP’s frame-ancestors directive is necessary to combat clickjacking attacks), or limit the consequences of an attack. As most of these protections are opt-in, they still need to be enabled by the web developers—typically by setting the correct response header. At large scale, the presence of the headers can tell us something about the security hygiene of websites and the incentives of the developers to protect their users.

Security feature adoption

Figure 12.23. Adoption of security headers for site requests in mobile pages.

Perhaps the most promising and uplifting finding of this chapter is that the general adoption of security mechanisms continues to grow. Not only does this mean that attackers will have a more difficult time exploiting certain websites, but it is also indicative that more and more developers value the security of the web products they build. Overall, we can see a relative increase in the adoption of security features of 10-30% compared to last year. The security-related mechanism with the most uptake is the Report-To header of the Reporting API, with almost a 4x increased adoption rate, from 2.6% to 12.2%.

Although this continued increase in the adoption rate of security mechanisms is certainly outstanding, there still remains quite some room for improvement. The most widely used security mechanism is still the X-Content-Type-Options header, which is used on 36.6% of the websites we crawled on mobile, to protect against MIME-sniffing attacks. This header is followed by the X-Frame-Options header, which is enabled on 29.4% of all sites. Interestingly, only 5.6% of websites use the more flexible frame-ancestors directive of CSP.

Another interesting evolution is that of the X-XSS-Protection header. The feature is used to control the XSS filter of legacy browsers: Edge and Chrome retired their XSS filter in July 2018 and August 2019 respectively as it could introduce new unintended vulnerabilities. Yet, we found that the X-XSS-Protection header was 8.5% more prevalent than last year.

Features enabled in `<meta>` element

In addition to sending a response header, some security features can be enabled in the HTML response body by including a <meta> element with the name attribute set to http-equiv. For security purposes, only a limited number of policies can be enabled this way. More precisely, only a Content Security Policy and Referrer Policy can be set via the <meta> tag. Respectively we found that 0.4% and 2.6% of the mobile sites enabled the mechanism this way.

3,410

Figure 12.24. Number of sites with X-Frame-Options in the <meta> tag, which is actually ignored by the browser.

When any of the other security mechanisms are set via the <meta> tag, the browser will actually ignore this. Interestingly, we found 3,410 sites that tried to enable X-Frame-Options via a <meta> tag, and thus were wrongly under the impression that they were protected from clickjacking attacks. Similarly, several hundred websites failed to deploy a security feature by placing it in a <meta> tag instead of a response header (X-Content-Type-Options: 357, X-XSS-Protection: 331, Strict-Transport-Security: 183).

Stopping XSS attacks via CSP

CSP can be used to protect against a multitude of things: clickjacking attacks, preventing mixed-content inclusion and determining the trusted sources from which content may be included (as discussed above).

Additionally, it is an essential mechanism to defend against XSS attacks. For instance, by setting a restrictive script-src directive, a web developer can ensure that only the application’s JavaScript code is executed (and not the attacker’s). Moreover, to defend against DOM-based cross-site scripting, it is possible to use Trusted Types, which can be enabled by using CSP’s require-trusted-types-for directive.

Keyword	Desktop	Mobile
`strict-dynamic`	5.2%	4.5%
`nonce-`	12.1%	17.6%
`unsafe-inline`	96.2%	96.5%
`unsafe-eval`	82.9%	77.2%

Figure 12.25. Prevalence of CSP keywords based on policies that define a default-src or script-src directive.

Although we saw an overall moderate increase (17%) in the adoption of CSP, what is perhaps even more exciting is that the usage of the strict-dynamic and nonces is either keeping the same trend or is slightly increasing. For instance, for desktop sites the use of strict-dynamic grew from 2.4% last year, to 5.2% this year. Similarly, the use of nonces grew from 8.7% to 12.1%.

On the other hand, we find that the usage of the troubling directives unsafe-inline and unsafe-eval is still fairly high. However, it should be noted that if these are used in conjunction with strict-dynamic, modern browsers will ignore these values, while older browsers without strict-dynamic support can still continue to use the website.

Defending against XS-Leaks

Various new security features have been introduced to allow web developers to defend their websites against micro-architectural attacks, such as Spectre, and other attacks that are typically referred to as XS-Leaks. Given that many of these attacks were only discovered in the last few years, the mechanisms used to tackle them obviously are very recent as well, which might explain the relatively low adoption rate. Nevertheless, compared to last year, the cross-origin policies have significantly increased in adoption.

The Cross-Origin-Resource-Policy, which is used to indicate to the browser how a resource should be included (cross-origin, same-site or same-origin), is now present on 106,443 (1.5%) sites, up from 1,712 sites last year. The most likely explanation for this is that cross-origin isolation is a requirement for using features such as SharedArrayBuffer and high-resolution timers and that requires setting the site’s Cross-Origin-Embedder-Policy to require-corp. In essence, this requires all loaded subresources to set the Cross-Origin-Resource-Policy response header for those sites wishing to use those features.

Consequently, several CDNs now set the header with a value of cross-origin (as CDN resources are typically meant to be included in a cross-site context). We can see that this is indeed the case, as 96.8% of sites set the CORP header value to cross-origin, compared to 2.9% that set it to same-site and 0.3% that use the more restrictive same-origin.

With this change, it is no surprise that the adoption of Cross-Origin-Embedder-Policy is also steadily increasing: in 2021, 911 sites enabled this header—significantly more than the 6 sites of last year. It will be interesting to see how this will further develop next year!

Finally, another anti-XS-Leak header, Cross-Origin-Opener-Policy, has also seen a significant boost compared to last year. We found 15,727 sites that now enable this security mechanism, which is a significant increase compared to last year when only 31 sites were protected from certain XS-Leak attacks.

Web Cryptography API

Security has become one of the central issues in web development. The Web Cryptography API W3C recommendation was introduced in 2017 to perform basic cryptographic operations (e.g., hashing, signature generation and verification, and encryption and decryption) on the client-side, without any third-party library. We analyzed the usage of this JavaScript API.

Cryptography API	Desktop	Mobile
`CryptoGetRandomValues`	70.4%	67.4%
`SubtleCryptoDigest`	0.4%	0.5%
`SubtleCryptoEncrypt`	0.4%	0.3%
`CryptoAlgorithmSha256`	0.3%	0.3%
`SubtleCryptoGenerateKey`	0.3%	0.2%
`CryptoAlgorithmAesGcm`	0.2%	0.2%
`SubtleCryptoImportKey`	0.2%	0.2%
`CryptoAlgorithmAesCtr`	0.1%	< 0.1%
`CryptoAlgorithmSha1`	0.1%	0.1%
`CryptoAlgorithmSha384`	0.1%	0.2%

Figure 12.26. Top used cryptography APIs.

The popularity of the functions remains almost the same as the previous year: we record only a slight increase of 0.7% (from 71.8% to 72.5%). Again, this year Cypto.getRandomValues is the most popular cryptography API. It allows developers to generate strong pseudo-random numbers. We still believe that Google Analytics has a major effect on its popularity since the Google Analytics script utilizes this function.

It should be noted that since we perform passive crawling, our results in this section will be limited by not being able to identify cases where any interaction is required before the functions are executed.

Utilizing bot protection services

Many cyberattacks are based on automated bot attacks and interest in it seems to have increased. According to the Bad Bot Report 2021 by Imperva, the number of bad bots has increased this year by 25.6%. Note that the increase from 2019 to 2020 was 24.1%—according to the previous report. In the following table, we present our results on using measures by websites to protect themselves from malicious bots.

Service provider	Desktop	Mobile
reCAPTCHA	10.2%	9.4%
Imperva	0.3%	0.3%
Sift	0.1%	0.1%
Signifyd	0.03%	0.03%
hCaptcha	0.03%	0.02%
Forter	0.03%	0.03%
TruValidate	0.03%	0.02%
Akamai Web Application Protector	0.02%	0.02%
Kount	0.02%	0.02%
Konduto	0.02%	0.02%
PerimeterX	0.02%	0.01%
Tencent Waterproof Wall	0.01%	0.01%
Others	0.03%	0.04%

Figure 12.27. Usage of bot protection services by provider.

Our analysis shows that under 10.7% of desktop websites, and 9.9% of mobile websites use a mechanism to fight malicious bots. Last year those numbers were 8.3% and 7.3%, so this is approximately a 30% increase compared to the previous year. This year, too, we identified more bot protection mechanisms for desktop versions than mobile versions (10.8% vs. 9.9%)

We also see new popular players as bot protection providers in our dataset (e.g., hCaptcha).

Drivers of security mechanism adoption

There are many different influences that might cause a website to invest more in their security posture. Examples of such factors are societal (e.g., more security-oriented education in certain countries, or laws that take more punitive measures in case of a data breach), technological (e.g., it might be easier to adopt security features in certain technology stacks, or certain vendors might enable security features by default), or threat-based (e.g., widely popular websites may face more targeted attacks than a website that is little known). In this section, we try to assess to what extent these factors influence the adoption of security features.

Where website’s visitors connect from

Figure 12.28. Adoption of HTTPS per country.

Although we can see that the adoption of HTTPS-by-default is generally increasing, there is still a discrepancy in adoption rate between sites depending on the country most of the visitors originate from.

We find that compared to last year, the Netherlands has now made it into the top 5, which means that the Dutch are relatively more protected against transport layer attacks: 95.1% of the sites frequently visited by people in the Netherlands has HTTPS enabled (compared to 93.0% last year). In fact, not only the Netherlands improved in the adoption of HTTPS; we find that virtually every country improved in that regard.

It is also very encouraging to see that several of the countries that performed worst last year, made a big leap. For instance, 13.4% more sites visited by people from Iran (the strongest riser with regards to HTTPS adoption) are now HTTPS-enabled compared to last year (from 74.3% to 84.3%). Although the gap between the best-performing and least-performing countries is becoming smaller, there are still significant efforts to be made.

Figure 12.29. Adoption of CSP and XFO per country.

When looking at the adoption of certain security features such as CSP and X-Frame-Options, we can see an even more pronounced difference between the different countries, where the sites from top-scoring countries are 2-4 times more likely to adopt these security features compared to the least-performing countries. We also find that countries that perform well on HTTPS adoption tend to also perform well on the adoption of other security mechanisms. This is indicative that security is often thought of holistically, where all different angles need to be covered. And rightfully so: an attacker just needs to find a single exploitable vulnerability whereas developers need to ensure that every aspect is tightly protected.

Technology stack

Technology	Security features enabled by default
Automattic (PaaS)	Strict-Transport-Security (97.8%)
Blogger (Blogs)	X-Content-Type-Options (99.6%), X-XSS-Protection (99.6%)
Cloudflare (CDN)	Expect-CT (93.1%), Report-To (84.1%)
Drupal (CMS)	X-Content-Type-Options (77.9%), X-Frame-Options (83.1%)
Magento (E-commerce)	X-Frame-Options (85.4%)
Shopify (E-commerce)	Content-Security-Policy (96.4%), Expect-CT (95.5%), Report-To (95.5%), Strict-Transport-Security (98.2%), X-Content-Type-Options (98.3%), X-Frame-Options (95.2%), X-XSS-Protection (98.2%)
Squarespace (CMS)	Strict-Transport-Security (87.9%), X-Content-Type-Options (98.7%)
Sucuri (CDN)	Content-Security-Policy (84.0%), X-Content-Type-Options (88.8%), X-Frame-Options (88.8%), X-XSS-Protection (88.7%)
Wix (Blogs)	Strict-Transport-Security (98.8%), X-Content-Type-Options (99.4%)

Figure 12.30. Security features adoption by various technology.

Another factor that can strongly influence the adoption of certain security mechanisms is the technology stack that’s being used to build a website. In some cases, security features may be enabled by default, or for some blogging systems the control over the response headers may be out of the hands of the website owner and a platform-wide security setting may be in place.

Alternatively, CDNs may add additional security features, especially when these concern the transport security. In the above table, we’ve listed the nine technologies that are used by at least 25,000 sites, and that have a significantly higher adoption rate of specific security mechanisms. For instance, we can see that sites that are built with the Shopify e-commerce system have a very high (over 95%) adoption rate for seven security-relevant headers: Content-Security-Policy, Expect-CT, Report-To, Strict-Transport-Security, X-Content-Type-Options, X-Frame-Options, and X-XSS-Protection.

7

Figure 12.31. The number of security features with over 95% adoption rate on Shopify sites.

It is great to see that despite the variability in these content that use these technologies, it is still possible to uniformly adopt these security mechanisms.

83.1%

Figure 12.32. The percentage of Drupal sites that keep the default XFO header.

Another interesting entry in this list is Drupal, whose websites have an adoption rate of 83.1% for the X-Frame-Options header (a slight improvement compared to last year’s 81.8%). As this header is enabled by default, it is clear that the majority of Drupal sites stick with it, protecting them from clickjacking attacks. Note that, while it makes sense to keep the X-Frame-Options header for compatibility with older browsers in the near term, site owners should consider transitioning to the recommended Content-Security-Policy header directive frame ancestors for the same functionality.

An important aspect to explore in the context of the adoption of security features, is the diversity. For instance, as Cloudflare is the largest CDN provider, powering millions of websites (see the CDN chapter for further analysis on this). Any feature that Cloudflare enables by default will result in a large overall adoption rate. In fact, 98.2% of the sites that employ the Expect-CT feature are powered by Cloudflare, indicating a fairly limited distribution in the adoption of this mechanism.

However, overall, we find that this phenomenon of a single actor like a Drupal or Cloudflare being a top technological driver of a security feature’s adoption is an outlier and appears less common over time. This means that an increasingly diverse set of websites is adopting security mechanisms, and that more and more web developers are becoming aware of their benefits. For example, last year 44.3% of the sites that set a Content Security Policy were powered by Shopify, whereas this year, Shopify is only responsible for 32.9% of all sites that enable CSP. Combined with the generally growing adoption rate, this is great news!

Website popularity

Websites that have many visitors may be more prone to targeted attacks given that there are more users with potentially sensitive data to attract attackers. Therefore, it can be expected that widely visited websites invest more in security in order to safeguard their users. To evaluate whether this hypothesis is valid, we used the ranking provided by the Chrome User Experience Report, which uses real-world user data to determine which websites are visited the most (ranked by top 1k, 10k, 100k, 1M and all sites in our dataset).

Figure 12.33. Prevalence of security headers set in a first-party context by rank.

We can see that the adoption of certain security features, X-Frame-Options (XFO), Content Security Policy (CSP), and Strict Transport Security (HSTS), is highly related to the ranking of sites. For instance, the 1,000 top visited sites are almost twice as likely to adopt a certain security header compared to the overall adoption. We can also see that the adoption rate for each feature is higher for higher-ranked websites.

We can draw two conclusions from this: on the one hand, having better “security hygiene” on sites that attract more visitors benefits a larger fraction of users (who might be more inclined to share their personal data with well-known trusted sites). On the other hand, the lower adoption rate of security features on less-visited sites could be indicative that it still requires a substantial investment to (correctly) implement these features. This investment may not always be feasible for smaller websites. Hopefully, we will see a further increase in security features that are enabled by default in certain technology stacks, which could further enhance the security of many sites without requiring too much effort from web developers.

Malpractices on the web

Cryptocurrencies have become an increasingly familiar part of our modern community. Global cryptocurrency adoption has been skyrocketing since the beginning of the pandemic. Due to its economic efficiency, cybercriminals have also become more interested in cryptocurrencies. That has led to the creation of a new attack vector: cryptojacking. Attackers have discovered the power of WebAssembly and exploited it to mine cryptocurrencies while website visitors surf on a website.

We now show our findings in the following figure regarding cryptominer usage on the web.

According to our dataset, until recently, we found a very stable decrease in the number of websites with Cryptominer. However, we are now seeing that the number of such websites has increased more than tenfold in the past two months. Such picks are very typical, for example, when widespread cryptojacking attacks take place or when a popular JS library has been infected.

We now turn to cryptominer market share in the following figure.

Figure 12.35. Cryptominer market share (mobile).

We see that Coinhive has been surpassed by CoinImp as the dominant cryptomining service. One of the main reasons for this was that Coinhive was shutdown in March 2019. Interestingly, the domain is now owned by Troy Hunt who is now displaying aggressive banners on the website in an effort to make those sites still hosting the Coinhive script (Desktop: 5.7%, mobile: 9.0%) aware that they are—often without their knowledge. This reflects both the prevalence of Coinhive scripts even over two years after ceasing to operate, and the risks of hosting third-party resources that can be taken over should that third party cease to operate. With Coinhive’s demise, CoinImp has clearly become the market leader (84.9% share).

Our results suggest that cryptojacking is still a serious attack vector, and necessary measures should be used for it.

Note that not all of these websites are infected. Website operators may also deploy this technique (instead of showing ads) to finance their website. But the use of this technique is also heavily discussed technically, legally, and ethically.

Please also note that our results may not show the actual state of the websites infected with cryptojacking. Since we run our crawler once a month, not all websites that run cryptominer can be discovered. This is the case, for example, if a website remains infected for only X days and not on the day our crawler ran.

`security.txt`

security.txt is a file format for websites to provide a standard for vulnerability reporting. Website providers can provide contact details, PGP key, policy, and other information in this file. White hat hackers can then use this information to conduct security analyses on these websites or report a vulnerability.

We see that just under 5% of the websites return a response when asking for the /.well-known/security.txt URL. However investigating many of these show they are basically 404 pages that are incorrectly returning a 200 status code so usage is likely much lower.

Figure 12.37. Use of security.txt properties.

We see that Policy is the most used property in the security.txt files, but even then it’s only used in 6.4% of sites with a security.txt URL. This property includes a link to the vulnerability disclosure policy for the website that helps researchers understand the reporting practices they need to follow. This is therefore likely a better indicator of the real usage of security.txt since most file are expected to have a Policy value, meaning likely closer to 0.3% of all sites have a “real” security.txt file, rather than the 5% measured above.

Another interesting point is that when we look at just this subset of “real” security.txt URLs, Tumblr makes up 63%-65% of the usage. It looks like this is set by default for these domains to the Tumblr contact details. This is great on one hand to show how a single platform can drive adoption of these new security features, but on the other hand indicates a further reduction in actual site usage.

The other most used properties include Canonical and Encryption. Canonical is used to indicate where the security.txt file is located. If the URI used to retrieve the security.txt file doesn’t match the list URIs in the Canonical fields, then the contents of the file should not be trusted. Encryption provides the security researchers with an encryption key that they can use for encrypted communication.

Conclusion

Our analysis shows that the situation of web security concerning the provider side is improving compared to previous years. For example, we see that the use of HTTPS has increased by almost 10% in the last 12 months. We also find an increase in the protection of cookies and the use of security headers.

These increases indicate we are moving safer web environment, but they do not mean our web is secure enough today. We still have to improve our situation. For example, we believe that the web community should value security headers more. These are very effective extensions to protect web environments and web users from possible attacks.

The bot protection mechanisms can also be adopted more to protect the platforms from malicious bots. Furthermore, our analysis from last year and another study using the HTTP Archive dataset about the update behavior of websites showed that the website components are not diligently maintained, which increases the attack surface on web environments.

We should not forget that attackers are also working diligently to develop new techniques to bypass the security mechanisms we adopt.

With our analysis, we have tried to crystallize an overview of the security of our web. As extensive as our investigation is, our methodology only allows us to see a subset of all aspects of modern web security. For example, we do not know what additional measures a site may employ to mitigate or prevent attacks such as Cross-Site-Request-Forgery (CSRF) or certain types of Cross-Site-Scripting (XSS). As such, the picture portrayed in this chapter is incomplete yet a solid directional signal of the status of web security today.

The takeaway from our analysis is that we, the web community, must continue to invest more interest and resources in making our web environments much safer—in the hope of better and safer tomorrow for all.