DOM-based XSS


Document Object Model-based Cross-site Scripting (DOM-based XSS) is a lesser-known form of XSS. It’s different from reflected and stored XSS because the exploit happens entirely on the client-side and does not conceptually require a server-side vulnerability.

What is the DOM?

The Document Object Model (DOM) is the data representation of the objects that comprise the structure and content of a document on the web.

A Web page is a document. This document can be either displayed in the browser window or as the HTML source. But it is the same document in both cases. The Document Object Model (DOM) represents that same document so it can be manipulated. The DOM is an object-oriented representation of the web page, which can be modified with a scripting language such as JavaScript.

Exploitation

Overview

The exploit relies on client-side Javascript code which inserts untrusted data into an HTML document via the DOM API, hence the term “DOM-based XSS”. The source of the untrusted data could be a page URL, an HTTP referrer or something similar which can be set by another web page.

In the following examples, the source of the data is the hash component of the URL, document.location.hash.

Caveats

In the case of modern browsers, DOM-based XSS has lost some of its relevance because most obvious data sources (URL, referrer) are automatically escaped by the browser. For example, the hash component of the URL is automatically URL encoded.

However, if a web page does something less trivial than simply reading a URL and inserting it into a document, it might still be possible to trick scripts into interpreting the content in an unsafe way. For example, a web application might semantically extract and decode data out of the URL and use that.

Example

Let’s say we have the following script snippet:

<script>
var hashData  = document.location.hash.substring(1);
var data = decodeURIComponent(hashData);
document.write(data);
</script>

The script simply takes everything after the hash part of the URI, decodes it and writes the data to the document.

In that case, we can simply urlencode any payload and execute it by placing it in the hash part of the URI, like this:

http://website.com/foo?bar#urlencoded-payload-goes-here

For the payload, a simple alert call works:

<script>alert(1)</script>

The final URI will look like this:

http://website.com/foo?bar#%3Cscript%3Ealert%281%29%3C%2Fscript%3E

If anyone clicks on the link, and the vulnerable script snippet is in that web page, then an alert message will be popped in their browser.

Mitigation

As with any XSS, it is important to escape or whitelist all untrusted data.

HTML encoding

When inserting untrusted data into the HTML body, it is important to HTML encode the data. An encoding library should be used for this.

For Javascript, a simple-to-use library is Mathias Bynens’ he library. Using the library, we can HTML encode the data, and the payload will not execute:

<script src="he.js"></script>
<script>
var hashData  = document.location.hash.substring(1);
var data = he.encode(decodeURIComponent(hashData));
document.write(data);
</script>

For other languages, there are other encoders. For Java, you can use the OWASP Java Encoder.

Modern frontend frameworks

Most modern frontend frameworks (e.g. Angular, Vue, React) have these mitigations built in. It’s still possible to allow XSS, but the functions which do these are often expressive of their dangers. For example dangerouslySetInnerHTML is a function which allows XSS in React.

Therefore using these frameworks largely mitigates many XSS vectors.

Caveats

HTML attributes

Note that when you add encoded data to an HTML attribute, then it’s important for the encoded value to be between quotes. The following script snippet is vulnerable:

document.write('<input type="text" value=' + he.encode(data) + ' />');

If the user inputs the following hash fragment:

#foo%20onclick=alert(1);

Then another attribute will be added to the input HTML element - onclick=alert(1). When the user clicks on the input box, then the javascript will be executed and an alert box will be popped.

Therefore it is imperative to make sure all user input values are between double quotes.

Whitelisting

Although escaping is usually preferred, it is also possible to use regex to whitelist untrusted data. The problem with this approach is that it is easy to make mistakes.

Conclusion

XSS comes in many shapes and sizes. The important thing is to be wary of all user-controlled data, even if it comes from an inconspicuous place, like the hash fragment. Any unescaped data can easily become an XSS vulnerability.

Extra reading

OWASP XSS Prevention Cheat Sheet

OWASP DOM-Based XSS Prevention Cheat Sheet

Heino Sass Hallik