Why `\Magento\Framework\Xml\Security` is not secure? And how to bypass

Class \Magento\Framework\Xml\Security in Magento 2 framework is intended to prevent XXE and its scan method can detect entities in XML input, by design. However, we found a way to bypass the scan method. The implementation of this class has no encoding-awareness at all. Let's see how to break it.

Simulate the input XML

<?xml version="1.0" encoding="UTF-16"?>
<!DOCTYPE foo [
    <!ENTITY bar "baz">
]>
<a>anything</a>

Save the above content to somewhere with UTF-16 LE BOM encoding.

Testing

// Simulated input XML string by using previously saved file
$xmlString = file_get_contents('utf-16-le-bom.xml');

// Verify the XML string is valid, so no exception will be thrown here
new \SimpleXMLElement($xmlString);

/** @var \Magento\Framework\Xml\Security $xmlSecurity */
var_dump($xmlSecurity->scan($xmlString)); // bool(true)

Note the scan method returns true, which means the bypass succeeds.

Usage in Magento

Magento 2.4.7-p3

Security.php

Carrier.php#L658

This example shows that Magento uses it to scan external XML response.

Adobe Analyst Response

Won't fix due to no practical exploitation scenario discovered.

Need A One-size-fits-all Solution

TODO

We're going to provide a secure version of this class within 2024 as not all Magento developers are security specialists.

DONE

We released a module wubinworks/module-xml-security, and after installing this module, \Magento\Framework\Xml\Security is preferenced and the object type compatibility is also kept.
Now, you are safe to do the following without worrying about XML encoding.

/** @var \Magento\Framework\Xml\Security $xmlSecurity */
$xmlSecurity->scan($xmlString);

Conclusion

2 important things:

  • Don't use \Magento\Framework\Xml\Security to scan WebAPI input, response and user input data.

  • If you don't have a replacement and has to it, at least, detect the XML encoding first.

We hope this blog post can help someone's development.