Defusedxml – defusing XML bombs and other exploits
“It be resplendent XML, what could well per chance well doubtlessly scamper depraved?”
Christian Heimes <christian@python.org>
The outcomes of an assault on a susceptible XML library can even be somewhat
dramatic. With resplendent just a few hundred Bytes of XML knowledge an attacker can
opt loads of Gigabytes of memory inside seconds. An attacker
can furthermore support CPUs busy for a really lengthy time with a microscopic to medium dimension
ask. Below some conditions it is far even attainable to access native
recordsdata in your server, to circumvent a firewall, or to abuse products and companies to
rebound attacks to third events.
The attacks spend and abuse much less smartly-liked sides of XML and its parsers.
The majority of developers are unacquainted with sides corresponding to
processing directions and entity expansions that XML inherited from
SGML. At most effective they learn about from journey with HTML however
they dangle now not seem like aware that a sage form definition (DTD) can generate an
HTTP ask or load a file from the file map.
None of the components is contemporary. They had been known for a really lengthy time. Billion
laughs turned into as soon as first reported in 2003. Nonetheless some XML libraries and
functions are calm susceptible and even heavy customers of XML are
surprised by these sides. It be interesting to assert whom to blame for the
discipline. It be too quick sighted to shift all blame on XML parsers and
XML libraries for the utilization of jumpy default settings. Despite everything they
effectively put in power XML specs. Application developers mustn’t
count that a library is repeatedly configured for security and seemingly
depraved knowledge by default.
Desk of Contents
The Billion Laughs
assault — most steadily is called exponential entity expansion –makes spend of a few
stages of nested entities. The distinctive instance makes spend of 9 stages of 10
expansions in every degree to enlarge the string lol
to a string of 3 10 9 bytes, attributable to this truth the establish “billion laughs”. The resulting
string occupies 3 GB (2.79 GiB) of memory; intermediate strings require
further memory. Because most parsers dangle now not cache the intermediate
step for every expansion it is far repeated over and over. It
increases the CPU load even extra.
An XML sage of resplendent just a few hundred bytes can disrupt all products and companies on
a machine inside seconds.
Instance XML:
]>
]>
&d;
A quadratic blowup assault is same to a Billion
Laughs assault; it abuses
entity expansion, too. As a replacement of nested entities it repeats one immense
entity with a few thousand chars over and over. The assault
is now not the least bit times in actual fact as efficient because the exponential case however it avoids triggering
countermeasures of parsers in opposition to heavily nested entities. Some parsers
limit the depth and breadth of a single entity however now not the general amount
of expanded text all via a total XML sage.
A medium-sized XML sage with a few hundred kilobytes can
require a few hundred MB to loads of GB of memory. When the assault
is mixed with some degree of nested expansion an attacker is able to
operate a more in-depth ratio of success.
]>
]>
&a;&a;&a;... repeat
Entity declarations can contain better than resplendent text for replacement.
They are going to furthermore demonstrate exterior sources by public identifiers or
map identifiers. System identifiers are customary URIs. When the URI
is a URL (e.g. a http://
locator) some parsers gain the handy resource
from the distant enviornment and embed them into the XML sage verbatim.
Easy instance of a parsed exterior entity:
]>
]>
ⅇ
The case of parsed exterior entities works resplendent for respectable XML disclose material.
The XML customary furthermore helps unparsed exterior entities with a
NData declaration
.
External entity expansion opens the door to heaps of exploits. An
attacker can abuse a susceptible XML library and application to rebound
and forward community requests with the IP address of the server. It
extremely is dependent upon the parser and the applying what roughly exploit is
attainable. As an illustration:
- An attacker can circumvent firewalls and construct access to restricted
sources because the total requests are fabricated from an inside and
faithful IP address, now not from the exterior. - An attacker can abuse a provider to assault, look on or DoS your
servers however furthermore third social gathering products and companies. The assault is disguised with
the IP address of the server and the attacker is able to contain the many of the
excessive bandwidth of a sizable machine. - An attacker can employ further sources on the machine, e.g.
with requests to a provider that doesn’t respond or responds with
very immense recordsdata. - An attacker could well per chance well also just construct knowledge, when, how most steadily and from which IP
address an XML sage is accessed. - An attacker could well per chance well ship mail from inside your community if the URL
handler helpssmtp://
URIs.
External entities with references to native recordsdata are a sub-case of
exterior entity expansion. It be listed as an further assault attributable to it
deserves further consideration. Some XML libraries corresponding to lxml disable
community access by default however calm allow entity expansion with native
file access by default. Native recordsdata are either referenced with a
file://
URL or by a file direction (either relative or absolute).
Additionally, lxml’s libxml2
has catalog improve. XML catalogs love
/and so on/xml/catalog
are XML recordsdata, which design schema URIs to native recordsdata.
An attacker would be ready to access and agree with all recordsdata that can even be
read by the applying direction of. This could well per chance well also just encompass essential configuration
recordsdata, too.
]>
]>
ⅇ
This case is same to exterior entity expansion, too. Some XML
libraries love Python’s xml.dom.pulldom retrieve sage form
definitions from distant or native locations. So a lot of assault scenarios
from the exterior entity case notice to this predicament as effectively.
text
“>
text
kind | sax | etree | minidom | pulldom | xmlrpc |
---|---|---|---|---|---|
billion laughs | Perhaps (1) | Perhaps (1) | Perhaps (1) | Perhaps (1) | Perhaps (1) |
quadratic blowup | Perhaps (1) | Perhaps (1) | Perhaps (1) | Perhaps (1) | Perhaps (1) |
exterior entity expansion (distant) | Counterfeit (2) | Counterfeit (3) | Counterfeit (4) | Counterfeit (2) | counterfeit |
exterior entity expansion (native file) | Counterfeit (2) | Counterfeit (3) | Counterfeit (4) | Counterfeit (2) | counterfeit |
DTD retrieval | Counterfeit (2) | Counterfeit | Counterfeit | Counterfeit (2) | counterfeit |
gzip bomb | Counterfeit | Counterfeit | Counterfeit | Counterfeit | True |
xpath improve (6) | Counterfeit | Counterfeit | Counterfeit | Counterfeit | Counterfeit |
xsl(t) improve (6) | Counterfeit | Counterfeit | Counterfeit | Counterfeit | Counterfeit |
xinclude improve (6) | Counterfeit | True (5) | Counterfeit | Counterfeit | Counterfeit |
C library | expat | expat | expat | expat | expat |
vulnerabilities and sides
- expat parser >= 2.4.0 has billion
laughs
security
in opposition to XML bombs (CVE-2013-0340). The parser has wise defaults
forXML_SetBillionLaughsAttackProtectionMaximumAmplification
and
XML_SetBillionLaughsAttackProtectionActivationThreshold
. - Python >= 3.6.8, >= 3.7.1, and >= 3.8 no longer retrieve native and
distant sources with urllib, seek
bpo-17239. - xml.etree doesn’t enlarge entities and raises a ParserError when an
entity occurs. - minidom doesn’t enlarge entities and easily returns the unexpanded
entity verbatim. - Library has (restricted) XInclude improve however requires an further
step to direction of inclusion. - These are sides however they could well per chance even just introduce exploitable holes, seek
Other issues to rob into myth
feature_external_ges (http://xml.org/sax/sides/exterior-classic-entities)
disables exterior entity expansion
feature_external_pes (http://xml.org/sax/sides/exterior-parameter-entities)
the option is neglected and doesn’t adjust any efficiency
external_parameter_entities
neglected
external_general_entities
neglected
external_dtd_subset
neglected
entities
in doubt
The defusedxml equipment
(defusedxml on PyPI) contains
loads of Python-handiest workarounds and fixes for denial of provider and
other vulnerabilities in Python’s XML libraries. In bid to succor
from the safety you resplendent contain to import and spend the listed functions
/ classes from the lawful defusedxml module in situation of the common
module. Merely defusedxml.xmlrpc is utilized as
monkey patch.
As a replacement of:
Show conceal
The defusedxml modules have to now not descend-in replacements of their stdlib
counterparts. The modules handiest provide functions and classes connected to
parsing and loading of XML. For all other sides, spend the classes,
functions, and constants from the stdlib modules. As an illustration:
Additionally the equipment has an untested characteristic to monkey patch
all stdlib modules with defusedxml.defuse_stdlib()
.
Warning
defuse_stdlib()
could well per chance well also just calm be avoided. It is going to contain to interrupt third social gathering equipment or
jam off shining side results. As a replacement it is seemingly you’ll per chance even just calm spend the parsing
sides of defusedxml explicitly.
All functions and parser classes settle for 3 further key phrase
arguments. They return either the an identical objects because the common functions
or effectively matched subclasses.
forbid_dtd (default: Counterfeit)
disallow XML with a processing instruction and elevate a
DTDForbidden exception when a DTD processing instruction is figured out.
forbid_entities (default: True)
disallow XML with declarations inside the DTD and elevate an
EntitiesForbidden exception when an entity is declared.
forbid_external (default: True)
disallow any access to distant or native sources in exterior entities or
DTD and elevating an ExternalReferenceForbidden exception when a DTD or
entity references an exterior handy resource.
DefusedXmlException, DTDForbidden, EntitiesForbidden,
ExternalReferenceForbidden, NotSupportedError
defuse_stdlib() (experimental)
NOTE defusedxml.cElementTree
is deprecated and could well per chance well just be eliminated in
a future originate. Import from defusedxml.ElementTree
as a replacement.
parse(), iterparse(), fromstring(), XMLParser
parse(), iterparse(), fromstring(), XMLParser
create_parser(), DefusedExpatParser
parse(), parseString(), make_parser()
parse(), parseString(), DefusedExpatBuilder, DefusedExpatBuilderNS
parse(), parseString()
parse(), parseString()
The fix is utilized as monkey patch for the stdlib’s xmlrpc equipment
(3.x) or xmlrpclib module (2.x). The characteristic monkey_patch() enables the fixes, unmonkey_patch() removes the patch and puts the
code in its faded explain.
The monkey patch protects in opposition to XML connected attacks as well to
decompression bombs and excessively immense requests or responses. The
default atmosphere is 30 MB for requests, responses and gzip decompression.
It is seemingly you’ll per chance well adjust the default by changing the module variable MAX_DATA. A price of -1 disables the limit.
DEPRECATED The module is deprecated and could well per chance well just be eliminated in a future
originate.
lxml is stable in opposition to most assault scenarios. lxml makes spend of libxml2
for
parsing XML. The library has builtin mitigations in opposition to billion laughs
and quadratic blowup attacks. The parser lets in a limit amount of entity
expansions, then fails. lxml furthermore disables community access by default.
libxml2 lxml
FAQ
lists further solutions for stable parsing, as an illustration counter
measures in opposition to compression bombs.
The default parser resolves entities and protects in opposition to mammoth bushes and
deeply nested entities. To disable entities expansion, spend a custom
parser object:
from lxml import etree
parser = etree.XMLParser(resolve_entities=Counterfeit)
root = etree.fromstring(" ", parser=parser)
The module acts as an instance the manner it is seemingly you’ll per chance offer protection to code that makes spend of
lxml.etree. It implements a custom Part class that filters out Entity
conditions, a custom parser factory and a thread native storage for parser
conditions. It furthermore has a check_docinfo() characteristic which inspects a tree
for inside or exterior DTDs and entity declarations. In bid to examine
for entities lxml > 3.0 is required.
parse(), fromstring() RestrictedElement, GlobalParserTLS,
getDefaultParser(), check_docinfo()
The defusedexpat equipment
(defusedexpat on PyPI) is
no longer supported. expat parser 2.4.0
and newer advance with billion laughs
security
in opposition to XML bombs.
Change to Python 3.6.8, 3.7.1, or newer. The SAX and DOM parser operate now not
load exterior entities from recordsdata or community sources.
Change to expat to 2.4.0 or newer. It has billion laughs
security
with wise default limits to mitigate billion laughs and quadratic
blowup.
Knowledgeable binaries from python.org spend libexpat 2.4.0 since 3.7.12,
3.8.12, 3.9.7, and 3.10.0 (August 2021). Third social gathering vendors could well per chance well also just spend
older or newer variations of expat. pyexpat.version_info
contains the
contemporary runtime model of libexpat. Vendors could well per chance well also just contain backported fixes
to older variations with out bumping the model amount.
Instance:
XPath queries
(per Brad Hill’s Attacking XML
Security)
XML, XML parsers and processing libraries contain extra sides and
attainable predicament that could well per chance well consequence in DoS vulnerabilities or security
exploits in functions. I in actual fact contain compiled an incomplete checklist of
theoretical components that need further examine and extra consideration. The
checklist is deliberately pessimistic and pretty paranoid, too. It contains
issues which can also just scamper depraved under daffy conditions.
XML parsers could well per chance well also just spend an algorithm with quadratic runtime O(n
2) to address attributes and namespaces. If it makes spend of hash
tables (dictionaries) to store attributes and namespaces the
implementation could well per chance well also just be at threat of hash collision attacks, thus
lowering the efficiency to O(n 2) all as soon as more. In either case an
attacker is able to forge a denial of provider assault with an XML
sage that contains thousands upon thousands of attributes in a
single node.
I contain now not researched but if expat, pyexpat or libxml2 are susceptible.
The predicament of decompression bombs (aka ZIP
bomb) notice to all XML
libraries that could well per chance parse compressed XML movement love gzipped HTTP streams
or LZMA-ed recordsdata. For an attacker it will decrease the amount of
transmitted knowledge by three magnitudes or extra. Gzip is able to compress 1
GiB zeros to roughly 1 MB, lzma is even better:
xmlrpclib
. The module is susceptible<https://bugs.python.org/issue16043> to decompression bombs.
lxml can load and direction of compressed knowledge via libxml2 transparently.
libxml2 can take care of even very immense blobs of compressed knowledge efficiently
with out the utilization of too grand memory. Nonetheless it doesn’t offer protection to functions from
decompression bombs. A fastidiously written SAX or iterparse-love manner
can even be stable.
PI‘s love:
could well per chance well also just impose extra threats for XML processing. It depends if and the blueprint in which a
processor handles processing directions. The predicament of URL retrieval
with community or native file access notice to processing directions, too.
DTD has extra
sides love . I contain now not researched how these sides could well per chance well also just
be a security threat.
XPath statements could well per chance well also just introduce DoS vulnerabilities. Code could well per chance well also just calm by no blueprint
enact queries from untrusted sources. An attacker could well per chance well also furthermore be ready to
dangle an XML sage that makes obvious XPath queries costly or
handy resource hungry.
XPath injeciton attacks stunning grand work love SQL injection attacks.
Arguments to XPath queries could well per chance well also just calm be quoted and validated effectively,
in particular when they’re taken from the user. The page Withhold far flung from the
dangers of XPath
injection
checklist some ramifications of XPath injections.
Python’s customary library doesn’t contain XPath improve. Lxml helps
parameterized XPath queries which does just correct quoting. You resplendent contain to
spend its xpath() manner precisely:
XML Inclusion is
one mistaken system to load and encompass exterior recordsdata:
“>
This characteristic could well per chance well also just calm be disabled when XML recordsdata from an untrusted source
are processed. Some Python XML libraries and libxml2 improve XInclude
however dangle now not contain an system to sandbox inclusion and limit it to allowed
directories.
A validating XML parser could well per chance well also just gain schema recordsdata from the notion
in a xsi:schemaLocation
attribute.
“>
It is seemingly you’ll per chance well also just calm rob into myth that XSLT is a Turing total language. Never
direction of XSLT code from unknown or untrusted source! XSLT processors could well per chance well also just
demonstrate you how to work in conjunction with exterior sources in solutions you wouldn’t even
take into consideration. Some processors even improve extensions that allow read/write
access to file map, access to JRE objects or scripting with Jython.
Instance from Attacking XML
Security
for Xalan-J:
“>
CVE-2013-1664
Unrestricted entity expansion induces DoS vulnerabilities in Python XML
libraries (XML bomb)
CVE-2013-1665
External entity expansion in Python XML libraries inflicts seemingly
security flaws and DoS vulnerabilities
So a lot of alternative programming languages and frameworks are susceptible as
effectively. A pair of them are struggling from the undeniable truth that libxml2 up to 2.9.0
has no security in opposition to quadratic blowup attacks. Most of them contain
seemingly unsafe default settings for entity expansion and exterior
entities, too.
Perl’s XML::Easy is at threat of quadratic entity expansion and
exterior entity expansion (both native and distant).
Ruby’s REXML sage parser is at threat of entity expansion attacks
(both quadratic and exponential) however it doesn’t operate exterior entity
expansion by default. In bid to counteract entity expansion it is seemingly you’ll per chance contain
to disable the characteristic:
REXML::Doc.entity_expansion_limit = 0
libxml-ruby and hpricot dangle now not enlarge entities in their default
configuration.
PHP’s SimpleXML API is at threat of quadratic entity expansion and
loads entities from native and distant sources. The likelihood
LIBXML_NONET
disables community access however calm lets in native file
access. LIBXML_NOENT
looks to contain no operate on entity expansion in
PHP 5.4.6.
Recordsdata in XML DoS and Defenses
(MSDN) counsel
that .NET is susceptible with its default settings. The article contains
code snippets learn how to dangle a stable XML reader:
XmlReaderSettings settings = contemporary XmlReaderSettings();
settings.ProhibitDtd = counterfeit;
settings.MaxCharactersFromEntities = 1024;
settings.XmlResolver = null;
XmlReader reader = XmlReader.Acquire(movement, settings);
Untested. The documentation of Xerces and its Xerces
SecurityMananger
sounds love Xerces is furthermore at threat of billion snicker attacks with its
default settings. It furthermore does entity resolving when an
org.xml.sax.EntityResolver
is configured. I’m now not but obvious about the
default atmosphere here.
Java specialists counsel to contain a custom builder factory:
DocumentBuilderFactory builderFactory = DocumentBuilderFactory.newInstance();
builderFactory.setXIncludeAware(Counterfeit);
builderFactory.setExpandEntityReferences(Counterfeit);
builderFactory.setFeature(XMLConstants.FEATURE_SECURE_PROCESSING, True);
# either
builderFactory.setFeature("http://apache.org/xml/sides/disallow-doctype-decl", True);
# or whenever you happen to need DTDs
builderFactory.setFeature("http://xml.org/sax/sides/exterior-classic-entities", Counterfeit);
builderFactory.setFeature("http://xml.org/sax/sides/exterior-parameter-entities", Counterfeit);
builderFactory.setFeature("http://apache.org/xml/sides/nonvalidating/load-exterior-dtd", Counterfeit);
builderFactory.setFeature("http://apache.org/xml/sides/nonvalidating/load-dtd-grammar", Counterfeit);
- DOM: Employ xml.dom.xmlbuilder choices for entity going via
- SAX: rob feature_external_ges and feature_external_pes (?) into
myth - test experimental monkey patching of stdlib modules
- beef up documentation
Copyright (c) 2013-2023 by Christian Heimes <christian@python.org>
Licensed to PSF under a Contributor Settlement.
Perceive https://www.python.org/psf/license for licensing essential functions.
Brett Cannon (Python Core developer)
overview and code cleanup
Antoine Pitrou (Python Core developer)
code overview
Aaron Patterson, Ben Murphy and Michael Koziarski (Ruby staff)
Many attributable to Aaron, Ben and Michael from the Ruby staff for their
document and help.
Thierry Carrez (OpenStack)
Many attributable to Thierry for his document to the Python Security Response
Crew on behalf of the OpenStack security staff.
Carl Meyer (Django)
Many attributable to Carl for his document to PSRT on behalf of the Django
security staff.
Daniel Veillard (libxml2)
Many attributable to Daniel for his insight and help with libxml2.
semantics GmbH (https://www.semantics.de/)
Many attributable to my employer semantics for letting me work on the predicament
all via working hours as piece of semantics’s commence source initiative.
- XML DoS and Defenses
(MSDN) - Billion Laughs on
Wikipedia - ZIP bomb on Wikipedia
- Configure SAX parsers for stable
processing - Checking out for XML
Injection
Release date: 2023
- Repair checking out with out lxml
- Take a look at on 3.13-dev and PyPy 3.9
Release date: 29-Sep-2023
- Silence deprecation warning in defuse_stdlib.
- Change lxml security knowledge
Release date: 26-Sep-2023
- Drop improve for Python 2.7, 3.4, and 3.5.
- Take a look at on 3.10, 3.11, and 3.12.
- Add
defusedxml.ElementTree.fromstringlist()
- Change vulnerabilities and sides table in README.
- Pending elimination The
defusedxml.lxml
module has been
unmaintained and deprecated since 2019. The module will most likely be eliminated
within the following model. - Pending elimination The
defusedxml.cElementTree
will most likely be eliminated in
the following model. Please spenddefusedxml.ElementTree
as a replacement.
Release date: 08-Mar-2021
- Repair regression
defusedxml.ElementTree.ParseError
(#63) The
ParseError
exception is now the an identical class object as
xml.etree.ElementTree.ParseError
all as soon as more.
Release date: 4-Mar-2021
- No adjustments
Release date: 12-Jan-2021
- Re-add and deprecate
defusedxml.cElementTree
- Employ GitHub Actions in situation of TravisCI
- Restore
ElementTree
attribute ofxml.etree
module after patching
Release date: 04-Would possibly well-2020
- Add improve for Python 3.9
defusedxml.cElementTree
is now not within the market with Python 3.9.- Python 2 is deprecate. Toughen for Python 2 will most likely be eliminated in
0.8.0.
Release date: 17-Apr-2019
- Lengthen test coverage.
- Add badges to README.
Release date: 14-Apr-2019
- Take a look at on Python 3.7 stable and 3.8-dev
- Drop improve for Python 3.4
- No longer pass html argument to XMLParse. It has been deprecated
and neglected for a really lengthy time. The DefusedXMLParser calm takes a html
argument. A deprecation warning is issued when the argument is Counterfeit
and a TypeError when or now not it is True. - defusedxml now fails early when pyexpat stdlib module is now not
within the market or broken. - defusedxml.ElementTree.__all__ now lists ParseError as public
attribute. - The defusedxml.ElementTree and defusedxml.cElementTree modules had a
typo and veteran XMLParse in situation of XMLParser as an alias for
DefusedXMLParser. Each and each the extinct and fixed establish are now within the market.
Release date: 07-Feb-2017
- No adjustments
Release date: 28-Jan-2017
- Add compatibility with Python 3.6
- Drop improve for Python 2.6, 3.1, 3.2, 3.3
- Repair lxml exams (XMLSyntaxError: Detected an entity reference loop)
Release date: 28-Mar-2013
- Add extra demo exploits, e.g. python_external.py and Xalan XSLT
demos. - Improved documentation.
Release date: 25-Feb-2013
- As per http://seclists.org/oss-sec/2013/q1/340 please REJECT
CVE-2013-0278, CVE-2013-0279 and CVE-2013-0280 and spend
CVE-2013-1664, CVE-2013-1665 for OpenStack/and so on. - Add lacking parser_list argument to sax.make_parser(). The argument
is neglected, though. (attributable to Florian Apolloner) - Add demo exploit for exterior entity assault on Python’s SAX parser,
XML-RPC and WebDAV.
Release date: 19-Feb-2013
- Beef up documentation
Release date: 15-Feb-2013
- Rename ExternalEntitiesForbidden to ExternalReferenceForbidden
- Rename defusedxml.lxml.check_dtd() to check_docinfo()
- Unify argument names in callbacks
- Add arguments and formatted representation to exceptions
- Add forbid_external argument to all functions and classes
- Extra exams
- LOTS of documentation
- Add instance code for other languages (Ruby, Perl, PHP) and parsers
(Genshi) - Add security in opposition to XML and gzip attacks to xmlrpclib
Release date: 08-Feb-2013
- Initial and inside originate for PSRT overview