python-xmlsec currently relies on passing raw xmlNodePtr objects between lxml (which builds on libxml2) and xmlsec1 (which also uses libxml2). This creates a fragile situation where different versions of libxml2 may be loaded into the same process, leading to:
- Segfaults or memory corruption due to incompatible struct layouts
- Invalid memory free errors (e.g., double-free or mismatched allocators)
- Signature verification failures caused by inconsistent parser state
- Undefined behavior from mismatched
libxml2 global configuration
This occurs because:
lxml bundles its own libxml2 and libxslt (especially in binary wheels) to ease installation for users on Windows, macOS, and some Linux platforms.
python-xmlsec binds to xmlsec1, which in turn links to the system's libxml2.
- Pointers like
xmlNodePtr created by lxml are then passed to python-xmlsec functions like tree.find_node() or SignatureContext.sign().
If the libxml2 versions are not ABI-compatible, this can easily lead to crashes, unpredictable behavior, or memory corruption.
Proposed Solution: Decoupling via Canonicalized XML
Instead of passing xmlNodePtr from lxml to python-xmlsec, we should support passing serialized XML (as bytes), ideally using Canonical XML (C14N) where appropriate. This isolates the XML parsing and memory management between the two libraries.
Example Usage
from lxml import etree
import xmlsec
doc = etree.fromstring("<Root><Signature/></Root>")
c14n_bytes = etree.tostring(doc, method="c14n", exclusive=True)
# Proposed new API:
signed_bytes = xmlsec.sign_serialized(c14n_bytes, key_file="key.pem")
# Parse back with lxml if needed
signed_doc = etree.fromstring(signed_bytes)