Extract XML Element
Tuesday 23 July 2002
I've been puzzling over an XML problem using .NET code. I want to extract an element from an XML document and create a new document with this element as the root element. The other part of the spec is that this must be done as efficiently as possible.
Take the following trivial example
<?xml version="1.0" encoding="utf-8" ?> <multistatus xmlns="DAV:"> <response> <href>http://www.foo.bar/container/> <propstat> <prop xmlns:R="http://www.foo.bar/boxschema/"> <R:bigbox/> </prop> <status>HTTP/1.1 200 OK</status> </propstat> </response> </multistatus>
I want to be able to extract each element under the prop element(s), e.g. the <R:bigbox/> element, and end up with documents like this:
<?xml version="1.0" encoding="utf-8" ?> <R:bigbox xmlns:R="http://www.foo.bar/boxschema/"/>
The efficiency requirement led me to try an XmlReader approach: traverse to the first child element of each prop element and use ReadOuterXml to extract the node. Unfortunately this does not handle the namespace properly because ReadOuterXml returns:
<R:bigbox/>
If there was some way of determing the active namespaces at the current reader position (or simply pushing them onto a stack as the document is traversed) there would still be the problem of determining which namespaces are required for the XML elements being extracted. All feasible but a lot more work than I had expected.
So for the time being I ended up using an XmlDocument approach involving copying the node and inserting it into a new instance of XMLDocument. This does handle the namespace problem but is likely to be much slower when large documents are involved.