A Python XML Handling Gotcha – Namespaces

Just a quick note to self from a matter arising at the #IIP11 hackday earlier today – parsing XML from the PatientOpinion API using xml.etree.ElementTree library in Python. The issue – as Dan Hagon (aka @axiomsofchoice) discovered, and as I couldn’t help on out given my appalling lack of skills in using Python, was that the namespace the XML results file used needed handling explicitly. (I wonder if this is also why Yahoo Pipes choked on the XML?).

Anyway, here’s an example of the XML returned from Patient Choices:

<Opinions xmlns="http://www.patientopinion.org.uk/api/rest/v1" xmlns:i="http://www.w3.org/2001/XMLSchema-instance">
  <Opinion>
    <Author>*****</Author>
    <Body>My mother who is ...
      ...
    </Body>
    <PostingID>27290</PostingID>
    <dtSubmitted>2010-01-05T12:57:19.667</dtSubmitted>
    <syndicOriginalID/><syndicSourceID>po</syndicSourceID>
    <HealthServices>
      <HealthService>
        <NACS>RAJ01_430</NACS>
        <Name>Geriatric medicine</Name>
        <OrganisationNACS>RAJ</OrganisationNACS>
        <Postcode>SS0 0RY</Postcode>
        <SiteNACS>RAJ01</SiteNACS>
        <Town/>
        <Type>service</Type>
      </HealthService>
    </HealthServices>
    <Period>Today</Period>
    <PostingAs>a relative</PostingAs>
    <Responses/>
    <Tags>
      <Tag>
        <TagGroup>Condition</TagGroup>
        <TagName>confused</TagName>
      </Tag>
    </Tags>
    <Title>My darling dad</Title>
    <Type>Story</Type>
  </Opinion>
</Opinions>

And here’s a snippet for how to handle it, as gleaned from the ever helpful Stack Overflow…

import urllib2
from xml.etree.ElementTree import *

req = urllib2.Request(url='http://www.patientopinion.org.uk/api/rest.svc/v1/postings/search?tag=dirty&take=20&apikey=******')
f = urllib2.urlopen(req)

tree = ElementTree()
tree.parse(f)
doc = tree.getroot()

#http://stackoverflow.com/questions/1319385/need-help-using-xpath-in-elementtree
namespace = "{http://www.patientopinion.org.uk/api/rest/v1}"
t= doc.find("{0}Opinion/{0}HealthServices/{0}HealthService/{0}Postcode".format(namespace))
print t.text

It also seems as if the full path from the root is required?

PS are there any Python libraries out there that would have been able to handle the namespaced XML automagically….?

2 comments

  1. mnkassier

    If you are struggling with xml namespaces, there is a great tutorial on xpath namespaces at xml reports. It walks you through it in very simple steps.
    xml reports

  2. Pingback: Processing XML in Python with ElementTree « ly2xxx