OUseful.Info, the blog…

Trying to find useful things to do with emerging technologies in open education

A Python XML Handling Gotcha – Namespaces

with 2 comments

Just a quick note to self from a matter arising at the #IIP11 hackday earlier today – parsing XML from the PatientOpinion API using xml.etree.ElementTree library in Python. The issue – as Dan Hagon (aka @axiomsofchoice) discovered, and as I couldn’t help on out given my appalling lack of skills in using Python, was that the namespace the XML results file used needed handling explicitly. (I wonder if this is also why Yahoo Pipes choked on the XML?).

Anyway, here’s an example of the XML returned from Patient Choices:

<Opinions xmlns="http://www.patientopinion.org.uk/api/rest/v1" xmlns:i="http://www.w3.org/2001/XMLSchema-instance">
  <Opinion>
    <Author>*****</Author>
    <Body>My mother who is ...
      ...
    </Body>
    <PostingID>27290</PostingID>
    <dtSubmitted>2010-01-05T12:57:19.667</dtSubmitted>
    <syndicOriginalID/><syndicSourceID>po</syndicSourceID>
    <HealthServices>
      <HealthService>
        <NACS>RAJ01_430</NACS>
        <Name>Geriatric medicine</Name>
        <OrganisationNACS>RAJ</OrganisationNACS>
        <Postcode>SS0 0RY</Postcode>
        <SiteNACS>RAJ01</SiteNACS>
        <Town/>
        <Type>service</Type>
      </HealthService>
    </HealthServices>
    <Period>Today</Period>
    <PostingAs>a relative</PostingAs>
    <Responses/>
    <Tags>
      <Tag>
        <TagGroup>Condition</TagGroup>
        <TagName>confused</TagName>
      </Tag>
    </Tags>
    <Title>My darling dad</Title>
    <Type>Story</Type>
  </Opinion>
</Opinions>

And here’s a snippet for how to handle it, as gleaned from the ever helpful Stack Overflow…

import urllib2
from xml.etree.ElementTree import *

req = urllib2.Request(url='http://www.patientopinion.org.uk/api/rest.svc/v1/postings/search?tag=dirty&take=20&apikey=******')
f = urllib2.urlopen(req)

tree = ElementTree()
tree.parse(f)
doc = tree.getroot()

#http://stackoverflow.com/questions/1319385/need-help-using-xpath-in-elementtree
namespace = "{http://www.patientopinion.org.uk/api/rest/v1}"
t= doc.find("{0}Opinion/{0}HealthServices/{0}HealthService/{0}Postcode".format(namespace))
print t.text

It also seems as if the full path from the root is required?

PS are there any Python libraries out there that would have been able to handle the namespaced XML automagically….?

Written by Tony Hirst

March 28, 2011 at 5:02 pm

Posted in Anything you want

Tagged with , , ,

2 Responses

Subscribe to comments with RSS.

  1. If you are struggling with xml namespaces, there is a great tutorial on xpath namespaces at xml reports. It walks you through it in very simple steps.
    xml reports

    mnkassier

    June 1, 2011 at 1:12 am


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 341 other followers

%d bloggers like this: