OUseful.Info, the blog…

Trying to find useful things to do with emerging technologies in open education

Posts Tagged ‘parser

A Python XML Handling Gotcha – Namespaces

with one comment

Just a quick note to self from a matter arising at the #IIP11 hackday earlier today – parsing XML from the PatientOpinion API using xml.etree.ElementTree library in Python. The issue – as Dan Hagon (aka @axiomsofchoice) discovered, and as I couldn’t help on out given my appalling lack of skills in using Python, was that the namespace the XML results file used needed handling explicitly. (I wonder if this is also why Yahoo Pipes choked on the XML?).

Anyway, here’s an example of the XML returned from Patient Choices:

<Opinions xmlns="http://www.patientopinion.org.uk/api/rest/v1" xmlns:i="http://www.w3.org/2001/XMLSchema-instance">
  <Opinion>
    <Author>*****</Author>
    <Body>My mother who is ...
      ...
    </Body>
    <PostingID>27290</PostingID>
    <dtSubmitted>2010-01-05T12:57:19.667</dtSubmitted>
    <syndicOriginalID/><syndicSourceID>po</syndicSourceID>
    <HealthServices>
      <HealthService>
        <NACS>RAJ01_430</NACS>
        <Name>Geriatric medicine</Name>
        <OrganisationNACS>RAJ</OrganisationNACS>
        <Postcode>SS0 0RY</Postcode>
        <SiteNACS>RAJ01</SiteNACS>
        <Town/>
        <Type>service</Type>
      </HealthService>
    </HealthServices>
    <Period>Today</Period>
    <PostingAs>a relative</PostingAs>
    <Responses/>
    <Tags>
      <Tag>
        <TagGroup>Condition</TagGroup>
        <TagName>confused</TagName>
      </Tag>
    </Tags>
    <Title>My darling dad</Title>
    <Type>Story</Type>
  </Opinion>
</Opinions>

And here’s a snippet for how to handle it, as gleaned from the ever helpful Stack Overflow…

import urllib2
from xml.etree.ElementTree import *

req = urllib2.Request(url='http://www.patientopinion.org.uk/api/rest.svc/v1/postings/search?tag=dirty&take=20&apikey=******')
f = urllib2.urlopen(req)

tree = ElementTree()
tree.parse(f)
doc = tree.getroot()

#http://stackoverflow.com/questions/1319385/need-help-using-xpath-in-elementtree
namespace = "{http://www.patientopinion.org.uk/api/rest/v1}"
t= doc.find("{0}Opinion/{0}HealthServices/{0}HealthService/{0}Postcode".format(namespace))
print t.text

It also seems as if the full path from the root is required?

PS are there any Python libraries out there that would have been able to handle the namespaced XML automagically….?

Written by Tony Hirst

March 28, 2011 at 5:02 pm

Posted in Anything you want

Tagged with , , ,

Follow

Get every new post delivered to your Inbox.

Join 134 other followers