Sketching Sponsor Partners Running UK Clinical Trials

Using data from the registry (search for UK clinical trials), I grabbed all records relating to trials that have at least in part run in the UK as an XML file download, then mapped links between project lead sponsors and their collaborators. Here’s a quick sketch of the result:

ukClinicalTrialPartners (PDF)

The XML data schema defines the corresponding fields as follows:

<!-- === Sponsors ==================================================== -->

  <xs:complexType name="sponsors_struct">
      <xs:element name="lead_sponsor" type="sponsor_struct"/>
      <xs:element name="collaborator" type="sponsor_struct" minOccurs="0" maxOccurs="unbounded"/>

Here’s the Python script I used to extract the data and generate the graph representation of it (requires networkx), which I then exported as a GEXF file that could be loaded into Gephi and used to generate the sketch shown above.

import os
from lxml import etree
import networkx as nx
import networkx.readwrite.gexf as gf
from xml.etree.cElementTree import tostring

def flatten(el):
	if el != None:
		result = [ (el.text or "") ]
		for sel in el:
			result.append(sel.tail or "")
		return "".join(result)
	return ''

def graphOut(DG):
	#print tostring(writer.xml)
	f = open('workfile.gexf', 'w')

def sponsorGrapher(DG,xmlRoot,sponsorList):
	if lead !='':
		if lead not in sponsorList:
	for collab in sponsors_xml.findall('./collaborator/agency'):
		if collabname !='':
			if collabname not in sponsorList:
				DG.add_node( sponsorList.index( collabname ), label=collabname, name=collabname, Label=collabname )
			DG.add_edge( sponsorList.index(lead), sponsorList.index(collabname) )
	return DG, sponsorList

def parsePage(path,fn,sponsorGraph,sponsorList):
	xmlRoot = xmldata.getroot()
	sponsorGraph,sponsorList = sponsorGrapher(sponsorGraph,xmlRoot,sponsorList)
	return sponsorGraph,sponsorList

listing = os.listdir(XML_DATA_DIR)

for page in listing:
	if os.path.splitext( page )[1] =='.xml':
		sponsorDG, sponsorList = parsePage(XML_DATA_DIR,page, sponsorDG, sponsorList)


Once the file is loaded in to Gephi, you can hover over nodes to see which organisations partnered which other organisations, etc.

One thing the graph doesn’t show directly are links between co-collaborators – edges go simply from lead partner to each collaborator. It would also be possible to generate a graph that represents pairwise links between every sponsor of a particular trial.

The XML data download also includes information about the locations of trials (sometimes at the city level, sometimes giving postcode level data). So the next thing I may look at is a map to see where sponsors tend to runs trials in the UK, and maybe even see whether different sponsors tend to favour different trial sites…

Further down the line, I wonder whether any linkage can be made across to things like GP practice level prescribing behaviour, or grant awards from the MRC?

PS these may be handy too – World Health Organisation Clinical Trials Registry portal, EU Clinical Trials Register

PPS looks like we can generate a link to the download file easily enough. Original URL is:
Download URL is:
So now I wonder: can Scraperwiki accept a zip file, unzip it, then parse all the resulting files? Answers, with code snippets, via the comments, please:-) DONE – example here: Scraperwiki: test

One comment

  1. Pingback: Bookmarks for May 3rd through May 6th | Help Me Investigate Health