Using OAI-PMH as a Single Record Level Query Interface to Citeseer

Picking up on a query I raised in Citation Positioning, here’s a quick summary of an online discussion featuring variously @edsu, @epoz, @ostephens and myself (I’m the one who knows absolutely nothing…!)

The context is: can I use the OAI-PMH interface on Citeseer to grab record level machine readable results from Citeseer. Note that I donlt really want to harvest all the Citeseer data, pop it into a database of my own, and then run queries on that; I just want a quick and dirty API to make a handful of calls to particular queries for a proof of concept hack;-)

Here’s what the Citeseer HTML page looks like:

A citeseer results page

It has a URL of the form:

The tabbed results pages have their own URLs:

– Active Bibliography, of the form
– Co-Citation, of the form
– Clustered Documents, of the form

Here’s what I’m guessing:
– the ‘front page’ results are links to papers that reference/cite the target article, ordered by the number of times that they themselves have been cited; this is a subset of the total set of papes that cite the target article;
– the Active Bibliography is a subset of the articles that are referenced from/cited by the target article that have themselves been recently cited elsewhere (?! I’m guessing – the Citeseer site doesn’t seem to provide an explanation anywhere?)
– the co-citations are… I have no idea? Other papers that have been cited by papers that cite the target paper?
– Clustered Documents – these seem to be other Citeseer records relating to the same paper; do they all have the same citation info? I have no idea?????

As far as the OAI interface goes, it seems we can grab an individual record using a query of the form:

which returns a result of the form:

<OAI-PMH xmlns="" xmlns:xsi="" xsi:schemaLocation="">
<request identifier="oai:CiteSeerX.psu:" metadataPrefix="oai_dc" verb="GetRecord"></request>
<oai_dc:dc xmlns:dc="" xmlns:oai_dc="" xsi:schemaLocation="">
<dc:title>The structure and function of complex networks</dc:title>
<dc:creator>M. E. J. Newman</dc:creator>
Inspired by empirical studies of networked systems such as the Internet, social networks, and biological networks, researchers have in recent years developed a variety of techniques and models to help us understand or predict the behavior of these systems. Here we review developments in this field, including such concepts as the small-world effect, degree distributions, clustering, network correlations, random graph models, models of network growth and preferential attachment, and dynamical processes taking place on networks.
The Pennsylvania State University CiteSeerX Archives

Click to access graphsurvey.pdf

</dc:source> <dc:language>en</dc:language> <dc:relation></dc:relation> <dc:relation></dc:relation> <dc:relation></dc:relation> <dc:relation></dc:relation> <dc:relation></dc:relation> <dc:relation></dc:relation> <dc:relation></dc:relation> <dc:relation></dc:relation> <dc:relation></dc:relation> <dc:relation></dc:relation> <dc:relation></dc:relation> <dc:relation></dc:relation> <dc:relation></dc:relation> <dc:relation></dc:relation> <dc:relation></dc:relation> <dc:relation></dc:relation> <dc:relation></dc:relation> <dc:relation></dc:relation> <dc:relation></dc:relation> <dc:relation></dc:relation> <dc:relation></dc:relation> <dc:relation></dc:relation> <dc:relation></dc:relation> <dc:relation></dc:relation> <dc:relation></dc:relation> <dc:relation></dc:relation> <dc:relation></dc:relation> <dc:relation></dc:relation> <dc:relation></dc:relation> <dc:relation></dc:relation> <dc:relation></dc:relation> <dc:relation></dc:relation> <dc:relation></dc:relation> <dc:relation></dc:relation> <dc:relation></dc:relation> <dc:relation></dc:relation> <dc:relation></dc:relation> <dc:relation></dc:relation> <dc:relation></dc:relation> <dc:relation></dc:relation> <dc:relation></dc:relation> <dc:relation></dc:relation> <dc:relation></dc:relation> <dc:relation></dc:relation> <dc:relation></dc:relation> <dc:relation></dc:relation> <dc:relation></dc:relation> <dc:relation></dc:relation> <dc:relation></dc:relation> <dc:relation></dc:relation> <dc:relation></dc:relation> <dc:relation></dc:relation> <dc:relation></dc:relation> <dc:relation></dc:relation> <dc:relation></dc:relation> <dc:relation></dc:relation> <dc:relation></dc:relation> <dc:relation></dc:relation> <dc:relation></dc:relation> <dc:relation></dc:relation> <dc:relation></dc:relation> <dc:relation></dc:relation> <dc:relation></dc:relation> <dc:relation></dc:relation> <dc:relation></dc:relation> <dc:relation></dc:relation> <dc:relation></dc:relation> <dc:relation></dc:relation> <dc:relation></dc:relation> <dc:relation></dc:relation> <dc:relation></dc:relation> <dc:relation></dc:relation> <dc:relation></dc:relation> <dc:relation></dc:relation> <dc:relation></dc:relation> <dc:relation></dc:relation> <dc:rights> Metadata may be used without restrictions as long as the oai identifier remains attached to it. </dc:rights> </oai_dc:dc> </metadata> </record> </GetRecord> </OAI-PMH>

I’m guessing the dc:relation elements refer to the papers listed on the ‘front page’ of the results for a given paper, that is, they are the most heavily cited papers that cite the target paper?

So a few questions that arise:

– what do the different results listings on the HTML pages actually refer to?
– what do the results in the OAI query above relate to?
– is it possible to get a list of all the papers cited/referenced by a target article? (Or failing that, is it possible to get hold of the Active Bibliography relations, which are presumably a subset of the complete set of bibliographic references contained within a paper?)
– is it possible to get a list of all the paper that cite/reference a particular target article?

If you can answer any or all of the above questions, please feel free to post the answer(s) in a comment below…:-)

Author: Tony Hirst

I'm a Senior Lecturer at The Open University, with an interest in #opendata policy and practice, as well as general web tinkering...

One thought on “Using OAI-PMH as a Single Record Level Query Interface to Citeseer”

Comments are closed.

%d bloggers like this: