Visualising Related Entries in Wikipedia Using Gephi

Sometime last week, @mediaczar tipped me off to a neat recipe on the wonderfully named Drunks&Lampposts blog, Graphing the history of philosophy, that uses Gephi to map an influence network in the world of philosophy. The data is based on the extraction of the “influencedBy” relationship over philosophers referred to in Wikipedia using the machine readable, structured data view of Wikipedia that is DBpedia.

The recipe given hints at how to extract data from DBpedia, tidy it up and then import it into Gephi… but there is a quicker way: the Gephi Semantic Web Import plugin. (If it’s not already installed, you can install this plugin via the Tools -> Plugins menu, then look in the Available Plugin.)

To get DBpedia data into Gephi, we need to do three things:

– tell the importer where to find the data by giving it a URL (the “Driver” configuration setting);
– tell the importer what data we want to get back, by specifying what is essentially a database query (the “Request” configuration setting);
– tell Gephi how to create the network we want to visualise from the data returned from DBpedia (in the context of the “Request” configuration).

Fortunately, we don’t have to work out how to do this from scratch – from the Semantic Web Import Configuration panel, configure the importer by setting the configuration to DBPediaMovies.

Hitting “Set Configuration” sets up the Driver (Remote SOAP Endpoint with Endpoint URL

and provides a dummy, sample query Request:

We need to do some work creating our own query now, but not too much – we can use this DBpediaMovies example and the query given on the Drunks&Lampposts blog as a starting point:

?p a
<> .
?p <> ?influenced.

This query essentially says: ‘give me all the pairs of people, (?p, ?influenced), where each person ?p is a philosopher, and each person ?influenced is influenced by ?p’.

We can replace the WHERE part of the query in the Semantic Web Importer with the WHERE part of this query, but what graph do we want to put together in the CONSTRUCT part of the Request?

The graph we are going to visualise will have nodes that are philosophers or the people who influenced them. The edges connecting the nodes will represent that one influenced the other, using a directed line (with an arrow) to show that A influenced B, for example.

The following construction should achieve this:

?p <> ?influenced.
  ?p a
<> .
?p <> ?influenced.
} LIMIT 10000

(The LIMIT argument limits the number of rows of data we’re going to get back. It’s often good practice to set this quite low when you’re trying out a new query!)

Hit Run and a graph should be imported:

If you click on the Graph panel (in the main Overview view of the Gephi tool), you should see the graph:

If we run the PageRank or EigenVector centrality statistic, size the nodes according to that value, and lay out the graph using a force directed or Fruchtermann-Rheingold layout algorithm, we get something like this:

The nodes are labelled in a rather clumsy way – – for example, but we can tidy this up. Going to one of the DPpedia pages, such as, we find what else DBpedia knows about this person:

In particular, we see we can get hold of the name of the philosopher using the foaf:name property/relation. If you look back to the original DBpediaMovies example, we can start to pick it apart. It looks as if there are a set of gephi properties we can use to create our network, including a “label” property. Maybe this will help us label our nodes more clearly, using the actual name of a philosopher for example? You may also notice the declaration of a gephi “prefix”, which appears in various constructions (such as gephi:label). Hmmm.. Maybe gephi:label is to prefix gephi:<; as foaf:name is to something? If we do a web search for the phrase foaf:name prefix, we turn up several results that contain the phrase prefix foaf:<;, so maybe we need one of those to get the foaf:name out of DBpedia….?

But how do we get it out? We’ve already seen that we can get the name of a person who was influenced by a philosopher by asking for results where this relation holds: ?p <; ?influenced. So it follows we can get the name of a philosopher (?pname) by asking for the foaf:name in the WHEER part of the query:

?p <foaf:name> ?pname.

and then using this name as a label in the CONSTRUCTion:

?p gephi:label ?pname.

We can also do a similar exercise for the person who is influenced.

looking through the DBpedia record, I notice that as well as an influenced relation, there is an influencedBy relation (I think this is the one that was actually used in the Drunks&Lampposts blog?). So let’s use that in this final version of the query:

prefix gephi:<>
prefix foaf: <>
  ?philosopher gephi:label ?philosopherName .
  ?influence gephi:label ?influenceName .
  ?philosopher <> ?influence
  ?philosopher a
  <> .
  ?philosopher <> ?influence.
  ?philosopher foaf:name ?philosopherName.
  ?influence foaf:name ?influenceName.
} LIMIT 10000

If you’ve already run a query to load in a graph, if you run this query it may appear on top of the previous one, so it’s best to clear the workspace first. At the bottom right of the screen is a list of workspaces – click on the RDF Request Graph label to pop up a list of workspaces, and close the RDF Request Graph one by clicking on the x.

Now run the query into a newly launched, pristine workspace, and play with the graph to your heart’s content…:-) [I’ll maybe post more on this later – in the meantime, if you’re new to Gephi, here are some Gephi tutorials]

Here’s what I get sizing nodes and labels by PageRank, and laying out the graph by using a combination of Force Atlas2, Expansion and Label Adjust (to stop labels overlapping) layout tools:

Using the Ego Network filter, we can then focus on the immediate influence network (influencers and influenced) of an individual philosopher:

What this recipe hopefully shows is how you can directly load data from DBpedia into Gephi. The two tricks you need to learn to do this for other data sets are:

1) figuring out how to get data out of DBpedia (the WHERE part of the Request);
2) figuring out how to get that data into shape for Gephi (the CONSTRUCT part of the request).

If you come up with any other interesting graphs, please post Request fragments in the comments below:-)

[See also: Graphing Every* Idea In History]

PS via @sciencebase (Mapping research on Wikipedia with Wikimaps), there’s this related tool: WikiMaps, on online (and desktop?) tool for visualising various Wikipedia powered graphs, such as, erm, Justin Bieber’s network…

Any other related tools out there for constructing and visualising Wikipedia powered network maps? Please add a link via the comments if you know of any…

PPS for a generalisation of this approach, and a recipe for finding other DBpedia networks to map, see Mapping How Programming Languages Influenced Each Other According to Wikipedia.

PPPS Here’s another handy recipe that shows how to pull SPARQLed DBPedia queries into R, analyse them there, and then generate a graphML file for rendering in Gephi: SPARQL Package for R / Gephi – Movie star graph visualization Tutorial

PPPPS related – a large scale version of this? Wikipedia Mining Algorithm Reveals The Most Influential People In 35 Centuries Of Human History

Author: Tony Hirst

I'm a Senior Lecturer at The Open University, with an interest in #opendata policy and practice, as well as general web tinkering...

13 thoughts on “Visualising Related Entries in Wikipedia Using Gephi”

  1. I threw a Wiki Graph example together for HPCCSystems. I loaded the whole english wiki snapshot, calculated pagerank, built queries that return gexf and visualized in Sigmajs, all in the one ecosystem to give a simple demo on putting it together end to end. When I get some more downtime I’ll turn on exploring the graph etc..

    Check it out at…

    let me know what you think?

    1. Hi Jo – thanks for the comment – and hinting the recipe used to create the demo… the scale of it is beyond anything I play with!

      A couple of quick comments – for me, one of the attractions of graphs is the way we can use them to generate maps in which the spatial layout is influenced by the connectivity of the graph and the relations between the nodes. Creative definitions of edges and and weight weights, along with judicious selection of the layout algorithm, means we can create a wide variety of maps, some of which may reveal pattern or structure that is meaningful or makes makes to the person creating the visualisation. In a lot of the sketches I produce, I use node and edge colour to represent modularity groups that correlate reasonably well with clusters that emerge under force directed layout, resulting in coherently coloured regions within the map. The demo you linked to renders a star graph in which I guess the node size relates to PageRank, colour to some sort of category, and position (in two senses: proximity to the root node, and proximity to other leaves) I’m not sure? This makes me wonder about what insights my eyes can give me from the visualisation, other than which the big nodes are? (I’ve been trying to start thinking a little more critically about my own viz practice lately, so I’m just airing some of the thoughts I’ve been having about my own work/play!;-) Second thing: clicking on a node leads to the Wikipedia page for the node, allowing the visualisation to provide a navigational surface over Wikipedia concepts. When I clicked on a node, I think I expected it to refocus the graph with that node as the root? Though from your comment I guess this is on the todo list next time you have downtime?;-)

      As to the edges – you’re using in and outlinks – does your graph also included semantically labelled edges, or did you create it from a link scrape? (How long did that take if so?!) What dataset did you actually use to create the graph? I was playing with DBpedia because I wanted to access the semantically labelled edges.

      Just by the by, for the style of viz you’re using – ie the star – did you also look at using ? (I’m not sure how well it scales to large numbers of nodes?)

      Good stuff, anyway:-) Let me know (@psychemedia) if you do any updates:-)

  2. Good stuff – very useful thank you! Been playing with Gephi a bit myself over the weekend. So good tips here.

  3. Hi, this is absolutely wonderful. I’ve just started a course in SNA, and this is astounding to me.
    However, the foaf:name for some reason calls the second name listed in dbpedia (if there is one, that is), which is often messy. I have absolutely no idea how to fix this (haven’t worked with SPARQL before, and I’ve only just started to work with Gephi). Any ideas?

  4. thanks. it was really useful :) but i got many error in the log saying that
    org.openide.util.Exceptions printStackTrace
    SEVERE: null
    at Method)
    org.netbeans.core.startup.TopLogging$AWTHandler uncaughtException
    SEVERE: null Stream closed

    any clue?

    1. Hi Sally – I can’t be much help I’m afraid. When it comes to these plugins, I’m very much just a user. One debugging approach might be just to load one of the default examples and see if that works. Also check you have the latest updated version of the plugin, and maybe also check out the forums for issue reports?

        1. @sally :-) So what are you mapping…? Anything interesting? If you blog it, please post a link back.. I’m always on the look out for new ideas:-)

Comments are closed.

%d bloggers like this: