Semantic Cartography – Mapping Dodgy Goth Bands With Common Members Using Wikipedia Data

Several years ago I did some doodles using the Gephi network visualiser Semantic Web Import plugin to sketch out how various sorts of thing (philosophers, music genres, programming languages) were related in Wikipedia (or at least, DBpedia, the semantic web derivative of Wikipedia). A couple of days ago, I started sketching some new queries in a Jupyter IPython notebook to generate a wider range of maps, using the networkx package to analyse the results locally, as well as building and export a graph that I could then visualise in Gephi.

The following bit of code provides a simple function for running a SPARQL query against a SPARQL endpoint, such as the DBpedia endpoint. It also accepts a set of prefix definitions for the query.

from SPARQLWrapper import SPARQLWrapper, JSON

#Add some helper functions
def runQuery(endpoint,prefix,q):
    ''' Run a SPARQL query with a declared prefix over a specified endpoint '''
    sparql = SPARQLWrapper(endpoint)
    sparql.setQuery(prefix+q)
    sparql.setReturnFormat(JSON)
    return sparql.query().convert()

endpoint='http://dbpedia.org/sparql'

prefix='''
prefix gephi:<http://gephi.org/>
prefix foaf: <http://xmlns.com/foaf/0.1/>
prefix dbp: <http://dbpedia.org/property/>
prefix dbr: <http://dbpedia.org/resource/>
prefix dbc: <http://dbpedia.org/resource/Category:>
prefix dct: <http://purl.org/dc/terms/>
prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>
prefix yago: <http://dbpedia.org/class/yago/>
prefix xsd: <http://www.w3.org/2001/XMLSchema#>
'''

Here’s an example of the style of query I explored a few years ago – it identifies a thing that’s a band in a particular genre, and then tries to find other genres associated with that band. Each combination of genres adds an edge to the resulting graph. The FILTER element makes sure that we make edges between different genres.

m='Gothic_rock'
q='''
SELECT DISTINCT ?a ?an ?b ?bn WHERE {{
?band dbp:genre dbr:{}.
?band <http://dbpedia.org/property/background> "group_or_band"@en.
?band dbp:genre ?a.
?band dbp:genre ?b.
?a dbp:name ?an.
?b dbp:name ?bn.
FILTER(?a != ?b && langMatches(lang(?an), "en")  && langMatches(lang(?bn), "en"))
}}'''.format(m)

r=runQuery(endpoint,prefix,q)

Another simple function takes the resulting edge list and creates a node labeled graph from it using the networkx library. We can then export a graph file from this network that can be visualised in Gephi. (On my to do list is using networkx to  calculate some simple network statistics and generate a first attempt at a preview layout automatically, rather than doing it by hand in Gephi, which is what I do at the moment…)

def nxGrapher_hack(response,config,typ='undirected'):
    ''' typ: forward | reverse | undirected'''
    if typ=='undirected':
        G = nx.Graph()
    else:
        G = nx.DiGraph()

    fr,fr_l=config['from']
    to,to_l=config['to']
    for r in response['results']['bindings']:
        G.add_node(r[fr]['value'], label=r[fr_l]['value'])
        G.add_node(r[to]['value'], label=r[to_l]['value'])
        if typ=='reverse':
            G.add_edge(r[to]['value'],r[fr]['value'])
        else:
            G.add_edge(r[fr]['value'],r[to]['value'])
    return G

G=nxGrapher_hack(r, {'from':('a','an'),'to':('b','bn')})
nx.write_gexf(G, "music_{}.gexf".format(m))

Here’s the sort of map/graph we can generate as a result:
Gephi_0_9_1_-_Project_1_-_Project_2

As well as genre information, we can look up information about band members, such as the current or previous members of a particular band*.

About__Wayne_Hussey

*Since generating the data files last night, and running them again today, a whole raft of bander membership details appear to have disappeared. WTF?! Now I remember another of the reasons I keep avoiding the semantic web – it’s as flakey as anything and you can never tell if the problem is yours, someone else’s or the result of an update (or downgrade) in the data!

What this means is that we can anchor a query on a band, and find the current or previous members. In the following snippet, the single braces (“{}”@en) are replaced by the value of the declared band name:

m="The Mission (band)"
q='''
SELECT DISTINCT ?a ?an ?b ?bn WHERE {{
?x <http://dbpedia.org/property/background> "group_or_band"@en.
?x rdfs:label "{}"@en.

?a <http://dbpedia.org/property/background> "group_or_band"@en.
?a rdfs:label ?an.

?b rdfs:label ?bn.
?b a dbo:Person.
{{?a dbp:pastMembers ?b.}} UNION
{{?a dbp:currentMembers ?b.}}.
{{?x dbp:pastMembers ?b.}} UNION
{{?x dbp:currentMembers ?b.}}

FILTER((lang(?an)=&amp;amp;quot;en&amp;amp;quot;) &amp;amp;amp;&amp;amp;amp; (lang(?bn)=&amp;amp;quot;en&amp;amp;quot;) &amp;amp;amp;&amp;amp;amp; !(STRSTARTS(?bn,&amp;amp;quot;List of&amp;amp;quot;)) &amp;amp;amp;&amp;amp;amp; !(STRSTARTS(?an,&amp;amp;quot;List of&amp;amp;quot;)))
}}'''.format(m)

r=runQuery(endpoint,prefix,q)

G=nxGrapher_hack(r, {'from':('a','an'),'to':('b','bn')})
nx.write_gexf(G, "band_{}.gexf".format(m))

A slight tweak to the code lets us replace the anchoring (that is, the search) around a single band name to a set of band names. This allows us to get the current and previous members of all the declared bands.

m=['The Mission (band)','The Cult','The Sisters of Mercy','Fields of the Nephilim','All_About_Eve_(band)']

p='''
?x rdfs:label "{}"@en.
'''

ms=''' UNION
'''.join(['{'+p.format(i)+'}' for i in m])

#In the query, replace ?x rdfs:label &amp;amp;quot;{}&amp;amp;quot;@en. with {}
#In the format method, replace m with ms

Rather than searching around one or more bands, we could instead hook into bands associated with a particular genre. Rather than anchoring around ?x rdfs:label "{}"@en, for example, use ?x dbp:genre dbr:{}. This then lets us generate views of the following form:

Gephi_0_9_1_-_Project_1

As well as mapping the territory around particular musical genres, we can also generate maps for other contexts, such as around particular art movements. For example:

m='Surrealism'
q='''
SELECT DISTINCT ?a ?an ?b ?bn WHERE {{
?movement dct:subject dbc:Art_movements.
?movement dct:subject dbc:{}.
?artist dbp:movement ?movement.
?artist dbp:movement ?a.
?artist dbp:movement ?b.
?a rdfs:label ?an.
?b rdfs:label ?bn.
FILTER(?a != ?b && (lang(?an)="en") && (lang(?bn)="en"))
}}'''.format(m)

r=runQuery(endpoint,prefix,q)
G=nxGrapher_hack(r, {'from':('a','an'),'to':('b','bn')})
nx.write_gexf(G, "art_{}.gexf".format(m))

Or we can tap into other ontologies to limit our searches, and generate a range of influence maps:

y='Artist109812338'
#Artist109812338
#Painter110391653
#Potter110460806
#Sculptor110566072
#Philosopher110423589
#PhilosophersOfLanguage
#PhilosophersOfMathematics
#PhilosophersOfMind
q='''
SELECT ?a ?an ?b ?bn WHERE {{
  ?a a yago:{typ} .
  ?b a yago:{typ} .
  ?a rdfs:label ?an.
  ?b rdfs:label ?bn.
  {{?a <http://dbpedia.org/ontology/influencedBy> ?b.}}
   UNION {{
  ?b <http://dbpedia.org/ontology/influenced> ?a.
  }}
  }}'''.format(typ=y)
r=runQuery(endpoint,prefix,q)
G=nxGrapher_hack(r, {'from':('a','an'),'to':('b','bn')},typ='forward')
nx.write_gexf(G, "influence_{}.gexf".format(y))

So why bother?

Here are several reasons: first, because it’s interesting/fun/recreational; secondly, it allows us to compare our own mental model of the wider context around a particular genre or movement with the Wikipedia version; thirdly, if we’re expert, it might allow us to spot gaps or errors in the Wikipedia data, and fix it; fourthly, these sorts of data collections are used to make recommendations to you, so it helps to get a feel for the sorts of things they can represent, the relations they claim exist, and the ways they can go wrong, so you trust the machines a little bit less, or are least, a little bit more informedly.

PS One of the reasons for grabbing the data using Python was because Gephi has recently undergone an update, and the extensions developed for the earlier version are still being migrated. However, checking today, I notice that the SemanticWebImport plugin has made it across, so it should be possible to run variants of the queries directly in Gephi. See the previous posts for examples.

One comment

  1. Tony Hirst

    I should probably have also done maps using dbo:associatedMusicalArtist and dbo:associatedBand under the assumption that these are /maintained/ relationships that identify that two bands have shared a common member? (Although I guess it could mean they also co-performed, eg as per Motorhead & Girlschool?)