Mapping How Programming Languages Influenced Each Other According to Wikipedia

By way of demonstrating how the recipe described in Visualising Related Entries in Wikipedia Using Gephi can easily be turned to other things, here’s a map of how different computer programming languages influence each other according to DBpedia/Wikipedia:

Here’s the code that I pasted in to the Request area of the Gephi Semantic Web Import plugin as configured for a DBpedia import:

prefix gephi:<http://gephi.org/>
prefix foaf: <http://xmlns.com/foaf/0.1/>
CONSTRUCT{
  ?a gephi:label ?an .
  ?b gephi:label ?bn .
  ?a <http://dbpedia.org/ontology/influencedBy> ?b
} WHERE {
?a a <http://dbpedia.org/ontology/ProgrammingLanguage>.
?b a <http://dbpedia.org/ontology/ProgrammingLanguage>.
?a <http://dbpedia.org/ontology/influencedBy> ?b.
?a foaf:name ?an.
?b foaf:name ?bn.
}

As to how I found the <http://dbpedia.org/ontology/ProgrammingLanguage&gt; relation, I had a play around with the SNORQL query interface for DBpedia looking for possible relations using queries along the lines of:

SELECT DISTINCT ?c WHERE {
?a <http://dbpedia.org/ontology/influencedBy> ?b.
?a rdf:type ?c.
?b a ?c.
} limit 50 offset 150

(I think a (as in ?x a ?y and rdf:type are synonyms?)

This query looks for pairs of things (?a, ?b), each of the same type, ?c, where ?b also influences ?a, then reports what sort of thing (?c) they are (philosophers, for example, or programming languages). We can then use this thing in our custom Wikipedia/DBpedia/Gephi semantic web mapping request to map out the “internal” influence network pertaining to that thing (internal in the sense that the things that are influencing and influenced are both representatives of the same, erm, thing…;-).

The limit term specifies how many results to return, the offset essentially allows you to page through results (so an offset of 500 will return results starting with the 501st result overall). DISTINCT ensures we see unique relations.

If you see a relation that looks like dbpedia:ontology/Philosopher, put it in and brackets (<>) and replace dbpedia: with http://dbpedia.org/ to give something like <http://dbpedia.org/ontology/Philosopher&gt;.

PS see how to use a similar technique to map out musical genres ascribed to bands on WIkipedia

Author: Tony Hirst

I'm a Senior Lecturer at The Open University, with an interest in #opendata policy and practice, as well as general web tinkering...

11 thoughts on “Mapping How Programming Languages Influenced Each Other According to Wikipedia”

  1. Pingback: Grayson Hodnett
  2. It’s interesting, but kind of hard to understand what the visualisation means. I can see a cluster of list processing languages over to the left, and OO languages in the middle. But why does Algol get so far separated from Algol 60 and Algol 68, with FORTAN and Pascal in between. I guess the actual locations are artefacts and we should see distances in terms of steps that are hard to see. But I can’t see a direct link from Algol to the others, or from Pascal to Object Pascal.

    I also realise this doesn’t mean your code is wrong, maybe these are artefacts of the ways that myriad Wikipedia editors expressed themselves. The Wikipedia article on Algol 60 mentions the generic term Algol many times, but not as a language itself. It doesn’t mention that Algol 60 was influenced by Algol, even though the Algol article says it influenced “Most subsequent imperative languages (so-called ALGOL-like languages) e.g. Simula, C, CPL, Pascal, Ada”. The article on Algol starts “ALGOL (short for ALGOrithmic Language) is a family of imperative computer programming languages originally developed in the mid 1950s which greatly influenced many other languages and was the standard method for algorithm description used by the ACM, in textbooks, and academic works for the next 30 years and more.[2] In the sense that most modern languages are “algol-like”,[3] it was arguably the most successful of the four high level programming languages with which it was roughly contemporary, Fortran, Lisp, and COBOL.”

    So is this visulaisation seductive but ultimately thoroughly misleading, giving a veneer of sense but hiding away really bad underlying data?

    1. @chris I was making no claims at all about anything the visualisation does or doesn’t say, or the way in which it may or may not be interpreted, nor even, I think, what the node sizing is related to? The point of the post was to demonstrate a recipe about how to get particular subsets of data out of DBpedia and into Gephi. That is all… ;-)

      PS FWIW, I think one of the major benefits to be had from these sorts of visualisation is in support of a visual analytical conversation between the analyst and the data. It also helps, as you say, to identify those cases where there may be something wrong with the data (as Martin Hawksey says, “data use == data validation”), which is an important part of the process of working with data. One thing I try to avoid doing on the blog is produce anything that might be construed as being intended as a ‘finished graphic’. Pretty much all of the “visualisations” I post here are sketches, snapshots of a process that is an ongoing conversation between myself and the data, and that as a consequence is only really meaningful to anyone who is aware of the full history of that conversation.

      The programming languages image was intended simply as an illustration of the structure of the data and some of the manipulations that could be applied to it, rather than as a meaningful visualisation. And it was not my intention at all to claim it as a “true” statement about the relationships between computer languages. At the most, it is a statement made about the claims made within a set of Wikipedia pages (a set that is opaque to me) that make use of a particular structured relation in a particular context.

Comments are closed.