The Structure of OUseful.Info
A blog is just so many blog posts, right? Wrong… it also has the potential to be full of structure. One of the things I try to do in OUseful.info is post links not only to related third party sites, but also back to previous blog posts on OUseful.info to provide additional information, context or explanations that add value to the current post. Through the magic of trackbacks/pingbacks, the WordPress platform notices when I link to one OUseful.info post from another, and adds a trackback/pingback style to that linked to post referring back to the post that included the link (got that!?;-)
So if I add a link whilst writing post B to post A, a pingback style comment will be added to post A saying that post B mentioned it.
If we take an export dump of a WordPress blog, we can search through it to identify each post, and each trackback/pingback comment:
We can then create a file that defines each blog post as a network node, an each pingback as an edge connecting two nodes (so if blog post A links to post B, we draw an edge from node A to B).
Here’s a cobbled together Python script to do just that (as a gist):
import string
from xml.dom import minidom
#based on http://code.activestate.com/recipes/551792-convert-wordpress-export-file-to-multiple-html-fil/
infile='wpexport.xml'
dotfile='internalstructure.dot'
csvfile='internalstructure.csv'
dom = minidom.parse(infile)
f = open(dotfile, 'w')
f2 = open(csvfile,'w')
blog=[]
f.write('digraph blogstruct{')
for node in dom.getElementsByTagName('item'):
post = dict()
post["title"] = node.getElementsByTagName('title')[0].firstChild.data
post["date"] = node.getElementsByTagName('pubDate')[0].firstChild.data
post["link"] = node.getElementsByTagName('link')[0].firstChild.data
post['comments'] =[]
for comment in node.getElementsByTagName('wp:comment'):
commentInfo = dict()
if comment.getElementsByTagName('wp:comment_type')[0]:
c= comment.getElementsByTagName('wp:comment_type')[0]
if c.firstChild:
commentInfo['type']= comment.getElementsByTagName('wp:comment_type')[0].firstChild.data
commentInfo['url']= comment.getElementsByTagName('wp:comment_author_url')[0].firstChild.data
commentInfo['date']= comment.getElementsByTagName('wp:comment_date')[0].firstChild.data
if commentInfo['type']=='pingback' and commentInfo['url'].find('http://blog.ouseful.info')!=-1:
cID=commentInfo['url'].strip('/')
cID=cID.rpartition('/')
rID=post["link"].strip('/')
rID=rID.rpartition('/')
f.write('"'+cID[2]+'"->"'+rID[2]+'"\n')
f2.write('"'+cID[2]+'","'+rID[2]+'"\n')
#post['comments'].append(comments)
#blog.append(post)
f.write('}')
f.close()
f2.close()
Note that the WordPress export file seemed to be incomplete (the Python parser didn’t like it…) – I had to add the Atom namespace definition: xmlns:atom=”http://www.w3.org/2005/Atom/”
In the above snippet, I generate two sorts of output file – a dot file for use with Graphviz, and a CSV file that can be loaded in to Gephi. The code is also customised for only showing pingbacks with the blog.ouseful.info domain – to use the code on your own blog you’d have to tweak that bit…
Anyway, here’s a glimpse of the the structure of the internal pingback links from OUSeful.info visualised using Gephi and a Force Atlas layout:
You can see that the blog as a whole contains a fair amount of structure. For sure, there are some posts that are only linked to by one other post and appear to “float” with respect to the rest of the blog posts (unlinked posts are not identified as nodes by my trackback graph script); but there are also long chains of posts that suggest I have developed an idea over multiple posts…
When I get a chance, I’ll have a go at using some of Gephi’s network analysis tools on this graph, but for now – back to the Bank Holiday:-)
See also: Visualising CoAuthors in Open Repostory Online Papers, Part 1 and Visualising CoAuthors in Open Repository Online Papers, Part 2, as well as Emergent Structure in the Digital Worlds Uncourse Blog Experiment and Uncovering a Little More Digital Worlds Structure




[...] Search « The Structure of OUseful.Info [...]
Using Graphviz to Explore the Internal Link Structure of a Wordpress Blog « OUseful.Info, the blog…
August 31, 2010 at 8:43 am
[...] September 2, 2010 Tinkering , Visualisation Leave a Comment Tags: gephi In a couple of recent posts, I’ve shown how it’s possible to extract and visualise the internal link [...]
A Quick Visualisation of Pingbacked Posts in OUseful.info Using Gephi « OUseful.Info, the blog…
September 2, 2010 at 9:51 am
[...] if the links are easily scrapeable, it’s easy enough to plot the graph eg http://blog.ouseful.info/2010/08/30/the-structure-of-ouseful-info/ [...]
Science in the Open » Blog Archive » A little bit of federated Open Notebook Science
October 3, 2010 at 12:26 pm