The Structure of OUseful.Info

A blog is just so many blog posts, right? Wrong… it also has the potential to be full of structure. One of the things I try to do in OUseful.info is post links not only to related third party sites, but also back to previous blog posts on OUseful.info to provide additional information, context or explanations that add value to the current post. Through the magic of trackbacks/pingbacks, the WordPress platform notices when I link to one OUseful.info post from another, and adds a trackback/pingback style to that linked to post referring back to the post that included the link (got that!?;-)

So if I add a link whilst writing post B to post A, a pingback style comment will be added to post A saying that post B mentioned it.

If we take an export dump of a WordPress blog, we can search through it to identify each post, and each trackback/pingback comment:

Wordpress pingback

We can then create a file that defines each blog post as a network node, an each pingback as an edge connecting two nodes (so if blog post A links to post B, we draw an edge from node A to B).

Here’s a cobbled together Python script to do just that (as a gist):

import string
from xml.dom import minidom
#based on http://code.activestate.com/recipes/551792-convert-wordpress-export-file-to-multiple-html-fil/

infile='wpexport.xml'
dotfile='internalstructure.dot'
csvfile='internalstructure.csv'

dom = minidom.parse(infile)

f = open(dotfile, 'w')
f2 = open(csvfile,'w')

blog=[]

f.write('digraph blogstruct{')

for node in dom.getElementsByTagName('item'):
	post = dict()
	post["title"] = node.getElementsByTagName('title')[0].firstChild.data
	post["date"] = node.getElementsByTagName('pubDate')[0].firstChild.data
	post["link"] = node.getElementsByTagName('link')[0].firstChild.data
	post['comments'] =[]
	
	for comment in node.getElementsByTagName('wp:comment'):
		commentInfo = dict()
		if comment.getElementsByTagName('wp:comment_type')[0]:
			c= comment.getElementsByTagName('wp:comment_type')[0]
			if c.firstChild:
				commentInfo['type']= comment.getElementsByTagName('wp:comment_type')[0].firstChild.data
				commentInfo['url']= comment.getElementsByTagName('wp:comment_author_url')[0].firstChild.data
				commentInfo['date']= comment.getElementsByTagName('wp:comment_date')[0].firstChild.data
				if commentInfo['type']=='pingback' and commentInfo['url'].find('http://blog.ouseful.info')!=-1:
					cID=commentInfo['url'].strip('/')
					cID=cID.rpartition('/')
					rID=post["link"].strip('/')
					rID=rID.rpartition('/')
					f.write('"'+cID[2]+'"->"'+rID[2]+'"\n')
					f2.write('"'+cID[2]+'","'+rID[2]+'"\n')
					#post['comments'].append(comments)
	#blog.append(post)

f.write('}')
f.close()
f2.close()

Note that the WordPress export file seemed to be incomplete (the Python parser didn’t like it…) – I had to add the Atom namespace definition: xmlns:atom=”http://www.w3.org/2005/Atom/”

Adding the Atom namespace to WP export

In the above snippet, I generate two sorts of output file – a dot file for use with Graphviz, and a CSV file that can be loaded in to Gephi. The code is also customised for only showing pingbacks with the blog.ouseful.info domain – to use the code on your own blog you’d have to tweak that bit…

Anyway, here’s a glimpse of the the structure of the internal pingback links from OUSeful.info visualised using Gephi and a Force Atlas layout:

Link structure of blog.ouseful.info

You can see that the blog as a whole contains a fair amount of structure. For sure, there are some posts that are only linked to by one other post and appear to “float” with respect to the rest of the blog posts (unlinked posts are not identified as nodes by my trackback graph script); but there are also long chains of posts that suggest I have developed an idea over multiple posts…

When I get a chance, I’ll have a go at using some of Gephi’s network analysis tools on this graph, but for now – back to the Bank Holiday:-)

See also: Visualising CoAuthors in Open Repostory Online Papers, Part 1 and Visualising CoAuthors in Open Repository Online Papers, Part 2, as well as Emergent Structure in the Digital Worlds Uncourse Blog Experiment and Uncovering a Little More Digital Worlds Structure

Author: Tony Hirst

I'm a Senior Lecturer at The Open University, with an interest in #opendata policy and practice, as well as general web tinkering...