Using Graphviz to Explore the Internal Link Structure of a WordPress Blog

In The Structure of OUseful.Info, I showed how it was possible to extract an autopingback graph from a WordPress blog (that is, the graph that shows which of the posts in a particular WordPress blog link to other posts in that blog), illustrating the post with a visualisation of linkage within generated using Gephi.

What I didn’t do was post any examples of the views that we can generate in Graphviz – so here are a couple generated without additional flourishes from a simple statement of links between posts.

Firstly, we see a series of posts relating to WriteToReply, and commentable documents:

Bog linkage examples using Graphviz

In the following example, we see a series of self-contained posts on Library Analytics:

Linkage structure in OUSeful info

Note that from the original library-analytics-part-1 post, we can see how two strands developed sound this topic (remember, arrows typically go from a more recent post to an earlier one; that is, the links typically go from a new post to one that already exists…)

Here’s another set of posts on a topic – this time privacy and Facebook:

OUseful posts on Facebook and privacy

(The bidirectional linkage arose from me editing the body of a pre-existing post with a link to a later one.)

One thing I haven’t explored yet is the groupings that arise from an analysis of the tags and categories I used to annotate each post. But what the above shows is that even in the absence of tags and categories, link structure may also be used to aggregate posts on a particular topic, and allow clusters of blog posts, or partitions containing link related posts, to be easily identified – and extracted – from the blog…

…and my supposition is that this sort of structure might be used to facilitate value adding navigation structures…

Handling Yahoo Pipes Serialised PHP Output

One of the output formats supported by Yahoo Pipes is a PHP style array. In this post, which describes a way of seeing how well connected a particular Twitter user is to other Twitterers who have recently used a particular hashtag, I’ll show you how it can b used.

The following snippet, (cribbed from Coding Forums) shows how to handle this array:

//Declare the required pipe, specifying the php output
$req = "";

// Make the request
$phpserialized = file_get_contents($req);

// Parse the serialized response
$phparray = unserialize($phpserialized);

//Here's the raw contents of the array

//Here's how to parse it
foreach ($phparray['value']['items'] AS $key => $val)
	printf("<div><p><a href=\"%s\">%s</a></p><p>%s</p>\n", $val['link'], $val['title'], $val['description']);

The pipe used in the above snippet ( displays a list of people who have recently used a particular hashtag on Twitter a minimum specified number of times.

It’s easy enough to parse out the Twitter ID of each individual, and then for a particular named individual see which of those hashtagging Twitterers they are either following, or are following them. (Why’s this interesting? Well, for any given hashtag community, it can show you how well connected you are with that community).

So let’s see how to do it. First, parse out the Twitter ID:

foreach ($phparray['value']['items'] AS $key => $val) {
	$id=preg_replace("/@([^\s]*)\s.*/", "$1", $val['title']);
	$idList[] = $id; 

We have the Twitter screennames, but now we want the actual Twitter user IDs. There are several PHP libraries for accessing the Twitter API. The following relies on an old, rejigged version of the library available from (the code may need tweaking to work with the current version…), and is really kludged together… (Note to self – tidy this up on day!)

The algorithm is basically as follows, and generates a GraphViz .dot file that will plot the connections a particular user has with the members of a particular hashtagging community:

  • get the list of hashtagger Twitter usernames (as above);
  • for each username, call the Twitter API to get the corresponding Twitter ID, and print out a label that maps each ID to a username;
  • for the user we want to investigate, pull down the list of people who follow them from the Twitter API; for each follower, if the follower is in the hashtaggers set, print out that relationship;
  • for the user we want to investigate, pull down the list of people who they follow (i.e. their ‘friends’) from the Twitter API; for each friend, if the friend is in the hashtaggers set, print out that relationship;
$Twitter = new Twitter($myTwitterID, $myTwitterPwd);

//Get the Twitter ID for each user identified by the hashtagger pipe
foreach ($idList as $user) {
	$user_det=$Twitter->showUser($user, 'xml');
 	$p = xml_parser_create();
	//print out labels in the Graphviz .dot format
	echo $id."[label=\"".$user."\"];\r";

//$userfocus is the Twitter screenname of the person we want to examine
//So who in the hashtagger list is following them?
$follower_det=$Twitter->getFollowers($userfocus, 'xml');
$p = xml_parser_create();
foreach ($index['ID'] as $item){
	//print out edges in the Graphviz .dot format
	if (in_array($follower,$userID)) echo $follower."->".$currUser.";\r";

//And who in the hashtagger list are they following?
$friends_det=$Twitter->getFriends($userfocus, 'xml');
$p = xml_parser_create();
foreach ($index['ID'] as $item){
	//print out edges in the Graphviz .dot format
	if (in_array($followed,$userID)) echo $currUser."->".$followed.";\r";

For completeness, here are the Twitter object methods and their associated Twitter API calls that were used in the above code:

function showUser($id,$format){
  	return $this->APICall($api_call, false);

function getFollowers($id,$format){
 	return $this->APICall($api_call, false);
function getFriends($id,$format){
 	return $this->APICall($api_call, false);

Running the code uses N+2 Twitter API calls, where N is the number of different users identified by the hashtagger pipe.

The output of the script is almost a minimal Graphviz .dot file. All that’s missing is the wrapper, e.g. something like: digraph twitterNet { … }. Here’s what a valid file looks like:

(The labels can appear either before or after the edges – it makes no difference as far as GraphViz is concernd.)

Plotting the graph will show you who the individual of interest is connected to, and how, in the particular hashtag community.

So for example, in the recent #ukoer community, here’s how LornaMCampbell is connected. First a ‘circular’ view:


The arrow direction goes FROM one person TO a person they are following. In the circular diagram, it can be quite hard to see whether a connection is reciprocated or one way.

The Graphviz network diagram uses a separate edge for each connection and makes it easier to spot reciprocated links:


So, there we have it. Another way of looking at Twitter hashtag networks to go along with Preliminary Thoughts on Visualising the OpenEd09 Twitter Network, A Quick Peek at the IWMW2009 Twitter Network and More Thinkses Around Twitter Hashtag Networks: #JISCRI

Scripted Diagrams Getting Easier

A quick heads-up on an another tool (diagrammr) that makes it easy to create network/graph diagrams like this:

Just type in a description of the graph and the diagram will be generated at the same time [video]:

[Infoskills note to self: when making a screencast with Jing, after clicking in a text box area, remember to move the mouse cursor out of the way…]

(Regular readers will know I’ve been this sort of thing for some time; for example, see Scripting Charts WIth GraphViz – Hierarchies; and a Question of Attitude, Writing Diagrams, RESTful Image Generation – When Text Just Won’t Do or Visual Gadgets: Scripting Diagrams).)

As well as creating diagrams, Diagrammr allows you to embed them, providing an image/PNG URI for your diagram; you can also edit the image (that is, edit the script that generates the image) after the fact via a shareable URI.

The URI for the editor page can be generated from the image URI, though, so without the ability to set a password on the editor page when you first crate a new image, this means that any time you embed a Diagrammr image, someone else could go and edit the image?

In an educational context, tools like this make it much easier for students to create their own diagrams (typing in a graph description is far quicker than trying to lay it out by hand in a drawing package). As you script the diagram, your attention is focussed on the local structural components/relations that define the graph, whilst at the same time the automatically generated diagram visualises the overall structure and brings alive its complexity at the network level.

(I’m not sure how the graph layouts are generated – maybe using Graphviz on the server to generate the image and return it to the browser? If so, an improved version of diagrammr might be able to return the compiled xdot version of the graph back to an interactive canviz component running in the browser?)

If you’re working in an insitutional VLE context, where the powers that be are still trying to retain control of everything, the Canviz component might offer one solution – an HTML 5 canvas library for displaying ‘compiled’ Graphviz network descriptions.

Although I haven’t tried it out, there is apparently a recipe for integrating Graphviz with Drupal (Graphviz Filter) and a suggestion for including Canviz into the mix (GraphMapping Framework (graphviz_api + graphviz_fields + graphviz_views + graphviz_filter) – has this been implemented yet by anyone, I wonder?). I’ve no idea if anyone has tried to do something similar in a Moodle environment…

PS here’s another one – a UML editor: YUML

Where Next With The Hashtagging Twitterers List?

This post is a holding position, so it’s probably gonna be even more cryptic than usual…

In Who’s Tweeting Our Hashtag?, I described a recipe for generating a list of people who had been tweeting, twittering or whatever, using a particular hashtag.

So what’s next on my to do list with this info?

Well, first of all I thought it’d be interesting to try to plot a graph of connections between the followers of everyone on the list, to see how large the hashtag audience might be.

Using a list of about 60 or so twitterers, captured yesterday, I called the Twitter API function for each one to pull down an XML list of all each of their followers by ID number, and topped it up with the user info ( for each person on the original list; this info meant I could in turn spot the ID for each of the hashtagging twitterers amongst the followers lists.

It’s easy enough to map transform these lists into the dot format that can be plotted by GraphViz, but the 10,000 edges or so that the list generated from the followers lists was too much for my version of GraphViz to cope with.

So instead, I thought I’d just try to plot a subgraph, such as the graph of people who were following a minimum specified number of people in the original hashtag twittering list. So for example, the graph of people who were following at least five of the the people who’d used the particular hashtag.

I hacked a piece of code to do this, but it’s far from ideal and I’m not totally convinced it works properly… Ideally what I want is simple (efficient) utility that will accept a .dot file and prune it, removing nodes that are less than a specified degree. (If you know of such a tool, please post a link to it in the comments:-)

Here’s the first graph I managed to plot:

If my code is working, an edge points to a person if at that person is following at least, err, lots of the other people [that is: lots of other people who used the hashtag]. So under the assumption that the code is working, this graph shows one person at the centre of the graph who is following lots of people who have tweeted the hashtag. Any guesses who that person might be? People who have edges directed towards them in this sort of plot are people who are heavily following the people using a particular hashtag. If you’re a conference organiser, I’m guessing that you’d probably want to appear in this sort of graph?

(If the code isn’t working, I’m not sure what the hell it is doing, or what the graph shows?!;-)

One other thing I thought I’d look at was the people who are following lots of people on the hashtagging list who haven’t themselves used the hashtag. These are the people to whom the event is being heavily amplified.

So for example, here we have a chart that is constructed as follows. The hashtag twitterers list is constructed from a sample of the most recent 500 opened09 hashtagged tweets around about the time stamp of this post and contains people who are in that list at least 3 times.

The edges on the chart are directed towards people who are not on the hashtag list but who are following more than 13 of the people who are on the list.

Hmmmm… anyway, that’s more than enough confusion for now… I’m going to try not to tinker with this any more for a bit, becuase a holiday beckons and this could turn into a mindf**k project… However, when I do return to it, I think I’m going to have a go at attacking it with a graph/network toolkit, such as NetworkX, and see if I can do a proper bit of network analysis on the resulting graphs.