OUseful.Info, the blog…

Trying to find useful things to do with emerging technologies in open education

Trackforward – Following the Consequences with N’th Order Trackbacks

with 9 comments

One of the nice things about blogging within the WordPress ecosystem is the way that trackbacks/pingbacks capture information about posts that link back to your posts, in much the same way that using the link: search limit on a web or blog search engine allows you to see what other webpages are linking back to a particular web page.

In the latter case, for example, searching for link:http://hedebate.jiscinvolve.org/on-line-higher-education-learning/ on Google blogsearch will turn up blog posts that link back to the original HE Debate blog post on On-Line Higher Education Learning.

(Actually, that’s not quite true. In an apparent tweak of the Google blogsearch algorithm last year, the Google blogsearch engine now seems to be indexing and returning results from complete web pages rather than indexing the content of RSS feeds i.e. blog posts – which means that as well as the useful links referred to in the body of a post, links are also indexed from blogrolls, twitter feeds and bookmark lists displayed in blog sidebars, blog comments etc etc. Which in turn is to say that Google blogsearch qua a web search of blog web pages is not much use as a blog search engine at all…)

By judicious linking back to your own blog posts, it’s possible to build up quite complex pathways between related posts that are navigable in two directions: from one post that links to another, previously published post, via an inline link; and “forwards” in time to a later post that has itself linked back to a post of interest and been picked up via a trackback/pingback.

(For examples of these emergent link structures, see Emergent Structure in the Digital Worlds Uncourse Blog Experiment, Uncovering a Little More Digital Worlds Structure and Trackback Graphs and Blog Categories.)

So the question arises – if I write a blog post that several other people link back to, and several further posts in turn link back to those posts that referred back to my post, but not my original post, how do I keep track of the conversation?

Keeping track of posts that cite my post is easy enough – if I have an effective pingback set-up, that will tell me who’s linking back to my posts; or I can simply run link: searches against the URLs of my posts every so often to see who the search engines think are linking back to me.

The answer lies in a recursive algorithm of the form:

function showInLinks($url){
  $links=getLinksto($url);
  foreach ($link in $links){
    print $link;
    showInLinks($link)
  }
}

This will then display URLs for the pages that link to an originally specified URL, the URLs of pages that link to those URLs, and so on…

So here for example is a quick test:

The items numbered “1.” are links that Google blogsearch thinks link back to the original URL. The items numbered “2.” are links that link to the links that link back to the original URL.

Here’s some minimal PHP code if you want to try it out:

<?php
$urlstub = "http://ajax.googleapis.com/ajax/services/search/blogs?scoring=d&v=1.0&rsz=large&q=link%3A";
$url="http://halfanhour.blogspot.com/2008/11/future-of-online-learning-ten-years-on_16.html";
if ($_GET['url']) $url=$_GET['url'];
$testurl=$urlstub.$url;
echo "Starting with: ".$url."<br/>";
echo "via: ".$testurl."<br/><br/>";
$depth=0;

function handlelinks($url, $depth){
	$urlstub = "http://ajax.googleapis.com/ajax/services/search/blogs?v=1.0&rsz=large&q=link%3A";
	//echo "testing".$url."  ";
	$depth++;
	$testurl=$urlstub.$url;
	//echo "testing ".$testurl."  ";
	$ch = curl_init();
	curl_setopt($ch, CURLOPT_URL, $testurl);
	curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
	$body = curl_exec($ch);
	curl_close($ch);
	// now, process the JSON string
	$json = json_decode($body);
	//var_dump($json); echo "<br/&gt";    
	if ($depth<3) 
	  foreach (responseData->results as $result) {
		for ($i=0;$i<$depth;$i++) echo "  ";
		echo $depth.".$result->title;
		echo "<a href='".$result->postUrl."'>".$result->postUrl."</a><br/>";
		handlelinks($result->postUrl, $depth);
	 }
}
handlelinks($url, $depth);
?>

By using this sort of algorithm to generate an RSS feed of links, it becomes possible to subscribe to a feed that will keep you updated of all the downstream posts (“blogversation” posts) that are contributing to a discussion that at some point referred to a URL you are interested in.

Written by Tony Hirst

January 8, 2009 at 1:09 pm

Posted in Analytics, Tinkering

Tagged with

9 Responses

Subscribe to comments with RSS.

  1. That’s very clever, Tony…. I may need one, two more cups of coffee to process. Thoughts and questions:

    * As a recursive function, should there me some exit condition? Of course that assumes some insane popularity of a URL

    * Are there any implications for hitting the google api so many times?

    * What is the significance of the order of said list? Is it by search rank prevalence? Should it by time? By?…

    * This produces a snapshot; what might be interesting is some way to record the growth over time, even visually, to show how a URl grows, spreads, dies…

    damn clever, thanks

    Alan Levine

    January 8, 2009 at 3:23 pm

  2. I’m loving your recent posts – they’ve prompted me to explore Pipes properly.

    I’m struggling with relevancy in the blog searches I’m running on Google Blogsearch and aggregating in my pipes. There’s the problem with sites like Blogspot which put tag mentions in the sidebar and hence return spurious results (as per your point above), but also lots of spam blogs too which just re-publish random post from elsewhere.

    A colleague pointed me to IceRocket as a better blog search engine recently, which might help. But have you got any further pointers on improving relevancy of this kind of approach?

    Steph

    January 8, 2009 at 3:25 pm

  3. @alan

    “As a recursive function, should there me some exit condition? Of course that assumes some insane popularity of a URL”

    Arrgh – yes – i didn’t escape the code properly and wordpress ata a bit (coreected now) – I used a simple trap to limit the depth of the recurse (” if depth < 3″)

    “Are there any implications for hitting the google api so many times?”

    Maybe ;-) – the code was more proof of concept; would be good if this was taken up as a service by a proper blogsearch engine that didn’t index blogrolls etc and just limited itself to indexing feed content… ;-)

    “What is the significance of the order of said list? Is it by search rank prevalence? Should it by time? By?…”

    the order is just the results from each query from the blogsearch api call (the search query can return top results or most recent results; limited to max 8 results returned). I guess i could postrank, get more than 8 results etc etc? ( http://ouseful.wordpress.com/2008/12/17/getting-lots-of-results-out-of-a-google-custom-search-engine-cse-via-rss/ )

    “This produces a snapshot; what might be interesting is some way to record the growth over time, even visually, to show how a URl grows, spreads, dies…”

    I more of had in mind this routine producing a feed so that you would get the latest results; the reader could then aggregate the results over time (inefficient, I know, on the search calls; not sure if there is a search limit that can just find results SINCE a time, which could then be used to collect data according to a cron schedule?

    Tony Hirst

    January 9, 2009 at 9:24 am

  4. [...] Trackforward, not to be confused with ForwardTrack by Eyebeam, is an idea put forward by Tony Hirst at OUseful. The idea is to find not just the sites that link to your postings, but the sites that link to those sites but not directly to your original post. With current tools, it’s not possible to track linking patterns like this. So the question arises – if I write a blog post that several other people link back to, and several further posts in turn link back to those posts that referred back to my post, but not my original post, how do I keep track of the conversation? [...]

  5. [...] See the original post here: Trackforward – Following the Consequences with N’th Order … [...]

  6. @steph I’ve found the Google Reader search to be okay – plus it lets you search through the feeds you subscribe to, and the posts you have actually read, in effect providing you with various flavours of custom search engine, but so far I haven’t found an API or a way of subscribing to the results via an external feed.

    Rest assured, I’ll post about any effective blogsearch tools I come across.:-)

    Tony Hirst

    January 11, 2009 at 11:30 pm

  7. [...] Trackforward – Following the Consequences with N’th Order Trackbacks, I showed a technique for tracking the posts that link to a particular URI, the posts that link to [...]

  8. [...] entradas relacionadas que aparezcan en el futuro.” Tony Hirst, OUseful Info, January 8, 2009 [Liga] [etiquetas: Google, [...]

  9. [...] put me in mind of Trackforward – Following the Consequences with N’th Order Trackbacks and Trackbacks, Tweetbacks and the Conversation Graph, Part I where I’d started thinking [...]


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 150 other followers