OUseful.Info, the blog…

Trying to find useful things to do with emerging technologies in open education

Archive for February 2010

Grabbing JSON Data from One Web Page and Displaying it in Another

Lots of web pages represent data within the page as a javascript object. But if you want to make use of that data in another page, how can you do that?

A case in point is Yahoo Pipes. The only place I’m currently aware of where we can look at how a particular Yahoo pipe is constructed is the Yahoo Pipes editor. The pipe is represented as a Javascript object within the page (as described in Starting to Think About a Yahoo Pipes Code Generator), but it’s effectively locked into the page.

So here’s a trick for liberating that representation…

Firstly, we need to know what the name of the object is. In the case of Yahoo Pipes, the pipe’s definition is contained in the editor.pipe.definition [NO: it's in editor.pipe.working] object.

In order to send the object to another page on the web, the first thing we need to do is generate a text string view of it that we can POST to another web page. This serialised representation of the object can be obtained by calling the .toSource() function on it.

The following bookmarklets show what that representation looks like.

<!– *** [UPDATE: the following bookmarks don't provide a complete description of the pipe - .toSource() doesnlt appear to dig into arrays... ]*** <- WRONG…I thought the missing data is in the terminaldata but it isn’t.. hmmm… –> UPDATE – found it? editor.pipe.module_info DOUBLE UPDATE: nah… that is more the UI side of things.., so where are the actual pipe RULEs defined (e.g. the rules in a Regular Expression block
UPDATE – found the RULE data – *** UPDATE 2 – Found it… I should be using editor.pipe.working NOT editor.pipe.definition

Firstly, we can display the serialised representation in a browser alert box:

javascript:(function(){alert(editor.pipe.working.toSource())})()

Alternatively, we can view it in the browser console (for example, in Firefox, we might do this via the Firebug plugin):

javascript:(function(){console.log(editor.pipe.working.toSource())})()

The object actually contains several other objects, not all of which are directly relevant to the logical definition of the pipe (e.g. they are more to do with layout), so we can modify the console logging bookmarklet to make it easier to see the two objects we are interested in – the definitions of each of the pipe blocks (that is, the pipe editor.pipe.definition.modules), and the connections that exist between the modules (editor.pipe.definition.wires; [UPDATE: we also need the terminaldata]):

javascript:(function(){var c=console.log;var p=editor.pipe.working;c('MODULES: '+p.modules.toSource());c('WIRES: '+p.wires.toSource());c('TERMINALS: '+p.terminaldata.toSource())})()


[terminaldata not shown]

To actually send the representation to another web page, we can use a bookmarklet to dynamically create a form element, attach the serialised object to it as a form argument, append the form to the page and then submit it:

javascript:(function(){var ouseful={};ouseful=editor.pipe.working;ouseful=ouseful.toSource(); var oi=document.createElement('form');oi.setAttribute('method','post');oi.setAttribute('name','oif');oi.setAttribute('action','http://ouseful.open.ac.uk/ypdp/jsonpost.php');var oie=document.createElement('input');oie.setAttribute('type','text');oie.setAttribute('name','data');oie.setAttribute('value',ouseful);oi.appendChild(oie);document.body.appendChild(oi);document.oif.submit();})()

In this case, the page I am submitting the form to is a PHP page. The code to accept the POST serilaised object, and then republish as a javascript object wrapped in a callback function (i.e. package it so it can be copied and then used within a web page).

&lt;?php
$str= $_POST['data'];
$str = substr($str, 1, strlen($str) - 2); // remove outer ( and )
$str=stripslashes($str);
echo &quot;ypdp(&quot;.$str.&quot;)&quot;;
?&gt;

[Note that I did try to parse the object using PHP, but I kept hitting all sorts of errors with the parsing of it... The simplest approach was just to retransmit the object as Javascript so it could be handled by a browser.]

If we want to display the serialsed version of the object in another page, rather than in an alert box or the browser console, we need to pass the the serialised object within the URI using an HTTP GET to the other page, so we can generate a link to it. For long pipes, this might break..*

*(Anyone know of an equivalent to a URL shortening service that will accept HTTP POST arguments and give you a short URL that will do a POST on your behalf? [As well as the POST payload we’d need to pass the target URL (i.e. the address to which the POST data is to be sent), to the shortener. It would then give you a short URL, such that when you click on it it will POST the data to the desired target URL. I suppose another approach would be a service that will store the post data for you, give you a short URI in return, and then you call the short URI with the address of the page you want the data posted to as a key?)

PS If you do run the bookmarklet to generate a URI that contains the serialised version of the pipe, (that is, use a GET method in the form and a $_GET handler in the PHP script), you can load the object (wrapped in the ypdp() callback function) into your own page via a <script> element in the normal way, by setting the src attribute of the script to the URI that includes the serialsed version of the pipe description.

Written by Tony Hirst

February 28, 2010 at 7:27 pm

Some of My Dev8D Tinkerings – Yahoo Pipes Quick Start Guide, Cross-Domain JSON with JQuery and Council Committee Treemaps from OpenlyLocal

with one comment

One of the goals I set myself for this year’s Dev8D was to get round to actually using some of the things I’ve been meaning to try out for ages, particularly Google App Store and JQuery, and also to have a push on some of the many languishing “projects” I’ve started over the last year, tidying up the code, making the UIs a little more presentable, and so on…

Things never turn out that way, of course. Instead, I did a couple of presentations, only one of which I was aware of beforehand!;-) a chance remark highlighting me to the fact I was down to do a lightning talk yesterday…

I did start looking at JQuery, though, and did manage to revisit the Treemapping Council Committees Using OpenlyLocal Data idea I’d done a static proof of concept for some time ago…

On the JQuery front, I quickly picked up how easy it is to grab JSON feeds into a web page if you have access to JSON-P (that is, the ability to attach a callback function to a JSON URL so you can call a function in the web page with the object as soon as it loads), but I also ran into a couple of issues. Firstly, if I want to load more than one JSON feed into a page, and then run foo(json1, json2, json3, json4, json5), how do I do it? That is, how do I do a “meta-callback” that fires when all the separate JSON calls have loaded content into the page. (Hmm – I just got a payoff from writing this para and then looking at it – it strikes me I could do a daisy chain – use the callback from the first JSON call to call the second JSON object, use the callback from that to call the third, and so on; but that’s not very elegant…?) And secondly, how do I get a JSON object into a page if there is no callback function available (i.e. no JSON-P support)?

I’m still stuck on the first issue (other than the daisy chain/bucket brigade hack), but I found a workaround for the second – use a Yahoo pipe as a JSON-P proxy. I’ll be writing more about this in a later post, but in the meantime, I popped a code snippet up on github.

On the Openlylocal/council treemap front, I’d grabbed some sample JSON files from the Openlylocal site as I left Dev8D last night for the train home, and managed to hack the resulting objects into a state that could be used to generate the treemap from them.

A couple of hours fighting with getting the Openlylocal JSON into the page (solved as shown above with the Pipes hack) and I now have a live demo – e.g. http://ouseful.open.ac.uk/test/ccl/index-dyn.php?id=111. The id is the openlylocal identifier used to identify a particular council on the Openlylocal site.

If you’re visiting Openlylocal council pages, the following bookmarklet will (sometimes*;-) display the corresponding council committee treemap:

javascript:var s=window.location.href;s=s.replace(/.*=/,”");window.location.href=”http://ouseful.open.ac.uk/test/ccl/index-dyn.php?id=”+s;

(It works for pages with URLs that end =NNN;-)
Council committee treemap

The code is still a bit tatty, and I need to tidy up the UI, (and maybe also update to a newer JIT visualisation library), so whilst the URI shown above will persist, I’ll be posting an updated version to somewhere else (along with a longer post about how it all works) when I get round to making the next set of tweaks… Hopefully, this will be before Dev8D next year!;-)

PS I also had a huge win in discovering a javascript function that works at least on Firefox: .toSource(). Apply it to a javascript object (e.g. myobj.toSource() and then if you do things like alert(myobj.toSource()) you can get a quick preview of the contents of that object without having to resort to a debugger or developer plugin tool:-)

PPS can you tell my debugging expertise is limited to: alert(“here”); all over the place ;-) Heh heh…

Written by Tony Hirst

February 27, 2010 at 4:10 pm

Posted in Tinkering

Tagged with , ,

Starting to Think About a Yahoo Pipes Code Generator

with 15 comments

Following a marathon session demoing Yahoo Pipes yesterday (the slides I didn’t really use but pretty much covered are available here) I thought I’d start to have a look at what would be involved in generating a Pipes2PHP, Pipes2Py, or Pipes2JS conversion tool as I’ve alluded to before (What Happens If Yahoo! Pipes Dies?)…

So how are pipes represented within the Yahoo Pipes environment? With a little bit of digging around using the Firebug extension to Firefox, we can inspect the Javascript object representation of a pipe (that is, the thing that is used to represent the pipework and that gets saved to the server whenever we save a pipe).

So to start, let’s look at the following simple pipe:

SImple pipe

Here’s a Firebug view showing the path (editor.pipe.definition should be: editor.pipe.working) to the representation of a pipe:

And here’s what we see being passed to the Yahoo pipes server when the pipe is saved…

Here’s how it looks as a Javascript object:

"modules":[{"type":"fetch","id":"sw-502","conf":{"URL":{"value":"http://writetoreply.com/feed","type":"url"}}},{"type":"output","id":"_OUTPUT","conf":{}},{"type":"filter","id":"sw-513","conf":{"MODE":{"type":"text","value":"permit"},"COMBINE":{"type":"text","value":"and"},"RULE":[{"field":{"value":"description","type":"text"},"op":{"type":"text","value":"contains"},"value":{"value":"the","type":"text"}}]}}],"terminaldata":[],"wires":[{"id":"_w3","src":{"id":"_OUTPUT","moduleid":"sw-502"},"tgt":{"id":"_INPUT","moduleid":"sw-513"}},{"id":"_w6","src":{"id":"_OUTPUT","moduleid":"sw-513"},"tgt":{"id":"_INPUT","moduleid":"_OUTPUT"}}

Let’s try to pick that apart a little… firstly, all the modules are defined. Here’s the Fetch module:

{
 "type":"fetch",
 "id":"sw-502",
 "conf":{
  "URL":{
   "value":"http://writetoreply.com/feed",
   "type":"url"
  }
 }
}

The output module:

{
 "type":"output",
 "id":"_OUTPUT",
 "conf":{}
}

The filter module:

{
 "type":"filter",
 "id":"sw-513",
 "conf":{
  "MODE":{"type":"text","value":"permit"},
  "COMBINE":{"type":"text","value":"and"},
  "RULE":[{
   "field":{"value":"description","type":"text"},
   "op":{"type":"text","value":"contains"},
   "value":{"value":"the","type":"text"}
  }]
 }
}

Each of these blocks (that is, modules) has a unique id. The wires then specify how these modules are connected.

So here’s the wire that connects the output of the fetch block to the input of the filter module:

{
 "id":"_w3",
 "src":{
  "id":"_OUTPUT",
  "moduleid":"sw-502"
 },
 "tgt":{
  "id":"_INPUT",
  "moduleid":"sw-513"
 }
}

And here we connect the output of the filter to the input of the output block:

{
 "id":"_w6",
 "src":{
  "id":"_OUTPUT",
  "moduleid":"sw-513"
 },
 "tgt":{
  "id":"_INPUT",
  "moduleid":"_OUTPUT"
 }
}

***UPDATE – I’m not sure if we also need to look at the terminaldata information. I seem to have lost sight of where the multiple “RULES” that might appear inside a block are described…? Ah…. editor.pipe.module_info? Hmm, not – that is more the UI side of things.., so where are the actual pipe RULEs defined (e.g. the rules in a Regular Expression block?)***

*** UPDATE 2 – Found it… I should be using editor.pipe.working NOT editor.pipe.definition

So what would a code generator need to do? I’m guessing one way would be to do something like this…

  • for each module, create an equivalent function by populating a templated function with the appropriate arguments e.g.
    f_sw-502(){ returnfetchURL(“http://writetoreply.org/feed”) }
  • for each wire, do something along the lines of f_sw-513(f_sw-502()); it’s been a long day, so I’m not sure how to deal with modules that have multiple inputs? But this is just the start, right…? (If anyone else is now intrigued enough to start thinking about building a code generator from a pipes representation, please let me know…;-)

As to why this approach might be useful?
- saving a copy of the Javascript representation of a pipe gives us an archival copy of the algorithm, albeit in a javascripty objecty way…
- if we have a code generator, we can use Yahoo Pipes as a rapid prototyping tool to create code that can be locally hosted.

PS a question that was raised a couple of times in the session yesterday related to whether or not Yahoo pipes can be run behind a corporate firewall. I don’t think it can, but does anyone know for sure? Is there a commercial offering available, for example, so corporate folk can run their own instance of pipes in the privacy of their own network?

PPS here’s a handy trick… when in a Yahoo pipes page, pop up the description of the pipe with this javascript call in a Firefox location bar:
javascript:alert(editor.pipe.definition.toSource());

Written by Tony Hirst

February 25, 2010 at 5:13 pm

Posted in Pipework

Tagged with

Online Apps for Live Code Tutorials/Demos

with 4 comments

With Dev8D coming up, here’s a quick round-up/reminder of some tools/techniques for hacking around with code via a browser, or running interactive coding presentations in a browser…

  • Advanced Javascript Tutorial – an interactive Javascript tutorial; double click on the code examples to edit them, then run them in the presentation window; (read more about it here: Adv. JavaScript and Processing.js);
  • Obsessing – an interactive version of Processing that runs in the browser;
  • Hacking with PHP – if you’re looking for ideas about how to present code demos, here’s a good example;
  • Codepad – “an online compiler/interpreter, and a simple collaboration tool. Paste your code, and codepad will run it and give you a short URL you can use to share it in chat or email.”
  • Yahoo Pipes – online tool for hacking around with RSS feeds, CSV, simple screenscraping;
  • Google spreadsheets – 2D programming canvas (honestly!;-)
  • Google Code Playground – an interactive playspace for tinkering with Google APIs;
  • KML Interactive Sampler – mess around with KML code and see how Google Earth treats it. (NB The Google Earth API is also available in the Google Code Playground… so this sampler may be deprecated?)
  • Wonderfl – edit Actionsript and run it live in a browser;
  • jsLinux – boot Linux in a Javascript PC emulator

And if your presentation includes visits to websites, remember to share the URL via a SplashURL bookmarklet (developed at Dev8D last year; SplashURL screencast.)

PS if you know of any other apps in a similar vein, or links to videos showing really effective ways of presenting code, please add a comment below.

- HTML5 presentation in HTML5

PPS On the notion of live docs, see also:
- dexy.it
- Wolfram computable document format (?)

PPPS seems someone is “monetising” interactive coding tutorials… Codecademy

PPPPS sort of related to CDF, the notion of ‘active readers and reactive documents‘ eg as implemented using Tangle Javascript Library

Written by Tony Hirst

February 23, 2010 at 11:44 am

Posted in Anything you want

Tagged with ,

Due Out Soon – The Google “Qualified Developer Program”

leave a comment »

A blog post on the Google GeoDevelopers blog last week announced:

Currently we are in the process of piloting certifications for several new APIs. We are building out certifications for KML, Google Earth Enterprise, and 3D in preparation for our first master certification, the Google Qualified Geo Web Developer. We’re also working on certifications for the AJAX Search API, Enterprise Apps, and Android.

(It seems like I was a little ahead of the curve when I blogged this almost 4 years ago: Google/Yahoo/Amazon Certified Professionals…;-)

There are already certified programmes for Cisco and Microsoft, of course, so it was only a matter of time before we started seeing badges like this one:

I wonder when we’ll be seeing a Google curriculum for computer science degrees too, building on the resources collected as part of the Google Code University? It seems they’re already trying to compete with the OU’s new short course Linux: an introduction with their Tools 101 tutorials, which includes intros to the Linux command line and grep;-) (It would be no loss to HE, of Google did take on compsci education, of course, because Computer Science degrees are ever harder to find, and much harder to do (too much reliance on logic and algorithm design) than Computing degrees… (Hmmm, a case of HE dumnping the academic in favour of the, err, more practical?!;-)

Of course, it may be that the Goog will get into delivering teaching qualifications?

One school subject area I think they could drive curriculum development is in geography – you do know they have a Geo Education website, don’t you…?;-)

Why does this matter? The internet based communications revolution hasn’t yet had a huge impact on the way we examine, assess and validate learning in formal academic education or on the curricula that are delivered. Maybe it shouldn’t. But whilst corporates have always produced educational promo packs, their reach has been limited to those students studying under teachers who have made use of those materials. And now we have search engines, and students will be coming across learning materials with corporate branding in the course of their own research. Maybe the kids will discount these materials as ‘tainted’ in some corporate way? Maybe they’ll see them as training materials and discount them as irrelevant to their academic educational studies? Or maybe they’ll see them as part of that userguide to the world that they’ll be referring to for the rest of their lives?

See also: Education, Training and Lifelong Learning, and Towards Vendor Certification on the Open Web? Google Training Resources.

Written by Tony Hirst

February 22, 2010 at 2:33 pm

Posted in OU2.0, Stirring, Thinkses

Tagged with , ,

Scheduling Content Round the Edges – Supporting OU/BBC Co-Productions

with 2 comments

Following the broadcast of the final episode of The Virtual Revolution, the OU/BBC co-produced history of the web, over the weekend, and the start today of the radio edit on BBC World Service, here are a few thoughts about how we might go about building further attention traps around the programme.

Firstly, additional content via Youtube playlists and a Boxee Channel – how about if we provide additional programming around the edges based on curating 3rd party content (including open educational video resources) as well as OU produced content?

Here’s a quick demo channel I set up, using the DeliTV way of doing things, and a trick I learned from @liamgh (How to build a basic RSS feed application for Boxee):

I opted for splitting up the content by programme:

Whilst the original programme is on iPlayer, we should be able to watch it on Boxee. I also created and bookmarked a Youtube playlist for each episode:

So for example, it’s easy to moderate or curate content that is posted on Youtube via a programme specific playlist.

Here’s the channel definition code:

<app>
<id>bbcRevolution</id>
<name>Virtual Revolution, Enhanced</name>
<version>1.0.1</version>
<description>Watch items related to the BBC/OU Virtual Revolution.</description>
<thumb>http://www.bbc.co.uk/virtualrevolution/images/ou_126x71.jpg</thumb>
<media>video</media>
<copyright>Tony Hirst</copyright>
<email>a.j.hirst@open.ac.uk</email>
<type>rss</type>
<platform>all</platform>
<minversion>0.9.20</minversion>
<url>rss://pipes.yahoo.com/ouseful/delitv?&_render=rss&q=psychemedia/_delitvS+bbcrevolution</url>
<test-app>true</test-app>
</app>

[This needs to be saved as the file descriptor.xml in a folder named bbcRevolution in the location identified in Liam's post... alternatively, I guess it should be possible to prescribe the content you want to appear in the channel literally, e.g. as a list of "hard coded" links to video packages? Or a safer middle way might be to host a custom defined and moderated RSS feed on the open.ac.uk domain somewhere?]

Anyway, here’s where much of the “programming” of the channel takes place in the DeliTV implementation:

(Note that the Youtube playlist content is curated on the Youtube site using Youtube playlists, partly because there appeared to be a few pipework problems with individual Youtube videos bookmarked to delicious as I was putting the demo together!;-)

Secondly, subtitle based annotations, as demonstrated by Martin Hawksey’s Twitter backchannel as iPlayer subtitles hack. The hack describes how to create an iPlayer subtitle feed (I describe some other ways we might view “timed text” here: Twitter Powered Subtitles for BBC iPlayer Content c/o the MASHe Blog).

With The Virtual Revolution also being broadcast in a radio form on the BBC World Service, it strikes me that it could be interesting to consider how we might use timed text to supplement radio broadcasts as well, with either commentary or links, or as Martin described, using a replay of a backchannel from the original broadcast, maybe using something like a SMILtext player alongside the radio player? (Hmmm, something to try out for the next co-pro of Digital Planet maybe..?;-)

Written by Tony Hirst

February 22, 2010 at 12:09 pm

Posted in BBC, OBU, OpenPlatform, OU2.0, Tinkering

Tagged with ,

Grabbing “Facts” from the Guardian Datastore with a Google Spreadsheets Formula

with 2 comments

In Using Data From Linked Data Datastores the Easy Way (i.e. in a spreadsheet, via a formula) I picked up on an idea outlined in Mulling Over =datagovukLookup() in Google Spreadsheets to show how to use Google Apps script to create a formula that could pull in live data from a data.gov.uk datastore.

So just because, here’s how to do something similar with data from a Google spreadsheet in the Guardian datastore. Note that what I’m essentially proposing here is using the datastore as a database…

To ground the example, consider the HE satisfaction tables:

Lots of data about lots of courses on lots of sheets in a single spreadsheet. But how do you compare the satisfaction ratings across subjects for a couple of institutions? How about like this:

Creating Subject comparison tables from Guardian HE data

(We can just click and drag the formula across a range of cells as we would any other formula.)

That is, how about defining a simple spreadsheet function that lets us look up a particular data value for a particular subject and a particular institution? How about being able to write a formula like:
=gds_education_unitable(“elecEng”,”Leeds”,”NSSTeachingPerCent”)
and get the national student satisfaction survey teaching satisfaction result back from students studying Electrical/Electronic Engineering at Leeds University?

Google Apps script provides a mechanism for defining formulae that can do this, and more:

Guardian Datastore as a database

The script takes the arguments and generates a query to the spreadsheet using the spreadsheet’s visualisation API, as used in my Guardian Datastore Explorer. The results are pulled back as CSV, run through a CSV2Javacript object function and then returned to the calling spreadsheet. Here’s an example of the Apps script:

function gds_education_unitable(sheet,uni,typ){
  var key="phNtm3LmDZEM6HUHUnVkPaA";
  var gid='0';//'Overall Institutional Table';
  var category="C"; //(Average) Guardian teaching score
  switch (sheet){
    case "full":
      gid='0';//'Overall Institutional Table';
      break;
    case "chemEng":
      gid='16';//'15 Chem Eng';
      break;
    case "matEng":
      gid='17';//'16 Mat Eng';
      break;
    case "civilEng":
      gid='18';//'17 Civil Eng';
      break;
    case "elecEng":
      gid='19';//'18 Elec Eng';
      break;
    case "mechEng":
      gid='20';//'19 Mech Eng';
      break;
    default:
  }

  switch (typ){
    case "guardianScore":
      category='C';//(Average) Guardian teaching score
      break;
    case "NSSTeachingPerCent":
      category='D';//
      break;
    case "expenditurePerStudent":
      category='E';//
      break;
    case "studentStaffRatio":
      category='F';//
      break;
    default:
  }

  if (sheet!='full') category=String.fromCharCode(category.charCodeAt(0)+2);

  var url="http://spreadsheets.google.com/tq?tqx=out:csv&tq=select%20B%2C"+category+"%20where%20B%20matches%20%22"+uni+"%22&key="+key+"&gid="+gid;

  var x=UrlFetchApp.fetch(url);
  var ret=x.getContentText();
  ret = CSVToArray( ret, "," );
  return ret[1][1];
}

(The column numbering between the first sheet in the spreadsheet and the subject spreadsheets is inconsistent, which is why we need a little bit of corrective fluff (if (sheet!=’full’)) in the code…)

Of course, we can also include script that will generate calls to other spreadsheets, or as I have shown elsewhere to other data sources such as the data.gov.uk Linked Data datastore.

Something that occurred to me though is if and how Google might pull on such “data formula” definitions to feed apps such as Google Squared (related: =GoogleLookup: Creating a Google Fact Engine Directory and Is Google Squared Just a Neatly Packaged and Generalised =googlelookup Array?).

Written by Tony Hirst

February 19, 2010 at 2:12 pm

Twitter Mailing Lists…

leave a comment »

I’m not sure if this app already exists, but it struck me it might be useful for conferences/events – a broadcast Twitterlist that subscribers can send short messages to. (I donlt have time to try to build this, unfortunately.)

So what do we have available to us at the moment?

Hashtags allow communities to come together in an ad hoc way around an event. If you want to keep track of the event’s activities, subscribe to a search on the hashtag. The downside? If you aren’t following the hashtag, then if you aren’t following the people using the hashtag, you won’t see the tweets.

Twitter lists pull together a list of Twitter users and let you see tweets from all of them. So if we run an event and get participants’ Twitter IDs, we can generate a list of participants to provide a single point of access to follow those particpants. For greater salience, we could also run the feed through a hashtag filter, so we only get to see tweets from those participants tagged with the event hashtag. (Note to self – I need to create a “generate list of hashtaggers” from the hashtag community app.)

But a question now arises – what if I can’t rely on every in the event follow the list or hashtag. What if I want to actually send a message to each and every person on the list? That is, what if I want to spam the members of the list?

How about this recipe: create a private Twitter user ID for the event. Encourage event attendees to follow that user (and follow them back) and don’t accept anyone else. Set up an autoresponder that works along the lines of the following. Suppose @follower7 sends:
d eventID BROADCAST: This is the spam message…
Then for every other follower, @eventID sends them a personal message:
@follower1 @follower7 says: “This is the spam message…”
@follower2 @follower7 says: “This is the spam message…”

Of course, Twitter may see this as a spambot and delete it. (If the bot just sends out DMs, does this go under the radar?) However, I prefer to see it as the equivalent of a mailing list…

Written by Tony Hirst

February 19, 2010 at 1:44 pm

Posted in Thinkses

Tagged with ,

Paragraph Level Search Results on WordPress Using Digress.it and Yahoo Pipes

with 3 comments

One of the many RSS related feature requests I put in when we were working on the JISCPress project was the ability to get a page level RSS feed out where each paragraph was represented as a separate item the page feed.

WordPress already delivers a single item RSS feed for each page containing just the substantive content of the page (i.e. the content without the header, footer and sidebar fluff), which means you can do things like this, but what I wanted is for the paragraphs on each page to be atomised as separate feed elements.

Eddie implemented support for this, but I didn’t do anything with it at the time, so here’s an example of just why I thought it might be handy – paragraph level search.

At the moment, searching a document on WriteToReply returns page level results – that is, you get a list of search results detailing the pages on which the search term(s) appear. As you might expect with WordPress, we can get access to these results as a feed by shoving feed in the URI, like this:
http://ouseful.wordpress.com/feed?s=test

Paragraph level feeds, as implemented in the Digress.it WordPress theme we were developing, are keyed by URLs of the form:
http://writetoreply.org/legaldeposit/feed/paragraphlevel/annex-c-online-content-to-be-published/#56

That is:

http://writetoreply.org/DOCNAME/feed/paragraphlevel/PAGENAME/#PARA_NUMBER

So can you guess what I’m gonna do yet…?

First of all, grab the search feed for a particular query on a particular document into a Yahoo Pipe:

Rewrite the URI of each page liked to in the results feed as the full fat, itemised paragraph feed for the page, and emit those items (that is, replace each original search results item with the set of paragraph items from that page).

The next step is to filter those paragrpah feed items for just the paragraphs that contain the original search terms:

We need to rewrite the link because (at the time of writing) the page paragraphs feed doesn’t link to each paragraph, it links to the parent page (a bug report has been made;-)

You can find the pipe here: Double dip JISCPress search

Note that at the time of writing, there’s also a problem with the paragraph number reported in the link (again a report has been made), a workaround patch for which is included in this pipe.

What this means is that we now have a workaround for indexing into individual paragraphs using a search term. If we tag content at the paragraph level, (e.g. by running a page-level paragraph feed, or double dip search results feed through OpenCalais), we can generate related search links into the document, or other documents on the platform, at a paragraph level, increasing the relevance, or resolution (in terms of increased focus), of the returned results.

Just by the by, the approach shown above is based on a search, expand and filter pattern, (cf. a search within results pattern) in which a search query is used to obtain an initial set of results which are then expanded to give higher resolution detail over the content, and then filtered using the original search query to deliver the final results. If a patent for this doesn’t already exist for this, then if I worked for Google, Yahoo, etc etc you could imagine it being patented. B*****ds.

PS here’s a trick I picked up from Joss’ blog somewhere for reversing the order of feed items published by WordPress:
http://writetoreply.org/legaldeposit/feed/?orderby=ID&order=ASC
I assume these parameters also work?

Written by Tony Hirst

February 18, 2010 at 12:04 pm

Using Data From Linked Data Datastores the Easy Way (i.e. in a spreadsheet, via a formula)

with 2 comments

Disclaimer: before any Linked Data purists say I’m missing the point about what Linked Data is, does, or whatever, I don’t care, okay? I just don’t care. This is for me, and me is someone who can’t get to grips with writing SPARQL queries, can’t stand the sight of unreadable <rdf> <all:over the=”place”>, can’t even work out how to find things are queryable in a Linked Data triple store, let alone write queries that link data from one data store with data from another data store (or maybe SPARQL can’t do that yet? Not that I care about that either, because I can, in Yahoo Pipes, or Google Spreadsheets, and in a way that’s meaningful to me…)

In Mulling Over =datagovukLookup() in Google Spreadsheets, I started wondering about whether or not it would be useful to be able to write formulae to look up “live facts” in various datastores from within a spreadsheet (you know, that Office app that is used pretty much universally in workplace whenever there is tabular data to hand. That or Access of course…)

Anyway, I’ve started tinkering with how it might work, so now I can do things like this:

The formulae in columns G, H and I are defined according to a Google Apps script, that takes a school ID and then returns something linked to it in the data.gov.uk education datastore, such as the name of the school, or its total capacity.

Formulae look like:

  • =datagovuk_education(A2,”name”)
  • =datagovuk_education(A2,”opendate”)
  • =datagovuk_education(A2,”totcapacity”)

and are defined to return a single cell element. (I haven’t worked out how to return several cells worth of content from a Google Apps Script yet!)

At the moment, te script is a little messy, taking the form:

function datagovuk_education(id,typ) {
  var ret=""; var args=""
  switch (typ) {
    case 'totcapacity':
      args= _datagovuk_education_capacity_quri(id);
      break;
    ...
    default:
      //hack something here;
  }
  var x=UrlFetchApp.fetch('http://data-gov.tw.rpi.edu/ws/sparqlproxy.php',{method: 'post', payload: args});
  var ret=x.getContentText();
  var xmltest=Xml.parse(ret);
  ret=xmltest.sparql.results.result.binding.literal.getText();

  return ret;
}
function _datagovuk_education_capacity_quri(id){
  return "query=prefix+sch-ont%3A+%3Chttp%3A%2F%2Feducation.data.gov.uk%2Fdef%2Fschool%2F%3E%0D%0ASELECT+%3FschoolCapacity+WHERE+{%0D%0A%3Fschool+a+sch-ont%3ASchool%3B%0D%0Asch-ont%3AuniqueReferenceNumber+"+id+"%3B%0D%0Asch-ont%3AschoolCapacity+%3FschoolCapacity.%0D%0A}+ORDER+BY+DESC%28%3Fdate%29+LIMIT+1&output=xml&callback=&tqx=&service-uri=http%3A%2F%2Fservices.data.gov.uk%2Feducation%2Fsparql";
}

The datagovuk_education(id,typ) function takes the school ID and the requested property, uses the case statement to create an appropriate query string, and then fetches the data from the education datastore, returning the result in an XML format like this. The data is pulled from the datastore via Sparqlproxy, and the query string URIs generated (at the moment) by adding the school ID number into a query string generated by running the desired SPARQL query on Sparqlproxy and then grabbing the appropriate part of the URI. (It’s early days yet on this hack!;-)

By defining appropriate Apps script functions, I can also create formulae to call other datastores, run queries on Google spreadsheets (e.g. in the Guardian datastore) and so on. I assume similar sorts of functionality would be supported using VB Macros in Excel?

Anyway – this is my starter for ten on how to make live datastore data available to the masses. It’ll be interesting to see whether this approach (or one like it) is used in favour of getting temps to write SPARQL queries and RDF parsers… The obvious problem is that my approach can lead to an explosion in the number of formulae and parameters you need to learn; the upside is that I think these could be quite easily documented in a matrix/linked formulae chart. The approach also scales to pulling in data from CSV stores and other online spreadsheets, using spreadsheets as a database via the =QUERY() formula (e.g. Using Google Spreadsheets Like a Database – The QUERY Formula), and so on. There might also be a market for selling prepackaged or custom formulae as script bundles via a script store within a larger Google Apps App store

PS I’m trying to collect example SPARQL queries that run over the various data.gov.uk end points because: a) I’m really struggling in getting my head round writing my own, not least because I struggle to make sense of the ontologies, if I can find them, and write valid queries off the back of them; even (in fact, especially) really simple training/example queries will do! b) coming up with queries that either have interesting/informative/useful results, or clearly demonstrate an important ‘teaching point’ about the construction of SPARQL queries, is something I haven’t yet got a feel for. If you’ve written any, and you’re willing to share, please post a gist to github and add a link in a comment here.

PPS utility bits, so I don’t lose track of them:
- education datastore ontology
- Apps script XML Element class

PPPS HEre’s how to dump a 2D CSV table into a range of cells: Writing 2D Data Arrays to a Google Spreadsheet from Google Apps Script Making an HTTP POST Request for CSV Data

Written by Tony Hirst

February 17, 2010 at 11:52 pm

Follow

Get every new post delivered to your Inbox.

Join 126 other followers