Conference Treasure Boxes

I’m at OpenEd12 at the moment, in the wonderful city of Vancouver, loving Eduroam, and pondering the Conference Treasure Box

Here it is:

And here’s a peek inside:

…which is to say – here it is:

..and here’s a peek inside:

..and another:

As @cogdog describes:

We are also seeking new ways of documenting the conference experience through the device created by David Darts, the PirateBox. which turns a local space into a communication and sharing network.

Using open source technology and under US$30 in parts, the PirateBox creates a local, open wireless network. Upon joining this network, you are not connected to the internet, but a web server running locally on the box, which is set up with simple tools for uploading and downloading files, synchronous chat, and a message board. All communication with the PirateBox is anonymous.

For non-local participants, there’s also an email route for dropping attachments into the box (I think this is handled directly/automatically, rather than Alan checking his email every so often and then moving any attached files over…?)

This is a really powerful idea, I think, particularly for conference workshop sessions. For the workshop session I’m due to give (with @mhawksey) at ILI2012, one of the things it would be handy to do would be to have data files (and maybe app installers) to hand for participants to make use of. I was thinking a USB memory stick would be the best way of making these files available (assuming flaky conference wifi and sizeable downloads), but something like the PirateBox provide a really neat alternative for local file sharing, with the added advantage you can take some of the load of a typically stressed conference internet wifi network.

I notice that the chatbox includes a simple chat environment (though it didn’t seem to work for me?), but I guess it could also include an etherpad for shared local notetaking?

It also occurs to me that we could be grabbing copies of web pages and files that have been linked to in a Twitter backchannel, generate PDF equivalents of webpages (maybe?!) and then pop copies into the Pirate box to serve as both a file access point for some of the things being shared in the conference, as well as a local archive of resources shared around the event. This provides a complement to a traditional conference proceedings, though it might also include shared copies of presented papers. As to an interface to the shared contents, a Flipboard style news magazine interface comes to mind maybe?

PS via @grantpotter, here’s an even more powerful alternative: FreedomBox.

Creating Simple Interactive Visualisations in R-Studio: Subsetting Data

Watching a fascinating Google Tech Talk by Hadley Wickham on The Future of Interactive Graphics in R – A Joint Visualization and UseR Meetup, I was reminded of the manipulate command provided in R-Studio that lets you create slider and dropdown widgets that in turn let you dynamically interact with R based visualisations, for example by setting data ranges or subsetting data.

Here are a couple of quick examples, one using the native plot command, the other using ggplot. In each case, I’m generating an interactive visualisation that lets me display as a line chart two user selected data series from a larger data set.

manipulate UI builder in RStudio

[Data file used in this example]

Here’s a crude first attempt using plot:

hun_2011comprehensiveLapTimes <- read.csv("~/code/f1/generatedFiles/hun_2011comprehensiveLapTimes.csv")
View(hun_2011comprehensiveLapTimes)

library("manipulate")
h=un_2011comprehensiveLapTimes

manipulate(
plot(lapTime~lap,data=subset(h,car==cn1),type='l',col=car) +
lines(lapTime~lap,data=subset(h,car==cn2 ),col=car),
cn1=slider(1,25),cn2=slider(1,25)
)

This has the form manipulate(command1+command2, uiVar=slider(min,max)), so we see for example two R commands to plot the two separate lines, each of them filtered on a value set by the correpsonding slider variable.

Note that we plot the first line using plot, and the second line using lines.

The second approach uses ggplot within the manipulate context:

manipulate(
ggplot(subset(h,h$car==Car_1|car==Car_2)) +
geom_line(aes(y=lapTime,x=lap,group=car,col=car)) +
scale_colour_gradient(breaks=c(Car_1,Car_2),labels=c(Car_1,Car_2)),
Car_1=slider(1,25),Car_2=slider(1,25)
)

In this case, rather than explicitly adding additional line layers, we use the group setting to force the display of lines by group value. The initial ggplot command sets the context, and filters the complete set of timing data down to the timing data associated with at most two cars.

We can add a title to the plot using:

manipulate(
ggplot(subset(h,h$car==Car_1|car==Car_2)) +
geom_line(aes(y=lapTime,x=lap,group=car,col=car)) +
scale_colour_gradient(breaks=c(Car_1,Car_2),labels=c(Car_1,Car_2)) +
opts(title=paste("F1 2011 Hungary: Laptimes for car",Car_1,'and car',Car_2)),
Car_1=slider(1,25),Car_2=slider(1,25)
)

My reading of the manipulate function is that if you make a change to one of the interactive components, the variable values are captured and then passed to the R command sequences, which then executes as normal. (I may be wrong in this assumption of course!) Which is to say: if you write a series of chained R commands, and can abstract out one or more variable values to the start of the sequence, then you can create corresponding interactive UI controls to set those variable values by placing the command series with the manipulate() context.

More Pivots Around Twitter Data (little-l, little-d, again;-)

I’ve been having a play with Twitter again, looking at how we can do the linked thing without RDF, both within a Twitter context and also (heuristically) outside it.

First up, hashtag discovery from Twitter lists. Twitter lists can be used to collect together folk who have a particular interest, or be generated from lists of people who have used a particular hashtag (as Martin H does with his recipe for Populating a Twitter List via Google Spreadsheet … Automatically! [Hashtag Communities]).

The thinking is simple: grab the most recent n tweets from the list, extract the hashtags, and count them, displaying them in descending order. This gives us a quick view of the most popular hashtags recently tweeted by folk on the list: Popular recent tags from folk a twitter list

This is only a start, of course: it might be that a single person has been heavily tweeting the same hashtag, so a sensible next step would be to also take into account the number of people using each hashtag in ranking the tags. It might also be useful to display the names of folk on the list who have used the hashtag?

I also updated a previous toy app that makes recommendations of who to follow on twitter based on a mismatch between the people you follow (and who follow you) and the people following and followed by another person – follower recommender (of a sort!):

The second doodle was inspired by discussions at Dev8D relating to a possible “UK HE Developers’ network”, and relies on an assumption – that the usernames people use Twitter might be used by the same person on Github. Again, the idea is simple: can we grab a list of Twitter usernames for people that have used the dev8d hashtag (that much is easy) and then lookup those names on Github, pulling down the followers and following lists from Github for any IDs that are recognised in order to identify a possible community of developers on Github from the seed list of dev8d hashtagging Twitter names. (It also occurs to me that we can pull down projects Git-folk are associated with (in order to identify projects Dev8D folk are committing to) and the developers who also commit or subscribe to those projects.)

follower/following connections on github using twitter usernames that tweeted dev8d hashtag

As the above network shows, it looks like we get some matches on usernames…:-)

Handling RDF on Your Own System – Quick Start

One of the things that I think tends towards being a bit of an elephant in the Linked Data room is the practically difficulty of running a query that links together results from two different datastores, even if they share common identifiers. The solution – at the moment at least – seems to require grabbing a dump of both datastores, uploading them to a common datastore and then querying that…

…which means you need to run your own triple store…

This quick post links out the the work of two others, as much as a placeholder for myself as for anything, describing how to get started doing exactly that…

First up, John Goodwin, aka @gothwin, (a go to person if you ever have dealings with the Ordnance Survey Linked Data) on How can I use the Ordnance Survey Linked Data: a python rdflib example. As John describes it:

[T]his post shows how you just need rdflib and Python to build a simple linked data mashup – no separate triplestore is required! RDF is loaded into a Graph. Triples in this Graph reference postcode URIs. These URIs are de-referenced and the RDF behind them is loaded into the Graph. We have now enhanced the data in the Graph with local authority area information. So as well as knowing the postcode of the organisations taking part in certain projects we now also know which local authority area they are in. Job done! We can now analyse funding data at the level of postcode, local authority area and (as an exercise for the ready) European region.

Secondly, if you want to run a fully blown triple store on your own localhost, check out this post from Jeni Tennison, aka @jenit, (a go to person if you’re using the data.gov.uk Linked Datastores, or have an interest in the Linked Data JSON API): Getting Started with RDF and SPARQL Using 4store and RDF.rb, which documents how to get started on the following challenges (via Richard Pope’s Linked Data/RDF/SPARQL Documentation Challenge):

Install an RDF store from a package management system on a computer running either Apple’s OSX or Ubuntu Desktop.
Install a code library (again from a package management system) for talking to the RDF store in either PHP, Ruby or Python.
Programatically load some real-world data into the RDF datastore using either PHP, Ruby or Python.
Programatically retrieve data from the datastore with SPARQL using using either PHP, Ruby or Python.
Convert retrieved data into an object or datatype that can be used by the chosen programming language (e.g. a Python dictionary).

PS it may also be worth checking out these posts from Kingsley Idehen:
SPARQL Guide for the PHP Developer
SPARQL Guide for Python Developer
SPARQL Guide for the Javascript Developer
SPARQL for the Ruby Developer

A Few More Thoughts on GetTheData.org

As we come up to a week in on GetTheData.org, there’s already an interesting collection of questions – and answers – starting to appear on the site, along with a fledgling community (thanks for chipping in, folks:-), so how can we maintain – and hopefully grow – interest in the site?

A couple of things strike me as the most likely things to make the site attractive to folk:

– the ability to find an appropriate – and useful – answer to your question without having to ask it, for example because someone has already asked the same, or a similar, question;
– timely responses to questions once asked (which leads to a sense of community, as well as utility).

I think it’s also worth bearing in mind the context that GetTheData sits in. Many of the questions result in answers that point to data resources that are listed in other directories. (The links may go to either the data home page or its directory page on a data directory site.)

Data Recommendations
One thing I think is worth exploring is the extent to which GetTheData can both receive and offer recommendations to other websites. Within a couple of days of releasing the site, Rufus had added a recommendation widget that could recommend datasets hosted on CKAN that seem to be related to a particular question.

GetTheData.org - related datasets on CKAN

What this means is that even before you get a reply, a recommendation might be made to you of a dataset that meets your requirements.

(As with many other Q&A sites, GetTheData also tries to suggest related questions to you when you enter you question, to prompt you to consider whether or not your question has already been asked – and answered.)

I think the recommendation context is something we might be able to explore further, both in terms of linking to recommendations of related data on other websites, but also in the sense of reverse links from GetTheData to those sites.

For example:

– would it be possible to have a recommendation widget on GetTheData that links to related datasets from the Guardian datastore, or National Statistics?
– are there other data directory sites that can take one or more search terms and return a list of related datasets?
– could a getTheData widget be located on CKAN data package pages to alert package owners/maintainers that a question possibly related to the dataset had been posted on GetTheData? This might encourage the data package maintainer to answer the question on the getTheData site with a link back to the CKAN data package page.

As well as recommendations, would it be useful for GetTheData to syndicate new questions asked on the site? For example, I wonder if the Guardian Datastore blog would be willing to add the new questions feed to the other datablogs they syndicate?;-) (Disclosure: data tagged posts from OUseful.info get syndicated in that way.)

Although I don’t have any good examples of this to hand from GetTheData, it strikes me that we might start to see questions that relate to obtaining data which is actually a view over a particular data set. This view might be best obtained via a particular query onto a particular data set. such as a specific SPARQL query on a Linked Data set, or a particular Google query language request to the visualisation API against a particular Google spreadsheet.

If we do start to see such queries, then it would be useful to aggregate these around the datastores they relate to, though I’m not sure how we could best do this at the moment other than by tagging?

News announcements
There are a wide variety of sites publishing data independently, and a fractured networked of data directories and data catalogues. Would it make sense for GetTheData to aggregate news announcements relating to the release of new data sets, and somehow use these to provide additional recommendations around data sets?

Hackdays and Data Fridays
As suggested in Bootstrapping GetTheData.org for All Your Public Open Data Questions and Answers:

If you’re running a hackday, why not use GetTheData.org to post questions arising in the scoping the hacks, tweet a link to the question to your event backchannel and give the remote participants a chance to contribute back, at the same time adding to the online legacy of your event.

Alternatively, how about “Data Fridays”, on the first Friday in the month, where folk agree to check GetTheData two or three times that day and engage in something of a distributed data related Question and Answer sprint, helping answer unanswered questions, and maybe pitching in a few new ones?

Aggregated Search
It would be easy enough to put together a Google custom search engine that searches over the domains of data aggregation sites, and possibly also offer filetype search limits?

So What Next?
Err, that’s it for now…;-) Unless you fancy seeing if there’s a question you can help out on right now at GetTheData.org

Ba dum… Education for the Open Web Fellowship: Uncourse Edu

A couple of weeks ago, I started getting tweets and emails linking to a call for an Education for the Open Web Fellowship from the Mozilla and Shuttleworth Foundations.

The way I read the call was that the fellowship provides an opportunity for an advocate of open ed on the web to do their thing with the backing of a programme that sees value in that approach…

…and so, I’ve popped an (un)application in (though not helped with having spent the weekend in a sick bed… bleurrrgh… man flu ;-) It’s not as polished as it should be, and it could be argued that it’s unfinished, but that is, erm, part of the point… After all, my take on the Fellowship is that the funders are seeking to act as a patron to a person and helping them achieve as much as they can, howsoever they can, as much as it is supporting a very specific project? (And if I’m wrong, then it’s right that my application is wrong, right?!;-)

The proposal – Uncourse Edu – is just an extension of what it is I spend much of my time doing anyway, as well as an attempt to advocate the approach through living it: trying to see what some of the future consequences of emerging tech might be, and demonstrating them (albeit often in way that feels too technical to most) in a loosely educational context. As well as being my personal notebook, an intended spin-off of this blog is to try help drive down barriers to use of web technologies, or demonstrate how technologies that are currently only available to skilled developers are becoming more widely usable, and access to them as building blocks is being “democratised”. As to what the barriers to adoption are, I see them as being at least two-fold: one is ease of use (how easy the technology is to actually use); the second is attitude: many people just aren’t, or don’t feel they’re allowed to be, playful. This stops them innovating in the workplace, as well as learning for themselves. (So for example, I’m not an auto-didact, I’m a free player…;-)

The Fellowship applications are templated (loosely) and submitted via the Drumbeat project pitching platform. This platform allows folk to pitch projects and hopefully gather support around a project idea, as well as soliciting (small amounts of) funding to help run a project. (It’d be interesting if in any future rounds of JISC Rapid Innovation Funding, projects were solicited this way and one of the marking criterion was the amount of support a pitched proposal received?)

I’m not sure if my application is allowed to change, but if it doesn’t get locked by the Drumbeat platform it may well do so… (Hopefully I’ll get to do at least another iteration of the text today…) In particular, I really need to post my own video about the project (that was my undone weekend task:-(

Of course, if you want to help out producing the video, and maybe even helping shape the project description, then why not join the project? Here’s the link again: Uncourse Edu.

PS I think there’s a package on this week’s OU co-produced episode of Digital Planet on BBC World Service (see also: Digital Planet on open2) that includes an interview with Mark Shuttleworth and a discussion about some of the work the Shuttleworth Foundation gets up to… (first broadcast is tomorrow, with repeats throughout the week).

DISCLAIMER: I’m the OU academic contact for the Digital Planet.

As Time Goes By, It Makes a World of Diff

Prompted by a DevCSI Developer Focus Group conference call just now, I had a quick look through the list of Bounty competition entries (and the winners to see whether there was any code that that might be fun to play with.

One app that’s quite fun is a simple app by Chris Gutteridge (Wayback/Memento Animation) that animates the history of a website using archived copies of the site from the Wayback Machine. So for example, here’s the animated history of the OU home page

And here are links to the history of the current Labour Party and Conservative Party domains: The animated history of: http://www.labour.org.uk/ and The animated history of: http://www.conservatives.com/.

The app will also animate changes from a MediaWiki wiki as this link demonstrates: Dev8D wiki changes over time.

(I can’t help thinking it needs: a) a pause button, so at least you can scroll up and down a page, if not explore the site; and b) a bookmarklet, to make it easier to get a site into the replayer;-)

The Dev8D pages also suggest a “Web Diff” app was entered in one of the challenges, but I couldn’t see a link to the app anywhere?

Diffs have been on my mind lately in a slightly different context, in particular relating to the changes made to the Digital Economy Bill on the various stages it went through as it passed through the Lords, but here again a developer challenge event turned up the goods, in this case the Rewired State: dotgovlabs held last Saturday and @1jh’s Parliamentary Bill analyser:

So for example, if we compare the Digital Economy Bill as introduced to the Lords:
http://www.publications.parliament.uk/pa/ld200910/ldbills/001/10001.i-ii.html
and the version that was passed to the Commons:
http://www.publications.parliament.uk/pa/cm200910/cmbills/089/10089.i-iii.html
here’s what we get:

Luvverly stuff :-)

PS @cogdog beats me to it again in a comment to Reversible, Reverse History and Side-by-Side Storytelling, specifically: “maybe this is like watching Memento backwards?” Which is to say, maybe the Wayback/Memento Animation should have a “play backwards” switch? And of course, this being a Chris Gutteridge production, it has. So for example, going back in time with the JISC home page

(Sob, I have no original ideas any more, and can’t even think of them before other people do, let alone implement them…;-(

Grabbing JSON Data from One Web Page and Displaying it in Another

Lots of web pages represent data within the page as a javascript object. But if you want to make use of that data in another page, how can you do that?

A case in point is Yahoo Pipes. The only place I’m currently aware of where we can look at how a particular Yahoo pipe is constructed is the Yahoo Pipes editor. The pipe is represented as a Javascript object within the page (as described in Starting to Think About a Yahoo Pipes Code Generator), but it’s effectively locked into the page.

So here’s a trick for liberating that representation…

Firstly, we need to know what the name of the object is. In the case of Yahoo Pipes, the pipe’s definition is contained in the editor.pipe.definition [NO: it’s in editor.pipe.working] object.

In order to send the object to another page on the web, the first thing we need to do is generate a text string view of it that we can POST to another web page. This serialised representation of the object can be obtained by calling the .toSource() function on it.

The following bookmarklets show what that representation looks like.

<!– *** [UPDATE: the following bookmarks don't provide a complete description of the pipe – .toSource() doesnlt appear to dig into arrays… ]*** <- WRONG…I thought the missing data is in the terminaldata but it isn’t.. hmmm… –> UPDATE – found it? editor.pipe.module_info DOUBLE UPDATE: nah… that is more the UI side of things.., so where are the actual pipe RULEs defined (e.g. the rules in a Regular Expression block
UPDATE – found the RULE data – *** UPDATE 2 – Found it… I should be using editor.pipe.working NOT editor.pipe.definition

Firstly, we can display the serialised representation in a browser alert box:

javascript:(function(){alert(editor.pipe.working.toSource())})()

Alternatively, we can view it in the browser console (for example, in Firefox, we might do this via the Firebug plugin):

javascript:(function(){console.log(editor.pipe.working.toSource())})()

The object actually contains several other objects, not all of which are directly relevant to the logical definition of the pipe (e.g. they are more to do with layout), so we can modify the console logging bookmarklet to make it easier to see the two objects we are interested in – the definitions of each of the pipe blocks (that is, the pipe editor.pipe.definition.modules), and the connections that exist between the modules (editor.pipe.definition.wires; [UPDATE: we also need the terminaldata]):

javascript:(function(){var c=console.log;var p=editor.pipe.working;c('MODULES: '+p.modules.toSource());c('WIRES: '+p.wires.toSource());c('TERMINALS: '+p.terminaldata.toSource())})()


[terminaldata not shown]

To actually send the representation to another web page, we can use a bookmarklet to dynamically create a form element, attach the serialised object to it as a form argument, append the form to the page and then submit it:

javascript:(function(){var ouseful={};ouseful=editor.pipe.working;ouseful=ouseful.toSource(); var oi=document.createElement('form');oi.setAttribute('method','post');oi.setAttribute('name','oif');oi.setAttribute('action','http://ouseful.open.ac.uk/ypdp/jsonpost.php');var oie=document.createElement('input');oie.setAttribute('type','text');oie.setAttribute('name','data');oie.setAttribute('value',ouseful);oi.appendChild(oie);document.body.appendChild(oi);document.oif.submit();})()

In this case, the page I am submitting the form to is a PHP page. The code to accept the POST serilaised object, and then republish as a javascript object wrapped in a callback function (i.e. package it so it can be copied and then used within a web page).

&lt;?php
$str= $_POST['data'];
$str = substr($str, 1, strlen($str) - 2); // remove outer ( and )
$str=stripslashes($str);
echo &quot;ypdp(&quot;.$str.&quot;)&quot;;
?&gt;

[Note that I did try to parse the object using PHP, but I kept hitting all sorts of errors with the parsing of it… The simplest approach was just to retransmit the object as Javascript so it could be handled by a browser.]

If we want to display the serialsed version of the object in another page, rather than in an alert box or the browser console, we need to pass the the serialised object within the URI using an HTTP GET to the other page, so we can generate a link to it. For long pipes, this might break..*

*(Anyone know of an equivalent to a URL shortening service that will accept HTTP POST arguments and give you a short URL that will do a POST on your behalf? [As well as the POST payload we’d need to pass the target URL (i.e. the address to which the POST data is to be sent), to the shortener. It would then give you a short URL, such that when you click on it it will POST the data to the desired target URL. I suppose another approach would be a service that will store the post data for you, give you a short URI in return, and then you call the short URI with the address of the page you want the data posted to as a key?)

PS If you do run the bookmarklet to generate a URI that contains the serialised version of the pipe, (that is, use a GET method in the form and a $_GET handler in the PHP script), you can load the object (wrapped in the ypdp() callback function) into your own page via a <script> element in the normal way, by setting the src attribute of the script to the URI that includes the serialsed version of the pipe description.

Some of My Dev8D Tinkerings – Yahoo Pipes Quick Start Guide, Cross-Domain JSON with JQuery and Council Committee Treemaps from OpenlyLocal

One of the goals I set myself for this year’s Dev8D was to get round to actually using some of the things I’ve been meaning to try out for ages, particularly Google App Store and JQuery, and also to have a push on some of the many languishing “projects” I’ve started over the last year, tidying up the code, making the UIs a little more presentable, and so on…

Things never turn out that way, of course. Instead, I did a couple of presentations, only one of which I was aware of beforehand!;-) a chance remark highlighting me to the fact I was down to do a lightning talk yesterday…

I did start looking at JQuery, though, and did manage to revisit the Treemapping Council Committees Using OpenlyLocal Data idea I’d done a static proof of concept for some time ago…

On the JQuery front, I quickly picked up how easy it is to grab JSON feeds into a web page if you have access to JSON-P (that is, the ability to attach a callback function to a JSON URL so you can call a function in the web page with the object as soon as it loads), but I also ran into a couple of issues. Firstly, if I want to load more than one JSON feed into a page, and then run foo(json1, json2, json3, json4, json5), how do I do it? That is, how do I do a “meta-callback” that fires when all the separate JSON calls have loaded content into the page. (Hmm – I just got a payoff from writing this para and then looking at it – it strikes me I could do a daisy chain – use the callback from the first JSON call to call the second JSON object, use the callback from that to call the third, and so on; but that’s not very elegant…?) And secondly, how do I get a JSON object into a page if there is no callback function available (i.e. no JSON-P support)?

I’m still stuck on the first issue (other than the daisy chain/bucket brigade hack), but I found a workaround for the second – use a Yahoo pipe as a JSON-P proxy. I’ll be writing more about this in a later post, but in the meantime, I popped a code snippet up on github.

On the Openlylocal/council treemap front, I’d grabbed some sample JSON files from the Openlylocal site as I left Dev8D last night for the train home, and managed to hack the resulting objects into a state that could be used to generate the treemap from them.

A couple of hours fighting with getting the Openlylocal JSON into the page (solved as shown above with the Pipes hack) and I now have a live demo – e.g. http://ouseful.open.ac.uk/test/ccl/index-dyn.php?id=111. The id is the openlylocal identifier used to identify a particular council on the Openlylocal site.

If you’re visiting Openlylocal council pages, the following bookmarklet will (sometimes*;-) display the corresponding council committee treemap:

javascript:var s=window.location.href;s=s.replace(/.*=/,””);window.location.href=”http://ouseful.open.ac.uk/test/ccl/index-dyn.php?id=”+s;

(It works for pages with URLs that end =NNN;-)
Council committee treemap

The code is still a bit tatty, and I need to tidy up the UI, (and maybe also update to a newer JIT visualisation library), so whilst the URI shown above will persist, I’ll be posting an updated version to somewhere else (along with a longer post about how it all works) when I get round to making the next set of tweaks… Hopefully, this will be before Dev8D next year!;-)

PS I also had a huge win in discovering a javascript function that works at least on Firefox: .toSource(). Apply it to a javascript object (e.g. myobj.toSource() and then if you do things like alert(myobj.toSource()) you can get a quick preview of the contents of that object without having to resort to a debugger or developer plugin tool:-)

PPS can you tell my debugging expertise is limited to: alert(“here”); all over the place ;-) Heh heh…

Starting to Think About a Yahoo Pipes Code Generator

Following a marathon session demoing Yahoo Pipes yesterday (the slides I didn’t really use but pretty much covered are available here) I thought I’d start to have a look at what would be involved in generating a Pipes2PHP, Pipes2Py, or Pipes2JS conversion tool as I’ve alluded to before (What Happens If Yahoo! Pipes Dies?)…

So how are pipes represented within the Yahoo Pipes environment? With a little bit of digging around using the Firebug extension to Firefox, we can inspect the Javascript object representation of a pipe (that is, the thing that is used to represent the pipework and that gets saved to the server whenever we save a pipe).

So to start, let’s look at the following simple pipe:

SImple pipe

Here’s a Firebug view showing the path (editor.pipe.definition should be: editor.pipe.working) to the representation of a pipe:

And here’s what we see being passed to the Yahoo pipes server when the pipe is saved…

Here’s how it looks as a Javascript object:

"modules":[{"type":"fetch","id":"sw-502","conf":{"URL":{"value":"http://writetoreply.com/feed","type":"url"}}},{"type":"output","id":"_OUTPUT","conf":{}},{"type":"filter","id":"sw-513","conf":{"MODE":{"type":"text","value":"permit"},"COMBINE":{"type":"text","value":"and"},"RULE":[{"field":{"value":"description","type":"text"},"op":{"type":"text","value":"contains"},"value":{"value":"the","type":"text"}}]}}],"terminaldata":[],"wires":[{"id":"_w3","src":{"id":"_OUTPUT","moduleid":"sw-502"},"tgt":{"id":"_INPUT","moduleid":"sw-513"}},{"id":"_w6","src":{"id":"_OUTPUT","moduleid":"sw-513"},"tgt":{"id":"_INPUT","moduleid":"_OUTPUT"}}

Let’s try to pick that apart a little… firstly, all the modules are defined. Here’s the Fetch module:

{
 "type":"fetch",
 "id":"sw-502",
 "conf":{
  "URL":{
   "value":"http://writetoreply.com/feed",
   "type":"url"
  }
 }
}

The output module:

{
 "type":"output",
 "id":"_OUTPUT",
 "conf":{}
}

The filter module:

{
 "type":"filter",
 "id":"sw-513",
 "conf":{
  "MODE":{"type":"text","value":"permit"},
  "COMBINE":{"type":"text","value":"and"},
  "RULE":[{
   "field":{"value":"description","type":"text"},
   "op":{"type":"text","value":"contains"},
   "value":{"value":"the","type":"text"}
  }]
 }
}

Each of these blocks (that is, modules) has a unique id. The wires then specify how these modules are connected.

So here’s the wire that connects the output of the fetch block to the input of the filter module:

{
 "id":"_w3",
 "src":{
  "id":"_OUTPUT",
  "moduleid":"sw-502"
 },
 "tgt":{
  "id":"_INPUT",
  "moduleid":"sw-513"
 }
}

And here we connect the output of the filter to the input of the output block:

{
 "id":"_w6",
 "src":{
  "id":"_OUTPUT",
  "moduleid":"sw-513"
 },
 "tgt":{
  "id":"_INPUT",
  "moduleid":"_OUTPUT"
 }
}

***UPDATE – I’m not sure if we also need to look at the terminaldata information. I seem to have lost sight of where the multiple “RULES” that might appear inside a block are described…? Ah…. editor.pipe.module_info? Hmm, not – that is more the UI side of things.., so where are the actual pipe RULEs defined (e.g. the rules in a Regular Expression block?)***

*** UPDATE 2 – Found it… I should be using editor.pipe.working NOT editor.pipe.definition

So what would a code generator need to do? I’m guessing one way would be to do something like this…

  • for each module, create an equivalent function by populating a templated function with the appropriate arguments e.g.
    f_sw-502(){ returnfetchURL(“http://writetoreply.org/feed&#8221;) }
  • for each wire, do something along the lines of f_sw-513(f_sw-502()); it’s been a long day, so I’m not sure how to deal with modules that have multiple inputs? But this is just the start, right…? (If anyone else is now intrigued enough to start thinking about building a code generator from a pipes representation, please let me know…;-)

As to why this approach might be useful?
– saving a copy of the Javascript representation of a pipe gives us an archival copy of the algorithm, albeit in a javascripty objecty way…
– if we have a code generator, we can use Yahoo Pipes as a rapid prototyping tool to create code that can be locally hosted.

PS a question that was raised a couple of times in the session yesterday related to whether or not Yahoo pipes can be run behind a corporate firewall. I don’t think it can, but does anyone know for sure? Is there a commercial offering available, for example, so corporate folk can run their own instance of pipes in the privacy of their own network?

PPS here’s a handy trick… when in a Yahoo pipes page, pop up the description of the pipe with this javascript call in a Firefox location bar:
javascript:alert(editor.pipe.definition.toSource());