The Yahoo Pipes Documentation Project – Initial Thoughts

One of the issues I keep coming across when trying to get folk interested in using Yahoo Pipes is the question of what happens if Yahoo Pipes Dies? I tend to post a lot of screenshots from some of the pipes I build, so I’d stand some chance of recreating them elsewhere, but this is not generally the case for most pipes users.

If we could compile a version of pipe into code that could be run on our own servers, then we could use Yahoo Pipes as a rapid prototyping tool and code generator, rather than as a runtime platfrom. This might then make the environment more palatable to developers working in a conservative regime (e.g. Starting to Think About a Yahoo Pipes Code Generator).

As a stop gap, I thought it would be worth starting up a little Yahoo Pipes Documentation Project to produce some sort of documentation around Yahoo pipes. Grabbing JSON Data from One Web Page and Displaying it in Another contains a couple of bookmarks that demonstrate a simple way of grabbing the content of a pipe.

Here’s a first attempt reporter, done as a bookmarklet that can be applied when editing a pipe. The result is dsiplayed in the browser console:

javascript:(function(){var p=editor.pipe.working;var d="----- ";var c=console.log;c(d+d);var q,r,w,i,m,k,K,ok,oK,x,X;var D=false;var ins=[];var outs=[];var b=[];var o=[];if(D)c("CONNECTIONS");m=p.wires;for (i=0;i<m.length;i++){if(D)c(d);q=m[i].src;r=m[i].tgt;K=r.moduleid;k=q.moduleid;if (!(b[K])){b[K]={};b[K].conns=[]};if (!(b[k])){b[k]={};b[k].conns=[]};ok={};oK={};x=b[k].conns.length;X=b[K].conns.length;b[k].conns[x]={};b[k].conns[x].o=q.id;b[k].conns[x].typ="out";b[k].conns[x].t=K;b[k].conns[x].T=r.id;b[K].conns[X]={};b[K].conns[X].typ="in";b[K].conns[X].f=k;b[K].conns[X].F=q.id;b[K].conns[X].i=r.id;if(D)c("FROM "+k+" ("+q.id+") TO "+K+" ("+r.id+")");}if(D)c(d+d); m=p.modules;c('MODULES');for (i=0;i<m.length;i++){q=m[i];if (!(b[q.id]))b[q.id]=[];b[q.id].n=q.type};for (i=0;i < m.length;i++){c(d);q=m[i];k=q.id;c("MODULE: "+k+" ("+q.type+")");c("-- PROPERTIES:");for (w in q)c(d+w+" "+q[w].toSource()+" ");c("-- CONNECTIONS:");if (b[k])if(b[k].conns)for (j=0;j<b[k].conns.length;j++){K=b[k].conns[j];if(K.typ=="in"){w=d+'<INPUT> [ '+K.i+' ]';if (K.f)w+=' from '+K.f+' [ '+b[K.f].n+" : "+K.F+' ]';c(w)};if(K.typ=="out")c(d+'<OUTPUT> [ '+K.o+' ] connects to '+K.t+' [ '+b[K.t].n+' : '+K.T+' ]')}};c(d+d)})()

Here’s a snippet of the sort of display it gives:

PS @hapdaniel just alerted me to a way I can get hold of a description of a pipe and make it available via a JSON feed. I’ll get a post up about how to do that when I get a chance… Note that this approach deprecates the method above and the approach alluded to in Grabbing JSON data post. (In the meantime, here’s a taster…)

Grabbing JSON Data from One Web Page and Displaying it in Another

Lots of web pages represent data within the page as a javascript object. But if you want to make use of that data in another page, how can you do that?

A case in point is Yahoo Pipes. The only place I’m currently aware of where we can look at how a particular Yahoo pipe is constructed is the Yahoo Pipes editor. The pipe is represented as a Javascript object within the page (as described in Starting to Think About a Yahoo Pipes Code Generator), but it’s effectively locked into the page.

So here’s a trick for liberating that representation…

Firstly, we need to know what the name of the object is. In the case of Yahoo Pipes, the pipe’s definition is contained in the editor.pipe.definition [NO: it’s in editor.pipe.working] object.

In order to send the object to another page on the web, the first thing we need to do is generate a text string view of it that we can POST to another web page. This serialised representation of the object can be obtained by calling the .toSource() function on it.

The following bookmarklets show what that representation looks like.

<!– *** [UPDATE: the following bookmarks don't provide a complete description of the pipe – .toSource() doesnlt appear to dig into arrays… ]*** <- WRONG…I thought the missing data is in the terminaldata but it isn’t.. hmmm… –> UPDATE – found it? editor.pipe.module_info DOUBLE UPDATE: nah… that is more the UI side of things.., so where are the actual pipe RULEs defined (e.g. the rules in a Regular Expression block
UPDATE – found the RULE data – *** UPDATE 2 – Found it… I should be using editor.pipe.working NOT editor.pipe.definition

Firstly, we can display the serialised representation in a browser alert box:

javascript:(function(){alert(editor.pipe.working.toSource())})()

Alternatively, we can view it in the browser console (for example, in Firefox, we might do this via the Firebug plugin):

javascript:(function(){console.log(editor.pipe.working.toSource())})()

The object actually contains several other objects, not all of which are directly relevant to the logical definition of the pipe (e.g. they are more to do with layout), so we can modify the console logging bookmarklet to make it easier to see the two objects we are interested in – the definitions of each of the pipe blocks (that is, the pipe editor.pipe.definition.modules), and the connections that exist between the modules (editor.pipe.definition.wires; [UPDATE: we also need the terminaldata]):

javascript:(function(){var c=console.log;var p=editor.pipe.working;c('MODULES: '+p.modules.toSource());c('WIRES: '+p.wires.toSource());c('TERMINALS: '+p.terminaldata.toSource())})()


[terminaldata not shown]

To actually send the representation to another web page, we can use a bookmarklet to dynamically create a form element, attach the serialised object to it as a form argument, append the form to the page and then submit it:

javascript:(function(){var ouseful={};ouseful=editor.pipe.working;ouseful=ouseful.toSource(); var oi=document.createElement('form');oi.setAttribute('method','post');oi.setAttribute('name','oif');oi.setAttribute('action','http://ouseful.open.ac.uk/ypdp/jsonpost.php');var oie=document.createElement('input');oie.setAttribute('type','text');oie.setAttribute('name','data');oie.setAttribute('value',ouseful);oi.appendChild(oie);document.body.appendChild(oi);document.oif.submit();})()

In this case, the page I am submitting the form to is a PHP page. The code to accept the POST serilaised object, and then republish as a javascript object wrapped in a callback function (i.e. package it so it can be copied and then used within a web page).

&lt;?php
$str= $_POST['data'];
$str = substr($str, 1, strlen($str) - 2); // remove outer ( and )
$str=stripslashes($str);
echo &quot;ypdp(&quot;.$str.&quot;)&quot;;
?&gt;

[Note that I did try to parse the object using PHP, but I kept hitting all sorts of errors with the parsing of it… The simplest approach was just to retransmit the object as Javascript so it could be handled by a browser.]

If we want to display the serialsed version of the object in another page, rather than in an alert box or the browser console, we need to pass the the serialised object within the URI using an HTTP GET to the other page, so we can generate a link to it. For long pipes, this might break..*

*(Anyone know of an equivalent to a URL shortening service that will accept HTTP POST arguments and give you a short URL that will do a POST on your behalf? [As well as the POST payload we’d need to pass the target URL (i.e. the address to which the POST data is to be sent), to the shortener. It would then give you a short URL, such that when you click on it it will POST the data to the desired target URL. I suppose another approach would be a service that will store the post data for you, give you a short URI in return, and then you call the short URI with the address of the page you want the data posted to as a key?)

PS If you do run the bookmarklet to generate a URI that contains the serialised version of the pipe, (that is, use a GET method in the form and a $_GET handler in the PHP script), you can load the object (wrapped in the ypdp() callback function) into your own page via a <script> element in the normal way, by setting the src attribute of the script to the URI that includes the serialsed version of the pipe description.

Grabbing the JSON Description of a Yahoo Pipe from the Pipe Itself

In a series of recent posts, (The Yahoo Pipes Documentation Project – Initial Thoughts, Grabbing JSON Data from One Web Page and Displaying it in Another, . Starting to Think About a Yahoo Pipes Code Generator) I’ve started exploring some of the various ingredients that might be involved in documenting the structure of a Yahoo Pipe and potentially generating some programme code that will then implement a particular pipe.

One problem I’d come across was how to actually obtain the abstract description of a pipe. I’d found an appropriate Javascript object within an open Pipes editor, but getting that data out was a little laborious…

…and then came a comment on one of the posts from Paul Daniel/@hapdaniel, pointing me to a pipe that included a little trick he was aware of. A trick for grabbing the description of a pipe from a pipe’s pipe.info feed (e.g. http://pipes.yahoo.com/pipes/pipe.info?_out=json&_id=eed5e097836289dfb4e8586220b18e0e.

Paul used something akin to this YPDP pipe’s internals pipe to grab the data from the info feed of a specified pipe (the URL of which has the form http://pipes.yahoo.com/pipes/pipe.info?_id=PIPE_ID using YQL:

http://query.yahooapis.com/v1/public/yql?url=http%3A%2F%2Fpipes.yahoo.com%2Fpipes%2Fpipe.info%3F_out%3Djson%26_id%3D44d4492a582d616bffda237d461c5ef4&q=select+PIPE.working+from+json+where+url%3D%40url&format=json

It’s just as easy to grab the JSON feed from YQL, e.g. using a query of the form:
select PIPE.working from json where url=”http://pipes.yahoo.com/pipes/pipe.info?_out=json&_id=44d4492a582d616bffda237d461c5ef4&#8243;. The pipe id is the id of the pipe you want the description of.

If you have a Yahoo account, you can try this for yourself in the YQL developer console:

We can then grab the JSON feed either from YQL or the YPDP pipe’s internals pipe into a web page and run whatever we want from it.

So for example, the demo service I have set up at http://ouseful.open.ac.uk/ypdp/pipefed.php will take an id argument containing the id of a pipe, and display a crude textual description of it. Like this:

So what’s next on the “to do” list? Firstly, I want to tidy up – and further unpack – the “documentation” that the above routine produces. Secondly, there’s the longer term goal of producing the code generator. If anyone fancies attacking that problem, you can get hold of the JSON description of a pipe from its ID using either the YPDP internals pipe or the YQL query that are shown above.

Yahoo Pipes Code Generator (Python): Pipe2Py

Wouldn’t it be nice if you coud use Yahoo Pipes as a visual editor for generating your own feed powered applications running on your own server? Now you can…

One of the concerns occasionally raised around Yahoo Pipes (other than the stability and responsiveness issues) relates to the dependence that results on the Yahoo pipes platform from creating a pipe. Where a pipe is used to construct an information feed that may get published on an “official” web page, users need to feel that content will always be being fed through the pipe, not just when when Pipes feels like it. (Actually, I think the Pipes backend is reasonably stable, it’s just the front end editor/GUI that has its moments…)

Earlier this year, I started to have a ponder around the idea of a Yahoo Pipes Documentation Project (the code appears to have rotted unfortunately; I think I need to put a proper JSON parser in place:-(, which would at least display a textual description of a pipe based on the JSON representation of it that you can access via the Pipes environment. Around the same time, I floated an idea for a code generator, that would take the JSON description of a pipe and generate Python or PHP code capable of achieving a similar function to the Pipe from the JSON description of it.

Greg Gaughan picked up the challenge and came up with a Python code generator for doing just that, written in Python. (I didn’t blog it at the time because I wanted to help Greg extend the code to cover more modules, but I never delivered on my part of the bargain:-(

Anyway – the code is at http://github.com/ggaughan/pipe2py and it works as follows. Install the universal feed parser (sudo easy_install feedparser) and simplejson (sudo easy_install simplejson), then download Greg’s code and declare the path to it, maybe something like:
export PYTHONPATH=$PYTHONPATH:/path/to/pipe2py.

Given the ID for a pipe on Yahoo pipes, generate a Python compiled version of it:
python compile.py -p PIPEID

This generates a file pipe_PIPEID.py containing a function pipe_PIPEID() which returns a JSON object equivalent of the output of the corresponding Yahoo pipe, the major difference being that it’s the locally compiled pipe code that’s running, not the Yahoo pipe…

So for example, for the following simple pipe, which just grabs the OUseful.info blog feed and passes it straight through:

SImple pipe for compilation

we generate a Python version of the pipe as follows:
python compile.py -p 404411a8d22104920f3fc1f428f33642

This generates the following code:

from pipe2py import Context
from pipe2py.modules import *

def pipe_404411a8d22104920f3fc1f428f33642(context, _INPUT, conf=None, **kwargs):
    "Pipeline"
    if conf is None:
        conf = {}

    forever = pipeforever.pipe_forever(context, None, conf=None)

    sw_502 = pipefetch.pipe_fetch(context, forever, conf={u'URL': {u'type': u'url', u'value': u'https://blog.ouseful.info/feed'}})
    _OUTPUT = pipeoutput.pipe_output(context, sw_502, conf={})
    return _OUTPUT

We can then run this code as part of our own program. For example, grab the feed items and print out the feed titles:

context = Context()
p = pipe_404411a8d22104920f3fc1f428f33642(context, None)
for i in p:
  print i['title']

running a compiled pipe on the desktop

Not all the Yahoo Pipes blocks are implemented (if you want to volunteer code, I’m sure Greg would be happy to accept it!;-), but for simple pipes, it works a dream…

So for example, here’s a couple of feed mergers and then a sort on the title…

ANother pipe compilation demo

And a corresponding compilation, along with a small amount of code to display the titles of each post, and the author:

from pipe2py import Context
from pipe2py.modules import *

def pipe_2e4ef263902607f3eec61ed440002a3f(context, _INPUT, conf=None, **kwargs):
    "Pipeline"
    if conf is None:
        conf = {}

    forever = pipeforever.pipe_forever(context, None, conf=None)

    sw_550 = pipefetch.pipe_fetch(context, forever, conf={u'URL': [{u'type': u'url', u'value': u'https://blog.ouseful.info/feed'}, {u'type': u'url', u'value': u'http://feeds.feedburner.com/TheEdTechie'}]})
    sw_572 = pipefetch.pipe_fetch(context, forever, conf={u'URL': {u'type': u'url', u'value': u'http://www.greenhughes.com/rssfeed'}})
    sw_580 = pipeunion.pipe_union(context, sw_550, conf={}, _OTHER = sw_572)
    sw_565 = pipesort.pipe_sort(context, sw_580, conf={u'KEY': [{u'field': {u'type': u'text', u'value': u'title'}, u'dir': {u'type': u'text', u'value': u'ASC'}}]})
    _OUTPUT = pipeoutput.pipe_output(context, sw_565, conf={})
    return _OUTPUT


context = Context()
      
p = pipe_2e4ef263902607f3eec61ed440002a3f(context, None)
for i in p:
        print i['title'], ' by ', i['author']

And the result?
MCMT013:pipes ajh59$ python basicTest.py
Build an app to search Delicious using your voice with the Android App Inventor by Liam Green-Hughes
Digging Deeper into the Structure of My Twitter Friends Network: Librarian Spotting by Tony Hirst
Everyday I write the book by mweller
...

So there we have it.. Thanks to Greg, the first pass at a Yahoo Pipes to Python compiler…

PS Note to self… I noticed that the ‘truncate’ module isn’t supported, so as it’s a relatively trivial function, maybe I should see if I can write a compiler block to implement it…

PPS Greg has also started exploring how to export a pipe so that it can be run on Google App Engine: Running Yahoo! Pipes on Google App Engine