Skip to content

OUseful.Info, the blog…

Trying to find useful things to do with emerging technologies in open education and data journalism. Snarky and sweary to anyone who emails to offer me content for the site.

Category: Radical Syndication

Guardian RSS Feeds Makes Offline eNewspaper Easy

Last week, the Guardian stepped up its online battle against the Telegraph for online readers by upgrading its RSS feeds to provide the full content of each article in a wealth of topic themed feeds.

To find a feed on the Guardian site, “simply add /rss to the end of the URL you see in the location bar in the browser”. If you do this for a “topic” page you seem to get a full text feed for the articles in that topic. If you add /rss onto the end of the URL of an actual article, you appear to get the comments via RSS (a similar sort of feed URL semantics to WordPress).

It didn’t take long for the twittersphere to catch on:

or for some far more informed commentators than I to have exactly the same thought about the immediate consequences:

Stanza? Stanza – the ebook reader that does for ebooks what iTunes does for music tracks…

Regular readers may remember I posted about Stanza quite recently – OpenLearn ebooks, for free, courtesy of OpenLearn RSS and Feedbooks… – describing how the Stanza iPhone app could, with a little help from the Feedbooks RSS2ebook converter, be used to view offline ebook versions of OpenLearn course materials.

And now I can do the same with Guardian technology stories:

– for example, pop over to http://www.guardian.co.uk/technology/blog;
– add /rss: http://www.guardian.co.uk/technology/blog/rss;
– feed feedbooks and create my own technology news newspaper:

Hmm – let’s add some BBC news in there too, from http://news.bbc.co.uk/1/hi/technology/default.stm (via http://newsrss.bbc.co.uk/rss/newsonline_uk_edition/technology/rss.xml).

And there I have it – an eNewspaper I can synch to from the Stanza app on my iPod Touch (or using an ebook reader on other platforms), and read when I like as an offline ebook…

PS There is one thing that appears to be missing from the Guardian RSS offerings though – single page RSS… that is, a single item RSS feed for each Guardian news story that only contains that news story (or that story plus the comments for it…).

Author Tony HirstPosted on October 28, 2008October 27, 2008Categories Radical Syndication2 Comments on Guardian RSS Feeds Makes Offline eNewspaper Easy

Serialised OpenLearn Daily RSS Feeds via WordPress

Regular readers will know that I’ve been posting about daily – or serialised – RSS feeds for several years now, so here’s a quick recap of what serialised feeds actually are, and then a bit of news about my latest attempt at OpenLearn Daily feeds.

If you’re reading this via an RSS feed, then I’m assuming you’re familiar with the idea of RSS (if not, check out the Common Craft video: RSS in Plain English). The important thing to take away from this understanding is that separate RSS feed items contain separate chunks of content.

Typically, RSS is used to syndicate content as it is published from a regularly updated source, such as a blog, news service, or saved search. In these cases, each feed item might correspond to a separate news item, blog post, or search result.

In contrast to syndication feeds from continually or regularly updated sources, a serialised feed is an RSS feed derived from an unchanging (or “static”) body of content, such as a book, or OpenLearn course unit, for example.

The original work is partitioned (serialised) into a set of separate component parts or chunks – in the case of a book, this might correspond to separate chapters, for example. Each chunk is then published as a separate RSS item. By scheduling the release of each feed item, a book or course can be released as a part-work over a period of time, with each part delivered as a separate feed item.

“Serialisation” should thus be understood in the sense of “serialised” books, such as might appear over several issues of a newspaper, or several episodes of a radio programme in the case of a book serialised for radio.

Daily feeds are a special case of serialised feed in which items are released according to a daily schedule.

Serialised feeds may be published according to a “global” schedule – starting on a particular day, and running for a fixed period of time – or a local “personal” schedule, in which case the serialisation starts at the time an individual subscribes to a personalised, serialised feed. That is, the serialisation is published according to when an individual subscribes to the feed. So if I subscribe to a daily, personalised feed today, I get the first item today, the second item tomorrow, and so on; whereas if you subscribe next Wednesday, you get the first item next Wednesday, the second item next Thursday, etc.

Two good examples of services that publish serialised feeds are DailyLit and Podiobooks. DailyLit produces books (some free, some for a fee) as serialised feeds (or via email installments) and Podiobooks serialises books as audio podcasts. Another example is RSS Response, an RSS “auto-responder” service in which a series of staged marketing or product support releases can be delivered to a potential customer over an open-ended period of time, but starting when they subscribe to the feed.

There are two main ways of handling personalised feeds at a server level: datestamping, and unique feed subscription identifiers.

The first, and simplest, way is simply to add a datestamp to a feed subscription URL whenever a page that contains a link to the feed is published. When a blog reader polls the feed server, the current time is sampled, compared to the timestamp in the subscription URL, and the appropriate number of items published to the feed. This approach is demonstrated on Openlearnigg, where a Yahoo pipe is used to provide daily serialised feeds for OpenLearn units (see Static RSS Feed Content, Delivered Daily for more on this); the “clockRSS” icon links to the daily feed:

[Note: the openlearnigg daily feeds appear to be broken at the moment:-(]

The second way is to add a unique subscription identifier (or set of identifiers, such as a user ID and a separate feed identifier) to the feed URL each time it is linked to. User settings can then be associated with each identifier in a small database, containing details such as the start date of the subscription, information relating to the schedule (such as how frequently items should be added to the feed), the provision of offsets and accelerators (for example, an offset might say that the user wants the first three items on the first day, item four on the next day, and so on; an accelerator might allow a subscriber to grab an additional item before the next scheduled release is due, accelerating the rate at which the serialisation is published to the subscriber). DailyLit uses unique subscription IDs to allow personalised scheduling:

In a commentary on Misconceptions about reuse of open educational resources, Juliette Culver rightly identifies “Misconception 1: The only type of reuse is wholesale reuse” (see also some of my own thoughts on this: So What Exactly Is An OpenLearn Content Remix?). However, that is not to say that wholesale reuse of opencourseware is not possible. For example, whilst I have not been engaged in remixing OpenLearn content, per se, I have dabbled in various ways of re-publishing course units in different formats (ebooks, for example, or via embeddable Grazr widgets on Openlearnigg, as shown in the screenshot below).

(For more examples, see Feeding from openencourseware.)

Republication is also possible in different contexts (such as an environment that provides alternative community, commenting or discussion tools).

Serialised feeds offer another possible context for re-publication, particularly for opencourseware, in which pacing is returned to mix, at least insofar as the subscriber to a serialised course is concerned. (There are loads of issues here with respect to serialisation discussion around serialised course content, but that is for another day and another post ;-) That is, by subscribing to a serialised version of a course, a learner can benefit from the paced delivery of the course materials, akin to the paced delivery that is achieved by attending a series of timetabled lectures over a term in a traditional institution.

For a long time, I’ve felt that serialised content on demand is an attractive offering, and I’ll be looking to explore this area far more aggressively than I have in the past over the next year or so…

As Stephen Downes noted in his essay “The Future of Online Learning 2008” whilst discussing Time Independence:

Being able to time the distribution of resources is a significant advantage. It allows for presentations, interactions and other activities to be encountered dynamically during the course of days or weeks. This space can be used to pedagogical advantage in addition to meeting the student’s scheduling needs, facilitating ongoing practice and recall. Dynamic scheduling does not guarantee success – students may simply delete the material as it arrives. But having this level of control makes it more likely students will be able to attend to the material when it arrives.

Self-pacing in online learning, therefore, isn’t simply the learner picking up the work from time to time whenever he or she feels like it. It is rather the employment of various mechanisms that will enable work to be scheduled. Pacing continues to be important, even in instances of self-pacing. Being free to set one’s own schedule does not mean setting no schedule at all. Nor does it mean that the release of learning activities and content is not scheduled at all. It is, rather, a meshing of schedules.

So to start the year, I’ve spent a little bit of time putting together a WordPress MU site that republishes OpenLearn content in a blog format, and additionally provides a daily feed for the republished courses: http://learningfeed.org.

The installation makes use of two WordPress plugins that were developed under the auspices of the OpenLearn project mid-way through last year: OpenLearn WordPress Plugins. The first plugin makes it easy to create WPMU blogs based on separate OpenLearn course units; the second provides the daily feed service.

The WordPress serialised feed plugin uses a timestamped subscription URL, and a relatively simple approach to scheduling (though I intend to develop this further).

You can find links to several examples of a republished courses – and their corresponding serialised RSS feeds – here: LearningFeeds Miscellany.

It is my intention for the first iteration of LearningFeeds to see just what sorts of thing “come to mind” when playing around with the syndication (republishing) of opencourseware in a WordPress environment, as well as the “issues” that will no doubt arise from using the first versions of the OpenLearn WordPress plugins in anger (I will document my exploration s in the LearningFeeds blog, and cross-link to them from OUseful.info).

In the meantime, please feel free to comment back here with any thoughts you might have with respect to the use of serialised RSS feeds in online education and training:-)

Author Tony HirstPosted on January 6, 2009January 4, 2009Categories Open Content, Radical SyndicationTags daily9 Comments on Serialised OpenLearn Daily RSS Feeds via WordPress

Just Feed Me One Piece at a Time

As Downes starts to commission his daily/serialised feed platform, here are a few more fragmentary thoughts about feeds and feed items…

If we are to construct uncourses out of separate “blog posts” (that is, out of small chunks of content that can be sensibly represented as independent RSS feed items) then it would be handy to get access to that content in bite size RSS chunks.

At the moment, it’s quite hard to get RSS representations of blog content anywhere except from the original feed. A search on a blog search engine does not turn up all the content of the search result posts in the search engine, just the opening fragment. In a feed reader such as Bloglines or Google Reader, all the blog posts are indexed, can often be saved as favourite and are typically displayed as full text/full post results following a search.

But can I construct a new feed containing some of those separate, search result and favourited feed items in an order I require? I don’t think so…

So here’s what I have in mind – something like the Grazr editor that lets me construct new feeds by dragging and dropping the content of separate feed items from my feed reader and into a “new feed” editor. The editor would let me construct and publish new feeds using items collected from arbitrary feeds, and then maybe drag and drop those items around inside the editor to change the order they are listed in.

The easiest way of doing this would probably be to extend a current feed reader, since feed readers tend to sit on top of platforms that already have a database store of separate feed items.

It would be more satisfying, however, for blogging engines to publish single item RSS versions of each blog post. That is, as well publishing each blog post as a separate HTML page, and as well as adding the content of each post to the blog feed and any corresponding category feeds, the blog engine should also publish a separate RSS feed containing just a single item, made up from the title and body (description) of the corresponding post.

That is, for a blog post published at http://example.com/myblogpost.html, I’d also like to see http://example.com/myblogpost.rss (or whatever…;-) containing a single item RSS feed for that page.

This shouldn’t be too hard to do – as well as the page template for each post, we just need a minimal template to publish the post as a single item RSS formatted document.

WordPress blogs already offer page level feeds, of course, but as far as I know they just syndicate the comments posted to each page?

(As a workaround for OUseful.info, my WordPress single feed item (raw) pipe does some screenscraping to generate a single item RSS feed version of a OUseful.info blog post from it’s HTML page version, and the WordPress single feed item pipe takes the scraped blog post content and adds it to the top of the comment feed that WordPress already publishes for each post…)

Why would I want to do this? Well suppose I want to publish my own crazy learning journey through OpenLearn content – the easiest way would be just just grab the RSS version of each page I wanted from whatever course it happened to be on, and construct my own series of posts/pages as a new feed. (If you’re interested in this, here are some tricks and tools that will let you do it (if they still work and haven’t rotted yet?!):

  • Embedding Single OpenLearn Unit Pages in an Arbitrary Blog Post;
  • Single Item RSS Feeds from OpenLearn Pages

PS Note to self – check to see if Microsoft pursued web slices in IE?

Author Tony HirstPosted on January 31, 2009October 23, 2010Categories Radical Syndication, Thinkses2 Comments on Just Feed Me One Piece at a Time

Single Item RSS Feeds from WordPress Blogs

Okay, in Just Feed Me One Piece at a Time here’s a quick fix for how to get a single item RSS feed for each separate blog post on a WordPress blog…

…but first, the hint condition for those of you who want to work it out yourself: Features: RSS (WordPress.com).

Can you see what it is yet?

Here: If you go to your main site feed, and then add ?s=oogabooga to the end of the URL, it’ll show you a feed of just the posts that contain the word “oogabooga” in them. (This is called a search feed.)

So a “nearly there” solution is…

…go to the main site feed and use the title of the post you want the feed for as the search term.

Like this:
https://ouseful.wordpress.com/feed/?s=”Just%20Feed%20Me%20One%20Piece%20at%20a%20Time”

Now this works fine if you never mention the exact title of the post in another post on the blog, because post titles are likely to be unique phrases (and so the search will only turn up one result)…

But if you do refer to one post using it exact title in another post, you’ll get multiple hits.

So a quick fix workaround is just to push the feed through a pipe that filters the results list by title:

(An alternative approach would be to use the heuristic that the first mention of the phrase will be in the original post (i.e. the post where the search phrase matches the post title), in which case you could just filter the search feed to return the oldest hit; in a pipe, the easiest way to do this would be to reverse the feed order, (or sort by ascending date) then truncate the feed after 1 item).)

Note that the workaround relies on the WordPress search doing its thing properly and the user getting the search term right…

For a useful workflow, it’d be handy to have a bookmarklet that would generate the URL for a single item RSS feed for a given WordPress blog feed (a bit like the OpenLearn single item RSS feed bookmarklet does for OpenLearn unit pages). This means capturing the blog top-level URL (e.g. https://ouseful.wordpress.com) and the post title. The following pipe attempts to do that just given the URL of the actual post you want the single feed item for (WordPress single item RSS from URl):

This pipe works for OUseful.info, but probably won’t work for a lot of other WordPress blogs because it uses a heuristic to capture the post title from the page title. More specifically, in the OUseful.info case, page titles have the form This is the Post Title « OUseful.Info, the blog…, so the pipe looks for the « then loses it and everything to the right of it in the page title to determine the post title.

What would be handy here would be for all WordPress templates to add a “title” metadata element to the header containing the exact post title..?

Though of course, it’d be nicer still if WordPress and the other blogging platforms made the single item RSS feed available for each post natively, just anyway… ;-)

PS it seems like WordPress does do such a thing…

So the general case solution to single item RSS feeds for WordPress blog posts is use the following construction:

http://wp.example.com/post-URL/?feed=rss2&withoutcomments=1

Here’s a bookmarklet that I think should work on any WordPress blog…

javascript:window.location+=”?feed=rss2&withoutcomments=1″;

(Crappy WordPress won’t let me actually provide a link to the bookmarklet – it insists on stripping out the “javascript:”:-(

And so the web just got a little bit more wired for me… Thanks, Shanta Rohse:-)

(Note that if the blog publisher has configured the feeds to only include summaries rather than full posts, you’ll only get the summary… Unless there’s a URI argument that will force the full item to be published (which I doubt!)?)

PPS it strikes me that you can add a link to the single item RSS feed for a WordPress post by adding something like this to the post (or sidebar, maybe, if your template has sidebar widgets alongside individual posts):
<a href=”?feed=rss2&withoutcomments=1″>Single item RSS feed for this post</a>

Like this: Single item RSS feed for this post

(I’m not sure whether that link will work from within a feed reader, though?)

Author Tony HirstPosted on February 2, 2009February 2, 2009Categories Pipework, Radical Syndication, Tinkering5 Comments on Single Item RSS Feeds from WordPress Blogs

Embedding Yahoo Pipes Output With a Single Click

…sort of…

Here are a quick couple of bookmarklets that I’ve been meaning to put together and only just got around to. They work on the “homepage” of any Yahoo pipe (such as this POIT Report beta recommendations pipe that reverses the order of feed items from the recommendations page of the POIT report (beta)) and do the following:

– preview the feed output of the pipe in a Grazr widget:
javascript:window.location=
“http://grazr.com/gzpanel.html?file=http://pipes.yahoo.com/pipes/pipe.run?_id=”+
window.location.href.split(‘=’)[1]+”&_render=rss”;
[Line breaks added for display purposes – you’ll need to remove them for the bookmarklet to work.]
Clicking on this bookmarklet when you are on a pipe’s homapage will display the pipe’s feed output in a Grazr widget. So what? So you can preview the pipe’s feed output in a “legitimate” feed reader.

– go the Grazr widget embedding page to customise your own embeddable widget container for the pipe’s output RSS feed:
javascript:window.location=
“http://grazr.com/config?file=http://pipes.yahoo.com/pipes/pipe.run?_id=”+
window.location.href.split(‘=’)[1]+”&_render=rss”;
[Line breaks added for display purposes – you’ll need to remove them for the bookmarklet to work.]
Clicking on this bookmarklet when you are on a pipe’s homapage will display the pipe’s feed output in a Grazr widget on the Grazr widget editor page. So what? So you can grab the pipe’s feed output into a “legitimate” feed reader and then get the embed code to embed the feed in your own page.

Remember that if the feed has Slideshare slideshow, audio files or Youtube movies added as enclosures to the feed, the Grazr widget will display them within the widget…

  • delicious Feed Enhancer – Auto-Enclosures in Grazr;
  • Viewing Presentations in Grazr via Slideshare RSS Feeds.

(I’m not sure if Scribd enclosures are also automatically rendered in Grazr, but they definitely can be if you create your own OPML file… Embed Scribd iPaper in Grazr Demo.)

Author Tony HirstPosted on February 7, 2009February 7, 2009Categories Radical Syndication, Tinkering1 Comment on Embedding Yahoo Pipes Output With a Single Click

Mashing Up Government the RSS Way: Raw Materials

Three or four weeks ago, @adrianshort tipped me off about a campaign he was trying to put together to encourage local councils to start publishing autodiscoverable web pages from their homepages. Various overcommitments of my own meant I couldn’t contribute anything to this initiative, but it’s great to see it up and running now at Mash the State:

So how does my local council do?

Boo – no autodiscoverable feeds on their homepage…. (I wonder whether it might it be an idea to have a link to the council page that is being checked for autodiscoverable links, so that people can see which page it actually is and scout around it for non-autodiscoverable feeds?)

Although the campaign is targeted at encouraging councils to publish RSS news feeds, there’s a range of other feeds that they could usefully publish too, potentially without too much effort.

For example, councils can make use of the Planning Alerts service that scrape planning info from local council websites (presumably it would make get the data via feeds if the data were made available that way? [Update: the link is there, I just hadn’t noticed it – the name of the council in the body text is a link to the assumed council home page.]):

These feeds include geo-data too, which means you can plot the feed on a map:

(I started exploring an even richer planning map for the IW Council, who provide (albeit in a hard to find way) audio recordings of council planing meetings. You can find the proof of concept here: Barriers to Open Availability of Information? IW Planning Committee Audio Recordings.)

Roadworks feeds might be another useful service? Elgin (the electronic local government information network) is one source of this information, although their results listings aren’t available as RSS, and in constructing the URLs for the search results, you need to know the Local Authority Area number :-( (Is there a straightforward list of these available anywhere?)

As well as the opening up of the Mash the State Campaign, I also spotted this week that the UK Parliament website was now providing RSS feeds detailing the progress of every bill currently going through Parliament:

Haing the RSS feed means it’s trivial to create a timeline viewof a Bill’s progress using a service such as Dipity. So for example, here’s a timeline depicting the progress of the Coroners and Justice Bill:

Coroners and Justice bill timeline http://www.dipity.com/psychemedia/Coroners-and-Justice

(I’m not sure if there’s an official way of tracking amendments to already enacted Acts? If not, here’s a workaround I put together some time ago – Tracking UK Parliamentary Act Amendments – although I don’t know whether it’s still working?)

PS this looks like an interesting related collection of links: Mashups in government; and this post – Sign up, sign up for Open Source – describes some innovative looking local council projects (I like the idea of a planning application tracker, cf. the government Bill tracker, maybe?)

PPS Although the percentage of councils that currently have autodiscoverable feeds on their homepage is quite low, it’s still a better uptake than for HEIs: Back from Behind Enemy Lines, Without Being Autodiscovered(?!) and Autodiscoverable RSS Feeds From HEI Library Websites. See also 404 “Page Not Found” Error pages and Autodiscoverable Feeds for UK Government Departments.

Author Tony HirstPosted on April 11, 2009April 11, 2009Categories Policy, Radical Syndication3 Comments on Mashing Up Government the RSS Way: Raw Materials

Ordered Lists of Links from delicious Using Yahoo Pipes

One of the things that I often use the delicious social bookmarking service for is to push lists of links into web pages, web dashboards, or the feedshow link presenter. However, sometimes it’s important to be able to push the links in a particular order (particularly for the link presenter) rather than the order in which the links were bookmarked (i.e. order by timestamp based on when the bookmark was saved).

So a couple of days ago it occurred to me that I should be able to do this with a simple Yahoo Pipe by using a tags to order the sequence of links and sorting on those. So for anyone who remembers programming in BASIC, and number the lines 10, 20, 30 (or 100, 200, 300) to give yourself “room” to insert additional lines, the following convention may be familiar…

STEP 1: tag your links according to the convention: ORDERLABEL:nnn. So for example, to provide raw testing material for my pipe I tagged three links with the following variants: orderA:1000, orderB:120, orderC:103; orderA:3000, orderB:110, orderC:102; and orderA:2000, orderB:130, orderC:101. It also makes sense to tag each item with just ORDERLABEL, so you can pull out just those items from delicious.

STEP 2: build the pipe. My idea here was to grab the list of tags for each link as a string, use a regular expression to just parse out the sequence number from the string, having been provided by the order label (e.g. orderA, orderB or orderC in my test case), and then sort the feed on those numbers…

Unfortunately, delicious doesn’t emit all the tags in a single element (at least, not as far as Yahoo! Pipes are concerned):

And even more unfortunately (for me at least), I don’t know an effective way of combining these sub-elements into a single element? (The Sub-element pipe operator will convert every item in each category subelement list to an element in it’s own right, but that’s not a lot of use as I don’t know how to copy the tilte, link and description elements into each category subelement…)

So what to do?

Well, it turns out you can use this sort of construction in a regular expression block:
${category.0.content}
which says “use the content of the 0’th category subelement”.

Which means in turn that if I refer to each of n tags explicitly (as in: ${category.n.content}), I can construct a single string containing all n categories (i.e. all the tags in a single string).

We copy the title element as an element of convenience to create an order element within the feed. The string block constructs a single replacement string for the regular expression that will replace the original contents of the order element with the content element from the first 16 category subelements. Following the regular expression replacement, the order element now contains up to the first 16 tags associated with the element in a single string.

The next step is to filter the feed so that we only pass elements that contain tags that are based on the ORDERLABEL root (in this case, I am sorting on things like orderA:1000, orderA:2000, etc):

(Remember that we could use another tag (I usedorderedfeedtest in this example) to pull in all the orderA:nnn tagged bookmarks.)

The appropriately order number tagged elements are then processed so that the order element is rewritten with just the “line number” for each feed item (so e.g. orderA:2000 would become 2000, and the items in the feed sorted using this element.

By specifying the appropriate ordering label, we can force the order in which feed items are displayed:

And then:

You can find the pipe here: delicious feed ordered by “tag line numbers”.

Author Tony HirstPosted on April 22, 2009April 22, 2009Categories Pipework, Radical Syndication, Tinkering7 Comments on Ordered Lists of Links from delicious Using Yahoo Pipes

Mashlib Pipes Tutorial: 2D Journal Search

[This post is a more complete working of Mash Oop North – Pipes Mashup by Way of an Apology]

Keeping current with journal articles in a particular subject area is one of the challenges that faces many researchers, and by implication the academic and research librarians tasked with supporting the information needs of those researchers.

This relatively simple recipe shows how to create a “two dimensional” search that allows a user to provide two sets of keywords, one to identify a set of journals in a particular subject area, the other to filter the current articles down to a particular subtopic in that subject area.

What this demo shows is:
– how to pull in a list of journals in a particular subject area based on user provided keywords into the Yahoo pipes environment;
– how to pull in the most recent table of contents from those journals into that environment;
– how to then filter those recent articles to only display articles on a particular subtopic.

Th starting point for this recipe is jOPML, a service created by Scott Wilson that allows you to run a keyword search on the titles of journals whose tables of contents are made available as RSS on ticTOCs and a generate an OPML feed containing the RSS feed URLs for those journal TOCS. (OPML is an XML formatted language that, among other things, can be used to transport bundles of RSS feed URLs around the web. In much the same way that RSS is one of the most effective ways of transporting sets of links to web pages around the web (as for example in the case of RSS feeds from social bookmarking sites such as delicious), so OPML is one of the best ways of moving collections of RSS links around.)

Now as well as consuming RSS feeds, Yahoo Pipes can also pull in other data formats. So for example:

– the Fetch Feed block can pull in a wide variety of RSS flavoured forms (different versions of RSS, Atom etc); [Handy tip – a pipe that just wires a Fetch Feed block direct to the pip output can be used to “normalise” different flavours of RSS/Atom in order to provide a single, standard feed format at the output of the pipe. ]<

– Fetch Data can be used to import XML and JSON into the pipes environment (with Fetch CSV pulling in CSV data files, from sources such as Google Spreadsheets);

– Fetch Page can be used to load HTML web pages into Yahoo Pipes, providing the means by which to develop simple screen scraping applications within the Pipes environment.

What this means is that we can pull in the OPML file generated by jOPML into the Yahoo Pipes environment and have a play with it :-)

So let’s see how. To start with, we need to find a way of getting arbitrary OPML files out of jOPML. Running a search for science history on jOPML returns:

with the OPML available here: http://jopml.org/feeds.opml?q=science+history

Looking at this URI, you’ll hopefully see that it contains the search terms used to query the journals database on jOPML. In effect, the URI is an API to the jOPML service. By rewriting the URI, we can make different calls on the jOPML service, and return different OPML files for different topic areas.

AS-AN-ASIDE TAKE HOME POINT: many URIs effectively provide an API to a web service. If you ever see a search form, run some queries using it, and look at the URIs of th results page. If you can see your search terms in the URI, you are now in a position to construct your own queires to that service simply by using the URI, rather than having to go by the search form.

Here are a couple of services you can try this out with:
– Google: http://google.com;
– Twitter: http://search.twitter.com.
Remember, the goal is to:
1) run a search;
2) look at the the URI of the results page and see if you can spot the search terms;
3) try to hack the URI to run a search using a new set of search terms.
Are there any other hackable items in the URI? For example, run several Twitter searches returning different numbers of search results, and look at the URI in each case. Can you see how to hack it to return the number of results items that you want? (Note that there is a hard limit set at the Twitter end that defines the maximun numbr of results that can be returned.)

It’s not just search terms that appear in URIs either. For example, the ISBN Playground will generate a wide variety of URIs that are all “keyed” using an arbitrary ISBN. (Actually, thot’s not quite true; many of them require ISBN 10 format ISBNs. But there are ways around that, as I’ll show in a later post…) If I’m missing any URIs you know of that contain ISBNs, please let me know in a comment to this post ;-)

Anyway, that’s more than enough of that! Let’s go back to the 2D journal search recipe, and let the pipework begin…

The main idea bhind Yahoo pipes is to “wire” together different components in order to complete some sort of task. When you create a new Yahoo pipe you are presented with an empty canvas that dominates the screen on which to create your “pipe”, and a menu area on the left that contains different blocks that you can use to create your pipe.

Blocks are added to the canvas either by dragging them from th menu area and dropping them on the canvas, or clicking the + symbol on the block you want in the menu area, which adds it to the canvas automatically.

Blocks are wired together by clicking on the circle on the bottom of a block and dragging and dropping the “wire” onto a circle on the top of the next block in your pipe.

The idea is that content flows through one block into the next, entering the block along its top edge, being processed by the block as appropriate, and then passing out through the bottom edge of the block.

Blocks that do not have an input circle on the top edge are used to pull content into the pipe from elsewhere. (These can be found in the Sources part of the menu panel.)

In contrast, the Pipe output block does not have any circles on its lower edge – the output from this block is exposed to the outside world on the the pipe’s public home page. (The single pipe output block is added to the canvas automatically when you add an input block. Pipes can have multiple input blocks, but only on output block.)

(If this sort of interaction design appeals to you – that is, “wiring” separate components in some sort of linear workflow together – a Javascript library is available that implements the drag, drop and wire features so you can implement an interface similar to the Yahoo Pipes interface in your own web applications: WireIt – a Javascript Wiring Library. To see WireIt in action, check out Tarpipe.)

So where do we start? The first thing to do is to construct the URI to the OPML feed that we can then use to pull in the OPML feed for a set of journals on a particular topic:

If you highlight a block by clicking on it, it will glow orange. You can then inspect the output from just this block by looking in the preview pan at the bottom of the screen:

The “Journal keywords (text)” block is actually a Text input block:

The URL builder constructs a URL from the fixed elements of the URI (the page location http://jopml.org/feeds.opml and the query variable q) and the user inputted search terms. The user inputs are exposed as text entry boxes on the front page of the pipe as well by arguments in the URI for the pipe (e.g. in the same way that the query terms appear in the jOPML URIs).

In order to import the contents of the jOPML file, we can use the Fetch Datablock.

To see what we’ll be working with, here’s what an original OPML file looks like:

If we load this XML file into Pipes, we need to tell the Fetch Data block what parts of the OPML file which should use as separate items in within the pipe. Looking at the OPML file, we ideally want each journal to be represented as a separate item within the pipe. We do this by specifying the path to the outlin element in the OPML feed, noting that each journal listing is represented using a separate outline element.

Within the pipes environment, the OPML file is represented as follows:

Each outline element contains information regarding a single journal – it’s title, xmlUrl, and so on. The xmlUrl element contains the URI of the RSS feed for the contents of the current issue of the particular journal. You’ll see that the xmlURI points to the RSS feed of the journal from the publisher’s site.

So for example, the RSS version of the TOCs for the journal The British Journal for the History of Science can be found at http://journals.cambridge.org/data/rss/feed_BJH_rss_2.0.xml.

Now you could of course subscribe to all these journal table of contents feeds simply importing the OPML file into an RSS reader such as Google Reader, but where would the fun be in that? After all, most of the time, I’m not actually that interested in most of the articles in any particular journal. For example, it would be far more efficient (?!) if I was only alerted to articles that were in my subject area. So let’s see how to do that…

The Loop block lets us work with each item in the selected journals feed. Essentially, it says “for each item in a feed, do something to or with that item”. (For each is a really powerful idea in computational thinking. It does pretty much exactly what it says on the tin: for each item in a list, do something with it. In the Yahoo Pipes environment, the Loop block essentially implements for each):

You’ll see that the loop block has a space for adding another block – the block whose functionality will be applied to each element in the incoming feed. As well as placing ‘standard’ pipes blocks taken from the blocks menu in a Loop element, you can also use pipes you have created yourself.

If we embed a Fetch Feed block in the Loop, then for each journal item identified in the imported OPML feed, we can locate its TOCs RSS feed URI (the xmlUrl element) and use it to fetch the contents of that feed.

Now you may notice that the Loop block can output the results of the Fetch Feed call in one of two ways; it can either annotate the original feed items, for example by assigning (that is, adding) the current list of contents for a journal to each a subelement of each item in the pipe:

In more abstract terms, we might represent that as follows:

Or byemitting the items, which is to say that each item that comes into the the Loop block is replaced by the set of items that were created within the Loop block:

Here’s how that looks diagrammatically:

Because I want to produce a feed that just contains links to articles that may be of interest to me, we’re going to use the “emit all results” option.

So let’s just recap where we are. Here’s the pipe so far:

We start by taking some user keyword terms and construct a URI that calls the jOPML service, returning an OPML file that contains the titles and TOC RSS URLs of journals related to those keywords. We then loop through that list of journals, replacing each journal item with a list of items corresponding to the current table of contents of each journal. These items are pulled from the table of contents RSS feed for each journal as obtained from the ticTOCs listings.

The next step is to filter the contents list so that we only get passed journal articles on a particular topic. We’ll do that using a crude keyword filter that only lets articles through whose contents contain a particular keyword or set of keywords.

Taking the filter block, we wire in another user input that allows the user to specify keywords that must appear in the title element of an article for it to be emitted from the pipe, and take the output from this filter to the output of the pipe.

So there we have it: a 2D search that takes two sets of keywords, one set that pulls out likely suspect journals on a topic, and the second set that filters articles from those journals on a more detailed subject.

The output form the pipe is then available as an RSS feed in its own right, as a Google personal (iGoogle) widget, etc etc.

The whole pipe looks like this:

It works by generating a jOPML URI based on user provided keyword terms, importing the jOPML feed into the pipe, grabbing the RSS feed of the table of contents for each journal specified in the OPML feed and then filtering those contents listings using another set of keyword terms based on the title of each article.

In doing so, you have seen how to use the URL Builder block to construct the jOPML URI using user provided search terms entered using a Text Input block; the Import Data block to grab the jOPML XML feed; the Loop and Fetch Feed blocks to pull in the table of contents RSS feed from the journal publisher for each journal identified in the jOPML feed; and the Filter block to pass through only those articles that contain a second set of user specified keywords in the article title.

Enjoy :-)

PS if I manage to blag being able to run a Library Mashup uncourse in the Autumn, this is about the level of detail post I’d was planning to write. So – too much detail? Not enough? Just right? How’s it for leveling? Appropriate for a ‘not necessarily techie, but keen to learn’ audience?

Author Tony HirstPosted on July 9, 2009August 6, 2012Categories Pipework, Radical Syndication, TutorialTags jOPML, mashlib, mashlib09, ticTOCs8 Comments on Mashlib Pipes Tutorial: 2D Journal Search

Feed Powered Auto-Responders

A few weeks ago, I got my first “real” mobile phone, an HTC Magic (don’t ask; suffice to say, I wish I’d got an iPhone:-( and as part of the follow up service from the broker (phones4U – I said I might be tempted to recommend them, so I am) I got a ‘will you take part in a short customer satisfaction survey’ type text message.

So when I responded (by text) I immediately got the next message in the sequence back as a response.

That is, the SMS I sent back was caught and handled by an auto-responder, that parsed my response, and automatically replied with an appropriate return message.

Auto-responders are widely used in email marketing and instant messaging environments, of course, and as well as acting in a direct response mode, can also be used to schedule the delivery of outgoing messages either according to a fixed calendar schedule (a bulk email to all subscribers on the first of the month, for example) or according to a more personalised, relative time schedule.

So for example, a day or two after getting my new phone, Vodafone started sending me texts about how to use my phone on their network*, presumably according to a schedule that was initiated when I registered the phone for the first time on the network; and the Phones4U courtesy chase up was presumably also triggered according to some preset schedule.

* something sucks here, somewhere: I keep finding my phone has connected to other, rival networks, and as such seems to spend large amounts of its time roaming, even when in a Vodafone signal area. Flanders – you owe me for making such a crappy recommendation… and Kelly, you have something to answer for, too…

So, these auto-scheduled, auto-responding systems are exactly the same idea as daily feeds: whenever you subscribe, a clock starts ticking and content is delivered to you according to a predefined schedule via that same channel.

In a true autoresponder, of course, the next mailing in a predefined sequence is sent in response to some sort of receipt from the recipient, rather than a relative time schedule, and in the case of autoresponding feeds this can be supported too if the feed scheduler supports unique identifiers for each subscription.

(The simplest daily feed system has a subscription URL that contains the start date; content is then delivered according to a relative time schedule that starts on the date contained in the subscription URL. A more elaborate syndication platform would use a unique identifier in the subscription URL, and the content delivery schedule is then tied to the current state of the schedule associated with that unique identifier.)

So how might a feed autoresponder work? How about in the same way as a feed stats package such as Feedburner? These measure ‘reach’ by inserting a small image at the very end of each feed item that is loaded whenever the feed item is viewed. By tracking how many images are served, it’s possible to get an idea of how many times the feed item was viewed.

The same mechanism can be used as part of a feed auto-responder system: for a subscription via a URI that contains a unique identifier, serve an image with a unique, obfuscated (impossible to guess at, and robots excluded) filename for each item. When the image is polled from a browser client, assume that the subscriber has read that item and publish the next item to the feed after a short delay. The next time the user visits their feedreader, the next item should be there waiting for them.

PS Note that someone somewhere has probably patented this, although as a mechanism it’s been around and blogged about for years (prior art doesn’t seem to be respected much in the world of software patents…) If you have a reference, please provide a link to it in the comments to this post.

Author Tony HirstPosted on July 20, 2009July 15, 2009Categories Radical Syndication, ThinksesTags feed based autoresponder, RSS autoresponder

Content Transclusion: One Step Closer

Following a brief exchange with @lesteph last night, I thought it might be worth making a quick post about the idea of content or document transclusion.

Simply put, transclusion refers to the inclusion, or embedding, of one document or resource in another. To a certain extent, whenever you embed an image or Youtube video in a page is a form of transclusion. (Actually, I’m not sure that’s strictly true? But it gets the point across…)

Whilst doing a little digging around for references to fill out this post, I came across a nicely worked example of transclusion from Wikipedia – Transclusion in Wikipedia

content transclusion in wikipedia

The idea? You can embed the content of any Wikipedia page in any other Wikipedia page. And presumably the same is true within any Mediawiki installation.

That is, in a MediaWiki wiki:

you can embed the content of any one page in any other page.

(I’m not sure if one MediaWiki installation can transclude content from any other MediaWiki installation? I assume it can???)

It’s also possible to include, (that is, transclude) MediaWiki content in a WordPress environment using the Wiki Inc plugin. A compelling demonstration of this is provided by Jim Groom, who has shown how to republish documentation authored in a Wiki via a WordPress page, an approach we adopted in our WriteToReply Digital Britain tinkerings.

One of the things we’ve started exploring the JISCPress project is the ability to publish each separate paragraph in a document (each with its own URI), in a variety of formats – txt, JSON, HTML, XML. That is, we have (or soon will have) an engine in place that supports the “publishing” side of paragraph level transclusion of content from reports published via the JISCPress/WTR platform. Now all we need is the transclusion (re-presentation of transcluded content) part to be able to transclude content from one document in another. (See Taking the Conversation Elsewhere – Embedded Quotes; see also Image Based Quotes from WriteToReply Using Kwout for a related mashup).

(Hmm, although Joss won’t like this, I do think we need a [WTR-include=REF] shortcode handler installed by default in WTR/JISCPress that will pull in paragraph level content in to one document from a document elsewhere on the local platform?)

Now this is really what hypertext is about – URIs (that is, links), that can act as a portal that can pull content in to one location from another. It may be of course that the idea of textual transclusion is just too confusing for people. But it’s something we’re going to explore with WriteToReply.

And on of the things we’re looking at for both WriteToReply and JISCPress is the use of semantic tagging to automatically annotate parts of the document (at the paragraph level, if possible?) so that content on a particular topic (i.e. tagged in a particular way) in one document can be automatically transcluded in – or alongside – a related paragraph in a separate document. (Hmm – maybe we need a ‘related paragraphs’ panel, cf. the comments panel, that can display transcluded, related paragraphs, from elsewhere in the document or from other documents?)

PS If you have an hour, here’s the venerable Ted Nelson giving a Google Tech Talk on the topic of transclusion:

Enjoy…

PPS here’s an old library that provides a more general case framework for content transclusion: Purple Include. I’m not sure if it still works though?

PPPS Here’s the scarey W3C take on linking and transclusion ;-) This is also interesting: auto/embed is not node transclusion

PPPPS for another take on including content by reference, see Email By Reference, Not By Value, or “how I came up with the idea for Google Wave first”;-)

PPPPPS Seems like eprints may also support transclusion… E-prints – VLit Transclusion Support.

Author Tony HirstPosted on August 7, 2009October 23, 2016Categories Radical Syndication, WriteToReplyTags Actually, JISCPress, transclusion7 Comments on Content Transclusion: One Step Closer

Posts navigation

Page 1 Page 2 Page 3 Next page
© AJ Hirst 2008-2021
Creative Commons License
Attribution: Tony Hirst.

Contact

Email me (Tony Hirst)
Bookmarks
Presentations
Follow @psychemedia
Tracking Jupyter newsletter

Enter your email address to subscribe to this blog and receive notifications of new posts by email.

Join 2,026 other subscribers
Subscribe in a reader

My Other Blogs

F1Datajunkie Blog F1 data tinkerings
Digital Worlds Blog Game Design uncourse
Visual Gadgets Blog visualisation bits'n'pieces

Custom Search Engines

Churnalism Times - Polls (search recent polls/surveys)
Churnalism Times (search press releases)
CourseDetective UK University Degree Course Prospectuses
UK University Libraries infoskills resources
OUseful web properties search
How Do I? Instructional Video Metasearch Engine

Page Hacks

RSS for the content of this page

View posts in chronological order

@psychemedia Tweets

  • RT @ollie: we're hiring! the @FinancialTimes is looking for a data journalist to join our newsroom in new york: boards.eu.greenhouse.io/financialtimes… 9 hours ago
  • RT @commonslibrary: How might artificial intelligence impact the UK's creative industries? Today, @sarahjolney1 will lead a debate on the… 10 hours ago
  • Public sector discord... twitter.com/dkernohan/stat… https://t.co/ECA1WEgfH2 15 hours ago
Follow @psychemedia

RSS Tumbling…

  • "So while the broadcasters (unlike the press) may have passed the test of impartiality during the..."
  • "FINDING THE STORY IN 150 MILLION ROWS OF DATA"
  • "To live entirely in public is a form of solitary confinement."
  • ICTs and Anti-Corruption: theory and examples | Tim's Blog
  • "Instead of getting more context for decisions, we would get less; instead of seeing the logic..."
  • "BBC R&D is now winding down the current UAS activity and this conference marked a key stage in..."
  • "The VC/IPO money does however distort the market, look at Amazon’s ‘profit’..."
  • "NewsReader will process news in 4 different languages when it comes in. It will extract what..."
  • Governance | The OpenSpending Blog
  • "The reality of news media is that once the documents are posted online, they lose a lot of value. A..."

Recent Posts

  • Working with Broken
  • Chat Je Pétais
  • From Packages to Transformers and Pipelines
  • Search Assist With ChatGPT
  • Fragment — Cheating, Plagiarism, Study Buddies and Machine Assist

Top Posts

  • Generating Diagrams from Text Generated by ChatGPT
  • Information Literacy and Generating Fake Citations and Abstracts With ChatGPT
  • Generating (But Not Previewing) Diagrams Using ChatGPT
  • Connecting to a Remote Jupyter Notebook Server Running on Digital Ocean from Microsoft VS Code
  • Can We Get ChatGPT to Act Like a Relational Database And Respond to SQL Queries on Provided Datasets and pandas dataframes?
  • Templated Text Summaries From Data Using ChatGPT
  • SQL Databases in the Browser, via WASM: SQLite and DuckDB
  • Chat Je Pétais

Archives

OUseful.Info, the blog… Blog at WordPress.com.
OUseful.Info, the blog…
Blog at WordPress.com.
  • Follow Following
    • OUseful.Info, the blog...
    • Join 2,026 other followers
    • Already have a WordPress.com account? Log in now.
    • OUseful.Info, the blog...
    • Customize
    • Follow Following
    • Sign up
    • Log in
    • Report this content
    • View site in Reader
    • Manage subscriptions
    • Collapse this bar
 

Loading Comments...