A recent post on the journalism.co.uk site asks: How much computer science does a journalist really need?, commenting that whilst coding skills may undoubtedly be useful for journalists, knowing what can be achieved easily in a computational way may be more important, because there are techies around who can do the coding for you… (For another take on this, see Charles Arthur’s If I had one piece of advice to a journalist starting out now, it would be: learn to code, and this response to it: Learning to Think Like A Programmer.)
Picking up on a few thoughts that came to mind around a presentation I gave yesterday (Web Lego And Format Glue, aka Get Yer Mashup On), here’s a slightly different take on it, based on the idea that programming doesn’t necessarily mean writing arcane computer code.
Note that a lot of what follows I’d apply to librarians as well as journalists… (So for example, see Infoskills for the Future – If You Can’t Handle Information, Get Out of the Library for infoskills that I think librarians as information professionals should at least be aware of (and these probably apply to journalists too…); Data Handling in Action is also relevant – it describes some of the practical skills involved in taking a “dirty” data set and getting it into a form where it can be easily visualised…)
So here we go…. An idea I’ve started working on recently as an explanatory device is the notion of feed oriented programming. I appreciate that this probably already sounds scary geeky, but it’s a made up phrase and I’ll try to explain it. A feed is something like an RSS feed. (If you don’t know what an RSS feed, this isn’t a remedial class, okay… go and find out… this old post should get you started: We Ignore RSS at OUr Peril.)
Typically, an RSS feed will contain a set of items, such as a set of blog posts, news stories, or bookmarks. Each item has the same structure in terms of how it is represented on a computer. Typically, the content of the feed will change over time – a blog feed represents the most recent posts on a blog, for example. That is, the publisher of the feed makes sure that the feed has current content in it – as a “programmer” you don’t really need to do anything to get the fresh content in the feed – you just need to look at the feed to see if there is new content in it – or let your feed reader show you that new content when it arrives. The feed is accessed via a web address/URL.
Some RSS feeds might not change over time. On WriteToReply, where we republish public documents, it’s possible to get hold of an RSS version of the document. The document RSS feed doesn’t change because the content of the document doesn’t change), although the content of the comment feeds might change as people comment on the document.
A nice thing about RSS is that lots of things publish it, and lots of things can import it. Importing an RSS feed into an application such as Google Reader simply means pasting the web address of the feed into a “Subscribe to feed” box in the application. Although it can do other things too, like supporting search, Google Reader is primarily a display application. It takes in RSS feeds and presents them to the user in an easy to read way. Google Maps and Google Earth are other display applications – they display geographical information in an appropriate way, a way that we can readily make sense of.
So what do we learn from this? Information can be represented in a standard way, such as RSS, and displayed in a visual way by an application that accepts RSS as an input. By subscribing to an RSS feed, which we identify by a fixed/permanent web address, we can get new content into our reader without doing anything. Subscribing is just a matter of copying a web address from the publisher’s web site and pasting it into our reader application. Cut and paste. No coding required. The feed publisher is responsible for putting new content into the feed, and our reader application is responsible for pulling that new content out and displaying it to us.
One of the tools I use a lot is Yahoo Pipes. Yahoo Pipes can take in RSS feeds and do stuff with it; it can take in a list of blog posts as an RSS feed and filter them so that you only get posts out that do – or don’t – mention cats, for example. And the output is in the form of an RSS feed…
What this means is that if we have a Yahoo pipe that does something we want in computational terms to an RSS feed, all we have to do is give it the web address of the feed we want to process, and then grab the RSS output web address from the Pipe. Cut and paste the original feed web address into the Pipe’s input. Cut and paste the web address of the RSS output from the pipe into our feed reader. No coding required.
Another couple of tools I use are Google Spreadsheets (a spreadsheet application) and Many Eyes WIkified (an interactive visualisation application). If you publish a spreadsheet on Google docs, you can get a web address/URL that points to a CSV (comma separated variable) version of the selected sheet. A CSV file is a simple text file where each spreadsheet row is a represented as a row in the CSV structured text file; and the value of each cell along a row in the original spreadsheet is represented as the same value in the text file, separated from the previous value by a comma. But you don’t need to know that… All you do need to know is that you can think of it as a feed… With a web address… And in a particular format…
Going to the “Share” options in the spreadsheet, you can publish the sheet and generate a web address that points to a range of cells in the spreadsheet (eg: B1:D120) represented as a CSV file. If we now turn to Many Eyes Wikified, I can provide it with the web address of a CSV file and it will go and fetch the data for me. At the click of a button I can then generate an interactive visualisation of the data in the spreadsheet. Cut and paste the web address of the CSV version of the data in a spreadsheet that Google Docs will generate for me into Many Eyes Wikified, and I can then create an interactive visualisation using the spreadsheet at the click of a button. Cut and paste a URL/web address that is generated for me. No coding required.
As to where the data in the spreadsheet came from? Well maybe it came from somewhere else on the web, via a URL? Like this, maybe?
So the model I’m working towards with feed oriented programming is the idea that you can get the web address of a feed which a publisher will publish current content or data to, and paste that address in an application that will render, or display the content (e.g. Google Reader, Many Eyes Wikified) or process/transform that data on your behalf.
So for example, Google Reader can transfrom an HTML table to CSV for you; (Google spreadsheets also lets you do all the normal spreadsheet things, so you could generate one sheet from another sheet using whatever spreadsheet formulae you like, and publish the CSV representation of that second sheet). Or in Yahoo Pipes, you can process an RSS feed by filtering its contents so that you only see posts that mention cats.
Yahoo Pipes offers other sorts of transformation as well. For example, in my original Wikipedia scraping demo, I took the feed from a Google spreadsheet and passed it to Yahoo Pipes where I geocoded city names and let pipes generate a map friendly feed (known as a KML feed) for me. Copying the web address of the KML feed output from the pipe and pasting it into Google Maps means I can generate an embeddable Google map view of data originally pulled from Wikipedia:
Once you start to think of the world in this way:
– where the web contains data and information that is represented in various standard ways and made available via a unique and persistent web address,
– where web applications can accept data and content that is represented in a standard way given the web address of that data,
– where web applications can transform data represented at one web address in one particular way and republish it in another standard format at another web address,
– or where web applications can take data represented in a particular way from one web adress and provide you with the tools to then visualise or display that data,
then the world is your toolbox. Have URL, will travel. All you need to know is which applications can import what format data, and how they can republish that data for you, whether in a different format, such as Google spreadsheets taking an HTML table from Wikipedia and publishing it as a CSV file, or as a visualisation/human friendly display (Many Eyes Wikified, Google Reader). And if you need to do “proper” programmer type things, then you might be able to do it using a spreadsheet formula or a Yahoo Pipe (no coding required…;-)