Crowd Sourcing a Promotion Case…

So racked with embarrassment at doing this (’tis what happens when you don’t publish formally, don’t get academic citations in the literature, and don’t have a “proper” academic impact factor;-) I’m going to take the next 10 days off in a place with no internet connection…. but anyway, here goes: an attempt at crowd-sourcing parts of my promotion case….

It’s All About Flow…

One of the compelling features of Yahoo Pipes for me is the way the the user interface encourages you think of programming in terms of pipelines and feeds, in which a bundle of stuff (RSS feed, CSV data, or whatever) is processed in a sequence of steps (the pipeline), with each step being applied to each item in the feed.

A few days ago I blogged about pipe2py, a toolkit from Greg Gaughan that lets you “compile” a simple Yahoo pipe into a Python code equivalent programme (Yahoo Pipes Code Generator (Python)). Given that, in general, I don’t believe the “build it and they will come” mantra, I spent half an hour or so this morning looking round the web for people who had posted queries about how to generate code equivalents of Yahoo Pipes, so that I could point them to pipe2py.

In doing so, I came across a couple of other visual pipeline environments that are maybe worth looking at in a little more detail.

PyF is a “[flow based] open source Python programming framework and platform dedicated to large data processing, mining, transforming, reporting and more.”

PyF - flow based pythin programming

On the other hand, Orange claims to offer “[o]pen source data visualization and analysis for novice and experts. Data mining through visual programming or Python scripting. Components for machine learning. Extensions for bioinformatics and text mining. Packed with features for data analytics.”

Here’s one of their promo shots:

Orange - piped visual data analysis

I haven’t had a chance to play with either of these environments – and probably won’t for a little time yet – so whilst I feel like I’m cheating by posting about them in such a cursory way without having even a simple demo to show, they’re maybe of interest to anyone who stumbles across this blog by way of pipe2py… [Update: my Orange Visualisation tool review).]

PS as well as PyF, see also: Pypes [via @dartdog]

My Understanding of SPARQL, the First Attempt…

(This one’s for Mike…) [Disclaimer: some or all of this post may be wrong…;-)]

So (and that was for Niall)… so: here’s what I think I know about SPARQL; or at least, here’s what I think I know about SPARQL and what I think is all you need to know to get started with SPARQL

SPARQL is a query language that can interrogate an RDF triple store.

An RDF triple store contains facts as 3 parts of a whole:

<knownThing1> <hasKnownAttribute> <knownThing2>
<knownThing1> <hasKnownRelationshipWith> <knownThing3>

That’s it, part one…

Let’s imagine a very small triple store that contains the following facts:

<Socrates> <existsAs> <aMan>
<Plato> <existsAs> <aMan>
<Zeus> <existsAs> <aGod>
<aGod> <hasLife> <Immortal>
<aMan> <hasLife> <Mortal>
<Socrates> <isTeacherOf> <Plato>

We can ask the following sorts of questions of this database, using the SPARQL query language (or should that be: using the SPAR Query Language?!)

Who is <aGod>?
select ?who where {
?who <existsAs> <aGod>
}

This should return: <Zeus>

Who is <aMan>?
select ?who where {
?who <existsAs> <aMan>
}

This should return: <Socrates>, <Plato>.

What sort of existence does <Zeus> have?
select ?existence where {
<Zeus> <existsAs> ?existence.
}

This should return: <aGod>.

What sort of life does <aMan> have?
select ?life where {
<aMan> <hasLife> ?life.
}

This should return: <Mortal>.

We only know that <Zeus> exists, but what sort of life does he have?
select ?life where {
<Zeus> <existsAs> ?dummyVar.
?dummyVar <hasLife> ?life.
}

This should return: <Immortal>.

Note that the full stop in the query is important. It means “AND”. [UPDATE: LD folk have taken issue with me saying the dot represents AND. See comment below for my take on this…]

We don’t know much, other than things exist and they have some sort of life: but who has what sort of life?
select ?who ?life where {
?who <existsAs> ?dummyVar.
?dummyVar <hasLife> ?life.
}

This should return: (<Socrates> <Mortal>),(<Plato> <Mortal>),(<Zeus> <Immortal>).

We know <Socrates>, but little more: what is there to know about him?
select ?does ?what where {
<Socrates> ?does ?what.
}

This should return: (<existsAs> <aMan>),(<isTeacherOf> <Plato>).

I think… If someone wants to set up a small triple store to try this out on, that would be handy…

If I’ve gone wrong somewhere, please let me know… (I’m writing this because I canlt sleep!)

If I haven’t gone wrong – that’s a relief… and furthermore: that is all I think you need to know to get started… (Just bear in mind that in real triple stores and queries, the syntactic clutter is a nightmare and is there solely to confuse you and scare you away…;-)

PS @cgutteridge’s Searching a SPARQL Endpoint demonstrates a useful ‘get you started’ query for exploring a real datastore (look for something by name, his example being a search over the Ordnance Survey triple store for things relating to ‘Ventnor’).

Google Translate Equilibrium Finder and Google Books Ngrams

A few days ago, in the post Translate to Google Statistical (“Google Standard”?!) English?, Iwondered whether there were any apps that looked for convergence of phrases going from one language, to another, and back again until a limit was reached. A comment from Erik at digitalmethods.net posted a link to Translation Party, a single web page app that looks for limit cycles between English and Japanese (as a default).

Having a look at the source, it seems there’s a switch to let you search for limits between English and other languages too, as the following screenshot shows:

Translation Party - Google trasnlation limit finder http://www.translationparty.com/?lang=fr

(Though I have to admit I don’t fully understand why the phrase in the above example appears to map to two different French translations?!)

Here’s another – timely – example, showing the dangers of this iterative approach to translation…

Translationparty.com

The switch is the URL argument lang=LANGUAGE_CODE, so for example, the French translation can be cued using http://www.translationparty.com/?lang=fr.

Another fun toy for the holiday break is the Google Books Ngrams trends viewer, that plots the occurrence of searched for phrases across a sample of books scanned as part of the Google Books project.

http://ngrams.googlelabs.com/graph?content=philosophy,religion,science&year_start=1800&year_end=2000&corpus=0&smoothing=3

Here’s another one:

http://ngrams.googlelabs.com/graph?content=teenager,pop+music,disco&year_start=1800&year_end=2000&corpus=0&smoothing=3

This is reminiscent of other trendspotting tools such as Google Trends (time series trends in Google search), or Trendistic ((time series trends in Twitter), which long-time readers may recall I’ve posted about before. (See also: Trendspotting, the webrhythms hashtag archive.)