Approxi-mapping Mash-ups, with a Google MyMaps Tidy Up to Follow

What do you do when you scrape a data set, geocode it so you can plot it on a map, and find that the geocoding isn’t quite as good as you’d hoped?

I’d promised myself that I wasn’t going to keep on posting “yet another way of scraping data into Google spreadsheets then geocoding it with a Yahoo pipe” posts along the lines of Data Scraping Wikipedia with Google Spreadsheets, but a post on Google Maps mania – Water Quality Google Map – sent me off on a train of thought that has sort of paid dividends…

So first up, the post got me thinking about whether there are maps of Blue Flag beaches in the UK, and where I could find them. A link on the UK page of blueflag.org lists them: UK Blue Flag beaches, (but there is a key in the URL, so I’m not sure how persistent that URL is).

Pull it into a Google spreadsheet using:
=ImportHtml(“http://www.blueflag.org/tools/beachsearch?q=beach&k={E1BB12E8-A3F7-4EE6-87B3-EC7CD55D3690}&f=locationcategory”,
“table”,”1″)

Publish the CSV:

Geocode the beaches using a Yahoo pipe – rather than using the Pipe location API, I’m making a call to the Yahoo GeoPlanet/Where API – I’ll post about that another day…

Grab the KML from the pipe:

Now looking at the map, it looks like some of the markers may be mislocated – like the ones that appear in the middle of the country, hundreds of miles from the coast. So what it might be handy to do is use the scraped data as a buggy, downloaded data set that needs cleaning. (This means that we are not going to treat the data as “live” data any more.)

And here’s where the next step comes in… Google MyMaps lets you seed a map by importing a KML file:

The import can be from a desktop file, or a URL:

Import the KML from the Yahoo pipe, and we now have the data set in the Google MyMap.

So the data set in the map is now decoupled from the pipe, the spreadsheet and the original Blue Flag website. It exists as a geo data set within Google MyMaps. Which means that I can edit the markers, and relocate the ones that are in the wrong place:

And before the post-hegenomic tirade comes in (;-), here’s an attempt at capturing the source of the data on the Google MyMap.

So, to sum up – Google MyMaps can be used to import an approximately geo-coded data set and used to tidy it up and republish it.

PS dont forget you can also use Google Maps (i.e. MyMaps) for geoblogging

Where is the Open University Homepage?

Several weeks ago, I was listening to one of the programmes delivered to me every week via my subscription to the IT Cnversations podcast feed, when I came across this Technometria episode on Search Engine Marketing (if you have a daily commute, it’s well worth listening to on one of your trips this week…).

One of the comments that resonated quite strongly with me, in part because I’ve heard several people in the OU comms team asking several times over the last few months “what’s the point of the OU homepage?”, was that to all intents and purposes, Google is now the de facto homepage from many institutions.

That is, this is the OU homepage for many people:


rather than this:

(As far as I know, very little of our online marketing sends traffic to the homepage – most campaigns send traffic to a URL deeper in the site more relevant to the particular campaign).

Just in passing, a post on Google Blogoscoped today – What Do People Seaarch For? – picked up on an item from Search Engine Land describing a new tool from Google: Search based Keyword Tool.

What this tool does is to “suggest keywords based on actual Google search queries” that are “matched to specific pages of your website”:

Hmmm…. (and yes, that Savings Interest Rates pages is on an OU domain…)

PS this search based keyword tool is also in the ball park of Google Trends, Google Insights for Search, and Google Trends for websites, which I’ve be playing with a lot recently (e.g. Playing with Google Search Data Trends and Recession, What Recession?), as well as the Google Adwords keywords tool:

which looks a lot more reasonable than the Search based Keyword tool?!

PPS Again in passing, and something I intend to pick up on a little more in a later post, Yahoo have just opened up a Key Terms service as part of the BOSS platform that will let you see the keywords that Yahoo has used to index a particular web page (Key Terms provide “an ordered terminological representation of what a document is about. The ordering of terms is based on each term’s frequency and its positional and contextual heuristics.”).

Services like Reuters’ OpenCalais already allow you to do ‘semantic tagging’ of free text, and Yahoo’s Term Extraction service also extracts keywords from text. I’m not sure how the BOSS exposed keywords compare with the keywords identified by the Term Extraction service as applied to a particular web page?

If I get a chance to run some tests, I’ll let you know, unless anyone can provide more info in the meantime?

Will Lack of Relevancy be the Downfall of Google?

Every so often, posts come around about new search engines that are going to make a bid to become a Google search killer, but I wonder if the changing nature of the web itself will lead people to a search engine that appears to do search better in those bits of the web that they’re spending most time, and so lead them away from Google?

It’s hard thinking back the 10 years or so to a time before Google, so I’m not sure what prompted me to switch allegiance from Metacrawler to Google? Maybe it was that Google results were dominating the Metacrawler results page? (For those of you who have know idea what I’m talking about, Metacrawler (which lives on to this day, as… MetaCrawler;-) was essentially a federated search engine, that pooled results from several, early web search engines. Before Metacrawler, I used Webcrawler, which was one of the first search engines to do full text search, I think?

In those early days, Google won out on producing “better” results in part because of its PageRank algorithm, in part becuase of its speedy response. PageRank essentially determines the authority of a page by the number of pages that link to it, and the authority of those pages. There’s lots of other voodoo magic in the ranking and relevancy algorithm now, of course, but that was at the heart of what made Google different in the early days.

So Google came good in large part because it used the structure of the web to help people better navigate the web.

But what of the structure of the web now? Many of the recently launched search engines have made great play of being “social” or “people powered” search engines, that leverage personal recommendations to improve search results. The big search engines are experimenting with tools that let searchers “vote up” more relevant results, and so on (e.g. Google’s experiment, or Microsoft’s URank experiment).

But it might be that the nature of recommending a page to someone else is now less to do with publishing a link to another site on a web page or in a blog post, than sharing a link with someone in a more conversational way (though as to how you found that link in the first place – there lies a problem;-)

So although Google won’t be able to snoop on link sharing in “walled garden” social networks like Facebook, I wonder if they are tracking link sharing in services like Twitter? (Google owns rival microblogging site Jaiku, but since buying it, all has been quiet. Maybe they’re waiting for the masses to become conscious of the thing called Twitter, then they’ll go prime time with Jaiku?)

Just by the by, there’s also the “problem” that many shared links are now being obfuscated by URL shortening services, which means that TinyURLs, bit.ly URLs and is.gd URLs all need resolving back to the pages they point to in order to rank those pages. (Hmm…. so when will the Goog be pushing it’s own URL shortening service, I wonder?)

This link resolution is easy enough to achieve, though. For example, the Tiwtturly service tracks the most poplular links being shared on Twitter over a 24 hour period (I think they also used to let you see who was tweeting about a particular URL, because I built a pipe around it – A Pipe for Twitturly – although that functionality appears to have disappeared?)

PS Maybe that Jaiku launch moment will actually be on mobile devices – on the iPhone (which now has Google Voice search), and on Android devices? Maybe Jaiku’s relaunch (and remember, Jaiku was heavy on the mobile stuff) will be a defining moment that hails: “the era of the PC [i]s over,… the future belong[s] to cloud applications accessed via phones”, via Daddy, Where’s Your Phone? which also includes a lovely story to illustrate this: a child overhears her dad answering “I don’t know” to a question

“Daddy, where’s your phone?”

“What do you mean, where’s my phone?” She explained that she’d overheard the question. Why wasn’t he just looking up the answer on his phone?

Cf. the apocryphal story of the child looking behind the TV set for a mouse, and the idea that “a screen without a mouse is broken”… ;-)

Dual View Media Channels

When I was putting together a talk (Users and Demons) for some visitors to the OU Library from the Cambridge University Library Arcadia project (who also put together the Cambridge Library Science Portal) a month or two ago, I included a slide depicting what might be a “typical” user of Library research related services.

//flickr.com/photos/zachklein/320561109/
“Where I work” by Zach Klein

Note the presence of the dual computer screens on the desk – wandering round the various corridors of the OU, it’s surprising how many people are now working with dual screen computers.

But the dual screen view is not just for the office desktop. I now find that I watch television with a laptop on my knee (and looking at my friends’ Tweets, I know some of them are in the habit of watching television with iPod or iPhone to hand (I can tell from the clients that the tweets are posted with)) – dual screen viewing again, though this time with one big screen relaying “pure” video content, and the other information, or a conversational back channel.

I also read Sunday papers with a laptop nearby – for fact checking, story chasing, and related info… Not a dual screen view, but a dual media view: one display surface for “fixed” textual information (the newspaper), one screen, with network connection.

Every time I go to a seminar or conference presentation, and many of the times I go into a meeting, I take a laptop. Dual channel, stereo info… One channel: other people, face-to-face; one channel: a screen and keyboard connection to the net.

And even if 2D Sema codes are not the way to go for Printing Out Online Course Materials With Embedded Movie Links, I’m convinced that dual media channels are going to have a huge impact on the way we deliver educational materials, particularly to distance education students.

In fact, I’d probably go further and suggest that it’s likely that one of the channels will be a predominantly one way, fixed content, information delivery channel (a book, TV programme or lecture, for example), and the other channel will be a two way channel to the net, providing access to supplementary information, user discovered resources, and people – discussion, conversation, and active reflection.

We used to engage with content through marking marks on paper – it was called taking notes. We’re going to engage with it in a far more active way, embellishing it and enriching it (not just noting it or annotating it) with supplementary material pulled viewed via a screen.

PS if you haven’t checked out the Cambridge University Library Portal, you should do…

The shape of things to come, maybe? It’ll be interesting to see what their web analytics say about the performance of the site? ;-)

PS see also: Daddy, Where’s Your Phone?

Recession, What Recession?

Following on from my own Playing With Google Search Data Trends, and John’s Google’s predictive power (contd.) pick up on this post from Bill Thompson, The net reveals the ties that bind, here’s one possible quick look at the impending state of the recession…

What search terms would you say are recession indicators?

PS I wonder to what extent, if any, the financial wizards factor real time “search intent” tracking into their stock trading strategies?

PPS I have to admit, I don’t really understand the shape of this trend at all?

The minima around Christmas look much of a muchness, but the New Year pear – and then the yearly average, are increasing, year on year? Any ideas?

Innovation in Institutions – and Yet More Jobs…

One of the things I’ve noticed about Twitter is that if you post a link there to a recent blog post, the post can start to get read very quickly. I’ve done a couple of experiments by tweeting links to old posts and comment threads to see if it can give them a little burst of renewed life, and I can anecdotally report that it does seem to work, if you get your twittertext right…

And it’s potentially also a way of using a subset of readers as a sounding board for whether or not to post more widely, to a larger set of readers. So for example, on Friday I replied to a comment on an earlier post (Printing Out Online Course Materials With Embedded Movie Links) with a rather <ranty> comment of my own… and got the following tweet back from @jukesie:

So here goes – I’ve blockquoted it, but it’s not strictly a quote – I have made a few minor changes – so if you want to read the comment in it’s original form, and in the original context, you can find it here.

The context was whether there was any value in adding a QR code visual link to a Youtube movie in the print stylesheet of a piece of online learning material that included an embedded video.

I picked up a catch phrase earlier today, about what UK HE needs: Flexibility, Innovation, Imagination.

So here’s my problem. The future lies around us, and some of us paddle in it. Innovation in the OU is hard to achieve – the feeling is whatever we give to our students, it has to scale and it has to be equally accessible to everyone. We often go for lowest common denominator plays, particularly with respect to assumptions about the availability of technology. The Innovator’s Dilemma rules…

Time out:

When I play with mashups – when I play with ideas – I’m balancing logic rocks. Sometimes they fall over, but that’s okay; if I wanted to build something a little longer lasting, I’d use concrete.

“if QR codes do take off here (they are used in industry but I mean, frequently used for general public) and all new phones start including the technology, and presumably by that time watching videos on phones will be more generally useful, the situation would change.”

QR codes may well not take off, but that’s as may be; something better may come along instead. But finding out how to teach effectively across multiple media at the same time is something I’d argue we don’t know how to do with contemporary devices and today’s lifestyles and expectations, assuming that the mean age of our students is less than the average age of OU staff.

The QR code was a throw away idea that made use of stuff that’s available and is low risk – a simple stylesheet change at its simplest, maybe switched by a preference cookie.

(Sharp intake of breath: “preference cookie – sheesh, that’ll be another week’s work, guv…” And if that is the case, then whither the OU student personalisation project. Here, the “QR code if cookie set” is a lite, but very real, test case of using cookie based personalisation.)

And if we can add a QR code into the print style file, we can maybe do other things – like print stylesheets that include registration patterns for augmented reality models.

So … by focussing on the fact that the QR code route won’t work, you’re missing the whole point. Which is that we need to find ways of exploring how to doodle with new technology in a distance classroom setting, and we need to build flexible components that make it easier – and quicker – to do related and next step things in the future.

The OU is probably unique in that we have a long tradition of using “blended” learning – teaching using different media – although arguably we have let those skills slide somewhat.

The future I have seen trending over the last year – that I’m willing to bet *will* come good over the next 3-5 years – is a “dual view” interaction with media. I sit with a laptop watching the TV – dual view; I read the Sunday papers with a laptop or iPod touch to hand: dual view; I read books and dip onto the web to chase references and look things up: dual view; researchers, designers and programmers at their desks – with two screen: dual view. The near-term future is: Dual View.

QR codes may suck – but that’s not the point. The point is looking for ways of using the technology that’s around us, and maybe the good will of some of our early adopter students, to explore how to use that technology. And also to cobble together building blocks and jisgsaw pieces. I have dozens of pipes and pipe fragments on Yahoo pipes. And it’s amazing how the old ones can come in useful…

And I believe in evolution; and in evolution, stuff fails. All the time. And still things move on…

Anyone who works for the OU knows it can take years to produce a course. So if we wait for the tech then learn how to use it, then write the course material to exploit it, a decade can have gone by. A decade…

I believe that once again we’re looking at various pilots of how to use text messaging with our students? Six or seven years ago, I spent 2-3 days clock time building a mobile WAP site around a course and a programme.

That experiment showed how to repurpose small chunks of info, and looked at some of the information design issues around “micro-sites”. I think I also built an SMS system that was architected in similar way, and explored the mapping between SMS and WAP sites. The app also provided a use case specification for what information might be usefully marked up in microformats on the OU courses and quals pages, which would have made scraping them easy (though of course an lite web service endpoint – maybe serving up a forerunner of XCRI) would have

WAP didn’t fly, but “micro info” has – tweets, SMS, the iUI aesthetic of iPhone apps. (I gave up the Micro Info blog 3 years ago because no-one grokked it.)

Exploring how to supplement text with video, and audio, in a dual view world, with navigation schemes that are natural to use and non-obtrusive (particularly to non-users) is something we need to explore by doing.

Maybe we all need to listen a little to what OUr Chancellor has to say?

(I guess one issue that now arises is that the potential for further commenting has been forked…?)

Just by the by, I’m also engaged in a, err, conversation at the moment about whether or not it will be possible to embed Youtube movies in learning materials delivered via our Moodle VLE. (We have already embedded Youtube videos in at least two of our online Relevant Knowledge short courses, but they use a different delivery environment.)

My argument for embedding is that it presents the material in the flow of the text. A link is a click away, which means that some (possibly significant) percentage of students won’t click through to watch it, and it also takes the student to a different context – specifically, Youtube… which is a vehicle for pushing advertising and keeping visitors onsite…

(There is an advantage to sending students to Youtube, of course – they may find additional, related material there that is in context and relevant – but pedagogically speaking maybe it’s not so good? (The “pedagogy” word is like a Joker in the OU card game. You can play it to try to justify anything… ;-)

Another approach that I’ve idled around over the last couple of years is that we don’t embed videos in the text as such, but we find a way of using progressive enhancement to view a video, from a link, in a lightbox/shadowbox (I do try to be accommodating, you see?). (For a discussion on this, see Interaction Design – “Now Follow This Link” and Progressive Enhancement – Some Examples. For an example of this technique in use, see Animation – Not Just For Numerical Data and click on the “Heavy Metal Umlaut” video link. Note that I’m deliberatley pointing to a page where the video is outsize compared to the lightbox window, to make the point that I know there are “issues” with using this technique naively… I’m not sure that I’m using the most recent version of that particular lightbox script either..)

There are good reasons for not supporting embedding/streamed replaying of media from third party servers in the page resources of a Moodle course, of course, one of which seems essentially policy driven: that media resources are served using an embedded player that draws from a locally hosted content store (I’m not sure if this is a real policy, but it appears, from my limited experience, to be an almost de facto one? Maybe I’m being a little harsh and someone can correct me on that?). So if we were to grab a copy of a Youtube video, and host it ourselves, I believe it wouldn’t be such a technical problem… (Though it would be for the Youtube – who make the content available for embedding as long as you stream the content from their servers, which is how they keep track of how it’s being used…)

Hmmm – and I thought the idea was to make more use of third party content, and find ways of working effectively within a well lubricated rights environment? Now I wonder… can I embed a slideshare presentation in our Moodle VLE? A flickr photo? A scribd document? An IT Conversations podcast?

And finally, here’s a chaser to my recent recent OU jobs round-up post (which also referred to the concerns of institutions, in particular, sharing), in the form of a couple more newly opened up vacancies:

  • 2 x Senior Lecturer – Knowledge Media Disciplines: The Open University’s Knowledge Media Institute has two positions for the role of Senior Lecturer in Knowledge Media Disciplines. The posts are intended to strengthen KMi’s reputation as an internationally leading Research Centre, and to further raise the profile of the Open University.
    You will aim in the first instance to strengthen our research in mobile computing and semantic social software however, we will be open to strategic guidance from successful candidates to other related areas. You will be expected to bid for and win significant research funding, produce high impact research outcomes, build comprehensive collaboration networks, manage project teams to deliver against project tasks, publish your research both individually and jointly, and supervise PhD students.
  • 3 x Technical Developers, Learning and Teaching Solutions (LTS): Over the last three years The Open University has been redeveloping the systems we use to allow our staff to teach and our students to learn online – we are now extending the development team to allow us to continue this work. Do you want to come and join us?
    You will be able to solve complex technical problems, think strategically and work on collaborative teams. Applications are particularly welcome from candidates with experience using PHP, particularly of developing for the Moodle platform. Experience of developing within an open source community would be advantageous.

As ever, none of the above jobs have anything to do with me…

PS I guess this post is related to the On-Line Higher Education Learning Debate? (In case you haven’t guessed, I’m a Trackback whore…!;-)

Playing With Google Search Data Trends

Early last week, Google announced a Google Flu trends service, that leverages the huge number of searches on Google to provide a near real-time indicator of ‘flu outbreaks in the US. Official reports from medical centres and doctors can lag actual outbreaks by up to a couple of weeks, but by correlating search trend data with real medical data, the Google folks were able to show that their data led the the official reports.

John Naughton picked up on this service in his Networker Observer column this week, and responded to an email follow-up comment I sent him idly wondering what search terms might be indicators of recession in this post on Google as a predictor. “Jobseeker’s allowance” appears to be on the rise, unfortunately (as does “redundancy”).

For some time, I’ve been convinced that spotting clusters of related search terms, or meaningful correlations between clusters of search terms, is going to be big the next step towards, err, something(?!), and Google Flu trends is one of the first public appearances of this outside the search, search marketing and ad sales area.

Which is why, on the playful side, I tried to pitch something like Trendspotting to the Games With a Purpose (GWAP) folks (so far unreplied to!), the idea being that players would have to try to identify search terms who’s trends were correlated in some “folk reasonable” way. Search terms like “flowers” and “valentine”, for example, which appear to be correlated according to the Google Trends service:

Just out of interest, can you guess what causes the second peak? Here’s one way of finding out – take a look at those search terms on the Google Insights for Search service (like Google Trends on steroids!):

Then narrow down the date over which we’re looking at the trend:

By inspection, it looks like the peak hits around May, so narrow the trend display to that period:

If you now scroll down the Google Insights for Search page, you can see what terms were “breaking out” (i.e. being searched for in volumes way out of the the norm) over that period:

So it looks like a Mother’s Day holiday? If you want to check, the Mother’s Day breakout (and ranking in the top searches list) is even more evident if you narrow down the date range even further.

Just by the by, what else can we find out? That the “Mother’s Day” holiday at the start of May is not internationally recognised, maybe?

There are several other places that are starting to collect trend data – not just search trend data – from arbitrary sources, such as Microsoft Research’s DataDepot (which I briefly described in Chasing Data – Are You Datablogging Yet?) and Trendrr.

The Microsoft service allegedly allows you to tweet data in, and the Trendrr service has a RESTful API for getting data in.

Although I’ve not seen it working yet (?!), the DataDepot looks like it tries to find correlations between data sets:

Next stop convolution of data, maybe?

So whither the future? In an explanatory blog post on the flu trends service – How we help track flu trends – the Googlers let slip that “[t]his is just the first launch in what we hope will be several public service applications of Google Trends in the future.”

It’ll be interesting to see what exactly those are going to be?

PS I’m so glad I did electronics as an undergrad degree. Discrete maths and graph theory drove web 2.0 social networking theory algorithms, and signal processing – not RDF – will drive web 3.0…