OUseful.Info, the blog…

Trying to find useful things to do with emerging technologies in open education

Posts Tagged ‘ddj

Local Data Journalism – Care Homes

leave a comment »

The front page of this week’s Isle of Wight County Press describes a tragic incident relating to a particular care home on the Island earlier this year:

20140914_103541

20140914_103611

(Unfortunately, the story doesn’t seem to appear on the County Press’ website? Must be part of a “divide the news into print and online, and never the twain shall meet” strategy?!)

As I recently started pottering around the CQC website and various datasets they publish, I thought I’d jot down a few notes about what I could find. The clues from the IWCP article were the name of the care home – Waxham House, High Park Road, Ryde – and the proprietor – Sanjay Ramdany.

Using the “CQC Care Directory – With Filters” from the CQC data and information page, I found a couple of homes registered to that provider.

1-120578256, 19/01/2011, Waxham House, 1 High Park Road, Ryde, Isle of Wight, PO33 1BP
1-120578313, 19/01/2011, Cornelia Heights 93 George Street, Ryde, Isle of Wight, PO33 2JE

1-101701588, Mr Sanjay Prakashsingh Ramdany & Mrs Sandhya Kumari Ramdany
	

Looking up “Waxham House” on the CQC website gives us a copy of the latest report outcome:

Waxham_House

Looking at the breadcrumb navigation, it seems we can directly get a list of other homes operated by the same proprietors:

cqc provider

I wonder if we can search the site by proprietor name too?

cqc properieot search

Looks like it…

So how did their other home fare?

Cornelia_Heights

Cornelia_Heights2

Hmmm…

By the by, according to the Food Standards Agency, how’s the food?

Food_Standards_Agency_-waxham

Food_Standards_Agency_-cornelia

And how much money is the local council paying these homes?

(Note – I haven’t updated the following datasets for a bit – I also note I need to add dates to the transaction tables. local spending explorer info; app.)

[Click through on the image to see the app - hit Search to remove the error message and load the data!]

IW_Council_Spending_Explorer_waxham

IW_Council_Spending_Explorer_cornelia

Why the refunds?

A check on OpenCorporates for director names turned up nothing.

I’m not trying to offer any story here about the actual case reported by the County Press, more a partial story about how we can start to look for data around a story to see if there may be more to the story we can find from open data sources.

Written by Tony Hirst

September 14, 2014 at 9:55 am

Posted in Anything you want

Tagged with

AP Business Wire Service Takes on Algowriters

Via @simonperry, news that AP will use robots to write some business stories (Automated Insights are one of several companies I’ve been tracking over the years who are involved in such activities, eg Notes on Narrative Science and Automated Insights).

The claim is that using algorithms to do the procedural writing opens up time for the journalists to do more of the sensemaking. One way I see this is that we can use data2text techniques to produce human readable press releases of things like statistical releases, which has a couple of advantages at least.

Firstly, the grunt – and error prone – work of running the numbers (calculating month on month or year on year changes, handling seasonal adjustments etc) can be handled by machines using transparent and reproducible algorithms. Secondly, churning numbers into simple words (“x went up month on month from Sept 2013 to Oct 2013 and down year on year from 2012″) makes them searchable using words, rather than having to write our own database or spreadsheet queries with lots of inequalities in them.

In this respect, something that’s been on my to do list for way to long is to produce some simple “press release” generators based on ONS releases (something I touched on in Data Textualisation – Making Human Readable Sense of Data).

Matt Waite’s upcoming course on “automated story bots” looks like it might produce some handy resources in this regard (code repo). In the meantime, he already shared the code described in How to write 261 leads in a fraction of a second here: ucr-story-bot.

For the longer term, on my “to ponder” list is what might something like “The Grammar of Graphics” be for data textualisation? (For background, see A Simple Introduction to the Graphing Philosophy of ggplot2.)

For example, what might a ggplot2 inspired gtplot library look like for converting data tables not into chart elements, but textual elements? Does it even make sense to try to construct such a grammar? What would the corollaries to aesthetics, geoms and scales be?

I think I perhaps need to mock-up some examples to see if anything comes to mind and that the function names, as well as the outputs, might look like, let alone the code to implement them! Or maybe code first is the way, to get a feel for how to build up the grammar from sensible looking implementation elements? Or more likely, perhaps a bit of iteration may be required?!

Written by Tony Hirst

July 2, 2014 at 10:00 am

Data Journalism – Conversations With Data Sources

Annotated slides from my opening talk at the University of Lincoln Journalism dept. research day – Data Journalism – Having Conversations with Data:

Written by Tony Hirst

June 28, 2014 at 11:40 am

Posted in Presentation

Tagged with

Personal Recollections of the “Data Journalism” Phrase

@digiphile’s being doing some digging around current popular usage of the phrase data journalism – here are my recollections…

My personal recollection of the current vogue is that “data driven journalism” was the phrase that dominated the discussions/community I was witness to around early 2009, though for some reason my blog doesn’t give any evidence for that (must take better contemporaneous notes of first noticings of evocative phrases;-). My route in was via “mashups”, mashup barcamps, and the like, where folk were experimenting with building services on newly emerging (and reverse engineered) APIs; things like crime mapping and CraigsList maps were in the air – putting stuff on maps was very popular I seem to recall! Yahoo were one of the big API providers at the time.

I noted the launch of the Guardian datablog and datastore in my personal blog/notebook here – http://blog.ouseful.info/2009/03/10/using-many-eyes-wikified-to-visualise-guardian-data-store-data-on-google-docs/ – though for some reason don’t appear to have linked to a launch post. With the arrival of the datastore it looked like there were to be “trusted” sources of data we could start to play with in a convenient way, accessed through Google docs APIs:-) Some notes on the trust thing here: http://blog.ouseful.info/2009/06/08/the-guardian-openplatform-datastore-just-a-toy-or-a-trusted-resource/

NESTA did an event on News Innovation London in July 2009, a review of which by @kevglobal mentions “discussions about data-driven journalism” (sic on the hyphen). I seem to recall that Journalism.co.uk (@JTownend) were also posting quite a few noticings around the use of data in the news at the time.

At some point, I did a lunchtime at the Guardian for their developers – there was a lot about Yahoo Pipes, I seem to remember! (I also remember pitching the Guardian Platform API to developers in the OU as a way of possibly getting fresh news content into courses. No-one got it…) I recall haranguing Simon Rogers on a regular basis about their lack of data normalisation (which I think in part led to the production of the Rosetta Stone spreadsheet) and their lack of use (at the time) of fusion tables. Twitter archives may turn something up there. Maybe Simon could go digging in the Twitter archives…?;-)

There was a session on related matters at the first(?) news:rewired event in early 2010 but I don’t recall the exact title of the session (I was in a session with Francis Irving/@frabcus from the then nascent Scraperwiki) http://blog.ouseful.info/2010/01/14/my-presentation-for-newsrewired-doing-the-data-mash/ Looking at the content of that presentation, it’s heavily dominated by notions of data flow; the data driven journalism (hence #ddj) phrase, seemed to fit this well.

Later that year, summer, was a roundtable event hosted by the ECJ on “data driven journalism” – I recall meeting Mirko Lorenz there (who maybe had a background in business data? and since helped launch datawrapper.de) and Jonathan Gray – who then went on to help edit the Data Journalism handbook – among others.

http://blog.ouseful.info/2010/08/25/my-slides-from-the-data-driven-journalism-round-table-ddj/

For me the focus at the time was very much on using technology to help flow data into useable content, (eg in a similar but perhaps slightly weaker sense than the more polished content generation services that Narrative Science/Automated Insights have since come to work on, or other data driven visualisations or what I guess we might term local information services; more about data driven applications with a weak local news/specific theme or issue general news relevance, perhaps). I don’t remember where the sense of the journalist was in all this – maybe as someone who would be able to take the flowed data, or use tools that were being developed to get the stories out of data with tech support?

My “data driven journalism” phrase notebook timeline

http://blog.ouseful.info/?s=%22data%20driven%20journalism%22&order=asc

My “data journalist” phrase notebook timeline

http://blog.ouseful.info/?s=%22data%20journalist%22&order=asc

My first blogged used of the data journalism phrase, in quotes, as it happens, so it must have been a relatively new sounding phrase to me, was here: http://blog.ouseful.info/2009/05/20/making-it-a-little-easier-to-use-google-spreadsheets-as-a-database-hopefully/ (h/t @paulbradshaw)

Seems like my first use of the “data journalist” phrase was in picking up on a job ad – so certainly the phrase was common to me by then.

http://blog.ouseful.info/2010/12/04/what-is-a-data-journalist/

As a practice and a commonplace, things still seemed to be developing in 2011 enough for me to comment on a situation where the Guardian and Telegraph teams were co-opetitively bootstrapping each other’s ideas: http://blog.ouseful.info/2011/09/13/data-journalists-engaging-in-co-innovation/

I guess the deeper history of CAR, database journalism, precision journalism may throw off trace references, though maybe not representing situations that led to the phrase gaining traction in “popular” usage?

Certainly, now I’m wondering what the relative rise in popularity of “data journalist” versus “data journalism” was? Certainly, for me, “data driven journalism” was a phrase I was familiar with way before the other two, though I do recall a sense of unease about it’s applicability to news stories that were perhaps “driven” by data more in the sense of being motivated or inspired by it, or whose origins lay in a data set, rather than “driven” in a live, active sense of someone using an interface that was powered by flowing data.

Written by Tony Hirst

April 29, 2014 at 9:36 am

Posted in Anything you want

Tagged with

An(other Attempt at an) Intro to Data Journalism…

I was pleased to be invited back to the University of Lincoln again yesterday to give a talk on data journalism to a couple of dozen or so journalism students…

I’ve posted a copy of the slides, as well as a set of annotated handouts onto slideshare, and to get a bump in my slideshare stats for meaningless page views, I’ve embedded the latter here too…

I was hoping to generate a copy of the slides (as images) embedded in a markdown version of the notes but couldn’t come up with a quick recipe for achieving that…

When I get a chance, it looks as if the easiest way will be to learn some VBA/Visual Basic for Applications macro scripting… So for example:

* How do I export powerpoint slide notes to individual text files?
* Using VBA To Export PowerPoint Slides To Images

If anyone beats me to it, I’m actually on a Mac, so from the looks of things on Stack Overflow, hacks will be required to get the VBA to actually work properly?

Written by Tony Hirst

February 19, 2014 at 6:59 pm

Posted in Infoskills

Tagged with ,

So what is a data journalist exactly? A view from the job ads…

A quick snapshot of how the data journalism scene is evolving at the moment based on job ads over the last few months…

Via mediauk, I’m not sure when this post for a Junior Data Journalist, Trinity Mirror Regionals (Manchester) was advertised (maybe it was for its new digital journalism unit?)? Here’s what they were looking for:

Trinity Mirror is seeking to recruit a junior data journalist to join its new data journalism unit.

Based in Manchester, the successful applicant will join a small team committed to using data to produce compelling and original content for its website and print products.

You will be expected to combine a high degree of technical skill – in terms of finding, interrogating and visualising data – with more traditional journalistic skills, like recognising stories and producing content that is genuinely useful to consumers.

Reporting to the head of data journalism, the successful candidate will be expected to help create and develop data-based packages, solve problems, find and ‘scrape’ key sources of data, and assist with the production of regular data bulletins flagging up news opportunities to editors and heads of content across the group.

You need to have bags of ideas, be as comfortable with sport as you are with news, know the tools to source and turn data into essential information for our readers and have a strong eye for detail.
This is a unique opportunity for a creative, motivated and highly-skilled individual to join an ambitious project from its start.

You will be expected to combine a high degree of technical skill – in terms of finding, interrogating and visualising data – with more traditional journalistic skills, like recognising stories and producing content that is genuinely useful to consumers.

Reporting to the head of data journalism, the successful candidate will be expected to help create and develop data-based packages, solve problems, find and ‘scrape’ key sources of data, and assist with the production of regular data bulletins flagging up news opportunities to editors and heads of content across the group.

You need to have bags of ideas, be as comfortable with sport as you are with news, know the tools to source and turn data into essential information for our readers and have a strong eye for detail.

News International were also recruiting a data journalist earlier this year, but I can’t find a copy of the actual ad.

From March, £23k-26k pa was on offer for a “Data Journalist” role that involved:

Identification of industry trends using quantitative-based research methods
Breaking news stories using digital research databases as a starting point
Researching & Analysing commercially valuable data for features, reports and events

Maintaining the Insolvency Today Market Intelligence Database (MID)
Mastering search functions and navigation of public databases such as London Gazette, Companies House, HM Court Listings, FSA Register, etc.
Using data trends as a basis for news stories and then using qualitative methods to structure stories and features.
Researching and producing content for the Insolvency cluster of products. (eg. Insolvency Today, Insolvency News, Insolvency BlackBook, Insolvency & Rescue Awards, etc.)
Identifying new data sources and trends, relevant to the Insolvency cluster.
Taking news stories published from rival sources and creating ‘follow up’ and analysis pieces using fresh data.
Occasional reporting from the High Court.
Liaising with the sales, events and marketing teams to share relevant ideas.
Sharing critical information with, and supporting sister editorial teams in the Credit and Payroll clusters.
Attending industry events to build contacts, network and represent the company.

On the other hand, a current rather clueless looking ad is offering £40k-60k for a “Data Journalist/Creative Data Engineer”:

Data Journalist/Creative Data Engineer is required by a leading digital media company based in Central London. This role is going to be working alongside a team of data modellers/statistical engineers and “bringing data to life”; your role will specifically be looking over data and converting it from “technical jargon” to creative, well written articles and white papers. This role is going to be pivotal for my client and has great scope for career progression.

To be considered for this role, you will ideally be a Data Journalist at the moment in the digital media space. You will have a genuine interest in the digital media industry and will have more than likely produced a white paper in the past or articles for publications such as AdAge previously. You will have a creative mind and will feel confident taking information from data and creating creative and persuasive written articles. Whilst this is not a technical role by anymeans it would definitely be of benefit if you had some basic technical knowledge with data mining or statistical modelling tools.

Here’s what the Associated Press were looking for from “Newsperson/Interactive Data Journalist”:

The ideal candidate will have experience with database management, data analysis and Web application development. (We use Ruby for most of our server-side coding, but we’re more interested in how you’ve solved problems with code than in the syntax you used to solve them.) Experience with the full lifecycle of a data project is vital, as the data journalist will be involved at every stage: discovering data resources, helping craft public records requests, managing data import and validation, designing queries and working with reporters and interactive designers to produce investigative stories and interactive graphics that engage readers while maintaining AP’s standards of accuracy and integrity.

Experience doing client-side development is a great advantage, as is knowledge of data visualization and UI design. If you have an interest in DevOps, mapping solutions or advanced statistical and machine learning techniques, we will want to hear about that, too. And if you have shared your knowledge through technical training or mentorship, those skills will be an important asset.

Most importantly, we’re looking for someone who wants to be part of a team, who can collaborate and communicate with people of varying technical levels. And the one absolute requirement is intellectual curiosity: if you like to pick up new technologies for fun and aren’t afraid to throw yourself into research to become the instant in-house expert on a topic, then you’re our kind of candidate.

And a post that’s still open at the time of writing – “Interactive Data Journalist ” with the FT:

The Financial Times is seeking an experienced data journalist to join its Interactive News team, a growing group of journalists, designers and developers who work at the heart of the FT newsroom to develop innovative forms of online storytelling. This position is based at our office in London.

You will have significant experience in obtaining, processing and presenting data in the context of news and features reporting. You have encyclopedic knowledge of the current best practices in data journalism, news apps, and interactive data visualisation.

Wrangling data is an everyday part of this job, so you are a bit of a ninja in Excel, SQL, Open Refine or a statistics package like Stata or R. You are conversant in HTML and CSS. In addition, you will be able to give examples of other tools, languages or technologies you have applied to editing multimedia, organising data, or presenting maps and statistics online.
More important than your current skillset, however, is a proven ability to solve problems independently and to constantly update your skills in a fast-evolving field.

While you will primarily coordinate the production of interactive data visualisations, you will be an all-round online journalist willing and able to fulfil other roles, including podcast production, writing and editing blog posts, and posting to social media.

We believe in building people’s careers by rotating them into different jobs every few years so you will also be someone who specifically wants to work for the FT and is interested in (or prepared to become interested in) the things that interest us.

So does that make it any clearer what a data journalist is or does?!

PS you might also find this relevant: Tow Center for Digital Journalism report on Post Industrial Journalism: Adapting to the Present

Written by Tony Hirst

May 31, 2013 at 4:42 pm

Posted in Jobs

Tagged with

Questioning Election Data to See if It Has a Story to Tell

I know, I know, the local elections are old news now, but elections come round again and again, which means building up a set of case examples of what we might be able to do – data wise – around elections in the future could be handy…

So here’s one example of a data-related question we might ask (where in this case by data I mean “information available in: a) electronic form, that b) can be represented in a structured way): are the candidates standing in different seats local to that ward/electoral division?. By “local”, I mean – can they vote in that ward by virtue of having a home address that lays within that ward?

Here’s what the original data for my own local council (the Isle of Wight council, a unitary authority) looked like – a multi-page PDF document collating the Notice of polls for each electoral division (archive copy):

IW council - notice of poll

Although it’s a PDF, the document is reasonably nicely structured for scraping (I’ll do a post on this over the next week or two) – you can find a Scraperwiki scraper here. I pull out three sorts of data – information about the polling stations (the table at the bottom of the page), information about the signatories (of which, more in a later post…;-), and information about the candidates, including the electoral division in which they were standing (the “ward” column) and a home address for them, as shown here:

scraperwiki candidates

So what might we be able to do with this information? Does the home address take us anywhere interesting? Maybe. If we can easily look up the electoral division the home addresses fall in, we have a handful of news story search opportunities: 1) to what extent are candidates – and election winners – “local”? 2) do any of the parties appear to favour standing in/out of ward candidates? 3) if candidates are standing out of their home ward, why? If we complement the data with information about the number of votes cast for each candidate, might we be able to find any patterns suggestive of a beneficial or detrimental effect living within, or outside of, the electoral division a candidate is standing in, and so on.

In this post, I’ll describe a way of having a conversation with the data using OpenRefine and Google Fusion Tables as a way of starting to explore some the stories we may be able to tell with, and around, the data. (Bruce Mcphereson/Excel Liberation blog has also posted an Excel version of the methods described in the post: Mashing up electoral data. Thanks, Bruce:-)

Let’s get the data into OpenRefine so we can start to work it. Scraperwiki provides a CSV output format for each scraper table, so we can get a URL for it that we can then use to pull the data into OpenRefine:

scraperwiki CSV export

In OpenRefine, we can Create a New Project and then import the data directly:

openrefine import from URL

The data is in comma separated CSV format, so let’s specify that:

import as csv comma separated

We can then name and create the project and we’re ready to start…

…but start what? If we want to find out if a candidate lives in ward or out of ward, we either need to know whether their address is in ward or out of ward, or we need to find out which ward their address is in and then see if it is the same as the one they are standing in.

Now it just so happens (:-) that MySociety run a service called MapIt that lets you submit a postcode and it tells you a whole host of things about what administrative areas that postcode is in, including (in this case) the unitary authority electoral division.

mapit postcode lookup

And what’s more, MapIt also makes the data available in a format that’s data ready for OpenRefine to be able to read at a web address (aka a URL) that we can construct from a postcode:

mapit json

Here’s an example of just such a web address: http://mapit.mysociety.org/postcode/PO36%200JT

Can you see the postcode in there? http://mapit.mysociety.org/postcode/PO36%200JT

The %20 is a character encoding for a space. In this case, we can also use a +.

So – to get information about the electoral division an address lays in, we need to get the postcode, construct a URL to pull down corresponding data from MapIt, and then figure out some way to get the electoral division name out of the data. But one step at a time, eh?!;-)

Hmmm…I wonder if postcode areas necessarily fall within electoral divisions? I can imagine (though it may be incorrect to do so!) a situation where a division boundary falls within a postcode area, so we need to be suspicious about the result, or at least bear in mind that an address falling near a division boundary may be wrongly classified. (I guess if we plot postcodes on a map, we could look to see how close to the boundary line they are, because we already know how to plot boundary lines.

To grab the postcode, a quick skim of the addresses suggests that they are written in a standard way – the postcode always seems to appear at the end of the string preceded by a comma. We can use this information to extract the postcode, by splitting the address at each comma into an ordered list of chunks, then picking the last item in the list. Because the postcode might be preceded by a space character, it’s often convenient for us to strip() any white space surrounding it.

What we want to do then is to create a new, derived column based on the address:

Add derived column

And we do this by creating a list of comma separated chunks from the address, picking the last one (by counting backwards from the end of the list), and then stripping off any whitespace/space characters that surround it:

grab a postcode

Here’s the result…

postcodes...

Having got the postcode, we can now generate a URL from it and then pull down the data from each URL:

col from URL

When constructing the web address, we need to remember to encode the postcode by escaping it so as not to break the URL:

get data from URL

The throttle value slows down the rate at which OpenRefine loads in data from the URLs. If we set it to 500 milliseconds, it will load one page every half a second.

When it’s loaded in all the data, we get a new column, filled with data from the MapIt service…

lots of data

We now need to parse this data (which is in a JSON format) to pull out the electoral division. There’s a bit of jiggery pokery required to do this, and I couldn’t work it out myself at first, but Stack Overflow came to the rescue:

that's handy...

We need to tweak that expression slightly by first grabbing the areas data from the full set of MapIt data. Here’s the expression I used:

filter(('[' + (value.parseJson()['areas'].replace( /"[0-9]+":/,""))[1,-1] + ']' ).parseJson(), v, v['type']=='UTE' )[0]['name']

to create a new column containing the electoral division:

parse out the electroal division

Now we can create another column, this time based on the new Electoral Division column, that compares the value against the corresponding original “ward” column value (i.e. the electoral division the candidate was standing in) and prints a message saying whether they were standing in ward or out:

inward or out

If we collapse down the spare columns, we get a clearer picture:

collapse...

Like this:

summary data

If we generate a text facet on the In/Out column, and increase the number of rows displayed, we can filter the results to show just the candidates who stood in their local electoral division (or conversely, those who stood outside it):

facet on inout

We can also start to get investigative, and ask some more questions of the data. For example, we could apply a text facet on the party/desc column to let us filter the results even more…

inout facet filter

Hmmm… were most of the Labour Party candidates standing outside their home division (and hence unable to vote for themselves?!)

Hmm.. labour out

There aren’t too many parties represented across the Island elections (a text facet on the desc/party description column should reveal them all), so it wouldn’t be too hard to treat the data as a source, get paper and pen in hand, and write down the in/out counts for each party describing the extent to which they fielded candidates who lived in the electoral divisions they were standing in (and as such, could vote for themselves!) versus those who lived “outside”. This data could reasonably be displayed using a staggered bar chart (the data collection and plotting are left as an exercise for the reader [See Bruce Mcphereson's Mashing up electoral data post for a stacked bar chart view.];-) Another possible questioning line is how do the different electoral divisions fare in terms of in-vs-out resident candidates. If we pull in affluence/poverty data, might it tell us anything about the likelihood of candidates living in area, or even tell us something about the likely socio-economic standing of the candidates?

One more thing we could try to do is to geocode the postcode of the address of the each candidate rather more exactly. A blog post by Ordnance Survey blogger John Goodwin (@gothwin) shows how we might do this (note: copying the code from John’s post won’t necessarily work; WordPress has a tendency to replace single quotes with all manner of exotic punctuation marks that f**k things up when you copy and paste them into froms for use in other contexts). When we “Add column by fetching URLs”, we should use something along the lines of the following:

'http://beta.data.ordnancesurvey.co.uk/datasets/code-point-open/apis/search?output=json&query=' + escape(value,'url')

os postcode lookup

The data, as imported from the Ordnance Survey, looks something like this:

o:sdata

As is the way of national services, the Ordnance Survey returns a data format that is all well and good but isn’t the one that mortals use. Many of my geo-recipes rely on latitude and longitude co-ordinates, but the call to the Ordnance Survey API returns Eastings and Northings.

Fortunately, Paul Bradshaw had come across this problem before (How to: Convert Easting/Northing into Lat/Long for an Interactive Map) and bludgeoned(?!;-) Stuart harrison/@pezholio, ex- of Lichfield Council, now of the Open Data Institute, to produce a pop-up service that returns lat/long co-ordinates in exchange for a Northing/Easting pair.

The service relies on URLs of the form http://www.uk-postcodes.com/eastingnorthing.php?easting=EASTING&northing=NORTHING, which we can construct from data returned from the Ordnance Survey API:

easting northing lat -long

Here’s what the returned lat/long data looks like:

lat-long json

We can then create a new column derived from this JSON data by parsing it as follows
parse latlong to lat

A similar trick can be used to generate a column containing just the longitude data.

We can then export a view over the data to a CSV file, or direct to Google Fusion tables.

postcode lat long export

With the data in Google Fusion Tables, we can let Fusion Tables know that the Postcode lat and Postcode long columns define a location:2222

Fusion table edit column

Specifically, we pick either the lat or the long column and use it to cast a two column latitude and longitude location type:

fusion table config cols to location type

We can inspect the location data using a more convenient “natural” view over it…

fusion table add map

By applying a filter, we can look to see where the candidates for a particular ward have declared their home address to be:

havenstreet candidates

(Note – it would be more useful to plot these markers over a boundary line defined region corresponding to the area covered by the corresponding electoral ward. I don’t think Fusion Table lets you do this directly (or if it does, I don’t know how to do it..!). This workaround – FusionTablesLayer Wizard – on merging outputs from Fusion Tables as separate layers on a Google Map is the closest I’ve found following a not very thorough search;-)

We can go back to the tabular view in Fusion Tables to run a filter to see who the candidates were in a particular electoral division, or we can go back to OpenRefine and run a filter (or a facet) on the ward column to see who the candidates were:

refine filter by division

Filtering on some of the other wards using local knowledge (i.e. using the filter to check/corroborate things I knew), I spotted a couple of missing markers. Going back to the OpenRefine view of the data, I ran a facetted view on the postcode to see if there were any “none-postcodes” there that would in turn break the Ordnance Survey postcode geocoding/lookup:

postcode missing...

Ah – oops… It seems we have a “data quality” issue, although albeit a minor one…

So, what do we learn from all this? One take away for me is that data is a source we can ask questions of. If we have a story or angle in mind, we can tune our questions to tease out corroborating facts (possbily! caveat emptor applies!) that might confirm, helpdevelop, or even cause us to rethink, the story we are working towards telling based on the support the data gives us.

Written by Tony Hirst

May 5, 2013 at 11:38 pm

Follow

Get every new post delivered to your Inbox.

Join 809 other followers