So It Seems My Ballot Didn’t Count Twice in the PCC Election…

At the start of my second year 6th, way back when, we had an external speaker – a Labour miner, through and through – come and talk to us about voting. In colourful language, he made it clear that he didn’t mind who we voted for, as long as we voted. I’m not sure what he had to say about spoiled votes, but as far as I can remember, I have always cast a ballot whenever I have been eligible to vote in a public election.

For folk dissatisfied with the candidates standing, I guess there are three +1 options available: 1) don’t vote at all; 2) spoil the paper; 3) cast an empty ballot (showing just how much you trust the way ballots are processed and counted); I can actually think of a couple of ways of spoiling or casting an empty ballot – one in the privacy of the the voting booth, the other in full site of the people staffing the ballot box. The +1 is stand yourself… For the first time ever, I cast an empty ballot this time round and it felt wrong, somehow… I should have made my mark on the voting form.

Anyway… the PCC (Police and Crime Commissioner) election forms allowed voters to nominate a first choice and (optionally) a second choice under a supplementary vote mechanism, described by the BBC as follows: “If a candidate has won more than 50% of first preferences they are elected. If no candidate has won more than 50%, all but the top two candidates are then eliminated. Any second preferences for the top two candidates from the eliminated candidates are added to the two remaining candidates’ totals. Whoever has the most votes combined is declared the winner.”

The Guardian Datablog duly published a spreadsheet of the PCC election results (sans spoiled ballot counts) and Andy Powell hacked them around to do a little bit of further analysis. In particular, Andy came up with a stacked bar chart showing the proportion of votes cast for the winner, vs. others, vs. didn’t vote. Note that the count recorded for the winner in the Guardian data, and Andy’s data (which is derived from the Guardian data) appears to the first round count

…which means we can look to see which elections returned a Commissioner based on second preference votes. If I use my Datastore Explorer tool to treat the spreadsheet as a database, and run a query looking for rows where the winner’s vote was less than any of the other vote counts, here’s what we get:

Here’s a link to my spreadsheet explorer view over Andy’s spreadsheet: PCC count – spreadsheet explorer:

So it seems that as someone in the Hampshire area, I could have had two preferences counted in the returned result, if I had voted for the winner as my second choice.

Open Webstats from GovUK

In the #solo12eval session* on Monday organised by Shane McCracken and Karen Bultitude on the topic of evaluating impact (whatever that is) of online science engagement, I was reminded (yet again…) of the Culture24 report on Evaluating Impact online. The report included as one of its deliverables a set of example Google Analytics report templates (now rotted?) that provided a starting point for what could be a commonly-accepted-as-sensible reporting framework. (I keep wondering whether it would be useful to try to do the same for academic library websites/library website analytics?) One of the things I pondered afterwards was whether it would make sense to open up Google Analytics from a ‘typical’ website in that sector to all-comers, so that different parties could demonstrate what stories and information they could pull out of the stats using a common data basis. Something a bit like CSS Zen Garden, but around a common Google Analytics dataset, for example?

* From the session, I also learned of the JISC Impact Analysis Programme, which includes an interestingly titled project on Tracking Digital Impact (TDI). That project is presumably in stealth mode, because it was really hard to find out anything about it… (I thought JISC projects were all encouraged to do the blogging thing? Or is that just from certain enlightened parts of JISC…?)

Loosely related to the workshop, and from my feeds, I noticed a couple of announcements over the last couple of days relating to the publication of web/traffic stats on a couple of government web properties.

First up, the Government Digital Service/@gdsteam posted on their Updat[ed] GOV.UK Performance Dashboard, which you can find: Performance Platform Dashboard.

As you can see, this dashboard reports on a variety of Google Analytics stats – average unique visitors, weekly pageviews, and so on.

As well as the dashboard itself, the @gds_datashark team seem to be quite happy to show their working and presumably allow others to propose check-ins of their own bug fixes and code solutions to .. Gov github

To make it easy to play along, they’re publishing a set of raw data feeds (Headline narrative text, Yesterday’s hourly traffic and comparison average, Weekly visits to GOV.UK, Direct Gov and Businesslink, Weekly unique visitors to GOV.UK, Direct Gov and Businesslink, Format success metrics) although the blog post notes these are ‘internal’ URLs and hence are subject to change…

(Via tweets from @jukesie and @lseteph, I was also reminded that Steph experimented with publishing BIS’ departmental webstats way back when)

In the past, UKGov has posted a certain amount of costings related data around website provision (for example, So Where Do the Numbers in Government Reports Come From?), so if there are any armchair web analysts/auditors out there (unlikely, I know;-), it seems as if data is there for the taking, as well as the asking (the GDS folk seem to be quite open to ideas…)

The second announcement that caught my eye was the opening up of site usage stats on the data.gov.uk website.

Data is broken down into site-wide, publisher and datasets groupings, and reports on things like:

– browser type
– O/S type
– social network referrals
– language
– country

The data is also available via a CSV file.

So I wonder: could we use the GDS and data.gov.uk data/data feeds as the basis for a crude webstats Zen Garden? How would such a site best be architected? (One central github repo pulling in exemplar view requests from cloned repos?) And would it make sense to publish webstats data/analytics from a “typical” science engagement website (or library website, or course website), and allow the community to see what sorts of take on it folk can come up with in respect of different ways presenting the data and more importantly, identifying different ways of making sense of it/finding different ways of telling stories with it?

Historical OUseful Jottings on Amplified Conferences

Via a tweet, @briankelly asks for links to things I’ve posted about around the notion of amplified conferences, whatever they are (I’m starting to reconsider what this might mean in the context of Conference Situational Awareness). I fear that there’s an increasing number of posts under the OUseful.info banner that I’ve completely forgotten about, but I’ll try to round up some of the ones here that Brian may (or may not) be interested in… If you’re not Brian, I woouldnlt bother reading any more of this post. If you are Brian, I’d probably stop now, too…

To try to get my recall juices flowing, I guess I’ve muddled around the idea of amplified conferences in several different contexts:

Conference Treasure Boxes

I’m at OpenEd12 at the moment, in the wonderful city of Vancouver, loving Eduroam, and pondering the Conference Treasure Box

Here it is:

And here’s a peek inside:

…which is to say – here it is:

..and here’s a peek inside:

..and another:

As @cogdog describes:

We are also seeking new ways of documenting the conference experience through the device created by David Darts, the PirateBox. which turns a local space into a communication and sharing network.

Using open source technology and under US$30 in parts, the PirateBox creates a local, open wireless network. Upon joining this network, you are not connected to the internet, but a web server running locally on the box, which is set up with simple tools for uploading and downloading files, synchronous chat, and a message board. All communication with the PirateBox is anonymous.

For non-local participants, there’s also an email route for dropping attachments into the box (I think this is handled directly/automatically, rather than Alan checking his email every so often and then moving any attached files over…?)

This is a really powerful idea, I think, particularly for conference workshop sessions. For the workshop session I’m due to give (with @mhawksey) at ILI2012, one of the things it would be handy to do would be to have data files (and maybe app installers) to hand for participants to make use of. I was thinking a USB memory stick would be the best way of making these files available (assuming flaky conference wifi and sizeable downloads), but something like the PirateBox provide a really neat alternative for local file sharing, with the added advantage you can take some of the load of a typically stressed conference internet wifi network.

I notice that the chatbox includes a simple chat environment (though it didn’t seem to work for me?), but I guess it could also include an etherpad for shared local notetaking?

It also occurs to me that we could be grabbing copies of web pages and files that have been linked to in a Twitter backchannel, generate PDF equivalents of webpages (maybe?!) and then pop copies into the Pirate box to serve as both a file access point for some of the things being shared in the conference, as well as a local archive of resources shared around the event. This provides a complement to a traditional conference proceedings, though it might also include shared copies of presented papers. As to an interface to the shared contents, a Flipboard style news magazine interface comes to mind maybe?

PS via @grantpotter, here’s an even more powerful alternative: FreedomBox.

Conference Situational Awareness

Confluence, again… first up, along with Doug Clow I’m one of the Awareness, Interaction & Memory co-chairs for LAK13, the Third International Learning Analytics & Knowledge Conference [LAK13 on Lanyrd] (and let me make this absolutely clear, memory is not legacy… and awareness is not just amplification (thanks, Kay…;-))

Part of the role is drumming up awareness of, and hopefully participation in, the conference itself; part of it will relate to helping folk get the most out of it whilst its on; and part of it will be about building on and making the most of the various conference activities after the event. As befits the theme of conference, we’ll also be looking to turn conference related into activity into datasets that we can put to work and hopefully extract value from. To a certain extent, I guess we could characterise part of this activity as “conference awareness”, or even conference situational awareness, which feeds into…

…a call from DSTL (the Defence Science and Technology bit of the MoD, the bit of DERA, as was, that wasn’t sold off as Qinetiq, I guess?) on Cyber Situational Awareness:

Cyber Situational Awareness (Cyber SA) in an MOD context can be defined as the ability for MOD to understand the effect that a Cyber event (ie an attack on us, or some action we could take ourselves) could have on the ability for the MOD enterprise to conduct its business. Enterprise in this context means the day-to-day business of MOD, and the military operations it conducts – eg the supply chain, the pay system, the training of its personnel. It should be able to evaluate the implications and impact of a particular piece of action and feed that information back into the decision-support process. Being able to achieve Cyber SA is a key enabler to allow MOD to defend all of its digital assets and freedom to operate in Cyberspace.

Linkages and dependencies between Cyberspace ‘layers’
We can try to visualise the various components of Cyber SA in a number of ways. The Cyber SA Layer Diagram (Figure 1) is one such method of depicting how users interact with Cyberspace. The elements of Cyberspace are represented as a series of layers, from the physical lay-down of digital assets up through the information layer to the human layers.

The physical layer (real world and network) consists of a geographic aspect – the physical location of elements of a network, such as under the sea, or under the ground or in a building – and the physical network components, which consists of: physical hardware and infrastructure (wired, wireless and optical); and the physical connections (wires, cables, radio frequency, satellite communications, routers, servers and computers).
Parts of MOD do monitor and gain SA on certain individual layers, but no-one in MOD gains an understanding of all the layers.

The logical layer (information) consists of the logical connections that exist between network nodes. A node is any physical device connected to a computer network; for example, computers, personal digital assistants and cell phones.
The logical layer includes applications and data and protocols that enable interactions across the physical layer, along with the configuration of individual networks. Characteristics of the logical layer:
• draws together a variety of information feeds from both open and closed sources
• fusion of sources to improve the overall information credence
• dealing with uncertainty and conflicts in the information – corroboration and believability
• information also gained from human and socio cultural intelligence.
The logical layer also includes details of communication service providers, transfer protocols, internet domain names and ownership information.

The social layer (persona, people and social) consists of the details that connect people to Cyberspace and the actual people and groups who interact by using the networks. Unique addresses or titles are matched to virtual addresses, which in turn map to the physical layer.
A single person can have multiple personas; and equally, multiple people can share a single persona.
How do we provide a level of assurance to the decision maker that we have the correct information? How do we measure our effect in this area?
The social layer can be further analysed through sub-areas such as, social networking, operating procedures, maintenance, and security.
Cyber SA will use this information as a feed to inform an understanding of who we are likely to be the instigators of attacks on us – eg malicious criminal groups, or state/non state sponsored hackers.
An understanding of the linkages and dependencies between the Cyberspace layers is key to gaining Cyber SA. Being able to understand and quantify an event in Cyberspace that occurs on a particular layer is challenging, but gaining an understanding of how such an event affects and impacts the other layers and the resulting impact it has on the real world is more challenging still.
We need to be able to identify and comprehend the interdependencies, influence and interaction (causes and effects) that exist between the Cyber layers and how we can mitigate these effects to maintain MOD’s freedom of manoeuvre in Cyberspace.

True Cyber SA should enable a decision maker to gain an understanding of all these layers, the linkages and dependencies between them and the impact that an event has in any particular layer on the ‘real world’

Here’s another little nugget I found in the slides that went with the call town hall meeting, taken (I think) from the Joint Doctrine Publication JDP 04 – Understanding:

(Whilst doctrine statements can often be long, jargon and acronym filled documents, they do often contain the occasional graphic, word equation or conceptualisation that can actually be quite rich when considered in other domains. I haven’t read JDP04 yet, but I have popped it onto my toread list…)

What actually jumped out at me from the call document in the context of LAK13 situational awareness was this:

Another interesting example was identified at the recent London 2012 Olympic Games where it was identified that tweets relating to the congestion of the Olympic park entrances had a direct effect on crowd flow through the site.
– This phenomena was a demonstration of the interaction between the Cyber layers.
— How do we extrapolate meaning from occurrences like this in real time and project that meaning onto the SA picture?
— We are likely to be transcending the Cyber layers in the process, so how do we capture this information and understand the impact that it has on wider Cyberspace?

So for example: could activity in the online social layer around a particular topic cause shifts in the way folk attend different sessions in the physical conference? Or could action at a distance, eg from participants not physically attending the conference, influence physical activity at the conference venue?

(It also reminded me how much the DERA/DSTL folk like layered models;-)

As ever, confluence posts are a good opportunity for me to do a bit of a tab sweep, so here are a couple of related things I have open at the moment:
Finding Your Friends and Following Them to Where You Are [PDF]
Beating the event hash tag spammers; I mentioned to Kirsty that Twitter has a “blocked user” call in the (authenticated) API, so it’s possible to look up folk who have recently been blocked by an arbitrary user. I’m thinking this might be interesting in the context of a dreamcatcher filter, as well as pondering the extent to which a dreamcatcher idea might work in a situational awareness setting?
– Online Journalism Blog – Get Involved in a New HMI Project: Investigating CCGs (Clinical Commissioning Groups); this is one of those areas where there are multiple sources of disconnected information, that when fused using a mosaic theory technique, may start to reveal structures that folk would rather weren’t common knowledge…

its the Gramma an punctuashun wot its’ about, Rgiht?

This is another of those confluence style posts, where a handful of things I’ve read in quick succession seem to phase lock in my mind:

(brought to mind in part via @downes a week or so ago: How to Synch 32 Metronomes)

The first was a post by Alan Levine on Making Text Work, which describes a simple technique for making text overlays on photographs create a more coherent image than just slapping text on:

One of the techniques I use on presentation slides from time to time is a solid filled banner or stripe containing text, some opaque, sometimes transparent. I wouldn’t claim to be much of an artist but it makes the slides slightly more interesting than a header with an image in the middle of the slide, surrounded by whitespace…

(Which reminds me: maybe I should look through Presentation Zen Design and Presentation Zen: Simple Ideas on Presentation Design and Delivery again…)

Reading Alan’s post, it occurred to me that once you get the idea of using a solid or semi-transparent solid filled background to a text label, you tend to remember it and add it to your toolbox of presentation ideas (of course, you might also forget and later rediscover this sort of trick… My own slides tend to crudely follow particular design ideas I’ve recently picked up on, albeit in a crude and often not very well polished way, that I’ve decided I want to try to explore…) In the slide above, several tricks are evident: the solid filled text label, the positioning of it, the backgrounded blog post that actually serves as a reference for the slide (you can do a web search for the post title to learn more about the topic), the foregrounded image, rotated slightly, and so on.

The thing that struck me about Alan’s post was that it reminded me of a time before I was really aware of using a solid filled label to foreground a piece of text, which in turn caused me to reflect on other things I now take for granted as ideas that I can draw on and combine with other ideas.

In the same way we learn to spell, and learn to use punctuation, and start to pick up on the grammer that structures a language, so we can use those rules to construct ever more complex sentences. Once we know the rules, it becomes a choice as to whether or not we employ them.

Here’s an example of how we might come to acquire a new design idea, drawn from a brief conversation with @mediaczar a couple of nights ago when Mat asked if anyone knew the name od this sort of chart combination:

I didn’t know what to call the chart, but thought it should be easy enough to try to wrangle one together using ggplot in R, guessing that a geom_errorbarh() might work; Mat came back suggesting geom_crossbar().

Here’s a minimal code fragment I used to explore it:

#plot a horizontal bar across a bar chart
a=data.frame(x=10,y=43)
b=stack(a)
c=data.frame(d=c('x','y'),f=c(3,22))
library(ggplot2)
g=ggplot(b)+geom_bar(aes(x=ind,y=values),stat='identity')
g=g+geom_crossbar(data = c, aes(x=d,y=f,ymax=f+1,ymin=f-1), colour = NA, fill = "red", width = 0.9, alpha = 0.5)
print(g)

Here’s an example of how I used it – in an as yet unlabelled sketch showing for a particular F1 driver, their gird position for each race (red bar) the the number of places they gained (or lost) during the first lap:

So now I know how to achieve that effect…

Now for one two more things… Just after reading Alan’s post, I read a post by James Allen on possible race strategies for the Japanese Grand Prix:

The first thing that struck me was that even if you vaguely understand how a race chart works, the following statement may not be readily obvious to you from the top chart (my emphasis):

Three stops is actually faster [taking new softs on lap 12], as the [upper] graph … shows, but it requires the driver to pass the two stoppers in the final stint. If there is a safety car, it will hand an advantage to the two stoppers.

So, can you see why the three stopper (the green line) “requires the driver to pass the two stoppers in the final stint”? Let’s step back for a moment – can you see which bit of the graphs represent an overtake?

This is actually quite a complex graph to read – the axes are non-obvious, and not that easy to describe, though you soon pick up a feeling for how the chart works (honest!). Getting a sensible interpretation working for the surprising feature – the sharp vertical drops – is one way of getting into this chart, as well as looking at how the lines are postioned at the extreme x values (that is, at the end of the first and last laps).

The second thing that occurred to me was that we could actually remove the fragment of the line that shows the pitstop and instead show a separate line segment for each stint for each driver, and hence the line crossings that do not represent required overtakes. I’ve used this technique before, for example to show the separate stints on a chart of laptimes for a particular driver over the course of a race:

And as to where I got that trick? I think it was a bastardisation of a cycle plot, which can be used to show monthly, weekly or seasonal trends over a series of years:

…but it could equally have been a Stephen Few highlighted trick of disconnecting a timeseries line at the crossing point of one month to the next…

Whatever the case, one of the ideas I always have in mind is whether it may be possible to introduce white space in the form of a break in a line in order to separate out different groups of data in a very simple way.

Therapy Time: Networked Personal Learning and a Reflection on the Urban Peasant…

Way back when I was a postgrad, I used to spend a coffee fuelled morning reading in bed, and then get up to eat a cooked breakfast whilst watching the Urban Peasant, a home kitchen chef with a great attitude:

My abiding memory, in part confirmed by several of the asides in the above clip (can you guess which?!), was that of “agile cooking” and flexible recipes. A chicken curry (pork’s fine too, or beef, even fish if you like; or potato if you want a vegetarian version) could be served with rice (or bread, or a baked potato); if you didn’t like curry, you could leave out the spices or curry powder, and just use a stock cube. If a recipe called for chopped vegetables, you could also grate them or slice them or dice them or…”it’s your decision”. Potato and peas could equally well be carrot or parsnip and beans. If you needed to add water to a recipe, you could add wine, or beer, or fruit juice or whatever instead; if you wanted to have scrambled egg on toast, you could also fry it, or poach it, or boil it. And the toast could be a crumpet or a muffin or just use “whatever you’ve got”.

The ethos was very much one of: start with an idea, and/or see what you’ve got, and then work with it – a real hacker ethic. It also encouraged you to try alternative ideas out, to be adaptive. And I’m pretty sure mistakes happened too – but that was fine…

When I play with data, I often have a goal in mind (albeit a loose one), used to provide a focus for exploring a data set I want to explore a little (typically using Schneiderman’s “Overview first, zoom and filter, then details-on-demand” approach), to see what potential it might hold, or to act a testbed for a tool or technique I want to try out. The problem then becomes one of coming up with some sort of recipe that works with the data and tools I have to hand, as well as the techniques and processes I’ve used before. Sometimes, a recipe I’m working on requires me to get another ingredient out of the fridge, or another utensil out of the cupboard. Sometimes I use a tea towel as an oven glove, or a fork as a knife. Sometimes I taste the food-in-process to know when it’s done, sometimes I go by the colour, texture, consistency, shape, smell or clouds of smoke that have started to appear.

Because I haven’t had any formal training in any of this “stuff”, using “approved” academic sources (I’ve recently been living by R-Bloggers (which is populated by quite a few academics) and Stack Overflow, for example), I suffer from a lack of confidence in talking about it in an academic way (see for example For My One Thousandth Blogpost: The Un-Academic), and a similar lack of confidence in feeling that I could ever charge anybody a fee for telling them what I (think I) know (leave aside for the moment that I effectively charge the OU my salary, benefits and on-costs… hmmm?!). I used to do the academic thing way back when as a postgrad and early postdoc, but fell out of the habit over the last few years because there seemed to me to be a huge amount of investment of time required for very little impact or consequence of what I was doing. Yes, it’s important for things be “right”, but I’m not sure my maths is up to generating formal proofs of new algorithms. I may be able to do the engineering or technologist thing of getting something working, -ish, good enough “for now”, research-style coding, but it’s always mindful of an engineering style trade-off: that it might not be “right” and is just something I figured out that seems to work, but that it’ll do because it lets me get something done… As Artur Bergman puts it using rather colourful language – “yes, correlation isn’t causation, but…”

(This clip was originally brought to mind by a recent commentary from Stephen Downes on The Internet Blowhard’s Favorite Phrase, and the original post it refers to.)

Also mixed up in the notion of “right” is seeing things as “right” if they are formally recognised or accepted as such, which is where assessment and peer review come in: you let other people you trust make an assessment about whatever it is you do/have done, publicly recognising your achievements which in turn allows you to make a justifiable claim to them. (I am reminded here of the definition of knowledge as justified true belief. That word “justified” is interesting, isn’t it…?)

As well as resisting getting in the whole grant bidding cycle for turnover generating, public money cycling projects that are set up to fail, I’ve also recently started to fall out of OU-style formal teaching roles… again, in part because of the long lead times involved with producing course materials and my preference for network based, rather than teamwork based, working style. (I so need to revisit formal models of teamwork and try to come up with a corresponding formulation for networks rather than teams…Or do a lit review to find one that’s already out there…!) I tend to write in 1 hour chunks based on 3-4 hours work, then post whatever it is I’ve done. One reason for doing this is becuase I figure most people read or do things in 5 to 15 minutes or one to two hour chunks and that in a network-centric, distributed online online educational setting small chunks are more likely to be discoverable and immediately useful (directly and immediately learnable from) chunks. There’s no shame in using a well crafted Wikipedia as a starting point for discovering more detailed – and academic – resources: at least you stand a good chance of finding the Wikipedia page! In the same way, I try to link out out to supporting resources from most of my posts so that readers (including myself as a revisitor to these pages in that set) have some additional context, supporting or conflicting material to help get more value from it. (Related: Why I (Edu)Blog.)

Thinking about my own personal lack of confidence, which in part arises from the way I have informally learned whatever it is that I have actually learned over the last few years and not had it formally validated by anybody else, my interest in espousing an informal style networked learning on others is an odd one… Because based on my own experience, it doesn’t give me the feeling that what I know is valid (justified..?), or even necessarily trustable by anybody other than me (because I know how it’s caveated because of what I have personally learned about it, rather than just being told about it), even if it is pragmatic and at least occasionally appears to be useful. (Hmm… I don’t think an OU editor would let me get away with a sentence like that in a piece of OU course material!) Maybe I need to start keeping a second, formalised reflective learning journal as the OU Skills for OU Study suggests to log what I learn, and provide some sort of indexable and searchable metadata around it? In fact, this approach might be a useful approach if I do another uncourse? (It also brings to mind the word equation: Learning material + metadata = learning object (it was something like that, wasn’t it?!))

To the extent that this blog is an act of informal, open teaching, I think it offers three main things: a) “knowledge transferring” discoverable resources on a variety of specialised topics; b) fragmentary records of created knowledge (I *think* I’ve managed to make up odd bits of new stuff over the last few years…); c) a model of some sort of online distributed network centric learning behaviour (see also the Digital Worlds Uncourse Blog Experiment in this respect).

I guess one of the things I do get to validate against is the real world. When I used to go into schools doing robotics activities*, kids would ask me if their robot or programme was “right”. In many cases, there wasn’t really a notion of “right”, it was more a case of:

  • were there things that were obviously wrong?
  • did the thing work as anticipated (or indeed, did any elements of it work at all?!;-)?
  • were there any bits that could be improved, adapted or done in another more elegant way?

So it is with some of my visualisation “experiments” – are they not wrong (is the data right, is there a sensible relationship between the data and the visual mappings)? do they “work” at all (eg in the sense of communicating a particular trend, or revealing a particular anomaly)? could they be improved? By running the robot program, or trying to read the story a data visualisation appears to be telling us, we can get a sense of how “right” it is; but there is often no single “right” for it to be. Which is where doubt can crop in… Because if something is “not right”, then maybe it’s “wrong”…?

In the world of distributed, networked learning, I think one thing we need to work on is developing an appropriate sense of validation and legitimisation of personal learning. Things like badges are weak extrinsic signs that some would claim have a role in this, but I wonder how networks and communities can be shaped and architected, or how their dynamics might work, so that learners develop not only a well-founded intrinsic confidence about what they have self-learned, but also a feeling that what they have self-learned is as legitimate as something they have been formally taught? (I know, I know: “I was at the University of Life, me”… As I am, now… which reminds me, I’ve a Coursera video and Feynman lecture on Youtube to watch, and a couple of code mentor answers to questions I’ve raised on various Stack Exchange sites to read through; and I probably should check to see if there are any questions recently posted to Stack Overflow that I may be able to answer and use to link out to other, more academic “open educational” resources…)

[Rereading this post, I think I am suffering from a lack of formality and the sense of justification that comes with it. Hmmm…]

* This is something I’ve recently been asked to do again for an MK local primary school in the new year; the contact queried how much I might charge and whilst in the past I would have said “no need”, for some reason this time I felt obliged to seek advice about from the Deanery about whether I should charge, and if so how much. This a huge personal cultural shift away from my traditional “of course/pro bono” attitude, and it felt wrong, somehow. To the extent that universities are public bodies, they should work with other public services in their local and extended communities. But of course, I get the sense we’re not really being encouraged to think of ourselves as public bodies very much any more, we’re commercial services… And that feeling affects the personal responsibility I feel when acting for and on behalf of the university. As it turns out, the Deanery seems keen that we participate freely in community events… But I note here that I felt (for the first time) as if I had to check first. So what’s in the air?

See also: Terran Lane’s On Leaving Academia and (via @boyledsweetie) Inspirational teaching: since when did entertainment not matter?

Appropriating IT: Glue Steps

Over the years, I’ve been fortunate enough to have been gifted some very evocative, and powerful, ideas that immediately appealed to me when I first heard them and that I’ve been able to draw on, reuse and repurpose over and over again. One such example is “glue logic”, introduced to me by my original OU PhD supervisor George Kiss. The idea of glue logic is to provide a means by which two digital electronic circuits (two “logic” circuits) that don’t share a common interface can be “glued” together.

To generalise things a little, I have labelled the circuits as applications in the figure. But you can think of them as circuits if you prefer.

A piece of custom digital circuitry than can talk to both original circuits, and translate the outputs of each into a form that can be used as input to the other, is placed between them to take on this interfacing role: glue logic.

Sometimes, we might not need to transform all the data that comes out of the first circuit or application:

This idea is powerful enough in its own right, but there was a second bit to it that made it really remarkable: the circuitry typically used to create the glue logic was a device known as a Field Programmable Gate Array, or FPGA. This is a type of digital circuit whose logical function can be configured, or programmed. That is, I can take my “shapeless” FPGA, and programme it so that it physically implements a particular digital circuit. Just think about that for a moment… You probably have a vague idea that the same computer can be reprogrammed to do particular things, using some vaguely mysterious and magical thing called software, instructions that computer processors follow in order to do incredible things. With an FPGA, the software actually changes the hardware: there is no processor that “runs a programme”; when you programme an FPGA, you change its hardware. FPGAs are, literally, programmable chips. (If you imagine digital circuits to be like bits of plastic, an FPGA is like polymorph.)

The notion of glue logic has stuck with me for two reasons, I think: firstly, because of what it made possible, the idea of flexibly creating an interface between two otherwise incompatible components; secondly, because of the way in which it could be achieved – using a flexible, repurposable, reprogrammable device – one that you could easily reprogramme if the mapping from one device to another wasn’t quite working properly.

So what has this got to do with anything? In a post yesterday, I described a recipe for Grabbing Twitter Search Results into Google Refine And Exporting Conversations into Gephi. The recipe does what it says on the tin… but it actually isn’t really about that at all. It’s about using Google Refine as glue logic, for taking data out of one system in one format (JSON data from the Twitter search API, via a hand assembled URL) and getting it in to another (Gephi, using a simple CSV file).

The Twitter example was a contrived example… the idea of using Google Refine as demonstrated in the example is much more powerful. Because it provides you way of thinking about how we might be able to decompose data chain problems into simple steps using particular tools that do certain conversions for you ‘for free’.

Simply by knowing that Google Refine can import JSON (and if you look at the import screen, you’ll also notice it can read in Excel files, CSV files, XML files, etc, not just JSON, tidy it a little, and then export some or all of it as a simple CSV file means you now have a tool that might just do the job if you ever need to glue together an application that publishes or exports JSON data (or XML, or etc etc) with one that expects to read CSV. (Google Refine can also be persuaded to generate other output formats too – not just CSV…)

You can also imagine chaining separate tools together. Yahoo Pipes, for example, is a great environment for aggregating and filtering RSS feeds. As well as publishing RSS/XML via a URL, it also publishes more comprehensive JSON feeds via a URL. So what? So now you know that if you have data coming through a Yahoo Pipe, you can pull it out of the pipe and into Google Refine, and then produce a custom CSV file from it.

PS Here’s another really powerful idea: moving in and out of different representation spaces. In Automagic Map of #opened12 Attendees, Scott Leslie described a recipe for plotting out the locations of #opened12 attendees on a map. This involved a process of geocoding the addresses of the home institutions of the participants to obtain the latitude and longitude of those locations, so that they could be appropriately placed. So Scott had to hand: 1) institution names and or messy addresses for those institutions; 2) latitude and longitude data for those addresses. (I use the word “messy” to get over the idea that the addresses may be specified in all manner of weird and wonderful ways… Geocoders are built to cope with all sorts of variation in the way addresses are presented to them, so we can pass the problem of handling these messy addresses over to the geocoder.)

In a comment to his original post, Scott then writes: [O]nce I had the geocodes for each registrant, I was also interested in doing some graphs of attendees by country and province. I realized I could use a reverse geocode lookup to do this. That is, by virtue of now having the lat/long data, Scott could present these co-ordinates to a reverse geo-coding service that takes in a latitude/longitude pair and returns an address for it in a standardised form, including, for example, an explicit country or country code element. This clean data can then be used as the basis for grouping the data by country, for example. The process is something like this:

Messy address data -> [geocode] -> latitude/longitude data -> [reverse geocode] -> structured address data.

Beautiful. A really neat, elegant solution that dances between ways of representing the data, getting it into different shapes or states that we can work in different particular ways. :-)

PPS Tom Morris tweeted this post likening Google Refine to an FPGA. For this to work more strongly as a metaphor, I think we might have to take the JSON transcript of a set of operations in Google Refine as the programming, and then bake them into executable code, as for example we can do with the Stanford Data Wrangler, or using Pipe2Py, with Yahoo Pipes?

#StrataConf Dreamcatcher

I’ve been listening to the #stratconf live feed this morning, tales of goodness relating to some of the things we can start to do with “Big Data”. I also had a peek at the Twitter feed, which looked to be swamped by spambots and pr0nbots…

This amused me no end – I imagine that IRC is still the favoured communication channel for big data developers, but Twitter is the home of big data’s social media loving evangelists. To be in any way useful to observers, the backchannel needs filtering, something that I suspect many of the attendees might have solutions for…

One of the reasons I was looking at the hashtag feed was to sketch a quick map of the folk commonly followed by recent tag users. Whilst the bots tend to fall out of making any contribution to the map (the way they follow and/or are followed by other accounts using the hashtag is atypical compared to “legitimate” members of the hashtag community, so they tend to get filtered out) the large number and quick fire tweeting behaviour of the bots trashes my sampling method: I tend to grab tweets using the Twitter search API, which gives me access to the 1500 most tweets containing whatever search term I use. If 50% of those tweets are spam, it reduces my sample size… (I guess I need to hook up a streaming API collection mechanism…If you have a tweepy recipe to share for collecting samples of N tweets from a stream on a particular search phrase, please post a link (or code) in the comments;-)

What we really need is a filter. A dreamcatcher, maybe?

• “Long ago when the word was sound, an old Lakota spiritual leader was on a high mountain and had a vision. In his vision, Iktomi, the great trickster and searcher of wisdom, appeared in the form of a spider. Iktomi spoke to him in a sacred language. As he spoke, Iktomi the spider picked up the elder’s willow hoop which had feathers, horsehair, beads and offerings on it, and began to spin a web. He spoke to the elder about the cycles of life, how we begin our lives as infants, move on through childhood and on to adulthood. Finally we go to old age where we must be taken care of as infants, completing the cycle. But, Iktomi said as he continued to spin his web, in each time of life there are many forces, some good and some bad. If you listen to the good forces, they will steer you in the right direction. But, if you listen to the bad forces, they’ll steer you in the wrong direction and may hurt you. So these forces can help, or can interfere with the harmony of Nature. While the spider spoke, he continued to weave his web. When Iktomi finished speaking, he gave the elder the web and said, The web is a perfect circle with a hole in the center. Use the web to help your people reach their goals, making good use of their ideas, dreams and visions. If you believe in the great spirit, the web will filter your good ideas and the bad ones will be trapped and will not pass.”
• Native Americans believed that dreams were floating in the air.
• The Bad dreams would get caught in the web and expire when the sun rose while the good dreams would go through the center and then flow down through the feathers to the sleeping individual to come to them.
[Native American Dream Catchers]

So how do we catch the bad dreams, the unwanted tweets from the spam bots, and let the useful tweets through? Or how do we let the spam tweets pass through and capture the good ones? Here’s a quick sketch I made of the #strataconf hahstag community, showing how a the people who sent a sample of 1500 recently so-tagged tweets follow each other, layed out in Gephi using the ARF layout function with nodes sized according to eigenvector centrality:

The unconnected grey nodes in the upper right sector are the bots, typically. The connected nodes are folk who are interested in the topic and who follow each other. Even if the spam bots follwed some of these parties, we could still identify them, for example by sizing node in-degree using a non-linear mapping:

So in my mind’s-eye I have a tweetcatcher that catches tweets from folk who are part of the greater connected component by virtue of having one of more folk in the community follow them and discards the rest…

A Question About Klout…

I’ve no idea how Klout works out it’s scores, but I’m guessing that there is an element of PageRank style algorithmic bootstrapping going on, in which a person’s Klout score is influenced by the Klout score of folk who interact with a person.

So for example, if we look at @briankelly, we see how he influences other influential (or not) folk on Klout:

One thing I’ve noticed about my Klout scrore is that it tends to be lower than most of the folk I have an OU/edtech style relationship with; and no, I don’t obsess about it… I just occasionally refer to it when Klout is in the news, as it was today with an announced tie up with Bing: Bing and Klout Partner to Strengthen Social Search and Online Influence. In this case, if my search results are going to be influenced by Bing, I want to understand what effect that might have on the search results I’m presented with, and how my content/contributions might be being weighted in other peoples’ search results.

So here’s a look at the Klout scrores of the folk I’ve influenced on Klout:

Hmm… seems like many of them are sensible and are completely ignoring Klout. So I’m wondering: is my Klout score depressed relative to other ed-tech folk who are on Klout because I’m not interacting with folk who are playing the Klout game? Which is to say: if you are generating ranking scores based at least in part on the statistics of a particular netwrok, it can be handy to know what netwrok those stats are being measured on. If Klout stats are dominated by components based on networks statistics calculated from membership of the Klout network, that is very different to the sorts of scores you might get if the same stats were calculated over the whole of the Twitter network graph…

Sort of, but not quite, related: a few articles on sampling error and sample bias – Is Your Survey Data Lying to You? and The Most Dangerous Porfession: A Note on Nonsampling Error.

PS Hmmm.. I wonder how my Technorati ranking is doing today…;-)