OUseful.Info, the blog…

Trying to find useful things to do with emerging technologies in open education

Archive for the ‘Anything you want’ Category

A Conversation With Data – Isle of Wight Car Parking Meter Transaction Logs

leave a comment »

Killer post title, eh?

Some time ago I put in an FOI request to the Isle of Wight Council for the transaction logs from a couple of ticket machines in the car park at Yarmouth. Since then, the Council made some unpopular decisions about car parking charges, got a recall and then in passing made the local BBC news (along with other councils) in respect of the extent of parking charge overpayments…

Here’s how hyperlocal news outlet OnTheWight reported the unfolding story…

I really missed a trick not getting involved in this process – because there is, or could be, a significant data element to it. And I had a sample of data that I could have doodled with, and then gone for the whole data set.

Anyway, I finally made a start on looking at the data I did have with a view to seeing what stories or insight we might be able to pull from it – the first sketch of my conversation with the data is here: A Conversation With Data – Car Parking Meter Data.

It’s not just the parking meter data that can be brought to bear in this case – there’s another set of relevant data too, and I also had a sample of that: traffic penalty charge notices (i.e. traffic warden ticket issuances…)

With a bit of luck, I’ll have a go at a quick conversation with that data over the next week or so… Then maybe put in a full set of FOI requests for data from all the Council operated ticket machines, and all the penalty notices issued, for a couple of financial years.

Several things I think might be interesting to look at Island-wide:

  • in much the same was as Tube platforms suffer from loading problems, where folk surge around one entrance or another, do car parks “fill up” in some sort of order, eg within a car park (one meter lags the other in terms of tickets issued) or one car park lags another overall;
  • do different car parks have a different balance of ticket types issued (are some used for long stay, others for short stay?) and does this change according to what day of the week it is?
  • how does the issuance of traffic penalty charge notices compare with the sorts of parking meter tickets issued?
  • from the timestamps of when traffic penalty charge notices tickets are issued, can we work out the rounds of different traffic warden patrols?

The last one might be a little bit cheeky – just like you aren’t supposed to share information about the mobile speed traps, perhaps you also aren’t supposed to share information that there’s a traffic warden doing the rounds…?!

Written by Tony Hirst

July 25, 2014 at 6:04 pm

MOOCs as Partworks

with 2 comments

A couple of recent posts crossed my feeds recently mooching around the idea that doing is MOOC is Like Reading a Newspaper; or not: MOOC Completion Rates DO Matter.

First up, Downes suggests that:

The traditional course is designed like a book – it is intended to run in a sequence, the latter bits build on the first bits, and if you start a book and abandon it p[art way through there is a real sense in which you can say the book has failed, because the whole point of a book is to read it from beginning to end.

But our MOOCs are not designed like that. Though they have a beginning and an end and a range of topics in between, they’re not designed to be consumed in a linear fashion the way a book it. Rather, they’re much more like a magazine or a newspaper (or an atlas or a city map or a phone book). The idea is that there’s probably more content than you want, and that you’re supposed to pick and choose from the items, selecting those that are useful and relevant to your present purpose.

And so here’s the response to completion rates: nobody ever complained that newspapers have low completion rates. And yet no doubt they do,. Probably far below the ‘abysmal’ MOOC completion rates (especially if you include real estate listings and classified ads). People don’t read a newspaper to complete it, they read a newspaper to find out what’s important.

Martin (Weller) responds:

Stephen Downes has a nice analogy, (which he blogged at my request, thankyou Stephen) in that it’s like a newspaper, no-one drops out of a newspaper, they just take what they want. This has become repeated rather like a statement of fact now. I think Stephen’s analogy is very powerful, but it is really a statement of intent. If you design MOOCs in a certain way, then the MOOC experience could be like reading a newspaper. The problem is 95% of MOOCs aren’t designed that way. And even for the ones that are, completion rates are still an issue.

Here’s why they’re an issue. MOOCs are nearly always designed on a week by week basis (which would be like designing a newspaper where you had to read a certain section by a certain time). I’ve blogged about this before, but from Katy Jordan’s data we reckon 45% of those who sign up, never turn up or do anything. It’s hard to argue that they’ve had a meaningful learning experience in any way. If we register those who have done anything at all, eg just opened a page, then by the end of week 2 we’re down to about 35% of initial registrations. And by week 3 or 4 it’s plateauing near 10%. The data suggests that people are definitely not treating it like a newspaper. In Japan some research was done on what sections of newspapers people read.

He goes on:

… Most MOOCs are about 6-7 weeks long, so 90% of your registered learners are never even looking at 50% of your content. That must raise the question of why are you including it in the first place? If a subject requires a longer take at it, beyond 3 weeks say, then MOOCs really may not be a very good approach to it. There is a hard, economic perspective here, it costs money to make and run MOOCs, and people will have to ask if the small completion rates are the most effective way to get people to learn that subject. You might be better off creating more stand alone OERs, or putting money into better supported outreach programmes where you can really help people stay with the course. Or maybe you will actually design your MOOC to be like a newspaper.

So…

I buy three newspapers a week – the Isle of Wight County Press (to get a feel for what’s happened and is about to happen locally, as well as seeing who’s currently recruiting), the Guardian on a Saturday (see what news stories made it as far as Saturday comment, do the Japanese number puzzles, check out the book reviews, maybe read the odd long form interview and check a recipe or two), and the Observer on a Sunday (read colleagues’ columns, longer form articles by journalists I know or have met, check out any F1 sports news that made it into that paper, book reviews, columns, and Killer again…).

So I skim bits, have old faithfuls I read religiously, and occasionally follow through on a long form article that was maybe advertised on the cover and I might have missed otherwise.

Newspapers are organised in a particular way, and that lets me quickly access the bits I know I want to access, and throw the rest straight onto the animal bedding pile, often unread and unopened.

So MOOCs are not really like that, at least, not for me.

For me MOOCs are freebie papers I’ve picked up and then thrown, unread, onto the animal bedding pile. For me.

What I can see, though, as MOOCs as partworks. Partworks are those titles you see week on week in the local newsagent with a new bit on the cover that, if collected over weeks and months and assembled in the right way, result in a flimsy plastic model you’ve assembled yourself with an effective cost price running into hundreds of pounds.

[Retro: seems I floated the MOOC as partwork idea before - Online Courses or Long Form Journalism? Communicating How the World Works… - and no-one really bit then either...]

In the UK, there are several notable publishers of partwork titles, including for example Hachette, De Agostini,Eaglemoss. Check out their homepages – then check out the homepages of a few MOOC providers. (Note to self – see if any folk working in marketing of MOOC platform providers came from a partwork publishing background.)

Here’s a riff reworking the Wikipedia partwork page:

A partworkMOOC is a written publicationan online course released as a series of planned magazine-like issueslessons over a period of time. IssuesLessons are typically released on a weekly, fortnightly or monthly basis, and often a completed set is designed to form a reference work oncomplete course in a particular topic.

Partwork seriesMOOCs run for a determined length and have a finite life. Generally, partworksMOOCs cover specific areas of interest, such as sports, hobbies, or children’s interest and stories such as PC Ace and the successful The Ancestral Trail series by Marshall Cavendish Ltdrandom university module subjects, particularly ones that tie in to the telly or hyped areas of pseudo-academic interest. They are generally sold at newsagents and are mostly supported by massive television advertising campaigns for the launchhosted on MOOC platforms because exploiting user data and optimising user journeys through learning content is something universities don't really understand and avoid trying to do. In the United Kingdom, partworksMOOCs are usually launched by heavy television advertising each Januarymentioned occasionally in the press, often following a PR campaign by the UK MOOC platfrom, FutureLearn.

PartworksMOOCs often include cover-mounted items with each issue that build into a complete set over time. For example, a partwork about artMOOC might include a small number of paints or pencils that build into a complete art-setso-called "badges" that can be put into an online "backpack" to show off to your friends, family, and LinkedIn trawlers; a partwork about dinosaurs might include a few replica bones that build a complete model skeleton at the end of the series; a partwork about films may include a DVD with each issue. In Europe, partworks with collectable models are extremely popular; there are a number of different publications that come with character figurines or diecast model vehicles, for example: The James Bond Car Collection.

In addition, completed partworksMOOCs have sometimes been used as the basis for receiving a non-academic credit bearing course completion certificate, or to create case-bound reference works and encyclopediasa basis for a piece of semi-formal assessment and recognition. An example is the multi-volume Illustrated Science and Invention Encyclopedia which was created with material first published in the How It Works partworkNEED TO FIND A GOOD EXAMPLE.

In the UK, partworksMOOCs are the fourth-best selling magazine sector, after TV listing guides, women’s weeklies and women’s monthliesNEED SOME NUMBERS HERE*.... A common inducement is a heavy discount for the first one or two issues??HOW DO MOOCs SELL GET SOLD?. The same seriesMOOC can be sold worldwide in different languages and even in different variations.

* Possibly useful starting point? BBC News Magazine: Let’s get this partwork started

The Wikipedia page goes on to talk about serialisation (ah, the good old days when I still had hoped for feeds and syndication… eg OpenLearn Daily Learning Chunks via RSS and then Serialised OpenLearn Daily RSS Feeds via WordPress) and the Pecia System (new to me), which looks like it could provide an interesting starting point on a model of peer-co-created learning, or somesuch. There’s probably a section on it in this year’s Innovating Pedagogy report. Or maybe there isn’t?!;-)

Sort of related but also not, this article from icrossing on ‘Subscribe is the new shop.’ – Are subscription business models taking over? and John Naughton’s column last week on the (as then, just leaked) Kindle subscription model – Kindle Unlimited: it’s the end of losing yourself in a good book, I’m reminded of Subscription Models for Lifelong Students and Graduate With Who (Whom?!;-), Exactly…?, which several people argued against and which I never really tried to defend, though I can’t remember what the arguments were, and I never really tried to build a case with numbers in it to see whether or not it might make sense. (Because sometimes you think the numbers should work out in your favour, but then they don’t… as in this example: Restaurant Performance Sunk by Selfies [via RBloggers].)

Erm, oh yes – back to the MOOCs.. and the partworks models. Martin mentioned the economics – just thinking about the partwork model (pun intended, or maybe not) here, how are parts costed? Maybe an expensive loss leader part in week 1, then cheap parts for months, then the expensive parts at the end when only two people still want them? How will print on demand affect partworks (newsagent has a partwork printer round the back to print of the bits that are needed for whatever magazines are sold that week?) And how do the partwork costing models then translate to MOOC production and presentation models?

Big expensively produced materials in front loaded weeks, then maybe move to smaller presentation methods, get the forums working a little better with smaller, more engaged groups? How about the cMOOC ideas – up front in early weeks, or pushed back to later weeks, where different motivations, skills, interest and engagement models play out.

MOOCs are newspapers? Nah… MOOCs as partwork – that works better as a model for me. (You can always buy a partwork mid-way through because you are interested in that week’s content, or the content covered by the magazine generally, not because you are interested in the plastic model or badge.

Thinks: hmm, partworks come in at least two forms, don’t they – one to get pieces to build a big model of a boat or a steam train or whatever. The other where you get a different superhero figurine each week and the aim it attract the completionist. Which isn’t to say that part 37 might not be stupidly popular because it has a figure that is just generally of interest, ex- of being part of a set?

Written by Tony Hirst

July 22, 2014 at 3:26 pm

Posted in Anything you want

Tagged with

F1 Timing Screen as a Spreadsheet?

leave a comment »

One of the comment themes I’ve noticed around the first Challenge in the Tata F1 Connectivity Innovation Prize, a challenge to rethink what’s possible around the timing screen given only the data in the real time timing feed, is that the non-programmers don’t get to play. I don’t think that’s true – the challenge seems to be open to ideas as well as practical demonstrations, but it got me thinking about what technical ways in might be to non-programmers who wouldn’t know where to start when it came to working with the timing stream messages.

The answer is surely the timing screen itself… One of the issues I still haven’t fully resolved is a proven way of getting useful information events from the timing feed – it updates the timing screen on a cell by cell basis, so we have to finesse the way we associate new laptimes or sector times with a particular driver, bearing in mind cells update one at a time, in a potentially arbitrary order, and with potentially different timestamps.

f1-innovation-prize_s3_amazonaws_com_challenge_packs_The_F1_Connectivity_Innovation_Prize_–_Challenge_1_Brief_pdf

So how about if we work with a “live information model” by creating a copy of an example timing screen in a spreadsheet. If we know how, we might be able to parse the real data stream to directly update the appropriate cells, but that’s largely by the by. At least we have something we can work work to start playing with the timing screen in terms of a literal reimagining of it. So what can we do if we put the data from an example timing screen into a spreadsheet?

If we create a new worksheet, we can reference the cells in the “original” timing sheet and pull values over. The timing feed updates cells on a cell by cell basis, but spreadsheets are really good at rippling through changes from one or more cells which are themselves reference by one or more others.

The first thing we might do is just transform the shape of the timing screen. For example, we can take the cells in a column relating to sector 1 times and put them into a row.

The second thing we might do is start to think about some sums. For example, we might find the difference between each of those sector times and (for practice and qualifying sessions at least) the best sector time recorded in that session.

The third thing we might do is to use a calculated value as the basis for a custom cell format that colours the cell according to the delta from the best session time.

Simple, but a start.

I’ve not really tried to push this idea very far – I’m not much of a spreadsheet jockey – but I’d be interested to know how folk who are might be able to push this idea…

If you need example data, there’s some on the F1 site – f1.com – results for Spanish Grand Prix, 2014 and more on Ergast Developer API.

PS FWIW, my entry to the competition is here: #f1datajunkie challenge 1 entry. It’s perhaps a little off-brief, but I’ve been meaning to do this sort of summary for some time, and this was a good starting point. If I get a chance, I’ll have a go a getting the parsers to work properly properly!

Written by Tony Hirst

July 17, 2014 at 4:38 pm

Posted in Anything you want

Tagged with , ,

Tracking Anonymous Wikipedia Edits From Specific IP Ranges

with one comment

Via @davewiner’s blog, I spotted a link to @congressedits, “a bot that tweets anonymous Wikipedia edits that are made from IP addresses in the US Congress”. (For more info, see why @congressedits?, /via @ostephens.) I didn’t follow the link to the home page for that account (doh!), but in response to a question about whether white label code was available, @superglaze pointed me to https://github.com/edsu/anon, a script that “will watch Wikipedia for edits from a set of named IP ranges and will tweet when it notices one”.

It turns out the script was inspired by @parliamentedits, a bot built by @tomscott that “tracks edits to Wikipedia made from Parliamentary IP addresses” built using IFTT and possibly a list of IP ranges operated by the House of Commons gleaned from this FOI request?

Nice…

My immediate thought was set up something to track edits made to Wikipedia from OU IP addresses, then idly wondered if set of feeds for tracking edits from HEIs in general might also be useful (something to add to the UK University Web Observatory for example?)

To the extent that Wikipedia represents an authoritative source of information, for some definition of authoritative(?!), it could be interesting to track the “impact” of our foolish universities in terms of contributing to the sum of of human knowledge as represented by Wikipedia.

It’d also be interesting to track the sorts of edits made from anonymous and named editors from HEI IP ranges. I wonder what classes they may fall into?

  1. edits from the marketing and comms folk?
  2. ego and peer ego edits, eg from academics keeping the web pages of other academics in their field up to date?
  3. research topic edits – academics maintaining pages that relate to their research areas or areas of scholarly interest?
  4. teaching topic edits – academics maintaining pages that relate to their teaching activities?
  5. library edits – edits made from the library?
  6. student edits – edits made by students as part of a course?
  7. “personal” edits – edits made by folk who class themselves and Wikimedians in general and just happen to make edits while they are on an HEI network?

My second thought was to wonder to what extent might news and media organisations be maintaining – or tweaking – Wikipedia pages? The BBC, for example, who have made widespread use of Wikipedia in their Linked Data driven music and wildlife pages.

Hmmm… news.. reminds me: wasn’t a civil servant who made abusive edits to a Wikipedia page sacked recently? Ah, yes: Civil servant fired after Telegraph investigation into Hillsborough Wikipedia slurs, as my OU colleague Andrew Smith suggested might happen.

Or how about other cultural organisations – museums and galleries for example?

Or charities?

Or particular brands? Hmm…

So I wonder: could we try to identify areas of expertise on, or attempted/potential influence over, particular topics by doing reverse IP lookups from pages focussed on those topics? This sort of mapping activity pivots the idea of visualising related entries in Wikipedia to map IP ranges, and perhaps from that locations and individuals associated with maintaining a set of resources around a particular topic area (cf. Visualising Delicious Tag Communities).

I think I started looking at how we might start to map IP ranges for organisations once….? Erm… maybe not, actually: it was looking up domains a company owned from its nameservers.

Hmm.. thinks… webstats show IP ranges of incoming requests – can we create maps from those? In fact, are there maps/indexes that give IP ranges for eg companies or universities?

I’m rambling again…

PS Related: Repository Googalytics – Visits from HEIs which briefly reviews the idea of tracking visits to HEI repositories from other HEIs…

Written by Tony Hirst

July 11, 2014 at 7:45 am

Posted in Anything you want

Foolish is as foolish does… fragments…

leave a comment »

It’s been that CDSA paperwork time of year again, and if nothing else it has forced me to start pulling together some fragments of ideas backed up by other peoples’ (or people’s – I never can remember) words…

So here are some fragments that are, I think, aligned to some of the things that I’ve thought for a long time and have been thinking again more recently, things that resonated with me as I read them just now…

The way in to this this time round for me was part of a talk by Prof Richard Keeble at the University of Lincoln School of Journalism Research Symposium last week, when he mentioned the role of the fool as a safety valve or regulator in the classical court…

First up, from “Class Clown and Court Jester“, David Chevreau, MA thesis, UBC, April 1994

p10-11
The fool can fill a number of specific roles in the group. He represents the rejected values, lost causes, fiascoes, and incompetencies of the larger gathering. His lowly yet valued position in the office of scapegoat and butt of humour gives him license to depart from the group’s accepted social norms with a unique impunity.
Footnote 10 – The fool is a rebel, outcast, prophet and whipping boy, and his office is a well defined social phenomenon.

As a scapegoat, you can speak the truth and people can choose to listen or not. The truth can be ignored if spoken by the fool, because the fool said it..

p13
Unperturbed by misplaced authority, the Class Clown seizes an opportunity either the naivete of the natural innocent or the insight of the wizened fool.

I’m often confused…

p16
The Fool exists on the fringe of social convention, and thus has a license which frees him from responsibility and consequence.

Scruffy hippy…

p18
It is difficult to accuse a Fool of meaning anything, for his foolish words may be nothing more than the babbling of the idiot or a disguise which reveals hidden truth to some but appears to be senseless chatter to others.

WTF is he on about?

p22
Transgressing the bounds of propriety in his failure to cope with convention the Fool does not suffer the usual loss of dignity associated with social failure.

FFS… swearing again…

p22
Standing at the fringe, the Fool may be a disinterested truth-teller whose apparent madness masks his breadth of perspective. The Fool is a detached observer who lives at the boundary not just between order and chaos but also between what is and what appears to be, and is often confused with the silly and deluded.

So, I deserve a Chair, right?!:-)

p23
The Fool perceives that the world has more to do with seeming than with seeing, for too much of our world is actually unknown and irrational. To take the imposition of order too seriously is the height of folly, and so the Fool, standing aside, sees through the illusion.

Ooh, big data… and squillionty billionty pounds will be made from open data… innit.

So that was that one…

Here’s another take… As I read (/red/) this, I also though about my own role within the university… As above,so below…? “Institutional Heterogeneity and Change: The University as Fool”, Donncha Kavanagh, Organization Volume 16(4): 575–595 ISSN 1350–5084 , 2009.

First, the scene is set…

p577 “Detailed study of the history of the University suggests that it is an institution that acts and has a role akin to the Fool in the royal court of medieval times.”

The paper then explores this idea in narrative form…

Fool as normative narrator:
p586 “The Fool is a story-teller, but its stories are always embedded in a framework of norms and values that connect the moment into longer conversations over time and space.”

There is a context to what we do, and a tradition that informs it, in both form and in content…

p587 “Akin to the medieval fool, who is not there to merely tell stories, the University is expected to provide a normative narrative or a critical interpretation of the world. … the University’s long tradition of academic freedom mirrors the Fool’s position as the Sovereign’s independent critic. … The university does not just (re-)tell stories, parables, and proverbs. Its power also comes about from its material ability to sort things out (Bowker and Star, 1999 [Bowker, G. and Star, S. L. (1999) Sorting Things Out: Classification and its Consequences. London: MIT Press.]); it is a sorter par excellence.”

The university helps make sense of the world… it can do this by putting things into perspective, or ordering them/organising them, in a particular way (that is, is can “sort” them, as you might sort a sock drawer, albeit one that doesn’t necessarily contain any actual pairs of socks…)

p588 “Through these twin processes of normative narrating and sorting the university constructs and maintains what I term the semiotic nexus. The semiotic nexus gives meaning to an institution — be it the University, its sovereign or one of the other institutions in the realm—through telling a multi-part, compelling, value-laden tale about the institution and its place in the world. The university is not the only institution engaged in this process of ‘making meaning’—narrating is a form of theorizing that everyone engages in—but it plays a central role in determining what counts as knowledge, as well as defining what is valuable, peripheral, obscene, sacred, profane, reputable, opinion, fact, etc. The University, like the Fool, personifies truth and reason, in that it is required to tell the truth, to abolish myth, and to distinguish fact from mere opinion. In other words, the University’s normative story-telling ability allied to its sorting practices and technologies are basic to how the University realizes its imagined community of academics, how it at once becomes an institution itself, and also how it maintains and sustains the semiotic nexus underpinning other institutions. In other words, these practices play a significant role in the process of institutionalization.”

I’ve noticed that people find it very hard to play… I can play all day… Erm… I can only play?!

p589 Play in the fool
“The Fool is a ludic spirit within the institutional complex, and play—a free activity standing outside of and opposed to the seriousness of ordinary life (Huizinga, 1955)—is its modus operandi. As with the child, the Fool is allowed, expected and given time and space to play. Through playing with language the Fool sparks a new (yet old) understanding of the here and now. This incandescent quality at once makes events alive—giving them immediate meaning—while simultaneously framing them within a longer temporal structure or longue durée that articulates the empirical with a transcendent truth. Each ‘play’ then endures as a new mental creation, to be repeated and retained in memory, echoing older refrains of truth and tradition. Following Huizinga, play is primordial and because of its close links with the sacred, it works to keep old norms and beliefs alive. The Fool as playmaker extraordinaire is central to this continual process of institutional re-creation through which an institution breathes, lives and renews itself.
Yet, because it takes work to create order within play, play always (sub- liminally) reminds us that the world is fundamentally chaotic and that any meaning within this chaos is always provisional and artificial. The Fool’s work of play then is to institutionalize order and at once to open up order to de-institutionalization. Through its role as playmaker, the Fool puts an institution ‘into play’, which means that work must be done to either recreate or de-stabilize the institution. In this way, the Fool’s ability and license to play is paradoxically central to both institutionalization and de-institutionalization.”

FWIW, seeing that mention of Huizinga, I’m reminded of how play is a serious business… see for example, Getting Philosophical About Games. The Magic Circle applies similarly to the little closed off workds we lock ourselves into when doing a research project. Don’t let anyone ever tell you that research is anything more than play, (though it’s often less…). Note also that that Digital Worlds blog post was itself is an ‘output’ from an uncourse I played with creating a few years ago. The material ended up being used by an actual OU course that followed on. I don’t think anyone in the OU really got it. Then MOOC hype shite came along and nothing really changed.

Playing the fool is a responsbile job, If you aren’t responsible, you can step beyond the bounds of playful foolishness and start “stirring”, or trying to use the cloak of foolishness to cause trouble directly…

“Another transgression occurs when the Fool cannot see beyond the play-making; i.e. the Fool becomes a Trickster, a Lucifer figure working solely to undermine and destroy order. This happens when the Fool forgets that part of the Fool’s role is sustaining order in the institutional complex.”

Beware Anansi taking over, in other words…

The “Emperor’s New Clothes” is one of my favourite stories. The boy is portrayed as foolish in his innocence, but he speaks a truth as a naif, or innocent. We see how corollaries to that story can be played out by the wise fool, rather than truth telling innocent…

p590 fool as educator
“Pursuing the metaphor of the Fool presents an interesting perspective on the University as an educational institution. While the Fool is an educator of sorts, she does not really ‘own’ knowledge that she ‘passes on’ as per our conventional understanding of pedagogy. Unlike the teacher who is usually cast as the learner’s caring coach, the Fool is an irritant, a provocateur, whose modus operandi is to provoke new wisdom in others. The Fool’s approach is, quite literally, to play the fool, acting as a lucid and ludic lens through which others perceive and recognize profound truths, truths that indeed may be lost in the conventions of learning and scholarship. The fool (like the child) is not expected to ‘know’ anything and is therefore free to act the fool, because she cannot, by definition, ‘know any better’. Paradoxically, this epistemic vacuum is also a potential source of great wisdom, which is why the idea of the ‘wise fool’ has such a long tradition. Moreover, the oxymoron ‘wise fool’ is also reversible: he that believes himself to be wise is necessarily foolish. For the Fool also reminds us that knowledge of the mystery of life is always beyond even the wise; at best we can only know that there is much of which we are and can only be ignorant.

[The university] must be the institutional manifestation of an oxymoron, remembering that this word comes from the Greek, oxumo ̄rone, meaning ‘pointedly foolish’.”

– Fin

Written by Tony Hirst

July 10, 2014 at 6:11 pm

Posted in Anything you want

Tagged with

Confused about “Confused About…” Posts

with 2 comments

Longtime readers will know that every so often I publish a posts whose title starts “Confused About”. The point of these posts is to put a marker down in this public notebook about words, phrases, ideas or stories that seem to make sense to everyone else but that I really don’t get.

They’re me putting my hand up and asking the possibly stupid question, then trying to explain my confusion and the ways in which I’m trying to make sense of the idea.

As educators, we’re forever telling learners not to be afraid of asking the question (“if you have a question, ask it: someone less comfortable even that than you with asking questions probably has the same question too, so you’ll be doing everyone a favour”), not to be afraid of volunteering an answer.

Of course, as academics, we can’t ever be seen to be wrong or to not know the answer, which means we can’t be expected to admit to being confused or not understanding something. Which is b*****ks of course.

We also can’t be seen to write down anything that might be wrong, because stuff that’s written down takes on the mantle of some sort of eternal truthiness. Which is also b*****ks. (This blog is a searchable, persistent, though mutable by edits, notebook of stuff that I was thinking at the time it was written. As the disclaimer could go, it does not necessarily represent even my own ideas or beliefs…)

It’s easy enough to take countermeasures to avoid citation of course – never publish in formal literature; if it’s a presentation that’s being recorded, put some music in it whose rights owners are litigious, or some pictures of Disney characters. Or swear…. Then people won’t link to you.

Anyway, back to being confused… I think that’s why I post these posts…

I also like to think they’re an example of open practice…

Written by Tony Hirst

July 2, 2014 at 9:03 am

Posted in Anything you want

Tagged with

It’s Not a Watch, It’s an Electronic Monitoring Tag

with one comment

In the closing line of his Observer Networker column on Sunday 23 March 2014 – Eisenhower’s military-industrial warning rings truer than ever John Naughton/@jjn1 concluded: “we’re witnessing the evolution of a military-information complex”. John also tweeted the story:

Google’s corporate Code of Conduct may begin with Don’t be evil, but I think a precautionary principle of considering the potential for evil should also be applied when trying to think through the possible implications of what Google, and other information companies of the same ilk, could do…

This isn’t about paranoid tin foil hat wearing fantasy – it’s about thinking through how cool it would be to try stuff out… and then oopsing. Or someone realising that that whatever can make shed loads of money, and surely it can’t hurt. Or a government forcing the company to do whatever. Or another company with an evilness agnostic motto (such as “maximise short term profits”) buying the company.

“Just” and “all you need to do” are often phrases that unpack badly in the tech world (“just” can be really hard, with multiple dependencies; “all you have to do” might mean you have to do pretty much everything). On the other hand “sure, we can do that” can cover things that are flick of a switch possible, but tend not to be done for policy reasons.

Geeks are optimists – “just” can take hours, days, weeks.. “Sure” can be dangerous. “Erm, well we can, but…” can mean game over when don’t be evil becomes a realised potential for evil.

What would we think if G4S, purveyors of monitoring technologies that have nothing to do with wearables or watches (although your “smart watch” is a tag, right?) bought Skybox Imaging?

What would we think if Securitas had bought Boston Dynamics, to keep up with G4S’ claimed use of surveillance drones?

What if Google, who sell advertising to influence you, or Amazon, who try to directly influence you to buy stuff, had been running #pysops experiments testing their ability to manipulate your emotions. [They do, of course...] Like advertising company company Facebook did – Experimental evidence of massive-scale emotional contagion through social networks:

The experiment manipulated the extent to which people (N = 689,003) were exposed to emotional expressions in their News Feed. This tested whether exposure to emotions led people to change their own posting behaviors, in particular whether exposure to emotional content led people to post content that was consistent with the exposure—thereby testing whether exposure to verbal affective expressions leads to similar verbal expressions, a form of emotional contagion. People who viewed Facebook in English were qualified for selection into the experiment.

[The experiment] was consistent with Facebook’s Data Use Policy, to which all users agree prior to creating an account on Facebook, constituting informed consent for this research. [my emphasis]

(For a defense of the experiment, see: In Defense of Facebook by Tal Yarkoni; I think one reason the chatterati are upset is because they remember undergrad psychology experiments and mention of ethics committees…)

Experian collect shed loads of personal data – including personal data they have privileged access to by virtue of being a credit reference checking agency – about you, me, all of us. Then we’re classed, demographically. And this information is sold, presumably to whoever’s buying, (local) government as well as commerce. Do we twitch if Google buys them? Do we care if GCHQ buys data from them?

What about Dunnhumby, processors of supermarket loyalty data among other things. Do we twitch if Amazon buys them?

Tesco Bank just started to offer a current account. Lots more lovely transaction data there:-)

I don’t know how to think through any of this. If raised in conversation, it always comes across as paranoid fantasy. But if you had access to the data, and all those toys? Well, you’d be tempted, wouldn’t you? To see if the “just” is a “just”, or it’s actually a “can’t”, or whether indeed it’s a straightforward “sure”…

Written by Tony Hirst

June 29, 2014 at 6:28 pm

Posted in Anything you want

Give the kids the power tools…

with one comment

How do we trade off giving students ridiculously powerful applications to play with, guarding against the “dangers” of doing so and the requirement to spend time teaching the tool rather than the concepts, versus just teaching them the concepts and hoping they’ll be able to see how to wield those concepts as they are manifest in and by the power tools?

“I … cannot disagree strongly enough with statements about the dangers of putting powerful tools in the hands of novices. Computer algebra, statistics, and graphics systems provide plenty of rope for novices to hang themselves and may even help to inhibit the learning of essential skills needed by researchers. The obvious problems caused by this situation do not justify blunting our tools, however. They require better education in the imaginative and disciplined use of these tools. And they call for more attention to the way powerful and sophisticated tools are presented to novice users.”
Leland Wilkinson, The Grammar of Graphics, Springer-Verlag, 1999, ISBN 0-387-98774-6, p15-16.

+1, as they say; +1.

Written by Tony Hirst

May 19, 2014 at 11:42 pm

Posted in Anything you want

Confused Again About VM Ecology… I Blame Not Blogging

leave a comment »

Via a cc’d tweet from Martin Hawksey, this lovely post from Tom Smith/@everythingabili (who has the best ever twitter bio strapline) on How I Learn ( And What I’m Learning ).

I like to think that I used to write blog posts that had the same sort of sense as that post…

…but for the last few months at least, I don’t think I have.

“Working” for once – starting production on an OU course (TM351, due out October 2015 (sic; I’m gonna be late on the first draft of the 7 weeks of the course I’m responsible for: it’s due in a little over a fortnight…), and also helping out on an investigative project the School of Data is partnering on – has meant that most of the learnings and failings that I used to blog about have been turned inward to emails (which are even more poorly structured in terms of note-taking than this blog is) if at all.

Which is a shame and makes me not happy.

Reading through completed academic papers, making useful (I hope) use of them in the course draft, has been quite fun in part – and made me regret at times not writing up work of my own in a self-contained, peer reviewed way over the last decade or so; getting stuff “into the record”, properly citable, and coherent enough to actually be cited. But as you pick away at the papers, you know they’re a revisionist telling, a straightforward narrative of how the pieces fit together and in which nothing went wrong along the way; (you also know that the closer you get to trying to replicate a paper, the harder it is to find the missing pieces (process, trick, equation or insight) that actually make it work; remember school maths, or physics, and the textbook that goes from one equation to the next with a “hence”, but there’s no way in hell you can figure out how to make that step and you know you’ll be stuck when that bit comes up in the exam…?! That. Or worse. When you bang your head against a wall trying to get something to work, contort your mental models to make it work, sort of, then see the errata list correcting that thing. That, too.)

On the other hand, this blog is not coherent, shapes no whole, but is full of hence steps. Disjointed ones, admittedly. But they keep a track of all the bits I tried at and failed at and worked around, and they keep on getting me out of holes… Like the emails won’t. Lost. Wasted effort because the learning fumblings that are OUseful learning fumblings are lost and locked up in email hell.

It makes me very not happy.

So that, by way of intro, to this: a quick catchup follow-up to Cursory Thoughts on Virtual Machines in Distance Education Courses and Doodling With IPython Notebooks for Education, a partial remembering of the various shades of hell associated with them and trying to share them.

Here’s what I think I now want to do (whether or not it’s the right thing I’m not sure).

  • generate a script that will build a VM. We’ve opted for Virtualbox as the VM container. The VM will need to contain: pandas; IPython notebook (course team want it to run Python 3.3. I’ve lost track of how many hours I’ve spent trying and failing to get Python libraries I think we need trying to run under Python 3.3; wasted effort; I should have settled with Python 2.7 and then revisited 3.3 in several months hence; the 2.7 3.3 tweaks to any code we write for the course should manageable in migration terms. Pratting around on libraries that I’m guessing will get patched as native distributions move to 3.3 by default but don’t work yet is wasted effort. October. 2015. First presentation.); PostgreSQL (perhaps with some extensions); mongodb; ipythonblocks; blockdiag; I came across shellinabox today and wonder if we should run that; OpenRefine (CT against this – I think it’s good for developing mental models); python-nvd3; folium; a ggplot port to python; (CT take – too much new stuff; my take, we should go as high up the stack as we can in terms of the grammar of the calling functions); I think we should run R and RStudio too to make for a really useful VM, making the point that the language doesn’t matter and we use whatever’s appropriate to get things done, but I don’t think anyone else does. if. Which computer language is that from then? for. Or that one? How about in? print? Cars are for driving. Mine’s blue. I have no idea how it works. Can I drive your car? The red one. With the left-hand drive.
  • access the services running on the headless VM via a browser on host. I think we should talk to the databases using Python, but I may be in the minority. We may need something more graphical to talk to postgresql. If we do, I would argue it should be a browser based client – if it’s an app, we’re moving functionality provision outside of the VM.
  • use the script to build to machines with the same package config; CT seem to prefer a single build on a 32 bit machine. I think we should support 64 bit as well. And deployment on at least one cloud service – I;d go for Amazon, but that’s mainly because it’s the only one I’ve tried. If we could support more, even better.
  • as far as maintenance goes, I wrote the vagrant script to update libraries whenever the provisioner is run (which is quite a lot at the mo as I keep finding new things to add to the box!;-) This may or may not be sensible for student use. If there is a bug in a library, an update could help. If there is a security patch to the kernel, we should be updating as good citizens. The current plan is to ship a built box (which I think would have to go on to a USB stick – we can’t rely on folk having computers with a DVD any more, and a 1.5GB download seems to be proving unreliable without a proper download manager. As it is, students will have to download virtualbox and vagrant, and install those themselves. (Unless we can put installers for them on a USB stick too.) If we do ship a built box, we need to think of some way of kickstarting the services and perhaps rebooting the machine (and then kickstarting the services). There is a separate question of whether we should be also be able to update config scripts during presentation. This would presumably have to be done on the host. One way might be to put config scripts on a git server then use git to keep the config scripts on the students’ host machine up to date, but that would probably also require them to install a git commandline tool, even if we automated the rest. Did I say this all has to run cross platform? Students may be on Windows (likely?), Mac or Linux. I think the course should be doable, if painfully, via a tablet, which means the VM needs the cloud hosted option. If we could also find a way to help students configure their whatever platform host so that they could access services from the VM running on it via their tablet, so much the better.
  • files need to be shared between VM and host. This raises an interesting issue for a cloud hosted VM. Either, we need to find a way to synch files between desktop and cloud VM, persist state on the cloud host so that the VM can synch to it, or pop dropbox into the cloud VM (though there would then be a synch delay, as there would with a desktop synch). I favour persisting on the cloud, though there is then the question of the student who is working on a home machine one day and a cloud machine the next.
  • Starting and stopping services: students need to be able to start and stop services running on the VM without having to ssh in. Once click easy. A dashboard with buttons that show if a service is running or not, click a button to toggle the run state of the the service. No idea how to do this.

Here’s the approach I’ve taken:

  • reusing DataminerUK’s infinite-interns model as a starting point, I’m using vagrant to build and provision a VM using puppet. At the moment I have duplicate setups for two different Linux base boxes (precise64 and raring64. The plan is to move to the latest Ubuntu LTS.) I really should have a single setup with the different machine setups called by name from a single Vagrantfile. I think.
  • The puppet provisioner builds the box from a minimal base and starts the services running. It’s aggressive on updates. The precise64 box is running python 2.7 and the raring64 box 3.3. Getting pip3 running in the raring box was a pain, and I don’t know how to tell puppet to use the pip3 thing I eventually installed to update. At the moment I fudge with:
    exec { "pip3-update":
    command => "/usr/local/bin/pip3 install --upgrade pip"
    }

    but it breaks because I’m not convinced that is always the right path (I’d like to hedge on /usr/bin:/usr/local/bin), or that pip3 is actually installed when I try to exec it… I think what I really want to do is something like
    package {
    [
    JUST UPGRADE YOURSELF, PLEASE
    ]: ensure => latest,
    provider => 'pip3';
    }

    with an additional dependency check (=>) that pip3 has been installed first, and from all the other pip3 installs that pip3 has been upgraded first.
  • The IPython notebook is started by a config shell script called from puppet. I think I’m also using a config script to set up a user and test tables in Postgres (though I plan to move to the puppet extension as soon as I can get round to reading the docs/finding out how to do it).
  • There are times when a server needs restarting. At the moment I have to run vagrant provision to do that – or even vagrant halt;vagrant up, which means it also runs the updates. It’d be nice to just be able to run the bits that restart the services (the DBMS’, IPython notebook etc) without doing any of the package updates, installs, checks etc.
  • We need a tool to check whether services are running on the necessary ports to help debugging without requiring a user to SSH into the VM; I’ve also fixed on default ports. We really need to change ports if a default port is being used to a free port and then somehow tell the IPython notebook scripts which port each service is running on. With vagrant letting you run a VM from within a particular directory, being able to know what VMs are being run and from where, wherever vagrant on host started them, would be useful.
  • I don’t use a configurator for the postgres db (it needs seeding with some example tables) but should do – on my to do list is to look at https://github.com/puppetlabs/puppetlabs-postgresql . Similarly for mongo db – and perhaps https://github.com/puppetlabs/puppetlabs-mongodb
  • To make use of python-nvd3, suggested route is to use bower. I got the npm package manager to work but have failed to find a way of installing any actual packages [issue].

Issues to date, aside from things like port clashes and all manner of f**k ups because I distributed a README with an error in it and folk tried to follow it rather than patches posted elsewhere, have been impeded by not having a good way of logging and sharing errors. OU network issues have also contributed to the fun. I always avoid the OU staff network, but nothing seems to work on that. I suspect this is a proxy issue, but I’m unwilling to invest any time exploring it or how to config the VM to cope (no-one else has offered to go down this rat hole). Poxy proxies could well be an issue for some students, but I’m not sure what the best way forward is. Cloud hosted VMs?!

I also had a problem on OU eduroam – mongodb wants to get something signed from a keyserver before it will install mongodb, but OU eduroam (the network I always use) seems to block the keyserver. Tracking that down wasted a good hour.

Here are some other things I’ve heard about:

- https://github.com/psychemedia/notebookcloud This is cloned from https://github.com/carlsmith who appears to have taken his repo – and the app – down? It provided a dashboard for firing up notebook servers on Amazon cloud. If I hadn’t been ‘working’ I’d have blogged screenshots and the workflow. As it is, all I have are vague memories of how it worked and what it did and the ideas that sprung off of having an artefact to talk around. [Hmm... app seems to have come back up - maybe I caught it at a bad time... https://notebookcloud.appspot.com/login ]

- provisioning things: chef, vagrant, puppet, docker.

What should I be using for what?

I thought about different VMs for different services, but that adds too much VM weight and requires networking between the VMs, we could lead to “support issues”. Would docker help here? A base VM built from vagrant and puppet, then docker to put additional machines on top.

What I want is students to be able to:

- install minimum number of third party apps on their desktop (currently virtualbox and vagrant)
– click one button get their VM. My feeling is we should have a largely prebuilt box on a USB stick they can use as a base box, then do a top up build and provision. I suspect CT would like one click somewhere to fire up a machine, get services running, and open a tab to the IPython notebook in their browser, maybe a status button somewhere, a single button to restart any services that have stopped and options to suspend or shutdown machines. In terms of student workflow, I think suspending and resuming machines (if services can resume appropriately) would be the neatest workflow. Note: the course runs over 9 months…
– be able to access files on host that are used in the VM. If they are using multiple VMs (eg on cloud and desktop) to find a sensible way of synching notebooks and data/database state across those machines; which could be tricky at least as far as database state goes.
– if a student is not using postgresql or mongo – and they won’t for the first 8 weeks of the course – it could be useful to run the VM without those services running (perhaps aside from a quick self-test in week 1 so we can check out any issues as early as possible and give ourselves a week or two to come up with any patches before those apps are used in anger). So maybe a control panel to fire up the VM and the services you want to run. Yes mongo, no postgresql. No DB at all. And so on. Would docker help here? Decide on host somehow which services to run, fire up the VM, then load in and start up the required services. Next session, change which services are running in the VM?

All in all, I reckon I’m between 20 and 40% there (further along in terms of time?) but not sure how much further I can push it to the desired level of robustness without learning how to do this stuff a bit more properly… I’m also not really sure what’s practically and reliably possible and what’s best for what. We have to maximise the robustness of stuff ‘just working’ and minimise support issues because students are at a distance and we don’t know what platform they’re on. I think if I started from scratch and rewrote the scripts now they’d be a lot clearer, but I know that’d take half a day and the functional return – for now – I think would be minimal.

That said, I’ve done a fair amount of learning, though large chunks of it have been lost to email and not blogging. Which is a shame. Must do better. Must take public notes.

Written by Tony Hirst

May 15, 2014 at 11:37 pm

Posted in Anything you want

Tagged with

Innovation’s End

with one comment

In that oft referred to work on innovation, The Innovator’s Dilemma, Clayton Christensen suggested that old guard companies struggle to innovate internally because of the value networks they have built up around their current business models. Upstart companies compete around the edges, providing cheaper but lower quality alternative offerings that allow the old guard to retreat to the higher value, higher quality products. As the upstarts improve their offerings, they grow market share and start to compete for the higher value customers. The upstarts also develop their own value networks which may be better adapted to an emerging new economy than the old guard’s network.

I don’t know if this model is still in favour, or whether it has been debunked by a more recent business author with an alternative story to sell, but in its original form it was a compelling tale, easily co-opted and reused, as I have done here. I imagine over the years, the model has evolved and become more refined, perhaps offering ever more upmarket consultancy opportunities to Christensen and his acolytes.

The theory was also one of the things I grasped at this evening to try to help get my head round why the great opportunities for creative play around the technologies being developed by companies such as Google, Amazon and Yahoo five or so years ago don’t seem to be there any more. (See for example this post mourning the loss of a playful web.)

The following screenshots – taken from Data Scraping Wikipedia with Google Spreadsheets – show how the original version of Google spreadsheets used to allow you to generate different file output formats, with their own URL, from a particular sheet in a Google spreadsheet:

publishAsWebpage

publishFormats

publishOPtions

morePublishOptions

In the new Google spreadsheets, this is what you’re now offered from the Publish to Web options:

googleshshare

[A glimmer of hope - there's still a route to CSV URLs in the new Google spreadsheets. But the big question is - will the Google Query language still work with the new Google spreadsheets?]

embed changes everything

(For some reason, WordPress won’t let me put angle brackets round that phrase. #ffs)

That’s what I said in this video put together for a presentation to a bunch of publishers visiting the OU Library at an event I couldn’t be at in person (back when I used to be invited to give presentations at events…)

I saw embed as a way that the publishers could retain control over content whilst still allowing people to access the content, and make it accessible, in ways that the publishers wouldn’t have thought of.

Where content could be syndicated but remain under control of the publisher, the idea was that new value networks could spring up around legacy content, and the publishers could then find a way to take a cut. (Publishers don’t see it that way of course – they want it all. However big the pie, they want all of it. If someone else finds a way to make the pie bigger, that’s not interesting. My pie. All mine. My value network, not yours, even if yours feeds mine. Because it’s mine. All mine.)

I used to build things around Amazon’s API, and Yahoo’s APIs, and Google APIs, and Twitter’s API. As those companies innovated, they built bare bones services that they let others play with. Against the established value network order of SOAP and enterprise service models let the RESTful upstarts play with their toys. And the upstarts let us play with their toys. And we did, because they were easy to play with.

But they’re not anymore. The upstarts started to build up their services, improve them, entrench them. And now they’re not something you can play with. The toys became enterprise warez and now you need professional tools to play with them. I used to hack around URLs and play with the result using a few lines of Javascript. Now I need credentials and heavyweight libraries, programming frameworks and tooling.

Christensen saw how the old guard, with their entrenched value networks couldn’t compete. The upstarts had value networks with playful edges and low hanging technological fruit we could pick up and play with. The old guard entrenched upwards, the upstarts upped their technology too, their value networks started to get real monetary value baked in, grown up services, ffs stop playing with our edges and bending our branches looking for low hanging fruit, because there isn’t any more. Go away and play somewhere else.

Go away and play somewhere else.

Go somewhere else.

Lock (y)our content in, Google, lock it in. Go play with yourself. Your social network sucked and your search is getting ropey. You want to lock up content, well so does every other content generating site, which means you’re all gonna be faced with the problem of ranking content that all intranets face. And their searches universally suck.

The innovator’s dilemma presented incumbents with the problem of how to generate new products and business models that might threaten their current ones. The upstarts started scruffy and let people play alongside, let people innovate along with them. The upstarts moved upwards and locked out the innovation networks around them. Innovations end. Innovation’s end. Innovation send. Away.

< embed > changes everything. Only this time it’s gone the wrong way. I saw embed as a way for us to get their closed content. Now Google’s gone the other way – open data has become an embedded package.

“God help us.” Withnail.

PS Google – why did my, sorry, your Chrome browser ask for my contacts today? Why? #ffs, why?

Written by Tony Hirst

May 2, 2014 at 11:47 pm

Posted in Anything you want

Tagged with

Follow

Get every new post delivered to your Inbox.

Join 772 other followers