OUseful.Info, the blog…

Trying to find useful things to do with emerging technologies in open education

Confused Again About VM Ecology… I Blame Not Blogging

leave a comment »

Via a cc’d tweet from Martin Hawksey, this lovely post from Tom Smith/@everythingabili (who has the best ever twitter bio strapline) on How I Learn ( And What I’m Learning ).

I like to think that I used to write blog posts that had the same sort of sense as that post…

…but for the last few months at least, I don’t think I have.

“Working” for once – starting production on an OU course (TM351, due out October 2015 (sic; I’m gonna be late on the first draft of the 7 weeks of the course I’m responsible for: it’s due in a little over a fortnight…), and also helping out on an investigative project the School of Data is partnering on – has meant that most of the learnings and failings that I used to blog about have been turned inward to emails (which are even more poorly structured in terms of note-taking than this blog is) if at all.

Which is a shame and makes me not happy.

Reading through completed academic papers, making useful (I hope) use of them in the course draft, has been quite fun in part – and made me regret at times not writing up work of my own in a self-contained, peer reviewed way over the last decade or so; getting stuff “into the record”, properly citable, and coherent enough to actually be cited. But as you pick away at the papers, you know they’re a revisionist telling, a straightforward narrative of how the pieces fit together and in which nothing went wrong along the way; (you also know that the closer you get to trying to replicate a paper, the harder it is to find the missing pieces (process, trick, equation or insight) that actually make it work; remember school maths, or physics, and the textbook that goes from one equation to the next with a “hence”, but there’s no way in hell you can figure out how to make that step and you know you’ll be stuck when that bit comes up in the exam…?! That. Or worse. When you bang your head against a wall trying to get something to work, contort your mental models to make it work, sort of, then see the errata list correcting that thing. That, too.)

On the other hand, this blog is not coherent, shapes no whole, but is full of hence steps. Disjointed ones, admittedly. But they keep a track of all the bits I tried at and failed at and worked around, and they keep on getting me out of holes… Like the emails won’t. Lost. Wasted effort because the learning fumblings that are OUseful learning fumblings are lost and locked up in email hell.

It makes me very not happy.

So that, by way of intro, to this: a quick catchup follow-up to Cursory Thoughts on Virtual Machines in Distance Education Courses and Doodling With IPython Notebooks for Education, a partial remembering of the various shades of hell associated with them and trying to share them.

Here’s what I think I now want to do (whether or not it’s the right thing I’m not sure).

  • generate a script that will build a VM. We’ve opted for Virtualbox as the VM container. The VM will need to contain: pandas; IPython notebook (course team want it to run Python 3.3. I’ve lost track of how many hours I’ve spent trying and failing to get Python libraries I think we need trying to run under Python 3.3; wasted effort; I should have settled with Python 2.7 and then revisited 3.3 in several months hence; the 2.7 3.3 tweaks to any code we write for the course should manageable in migration terms. Pratting around on libraries that I’m guessing will get patched as native distributions move to 3.3 by default but don’t work yet is wasted effort. October. 2015. First presentation.); PostgreSQL (perhaps with some extensions); mongodb; ipythonblocks; blockdiag; I came across shellinabox today and wonder if we should run that; OpenRefine (CT against this – I think it’s good for developing mental models); python-nvd3; folium; a ggplot port to python; (CT take – too much new stuff; my take, we should go as high up the stack as we can in terms of the grammar of the calling functions); I think we should run R and RStudio too to make for a really useful VM, making the point that the language doesn’t matter and we use whatever’s appropriate to get things done, but I don’t think anyone else does. if. Which computer language is that from then? for. Or that one? How about in? print? Cars are for driving. Mine’s blue. I have no idea how it works. Can I drive your car? The red one. With the left-hand drive.
  • access the services running on the headless VM via a browser on host. I think we should talk to the databases using Python, but I may be in the minority. We may need something more graphical to talk to postgresql. If we do, I would argue it should be a browser based client – if it’s an app, we’re moving functionality provision outside of the VM.
  • use the script to build to machines with the same package config; CT seem to prefer a single build on a 32 bit machine. I think we should support 64 bit as well. And deployment on at least one cloud service – I;d go for Amazon, but that’s mainly because it’s the only one I’ve tried. If we could support more, even better.
  • as far as maintenance goes, I wrote the vagrant script to update libraries whenever the provisioner is run (which is quite a lot at the mo as I keep finding new things to add to the box!;-) This may or may not be sensible for student use. If there is a bug in a library, an update could help. If there is a security patch to the kernel, we should be updating as good citizens. The current plan is to ship a built box (which I think would have to go on to a USB stick – we can’t rely on folk having computers with a DVD any more, and a 1.5GB download seems to be proving unreliable without a proper download manager. As it is, students will have to download virtualbox and vagrant, and install those themselves. (Unless we can put installers for them on a USB stick too.) If we do ship a built box, we need to think of some way of kickstarting the services and perhaps rebooting the machine (and then kickstarting the services). There is a separate question of whether we should be also be able to update config scripts during presentation. This would presumably have to be done on the host. One way might be to put config scripts on a git server then use git to keep the config scripts on the students’ host machine up to date, but that would probably also require them to install a git commandline tool, even if we automated the rest. Did I say this all has to run cross platform? Students may be on Windows (likely?), Mac or Linux. I think the course should be doable, if painfully, via a tablet, which means the VM needs the cloud hosted option. If we could also find a way to help students configure their whatever platform host so that they could access services from the VM running on it via their tablet, so much the better.
  • files need to be shared between VM and host. This raises an interesting issue for a cloud hosted VM. Either, we need to find a way to synch files between desktop and cloud VM, persist state on the cloud host so that the VM can synch to it, or pop dropbox into the cloud VM (though there would then be a synch delay, as there would with a desktop synch). I favour persisting on the cloud, though there is then the question of the student who is working on a home machine one day and a cloud machine the next.
  • Starting and stopping services: students need to be able to start and stop services running on the VM without having to ssh in. Once click easy. A dashboard with buttons that show if a service is running or not, click a button to toggle the run state of the the service. No idea how to do this.

Here’s the approach I’ve taken:

  • reusing DataminerUK’s infinite-interns model as a starting point, I’m using vagrant to build and provision a VM using puppet. At the moment I have duplicate setups for two different Linux base boxes (precise64 and raring64. The plan is to move to the latest Ubuntu LTS.) I really should have a single setup with the different machine setups called by name from a single Vagrantfile. I think.
  • The puppet provisioner builds the box from a minimal base and starts the services running. It’s aggressive on updates. The precise64 box is running python 2.7 and the raring64 box 3.3. Getting pip3 running in the raring box was a pain, and I don’t know how to tell puppet to use the pip3 thing I eventually installed to update. At the moment I fudge with:
    exec { "pip3-update":
    command => "/usr/local/bin/pip3 install --upgrade pip"
    }

    but it breaks because I’m not convinced that is always the right path (I’d like to hedge on /usr/bin:/usr/local/bin), or that pip3 is actually installed when I try to exec it… I think what I really want to do is something like
    package {
    [
    JUST UPGRADE YOURSELF, PLEASE
    ]: ensure => latest,
    provider => 'pip3';
    }

    with an additional dependency check (=>) that pip3 has been installed first, and from all the other pip3 installs that pip3 has been upgraded first.
  • The IPython notebook is started by a config shell script called from puppet. I think I’m also using a config script to set up a user and test tables in Postgres (though I plan to move to the puppet extension as soon as I can get round to reading the docs/finding out how to do it).
  • There are times when a server needs restarting. At the moment I have to run vagrant provision to do that – or even vagrant halt;vagrant up, which means it also runs the updates. It’d be nice to just be able to run the bits that restart the services (the DBMS’, IPython notebook etc) without doing any of the package updates, installs, checks etc.
  • We need a tool to check whether services are running on the necessary ports to help debugging without requiring a user to SSH into the VM; I’ve also fixed on default ports. We really need to change ports if a default port is being used to a free port and then somehow tell the IPython notebook scripts which port each service is running on. With vagrant letting you run a VM from within a particular directory, being able to know what VMs are being run and from where, wherever vagrant on host started them, would be useful.
  • I don’t use a configurator for the postgres db (it needs seeding with some example tables) but should do – on my to do list is to look at https://github.com/puppetlabs/puppetlabs-postgresql . Similarly for mongo db – and perhaps https://github.com/puppetlabs/puppetlabs-mongodb
  • To make use of python-nvd3, suggested route is to use bower. I got the npm package manager to work but have failed to find a way of installing any actual packages [issue].

Issues to date, aside from things like port clashes and all manner of f**k ups because I distributed a README with an error in it and folk tried to follow it rather than patches posted elsewhere, have been impeded by not having a good way of logging and sharing errors. OU network issues have also contributed to the fun. I always avoid the OU staff network, but nothing seems to work on that. I suspect this is a proxy issue, but I’m unwilling to invest any time exploring it or how to config the VM to cope (no-one else has offered to go down this rat hole). Poxy proxies could well be an issue for some students, but I’m not sure what the best way forward is. Cloud hosted VMs?!

I also had a problem on OU eduroam – mongodb wants to get something signed from a keyserver before it will install mongodb, but OU eduroam (the network I always use) seems to block the keyserver. Tracking that down wasted a good hour.

Here are some other things I’ve heard about:

- https://github.com/psychemedia/notebookcloud This is cloned from https://github.com/carlsmith who appears to have taken his repo – and the app – down? It provided a dashboard for firing up notebook servers on Amazon cloud. If I hadn’t been ‘working’ I’d have blogged screenshots and the workflow. As it is, all I have are vague memories of how it worked and what it did and the ideas that sprung off of having an artefact to talk around. [Hmm... app seems to have come back up - maybe I caught it at a bad time... https://notebookcloud.appspot.com/login ]

- provisioning things: chef, vagrant, puppet, docker.

What should I be using for what?

I thought about different VMs for different services, but that adds too much VM weight and requires networking between the VMs, we could lead to “support issues”. Would docker help here? A base VM built from vagrant and puppet, then docker to put additional machines on top.

What I want is students to be able to:

- install minimum number of third party apps on their desktop (currently virtualbox and vagrant)
- click one button get their VM. My feeling is we should have a largely prebuilt box on a USB stick they can use as a base box, then do a top up build and provision. I suspect CT would like one click somewhere to fire up a machine, get services running, and open a tab to the IPython notebook in their browser, maybe a status button somewhere, a single button to restart any services that have stopped and options to suspend or shutdown machines. In terms of student workflow, I think suspending and resuming machines (if services can resume appropriately) would be the neatest workflow. Note: the course runs over 9 months…
- be able to access files on host that are used in the VM. If they are using multiple VMs (eg on cloud and desktop) to find a sensible way of synching notebooks and data/database state across those machines; which could be tricky at least as far as database state goes.
- if a student is not using postgresql or mongo – and they won’t for the first 8 weeks of the course – it could be useful to run the VM without those services running (perhaps aside from a quick self-test in week 1 so we can check out any issues as early as possible and give ourselves a week or two to come up with any patches before those apps are used in anger). So maybe a control panel to fire up the VM and the services you want to run. Yes mongo, no postgresql. No DB at all. And so on. Would docker help here? Decide on host somehow which services to run, fire up the VM, then load in and start up the required services. Next session, change which services are running in the VM?

All in all, I reckon I’m between 20 and 40% there (further along in terms of time?) but not sure how much further I can push it to the desired level of robustness without learning how to do this stuff a bit more properly… I’m also not really sure what’s practically and reliably possible and what’s best for what. We have to maximise the robustness of stuff ‘just working’ and minimise support issues because students are at a distance and we don’t know what platform they’re on. I think if I started from scratch and rewrote the scripts now they’d be a lot clearer, but I know that’d take half a day and the functional return – for now – I think would be minimal.

That said, I’ve done a fair amount of learning, though large chunks of it have been lost to email and not blogging. Which is a shame. Must do better. Must take public notes.

Written by Tony Hirst

May 15, 2014 at 11:37 pm

Posted in Anything you want

Tagged with

Innovation’s End

with one comment

In that oft referred to work on innovation, The Innovator’s Dilemma, Clayton Christensen suggested that old guard companies struggle to innovate internally because of the value networks they have built up around their current business models. Upstart companies compete around the edges, providing cheaper but lower quality alternative offerings that allow the old guard to retreat to the higher value, higher quality products. As the upstarts improve their offerings, they grow market share and start to compete for the higher value customers. The upstarts also develop their own value networks which may be better adapted to an emerging new economy than the old guard’s network.

I don’t know if this model is still in favour, or whether it has been debunked by a more recent business author with an alternative story to sell, but in its original form it was a compelling tale, easily co-opted and reused, as I have done here. I imagine over the years, the model has evolved and become more refined, perhaps offering ever more upmarket consultancy opportunities to Christensen and his acolytes.

The theory was also one of the things I grasped at this evening to try to help get my head round why the great opportunities for creative play around the technologies being developed by companies such as Google, Amazon and Yahoo five or so years ago don’t seem to be there any more. (See for example this post mourning the loss of a playful web.)

The following screenshots – taken from Data Scraping Wikipedia with Google Spreadsheets – show how the original version of Google spreadsheets used to allow you to generate different file output formats, with their own URL, from a particular sheet in a Google spreadsheet:

publishAsWebpage

publishFormats

publishOPtions

morePublishOptions

In the new Google spreadsheets, this is what you’re now offered from the Publish to Web options:

googleshshare

[A glimmer of hope - there's still a route to CSV URLs in the new Google spreadsheets. But the big question is - will the Google Query language still work with the new Google spreadsheets?]

embed changes everything

(For some reason, WordPress won’t let me put angle brackets round that phrase. #ffs)

That’s what I said in this video put together for a presentation to a bunch of publishers visiting the OU Library at an event I couldn’t be at in person (back when I used to be invited to give presentations at events…)

I saw embed as a way that the publishers could retain control over content whilst still allowing people to access the content, and make it accessible, in ways that the publishers wouldn’t have thought of.

Where content could be syndicated but remain under control of the publisher, the idea was that new value networks could spring up around legacy content, and the publishers could then find a way to take a cut. (Publishers don’t see it that way of course – they want it all. However big the pie, they want all of it. If someone else finds a way to make the pie bigger, that’s not interesting. My pie. All mine. My value network, not yours, even if yours feeds mine. Because it’s mine. All mine.)

I used to build things around Amazon’s API, and Yahoo’s APIs, and Google APIs, and Twitter’s API. As those companies innovated, they built bare bones services that they let others play with. Against the established value network order of SOAP and enterprise service models let the RESTful upstarts play with their toys. And the upstarts let us play with their toys. And we did, because they were easy to play with.

But they’re not anymore. The upstarts started to build up their services, improve them, entrench them. And now they’re not something you can play with. The toys became enterprise warez and now you need professional tools to play with them. I used to hack around URLs and play with the result using a few lines of Javascript. Now I need credentials and heavyweight libraries, programming frameworks and tooling.

Christensen saw how the old guard, with their entrenched value networks couldn’t compete. The upstarts had value networks with playful edges and low hanging technological fruit we could pick up and play with. The old guard entrenched upwards, the upstarts upped their technology too, their value networks started to get real monetary value baked in, grown up services, ffs stop playing with our edges and bending our branches looking for low hanging fruit, because there isn’t any more. Go away and play somewhere else.

Go away and play somewhere else.

Go somewhere else.

Lock (y)our content in, Google, lock it in. Go play with yourself. Your social network sucked and your search is getting ropey. You want to lock up content, well so does every other content generating site, which means you’re all gonna be faced with the problem of ranking content that all intranets face. And their searches universally suck.

The innovator’s dilemma presented incumbents with the problem of how to generate new products and business models that might threaten their current ones. The upstarts started scruffy and let people play alongside, let people innovate along with them. The upstarts moved upwards and locked out the innovation networks around them. Innovations end. Innovation’s end. Innovation send. Away.

< embed > changes everything. Only this time it’s gone the wrong way. I saw embed as a way for us to get their closed content. Now Google’s gone the other way – open data has become an embedded package.

“God help us.” Withnail.

PS Google – why did my, sorry, your Chrome browser ask for my contacts today? Why? #ffs, why?

Written by Tony Hirst

May 2, 2014 at 11:47 pm

Posted in Anything you want

Tagged with

Personal Recollections of the “Data Journalism” Phrase

leave a comment »

@digiphile’s being doing some digging around current popular usage of the phrase data journalism – here are my recollections…

My personal recollection of the current vogue is that “data driven journalism” was the phrase that dominated the discussions/community I was witness to around early 2009, though for some reason my blog doesn’t give any evidence for that (must take better contemporaneous notes of first noticings of evocative phrases;-). My route in was via “mashups”, mashup barcamps, and the like, where folk were experimenting with building services on newly emerging (and reverse engineered) APIs; things like crime mapping and CraigsList maps were in the air – putting stuff on maps was very popular I seem to recall! Yahoo were one of the big API providers at the time.

I noted the launch of the Guardian datablog and datastore in my personal blog/notebook here – http://blog.ouseful.info/2009/03/10/using-many-eyes-wikified-to-visualise-guardian-data-store-data-on-google-docs/ – though for some reason don’t appear to have linked to a launch post. With the arrival of the datastore it looked like there were to be “trusted” sources of data we could start to play with in a convenient way, accessed through Google docs APIs:-) Some notes on the trust thing here: http://blog.ouseful.info/2009/06/08/the-guardian-openplatform-datastore-just-a-toy-or-a-trusted-resource/

NESTA did an event on News Innovation London in July 2009, a review of which by @kevglobal mentions “discussions about data-driven journalism” (sic on the hyphen). I seem to recall that Journalism.co.uk (@JTownend) were also posting quite a few noticings around the use of data in the news at the time.

At some point, I did a lunchtime at the Guardian for their developers – there was a lot about Yahoo Pipes, I seem to remember! (I also remember pitching the Guardian Platform API to developers in the OU as a way of possibly getting fresh news content into courses. No-one got it…) I recall haranguing Simon Rogers on a regular basis about their lack of data normalisation (which I think in part led to the production of the Rosetta Stone spreadsheet) and their lack of use (at the time) of fusion tables. Twitter archives may turn something up there. Maybe Simon could go digging in the Twitter archives…?;-)

There was a session on related matters at the first(?) news:rewired event in early 2010 but I don’t recall the exact title of the session (I was in a session with Francis Irving/@frabcus from the then nascent Scraperwiki) http://blog.ouseful.info/2010/01/14/my-presentation-for-newsrewired-doing-the-data-mash/ Looking at the content of that presentation, it’s heavily dominated by notions of data flow; the data driven journalism (hence #ddj) phrase, seemed to fit this well.

Later that year, summer, was a roundtable event hosted by the ECJ on “data driven journalism” – I recall meeting Mirko Lorenz there (who maybe had a background in business data? and since helped launch datawrapper.de) and Jonathan Gray – who then went on to help edit the Data Journalism handbook – among others.

http://blog.ouseful.info/2010/08/25/my-slides-from-the-data-driven-journalism-round-table-ddj/

For me the focus at the time was very much on using technology to help flow data into useable content, (eg in a similar but perhaps slightly weaker sense than the more polished content generation services that Narrative Science/Automated Insights have since come to work on, or other data driven visualisations or what I guess we might term local information services; more about data driven applications with a weak local news/specific theme or issue general news relevance, perhaps). I don’t remember where the sense of the journalist was in all this – maybe as someone who would be able to take the flowed data, or use tools that were being developed to get the stories out of data with tech support?

My “data driven journalism” phrase notebook timeline

http://blog.ouseful.info/?s=%22data%20driven%20journalism%22&order=asc

My “data journalist” phrase notebook timeline

http://blog.ouseful.info/?s=%22data%20journalist%22&order=asc

My first blogged used of the data journalism phrase, in quotes, as it happens, so it must have been a relatively new sounding phrase to me, was here: http://blog.ouseful.info/2009/05/20/making-it-a-little-easier-to-use-google-spreadsheets-as-a-database-hopefully/ (h/t @paulbradshaw)

Seems like my first use of the “data journalist” phrase was in picking up on a job ad – so certainly the phrase was common to me by then.

http://blog.ouseful.info/2010/12/04/what-is-a-data-journalist/

As a practice and a commonplace, things still seemed to be developing in 2011 enough for me to comment on a situation where the Guardian and Telegraph teams were co-opetitively bootstrapping each other’s ideas: http://blog.ouseful.info/2011/09/13/data-journalists-engaging-in-co-innovation/

I guess the deeper history of CAR, database journalism, precision journalism may throw off trace references, though maybe not representing situations that led to the phrase gaining traction in “popular” usage?

Certainly, now I’m wondering what the relative rise in popularity of “data journalist” versus “data journalism” was? Certainly, for me, “data driven journalism” was a phrase I was familiar with way before the other two, though I do recall a sense of unease about it’s applicability to news stories that were perhaps “driven” by data more in the sense of being motivated or inspired by it, or whose origins lay in a data set, rather than “driven” in a live, active sense of someone using an interface that was powered by flowing data.

Written by Tony Hirst

April 29, 2014 at 9:36 am

Posted in Anything you want

Tagged with

Writing Diagrams – Boxes and Arrows

with 2 comments

If you’ve ever had to draw “blocks and arrows” diagrams, you’ll know how irritating it can be if you spend hours laying out the diagram using a presentation editor or drawing tool, only to find you need to edit the drawing, add another box, and lay the whole thing out again.

Surely there must be a better way?

Let’s just think about what a box and arrow diagram is intended to show: when describing a process, the connections typically represent a flow from one thing to another; furthermore, the layout is often rectilinear, laid out along straight lines, the boxes tidily spaced and their edges lined up with each other. In a diagram such as a mindmap, different ideas or concepts are related to each other by drawing a line between them and the layout may be more fluid, with like or related concepts grouped together in space, or by the additional use of colour themes, for example.

The primary information contained in the diagram are the text elements and the connections between them. The positioning on the page often reflects the structure of these connections. When we lay out a diagram, we unconsciously favour layouts that minimise the number of crossed lines (to keep the diagram “clean” looking), and group connected items close together (unless some other information requires us to separate them – for example, we might be using a timeline basis for a horizontal x-axis and placing boxes in areas of the canvas we are working on that are associated with a particular month).

The online Google Drawing document type is typical of drawing tools included in many office applications. As well as being able to draw boxes and connect registration points on each box by lines or arrows, a range of layout tools provides support for aligning and spacing boxes.

google draw

Tools such as popplet provide a friendlier environment for generating similar sorts of diagram:

popplet

Whilst drawing tools such as these allow you to craft your diagram by hand, building it up as you go along, actually putting previously collected information into blocks on the canvas, let alone connecting the blocks together and laying them out nicely, may be quite an involved and error prone affair.

In these circumstances, it may make more sense to take a raw representation of the block contents and a simple representation of connections between appropriate blocks and just write the relationships down, letting a drawing tool do the hard work of drawing the blocks, connecting them together and laying them out, at least in draft layout form. To provide for a final layer of customisation, it might also be useful to be able to take a vector/SVG representation of the automatically sketched layout into a drawing package where it can be tidied up by hand and the application of a human designer’s eye.

There are several online tools available that you can use to sketch box and arrow diagrams from simple text descriptions.

Text2mindmap

Text2Mindmap allows you to construct tree based mindmaps from a simple outline style description of the mindmap.

text2mindmap

The layout has a radial basis. Designs can be saved and images downloaded as JPG or PDF files.

Diagrammr

Diagrammr allows you to draw simple graph based network structures in which text labelled block elements can be connected to other blocks by labelled edges.

diagrammr

Designs are given a persistent URL, but anyone with access to the URL can edit the diagram.

JS Sequence Diagrams

JS Sequence Diagrams is a javascript library for generating sequence diagrams from a textual description of them.

jssequencediagrams

Diagrams can be saved as SVG files. The source code is available and depends on Jison and Raphaël as the graphics library.

blockdiag

blockdiag (application) uses a language similar to the DOT language for describing a range of block diagram types (block diagrams, sequence diagrams, activity diagrams, logical network diagrams).

blockdiag2

Diagrams can be saved as SVG diagrams and associated with a URL that contains all the information used to recreate the diagram. As such, large diagrams are not supported if they make the URL too long. Source code is available.

Several diagram types are available using blockdiag, including graphviz diagrams constructed using the DOT language.

GraphvizFiddle

GraphvizFiddle is a fiddler style application that lets you enter ,a href=”http://www.graphviz.org/content/dot-language”>Graphviz DOT language descriptions and preview the result.

graphvizfiddle

Files can be generated in SVG format, or a textual definitions (for example, in the dot layout language).

Summary

Generating boxes and arrows style diagrams can be a pain at times because the semantics of the diagram – how one item is related to another – is represented in a graphical rather than data based form. By writing down the relations and then automatically generating visual representations of them, we retain access to the data representation whilst letting the hard work of generating the initial draft, at least, of the layout to a machine.

Several tools are available to support the creation of such literally described box and arrow diagrams using a variety of description languages and generating a range of output image formats (SVG probably being the most useful if you need to edit the sketch diagram for yourself to tweak the layout for its final presentation). Code for some of the tools (JS Sequence diagrams, blockdiag) is available.

Arguably the most powerful tools allow you to “write” diagrams using the Graphviz DOT layout language. Whilst there is a certain overhead associated with learning this language, it does save time in the long run if you regularly need to create network style diagrams. Graphviz also supports a range of layout algorithms – see the Graphviz gallery for examples.

PS If you want to write your own diagramming application, the JointJS library looks like a handy library to have on hand… The Venn.js library also looks quite pretty – if you have to generate Venn diagrams, that is!

Written by Tony Hirst

April 28, 2014 at 1:57 pm

Posted in Infoskills

Open Data (or Not) About Designated Public Place Orders on the Isle of Wight

leave a comment »

A couple of notices taken out in this week’s Isle of Wight County Press by the Isle of Wight Council raise notice that a couple more areas are to become “designated public spaces”, which means that the police and other recognised officers are allowed to confiscate alcohol (no drinking on the beach…).

Untitled

Despite my best efforts, I failed to find a listing on the Isle of Wight Council website detailing all designated public spaces on the island, or showing maps of their extent.

As with other council orders that have a geographical component, I am ever hopeful that appropriate maps will be made available, in the case of designated public places as a shapefile, not least because I generally have no idea what boundaries the notices refer to when they just describe road names and other geographical features (“along the beach to the point in line with the junction of Whitecross Lane and Whitecross Farm Lane”, for example…). The notice does say “as shown on the map to be attached to the Order”, but I can’t readily find the order on the council website to see whether the map has yet been attached (maybe it’s buried in papers relating to licensing committee meetings?) I did manage to find a map (search: designated public place lake/) but only because I read URLs…

spot the map

Re: finding maps relating to DPPOs, I’ve had this problem before. The same criticisms still apply – the Home OFfice don’t appear to maintain a single register of DPPOs in force, despite the original guidance note saying they would, although they will release the data at a crude level in response to an FOI request. Again, I wonder about setting up an FOI repeater to schedule monthly submissions of the same request to handle this sort of query to the Home Office.

Drawing on the fan in/fan out notion, a Home Office aggregation of the data represents a large fan-in of data from separate councils, and seems to be the easiest place to get hold of such data. However, it would help if they also requested and made available information about the geographical extent of each order as a shapefile.

Here’s the map, though I’m not clear whether I’m breaching copyright in displaying it without permission:

dppo map - licensed

Returning to the notice, I can’t find it displayed on the Isle of Wight County Press website, or on OnTheWight, our hyperlocal blog, despite online fora being legitimate spaces for publishing notices, so I guess buying the County Press is another of those effective taxes I need to pay to keep up with council notices (spending – (click search to run…)).

(As well as posting notices in the IWCP, why doesn’t the council have a /notices area of the website to which it could post its statutory announcements in addition to any other media channels it uses that would act as a convenient, authoritative and archival repository of such notices? [NB quite a few councils do have an area for public notices on their website. For example, do a web search for intitle:"public notices" site:gov.uk] In addition, an area of the site detailing orders in place and in force on the Island so we could keep up with how freedoms are being restricted at any time by the council.)

In passing, whilst trying to find details about DPPOs on the island, I came across a copy of the consultation around the DPPOs on the Gurnard Parish Council website:

police request dppo

The Isle of Wight County Council has received a request from the Hampshire Constabulary to consider the imposition of Designated Public Place Orders (DPPOs) in …”

Hmm… so does the Hampshire Constabulary website list the extent of DPPOs in the region it covers? (I note on WhatDoTheKnow several people have FOId information about DPPOs from police forces.)

I’m pretty sure there’s a DPPO covering Ryde (here’s an example of additional evidence from Hampshire Constabulary offered in support of DPPO in Ryde in 2007 that was use to support an application for an order at that time) but I can’t find any statement about where, or if, such an order applies on the Isle of Wight Council website, or on the appropriate local neighbourhood area of the Hampshire Constabulary site.

hampshire dppo

So the only way for me to find out is to go round Ryde looking for a signs, and maybe (outside chance) a notice that has a map of the area? Hmm…

PS Just by the by, shapefiles would also help when it comes to working out which bits of the beach I can and can’t walk the dog on…

shapefiles would help...

Written by Tony Hirst

April 18, 2014 at 9:55 am

Posted in Open Data, opengov

Tagged with , ,

Open Data, Transparency, Fan-In and Fan-Out

with one comment

In digital electronics, the notions of fan in and fan out describe, respectively, the number of inputs a gate (or, on a chip, a pin) can handle, or the number of output connections it can drive. I’ve been thinking about this notion quite a bit, recently, in the context of concentrating information, or data, about a particular service.

For example, suppose I want to look at the payments made by a local council, as declared under transparency regulations. I can get the data for a particular council from a particular source. If we consider each organisation that the council makes a payment to as a separate output (that is, as a connection that goes between that council and the particular organisation), the fan out of the payment data gives the number of distinct organisations that the council has made a declared payment to.

One things councils do is make payments to other public bodies who have provided them with some service or other. This may include other councils (for example, for the delivery of services relating to out of area social care).

Why might this be useful? If we aggregate the payments data from different councils, we can set up a database that allows us to look at all payments from different councils to a particular organisation, (which may also be a particular council, which is obliged to publish its transaction data, as well as a private company, which currently isn’t). (See Using Aggregated Local Council Spending Data for Reverse Spending (Payments to) Lookups for an example of this. I think startup Spend Network are aggregating this data, but they don’t seem to be offering any useful open or free services, or data collections, off the back of it. OpenSpending has some data, but it’s scattergun in what’s there and what isn’t, depending as it does on volunteer data collectors and curators.)

The payments incoming to a public body from other public bodies are therefore available as open data, but not in a generally, or conveniently, concentrated way. The fan in public payments is given by the number of public bodies that have made a payment to a particular body (which may itself be a public body or may be a private company). If the fan in is large, it can be a major chore searching through the payments data of all the other public bodies trying to track down payments to the body of interest.

Whilst I can easily discover fan out payments from a public body, I can’t easily discover the originators of fan in public payments to a body, public or otherwise. Except that I could possibly FOI a public body for this information (“please send me a list of payments you have received from these bodies…”).

As more and more public services get outsourced to private contractors, I wonder if those private contractors will start to buy services off the public providers? I may be able to FOI the public providers for their receipts data (any examples of this, successful or otherwise?), but I wouldn’t be able to find any publicly disclosed payments data from the private provider to the public provider.

The transparency matrix thus looks something like this:

  • payment from public body to public body: payment disclosed as public data, receipts available from analysis of all public body payment data (and reciipts FOIable from receiver?)
  • payment from public body to private body: payment disclosed as public data; total public payments to private body can be ascertained by inspecting payments data of all public bodies. Effective fan-in can be increased by setting up different companies to receive payments and make it harder to aggregate total public monies incoming to a corporate group. (Would be useful if private companied has to disclose: a) total amount of public monies received from any public source, exceeding some threshold; b) individual payments above a certain value from a public body)
  • payment from private body to public body: receipt FOIable from public body? No disclosure requirement on private body? Private body can effectively reduce fan out (that is, easily identified concentration of outgoing payments) by setting up different companies through which payments are made.
  • payment from private body to private body: no disclosure requirements.

I have of course already wondered Do We Need Open Receipts Data as Well as Open Spending Data?. My current take on this would perhaps argue in favour of requiring all bodies, public or private, that receive more than £25,000, for example, in total per financial year from a particular corporate group* to declare all the transactions (over £500, say) from that body. A step on the road towards that would be to require bodies that receive more than a certain amount of receipts summed from across all public bodies to be subject to FOI at least in respect of payments data received from public bodies.

* We would need to define a corporate group somehow, to get round companies setting up EvilCo Public Money Receiving Company No. 1, EvilCo Public Money Receiving Company No. 2354 Ltd, etc, each of which only ever invoices up to £24,999. There would also have to be a way of identifying payments from the same public body but made through different accounts (for example, different local council directorates).

Whilst this would place a burden on all bodies, it would also start to level out the asymmetry between public body reporting and private body reporting in the matter of publicly funded transactions. At the moment, private company overheads for delivering subcontracted public services are less than public body overheads for delivering the same services in the matter of, for example, transparency disclosures, placing the public body at a disadvantage compared to the private body when it comes to transparency disclosures. (Note that things may be changing, at least in FOI stakes… See for example the latter part of Some Notes on Extending FOI.)

One might almost think the government was promoting transparency of public services gleeful in the expectation that as there privatisation agenda moves on a decreasing proportion of service providers will actually have to make public disclosures. Again, this asymmetry would make for unfair comparisons between service providers based on publicly available data if only data from public body providers of public services, rather than private providers of tendered public services, had to be disclosed.

So the take home, which has got diluted somewhat, is the proposal that the joint notions of fan in and fan out, when it comes to payment/receipts data, may be useful when it comes to helping us think about out how easy it is to concentrate data/information about payments to, or from, a particular body, and how policy can be defined to shine light where it needs shining.

Comments?

Written by Tony Hirst

April 17, 2014 at 7:58 pm

Posted in Open Data, Thinkses

Losing Experimental Edtech Value from IPython Notebooks Because of New Security Policies?

Just like the way VLEs locked down what those who wanted to try to stuff out could do with educational websites, usually on the grounds of “security”, so a chunk of lightweight functionality with possible educational value that I was about to start to exploring inside IPython notebooks has been locked out by the new IPython notebook security policy:

Affected use cases
Some use cases that work in IPython 1.0 will become less convenient in 2.0 as a result of the security changes. We do our best to minimize these annoyance, but security is always at odds with convenience.

Javascript and CSS in Markdown cells
While never officially supported, it had become common practice to put hidden Javascript or CSS styling in Markdown cells, so that they would not be visible on the page. Since Markdown cells are now sanitized (by Google Caja), all Javascript (including click event handlers, etc.) and CSS will be stripped.

Here’s what I’ve been exploring – using a simple button:

ipynb button

to reveal an answer:

ipynb button reveal

It’s a 101 interaction style in “e-learning” (do we still call it that?!) and one that I was hoping to explore more given the interactive richness of the IPython notebook environment.

Here’s how I implemented it – a tiny bit of Javascript hidden in one of the markdown cells:

<script type="text/javascript">
   function showHide(id) {
       var e = document.getElementById(id);
       if(e.style.display == 'block')
          e.style.display = 'none';
       else
          e.style.display = 'block';
   }
</script>

and then a quick call from a button onclick event handler to reveal the answer block:

<input type="button" value="Answer" onclick="showHide('ans2')">

<div id="ans2" style="display:none">I can see several ways of generating common identifiers:

<ul><li>using the **gss** code from the area data, I could generate identifiers of the form `http://http://statistics.data.gov.uk/id/statistical-geography/GSS`</li>
<li>from the housing start data, I could split the *Reference Area* on space characters and then extract the GSS code from the first item in the split list</li>
<li>The *districtname* in the area data looks like it make have "issues" with spacing in area names. If we remove spaces and turn everything to lower case in the area data *districtname* and the *Reference Area* in the housing data, we *may* be able create matching keys. But it could be a risky strategy...</li>
</ul></div>

This won’t work anymore – and I don’t have the time to learn whether custom CSS can do this, and if so, how.

I don’t really want to have to go back to the approach I tried before I demoed the button triggered reveal example to myself…

ipynb another interaction

That is, putting answers into a python library and then using code to pull the text answer in…

ipynb color styling

Note also the use of colour in the cells – this is something else I wanted to try to explore, the use of styling to prompt interactions; in the case of IPython notebooks, I quite like the idea of students taking ownership of the notebooks and adding content to it, whether by adding commentary text to cells we have written in, adding their own comment cells (perhaps using a different style – so a different cell type?), amending code stubs we have written, adding in their own code, perhaps as code complements to comment prompts we have provided, etc etc.

ipynb starting to think about different interactions...

The quick hack, try and see option that immediately came to mind to support these sorts of interaction seems to have been locked out (or maybe not – rather than spending half an hour on a quick hack I’ll have to spend have an hour reading docs…). This is exactly the sort of thing that cuts down on our ability to mix ideas and solutions picked up from wherever, and just try them out quickly; and whilst I can see the rationale, it’s just another of those things to add to the when the web was more open pile. (I was going to spend half an hour blogging a post to let other members of the course team I’m on know how to add revealed answers to their notebooks, but as I’ve just spent 18 hours trying to build a VM box that supports python3 and the latest IPythion notebook, I’m a bit fed up at the thought of having to stick with the earlier version py’n’notebook VM I built because it’s easier for us to experiment with…)

I have to admit that some of the new notebook features look like they could be interesting from a teaching point of view in certain subject areas – the ability to publish interactive widgets where the controls talk to parameters accessed via the notebook code cells, but that wasn’t on my to do list for the next week…

What I was planning to do was explore what we’d need to do to get elements of the notebook behaving like elements in OU course materials, under the assumption that our online materials have designs that go hand in hand with good pedagogy. (This is a post in part about OU stuff, so necessarily it contains the p-word.)

ou teaching styling

Something else on the to do list was to explore how to tweak the branding of the notebook, for example to add in an OU logo or (for my other day per week), a School of Data logo. (I need to check the code openness status of IPython notebooks… How bad form would it be to remove the IPy logo for example? And where should a corporate log go? In the toolbar, or at the top of the content part of the notebook? If you just contribute content, I guess the latter; if you add notebook functionality, maybe the topbar is okay?)

There are a few examples of styling notebooks out there, but I wonder – will those recipes still work?

Ho hum – this post probably comes across as negative about IPython notebooks, but it shouldn’t because they’re a wonderful environment (for example, Doodling With IPython Notebooks for Education and Time to Drop Calculators in Favour of Notebook Programming?). I’m just a bit fed up that after a couple of days graft I don’t get to have half and hour’s fun messing around with look and feel. Instead, I need to hit the docs to find out what’s possible and what isn’t because the notebooks are no longer an open environment as they were… Bah..:-(

Written by Tony Hirst

April 11, 2014 at 6:10 pm

Posted in Open Education, OU2.0, Tinkering

Tagged with ,

Follow

Get every new post delivered to your Inbox.

Join 757 other followers