I couldn’t get to sleep last night mulling over thoughts that had surfaced after posting Time to Drop Calculators in Favour of Notebook Programming?. This sort of thing: what goes on when you get someone to add three and four?
Part of the problem is associated with converting the written problem into mathematical notation:
3 + 4
For more complex problems it may require invoking some equations, or mathematical tricks or operations (chain rule, dot product, and so on).
3 + 4
Cast the problem into numbers then try to solve it:
3 + 4 =
That equals gets me doing some mental arithmetic. In a calculator, there’s a sequence of button presses, then the equals gives the answer.
In a notebook, I type:
3 + 4
that is, I write the program in mathematicalese, hit the right sort of return, and get the answer:
The mechanics of finding the right hand side by executing the operations on the left hand side are handled for me.
Try this on WolframAlpha: integral of x squared times x minus three from -3 to 4
Or don’t.. do it by hand if you prefer.
I may be able to figure out the maths bit – figure out how to cast my problem into a mathematical statement – but not necessarily have the skill to solve the problem. I can get the method marks but not do the calculation and get the result. I can write the program. But running the programme, diving 3847835 by 343, calculating the square root of 26,863 using log tables or whatever means, that’s the blocker – that could put me off trying to make use of maths, could put me off learning how to cast a problem into a mathematical form, if all that means is that I can do no more than look at the form as if it were a picture, poem, or any other piece of useless abstraction.
So why don’t we help people see that casting the problem into the mathematical form is the creative bit, the bit the machines can’t do. Because the machines can do the mechanical bit:
Maybe this is the approach that the folk over at Computer Based Math are thinking about (h/t Simon Knight/@sjgknight for the link), or maybe it isn’t… But I get the feeling I need to look over what they’re up to… I also note Conrad Wolfram is behind it; we kept crossing paths, a few years ago…I was taken by his passion, and ideas, about how we should be helping folk see that maths can be useful, and how you can use it, but there was always the commercial blocker, the need for Mathematica licenses, the TM; as in Computer-Based Math™.
Then tonight, another example of interactivity, wired in to a new “learning journey” platform that again @sjgknight informs me is released out of Google 20% time…: Oppia (Oppia: a tool for interactive learning).
Here’s an example….
The radio button choice determines where we go next on the learning journey:
Nice – interactive coding environment… 3 + 4 …
What happens if I make a mistake?
History of what I did wrong, inline, which is richer than a normal notebook style, where my repeated attempts would overwrite previous ones…
Depending how many common incorrect or systematic errors we can identify, we may be able to add richer diagnostic next step pathways…
..but then, eventually, success:
The platform is designed as a social one where users can create their own learning journeys and collaborate on their development with others. Licensing is mandated as “CC-BY-SA 4.0 with a waiver of the attribution requirement”. The code for the platform is also open.
The learning journey model is richer and potentially far more complex in graph structure terms than I remember the attempts developed for the [redacted] SocialLearn platform, but the vision appears similar. SocialLearn was also more heavily geared to narrative textual elements in the exposition; by contrast, the current editing tools in Oppia make you feel as if using too much text is not a Good Thing.
So – how are these put together… The blurb suggests it should be easy, but Google folk are clever folk (and I’m not sure how successful they’ve been getting their previous geek style learning platform attempts into education)… here’s an example learning journey – it’s a state machine:
Each block can be edited:
When creating new blocks, the first thing you need is some content:
Then some interaction. For the interactions, a range of input types you might expect:
and some you might not. For example, these are the interactive/executable coding style blocks you can use:
There’s also a map input, though I’m not sure what dialogic information you can get from it when you use it?
After the interaction definition, you can define a set of rules that determine where the next step takes you, depending on the input received.
The rule definitions allow you to trap on the answer provide by the interaction dialogue, optionally provide some feedback, and then identify the next step.
The rule branches are determined by the interaction type. For radio buttons, rules are triggered on the selected answer. For text inputs, simple string distance measures:
For numeric inputs, various bounds:
For the map, what looks like a point within a particular distance of a target point?
The choice of programming languages currently available in the code interaction type kinda sets the tone about who might play with this…but then, maybe as I suggested to colleague Ray Corrigan yesterday, “I don’t think we’ve started to consider consequences of 2nd half of the chessboard programming languages in/for edu yet?”
All in all – I don’t know… getting the design right is what’ll make for a successful learning journey, and that’s not something you can necessarily do quickly. The interface is as with many Google interfaces, not the friendliest I’ve seen (function over form, but bad form can sometimes get in the way of the function…).
I was interested to see they’re calling the learning journeys explorations. The Digital Worlds course I ran some time ago used that word too, in the guise of Topic Explorations, but they were a little more open ended, and used a question mechanic used to guide reading within a set of suggested resources.
Anyway, one to watch, perhaps, erm, maybe… No badges as yet, but that would be candy to offer at the end of a course, as well as a way of splashing your own brand via the badges. But before that, a state machine design mountain to climb…
In a comment based conversation with Anne-Marie Cunningham/@amcunningham last night, it seems I’d made a few errors in the post Demographically Classed, mistakenly attributing the use of HES data by actuaries in the Extending the Critical Path report to the SIAS when it should have been a working group of (I think?!) the Institute and Faculty of Actuaries (IFoA). I’d also messed up in assuming that the HES data was annotated with ACORN and MOSAIC data by the researchers, a mistaken assumption that begged the question as to how that linkage was actually done. Anne-Marie did the journalistic thing and called the press office (seems not many hacks did…) and discovered that “The researchers did not do any data linkage. This was all done by NHSIC. They did receive it fully coded. They only received 1st half of postcode and age group. There was no information on which hospitals people had attended.” Thanks, Anne-Marie:-)
Note – that last point could be interesting: it would suggest that in the analysis the health effects were decoupled from the facility where folk were treated?
Here are a few further quick notes adding to the previous post:
- the data that will be shared by GPs will be in coded form. An example of the coding scheme is provided in this post on the A Better NHS blog – Care dot data. The actual coding scheme can be found in this spreadsheet from the HSCIC: Code set – specification for the data to be extracted from GP electronic records and described in Care Episode Statistics: Technical Specification of the GP Extract. The tech spec uses the following diagram to explain the process (p19):
I’m intrigued as to what they man by the ‘non-relational database’…?
As far as the IFoA report goes, an annotated version of this diagram to show how the geodemographic data from Experian and CACI was added, and then how personally identifiable data was stripped before the dataset was handed over to the IFoA ,would have been a useful contribution to the methodology section. I think over the next year or two, folk are going to have to spend some time being clear about the methodology in terms of “transparency” around ethics, anonymisation, privacy etc, whilst the governance issues get clarified and baked into workaday processes and practice.
Getting a more detailed idea of what data will flow and how filters may actually work under various opt-out regimes around various data sharing pathways requires a little more detail. The Falkland Surgery in Newbury provides a useful summary of what data in general GP practices share, including care.data sharing. The site also usefully provides a map of the control-codes that preclude various sharing principles (As simple as I [original site publisher] can make it!):
Returning the to care episode statistics reporting structure, the architecture to support reuse is depicted on p21 of the tech spec as follows:
There also appear to be two main summary pages of resources relating to care data that may be worth exploring further as a starting point: Care.data and Technology, systems and data – Data and information. Further resources are available more generally on Information governance (NHS England).
As I mentioned in my previous post on this topic, I’m not so concerned about personal privacy/personal data leakage as I am about trying to think trough the possible consequences of making bulk data releases available that can be used as the basis for N=All/large scale data modelling (which can suffer from dangerous (non)sampling errors/bias when folk start to opt-out), the results of which are then used to influence the development of and then algorithmic implementation of, policy. This issue is touched on in by blogger and IT, Intellectual Property and Media Law lecturer at the University of East Anglia Law School, Paul Bernal, in his post Care.data and the community…:
The second factor here, and one that seems to be missed (either deliberately or through naïveté) is the number of other, less obvious and potentially far less desirable uses that this kind of data can be put to. Things like raising insurance premiums or health-care costs for those with particular conditions, as demonstrated by the most recent story, are potentially deeply damaging – but they are only the start of the possibilities. Health data can also be used to establish credit ratings, by potential employers, and other related areas – and without any transparency or hope of appeal, as such things may well be calculated by algorithm, with the algorithms protected as trade secrets, and the decisions made automatically. For some particularly vulnerable groups this could be absolutely critical – people with HIV, for example, who might face all kinds of discrimination. Or, to pick a seemingly less extreme and far more numerous group, people with mental health issues. Algorithms could be set up to find anyone with any kind of history of mental health issues – prescriptions for anti-depressants, for example – and filter them out of job applicants, seeing them as potential ‘trouble’. Discriminatory? Absolutely. Illegal? Absolutely. Impossible? Absolutely not – and the experience over recent years of the use of black-lists for people connected with union activity (see for example here) shows that unscrupulous employers might well not just use but encourage the kind of filtering that would ensure that anyone seen as ‘risky’ was avoided. In a climate where there are many more applicants than places for any job, discovering that you have been discriminated against is very, very hard.
This last part is a larger privacy issue – health data is just a part of the equation, and can be added to an already potent mix of data, from the self-profiling of social networks like Facebook to the behavioural targeting of the advertising industry to search-history analytics from Google. Why, then, does care.data matter, if all the rest of it is ‘out there’? Partly because it can confirm and enrich the data gathered in other ways – as the Telegraph story seems to confirm – and partly because it makes it easy for the profilers, and that’s something we really should avoid. They already have too much power over people – we should be reducing that power, not adding to it. [my emphasis]
There are many trivial reasons why large datasets can become biased (for example, see The Hidden Biases in Big Data), but there are also deeper reasons why wee need to start paying more attention to “big” data models and the algorithms that are derived from and applied to them (for example, It’s Not Privacy, and It’s Not Fair [Cynthia Dwork & Deirdre K. Mulligan] and Big Data, Predictive Algorithms and the Virtues of Transparency (Part One) [John Danaher]).
The combined HES’n'insurance report, and the care.data debacle provides an opportunity to start to discuss some of these issues around the use of data, the ways in which data can be combined, the undoubted rise in data brokerage services. So for example, a quick pop over to CCR Dataand they’ll do some data enhancement for you (“We have access to the most accurate and validated sources of information, ensuring the best results for you. There are a host of variables available which provide effective business intelligence [including] [t]elephone number appending, [d]ate of [b]irth, MOSAIC”), [e]nhance your database with email addresses using our email append data enrichment service or wealth profiling. Lovely…
With the UK national curriculum for schools set to include a healthy dose of programming from September 2014 (Statutory guidance – National curriculum in England: computing programmes of study) I’m wondering what the diff will be on the school day (what gets dropped if computing is forced in?) and who’ll be teaching it?
A few years ago I spent way too much time engaged in robotics related school outreach activities. One of the driving ideas was that we could use practical and creative robotics as a hands-on platform in a variety of curriculum context: maths and robotics, for example, or science and robotics. We also ran some robot fashion shows – I particularly remember a two(?) day event at the Quay Arts Centre on the Isle of Wight where a couple of dozen or so kids put on a fashion show with tabletop robots – building and programming the robots, designing fashion dolls to sit on them, choosing the music, doing the lights, videoing the show, and then running the show itself in front of a live audience. Brilliant.
On the science side, we ran an extended intervention with the Pompey Study Centre, a study centre attached to the Portsmouth Football Club, that explored scientific principles in the context of robot football. As part of the ‘fitness training’ programme for the robot footballers, the kids had to run scientific experiments as they calibrated and configured their robots.
The robot platform – mechanical design, writing control programmes, working with sensors, understanding interactions with the real world, dealing with uncertainty – provided a medium for creative problem solving that could provide a context for, or be contextualised by, the academic principles being taught from a range of curriculum areas. The emphasis was very much on learning by doing, using an authentic problem solving context to motivate the learning of principles in order to be able to solve problems better or more easily. The idea was that kids should be able to see what the point was, and rehearse the ideas, strategies and techniques of informed problem solving inside the classroom that they might then be able to draw on outside the classroom, or in other classrooms. Needless to say, we were disrespectful of curriculum boundaries and felt free to draw on other curriculum areas when working within a particular curriculum area.
In many respects, robotics provides a great container for teaching pragmatic and practical computing. But robot kit is still pricey and if not used across curriculum areas can be hard for schools to afford. There are also issues of teacher skilling, and the set-up and tear-down time required when working with robot kits across several different classes over the same school day or week.
So how is the new computing curriculum teaching going to be delivered? One approach that I think could have promise if kids are expected to used text based programming languages (which they are required to do at KS3) is to use a notebook style programming environment. The first notebook style environment I came across was Mathematica, though expensive license fees mean I’ve never really used it (Using a Notebook Interface).
More recently, I’ve started playing with IPython Notebooks (“ipynb”; for example, Doodling With IPython Notebooks for Education).
(Start at 2 minutes 16 seconds in – I’m not sure that WordPress embeds respect the time anchor I set. Yet another piece of hosted WordPress crapness.)
For a history of IPython Notebooks, see The IPython notebook: a historical retrospective.
Whilst these can be used for teaching programming, they can also be used for doing simple arithmetic, calculator style, as well as simple graph plotting. If we’re going to teach kids to use calculators, then maybe:
1) we should be teaching them to use “found calculators”, such as on their phone, via the Google search box, in those two-dimensional programming surfaces we call spreadsheets, using tools such as WolframAlpha, etc;
2) maybe we shouldn’t be teaching them to use calculators at all? Maybe instead we should be teaching them to use “programmatic calculations”, as for example in Mathematica, or IPython Notebooks?
Maths is a tool and a language, and notebook environments, or other forms of (inter)active, executable worksheets that can be constructed and or annotated by learners, experimented with, and whose exercises can be repeated, provide a great environment for exploring how to use and work with that language. They’re also great for learning how the automated execution of mathematical statements can allow you to do mathematical work far more easily than you can do by hand. (This is something I think we often miss when teaching kids the mechanics of maths – they never get a chance to execute powerful mathematical ideas with computational tool support. One argument against using tools is that kids don’t learn to spot when a result a calculator gives is nonsense if they don’t also learn the mechanics by hand. I don’t think many people are that great at estimating numbers even across orders of magnitude even with the maths that they have learned to do by hand, so I don’t really rate that argument!)
Maybe it’s because I’m looking for signs of uptake of notebook ideas, or maybe it’s because it’s an emerging thing, but I noticed another example of notebook working again today, courtesy of @alexbilbie: reports written over Neo4J graph databases submitted to the Neo4j graph gist winter challenge. The GraphGist how to guide looks like they’re using a port of, or extensions to, an IPython Notebook, though I’ve not checked…
Note that IPython notebooks have access to the shell, so other languages can be used within them if appropriate support is provided. For example, we can use R code in the IPython notebook context.
Note that interactive, computationaal and data analysis notebooks are also starting to gain traction in certain areas of research under the moniker “reproducible research”. An example I came across just the other day was The Dataverse Network Project, and an R package that provides an interface to it: dvn – Sharing Reproducible Research from R.
In much the same way that I used to teach programming as a technique for working with robots, we can also teach programming in the context of data analysis. A major issue here is how we get data in to and out of a programming environment in an seamless way. Increasingly, data sources hosted online are presenting APIs (programmable interfaces) with wrappers that provide a nice interface to a particular programming language. This makes it easy to use a function call in the programming language to pull data into the programme context. Working with data, particularly when it comes to charting data, provides another authentic hook between maths and programming. Using them together allows us to present each as a tool that works with the other, helping answer the question “but why are learning this?” with the response “so now you can do this, see this, work with this, find this out”, etc. (I appreciate this is quite a utilitarian view of the value of knowledge…)
But how far can we go in terms of using “raw”, but very powerful, computational tools in school? The other day, I saw this preview of the Wolfram Language:
There is likely to be a cost barrier to using this language, but I wonder: why shouldn’t we use this style of language, or at least the notebook style of computing, in KS3 and 4? What are the barriers (aside from licensing cost and machine access) to using such a medium for teaching computing in context (in maths, in science, in geography, etc)?
Programming puritans might say that notebook style computing isn’t real programming… (I’m not sure why, but I could imagine they might… erm… anyone fancy arguing that line in the comments?!:-) But so what? We don’t want to teach everyone to be a programmer, but we do maybe want to help them realise what sorts of computational levers there are, even if they don’t become computational mechanics?
So it seems that in a cost-recovered data release that was probably lawful then but possibly wouldn’t be now* – Hospital records of all NHS patients sold to insurers – the
Staple Inn Actuarial Society Critical Illness Definitions and Geographical Variations Working Party (of what, I’m not sure? The Institute and Faculty of Actuaries, perhaps?) got some Hospital Episode Statistics data from the precursor to the HSCIC, blended it with some geodemographic data**, and then came to the conclusion that “that the use of geodemographic profiling could refine Critical illness pricing bases” (source: Extending the Critical Path), presenting the report to the Staple Inn Actuarial Society who also headline branded the PDF version of the report? Maybe?
* House of Commons Health Committee, 25/2/14: 15.59:32 for a few minutes or so; that data release would not be approved now: 16.01:15 reiterated at 16.03:05 and 16.07:05
** or maybe they didn’t? Maybe the data came pre-blended, as @amcunningham suggests in the comments? I’ve added a couple of further questions into my comment reply… – UPDATE: “HES was linked to CACI and Experian data by the Information Centre using full postcode. The working party did not receive any identifiable data.”
CLARIFICATION ADDED (source )—-
“In a story published by the Daily Telegraph today research by the IFoA was represented as “NHS data sold to insurers”. This is not the case. The research referenced in this story considered critical illness in the UK and was presented to members of the Staple Inn Actuarial Society (SIAS) in December 2013 and was made publically available on our website.
“The IFoA is a not for profit professional body. The research paper – Extending the Critical Path – offered actuaries, working in critical illness pricing, information that would help them to ask the right questions of their own data. The aim of providing context in this way is to help improve the accuracy of pricing. Accurate pricing is considered fairer by many consumers and leads to better reserving by insurance companies.
There was also an event on 17 February 2014.
Via a tweet from @SIAScommittee, since deleted for some reason(?), this is clarified further: “SIAS did not produce the research/report.”
The branding that mislead me – I must not be so careless in future…
Many of the current agreements about possible invasions of privacy arising from the planned care.data release relate to the possible reidentification of individuals from their supposedly anonymised or pseudonymised health data (on my to read list: NHS England – Privacy Impact Assessment: care.data) but to my mind the
SIAS report presented to the SIAS suggests that we also need to think about consequences of the ways in which aggregated data is analysed and used (for example, in the construction of predictive models). Where aggregate and summarised data is used as the basis of algorithmic decision making, we need to be mindful that sampling errors, as well as other modelling assumptions, may lead to biases in the algorithms that result. Where algorithmic decisions are applied to people placed into statistical sampling “bins” or categories, errors in the assignment of individuals into a particular bin may result in decisions being made against them on an incorrect basis.
Rather than focussing always on the ‘can I personally be identified from the supposedly anonymised or pseudonymised data’, we also need to be mindful of the extent to, and ways in, which:
1) aggregate and summary data is used to produce models about the behaviour of particular groups;
2) individuals are assigned to groups;
3) attributes identified as a result of statistical modelling of groups are assigned to individuals who are (incorrectly) assigned to particular groups, for example on the basis of estimated geodemographic binning.
What worries me is not so much ‘can I be identified from the data’, but ‘are there data attributes about me that bin me in a particular way that statistical models developed around those bins are used to make decisions about me’. (Related to this are notions of algorithmic transparency – though in many cases I think this must surely go hand in hand with ‘binning transparency’!)
That said, for the personal-reidentification-privacy-lobbiests, they may want to pick up on the claim in the
SIASIFoA report (page 19) that:
In theory, there should be a one to one correspondence between individual patients and HESID. The HESID is derived using a matching algorithm mainly mapped to NHS number, but not all records contain an NHS number, especially in the early years, so full matching is not possible. In those cases HES use other patient identifiable fields (Date of Birth, Sex, Postcode, etc.) so imperfect matching may mean patients have more than one HESID. According to the NHS IC 83% of records had an NHS number in 2000/01 and this had grown to 97% by 2007/08, so the issue is clearly reducing. Indeed, our data contains 47.5m unique HESIDs which when compared to the English population of around 49m in 1997, and allowing for approximately 1m new lives a year due to births and inwards migration would suggest around 75% of people in England were admitted at least once during the 13 year period for which we have data. Our view is that this proportion seems a little high but we have been unable to verify that this proportion is reasonable against an independent source.
Given two or three data points, if this near 1-1 correspondence exists, you could possibly start guessing at matching HESIDs to individuals, or family units, quite quickly…
- ACORN (A Classification of Residential Neighbourhoods) is CACI’s geodemographic segmentation system of the UK population. We have used the 2010 version of ACORN which segments postcodes into 5 Categories, 17 Groups and 57 Types.
- Mosaic UK is Experian’s geodemographic segmentation system of the UK population. We have used the 2009 version of Mosaic UK which segments postcodes into 15 Groups and 67 Household Types.
The ACORN and MOSAIC data sets seem to provide data at the postcode level. I’m not sure how this was then combined with the HES data, but it seems the
SIASIFoA folk found a way (p 29) [or as Anne-Marie Cunningham suggests in the comments, maybe it wasn't combined by SIASIFoA - maybe it came that way?]:
The HES data records have been encoded with both an ACORN Type and a Mosaic UK Household Type. This enables hospital admissions to be split by ACORN and Mosaic Type. This covers the “claims” side of an incidence rate calculation. In order to determine the exposure, both CACI and Experian were able to provide us with the population of England, as at 2009 and 2010 respectively, split by gender, age band and profiler.
This then represents another area of concern – the extent to which even pseudonymised data can be combined with other data sets, for example based on geo-demographic data. So for example, how are the datasets actually combined, and what are the possible consequences of such combinations? Does the combination enrich the dataset in such a way that makes it easier for use to deanonymise either of the original datasets (if that is your primary concern); or does the combination occur in such a way that it may introduce systematic biases into models that are then produced by running summary statistics over groupings that are applied over the data, biases that may be unacknowedged (to possibly detrimental effect) when the models are used for predictive modelling, pricing models or as part of policy-making, for example?
Just by the by, I also wonder:
- what data was released lawfully under the old system that wouldn’t be allowed to be released now, and to whom, and for what purpose?
- are the people to whom that data was released allowed to continue using and processing that data?
- if they are allowed to continue using that data, under what conditions and for what purpose?
- if they are not, have they destroyed the data (16.05:44), for example by taking a sledgehammer to the computers the data was held on in the presences of NHS officers, or by whatever other means the state approves of?
I was pleased to be invited back to the University of Lincoln again yesterday to give a talk on data journalism to a couple of dozen or so journalism students…
I was hoping to generate a copy of the slides (as images) embedded in a markdown version of the notes but couldn’t come up with a quick recipe for achieving that…
When I get a chance, it looks as if the easiest way will be to learn some VBA/Visual Basic for Applications macro scripting… So for example:
If anyone beats me to it, I’m actually on a Mac, so from the looks of things on Stack Overflow, hacks will be required to get the VBA to actually work properly?
Via my feeds, I noticed this the other day: Google is pushing a new content-recommendation system for publishers, in which VentureBeat quoted a Google originated email sent to them: “Our engineers are working on a content recommendation beta that will present users relevant internal articles on your site after they read a page. This is a great way to drive loyal users and more pageviews.” Hmm.. what’s taken them so long?
(FWIW, use contextual ad-servers to serve content has been one of those ideas that I keep coming round to but never really pursuing:for example, Contextual Content Server, Courtesy of Google?, Google Banner Ads as On-Campus Signage? or Contextual Content Delivery on Higher Ed Websites Using Ad Servers.)
Reflecting on this, I started thinking again about the uses to which we might be able to put adservers. It struck me that one way is actually to use them to serve… ads.
One of the things I’ve noticed about the Open Knowledge Foundation (disclaimer: I work one day a week for the Open Knowledge Foundation’s School of Data) is that it throws up a lot of websites. Digging out a couple of tricks from Pondering Mapping the Pearson Network, I spot at least these domains, for example:
An emergent social positioning map around the School)fData twitter account also identifies a wealth of OKF related projects and local chapters (bottom region of the map), many of which will also run their own web presence:
One of the issues associated with such a widely dispersed and loosely coupled networked organisation relates to the running of campaigns, and promoting strong single campaign issue messages out across the various websites. So I wonder: would an internal adserver work???
… you define web sites, and for each website you then define one or more zones. A zone is a representation of a place on the web pages where the adverts must appear. For every zone, Revive Adserver generates a small piece of HTML code, which must be placed in the site at the exact spot where the ads must be displayed. …
You must also create advertisers, campaigns and advertisements …
The final step is to link the right campaigns to the right zones. This determines which ads will be displayed where. You can combine this with various forms of ‘targeting’, which means you can adjust the advertising to specific situations.
So…each website in the OKF sprawl could include a local adserver zone and display OKF ads. Such ads might be campaign related, or announcements of upcoming dates and events likely to be relevant across the OKF network (for example, internationl open data days, or open data census days).
Other ad blocks/zones could be defined to serve content from particular ad channels or campaigns.
Ad/content could in part be editorially controlled from the centre – for example, a campaign manager might be responsible for choosing which ads are in the pool for a particular campaign or set of campaigns. Site owners might allocate different zones that they can sign up to different ad channels that only serve ad/content on a particular theme?
Members of local groups and project teams could submit ads to the adserver relating to their projects or group activities, with associated campaign codes and topics so that content can be suitably targeted by the platform. The adserver thus also becomes a(nother) possible communications channel across the network.
One of the big issues when it comes to working with data is that things are often done most easily using small fragments of code. Something I’ve been playing with recently are IPython notebooks, which provide an interactive browser based interface to a Python shell.
The notebooks are built around the idea of cells of various types, including header and markdown/HTML interpreted cells and executable code cells. Here are a few immediate thoughts on how we might start to use these notebooks to support distance and self-paced education:
The output from executing a code cell is displayed in the lower part of the cell when the cell is run. Code execution causes state changes in the underlying IPython session, the current state of which is accessible to all cells.
Graphical outputs, such as chart objects generated using libraries such as matplotlib, can also be displayed inline (not shown).
There are several ways we might include a call to action for students:
* invitations to run code cells;
* invitations to edit and run code cells;
* invitations to enter commentary or reflective notes.
We can also explore ways of providing “revealed” content. One way is to make use of the ability to execute code, for example by importing a crib class that we can call…
Here’s how we might prompt its use:
All code cells in a notebook can be executed one after another, or a cell at a time (You can also execute all cells above or below the cursor). This makes for easy testing of the notebook and self-paced working through it.
A helper application, nbconvert, allows you to generate alternative versions of a notebook, whether as HTML, python code, latex, or HTML slides (reveal.js). There is also a notebook viewer available that displays an HTML view of a specified ipynb notebook. (There is also an online version: IPython notebook viewer.)
Another advantage of the browser based approach is that the IPython shell can run in a virtual machine (for example, Cursory Thoughts on Virtual Machines in Distance Education Courses) and expose the notebook as a service that can be accessed via a browser on a host machine:
A simple configuration tweak allows notebook files and data files to occupy a folder that is shared across both the host machine and the guest virtual machine.
It is also possible to run remote notebook servers that can be accessed via the web. It would be nice if institutional IT services could support the sort of Agile EdTech that Jim Groom has been writing about recently, that would allow course teams developers to experiment with this sort of technology quickly and easily, but in the meantime, we can still do it ourselves…
For example, I am currently running a couple of virtual machines whose configurations I have “borrowed” from elsewhere – Matthew Russell’s Mining the Social Web 2nd Edition VM, and @datamineruk’s pandas infinite-intern machine.
I’ve only really just started exploring what may be possible with IPython notebooks, and how we might be able to use them as part of an e-learning course offering. If you’ve seen any particularly good (or bad!) examples of IPython notebooks being used in an educational context, please let me know via the comments…:-)