The Four S’s of Real Data – And a Need for Data Technicians, Not Data Scientists?

“Meh” to the 4 Vs of “big data”, for most people, most of the time, real data is:

  • small: a few rows and a few columns;
  • slow: comes out rarely, often according to a trailing schedule (once a week, once a month, some time after the reported period),
  • spreadsheeted: it just is…
  • smelly: indications in the data that something is wrong with the way it’s been collected, processed or analysed. (Cf. code smells, spreadsheet smells).

At the same time, all data projects, big or small, often require folk to do a whole chunk of work with the data before they can actually get round to using it. (Much of the time spent on data projects is spent getting the data, cleaning it (is “J. Smith” the same as “J Smith”?; data-typing: is “1” the number one or a character “1”, should “12/1/17” be saved as a date (and it is day or month first? Is it the date or is it the period corresponding to that day, etc), putting it into a form you can work with (which may be a database, or a well formed spreadsheet), getting it into the right shape (that it, structured using rows and columns you can easily work with), and so on.

If the value you think you want from, and what you pay your, data scientist for is the stats’n’insights’n’data mining stuff, then should they be spending most of their time doing the grunt work, much of which relies on craft knowledge and skills? How many data scientists do we actually need if they arenlt spending all their time poking around fixing the plumbing?

Don’t we need more data technicians or data tech eng‘s (technical engineers) who can do the labour intensive stuff well (using their craft knowledge) as well as making a bit of sense from it (getting a bit of “insight” out of it based on familiarity with it) using the real data every company has? I just don’t get this whole “data science” hype thing… More people need to fix a dripping tap than a leaking high pressure, superheated steam valve in an online nuclear power station. So why the hype about a huge skills gap in the latter when what every company needs is someone who can do the former?

Personal Health Calendar Feeds and a Social Care Annunciator?

Over the last few weeks and months I’ve started pondering all sorts of health and care related stuff that may help when trying to support family members a couple of hundred miles away. One of the things we picked up on was a “friendly” digital clock display (often sold as a “dementia clock” or “memory loss calendar”), with a clear screen, and easy to read date and time.

The clock supports a variety of daily reminders (“Take your pills…”) and can also be programmed to display images or videos at set dates and times (“Doctor’s today”).

One of the things this reminded me of was the parliamentary annunciators, that detail the current activity in the House of Commons and House of Lords, and that can be found all over the parliamentary estate.

Which got me thinking:

  • what if I could send a short text message or reminder to the screen via SMS?
  • what if I could subscribe to a calendar feed from the device that could be interpreted to generate a range of alerts leading up to an event (*”Doctor’s tomorrow morning”*, *”Hospital this afternoon at 2pm”*).

(Lots of other ideas came to mind too, but the primary goal is to keep the device as simple as possible and the display as clear as possible, which includes being able to read it from a distance.)

The calendar feed idea also sparked a far more interesting idea – one of the issues of trying to support family members with ongoing health appointments is knowing when they are taking place, whether you need to go along to provide advocacy or support, whether hospital stays are being planned, and so on. Recent experience suggests that different bits of the NHS ac independently of each other:

  • the GP doesn’t know when hospital surgery has been booked, let alone when pre-op assessments requiring a hospital visit are scheduled;
  • district nurses don’t know when hospital visits are planned;
  • different parts of the hospital don’t know when other parts of the hospital have visits planned,

and so on…

In short, it seems that the hospital doesn’t seem to have a calendar associated with each patient.

As with “student first” initiatives in HE, I suspect “patient first” initiatives are more to do with tracking internal performance metrics and KPIs rather than initiatives formulated from the patient perspective, but a personal “health and social care calendar” could benefit a whole range of parties:

  • the patient, wanting to keep track of appointments;
  • health and social care agencies wanting to book appointments and follow up on appointments with other parts of the service;
  • family members looking to support the patient.

So I imagine a situation where a patient books a GP appointment, and the receptionist adds it to the patient’s personal calendar.

A hospital appointment is generated by a consultant and, along with the letter informing the patient of the date, the event is added to the patient’s calendar (possibly with an option to somehow acknowledge it, confirm it, cancel it?).

A patient asks the GP to add a family member to the calendar/calendar feed so they can also access it.

A range of privacy controls allow different parts of the health and social care system to request/make use of read access to a patient’s health and social care calendar.

The calendar keeps a memory of historical appointments as well as future ones. Options may be provided to say whether an appointment was attended, cancelled or rescheduled. Such information may be useful to a GP (“I see you had your appointment with the consultant last week…”) or consultant (“I see you have an appointment with your GP next week? It may be worth mentioning to them…”)

Hmmm…thinks… is this sort of thing has this sort of thing being explored (or has it been in the past?), or maybe considered at an NHS Hack Day? Or is it the sort of thing I could put together as an NHS tech funding pitch?

PS Some of the features of the Amazon Show could also work well in the context of a health/care annunciator, but… the Amazon Show is too feature rich and could easily lead to feature creep and complexity in use; I’d have “privacy concerns” using the Amazon backend and always on Alexa/Echo mic.

ergastR – R Wrapper for ergast F1 Results Data API

By the by, I’ve posted a first attempt at an R package – ergastR to wrap the ergast developer API, which is where I get chunks of data from for my f1datajunkie tinkerings.

You can find it on Github: psychemedia/ergastR.

The function names are the ones used in the Wrangling F1 Data With R book.

The R package needs a bit of tidying up and also needs work on the following: cacheing, so that we don’t keep hitting the ergast API unnecessarily; paged results handling (I fudge this a bit at the moment by explicitly setting a large results limit); and dual handling of ergast API versus downloaed ergast database requests (if a database connection string is passed, use that rather than make a call to the ergast API). But it’s a start… Feel free to raise issues via the repo:-)

In related news, Will Vaughan tipped me off to a Python package he’s started putting together to wrap the ergast API: ergast-python. He’s also making a start on some Wrangling F1 Data Jupyter notebooks that make use of the Python wrapper: wranglingf1data.

I Just Don’t Understand Why…

…there seems to be so much resistance in OU to Jupyter notebooks, when I’m seeing this sort of thing more and more….

Folk creating open educational resources to support their technical ramblings using IPython (which is to say, Jupyter) notebooks…

I just, …., whatever… #ffs

PS see also: Introducing learnr. I can just imagine what sort of response that would get… Whuurrr? Wossat? No idea…

Time to Revisit Tangle?

Engaged, as ever, in displacement activity, I just came across Distill (via https://deepmind.com/blog/distill-communicating-science-machine-learning/), yet another “modern medium for presenting research”, this time “in the area of Machine Learning”.

The most recent (April, 2017) paper, Why Momentum Really Works, includes several inline embedded interactive that let you explore some of the maths described in the paper.

Another of the interactive features allows you to play with the parameters of an equation and see the result:

This interaction reminded me of Bret Victor’s Tangle.js library, which is quite old now (I’m not sure if it is/ever needs to be maintained?).

Poking around in the Distill post, I couldn’t trivially see how to create my own versions of the above tangle-like interaction, which made me think that when publishing docs like that it would be really handy if folk also documented some simple, minimal how to’s on how the interactives were created.

I was also prompted to have a quick poke around to see whether tangle like features are supports in Jupyter notebooks or Rmd/knitr/shiny. It seems that there are some old demos of using tangle in those environments – bollwyvl/ipytangle and hadley/tanglekit – but I haven’t had a chance to try them to see if they still work…

PS In passing, I note that the RSTudio folk have just produced a toolkit for generating tutorials from RMarkdown docs – Introducing learnr. Lowering the boundaries to educators creating their own interactives – again. Just a shame so few want to try out such things and explore how we might be able to make use of them… :-(


Animated GIFs created using: Giphy Capture: