What Happens When “Computers” Are Replaced by Tablets and Phones?

With personal email services managed online since what feels like forever (and probably is “forever”, for many users), personally accessed productivity apps delivered via online services (perhaps with some minimal support for in-browser, offline use) – things like Microsoft Office Online or Google Docs – video and music services provided via online streaming services, rather than large file downloads, image galleries stored in the cloud and social networking provided exclusively online, and in the absence of data about connecting devices (which is probably available from both OU and OU-owned FutureLearn server logs), I wonder if the OU strategists and curriculum planners are considering a future where a significant percentage of OUr distance education students do not have access to a “personal (general purpose) computer” onto which arbitrary software applications can be installed rather than from which they can simply be accessed, but do have access to a network connection via a tablet device, and perhaps a wireless keyboard?

And if the learners do have access to a desktop or laptop computer, what happens if that is likely to be a works machine, or perhaps a public access desktop computer (though I’m not sure how much longer they will remain around), probably with administrative access limits on it (if the OU IT department’s obsession with minimising general purpose and end-user defined computing is anything to go by…)

If we are to require students to make use of “installed software” rather than software that can be accessed via browser based clients/user interfaces, then we will need to ask the access question: is it fair to require students to buy a desktop computer onto which software can be installed purely for the purposes of their studies, given they presumably have access otherwise to all the (online) digital services they need?

I seem to recall that the OU’s student computing requirements are now supposed to be agnostic as to operating system (the same is not true internally, unfortunately, where legacy systems still require Windows and may even require obsolete versions of IE!;-) although the general guidance on the matter is somewhat vague and perhaps not a little out of date…?!

I wish I’d kept copies of OU computing (and network) requirements over the years. Today, network access is likely to come in the form of either wired, fibre, or wireless broadband access (the latter particularly in rural areas, (example) or (for the cord-cutters), a mobile/3G-4G connection; personal computing devices that connect to the network are likely to be smartphones, tablets, laptop computers, Chromebooks and their ilk, and gaming desktop machines. Time was when a household was lucky to have a single personal desktop computer, a requirement that became expected of OU students. I suspect that is still largely true… (the yoof’s gaming machine; the 10 year old “office” machine).

If we require students to run “desktop” applications, should we then require the students to have access to computers capable of installing those applications on their own computer, or should we be making those applications available in a way that allows them to be installed and run anywhere – either on local machines (for offline use), or on remote machines (either third party managed or managed by the OU) where a network connection is more or less always guaranteed?

One of the reasons I’m so taken by the idea of containerised computing is that it provides us with a mechanism for deploying applications to students that can be run in a variety of ways. Individuals can run the applications on their own computers, in the cloud, via service providers accessed and paid for directly by the students on a metered basis, or by the OU.

Container contents can be very strictly version controlled and archived, and are easily restored if something should go wrong (there are various ‘switch-it-off-and-switch-it-on-again’ possibilities with several degrees of severity!) Container image files can be distributed using physical media (USB memory sticks, memory cards) for local use, and for OU cloud servers, at least, those images could be pre-installed on student accessed container servers (meaning the containers can start up relatively quickly…)

If updates are required, these are likely to be lightweight – only those bits of the application that need updating will be updated.

At the moment, I’m not sure how easy it is to arbitrarily share a data container containing a student’s work with application containers that are arbitrarily launched on various local and remote hosts? (Linking containers to Dropbox containers is one possibility, but they would perhaps be slow to synch? Flocker is perhaps another route, with its increased emphasis on linked data container management?)

If any other educational institutions, particularly those involved in distance education, are looking at using containers, I’d be interested to hear what your take is…

And if any folk in the OU are looking at containers in any context (teaching, research, project work), please get in touch – I need folk to bounce ideas around with, sanity check with, and ask for technical help!;-)

Notes on Robot Churnalism, Part I – Robot Writers

In Some Notes on Churnalism and a Question About Two Sided Markets, I tried to pull together a range of observations about the process of churnalism, in which journalists propagate PR copy without much, if any, critique, contextualisation or corroboration.

If that view in any way represents a fair description of how some pre-packaged content, at least, makes its way through to becoming editorial content, where might the robots fit in? To what extent might we start to see “robot churnalism“, and what form or forms might it take?

There are two particular ways in which we might consider robot churnalism:

  1. “robot journalists” that produce copy acts as a third conveyor belt complementary to PA-style wire and PR feedstocks;
  2. robot churnalists as ‘reverse’ gatekeepers, choosing what wire stories to publish where based on traffic stats and web analytics.

A related view is taken by Philip Napoli (“Automated media: An institutional theory perspective on algorithmic media production and consumption.” Communication Theory 24.3 (2014): 340-360; a shorter summary of the key themes can be found here) who distinguishes roles for algorithms in “(a) media consumption and (b) media production”. He further refines the contributions algorithms may make in media production by suggesting that “[t]wo of the primary functions that algorithms are performing in the media production realm at this point are: (a) serving as a demand predictor and (b) serving as content creator.”

Robot Writers

“Automated content can be seen as one branch of what is known as algorithmic news” writes Christer Clerwall (2014, Enter the Robot Journalist, Journalism Practice, 8:5, pp519-531), a key component of automated journalism “in which a program turns data into a news narrative, made possible with limited — or even zero — human input” (Matt Carlson (2015) The Robotic Reporter, Digital Journalism, 3:3, 416-431).

In a case study based around the activities of Narrative Science, a company specialising in algorithmically created, data driven narratives, Carlson further conceptualises “automated journalism” as “algorithmic processes that convert data into narrative news texts with limited to no human intervention beyond the initial programming”. He goes on:

The term denotes a split from data analysis as a tool for reporters encompassed in writings about “computational and algorithmic journalism” (Anderson 2013) to indicate wholly computer-written news stories emulating the compositional and framing practices of human journalism (ibid, p417).

Even several years ago, Arjen van Dalen observed that “[w]ith the introduction of machine-written news computational journalism entered a new phase. Each step of the news production process can now be automated: “robot journalists” can produce thousands of articles with virtually no variable costs” (The Algorithms Behind the Headlines, Journalism Practice, 6:5-6, 648-658, 2012, p649).

Sport and financial reporting examples abound from the bots of Automated Insights and Narrative Science (for example, Notes on Narrative Science and Automated Insights or Pro Publica: How To Edit 52,000 Stories at Once, and more recently e.g. Robot-writing increased AP’s earnings stories by tenfold), with robot writers generating low-cost content to attract page views, “producing content for the long tail, in virtually no time and with low additional costs for articles which can be produced in large quantities” (ibid, p649).

Although writing back in 2012, van Dalen noted in his report on “the responses of the journalistic community to automatic content creation” that:

[t]wo main reasons are mentioned to explain why automated content generation is a trend that needs to be taken seriously. First, the journalistic profession is more and more commercialized and run on the basis of business logics. The automation of journalism tasks fits in with the trend to aim for higher profit margins and lower production costs. The second reason why automated content creation might be successful is the quality of stories with which it is competing. Computer-generated news articles may not be able to compete with high quality journalism provided by major news outlets, which pay attention to detail, analysis, background information and have more lively language or humour. But for information which is freely available on the Internet the bar is set relatively low and automatically generated content can compete (ibid, p651).

As Christer Clerwall writes in Enter the Robot Journalist, (Journalism Practice, 8:5, 2014, pp519-531):

The advent of services for automated news stories raises many questions, e.g. what are the implications for journalism and journalistic practice, can journalists be taken out of the equation of journalism, how is this type of content regarded (in terms of credibility, overall quality, overall liking, to mention a few aspects) by the readers? p520.

van Dalen puts it thus:

Automated content creation is seen as serious competition and a threat for the job security of journalists performing basic routine tasks. When routine journalistic tasks can be automated, journalists are forced to offer a better product in order to survive. Central in these reflections is the need for journalists to concentrate on their own strengths rather than compete on the strengths of automated content creation. Journalists have to become more creative in their writing, offer more in-depth coverage and context, and go beyond routine coverage, even to a larger extent than they already do today (ibid, p653).

He then goes on to produce the following SWOT analysis to explore just how the humans and the robots compare:

algo_behind_headlines

One possible risk associated with the automated production of copy is that it becomes published without human journalistic intervention, and as such is not necessarily “known”, or even read, by any member at all of the publishing organisation. To paraphrase Daniel Jackson and Kevin Moloney, “Inside Churnalism: PR, journalism and power relationships in flux”, Journalism Studies, 2015, this would represent an extreme example of churnalism in the sense of “the use of unchecked [robot authored] material in news”.

This is dangerous, I think, on many levels. The more we leave the setting of the news agenda and the identification of news values to machines, the more we lose any sensitivity to what’s happening in the world around us and what stories are actually important to an audience as opposed to merely being Like-bait titillation. (As we shall see, algorithmic gatekeepers that channel content to audiences based on various analytics tools respond to one definition of what audiences value. But it is not clear that these are necessarily the same issues that might weigh more heavily in a personal-political sense. Reviews of the notion of “hard” vs. “soft” news (e.g. Scherr, S., & Legnante, G. (2011). Hard and soft news: A review of concepts, operationalizations and key findings. Journalism, 13(2) pp221–239)) may provide lenses to help think about this more deeply?)

Of course, machines can also be programmed to look for links and patterns across multiple sources of information and at far greater scale than a human journalist could hope to cover, but we are then in danger of creating some sort of parallel news world, where events are only recognised, “discussed” and acted upon by machines and human actors are oblivious to them. (For an example, The Wolf of Wall Tweet: A Web-reading bot made millions on the options market. It also ate this guy’s lunch that describes how bots read the news wires and trade off the back them. They presumably also read wire stories created by other bots…)

So What It Is That Robot Writers Actually Do All Day?

In a review of Associated Press’ use of Automated Insight’s Wordsmith application (In the Future, Robots Will Write News That’s All About You), Wired reported that Wordsmith “essentially does two things. First, it ingests a bunch of structured data and analyzes it to find the interesting points, such as which players didn’t do as well as expected in a particular game. Then it weaves those insights into a human readable chunk of text.”

One way of getting deeper into the mind of a robot writer is to look to the patents held by the companies who develop such applications. For example, in The Anatomy of a Robot Journalist, one process used by Narrative Science is characterised as follows:

narrativeScience

Identifying newsworthy features is a process of identifying features and then filtering out the ones that are somehow notable. Angles are possibly defined as in terms of sets of features that need to be present within a particular dataset for that angle to provide a possible frame for story. The process of reconciling interesting features with angle points populates the angle with known facts, and a story engine then generates the natural language text within a narrative structure suited to an explication of the selected angle.

(An early – 2012 – presentation by Narrative Science’s Larry Adams also reviews some of the technicalities: Using Open Data to Generate Personalized Stories.)

In actual fact, the process may be a relatively straightforward one, as demonstrated by the increasing numbers of “storybots” that populate social media. One well known class of examples are earthquake bots that tweet news of earthquakes (see also: When robots help human journalists: “This post was created by an algorithm written by the author”). (It’s easy enough to see various newsworthiness filters might work here: a geo-based one for reporting a story locally, a wider interest one for reporting an earthquake above a particular magnitude, and so on.)

It’s also easy enough to create your own simple storybot (or at least, an “announcer bot”) using something like IFTT that can take in an RSS feed and make a tweet announcement about each new item. A collection of simple twitterbots produced as part of a journalism course on storybots, along with code examples, can be found here: A classroom experiment in Twitter Bots and creativity. Here’s another example, for a responsive weatherbot that tries to geolocate someone sending a message to the bot and respond to them with a weather report for their location.


Not being of a journalistic background, and never having read much on media or communications theory, I have to admit I don’t really have a good definition for what angles are, or a typology for them in different topic areas, and I’m struggling to find any good structural reviews of the idea, perhaps because it’s so foundational? For now, I’m sticking with a definition of “an angle” as being something along the lines of the thing you want focus on and dig deeper around within the story (the thing you want to know more about or whose story you want to tell; this includes abstract things: the story of an indicator value for example, over time). The blogpost Framing and News Angles: What is Bias? contrasts angles with the notions of framing and bias. Entman, Robert M. “Framing: Towards clarification of a fractured paradigm.” McQuail’s reader in mass communication theory (1993): 390-397 [pdf] seems foundational in terms of the framing idea, De Vreese, Claes H. “News framing: Theory and typology.” Information design journal & document design 13.1 (2005): 51-62 [PDF] offers a review (of sorts) of some related literature, and Reinemann, C., Stanyer, J., Scherr, S., & Legnante, G. (2011). Hard and soft news: A review of concepts, operationalizations and key findings. Journalism, 13(2) pp221–239 (PDF) perhaps provides another way in to related literature? Bias is presumably implicit in the selection of any particular frame or angle? Blog posts such as What makes a press release newsworthy? It’s all in the news angle look to be linkbait, perhaps even stolen content (eg here’s a PDF), but I can’t offhand find a credible source or inspiration for the original list? Resource packs like this one on Working with the Media from the FAO gives a crash course into what I guess are some of the generally taught basics around story construction?


Festival Segregation

Isle of Wight Festival time again, and some immediate reflections from the first day…

I seem to remember a time-was-when festival were social levellers – unless you were crew or had a guest pass that got you backstage. Then the backstage areas started to wend their way up the main stage margins so the backstage guests could see the stage from front-of-stage. Then you started to get the front-of-stage VIP areas with their own bars, and a special access area in front of the stage to give you a better view and keep you away from the plebs.

There has also been a growth in other third party retailed add-ons – boutique camping, for example:

Isle_of_Wight_Festival_2015_-_11th-14th_June

and custom toilets:

Isle_of_Wight_Festival_2015

One of the things I noticed about the boutique camping areas (which are further distinguished from the VIP camping areas…) was that they are starting to include their own bars, better toilets, and so on. Gated communities, for those who can afford a hefty premium on top of the base ticket price. Or a corporate hospitality/hostility perk.

I guess festivals always were a “platform” creating two sided markets that could sell tickets to punters, location to third party providers (who were then free to sell goods and services to the audience), sponsorship of every possible surface. But the festivals were, to an extent, open; level-playing fields. Now they’re increasingly enclosed. So far, the music entertainment has remained free. But how long before you have to start paying to access “exclusive” events in some of the music tents?

PS I wonder: when it comes to eg toilet capacity planning, are the boutique poo-stations over-and-above capacity compared to the capacity provided by the festival promoter to meet sanitation needs, or are they factored in as part of that core capacity? Which is to say, if no-one paid the premium, would the minimum capacity requirements still be met?

PPS I also note that the IW Festival had a heliport this year (again…?)

PPPS On the toilet front, the public toilets all seemed pretty clean this year… and what really amused me was seeing a looooonnngggg queue for the purchased-access toilets…

Spotting Potential Battles in F1 Races

Over the last couple of races, I’ve started trying to review a variety of battlemaps for various drivers in each race. Prompted by an email request for more info around the battlemaps, I generated a new sketch charting the on track gaps between each driver and the lap leader for each lap of the race (How the F1 Canadian Grand Prix Race Evolved on Track).

Colour is used to identify cars on lead lap compared to lapped drivers. For lapped drivers, a count of how many laps they are behind the leader is displayed. I additionally overplot with a highlight for specified driver, as well as adding in a mark that shows the on track position of the leader of the next lap, along with their driver code.

Rplot06

Battles can be identified through the close proximity of two or more drivers within a lap, across several laps. The ‘next-lap-leader’ time at the far right shows how close the leader on the next lead lap is to the backmarker (on track) on the current lead lap.

By highlighting two particular drivers, we could compare how their races evolved, perhaps highlighting different strategies used within a race that eventually bring the drivers into a close competitive battle in the last few laps of a race.

The unchanging leader-on-track-delta-of-0 line is perhaps missing an informational opportunity? For example, should we set the leader’s time to be the delta compared to the lap time for the leader laps from the previous lead lap? Or a delta compared to the fastest laptime on the previous lead lap? And if we do start messing about with an offset to the leader’s lap time, we presumably need to apply the same offset to the laptime of everyone else on the lap so we can still see the comparative on-track gaps to leader?

On the to-do list are various strategies for automatically identifying potential battles based on a variety of in-lap and across-lap heuristics.

Here’s the code:

#Grab some data
lapTimes =lapsData.df(2015,7)

#Process the laptimes
lapTimes=battlemap_encoder(lapTimes)

#Find the accumulated race time at the start of each leader's lap
lapTimes=ddply(lapTimes,.(leadlap),transform,lstart=min(acctime))

#Find the on-track gap to leader
lapTimes['trackdiff']=lapTimes['acctime']-lapTimes['lstart']

#Construct a dataframe that contains the difference between the 
#leader accumulated laptime on current lap and next lap
#i.e. how far behind current lap leader is next-lap leader?
ll=data.frame(t=diff(lapTimes[lapTimes['position']==1,'acctime']))
#Generate a de facto lap count
ll['n']=1:nrow(ll)
#Grab the code of the lap leader on the next lap
ll['c']=lapTimes[lapTimes['position']==1 & lapTimes['lap']>1,'code']

#Plot the on-track gap to leader versus leader lap
g = ggplot(lapTimes) 
g = g + geom_point(aes(x=trackdiff,y=leadlap,col=(lap==leadlap)), pch=1)
g = g + geom_point(data=lapTimes[lapTimes['driverId']=='vettel',],
                  aes(x=trackdiff,y=leadlap), pch='+')
g = g + geom_text(data=lapTimes[lapTimes['lapsbehind']>0,],
                  aes(x=trackdiff,y=leadlap, label=lapsbehind),size=3)
g = g + geom_point(data=ll,aes(x=t, y=n), pch='x')
g = g + geom_text(data=ll,aes(x=t+3, y=n,label=c), size=2)
g = g + geom_vline(aes(xintercept=17), linetype=3)
g

This chart will be included in a future update to the Wrangling F1 Data With R book. I hope to do a sprint on that book to tidy it up and get it into a reasonably edited state in the next few weeks. At that point, the text will probably be frozen, a print-on-demand version generated, and if it ends up on Amazon, the minimum price being hiked considerably.

Notebooks, knitr and the Language-Markdown View Source Option…

One of the foundational principles of the web, though I suspect ever fewer people know it, is that you can “View Source” on a web page to see what bits of HTML, Javascript and CSS are used to create it.

In the WordPress editor I’m currently writing in, I’m using a Text view that lets me write vanilla HTML; but there is also a WYSIWYG (what you see is what you get) view that shows how the interpreted HTML text will look when it is rendered in the browser as a web page.

viewtext

Reflecting on IPython Markdown Opportunities in IPython Notebooks and Rstudio, it struck me that the Rmd (Rmarkdown) view used in RStudio, the HTML preview of “executed” Rmd documents generated from Rmd by knitr and the interactive Jupyter (IPython, as was) notebook view can be seen as standing in this sort of relation to each other:

rmd-wysiwyg

From that, it’s not too hard to imagine RStudio offering the following sort of RStudio/IPython notebook hybrid interface – with an Rmd “text” view, and with a notebook “visual” view (eg via an R notebook kernel):

viewrmd

And from both, we can generate the static HTML preview view.

In terms of underlying machinery, I guess we could have something like this:

rmdviewarch

I’m looking forward to it:-)

Google Gets Out of Personal Control?

This is a rant… It may or may not be coherent… it’s just me venting and trolling myself…

Earlier today I posted a selection of F1 battlemaps in a post on the F1DataJunkie blog, which is hosted on Blogger: F1 Canada 2015 Battlemaps – How the Race Happened from the Drivers’ Perspective. The charts were uploaded to the blog, which in turn means that they they’re stored on Google photos or whatever the service is called.

Being in a Blogger – and hence Google – context, a Google+ (or Google Accounts or whatever we’re supposed to call it now) profile button was present in the top right hand corner of the screen. It alerted me to some activity, and even though I generally avoid Google Plus, I think Blogger autoposts there, so I clicked through.

blogger_autoawesome

It seems that Google had created an animated gif (an “auto-awesome” picture) out of the images that were contained in the blog post and “added” it somewhere (?) for me.

In this case, the animation is a pure nonsense.

I don’t recall every having opted in to this content-creation-on-my-behalf, and I’m not really interested in Google taking my stuff and mucking about with it. (I know it does this when it resizes images, for example, but in that case, it doesn’t change the content. And I know it does who knows what with my data, and any content that goes any near any of its storage services so it can “better” “personalise” thing for me (as well as presumably using that content and context in a whole range of learning and training algorithms).)

Anyway – as to auto-awesome – I think this is how to disable it?

Settings_-_Google_Photos

PS I don’t remember offhand how I’ve licensed the content on the F1DataJunkie blog (did I get round to CC-BYing it?), but whatever the copyright status, I assume that by my agreeing to my uploaded Blogger images being stored on Google Photos, I grant Google a license to do whatever the f**k it wants with them, if only for my own access and amusement, and then go on to grab at my attention to tell me?

PPS In passing, in response to an an iOS update, I tweeted: itunes update on ios. 37 pages of terms and conditions. Thirty Seven. God only knows what terms and conditions I “agreed” to. But presumably, given that I

PPPS see also Mia Ridge on The rise of interpolated content?.

IPython Markdown Opportunities in IPython Notebooks and Rstudio

One of the reasons I started working on the Wrangling F1 Data With R book was to see what the Rmd (RMarkdown) workflow was like. Rmd allows you to combine markdown and R code in the same document, as well as executing the code blocks and then displaying the results of that code execution inline in the output document.

rmd_demo

As well as rendering to HTML, we can generate markdown (md is actually produced as the interim step to HTML creation), PDF output documents, etc etc.

One thing I’d love to be able to do in the RStudio/RMarkdown environment is include – and execute – Python code. Does a web search to see what Python support there is in R… Ah, it seems it does it already… (how did I miss that?!)

knitr_py

ADDED: Unfortunately, it seems as if Python state is not persisted between separate python chunks – instead, each chunk is run as a one off python inline python command. However, it seems as if there could be a way round this, which is to use a persistent IPython session; and the knitron package looks like just the thing for supporting that.

So that means in RStudio, I could use knitr and Rmd to write a version of Wrangling F1 Data With RPython

Of course, it would be nicer if I could write such a book in an everyday python environment – such as in an IPython notebook – that could also execute R code (just to be fair;-)

I know that we can already use cell magic to run R in a IPython notebook:

ipynb_rmagic

…so that’s that part of the equation.

And the notebooks do already allow us to mix markdown cells and code blocks/output. The default notebook presentation style is to show the code cells with the numbered In []: and Out []: block numbering, but it presumably only takes a small style extension or customisation to suppress that? And another small extension to add the ability to hide a code cell and just display the output?

So what is it that (to my mind at least) makes RStudio a nicer writing environment? One reason is the ability to write the Rmarkdown simply as Rmarkdown in a simple text editor enviroment. Another is the ability to inline R code and display its output in-place.

Taking that second point first, the ability to do better inlining in IPython notebooks – it looks like this is just what the python-markdown extension seems to do:

python_markdown

But how about the ability to write some sort of pythonMarkdown and then open in a notebook? Something like ipymd, perhaps…?

rossant_ipymd

What this seems to do is allow you to open an IPython-markdown document as an IPython notebook (in other words, it replaces the ipynb JSON document with an ipymd markdown document…). To support the document creation aspects better, we just need an exporter that removes the code block numbering and trivially allows code cells to be marked as hidden.

Now I wonder… what would it take to be able to open an Rmd document as an IPython notebook? Presumably just the ability to detect the code language, and then import the necessary magics to handle its execution? It’d be nice if it could cope with inline code, e.g. using the python-markdown magic too?

Exciting times could be ahead:-)