OUseful.Info, the blog…

Trying to find useful things to do with emerging technologies in open education

Exporting Markdown and XML From Google Docs

leave a comment »

Just over a year ago, we started production of a new OU course using Google docs as the medium within which we’d share draft course materials. This was something of an experiment to see whether the online social document medium encouraged sharing and discussion of ideas, resources, ongoing feedback and comment of the work in progress amongst the course team, rather than, or at least, in addition to, the traditional handover of significant chunks of the course at set handover dates. (In case you’re wondering, it didn’t…)

The question we’re now faced with is, how do we get the content that’s in Google docs into the next stage of the OU document workflow.

The actual HTML based course materials that appear in the VLE (or the ebook versions of them, etc) are generated automatically from an “OU Structured Content” XML document. The XML documents are prepared using the <oXygen/> XML editor, extended with an OU structured content framework that includes the requisite DTDs/schema files, hooks for rendering, previewing and publishing into the OU environment and so on. (Details about the available tags can be found here: tag guide [OU internal link].)

Whilst the preferred authoring route is presumably that authors use the <oXygen/> editor from the start, the guidance also suggests that many authors use Word (and an appropriate OU style sheet) and then copy and paste content over into the XML editor, at which point some amount of tidying and retagging may be required.

As Google docs doesn’t seem to support the addition of custom style elements or tags (users are limited to customising the visual style of provided style elements), we need to find another way of getting content from Google docs and into <oXygen/>. One approach would be to grab a copy of the Google doc into Microsoft Word, apply an OU template to mark up the content using appropriate custom style elements, and then copy the content over to the XML editor, upon which point it will probably need further tidying.

Another approach is to try to export the data as an XML document. Looking around, I found a Google Apps script script (sic) that allows you to export the content in a Google doc as markdown. Whilst markdown documents don’t have the same tree-like document structure as XML, it did provide an example of how to parse a Google docs document. My first attempt at a script to export a Google doc in the OU XML format can be found here: export Google doc as OU SC-XML. (Note: the script is subject to change; now I have a basic operational/functional spec, I can start to try to tidy up the code and try to parse out more structure…)

Having got a minimal exporter working, the question now arises as to where effort needs to be spent next. The exporter produces a minimal form of OU-XML that sometimes validates and (in early testing) sometimes doesn’t. (If the script is working properly it should produce output that always validate as XML; then it should produce output that always validate as OU-XML). Should time be spent improving the script to produce better XML, or can we live with the fact that the exporter gets the document some way in to <oXygen/>, but further work will be required to fix a few validation breaks?

Another issue that arises is how rich a form of OU-XML we try to export. When working in Microsoft Word environment, a document style can be defined using elements that map onto the elements in OU-XML. When working in Google docs, we need to define a convention that the parser can respond to.

At the moment, the parser is sensitive to:

  • HEADING1: treated as Session;
  • HEADING2: treated as Section;
  • HEADING3: treated as SubSection;
  • LIST_ITEM: numbered and unnumbered lists are treated as BulletedList; sublists to depth 1 are supported as BulletedSubsidiaryList
  • coloured text: treated as AuthorComment; at the moment this may incorrectly grab title elements as such;
  • INLINE_IMAGE: images are rendered as relatively referenced Figure elements with an empty Description element. A copy of the image is locally stored. (Note: INLINE_DRAWING elements are unsupported – there’s no way of exporting them; maybe I should export an empty Figure with a Description saying there’s a missing INLINE_DRAWING?)
  • TABLE: rendered as Table with empty TableHead;
  • bold, italic, LinkUrl: rendered as b, i and a tags respectively;
  • font.COURIER_NEW: rendered as ComputerCode.
  • By convention, we should be able to detect and parse things like activities, exercises and SAQs. Some mechanism needs to be supported for identifying the block elements in such cases. For example, one convention might take the form:

    Exercise N

    Discussion

    End

    The ^Exercise and ^End$ elements denote the block; heading style (eg HEADING4 for the ^Exercise (and perhaps HEADING5 for the ^End$?) could further aid detection?

    Another approach would be to use horizontal lines to denote the start and stop of a block. For example:


    SAQ

    My Answer

    where represents a horizontal line and denotes the start and end of a block. Again, heading styles within the block could either identify or reinforce a particular block element type.

    Rendering a preview of the OU-XML as it would appear in the OU VLE is possible by uploading the OU-XML file, or a zip file containing it and related assets, to an OU URL that sits behind an OU authentication care. The <oXygen/> editor handles previews by using the default web client – your default browser – to post a selected XML document to the appropriate OU upload/preview URL. Unfortunately, the functions that allow http POST operations from Google app script run on Google servers, which means that we can’t just create a button in Google docs that would post and XML export of the current Google doc to the authentication-required OU URL. (I don’t know if this would be possible in an OU/Google apps domain?).

    I’m not sure if a workaround would be to launch a preview window in the browser from Google docs containing a copy of the OU XML version of a document, highlighting the XML, then using a bookmarklet to post the highlighted XML to the OU preview service URL within the browser context using a browser where the user has logged in to the OU web domain? Alternatively, could a Chrome application access both content from Google Drive and then post to the authenticated OU preview URL using browser permissions? (That is, can a Chrome app access both Google Drive using a user’s Google permissions, or machine access permissions, and then post content grabbed from that source to the OU URL using permissions granted to the browser?) As ever, my ignorance about browser security policies on the one hand, and the Google Apps/Chrome apps security model on the other, make it hard to know what workarounds might be possible.

    If any members of OU staff would like to try out the exporter, please get in touch for hints, or let me know how you get on:-) In addition, if any members of OU staff are using Google docs for course production, I’d love to know how you’re using them and how you’re getting on:-)

    PS via @mahawksey, I see that we can associate metadata with a doc using PropertiesService.getDocumentProperties(). Could be handy for adding things like course code, author metadata, publishing route, template etc to a doc. I’m not sure if we can also associate metadata with a folder, though I guess we could also include a file in a folder that contains metadata relating to files held more generally within the same folder?

Written by Tony Hirst

December 4, 2014 at 10:27 pm

Posted in OU2.0, Tinkering

Tagged with

Information Density and Custom Chart Designs

leave a comment »

I’ve been doodling today with a some charts for the Wrangling F1 Data With R living book, trying to see how much information I can start trying to pack into a single chart.

The initial impetus came simply from thinking about a count of laps led in a particular race by each drive; this morphed into charting the number of laps in each position for each driver, and then onto a more comprehensive race summary chart (see More Shiny Goodness – Tinkering With the Ergast Motor Racing Data API for an earlier graphical attempt at producing a race summary chart).

lapPosChart

The chart shows:

- grid position: identified using an empty grey square;
race position after the first lap: identified using an empty grey circle;
race position on each driver’s last lap: y-value (position) of corresponding pink circle;
points cutoff line: a faint grey dotted line to show which positions are inside – or out of – the points;
number of laps completed by each driver: size of pink circle;
total laps completed by driver: greyed annotation at the bottom of the chart;
whether a driver was classified or not: the total lap count is displayed using a bold font for classified drivers, and in italics for unclassified drivers;
finishing status of each driver: classification statuses other than *Finished* are also recorded at the bottom of the chart.

The chart also shows drivers who started the race but did not complete the first lap.

What the chart doesn’t show is what stage of the race the driver was in each position, and how long for. But I have an idea for another chart that could help there, as well as being able to reuse elements used in the chart shown here.

FWIW, the following fragment of R code shows the ggplot function used to create the chart. The data came from the ergast API, though it did require a bit of wrangling to get it into a shape that I could use to power the chart.

#Reorder the drivers according to a final ranked position
g=ggplot(finalPos,aes(x=reorder(driverRef,finalPos)))
#Highlight the points cutoff
g=g+geom_hline(yintercept=10.5,colour='lightgrey',linetype='dotted')
#Highlight the position each driver was in on their final lap
g=g+geom_point(aes(y=position,size=lap),colour='red',alpha=0.15)
#Highlight the grid position of each driver
g=g+geom_point(aes(y=grid),shape=0,size=7,alpha=0.2)
#Highlight the position of each driver at the end of the first lap
g=g+geom_point(aes(y=lap1pos),shape=1,size=7,alpha=0.2)
#Provide a count of how many laps each driver held each position for
g=g+geom_text(data=posCounts,
              aes(x=driverRef,y=position,label=poscount,alpha=alpha(poscount)),
              size=4)
#Number of laps completed by driver
g=g+geom_text(aes(x=driverRef,y=-1,label=lap,fontface=ifelse(is.na(classification), 'italic' , 'bold')),size=3,colour='grey')
#Record the status of each driver
g=g+geom_text(aes(x=driverRef,y=-2,label=ifelse(status!='Finished', status,'')),size=2,angle=30,colour='grey')
#Styling - tidy the chart by removing the transparency legend
g+theme_bw()+xRotn()+xlab(NULL)+ylab("Race Position")+guides(alpha=FALSE)

The fully worked code can be found in forthcoming update to the Wrangling F1 Data With R living book.

Written by Tony Hirst

November 21, 2014 at 6:21 pm

Posted in Rstats

Tagged with ,

Looking for an Alternative to Twitter – and Vodafone…

with 4 comments

Whilst at an event over the weekend – from which I would generally have tweeted once or twice – I got the following two part message in what I regard as my personal Twitter feed to my phone:

1/2: Starting Nov 15, Vodafone UK will no longer support some Twitter SMS notifications. You will still be able to use 2-factor authentication and reset your pa
2/2: ssword over SMS.However, Tweet notifications and activity updates will cease. We are very sorry for the service disruption.

The notification appeared from a mobile number I have listed as “Twitter” in my contacts book. This makes both Twitter and Vodafone very much less useful to me – direct messages and mentions used to come direct to my phone as SMS text messages, and I used to be able to send tweets and direct messages to the same number (Docs: Twitter SMS commands). The service connected an SMS channel on my personal/private phone, with a public address/communication channel that lives on the web.

Why not use a Twitter client? A good question that deserves and answer that may sound foolish: Vodafone offers crappy coverage in the places I need it most (at home, at in-laws) where data connections are as good as useless. At home there’s wifi – and other screens running Twitter apps – elsewhere there typically isn’t. I also use my phone at events in the middle of fields, where coverage is often poor and even getting a connection can be chancy. (Client apps also draw on power – which is a factor on a weekend away at an event where there may be few recharging options; and they’re often keen to access contact book details, as well as all sorts of other permissions over your phone.)

So SMS works – it’s low power, typically on, low bandwidth, personal and public (via my Twitter ID).

But now both my phone contract and Twitter are worth very much less to me. One reason I kept my Vodafone contract was because of the Twitter connection. And one reason I stick with Twitter is the SMS route I have – I had – to it.

So now I’m looking for an alternative. To both.

I thought about rolling my own service using an SMS channel on IFTT, but I don’t think it supports/is supported by Vodafone in the UK? (Do any mobile operators support it? It so, I think I may have to change to them…)

ifft

If I do change contract though, I hope it’s easier that the last contract we tried are still trying – to kill. After several years it seems a direct debit on an old contract is still going out; after letters – and a phone call to Vodafone where they promised the direct debit was cancelled – it pops up again, paying out again, the direct debit that will never die. This week we’ll try again. Next month, if it pops up again, I guess we need to call on the ombudsman.

I guess what I’d really like is a mobile operator that offers me an SMS gateway so that I can call arbitrary webhooks in response to text messages I send, and field web requests that can then forward messages to my phone. (Support for the IFTT SMS channel would be almost as good.)

From what I know of Twitter’s origins (as twttr), the mobile SMS context was an important part of it – “I want to have a dispatch service that connects us on our phones using text” @Jack [Dorsey, Twitter founder] possibly once said (How Twitter Was Born). Text still works for me in the many places I go where wifi isn’t available and data connections are unreliable and slow, if indeed they’re available at all. (If wifi is available, I don’t need my phone contract…)

The founding story continues: I remember that @Jack’s first use case was city-related: telling people that the club he’s at is happening. If Vodafone and Twitter hadn’t stopped playing over the weekend, I’d have tweeted that I was watching the Wales Rally (WRC/FIA World Rally Championship) Rallyfest stages in North Wales at Chirk Castle on Saturday, and Kinmel Park on Sunday. As it was, I didn’t – so WRC also lost out on the deal. And I don’t have a personal tweet record of the event I was at.

If I’m going to have to make use of a web client and data connection to make use of Twitter messaging, it’s probably time to look for a service that does it better. What’s WhatsApp like in this respect?

Or if I’m going to have to make use of a web client and data connection to make use of Twitter messaging, I need to find a mobile operator that offers reliable data connections in places where I need it, because Vodafone doesn’t.

Either way, this cessation of the service has made me realise where I get most value from Twitter, and where I get most value from Vodafone, and it was in a combination of those services. With them now separated, the value of both to me are significantly reduced. Reduced to such an effect that I am looking for alternatives – to both.

Written by Tony Hirst

November 17, 2014 at 10:43 am

Posted in Anything you want

Teaching Material Analytics

leave a comment »

A couple of weeks ago, I had a little poke around some of the standard reports that we can get out of the OU VLE. OU course materials are generated from a structured document format – OU XML – that generates one or more HTML pages bound to a particular Moodle resource id. Additional Moodle resources are associated with forums, admin pages, library resource pages, and so on.

One of the standard reports provides a count of how many times each resource has been accessed within a given time period, such as a weekly block. Data can only be exported for so many weeks at a time, so to get stats for course materials over the presentation of a course (which may be up to 9 months long) requires multiple exports and the aggregation of the data.

We can then generate simple visual summaries over the data such as the following heatmap.

course_material_usage

Usage is indicated by colour density, time in weeks are organised along horizontal x-axis. From the chart, we can clearly see waves of activity over the course of the module as students access resources associated with particular study weeks. We can also see when materials aren’t
being accessed, or are only being accessed by a low number of times (that is, necessarily by a low proportion of students. If we get data about unique user accesses or unique user first use activity, we can get a better idea about the proportion of students in a cohort as a whole accessing a resource).

This sort of reporting – about material usage rather than student attainment – was what originally attracted me to thinking about data in the context of OU courses (eg Course Analytics in Context). That is, I wasn’t that interested in how well students were doing, per se, or interested in trying to find ways of spying on individual students to build clever algorithms behind experimental personalisation and recommender systems that would never make it out of the research context.

That could come later.

What I originally just wanted to know was whether this resource was ever looked at, whether that resource was accessed when I expected (eg if an end of course assessment page was accessed when students were prompted to start thinking about it during an exercise two thirds of the way in to the course), whether students tended to study for half an hour or three hours (so I could design the materials accordingly), how (and when) students searched the course materials – and for what (keyphrase searches copied wholesale out of the continuous assessment materials) and so on.

Nothing very personal in there – everything aggregate. Nothing about students, particularly, everything about course materials. As a member of the course team, asking how are the course materials working rather than how is that student performing?

There’s nothing very clever about this – it’s just basic web stats run with an eye to looking for patterns of behaviour over the life of a course to check that the materials appear to be being worked in the way we expected. (At the OU, course team members are often a step removed from supporting students.)

But what it is, I think, is an important complement to the “student centred” learning analytics. It’s analytics about the usage and utilisation of the course materials, the things we actually spend a couple of years developing but don’t really seem to track the performance of?

It’s data that can be used to inform and check on “learning designs”. Stats that act as indicators about whether the design is being followed – that is, used as expected, or planned.

As a course material designer, I may want to know how well students perform based on how they engage with the materials, but I should really to know how the materials are being utilised, because they’re designed to be utilised in a particular way? And if they’re not being used in that way, maybe I need to have a rethink?

Written by Tony Hirst

November 14, 2014 at 12:57 pm

Posted in Analytics, Anything you want

Tagged with

F1 Championship Race, 2014 – Winning Combinations…

with 3 comments

As we come up to the final two races of the 2014 Formula One season, the double points mechanism for the final race means that two drivers are still in with a shot at the Drivers’ Championship: Lewis Hamilton and Nico Rosberg.

As James Allen describes in Hamilton closes in on world title: maths favour him but Abu Dhabi threat remains:

Hamilton needs 51 points in the remaining races to be champion if Rosberg wins both races. Hamilton can afford to finish second in Brazil and at the double points finale in Abu Dhabi and still be champion. Mathematically he could also finish third in Brazil and second in the finale and take it on win countback, as Rosberg would have just six wins to Hamilton’s ten.
If Hamilton leads Rosberg home again in a 1-2 in Brazil, then he will go to Abu Dhabi needing to finish fifth or higher to be champion (echoes of Brazil 2008!!). If Rosberg does not finish in Brazil and Hamilton wins the race, then Rosberg would need to win Abu Dhabi with Hamilton not finishing; no other scenario would give Rosberg the title.

A couple of years ago, I developed an interactive R/shiny app for exploring finishing combinations of two drivers in the last two races of a season to see what situations led to what result: Interactive Scenarios With Shiny – The Race to the F1 2012 Drivers’ Championship.

f12014champshiny

I’ve updated the app (taking into account the matter of double points in the final race) so you can check out James Allen’s calculations with it (assuming I got my sums right too!). I tried to pop up an interactive version to Shinyapps, but the Shinyapps publication mechanism seems to be broken (for me at least) at the moment…:-(

In the meantime, if you have RStudio installed, you can run the application yourself. The code is avaliable and can be run from RStudio with: runGist("81380ff09ebe1cd67005")

When I get a chance, I’ll weave elements of this recipe into the Wrangling F1 Data With R book.

PS I’ve also started using the F1dataJunkie blog again as a place to post drafts and snippets of elements I’m working on for that book…

Written by Tony Hirst

November 8, 2014 at 2:07 pm

Posted in Rstats

Tagged with

Where’s My Phone…?

with 2 comments

Several years ago, I cam across this mocked up Google search that still makes me laugh now…

Google - where are my keys

And a couple of days ago, I realised I’d misplaced my phone. An Android device. An Android device that I have associated with a secondary Google profile I set up specifically to work with my phone (and that is linked in certain respects, such as calendars, with my primary Google ID).

Not being overly trusting of Google, I thought I’d switched off the various location awareness services that Google, and others, keep trying to get me to enable. Which made me feel a little silly – because if I’d put a location tracker on my device I would have been able to check if I had accidentally lost it from pocket in the place I thought. Or erase it if not.

Oops..

Or perhaps, not oops. I thought I’d do a quick search for “locate Android phone” just anyway, and turned up the so-called Android Device Manager (about, help). Logging in to that service, and lo and behold, there was a map locating the phone… pretty much exactly to the point I’d thought – I’d hoped – I’d misplaced it.

Android_Device_Manager

There are also a couple of other device management services – call the phone (to help find it down the side of the sofa, for example), or erase the phone and lock it. A service also exists to display a number to display on the locked phone in case a kindly soul finds it and wants to call you on that number to let you know they have it.

Android_Device_Manager_ring

Another example of a loss of sovereignty? And another example of how Google operate enterprise level control over our devices (it’s operating system and deep seated features), albeit giving us some sort of admin privileges too in a vague attempt to persuade us that we’re in control. Which we aren’t, of course. Useful, yes – but disconcerting and concerning too; because I really thought I’d tried to opt out of, and even disable, location revealing service access on that phone. (I know for a fact GPS was disabled – but then, mobile cell triangulation topped up with wifi hotspot location seem to tunnel things down pretty well…)

Hmmm…

Written by Tony Hirst

November 2, 2014 at 6:17 pm

Posted in Anything you want

Tagged with ,

Wrangling F1 Data With R – F1DataJunkie Book

with 2 comments

Earlier this year I started trying to pull together some of my #f1datajunkie R-related ramblings together in a book form. The project stalled, but to try to reboot it I’ve started publishing it as a living book over on Leanpub. Several of the chapters are incomplete – with TO DO items sketched in, others are still unpublished. The beauty of the Leanpub model is that if you buy a copy, you continue to get access to all future updated versions of the book. (And my idea is that by getting the book out there as it is, I’ll feel as if there’s more (social) pressure on actually trying to keep up with it…)

I’ll be posting more details about how the Leanpub process works (for me at least) in the next week or two, but for now, here’s a link to the book: Wrangling F1 Data With R: A Data Junkie’s Guide.

Here’s the table of contents so far:

  • Foreword
    • A Note on the Data Sources
  • Introduction
    • Preamble
    • What are we trying to do with the data?
    • Choosing the tools
    • The Data Sources
    • Getting the Data into RStudio
    • Example F1 Stats Sites
    • How to Use This Book
    • The Rest of This Book…
  • An Introduction to RStudio and R dataframes
    • Getting Started with RStudio
    • Getting Started with R
    • Summary
  • Getting the data from the Ergast Motor Racing Database API
    • Accessing Data from the ergast API
    • Summary
  • Getting the data from the Ergast Motor Racing Database Download
    • Accessing SQLite from R
    • Asking Questions of the ergast Data
    • Summary
    • Exercises and TO DO
  • Data Scraped from the F1 Website
    • Problems with the Formula One Data
    • How to use the FormulaOne.com alongside the ergast data
  • Reviewing the Practice Sessions
    • The Weekend Starts Here
    • Practice Session Data from the FIA
    • Sector Times
    • FIA Media Centre Timing Sheets
  • A Quick Look at Qualifying
    • Qualifying Session Position Summary Chart
    • Another Look at the Session Tables
    • Ultimate Lap Positions
  • Lapcharts
    • Annotated Lapcharts
  • Race History Charts
    • The Simple Laptime Chart
    • Accumulated Laptimes
    • Gap to Leader Charts
    • The Lapalyzer Session Gap
    • Eventually: The Race History Chart
  • Pit Stop Analysis
    • Pit Stop Data
    • Total pit time per race
    • Pit Stops Over Time
    • Estimating pit loss time
    • Tyre Change Data
  • Career Trajectory
    • The Effect of Age on Performance
    • Statistical Models of Career Trajectories
    • The Age-Productivity Gradient
    • Summary
  • Streakiness
    • Spotting Runs
    • Generating Streak Reports
    • Streak Maps
    • Team Streaks
    • Time to N’th Win
    • TO DO
    • Summary
  • Conclusion
  • Appendix One – Scraping formula1.com Timing Data
  • Appendix Two – FIA Timing Sheets
    • Downloading the FIA timing sheets for a particular race
  • Appendix – Converting the ergast Database to SQLite

If you think you deserve a free copy, let me know… ;-)

Written by Tony Hirst

October 31, 2014 at 12:04 am

Posted in Rstats

Tagged with

Follow

Get every new post delivered to your Inbox.

Join 865 other followers