Category: Tinkering

A First Attempt at Wrangling WRC (World Rally Championship) Data With pandas and matplotlib

Last year was a quite year on the Wrangling F1 Data With R front, with a not even aborted start at doing a python/pandas equivalent project. With the new owners of F1 in place, things may change for the better in terms of engaging with fans and supporters, and I may revisit that idea properly, but in the meantime, I thought I started tinkering with a wider range of motorsport data.

The start to the BTCC season is still a few months away, but the WRC started over the weekend, and with review highlights and live coverage of one stage per rally on Red Bull TV, I thought I may give that data a go…

Results and timing info can be found on the WRC web pages (I couldn’t offhand find a source of official FIA timing sheets) so here’s a first quick sketch using stage results from the first rally of the year – Monte Carlo.

world_rally_championship_-_results_monte_carlo_-_wrc_com

To start with, we need to grab the data. I’m using the pandas library, which has a handy .read_html() method that can scrape tables (crudely) from an HTML page given its URL.

import pandas as pd

def getStageResultsBase(year,rallyid,stages):
    ''' Get stage results and overall results at end of stage '''
    
    # Accept one stage number or a list of stage numbers
    stages=[stages] if not isinstance(stages,list) else stages
    
    #There are actually two tables on the stage results page
    df_stage=pd.DataFrame()
    df_overallpart=pd.DataFrame()
    
    #Grab data for each stage
    for stage in stages:
        url='http://www.wrc.com/live-ticker/daten/{year}/{rallyid}/stage.{rallyid}.{stage}.all.html'.format(year=year, rallyid=rallyid, stage=stage)
        #scrape the data
        results=pd.read_html(url,encoding='utf-8')
        results[0].columns=['pos', 'carNo', 'driverName', 'time', 'diffPrev', 'diffFirst']
        results[1].columns=['pos', 'carNo', 'driverName', 'time', 'diffPrev', 'diffFirst']
        
        #Simple cleaning - cast the data types as required
        for i in [0,1]:
            results[i].fillna(0,inplace=True)
            results[i]['pos']=results[i]['pos'].astype(float).astype(int)
            for j in ['carNo','driverName','time','diffPrev','diffFirst']:
                results[i][j]=results[i][j].astype(str)
        
        #Add a stage identifier
        results[0]['stage']=stage
        results[1]['stage']=stage
        
        #Add the scraped stage data to combined stage results data frames
        df_stage=pd.concat([df_stage,results[0]])
        df_overallpart=pd.concat([df_overallpart,results[1]])

    return df_stage.reset_index(drop=True), df_overallpart.reset_index(drop=True)

The data we pull back looks like the following.

wrc_results_scraper1

Note that deltas (the time differences) are given as offset times in the form of a string. As the pandas library was in part originally developed for working with financial time series data, it has a lot of support for time handling. This includes the notion of a time delta:

pd.to_timedelta("1:2:3.0")
#Timedelta('0 days 01:02:03')

We can use this datatype to represent time differences from the results data:

#If we have hh:mm:ss format we can easily cast a timedelta
def regularTimeString(strtime):

    #Go defensive, just in case we're passed eg 0 as an int
    strtime=str(strtime)
    strtime=strtime.strip('+')

    modifier=''
    if strtime.startswith('-'):
        modifier='-'
        strtime=strtime.strip('-')

    timeComponents=strtime.split(':')
    ss=timeComponents[-1]
    mm=timeComponents[-2] if len(timeComponents)>1 else 0
    hh=timeComponents[-3] if len(timeComponents)>2 else 0
    timestr='{}{}:{}:{}'.format(modifier,hh,mm,ss)
    return pd.to_timedelta(timestr)

We can use the time handler to cast the time differences from the scraped data as timedelta typed data:

def getStageResults(year,rallyid,stages):
    df_stage, df_overallpart = getStageResultsBase(year,rallyid,stages)
    for col in ['time','diffPrev','diffFirst']:
        df_stage['td_'+col]=df_stage.apply(lambda x: regularTimeString(x[col]),axis=1)
        df_overallpart['td_'+col]=df_overallpart.apply(lambda x: regularTimeString(x[col]),axis=1)
    return df_stage, df_overallpart 

wrc_results_scraper2

The WRC results cover all entrants to the rally, but not all the cars are classed as fully blown WRC cars (class RC1). We can limit the data to just the RC1 cars and generate a plot showing the position of each driver at the end of each stage:

%matplotlib inline
import matplotlib.pyplot as plt

rc1=df_overall[df_overall['groupClass']=='RC1'].reset_index(drop=True)

fig, ax = plt.subplots(figsize=(15,8))
ax.get_yaxis().set_ticklabels([])
rc1.groupby('driverName').plot(x='stage',y='pos',ax=ax,legend=None);

wrc_results_scraper3

The position is actually the position of the driver across all entry classes, not just RC1. This means if a driver has a bad day, they could be placed well down the all-class field; but that’s not of too much interest if all we’re interested in is in-class ranking.,

So what about if we rerank the drivers within the RC1 class? And perhaps improve the chart a little by adding a name label to identify each driver at their starting position?

rc1['rank']=rc1.groupby('stage')['pos'].rank()

fig, ax = plt.subplots(figsize=(15,8))
ax.get_yaxis().set_ticklabels([])
rc1.groupby('driverName').plot(x='stage',y='rank',ax=ax,legend=None)

#Add some name labels at the start
for i,d in rc1[rc1['stage']==1].iterrows():
    ax.text(-0.5, i+1, d.ix(i)['driverName'])

wrc_results_scraper4

This chart is a bit cleaner, but now we lose information around the lower placed in-class drivers, in particular that information about  there overall position when other classes are taken into account too…

The way the FIA recover this information in their stage chart displays that reports on the evolution of the race for the top 10 cars overall (irrespective of class)  that shows excursions in interim stages outside the top 10  “below the line”, annotating them further with their overall classification on the corresponding stage.

stage_chart___federation_internationale_de_l_automobile

We can use this idea by assigning a “re-rank” to each car if they are positioned outside the size of the class.

#Reranking...
rc1['xrank']= (rc1['pos']>RC1SIZE)
rc1['xrank']=rc1.groupby('stage')['xrank'].cumsum()
rc1['xrank']=rc1.apply(lambda row: row['pos'] if row['pos']<=RC1SIZE else row['xrank'] +RC1SIZE, axis=1)
fig, ax = plt.subplots(figsize=(15,8))
ax.get_yaxis().set_ticklabels([])
rc1.groupby('driverName').plot(x='stage',y='xrank',ax=ax,legend=None)

#Name labels
for i,d in rc1[rc1['stage']==1].iterrows():
    ax.text(-0.5, d.ix(i)['xrank'], d.ix(i)['driverName'])
for i,d in rc1[rc1['stage']==17].iterrows():
    ax.text(17.3, d.ix(i)['xrank'], d.ix(i)['driverName'])

wrc_results_scraper5The chart now shows the evolution of the race for the RC1 cars, retaining the spaced ordering of the top 12 positions that would be filled by WRC1/RC1 cars if they were all placed above cars from other classes and then bunching those placed outside the group size. (Only 11 names are shown because one the entries retired right at the start of the rally.)

So for example, in this case we see how Neuvill, Hanninen and Serderidis are classed outside Lefebvre, who was actually classed 9th overall.

Further drawing on the FIA stage chart, we can label the excursions outside the top 12, and also invert the y-axis.

fig, ax = plt.subplots(figsize=(15,8))
ax.get_yaxis().set_ticklabels([])
rc1.groupby('driverName').plot(x='stage',y='xrank',ax=ax,legend=None);

for i,d in rc1[rc1['xrank']>RC1SIZE].iterrows(): ax.text(d.ix(i)['stage']-0.1, d.ix(i)['xrank'], d.ix(i)['pos'], bbox=dict( boxstyle='round,pad=0.3',color='pink')) #facecolor='none',edgecolor='black',
#Name labels
for i,d in rc1[rc1['stage']==1].iterrows(): ax.text(-0.5, d.ix(i)['xrank'], d.ix(i)['driverName']) for i,d in rc1[rc1['stage']==17].iterrows(): ax.text(17.3, d.ix(i)['xrank'], d.ix(i)['driverName'])
#Flip the y-axis plt.gca().invert_yaxis()

Lefebvre’s excursions outside the top 12 are now recorded and plain to see.

wrc_results_scraper6

We now have a chart that blends rank ordering with additional information showing where cars are outpaced by cars from other classes, in a space efficient manner.

PS as with Wrangling F1 Data With R, I may well turn this into a Leanpub book, this time exploring the workflow to markdown (and/or maybe reveal.js slides!) from Jupyter notebooks, rather than from RStudio/Rmd.

A Recipe for Automatically Going From Data to Text to Reveal.js Slides

Over the last few years, I’ve experimented on and off with various recipes for creating text reports from tabular data sets, (spreadsheet plugins are also starting to appear with a similar aim in mind). There are several issues associated with this, including:

  • identifying what data or insight you want to report from your dataset;
  • (automatically deriving the insights);
  • constructing appropriate sentences from the data;
  • organising the sentences into some sort of narrative structure;
  • making the sentences read well together.

Another approach to humanising the reporting of tabular data is to generate templated webpages that review and report on the contents of a dataset; this has certain similarities to dashboard style reporting, mixing tables and charts, although some simple templated text may also be generated to populate the page.

In a business context, reporting often happens via Powerpoint presentations. Slides within the presentation deck may include content pulled from a templated spreadsheet, which itself may automatically generate tables and charts for such reuse from a new dataset. In this case, the recipe may look something like:

exceldata2slide

#render via: http://blockdiag.com/en/blockdiag/demo.html
{
  X1[label='macro']
  X2[label='macro']

  Y1[label='Powerpoint slide']
  Y2[label='Powerpoint slide']

   data -> Excel -> Chart -> X1 -> Y1;
   Excel -> Table -> X2 -> Y2 ;
}

In the previous couple of posts, the observant amongst you may have noticed I’ve been exploring a couple of components for a recipe that can be used to generate reveal.js browser based presentations from the 20% that account for the 80%.

The dataset I’ve been tinkering with is a set of monthly transparency spending data from the Isle of Wight Council. Recent releases have the form:

iw_transparency_spending_data

So as hinted at previously, it’s possible to use the following sort of process to automatically generate reveal.js slideshows from a Jupyter notebook with appropriately configured slide cells (actually, normal cells with an appropriate metadata element set) used as an intermediate representation.

jupyterslidetextgen

{
  X1[label="text"]
  X2[label="Jupyter notebook\n(slide mode)"]
  X3[label="reveal.js\npresentation"]

  Y1[label="text"]
  Y2[label="text"]
  Y3[label="text"]

  data -> "pandas dataframe" -> X1  -> X2 ->X3
  "pandas dataframe" -> Y1,Y2,Y3  -> X2 ->X3

  Y2 [shape = "dots"];
}

There’s an example slideshow based on October 2016 data here. Note that some slides have “subslides”, that is, slides underneath them, so watch the arrow indicators bottom left to keep track of when they’re available. Note also that the scrolling is a bit hit and miss – ideally, a new slide would always be scrolled to the top, and for fragments inserted into a slide one at a time the slide should scroll down to follow them).

The structure of the presentation is broadly as follows:

demo_-_interactive_shell_for_blockdiag_-_blockdiag_1_0_documentation

For example, here’s a summary slide of the spends by directorate – note that we can embed charts easily enough. (The charts are styled using seaborn, so a range of alternative themes are trivially available). The separate directorate items are brought in one at a time as fragments.

testfullslidenotebook2_slides1

The next slide reviews the capital versus expenditure revenue spend for a particular directorate, broken down by expenses type (corresponding slides are generated for all other directorates). (I also did a breakdown for each directorate by service area.)

The items listed are ordered by value, and taken together account for at least 80% of the spend in the corresponding area. Any further items contributing more than 5%(?) of the corresponding spend are also listed.

testfullslidenotebook2_slides2

Notice that subslides are available going down from this slide, rather than across the mains slides in the deck. This 1.5D structure means we can put an element of flexible narrative design into the presentation, giving the reader an opportunity to explore the data, but in a constrained way.

In this case, I generated subslides for each major contributing expenses type to the capital and revenue pots, and then added a breakdown of the major suppliers for that spending area.

testfullslidenotebook2_slides3

This just represents a first pass at generating a 1.5D slide deck from a tabular dataset. A Pareto (80/20) heurstic is used to try to prioritise to the information displayed in order to account for 80% of spend in different areas, or other significant contributions.

Applying this principle repeatedly allows us to identify major spending areas, and then major suppliers within those spending areas.

The next step is to look at other ways of segmenting and structuring the data in order to produce reports that might actually be useful…

If you have any ideas, please let me know via the comments, or get in touch directly…

PS FWIW, it should be easy enough to run any other dataset that looks broadly like the example at the top through the same code with only a couple of minor tweaks…

Automatically Generating Two Dimensional Reveal.js Documents Using Jupyter Notebooks

One of the things I finally got round to exploring whilst at the Reproducible Research Using Jupyter Notebooks curriculum development hackathon was the ability to generate slideshows from Jupyter notebooks.

The underlying slideshow presentation framework is reveal.js. This uses a 1.5(?) dimensional slide geometry, so slides can transition left to right, or you can transition down to subslides off a single slide.

This got me wondering… could I use a notebook/script to generate a reveal.js slideshow that could provide a convenient way of navigating automatically generated slideshows made up from automatically generated data2text slides?

The 1.5/two dimensional component would mean that slides could be structured by topic horizontally, with subtopic vertically downwards within a topic.

A quick test suggests that this is absolutely doable…

import IPython.nbformat as nb
import IPython.nbformat.v4.nbbase as nb4

test=nb4.new_notebook()
test.cells.append(nb4.new_markdown_cell('# Test slide1',metadata={"slideshow": {"slide_type": "slide"}}))
test.cells.append(nb4.new_markdown_cell('# Test slide2',metadata={"slideshow": {"slide_type": "slide"}}))
test.cells.append(nb4.new_markdown_cell('Slide2 extra content line 1\n\nSlide2 extra content line 2'))
test.cells.append(nb4.new_markdown_cell('# Test slide3',metadata={"slideshow": {"slide_type": "slide"}}))
test.cells.append(nb4.new_markdown_cell('Slide3 fragment 1',metadata={"slideshow": {"slide_type": "fragment"}}))
test.cells.append(nb4.new_markdown_cell('Slide3 fragment 2',metadata={"slideshow": {"slide_type": "fragment"}}))
test.cells.append(nb4.new_markdown_cell('# Slide4',metadata={"slideshow": {"slide_type": "slide"}}))
test.cells.append(nb4.new_markdown_cell('Slide4 extra content line 1\n\nSlide4 extra content line 2'))
test.cells.append(nb4.new_markdown_cell('# Slide4 subslide',metadata={"slideshow": {"slide_type": "subslide"}}))

nbf='testslidenotebook.ipynb'
nb.write(test,nbf)

#Generate and render slideshow
!jupyter nbconvert $nbf --to slides --post serve

Let the fun begin…:-)

PS here’s a second pass:

def addSlideComponent(notebook, content, styp=''):
    if styp in ['slide','fragment','subslide']: styp={"slideshow": {"slide_type":styp}}
    else: styp={}
    notebook.cells.append(nb4.new_markdown_cell(content, metadata=styp))

test=nb4.new_notebook()
addSlideComponent(test,'# Test2 slide1','slide')
addSlideComponent(test,'# Test slide2','slide')
addSlideComponent(test,'Slide2 extra content line 1\n\nSlide2 extra content line 2')
addSlideComponent(test,'# Test slide3','slide')
addSlideComponent(test,'Slide3 fragment 1','fragment')
addSlideComponent(test,'Slide3 fragment 2','fragment')
addSlideComponent(test,'# Slide4','slide')
addSlideComponent(test,'Slide4 extra content line 1\n\nSlide2 extra content line 1')
addSlideComponent(test,'# Slide4 subslide','subslide')

nbf='testslidenotebook2.ipynb'
nb.write(test,nbf)

Wrangling Time Periods (such as Financial Year Quarters) In Pandas

Looking at some NHS 111 and A&E data today, the reported data I was interested in was being reported for different sorts of period, specifically, months and quarters.

The pandas python library has quite a few tools for dealing with periods, so here are a couple of examples of tricks I put to use today.

Firstly, casting months to a month period. Monthly periods (in column df['Periodname']) were reported in the form “Dec-10”, “Jan-11”, etc, which is to say a three letter month followed by a two digit year representation. If we create a datetime on the first of the corresponding month, we can then cast that date to a month period for the corresponding month:

#Generate a pandas period for time series indexing purposes
#Create a date from each month (add the first date of the month) then set to period
#Note: if the datetime is an index, drop the .dt.
df['_period']=pd.to_datetime(('01-'+df['Periodname']), \
                             format='%d-%b-%y').dt.to_period('M')

If the datetimes form the dataframe index, we could drop the .dt. element and cast the timeseries index directly to the monthly period. The %b term maps onto the locale specific month description.

The second example had financial years and dates in two columns – Year and Quarter. The months specified in the quarter description mix three letter representations and the fully written name of the month. (As with many NHS datasets, the text used to describe dates can be all over the place, even within a single spreadsheet or datafile!)

If we grab the first three letters of the month, we can identify the month as it’s short form. If we create a date stamp for the first day of the quarter (in the UK, corporate financial years run 1-April, to 31-March (it’s the personal tax year that reports from April 5th?) we can then cast to a quarterly reporting period whose year ends in a particular month, eg March (freq='Q-MAR').

nhs_111

Note that the YearQuarter uses the calendar year in which the financial year ends. We can check the start and end date of the quarters explicitly:

nhs_111_2

One issue when it comes to plotting the financial year quarterly periods is that the chart looks to me as if everything is a year out:

nhs_111_3

That is, I read the year as the calendar year, but it’s actually the calendar year in which a particular financial year ends.

So I’m wondering – should pandas really report and plot the financial year in which the fiscal year starts? Is there a switch for this? Or do I need to submit a bug report (and if so, what’s the bug?)

PS for a notebook worked example, see here: NHS 111 example.

Creating a Self-Updating WordPress Post Using WordPress Transients and Data from a Third Party API

In the post Pondering a Jupyter Notebooks to WordPress Publishing Pattern: MultiMarker Map Widget, I described a simple pattern I started exploring last year that used a custom WordPress shortcode plugin to render data added to one or more custom fields associated with a WordPress post; the post text (including shortcode) and custom fields data were themselves posted into WordPress using some Python code executed from a Jupyter notebook. The idea behind that pattern was to provide a way of automating the creation of custom posts largely from a supplied data set, rendered using a generic shortcode plugin.

Another pattern I explored last year used the WordPress Transients API to cache data pulled from a 3rd party API in the WordPress database, and allow that data to be used by a custom plugin to render the post.

Here’s some example code for a plugin that renders a map containing recent planning applications on the Isle of Wight: the data is grabbed via an API from a morph.io webscraper, which scrapes the data from the Isle of Wight council website.

The two key bits of the script are where I check to see if cached data exisits ( get_transient( 'iwcurrplanningitems' ); and if it doesn’t, grab a recent copy from the API and cache it for 8 hours (set_transient('iwcurrplanningitems', $markers, 60*60*8);).

<?php
/*
Plugin Name: IWPlanningLeafletMap
Description: Shortcode to render an interactive map displaying clustered markers. Markers are pulled in via JSON from an external URL. Intended primarily to supported automated post creation. Inspired by folium python library and Google Maps v3 Shortcode multiple Markers WordPress plugin
Version: 1.0
Author: Tony Hirst
*/

//Loaded in from multimarker shortcode
add_action( 'wp_enqueue_scripts', 'custom_scripts' );
add_action( 'wp_enqueue_scripts', 'custom_styles' );


// Add stuff to header
add_action('wp_head', 'IWPlanningLeafletMap_header');
add_action('wp_head', 'fix_css');

/*
function fix_css() { 
	echo '<style type="text/css">#map {
        position:absolute;
        top:0;
        bottom:0;
        right:0;
        left:0;
      }</style>' . "\n";
 } 
*/


function IWPlanningLeafletMap_header() {
}

function IWPlanningLeafletMap_call($attr) {
// Generate the map template

	// Default attributes - can be overwritten from shortcode
	$attr = shortcode_atts(array(	
									'lat'   => '50.675', 
									'lon'    => '-1.32',
									'id' => 'iwmap_1',
									'zoom' => '11',
									'width' => '800',
									'height' => '500',
									'markers'=>''
									), $attr);

	$html = '<div class="folium-map" id="'.$attr['id'].'" style="width: '. $attr['width'] .'px; height: '. $attr['height'] .'px"></div>

   <script type="text/javascript">
      var base_tile = L.tileLayer("https://{s}.tile.openstreetmap.org/{z}/{x}/{y}.png", {
          maxZoom: 18,
          minZoom: 1,
          attribution: "Map data (c) OpenStreetMap contributors - http://openstreetmap.org"
      });

      var baseLayer = {
        "Base Layer": base_tile
      }

      /*
      list of layers to be added
      */
      var layer_list = {
      };

      /*
      Bounding box.
      */
      var southWest = L.latLng(-90, -180),
          northEast = L.latLng(90, 180),
          bounds = L.latLngBounds(southWest, northEast);

      /*
      Creates the map and adds the selected layers
      */
      var map = L.map("'.$attr['id'].'", {
                                       center:['.$attr['lat'].', '.$attr['lon'].'],
                                       zoom: '.$attr['zoom'].',
                                       maxBounds: bounds,
                                       layers: [base_tile]
                                     });

      L.control.layers(baseLayer, layer_list).addTo(map);

      //cluster group
      var clusteredmarkers = L.markerClusterGroup();
      //section for adding clustered markers
      ';
	
	$markers =  get_transient( 'iwcurrplanningitems' );
	if ( false === $markers ) {
		$url='https://api.morph.io/psychemedia/iwplanningscraper/data.json?key=*****MORPHIOKEY****&query=select%20*%20from%20IWPLANNING%20where%20date(%22Consultation%20End%20Date_t%22)%3Edate(%22now%22)';
		$json = file_get_contents($url);
		$markers=json_decode($json, true);
		set_transient('iwcurrplanningitems', $markers, 60*60*8);
	}
	
	
	for ($i = 0;$i < count($markers);$i ++){
		$arrkeys=['Agent or Applicant','Location','Proposal'];
		foreach($arrkeys as $arrkey){
			$markers[$i][$arrkey] = str_replace("\n", "<br/>", $markers[$i][$arrkey]);
			$markers[$i][$arrkey] = str_replace("\r", "<br/>", $markers[$i][$arrkey]);
		}	
		$html .='
			var marker_'.$i.'_icon = L.AwesomeMarkers.icon({ icon: "info-sign",markerColor: "blue",prefix: "glyphicon",extraClasses: "fa-rotate-0"});
      		var marker_'.$i.' = L.marker(['.$markers[$i]['lat'].','.$markers[$i]['lon'].'], {"icon":marker_'.$i.'_icon});
      marker_'.$i.'.bindPopup("<strong>Consultation start:</strong> '.$markers[$i]['Consultation Start Date'].'<br/><strong>Consultation end:</strong> '.$markers[$i]['Consultation End Date'].'<br/><strong>Location:</strong> '.$markers[$i]['Location'].'<br/><em> '.$markers[$i]['Parish'].' parish, '.$markers[$i]['Ward'].' ward.</em><br/><strong>Proposal:</strong> '.$markers[$i]['Proposal'].'<br/><strong>Agent or Applicant:</strong> '.$markers[$i]['Agent or Applicant'].'<br/><strong>Case Officer:</strong> '.$markers[$i]['Case Officer'].'<br/><em><a href=\'https://www.iwight.com/planning/'.$markers[$i]['stub'].'\'>View application</a></em>");
      marker_'.$i.'._popup.options.maxWidth = 300;
      clusteredmarkers.addLayer(marker_'.$i.');
      
     		//add the clustered markers to the group anyway
      		map.addLayer(clusteredmarkers);

    	';
	}
	$html .= '</script>';
	return $html;
	?>

<?php
}
add_shortcode('IWPlanningLeafletMap', 'IWPlanningLeafletMap_call');
?>

One thing I started to wonder over the Christmas break was whether this approach could provide a way of sharing “data2text” content. For example, having a plugin that creates a canned summary of jobseeker’s allowance figures from data cached from the ONS website? A downside of this is that I’d have to write the data2text script using PHP, which means I couldn’t directly build on related code I’ve written previously…

I also wonder if we could use custom fields to permanently store data for a particular post. For example, we might check whether or not a custom field exists for the post, and if it doesn’t we could create and populate it using data pulled from an API, (possibly keyed by plugin/shortcode parameters, or the post publication date), using a WordPress add_post_meta() function call?

Pondering a Jupyter Notebooks to WordPress Publishing Pattern: MultiMarker Map Widget

On my to do list for next year is to finally get round to doing something consistently with open data in an Isle of Wight context, probably on isleofdata.com. One of the things I particularly want to explore are customisable WordPress plugins that either source data from on online data source or that can be configured as part of an external publishing system.

For example, the following code, saved as MultiMarkerLeafletMap2.php and zipped up implements a WordPress plugin that can render an interactive leaflet map with clustered markers.

datawire_-_test_blog_-_just_seeing_what_the_bots_can_do

<?php
/*
Plugin Name: MultiMarkerLeafletMap2
Description: Shortcode to render an interactive map displaying clustered markers. Markers to be added as JSON. Intended primarily to supported automated post creation. Inspired by folium python library and Google Maps v3 Shortcode multiple Markers WordPress plugin
Version: 1.0
Author: Tony Hirst
*/

function MultiMarkerLeafletMap2_custom_styles() {

	wp_deregister_style( 'oi_css_map_leaflet' );
	wp_register_style( 'oi_css_map_leaflet', '//cdnjs.cloudflare.com/ajax/libs/leaflet/0.7.3/leaflet.css',false, '0.7.3' );
	wp_enqueue_style( 'oi_css_map_leaflet' );

	wp_deregister_style( 'oi_css_map_bootstrap' );
	wp_register_style( 'oi_css_map_bootstrap', '//maxcdn.bootstrapcdn.com/bootstrap/3.2.0/css/bootstrap.min.css', false, '3.2.0' );
	wp_enqueue_style( 'oi_css_map_bootstrap' );

	wp_deregister_style('oi_css_map_bootstrap_theme');
	wp_register_style('oi_css_map_bootstrap_theme','//maxcdn.bootstrapcdn.com/bootstrap/3.2.0/css/bootstrap-theme.min.css',false,false);
	wp_enqueue_style( 'oi_css_map_bootstrap_theme');

	wp_deregister_style('oi_css_map_fa');
	wp_register_style('oi_css_map_fa','//maxcdn.bootstrapcdn.com/font-awesome/4.1.0/css/font-awesome.min.css',false,false);
	wp_enqueue_style( 'oi_css_map_fa');

	wp_deregister_style('oi_css_map_lam');
	wp_register_style('oi_css_map_lam','//cdnjs.cloudflare.com/ajax/libs/Leaflet.awesome-markers/2.0.2/leaflet.awesome-markers.css',false,false);
	wp_enqueue_style( 'oi_css_map_lam');

	wp_deregister_style('oi_css_map_lmcd');
	wp_register_style('oi_css_map_lmcd','//cdnjs.cloudflare.com/ajax/libs/leaflet.markercluster/0.4.0/MarkerCluster.Default.css',false,false);
	wp_enqueue_style( 'oi_css_map_lmcd');

	wp_deregister_style('oi_css_map_lmc');
	wp_register_style('oi_css_map_lmc','//cdnjs.cloudflare.com/ajax/libs/leaflet.markercluster/0.4.0/MarkerCluster.css',false,false);
	wp_enqueue_style( 'oi_css_map_lmc');

	wp_deregister_style('oi_css_map_lar');
	wp_register_style('oi_css_map_lar','//birdage.github.io/Leaflet.awesome-markers/dist/leaflet.awesome.rotate.css',false,false);
	wp_enqueue_style( 'oi_css_map_lar');

}

function MultiMarkerLeafletMap2_custom_scripts() {

	wp_deregister_script( 'oi_script_leaflet' );
	wp_register_script( 'oi_script_leaflet', '//cdnjs.cloudflare.com/ajax/libs/leaflet/0.7.3/leaflet.js',array('oi_script_jquery'),'0.7.3');
	wp_enqueue_script( 'oi_script_leaflet' );

	wp_deregister_script( 'oi_script_jquery' );
	wp_register_script( 'oi_script_jquery', '//ajax.googleapis.com/ajax/libs/jquery/1.11.3/jquery.min.js', false, '1.11.3' );
	wp_enqueue_script( 'oi_script_jquery' );

	wp_deregister_script( 'oi_script_bootstrap' );
	wp_register_script( 'oi_script_bootstrap', '//maxcdn.bootstrapcdn.com/bootstrap/3.2.0/js/bootstrap.min.js',false,'3.2.0');
	wp_enqueue_script( 'oi_script_bootstrap' );

	wp_deregister_script( 'oi_script_lam' );
	wp_register_script( 'oi_script_lam', '//cdnjs.cloudflare.com/ajax/libs/Leaflet.awesome-markers/2.0.2/leaflet.awesome-markers.js',array('oi_script_leaflet'),'2.0');
	wp_enqueue_script( 'oi_script_lam' );

	wp_deregister_script( 'oi_script_lmc' );
	wp_register_script( 'oi_script_lmc', '//cdnjs.cloudflare.com/ajax/libs/leaflet.markercluster/0.4.0/leaflet.markercluster.js',array('oi_script_leaflet'),'0.4.0');
	wp_enqueue_script( 'oi_script_lmc' );	

	wp_deregister_script( 'oi_script_lmcsrc' );
	wp_register_script( 'oi_script_lmcsrc', '//cdnjs.cloudflare.com/ajax/libs/leaflet.markercluster/0.4.0/leaflet.markercluster-src.js',array('oi_script_leaflet'),'0.4.0');
	wp_enqueue_script( 'oi_script_lmcsrc' );

}
add_action( 'wp_enqueue_scripts', 'MultiMarkerLeafletMap2_custom_scripts' );
add_action( 'wp_enqueue_scripts', 'MultiMarkerLeafletMap2_custom_styles' );

// Add items to header
add_action('wp_head', 'MultiMarkerLeafletMap2_header');
add_action('wp_head', 'MultiMarkerLeafletMap2_fix_css');

function MultiMarkerLeafletMap2_fix_css() {
	echo '<style type="text/css">#map {
        position:absolute;
        top:0;
        bottom:0;
        right:0;
        left:0;
      }</style>' . "\n";
 } 

function MultiMarkerLeafletMap2_header() {
}

function MultiMarkerLeafletMap2_call($attr) {
// Generate the map template

	// Default attributes - can be overwritten from shortcode
	$attr = shortcode_atts(array(
									'lat'   => '0',
									'lon'    => '0',
									'id' => 'oimap_1',
									'zoom' => '7',
									'width' => '600',
									'height' => '400',
									'type' => 'multimarker',
									'markers'=>''
									), $attr);

	$html = '<div class="folium-map" id="'.$attr['id'].'" style="width: '. $attr['width'] .'px; height: '. $attr['height'] .'px"></div>
<script type="text/javascript">
      var base_tile = L.tileLayer("https://{s}.tile.openstreetmap.org/{z}/{x}/{y}.png", {
          maxZoom: 18,
          minZoom: 1,
          attribution: "Map data (c) OpenStreetMap contributors - http://openstreetmap.org"
      });

      var baseLayer = {
        "Base Layer": base_tile
      };

      /*
      list of layers to be added
      */
      var layer_list = {
      
      };
      /*
      Bounding box.
      */
      var southWest = L.latLng(-90, -180),
          northEast = L.latLng(90, 180),
          bounds = L.latLngBounds(southWest, northEast);

      /*
      Creates the map and adds the selected layers
      */
      var map = L.map("'.$attr['id'].'", {
                                       center:['.$attr['lat'].', '.$attr['lon'].'],
                                       zoom: '.$attr['zoom'].',
                                       maxBounds: bounds,
                                       layers: [base_tile]
                                     });

      L.control.layers(baseLayer, layer_list).addTo(map);

      //cluster group
      var clusteredmarkers = L.markerClusterGroup();
      //section for adding clustered markers
      ';
	
    if($attr['type']=='multimarker'){
	// Get our custom fields
	global $post;

	$premarkers=get_post_meta( $post->ID, 'markers', true );
	$markers = json_decode($premarkers,true);
	$legend = get_post_meta( $post->ID, 'maplegendtemplate', true );
	$legendkeys = get_post_meta( $post->ID, 'maplegendkeys', true );

	if (count($markers)>0){
		for ($i = 0;$i < count($markers);$i ++){
			$popup=$legend;
			foreach (explode(',', $legendkeys) as $k) {
				$popup=str_replace("%".$k."%",$markers[$i][$k],$popup);
			};

			$html .='
			     var marker_'.$i.'_icon = L.AwesomeMarkers.icon({ icon: "info-sign",markerColor: "blue",prefix: "glyphicon",extraClasses: "fa-rotate-0"});
      var marker_'.$i.' = L.marker(['.$markers[$i]['lat'].','.$markers[$i]['lon'].'], {"icon":marker_'.$i.'_icon});
      marker_'.$i.'.bindPopup("'.$popup.'");
      marker_'.$i.'._popup.options.maxWidth = 300;
      clusteredmarkers.addLayer(marker_'.$i.');
      
      	';

		}   
	} 
    } 

	$html .= '//add the clustered markers to the group anyway
      map.addLayer(clusteredmarkers);</script>';
	return $html;
	?>

<?php } add_shortcode('MultiMarkerLeafletMap2', 'MultiMarkerLeafletMap2_call'); ?>

Data is passed to the plugin embedded in a WordPress post via three custom fields associated with the post:

  • markers: a JSON list that contains information associated with each marker;
  • maplegendkeys: a comma separated list of key values that refers to keys in each marker object that are referenced when constructing the popup legend for each marker;
  • maplegendtemplate: a template that is used to construct each popup legend, of the form ‘Asset type: %typ% (%loc%)’, where the %VAR% elements identify key vales VAR associated with object attributes in the markers list.

In the set up I have, the post content – including the plugin code – is generated from a Python script running in a Jupyter notebook that can be posted using the following code fragment:

#!pip3 install python-wordpress-xmlrpc

from wordpress_xmlrpc import Client, WordPressPost
from wordpress_xmlrpc.compat import xmlrpc_client
from wordpress_xmlrpc.methods import media, posts
from wordpress_xmlrpc.methods.posts import NewPost

wpoi = Client(WORDPRESS_BLOG_URL+'/xmlrpc.php', 'robot1', WORDPRESS_API_KEY)

def wp_customPost(client,title='ping',content='pong, <em>pong<em>',custom={}):
    post = WordPressPost()
    post.title = title
    post.content = content
    post.custom_fields = []
    for c in custom:
        post.custom_fields.append({
            'key': c,
            'value': custom[c]
        })
    response = client.call(NewPost(post))
    return response

A list of objects is created from a pandas dataframe where each object contains the information associated with each marker – we limit the list to only include items for which we have latitude and longitude information:

def itemiser(row):
    item={'lat':row['latlong'].split(',')[0],
                     'lon': row['latlong'].split(',')[1],
                     'typ':row['Asset Type Description'],
                     'tenure':row['Tenure'],
                     'loc':row['Location'],
                     'location':'{}, {}, {}'.format(row['Address 1'], row['Address 2'], row['Post Code']),
            }
    return item

jd1=df[(df['latlong']!='') & (df['latlong'].notnull())].apply(itemiser,axis=1)

A post is then constructed that includes a reference to the plugin (as part of the text of the body of the post) and the data that is to be passed to the custom variables.

import json

#txt contains the content for the blog post
txt="[MultiMarkerLeafletMap2 zoom=11 lat=50.675 lon=-1.31 width=800 height=500]"
txt='{}{}'.format(txt,'
<div><em>Data produced by <a href="https://www.iwight.com/Council/transparency/Our-Assets/Transparency-Our-Assets/Property">Isle of WIght Council</a>.</div>
')

#jsondata contains the custom variable data that will be associated with the post
jsondata={'markers':json.dumps( jd1.tolist() ),
          'maplegendkeys':'typ,tenure,location,loc',
          'maplegendtemplate':'Asset type: %typ% (%loc%)<br/>Tenure: %tenure%<br/>%location%'}

wp_customPost(wpoi, "Properties on the Isle of Wight Council property register", txt, jsondata)

Student Workload Planning – Section Level Word Count Reports in MS Word Docs

One of the things the OU seems to have gone in for big time lately is “learning design”, with all sorts of planning tools and who knows what to try and help us estimate student workloads.

One piece of internal research I saw suggested that we “adopt a University-wide standard for study speed of 35 words per minute for difficult texts, 70 words per minute for normal texts and 120 words per minute for easy texts”. This is complemented by a recommended level 1 (first year equivalent) 60:40 split between module-directed (course text) work and student-directed (activities, exercises, self-assessment questions, forum activity etc) work. Another constraint is the available study time per week – for a 30 CAT point course (300 hours study), this is nominally set at 10 hours study per week. I seem to recall that retention charts show that retention rates go down as mean study time goes up anywhere close to this…

One of the things that seems to have been adopted is the assumption that the first year equivalent study material should all be rated at the 35 words per minute level. For 60% module led work, at 10 hours a week, this gives approximately 35 * 60 * 6 ~ 1200 words of reading per week. With novels coming in around 500 words a page, that’s 20 pages of reading or so.

This is okay for dense text but we tend to write quite around with strong narrative, using relatively straightforward prose, explaining things a step at a time, with plenty of examples. Dense sentences are rewritten and the word count goes up (but not the reading rate… Not sure I understand that?)

As part of the production process, materials go through multiple drafts and several stages of critical reading by third parties. Part of the critical reading process is to estimate (or check) workload. To assist this, materials are chunked and should be provided with word counts and estimated study times. The authoring process uses Microsoft Word.

As far as I can tell, there is an increasing drive to segment all the materials and chunk them all to be just so, one more step down the line rigidly templated materials. For a level 1 study week, the template seems to be five sections per week with four subsections each, each subsection about 500 words or so. (That is, 10 to 20 blog posts per study week…;-)

I’m not sure what, if any, productivity tools there are to automate the workload guesstimates, but over coffee this morning I though I’d have a go at writing a Visual Basic macro to do do some of the counting for me. I’m not really familiar with VB, in fact, I’m not sure I’ve ever written a macro before, but it seemed to fall together okay if the document was structured appropriately.

To whit, the structure I adopted was: a section to separate each section and subsection (which meant I could count words in each section); a heading as the first line after a section break (so the word count could be associated with the (sub)section heading). This evening, I also started doodling a convention for activities, where an activity would include a line on its own of the form – Estimated study time: NN minutes – which could then be used as a basis for an activity count and an activity study time count.

Running the macro generates a pop up report and also inserts the report at the cursor insertion point. The report for a section looks something like this:

tm112block2part6_d2_2nd_attempt_docm

A final summary report also gives the total number of words.

It should be easy enough to also insert wordcounts into the document at the start of each section, though I’m not sure (yet) how I could put a placeholder in at the start of each section that the macro could update with the current wordcount each time I run it? (Also how the full report could just be updated, rather than appended to the document, which could get really cluttered…) I guess I could also create a separate Word doc, or maybe populate an Excel spreadsheet, with the report data.

Another natural step would be to qualify each subsection with a conventional line declaring the estimated reading complexity level, detecting this, and using it with a WPM rate to estimate the study time of the reading material. Things are complicated somewhat by my version of Word (on a Mac) not supporting regular expressions, but then, in the spirit of trying to build tools at the same level of complexity as the level at which we’re teaching, regex are probably out of scope (too hard, I suspect…)

To my mind, exploring such productivity tools is the sort of thing we should naturally do; at least, it’s the sort of thing that felt natural in a technology department. Computing seems different; computing doesn’t seem to be about understanding the technical world around us and getting our hands dirty with it. It’s about… actually, I’m not sure what it’s about. The above sketch really was a displacement activity – I have no misconceptions at all that the above will generate any interest at all, not even as a simple daily learning exercise (I still try to learn, build or create something new every day to keep the boredom away…) In fact, the “musical differences” between my view of the world and pretty much everyone else’s is getting to the stage where I’m not sure it’s tenable any more. The holiday break can’t come quickly enough… Roll on HoG at the weekend…

Sub WordCount()

    Dim NumSec As Integer
    Dim S As Integer
    Dim Summary As String

    Dim SubsectionCnt As Integer
    Dim SubsectionWordCnt As Integer
    Dim SectionText As String

    Dim ActivityTime As Integer
    Dim OverallActivityTime As Integer
    Dim SectionActivities As Integer

    Dim ParaText As String

    Dim ActivityTimeStr As String

    ActivityTime = 0
    OverallActivityTime = 0
    SectionActivities = 0

    SubsectionCnt = 0
    SubsectionWordCnt = 0

    NumSec = ActiveDocument.Sections.Count
    Summary = "Word Count" & vbCrLf

    For S = 1 To NumSec
        SectionText = ActiveDocument.Sections(S).Range.Paragraphs(1).Range.Text

        For P = 1 To ActiveDocument.Sections(S).Range.Paragraphs.Count
            ParaText = ActiveDocument.Sections(S).Range.Paragraphs(P).Range.Text
            If InStr(ParaText, "Estimated study time:") Then
                ActivityTimeStr = ParaText
                ActivityTimeStr = Replace(ActivityTimeStr, "Estimated study time: ", "")
                ActivityTimeStr = Replace(ActivityTimeStr, " minutes", "")
                ActivityTime = ActivityTime + CInt(ActivityTimeStr)
                SectionActivities = SectionActivities + 1
            End If
        Next

        If InStr(SectionText, "Section") = 1 Then
            OverallActivityTime = OverallActivityTime + OverallActivityTime
            Summary = Summary & vbCrLf & "SECTION SUMMARY" & vbCrLf _
            & "Subsections: " & SubsectionCnt & vbCrLf _
            & "Section Wordcount: " & SubsectionWordCnt & vbCrLf _
            & "Section Activity Time: " & ActivityTime & vbCrLf _
            & "Section Activity Count: " & SectionActivities & vbCrLf & vbCrLf
            SubsectionCnt = 0
            SubsectionWordCnt = 0
            ActivityTime = 0
            SectionActivities = 0
        End If

        Summary = Summary & "[Document Section " & S & "] " _
        & SectionText _
        & "Word count: " _
        & ActiveDocument.Sections(S).Range.Words.Count _
        & vbCrLf

        SubsectionCnt = SubsectionCnt + 1
        SubsectionWordCnt = SubsectionWordCnt + ActiveDocument.Sections(S).Range.Words.Count
    Next

    Summary = Summary & vbCrLf & vbCrLf & "Overall document wordcount: " & _
    ActiveDocument.Range.Words.Count

    Summary = Summary & vbCrLf & "Activity Time: " & ActivityTime & " minutes"
    MsgBox Summary

    Selection.Paragraphs(1).Range.InsertAfter vbCr & Summary & vbCrLf
End Sub

PS I’ve no idea what idiomatic VB is supposed to look like; all the examples I saw seemed universally horrible… If you can give me any pointers to cleaning the above code up, feel free to add them in the comments…

PPS Thinks… I guess each section could also return a readability score? Does VB have a readability score function? VB code anywhere implementing readability scores?