Converting Pandas Generated HTML Data Tables to PNG Images

Over the weekend, I noticed the Dakar 2019 rally was on, which resulted in my spending Sunday evening putting a scraper together to grab timing data down from the official website (notebook code here).

The data on its own is all a bit “so  what?”; it only comes alive when you start playing with it. One of the displays I was still tinkering with at the end of last year’s WRC season was a tabular stage report that tries to capture a chunk of information from stage timing, such as split times, so it made sense to start riffing on that.

The Rally Dakar timing screen presents split times like this:

You can get a view with either of the running time in stage at each split / waypoint, or the gap, which is to say, the time difference to the person who was fastest at each split. (I think sometimes Gap times may report the time difference to the person who ranked first on the stage overall, rather than the gap to the person ranked first at a particular split.)

One of the things that interests me (from a data storytelling point of view) is how much time a driver gains, or loses, within a split compared to other drivers. We can use this to spot parts of the stage where a driver has hit a problem, or pushed hard.

The sort of display I’ve been working up looks, at least with the Dakar data, like this so far (there are a few columns missing compared to my WRC tables, but there’s also an extra one: the in-line bimodal sparkline chart).

This particular view displays split times rebased relative to Peterhansel (it’s easy enough to generate views rebased relative to any other specified driver). That is, the table shows how much time Peterhansel gained/lost relative to each other driver at each waypoint. The table is ordered by stage rank. The columns on the left show how much time was gained/lost going from one waypoint to the next. The columns on the right show how the gap relative to each driver evolved over the stage. The inline chart tracks the gap evolution.

The table is a styled pandas table, rendered as HTML. After applying styling, you can get a preview in a notebook using something of the form:

from IPython.display import display, HTML
display( HTML( df.style.render() ) )

I’ve previously posted a recipe for Grabbing Screenshots of folium Produced Choropleth Leaflet Maps from Python Code Using Selenium so here’s the latest iteration of my code fragment (which built on the previous example) for taking a chunk of HTML and using selenium to open it in a browser and grab a screenshot of it.

The code is h/t to several Stack Overflow posts.

import os
import time
from selenium import webdriver

#Via https://stackoverflow.com/a/52572919/454773
def setup_screenshot(driver,path):
    ''' Grab screenshot of browser rendered HTML.
        Ensure the browser is sized to display all the HTML content. '''
    # Ref: https://stackoverflow.com/a/52572919/
    original_size = driver.get_window_size()
    required_width = driver.execute_script('return document.body.parentNode.scrollWidth')
    required_height = driver.execute_script('return document.body.parentNode.scrollHeight')
    driver.set_window_size(required_width, required_height)
    # driver.save_screenshot(path)  # has scrollbar
    driver.find_element_by_tag_name('body').screenshot(path)  # avoids scrollbar
    driver.set_window_size(original_size['width'], original_size['height'])

def getTableImage(url, fn='dummy_table', basepath='.', path='.', delay=5, height=420, width=800):
    ''' Render HTML file in browser and grab a screenshot. '''
    browser = webdriver.Chrome()

    browser.get(url)
    #Give the html some time to load
    time.sleep(delay)
    imgpath='{}/{}.png'.format(path,fn)
    imgfn = '{}/{}'.format(basepath, imgpath)
    imgfile = '{}/{}'.format(os.getcwd(),imgfn)

    setup_screenshot(browser,imgfile)
    browser.quit()
    os.remove(imgfile.replace('.png','.html'))
    #print(imgfn)
    return imgpath

def getTablePNG(tablehtml, basepath='.', path='testpng', fnstub='testhtml'):
    ''' Save HTML table as: {basepath}/{path}/{fnstub}.png '''
    if not os.path.exists(path):
        os.makedirs('{}/{}'.format(basepath, path))
    fn='{cwd}/{basepath}/{path}/{fn}.html'.format(cwd=os.getcwd(), basepath=basepath, path=path,fn=fnstub)
    tmpurl='file://{fn}'.format(fn=fn)
    with open(fn, 'w') as out:
        out.write(tablehtml)
    return getTableImage(tmpurl, fnstub, basepath, path)

#call as: getTablePNG(s)
#where s is a string containing html, eg s = df.style.render()

The png image is saved as an image file that can be embedded in other HTML pages, shared via soshul meeja, etc…

PS here’s what looks to be another route (though I haven’t tried it yet..) using imgkit with xfvb in a headless server environment: https://github.com/fomightez/dataframe2img

Author: Tony Hirst

I'm a Senior Lecturer at The Open University, with an interest in #opendata policy and practice, as well as general web tinkering...

%d bloggers like this: