Generating HTML Reports from Excel Metadata Sheets and Coded Columns

Head. Cold. Bleurghh. Stuff.

Specifically, Local authority housing statistics data returns for 2017 to 2018.

Downloads are Excel and CSV:

Excel looks like this:

So — lots of sheets, multiheaded columns, one row per authority, no end-user-app sheet that lets you easily pull out data for a particular LA that I could see?

CSV is one row per LA, in a coded wide format — so how do we decode the cols?

There are guidance notes (Completing local authority housing statistics 2017 to 2018: guidance notes and bulk upload) in the form of another Excel document that has what looks like a form layout for each sheet in the report:

If you’re an Excel guru, can you render the wide CSV with coded columns through the sheets in the bulk upload guidance doc?

Here’s what I did…

Imported pandas, a Python package for working with tabular data, and loaded in CSV data, and searched for the Isle of Wight row…

You could probably guess at how to search for the single data row concerning an LA of interest to you.

Next up, read in the bulk upload guidance spreadsheet. Each sheet helps us decode what the code refers to, but we can also use each sheet as a search and replace display surface…

Tidy it up by removing blank rows and columns, then preview the sheet names…

We can generate a simple metadata collection. You don’t really need to make sense of this, but you can probably figure out bits if you look at the code, and the output, in association with some of the sheets. Things in square brackets are index values. Indexes start at 0. The .columns method returns a list of column names on a pandas dataframe.

If we preview one of the tidied bulk uopload sheets, we see how we can use it as a display surface for a laid out report if we swap in the values from the code headed CSV dataset:

I can use the dict generated from the single row of CSV data I’m interested in as the basis for a search and replace on lots of values… The .fillna('') method tidies up the display by removing the NaN values:

If you’re that keen, you can play around with pulling out bits of the guidance dataframe into a smaller, tidier data table. The loc[X,Y] method lets you identify rows and columns you want to grab. The : and [] stuff looks difficult but 10 minutes practice and the pain’ll be over.

We can also generate simple reports around the data if we know the keys. For example:

This is the sort of thing I could see different individuals in a loose affiliation of local news / reporting outlets sharing the work around, writing different bits of boilerplate for their reporting that other hyperlocals can make use of.

Yes, it’s code, but it’s not that hard — you can probably figure out from the above how to write your own bit of templated stuff. And the point about it being code is that hundreds of developer hours aren’t spent writing user interfaces that don’t work very well and that have to cater to a wide range of users.

If you want to play along, you can launch a demo Jupyter notebook (look ’em up) using MyBinder (look it up) by clicking the button…:

Binder

Or if you prefer, here’s the original repo.

Author: Tony Hirst

I'm a Senior Lecturer at The Open University, with an interest in #opendata policy and practice, as well as general web tinkering...

One thought on “Generating HTML Reports from Excel Metadata Sheets and Coded Columns”

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.