Displaying Wide pandas Dataframes Over Several, Narrower Dataframes

One of the problems associated with using Jupyter notebooks, and Jupyter Book, for publishing content that incorporates pandas dataframes is that wide dataframes are overflowed in the displayed page. In an HTML output, the wide dataframe is rendered via a scroll box; in a PDF output, the content is just lost over the edge of page.

Controls are available to limit the width of the displayed dataframe:

# Pandas dataframe display options
pd.options.display.show_dimensions=False
pd.options.display.width = 80
pd.options.display.max_colwidth = 10
pd.options.display.max_columns = 10

but sometimes you want to see all the columns.

The easiest way of rendering all the columns is to split the wide table over several smaller tables, each with a limited number of columns that do fit in the page width, but I havenlt spotted a simple way to do that (which is really odd, becuase I suspect this is a widely (?!) desired feature..

Anyway, here’s a first attempt at a simple function to do that:

#| echo: false
import pandas as pd

def split_wide_table(df, split=True,
                     hide=None, padding=1, width=None):
    """Split a wide table over multiple smaller tables."""
    # Note: column widths may be much larger than heading widths
    if hide:
        hide = [hide] if isinstance(hide, str) else hide
    else:
        hide = []
    cols = [c for c in df.columns if c not in hide]
    width = width or pd.options.display.width
    _cols = []
    _w = 0
    continues = False
    for col in cols:
        if (len("".join(_cols)) + (len(_cols)+1)*padding) < width:
            _cols.append(col)
        else:
            print("\n")
            display(df[_cols].style.hide(axis="index"))
            _cols = []
            continues = False
    if _cols:
        print("\n")
        display(df[_cols].style.hide(axis="index"))

Given a wide dataframe such as the following:

wdf = pd.DataFrame({f"col{x}":range(3) for x in range(25)})
wdf

We can instead render the table over multiple, narrower tables:

I havenlt really written any decorators before, but it also seems to me that this sort of approach might make decorative display sense…

For example, given the decorator:

def split_wide_display(func):
    """Split a wide table of multiple tables."""
    def inner(*args, **kwargs):
        resp = func(*args[:3])
        if (not "display" in kwargs) or \
            kwargs["display"]:
            # Optionally display the potentially wide
            # table as a set of narrower tables
            split_wide_table(resp, **kwargs)
        return resp
 
    return inner

we can decorate a function returns a wide dataframe to optionally display the dataframe over mulitple narrower tables:

# Use fastf1 Formula One data grabbing package
from fastf1 import ergast

@split_wide_display
def ergast_race_results(year, event, session):
    erd_race = ergast.fetch_results(year, event, session)
    df = pd.json_normalize(erd_race)
    return df

It seems really odd that I have to hack this solution together, and there isn’t a default way of doing this? Or maybe there is, and I just missed it? If so, please let me know via the comments etc…

PS the splitter is far from optimal, becuase it splits columns on the basis of the cumulative column header widths, rather than the larger of the column head or column content width. I’m not sure what the best way might be of finding the max displayed column content width?

PPS is there a Sphinx extension to magically handle splitting overflowing tables into narraoewer subtables somehow? And a LaTeX one?

Author: Tony Hirst

I'm a Senior Lecturer at The Open University, with an interest in #opendata policy and practice, as well as general web tinkering...

%d bloggers like this: