Styling Python and SQL Code in Jupyter Notebooks

One of the magics we use in the TM351 Jupyter notebooks is the ipython-sql magic that lets you create a connection to a database server (in our case, a PostgreSQL database) and then run queries on it:

Whilst we try to use consistent code styling across the notebooks, such as capitalisation of SQL reserved words (SELECT, FROM, WHERE etc), sometimes inconsistencies can crop in. (The same is true when formatting Python code.)

One of the notebook extensions can help in this respect: code prettifier. This extension allows you to style one or all code cells in a notebook using a templated recipe:

The following snippet applies to Python cells and will apply the yapf Python code formatter to Python code cells by default, or the sqlparse SQL code formatter if the cell starts with a %%sql block magic. (It needs a bit more work to cope with %sql line magic, in which case the Python formatter needs to be applied first and then the SQL formatter applied from the start of the %sql line magic to the end of the line.

#"library":'''
import json
def code_reformat(cell_text):
    import yapf.yapflib.yapf_api
    import sqlparse
    import re
    comment = '--%--' if cell_text.startswith('%%sql') else '#%#'
    cell_text = re.sub('^%', comment, cell_text, flags=re.M)
    reformated_text = yapf.yapflib.yapf_api.FormatCode(cell_text)[0] if comment=='#%#' else sqlparse.format(cell_text, keyword_case='upper')
    return re.sub('^{}'.format(comment), '%', reformated_text, flags=re.M)
#''',
#"prefix":"print(json.dumps(code_reformat(u",
#"postfix": ")))"

Or as a string:

"python": {\n"library": "import json\ndef code_reformat(cell_text):\n import yapf.yapflib.yapf_api\n import sqlparse\n import re\n comment = '--%--' if cell_text.startswith('%%sql') else '#%#'\n cell_text = re.sub('^%', comment, cell_text, flags=re.M)\n reformated_text = yapf.yapflib.yapf_api.FormatCode(cell_text)[0] if comment=='#%#' else sqlparse.format(cell_text, keyword_case='upper')\n return re.sub('^{}'.format(comment), '%', reformated_text, flags=re.M)",
"prefix": "print(json.dumps(code_reformat(u",
"postfix": ")))"\n}

On my to do list is to find a way of running the code prettifier over notebooks from the command line using nbconvert. If you have a snippet that shows how to do that, please share via the comments:-)

PS clunky, but this sort of handles the line magic?

import json

def code_reformat(cell_text):
    import yapf.yapflib.yapf_api
    import sqlparse
    import re

    def sqlmatch(match):
        return '%sql'+sqlparse.format(match.group(1), keyword_case='upper')

    comment = '--%--' if cell_text.startswith('%%sql') else '#%#'
    cell_text = re.sub('^%', comment, cell_text, flags=re.M)
    reformatted_text = yapf.yapflib.yapf_api.FormatCode(cell_text)[0] if comment=='#%#' else sqlparse.format(cell_text, keyword_case='upper', identifier_case='upper')
    reformatted_text = re.sub('^{}'.format(comment), '%', reformatted_text, flags=re.M)
    if not cell_text.startswith('%%sql'):
        reformatted_text=re.sub('%sql(.*)',sqlmatch , reformatted_text, re.MULTILINE)
    return reformatted_text

NB the sqlparse function doesn’t seem to handle functions (eg count(*)) [bug?] but this horrible workaround hack to substitute for sqlparse.format() may provide a stop gap?

#replace sqlparse.format with sqlhackformat() defined as follows:
def sqlhackformat(sql):
    #Escape the brackets, parse, then unescape the brackets
    return re.sub(r'\\(.)', r'\1', sqlparse.format(re.escape(sql), keyword_case='upper', identifier_case= 'upper'))

If there is no semi-colon at the end of the final statement, we could handle any extra white space and add one with something like: s  = '{};'.format(s.strip()) if not s.strip().endswith(';') else s.strip().

Author: Tony Hirst

I'm a Senior Lecturer at The Open University, with an interest in #opendata policy and practice, as well as general web tinkering...

%d bloggers like this: