One of the magics we use in the TM351 Jupyter notebooks is the ipython-sql magic that lets you create a connection to a database server (in our case, a PostgreSQL database) and then run queries on it:
Whilst we try to use consistent code styling across the notebooks, such as capitalisation of SQL reserved words (SELECT
, FROM
, WHERE
etc), sometimes inconsistencies can crop in. (The same is true when formatting Python code.)
One of the notebook extensions can help in this respect: code prettifier
. This extension allows you to style one or all code cells in a notebook using a templated recipe:
The following snippet applies to Python cells and will apply the yapf
Python code formatter to Python code cells by default, or the sqlparse
SQL code formatter if the cell starts with a %%sql
block magic. (It needs a bit more work to cope with %sql
line magic, in which case the Python formatter needs to be applied first and then the SQL formatter applied from the start of the %sql
line magic to the end of the line.
#"library":''' import json def code_reformat(cell_text): import yapf.yapflib.yapf_api import sqlparse import re comment = '--%--' if cell_text.startswith('%%sql') else '#%#' cell_text = re.sub('^%', comment, cell_text, flags=re.M) reformated_text = yapf.yapflib.yapf_api.FormatCode(cell_text)[0] if comment=='#%#' else sqlparse.format(cell_text, keyword_case='upper') return re.sub('^{}'.format(comment), '%', reformated_text, flags=re.M) #''', #"prefix":"print(json.dumps(code_reformat(u", #"postfix": ")))"
Or as a string:
"python": {\n"library": "import json\ndef code_reformat(cell_text):\n import yapf.yapflib.yapf_api\n import sqlparse\n import re\n comment = '--%--' if cell_text.startswith('%%sql') else '#%#'\n cell_text = re.sub('^%', comment, cell_text, flags=re.M)\n reformated_text = yapf.yapflib.yapf_api.FormatCode(cell_text)[0] if comment=='#%#' else sqlparse.format(cell_text, keyword_case='upper')\n return re.sub('^{}'.format(comment), '%', reformated_text, flags=re.M)", "prefix": "print(json.dumps(code_reformat(u", "postfix": ")))"\n}
On my to do list is to find a way of running the code prettifier over notebooks from the command line using nbconvert
. If you have a snippet that shows how to do that, please share via the comments:-)
PS clunky, but this sort of handles the line magic?
import json def code_reformat(cell_text): import yapf.yapflib.yapf_api import sqlparse import re def sqlmatch(match): return '%sql'+sqlparse.format(match.group(1), keyword_case='upper') comment = '--%--' if cell_text.startswith('%%sql') else '#%#' cell_text = re.sub('^%', comment, cell_text, flags=re.M) reformatted_text = yapf.yapflib.yapf_api.FormatCode(cell_text)[0] if comment=='#%#' else sqlparse.format(cell_text, keyword_case='upper', identifier_case='upper') reformatted_text = re.sub('^{}'.format(comment), '%', reformatted_text, flags=re.M) if not cell_text.startswith('%%sql'): reformatted_text=re.sub('%sql(.*)',sqlmatch , reformatted_text, re.MULTILINE) return reformatted_text
NB the sqlparse
function doesn’t seem to handle functions (eg count(*)
) [bug?] but this horrible workaround hack to substitute for sqlparse.format()
may provide a stop gap?
#replace sqlparse.format with sqlhackformat() defined as follows: def sqlhackformat(sql): #Escape the brackets, parse, then unescape the brackets return re.sub(r'\\(.)', r'\1', sqlparse.format(re.escape(sql), keyword_case='upper', identifier_case= 'upper'))
If there is no semi-colon at the end of the final statement, we could handle any extra white space and add one with something like: s = '{};'.format(s.strip()) if not s.strip().endswith(';') else s.strip()
.