In Build Your Own Learning Tools I described how I was inspired to try to build a simple colour palette explorer to complement an OpenLearn unit on art history that asked learners various questions about the colour themes used in particular paintings.
This is in part driven by the Jupyter notebook solutionism obsession that’s currently consuming me, where I’m trying to put together various demos that show how Jupyter notebooks might be used as an authoring tool to support the creation – and enhancement – of reproducible (which is also to say, easily modifiable) (open) educational resources.
Over the last couple of days, I’ve started looking at
cltk, a classics fork of the
nltk Python natural language toolkit. This provides access to a wide range of classical texts in a variety of languages, and text analysis tools to work with them. I’m struggling to find easy ways in to work with this package, so progress is a bit slower than I’d liked in just familiarising myself with it, and I’m yet to look at it in the context of some actual Latin or Greek open teaching material (eg Discovering Ancient Greek and Latin or Getting started on classical Latin).
As I’ve been trying to familiarise myself with the package, I’ve been reflecting on things that may be helpful to me as a learner if I was trying to get started with reading Latin or Greek texts.
One thing would be accessing texts:
cltk does have a wide range of corpuses available, but I’ve struggled to find any index files or metadata files to help me know what’s in them and how to retrieve them.
Apols for screenshots rather than code, and incomplete code screenshots at that. A link to the notebook is provided at the end of the post if you want the src.
One you have found and loaded a text, it’s easy to search:
You can also find concordances:
There is a trained named entity tagger, although it seems to be a bit basic/ ropey. It’s something to work with, though:
On my to do list generally is learn How to Train your Own Model with NLTK and Stanford NER Tagger? (for English, French, German…).
Latin declensions are something I remember from my schooldays. Enumerating them may be handy when creating resources, and might also be useful to students. One thing I did find, though, was a lack of documentation about how to decode the, anyone?, anyone?, person/tense information? Anyone, anyone? Really… Anyone? (I imagine
present, maybe but then what. Also, isn’t is conjugation for verbs, rather than declension? )
From a declension/conjucation/whatever it is, we can do a lookup, but to do this you need to know the root(?), and for it to be useful, you need to know how to read the grammar(?) code string:
So how do I properly decode those strings? Docs anywhere? Maybe even a simple py function that turns a string like
v2pfia--- into words?
Another issue I’m guessing students face when reading classical texts, and that educators must grapple with when teaching people how to read the texts in a prosodically meaningful way, is how to spilt the words into syllables and how to sound them out.
Splitting texts into syllables may help with this, and is also likely to be useful when working out the meter of a verse, for example?
Transliteration into a phonetic alphabet may also be useful, although this requires that the learner also knows how to read and sound out characters in that alphabet. (The one used in
cltk (IPA phonetic transliteration alphabet) differs from the one used in the OpenLearn texts I skimmed (any idea what that one is called? Anyone, anyone? I’m not sure if
cltk offers other alphabets? Or how you’d train a system to use one?)
As I mentioned, I’ve been struggling to find many useful docs/tutorials for working with
cltk, and haven’t managed to find any meaningful corpus metadata (or a recipe for building it in a standard way). If you can point me to anything useful, please do so via the comments…
You can find my current work in progress notebook on Azure notebooks: Getting Started With Notebooks/4.2.0 Classics.ipynb.