A week or two ago, I bought a couple of ready-made print-on-demand public domain volumes that colleced together all of Andrew Lang’s coloured Fairy Books. There are no contents lists and no index, but the volumes didn’t cost that much more than printing-them on demand myself, and they saved me the immediate hassle of compiling my own PDFs.
But… there’s too much to skim through if you’re trying to find a particular story. So I started to wonder about creating a simple full-text search tool to search through the stories. A first attempt, that scrapes the story texts from sacred-texts.com, can be accessed here but it’s in pretty raw form – a SQL query interface essentially published via GitHub Pages and running against a db in the repo. (The query interface is powered via SQLite compiled to WASM and running in the browser, a trick I discovered several years ago… I’m still waiting for datasette in the browser! ;-))
Anyway… code for the scraper and the db constructor is in the repo, with an earlier version available as a gist. And of course, the query UI is available here. The scraper and sample db queries took maybe a couple of hours to pull together in all. And then another half hour today to set the repo up with the SQL GUI and write thisblog post…
Note to self – the db is intended to run as a full-text searchable db via a neater user inferface, ideally with some sensible facet-based search options (with facet elements identified using entity extraction etc. I think I need to start working on my own “fairy story entities” model too…)
Also on the to do list:
- annotate stories with Aarne-Thompson or Aarne-Thompson-Uther (ATU) story classification codes (is there a dataset I can pull in to do this keyed on story title? There is one for Grimm here. There’s a motif index here.)
- pull out first lines as a separate item;
- explore generating book indexes based on a hacky pagination estimate;
- put together a fairy story entity model.
It’d also be really interesting to come up with a way of tagging each record with a story structure / morphology (e..g. Propp morphology of each story), eg so I could easily search for stories with different structure types.
PS (suggested via @TarkabarkaHolgy / @OnlineCrsLady) also add link to Wikipedia page for each story (thinks: that should be easy enough to at least partially automate…)