When I write code, I typically co-opt functions and algorithms I’ve pinched from elsewhere.
There are Python packages out there that are likely to do pretty much whatever you want, at least as a first draft, so to my mind, it makes sense to use them, and file issues (and occasionally PRs) if they don’t quite do what I want or expect, or if I find they aren’t working as advertised or as intended. (I’m also one of those people who emails webmasters to tell them their website is broken…)
But I’m starting to realise that there’s now a whole other class of off-the-shelf computational building blocks available in the form of AI pipelines. (For example, I’ve been using the whisper
speech2text model for some time via a Python package.)
For example, the Hugging Face huggingface/transformers
package contains a whole raft of pre-trained AI models wrapped by simple, handy Python function calls.
For example, consider a “table question answering” task: how would I go about creating a simple application to help me ask natural language queries over a tabular data set? A peek at the Hugging Face table question answering task suggests using the table-question-answering
transformer and the google/tapas-base-finetuned-wtq
model:
from transformers import pipeline
import pandas as pd
# prepare table + question
data = {"Actors": ["Brad Pitt", "Leonardo Di Caprio", "George Clooney"], "Number of movies": ["87", "53", "69"]}
table = pd.DataFrame.from_dict(data)
question = "how many movies does Leonardo Di Caprio have?"
# pipeline model
# Note: you must to install torch-scatter first.
tqa = pipeline(task="table-question-answering", model="google/tapas-large-finetuned-wtq")
# result
print(tqa(table=table, query=query)['cells'][0])
#53
The task description also points to at least one application demonstrating the approach — Table Question Answering (TAPAS) — although when I tried I could only get it to give one column in reply on queries I posed to it…
…which is to say the the models / applications may not do quite what you want them to do. But as with a lot of found code, it can often help get you started, either with some code you can revise, or as an example of an approach that does not do what you want and that you should avoid.
Now I’m wondering: are there other transformers
like packages out there? And should I be looking at getting myself a m/c with a reasonable GPU so I can run this stuff locally… Or bite the bullet and start paying for AI APIs and on-demand GPU servers…