From Packages to Transformers and Pipelines

When I write code, I typically co-opt functions and algorithms I’ve pinched from elsewhere.

There are Python packages out there that are likely to do pretty much whatever you want, at least as a first draft, so to my mind, it makes sense to use them, and file issues (and occasionally PRs) if they don’t quite do what I want or expect, or if I find they aren’t working as advertised or as intended. (I’m also one of those people who emails webmasters to tell them their website is broken…)

But I’m starting to realise that there’s now a whole other class of off-the-shelf computational building blocks available in the form of AI pipelines. (For example, I’ve been using the whisper speech2text model for some time via a Python package.)

For example, the Hugging Face huggingface/transformers package contains a whole raft of pre-trained AI models wrapped by simple, handy Python function calls.

For example, consider a “table question answering” task: how would I go about creating a simple application to help me ask natural language queries over a tabular data set? A peek at the Hugging Face table question answering task suggests using the table-question-answering transformer and the google/tapas-base-finetuned-wtq model:

from transformers import pipeline
import pandas as pd

# prepare table + question
data = {"Actors": ["Brad Pitt", "Leonardo Di Caprio", "George Clooney"], "Number of movies": ["87", "53", "69"]}
table = pd.DataFrame.from_dict(data)
question = "how many movies does Leonardo Di Caprio have?"

# pipeline model
# Note: you must to install torch-scatter first.
tqa = pipeline(task="table-question-answering", model="google/tapas-large-finetuned-wtq")

# result

print(tqa(table=table, query=query)['cells'][0])
#53

The task description also points to at least one application demonstrating the approach — Table Question Answering (TAPAS) — although when I tried I could only get it to give one column in reply on queries I posed to it…

…which is to say the the models / applications may not do quite what you want them to do. But as with a lot of found code, it can often help get you started, either with some code you can revise, or as an example of an approach that does not do what you want and that you should avoid.

Now I’m wondering: are there other transformers like packages out there? And should I be looking at getting myself a m/c with a reasonable GPU so I can run this stuff locally… Or bite the bullet and start paying for AI APIs and on-demand GPU servers…

Author: Tony Hirst

I'm a Senior Lecturer at The Open University, with an interest in #opendata policy and practice, as well as general web tinkering...

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: