Information held vs “information” generated

Noting that noyb.eu (“none of your business”; info rights org) has a compliant in, in Austria, against OpenAI — ChatGPT provides false information about people, and OpenAI can’t correct it.

So I wonder: if you make a DPA subject access request for everything a company holds about you in its databases, it can look that data up.

But what if you ask the company what it “believes” about you, e.g. if you search any of its information systems, and assuming that genAI models are classed as “information systems” (if not, what are they classed as?).

If the model has been trained on personal information, then that information has influenced the model weights and may return elements of that information through a statistical process. If you ask a model what it “knows” about a particular person, then could you argue that what it returns is what that organisation believes about you, and is therefore subject to personal information subject access requests? If the process is a statistical one, how can be return with any degree of confidence what that “information” is? And how can it correct it?

How responsible is a company for any generated statements a model may make about you if a model operated by the company:

  • has been trained from the ground up on the company’s data?
  • is the result of a third party model have been further trained or fine-tuned on the company’s data?
  • is purely a third party model?

If a company operates a retrieval augmented generation (RAG) process where your (actual) data is ‘interpreted” and returned after processing through a model as generated text, is that generated text what the company believes about you?

If a marketing company has a database that puts me into a particular labeled demographic group, can I request what those groups and what those labels are?

If someone in a company looks me up and a conversational AI labels me based on its training and based on my data (eg provided via a RAG mechanism), am I allowed to request what that information was, eg via chat logs? But what about the next time someone in the company ask a corporate chatUI about me, what will it say then? Can I ask for everything the information may say about me, ever, along with probabilities for each response?!

What is the status, in GDPR terms, of “generated information” (is it even information)? How does “generated information” relate to information (or data) “held” about me.

For a person using a user interface, how is “generated information” distinguished from retrieved information? In each case, to what extent might that information be said to be what the company “believes” about me?

PS see also Appropriate/ing Knowledge and Belief Tools?

Author: Tony Hirst

I'm a Senior Lecturer at The Open University, with an interest in #opendata policy and practice, as well as general web tinkering...

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.