How Do We Prevent Large Language Models From Being Trained On Their Own Outputs?

I wondered about this over on the Twitterz…

@jordaneiter suggested: lang=”en ai”

@haihaeppchen elaborated:

That should also make it possible to have a browser extension that marks those sections…

theoretically, ‘cite’ could be mis-used for this. I also like the idea of adding an extension to language tags, e.g., lang=”en-US-ai” (would probably need to supersede RFC 564)

Author: Tony Hirst

I'm a Senior Lecturer at The Open University, with an interest in #opendata policy and practice, as well as general web tinkering...

