Should there be a clause for AI?

Fri Jul 11 07:49:35 UTC 2025

I apologize in advance for opening what is, in my regard, a giant can of 
worms.

Normally, when you use any software under a copyleft license, you must 
disclose any modifications, and release them under said license. However 
recently, with the training of language models and other generative 
methods, there is a laundering effect, where the end user is "handed" 
copyleft licensed code generated by the generative model, which 
effectively bypasses the copyleft clause. To my understanding (I am not 
a USA citizen) this is because, in the USA, the products of non-humans 
are not copyrightable (or copyleftable). At the same time, the capacity 
of machine learning models to output copyleft code *verbatim* is very 
uncomfortable; a picture taken by a monkey isn't copyrightable, or 
attributable to me, but when I purposefully train monkeys to take 
pictures of people, there must be some liability.

Adding a clause to the effect of "copyleft must be preserved even after 
being generated by non-human methods" is obviously unreasonable, one 
can't expect that the end user of a generative model will look up the 
license for each line generated, especially since the recent focus is on 
generating entire app suites or frameworks in an "agentic" fashion (aka 
you create a framework were models attempt to coordinate with themselves 
as to better solve the given task), better known as "vibe coding".

At the same time, the argument from the developers of machine learning 
methods capable of outputting verbatim copyleft licensed code is that 
the model itself isn't a derivative of the code, but rather that the 
code has been embedded into the "semantic understanding" of the model 
(as in, it *has* affected the model parameters, but the model itself 
would exist without it), and therefore the model itself doesn't fall 
under copyleft. In other words, the fact that the copyleft code was 
included in the database rather than the codebase means that there is 
nothing for the copyleft to apply to.

To this extent, I myself personally use a second license, specifically 
for using projects as training data:

---
Training Public License (TPL) v1.0

Copyright © 2025

This code and content is licensed under the GPLv3 or later, with the 
following special condition:

If you use any part of this code, notes, or data for training, 
fine-tuning, or evaluating a machine learning system (including but not 
limited to neural networks, large language models, or any algorithm 
where this content influences the resulting system), you must release 
all resulting models, weights, and related code under the GPLv3 or later.

All other uses are governed by the regular terms of the GPLv3 or later.
---

I wonder if (a) this would make sense in copyleft-next, (b) if it even 
belongs in the conversation, (c) if there is a better way to tackle this.

It does feel like it is a nuclear option, despite adhering perfectly to 
the spirit of copyleft. I guess you can say that copyleft itself is a 
nuclear option.