From bkuhn at sfconservancy.org  Fri Oct 25 20:42:20 2024
From: bkuhn at sfconservancy.org (Bradley M. Kuhn)
Date: Fri, 25 Oct 2024 13:42:20 -0700
Subject: SFC Announces Aspirational Statement on LLM-backed generative AI
 for Programming:
 https://sfconservancy.org/news/2024/oct/25/aspirational-on-llm-generative-ai-programming/
Message-ID: <87sesj3oer.fsf@ebb.org>

I wanted to open any discussion, comments, complaints, inquiries, or the
rest that folks here might have about the statement that SFC issued today.
It's included in full below so you don't have to go looking for it and/or
click through:

                  SFC Announces Aspirational Statement on
                  LLM-backed generative AI for Programming

Main URL:  https://sfconservancy.org/news/2024/oct/25/aspirational-on-llm-generative-ai-programming/
Fediverse: https://social.sfconservancy.org/notice/AnNIXuHBBEJUKN5I80

In 2022, Software Freedom Conservancy (SFC) [0] convened a committee in the 
wake of Microsoft's GitHub Copilot announcement [1], to meet and begin
considering the complex questions that arise from the use of large language
models (LLMs) in generative AI systems that seek to assist software developers.

Today, we announce a joint statement by this committee [2], entitled
Machine-Learning-Assisted Programming that Respects User Freedom.

Everyone on our committee has watched as interest in this issue has grown in
the FOSS community. While the Committee was initially convened to consider how
copyleft related to these systems, our focus changed as we considered the
complex issues. With the unending influx of models, products, and projects in
this area, we began to see a potential dystopia: no systems available today are
reproducible by the public, and all of them seem to disrespect user rights and
freedoms in some manner. Rather than despair, we turned our minds to what FOSS
does best: imagining the ideal if corporate interests were not the primary
force defining society's relationship with software.

In the past, the FOSS community has responded to new challenges with a
race-to-the-bottom document that defines the bare minimum of user rights and
freedoms that the community of activists will accept. For-profit companies hope
to legitimately claim whatever they produce is ?FOSS enough?. As such, we have
avoided any process that effectively auto-endorses the problematic practices of
companies whose proprietary products are already widely deployed . No system,
particularly a proprietary one, should ever be "too big to fail".

While our proposal may seem unrealistic, nearly every proposal in the history
of FOSS has seemed unrealistic ? until it happened. We call on the FOSS
community to not lament what is, but to dream and strive for what can be. The
statement follows:

Machine-Learning-Assisted Programming that Respects User Freedom

There has been intense industry ballyhoo about a specific branch of Artificial
Intelligence (AI): generative AI backed by large language models (LLMs). We
have reached an era in computing history where input data sets for many
different types of works are quite large (after decades of Internet content
archiving), and hardware is powerful enough to rebuild LLMs repetitively. As
FOSS (Free and Open Source Software) activists, we must turn at least a modicum
of attention to the matter, lest its future be dominated by the same
proprietary software companies that have curtailed user rights for so long.

LLM-backed generative AI impacts the rights of everyone ? including developers,
creators, and users. Software freedom, both in theory and practice, yields
substantial public good. Yet, traditional, narrow FOSS analysis has boundaries
and confines; it's inadequate when applied to these technologies.

We propose an aspirational vision of a FOSS, LLM-backed generative AI system
for computer-assisted programming that software rights supporters would be
proud to use and improve.

This narrow approach is by design. We are keenly cognizant that LLMs have been
built for myriad works ? from visual art, to the spoken human voice, to music,
to literature, to actors' performances. However, this document focuses on
systems that employ LLM-backed generative AI to assist programmers because such
systems have a critical role in the future of FOSS. While the impact of
AI-based programming assistants' in the daily life of programmers remains
unclear (in the long term), it seems likely that AI assistants have the
potential to advance FOSS goals around the democratization of software
development. For example, such systems help newcomers get started with
unfamiliar codebases. We must look hopefully to these technologies and seek
ways to deploy them that help everyone.

Aspirational Target for a Software-Rights-Respecting AI Assisted Programming
System

The ideal system for generative-AI-assisted programming should have the
following properties:

 1. The system is built using only FOSS, and is used only for the creation of
    FOSS, and never for proprietary software. In this manner, the system would
    propagate and improve interest in software freedom and rights.
 2. The system must respect the principle of ?FOSS in, FOSS out, and FOSS
    throughout?. In detail, this means:
  2(a). All software and generally useful technical information (including but
        not limited to: user interface code and applications for generating new
        material from the model, data cleaning code, model architecture, hyper
        parameters, model weights, and the model itself) needed to create the
        system are freely available to the public under a FOSS license [3].
  2(b). All training data should be fully identified, and available freely and
        publicly on the Internet, under a FOSS license.
 3. The system will aid the user in adding necessary licensing notices and
    determining any licensing requirements [4] of the output.

As an aspirational document, this is not intended to be prescriptive nor
definitional. We describe the absolute ideal LLM-backed generative AI system
for FOSS that we can imagine. Articulating the ideal paves the road to
understanding why common consensus remains insufficient. We must be the
change we want in the world, and strive for what is right ? until the
politically unviable becomes viable.

???????????????????????????????????????????????????????????????????????????????
References / Footnotes:

[0] https://sfconservancy.org/news/2022/feb/23/committee-ai-assisted-software-github-copilot/
[1] https://sfconservancy.org/blog/2022/feb/03/github-copilot-copyleft-gpl/
[2] https://sfconservancy.org/activities/aspirational-statement-on-llm-generative-ai-for-programming.html
[3] It is well established that FOSS activists consider it a moral imperative
    to share any generally useful technical information under a FOSS license.
    As such, we should not tolerate any portion of the software and generally
    useful technical information released under a license that is non-FOSS. 

[4] Since recitation (i.e., verbatim repeating of parts of the training set)
    is known to occur in these systems, we know they will occasionally output
    Works Based on the training set, so our ideal system would be capable of
    notifying the user that recitation occurred and properly mark the licensing
    for it.
-- 
Bradley M. Kuhn - he/them
Policy Fellow & Hacker-in-Residence at Software Freedom Conservancy
========================================================================
Become a Conservancy Sustainer today: https://sfconservancy.org/sustainer