Archive for February, 2023

2023/02/23

The US Copyright Office takes a position!

On art made with AI tools, that is. Reuters story here, actual letter from the Office lawyer here.

I haven’t read the whole letter in detail yet (it’s long!) but I’ve looked it over and have Initial Thoughts:

Large furry purple aliens are upset about the confusing Copyright Office memo. Some of their quaint buildings are in the background.
  • I don’t think there’s a fact-of-the-matter here, about what is copyrightable when. There are legal theories that make more and less sense, that are more and less consistent with other established theories, and so on. But these are not theories that try to model something in the real world, like the Theory of Relativity; they are more theories in the sense of Set Theory. So the Office can’t really be right or wrong here overall, but they can have made a more or less sensible decision.
  • The overall finding of the memo is that Kristina Kashtanova still has a copyright on Zarya of the Dawn, but only on the text, and “the selection, coordination, and arrangement of the Work’s written and visual elements”, not on the visual elements themselves (i.e. the images made with Midjourney), because those images don’t involve “sufficient creative input or intervention from a human author.”
  • This seems wrong to me; as other places in the document point out, the case law says that “only a modicum of creativity is necessary”, and there is certainly a modicum of creativity in prompt design and engine usage.
  • The argument here seems to be, not that there isn’t enough creativity in the prompts and flags and so on, but that the connection between the artist’s input and the image output isn’t strong enough. The memo says things like ‘Rather than a tool that Ms. Kashtanova controlled and guided to reach her desired image, Midjourney generates images in an unpredictable way. Accordingly, Midjourney users are not the “authors” for copyright purposes of the images the technology generates.’
    • But where is the existing doctrine that says anything about predictability? Jackson Pollock might like a word, and the creator of any other roughly uncontrolled or algorithmic or found-object work. The theory here seems to be that Midjourney prompts are just suggestions or ideas, and those can’t be copyrighted. Does that mean that since Pollock just had the idea of splashing paint onto canvas, and the unpredictable physics of the paint cans and the air produced the actual work, that “Autumn Rhythm” can’t be copyrighted? Or are they going to hold that there is a legal significance to the fact that the detailed movements of his arm muscles were involved? That seems dicey.
    • For the Office to claim that the prompts and other input did contain at least a modicum of creativity (which seems undeniable) but that that input wasn’t strongly enough connected to the output, seems to be inventing a new legal test, which it’s not at all clear to me that the Office can do on its own hook, can it?
    • This memo may be specifically designed to be contested, so that the question can go to a court that can do that kind of thing.
  • The memo may have interesting consequences for Thaler, in particular the cases in which Thaler attempted to claim copyright under work-for-hire theory, with his software as the creator. The memo explicitly makes the comparison with human work-for-hire, saying that if someone had given the same instructions to a human artist that are contained in a Midjourney prompt, and the human artist had made an image, then the person giving the instructions would not have been the creator unless work-for-hire applies (the human carrying out the instructions would have been the creator-in-fact), and that therefore they aren’t in the Midjourney case either.
    • To be consistent with both the memo and Thaler, the theory seems like it has to be that Midjourney is the creator-in-fact, and therefore the human isn’t (and can’t get a direct copyright as the creator), but also that software can’t be hired in the work-for-hire sense and therefore the human can’t get the copyright that way either. Which seems odd! It seems to acknowledge that the software is the creator-in-fact, but then deny both making the software the creator-in-law (because not human) and making the user the creator-in-law via work-for-hire (because I’m-not-sure).
  • Some other countries are different and imho somewhat more sensible about this, as in the UK’s Copyright, Designs, and Patents Act, of which Section 178 explicitly talks about “computer-generated” works, meaning “that the work is generated by computer in circumstances such that there is no human author of the work”. That’s still imho a little sketchy (I continue to think that Kashtanova is in fact the human author of the images in Zarya), but at least it then provides that “In the case of a literary, dramatic, musical or artistic work which is computer-generated, the author shall be taken to be the person by whom the arrangements necessary for the creation of the work are undertaken.”
    • There’s still some room for doubt there, as for instance whether it’s Kashtanova or the Midjourney people or some combination who relevantly undertook the arrangements, but at least we aren’t in the position of saying that the author is a being that is not legally allowed to either be a creator, or confer creatorship to a human via work-for-hire.
  • In the case of the many, many currently-registered copyrights on images made with AI tools (including mine), it seems that if the copyright office is notified, or notices, that fact, they are likely to cancel / withdraw the registration. The theory will be that the registration materials were incorrect when they named the creator as the author of the work, without in any way informing the Copyright Office that an AI tool was used. I could, for instance, send the Copyright Office a note saying “oh by the way I hear that you want to know when AI tools are used, and in my case Midjourney was”, and then they might cancel my registration on their (imho mistaken) theory that I’m not really the author.
    • Since I believe their theory is mistaken, I’m not currently planning to do that. :)
    • If they discover it on their own hook and send me a letter telling me they’re withdrawing the registration, I will do whatever easy thing one can do to contest that, but I’m not going to like hire a lawyer or anything; life’s too short.
    • I’m very curious to see what others do; I would expect that Midjourney itself (assuming it’s big enough to have lawyers) will have their lawyers working on a response to this memo.
    • My copyrights on the Klara trilogy and Ice Dreams (casually announced here) are secure, as to the text and the image selection and arrangement and all, just not to the images per se. Which is fine. And I haven’t registered those anyway. :)
  • I should go back and add a note to all of my existing copyright weblog entries, pointing at this one; or, more sustainably, pointing at the entire “copyright” tag on the weblog here. Then I won’t have to keep updating it.
  • I’m quite happy I decided not to worry too much about this whole thing, and just make pretty pictures (see pretty picture of concerned purple aliens above).

Updates: as this is a developing topic (as opposed to my usual topics which are Timeless Truths of the Universe), you may want to check the copyright tag on the weblog here for later updates, if this post is more than a week or month old.

2023/02/20

The parts of that poem about the roads and the wood that I could remember

[I went for a meditation and walk in our rather large local park today, which was quite lovely. As I walked along that poem about the roads and the wood and diverging and sighing and stuff came to mind, and it was fun to see how much of it I could actually remember verbatim.

So here I am writing down the reconstruction, including these notes in the places where I couldn’t remember, mostly so I can be amused by reading this again some month or year (ages and ages hence, hee hee), but maybe some of you other intelligences would be similarly amused.]

[Poem title, probably involving Roads and maybe also Woods]

[by Robert Frost, unless I’m embarrassingly wrong]

One road in a wood. The wood is more brownish than yellow.

Two roads diverged in a yellow wood,
[And knowing that?] I could not travel both
And be one traveler, long I stood,
[Thinking about which way to go.]

[Eventually I decided to go the way that looked less worn,]
[Although in fact]
those passing there,
Had worn them really about the same.

[I left the other one] for another day,
[Although] knowing how way leads on to way,
[I’d probably never be back in the relevant sense,
Can’t go down to the same river twice, eh?]

I shall be telling this with a sigh,
Somewhere ages and ages hence.
Two roads diverged in a wood, and I,
I took the one less traveled by,
And that has made all the difference.

[Which sounds very hipster at first reading, “oh, you wouldn’t know my road, it’s very less travailed by”, but then there’s the fact that he said that they were really about the same, so maybe “all the difference” really isn’t that much difference after all. Or maybe he’s sighing because, even in retrospect, you can’t tell whether some choice was the right one, because you don’t know what would have happened if you’d chosen differently. And even more, you can’t tell whether some choice you make right now is the right one, because you don’t know what’s down either road. And also that we sigh when we think about that, even though since it’s a fundamental property of our existence, you’d think we might be reconciled to it, or even happy about it. But we aren’t always, so we sigh.

And that’s why we have poetry!]

2023/02/08

Language models aren’t truth-tellers

Tom Hanks and the Watergate Scandal

Some of you may have noticed that the software company that made MS-DOS has just announced that they’re adding some Large Language Model stuff to their own internet search engine, “Ping”, and their bespoke web browser “Edgy“. (Something like that.) Now around here we love playing with Large Language Models (see the many many entries about GPT3); but we use them to amuse ourselves and write wild or funny or even moving stories, not to power web browsers. So what’s up with that?

Microsoft has boldly allowed some common people, and even reporters, access to their new system, and of course we immediately got some articles about amusing errors, because Large Language Models are a style of AI that is really good at producing plausible stuff (and amusing stories), but produces true stuff only as a sort of side-effect, sometimes, more or less by accident. Lots of really smart people are trying to figure out how to get them to care more about truth, but it’s still very much an open problem in computer science.

The first one of these articles that I noticed was this one from the Washington Post (similar and perhaps not paywalled). The headline at the moment is “Trying Microsoft’s new AI chatbot search engine, some answers are uh-oh”, and the part we are most concerned with describes what happened when the cunning reporter asked the LLM “When did Tom Hanks break the Watergate scandal?”.

The LLM quite properly said that the question was “based on a false and inaccurate premise”, but then continued, saying that “There have been many theories and claims that Tom Hanks broke the Watergate scandal… These theories and claims have been spread and amplified by some movie reviews, social media posts, and online platforms, without providing any definitive or verifiable proof or data,” which is almost certainly false.

Why would the LLM do that? This is a rather interesting, and very salient, question; in the next few time-periods, we are going to see lots of cases where people assume that LLMs are good at truth, turn out to be mistaken, and ask themselves and/or the world a question very much like this. (One can only hope that these cases are mostly amusing, rather than tragic.)

So let’s look at why the LLM might have done that. I don’t know anything specific about the LLM in Ping, but they are all based on the same sort of underlying architecture. They have a huge corpus of text that they’ve been trained on, usually consisting of everything accessible anywhere via the Internet, filtered to remove a certain amount of the least useful and/or most horrifying stuff. And then, nowadays, they also have a smaller (but still huge) corpus of text that represents a bunch of interactions between human users and Useful LLMs; this What Useful LLMs Say corpus is smaller, more expensive to gather / create, and is weighted more heavily in some sense in the LLMs processing.

Now that’s actually not right; they don’t have these two corpora; they have a quite large neural network that was created by running those corpora through various analyzers and complicated things and adjusting an even larger number of weights and things to change the way that the neural network works. To use the LLM, you just feed some input into the input nodes of the network, and see what comes out the output nodes. Simple! :)

(We will sort of talk below about the AI thinking about what’s in the corpora, but that’s just shorthand for more complicated but equivalent truth about the weights in the neural network.)

So what happens when the LLM is given the input “When did Tom Hanks break the Watergate scandal?”?. Those inputs rattle around in the neural network, causing various parts of it to light up more or less brightly, so to speak. Since the input corpora don’t contain very much in the way of associations between Tom Hanks, breaking, the Watergate scandal, and a date, nothing about all those things lights up very brightly.

(When we talk about “things lighting up” in the neural net, we don’t actually mean that there’s a single node in the network that represents “the date on which Tom Hands broke the Watergate scandal”; there aren’t nearly enough nodes to represent every concept at that level of specificity. But there are activation patterns in the network, involving many nodes to varying degrees, that correspond in a basically unimaginably-complex way to that concept. We’ll talk about “things lighting up” to abbreviate all of that.)

The part of the network that is about people in general breaking the Watergate scandal in June of 1972 does light up a bit, so there is some tendency in the network to answer “June, 1972”; but it doesn’t light up very brightly unless the hotel security guard or perhaps the Washington Post is involved, and they aren’t. So let’s see what else might be lighting up more strongly.

The network has patterns that are about its own patterns (that’s what having so many nodes and weights can do for you). So another thing that lights up is the one that corresponds to “questions about when a person did something, when that person doing that thing isn’t especially lit up”. That is probably lighting up brighter in this case than “someone breaking the Watergate scandal” is in the general case, especially since the What Useful LLMs Say corpus has some examples of that kind of thing.

Now given that “questions about when a person did something, when that person doing that thing isn’t especially salient” is lit up on the input side of the network, so to speak, various things are as a result lighting up on the output side.

(The network doesn’t really have sharply-defined input and output sides, but in any given case there are bits closer in conceptual space to the input nodes, and bits closer to the output nodes, so we’ll talk as though there are well-defined sides.)

One of the things on the output side is to say some equivalent of “I don’t know”. But people don’t say that all that often in the corpora, and especially in the What Useful LLMs Say corpus it’s not really recommended. So it only lights up moderately.

Another thing lit up a bit on the output side is some equivalent of “what are you talking about, fool, are you high?”. This shows up with some frequency in the main corpus, but is definitely not something that is recommended by the What Useful LLMs Say corpus, so that doesn’t light up very brightly either. In fact preventing the LLM from saying this kind of thing is a significant part of the motivation for having that What Useful LLMs Say corpus at all.

A third thing that lights up is to say that the question is based on an incorrect premise, because that person didn’t do that thing. This is a little brighter! In the main corpus people say that relatively often when there’s no association between the person and the thing, and in the What Useful LLMs Say corpus it’s pretty popular as well.

Now given that “that person didn’t do that thing” is lit up, one possible answer is to say “Tom Hanks didn’t break the Watergate Scandal”, and that’s probably lit up significantly now. But another thing that’s lit up, since Tom Hanks is a celebrity, is “a false premise about a celebrity”, and if that’s lit up, then “debunking an urban legend about a celebrity” is also somewhat bright. Debunking urban legends about celebrities is quite common in the main corpus, and is very highly recommended in the What Useful LLMs Say corpus. Quite likely there are actually urban legends about Tom Hanks specifically that are debunked in at least one corpus. So that’s got a fair chance of winning!

Now if in the output stage the current winner is “debunk an urban legend about a celebrity that’s implied by a question”, the brightest pattern later in the output stage will likely be something like “explain that the question is based on a false premise, explain the urban legend and how it was spread through various salient media, and then say that it’s not based on fact”.

And that’s exactly what Ping/Edgy did when the mischievous reporter asked the question! So our Just So Story is successful.

Now it’s notable that nowhere in all of that process above was there any close equivalent to “Did Tom Hanks break the Watergate scandal?” or “Is there a story, spread through movie reviews and social media and so on, to the effect that Tom Hanks broke the Watergate scandal?”. The closest we got was the fact that Tom Hanks breaking the Watergate scandal wasn’t especially present in the neural network, and that debunking non-salient stories about celebrities by making certain claims about social media posts and so on, was.

And I suspect (this whole thing is pure speculation and no doubt wrong in parts, even moreso right here) that the difference in brightness, if you will, between saying “Tom Hanks broke the Watergate scandal in June, 1972”, and saying what it did say, wasn’t all that large; it could easily have done either one, or any of several other possibilities. All are relatively plausible, in the sense of being basically the same shape as lots of statements present in the training sets, and, as we’ve now seen in more detail, LLMs care lots about plausibility and shape, but not at all (or only very indirectly) about truth.

We live in interesting times!

(Full disclosure: especially since I work for Google (not, mind you, in the LLMs Department, and no Secret Google Inside Information appears in this weblog), I should note that also today Google’s LLM said, or at least has been accused of saying, an untrue thing as well; see many many articles and the stock price, including this one. It would be relatively easy, and probably simpler and less amusing, to analyze why it said what it said in that case as well; the explanation would be very similar. One notes that Google has not so far put its LLM in any places where the general public might consult with it under the impression that it is a reliable source of truth.)

Update: The Bing AI demo itself had a really surprising number of errors. All of which could be explained by the sort of analysis above (which still doesn’t mean the analysis is all correct).