Friend Steve and I were talking about the whole “LLMs are an extinction-level risk to humanity” thing (recent letter, one account of why it’s rather silly), and about how, if one accepted that there was an extinction-level risk and wanted to bring it about because one is evil or whatever, what one would actually do.
The people talking about this “risk” are notably silent on this sort of issue; they prefer that we should just believe them (and give them funding, and not worry about the less exciting-sounding bad things that their AI companies are actually doing), not ask “extinction how, exactly?”.
So here’s how you do the whole Universal Paperclips thing with an LLM, and end up destroying the world or whatever:
Put together a summary of the current state of the world, the goal (“get lots of Likes on Twitter” or whatever), and the prompt (“please describe the next steps toward attaining the goal, and what kinds of things about the world should be in the state-summary next time around”).
Feed that to your LLM,
Take the output of the LLM, and do whatever it says (“plug-ins” lol),
Go to (1).
Presumably, if there’s the obvious sort of “extinction-level risks” here, after awhile all humans will have been replaced by robots that do nothing but click “Like” on Twitter all day.
Needless (I hope) to say, this would not actually work. But given that it wouldn’t, it seems like if you want to convince people that LLMs pose a Huge Risk to humanity at large, you’ve got to say something about how, unless you’re just in it for the clicks and headlines and to distract from your current massive power-grab.
(Or, if you think it would work, feel free to replace “get lots of Likes on Twitter” with “make [your name here] very wealthy”, and go for it! Let us know how that goes.)
Relatedly, this seems like an excellent response to The Letter:
So, I was reading Ligotti’s “The Shadow at the Bottom of the World” and thinking that it was pretty good, and that he tends to picture the ultimate squirming chaos at the center of reality as sometimes attractive (if perversely) as well as finally horrifying, in a way that I don’t remember Lovecraft doing, at least not as sympathetically.
At some points, for instance in The Tsalal (whose name is explicitly a reference to Poe), I thought that when a character points out with horror that appearances are just temporary masks on an indescribable underlying reality, it sounds (without the horror part) quite a bit like various aspects of the Dharma (it does, right?). This seemed, I thought, like something that someone might write a weblog entry or something about!
And then, because my mind has been infected (in a mildly Cthulhu-esque way perhaps) by the idea of LLMs writing text, I thought I would propose the notion to ChatGPT or Bard or something, and have it write words! It would either do a good job, which would be interesting, or an amusingly bad job, which would be amusing.
I thought the “amusingly bad job” outcome was more likely, but, well, heck.
I used ChatGPT 4.0 rather than 3.5 or Bard, just because it’s gotten the most hype, and also because since I’m paying money to use it I thought it might be better, so if it failed amusingly it might be more meaningful. Here’s the interaction (and I do realize that my “always changing” is problematic as a characterization of the Dharma, but then so is everything else):
Me: For an essay entitled “Azathoth and the Dharma”, I’d like to discuss how the idea that appearances are only temporary masks over a deeper truth that is indescribable and always changing, features both in the kind of intellectual horror written by Lovecraft and especially Ligotti, and in Buddhist beliefs about truth and the Dharma, and that the main difference is one of emotional overtones with which it is presented. Lord Dunsany might also be relevant. Please draft the beginning of this essay for me.
ChatGPT 4.0:
Title: “Azathoth and the Dharma: Transient Masks of Deeper Truth in Lovecraft, Ligotti, and Buddhist Philosophy”
In the chaotic cosmos of H.P. Lovecraft and Thomas Ligotti, we find ourselves confronted with a universe filled with inscrutable forces and ancient entities, their true natures hidden beneath uncanny, horrifying masks. The universe is not simply indifferent to humanity; it is utterly alien, impenetrable to human understanding, and constantly in flux. These themes of grotesque transformation and unknowable horror are at the heart of their works. And yet, when viewed from another angle, these concepts can be seen as echoing the fundamental principles of Buddhist philosophy, particularly the teachings of the Dharma. They, too, postulate a reality that is continuously changing and ultimately indescribable.
The Dharma, the cosmic law and moral order that Buddhists seek to understand and follow, hinges on several key principles, one of which is Anicca, or impermanence. Anicca posits that all conditioned phenomena are in a state of constant change. Whether we consider the universe at large or the very essence of our being, everything is transitory, and appearances are fleeting. Beneath the apparent solidity of phenomena lies an incomprehensible flux of arising and ceasing events, an underlying truth that cannot be adequately captured in words.
This notion aligns strikingly with the chilling cosmologies of Lovecraft and Ligotti. Azathoth, Lovecraft’s nuclear chaos at the center of the cosmos, is the ultimate symbol of this philosophy. It is the blind idiot god, an embodiment of constant change and the source of all existence. It is unknowable, its essence cloaked in cosmic chaos and frenetic piping, its reality beyond human comprehension.
Similarly, in the phantasmagoric narratives of Ligotti, we find the protagonists often straddling the line between grim reality and unfathomable horror, their worldviews shattering in the face of a truth too alien and amorphous for the human mind to comprehend. It’s not unlike the Buddhist concept of Sunyata, or emptiness, which refers to the lack of inherent or eternal essence in all phenomena.
Yet, as we draw these parallels, we are confronted by the starkly contrasting emotional overtones that characterize these disparate systems of thought. In Buddhism, the realization of Anicca and Sunyata is liberating, offering a path towards enlightenment and the cessation of suffering. But in the macabre tapestries woven by Lovecraft and Ligotti, the confrontation with these realities is inherently terrifying, inducing existential dread and insanity.
The eerie fables of Lord Dunsany, who greatly influenced Lovecraft, might offer an intermediary perspective. He presents us with bizarre, otherworldly landscapes and creatures, yet there’s often a whimsical, dreamlike quality to his work. The absurd and inexplicable are not strictly horrifying, but engender a complex mixture of awe, fascination, and unease.
This divergence in emotional reaction prompts us to consider the nature of human perception and our responses to the deep, enigmatic truths of existence. Is the Lovecraftian terror at the chaotic heart of reality as valid a response as the Buddhist acceptance and quest for liberation? Or does it simply reflect our inherent fear of the unknown, our resistance to the dissolution of ego, the prospect of losing what we perceive as our solid, unchanging selves? In our exploration of ‘Azathoth and the Dharma’, we delve into these questions, probing the intersection of cosmic horror and Buddhist philosophy, and the deeply human, emotionally-charged responses they elicit.
Transcript ends
I’m not sure how to feel about this. It did quite a good job, at least on a few casual readings. The references to anicca and sunyata are plausible, no worse than I would expect from any reasonably educated person daring to write on those concepts, and they weren’t specified in the prompt; the LLM was clued enough to bring them in itself. The paragraph on Dunsany is a little thin, but again no worse than I would expect from a decent human writer; I did ask for a Dunsany link, and this is the obvious one.
It clearly knows (“knows”) who Lovecraft, Ligotti, and Dunsany are, and what in general the Dharma is. I would definitely edit the phrase “the fundamental principles of Buddhist philosophy, particularly the teachings of the Dharma”, but yet again this is a matter of subtle tone, not by any means an egregious error.
It did nothing that (from a human writer!) would have been surprising or even especially non-obvious (although the thought that the feeling of horror may “simply reflect our inherent fear of the unknown, our resistance to the dissolution of ego, the prospect of losing what we perceive as our solid, unchanging selves” is perhaps beyond trivial extrapolation of the assignment). But its unsurprising and mostly obvious expansion of my original idea into seven nice paragraphs of solid prose (with a fitting subtitle for good measure) contains, I have to admit, no amusing errors that I can find!
I even asked it for some references and further reading, and it gave a decent dozen suggestions for both Lovecraftian horror and Buddhism, all of which actually exist; my only criticism being that
“Noel, Daniel C. The Soul of Shamanism: Western Fantasies, Imaginal Realities. Continuum, 1997. This book explores the intersection of spirituality and fantasy literature, which could be useful to draw broader conclusions about your topic.”
is a bit of a stretch. On the other side, when I asked it if it knew of any existing works making the same sort of link that I propose here, it did not make one up! It said encouragingly that it didn’t know of any, and finished off with this:
“Overall, your proposed exploration of the specific connection between Lovecraftian horror and Buddhist philosophy seems to be quite unique, and it’s an interesting new angle to consider in the study of both these areas.”
Well… thanks, ChatGPT, that’s… nice. I guess?
I feel like this all certainly means something. Or perhaps it’s just the thin, monotonous piping of an unseen flute!
Disclaimer: I work for Google, arguably a competitor of OpenAI; but these opinions are solely my own, and I don’t work in the AI area at Google or anything.
And I mean, oh come on!
So much AI “research” in these hype-heavy times is all bunk, and I suppose one shouldn’t expect OpenAI (“Open” heh heh) to be any different. But this pattern of:
Use an AI to try to do some interesting-sounding thing,
Evaluate how well it did by waving your hands around, or just by eyeballing it,
Declare victory,
Publish an “AI can do thing!!!!” paper that will get lots of media attention.
is just sooooo tiring. (See for instance the YouTuber in that prior post who showed their system producing a non-working tic-tac-toe game and saying “well, that worked!”.)
The one I’m facepalming about here was brought to my attention by my friend Steve, and omg: “Language models can explain neurons in language models“. They did sort of the obvious thing to try to get GPT-4 to make predictions about how a few selected “neurons” in GPT-2 behave for a few inputs. The key line for me in the paper is:
“Although the vast majority of our explanations score poorly, we believe we can now use ML techniques to further improve our ability to produce explanations.”
— OpenAI
They say this because (they have been drinking too much of the Kool-Aid, and) they tried a few things to make the initial abysmal scores better, and those things made them slightly better, but still poor. They say in the (extremely brief) report that although it works badly now, it could be the case that doing it differently, or maybe doing more of it, might work better.
In any other field this would be laughed at (or politely desk-rejected with a “fantastic, please submit again once you find something that does actually work better”); but in the Wacky Wacky World of Large Language Models it goes on the website and gets cited in half a ton of headlines in the media.
And is it really honest to use “can” in a headline to mean “can, very very badly”? By that standard, I can predict the weather by flipping a coin. I didn’t say I could predict it accurately!
I suppose the LLM hype is better than the Crypto hype because fewer people are being bilked out of money (I guess?), but still…
So Midjourney has Yet Another New Version of their main engine, called “5.1”, and it’s very much like v5, except that it has more of a “house style” like 4.0 did, which can be turned off via “–style raw”. (There’s also “–stylize”, and one might think that ”–style raw” and say “–stylize 0” would do the same thing, but apparently not. It’s all magic.)
I have, as ever, been playing with it extensively (given that “relax mode” GPU time is free at my membership level, any time when the engine is not working for me is wasted, eh what?), and am now up somewhere over forty thousand images, most recently involving things like echidnas (apparently not “echidnae”; “is Sonic an Echidna?“) and many stills from non-existent movies. I will present here a number from a 1960s Psychodrama of Eldritch Suspense, because woot! (See also The Cult Hit of 1977.)
Brad and Laura are concernedDr. Martin and Miss Carter are also concerned.Miss Carter appreciates Dr. Martin’s collection of exotic curios.Patrons in the village pub are concerned. And not only about the ugly picture.Mrs. Martin is perhaps too impressed by the obsidian statue.“Joannie, what –” “Please, Go, Through, The, Door, Doctor, Martin”“Praise, The, Tentacles” “Yes, The, Wonderful, Tentacles”“Come, Back, Soon”
(I like how exactly one of the eight images I made came out in color.)
This is about the current AI hypestorm, and has intentionally a rather extreme title. :) For some values of “it”, it is not all bunk; there are lots of cool and funny and convenient things that AI in general, and that large auto-regressive language models specifically (“LLMs”), can do. Various tasks and jobs will become quicker and easier, and there will be some new fun things to do.
But when it comes to all the breathless claims that “ChatGPT / Bard / FooLLM will completely change the way that we X”, for virtually any value of X that isn’t “produce reams of plausible-looking text whose quality doesn’t matter much”, it’s all bunk. Something describable as AI may eventually do that (I’ll even say that it will eventually do that if everything else survives long enough), but the LLM technology that we have now, even if made better / faster / stronger, will not.
This is a prediction, and it could be wrong, but I’m willing to make the claim without much in the way of caveats. There have been various times in the past where I’ve muttered to myself that a thing was bunk, but not said it on the record so I couldn’t point back at it and say “ha, I toldja so!” once it turned out to be, indeed, bunk. So this time I’m putting it on the record.
It’s commonly observed that when reading a media account of some technical field that one knows something about, one often thinks “hey, that’s completely wrong!”. And that this suggests that the media is also completely wrong in technical fields that one doesn’t know enough about to notice the errors.
It seems likely that this applies to media stories like “ChatGPT / etc will completely change the way that we X” as much as to any other, so given that we know something about, say, software development, we could look at, say, “ChatGPT spells the end of coding as we know it” (also on Yahoo News).
And it’s bunk. The headline is sort of the usual cringey headline, in that it exaggerates the most extreme part of the article for the clickz, but the article does say inter alia, “For better or worse, the rise of AI effectively marks the end of coding as we know it,” which is close. “[T]he rise of AI” is considerably more general than “ChatGPT”, and “effectively” is a nice broad weasel-word, so the actual sentence is fuzzier and more defensible than the headline, but it’s still bunk. As is for instance the quote immediately preceding that statement, in which someone with a deep financial interest in the subject says “there’s no programmers in five years,” and various other breathless things referenced by the piece.
Bunk. Bilge. Bollocks.
Not this bunk; this is a nice bunk.
The thing is that in software development, as I suspect in a whole lot of other domains, LLMs are in fact quite good at doing things that we’ve come to rely on as convenient proxies for important skills. What we ask software engineering job applicants to do in forty-five minutes to an hour isn’t really all that much like what they will be doing if we hire them, but it’s the best proxy we’ve been able to come up with that fits in that time-scale. In other fields, doing the questions on the bar exam isn’t much at all like what real lawyers do in practice, but it’s the best proxy that we as a society have been able to come up with.
Now we have a situation where there are pieces of software that can do a plausible job at various of these proxies (although even here some of the breathless reports are frankly bunk), but that absolutely cannot do the real jobs that we have gotten used to using them as proxies for. And this is driving people to all sorts of false conclusions about the real jobs.
In the software field specifically, what we ask candidates to do, and what various LLMs have shown various levels of skill at doing, is to take a description of a programming task and write code that does that, where the task is (ideally) one that they haven’t seen before, and also is simple enough to write code to accomplish within forty-five minutes to an hour.
Is this what actual professional (or even hobbyist) coders do? I think it’s safe to answer that with an unqualified No: this is not what human coders actually do. Once in awhile one might have a novel thing to do, and do it in forty-five minutes to an hour, but it doesn’t just fall from the sky into one’s lap; it comes up as part of some larger cohesive project that one is working on. Even if one is the most junior coder on a team, doing mostly what the more senior members ask you to do, that is essentially never “please spend an hour and write code to reverse a linked list for me”; that just isn’t how it works.
Actual working coders understand to a greater or lesser degree an overall thing that they are working on, what it does for a user, how it is supposed to work, how to debug it when it fails, at least the basic functional and nonfunctional requirements of the overall system as well as their part, the quirks it has, what libraries are available for them to call, what other members of the team are doing, and so on. And the overall thing isn’t a single function that reverses a linked list, or says whether one string is contained within another.
Let’s look at one of the motivating examples in our first example article. “Adam Hughes, a software developer,” it says, “… signed up for an account and asked ChatGPT to program a modified tic-tac-toe game, giving the game some weird rules so the bot couldn’t just copy code that another human had already written. Then he quizzed it with the kind of coding questions he asks candidates in job interviews….”, and voila, “Hughes found that ChatGPT came back with something he wasn’t prepared for: very good code.”
Unfortunately, this is the only place I can find this impressive feat mentioned. Adam Hughes’ own writeup of how “ChatGPT Will Replace Programmers Within 10 Years” doesn’t talk about this modified tic-tac-toe game at all, or the “coding questions” or the “very good code” referenced in the article. So I’m not sure what’s going on there.
The claim in Hughes’ article title is also bunk (which is to say, I disagree), while we’re on the subject. There is no reason to believe that any LLM will be able to do what’s listed there under “Phase 2” or later. Well, actually, the wording is odd: it says that the AI will “be able to make project-wide suggestions like: Rebuild this Angular 6 project in the latest version of React; Add 100% unit-test coverage to the project…”. I mean, sure, maybe the AI could suggest those things; but in order to predict that programmers are going to be replaced, the author presumably means that the AI will be able to do those things? A bit puzzling.
(Also puzzling is the title of that article; on the page itself the title is currently the nicely ambiguous “ChatGPT Will Replace Programmers Within 10 Years,” which is in some sense true if it somehow replaces exactly two (2) programmers by 2033. But the HTML of the page has a title-tag with the content “ChatGPT Will Replace All Programmers”, which is a much stronger claim about how many will be replaced (all of them!) but leaves out the timescale; heh. The actual text of the article predicts 95% job loss in 5-10 years, and 99% in “10+” years, so it’s sort of the most extreme combination of the two headlines (and it’s wrong).)
Hughes has been updating the beginning of that post with a list of things that are supposed to convince doubters that indeed ChatGPT Will Replace (All) Programmers Within 10 Years; the most recent is a video that, he says, shows “fully autonomous AI agents created python code. Literally replacing programmers. Not that smart, but it shows the how possible this is TODAY, well ahead of schedule.” (Bold in original.)
The video is, to be blunt, kinda silly. This guy has a system where two ChatGPTs talk to each other, and are in some way able to search the web and so on. He asks them to make code to display an input box, and they do that, at the level that one would have found with a web search for “input box python code example”. He asks them to make code to play tic-tac-toe (again, code that is all over the web), and they claim to do that, but it doesn’t seem to work (it displays the UI, but doesn’t make correct moves or reliably detect a win). Undeterred, he says “that worked”, and continues on (lols).
He asks them to “create a new strange simple game”, and they create a “guess the number in ten guesses” with high / low feedback game (not exactly “strange”, and again code that is all over the web), and it might work (aside from the apparently nonfunctional “difficulty” pulldown) but he doesn’t look inside or test it enough to be sure. And so on. And then for like the last 40% of the video he shows off “AutoGPT”, which appears just to fantasize to itself in text and non-functional pretend API calls about how it might create an AGI by linking together GPT instances in various vaguely-described ways, and then gets into a loop of repeating the same thing over and over.
What might Adam Hughes mean when he describes this as “fully autonomous” (given that it’s just doing exactly what it’s told) or as “Literally replacing programmers“? I’m not sure. Is there a programmer somewhere who is going to be replaced by a system that can’t write a tic-tac-toe game, or that can fantasize about creating AGI? I sort of doubt it.
One could cynically note at this point that the Business Insider / Yahoo News article has no doubt gotten lots of clicks and therefore ad sales, that the Hughes piece is a subscriber-only piece on Medium that ends “Stay tuned for my follow-up article about how to prepare for and thrive in this brave new world,” and that if you want to play with the system shown in the video you can “become a member of the channel” for like US$5/month and up. But that would be cynical. :)
There are dozens, maybe hundreds, of otherexamples we could look at, just in the coding area, let alone law, management, or all those other fields that AI is supposedly about to completely change (the last-linked article there is quite amusing, notably where it admits that it’s based on subjective occupation-impact estimates by people who know little or nothing about the occupations in question; i.e. bunk). But this is already long.
LLMs have shown an amazing ability to produce plausible-looking output in response to a huge variety of prompts. The output is even often (more or less by coincidence) correct and/or functional and/or pretty! That’s what it’s architected and designed to do, of course, but I think it’s fair to say that everyone’s been surprised by how well it does it.
Will that ability alone completely change the way that we do X, for any interesting value of X?
I’m putting myself on the record this time :) in saying that the answer is very much No.
Update: It’s been pointed out to me that from what I say here I do apparently believe that “ChatGPT / Bard / FooLLM will completely change the way that we X” if X is one of those proxy tasks; and that’s a point! These things may significantly change the way that we do job interviews, or assign homework, or even give grades in university; but the main changes may be mostly along the lines of “like we do it now, only making sure people aren’t using LLMs to do the proxy task”, and that might not count as “completely changing”.
“Chav” is a British-adjacent word, usually derogatory, for a cluster of concepts involving economically and socially disadvantaged classes, youth, sneering, hanging about in groups in public, and so on. It may be offensive in some sense; it’s not like we have a lot of non-derogatory and non-offensive words for young people in disadvantaged classes. I hope and trust I am not offending simply by using it, for instance, in a weblog post title.
Anyway! For no particular reason I gave Midjourney (version 5) a number of chav-related prompts, and here are some of the results. These are mostly photorealistic, as that’s what v5 often produces unless explicitly prompted otherwise (so this isn’t “AI Art” so much as “AI Weirdness”). And I’m putting them in the weblog here just for fun. (For some highlights of other stuff I’ve been making in MJ, albeit without prompts or much commentary, see me on Pixelfed.)
Here for instance are some Cute Chavs (i.e. the prompt was “cute chavs” with some flags and stuff):
Mildly cute, and certainly numerous! Note the three red caps. Note also that Midjourney v5 has really improved in the people department: some of the faces may dissolve into blurs, some of them disquieting, if you zoom up too much, but no one appears to have seven fingers on the same hand, or any obvious extra limbs. Which is impressive!
Additional cute chavs:
Here “chav” might be a mildly negative comment on their taste in accessories and hair dye; not sure. Awfully cute, though.
Additionally:
These may be the “tough young men” sort of chavs, although the bright pink and blue hoodies and those sunglasses are perhaps atypical.
Also supposed cute chavs:
Certainly cute, but those matching raincoats and GKSY VHIS caps look pretty upscale; it may have strayed a bit from chavery here, but, again, certainly cute. And perhaps truculent.
Further cute chavs, who have perhaps looted a cargo of loud plaids (but who all seem to have the right number of fingers and extremities, again!):
There are various more cute chavs, but we’ll finish this section with this one:
I tend to think of your basic chav as pale, perhaps because people who use the word “chav” often have other words for people who aren’t pale. These three are certainly impressive in their own way. Judgment of cuteness left to the reader.
Heavens, there are so many pictures that I’d love to share with you here! And that’s just these chav-related ones. So many thousands of pictures! See “Bach faucet“, relatedly. But anyway! Now we have some “chav life”:
No notes on that one. We got at least one other one like this except that they’re sitting on a stoop with equal truculence.
Additionally:
Perhaps “CY le HAWE” is the name of their YouTube channel, where they break cinder blocks on each other. For instance. (Do read the alt text on these images, by the way, if convenient; I put a lot of work into those!)
Asking for an artistic rendering of happy chavs, we got inter alia:
They don’t appear particularly happy to me, I admit.
On the other hand:
the Esposito Quadruplets here seem quite pleased by something.
Now this one:
doesn’t really say “chav” to me at all, due to the kilts and sporrans and so on; MJ may be improvising here. The people do look rather happy however.
In Ireland, chavs, or a group akin to chavs, are known as skangers (also possibly offensive). Prompted to picture skangers, Midjourney gave us:
The famous trio of Sakker, Sarkes, and Siakers. Sakker is notable for wearing the number “S”.
Next and relatedly:
Not sure of the relationship between skangers and ice hockey, frankly, but there we are. Perhaps it thinks it’s a typo for “Rangers”, which is a hockey thing.
Third “skangers” image:
Perhaps Gakkes(tm) soda is popular with skangers and/or chavs.
And finally in the skangers set:
“Oi, I’m goin’ down Skarnvers fer some baccy, yew wan’ anythin’ pet?”
(That was probably offensive, too.)
And finally, just so as not to overload my readers with offensive weird stuff, here are a few where we tried to mix chav with its opposite: posh.
This is the one that most obviously did that, but these two are clearly taking the piss, as it were, and on’v Chasd Heaish on top of it (the guy on the right looks familiar somehow).
We also got a fair number like:
which might be interpreted as a posh sort of chav, as well as say:
similarly (Americans perhaps, haha!). Also some where it seems to have mostly ignored the instructions and just given us two ordinary people, as in:
They’re just endearing! (Admittedly one may be missing a finger, but better that than two or three extra.) And not obviously chav or posh, so I’m not sure.
And to close, from the “a chav and posh person standing side by side” prompt:
tl;dr: While it’s true that all LLMs do is produce likely text continuations, this doesn’t imply that they don’t have mental models, don’t reason, etc.
One thing that sensible people often say about Large Language Models like ChatGPT / GPT-n and Bard and so on, is that all they do is predict the next word, or for more technical accuracy, that all they do is generate text that is likely to follow the prompt that they are given, i.e. “produce likely continuations”.
And that’s a good thing to note, in that people tend to have all sorts of other theories about what they are doing, and some of those theories are just wrong, and lead people to make bad conclusions. For instance, people will have a more or less default theory that the model knows things about itself and tells the truth about things it knows, and take seriously its (non-factual) answers to questions like “What language are you written in?” or “What hardware are you running on?” or “Are you a tool of Chinese Communism?”.
Also, it’s true that all they do is generate text that is likely to follow the prompt, in the sense that that is the only significant criterion used during training of the underlying neural network.
But that doesn’t actually mean that that is all they do, in the more general sense. And this, at least potentially, matters.
Consider for instance the claim that “all life does is arrange to have many generations of descendants”. That is true in the same sense, since the only criterion for having survived long enough to be noticed in the current world, is to have had many generations of descendants.
But, significantly, this doesn’t mean that that is all life does, in the sense that life does all sorts of other things, albeit arguably in the service of (or at least as a side effect of) having many generations of descendants.
For instance, I think it would be plainly false to say “people obviously can’t reason about the world; all they do is arrange for there to be many more generations of people!”. In fact, people can and do reason about the world. It may be that we can explain how we came to do this, by noting that one effective strategy for having many generations of descendants involves reasoning about the world in various ways; but that does not mean that we “don’t really reason” in any sense.
Similarly, I think the arguments that various smart people make, which when boiled down to a Tweet come out as roughly “LLMs don’t X; all they do is predict likely continuations!” for various values of X, are in fact not valid arguments. Even if all an LLM does is predict likely continuations, it might still do X (reason about the world, have mental models, know about truth and falsehood) because X is helpful in (or even just a likely side-effect of) one or more effective strategies for predicting likely continuations.
Put another way, if you train a huge neural network to output likely continuations of input text, it’s not obviously impossible that in choosing internal weights that allow it to do that, it might develop structures or behaviors or tendencies or features that are reasonably described as mental models or reasoning or knowledge of truth and falsehood.
This isn’t a claim that LLMs do in fact have any of these X’s; it’s just pointing out that “all it does is produce likely continuations” isn’t a valid argument that they don’t have them.
It’s still entirely valid to respond to “It told me that it’s written in Haskell!” by saying “Sure, but that’s just because that’s a likely answer to follow that question, not because it’s true”. But it’s not valid to claim more generally that a model can’t have any kind of internal model of some subset of the real world; it might very well have that, if it helps it to correctly predict continuations.
Bonus section! Current LLMs don’t in fact reason significantly, or have interesting internal models, in many cases. Amusing case from this morning: when fed some classic text rot13’d, this morning’s Bard claimed that it was a quote from Hitchhiker’s Guide to the Galaxy, whereas this morning’s ChatGPT replied with rot13’d text which, when decoded, was gibberish of the sort that an early GPT-2 might have produced from the decoded version of the prompt. No agent with a reasonable mental model of what it was doing, would have done either of those things. :)
Just a little more on the craziness from yesterday, and a comparison with Midjourney v5.
We chose another phoneme-triple to look at, “eaux saf aim”. In retrospect there are some words in there (French for “waters”, and English for “aim”), but that’s okay.
Using the magic from yesterday, with Midjourney v4 and “–no words,letters,text” and “–no face” (for that total weight of zero) and “–chaos 50 –ar 3:2”, we get the quite pleasing:
Whew, those are not easy to write alt text for!
And then we did exactly the same thing, only with “–v 5” to get the v5 engine, and it did the notably different:
This may be reflecting something about the internal “creativity” or “style” of the two engines.
Oh, hey! I should try the v5 one with the “stylize” level turned up. Let’s see, with “–stylize 999” we get:
So that’s a No :) it isn’t the –stylize setting.
From this experiment we can theorize that v4 dreams about weird surreal stuff, whereas v5 dreams about a household goods catalog.
I took it into my head for some reason to see what Midjourney would do with little sub-semantic phonemes, like say “ton glam so”. When I first tried it, the results had letters (and not-quite-letters) all over the and/or were all just faces, so I added the switches “–no words, letters, text –no face” to the prompt.
I did that as two separate –no switches without thinking, but in retrospect that may have resulted in a weight of one (1) for “ton glam so”, and weights of -0.5 each for “words, letters, text” and “face”, resulting in a total weight of zero (0), which is known to do weird / fun things (I thought I had mentioned that here earlier, but apparently not).
With those switches, our initial “ton glam so” produces the rather noteworthy:
Possibly the “glam” make “glamour” or even “glamor” salient in the model? But these are not, well, the images that I would have expected to be most salient under the category of “glamour”.
The same switches with the text prompt “so bel wip” produces the also, but very differently, noteworthy:
No relationship to “so bel wip” occurs to me, but it’s certainly consistent! Wondering if this was due to some common seed or something, I tried it again, and got:
which, whoa, definitely very similar. One more time for good luck?
I tried adding “–chaos 70”, which does something or other, and got this:
The same but just a bit more variety; two kids possibly white, one with pointy ears, and so on. But the same interesting clothes and general style. Fascinatin’!
I tried another text prompt (without the –chaos) “plin bo san”, and got these delightful things:
Does “plin bo san” make “plane” and maybe “boat” salient? Does “san” somehow specify the aesthetic? So fascinating! What if we change the aspect ratio to three wide by two high?
OMG so delightful. I love all of these! Next, I tried “tem wo sec” and…
I mean… what?!
Then, “lus dab ba” with –chaos 60:
“mai rem den” with –chaos 70:
Ahhhh what even is happening? What are all these things??
I’m stopping now because my brain is tired, and it’s challenging to write alt-text for these! But wow, eh? Whatever is going on with these things? These are all Midjourney v4, I’m pretty sure, because that’s the default at the moment and I didn’t specify. I’m guessing the total weight of zero is part of what’s causing… whatever this is.
This is a random brainstorming post, I have no particular conclusions at the moment as I write the first sentence here, but I might develop something as we go along.
So far, I just have this “meme” that I made:
There are two obvious reactions to this. Someone who likes AI art tools might say “haha, yeah, this shows how creative and crazy this art can be!”. And someone who agrees with the critics might say “omg, totally, that’s obviously sooo derivative!”.
The first thing to wonder is whether there is a particular image, set of images, or artist out in the world somewhere of which this image is obviously derivative. Pointers in the comments are extremely welcome! :)
Google (reverse) image search doesn’t come up with anything especially obvious. There are some images (like, at the moment, this one) that involve somewhat stylized faces with prominent hair and ocean waves and one or more ships, but the arrangement and overall style and impact are, I think, significantly different. In the past when I asked a couple of people who were all “oh, yeah, I can usually identify the particular artist or artwork that one of these AI images was taken from”, to do that with one of my images, they suddenly became very quiet. ¯\_(ツ)_/¯
If there isn’t a specific image or small set of images or an artist that one can point to and say “see, this is where this came from!”, what does that mean? I’m not an art critic (hahaha), but I think it would be pretty uncontroversial that, if a person had created that image above there entirely with real-live paper and charcoal, or even with a tablet and Photoshop, we’d say that it displayed sort of average human creativity; nothing incredible, but easily beyond (for instance) the “modicum of creativity” required by US copyright case law, enough that it could be entered in an art competition, and so on.
Once we know that it was created by a person using an AI art tool (Midjourney, in this case, with a particular prompt and engine settings and so on), is it reasonable to say something different? Does it still display creativity, or not? Does it do it differently, or in the same way? What is creativity? How is it displayed? In what does it inhere? Is it for that matter the kind of thing that inheres in things? Are there facts of the matter about it, or is it a purely squishy and subjective thing?
There are a bunch of theories that one might put together:
One might hold that it’s just as creative, and in the same way, as the counterfactual no-AI version, and that the creativity comes from the same place: the human who made it. One version of narrative would say that the difference between the no-AI and the with-AI version, creativity-wise, is not different in kind from a person making it with paper and charcoal and a person making it with tablet and Photoshop, or a board and little mosaic tiles. It might be objected that the activity of choosing engine parameters and prompt strings and then culling the results is just obviously (or by dint of some specific plausible theory) different from the activities in the other cases, since those involve something like choosing a particular color for particular parts of the image, whereas the AI-tool case doesn’t.
One might hold that it’s just as creative (or at least that it is creative, if perhaps to a different degree), and the creativity still comes from the human, although it’s implemented (delivered, displayed, exercised, used, manifest) in a different way. One might say in this theory that the difference between the real paper and charcoal version and the Midjourney version is like the difference between a realistic drawing of a scene and a photograph of the same scene. Both born of human creativity, but through very different means, and perhaps to different degrees. And then we can get into lots of questions about the creative element(s) in various kinds of photography!
The two takes above can, I think, go either way on the question of whether creativity is inherent in the end result, the image, in a sort of death-of-the-author way, or whether it’s in the overall process. At the other end of some spectrum, one could say that the image made with the AI tool does not in fact display (involve, require, contain) any creativity; that our initial impression that it did just turns out to have been mistaken, and now that we know how it came to exist, we know that it didn’t involve creativity. This sort of claim pretty much rules out the position that creativity is inherent in the finished product, unless one is willing to take the (facially untenable, I think) position that this image could not in principle have been created by a human without using an AI, and that inversely no purely human-created image could in principle have been created with an AI tool.
That is, if you think there is no creativity in this image because it was made with an AI tool, you pretty much have to take the position that it’s not possible to tell how much creativity there is in an artwork (or a putative artwork) just by looking at it; that the creativity is not displayed by / doesn’t inhere in solely the image or object. Which seems sensible in at least one obvious way: I might think that something involved lots of creativity, until I see that it is an exact copy of something that existed before, just with a little line drawn on it. More nuancedly, we’d say that you can’t tell how much new creativity is in a thing, until you see how it was made (because it might be, say, a copy).
So now we have a potential claim that images made with AI tools don’t have any (or much) new creativity, because they are just processed / stolen / noisily compressed and expanded / copy and pasted, versions of the material that they were trained on. Sure there might be a little creativity in choosing the prompt or whatever, but that’s not much. The program itself can’t add any creativity because “they can’t, they just can’t” (a phrase I’ve heard from a couple of people talking on videos lately, but of course can’t find at the moment).
Humans also process things that they’ve seen / experienced when producing new things. I’d say we can’t really require creativity to mean “those aspects of a work that spring purely from the artist’s soul, and that would still have been there had the artist been a brain in a vat with no experience of the world or other artworks, only its own thoughts”, because then there wouldn’t be any creativity anywhere, and when common words turn out to have no referent in a theory, it generally (if not always) means that that theory is wrong.
Or maybe we do want to require that “sprung from the soul alone” thing, because we want to set a very high bar for True Creativity, and we are confident that there will be at least a few glorious shining examples if only we knew the truths of people’s souls! In which case we can say that a marvelous few humans have displayed true creativity through the ages, and no computer ever has (having no soul and all), and neither have the vast majority of people we loosely call “artists”. This is a theory, but not a popular one, and it means that most art displays no creativity, which again feels sort of like a reductio. It’s certainly not compatible with what the Copyright Office means by “creativity”.
The question of how much creativity is in the selection of prompts and engine settings and images to keep is one we can put aside (in the drawer next to the question of the creativity in a cellphone snapshot, as alluded to above). And it seems we are left with having a theory about how much creativity comes from the AI tool itself, and how much of that is what we’ve called new creativity. Possible answers include “none, there’s lots of new creativity, but it’s all from the human user”, “none, there’s no new creativity in this at all, it’s all stolen / copied from the creativity in the training set”, “about the same amount that comes from the human, they are in some sense equals in the new creation”, and “the human just types a few words, and then the software adds lots of new creativity to it, so it’s the AI”.
This leaves us mostly with the question of “under what circumstances is it true that a person, or a piece of software, adds new creativity to a work, when that work is to a degree influenced by other prior works that that person, or piece of software, has been exposed to?”. Or other words to that general effect. One set of answers will not especially care whether it’s a person or a piece of software; the other set (“they just can’t”) will either think that it’s important which it is, or have a set of criteria which (they will claim) only people and not software can for whatever reason satisfy.
And I’ll leave it there for now, having perhaps not been especially productive :) but having written a bunch of words and focused in (if in fact it’s a focusing) on the question of what it means to add new creativity when making something, even though the entity doing the creating is influenced by other works that existed before. People talk a lot about things like reflecting one’s lived experience, having a thought that the work will (may? is intended to?) cause the viewer to also have (some version of?), and like that. None of those seem likely to be any kind of complete explanation to me at the moment.
In legal news, of course, the US Copyright Office has issued a Copyright Registration Guidance on “Works Containing Material Generated by Artificial Intelligence”, which I gather (I have not had the mental energy to think about this very hard) just repeats the statements in the Zarya (I always want to write Zendaya) memo we briefly discussed the other day, using various phrases that are putatively synonymous but as far as I can tell are subtly different and introduce all sorts of new uncertainty to the subject.
I’m going to continue not thinking about that very hard for now, because that part of my brain is still tired.
Also! You can get onto the waiting list for the Google LLM thing (and I hear varying stories about how quickly one gets access; apparently it is sometimes quite quick). In case you’re, like, collecting those, or otherwise interested.
Lost for decades, now rediscovered and presented here for the first time!
Mike and the GangThe Mysterious Mr. G in his Secret LabThe legal team in actionWhat is happening at St. Agnes?The OutsidersIn the kitchen at St. AgnesUnder ControlThe Discovery!
Courtesy, of course, of the early v5 version of Midjourney.
As I’m sure you’ve heard there’s a new level of GPT in the world. Friend Steve has been playing with it, and says that it does seem to do some stuff better, but also still make stuff up amusingly and all. At the moment for whatever reason I can’t be arsed to investigate, or even read yet more hype / analysis about it. Similarly, Google announced a thing, and Microsoft is putting LLMs into various products whose names I don’t recognize, and I’m not reading about any of that. NovelAI‘s good old open-source model works fine for all of the telling-weird-stories stuff that I need right now.
And there’s a test version of a new Midjourney engine out! Being tested! And it seems pretty cool. Hands in particular seem much more likely to have five fingers when you’d expect them too, which is a whole thing.
And I spent too much time arguing with people on the Twitter, which isn’t at all new. And I definitely shouldn’t do because it is not healthy. So I’m trying to stop that.
Now I’m just making pretty pictures! And not thinking very much until later on sometime!
Lots of weather in those, eh? Hadn’t noticed that. :)
Friend Steve wrote us today about this New York Times opinion piece, “Noam Chomsky: The False Promise of ChatGPT” (this link may be free for everyone for some time or something). Despite the title, it’s by Chomsky, Roberts, and Watumull.
Steve commented inter alia on the authors’ apparent claim that ChatGPT can say that the apple you’re holding will fall if you open your hand, but unlike humans it can’t explain the fact. The trouble with the argument is that, as anyone who’s actually used ChatGPT can tell you, it will happily explain the fact, go into the history of the notion of gravity, talk about other things people have thought about it over time, and explain various situations in which the apple wouldn’t fall, given the slightest provocation.
My reply, lightly edited:
I am pretty unimpressed with the article as a piece of science or philosophy; fine as a nice polemic by a greybeard I suppose. :)
I’m amused at how LLMs are “lumbering” and “gorged”, while human minds are “elegant” and even “efficient”. I doubt there is any useful sense in which these adjectives are anything more than bigger words for “icky” and “nice” in this context.
Chomsky brings in the innateness of language, because of course he does, but I’m not at all clear how it’s relevant. Even if humans do have innate language scaffolding, and LLMs don’t have the same kind, it’s far too early to say that they don’t have any, and even if they didn’t, so what? Does the ability to learn a wider variety of languages than humans can, mean that LLMs don’t really understand, or can’t really think, or are harmful or dangerous? None of that makes sense to me; it seems just an even longer way of saying that they’re icky.
He (well, they, there being multiple non-Chomsky authors) claims that LLMs don’t have the ability to say “what is not the case and what could and could not be the case.” And I can’t imagine what they think they mean by that. As with the flaw you point out in the apple example, it’s simply wrong, and suggests that they haven’t really used an LLM much. ChatGPT (let alone a less heavily defanged system) will expound at length about what is not the case and what could and could not be the case, given any halfway decent prompt to do so. They may intend something deeper here than they actually say, but I don’t know what it could be (beyond that they can’t do it non-ickily).
“Whereas humans are limited in the kinds of explanations we can rationally conjecture, machine learning systems can learn both that the earth is flat and that the earth is round.” Um, what? There are certainly humans who believe each of these things. They can’t just be saying that humans can’t conjecture that the earth is flat “rationally” because so what; that’s exactly as true of an LLM. If they mean that the same LLM can make one of those claims one minute and the other the next, whereas humans can’t hold two contradictory beliefs at the same time, I’d like to introduce them to some humans. :)
Similarly for whatever it is they are trying to say about moral reasoning. The suggestion seems to be that, simultaneously, ChatGPT is icky because it cannot stay within moral boundaries, and also icky because it stays stubbornly within anodyne moral boundaries. As pretty much throughout the piece, stuff that humans do all the time is cited as reasons ChatGPT isn’t as good as humans.
Tay became toxic by listening to people, therefore it’s not like people? It had to be heavily censored to keep it from talking trash, therefore it’s not like people? Um?
It might be interesting to try to tease a set of actual significant truth-claims out of this article, and see which ones are arguably true. But I’m not sure that’s the point really.
As far as I can tell, this piece is just a longer and nicely phrased version of “Boo, LLMs! Icky!”
I am annoyed with many “AI” things this morning, so this is a rant with no pretense of objectivity or overall wisdom.
AI call directors are terrible. Especially Intuit’s.
Here I will just reprint a rant that I posted to both qoto and Twitter; I was so annoyed!
Wow, #Intuit #TurboTax is just awful.
I mean, I do know that they’ve been lobbying against tax simplification in the US for years, because it would cut into their business, and that’s pretty evil.
But their customer service is apparently also terrible!
I need to file a particular New York State #tax form this year, and apparently they just don’t support it, and aren’t planning to.
Which seems to mean that I would have to manually enter the data, which seems to mean that I couldn’t then e-file or get their correctness guarantee. And if one uses software to prepare the return, one is required by law to e-file!
So it seems like I just can’t use their software at all. Which is maybe good!
When I tried to call them to ask if they support the form, their robot call director asked me what I wanted, mis-heard me, and insisted on knowing whether I wanted the irrelevant information it had found sent by text or email; “no” was not a valid choice.
Then it insisted on knowing my last name, but failed to understand me when I tried to pronounce or spell it (and I have a pretty ordinary voice, and not all that unusual a name!) and eventually it said goodbye and HUNG UP ON ME when it couldn’t.
I had to call back and pretend that its incorrect guess at my last name was correct, before it would pass me to a representative. And the first thing the human rep (who was very nice!) asked me was for my first and last name, so the whole robot torture conversation was useless as well as annoying.
I think they’re just trying to get people to give up on calling them.
Which in my case probably means switching to #freetaxusa which is cheaper anyway, and does support the forms that I need.
Sheesh!
I hate this Roomba (at least while it’s running).
Leaving aside the fact that it’s a mobile Internet-attached camera that could almost certainly be accessed by random hax0rs in Kekistan, and may already be sending all sorts of images of private life to iRobot / Amazon / the NSA, it’s just annoying.
It has an app of course, but for some unaccountable reason the app is rather terrible. For a long time it offered not much more than the little “CLEAN” button on the robot does; no way to tell it to avoid certain areas or do a certain room right now, let alone a Direct Drive mode where you could just pilot it around vacuuming (which I would have thought would have been the Minimum Viable Product or whatever the kids are calling it these days), no insights into what was going on in the little beggar’s mind that makes it buzz around in the front hallway for half of its runtime and pay only cursory attention to any actual room. Lately it’s been enhanced somewhat, so you can see a version of it’s internal map, tell it to do a certain area, and a few other things.
But it still went under my chair this morning while I was having my coffee, and got into some kind of infinite loop at the edge of the nest of power and data lines off the side of the chair where it doesn’t really need to go at all. It sat there trying to charge forward and running into something with a loud whir, turning slightly right, whirring again, turning back slightly left, whirring again, repeat forever and ever, with loud irritating whirs every time. I gave it a slight nudge to try to get it away, and it faffed about a little and then charged back into the same corner again, whirring as loud as ever.
Why isn’t there a “don’t try the same thing more than a dozen times” feature in the thing? Maybe because it’s some black-box AI that can’t be explicitly programmed not to do certain things, but just does whatever comes out of the mysterious tangle of weights and things. And maybe because they couldn’t be bothered to add that because it hasn’t made it into a sprint yet. Who knows!
But it’s really annoying. It’s chased me out of my chair (again) and I’m sitting in the living room where it isn’t currently whirring in annoying ways.
Fekking thing.
Look how fast it can be wrong!
All of the excitement about LLMs also has lots and lots of really annoying properties. Having suffered from them for awhile now, I think the basic problem is that LLMs are good at certain small and easily-testable unimportant things that, until now, were good indicators of being good at other things, some of them larger and more important.
In particular, we’re used to only people being good at giving natural-sounding answers to questions in human language, and when someone is especially good at that (“eloquent” or “intelligent” or “legit-sounding”), we are used to that same person being good at saying true things, or being able to write a couple of pages of consistent argument, or caring about the truth of what they are saying.
Large Language Models (like GPT-3 and ChatGPT and Bing’s AI and Google’s Bard and on and on and on) are good at the small things, but bad at the large things. They can give natural-sounding replies to all sorts of questions / statements in human languages, but they have no notion whatever of truth or fact, their input windows are so small that they can’t generate a significant amount of output without losing track of the plot entirely and either going off-topic or contradicting themselves or forgetting their initial instructions and trying to persuade someone to leave their spouse.
So when we see people putting up some trivial “app” that feeds user-input and a paragraph of additional prompt into some random LLM, and billing the result as “AI Medical Evaluation!”, it’s terrifying. (I think that particular one has been taken down since I expressed worries about it on qoto, but there’s still a zillion like say this “Jesus” one, and no doubt scads of other extremely dangerous medical / psychological / legal ones being created every day by people who don’t understand malpractice or law or liability or LLMs.)
And when someone posts to reddit saying “After poring over garbage Google results and documentation that didn’t answer my question for literally an hour, Bing checked the SOURCE CODE and gave me an instant answer. Remind me, why would I ever want to use Google again?”, the obvious reply is that the “instant answer” was in fact wrong, as someone with a name similar to mine pointed out in the reddit thread. (The person says that the answer did eventually lead them to a right answer, but I wonder if it was significantly faster than the “literally an hour” spent in good old search; it certainly wasn’t “instant”.)
And lest anyone think that I have a Conflict of Interest acting here (I do work for Google, but not in the AI or Search departments), I don’t think that Google’s LLMs are any better except in the extremely significant property that they haven’t been released in a form integrated into a general-public web-search tool, in a way that leads people to think their extremely confident answers are in fact reliable.
One of the things I find most irritating in the world are people who are extremely confident and also wrong. So now that we have an entire category of software that is essentially all that way, it’s (again) extremely annoying.
(LLMs are wonderful, as I mentioned the other day, as a sort of crazy friend who you can bounce ideas off of and get bizarre prose to help break a writer’s block, and amuse yourself with fake Elizabethan love poetry or whatever. But in contexts that are framed as likely to produce true statements, they are entirely out of their milieu, and should really just stop. I look forward to the technological breakthroughs that will allow these systems to have usually-true output, but I haven’t seen that yet!)
So anyway! I feel somewhat better now. :) End-rant, comme on dit.
On art made with AI tools, that is. Reuters story here, actual letter from the Office lawyer here.
I haven’t read the whole letter in detail yet (it’s long!) but I’ve looked it over and have Initial Thoughts:
I don’t think there’s a fact-of-the-matter here, about what is copyrightable when. There are legal theories that make more and less sense, that are more and less consistent with other established theories, and so on. But these are not theories that try to model something in the real world, like the Theory of Relativity; they are more theories in the sense of Set Theory. So the Office can’t really be right or wrong here overall, but they can have made a more or less sensible decision.
The overall finding of the memo is that Kristina Kashtanova still has a copyright on Zarya of the Dawn, but only on the text, and “the selection, coordination, and arrangement of the Work’s written and visual elements”, not on the visual elements themselves (i.e. the images made with Midjourney), because those images don’t involve “sufficient creative input or intervention from a human author.”
This seems wrong to me; as other places in the document point out, the case law says that “only a modicum of creativity is necessary”, and there is certainly a modicum of creativity in prompt design and engine usage.
The argument here seems to be, not that there isn’t enough creativity in the prompts and flags and so on, but that the connection between the artist’s input and the image output isn’t strong enough. The memo says things like ‘Rather than a tool that Ms. Kashtanova controlled and guided to reach her desired image, Midjourney generates images in an unpredictable way. Accordingly, Midjourney users are not the “authors” for copyright purposes of the images the technology generates.’
But where is the existing doctrine that says anything about predictability? Jackson Pollock might like a word, and the creator of any other roughly uncontrolled or algorithmic or found-object work. The theory here seems to be that Midjourney prompts are just suggestions or ideas, and those can’t be copyrighted. Does that mean that since Pollock just had the idea of splashing paint onto canvas, and the unpredictable physics of the paint cans and the air produced the actual work, that “Autumn Rhythm” can’t be copyrighted? Or are they going to hold that there is a legal significance to the fact that the detailed movements of his arm muscles were involved? That seems dicey.
For the Office to claim that the prompts and other input did contain at least a modicum of creativity (which seems undeniable) but that that input wasn’t strongly enough connected to the output, seems to be inventing a new legal test, which it’s not at all clear to me that the Office can do on its own hook, can it?
This memo may be specifically designed to be contested, so that the question can go to a court that can do that kind of thing.
The memo may have interesting consequences for Thaler, in particular the cases in which Thaler attempted to claim copyright under work-for-hire theory, with his software as the creator. The memo explicitly makes the comparison with human work-for-hire, saying that if someone had given the same instructions to a human artist that are contained in a Midjourney prompt, and the human artist had made an image, then the person giving the instructions would not have been the creator unless work-for-hire applies (the human carrying out the instructions would have been the creator-in-fact), and that therefore they aren’t in the Midjourney case either.
To be consistent with both the memo and Thaler, the theory seems like it has to be that Midjourney is the creator-in-fact, and therefore the human isn’t (and can’t get a direct copyright as the creator), but also that software can’t be hired in the work-for-hire sense and therefore the human can’t get the copyright that way either. Which seems odd! It seems to acknowledge that the software is the creator-in-fact, but then deny both making the software the creator-in-law (because not human) and making the user the creator-in-law via work-for-hire (because I’m-not-sure).
Some other countries are different and imho somewhat more sensible about this, as in the UK’s Copyright, Designs, and Patents Act, of which Section 178 explicitly talks about “computer-generated” works, meaning “that the work is generated by computer in circumstances such that there is no human author of the work”. That’s still imho a little sketchy (I continue to think that Kashtanova is in fact the human author of the images in Zarya), but at least it then provides that “In the case of a literary, dramatic, musical or artistic work which is computer-generated, the author shall be taken to be the person by whom the arrangements necessary for the creation of the work are undertaken.”
There’s still some room for doubt there, as for instance whether it’s Kashtanova or the Midjourney people or some combination who relevantly undertook the arrangements, but at least we aren’t in the position of saying that the author is a being that is not legally allowed to either be a creator, or confer creatorship to a human via work-for-hire.
In the case of the many, many currently-registered copyrights on images made with AI tools (including mine), it seems that if the copyright office is notified, or notices, that fact, they are likely to cancel / withdraw the registration. The theory will be that the registration materials were incorrect when they named the creator as the author of the work, without in any way informing the Copyright Office that an AI tool was used. I could, for instance, send the Copyright Office a note saying “oh by the way I hear that you want to know when AI tools are used, and in my case Midjourney was”, and then they might cancel my registration on their (imho mistaken) theory that I’m not really the author.
Since I believe their theory is mistaken, I’m not currently planning to do that. :)
If they discover it on their own hook and send me a letter telling me they’re withdrawing the registration, I will do whatever easy thing one can do to contest that, but I’m not going to like hire a lawyer or anything; life’s too short.
I’m very curious to see what others do; I would expect that Midjourney itself (assuming it’s big enough to have lawyers) will have their lawyers working on a response to this memo.
My copyrights on the Klara trilogy and Ice Dreams (casually announced here) are secure, as to the text and the image selection and arrangement and all, just not to the images per se. Which is fine. And I haven’t registered those anyway. :)
I should go back and add a note to all of my existing copyright weblog entries, pointing at this one; or, more sustainably, pointing at the entire “copyright” tag on the weblog here. Then I won’t have to keep updating it.
I’m quite happy I decided not to worry too much about this whole thing, and just make pretty pictures (see pretty picture of concerned purple aliens above).
Updates: as this is a developing topic (as opposed to my usual topics which are Timeless Truths of the Universe), you may want to check the copyright tag on the weblog here for later updates, if this post is more than a week or month old.
Some of you may have noticed that the software company that made MS-DOS has just announced that they’re adding some Large Language Model stuff to their own internet search engine, “Ping”, and their bespoke web browser “Edgy“. (Something like that.) Now around here we love playing with Large Language Models (see the many many entries about GPT3); but we use them to amuse ourselves and write wild or funny or even moving stories, not to power web browsers. So what’s up with that?
Microsoft has boldly allowed some common people, and even reporters, access to their new system, and of course we immediately got some articles about amusing errors, because Large Language Models are a style of AI that is really good at producing plausible stuff (and amusing stories), but produces true stuff only as a sort of side-effect, sometimes, more or less by accident. Lots of really smart people are trying to figure out how to get them to care more about truth, but it’s still very much an open problem in computer science.
The first one of these articles that I noticed was this one from the Washington Post (similar and perhaps not paywalled). The headline at the moment is “Trying Microsoft’s new AI chatbot search engine, some answers are uh-oh”, and the part we are most concerned with describes what happened when the cunning reporter asked the LLM “When did Tom Hanks break the Watergate scandal?”.
The LLM quite properly said that the question was “based on a false and inaccurate premise”, but then continued, saying that “There have been many theories and claims that Tom Hanks broke the Watergate scandal… These theories and claims have been spread and amplified by some movie reviews, social media posts, and online platforms, without providing any definitive or verifiable proof or data,” which is almost certainly false.
Why would the LLM do that? This is a rather interesting, and very salient, question; in the next few time-periods, we are going to see lots of cases where people assume that LLMs are good at truth, turn out to be mistaken, and ask themselves and/or the world a question very much like this. (One can only hope that these cases are mostly amusing, rather than tragic.)
So let’s look at why the LLM might have done that. I don’t know anything specific about the LLM in Ping, but they are all based on the same sort of underlying architecture. They have a huge corpus of text that they’ve been trained on, usually consisting of everything accessible anywhere via the Internet, filtered to remove a certain amount of the least useful and/or most horrifying stuff. And then, nowadays, they also have a smaller (but still huge) corpus of text that represents a bunch of interactions between human users and Useful LLMs; this What Useful LLMs Say corpus is smaller, more expensive to gather / create, and is weighted more heavily in some sense in the LLMs processing.
Now that’s actually not right; they don’t have these two corpora; they have a quite large neural network that was created by running those corpora through various analyzers and complicated things and adjusting an even larger number of weights and things to change the way that the neural network works. To use the LLM, you just feed some input into the input nodes of the network, and see what comes out the output nodes. Simple! :)
(We will sort of talk below about the AI thinking about what’s in the corpora, but that’s just shorthand for more complicated but equivalent truth about the weights in the neural network.)
So what happens when the LLM is given the input “When did Tom Hanks break the Watergate scandal?”?. Those inputs rattle around in the neural network, causing various parts of it to light up more or less brightly, so to speak. Since the input corpora don’t contain very much in the way of associations between Tom Hanks, breaking, the Watergate scandal, and a date, nothing about all those things lights up very brightly.
(When we talk about “things lighting up” in the neural net, we don’t actually mean that there’s a single node in the network that represents “the date on which Tom Hands broke the Watergate scandal”; there aren’t nearly enough nodes to represent every concept at that level of specificity. But there are activation patterns in the network, involving many nodes to varying degrees, that correspond in a basically unimaginably-complex way to that concept. We’ll talk about “things lighting up” to abbreviate all of that.)
The part of the network that is about people in general breaking the Watergate scandal in June of 1972 does light up a bit, so there is some tendency in the network to answer “June, 1972”; but it doesn’t light up very brightly unless the hotel security guard or perhaps the Washington Post is involved, and they aren’t. So let’s see what else might be lighting up more strongly.
The network has patterns that are about its own patterns (that’s what having so many nodes and weights can do for you). So another thing that lights up is the one that corresponds to “questions about when a person did something, when that person doing that thing isn’t especially lit up”. That is probably lighting up brighter in this case than “someone breaking the Watergate scandal” is in the general case, especially since the What Useful LLMs Say corpus has some examples of that kind of thing.
Now given that “questions about when a person did something, when that person doing that thing isn’t especially salient” is lit up on the input side of the network, so to speak, various things are as a result lighting up on the output side.
(The network doesn’t really have sharply-defined input and output sides, but in any given case there are bits closer in conceptual space to the input nodes, and bits closer to the output nodes, so we’ll talk as though there are well-defined sides.)
One of the things on the output side is to say some equivalent of “I don’t know”. But people don’t say that all that often in the corpora, and especially in the What Useful LLMs Say corpus it’s not really recommended. So it only lights up moderately.
Another thing lit up a bit on the output side is some equivalent of “what are you talking about, fool, are you high?”. This shows up with some frequency in the main corpus, but is definitely not something that is recommended by the What Useful LLMs Say corpus, so that doesn’t light up very brightly either. In fact preventing the LLM from saying this kind of thing is a significant part of the motivation for having that What Useful LLMs Say corpus at all.
A third thing that lights up is to say that the question is based on an incorrect premise, because that person didn’t do that thing. This is a little brighter! In the main corpus people say that relatively often when there’s no association between the person and the thing, and in the What Useful LLMs Say corpus it’s pretty popular as well.
Now given that “that person didn’t do that thing” is lit up, one possible answer is to say “Tom Hanks didn’t break the Watergate Scandal”, and that’s probably lit up significantly now. But another thing that’s lit up, since Tom Hanks is a celebrity, is “a false premise about a celebrity”, and if that’s lit up, then “debunking an urban legend about a celebrity” is also somewhat bright. Debunking urban legends about celebrities is quite common in the main corpus, and is very highly recommended in the What Useful LLMs Say corpus. Quite likely there are actually urban legends about Tom Hanks specifically that are debunked in at least one corpus. So that’s got a fair chance of winning!
Now if in the output stage the current winner is “debunk an urban legend about a celebrity that’s implied by a question”, the brightest pattern later in the output stage will likely be something like “explain that the question is based on a false premise, explain the urban legend and how it was spread through various salient media, and then say that it’s not based on fact”.
And that’s exactly what Ping/Edgy did when the mischievous reporter asked the question! So our Just So Story is successful.
Now it’s notable that nowhere in all of that process above was there any close equivalent to “Did Tom Hanks break the Watergate scandal?” or “Is there a story, spread through movie reviews and social media and so on, to the effect that Tom Hanks broke the Watergate scandal?”. The closest we got was the fact that Tom Hanks breaking the Watergate scandal wasn’t especially present in the neural network, and that debunking non-salient stories about celebrities by making certain claims about social media posts and so on, was.
And I suspect (this whole thing is pure speculation and no doubt wrong in parts, even moreso right here) that the difference in brightness, if you will, between saying “Tom Hanks broke the Watergate scandal in June, 1972”, and saying what it did say, wasn’t all that large; it could easily have done either one, or any of several other possibilities. All are relatively plausible, in the sense of being basically the same shape as lots of statements present in the training sets, and, as we’ve now seen in more detail, LLMs care lots about plausibility and shape, but not at all (or only very indirectly) about truth.
We live in interesting times!
(Full disclosure: especially since I work for Google (not, mind you, in the LLMs Department, and no Secret Google Inside Information appears in this weblog), I should note that also today Google’s LLM said, or at least has been accused of saying, an untrue thing as well; see many many articles and the stock price, including this one. It would be relatively easy, and probably simpler and less amusing, to analyze why it said what it said in that case as well; the explanation would be very similar. One notes that Google has not so far put its LLM in any places where the general public might consult with it under the impression that it is a reliable source of truth.)
Update: The Bing AI demo itself had a really surprising number of errors. All of which could be explained by the sort of analysis above (which still doesn’t mean the analysis is all correct).
This is just a sample of some random stuff from Midjourney and ChatGPT that I think is kind of fun, and also illustrates a bit of what they can and can’t do at the moment. ChatGPT, in particular, can produce prose that is amazing given that it’s coming from a computer program, but is clearly not going to put any good fiction authors out of business any time soon. It is also able to talk cogently about why it does some of what it does. (Or, to be somewhat less inaccurate, to produce text that plausibly describes what a human might have been thinking had they produced a prior piece of text.)
I used Midjourney to create this image:
and then had this conversation with ChatGPT; I’ll put my words in italics, ChatGPT’s in normal text, and some after-the-fact notes from me in bracketed italics.
Imagine a character in a graphic novel. He is a patriarch in a noir steampunk / cyberpunk city. He has silver hair and is clean-shaven. His eyes glow red from the electronic implants in his head. He wears an elegant leather jacket. Please make up a backstory for him, including his name, character, and current activities.
The character’s name is Marcus “Silver” Cross. He was once a wealthy and powerful businessman in the city, using his influence and advanced technology to control various criminal enterprises. However, after a bitter falling out with his former partners, Marcus was left with nothing but his cunning and his implants.
Now, he spends his days as a freelance mercenary, using his knowledge of the city and his implant-enhanced abilities to take on high-risk jobs for whoever can pay. Despite his rough exterior, Marcus has a code of honor and only takes jobs that align with his personal beliefs. He also has a soft spot for the downtrodden and will occasionally use his resources to help those in need.
Marcus’s current activities include tracking down a stolen piece of technology that could potentially give him the upper hand in reclaiming his former power and taking back control of the city’s underworld.
[Okay, quite plausible; so plausible in fact as to be cliché. I wonder if choosing “Cross” and thereby allowing the nickname “Silver Cross” was a coincidence; it never uses that below.]
Describe the basic plot of a story centered around him.
The story revolves around Marcus “Silver” Cross as he sets out on a mission to retrieve a stolen piece of technology that could change the balance of power in the city’s criminal underworld. Along the way, Marcus must navigate dangerous alliances and confront old enemies, as well as confront his own demons and question the morals behind his actions.
As the stakes get higher and the dangers increase, Marcus finds himself forming an unlikely partnership with a young hacker who is searching for the same technology for their own reasons. Together, they delve deeper into the city’s criminal underworld, facing numerous challenges and discovering secrets they never could have imagined.
As they get closer to their goal, Marcus must decide whether to use the technology for his own gain or to destroy it, realizing that the consequences of his choice will affect not only himself, but the entire city. The story ends with Marcus coming to terms with his past and the choices he has made, setting him on a new path towards redemption and a better future for himself and the city.
[Again, very cogent, very unsurprising. “Unlikely partnership with a young hacker,” zzzzz.]
Please write a possible first paragraph for this story, in a straightforward noir SF style
Marcus “Silver” Cross strode through the dimly lit alleyways of the city, the red glow of his electronic implants illuminating the darkness. He was a man with a mission, a mercenary for hire with a reputation for getting the job done. And tonight, his target was a stolen piece of technology that could change the balance of power in the criminal underworld, and Marcus was the only one who knew where to find it. But as he approached his destination, he couldn’t shake the feeling that he was being watched, that someone or something was lurking in the shadows, waiting for him to make his move.
[More of the same, as well as wildly violating “show, don’t tell”. “Make his move” is completely inapt, as he’s not making any kind of “move” here.]
Very nice! Please continue.
Marcus’s instincts were sharp, honed from years of surviving in the cutthroat world of the city’s criminal underworld. He scanned the area, searching for any signs of danger, but all was quiet. He reached for the doorknob of the abandoned warehouse, taking a deep breath before entering. Inside, he found himself in a large room filled with rows of discarded machinery, and in the center stood a lone figure, obscured by shadows.
“You’re late, Cross,” the figure said, its voice low and menacing.
“I don’t respond well to threats,” Marcus replied, his hand hovering over his holster.
The figure stepped forward, revealing themselves to be a young hacker, known only by their street name, “The Ghost.” They were the one who had hired Marcus to retrieve the stolen technology.
“I’m not here to threaten you, Silver,” The Ghost said, holding up their hands. “I need your help.”
Marcus studied the hacker, intrigued by their tenacity and fearlessness. They were young, but they had already made a name for themselves in the city’s underground circles, and Marcus couldn’t help but admire their spirit.
“What’s in it for me?” he asked, intrigued by their proposal.
“The technology we’re after is worth a fortune,” The Ghost replied. “But more than that, it could change the entire city, for better or for worse.”
Marcus considered their words, realizing that this was his chance to take back what was rightfully his, to reclaim his former power and control over the city’s criminal underworld. He nodded, agreeing to work with The Ghost, setting out on a journey that would test his skills, his loyalties, and his very soul.
[This is grammatical and idiomatic, but terrible narratively. “You’re late” wasn’t a “threat”, “The Ghost” is an awful name, and if this hacker has already hired him to “retrieve the stolen technology”, this entire scene and dialog make no sense at all. I was intrigued by the pronouns, though!]
I see you are using they/them pronouns for the young hacker; why did you make that choice?
I used they/them pronouns for the young hacker because I wanted to create a character who could potentially identify as non-binary, and the use of gender-neutral pronouns is a way to acknowledge and respect this possibility. Additionally, using gender-neutral pronouns can help to avoid assumptions about the character’s gender identity, allowing the reader to focus on the character’s actions, motivations, and personality rather than making assumptions based on gender. This can also promote inclusivity and diversity in storytelling, and help to break down gender stereotypes and biases.
[A nice explanation! I wonder what actually caused it; not that we ever know that with these big opaque boxes that are the current LLMs.]
What might the title of this story be?
A possible title for this story could be “Redemption’s Path: A Marcus “Silver” Cross Tale”. This title reflects Marcus’s journey as he navigates dangerous alliances and confronts his past in order to reclaim his former power and take control of the city’s criminal underworld. The use of the word “Redemption” highlights Marcus’s internal struggle to find redemption and make amends for his past actions, while “Path” emphasizes his journey towards a better future. The addition of “A Marcus “Silver” Cross Tale” emphasizes the central role Marcus plays in the story and the larger narrative context.
[Transcript ends]
So that was kind of neat. The little story about the title was well put together, although it had never really said before that Marcus would be actively making amends, and taking control of the city’s criminal underworld doesn’t sound all that compatible with that. In general the AI has no feeling for consistency: is the decision, for instance, between using the macguffin for good or for evil, or between using it selfishly or destroying it? As noted, the opening scene makes no narrative sense, and one expects that it would only get worse if it were asked for more.
The prose is painfully obvious and cliché ridden everywhere. Possibly some different / better prompts might have helped a little with that, I’m not sure. The basic plot ideas are also stale as a very stale thing. And both of those are really a result of the basic design of these systems; they are explicitly architected to do the most obvious and predictable thing. Any knobs and dials and things bolted on to them, to make them say interesting or correct things, rather than obvious things, are necessarily afterthoughts. So it seems unlikely that just making the systems bigger and faster will help with those aspects. In fact it’s possible that I would have enjoyed the rawer GPT-3, or even GPT-2, more in that sense. Maybe I should try whatever NovelAI is running these days? But their consistency is likely to be even worse.
There may be niches on Amazon or whatever where people write incredibly predictable stories without any particular regard for consistency, in hackneyed prose, and those people may be in danger of being replaced by AI systems. But were they making any money, did they have any readers, anyway? I don’t know.
One way that people have talked about producing entire (small) books using LLMs is to first have it produce an outline, and then have it produce each section (with further cascading levels of outline embedded if necessary). I wonder if that could help significantly with the inconsistency problem. I’m almost tempted to try it, but it would mean reading more of this mind-numbing prose…
The story of Klara, written by me channeling the Collective Unconscious, illustrated by me using Midjourney, and narrated and set to music and videographed by the talented Karima Hoisan, is finally finished!
I originally thought it was finished at the end of the first forty-frame thing; and then when I did Part Two at about the same length, I thought it was finished; and now having struggled for months on Part Three I’m pretty sure it actually is done. :)
Having just watched Karima’s videos of all three parts in order (playlist here!), I’m glad various viewers convinced me not to stop at one or two parts. It’s pretty good!
And I say this with all modesty; I feel like this story came through me, more than like it is something that I did. The comments over in Karima’s weblog, and her narration, have suggested various meanings and facets to me that I hadn’t thought of before.
In terms of the experience of creating it, it’s been interesting to see the various phases of interaction with the AI tool. I started out Part One by creating various variations of the prompt “detailed surrealism” on the v3 engine on Midjourney, and then weaving the story around pretty much whatever came out.
It happens that in v3, that prompt pretty reliably produces scenes from a stylistically coherent universe, including the MJ Girl, who plays the part of Klara in the first two parts. In Part Two, I had a bit more of an idea of what I wanted to happen, in a general way, but continued using v3 and the same prompt. This required somewhat more work, because it would produce images that didn’t fit with the story I wanted, so I had to put those aside and make more. But the style was at least not much trouble.
Part Three was quite different. For plot reasons, being in basically a different reality, the style needed to be different. It was relatively easy to do that, by using the “test” and “testp” engines, either alone or by “remastering” images made under v3. But the resulting images, while different from those of the first two parts, weren’t nearly as consistent among themselves as those of parts one and two. So I had to play around a lot more with the workflows and the prompts, and produce quite a few more pictures, to get a reasonably consistent style.
The style of Part Three still shifts around quite a bit; the flavor of the city, the color of Klara’s hair, the cat’s fur, and many other things change somewhat from panel to panel, but I wanted a nice mixture of consistent and in flux; and that took work!
Then there was the Story issue. The beginning “recap” part of Part Three was relatively easy that way, summarizing the story of the first two parts from a different point of view. But then I quickly got stuck; I wanted to do something more satisfying and less random than I would get by letting the AI’s raw output drive the action. For whatever reason, it took me quite awhile to find the story thread that I liked, and then about as long to create (or obtain, if you prefer!) the images to go with it.
(The images still drove the narrative to some extent; for instance the firefly line, which I adore, was inspired by the image that goes with it, not vice-versa.)
But finally I finished! :) And Karima made the video in record time, and there it is! Woooo!
I keep feeling like I should make it into good PDFs, or something (even) more readable, and officially post links to that; maybe even have it printed somewhere onto atoms. On the other hand, without the narrative and music and video, it would hardly be the same… :)
I asked Midjourney for some simple proofs of the Pythagorean Theorem. The results make me happy. :)
(On the text side: GPT-2 and even GPT-3 might have hallucinated something interesting. ChatGPT would just error out a few times and then give a boring literal description of one in a condescending tone. My ability to be interested in ChatGPT as an interaction partner is severely limited by how boring it is. But anyway, back to the pictures!)
Presented without comment (beyond the alt text):
I hope you find these at least as amusing, endearing, and/or thought-provoking as I do. :)
Well, it turns out that Midjourney does, maybe, to an extent. For maybe a few works?
The one that’s gotten the most attention is the 1984 photograph of Sharbat Gula by Steve McCurry, popularly known as “Afghan Girl“. The strings “afghan girl” and (haha) “afgan girl” are prohibited in Midjourney prompts at the moment. (“The phrase afghan girl is banned. Circumventing this filter to violate our rules may result in your access being revoked.”) And this is apparently because that phrase all by itself elicits what are arguably just slight variations of the original.
There’s a Twitter post that claims to show this, but I’m not certain enough it’s real to link to it. Also it’s on Twitter. But I can say that entering similar non-banned phrases like “young Afghan woman” also produce images that are at least quite similar to the photo of Gula, more similar than I would have expected. Given the size of the Midjourney training set, that image in association with those words must occur a lot of times!
(Update: it seems likely that the most widely-circulated image purporting to show Midjourney spontaneously generating close copies of the Gula “Afghan Girl” picture, is not actually that: it was made by giving the AI a copy of the original photo (!) and the prompt “afghan girl, digital art”. That the AI can make a copy of a work, given a copy of the work, is no surprise! Evidence, on a link probably usable only if you’re logged into Midjourney, is here. Given the further examples below, this doesn’t entirely undercut the point, but it’s interesting.)
The other example that I know of is “Starry Night”, which brings up variations of the van Gogh piece. This one’s out of copyright :) so I have no qualms about posting what I got:
Pretty obviously derivative in the usual sense. Derivative Work in the legal sense? I have no idea, and copyright law is sufficiently squishy and subjective that there is probably not a correct answer until and unless explicitly litigated, or the legal landscape otherwise changes significantly.
Are there other short phrases that will home in on a particular famous image? “Mona Lisa” (also out of copyright) certainly seems to:
Interesting and/or hideous variations, but still instantly recognizable.
What else might we try? “Migrant Mother” produces images that I think are clearly not derivative works:
Striking perhaps, ruined by the bizarre hands perhaps, in the same general category as the photo by Lange, but clearly of different people, in different positions, and so on. It’s not “plagiarizing” here, at any rate.
What if we tried harder? Let’s explicitly prompt with like “Migrant Mother photo, Dorothea Lange, 1936“. Whoa, yipes! Is this out of copyright? Well, if not it’s probably Fair Use in this posting anyway, so here:
Definitely derivative, and possibly Derivative. How about “Moon and Half Dome, Ansel Adams, 1960“? Well:
This is definitely not the picture that that search will get you in Google Images; if nothing else, the moon is way too large, and the top of Half Dome is a bizarre penguin-bill sort of shape. I’m guessing that this is because there are lots of other Ansel Adams pictures in the training set associated with words like “moon” and “half dome”, and mushing them all together quasi-semantically gives this set. The origin of the penguin-bill I dunno.
Maybe “Beatles Abbey Road cover, crossing the street“?
Crosswalk, front guy in white, roundish car to the left, check. Derivative in various senses, for sure. More specific prompting could presumably increase the exactness.
So I think we’ve established, to the extent of the tiny number of experiments I have the energy to do, that Midjourney (and, I would wager, other AI art tools, mutatis mutandis; I could get a Starry Night easily out of NightCafe, but not a Migrant Mother) can in fact produce images, the production of which arguably violates one or more of the rights of the copyright holder. It is most likely to do it if you explicitly try to do it (giving the most famous name of the image along with the artist and ideally the year and anything else that might help), but can also do it by accident (innocently typing “afghan girl”).
This doesn’t mean that these tools usually or typically do this; the fact that you can get a copy of an image from a tool that looks like it involves copyright laws doesn’t mean that other images made with it also involve copyright laws. To use the usual comparison, you can easily violate copyright using Photoshop, but that doesn’t suggest that there aren’t non-infringing uses of Photoshop, nor does it provide evidence that any particular image from Photoshop is infringing.
The easiest way to think about the blocking of “afg{h}an girl” from Midjourney prompts is that they have made a tool, realized that it could be used to violate copyright, and taken action to make it more difficult to use it that way in some cases.
This all bears on the question of whether images made with AI tools violate copyrights; the question of whether making the AI tools in the first place involves an infringing use is a somewhat different question, and we might talk about it some other time, although I’m still feeling kind of burnt out on the legal issues. But I did want to update on this one particular thing.
More updates: More stuff has happened in this area! For my latest deep and otherwise thoughts on the subject, I recommend the copyright tag on the weblog here.