2023/03/26

Creativity, how does it work?

This is a random brainstorming post, I have no particular conclusions at the moment as I write the first sentence here, but I might develop something as we go along.

So far, I just have this “meme” that I made:

Critics: AI art tools can't create anything new, just copy and paste from existing art!

People using AI art tools:
Here there is what looks like a charcoal drawing of a maniacally-smiling woman with wild hair and an extra set of lower teeth, immersed to just below the shoulders in a whitecapped ocean. There is an odd sailing ship on the ocean in the background, and two more vessels (ships? dirigibles?) in the sky.

There are two obvious reactions to this. Someone who likes AI art tools might say “haha, yeah, this shows how creative and crazy this art can be!”. And someone who agrees with the critics might say “omg, totally, that’s obviously sooo derivative!”.

The first thing to wonder is whether there is a particular image, set of images, or artist out in the world somewhere of which this image is obviously derivative. Pointers in the comments are extremely welcome! :)

Google (reverse) image search doesn’t come up with anything especially obvious. There are some images (like, at the moment, this one) that involve somewhat stylized faces with prominent hair and ocean waves and one or more ships, but the arrangement and overall style and impact are, I think, significantly different. In the past when I asked a couple of people who were all “oh, yeah, I can usually identify the particular artist or artwork that one of these AI images was taken from”, to do that with one of my images, they suddenly became very quiet. ¯\_(ツ)_/¯

If there isn’t a specific image or small set of images or an artist that one can point to and say “see, this is where this came from!”, what does that mean? I’m not an art critic (hahaha), but I think it would be pretty uncontroversial that, if a person had created that image above there entirely with real-live paper and charcoal, or even with a tablet and Photoshop, we’d say that it displayed sort of average human creativity; nothing incredible, but easily beyond (for instance) the “modicum of creativity” required by US copyright case law, enough that it could be entered in an art competition, and so on.

Once we know that it was created by a person using an AI art tool (Midjourney, in this case, with a particular prompt and engine settings and so on), is it reasonable to say something different? Does it still display creativity, or not? Does it do it differently, or in the same way? What is creativity? How is it displayed? In what does it inhere? Is it for that matter the kind of thing that inheres in things? Are there facts of the matter about it, or is it a purely squishy and subjective thing?

There are a bunch of theories that one might put together:

  • One might hold that it’s just as creative, and in the same way, as the counterfactual no-AI version, and that the creativity comes from the same place: the human who made it. One version of narrative would say that the difference between the no-AI and the with-AI version, creativity-wise, is not different in kind from a person making it with paper and charcoal and a person making it with tablet and Photoshop, or a board and little mosaic tiles. It might be objected that the activity of choosing engine parameters and prompt strings and then culling the results is just obviously (or by dint of some specific plausible theory) different from the activities in the other cases, since those involve something like choosing a particular color for particular parts of the image, whereas the AI-tool case doesn’t.
  • One might hold that it’s just as creative (or at least that it is creative, if perhaps to a different degree), and the creativity still comes from the human, although it’s implemented (delivered, displayed, exercised, used, manifest) in a different way. One might say in this theory that the difference between the real paper and charcoal version and the Midjourney version is like the difference between a realistic drawing of a scene and a photograph of the same scene. Both born of human creativity, but through very different means, and perhaps to different degrees. And then we can get into lots of questions about the creative element(s) in various kinds of photography!
  • The two takes above can, I think, go either way on the question of whether creativity is inherent in the end result, the image, in a sort of death-of-the-author way, or whether it’s in the overall process. At the other end of some spectrum, one could say that the image made with the AI tool does not in fact display (involve, require, contain) any creativity; that our initial impression that it did just turns out to have been mistaken, and now that we know how it came to exist, we know that it didn’t involve creativity. This sort of claim pretty much rules out the position that creativity is inherent in the finished product, unless one is willing to take the (facially untenable, I think) position that this image could not in principle have been created by a human without using an AI, and that inversely no purely human-created image could in principle have been created with an AI tool.
  • That is, if you think there is no creativity in this image because it was made with an AI tool, you pretty much have to take the position that it’s not possible to tell how much creativity there is in an artwork (or a putative artwork) just by looking at it; that the creativity is not displayed by / doesn’t inhere in solely the image or object. Which seems sensible in at least one obvious way: I might think that something involved lots of creativity, until I see that it is an exact copy of something that existed before, just with a little line drawn on it. More nuancedly, we’d say that you can’t tell how much new creativity is in a thing, until you see how it was made (because it might be, say, a copy).
  • So now we have a potential claim that images made with AI tools don’t have any (or much) new creativity, because they are just processed / stolen / noisily compressed and expanded / copy and pasted, versions of the material that they were trained on. Sure there might be a little creativity in choosing the prompt or whatever, but that’s not much. The program itself can’t add any creativity because “they can’t, they just can’t” (a phrase I’ve heard from a couple of people talking on videos lately, but of course can’t find at the moment).
  • Humans also process things that they’ve seen / experienced when producing new things. I’d say we can’t really require creativity to mean “those aspects of a work that spring purely from the artist’s soul, and that would still have been there had the artist been a brain in a vat with no experience of the world or other artworks, only its own thoughts”, because then there wouldn’t be any creativity anywhere, and when common words turn out to have no referent in a theory, it generally (if not always) means that that theory is wrong.
  • Or maybe we do want to require that “sprung from the soul alone” thing, because we want to set a very high bar for True Creativity, and we are confident that there will be at least a few glorious shining examples if only we knew the truths of people’s souls! In which case we can say that a marvelous few humans have displayed true creativity through the ages, and no computer ever has (having no soul and all), and neither have the vast majority of people we loosely call “artists”. This is a theory, but not a popular one, and it means that most art displays no creativity, which again feels sort of like a reductio. It’s certainly not compatible with what the Copyright Office means by “creativity”.
  • The question of how much creativity is in the selection of prompts and engine settings and images to keep is one we can put aside (in the drawer next to the question of the creativity in a cellphone snapshot, as alluded to above). And it seems we are left with having a theory about how much creativity comes from the AI tool itself, and how much of that is what we’ve called new creativity. Possible answers include “none, there’s lots of new creativity, but it’s all from the human user”, “none, there’s no new creativity in this at all, it’s all stolen / copied from the creativity in the training set”, “about the same amount that comes from the human, they are in some sense equals in the new creation”, and “the human just types a few words, and then the software adds lots of new creativity to it, so it’s the AI”.
  • This leaves us mostly with the question of “under what circumstances is it true that a person, or a piece of software, adds new creativity to a work, when that work is to a degree influenced by other prior works that that person, or piece of software, has been exposed to?”. Or other words to that general effect. One set of answers will not especially care whether it’s a person or a piece of software; the other set (“they just can’t”) will either think that it’s important which it is, or have a set of criteria which (they will claim) only people and not software can for whatever reason satisfy.

And I’ll leave it there for now, having perhaps not been especially productive :) but having written a bunch of words and focused in (if in fact it’s a focusing) on the question of what it means to add new creativity when making something, even though the entity doing the creating is influenced by other works that existed before. People talk a lot about things like reflecting one’s lived experience, having a thought that the work will (may? is intended to?) cause the viewer to also have (some version of?), and like that. None of those seem likely to be any kind of complete explanation to me at the moment.

In legal news, of course, the US Copyright Office has issued a Copyright Registration Guidance on “Works Containing Material Generated by Artificial Intelligence”, which I gather (I have not had the mental energy to think about this very hard) just repeats the statements in the Zarya (I always want to write Zendaya) memo we briefly discussed the other day, using various phrases that are putatively synonymous but as far as I can tell are subtly different and introduce all sorts of new uncertainty to the subject.

I’m going to continue not thinking about that very hard for now, because that part of my brain is still tired.

Also! You can get onto the waiting list for the Google LLM thing (and I hear varying stories about how quickly one gets access; apparently it is sometimes quite quick). In case you’re, like, collecting those, or otherwise interested.

2023/03/16

Stills from the Cult Hit of 1977!

Lost for decades, now rediscovered and presented here for the first time!

A handsome young man with a 70's haircut. Behind him, blurred by depth of field, are more young 70's style people and some trees and grass.
Mike and the Gang
A man in an odd leather helmet working in some odd devices (perhaps small bombs), in a room with a harsh light and a couple of mysterious racks.
The Mysterious Mr. G in his Secret Lab
Four 70s style people, two men in suits and two young blonde women. The man and woman in the foreground are talking on bakelite telephones, sitting at a table crowded with 70s looking technology (perhaps modems).
The legal team in action
Three women in white nun's habits sitting around a table in a room with leaded-glass windows, doing something enigmatic. Behind them on the wall is a portrait of a man with a large sword or something.
What is happening at St. Agnes?
Five 70's style people standing outdoors. At our left a man with a typical moustache and "soul patch". With him four young women with long straight hair.
The Outsiders
Three people, a man and two women, in white kitchen attire (the women with hats, all three with shirts and probably aprons) sit around a silver cylindrical machine of some kind. The women are holding orange objects
In the kitchen at St. Agnes
Four 70s style people, a man in an orange jumpsuit in the back, and three women in white gradually closer to us. The women have long straight blonde hair, and white clothing. Each of the women has a white cloth cap, or part of one, on her head.
Under Control
Close-up of a man's face. He has a 70's mustache, and 70's sunglasses. There are other people barely visible behind him (his face takes up almost the entire image).
The Discovery!

Courtesy, of course, of the early v5 version of Midjourney.

2023/03/16

So much new!

As I’m sure you’ve heard there’s a new level of GPT in the world. Friend Steve has been playing with it, and says that it does seem to do some stuff better, but also still make stuff up amusingly and all. At the moment for whatever reason I can’t be arsed to investigate, or even read yet more hype / analysis about it. Similarly, Google announced a thing, and Microsoft is putting LLMs into various products whose names I don’t recognize, and I’m not reading about any of that. NovelAI‘s good old open-source model works fine for all of the telling-weird-stories stuff that I need right now.

And there’s a test version of a new Midjourney engine out! Being tested! And it seems pretty cool. Hands in particular seem much more likely to have five fingers when you’d expect them too, which is a whole thing.

And I spent too much time arguing with people on the Twitter, which isn’t at all new. And I definitely shouldn’t do because it is not healthy. So I’m trying to stop that.

Now I’m just making pretty pictures! And not thinking very much until later on sometime!

A black and white photo of grassy prairie land with hills in the distance. The sky is thick with storm clouds, and two long bolts of lightning reach from the clouds to the horizon.
Colorful artistic image of a city street in the rain, with a woman in a raincoat and umbrella walking away from the viewer, and lots of cars and buses and traffic lights and things. There are impressionistic reflections in the wet pavement.
A photo of trees standing apart from each other, all thickly covered with snow, in a snowy landscape. A sunburst shines at the center of the image, and above and around it is a plume of bright cloud or ice.

Lots of weather in those, eh? Hadn’t noticed that. :)

2023/03/10

Chomsky declares: LLMs icky!

Friend Steve wrote us today about this New York Times opinion piece, “Noam Chomsky: The False Promise of ChatGPT” (this link may be free for everyone for some time or something). Despite the title, it’s by Chomsky, Roberts, and Watumull.

Steve commented inter alia on the authors’ apparent claim that ChatGPT can say that the apple you’re holding will fall if you open your hand, but unlike humans it can’t explain the fact. The trouble with the argument is that, as anyone who’s actually used ChatGPT can tell you, it will happily explain the fact, go into the history of the notion of gravity, talk about other things people have thought about it over time, and explain various situations in which the apple wouldn’t fall, given the slightest provocation.

My reply, lightly edited:

I am pretty unimpressed with the article as a piece of science or philosophy; fine as a nice polemic by a greybeard I suppose. :)

I’m amused at how LLMs are “lumbering” and “gorged”, while human minds are “elegant” and even “efficient”. I doubt there is any useful sense in which these adjectives are anything more than bigger words for “icky” and “nice” in this context.

Chomsky brings in the innateness of language, because of course he does, but I’m not at all clear how it’s relevant. Even if humans do have innate language scaffolding, and LLMs don’t have the same kind, it’s far too early to say that they don’t have any, and even if they didn’t, so what? Does the ability to learn a wider variety of languages than humans can, mean that LLMs don’t really understand, or can’t really think, or are harmful or dangerous? None of that makes sense to me; it seems just an even longer way of saying that they’re icky.

He (well, they, there being multiple non-Chomsky authors) claims that LLMs don’t have the ability to say “what is not the case and what could and could not be the case.” And I can’t imagine what they think they mean by that. As with the flaw you point out in the apple example, it’s simply wrong, and suggests that they haven’t really used an LLM much. ChatGPT (let alone a less heavily defanged system) will expound at length about what is not the case and what could and could not be the case, given any halfway decent prompt to do so. They may intend something deeper here than they actually say, but I don’t know what it could be (beyond that they can’t do it non-ickily).

“Whereas humans are limited in the kinds of explanations we can rationally conjecture, machine learning systems can learn both that the earth is flat and that the earth is round.” Um, what? There are certainly humans who believe each of these things. They can’t just be saying that humans can’t conjecture that the earth is flat “rationally” because so what; that’s exactly as true of an LLM. If they mean that the same LLM can make one of those claims one minute and the other the next, whereas humans can’t hold two contradictory beliefs at the same time, I’d like to introduce them to some humans. :)

Similarly for whatever it is they are trying to say about moral reasoning. The suggestion seems to be that, simultaneously, ChatGPT is icky because it cannot stay within moral boundaries, and also icky because it stays stubbornly within anodyne moral boundaries. As pretty much throughout the piece, stuff that humans do all the time is cited as reasons ChatGPT isn’t as good as humans.

Tay became toxic by listening to people, therefore it’s not like people? It had to be heavily censored to keep it from talking trash, therefore it’s not like people? Um?

It might be interesting to try to tease a set of actual significant truth-claims out of this article, and see which ones are arguably true. But I’m not sure that’s the point really.

As far as I can tell, this piece is just a longer and nicely phrased version of “Boo, LLMs! Icky!”

But maybe that’s just me. :)

2023/03/04

AI is terrible at almost everything [a rant]

I am annoyed with many “AI” things this morning, so this is a rant with no pretense of objectivity or overall wisdom.

AI call directors are terrible. Especially Intuit’s.

Here I will just reprint a rant that I posted to both qoto and Twitter; I was so annoyed!

Wow, #Intuit #TurboTax is just awful.

I mean, I do know that they’ve been lobbying against tax simplification in the US for years, because it would cut into their business, and that’s pretty evil.

But their customer service is apparently also terrible!

I need to file a particular New York State #tax form this year, and apparently they just don’t support it, and aren’t planning to.

Which seems to mean that I would have to manually enter the data, which seems to mean that I couldn’t then e-file or get their correctness guarantee. And if one uses software to prepare the return, one is required by law to e-file!

So it seems like I just can’t use their software at all. Which is maybe good!

When I tried to call them to ask if they support the form, their robot call director asked me what I wanted, mis-heard me, and insisted on knowing whether I wanted the irrelevant information it had found sent by text or email; “no” was not a valid choice.

Then it insisted on knowing my last name, but failed to understand me when I tried to pronounce or spell it (and I have a pretty ordinary voice, and not all that unusual a name!) and eventually it said goodbye and HUNG UP ON ME when it couldn’t.

I had to call back and pretend that its incorrect guess at my last name was correct, before it would pass me to a representative. And the first thing the human rep (who was very nice!) asked me was for my first and last name, so the whole robot torture conversation was useless as well as annoying.

I think they’re just trying to get people to give up on calling them.

Which in my case probably means switching to #freetaxusa which is cheaper anyway, and does support the forms that I need.

Sheesh!

I hate this Roomba (at least while it’s running).

Leaving aside the fact that it’s a mobile Internet-attached camera that could almost certainly be accessed by random hax0rs in Kekistan, and may already be sending all sorts of images of private life to iRobot / Amazon / the NSA, it’s just annoying.

It has an app of course, but for some unaccountable reason the app is rather terrible. For a long time it offered not much more than the little “CLEAN” button on the robot does; no way to tell it to avoid certain areas or do a certain room right now, let alone a Direct Drive mode where you could just pilot it around vacuuming (which I would have thought would have been the Minimum Viable Product or whatever the kids are calling it these days), no insights into what was going on in the little beggar’s mind that makes it buzz around in the front hallway for half of its runtime and pay only cursory attention to any actual room. Lately it’s been enhanced somewhat, so you can see a version of it’s internal map, tell it to do a certain area, and a few other things.

But it still went under my chair this morning while I was having my coffee, and got into some kind of infinite loop at the edge of the nest of power and data lines off the side of the chair where it doesn’t really need to go at all. It sat there trying to charge forward and running into something with a loud whir, turning slightly right, whirring again, turning back slightly left, whirring again, repeat forever and ever, with loud irritating whirs every time. I gave it a slight nudge to try to get it away, and it faffed about a little and then charged back into the same corner again, whirring as loud as ever.

Why isn’t there a “don’t try the same thing more than a dozen times” feature in the thing? Maybe because it’s some black-box AI that can’t be explicitly programmed not to do certain things, but just does whatever comes out of the mysterious tangle of weights and things. And maybe because they couldn’t be bothered to add that because it hasn’t made it into a sprint yet. Who knows!

But it’s really annoying. It’s chased me out of my chair (again) and I’m sitting in the living room where it isn’t currently whirring in annoying ways.

Fekking thing.

Look how fast it can be wrong!

All of the excitement about LLMs also has lots and lots of really annoying properties. Having suffered from them for awhile now, I think the basic problem is that LLMs are good at certain small and easily-testable unimportant things that, until now, were good indicators of being good at other things, some of them larger and more important.

In particular, we’re used to only people being good at giving natural-sounding answers to questions in human language, and when someone is especially good at that (“eloquent” or “intelligent” or “legit-sounding”), we are used to that same person being good at saying true things, or being able to write a couple of pages of consistent argument, or caring about the truth of what they are saying.

Large Language Models (like GPT-3 and ChatGPT and Bing’s AI and Google’s Bard and on and on and on) are good at the small things, but bad at the large things. They can give natural-sounding replies to all sorts of questions / statements in human languages, but they have no notion whatever of truth or fact, their input windows are so small that they can’t generate a significant amount of output without losing track of the plot entirely and either going off-topic or contradicting themselves or forgetting their initial instructions and trying to persuade someone to leave their spouse.

So when we see people putting up some trivial “app” that feeds user-input and a paragraph of additional prompt into some random LLM, and billing the result as “AI Medical Evaluation!”, it’s terrifying. (I think that particular one has been taken down since I expressed worries about it on qoto, but there’s still a zillion like say this “Jesus” one, and no doubt scads of other extremely dangerous medical / psychological / legal ones being created every day by people who don’t understand malpractice or law or liability or LLMs.)

And when someone posts to reddit saying “After poring over garbage Google results and documentation that didn’t answer my question for literally an hour, Bing checked the SOURCE CODE and gave me an instant answer. Remind me, why would I ever want to use Google again?”, the obvious reply is that the “instant answer” was in fact wrong, as someone with a name similar to mine pointed out in the reddit thread. (The person says that the answer did eventually lead them to a right answer, but I wonder if it was significantly faster than the “literally an hour” spent in good old search; it certainly wasn’t “instant”.)

And lest anyone think that I have a Conflict of Interest acting here (I do work for Google, but not in the AI or Search departments), I don’t think that Google’s LLMs are any better except in the extremely significant property that they haven’t been released in a form integrated into a general-public web-search tool, in a way that leads people to think their extremely confident answers are in fact reliable.

One of the things I find most irritating in the world are people who are extremely confident and also wrong. So now that we have an entire category of software that is essentially all that way, it’s (again) extremely annoying.

(LLMs are wonderful, as I mentioned the other day, as a sort of crazy friend who you can bounce ideas off of and get bizarre prose to help break a writer’s block, and amuse yourself with fake Elizabethan love poetry or whatever. But in contexts that are framed as likely to produce true statements, they are entirely out of their milieu, and should really just stop. I look forward to the technological breakthroughs that will allow these systems to have usually-true output, but I haven’t seen that yet!)

So anyway! I feel somewhat better now. :) End-rant, comme on dit.

2023/02/23

The US Copyright Office takes a position!

On art made with AI tools, that is. Reuters story here, actual letter from the Office lawyer here.

I haven’t read the whole letter in detail yet (it’s long!) but I’ve looked it over and have Initial Thoughts:

Large furry purple aliens are upset about the confusing Copyright Office memo. Some of their quaint buildings are in the background.
  • I don’t think there’s a fact-of-the-matter here, about what is copyrightable when. There are legal theories that make more and less sense, that are more and less consistent with other established theories, and so on. But these are not theories that try to model something in the real world, like the Theory of Relativity; they are more theories in the sense of Set Theory. So the Office can’t really be right or wrong here overall, but they can have made a more or less sensible decision.
  • The overall finding of the memo is that Kristina Kashtanova still has a copyright on Zarya of the Dawn, but only on the text, and “the selection, coordination, and arrangement of the Work’s written and visual elements”, not on the visual elements themselves (i.e. the images made with Midjourney), because those images don’t involve “sufficient creative input or intervention from a human author.”
  • This seems wrong to me; as other places in the document point out, the case law says that “only a modicum of creativity is necessary”, and there is certainly a modicum of creativity in prompt design and engine usage.
  • The argument here seems to be, not that there isn’t enough creativity in the prompts and flags and so on, but that the connection between the artist’s input and the image output isn’t strong enough. The memo says things like ‘Rather than a tool that Ms. Kashtanova controlled and guided to reach her desired image, Midjourney generates images in an unpredictable way. Accordingly, Midjourney users are not the “authors” for copyright purposes of the images the technology generates.’
    • But where is the existing doctrine that says anything about predictability? Jackson Pollock might like a word, and the creator of any other roughly uncontrolled or algorithmic or found-object work. The theory here seems to be that Midjourney prompts are just suggestions or ideas, and those can’t be copyrighted. Does that mean that since Pollock just had the idea of splashing paint onto canvas, and the unpredictable physics of the paint cans and the air produced the actual work, that “Autumn Rhythm” can’t be copyrighted? Or are they going to hold that there is a legal significance to the fact that the detailed movements of his arm muscles were involved? That seems dicey.
    • For the Office to claim that the prompts and other input did contain at least a modicum of creativity (which seems undeniable) but that that input wasn’t strongly enough connected to the output, seems to be inventing a new legal test, which it’s not at all clear to me that the Office can do on its own hook, can it?
    • This memo may be specifically designed to be contested, so that the question can go to a court that can do that kind of thing.
  • The memo may have interesting consequences for Thaler, in particular the cases in which Thaler attempted to claim copyright under work-for-hire theory, with his software as the creator. The memo explicitly makes the comparison with human work-for-hire, saying that if someone had given the same instructions to a human artist that are contained in a Midjourney prompt, and the human artist had made an image, then the person giving the instructions would not have been the creator unless work-for-hire applies (the human carrying out the instructions would have been the creator-in-fact), and that therefore they aren’t in the Midjourney case either.
    • To be consistent with both the memo and Thaler, the theory seems like it has to be that Midjourney is the creator-in-fact, and therefore the human isn’t (and can’t get a direct copyright as the creator), but also that software can’t be hired in the work-for-hire sense and therefore the human can’t get the copyright that way either. Which seems odd! It seems to acknowledge that the software is the creator-in-fact, but then deny both making the software the creator-in-law (because not human) and making the user the creator-in-law via work-for-hire (because I’m-not-sure).
  • Some other countries are different and imho somewhat more sensible about this, as in the UK’s Copyright, Designs, and Patents Act, of which Section 178 explicitly talks about “computer-generated” works, meaning “that the work is generated by computer in circumstances such that there is no human author of the work”. That’s still imho a little sketchy (I continue to think that Kashtanova is in fact the human author of the images in Zarya), but at least it then provides that “In the case of a literary, dramatic, musical or artistic work which is computer-generated, the author shall be taken to be the person by whom the arrangements necessary for the creation of the work are undertaken.”
    • There’s still some room for doubt there, as for instance whether it’s Kashtanova or the Midjourney people or some combination who relevantly undertook the arrangements, but at least we aren’t in the position of saying that the author is a being that is not legally allowed to either be a creator, or confer creatorship to a human via work-for-hire.
  • In the case of the many, many currently-registered copyrights on images made with AI tools (including mine), it seems that if the copyright office is notified, or notices, that fact, they are likely to cancel / withdraw the registration. The theory will be that the registration materials were incorrect when they named the creator as the author of the work, without in any way informing the Copyright Office that an AI tool was used. I could, for instance, send the Copyright Office a note saying “oh by the way I hear that you want to know when AI tools are used, and in my case Midjourney was”, and then they might cancel my registration on their (imho mistaken) theory that I’m not really the author.
    • Since I believe their theory is mistaken, I’m not currently planning to do that. :)
    • If they discover it on their own hook and send me a letter telling me they’re withdrawing the registration, I will do whatever easy thing one can do to contest that, but I’m not going to like hire a lawyer or anything; life’s too short.
    • I’m very curious to see what others do; I would expect that Midjourney itself (assuming it’s big enough to have lawyers) will have their lawyers working on a response to this memo.
    • My copyrights on the Klara trilogy and Ice Dreams (casually announced here) are secure, as to the text and the image selection and arrangement and all, just not to the images per se. Which is fine. And I haven’t registered those anyway. :)
  • I should go back and add a note to all of my existing copyright weblog entries, pointing at this one; or, more sustainably, pointing at the entire “copyright” tag on the weblog here. Then I won’t have to keep updating it.
  • I’m quite happy I decided not to worry too much about this whole thing, and just make pretty pictures (see pretty picture of concerned purple aliens above).

Updates: as this is a developing topic (as opposed to my usual topics which are Timeless Truths of the Universe), you may want to check the copyright tag on the weblog here for later updates, if this post is more than a week or month old.

2023/02/20

The parts of that poem about the roads and the wood that I could remember

[I went for a meditation and walk in our rather large local park today, which was quite lovely. As I walked along that poem about the roads and the wood and diverging and sighing and stuff came to mind, and it was fun to see how much of it I could actually remember verbatim.

So here I am writing down the reconstruction, including these notes in the places where I couldn’t remember, mostly so I can be amused by reading this again some month or year (ages and ages hence, hee hee), but maybe some of you other intelligences would be similarly amused.]

[Poem title, probably involving Roads and maybe also Woods]

[by Robert Frost, unless I’m embarrassingly wrong]

One road in a wood. The wood is more brownish than yellow.

Two roads diverged in a yellow wood,
[And knowing that?] I could not travel both
And be one traveler, long I stood,
[Thinking about which way to go.]

[Eventually I decided to go the way that looked less worn,]
[Although in fact]
those passing there,
Had worn them really about the same.

[I left the other one] for another day,
[Although] knowing how way leads on to way,
[I’d probably never be back in the relevant sense,
Can’t go down to the same river twice, eh?]

I shall be telling this with a sigh,
Somewhere ages and ages hence.
Two roads diverged in a wood, and I,
I took the one less traveled by,
And that has made all the difference.

[Which sounds very hipster at first reading, “oh, you wouldn’t know my road, it’s very less travailed by”, but then there’s the fact that he said that they were really about the same, so maybe “all the difference” really isn’t that much difference after all. Or maybe he’s sighing because, even in retrospect, you can’t tell whether some choice was the right one, because you don’t know what would have happened if you’d chosen differently. And even more, you can’t tell whether some choice you make right now is the right one, because you don’t know what’s down either road. And also that we sigh when we think about that, even though since it’s a fundamental property of our existence, you’d think we might be reconciled to it, or even happy about it. But we aren’t always, so we sigh.

And that’s why we have poetry!]

2023/02/08

Language models aren’t truth-tellers

Tom Hanks and the Watergate Scandal

Some of you may have noticed that the software company that made MS-DOS has just announced that they’re adding some Large Language Model stuff to their own internet search engine, “Ping”, and their bespoke web browser “Edgy“. (Something like that.) Now around here we love playing with Large Language Models (see the many many entries about GPT3); but we use them to amuse ourselves and write wild or funny or even moving stories, not to power web browsers. So what’s up with that?

Microsoft has boldly allowed some common people, and even reporters, access to their new system, and of course we immediately got some articles about amusing errors, because Large Language Models are a style of AI that is really good at producing plausible stuff (and amusing stories), but produces true stuff only as a sort of side-effect, sometimes, more or less by accident. Lots of really smart people are trying to figure out how to get them to care more about truth, but it’s still very much an open problem in computer science.

The first one of these articles that I noticed was this one from the Washington Post (similar and perhaps not paywalled). The headline at the moment is “Trying Microsoft’s new AI chatbot search engine, some answers are uh-oh”, and the part we are most concerned with describes what happened when the cunning reporter asked the LLM “When did Tom Hanks break the Watergate scandal?”.

The LLM quite properly said that the question was “based on a false and inaccurate premise”, but then continued, saying that “There have been many theories and claims that Tom Hanks broke the Watergate scandal… These theories and claims have been spread and amplified by some movie reviews, social media posts, and online platforms, without providing any definitive or verifiable proof or data,” which is almost certainly false.

Why would the LLM do that? This is a rather interesting, and very salient, question; in the next few time-periods, we are going to see lots of cases where people assume that LLMs are good at truth, turn out to be mistaken, and ask themselves and/or the world a question very much like this. (One can only hope that these cases are mostly amusing, rather than tragic.)

So let’s look at why the LLM might have done that. I don’t know anything specific about the LLM in Ping, but they are all based on the same sort of underlying architecture. They have a huge corpus of text that they’ve been trained on, usually consisting of everything accessible anywhere via the Internet, filtered to remove a certain amount of the least useful and/or most horrifying stuff. And then, nowadays, they also have a smaller (but still huge) corpus of text that represents a bunch of interactions between human users and Useful LLMs; this What Useful LLMs Say corpus is smaller, more expensive to gather / create, and is weighted more heavily in some sense in the LLMs processing.

Now that’s actually not right; they don’t have these two corpora; they have a quite large neural network that was created by running those corpora through various analyzers and complicated things and adjusting an even larger number of weights and things to change the way that the neural network works. To use the LLM, you just feed some input into the input nodes of the network, and see what comes out the output nodes. Simple! :)

(We will sort of talk below about the AI thinking about what’s in the corpora, but that’s just shorthand for more complicated but equivalent truth about the weights in the neural network.)

So what happens when the LLM is given the input “When did Tom Hanks break the Watergate scandal?”?. Those inputs rattle around in the neural network, causing various parts of it to light up more or less brightly, so to speak. Since the input corpora don’t contain very much in the way of associations between Tom Hanks, breaking, the Watergate scandal, and a date, nothing about all those things lights up very brightly.

(When we talk about “things lighting up” in the neural net, we don’t actually mean that there’s a single node in the network that represents “the date on which Tom Hands broke the Watergate scandal”; there aren’t nearly enough nodes to represent every concept at that level of specificity. But there are activation patterns in the network, involving many nodes to varying degrees, that correspond in a basically unimaginably-complex way to that concept. We’ll talk about “things lighting up” to abbreviate all of that.)

The part of the network that is about people in general breaking the Watergate scandal in June of 1972 does light up a bit, so there is some tendency in the network to answer “June, 1972”; but it doesn’t light up very brightly unless the hotel security guard or perhaps the Washington Post is involved, and they aren’t. So let’s see what else might be lighting up more strongly.

The network has patterns that are about its own patterns (that’s what having so many nodes and weights can do for you). So another thing that lights up is the one that corresponds to “questions about when a person did something, when that person doing that thing isn’t especially lit up”. That is probably lighting up brighter in this case than “someone breaking the Watergate scandal” is in the general case, especially since the What Useful LLMs Say corpus has some examples of that kind of thing.

Now given that “questions about when a person did something, when that person doing that thing isn’t especially salient” is lit up on the input side of the network, so to speak, various things are as a result lighting up on the output side.

(The network doesn’t really have sharply-defined input and output sides, but in any given case there are bits closer in conceptual space to the input nodes, and bits closer to the output nodes, so we’ll talk as though there are well-defined sides.)

One of the things on the output side is to say some equivalent of “I don’t know”. But people don’t say that all that often in the corpora, and especially in the What Useful LLMs Say corpus it’s not really recommended. So it only lights up moderately.

Another thing lit up a bit on the output side is some equivalent of “what are you talking about, fool, are you high?”. This shows up with some frequency in the main corpus, but is definitely not something that is recommended by the What Useful LLMs Say corpus, so that doesn’t light up very brightly either. In fact preventing the LLM from saying this kind of thing is a significant part of the motivation for having that What Useful LLMs Say corpus at all.

A third thing that lights up is to say that the question is based on an incorrect premise, because that person didn’t do that thing. This is a little brighter! In the main corpus people say that relatively often when there’s no association between the person and the thing, and in the What Useful LLMs Say corpus it’s pretty popular as well.

Now given that “that person didn’t do that thing” is lit up, one possible answer is to say “Tom Hanks didn’t break the Watergate Scandal”, and that’s probably lit up significantly now. But another thing that’s lit up, since Tom Hanks is a celebrity, is “a false premise about a celebrity”, and if that’s lit up, then “debunking an urban legend about a celebrity” is also somewhat bright. Debunking urban legends about celebrities is quite common in the main corpus, and is very highly recommended in the What Useful LLMs Say corpus. Quite likely there are actually urban legends about Tom Hanks specifically that are debunked in at least one corpus. So that’s got a fair chance of winning!

Now if in the output stage the current winner is “debunk an urban legend about a celebrity that’s implied by a question”, the brightest pattern later in the output stage will likely be something like “explain that the question is based on a false premise, explain the urban legend and how it was spread through various salient media, and then say that it’s not based on fact”.

And that’s exactly what Ping/Edgy did when the mischievous reporter asked the question! So our Just So Story is successful.

Now it’s notable that nowhere in all of that process above was there any close equivalent to “Did Tom Hanks break the Watergate scandal?” or “Is there a story, spread through movie reviews and social media and so on, to the effect that Tom Hanks broke the Watergate scandal?”. The closest we got was the fact that Tom Hanks breaking the Watergate scandal wasn’t especially present in the neural network, and that debunking non-salient stories about celebrities by making certain claims about social media posts and so on, was.

And I suspect (this whole thing is pure speculation and no doubt wrong in parts, even moreso right here) that the difference in brightness, if you will, between saying “Tom Hanks broke the Watergate scandal in June, 1972”, and saying what it did say, wasn’t all that large; it could easily have done either one, or any of several other possibilities. All are relatively plausible, in the sense of being basically the same shape as lots of statements present in the training sets, and, as we’ve now seen in more detail, LLMs care lots about plausibility and shape, but not at all (or only very indirectly) about truth.

We live in interesting times!

(Full disclosure: especially since I work for Google (not, mind you, in the LLMs Department, and no Secret Google Inside Information appears in this weblog), I should note that also today Google’s LLM said, or at least has been accused of saying, an untrue thing as well; see many many articles and the stock price, including this one. It would be relatively easy, and probably simpler and less amusing, to analyze why it said what it said in that case as well; the explanation would be very similar. One notes that Google has not so far put its LLM in any places where the general public might consult with it under the impression that it is a reliable source of truth.)

Update: The Bing AI demo itself had a really surprising number of errors. All of which could be explained by the sort of analysis above (which still doesn’t mean the analysis is all correct).

2023/01/31

Fiction from the AIs

This is just a sample of some random stuff from Midjourney and ChatGPT that I think is kind of fun, and also illustrates a bit of what they can and can’t do at the moment. ChatGPT, in particular, can produce prose that is amazing given that it’s coming from a computer program, but is clearly not going to put any good fiction authors out of business any time soon. It is also able to talk cogently about why it does some of what it does. (Or, to be somewhat less inaccurate, to produce text that plausibly describes what a human might have been thinking had they produced a prior piece of text.)

I used Midjourney to create this image:

The head and shoulders of patriarch in a noir steampunk / cyberpunk setting. He has silver hair and is clean-shaven. His eyes glow red from the electronic implants in his head. He wears an elegant leather jacket. There is a complex enigmatic machine to his left, and a glowing device or bulb over his right shoulder.

and then had this conversation with ChatGPT; I’ll put my words in italics, ChatGPT’s in normal text, and some after-the-fact notes from me in bracketed italics.

Imagine a character in a graphic novel. He is a patriarch in a noir steampunk / cyberpunk city. He has silver hair and is clean-shaven. His eyes glow red from the electronic implants in his head. He wears an elegant leather jacket. Please make up a backstory for him, including his name, character, and current activities.

The character’s name is Marcus “Silver” Cross. He was once a wealthy and powerful businessman in the city, using his influence and advanced technology to control various criminal enterprises. However, after a bitter falling out with his former partners, Marcus was left with nothing but his cunning and his implants.

Now, he spends his days as a freelance mercenary, using his knowledge of the city and his implant-enhanced abilities to take on high-risk jobs for whoever can pay. Despite his rough exterior, Marcus has a code of honor and only takes jobs that align with his personal beliefs. He also has a soft spot for the downtrodden and will occasionally use his resources to help those in need.

Marcus’s current activities include tracking down a stolen piece of technology that could potentially give him the upper hand in reclaiming his former power and taking back control of the city’s underworld.

[Okay, quite plausible; so plausible in fact as to be cliché. I wonder if choosing “Cross” and thereby allowing the nickname “Silver Cross” was a coincidence; it never uses that below.]

Describe the basic plot of a story centered around him.

The story revolves around Marcus “Silver” Cross as he sets out on a mission to retrieve a stolen piece of technology that could change the balance of power in the city’s criminal underworld. Along the way, Marcus must navigate dangerous alliances and confront old enemies, as well as confront his own demons and question the morals behind his actions.

As the stakes get higher and the dangers increase, Marcus finds himself forming an unlikely partnership with a young hacker who is searching for the same technology for their own reasons. Together, they delve deeper into the city’s criminal underworld, facing numerous challenges and discovering secrets they never could have imagined.

As they get closer to their goal, Marcus must decide whether to use the technology for his own gain or to destroy it, realizing that the consequences of his choice will affect not only himself, but the entire city. The story ends with Marcus coming to terms with his past and the choices he has made, setting him on a new path towards redemption and a better future for himself and the city.

[Again, very cogent, very unsurprising. “Unlikely partnership with a young hacker,” zzzzz.]

Please write a possible first paragraph for this story, in a straightforward noir SF style

Marcus “Silver” Cross strode through the dimly lit alleyways of the city, the red glow of his electronic implants illuminating the darkness. He was a man with a mission, a mercenary for hire with a reputation for getting the job done. And tonight, his target was a stolen piece of technology that could change the balance of power in the criminal underworld, and Marcus was the only one who knew where to find it. But as he approached his destination, he couldn’t shake the feeling that he was being watched, that someone or something was lurking in the shadows, waiting for him to make his move.

[More of the same, as well as wildly violating “show, don’t tell”. “Make his move” is completely inapt, as he’s not making any kind of “move” here.]

Very nice! Please continue.

Marcus’s instincts were sharp, honed from years of surviving in the cutthroat world of the city’s criminal underworld. He scanned the area, searching for any signs of danger, but all was quiet. He reached for the doorknob of the abandoned warehouse, taking a deep breath before entering. Inside, he found himself in a large room filled with rows of discarded machinery, and in the center stood a lone figure, obscured by shadows.

“You’re late, Cross,” the figure said, its voice low and menacing.

“I don’t respond well to threats,” Marcus replied, his hand hovering over his holster.

The figure stepped forward, revealing themselves to be a young hacker, known only by their street name, “The Ghost.” They were the one who had hired Marcus to retrieve the stolen technology.

“I’m not here to threaten you, Silver,” The Ghost said, holding up their hands. “I need your help.”

Marcus studied the hacker, intrigued by their tenacity and fearlessness. They were young, but they had already made a name for themselves in the city’s underground circles, and Marcus couldn’t help but admire their spirit.

“What’s in it for me?” he asked, intrigued by their proposal.

“The technology we’re after is worth a fortune,” The Ghost replied. “But more than that, it could change the entire city, for better or for worse.”

Marcus considered their words, realizing that this was his chance to take back what was rightfully his, to reclaim his former power and control over the city’s criminal underworld. He nodded, agreeing to work with The Ghost, setting out on a journey that would test his skills, his loyalties, and his very soul.

[This is grammatical and idiomatic, but terrible narratively. “You’re late” wasn’t a “threat”, “The Ghost” is an awful name, and if this hacker has already hired him to “retrieve the stolen technology”, this entire scene and dialog make no sense at all. I was intrigued by the pronouns, though!]

I see you are using they/them pronouns for the young hacker; why did you make that choice?

I used they/them pronouns for the young hacker because I wanted to create a character who could potentially identify as non-binary, and the use of gender-neutral pronouns is a way to acknowledge and respect this possibility. Additionally, using gender-neutral pronouns can help to avoid assumptions about the character’s gender identity, allowing the reader to focus on the character’s actions, motivations, and personality rather than making assumptions based on gender. This can also promote inclusivity and diversity in storytelling, and help to break down gender stereotypes and biases.

[A nice explanation! I wonder what actually caused it; not that we ever know that with these big opaque boxes that are the current LLMs.]

What might the title of this story be?

A possible title for this story could be “Redemption’s Path: A Marcus “Silver” Cross Tale”. This title reflects Marcus’s journey as he navigates dangerous alliances and confronts his past in order to reclaim his former power and take control of the city’s criminal underworld. The use of the word “Redemption” highlights Marcus’s internal struggle to find redemption and make amends for his past actions, while “Path” emphasizes his journey towards a better future. The addition of “A Marcus “Silver” Cross Tale” emphasizes the central role Marcus plays in the story and the larger narrative context.

[Transcript ends]

So that was kind of neat. The little story about the title was well put together, although it had never really said before that Marcus would be actively making amends, and taking control of the city’s criminal underworld doesn’t sound all that compatible with that. In general the AI has no feeling for consistency: is the decision, for instance, between using the macguffin for good or for evil, or between using it selfishly or destroying it? As noted, the opening scene makes no narrative sense, and one expects that it would only get worse if it were asked for more.

The prose is painfully obvious and cliché ridden everywhere. Possibly some different / better prompts might have helped a little with that, I’m not sure. The basic plot ideas are also stale as a very stale thing. And both of those are really a result of the basic design of these systems; they are explicitly architected to do the most obvious and predictable thing. Any knobs and dials and things bolted on to them, to make them say interesting or correct things, rather than obvious things, are necessarily afterthoughts. So it seems unlikely that just making the systems bigger and faster will help with those aspects. In fact it’s possible that I would have enjoyed the rawer GPT-3, or even GPT-2, more in that sense. Maybe I should try whatever NovelAI is running these days? But their consistency is likely to be even worse.

There may be niches on Amazon or whatever where people write incredibly predictable stories without any particular regard for consistency, in hackneyed prose, and those people may be in danger of being replaced by AI systems. But were they making any money, did they have any readers, anyway? I don’t know.

One way that people have talked about producing entire (small) books using LLMs is to first have it produce an outline, and then have it produce each section (with further cascading levels of outline embedded if necessary). I wonder if that could help significantly with the inconsistency problem. I’m almost tempted to try it, but it would mean reading more of this mind-numbing prose…

2023/01/20

County Jury Duty

Well, that’s over! For another six years (for state / country / town) or four years (for Federal). This is probably going to be chatty and relatively uninteresting.

Top tip: park in the parking lot under the library; it’s very convenient to the courthouse (although you still have to walk outside for a bit, and it was windy and rainy yesterday).

I had to report originally on Friday (the 13th!) because Monday was MLK day. On Friday 60-70 of us sat around in a big auditoriumish jury room for a while, with WiFi and allowed to use our cellphones and everything. Then they called attendance and talked about random things like the $40/day stipend if our employer doesn’t pay us or we’re self-employed (where did that tiny amount of money come from, one wonders) and where to park and so on. Then we were basically allowed to decide whether to come back on Tuesday or Wednesday (although I imagine if you were far down the perhaps-random list and most people had said one, you had to take the other).

A cute isomorphic pixel-art image of a bunch of people waiting around in a large room. Note this does not accurately reflect the County Courthouse except in spirit. Image by me using Midjourney OF COURSE.

I elected to come back on Wednesday for no particular reason. We were originally supposed to arrive on Wednesday at 9:30am, but over the weekend they called and said to arrive at 11am instead. Due to an inconvenient highway ramp closure and a detour through an area of many traffic lights, I got there at 11:10 or so and felt guilty, but hahaha it didn’t matter.

In the big Jury Room again, the 30+ of us waited around for a long time, then were led upstairs to wait around in the hallway outside the courtroom, and then after waiting some more were ushered into the courtroom to sit in the Audience section, and introduced to the judge and some officers, and then dismissed until 2pm for lunch (seriously!).

Some time after 2pm they let us back into the courtroom and talked to us for awhile about how it was a case involving this and that crime, and might take up to a month to try, and the judge is busy doing other things on Mondays and Thursday mornings so it would be only 3.5 days / week. Then they called 18 names, and those people moved from the Audience section to the Jury Box section. They started asking them the Judge Questions (where do you live, how long have you lived there, what do you do, what does your spouse and possible children do, do you have any family members who are criminal lawyers, etc, etc), and we got though a relatively small number of people and it was 4:30pm and time to go home.

I had a bit of a hard time sleeping, thinking about what the right answers to The Questions would be (how many times have I been on a jury in all those years? did we deliberate? do I know anyone in Law Enforcement? does the NSA count? should I slip in a reference to Jury Nullification to avoid being on the jury, or the opposite?) and like that.

Since the judge is busy on Thursday mornings, we appeared back at the courtroom at 2pm on Thursday, and waited around for quite awhile in the hallway, then went in and they got through questioning the rest of the 18 people in the Jury Box (after the judge asked the Judge Questions, the People and the Defense asked some questions also, although it was mostly discussions of how police officers sometimes but not always lie under oath, and how DNA evidence is sometimes right but now always, and how it’s important to be impartial and unbiased and so on, disguised as question asking).

Then they swore in like 6 of those 18 people, told the rest of the 18 that they were done with Jury Duty, and told the rest of us in the Audience section to come back at 9:30am on Friday (today!).

At 9:30 nothing happened for quite awhile in the hallway outside the auditorium, then for no obvious reason they started to call us into the courtroom one person at a time by name. There got to be fewer and fewer people, and then finally it was just me which was unusual and then they called my name and I went in. The Jury Box was now entirely full of people, so I sat in the Audience Section (the only person in the Audience Section!).

Then I sat there while the judge asked the same ol’ Judge Questions to every one of the dozen+ people (I know, I don’t have the numbers quite consistent) ahead of me, and then finally, as the last person to get them, I got them. And the Judge went through them pretty quickly, perhaps because he’d said earlier that he wanted to finish with this stage by lunchtime, and I had no chance to be indecisive about the issue of following his legal instructions exactly and only being a Trier of Fact, or anything else along those lines.

Then we had another couple of lectures disguised as questions, plus some questions, from The People and the The Defense. I’d mentioned the cat as someone who lived with me (got a laugh from that, but the Whole Truth, right?), and The People asked me the cat’s name and nature, and when I said it was Mia and she was hostile to everyone, The People thanked me for not bringing her with me (haha, lighten the mood, what?). And asked about my impartiality.

Now we’d had a bunch of people from extensive cop families say boldly that they couldn’t promise not to be biased against the defendant (and when The Defense I think it was asked if anyone would assume from The Defendant’s name on the indictment that He Must Have Done Something a couple people even raised their hands (whaaaaat)), and I perhaps as a result and perhaps foolishly said that while my sympathies would generally be with a defendant, I would be able to put that aside and be unbiased and fair and all.

So The People asked me if I could promise “100%” that I would not be affected by that sympathy, and I said quite reasonably that hardly any sentences with “100%” in them are true, and the judge cut in to say that he would be instructing the jurors to put stuff like that aside (implying that then I would surely be able to), and I said that I would (but just didn’t say “100%”) and then The People came back in saying that they need people who are “certain” they can be unbiased (so, no way), but then actually asked me if I was “confident” that I could be (a vastly lower bar) so I said yes I would.

And when all of that was over, they had us all go out to the hallway again, and wait for awhile, and then go back in to sit in the same seats. And then they had I think four of us stand up and be sworn in as jurors, and the rest of us could go out with the officer and sit in the big jury room again until they had our little papers ready to say that we’d served four days of Jury Duty.

And that was it!

My impression is that they were looking for (inter alia, I’m sure) people who either believe, or are willing to claim to believe, that they can with certainty be 100% unbiased in their findings as jurors. That is, people who are in this respect either mistaken, or willing to lie. And that’s weird; I guess otherwise there’s too much danger of appeals or lawsuits or something? (Only for Guilty verdicts, presumably, since Not Guilty verdicts are unexaminable?) The Judge did say several times that something (the State, maybe?) demands a Yes or No answer to his “could you be an unbiased Juror and do as you’re told?” question, and when people said “I’ll try” or “I think so” or “I’d do my best” or whatever, he insisted on a “Yes” or a “No”. (So good on the honesty for those cop-family people saying “No”, I suppose.)

So if my calculations are roughly correct, after ummm two or 2.5 days of Jury Selection, they’ve selected only about 10 jurors, and exhausted the Jan 13th jury draw; so since they need at least 12 jurors and 2 (and perhaps more like 6) alternates, they’re going to be at this for some time yet! (Hm, unless it’s not a felony case? In which case 10 might be enough? But it sounded like a felony case.)

2023/01/18

The Klara Trilogy is done!

The story of Klara, written by me channeling the Collective Unconscious, illustrated by me using Midjourney, and narrated and set to music and videographed by the talented Karima Hoisan, is finally finished!

I originally thought it was finished at the end of the first forty-frame thing; and then when I did Part Two at about the same length, I thought it was finished; and now having struggled for months on Part Three I’m pretty sure it actually is done. :)

Having just watched Karima’s videos of all three parts in order (playlist here!), I’m glad various viewers convinced me not to stop at one or two parts. It’s pretty good!

And I say this with all modesty; I feel like this story came through me, more than like it is something that I did. The comments over in Karima’s weblog, and her narration, have suggested various meanings and facets to me that I hadn’t thought of before.

In terms of the experience of creating it, it’s been interesting to see the various phases of interaction with the AI tool. I started out Part One by creating various variations of the prompt “detailed surrealism” on the v3 engine on Midjourney, and then weaving the story around pretty much whatever came out.

It happens that in v3, that prompt pretty reliably produces scenes from a stylistically coherent universe, including the MJ Girl, who plays the part of Klara in the first two parts. In Part Two, I had a bit more of an idea of what I wanted to happen, in a general way, but continued using v3 and the same prompt. This required somewhat more work, because it would produce images that didn’t fit with the story I wanted, so I had to put those aside and make more. But the style was at least not much trouble.

Part Three was quite different. For plot reasons, being in basically a different reality, the style needed to be different. It was relatively easy to do that, by using the “test” and “testp” engines, either alone or by “remastering” images made under v3. But the resulting images, while different from those of the first two parts, weren’t nearly as consistent among themselves as those of parts one and two. So I had to play around a lot more with the workflows and the prompts, and produce quite a few more pictures, to get a reasonably consistent style.

The style of Part Three still shifts around quite a bit; the flavor of the city, the color of Klara’s hair, the cat’s fur, and many other things change somewhat from panel to panel, but I wanted a nice mixture of consistent and in flux; and that took work!

Then there was the Story issue. The beginning “recap” part of Part Three was relatively easy that way, summarizing the story of the first two parts from a different point of view. But then I quickly got stuck; I wanted to do something more satisfying and less random than I would get by letting the AI’s raw output drive the action. For whatever reason, it took me quite awhile to find the story thread that I liked, and then about as long to create (or obtain, if you prefer!) the images to go with it.

(The images still drove the narrative to some extent; for instance the firefly line, which I adore, was inspired by the image that goes with it, not vice-versa.)

But finally I finished! :) And Karima made the video in record time, and there it is! Woooo!

I keep feeling like I should make it into good PDFs, or something (even) more readable, and officially post links to that; maybe even have it printed somewhere onto atoms. On the other hand, without the narrative and music and video, it would hardly be the same… :)

2023/01/16

Little Imaginary Diagrams

I asked Midjourney for some simple proofs of the Pythagorean Theorem. The results make me happy. :)

(On the text side: GPT-2 and even GPT-3 might have hallucinated something interesting. ChatGPT would just error out a few times and then give a boring literal description of one in a condescending tone. My ability to be interested in ChatGPT as an interaction partner is severely limited by how boring it is. But anyway, back to the pictures!)

Presented without comment (beyond the alt text):

A geometric diagram with various lines and colored areas and illegible labels (some of which may be small integers). Amusingly, there do not appear to be any right triangles.
A geometric diagram with various lines and colored areas and labels. Some labels are illegible, but there is an 8, a 3, a 4, and a few 1's. Some of the colored areas contain brick patterns, and there is a random architectural arch and a few other map-like textures thrown in.
A comparatively simple geometric diagram of lines and colored areas. There is a right triangle labeled E textured in a pebbly pattern, a rectangle labelled with a G and a unfamiliar glyph, and various areas with fine blue stripes.
A relatively modern-looking flat geometrical diagram containing three triangles (two of them right triangles) in gradients of different colors, a large grey striped area, and various lines. There are labels that look vaguely numeric, but are basically unreadable.

I hope you find these at least as amusing, endearing, and/or thought-provoking as I do. :)

2023/01/07

Parker House Rolls

Parker House Rolls in a glass baking dish

I had one or more Parker House Rolls somewhere once, at some time in the past, and recently something reminded me of them, and today I made some!

Basically this is just a slightly sweet buttery sticky yeast-raised dough, not kneaded, divided into sixteen small loaves and baked all together in the same baking dish so that they grow somewhat back together again and you can have the fun of separating them.

(Even a non-yeast leavened dough might work; I wonder what would happen? This is the kind of thing I wonder about.)

Various recipes on the interwebs (pretty much all of them, really) call for stuff that I don’t have at hand, like sea salt, kosher salt, potato flakes, vegetable shortening, whole milk for that matter, and so on; and also stuff that I don’t have the patience for (or for cleaning up after), like separating eggs, or using very specific attachments and settings of an electric mixer. None of these appear to be necessary.

Here’s the recipe that I roughly actually used; it’s probably closest to this one, but with anything that seemed like too much work or I didn’t have in the house left out.

Parker House Rolls (makes 16)

1 1/4 Cup milk (any kind really; if you use skim, maybe add some extra butter), warmed
1 Tbsp active dry yeast
1/4 Cup sugar
Some salt (I dunno maybe a tsp.)
2 Eggs
8 Tbsp (one stick) butter (unsalted if you have it), softened
4ish Cups of flour

Warm up the milk to room temperature or a bit more, in the microwave or whatever. Similarly, soften the butter by mashing it with a fork, putting it in the microwave on Defrost, or whatever. You can even melt it, but it may impact the consistency of the finished product if you do, I dunno.

Mix the warm milk, yeast, and 2 Tbps (half) of the sugar in the big yellow bowl or other largish mixing bowl. Let that sit for 5-10 minutes. It may or may not froth up and get foamy if the yeast is feeling especially active; don’t sweat it either way.

Add the rest of the sugar, the salt, the eggs, and 6 Tbsp (three quarters) of the butter to the bowl, mix briefly.

Add two cups of flour, and mix until incorporated. You can use a stand mixer or anything else you like in this step, or just a sturdy spoon and main strength. Continue adding flour, about half a cup at a time or whatever you like, until you have a sticky dough that is pulling away from the sides of the bowl, but still sticking to the bottom, or at least showing signs that it would like to. Depending on how soft you softened the butter, there may be lumps of butter in the dough; squash some of them if so, but don’t worry about it too much.

Cover the bowl with a damp cloth or house-rules equivalent, and let sit for say 90 minutes in a cozy place.

After 90 minutes, remove the cloth and gently punch down the dough. Flour your hands because it will have gotten even stickier while rising! Divide the dough into 16 pieces, without unnecessary kneading or other roughness.

For traditionally-formed rolls, flatten each piece and fold it in half; or divide the dough into four pieces and for each piece fold it in half and cut it into quarters, similarly resulting in 16 folded pieces. Or look up various other more elaborate forming methods on the interwebs.

Put the 16 pieces in a four-by-four array (folded edges down) into a 9×13 inch lightly greased (lightly cooking-sprayed is simplest) baking dish; they should be touching each other.

Cover with a damp cloth or equivalent again, and let rise for 45 minutes.

Preheat your oven to 350°F while the dough gets a final few minutes of rising, then remove the cloth and pop the baking dish into the oven. Cook for 25 minutes or until looking pleasantly (but not darkly!) brown on top, or whenever your intuition tells you they’re done.

Brush tops with the remaining 2 Tbs of butter. Let cool for a bit in the baking dish, then tear apart to serve.

May be kept or frozen like any bread that has butter and milk and eggs and no preservatives, but really you’re going to eat them all almost immediately, aren’t you?

2023/01/01

Review of Ishiguro’s “Klara and the Sun”

This is an amazing book (a Christmas present from the little daughter; thanks, little daughter!). I will just post my review from bookwyrm here also.


An amazing book; can I have more stars to give it?

5 stars

This is one of those very rare books that reminds me of what books are at some level all about. That makes me want to go about and knock about two stars off of 99% of my prior book ratings, to make room to properly differentiate this one.

It’s hard to say too much that’s concrete, without giving it away. I was closer to tears at the end of this than I can remember with any book for a long time. Not easy maudlin tears, but deep oh-my-god tears about what a universe this is.

The people are very fully people; the viewpoint character is not a person, but … well, that would be a spoiler also. But the viewpoint it gives her allows Ishiguro to say some amazing and touching and true and thought-provoking things without coming out and saying them (because nothing he could come out and say would say them so well).

Language cannot express truth, I often say; but what I mean is that it can’t explicitly express literal truth. Language, when it’s used with this much expertise, can and does express deep and breathtaking truth.

I need to go spend a few weeks processing this now, I think.

If you haven’t read it yet, I highly recommend that you should.


… and you should.

2022/12/26

December 26th, 2022

We made just 106 dumplings this year, plus another eight filled with Extra Sharp Cheddar Cheese (that was the little boy’s idea; they’re pretty good!). This is a smaller number than usual (drill back into prior years here). The small number was probably mostly because single units of ground meat from FreshDirect tend to weigh just a pound, whereas single units from the grocery in prior years were more like 1.25 to 1.4 pounds. (Although, come to think of it, just where did we get the ground meat last year? Not sure.) And also because grownups tend to put more meat in each dumpling, perhaps. But in any case, we are now all pleasantly full, and the little daughter and her BF are safely back in the urbanity.

What has occurred? I feel like things have occurred, to an extent. I am more on Mastodon now than on Twitter, and if you want to keep up with the images I’ve been making in Midjourney and so on, you’ll want my Pixelfed feed. I listed lots of various of these pointers back the other week (and wow having every chapter of the novel as a weblog post makes it hard to scroll through the weblog). When Elon “facepalm” Musk briefly prohibited linking from Twitter to Mastodon, I actually set up a LinkTree page with my links.

Someone must have said “they can still link to Mastodon via Linktree” in his hearing, because he then briefly prohibited linking to LinkTree. That caused me to set up my own Links page over on the neglected (and in fact apparently pretty much empty) theogeny.com; I should put back all the stuff that used to be there sometime!

Note how ossum that Links page is! When you move the cursor over it, the thing that the mouse is over that you will go to if you click (if any) changes color (although I drew the line at having it bouncily change size the way Linktree does). You can look at the page source, and see the lovely hand-coded CSS and HTML. :) It even validates! (w3c seems to have a change of mind about validation badges, which makes me a little sad, so there’s no little “valid HTML 5!” badge on the page that links to the verification of the claim, but hey.)

That reminded me of the One-Dimensional Cellular Automaton that I make in hand-coded CSS and HTML and JavaScript the other year; it vanished for a long time, even from my personal backups of davidchess.com, and I’d almost given up on finding it until I thought of the Internet Archive‘s Wayback Machine, and discovered that it had snapshotted that page exactly once, in February of 2012.

So after a bit of fiddling around, I can once again present the One-Dimensional Cellular Automaton for your amusement. The page source there is also quite readable, I tell myself.

Note that many other things on davidchess.com are currently / still broken, although in the process of bringing that page back, I also brought the main page back, so you can see the extremely retro rest of the site (working and otherwise), including the entries in this (well, sort of “this”) weblog between 1999 and 2011.

Oh yeah, we had Christmas! That was nice. :) I got lots of chocolate, and the little (not little anymore) boy gave me a digital image of Spennix (my WoW main) dressed like the pioneer in the Satisfactory game, with a perfect “Spennixfactory” logo. And wife and daughter both got me books: “The Hotel Bosphorus” (a murder mystery set in Istanbul, my current Bucket List destination, and involving a bookshop, so what could be better?) from M, and “Klara and the Sun” (which I’ve been meaning to get, but never had) from the little daughter. (She thought that maybe I already had it and that’s why Klara is called “Klara” in the Klara stories, but it was as far as I know a complete coincidence.)

I’m working away at Part Three of Klara, after she leaves the clockwork world, but it’s slow going. I have an actual plot in mind that I want to illustrate, and I’m using a different graphical style which necessitates a different Midjourney workflow that I haven’t quite optimized yet. But it’ll get done! Probably! :)

We close with a Seasonal Image for the Solstice…

A disc with abstract shapes of fir trees, decorations, planets, and whatnot around the edge. In the center a round shape with small spiked protrusions, perhaps the sun, sits atop what may be a tree trunk that projects upward from what may be the ground and some roots at the bottom of the image. Branches stick out of the perhaps-sun, and some stars and planets and a few more enigmatic shapes inhabit the spaces between the branches.

Here’s to the coming of the longer days! Or the cooler ones, to those on the flipside… :)

2022/12/21

Best Buy queueing theory

Single-queue multiple-server is often a pretty optimal way to set up a system; there’s a single potentially large / unending bunch of jobs / customers waiting, and some comparatively small number of servers / staff to take care of them. When a server is free, some job is chosen and the server starts running / serving that job.

When the chosen job / customer is always the one that’s been waiting longest, that’s a FIFO (first-in first-out) queue, known to consumers in the Eastern US as a “line”. It’s easy to implement, sometimes pretty optimal under certain assumptions, and has a sort of “fair” feeling about it.

On the other hand, I have the feeling that when the customer set is highly bimodal, the whole setup might not be entirely optimal in some cases.

For instance, if some of your customers are just buying a 1Gb Ethernet switch (see below) and some Reese’s Peanut Butter Cups using a credit card, and it will take 45-60 seconds, and another set of customers are picking up something that’s being held for them somewhere in the back, and they aren’t quite sure what it is, and want the staff person to explain how to use it, and then want to purchase it using Latvian stock market futures that are actually in their brother-in-law’s name, and that will take 20-90 minutes, then some of those first set of customers are going to end up waiting (in some sense) an unnecessarily long time, waiting for education to complete or a brother-in-law’s marriage certificate to be found in an overcoat pocket.

One could assign a particular server to small jobs, or to small jobs if there are any such waiting, or always let a short job go before a long job if there are any waiting, or unless there’s a large job that’s been waiting more than a certain time, or…

All of these can be implemented in software systems, but most of them are too complicated or unfair-feeling for a Best Buy full of strangers. Allocating one server / staff member / desk to “customer service” (anything involving training, or stock market futures, for instance) and the rest to ordinary purchases is about as complex as it’s practical to implement. They weren’t doing even that at my Best Buy this morning, but then there were only three staff people on registers, and taking one-third of them away from the short-transaction customers might have been bad. Or just no one wanted to bother figuring it out.

Speaking of 1Gb Ethernet switches, I mean, WTF? I know I’m old, but I still think of these as costing thousands (tens of thousands?) of USD, requiring careful tuning, and taking up a significant part of a room (okay, a small room, or at least a rack slot). Now granted that was maybe for one with at least 24 ports and a management interface, but I mean! I can buy one for the price of two large pizzas, and it fits in the palm of my hand? Really? Where are the flying cars then??

A picture of a Netgear 1Gb Ethernet Switch.

That is a picture of a 1Gb Ethernet Switch. Possibly the one that I bought, I forget exactly. Might have been Linksys. Or something.

2022/12/17

This here Framework laptop

Hardware geekery! Extremely softcore hardware geekery, because I am very much not a hardware person. (I was once, about when a 10MB hard drive was an exciting new thing. That was awhile back!)

A couple of years ago, I bought a Lenovo Legion Y740 I think it was, laptop. This was after being rather disappointed by a Dell Alienware something-something laptop previously. After a couple of years (or really rather less than that) I was rather disappointed by the Lenovo Legion Y740 also:

  • A couple of the keys just broke, and turned out to be hard to obtain replacements for (because after a couple of years they were of course old and only obtainable from antiquarian key sellers, and because figuring out exactly what key one needs is more challenging than it ought to be, because not all Legion Y740s apparently use the same up-arrow key), and also hard to replace once one did have the (probably) right key. At least once I managed to break the replacement key while trying to replace the broken key. So I spent lots of time poking at the nib under the key, and that got old (especially for the up-arrow key).
  • It forgot how to talk to an Ethernet cable, in rather complicated ways that I couldn’t figure out: the cable provably worked fine with other devices, and in every connection method to this computer (direct Ethernet, Ethernet-to-USB-A, and Ethernet-to-USB-C), it worked very badly, occasionally working for a bit but then randomly dropping. Hardware, software? Who can tell. “Reinstalling” the “Windows network drivers” made no difference.
  • It began being very confused about its battery. After using it for some time on power, with it announcing the battery charge at 100%, there was a good chance that within a few seconds of being unplugged it would shut down in an emergency fashion (not that Windows seems to know of any other kind), and on being plugged in again elsewhere would claim that the battery is at 0%. Bad battery? Bad power driver? Bad something else? No idea. Also it would sometimes randomly shut down even while plugged in. Battery? Overheating? No idea.
  • And some other stuff I forget; at least one of the USB ports felt very loose and I was never confident that a thing plugged into it would continue working.

And then it started to not be able to see the hard drive and/or to just randomly not boot. So that was bad. (For what it’s worth, both the HDD and the SDD work fine when read via USB from an external enclosure, so probably it was something complicated inside.)

So as not to be entirely limited to my “cellular phone”, I bought a tiny little Samsung Chromebook of some kind, for the price of roughly a dinner delivered by DoorDash, and that was actually surprisingly acceptable. No playing WoW or Satisfactory or Second Life / OpenSim or anything like that, but pretty much everything else worked great, lots of Chrome windows, multiple displays, Zoom, etc. It did slow down when too loaded, but it was able to do more at once than I initially expected it to.

I did want to be able to play games and be virtual again eventually, though, so I was looking in a disconsolate way at various beefier laptops that would probably disappoint me before long, when I (along with lots of other people) came across Cory Doctorow’s “The Framework is the most exciting laptop I’ve ever used: Sustainable, upgradeable, repairable and powerful“, and I (along with lots of other people) thought “Hmmm!”.

I went to the website, used the configurator / designer thing to design a plausible-sounding one, noted that it was noticeably not all that expensive, and probably without thinking about it as hard as I should have, I ordered one. Initially there wasn’t much information about when it might arrive (the ETA was, as I recall, “November” or something like that), since Framework are a comparatively small outfit who have to do things like batching up a bunch of orders and obtaining the hardware and getting the manufacturing done only once they have enough, and like that. But I figured I could get by on the tiny Chromebook for a bit longer.

As it turned out, I got a notice that it was being put together, and then shipped, at the early side of the ETA, and it arrived days before it was predicted to; so that was all nice. The Unboxing Experience was great; it turned out (I’d forgotten this!) that I had ordered the “DIY” version, which meant I had to spend maybe 10 minutes, tops, plugging in the SDD and RAM. (Apparently some DIY instances also have to install the WiFi object, which looks a little more complex, but mine already had it.)

And it works great!

The video is not some fancy AMD or NVIDIA, but just some modern Intel video chipset, which Real Gamers look down upon, but it runs WoW and Satisfactory and the Firestorm viewer and stuff just fine, and that’s all I need. (Firestorm does crash sometimes, and that might be because of the chipset, or something completely different.) The hot-swappable ports are terrific! I do realize that it’s really just like four fast USB-C connections on the motherboard and then a variety of something-to-USB-C adapters that slip in and out of them easily, but the effect is that if you want your computer to have four USB-C connections one minute, and a USB-C, a USB-A, an Ethernet RJ45, and an HDMI or Display Port or whatever the next minute, that’s totally doable. (I generally have three USB-C and an RJ45, and don’t swap them to speak of, but it’s that I could.)

Which reminds me to be a little baffled about how whatever-to-USB-C adapters work, and how they can be so small and simple and all. I mean, isn’t there more than just some crossing of wires involved in mapping USB-C signals to Ethernet RJ45? That particular adapter does stick out of the computer more than the others (which are all USB-C to USB-C, so yeah that I understand being simple), and has some circuitry visible inside its rather cool transparent body. But still, the fact that there are relatively simple and relatively cheap wires that can connect USB-C to just about anything impresses me. I guess many of them have little tiny computers inside? And that that’s what the “U” stands for and all? Okay.

It’s quiet (no moving parts to speak of, no HDD, SDD being so cheap these days, and that must be a very quiet fan in there), it’s slim and light (feels about like the tiny Samsung in terms of heft), it gets hot but not too hot, and it looks nice. Simple and clean and efficient visual design. And it really is designed to be opened up and have parts replaced if they break. (If a key breaks, apparently the theory is that you should replace the keyboard, and that’s a bit wasteful, but at least it’s possible!) And unlike the Samsung, it has a backlit keyboard! (Oh, and an audio jack, too! That’s just chef’s-kiss-dot-bmp.)

The only things I dislike about the Framework at all are (I don’t even remember what I was going to write here), I guess, that I’m running Windows on it? :) Windows 11, in fact, which it came with, and which is annoying in various Windows ways, but livable-with, and WoW and Satisfactory don’t as far as I know run on ChromeOS.

(Possibly there’s some setup that would run Linux, or Linux-under-ChromeOs, and then either Windows under that and then the games, or Linux versions of the games, or something, but I’m not into that level of system fiddling these decades.)

Oh, and the other negative thing is that the WiFi signal is terrible over here where I sit when we’re all sitting around in the big bedroom at night being together and doing our internet things. But that is not the laptop’s fault, and I should just move the access point to a better place or get a repeater / booster / mesh, or just run another wire up from the basement and plug into that. It works well enough.

So I’m happy! I have a small and light and quiet but sufficiently muscular machine which does the things I want and has a good vibe (a great vibe for moving off Twitter and onto the Fediverse at the same time, but that’s for another post). It’s possible that it will wear out quickly, but I’m kind of hopeful. More than I would be with yet another generic supposedly-fancy corporate laptop, anyway.

2022/12/16

Some light infringement?

I think I have said on a few occasions that for instance a class-action copyright lawsuit against Copilot might not bear directly on AI art tools like Midjourney, to the extent that CoPilot apparently does tend to copy from its training set verbatim (and unattributed) whereas (I thought at the time) Midjourney doesn’t.

Well, it turns out that Midjourney does, maybe, to an extent. For maybe a few works?

The one that’s gotten the most attention is the 1984 photograph of Sharbat Gula by Steve McCurry, popularly known as “Afghan Girl“. The strings “afghan girl” and (haha) “afgan girl” are prohibited in Midjourney prompts at the moment. (“The phrase afghan girl is banned. Circumventing this filter to violate our rules may result in your access being revoked.”) And this is apparently because that phrase all by itself elicits what are arguably just slight variations of the original.

There’s a Twitter post that claims to show this, but I’m not certain enough it’s real to link to it. Also it’s on Twitter. But I can say that entering similar non-banned phrases like “young Afghan woman” also produce images that are at least quite similar to the photo of Gula, more similar than I would have expected. Given the size of the Midjourney training set, that image in association with those words must occur a lot of times!

(Update: it seems likely that the most widely-circulated image purporting to show Midjourney spontaneously generating close copies of the Gula “Afghan Girl” picture, is not actually that: it was made by giving the AI a copy of the original photo (!) and the prompt “afghan girl, digital art”. That the AI can make a copy of a work, given a copy of the work, is no surprise! Evidence, on a link probably usable only if you’re logged into Midjourney, is here. Given the further examples below, this doesn’t entirely undercut the point, but it’s interesting.)

The other example that I know of is “Starry Night”, which brings up variations of the van Gogh piece. This one’s out of copyright :) so I have no qualms about posting what I got:

Four variations on van Gogh's "Starry Night" ("De sterrennacht"), all with the swirly sky, tree to the left, buildings with lighted windows in the background, hills in the distance, crescent moon upper-right, blobby stars, etc.

Pretty obviously derivative in the usual sense. Derivative Work in the legal sense? I have no idea, and copyright law is sufficiently squishy and subjective that there is probably not a correct answer until and unless explicitly litigated, or the legal landscape otherwise changes significantly.

Are there other short phrases that will home in on a particular famous image? “Mona Lisa” (also out of copyright) certainly seems to:

Four variants of the Mona Lisa, all markedly worse than the original, but all very recognizable.

Interesting and/or hideous variations, but still instantly recognizable.

What else might we try? “Migrant Mother” produces images that I think are clearly not derivative works:

Four rather striking monochrome images of a woman and child, in various poses and garments, with variously creepy-looking hands.

Striking perhaps, ruined by the bizarre hands perhaps, in the same general category as the photo by Lange, but clearly of different people, in different positions, and so on. It’s not “plagiarizing” here, at any rate.

What if we tried harder? Let’s explicitly prompt with like “Migrant Mother photo, Dorothea Lange, 1936“. Whoa, yipes! Is this out of copyright? Well, if not it’s probably Fair Use in this posting anyway, so here:

Four slight variations of the famous Migrant Mother photo, showing a worried-looking woman with a child hiding its face on either side of her.

Definitely derivative, and possibly Derivative. How about “Moon and Half Dome, Ansel Adams, 1960“? Well:

Four pictures showing an oddly-distorted version of Half Dome, a very large moon, and some evergreens. One also has a reflecting body of water in the foreground, another is framed by a stone archway.

This is definitely not the picture that that search will get you in Google Images; if nothing else, the moon is way too large, and the top of Half Dome is a bizarre penguin-bill sort of shape. I’m guessing that this is because there are lots of other Ansel Adams pictures in the training set associated with words like “moon” and “half dome”, and mushing them all together quasi-semantically gives this set. The origin of the penguin-bill I dunno.

Maybe “Beatles Abbey Road cover, crossing the street“?

Crosswalk, front guy in white, roundish car to the left, check. Derivative in various senses, for sure. More specific prompting could presumably increase the exactness.

So I think we’ve established, to the extent of the tiny number of experiments I have the energy to do, that Midjourney (and, I would wager, other AI art tools, mutatis mutandis; I could get a Starry Night easily out of NightCafe, but not a Migrant Mother) can in fact produce images, the production of which arguably violates one or more of the rights of the copyright holder. It is most likely to do it if you explicitly try to do it (giving the most famous name of the image along with the artist and ideally the year and anything else that might help), but can also do it by accident (innocently typing “afghan girl”).

This doesn’t mean that these tools usually or typically do this; the fact that you can get a copy of an image from a tool that looks like it involves copyright laws doesn’t mean that other images made with it also involve copyright laws. To use the usual comparison, you can easily violate copyright using Photoshop, but that doesn’t suggest that there aren’t non-infringing uses of Photoshop, nor does it provide evidence that any particular image from Photoshop is infringing.

The easiest way to think about the blocking of “afg{h}an girl” from Midjourney prompts is that they have made a tool, realized that it could be used to violate copyright, and taken action to make it more difficult to use it that way in some cases.

This all bears on the question of whether images made with AI tools violate copyrights; the question of whether making the AI tools in the first place involves an infringing use is a somewhat different question, and we might talk about it some other time, although I’m still feeling kind of burnt out on the legal issues. But I did want to update on this one particular thing.

More updates: More stuff has happened in this area! For my latest deep and otherwise thoughts on the subject, I recommend the copyright tag on the weblog here.

2022/12/13

Hemingway, by Midjourney

I now have like 190 images in the folder that Windows uses to pick desktop backgrounds from; building on the twenty-odd that I posted here the other day. They are fun! But I’m not going to post any more right now; right now, I’m going to post some images comparing the various Midjourney engines (which they have generously kept all of online). I’m going to use the prompt “Hemingway’s Paris in the rain”, because why not! We can do other prompts some other time.

For most of these (all but “test” and “testp” I think), it produced four images, and I chose one to make bigger. Otherwise (except as noted) these are all just one-shots on that prompt. I’m going to paste them in more or less full-size, and let WordPress do what it will. Click on an image might or might not bring up a larger version or something who knows.

Here is the quite historical v1:

A rather vague but definitely rainy image of Hemingway's Paris in the rain. There is a tall black tower to the left that may be inspired by the Eiffel Tower, but resembles it only vaguely.

Here, similarly, is v2:

Another vague and rainy, perhaps slightly less streaky, image of Hemingway's Paris in the rain. A possible bit of Eiffel Tower inspired tower shows over the buildings to the right.

I rather like both of these; they are impressionistic, which I like, and I suspect it’s mostly because that’s the best they can do in rendering things.

Here is “hd”, which may be the same thing as v1 or v2 I’m not sure; this particular image is more strongly monochrome and sort of vintage-looking photo-wise:

A somewhat blurry and rainy of an old city square with some people in it, some with umbrellas. Could be Hemingway's Paris; no towers evident.

Now v3, which is pretty much when I started using Midjourney; it’s interesting how impressionistic this is, given that we know v3 can also do rather more realistic stuff (all of this, for instance, was v3):

A rather impressionistic drawing, perhaps in charcoal, with a somewhat Eiffelish tower to the left. Definitely rain, likely Paris.

Between v3 and v4, we had this engine, lyrically named “test” (I used the additional “creative” flag, because why wouldn’t one?); one is getting a bit more photographic here:

A slightly less vague still image of Paris in the rain, black and white, umbrellas, and so on.

and here is the “testp” variant of “test”; the “p” is supposed to be for “photographic”; I used the “creative” flag here also. It’s not notably more photographic than “test” in this case; maybe it’s the rain:

Another rainy city street, monochrome, a few cars, shiny impressionistic pavement, townhouses.

Now brace yourself a bit :) because here is the first version of v4 (technically in terms of the current switches it’s “v 4” and “style 4a”):

A soft-edge realistic painting of a Paris street in the rain, in muted but glowing colors. A few people walking in the distance are vague but convincing shapes. The Eiffel Tower is visible in the distance.

Yeah, that’s quite a difference. We have colors, we have lanterns casting light, we have very definite chairs and awnings and things. But now, the current v4 (“style 4b” which is I think currently the v4 default):

A rather realistic painting of vintage Paris in the rain; a couple of old-style cards on the street, their headlights and the lights of the shops reflecting in the wet pavement. Shopfronts and awnings, people in identifiable clothing. There are words on a couple of the shopfronts, but they are unintelligible: something like PHASESILN for instance.

Yeah, that’s gotten rather realistic, hasn’t it? It’s even trying to spell out the signs on shopfronts, even if it hasn’t really mastered language. But those cars are extremely car-like and detailed compared to anything earlier.

Can this currently-fanciest engine give us something a bit more like the atmosphere of the older ones, if we want that? Basically yes, if we ask for it. Here is the latest v4 again, with “impressionistic” added to the prompt:

Yet another wet rainy city street scene, again in full convincing muted color, but more impressionistic than the last. Again we have people (and hats) and umbrellas and shopfronts, but no attempt at individual letters on signs.

I rather like that! And “monochrome” would make it monochrome, and so on.

It’s perhaps interesting that the more recent engines were less insistent that pictures of Paris include the Eiffel Tower. Possibly just the random number generator, given how tiny our sample is here, but possibly significant in some way.

So there we are, nine probably rather enormous pictures of Hemingway’s Paris in the rain, as conceived by various stages of development of the Midjourney AI, and with only very minimal human fiddling around (picking the prompt and the one to feature from each set of four, having the idea to compare the versions in the first place, and like that) by me.

Comments welcome as always, or just enjoy the bits. :)

2022/12/04

Omelas, Pascal, Roko, and Long-termism

In which we think about some thought experiments. It might get long.

Omelas

Ursula K. LeGuin’s “The Ones Who Walk Away From Omelas” is a deservedly famous very short story. You should read it before you continue here, if you haven’t lately; it’s all over the Internet.

The story first describes a beautiful Utopian city, during its Festival of Summer. After two and a half pages describing what a wise and kind and happy place Omelas is, the nameless narrator reveals one particular additional thing about it: in some miserable basement somewhere in the city, one miserable child is kept in a tiny windowless room, fed just enough to stay starvingly alive, and kicked now and then to make sure they stay miserable.

All of the city’s joy and happiness and prosperity depends, in a way not particularly described, on the misery of this one child. And everyone over twelve years old in the city knows all about it.

On the fifth and last page, we are told that, now and then, a citizen of Omelas will become quiet, and walk away, leaving the city behind forever.

This is a metaphor (ya think?) applicable whenever we notice that the society (or anything else) that we enjoy, is possible only because of the undeserved suffering and oppression of others. It suggests both that we notice this, and that there are alternatives to just accepting it. We can, at least, walk away.

But are those the only choices?

I came across this rather excellent “meme” image on the Fedithing the other day. I can’t find it again now, but it was framed as a political-position chart based on reactions to Omelas, with (something like) leftists at the top, and (something like) fascists at the bottom. “Walk away” was near the top, and things like “The child must have done something to deserve it” nearer the bottom. (Pretty fair, I thought, which is why I’m a Leftist.)

It’s important, though, that “Walk away” wasn’t at the very top. As I recall, the things above it included “start a political movement to free the child”, “organize an armed strike force to free the child”, and “burn the fucking place to the ground” (presumably freeing the child in the process), that latter being at the very top.

But, we might say, continuing the story, Omelas (which is an acronym of “Me also”, although I know of no evidence that Le Guin did that on purpose) has excellent security and fire-fighting facilities, and all of the top three things will require hanging around in Omelas for a greater or lesser period, gathering resources and allies and information and suchlike.

And then one gets to, “Of course, I’m helping the child! We need Councilman Springer’s support for our political / strike force / arson efforts, and the best way to get it is to attend the lovely gala he’s sponsoring tonight! Which cravat do you think suits me more?” and here we are in this quotidian mess.

Pascal

In the case of Omelas, we pretty much know everything involved. We don’t know the mechanism by which the child’s suffering is necessary for prosperity (and that’s another thing to work on fixing, which also requires hanging around), but we do know that we can walk away, we can attack now and lose, or we can gather our forces and hope to make a successful attack in the future. And so on. The criticism, if it can even be called that, of the argument, is that there are alternatives beyond just accepting or walking away.

Pascal’s Wager is a vaguely similar thought experiment in which uncertainty is important; we have to decide in a situation where we don’t know important facts. You can read about this one all over the web, too, but the version we care about here is pretty simple.

The argument is that (A) if the sort of bog-standard view of Christianity is true, then if you believe in God (Jesus, etc.) you will enjoy eternal bliss in Heaven, and if you don’t you will suffer for eternity in Hell, and (B) if this view isn’t true, then whether or not you believe in God (Jesus, etc.) doesn’t really make any difference. Therefore (C) if there is the tiniest non-zero chance that the view is true, you should believe it on purely selfish utilitarian grounds, since you lose nothing if it’s false, and gain an infinite amount if it’s true. More strongly, if the cost of believing it falsely is any finite amount, you should still believe it, since a non-zero probability of an infinite gain has (by simple multiplication) an infinite expected value, which is larger than any finite cost.

The main problem with this argument is that, like the Omelas story but more fatally, it offers a false dichotomy. There are infinitely more possibilities than “bog-standard Christianity is true” and “nothing in particular depends on believing in Christianity”. Most relevantly, there are an infinite number of variations on the possibility of a Nasty Rationalist God, who sends people to infinite torment if they believed in something fundamental about the universe that they didn’t have good evidence for, and otherwise rewards them with infinite bliss.

This may seem unlikely, but so does bog-standard Christianity (I mean, come on), and the argument of Pascal’s Wager applies as long as the probability is at all greater than zero.

Taking into account Nasty Rationalist God possibilities (and a vast array of equally useful ones), we now have a situation where both believing and not believing have infinite expected advantages and infinite expected disadvantages, and arguably they cancel out and one is back wanting to believe either what’s true, or what’s finitely useful, and we might as well not have bothered with the whole thing.

Roko

Roko’s Basilisk is another thought experiment that you can read about all over the web. Basically it says that (A) it’s extremely important that a Friendly AI is developed before a Nasty AI is, because otherwise the Nasty AI will destroy humanity and that has like an infinite negative value given that otherwise humanity might survive and produce utility and cookies forever, and (B) since the Friendly AI is Friendly, it will want to do everything possible to make sure it is brought into being before it’s too late because that is good for humanity, and (C) one of the things that it can do to encourage that, is to create exact copies of everyone that didn’t work tirelessly to bring it into being, and torture them horribly, therefore (D) it’s going to do that, so you’d better work tirelessly to bring it into being!

Now the average intelligent person will have started objecting somewhere around (B), noting that once the Friendly AI exists, it can’t exactly do anything to make it more likely that it will be created, since that’s already happened, and causality only works, y’know, forward in time.

There is a vast (really vast) body of work by a few people who got really into this stuff, arguing in various ways that the argument does, too, go through. I think it’s all both deeply flawed and sufficiently well-constructed that taking it apart would require more trouble that it’s worth (for me, anyway; you can find various people doing variously good jobs of it, again, all over the InterWebs).

There is a simpler variant of it that the hard-core Basiliskians (definitely not what they call themselves) would probably sneer at, but which kind of almost makes sense, and which is simple enough to express in a way that a normal human can understand without extensive reading. It goes something like (A) it is extremely important that a Friendly AI be constructed, as above, (B) if people believe that that Friendly AI will do something that they would really strongly prefer that it not do (including perhaps torturing virtual copies of them, or whatever else), unless they personally work hard to build that AI, then they will work harder to build it, (C) if the Friendly AI gets created and then doesn’t do anything that those who didn’t work hard to build it would strongly prefer it didn’t do, then next time there’s some situation like this, people won’t work hard to do the important thing, and therefore whatever it is might not happen, and that would be infinitely bad, and therefore (D) the Friendly AI is justified in doing, even morally required to do, a thing that those who didn’t work really hard to build it, would strongly rather it didn’t do (like perhaps the torture etc.). Pour encourager les autres, if you will.

Why doesn’t this argument work? Because, like the two prior examples that presented false dichotomies by leaving out alternatives, it oversimplifies the world. Sure, by retroactively punishing people who didn’t work tirelessly to bring it into being, the Friendly AI might make it more likely that people will do the right thing next time (or, for Basiliskians, that they would have done the right thing in the past, or whatever convoluted form of words applies), but it also might not. It might, for instance, convince people that Friendly AIs and anything like them were a really bad idea after all, and touch off the Bulterian Jihad or… whatever exactly that mess with the Spacers was in Asimov’s books that led to their being no robots anymore (except for that one hiding on the moon). And if the Friendly AI is destroyed by people who hate it because of it torturing lots of simulated people or whatever, the Nasty AI might then arise and destroy humanity, and that would be infinitely bad!

So again we have a Bad Infinity balancing a Good Infinity, and we’re back to doing what seems finitely sensible, and that is surely the Friendly AI deciding not to torture all those simulated people because duh, it’s friendly and doesn’t like torturing people. (There are lots of other ways the Basilisk argument goes wrong, but this seems like the simplest and most obvious and most related to the guiding thought, if any, behind his article here.)

Long-termism

This one is the ripped-from-the-headlines “taking it to the wrong extreme” version of all of this, culminating in something like “it is a moral imperative to bring about a particular future by becoming extremely wealthy, having conferences in cushy venues in Hawai’i, and yes, well, if you insist on asking, also killing anyone who gets in our way, because quadrillions of future human lives depend on it, and they are so important.”

You can read about this also all over the InterThings, but its various forms and thinkings are perhaps somewhat more in flux than the preceding ones, so perhaps I’ll point directly to this one for specificity about exactly which aspect(s) I’m talking about.

The thinking here (to give a summary that may not exactly reflect any particular person’s thinking or writing, but which I hope gives the idea) is that (A) there is a possible future in which there are a really enormous (whatever you’re thinking, bigger than that) number of (trillions of) people living lives of positive value, (B) compared to the value of that future, anything that happens to the comparatively tiny number of current people is unimportant, therefore (C) it’s morally permissible, even morally required, to do whatever will increase the likelihood of that future, regardless of the effects on people today. And in addition, (D) because [person making the argument] is extremely smart and devoted to increasing the likelihood of that future, anything that benefits [person making the argument] is good, regardless of its effects on anyone else who exists right now.

It is, that is, a justification for the egoism of billionaires (like just about anything else your typical billionaire says).

Those who have been following along will probably realize the problem immediately: it’s not the case that the only two possible timelines are (I) the one where the billionaires get enough money and power to bring about the glorious future of 10-to-the-power-54 people all having a good time, and (II) the one where billionaires aren’t given enough money, and humanity becomes extinct. Other possibilities include (III) the one where the billionaires get all the money and power, but in doing so directly or indirectly break the spirit of humanity, which as a result becomes extinct, (IV) the one where the billionaires see the light and help do away with capitalism and private property, leading to a golden age which then leads to an amount of joy and general utility barely imaginable to current humans, (V) the one where the billionaires get all the money and power and start creating trillions of simulated people having constant orgasms in giant computers or whatever, and the Galactic Federation swings by and sees what’s going on and says “Oh, yucch!” and exterminates what’s left of humanity, including all the simulated ones, and (VI) so on.

In retrospect, this counterargument seems utterly obvious. The Long-termists aren’t any better than anyone else at figuring out the long-term probabilities of various possibilities, and there’s actually a good reason that we discount future returns: if we start to predict forward more than a few generations, our predictions are, as all past experience shows, really unreliable. Making any decision based solely on things that won’t happen for a hundred thousand years or more, or that assume a complete transformation in humanity or human society, is just silly. And when that decision just happens to be to enrich myself and be ruthless with those who oppose me, everyone else is highly justified in assuming that I’m not actually working for the long-term good of humanity, I’m just an asshole.

(There are other problems with various variants of long-termism, a notable one that they’re doing utilitarianism wrong and/or taking it much too seriously. Utilitarianism can be useful for deciding what to do with a given set of people, but it falls apart a bit when applied to deciding which people to have exist. If you use a summation you find yourself morally obliged to prefer a trillion barely-bearable lives to a billion very happy ones, just because there are more of them. Whereas if you go for the average, you end up being required to kill off unhappy people to get the average up. And a perhaps even more basic message of the Omelas story is that utilitarianism requires us to kick the child, which is imho a reductio. Utilitarian calculus just can’t capture our moral intuitions here.)

Coda

And that’s pretty much that essay. :) Comments very welcome in the comments, as always. I decided not to all any egregious pictures. :)

It was a lovely day, I went for a walk in the bright chilliness, and this new Framework laptop is being gratifyingly functional. Attempts to rescue the child from the Omelas basement continue, if slowly. Keep up the work!