Posts tagged ‘hype’

2023/05/11

OpenAI: Where by “can” we mean “can’t”

Disclaimer: I work for Google, arguably a competitor of OpenAI; but these opinions are solely my own, and I don’t work in the AI area at Google or anything.

And I mean, oh come on!

So much AI “research” in these hype-heavy times is all bunk, and I suppose one shouldn’t expect OpenAI (“Open” heh heh) to be any different. But this pattern of:

  1. Use an AI to try to do some interesting-sounding thing,
  2. Evaluate how well it did by waving your hands around, or just by eyeballing it,
  3. Declare victory,
  4. Publish an “AI can do thing!!!!” paper that will get lots of media attention.

is just sooooo tiring. (See for instance the YouTuber in that prior post who showed their system producing a non-working tic-tac-toe game and saying “well, that worked!”.)

The one I’m facepalming about here was brought to my attention by my friend Steve, and omg: “Language models can explain neurons in language models“. They did sort of the obvious thing to try to get GPT-4 to make predictions about how a few selected “neurons” in GPT-2 behave for a few inputs. The key line for me in the paper is:

“Although the vast majority of our explanations score poorly, we believe we can now use ML techniques to further improve our ability to produce explanations.” 

— OpenAI

They say this because (they have been drinking too much of the Kool-Aid, and) they tried a few things to make the initial abysmal scores better, and those things made them slightly better, but still poor. They say in the (extremely brief) report that although it works badly now, it could be the case that doing it differently, or maybe doing more of it, might work better.

In any other field this would be laughed at (or politely desk-rejected with a “fantastic, please submit again once you find something that does actually work better”); but in the Wacky Wacky World of Large Language Models it goes on the website and gets cited in half a ton of headlines in the media.

And is it really honest to use “can” in a headline to mean “can, very very badly”? By that standard, I can predict the weather by flipping a coin. I didn’t say I could predict it accurately!

I suppose the LLM hype is better than the Crypto hype because fewer people are being bilked out of money (I guess?), but still…

2023/04/26

It’s All Bunk, Again

This is about the current AI hypestorm, and has intentionally a rather extreme title. :) For some values of “it”, it is not all bunk; there are lots of cool and funny and convenient things that AI in general, and that large auto-regressive language models specifically (“LLMs”), can do. Various tasks and jobs will become quicker and easier, and there will be some new fun things to do.

But when it comes to all the breathless claims that “ChatGPT / Bard / FooLLM will completely change the way that we X”, for virtually any value of X that isn’t “produce reams of plausible-looking text whose quality doesn’t matter much”, it’s all bunk. Something describable as AI may eventually do that (I’ll even say that it will eventually do that if everything else survives long enough), but the LLM technology that we have now, even if made better / faster / stronger, will not.

This is a prediction, and it could be wrong, but I’m willing to make the claim without much in the way of caveats. There have been various times in the past where I’ve muttered to myself that a thing was bunk, but not said it on the record so I couldn’t point back at it and say “ha, I toldja so!” once it turned out to be, indeed, bunk. So this time I’m putting it on the record.

It’s commonly observed that when reading a media account of some technical field that one knows something about, one often thinks “hey, that’s completely wrong!”. And that this suggests that the media is also completely wrong in technical fields that one doesn’t know enough about to notice the errors.

It seems likely that this applies to media stories like “ChatGPT / etc will completely change the way that we X” as much as to any other, so given that we know something about, say, software development, we could look at, say, “ChatGPT spells the end of coding as we know it” (also on Yahoo News).

And it’s bunk. The headline is sort of the usual cringey headline, in that it exaggerates the most extreme part of the article for the clickz, but the article does say inter alia, “For better or worse, the rise of AI effectively marks the end of coding as we know it,” which is close. “[T]he rise of AI” is considerably more general than “ChatGPT”, and “effectively” is a nice broad weasel-word, so the actual sentence is fuzzier and more defensible than the headline, but it’s still bunk. As is for instance the quote immediately preceding that statement, in which someone with a deep financial interest in the subject says “there’s no programmers in five years,” and various other breathless things referenced by the piece.

Bunk. Bilge. Bollocks.

A drawing in warm colors of a two-level bunk-bed in a pleasantly messy room with lots of shelves and things hanging on the walls and so on.
Not this bunk; this is a nice bunk.

The thing is that in software development, as I suspect in a whole lot of other domains, LLMs are in fact quite good at doing things that we’ve come to rely on as convenient proxies for important skills. What we ask software engineering job applicants to do in forty-five minutes to an hour isn’t really all that much like what they will be doing if we hire them, but it’s the best proxy we’ve been able to come up with that fits in that time-scale. In other fields, doing the questions on the bar exam isn’t much at all like what real lawyers do in practice, but it’s the best proxy that we as a society have been able to come up with.

Now we have a situation where there are pieces of software that can do a plausible job at various of these proxies (although even here some of the breathless reports are frankly bunk), but that absolutely cannot do the real jobs that we have gotten used to using them as proxies for. And this is driving people to all sorts of false conclusions about the real jobs.

In the software field specifically, what we ask candidates to do, and what various LLMs have shown various levels of skill at doing, is to take a description of a programming task and write code that does that, where the task is (ideally) one that they haven’t seen before, and also is simple enough to write code to accomplish within forty-five minutes to an hour.

Is this what actual professional (or even hobbyist) coders do? I think it’s safe to answer that with an unqualified No: this is not what human coders actually do. Once in awhile one might have a novel thing to do, and do it in forty-five minutes to an hour, but it doesn’t just fall from the sky into one’s lap; it comes up as part of some larger cohesive project that one is working on. Even if one is the most junior coder on a team, doing mostly what the more senior members ask you to do, that is essentially never “please spend an hour and write code to reverse a linked list for me”; that just isn’t how it works.

Actual working coders understand to a greater or lesser degree an overall thing that they are working on, what it does for a user, how it is supposed to work, how to debug it when it fails, at least the basic functional and nonfunctional requirements of the overall system as well as their part, the quirks it has, what libraries are available for them to call, what other members of the team are doing, and so on. And the overall thing isn’t a single function that reverses a linked list, or says whether one string is contained within another.

Let’s look at one of the motivating examples in our first example article. “Adam Hughes, a software developer,” it says, “… signed up for an account and asked ChatGPT to program a modified tic-tac-toe game, giving the game some weird rules so the bot couldn’t just copy code that another human had already written. Then he quizzed it with the kind of coding questions he asks candidates in job interviews….”, and voila, “Hughes found that ChatGPT came back with something he wasn’t prepared for: very good code.”

Unfortunately, this is the only place I can find this impressive feat mentioned. Adam Hughes’ own writeup of how “ChatGPT Will Replace Programmers Within 10 Years” doesn’t talk about this modified tic-tac-toe game at all, or the “coding questions” or the “very good code” referenced in the article. So I’m not sure what’s going on there.

The claim in Hughes’ article title is also bunk (which is to say, I disagree), while we’re on the subject. There is no reason to believe that any LLM will be able to do what’s listed there under “Phase 2” or later. Well, actually, the wording is odd: it says that the AI will “be able to make project-wide suggestions like: Rebuild this Angular 6 project in the latest version of React; Add 100% unit-test coverage to the project…”. I mean, sure, maybe the AI could suggest those things; but in order to predict that programmers are going to be replaced, the author presumably means that the AI will be able to do those things? A bit puzzling.

(Also puzzling is the title of that article; on the page itself the title is currently the nicely ambiguous “ChatGPT Will Replace Programmers Within 10 Years,” which is in some sense true if it somehow replaces exactly two (2) programmers by 2033. But the HTML of the page has a title-tag with the content “ChatGPT Will Replace All Programmers”, which is a much stronger claim about how many will be replaced (all of them!) but leaves out the timescale; heh. The actual text of the article predicts 95% job loss in 5-10 years, and 99% in “10+” years, so it’s sort of the most extreme combination of the two headlines (and it’s wrong).)

Hughes has been updating the beginning of that post with a list of things that are supposed to convince doubters that indeed ChatGPT Will Replace (All) Programmers Within 10 Years; the most recent is a video that, he says, shows “fully autonomous AI agents created python code. Literally replacing programmers. Not that smart, but it shows the how possible this is TODAY, well ahead of schedule.” (Bold in original.)

The video is, to be blunt, kinda silly. This guy has a system where two ChatGPTs talk to each other, and are in some way able to search the web and so on. He asks them to make code to display an input box, and they do that, at the level that one would have found with a web search for “input box python code example”. He asks them to make code to play tic-tac-toe (again, code that is all over the web), and they claim to do that, but it doesn’t seem to work (it displays the UI, but doesn’t make correct moves or reliably detect a win). Undeterred, he says “that worked”, and continues on (lols).

He asks them to “create a new strange simple game”, and they create a “guess the number in ten guesses” with high / low feedback game (not exactly “strange”, and again code that is all over the web), and it might work (aside from the apparently nonfunctional “difficulty” pulldown) but he doesn’t look inside or test it enough to be sure. And so on. And then for like the last 40% of the video he shows off “AutoGPT”, which appears just to fantasize to itself in text and non-functional pretend API calls about how it might create an AGI by linking together GPT instances in various vaguely-described ways, and then gets into a loop of repeating the same thing over and over.

What might Adam Hughes mean when he describes this as “fully autonomous” (given that it’s just doing exactly what it’s told) or as “Literally replacing programmers“? I’m not sure. Is there a programmer somewhere who is going to be replaced by a system that can’t write a tic-tac-toe game, or that can fantasize about creating AGI? I sort of doubt it.

One could cynically note at this point that the Business Insider / Yahoo News article has no doubt gotten lots of clicks and therefore ad sales, that the Hughes piece is a subscriber-only piece on Medium that ends “Stay tuned for my follow-up article about how to prepare for and thrive in this brave new world,” and that if you want to play with the system shown in the video you can “become a member of the channel” for like US$5/month and up. But that would be cynical. :)

There are dozens, maybe hundreds, of other examples we could look at, just in the coding area, let alone law, management, or all those other fields that AI is supposedly about to completely change (the last-linked article there is quite amusing, notably where it admits that it’s based on subjective occupation-impact estimates by people who know little or nothing about the occupations in question; i.e. bunk). But this is already long.

LLMs have shown an amazing ability to produce plausible-looking output in response to a huge variety of prompts. The output is even often (more or less by coincidence) correct and/or functional and/or pretty! That’s what it’s architected and designed to do, of course, but I think it’s fair to say that everyone’s been surprised by how well it does it.

Will that ability alone completely change the way that we do X, for any interesting value of X?

I’m putting myself on the record this time :) in saying that the answer is very much No.

Update: It’s been pointed out to me that from what I say here I do apparently believe that  “ChatGPT / Bard / FooLLM will completely change the way that we X” if X is one of those proxy tasks; and that’s a point! These things may significantly change the way that we do job interviews, or assign homework, or even give grades in university; but the main changes may be mostly along the lines of “like we do it now, only making sure people aren’t using LLMs to do the proxy task”, and that might not count as “completely changing”.