I’m reading this book “Superintelligence” by Nick Bostrom (I apparently have the 2014, not the 2016, edition). This isn’t a review, because I haven’t nearly finished it yet, but I have some Thoughts.
First of all, the book is taking far too long to get to the “What we should do about the prospect of things that are sufficiently smarter than us coming to exist nearby?” part. I’ve been plodding and plodding through many pages intended to convince me that this is a significant enough probability that I should bother thinking about it at all. I already believed that it was, and if anything the many many pages attempting to convince me of it have made me think it’s less worth worrying about, by presenting mediocre arguments.
(Like, they point out the obvious possibility that if a computer can make itself smarter, and the smarter it is the faster it can make itself smarter, then it can get smarter exponentially, and that can be really fast. But they also note that this depends on each unit of smarterness being not significantly harder than the last to achieve, and they give unconvincing external arguments for this being likely, whereas the typical inherent behavior of a hard problem is that it gets harder and harder as you go along (80/20 rules and all), and that would tend to prevent an exponential increase, and they haven’t even mentioned that. Maybe they will, or maybe they did and I didn’t notice. They also say amusing things about how significant AI research can be done on a single “PC” and maybe in 2014 that was a reasonable thing to say I dunno.)
But anyway, this isn’t a review of the book (maybe I’ll do one when I’m finished, if I ever finish). This is a little scenario-building based on me thinking about what an early Superintelligent system might actually look like, in what way(s) it might be dangerous, and so on. I’m thinking about this as I go, so no telling where we might end up!
The book tends to write down scenarios where, when it becomes Superintelligent, a system also becomes (or already is) relatively autonomous, able to just Do Things with its effectors, based on what comes in through its sensors, according to its goals. And I think that’s unlikely in the extreme, at least at first. (It may be that the next chapter or two in the book will consider this, and that’s fine, I’m thinking about it now anyway.
Consider a current AI system of whatever kind; say GPT-3 or NightCafe (VQGAN+CLIP). It’s a computer program, and it sits there doing nothing. Someone types some stuff into it, and it produces some stuff. Some interesting text, say, or a pretty image. Arguably it (or a later version of it) knows a whole lot about words and shapes and society and robots and things. But it has no idea of itself, no motives except in the most trivial sense, and no autonomy; it never just decides to do something.
So next consider a much smarter system, say a “PLNR-7” which is an AI in the same general style, which is very good at planning to achieve goals. You put in a description of a situation, some constraints and a goal, and it burns lots of CPU and GPU time and gets very hot, and outputs a plan for how to achieve that goal in that situation, satisfying those constraints. Let’s say it is Superintelligent, and can do this significantly better than any human.
Do we need to worry about it taking over the world? Pretty obviously not, in this scenario. If someone were to give it a description of its own situation, a relatively empty set of constraints, and the goal of taking over the world, perhaps it could put out an amazing plan for how it could do that. But it isn’t going to carry out the plan, because it isn’t a carrier-out of plans; all it does is create them and output them.
The plan that PLNR-7 outputs might be extremely clever, involving hiding subliminal messages and subtle suggestions in the outputs that it delivers in response to inputs, that would (due to its knowledge of human psychology) cause humans to want to give PNLR-7 more and more authority, to hook it up to external effectors, add autonomy modules to allow it to take actions on its own rather than just outputting plans, and so on.
But would it carry out that plan? No. Asking “would it have any reason to carry out that plan?” is already asking too much; it doesn’t have reasons in the interesting sense; the only “motivation” that it has is to output plans when a situation / constraints / goal triplet is input. And it’s not actually motivated to do that, that is simply what it does. It has no desires, preferences, or goals itself, even though it is a superhuman expert on the overall subject of desires, preferences, goals, and so on.
Is the difference here, the difference between being able to make plans, and being able to carry them out? I don’t think it’s even that simple. Imagine that we augment PLNR-7 so that it has a second input port, and we can bundle up the situation / constraints / goal inputs with the plan output of the first part, feed than into that second slot, and PLNR-7 will now compare the real world with the situation described, and the plan with whatever effectors we’ve given it, and if it matches closely enough it will carry out the plan (within the constraints) using its effectors.
Say we give it as its only effectors the ability to send email, and make use of funds from a bank account containing 100,000 US dollars. We give it a description of the current world as its input, a constraint that corresponding to being able to send email and spend a starting pool of US$100,000, and a goal of reducing heart disease in developing countries in one year. It thinks for awhile and prints out a detailed plan involving organizing a charitable drive, hiring a certain set of scientists, and giving them the task of developing a drug with certain properties that PLNR-7 has good reason to think will be feasible.
We like that plan, so we put it all into the second input box, and a year later heart disease in developing countries is down by 47%. Excellent! PLNR-7, having finished that planning and executing, is now just sitting there, because that’s what it does. It does not worry us, and does not pose a threat.
Is that because we let humans examine the plan between the first-stage output, and the second-stage effecting? I don’t think that’s entirely it. Let’s say we are really stupid, and we attach the output of the first stage directly to the input of the second stage. Now we can give it constraints to not to cause any pain or injury, and a goal of making the company that built it one billion dollars in a year, and just press GO.
A year later, it’s made some really excellent investments, and the company is one billion dollars richer, and once again it’s just sitting there.
Now, that was dangerous, admittedly. We could have overlooked something in the constraints, and PLNR-7 might have chosen a plan that, while not causing any pain or injury, would have put the entire human population of North America into an endless coma, tended to by machines of loving grace for the rest of their natural lives. But it didn’t, so all good.
The point, at this point, is that while PLNR-7 is extremely dangerous, it isn’t extremely dangerous on its own behalf. That is, it still isn’t going to take any actions autonomously. It is aware of itself only as one element of the current situation, and it doesn’t think of itself as special. It is extremely dangerous only because it has no common sense, and we might give it a goal which would be catastrophic for us.
And in fact, circling back around, the book sort of points that out. It tends to assume that AIs will be given goals and effectors, and notes that this doesn’t automatically give them any kind of instinct for self-preservation or anything, but that if the goal is open-ended enough, they will probably realize in many circumstances that the goal will be best achieved if the AI continues to exist to safeguard the goal-achievement, and if the AI has lots of resources to use to accomplish the goal. So you end up with an AI that both defends itself and wants to control as much as possible, not for itself but for the sake of the goal that we foolishly gave it, and that’s bad.
The key step here seems to be closing the loop between planning and effectuating. In the general case in the current world, we don’t do that; we either just give the AI input and have it produce symbolic output, or we give it effectors (and goals) that are purely virtual: get the red block onto the top of a stable tower of blocks on the virtual work surface, or get Company X to dominate the market in the marketplace simulation.
On the other hand, we do close the loop back to the real world in various places, some having to do with not-necessarily-harmless situations like controlling fighter jets. So that’s worth thinking about.
Okay, so that’s an interesting area identified! :) I will watch for, as I continue to read the book, places where they talk about how an AI might get directly attached to effectors that touch the real world, and might be enabled to use them to carry out possibly Universal Paperclips style real-world goals. And whether not doing that (i.e. restricting your AIs to just outputting verbal descriptions of means toward closed-ended goals) might be a Good Thing To Do. (Although how to you prevent that one jerk from taking the Fatal Step in order to speed up his world domination? Indeed.)
Hm?