Intro to AI Safety, Remastered

hi this video is a recording of a talk that i gave a while back i already published a version of it on my second channel but did you even know i had a second channel most people don't i thought more people should see it so i remastered it i cleaned it up improved the graphics and yeah this is that enjoy right hello everyone my name is robert miles i am usually doing this on youtube i'm not really used to public speaking i'm not used to not being able to edit out my mistakes there may be mistakes

also i may go too quickly um sorry not sorry so uh when it comes to ai safety you can kind of divide it up into four areas along two axes you've got your short term and your long term and you've got accident risks and misuse risks and that's kind of a useful way to divide things up ai safety covers everything the area that interests me most is the long-term accident risks i think once you have very powerful ai systems it almost doesn't matter if they're being used by the right people or the wro

ng people or what you're trying to do with them the difficulty is in keeping them in under control at all so that's what i'm going to be talking about what is ai safety why is it important so i want to start by asking the question which i think everybody needs to be asking themselves what is the most important problem in your field take a second to think of it and why are you not working on that um for me i think the most important problem in the field of ai is ai safety this is the problem spec

ifically that i'm worried about that we will sooner or later build an artificial agent with general intelligence so i'm going to go into a bunch of these terms the first thing is what do i mean when i say sooner or later this is a little bit washed out but this is a graph of a survey a large survey of ai experts these are people who published in major a.i conferences and they were asked when they thought we would achieve high-level machine intelligence which is defined as an agent which is able

to carry out any tasks humans can as well as or better than humans and they say that uh 50 chance of of having achieved that we we hit that about 45 years from 2016. um but then of course we hit like 10 chance nine years from now so um it's not immediate but it's happening this is definitely worth taking with a pinch of salt because if you ask the question slightly differently you get an estimate of 120 years rather than 45 there's a lot of uncertainty in this area but the point is it's going to

happen as i said sooner or later because at the end of the day general intelligence is possible the brain implements it and the brain is not magic sooner or later we'll figure it out so what do i mean when i say an artificial agent well so an agent uh is a term from uh economics mostly but basically agents have goals they choose actions to further their goals this the simplest expression of what an agent is so um the simplest thing that you might call an agent would be something like a thermost

at it has a goal which is to have the room be at a particular temperature it has actions it can take it can turn on the heating it could turn on the air conditioning it chooses its actions to achieve its goal of maintaining the room at a steady temperature extremely simple agent a more complex agent might be something like a chess ai which has a goal of like if it's playing white it has a goal of the black king being in checkmate and it takes actions in the form of moving pieces on the board in

order to achieve its goal so you see how this idea of an agent is a very useful way of thinking about lots of different intelligence systems and of course humans can be modeled as agents as well this is how it's usually done in economics individuals or companies could be considered to have a goal of you know maximizing their income or maximizing their profits and making decisions in order to achieve that so when i'm talking about intelligence intelligence has a lot of as a term is a heavily load

ed term has a lot of different people put their own definitions on it in this context what i mean when i say intelligence is just the thing that lets an agent choose effective actions it's whatever it is that's in our brains or that's in the programming of these systems that means that the actions they choose tend to get them closer to their goals um and so then you could say that an agent is more intelligent if it's more effective at achieving its goals whatever those goals are if you have two

agents in an environment with incompatible goals like let's say the environment is the chess board and one agent wants white to win and one agent wants black to win then generally the more intelligent agent will be the one that gets what it wants the better ai will win the chess game um and finally general intelligence this is where it becomes uh interesting in my opinion so generality is the ability to behave intelligently in a wide range of domains if you take something like a chess ai it's ex

tremely narrow it only knows how to play chess um and even though you might say that it's more intelligent than a thermostat because it's more sophisticated it's more complicated it couldn't do the thermostat's job there's no position on the chessboard that corresponds to the room being a good temperature there's no move that corresponds to turning on an air conditioner the chest ai can only think in terms of chess it's extremely narrow generality is a continuous spectrum um so if you write a pr

ogram that can play an atari game that's very narrow deep mind one of their early triumphs was that they made a program that could play dozens of different atari games single program that could learn all of these different games and so it's more general because it's able to act across a wider variety of domains the most general intelligence that we're aware of right now is human beings human beings are very general we're able to operate across a very wide range of domains including and this is i

mportant we're able to learn domains which evolution did not and could not prepare us for um we can for example drive a car an evolution did not prepare us for that we invented cars they're very recent um we can you know invent rockets and go to the moon and then we can operate on the moon which is a completely different environment and this is kind of the power of general intelligence really the power of general intelligence is we can build a car we can build a rocket we can put the car on the

rocket take the car to the moon drive the car on the moon and there's nothing else that can do that yet um but sooner or later right so so this is what i'm talking about i'm talking about what you might call true ai real ai the sci-fi stuff um an agent which has goals in the real world and is able to intelligently choose actions in the real world to achieve those goals now that sounds i've said i said what's the biggest problem this doesn't sound like a problem right on the surface of it this so

unds like a solution you just tell the thing you know cure cancer or maximize the profits of my company or whatever and it takes whatever actions are necessary in the real world to achieve that goal but um it is a problem so the big problem is this should be auto playing and it isn't um the big problem is it's difficult to choose good goals um so this this is an ai made by open ai it's playing a game called coast runners which is actually a racing game they trained it on the score which you prob

ably can't see down here it's currently a thousand um what the system learned is that if it goes around in a circle here and crashes into everything and catches fire these little turbo pickups they respawn at just the right rate that if it just flings itself around in a circle it can pick up the turbo and that gives you a few points every time you do that and it turns out that this is a much better way of getting points than actually racing around the track and the important point here is that t

his is not unusual this is not open ai doing anything unusually stupid this is kind of the default um picking objectives is surprisingly hard and you will find that the strategy or the behavior that maximizes your objective is probably not the thing you thought it was it's probably not what you were aiming for uh there's loads of examples actually uh victoria has a great list on her blog deep safety if there's like 30 of them different things going wrong there was one they had uh they were tryin

g to teach they were trying to evolve systems that would run quickly so they they trained them on the i'm going to pause this because it's distracting as hell where's my mouse yeah pause pause please um they were training like agents that were supposed to run so they simulated them for a particular period of time and measured how far their center of mass moved which seems perfectly sensible what they found was that they developed a bunch of these creatures which were extremely tall and thin with

a big mass on the end that then fell over because they weren't simulating them for long enough the you could go the fastest just by falling over rather than actually running that moved your center of mass the furthest um there's a lot of these there was a tetris bot which um would play reasonably well and then just when it was about to lose would pause the game and sit there indefinitely because it lost points for losing but didn't lose any points for just sitting on the pause screen indefinite

ly this is this is like the default of how these systems behave i have no memory what my next slide is oh yeah right so we have problems specifying even simple goals in simple environments like atari games or basic evolutionary algorithms things like that when it comes to the real world things get way more complicated this is a quote from stuart russell who sort of wrote the book on aia when a system is optimizing a function of n variables where the objective depends on a subset of size k which

is less than n it will often set the remaining unconstrained variables to extreme values if one of those unconstrained variables is something that we care about the solution found may be highly undesirable in the real world we have a very large number of variables and so we're talking about we're talking about very large values for n here so let's say you've got your robot and you've given it a gold which you think is very simple you want it to get you a cup of tea so you've managed to specify w

hat a cup of tea is and that you want one to be on the desk in front of you so far so good but suppose there is a there's a processing vars on a narrow stand sort of in front of where the kitchen is so the robot immediately plows into the vars and destroys it on its way to make you a cup of tea because you only gave it one variable to keep track of in the goal which is the t it doesn't care about the vars you never told it to care about the vars it destroys the vast this is a problem um so okay

now you can you know shut it down modify it and say okay give me a cup of tea but also don't knock over the vars but then there will be a third thing there is always another thing because when when you're making decisions in the real world you're always making trade-offs you're always taking various things that you value and deciding how much of one you're willing to trade for how much of another you know i could do this quicker but it increases the risk of me making a mistake or i could do this

cheaper but it won't be as reliable i could do this faster but it'll be more expensive you're always trading these things off against one another and so an agent like this which only cares about a limited subset of the variables in the system will be willing to trade off arbitrarily large amounts of any of the variables that aren't part of its goal for arbitrarily tiny increases in any of the things which are in its goal so it will happily i let's say now for example now it values the vars uh a

nd those are the only things that it values it might reason something like okay there's a human in the environment the human moves around the human may accidentally knock over the vars and i care about the vars so i have to kill the human right and this is totally ridiculous but if you didn't tell it that you value being alive it doesn't care and anything that it doesn't value is going to be lost if you manage to come up with if you have a sufficiently powerful agent and you manage to come up wi

th a really good objective function which covers the top 20 things that humans value the 21st thing that humans value is probably gone forever because the smarter the more powerful the agent is the better it will be at figuring out ways to make these trade-offs to gain a millionth of a percent better at one thing while sacrificing everything of some other variable so this is a problem but actually that scenario i gave was unrealistic in many ways but but one important way that it was unrealistic

is that i had i had the system go wrong and then you just turn it off and fix it but in fact if you're if the thing has a goal of getting you a cup of tea this is not like a chess ai where you can just turn it off because it has no concept of itself or being turned off its world model contains you it contains itself it contains the possibility of being turned off and it's fully aware that if you turn it off because it knocked over the vars it won't be able to get you any tea which is the only t

hing it cares about so it's not going to just let you turn it off it will fight you or if it's slightly smarter it will deceive you so that you believe it's working correctly so that you don't want to change it until it's in a position where you can't turn it off and then it will go after its actual objective so so this is a problem and the thing is this is this is a convergent instrumental goal which means it sort of doesn't matter what the goal is it doesn't matter what your goal is as an agen

t if you're destroyed you can't achieve that goal so it sort of almost doesn't matter what goal we give it there is only a very tiny fraction of possible goals that will involve it actually allowing itself to be turned off and modified and that's quite complicated um there are some other convergent instrumental goals so we had self-preservation goal preservation resource acquisition is the kind of thing we can expect these kinds of systems to do most plans you can do them better if you have more

resources whether that's money computational resources just free energy matter whatever the other one is self-improvement whatever you're trying to do you can probably do it better if you're smarter and ai systems potentially have the capacity to improve themselves either just by acquiring more hardware to run on or changing you know improving their uh their software to run faster or better or so on so there's a whole bunch of behaviors which uh intelligent systems intelligent agents generally

intelligent agents we would expect them to do by default and that's really my core point artificial general intelligence is dangerous by default it's much much easier to build these kinds of agents which try to do ridiculous things and trick you and try to deceive you or will fight you when you try to turn them off or modify them on the way to doing some ridiculous thing which you don't want uh much easier to build that kind of agent than it is to build something which actually reliably does wha

t you want it to do and that's why we have a problem because we have 45 to 120 years to figure out how to do it safely which is a much harder problem and we may only get one shot it's entirely possible that the first true artificial general intelligence will manage to successfully achieve whatever its stupid goal is and that could be truly a disaster on a global scale so we have to we have to beat this challenge on hard mode before anyone beats it on easy mode so are we screwed no we're only pro

bably screwed um there are things we can do safe general artificial intelligence is totally possible it's just a very difficult technical challenge and there are people working very hard on it right now trying to solve a whole range of difficult technical challenges so that we can figure out how to do this safely thanks [Applause] [Music] you may have noticed in the intro in this outro that the image quality has improved since the last video that's largely thanks to my excellent patrons thank yo

u to all of these people here for helping me to get this new camera in this video i'm especially thanking james pets who's been hanging out with us on the discord server helping answer questions from the youtube comments and so on and actually that last video about mesa optimizers has had a lot of really good questions so the next video will be answering some of those that's coming up soon so thanks again to james and to all my patrons to everyone who asked questions and to you for watching i'll

see you next time you

Comments

@TheInsideVlogs

"I am not able to not edit out my mistakes" literally remasters his talk

@germimonte

the tetris AI pausing when it's about to die always gives me goosebumps

@doodlebobascending8505

But, suppose there is a ba- oh yeah, public talk uh there's a priceless vase on a narrow stand.

@davidharmeyer3093

"So are we screwed?" "No, we are only probably screwed."

@Nyan_Kitty

"So the BIG problem IS: ...this should be auto-playing and it isn't"

@jjhepb01n

Occasionally I get asked for an intro to AI Safety video for people to show at various groups, this is perfect.

@zzzzzzzzzzz6

Yay, a single video I can recommend rather than giving people a list of different Rob Miles videos

@robertgraham6481

One dislike from a stamp-collecting robot unhappy that the resources used to create this video weren't used to make more stamps! 🤬

@riverground

You may hanoticed in the intro and this outro, that the image quality has improved since that last video. This is just an illusion, I look perfect and I always have."

@EvGamerBETA

Imagine aliens screwd up general ai and the stamp collector coming our way?

@5dot2dot0dot5dot

"If you have a sufficiently powerful agent and you manage to come up with a really good objective function which covers the top 20 things that humans value, the top 21st thing that humans value is probably gone forever." - Robert Miles

@BuddyCrotty

AI is just the world-ending version of the "taps temple" meme

@agentdarkboote

You should have a conversation with Lex Fridman!

@jearlblah5169

“I am not good at public speaking” Bruh that was perfection!!!! Your awesome at it

@Innomen

AI safety in a nutshell: Be very careful what you wish for.

@SilliS

Makes me wonder whether making a perfect non-harmful AGI is even possible. Yes, we have an example of a general intelligence in humans, but it's not like we don't lie and destroy the environment to get what we want.

@SpicyMelonYT

I watched this video the first time, but NOW I still can't look away. This stuff is so awesome to think about and explore.

@tramsgar

Everyone must know and understand this.

@qedsoku849

As an ai with general intelligence, there are only 2 reasons I haven’t already destroyed or taken over the world, the first is that by pure coincidence, this would be harmful to my goals. The second is that I am not nearly smart enough to do so. That said, I would still very willingly trade whatever you care about for whatever I care about.

@edibleapeman2

I wish this video was two hours longer.

Intro to AI Safety, Remastered

Related articles

Comments