18

Paul_Christiano comments on Utopia In The Fog - Effective Altruism Forum

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (14)

You are viewing a single comment's thread. Show more comments above.

Comment author: Paul_Christiano 28 March 2017 10:53:56PM 4 points [-]

If you drop the assumption that the agent will be all-powerful and far beyond human intelligence then a lot of AI safety work isn't very applicable anymore, while it increasingly needs to pay attention to multi-agent dynamics

I don't think this is true in very many interesting cases. Do you have examples of what you have in mind? (I might be pulling a no-true-scotsman here, and I could imagine responding to your examples with "well that research was silly anyway.")

Whether or not your system is rebuilding the universe, you want it to be doing what you want it to be doing. Which "multi-agent dynamics" do you think change the technical situation?

the claim isn't that evolution is intrinsically "against" any particular value, it's that it's extremely unlikely to optimize for any particular value, and the failure to do so nearly perfectly is catastrophic

If evolution isn't optimizing for anything, then you are left with the agents' optimization, which is precisely what we wanted. I though you were telling a story about why a community of agents would fail to get what they collectively want. (For example, a failure to solve AI alignment is such a story, as is a situation where "anyone who wants to destroy the world has the option," as is the security dilemma, and so forth.)

Yes, or even implementable in current systems.

We are probably on the same page here. We should figure out how to build AI systems so that they do what we want, and we should start implementing those ideas ASAP (and they should be the kind of ideas for which that makes sense). When trying to figure out whether a system will "do what we want" we should imagine it operating in a world filled with massive numbers of interacting AI systems all built by people with different interests (much like the world is today, but more).

The point you are quoting is not about just any conflict, but the security dilemma and arms races. These do not significantly change with complete information about the consequences of conflict.

You're right.

Unsurprisingly, I have a similar view about the security dilemma (e.g. think about automated arms inspections and treaty enforcement, I don't think the effects of technological progress are at all symmetrical in general). But if someone has a proposed intervention to improve international relations, I'm all for evaluating it on its merits. So maybe we are in agreement here.

Comment author: Zeke_Sherman 04 June 2017 08:28:33AM *  0 points [-]

I don't think this is true in very many interesting cases. Do you have examples of what you have in mind? (I might be pulling a no-true-scotsman here, and I could imagine responding to your examples with "well that research was silly anyway.")

Parenthesis is probably true, e.g. most of MIRI's traditional agenda. If agents don't quickly gain decisive strategic advantages then you don't have to get AI design right the first time; you can make many agents and weed out the bad ones. So the basic design desiderata are probably important, but it's just not very useful to do research on them now. Not familiar enough with your line of work to comment on it, but just think about the degree to which a problem would no longer be a problem if you can build, test and interact with many prototype human-level and smarter-than-human agents.

Whether or not your system is rebuilding the universe, you want it to be doing what you want it to be doing. Which "multi-agent dynamics" do you think change the technical situation?

Aside from the ability to prototype as described above, there are the same dynamics which plague human society when multiple factions with good intentions end up fighting due to security concerns or tragedies of the commons, or when multiple agents with different priors interpret every new piece of evidence they see differently and so go down intractably separate paths of disagreement. FAI can solve all the problems of class, politics, economics, etc by telling everyone what to do, for better or for worse. But multiagent systems will only be stable with strong institutions, unless they have some other kind of cooperative architecture (such as universal agreement in value functions, in which case you now have the problem of controlling everybody's AIs but without the benefit of having an FAI to rule the world). Building these institutions and cooperative structures may have to be done right the first time, since they are effectively singletons, and they may be less corrigible or require different kinds of mechanisms to ensure corrigibility. And the dynamics of multiagent systems means you cannot accurately predict the long term future merely based on value alignment, which you would (at least naively) be able to do with a single FAI.

If evolution isn't optimizing for anything, then you are left with the agents' optimization, which is precisely what we wanted.

Well it leads to agents which are optimal replicators in their given environments. That's not (necessarily) what we want.

I though you were telling a story about why a community of agents would fail to get what they collectively want. (For example, a failure to solve AI alignment is such a story, as is a situation where "anyone who wants to destroy the world has the option," as is the security dilemma, and so forth.)

That too!