JoeW comments on Intro to caring about AI alignment as an EA cause - Effective Altruism Forum

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (9)

You are viewing a single comment's thread. Show more comments above.

Comment author: JoeW 16 April 2017 10:33:11PM *  0 points [-]

Thanks for the reply. You're right that we can't be sure that conscious beings will do good things, but we don't have that assurance for any outcome we might push for.

If AIs are conscious, then a multipolar future filled with vast numbers of unaligned AIs could very plausibly be a wonderful future, brimming with utility. This isn't overwhelmingly obvious, but it's a real possibility. By contrast, if AIs aren't conscious then this scenario would represent a dead future. So distinguishing the two seems quite vital to understanding whether a multipolar outcome is bad or good.

You point out that even compared to the optimistic scenario I describe above, a correctly-aligned singleton could do better, by ensuring the very best future possible. True, but if a singleton isn't inevitable, trying to create one will itself pose a serious existential risk, as would any similar world takeover attempt. And even if the attempt is successful, we all agree that creating an aligned singleton is a very difficult task. Most singleton outcomes result in a universe almost entirely full of dead matter, produced by the singleton AI optimising for something irrelevant; even if it's conscious itself, resources that could have been put towards creating utility are almost all wasted as paperclips or whatever.

So it seems to me that, unless you're quite certain we're headed for a singleton future, the question of whether AIs will be conscious or not has a pretty huge impact on what path we should try to take.

Comment author: RobBensinger 17 April 2017 03:26:30AM *  1 point [-]

You're right that we can't be sure that conscious beings will do good things, but we don't have that assurance for any outcome we might push for.

One way to think about the goal is that we want to "zero in" on valuable futures: it's unclear what exactly a good future looks like, and we can't get an "assurance," but for example a massive Manhattan Project to develop whole brain emulation is a not-implausible path to zeroing in, assuming WBE isn't too difficult to achieve on the relevant timescale and assuming you can avoid accelerating difficult-to-align AI too much in the process. It's a potentially promising option for zeroing in because emulated humans could be leveraged to do a lot of cognitive work in a compressed period of time to sort out key questions in moral psychology+philosophy, neuroscience, computer science, etc. that we need to answer in order to get a better picture of good outcomes.

This is also true for a Manhattan Project to develop a powerful search algorithm that generates smart creative policies to satisfy our values, while excluding hazardous parts of the search space -- this is the AI route.

Trying to ensure that AI is conscious, without also solving WBE or alignment or global coordination or something of that kind in the process, doesn't have this "zeroing in" property. It's more of a gamble that hopefully good-ish futures have a high enough base rate even when we don't put a lot of work into steering in a specific direction, that maybe arbitrary conscious systems would make good things happen. But building a future of conscious AI systems opens up a lot of ways for suffering to end up proliferating in the universe, just as it opens up a lot of ways for happiness to end up proliferating in the universe. Just as it isn't obvious that e.g. rabbits experience more joy than suffering in the natural world, it isn't obvious that conscious AI systems in a multipolar outcome would experience more joy than suffering. (Or otherwise experience good-on-net lives.)

True, but if a singleton isn't inevitable, trying to create one will itself pose a serious existential risk, as would any similar world takeover attempt.

I think if an AI singleton isn't infeasible or prohibitively difficult to achieve, then it's likely to happen eventually regardless of what we'd ideally prefer to have happen, absent some intervention to prevent it. Either it's not achievable, or something needs to occur to prevent anyone in the world from reaching that point. If you're worried about singletons, I don't think pursuing multipolar outcomes and/or conscious-AI outcomes should be a priority for you, because I don't think either of those paths concentrates very much probability mass (if any) into scenarios where singletons start off feasible but something blocks them from occurring.

Multipolar scenarios are likelier to occur in scenarios where singletons simply aren't feasible, as a background fact about the universe; but conditional on singletons being feasible, I'm skeptical that achieving a multipolar AI outcome would do much (if anything) to prevent a singleton from occurring afterward, and I think it would make alignment much more difficult.

Alignment and WBE look like difficult tasks, but they have the "zeroing in" property, and we don't know exactly how difficult they are. Alignment in particular could turn out to be much harder than it looks or much easier, because there's so little understanding of what specifically is required. (Investigating WBE has less value-of-information because we already have some decent WBE roadmaps.)

Comment author: JoeW 18 April 2017 08:30:52PM *  0 points [-]

To clarify: I don't think it will be especially fruitful to try to ensure AIs are conscious, for the reason you mention: multipolar scenarios don't really work that way, what will happen is determined by what's efficient in a competitive world, which doesn't allow much room to make changes now that will actually persist.

And yes, if a singleton is inevitable, then our only hope for a good future is to do our best to align the singleton, so that it uses its uncontested power to do good things rather than just to pursue whatever nonsense goal it will have been given otherwise.

What I'm concerned about is the possibility that a singleton is not inevitable (which seems to me the most likely scenario) but that folks attempt to create one anyway. This includes realities where a singleton is impossible or close to it, as well as where a singleton is possible but only with some effort made to push towards that outcome. An example of the latter would just be a soft takeoff coupled with an attempt at forming a world government to control the AI - such a scenario certainly seems to me like it could fit the "possible but not inevitable" description.

A world takeover attempt has the potential to go very, very wrong - and then there's the serious possibility that the creation of the singleton would be successful but the alignment of it would not. Given this, I don't think it makes sense to push unequivocally for this option, with the enormous risks it entails, until we have a good idea of what the alternative looks like. That we can't control that alternative is irrelevant - we can still understand it! When we have a reasonable picture of that scenario, then we can start to think about whether it's so bad that we should embark on dangerous risky strategies to try to avoid it.

One element of that understanding would be on how likely AIs are to be conscious; another would be how good or bad a life conscious AIs would have in a multipolar scenario. I agree entirely that we don't know this yet - whether for rabbits or for future AIs - that's part of what I'd need to understand before I'd agree that a singleton seems like our best chance at a good future.