J

jprwg

4 karmaJoined Apr 2017

Comments
5

Thank you for writing this. I broadly agree with the perspective and find it frustrating how often it’s dismissed based on (what seem to me) somewhat-shaky assumptions.

A few thoughts, mainly on the section on total utilitarianism:

1. Regarding why people tend to assume unaligned AIs won’t innately have any value, or won’t be conscious: my impression is this is largely due to the “intelligence as optimisation process” model that Eliezer advanced. Specifically, that in this model, the key ability humans have that enables us to be so successful is our ability to optimise for goals; whereas mind features we like, such as consciousness, joy, curiosity, friendship, and so on are largely seen as being outside this optimisation ability, and are instead the terminal values we optimise for. (Also that none of the technology we have so far built has really affected this core optimisation ability, so once we do finally build an artificial optimiser it could very well quickly become much more powerful than us, since unlike us it might be able to improve its optimisation ability.)

I think people who buy this model will tend not to be moved much by observations like consciousness having evolved multiple times, as they’d think: sure, but why should I expect that consciousness is part of the optimisation process bit of our minds, specifically? Ditto for other mind features, and also for predictions that AIs will be far more varied than humans — there just isn’t much scope for variety or detail in the process of doing optimisation. You use the phrase “AI civilisation” a few times; my sense is that most people who expect disaster from unaligned AI would say their vision of this outcome is not well-described as a “civilisation” at all.

2. I agree with you that if the above model is wrong (which I expect it is), and AIs really will be conscious, varied, and form a civilisation rather than being a unified unconscious optimiser, then there is some reason to think their consumption will amount to something like “conscious preference satisfaction”, since a big split between how they function when producing vs consuming seems unlikely (even though it’s logically possible).

I’m a bit surprised though by your focus (as you’ve elaborated on in the comments) on consumption rather than production. For one thing, I’d expect production to amount to a far greater fraction of AIs’ experience-time than consumption, I guess on the basis that production enables more subsequent production (or consumption), whereas consumption doesn’t, it just burns resources.

Also, you mentioned concerns about factory farms and wild animal suffering. These seem to me describable as “experiences during production” — do you not have similar concerns regarding AIs’ productive activities? Admittedly pain might not be very useful for AIs, as plausibly if you’re smart enough to see the effects on your survival of different actions, then you don’t need such a crude motivator — even humans trying very hard to achieve goals seem to mostly avoid pain while doing so, rather than using it to motivate themselves. But emotions like fear and stress seem to me plausibly useful for smart minds, and I’d not be surprised if they were common in an AI civilisation in a world where the “intelligence as optimisation process” model is not true. Do you disagree, or do you just think they won’t spend much time producing relative to consuming, or something else?

(To be clear, I agree this second concern has very little relation to what’s usually termed “AI alignment”, but it’s the concern re: an AI future that I find most convincing, and I’m curious on your thoughts on it in the context of the total utilitarian perspective.)

To clarify: I don't think it will be especially fruitful to try to ensure AIs are conscious, for the reason you mention: multipolar scenarios don't really work that way, what will happen is determined by what's efficient in a competitive world, which doesn't allow much room to make changes now that will actually persist.

And yes, if a singleton is inevitable, then our only hope for a good future is to do our best to align the singleton, so that it uses its uncontested power to do good things rather than just to pursue whatever nonsense goal it will have been given otherwise.

What I'm concerned about is the possibility that a singleton is not inevitable (which seems to me the most likely scenario) but that folks attempt to create one anyway. This includes realities where a singleton is impossible or close to it, as well as where a singleton is possible but only with some effort made to push towards that outcome. An example of the latter would just be a soft takeoff coupled with an attempt at forming a world government to control the AI - such a scenario certainly seems to me like it could fit the "possible but not inevitable" description.

A world takeover attempt has the potential to go very, very wrong - and then there's the serious possibility that the creation of the singleton would be successful but the alignment of it would not. Given this, I don't think it makes sense to push unequivocally for this option, with the enormous risks it entails, until we have a good idea of what the alternative looks like. That we can't control that alternative is irrelevant - we can still understand it! When we have a reasonable picture of that scenario, then we can start to think about whether it's so bad that we should embark on dangerous risky strategies to try to avoid it.

One element of that understanding would be on how likely AIs are to be conscious; another would be how good or bad a life conscious AIs would have in a multipolar scenario. I agree entirely that we don't know this yet - whether for rabbits or for future AIs - that's part of what I'd need to understand before I'd agree that a singleton seems like our best chance at a good future.

Thanks for the reply. You're right that we can't be sure that conscious beings will do good things, but we don't have that assurance for any outcome we might push for.

If AIs are conscious, then a multipolar future filled with vast numbers of unaligned AIs could very plausibly be a wonderful future, brimming with utility. This isn't overwhelmingly obvious, but it's a real possibility. By contrast, if AIs aren't conscious then this scenario would represent a dead future. So distinguishing the two seems quite vital to understanding whether a multipolar outcome is bad or good.

You point out that even compared to the optimistic scenario I describe above, a correctly-aligned singleton could do better, by ensuring the very best future possible. True, but if a singleton isn't inevitable, trying to create one will itself pose a serious existential risk, as would any similar world takeover attempt. And even if the attempt is successful, we all agree that creating an aligned singleton is a very difficult task. Most singleton outcomes result in a universe almost entirely full of dead matter, produced by the singleton AI optimising for something irrelevant; even if it's conscious itself, resources that could have been put towards creating utility are almost all wasted as paperclips or whatever.

So it seems to me that, unless you're quite certain we're headed for a singleton future, the question of whether AIs will be conscious or not has a pretty huge impact on what path we should try to take.

I'm still not sure how the consciousness issue can just be ignored. Yes, given the assumption that AIs will be mindless machines with no moral value, obviously we need to build them to serve humans. But if AIs will be conscious creatures with moral value like us, then...? In this case finding the right thing to do seems like a much harder problem, as it would be far from clear that a future in which machine intelligences gradually replace human intelligences represents a nightmare scenario, or even an existential risk at all. It's especially frustrating to see AI-risk folks treat this question as an irrelevance, since it seems to have such enormous implications on how important AI alignment actually is.

(Note that I'm not invoking 'ghost in the machine', I am making the very reasonable guess that our consciousness is a physical process that occurs in our brains, that it's there for the same reason other features of our minds are there - because it's adaptive - and that similar functionality might very plausibly be useful for an AI too.)

One implication of a multi-agent scenario is that there would likely be enormous variety in the types of minds that exist, as each mind design could be optimised for a different niche. So in such a scenario, it seems quite plausible that each feature of our minds would turn out to be a good solution in at least a few situations, and so would be reimplemented in minds designed for those particular niches.