Comment author: capybaralet 04 January 2017 07:32:22PM *  2 points [-]

I think I was too terse; let me explain my model a bit more.

I think there's a decent chance (OTTMH, let's say 10%) that without any deliberate effort we make an AI which wipes our humanity, but is anyhow more ethically valuable than us (although not more than something which we deliberately design to be ethically valuable). This would happen, e.g. if this was the default outcome (e.g. if it turns out to be the case that intelligence ~ ethical value). This may actually be the most likely path to victory.**

There's also some chance that all we need to do to ensure that AI has (some) ethical value (e.g. due to having qualia) is X. In that case, we might increase our chance of doing X by understanding qualia a bit better.

Finally, my point was that I can easily imagine a scenario in which our alternatives are: 1. Build an AI with 50% chance of being aligned, 50% chance of just being an AI (with P(AI has property X) = 90% if we understand qualia better, 10% else) 2. Allow our competitors to build an AI with ~0% chance of being ethically valuable.

So then we obviously prefer option1, and if we understand qualia better, option 1 becomes better.

* I notice as I type this that this may have some strange consequences RE high-level strategy; e.g. maybe it's better to just make something intelligent ASAP and hope that it has ethical value, because this reduces *its X-risk, and we might not be able to do much to change the distribution of the ethical value the AI we create produces that much anyhow. I tend to think that we should aim to be very confident that the AI we build is going to have lots of ethical value, but this may only make sense if we have a pretty good chance of succeeding.

Comment author: Paul_Christiano 20 January 2017 06:58:42PM 1 point [-]

Ah, that makes a lot more sense, sorry for misinterpreting you. (I think Toby has a view closer to the one I was responding to, though I suspect I am also oversimplifying his view.)

I agree that there are important philosophical questions that bear on the goodness of building various kinds of (unaligned) AI, and I think that those questions do have impact on what we ought to do. The biggest prize is if it turns out that some kinds of unaligned AI are much better than others, which I think is plausible. I guess we probably have similar views on these issues, modulo me being more optimistic about the prospects for aligned AI.

I don't think that an understanding of qualia is an important input into this issue though.

For example, from a long-run ethical perspective, whether or not humans have qualia is not especially important, and what mostly matters is human preferences (since those are what shape the future). If you created a race of p-zombies that nevertheless shared our preferences about qualia, I think it would be fine. And "the character of human preferences" is a very different kind of object than qualia. These questions are related in various ways (e.g. our beliefs about qualia are related to our qualia and to philosophical arguments about consciousness), but after thinking about that a little bit I think it is unlikely that the interaction is very important.

To summarize, I do agree that there are time-sensitive ethical questions about the moral value of creating unaligned AI. This was item 1.2 in this list from 4 years ago. I could imagine concluding that the nature of qualia is an important input into this question, but don't currently believe that.

21

Donor lottery details

On January 15 we will have the drawing for the donor lottery discussed here . The opportunity to participate has passed; this post just lays out the details and final allocation of lottery numbers. If you regret missing out, I expect there will be another round, and it would be useful... Read More
Comment author: vollmer 08 December 2016 01:01:38PM *  0 points [-]

(If EAs end up committing a significant proportion of the $100k (or even all of it), will Paul reduce his 'haircut' or not?)

Comment author: Paul_Christiano 31 December 2016 03:32:26AM 2 points [-]

It looks like the total will be around $50k, so I'm going to reduce the cut to 0.5%.

Comment author: capybaralet 20 December 2016 01:00:19AM 1 point [-]

Hey I (David Krueger) remember we spoke about this a bit with Toby when I was at FHI this summer.

I think we should be aiming for something like CEV, but we might not get it, and we should definitely consider scenarios where we have to settle for less.

For instance, some value-aligned group might find that its best option (due to competitive pressures) is to create an AI which has a 50% probability of being CEV-like or "aligned via corrigibility", but has a 50% probability of (effectively) prematurely settling on a utility function whose goodness depends heavily on the nature of qualia.

If (as I believe) such a scenario is likely, then the problem is time-sensitive.

Comment author: Paul_Christiano 20 December 2016 03:20:18AM *  2 points [-]

(effectively) prematurely settling on a utility function whose goodness depends heavily on the nature of qualia

This feels extremely unlikely; I don't think we have plausible paths to obtaining a non-negligibly good outcome without retaining the ability to effectively deliberate about e.g. the nature of qualia. I also suspect that we will be able to solve the control problem, and if we can't then it will be because of failure modes that can't be avoided by settling on a utility function. Of course "can't see any way it can happen" is not the same as "am justifiably confident it won't happen," but I think in this case it's enough to get us to pretty extreme odds.

More precisely, I'd give 100:1 against: (a) we will fail to solve the control problem in a satisfying war, (b) we will fall back to a solution which depends on our current understanding of qualia, (c) the resulting outcome will be non-negligibly good according to our view about qualia at the time that we build AI, and (d) it will be good because we hold that view about qualia.

(My real beliefs might be higher than 1% just based on "I haven't thought about it very long" and peer disagreement. But I think it's more likely than not that I would accept a bet at 100:1 odds after deliberation, even given that reasonable people disagree.)

(By non-negligibly good I mean that we would be willing to make some material sacrifice to improve its probability compared to a barren universe, perhaps of $1000/1% increase. By because I mean that the outcome would have been non-negligibly worse according to that view if we had not held it.)

I'm not sure if there is any way to turn the disagreement into a bet. Perhaps picking an arbiter and looking at their views in a decade? (e.g. Toby, Carl Schulman, Wei Dai?) This would obviously involve less extreme odds.

Probably more interesting than betting is resolving the disagreement. This seems to be a slightly persistent disagreement between me and Toby, I have never managed to really understand his position but we haven't talked about it much. I'm curious about what kind of solutions you see as plausible---it sounds like your view is based on a more detailed picture rather than an "anything might happen" view.

Comment author: JoshYou 05 December 2016 02:25:48AM 1 point [-]

I'm still pretty confused about why you think donating 10% has to be time-confusing. People who outsource their donation decisions to, say, Givewell might only spend a few hours a year (or a few minutes, depending on how literally we interpret "outsourcing) deciding where to donate.

Comment author: Paul_Christiano 20 December 2016 03:11:03AM *  5 points [-]

I think that donor lotteries are a considerably stronger argument than GiveWell for the claim "donating 10% doesn't have to be time-consuming."

Your argument (with GiveWell in place of a lottery) requires that either (a) you think that GiveWell charities are clearly the best use of funds, or (b) by "doesn't have to be time-consuming" you mean "if you don't necessarily want to do the most good." I don't think you should be confused about why someone would disagree with (a), nor about why someone would think that (b) is a silly usage.

If there were low-friction donor lotteries, I suspect that most small GiveWell donors would be better-served by gambling up to perhaps $1M and then thinking about it at considerably greater length. I expect a significant fraction of them would end up funding something other than GiveWell top charities.

(I was originally supportive but kind of lukewarm about donor lotteries, but I think I've now come around to Carl's level of enthusiasm.)

Comment author: Ben_Todd 06 December 2016 12:35:08AM 5 points [-]

Thanks for moving this post to here rather than FB. I think it's a good discussion, however, I wanted to flag:

None of these criticisms are new to me. I think all of them have been discussed in some depth within CEA.

This makes me wonder if the problem is actually a failure of communication. Unfortunately, issues like this are costly to communicate outside of the organisation, and it often doesn't seem like the best use of time, but maybe that's wrong.

Given this, I think it also makes sense to run critical posts past the organisation concerned before posting. They might have already dealt with the issue, or have plans to do so, in which posting the criticism is significantly less valuable (because it incurs similar costs to the org but with fewer benefits). It also helps the community avoid re-treading the same ground.

Comment author: Paul_Christiano 20 December 2016 03:00:02AM *  2 points [-]

I assume this discussion is mostly aimed at people outside of CEA who are considering whether to take and help promote the pledge. I think there are many basic points which those people should probably understand but which CEA (understandably) isn't keen to talk about, and it is reasonable for people outside of CEA to talk about them instead.

I expect this discussion wasn't worth the time at any rate, but it seems like sharing it with CEA isn't really going to save time on net.

Comment author: Robert_Wiblin 09 December 2016 10:47:22PM 2 points [-]

Firstly: I think we should use the interpretation of the pledge that produces the best outcome. The use GWWC and I apply is completely mainstream use of the term pledge (e.g. you 'pledge' to stay with the person you marry, but people nonetheless get divorced if they think the marriage is too harmful to continue).

A looser interpretation is better because more people will be willing to participate, and each person gain from a smaller and more reasonable push towards moral behaviour. We certainly don't want people to be compelled to do things they think are morally wrong - that doesn't achieve an EA goal. That would be bad. Indeed it's the original complaint here.

Secondly: An "evil future you" who didn't care about the good you can do through donations probably wouldn't care much about keeping promises made by a different kind of person in the past either, I wouldn't think.

Thirdly: The coordination thing doesn't really matter here because you are only 'cooperating' with your future self, who can't really reject you because they don't exist yet (unlike another person who is deciding whether to help you).

One thing I suspect is going on here is that people on the autism spectrum interpret all kinds of promises to be more binding than neurotypical people do (e.g. https://www.reddit.com/r/aspergers/comments/46zo2s/promises/). I don't know if that applies to any individual here specifically, but I think it explains how some of us have very different intuitions. But I expect we will be able to do more good if we apply the neurotypical intuitions that most people share.

Of course if you want to make it fully binding for yourself, then nobody can really stop you.

Comment author: Paul_Christiano 20 December 2016 02:47:51AM 6 points [-]

Secondly: An "evil future you" who didn't care about the good you can do through donations probably wouldn't care much about keeping promises made by a different kind of person in the past either, I wouldn't think.

[...] there's no point having a commitment device to prompt you to follow through on something you don't think you should do

Usually we promise to do something that we would not have done otherwise, i.e. which may not be in line with our future self's interests. The promise "I will do X if my future self wants to" is gratuitous.

When I promise to do something I will try to do it, even if my preferences change. Perhaps you are reading "evil" as meaning "lacks integrity" rather than "is not altruistic," but in context that doesn't make much sense.

It seems reasonable for GWWC to say that the GWWC pledge is intended more as a statement of intent than as a commitment; it would be interesting to understand whether this is how most people who come into contact with GWWC perceive the pledge. If there is systematic misperception, it seems like the appropriate response is "oops, sorry" and to fix the misperception.

Thirdly: The coordination thing doesn't really matter here because you are only 'cooperating' with your future self, who can't really reject you because they don't exist yet (unlike another person who is deciding whether to help you).

It does not seem to me that the main purpose of taking the GWWC pledge, nor its main effect, is to influence the pledger's behavior.

Comment author: Pablo_Stafforini 07 December 2016 08:16:38PM *  2 points [-]

I have been put in touch with other donors that are each contributing less than $5k, but you can just team up with us. Email me at MyFrstName at MyLastName, followed by the most common domain extension.

Ideally there should be a better procedure for doing this; the associated trivial inconvenience may be discouraging some people from joining.

Comment author: Paul_Christiano 11 December 2016 12:08:18AM 3 points [-]

Note that this is now being implemented by donation swapping, so small donors don't have to put in any extra work.

Comment author: vollmer 08 December 2016 01:01:38PM *  0 points [-]

(If EAs end up committing a significant proportion of the $100k (or even all of it), will Paul reduce his 'haircut' or not?)

Comment author: Paul_Christiano 08 December 2016 06:35:02PM 7 points [-]

If $50,000 gets contributed, it reduces my risk by a factor of 2, so I could halve the fee (and if $100k got contributed I could reduce it to zero). I'll probably do that.

Comment author: RyanCarey 08 December 2016 12:03:00AM *  0 points [-]

If you had the right financial accounts, you might just short something, because that's very high variance and anticorrelated with other donors. edit: actually as Michael points out, it has the same variance (though it has more extreme downside risks)

Comment author: Paul_Christiano 08 December 2016 01:16:24AM 1 point [-]

Also very negative expected value though.

View more: Next