Comment author: Tobias_Baumann 02 November 2017 04:25:27PM *  3 points [-]

Thanks for writing this up!

I think the idea is intriguing, and I agree that this is possible in principle, but I'm not convinced of your take on its practical implications. Apart from heuristic reasons to be sceptical of a new idea on this level of abstractness and speculativeness, my main objection is that a high degree of similarity with respect to reasoning (which is required for the decisions to be entangled) probably goes along with at least some degree of similarity with respect to values. (And if the values of the agents that correlate with me are similar to mine, then the result of taking them into account is also closer to my own values than the compromise value system of all agents.)

You write:

Superrationality only motivates cooperation if one has good reason to believe that another party’s decision algorithm is indeed extremely similar to one’s own. Human reasoning processes differ in many ways, and sympathy towards superrationality represents only one small dimension of one’s reasoning process. It may very well be extremely rare that two people’s reasoning is sufficiently similar that, having common knowledge of this similarity, they should rationally cooperate in a prisoner’s dilemma.

Conditional on this extremely high degree of similarity to me, isn't it also more likely that their values are also similar to mine? For instance, if my reasoning is shaped by the experiences I've made, my genetic makeup, or the set of all ideas I've read about over the course of my life, then an agent with identical or highly similar reasoning would also share a lot of these characteristics. But of course, my experiences, genes, etc. also determine my values, so similarity with respect to these factors implies similarity with respect to values.

This is not the same as claiming that a given characteristic X that's relevant to decision-making is generally linked to values, in the sense that people with X have systematically different values. It's a subtle difference: I'm not saying that certain aspects of reasoning generally go along with certain values across the entire population; I'm saying that a high degree of similarity regarding reasoning goes along with similarity regarding values.

13

S-risk FAQ

The idea that the future might contain astronomical amounts of suffering, and that we should work to prevent  such worst-case outcomes , has lately attracted some attention . I've written this FAQ to help clarify the concept and to clear up potential misconceptions. [Crossposted from my website on s-risks .] General... Read More
Comment author: Tobias_Baumann 20 July 2017 08:40:43AM *  11 points [-]

Thanks for writing this up! I agree that this is a relevant argument, even though many steps of the argument are (as you say yourself) not airtight. For example, consciousness or suffering may be related to learning, in which case point 3) is much less clear.

Also, the future may contain vastly larger populations (e.g. because of space colonization), which, all else being equal, may imply (vastly) more suffering. Even if your argument is valid and the fraction of suffering decreases, it's not clear whether the absolute amount will be higher or lower (as you claim in 7.).

Finally, I would argue we should focus on the bad scenarios anyway – given sufficient uncertainty – because there's not much to do if the future will "automatically" be good. If s-risks are likely, my actions matter much more.

(This is from a suffering-focused perspective. Other value systems may arrive at different conclusions.)

Comment author: Daniel_Dewey 07 July 2017 06:17:17PM 2 points [-]

Thanks!

Conditional on MIRI's view that a hard or unexpected takeoff is likely, HRAD is more promising (though it's still unclear).

Do you mean more promising than other technical safety research (e.g. concrete problems, Paul's directions, MIRI's non-HRAD research)? If so, I'd be interested in hearing why you think hard / unexpected takeoff differentially favors HRAD.

Comment author: Tobias_Baumann 08 July 2017 08:31:50AM *  0 points [-]

Do you mean more promising than other technical safety research (e.g. concrete problems, Paul's directions, MIRI's non-HRAD research)?

Yeah, and also (differentially) more promising than AI strategy or AI policy work. But I'm not sure how strong the effect is.

If so, I'd be interested in hearing why you think hard / unexpected takeoff differentially favors HRAD.

In a hard / unexpected takeoff scenario, it's more plausible that we need to get everything more or less exactly right to ensure alignment, and that we have only one shot at it. This might favor HRAD because a less principled approach makes it comparatively unlikely that we get all the fundamentals right when we build the first advanced AI system.

In contrast, if we think there's no such discontinuity and AI development will be gradual, then AI control may be at least somewhat more similar (but surely not entirely comparable) to how we "align" contemporary software systems. That is, it would be more plausible that we could test advanced AI systems extensively without risking catastrophic failure or that we could iteratively try a variety of safety approaches to see what works best.

It would also be more likely that we'd get warning signs of potential failure modes, so that it's comparatively more viable to work on concrete problems whenever they arise, or to focus on making the solutions to such problems scalable – which, to my understanding, is a key component of Paul's approach. In this picture, successful alignment without understanding the theoretical fundamentals is more likely, which makes non-HRAD approaches more promising.

My personal view is that I find a hard and unexpected takeoff unlikely, and accordingly favor other approaches than HRAD, but of course I can't justify high confidence in this given expert disagreement. Similarly, I'm not highly confident that the above distinction is actually meaningful.

I'd be interested in hearing your thoughts on this!

Comment author: Tobias_Baumann 07 July 2017 02:49:05PM *  1 point [-]

Great post! I agree with your overall assessment that other approaches may be more promising than HRAD.

I'd like to add that this may (in part) depend on our outlook on which AI scenarios are likely. Conditional on MIRI's view that a hard or unexpected takeoff is likely, HRAD may be more promising (though it's still unclear). If the takeoff is soft or AI will be more like the economy, then I personally think HRAD is unlikely to be the best way to shape advanced AI.

(I wrote a related piece on strategic implications of AI scenarios.)

6

Strategic implications of AI scenarios

[Originally posted on my new website on cause prioritization . This article is an introductory exploration of what different AI scenarios imply for our strategy in shaping advanced AI and might be interesting to the broader EA community, which is why I crosspost it here.] Efforts to mitigate the risks... Read More
Comment author: Tobias_Baumann 10 March 2017 10:00:15AM *  11 points [-]

Thanks for your post! I agree that work on preventing risks of future suffering is highly valuable.

It’s tempting to say that it implies that the expected value of a miniscule increase in existential risk to all sentient life is astronomical.

Even if the future is negative according to your values, there are strong reasons not to increase existential risk. This would be extremely uncooperative towards other value systems, and there are many good reasons to be nice to other value systems. It is better to pull the rope sideways by working to improve the future (i.e. reducing risks of astronomical suffering) conditional on there being a future.

In addition, I think it makes sense for utilitarians to adopt a quasi-deontological rule against using violence, regardless of whether one is a classical utilitarian or suffering-focused. This obviously prohibits something like increasing risks of extinction.

Comment author: Tobias_Baumann 17 October 2016 12:35:42AM 3 points [-]

Thanks a lot, Peter, for taking the time to evaluate SHIC! I agree that their work seems to be very promising.

In particular, it seems that students and future leaders are one of the most important target groups of effective altruism.

Comment author: Tobias_Baumann 17 October 2016 12:27:52AM 1 point [-]

Thanks for this great map!

Comment author: Tobias_Baumann 17 October 2016 12:31:09AM 3 points [-]

A minor detail: It's a bit inaccurate to say that the Foundational Research Institute works on general x-risks. This text explains that FRI focuses on reducing risks of astronomical suffering, which is related to, but not the same, as x-risk reduction.

Comment author: Tobias_Baumann 17 October 2016 12:27:52AM 1 point [-]

Thanks for this great map!

View more: Next