I would therefore say that large-scale catastrophes related to biorisk or nuclear war are quite likely (~80–90%) to merely delay space colonization in expectation.[17] (With more uncertainty being not on the likelihood of recovery, but on whether some outlier-type catastrophes might directly lead to extinction.)

You seem to be highly certain that humans will recover from near-extinction. Is this based on solely the arguments in the text and footnote, or is there more? It seems to rest on the assumption that only population growth/size is the bottleneck, and key technologies and infrastructures will be developed anyway.

There isn't much more except that I got the impression that people in EA who have thought about this a lot think recovery is very likely, and I'm mostly deferring to them. The section about extinction risk is the part of my post where I feel the least knowledgeable. As for additional object-level arguments, I initially wasn't aware of points such as crops and animals already being cultivated/domesticated, metals already mined, and there being alternatives to rapid growth induced by fossil fuels, one of which being slow but steady growth over longer time periods. The way cultural evolution works is that slight improvements from innovations (which are allowed to be disjunctive rather than having to rely on developing a very specific technology) spread everywhere, which makes me think that large populations + a lot of time should go far enough eventually. Note also that if all-out extinction is simply very unlikely to ever happen, then you have several attempts left to reach technological maturity again.

I don't think CEV or similar reflection processes reliably lead to wide moral circles. I think they can still be heavily influenced by their initial set-up (e.g. what the values of humanity when reflection begins).

Why do you think this is the case? Do you think there is an alternative reflection process (either implemented by an AI, by a human society, or combination of both) that could be defined that would reliably lead to wide moral circles? Do you have any thoughts on what would it look like?

If we go through some kind of reflection process to determine our values, I would much rather have a reflection process that wasn't dependent on whether or not MCE occurred before hand, and I think not leading to a wide moral circle should be considered a serious bug in any definition of a reflection process. It seems to me that working on producing this would be a plausible alternative or at least parallel path to directly performing MCE.

I think that there's an inevitable tradeoff between wanting a reflection process to have certain properties and worries about this violating goal preservation for at least some people. This blogpost is not about MCE directly, but if you think of "BAAN thought experiment" as "we do moral reflection and the outcome is such a wide circle that most people think it is extremely counterintuitive" then the reasoning in large parts of the blogpost should apply perfectly to the discussion here.

That is not to say that trying to fine tune reflection processes is pointless: I think it's very important to think about what our desiderata should be for a CEV-like reflection process. I'm just saying that there will be tradeoffs between certain commonly mentioned desiderata that people don't realize are there because they think there is such a thing as "genuinely free and open-ended deliberation."

how do you think this compares with an additional employee at a non-local EA org?

EA London estimated with it's first year of a paid staff it had about 50% of the impact of a more established EA organisation such as GWWC or 80K per £ invested.

It is also worth bearing in mind that the non-monetary costs of ' an additional employee' are higher than the non-monetary costs of a grant (eg, training, management time, overheads, risks, opportunity costs)

Are they mostly counting impact on Givewell-recommended charities? I'd imagine that for donors who are mostly interested in the long-term cause area, there'd be a perceived large difference between GWWC and 80k, which is why this sounds like a weird reference class to me. (Though maybe the difference is not huge because GWWC has become more cause neutral over the years?)

Hello Lukas,

I'm struggling to wrap my head around the difference between upside and downside focused morality. I tried to read the rest of the document, but I kept thinking "hold on, I don't understand the original motivation" and going back to the start.

I’m using the term downside-focused to refer to value systems that in practice (given what we know about the world) primarily recommend working on interventions that make bad things less likely.

If I understand it, the project is something like "how do your priorities differ if you focus on reducing bad things over promoting good things?" but I don't see how you can on to draw anything conclusions about that because downside (as well as upside) morality covers so many different things.

Here are 4 different ways you might come to the conclusion you should work on making bad things less likely. Quoting Ord:

"Absolute Negative Utilitarianism (NU). Only suffering counts.

Lexical NU. Suffering and happiness both count, but no amount of happiness (regardless of how great) can outweigh any amount of suffering (no matter how small).

Lexical Threshold NU. Suffering and happiness both count, but there is some amount of suffering that no amount of happiness can outweigh.

Weak NU. Suffering and happiness both count, but suffering counts more. There is an exchange rate between suffering and happiness or perhaps some nonlinear function which shows how much happiness would be required to outweigh any given amount of suffering."

This would lead you to give more weight to suffering at the theoretical level. Or, fifth, you could be a classical utilitarian - happiness and suffering count equally - and decide, for practical reasons, to focus on reducing suffering.

As I see it, the problem is that all of them will and do recommend different priorities. A lexical or absolute NU should, perhaps, really be trying to blow up the world. Weak NU and classical U will be interested in promoting happiness too and might want humanity to survive and conquer the stars. It doesn't seem useful or possible to conduct analysis along the lines of "this is what you should do if you're more interested in reducing bad things" because the views within downside focused morality won't agree with what you should do or why you should do it.

More broadly, this division seems unhelpful. Suppose we we have four people in a room, a lexical NU, a very weak NU, a classical U, and a lexical positive utilitarian (any happiness outweighs all suffer). It seems like, on your view, the first two should be downside focused and the latter two upside focused. However, it could be both the classical U and the very weak NU agree that the best way to do good is focusing suffering reduction, so they're downside. Or they could agree the best way is happiness promotion, so they're upside. In fact, the weak NU and classical U have much more in common with each other - they will nearly always agree on the value of states of affairs - than either of them do with the lexical NU or lexical PU. Hence they should really stick together and it doesn't seem trying to force views into those that, practically speaking, focus on producing good or reducing bad, is a category that helps our analysis.

It might be useful to hear you say why you think this is a useful distinction.

If I understand it, the project is something like "how do your priorities differ if you focus on reducing bad things over promoting good things?"

This sounds accurate, but I was thinking of it with empirical cause prioritization already factored in. For instance, while a view like classical utilitarianism can be called "symmetrical" when it comes to normatively prioritizing good things and bad things (always with some element of arbitrariness because there are no "proper units" of happiness and suffering), in practice the view turns out to be upside-focused because, given our empirical situation, there is more room for creating happiness/good things than there is future expected suffering left to prevent. (Cf. the astronomical waste argument.)

This would go the other way if we had good reason to believe that the future will be very bad, but I think the classical utilitarians who are optimistic about the future (given their values) are right to be optimistic: If you count the creation of extreme happiness as not-a-lot-less important than the prevention of extreme suffering, then the future will in expectation be very valuable according to your values (see footnote [3]).

but I don't see how you can on to draw anything conclusions about that because downside (as well as upside) morality covers so many different things.

My thinking is that when it comes to interventions that affect the long-term future, different normative views tend to converge roughly into two large clusters for the object-level interventions they recommend. If the future will be good for your value system, reducing exinction risks and existential risk related to "not realizing full potential" will be most important. If your value system makes it harder to attain vast amounts of positive value through bringing about large (in terms of time and/or space) utopian futures, then you want to focus specifically on (cooperative ways of) reducing suffering risks or downside risks generally. The cut-off point is determined by what the epistemically proper degree of optimism or pessimism is with regard to the quality of the long-term future, and to what extent we can have an impact on that. Meaning, if we had reason to believe that the future will be very negative and that effort to make the future contain vast amounts of happiness are very very very unlikely to ever work, then even classical utilitarianism would count as "downside-focused" according to my classification.

Some normative views simply don't place much importance on creating new happy people, in which case they kind of come out as downside-focused by default (except for the consideration I mention in footnote 2). (If these views give a lot of weight to currently existing people, then they can be both downside-focused and give high priority to averting extinction risks, which is something I pointed out in the third-last paragraph in the section on extinction risks.)

Out of the five examples you mentioned, I'd say they fall into the two clusters as follows: Downside-focused: absolute NU, lexical NU, lexical threshold NU and a "negative-leaning" utilitarianism that is sufficiently negative-leaning to counteract our empirical assessment of how much easier it will be to create happiness than to prevent suffering. The rest is upside-focused (maybe with some stuck at "could go either way"). How much is "sufficiently negative-leaning"? It becomes tricky because there are not really any "proper units" of happiness and suffering, so we have to first specify what we are comparing. See footnote 3: My own view is that the cut-off is maybe very roughly at around 100, but I mentioned "100 or maybe 1,000" to be on the conservative side. And these refer to comparing extreme happiness to extreme suffering. Needless to say, it is hard to predict the future and we should take such numbers with a lot of caution, and it seems legitimate for people to disagree. Though I should qualify that a bit: Say, if someone thinks that classical utilitarians should not work on extinction risk reduction because the future is too negative, or if someone thinks even strongly negative-leaning consequentialists should have the same ranking of priorities as classical utilitarians because the future is so very positive, then both of these have to explain away strong expert disagreement (at least within EA; I think outside of EA, people's predictions are all over the place, with economists generally being more optimistic).

Lastly, I don't think proponents of any value system should start to sabotage other people's efforts, especially not since there are other ways to create value according to your own value systems that is altogether much more positive sum. Note that this – the dangers of naive/Machiavellian consequentialism – is a very general problem that reaches far deeper than just value differences. Say you have two EAs who both think creating happiness is 1/10th as important as reducing suffering. One is optimistic about the future, the other has become more pessimistic after reading about some new arguments. They try to talk out the disagreement, but do not reach agreement. Should the second EA now start to sabotage the efforts of the first one, or vice versa? That seems ill-advised; no good can come from going down that path.


This was really interesting and probably as clear as such a topic can possibly be displayed.

Disclaimer: I dont know how to deal with infinities mathematically. What I am about to say is probably very wrong.

For every conceivable value system, there is an exactly opposing value system, so that there is no room for gains from trade between the systems (e.g. suffering maximizers vs suffering minimizers).

In an infinite multiverse, there are infinite agents with decision algorithms sufficiently similar to mine to allow for MSR. Among them, there are infinite agents that hold any value system. So whenever I cooperate with one value system, I defect on infinite agents that hold the exactly opposing values. So infinity seems to make cooperation impossble??

Sidenote: If you assume decision algorithm and values to be orthogonal, why do you suggest to "adjust [the values to cooperate with] by the degree their proponents are receptive to MSR ideas"?

Best, Jan

For every conceivable value system, there is an exactly opposing value system, so that there is no room for gains from trade between the systems (e.g. suffering maximizers vs suffering minimizers).

There is an intuition that "disorderly" worlds with improbable histories must somehow "matter less," but it's very hard to cash out what this could mean. See this post or this proposal. I'm not sure these issues are solved yet (probably not). (I'm assuming that suffering maximizers or other really weird value systems would only evolve, or be generated when lightning hits someone's brain or whatever, in very improbable instances.)

Sidenote: If you assume decision algorithm and values to be orthogonal, why do you suggest to "adjust [the values to cooperate with] by the degree their proponents are receptive to MSR ideas"?

Good point; this shows that I'm skeptical about a strong version of independence where values and decision algorithms are completely uncorrelated. E.g., I find it less likely that deep ecologists would change their actions based on MSR than people with more EA(-typical) value systems. It is open to discussion whether (or how strongly) this has to be corrected for historical path dependencies and founder effects: If Eliezer had not been really into acausal decision theory, perhaps the EA movement would think somewhat differently about the topic. If we could replay history many times over, how often would EA be more or less sympathetic to superrationality than it is currently?


Whoops. I can see how my responses didn't make my own position clear.

I am an anti-realist, and I think the prospects for identifying anything like moral truth are very low. I favor abandoning attempts to frame discussions of AI or pretty much anything else in terms of converging on or identifying moral truth.

I consider it a likely futile effort to integrate important and substantive discussions into contemporary moral philosophy. If engaging with moral philosophy introduces unproductive digressions/confusions/misplaced priorities into the discussion it may do more harm than good.

I'm puzzled by this remark:

I think anything as specific as this sounds worryingly close to wanting an AI to implement favoritepoliticalsystem.

I view utilitronium as an end, not a means. It is a logical consequence of wanting to maximize aggregate utility and is more or less a logical entailment of my moral views. I favor the production of whatever physical state of affairs yields the highest aggregate utility. This is, by definition, "utilitronium." If I'm using the term in an unusual way I'm happy to propose a new label that conveys what I have in mind.

I totally sympathize with your sentiment and feel the same way about incorporating other people's values in a superintelligent AI. If I just went with my own wish list for what the future should look like, I would not care about most other people's wishes. I feel as though many other people are not even trying to be altruistic in the relevant sense that I want to be altruistic, and I don't experience a lot of moral motivation to help accomplish people's weird notions of altruistic goals, let alone any goals that are clearly non-altruistically motivated. In the same way I'd feel no strong (even lower, admittedly) motivation to help make the dreams of baby eating aliens come true.

Having said that, I am confident that it would screw things up for everyone if I followed a decision policy that does not give weight to other people's strongly held moral beliefs. It is already hard enough to not mess up AI alignment in a way that makes things worse for everyone, and it would become much harder still if we had half a dozen or more competing teams who each wanted to get their idiosyncratic view of the future installed.

BTW note that value differences are not the only thing that can get you into trouble. If you hold an important empirical beliefs that others do not share, and you cannot convince them of it, then it may appear to you as though you're justified to do something radical about it, but that's even more likely to be a bad idea because the reasons for taking peer disagreement seriously are stronger in empirical domains of dispute than in normative ones.

There is a sea of considerations from Kantianism, contractualism, norms for stable/civil societies and advanced decision theory that, while each line of argument seems tentative on its own and open to skepticism, all taken together point very strongly into the same direction, namely that things will be horrible if we fail to cooperate with each other and that cooperating is often the truly rational thing to do. You're probably already familiar with a lot of this, but for general reference, see also this recent paper that makes a particularly interesting case for particularly strong cooperation, as well as other work on the topic, e.g. here and here.

This is why I believe that people interested in any particular version of utilitronium should not override AI alignment procedures last minute just to get an extra large share of cosmic stakes for their own value system, and why I believe that people like me, who care primarily about reducing suffering, should not increase existential risk. Of course, all of this means that people who want to benefit human values in general should take particular caution to make sure that idiosyncratic value systems that may diverge from them also receive consideration and gains from trade.

This piece I wrote recently is relevant to cooperation and the question of whether values are subjective or not, and how much convergence we should expect and to what extent value extrapolation procedures bake in certain (potentially unilateral) assumptions.

This blogpost seems relevant. Admittedly it's labelled 'speculative' by the author, but I find the concerns plausible.

The one view that seems unusually prevalent within FRI, apart from people self-identifying with suffering-focused values, is a particular anti-realist perspective on morality and moral reasoning where valuing open-ended moral reflection is not always regarded as the by default "prudent" thing to do.

Thanks for pointing this out. I've noticed this myself in some of FRI's writings, and I'd say this, along with the high amount of certainty on various object-level philosophical questions that presumably cause the disvaluing of reflection about them, are what most "turns me off" about FRI. I worry a lot about potential failures of goal preservation (i.e., value drift) too, but because I'm highly uncertain about just about every meta-ethical and normative question, I see no choice but to try to design some sort of reflection procedure that I can trust enough to hand off control to. In other words, I have nothing I'd want to "lock in" at this point and since I'm by default constantly handing off control to my future self with few safeguards against value drift, doing something better than that default is one of my highest priorities. If other people are also uncertain and place high value on (safe/correct) reflection as a result, that helps with my goal (because we can then pool resources together to work out what safe/correct reflection is), so it's regrettable to see FRI people sometimes argue for more certainty than I think is warranted and especially to see them argue against reflection.

That makes sense. I do think as a general policy, valuing reflection is more positive-sum, and if one does not feel like much is "locked in" yet then it becomes very natural too. I'm not saying that people who value reflection more than I do are doing it wrong; I think I would even argue for reflection being very important and recommend it to new people, if I felt more comfortable that they'd end up pursuing things that are beneficial from all/most plausible perspectives. Though what I find regrettable is that the "default" interventions that are said to be good from as many perspectives as possible oftentimes do not seem great from a suffering-focused perspective.

