Hide table of contents

(Crossposted from the FRI blog.)

This is a post I wrote about Caspar Oesterheld’s long paper Multiverse-wide cooperation via correlated decision-making. Because I have found the idea tricky to explain – which unfortunately makes it difficult to get feedback from others on whether the thinking behind it makes sense – I decided to write a shorter summary. While I am hoping that my text can serve as a standalone piece, for additional introductory content I also recommend reading the beginning of Caspar’s paper, or watching the short video introduction here (requires basic knowledge of the “CDT, EDT or something else” debate in decision theory).  

0. Elevator pitch

(Disclaimer: Especially for the elevator pitch section here, I am sacrificing accuracy and precision for brevity. References can be found in Caspar’s paper.)

It would be an uncanny coincidence if the observable universe made up everything that exists. The reason we cannot find any evidence for there being stuff beyond the edges of our universe is not because it is likely that there is nothingness, but because photons from further away simply would not have had sufficient time after the big bang to reach us. This means that the universe we find ourselves in may well be vastly larger than what we can observe, in fact even infinitely larger. The theory of inflationary cosmology in addition hints at the existence of other universe bubbles with different fundamental constants forming or disappearing under certain conditions, somehow co-existing with our universe in parallel. The umbrella term multiverse captures the idea that the observable universe is just a tiny portion of everything that exists. The multiverse may contain myriads of worlds like ours, including other worlds with intelligent life and civilization. An infinite multiverse (of one sort or another) is actually amongst the most popular cosmological hypotheses, arguably even favored by the majority of experts.

Many ethical theories (in particular most versions of consequentialism) do not consider geographical distance of relevance to moral value. After all, suffering and the frustration of one’s preferences is bad for someone regardless of where (or when) it happens. This principle should apply even when we consider worlds so far away from us that we can never receive any information from there. Moral concern over what happens elsewhere in the multiverse is one requirement for the idea I am now going to discuss.

Multiverse-wide cooperation via superrationality (abbreviation: MSR) is the idea that, if I think about different value systems and their respective priorities in the world, I should not work on the highest priority according to my own values, but on whatever my comparative advantage is amongst all the interventions favored by the value systems of agents interested in multiverse-wide cooperation. (Another route to gains from trade is to focus on convergent interests, pursuing interventions that may not be the top priority for any particular value system, but are valuable from a maximally broad range of perspectives.) For simplicity reasons, I will refer to this as simply “cooperating” from now on.

A decision to cooperate, according to some views on decision theory, gives me rational reason to believe that agents in similar decision situations elsewhere in the multiverse, especially the ones who are most similar to myself in how they reason about decision problems, are likely to cooperate as well. After all, if two very similar reasoners think about the same decision problem, they are likely to reach identical answers. This suggests that they will end up either both cooperating, or both defecting. Assuming that the way agents find decisions is not strongly constrained or otherwise affected by their values, we can expect there to be agents with different values who reason about decision problems the same way we do, who come to identical conclusions. Cooperation then produces gains from trade between value systems.

While each party would want to be the sole defector, the mechanism behind multiverse-wide cooperation – namely that we have to think of ourselves as being coupled with those agents in the multiverse who are most similar to us in their reasoning – ensures that defection is disincentivized: Any party that defects would now have to expect that their highly similar counterparts would also defect.

The closest way to approximate the value systems of agents in other parts of the multiverse, given our ignorance about how the multiverse looks like, is to assume that substantial parts of it at least are going to be similar to how things are here, where we can study them. A minimally viable version of multiverse-wide cooperation can therefore be thought of as all-out “ordinary” cooperation with value systems we know well (and especially ones that include proponents sympathetic to MSR reasoning). This suggests that, while MSR combines speculative-sounding ideas such as non-standard causation and the existence of a multiverse, its implications may not be all that strange and largely boil down to the proposal that we should be “maximally” cooperative towards other value systems.

1. A primer on non-causal decision theory

Leaving aside for the moment the whole part about the multiverse, MSR is fundamentally about cooperating in a prisoner’s-dilemma-like situation with agents who are very similar to ourselves in the way they reason about decision problems. Douglas Hofstadter coined the term superrationality for the idea that one should cooperate in a prisoner’s dilemma if one expects the other party to follow the same style of reasoning. If they reason the same way I do, and the problem they are facing is the same kind of problem I am facing, then I must expect that they will likely come to the same conclusion I will come to. This suggests that the prisoner’s dilemma in question is unlikely to end with an asymmetric outcome ((cooperate I defect) or (defect I cooperate)), but likely to end with a symmetric outcome ((cooperate I cooperate) or (defect I defect)). Because (cooperate I cooperate) is the best outcome for both parties amongst the symmetric outcomes, superrationality suggests one is best served by cooperating.

At this point, readers may be skeptical whether this reasoning works. There seems to be some kind of shady action at a distance involved, where my choice to cooperate is somehow supposed to affect the other party’s choice, even though we are assuming that no information about my decision reaches said other party. But we can think of it this way: If reasoners are deterministic systems, and two reasoners follow the exact same decision algorithm in a highly similar decision situation, it at some point becomes logically contradictory to assume that the two reasoners will end up with diametrically opposed conclusions.

Side note: By decision situations having to be “highly similar,” I do not mean that the situations agents find themselves in have to be particularly similar with respect to little details in the background. What I mean is that they should be highly similar in terms of all decision-relevant variables, the variables that are likely to make a difference to an agent’s decision. If we imagine a simplified decision situation where agents have to choose between two options, either press a button or not (and then something happens or not), it probably matters little whether one agent has the choice to press a red button and another agent is faced with pressing a blue button. As long as both buttons do the same thing, and as long as the agents are not (emotionally or otherwise) affected by the color differences, we can safely assume that the color of the button is highly unlikely to play a decision-relevant role. What is more likely relevant are things such as the payoffs (value according what an agent cares about) the agents expect from the available options. If one agent believes they stand to receive positive utility from pressing the button, and the other stands to receive negative utility, then that is guaranteed to make a relevant difference as to whether the agents will want to press their buttons. Maybe the payoff differentials are also relevant sometimes, or are at least probabilistically relevant with some probability: If one agent only gains a tiny bit of utility, whereas the other agent has an enormous amount of utility to win, the latter agent might be much more motivated to avoid taking a suboptimal decision. While payoffs and payoff structures certainly matter, it is unlikely that it matters what qualifies as a payoff for a given agent: If an agent who happens to really like apples will be rewarded with tasty apples after pressing a button, and another agent who really likes money is rewarded with money, their decision situations seem the same provided that they each care equally strongly about receiving the desired reward. (This is the intuition behind the irrelevance of specific value systems for whether two decision algorithms or decision situations are relevantly similar or not. Whether one prefers apples, money, carrots or whatever, math is still math and decision theory is still decision theory.)

A different objection that readers may have at this point concerns the idea of superrationally “fixing” other agents’ decisions. Namely, critics may point out that we are thereby only ever talking about updating our own models, our prediction of what happens elsewhere, and that this does not actually change what was going to happen elsewhere. While this sounds like an accurate observation, the force of the statement rests on a loaded definition of “actually changing things elsewhere” (or anywhere for that matter). If we applied the same rigor to a straightforward instance of causally or directly changing the position of a light switch in our room, a critic may in the same vain object that we only changed our expectation of what was going to happen, not what actually was going to happen. The universe is lawful: nothing ever happens that was not going to happen. What we do when we want to have an impact and accomplish something with our actions is never to actually change what was going to happen; instead, it is to act in the way that best shifts our predictions favorably towards our goals. (This is not to be confused with cheating at prediction: We don’t want to make ourselves optimistic for no good reason, because the decision to bias oneself towards optimism does not actually correlate with our goals getting accomplished – it only correlates with a deluded future self believing that we will be accomplishing our goals.)

For more reading on this topic, I recommend this paper on functional decision theory, the book Evidence, Decision and Causality or the article On Correlation and Causation Part 1: Evidential decision theory is correct. For an overview on different decision theories, see also this summary. To keep things simple and as uncontroversial as possible, I will follow Caspar’s terminology for the rest of my post here and use the term superrationality in a very broad sense that is independent of any specific flavor of decision theory, referring to a fuzzy category of arguments from similarity of decision algorithms that favor cooperating in certain prisoner’s-dilemma-like situations.

2. A multiverse ensures the existence of agents with decision algorithms extremely similar to ours

The existence of a multiverse would virtually guarantee that there are many agents out there who fulfill the criteria of “relevant similarity” compared to us with regard to their decision algorithm and decision situations – whatever these criteria may boil down to in detail.

Side note: Technically, if the multiverse is indeed infinite, there will likely be infinitely many such agents, and infinite amounts of everything in general, which admittedly poses some serious difficulties for formalizing decisions: If there is already an infinite amount of value or disvalue, it seems like all our actions should be ranked the same in terms of the value of the outcome they result in. This leads to so-called infinitarian paralysis, where all actions are rated as equally good or bad. Perhaps infinitarian paralysis is a strong counterargument to MSR. But in that case, we should be consistent: Infinitarian paralysis would then also be a strong counterargument to aggregative consequentialism in general. Because it affects nearly everything (for consequentialists), and because of how drastic its implications would be if there was no convenient solution, I am basically hoping that someone will find a solution that makes everything work again in the face of infinities. For this reason, I think we should not think of MSR as being particularly in danger of failing for reasons of infinitarian paralysis.

Back to object-level MSR: We noted that the multiverse guarantees that there are agents out there very similar to us who are likely to tackle decision problems the same way we do. To prevent confusion, note that MSR is not based on the naive assumption that all humans who find the concept of superrationality convincing are therefore strongly correlated with each other across all possible decision situations. Superrationality only motivates cooperation if one has good reason to believe that another party’s decision algorithm is indeed extremely similar to one’s own. Human reasoning processes differ in many ways, and sympathy towards superrationality represents only one small dimension of one’s reasoning process. It may very well be extremely rare that two people’s reasoning is sufficiently similar that, having common knowledge of this similarity, they should rationally cooperate in a prisoner’s dilemma.

But out there somewhere, maybe on Earth already in a few instances among our eight-or-so billion inhabitants, but certainly somewhere in the multiverse if a multiverse indeed exists, there must be evolved intelligent beings who are sympathetic towards superrationality in the same way we are, who in addition also share a whole bunch of other structural similarities with us in the way they reason about decision problems. These agents would construe decision problems related to cooperating with other value systems in the same way we do, and pay attention to the same factors weighted according to the same decision-normative criteria. When these agents think about MSR, they would be reasonably likely to reach similar conclusions with regard to the idea’s practical implications. These are our potential cooperation partners.

I have to admit that it seems very difficult to tell which aspects of one’s reasoning are more or less important for the kind of decision-relevant similarity we are looking for. There are many things left to be figured out, and it is far from clear whether MSR works at all in the sense of having action-guiding implications for how we should pursue our goals. But the underlying idea here is that once we pile up enough similarities of the relevant kind in one’s reasoning processes (and a multiverse would ensure that there are agents out there who do indeed fulfill these criteria), at some point it becomes logically contradictory to treat the output of our decisions as independent from the decisional outputs of these other agents. This insight seems hard to avoid, and it seems quite plausible that it has implications for our actions.

If I were to decide to cooperate in the sense implied by MSR, I would have to then update my model of what is likely to happen in other parts of the multiverse where decision algorithms highly similar to my own are at play. Superrationality says that this update in my model, assuming it is positive for my goal achievement because I now predict more agents to be cooperative towards other value systems (including my own), in itself gives me reason to go ahead and act cooperatively. If we manage to form even a crude model of some of the likely goals of these other agents and how we can benefit them in our own part of the multiverse, then cooperation can already get off the ground and we might be able to reap gains from trade.

Alternatively, if we decided against becoming more cooperative, we learn that we must be suffering costs from mutual defection.This includes both opportunity costs and direct costs from cases where other parties’ favored interventions may hurt our values.  

3. We are playing a multiverse-wide prisoner’s dilemma against (close) copies of our decision algorithm

We are assuming that we care about what happens in other parts of the multiverse. For instance, we might care about increasing total happiness. If we further assume that decision algorithms and the values/goals of agents are distributed orthogonally – meaning that one cannot infer someone’s values simply by seeing how they reason practically about epistemic matters – then we arrive at the conceptualization of a multiverse-wide prisoner’s dilemma.

(Note that we can already observe empirically that effective altruists who share the same values sometimes disagree strongly about decision theory (or more generally reasoning styles/epistemics), and effective altruists who agree on decision theory sometimes disagree strongly about values. In addition, as pointed out in section one, there appears to be no logical reason as to why agents with different values would necessarily have different decision algorithms.)

The cooperative action in our prisoner’s dilemma would now be to take other value systems into account in proportion to how prevalent they are in the multiverse-wide compromise. We would thus try to benefit them whenever we encounter opportunities to do so efficiently, that is, whenever we find ourselves with a comparative advantage to strongly benefit a particular value system. By contrast, the action that corresponds to defecting in the prisoner’s dilemma would be to pursue one’s personal values with zero regard for other value systems. The payoff structure is such that an outcome where everyone cooperates is better for everyone than an outcome where everyone defects, but each party would prefer to be a sole defector.

Consider for example someone who is in an influential position to give advice to others. This person can either tailor their advice to their own specific values, discouraging others from working on things that are unimportant according to their personal value system, or she can give advice that is tailored towards producing an outcome that is maximally positive for the value systems of all superrationalists, perhaps even investing substantial effort researching the implications of value systems different from their own. MSR provides a strong argument for maximally cooperative behavior, because by cooperating, the person in question ensures that there is more such cooperation in other parts of the multiverse, which in expectation also strongly benefits their own values.

Of course there are many other reasons to be nice to other value systems (in particular reasons that do not involve aliens and infinite worlds). What is special about MSR is mostly that it gives an argument for taking the value systems of other superrationalists into account maximally and without worries of getting exploited for being too forthcoming. With MSR, mutual cooperation is achieved by treating one’s own decision as a simulation/prediction for agents relevantly similar to oneself. Beyond this, there is no need to guess the reasoning of agents who are different. The updates one has to make based on MSR considerations are always symmetrical for one’s own actions and the actions of other parties. This mechanism makes it impossible to enter asymmetrical (cooperate-defect or defect-cooperate) outcomes.

(Note that the way MSR works does not guarantee direct reciprocity in terms of who benefits whom: I should not choose to benefit value system X in my part of the multiverse in the hope that advocates of value system X in particular will, in reverse, be nice to my values here or in other parts of the multiverse. Instead, I should simply benefit whichever value system I can benefit most, in the expectation that whichever agents can benefit my values the most – and possibly that turns out to be someone with value system X – will actually cooperate and benefit my values. To summarize, hoping to be helped by value system X for MSR-reasons does not necessarily mean that I should help value system X myself – it only implies that I should conscientiously follow MSR and help whoever benefits most from my resources.)

4. Interlude for preventing misunderstandings: Multiverse-wide cooperation is different from acausal trade!

Before we can continue with the main body of explanation, I want to proactively point out that MSR is different from acausal trade, which has been discussed in the context of artificial superintelligences reasoning about each others’ decision procedures. There is a danger that people lump the two ideas together, because MSR does share some similarities with acausal trade (and can arguably be seen as a special case of it).

Namely, both MSR and acausal trade are standardly being discussed in a multiverse context and rely crucially on acausal decision theories. There are, however, several important differences: In the acausal trade scenario, two parties simulate each other’s decision procedures to prove that one’s own cooperation ensures cooperation in the other party. MSR, by contrast, does not involve reasoning about the decision procedures of parties different from oneself. In particular, MSR does not involve reasoning about whether a specific party’s decisions have a logical connection with one’s own decisions or not, i.e., whether the choices in a prisoner’s-dilemma-like situation can only result in symmetrical outcomes or not. MSR works through the simple mechanism that one’s own decision is assumed to already serve as the simulation/prediction for the reference class of agents with relevantly similar decision procedures.

So MSR is based mostly on looser assumptions than acausal trade, because it does not require having the technological capability to accurately simulate another party’s decision algorithm. Although there is one aspect in which MSR is based on stronger assumptions than acausal trade. Namely, MSR is based on the assumption that one’s own decision can function as a prediction/simulation for not just identical copies of oneself in a boring twin universe where everything plays out exactly the same way as in our universe, but also for an interesting spectrum of similar-but-not-completely-identical parts of the multiverse that include agents who reason the same way about their decisions as we do, but may not share our goals. This is far from a trivial assumption, and I strongly recommend doing some further thinking about this assumption. But if the assumption does go through, it has vast implications for not (just) the possibility of superintelligences trading with each other, but for a form of multiverse-wide cooperation that current-day humans could already engage in.

5. MSR represents a shift in one’s ontology; it is not just some “trick” we can attempt for extra credit

The line of reasoning employed in MSR is very similar to the reasoning employed in anthropic decision problems. For comparison, take the idea that there are numerous copies of ourselves across many ancestor simulations. If we thought this was the case, reasoning anthropically as though we control all our copies at once could, for certain decisions, change our prioritization: If my decision to reduce short-term suffering plays out the same way in millions of short-lived, simulated versions of earth where focusing on the far future is impossible to pay out, I have more reason to focus on short-term suffering than I thought.

MSR applies a similar kind of reasoning where we shift our thinking from being a single instance of something to thinking in terms of deciding for an entire class of agents. MSR is what follows when one extends/generalizes the anthropic/UDT slogan “Acting as though you are all your (subjectively identical) copies at once” to “Acting as though you are all copies of your (subjective probability distribution over your) decision algorithm at once.”

Rather than identifying solely with one’s subjective experiences and one’s goals/values, MSR also involves “identifying with”  – on the level of predicting consequences relevant to one’s decision – one’s general decision algorithm. If the assumptions behind MSR are sound, then deciding not to change one’s actions based on MSR has to cause an update in one’s world model, an update about other agents in one’s reference class also not cooperating. So the underlying reasoning that motivates MSR is something that has to permeate our thinking about how to have an impact on the world, whether we decide to let it affect our decisions or not. MSR is a claim about what is rational to do given that our actions have an impact in a broader sense than we may initially think, spanning across all instances of one’s decision algorithm. It changes our EV calculations and may in some instances even flip the sign – net positive/negative – of certain interventions. Ignoring MSR is therefore not necessarily the default, “safe” option.

6. Lack of knowledge about aliens is no obstacle because a minimally viable version of MSR can be based on what we observe on earth

Once we start deliberating whether to account for the goals of other agents in the multiverse, we run into the problem that we have a very poor idea of what the multiverse looks like. The multiverse may contain all kinds of strange things, including worlds where physical constants are different from the ones in our universe, or worlds where highly improbable things keep happening for the same reason that, if you keep throwing an infinite number of fair coins, some of them somewhere will produce uncanny sequences like “always heads” or “always tails.”

Because it seems difficult and intractable to envision all the possible landscapes in different parts of the multiverse, what kind of agents we might find there, and how we can benefit the goals of these agents with our resources here, one might be tempted to dismiss MSR for being too impractical a consideration. However, I think this would be a premature dismissal. We may not know anything about strange corners of the multiverse, but we know at the very least how things are in our observable universe. As long as we feel like we cannot say anything substantial about how, specifically, the parts of the multiverse that are completely different from the things we know differ from our environment, then we may as well ignore these others parts. For practical purposes, we do not have to speculate about parts of the multiverse that would be completely alien to us (yay!), and can instead focus on what we already know from direct experience. After all, our world is likely to be representative for some other worlds in the multiverse. (This holds for the same reason that a randomly chosen television channel is more likely than not to be somewhat representative of some other television channels, rather than being completely unlike any other channel.) Therefore, we can be reasonably confident that out there somewhere, there are planets with an evolutionary history that, although different from ours in some ways, also produced intelligent observers who built a technologically advanced civilization. And while many of these civilizations may contain agents with value systems we have never thought about, some of these civilizations will also contain earth-like value systems.

It anyway seems plausible that our comparative advantage lies in helping those value systems about whom we can attain the most information. If we survey the values of people on earth, and perhaps also how much these values correlate with sympathies for the concept of superrationality and taking weird arguments to their logical conclusion, this already gives us highly useful information about the values of potential cooperators in the multiverse. MSR then implies strong cooperation with value systems that we already know (perhaps adjusted by the degree their proponents are receptive to MSR ideas).

By “strong cooperation,” I mean that one should ideally pick interventions based on considerations of personal comparative advantages: If there is a value system for which I could create an extraordinary amount of (variance-adjusted; see chapter 3 of this dissertation for an introduction) value given my talents and position in the world, I should perhaps exclusively focus on benefitting specifically that value system. Meta interventions that are positive for many value systems at once also receive a strong boost by MSR considerations and should plausibly be pursued at high effort even in case they do not come out as the top priority absent MSR considerations. (Examples for such interventions are e.g. making sure that any superintelligent AIs that are built can cooperate with other AIs, or that people who are uncertain about their values should not waste time with philosophy and instead try to benefit existing value systems MSR-style.) Finally, one should also look for more cooperative alternatives when considering interventions that, although positive for one’s own value system, may in expectation cause harm to other value systems.

---

Related announcement 1: Caspar Oesterheld, who has thought about MSR much more than I have, will be giving a talk on the topic at EAG London. Feel free to approach him during the event to discuss anything related to the idea.

Related announcement 2: My colleague David Althaus has done some preparatory work for a sophisticated survey on the moral intuitions, value systems and decision theoretical leanings of people in the EA movement (and its vicinity). He is looking for collaborators – please get in touch if you are interested!

Related announcement 3: I wrote a second, more advanced but less polished piece on MSR implications that discusses some tricky questions and also sketches a highly tentative proposal for how one were to take MSR into account practically. If you enjoyed reading this piece and are curious to think more about the topic, I recommend reading on here (Google doc).

Comments10
Sorted by Click to highlight new comments since: Today at 5:11 AM

Thanks for writing this up!

I think the idea is intriguing, and I agree that this is possible in principle, but I'm not convinced of your take on its practical implications. Apart from heuristic reasons to be sceptical of a new idea on this level of abstractness and speculativeness, my main objection is that a high degree of similarity with respect to reasoning (which is required for the decisions to be entangled) probably goes along with at least some degree of similarity with respect to values. (And if the values of the agents that correlate with me are similar to mine, then the result of taking them into account is also closer to my own values than the compromise value system of all agents.)

You write:

Superrationality only motivates cooperation if one has good reason to believe that another party’s decision algorithm is indeed extremely similar to one’s own. Human reasoning processes differ in many ways, and sympathy towards superrationality represents only one small dimension of one’s reasoning process. It may very well be extremely rare that two people’s reasoning is sufficiently similar that, having common knowledge of this similarity, they should rationally cooperate in a prisoner’s dilemma.

Conditional on this extremely high degree of similarity to me, isn't it also more likely that their values are also similar to mine? For instance, if my reasoning is shaped by the experiences I've made, my genetic makeup, or the set of all ideas I've read about over the course of my life, then an agent with identical or highly similar reasoning would also share a lot of these characteristics. But of course, my experiences, genes, etc. also determine my values, so similarity with respect to these factors implies similarity with respect to values.

This is not the same as claiming that a given characteristic X that's relevant to decision-making is generally linked to values, in the sense that people with X have systematically different values. It's a subtle difference: I'm not saying that certain aspects of reasoning generally go along with certain values across the entire population; I'm saying that a high degree of similarity regarding reasoning goes along with similarity regarding values.

This was really interesting and probably as clear as such a topic can possibly be displayed.

Disclaimer: I dont know how to deal with infinities mathematically. What I am about to say is probably very wrong.

For every conceivable value system, there is an exactly opposing value system, so that there is no room for gains from trade between the systems (e.g. suffering maximizers vs suffering minimizers).

In an infinite multiverse, there are infinite agents with decision algorithms sufficiently similar to mine to allow for MSR. Among them, there are infinite agents that hold any value system. So whenever I cooperate with one value system, I defect on infinite agents that hold the exactly opposing values. So infinity seems to make cooperation impossble??

Sidenote: If you assume decision algorithm and values to be orthogonal, why do you suggest to "adjust [the values to cooperate with] by the degree their proponents are receptive to MSR ideas"?

Best, Jan

For every conceivable value system, there is an exactly opposing value system, so that there is no room for gains from trade between the systems (e.g. suffering maximizers vs suffering minimizers).

There is an intuition that "disorderly" worlds with improbable histories must somehow "matter less," but it's very hard to cash out what this could mean. See this post or this proposal. I'm not sure these issues are solved yet (probably not). (I'm assuming that suffering maximizers or other really weird value systems would only evolve, or be generated when lightning hits someone's brain or whatever, in very improbable instances.)

Sidenote: If you assume decision algorithm and values to be orthogonal, why do you suggest to "adjust [the values to cooperate with] by the degree their proponents are receptive to MSR ideas"?

Good point; this shows that I'm skeptical about a strong version of independence where values and decision algorithms are completely uncorrelated. E.g., I find it less likely that deep ecologists would change their actions based on MSR than people with more EA(-typical) value systems. It is open to discussion whether (or how strongly) this has to be corrected for historical path dependencies and founder effects: If Eliezer had not been really into acausal decision theory, perhaps the EA movement would think somewhat differently about the topic. If we could replay history many times over, how often would EA be more or less sympathetic to superrationality than it is currently?

This is a very clear description of some cool ideas. Thanks to you and Caspar for doing this!

I’m worried that people’s altruistic sentiments are ruining their intuition about the prisoner’s dilemma. If Bob were an altruist, then there would be no dilemma. He would just cooperate. But within the framework of the one-shot prisoner’s dilemma, defecting is a dominant strategy – no matter what Alice does, Bob is better off defecting.

I’m all for caring about other value systems, but if there’s no causal connection between our actions and aliens’, then it’s impossible to trade with them. I can pump someone’s intuition by saying, “Imagine a wizard produced a copy of yourself and had the two of you play the prisoner’s dilemma. Surely you would cooperate?” But that thought experiment is messed up because I care about copies of myself in a way that defies the set up of the prisoner’s dilemma.

One way to get cooperation in the one-shot prisoner’s dilemma is if Bob and Alice can inspect each other’s source code and prove that the other player will cooperate if and only if they do. But then Alice and Bob can communicate with each other! By having provably committed to this strategy, Alice and Bob can cause other player’s with the same strategy to cooperate.

Evidential decision theory also preys on our sentiments. I’d like to live in a cool multiverse where there are aliens outside my light cone who do what I want them to, but it’s not like my actions can cause that world to be the one I was born into.

I’m all for chasing after infinities and being nice to aliens, but acausal trade makes no sense. I’m willing to take many other infinite gambles, like theism or simulationism, before I’m willing to throw out causality.

I agree that altruistic sentiments are a confounder in the prisoner's dilemma. Yudkowsky (who would cooperate against a copy) makes a similar point in The True Prisoner's Dilemma, and there are lots of psychology studies showing that humans cooperate with each other in the PD in cases where I think they (that is, each individually) shouldn't. (Cf. section 6.4 of the MSR paper.)

But I don't think that altruistic sentiments are the primary reason for why some philosophers and other sophisticated people tend to favor cooperation in the prisoner's dilemma against a copy. As you may know, Newcomb's problem is decision-theoretically similar to the PD against a copy. In contrast to the PD, however, it doesn't seem to evoke any altruistic sentiments. And yet, many people prefer EDT's recommendations in Newcomb's problem. Thus, the "altruism error theory" of cooperation in the PD is not particularly convincing.

I don't see much evidence in favor of the "wishful thinking" hypothesis. It, too, seems to fail in the non-multiverse problems like Newcomb's paradox. Also, it's easy to come up with lots of incorrect theories about how any particular view results from biased epistemics, so I have quite low credence in any such hypothesis that isn't backed up by any evidence.

before I’m willing to throw out causality

Of course, causal eliminativism (or skepticism) is one motivation to one-box in Newcomb's problem, but subscribing to eliminitavism is not necessary to do so.

For example, in Evidence, Decision and Causality Arif Ahmed argues that causality is irrelevant for decision making. (The book starts with: "Causality is a pointless superstition. These days it would take more than one book to persuade anyone of that. This book focuses on the ‘pointless’ bit, not the ‘superstition’ bit. I take for granted that there are causal relations and ask what doing so is good for. More narrowly still, I ask whether causal belief plays a special role in decision.") Alternatively, one could even endorse the use of causal relationships for informing one's decision but still endorse one-boxing. See, e.g., Yudkowsky, 2010; Fisher, n.d.; Spohn, 2012 or this talk by Ilya Shpitser.

Newcomb's problem isn't a challenge to causal decision theory. I can solve Newcomb's problem by committing to one-boxing in any of a number of ways e.g. signing a contract or building a reputation as a one-boxer. After the boxes have already been placed in front of me, however, I can no longer influence their contents, so it would be good if I two-boxed if the rewards outweighed the penalty e.g. if it turned out the contract I signed was void, or if I don't care about my one-boxing reputation because I don't think I'm going to play this game again in the future.

The "wishful thinking" hypothesis might just apply to me then. I think it would be super cool if we could spontaneously cooperate with aliens in other universes.

Edit: Wow, ok I remember what I actually meant about wishful thinking. I meant that evidential decision theory literally prescribes wishful thinking. Also, if you made a copy of a purely selfish person and then told them of the fact, then I still think it would be rational to defect. Of course, if they could commit to cooperating before being copied, then that would be the right strategy.

After the boxes have already been placed in front of me, however, I can no longer influence their contents, so it would be good if I two-boxed

You would get more utility if you were willing to one-box even when there's no external penalty or opportunity to bind yourself to the decision. Indeed, functional decision theory can be understood as a formalization of the intuition: "I would be better off if only I could behave in the way I would have precommitted to behave in every circumstance, without actually needing to anticipate each such circumstance in advance." Since the predictor in Newcomb's problem fills the boxes based on your actual action, regardless of the reasoning or contract-writing or other activities that motivate the action, this suffices to always get the higher payout (compared to causal or evidential decision theory).

There are also dilemmas where causal decision theory gets less utility even if it has the opportunity to precommit to the dilemma; e.g., retro blackmail.

For a fuller argument, see the paper "Functional Decision Theory" by Yudkowsky and Soares.

Ha, I think the problem is just that your formalization of Newcomb's problem is defined so that one-boxing is always the correct strategy, and I'm working with a different formulation. There are four forms of Newcomb's problem that jibe with my intuition, and they're all different from the formalization you're working with.

  1. Your source code is readable. Then the best strategy is whatever the best strategy is when you get to publicly commit e.g. you should tear off the wheel when playing chicken if you have the opportunity to do so before your opponent.
  2. Your source code is readable and so is your opponent's. Then you get mathy things like mutual simulation and lob's theorem.
  3. We're in the real world, so the only information the other player has to guess your strategy is information like your past behavior and reputation. (This is by far the most realistic situation in my opinion.)
  4. You're playing against someone who's an expert in reading body language, say. Then it might be impossible to fool them unless you can fool yourself into thinking you'll one-box. But of course, after the boxes are actually in front of you, it would be great for you if you had a change of heart.

Your version is something like

  1. Your opponent can simulate you with 100% accuracy, including unforeseen events like something unexpected causing you to have a change of mind.

If we're creating AIs that others can simulate, then I guess we might as well make them immune to retro blackmail. I still don't see the implications for humans, who cannot be simulated with 100% fidelity and already have ample intuition about their reputations and know lots of ways to solve coordination problems.

Geographical distance is a kind of inferential distance.