I am currently working on a research project as part of CEA’s summer research fellowship. I am building a simple model of so-called “multiverse-wide cooperation via superrationality” (MSR). The model should incorporate the most relevant uncertainties for determining possible gains from trade. To be able to make this model maximally useful, I would like to ask others for their opinions on the idea of MSR. For instance, what are the main reasons you think MSR might be irrelevant or might not work as it is supposed to work? Which questions are unanswered and need to be addressed before being able to assess the merit of the idea? I would be happy about any input in the comments to this post or via mail to johannes@foundational-research.org.
An overview of resources on MSR, including introductory texts, can be found on the link above. To briefly illustrate the idea, consider two artificial agents with identical source code playing a prisoner’s dilemma. Even if both agents cannot causally interact, one agent’s action provides them with strong evidence about the other agent’s action. Evidential decision theory and recently proposed variants of causal decision theory (Yudkowsky and Soares, 2018; Spohn, 2003; Poellinger, 2013) say that agents should take such evidence into account when making decisions. MSR is based on the idea that (i) humans on Earth are in a similar situation as the two AI agents: there probably is a large or infinite multiverse containing many exact copies of humans on Earth (Tegmark 2003, p. 464), but also agents similar but non-identical to humans. (ii) If humans and these other, similar agents take each other’s preferences into account, then, due to gains from trade, everyone is better off than if everyone were to pursue only their own ends. It follows from (i) and (ii) that humans should take the preferences of other, similar agents in the multiverse into account, to produce the evidence that they do in turn take humans’ preferences into account, which leaves everyone better off.
According to Oesterheld (2017, sec. 4), this idea could have far-reaching implications for prioritization. For instance, given MSR, some forms of moral advocacy could become ineffective: advocating for their particular values provides agents with evidence that others do the same, potentially neutralizing each other’s efforts. Moreover, MSR could play a role in deciding which strategies to pursue in AI alignment. It could become especially valuable to ensure an AGI will engage in a multiverse-wide trade.
A few doubts:
It seems like MSR requires a multiverse large enough to have many well-correlated agents, but not large enough to run into the problems involved with infinite ethics. Most of my credence is on no multiverse or infinite multiverse, although I'm not particularly well-read on this issue.
My broad intuition is something like "Insofar as we can know about the values of other civilisations, they're probably similar to our own. Insofar as we can't, MSR isn't relevant." There are probably exceptions, though (e.g. we could guess the direction in which an r-selected civilisation's values would vary from our own).
I worry that MSR is susceptible to self-mugging of some sort. I don't have a particular example, but the general idea is that you're correlated with other agents even if you're being very irrational. And so you might end up doing things which seem arbitrarily irrational. But this is just a half-fledged thought, not a proper objection.
And lastly, I would have much more confidence in FDT and superrationality in general if there were a sensible metric of similarity between agents, apart from correlation (because if you always cooperate in prisoner's dilemmas, then your choices are perfectly correlated with CooperateBot, but intuitively it'd still be more rational to defect against CooperateBot, because your decision algorithm isn't similar to CooperateBot in the same way that it's similar to your psychological twin). I guess this requires a solution to logical uncertainty, though.
Happy to discuss this more with you in person. Also, I suggest you cross-post to Less Wrong.
One way I imagine dealing with this is that there is an oracle that tells us with certainty, for two algorithms and their decision situations, what the counterfactual possible joint outputs are. The smoothness then comes from our uncertainty about (i) the other agents' algorithms (ii) their decision situation (iii) potentially the outputs of the oracle. The correlations vary smoothly as we vary our probability distributions over these things, but for a fully specified algorithm, situation, etc., the algorithms are always either logically identical or not.
U... (read more)