Why I prioritize moral circle expansion over reducing extinction risk through artificial intelligence alignment

Jacy

This is a linkpost for https://www.sentienceinstitute.org/blog/moral-circle-expansion-vs-reducing-extinction-risk

This blog post was written on an old version of the EA Forum, so it has formatting issues. The title was also updated for clarity.

Many thanks for helpful feedback to Jo Anderson, Tobias Baumann, Jesse Clifton, Max Daniel, Michael Dickens, Persis Eskander, Daniel Filan, Kieran Greig, Zach Groff, Amy Halpern-Laff, Jamie Harris, Josh Jacobson, Gregory Lewis, Caspar Oesterheld, Carl Shulman, Gina Stuessy, Brian Tomasik, Johannes Treutlein, Magnus Vinding, Ben West, and Kelly Witwicki. I also forwarded Ben Todd and Rob Wiblin a small section of the draft that discusses an 80,000 Hours article.

Abstract

When people in the effective altruism (EA) community have worked to affect the far future, they’ve typically focused on reducing extinction risk, especially risks associated with superintelligence or general artificial intelligence alignment (AIA). I agree with the arguments for the far future being extremely important in our EA decisions, but I tentatively favor improving the quality of the far future by expanding humanity’s moral circle more than increasing the likelihood of the far future or humanity’s continued existence by reducing AIA-based extinction risk because: (1) the far future seems to not be very good in expectation, and there’s a significant likelihood of it being very bad, and (2) moral circle expansion seems highly neglected both in EA and in society at large. Also, I think considerations of bias are very important here, given how necessarily intuitive and subjective judgment calls make up the bulk of differences in opinion on far future cause prioritization. I find the argument in favor of AIA that technical research might be more tractable than social change to be the most compelling counterargument to my position.

Context

This post largely aggregates existing content on the topic, rather than making original arguments. I offer my views, mostly intuitions, on the various arguments, but of course I remain highly uncertain given the limited amount of empirical evidence we have on far future cause prioritization.

Many in the effective altruism (EA) community think the far future is a very important consideration when working to do the most good. The basic argument is that humanity could continue to exist for a very long time and could expand its civilization to the stars, creating a very large amount of moral value. The main narrative has been that this civilization could be a very good one, and that in the coming decades, we face sizable risks of extinctions that could prevent us from obtaining this “cosmic endowment.” The argument goes that these risks also seem like they can be reduced with a fairly small amount of additional resources (e.g. time, money), and therefore extinction risk reduction is one of the most important projects of humanity and the EA community.

(This argument also depends on a moral view that bringing about the existence of sentient beings can be a morally good and important action, comparable to helping sentient beings who currently exist live better lives. This is a contentious view in academic philosophy. See, for example, “'Making People Happy, Not Making Happy People': A Defense of the Asymmetry Intuition in Population Ethics.”)

However, one can accept the first part of this argument — that there is a very large amount of expected moral value in the far future and it’s relatively easy to make a difference in that value — without deciding that extinction risk is the most important project. In slightly different terms, one can decide not to work on reducing population risks, risks that could reduce the number of morally relevant individuals in the far future (of course, these are only risks of harm if one believes more individuals is a good thing), and instead work on reducing quality risks, risks that could reduce the quality of morally relevant individuals’ existence. One specific type of quality risk often discussed is a risk of astronomical suffering (s-risk), defined as “events that would bring about suffering on an astronomical scale, vastly exceeding all suffering that has existed on Earth so far.”

This blog post makes the case for focusing on quality risks over population risks. More specifically, though also more tentatively, it makes the case for focusing on reducing quality risk through moral circle expansion (MCE), the strategy of impacting the far future through increasing humanity’s concern for sentient beings who currently receive little consideration (i.e. widening our moral circle so it includes them), over AI alignment (AIA), the strategy of impacting the far future through increasing the likelihood that humanity creates an artificial general intelligence (AGI) that behaves as its designers want it to (known as the alignment problem).[1][2]

The basic case for MCE is very similar to the case for AIA. Humanity could continue to exist for a very long time and could expand its civilization to the stars, creating a very large number of sentient beings. The sort of civilization we create, however, seems highly dependent on our moral values and moral behavior. In particular, it’s uncertain whether many of those sentient beings will receive the moral consideration they deserve based on their sentience, i.e. whether they will be in our “moral circle” or not, like the many sentient beings who have suffered intensely over the course of human history (e.g. from torture, genocide, oppression, war). It seems the moral circle can be expanded with a fairly small amount of additional resources (e.g. time, money), and therefore MCE is one of the most important projects of humanity and the EA community.

Note that MCE is a specific kind of values spreading, the parent category of MCE that describes any effort to shift the values and moral behavior of humanity and its decendants (e.g. intelligent machines) in a positive direction to benefit the far future. (Of course, some people attempt to spread values in order to benefit the near future, but in this post we’re only considering far future impact.)

I’m specifically comparing MCE and AIA because AIA is probably the most favored method of reducing extinction risk in the EA community. AIA seems to be the default cause area to favor if one wants to have an impact on the far future, and I’ve been asked several times why I favor MCE instead.

This discussion risks conflating AIA with reducing extinction risk. These are two separate ideas, since an unaligned AGI could still lead to a large number of sentient beings, and an aligned AGI could still potentially cause extinction or population stagnation (e.g. if according to the designers’ values, even the best civilization the AGI could help build is still worse than nonexistence). However, most EAs focused on AIA seem to believe that the main risk is something quite like extinction, such as the textbook example of an AI that seeks to maximize the number of paperclips in the universe. I’ll note when the distinction between AIA and reducing extinction risk is relevant. Similarly, there are sometimes important prioritization differences between MCE and other types of values spreading, and those will be noted when they matter. (This paragraph is an important qualification for the whole post. The possibility of unaligned AGI that involves a civilization (and, less so because it seems quite unlikely, the possibility of an AGI that causes extinction) is important to consider for far future cause prioritization. Unfortunately, elaborating on this would make this post far more complicated and far less readable, and would not change many of the conclusions. Perhaps I’ll be able to make a second post that adds this discussion at some point.)

It’s also important to note that I’m discussing specifically AIA here, not all AI safety work in general. AI safety, which just means increasing the likelihood of beneficial AI outcomes, could be interpreted as including MCE, since MCE plausibly makes it more likely that an AI would be built with good values. However, MCE doesn’t seem like a very plausible route to increasing the likelihood that AI is simply aligned with the intentions of its designers, so I think MCE and AIA are fairly distinct cause areas.

AI safety can also include work on reducing s-risks, such as specifically reducing the likelihood of an unaligned AI that causes astronomical suffering, rather than reducing the likelihood of all unaligned AI. I think this is an interesting cause area, though I am unsure about its tractability and am not considering it in the scope of this blog post.

The post’s publication was supported by Greg Lewis, who was interested in this topic and donated $1,000 to Sentience Institute, the think tank I co-founded which researches effective strategies to expand humanity’s moral circle, conditional on this post being published to the Effective Altruism Forum. Lewis doesn’t necessarily agree with any of its content. He decided on the conditional donation prior to the post being written, and I did ask him to review the post prior to publication and it was edited based on his feedback.

The expected value of the far future

Whether we prioritize reducing extinction risk partly depends on how good or bad we expect human civilization to be in the far future, given it continues to exist. In my opinion, the assumption that it will be very good is a tragically unexamined assumption in the EA community.

What if it’s close to zero?

If we think the far future is very good, that clearly makes reducing extinction risk more promising. And if we think the far future is very bad, that makes reducing extinction risk not just unpromising, but actively very harmful. But what if it’s near the middle, i.e. close to zero?[3] 80,000 Hours wrote that to believe reducing extinction risk is not an EA priority on the basis of the expected moral value of the far future,

...even if you’re not sure how good the future will be, or suspect it will be bad, you may want civilisation to survive and keep its options open. People in the future will have much more time to study whether it’s desirable for civilisation to expand, stay the same size, or shrink. If you think there’s a good chance we will be able to act on those moral concerns, that’s a good reason to leave any final decisions to the wisdom of future generations. Overall, we’re highly uncertain about these big-picture questions, but that generally makes us more concerned to avoid making any irreversible commitments...

This reasoning seems mistaken to me because wanting “civilisation to survive and keep its options open” depends on optimism that civilization will do research, make good[4] decisions based on that research, and be capable of implementing those decisions.[5] In other words, while preventing extinction keeps options open for good things to happen, it also keeps options open for bad things to happen, and desiring this option value depends on an optimism that the good things are more likely. In other words, the reasoning assumes the optimism (thinking the far future is good, or at least that humans will make good decisions and be able to implement them[6]), which is also its conclusion.

Having that optimism makes sense in many decisions, which is why keeping options open is often a good heuristic. In EA, for example, people tend to do good things with their careers, which means career option value is a useful thing. This doesn’t readily translate to decisions where it’s not clear whether the actors involved will have a positive or negative impact. (Note 80,000 Hours isn’t making this comparison. I’m just making it to explain my own view here.)

There’s also a sense in which preventing extinction risk decreases option value because if humanity progresses past certain civilizational milestones that make extinction more unlikely — say, the rise of AGI or expansion beyond our own solar system — it might become harder or even impossible to press the “off switch” (ending civilization). However, I think most would agree that there’s more overall option value in a civilization that has gotten past these milestones because there’s a much wider variety of non-extinct civilizations than extinct civilizations.[7]

If you think that the expected moral value of the far future is close to zero, even if you think it’s slightly positive, then reducing extinction risk is a less promising EA strategy than if you think it’s very positive.

Key considerations

I think the considerations on this topic are best represented as questions where people’s beliefs (mostly just intuitions) vary on a long spectrum. I’ll list these in order of where I would guess I have the strongest disagreement with people who believe the far future is highly positive in expected value (shortened as HPEV-EAs), and I’ll note where I don’t think I would disagree or might even have a more positive-leaning belief than the average such person.

I think there’s a significant[8] chance that the moral circle will fail to expand to reach all sentient beings, such as artificial/small/weird minds (e.g. a sophisticated computer program used to mine asteroids, but one that doesn’t have the normal features of sentient minds like facial expressions). In other words, I think there’s a significant chance that powerful beings in the far future will have low willingness to pay for the welfare of many of the small/weird minds in the future.[9]
I think it’s likely that the powerful beings in the far future (analogous to humans as the powerful beings on Earth in 2018) will use large numbers of less powerful sentient beings, such as for recreation (e.g. safaris, war games), a labor force (e.g. colonists to distant parts of the galaxy, construction workers), scientific experiments, threats, (e.g. threatening to create and torture beings that a rival cares about), revenge, justice, religion, or even pure sadism.[10] I believe this because there have been less powerful sentient beings for all of humanity’s existence and well before (e.g. predation), many of whom are exploited and harmed by humans and other animals, and there seems to be little reason to think such power dynamics won’t continue to exist.
Alternative uses of resources include simply working to increase one’s own happiness directly (e.g. changing one’s neurophysiology to be extremely happy all the time), and constructing large non-sentient projects like a work of art. Though each of these types of project could still include sentient beings, such as for experimentation or a labor force.
With the exception of threats and sadism, the less powerful minds seem like they could suffer intensely because their intense suffering could be instrumentally useful. For example, if the recreation is nostalgic, or human psychology persists in some form, we could see powerful beings causing intense suffering in order to see good triumph over evil or in order to satisfy curiosity about situations that involve intense suffering (of course, the powerful beings might not acknowledge the suffering as suffering, instead conceiving of it as simulated but not actually experienced by the simulated entities). For another example, with a sentient labor force, punishment could be a stronger motivator than reward, as indicated by the history of evolution on Earth.[11][12]
I place significant moral value on artificial/small/weird minds.
I think it’s quite unlikely that human descendants will find the correct morality (in the sense of moral realism, finding these mind-independent moral facts), and I don’t think I would care much about that correct morality even if it existed. For example, I don’t think I would be compelled to create suffering if the correct morality said this is what I should do. Of course, such moral facts are very difficult to imagine, so I’m quite uncertain about what my reaction to them would be.[13]
I’m skeptical about the view that technology and efficiency will remove the need for powerless, high-suffering, instrumental moral patients. An example of this predicted trend is that factory farmed animals seem unlikely to be necessary in the far future because of their inefficiency at producing animal products. Therefore, I’m not particularly concerned about the factory farming of biological animals continuing into the far future. I am, however, concerned about similar but less inefficient systems.
An example of how technology might not render sentient labor forces and other instrumental sentient beings obsolete is how humans seem motivated to have power and control over the world, and in particular seem more satisfied by having power over other sentient beings than by having power over non-sentient things like barren landscapes.
I do still believe there’s a strong tendency towards efficiency and that this has the potential to render much suffering obsolete; I just have more skepticism about it than I think is often assumed by HPEV-EAs.[14]
I’m skeptical about the view that human descendants will optimize their resources for happiness (i.e. create hedonium) relative to optimizing for suffering (i.e. create dolorium).[15] Humans currently seem more deliberately driven to create hedonium, but creating dolorium might be more instrumentally useful (e.g. as a threat to rivals[16]).
On this topic, I similarly do still believe there’s a higher likelihood of creating hedonium; I just have more skepticism about it than I think is often assumed by EAs.
I’m largely in agreement with the average HPEV-EA in my moral exchange rate between happiness and suffering. However, I think those EAs tend to greatly underestimate how much the empirical tendency towards suffering over happiness (e.g. wild animals seem to endure much more suffering than happiness) is evidence of a future empirical asymmetry.
My view here is partly informed by the capacities for happiness and suffering that have evolved in humans and other animals, the capacities that seem to be driven by cultural forces (e.g. corporations seem to care more about downsides than upsides, perhaps because it’s easier in general to destroy and harm things than to create and grow them), and speculation about what could be done in more advanced civilizations, such as my best guess on what a planet optimized for happiness and a planet optimized for suffering would look like. For example, I think a given amount of dolorium/dystopia (say, the amount that can be created with 100 joules of energy) is far larger in absolute moral expected value than hedonium/utopia made with the same resources.
I’m unsure of how much I would disagree with HPEV-EAs about the argument that we should be highly uncertain about the likelihood of different far future scenarios because of how highly speculative our evidence is, which pushes my estimate of the expected value of the far future towards the middle of the possible range, i.e. towards zero.
I’m unsure of how much I would disagree with HPEV-EAs about the persistence of evolutionary forces into the future (i.e. how much future beings will be determined by fitness, rather than characteristics we might hope for like altruism and happiness).[17]
From the historical perspective, it worries me that many historical humans seem like they would be quite unhappy with the way human morality changed after them, such as the way Western countries are less concerned about previously-considered-immoral behavior like homosexuality and gluttony than their ancestors were in 500 CE. (Of course, one might think historical humans would agree with modern humans upon reflection, or think that much of humanity’s moral changes have been due to improved empirical understanding of the world.)[18]
I’m largely in agreement with HPEV-EAs that humanity’s moral circle has a track record of expansion and seems likely to continue expanding. For example, I think it’s quite likely that powerful beings in the far future will care a lot about charismatic biological animals like elephants or chimpanzees, or whatever beings have a similar relationship to those powerful beings as humanity has to elephants and chimpanzees. (As mentioned above, my pessimism about the continued expansion is largely due to concern about the magnitude of bad-but-unlikely outcomes and the harms that could occur due to MCE stagnation.)

Unfortunately, we don’t have much empirical data or solid theoretical arguments on these topics, so the disagreements I’ve had with HPEV-EAs have mostly just come down to differences in intuition. This is a common theme for prioritization among far future efforts. We can outline the relevant factors and a little empirical data, but the crucial factors seem to be left to speculation and intuition.

Most of these considerations are about how society will develop and utilize new technologies, which suggests we can develop relevant intuitions and speculative capacity by studying social and technological change. So even though these judgments are intuitive, we could potentially improve them with more study of big-picture social and technological change, such as Sentience Institute’s MCE research or Robin Hanson’s book on The Age of Em that analyzes what a future of brain emulations would look like. (This sort of empirical research is what I see as the most promising future research avenue for far future cause prioritization. I worry EAs overemphasize armchair research (like most of this post, actually) for various reasons.[19])

I’d personally be quite interested in a survey of people with expertise in the relevant fields of social, technological, and philosophical research, in which they’re asked about each of the considerations above, though it might be hard to get a decent sample size, and I think it would be quite difficult to debias the respondents (see the Bias section of this post).

I’m also interested in quantitative analyses of these considerations — calculations including all of these potential outcomes and associated likelihoods. As far as I know, this kind of analysis has only been attempted so far by Michael Dickens in “A Complete Quantitative Model for Cause Selection,” in which Dickens notes that, “Values spreading may be better than existential risk reduction.” While this quantification might seem hopelessly speculative, I think it’s highly useful even in such situations. Of course, rigorous debiasing is also very important here.

Overall, I think the far future is close to zero in expected moral value, meaning it’s not nearly as good as is commonly assumed, implicitly or explicitly, in the EA community.

Scale

Range of outcomes

It’s difficult to compare the scale of far future impacts since they are all astronomical, and I find the consideration of scale here to overall not be very useful.

Technically, it seems like MCE involves a larger range of potential outcomes than reducing extinction risk through AIA because, at least from a classical consequentialist perspective (giving weight to both negative and positive outcomes), it could make the difference between some of the worst far futures imaginable and the best far futures. Reducing extinction risk through AIA only makes the difference between nonexistence (a far future of zero value) and whatever world comes to exist. If one believes the far future is highly positive, this could still be a very large range, but it would still be less than the potential change from MCE.

How much less depends on one’s views of how bad the worst future is relative to the best future. If the absolute value is the same, then MCE has a range twice as large as extinction risk.

As mentioned in the Context section above, the change in the far future that AIA could achieve might not exactly be extinction versus non-extinction. While an aligned AI would probably not involve the extinction of all sentient beings, since that would require the values of its creators to prefer extinction over all other options, an unaligned AI might not necessarily involve extinction. To use the canonical AIA example of a “paperclip maximizer” (used to illustrate how an AI could easily have a harmful goal without any malicious intention), the rogue AI might create sentient beings as a labor force to implement its goal of maximizing the number of paperclips in the universe, or create sentient beings for some other goal.[20]

This means that the range of AIA is the difference between the potential universes with aligned AI and unaligned AI, which could be very good futures contrasted with very bad futures, rather than just very good futures contrasted with nonexistence.

Brian Tomasik has written out a thoughtful (though necessarily speculative and highly uncertain) breakdown of the risks of suffering in both aligned and unaligned AI scenarios, which weakly suggests that an aligned AI would lead to more suffering in expectation.

All things considered, it seems that the range of quality risk reduction (including MCE) is larger than that of extinction risk reduction (including AIA, depending on one’s view of what difference AI alignment makes), but this seems like a fairly weak consideration to me because (i) it’s a difference of roughly two-fold, which is quite small relative to the differences of ten-times, a thousand-times, etc. that we frequently see in cause prioritization, (ii) there are numerous fairly arbitrary judgment calls (like considering reducing extinction risk from AI versus AIA versus AI safety) that lead to different results.[21]

Likelihood of different far future scenarios[22][23]

MCE is relevant for many far future scenarios where AI doesn’t undergo the sort of “intelligence explosion” or similar progression that makes AIA important; for example, if AGI is developed by an institution like a foreign country that has little interest in AIA, or if AI is never developed, or if it’s developed slowly in a way that makes safety adjustments quite easy as that development occurs. In each of these scenarios, the way society treats sentient beings, especially those currently outside the moral circle, seems like it could still be affected by MCE. As mentioned earlier, I think there is a significant chance that the moral circle will fail to expand to reach all sentient beings, and I think a small moral circle could very easily lead to suboptimal or dystopian far future outcomes.

On the other hand, some possible far future civilizations might not involve moral circles, such as if there is an egalitarian society where each individual is able to fully represent their own interests in decision-making and this societal structure was not reached through MCE because these beings are all equally powerful for technological reasons (and no other beings exist and they have no interest in creating additional beings). Some AI outcomes might not be affected by MCE, such as an unaligned AI that does something like maximizing the number of paperclips for reasons other than human values (such as a programming error) or one whose designers create its value function without regard for humanity’s current moral views (“coherent extrapolated volition” could be an example of this, though I agree with Brian Tomasik that current moral views will likely be important in this scenario).

Given my current, highly uncertain estimates of the likelihood of various far future scenarios, I would guess that MCE is applicable in somewhat more cases than AIA, suggesting it’s easier to make a difference to the far future through MCE. (This is analogous to saying the risk of MCE-failure seems greater than the risk of AIA-failure, though I’m trying to avoid simplifying these into binary outcomes.)

Tractability

How much of an impact can we expect our marginal resources to have on the probability of extinction risk, or on the moral circle of the far future?

Social change versus technical research

One may believe changing people’s attitudes and behavior is quite difficult, and direct work on AIA involves a lot less of that. While AIA likely involves influencing some people (e.g. policymakers, researchers, and corporate executives), MCE is almost entirely influencing people’s attitudes and behavior.[24]

However, one could instead believe that technical research is more difficult in general, pointing to potential evidence such as the large amount of money spent on technical research (e.g. by Silicon Valley) with often very little to show for it, while huge social change seems to sometimes be effected by small groups of advocates with relatively little money (e.g. organizers of revolutions in Egypt, Serbia, and Turkey). (I don’t mean this as a very strong or persuasive argument, just as a possibility. There are plenty of examples of tech done with few resources and social change done with many.)

It’s hard to speak so generally, but I would guess that technical research tends to be easier than causing social change. And this seems like the strongest argument in favor of working on AIA over working on MCE.

Track record

In terms of EA work explicitly focused on the goals of AIA and MCE, AIA has a much better track record. The past few years have seen significant technical research output from organizations like MIRI and FHI, as documented by user Larks on the EA Forum for 2016 and 2017. I’d defer readers to those posts, but as a brief example, MIRI had an acclaimed paper on “Logical Induction,” which used a financial market process to estimate the likelihood of logical facts (e.g. mathematical propositions like the Riemann hypothesis) that we aren’t yet sure of. This is analogous to how we use probability theory to estimate the likelihood of empirical facts (e.g. a dice roll). In the bigger picture of AIA, this research could help lay the technical foundation for building an aligned AGI. See Larks’ post for a discussion of more papers like this, as well as non-technical work done by AI-focused organizations such as the Future of Life Institute’s open letter on AI safety signed by leading AI researchers and cited by the White House’s “Report on the Future of Artificial Intelligence.”

Using an analogous definition for MCE, EA work explicitly focused on MCE (meaning expanding the moral circle in order to improve the far future) basically only started in 2017 with the founding of Sentience Institute (SI), though there were various blog posts and articles discussing it before then. SI has basically finished four research projects: (1) Foundational Question Summaries that summarize evidence we have on important effective animal advocacy (EAA) questions, including a survey of EAA researchers, (2) a case study of the British antislavery movement to better understand how they achieved one of the first major moral circle expansions in modern history, (3) a case study of nuclear power to better understand how some countries (e.g. France) enthusiastically adopted this new technology, but others (e.g. the US) didn’t, (4) a nationally representative poll of US attitudes towards animal farming and animal-free food.

With a broader definition of MCE that includes activities that people prioritizing MCE tend to think are quite indirectly effective (see the Neglectedness section for discussion of definitions), we’ve seen EA achieve quite a lot more, such as the work done by The Humane League, Mercy For Animals, Animal Equality, and other organizations on corporate welfare reforms to animal farming practices, and the work done by The Good Food Institute and others on supporting a shift away from animal farming, especially through supporting new technologies like so-called “clean meat.”

Since I favor the narrower definition, I think AIA outperforms MCE on track record, but the difference in track record seems largely explained by the greater resources spent on AIA, which makes it a less important consideration. (Also, when I personally decided to focus on MCE, SI did not yet exist, so the lack of track record was an even stronger consideration in favor of AIA (though MCE was also more neglected at that time).)

To be clear, the track records of all far future projects tend to be weaker than near-term projects where we can directly see the results.

Robustness

If one values robustness, meaning a higher certainty that one is having a positive impact, either for instrumental or intrinsic reasons, then AIA might be more promising because once we develop an aligned AI (that continues to be aligned over time), the work of AIA is done and won’t need to be redone in the future. With MCE, assuming the advent of AI or similar developments won’t fix society’s values in place (known as “value lock-in”), then MCE progress could more easily be undone, especially if one believes there’s a social setpoint that humanity drifts back towards when moral progress is made.[25]

I think the assumptions of this argument make it quite weak: I’d guess an “intelligence explosion” has a significant chance of value lock-in,[26][27] and I don’t think there’s a setpoint in the sense that positive moral change increases the risk of negative moral change. I also don’t value robustness intrinsically at all or instrumentally very much; I think that there is so much uncertainty in all of these strategies and such weak prior beliefs[28] that differences in certainty of impact matter relatively little.

Miscellaneous

Work on either cause area runs the risk of backfiring. The main risk for AIA seems to be that the technical research done to better understand how to build an aligned AI will increase AI capabilities generally, meaning it’s also easier for humanity to produce an unaligned AI. The main risk for MCE seems to be that certain advocacy strategies will end up having the opposite effect as intended, such as a confrontational protest for animal rights that ends up putting people off of the cause.

It’s unclear which project has better near-term proxies and feedback loops to assess and increase long-term impact. AIA has technical problems with solutions that can be mathematically proven, but these might end up having little bearing on final AIA outcomes, such as if an AGI isn’t developed using the method that was advised or if technical solutions aren’t implemented by policy-makers. MCE has metrics like public attitudes and practices. My weak intuition here, and the weak intuition of other reasonable people I’ve discussed this with, is that MCE has better near-term proxies.

It’s unclear which project has more historical evidence that EAs can learn from to be more effective. AIA has previous scientific, mathematical, and philosophical research and technological successes and failures, while MCE has previous psychological, social, political, and economic research and advocacy successes and failures.

Finally, I do think that we learn a lot about tractability just by working directly on an issue. Given how little effort has gone into MCE itself (see Neglectedness below), I think we could resolve a significant amount of uncertainty with more work in the field.

Overall, considering only direct tractability (i.e. ignoring information value due to neglectedness, which would help other EAs with their cause prioritization), I’d guess AIA is a little more tractable.

Neglectedness

With neglectedness, we also face a challenge of how broadly to define the cause area. In this case, we have a fairly clear goal with our definition: to best assess how much low-hanging fruit is available. To me, it seems like there are two simple definitions that meet this goal: (i) organizations or individuals working explicitly on the cause area, (ii) organizations or individuals working on the strategies that are seen as top-tier by people focused on the cause area. How much one favors (i) versus (ii) depends largely on whether one thinks the top-tier strategies are fairly well-established and thus (ii) makes sense, or whether they will change over time such that one should favor (i) because those organizations and individuals will be better able to adjust.[29]

With the explicit focus definitions of AIA and MCE (recall this includes having a far future focus), it seems that MCE is much more neglected and has more low-hanging fruit.[30] For example, there is only one organization that I know of explicitly committed to MCE in the EA community (SI), while numerous organizations (MIRI, CHAI, part of FHI, part of CSER, even parts of AI capabilities organizations like Montreal Institute for Learning Algorithms, DeepMind, and OpenAI, etc.) are explicitly committed to AIA. Because MCE seems more neglected, we could learn a lot about MCE through SI’s initial work, such as how easily advocates have achieved MCE throughout history.

If we include those working on the cause area without an explicit focus, then that seems to widen the definition of MCE to include some of the top strategies being used to expand the moral circle in the near-term, such as farmed animal work done by Animal Charity Evaluators and it’s top-recommended charities, which have a combined budget of around $7.5 million in 2016. The combined budgets of top-tier AIA work is harder to estimate, but the Centre for Effective Altruism estimates all AIA work in 2016 was around $6.6 million. The AIA budgets seem to be increasing more quickly than the MCE budgets, especially given the grant-making of the Open philanthropy project. We could also include EA movement-building organizations that place a strong focus on reducing extinction risk, and even AIA specifically, such as 80,000 Hours. The categorization for MCE seems to have more room to broaden, perhaps all the way to mainstream animal advocacy strategies like the work of People for the Ethical Treatment of Animals (PETA), which might make AIA more neglected. (It could potentially go even farther, such as advocating for human sweatshop laborers, but that seems too far removed and I don’t know any MCE advocates who think it’s plausibly top-tier.)

I think there’s a difference in aptitude that suggests MCE is more neglected. Moral advocacy seems like a field which, while quite crowded, seems relatively easy for deliberate, thoughtful people to vastly outperform the average advocate,[31] which can lead to surprisingly large impact (e.g. EAs have already had far more success in publishing their writing, such as books and op-eds, than most writers hope for).[32] Additionally, despite centuries of advocacy, very little quality research has been done to critically examine what advocacy is effective and what’s not, while the fields of math, computer science, and machine learning involve substantial self-reflection and are largely worked on by academics who seem to use more critical thinking than the average activist (e.g. there’s far more skepticism in these academic communities, a demand for rigor and experimentation that’s rarely seen among advocates). In general, I think the aptitude of the average social change advocate is much lower than that of the average technological researcher, suggesting MCE is more neglected, though of course other factors also count.

The relative neglectedness of MCE also seems likely to continue, given the greater self-interest humanity has in AIA relative to MCE and, in my opinion, the net biases towards AIA described in the Biases section of this blog post. (This self-interest argument is a particularly important consideration for prioritizing MCE over AIA in my view.[33])

However, while neglectedness is typically thought to make a project more tractable, it seems that existing work in the extinction risk space has made marginal contributions more impactful in some ways. For example, talented AI researchers can find work relatively easily at an organization dedicated to AIA, while the path for talented MCE researchers is far less clear and easy. This alludes to the difference in tractability that might exist between labor resources and funding resources, as it currently seems like MCE is much more funding-constrained[34] while AIA is largely talent-constrained.

As another example, there are already solid inroads between the AIA community and the AI decision-makers, and AI decision-makers have already expressed interest in AIA, suggesting that influencing them with research results will be fairly easy once those research results are in hand. This means both that our estimation of AIA’s neglectedness should decrease, and that our estimation of its non-neglectedness tractability should increase, in the sense that neglectedness is a part of tractability. (The definitions in this framework vary.)

All things considered, I find MCE to be more compelling from a neglectedness perspective, particularly due to the current EA resource allocation and the self-interest humanity has, and will most likely continue to have, in AIA. When I decided to focus on MCE, there was an even stronger case for neglectedness because no organization existed committed to that goal (SI was founded in 2017), though there was an increased downside to MCE — the even more limited track record.

Cooperation

Values spreading as a far future intervention has been criticized on the following grounds: People have very different values, so trying to promote your values and change other people’s could be seen as uncooperative. Cooperation seems to be useful both directly (e.g. how willing are other people to help us out if we’re fighting them?) and in a broader sense because of superrationality, an argument that one should help others even when there’s no causal mechanism for reciprocation.[35]

I think this is certainly a good consideration against some forms of values spreading. For example, I don’t think it’d be wise for an MCE-focused EA to disrupt the Effective Altruism Global conferences (e.g. yell on stage and try to keep the conference from continuing) if they have an insufficient focus on MCE. This seems highly ineffective because of how uncooperative it is, given the EA space is supposed to be one for having challenging discussions and solving problems, not merely advocating one’s positions like a political rally.

However, I don’t think it holds much weight against MCE in particular for two reasons: First, because I don’t think MCE is particularly uncooperative. For example, I never bring up MCE with someone and hear, “But I like to keep my moral circle small!” I think this is because there are many different components of our attitudes and worldview that we refer to as values and morals. People have some deeply-held values that seem strongly resistant to change, such as their religion or the welfare of their immediate family, but very few people seem to have small moral circles as a deeply-held value. Instead, the small moral circle seems to mostly be a superficial, casual value (though it’s often connected to the deeper values) that people are okay with — or even happy about — changing.[36]

Second, insofar as MCE is uncooperative, I think a large number of other EA interventions, including AIA, are similarly uncooperative. Many people even in the EA community are concerned with, or even opposed to, AIA. For example, if one believes an aligned AI would create a worse far future than an unaligned AI, or if one thinks AIA is harmfully distracting from more important issues and gives EA a bad name. This isn’t to say I think AIA is bad because it’s uncooperative — on the contrary, this seems like a level of uncooperativeness that’s often necessary for dedicated EAs. (In a trivial way, basically all action involves uncooperativeness because it’s always about changing the status quo or preventing the status quo from changing.[37] Even inaction can involve uncooperativeness if it means not working to help someone who would like your help.)

I do think it’s more important to be cooperative in some other situations, such as if one has a very different value system than some of their colleagues, as might be the case for the Foundational Research Institute, which advocates strongly for cooperation with other EAs.

Cooperation with future do-gooders

Another argument against values spreading goes something like, “We can worry about values after we’ve safely developed AGI. Our tradeoff isn’t, ‘Should we work on values or AI?’ but instead ‘Should we work on AI now and values later, or values now and maybe AI later if there’s time?’”

I agree with one interpretation of the first part of this argument, that urgency is an important factor and AIA does seem like a time-sensitive cause area. However, I think MCE is similarly time-sensitive because of risks of value lock-in where our descendants’ morality becomes much harder to change, such as if AI designers choose to fix the values of an AGI, or at least to make them independent of other people’s opinions (they could still be amenable to self-reflection of the designer and new empirical data about the universe other than people’s opinions)[38]; if humanity sends out colonization vessels across the universe that are traveling too fast for us to adjust based on our changing moral views; or if society just becomes too wide and disparate to have effective social change mechanisms like we do today on Earth.

I disagree with the stronger interpretation, that we can count on some sort of cooperation with or control over future people. There might be some extent to which we can do this, such as via superrationality, but that seems like a fairly weak effect. Instead, I think we’re largely on our own, deciding what we do in the next few years (or perhaps in our whole career), and just making our best guess of what future people will do. It sounds very difficult to strike a deal with them that will ensure they work on MCE in exchange for us working on AIA.

Bias

I’m always cautious about bringing considerations of bias into an important discussion like this. Considerations easily turn into messy, personal attacks, and often you can fling roughly-equal considerations of counter-biases when accusations of bias are hurled at you. However, I think we should give them serious consideration in this case. First, I want to be exhaustive in this blog post, and that means throwing every consideration on the table, even messy ones. Second, my own cause prioritization “journey” led me first to AIA and other non-MCE/non-animal-advocacy EA priorities (mainly EA movement-building), and it was considerations of bias that allowed me to look at the object-level arguments with fresh eyes and decide that I had been way off in my previous assessment.

Third and most importantly, people’s views on this topic are inevitably driven mostly by intuitive, subjective judgment calls. One could easily read everything I’ve written in this post and say they lean in the MCE direction on every topic, or the AIA direction, and there would be little object-level criticism one could make against that if they just based their view on a different intuitive synthesis of the considerations. This subjectivity is dangerous, but it is also humbling. It requires us to take an honest look at our own thought processes in order to avoid the subtle, irrational effects that might push us in either direction. It also requires caution when evaluating “expert” judgment, given how much experts could be affected by personal and social biases themselves.

The best way I know of to think about bias in this case is to consider the biases and other factors that favor either cause area and see which case seems more powerful, or which particular biases might be affecting our own views. The following lists are presumably not exhaustive but lay out what I think are some common key parts of people’s journeys to AIA or MCE. Of course, these factors are not entirely deterministic and probably not all will apply to you, nor do they necessarily mean that you are wrong in your cause prioritization. Based on the circumstances that apply more to you, consider taking a more skeptical look at the project you favor and your current views on the object-level arguments for it.

One might be biased towards AIA if...

They eat animal products, and thus are assign lower moral value and less mental faculties to animals.

They haven’t accounted for the bias of speciesism.

They lack personal connections to animals, such as growing up with pets.

They are or have been a fan of science fiction and fantasy literature and media, especially if they dreamed of being the hero.

They have a tendency towards technical research over social projects.

They lack social skills.

They are inclined towards philosophy and mathematics.

They have a negative perception of activists, perhaps seeing them as hippies, irrational, idealistic, “social justice warriors,” or overly emotion-driven.

They are a part of the EA community, and therefore drift towards the status quo of EA leaders and peers. (The views of EA leaders can of course be genuine evidence of the correct cause prioritization, but they can also lead to bias.)

The idea of “saving the world” appeals to them.

They take pride in their intelligence, and would love if they could save the world just by doing brilliant technical research.

They are competitive, and like the feeling/mindset of doing astronomically more good than the average do-gooder, or even the average EA. (I’ve argued in this post that MCE has this astronomical impact, but it lacks the feeling of literally “saving the world” or otherwise having a clear impact that makes a good hero’s journey climax, and it’s closely tied to lesser, near-term impacts.)

They have little personal experience of extreme suffering, the sort that makes one pessimistic about the far future, especially regarding s-risks. (Personal experience could be one’s own experience or the experiences of close friends and family.)

They have little personal experience of oppression, such as due to their gender, race, disabilities, etc.

They are generally a happy person.

They are generally optimistic, or at least averse to thinking about bad outcomes like how humanity could cause astronomical suffering. (Though some pessimism is required for AIA in the sense that they don’t count on AI capabilities researchers ending up with an aligned AI without their help.)

One might be biased towards MCE if...

They are vegan, especially if they went vegan for non-animal or non-far-future reasons, such as for better personal health.

Their gut reaction when they hear about extinction risk or AI risk is to judge it nonsensical.

They have personal connections to animals, such as growing up with pets.

They are or have been a fan of social movement/activism literature and media, especially if they dreamed of being a movement leader.

They have a tendency towards social projects over technical research.

They have benefitted from above-average social skills.

They are inclined towards social science.

They have a positive perception of activists, perhaps seeing them as the true leaders of history.

They have social ties to vegans and animal advocates. (The views of these people can of course be genuine evidence of the correct cause prioritization, but they can also lead to bias.)

The idea of “helping the worst off” appeals to them.

They take pride in their social skills, and would love if they could help the worst off just by being socially savvy.

They are not competitive, and like the thought of being a part of a friendly social movement.

They have a lot of personal experience of extreme suffering, the sort that makes one pessimistic about the far future, especially regarding s-risks. (Personal experience could be one’s own experience or the experiences of close friends and family.)

They have a lot of personal experience of oppression, such as due to their gender, race, disabilities, etc.

They are generally an unhappy person.

They are generally pessimistic, or at least don’t like thinking about good outcomes. (Though some optimism is required for MCE in the sense that they believe work on MCE can make a large positive difference in social attitudes and behavior.)

They care a lot about directly seeing the impact of their work, even if the bulk of their impact is hard to see. (E.g. seeing improvements in the conditions of farmed animals, which can be seen as a proxy for helping farmed-animal-like beings in the far future.)

Implications

I personally found myself far more compelled towards AIA in my early involvement with EA before I had thought in detail about the issues discussed in this post. I think the list items in the AIA section apply to me much more strongly than the MCE list. When I considered these biases, in particular speciesism and my desire to follow the status quo of my EA friends, a fresh look at the object-level arguments changed my mind.

From my reading and conversations in EA, I think the biases in favor of AIA are also quite a bit stronger in the community, though of course some EAs — mainly those already working on animal issues for near-term reasons — probably feel a stronger pull in the other direction.

How you think about these bias considerations also depends on how biased you think the average EA is. If you, for example, think EAs tend to be quite biased in another way like “measurement bias” or “quantifiability bias” (a tendency to focus too much on easily-quantifiable, low-risk interventions), then considerations of biases on this topic should probably be more compelling to you than they will be to people who think EAs are less biased.

Notes

[1] This post attempts to compare these cause areas overall, but since that’s sometimes too vague, I specifically mean the strategies within each cause area that seem most promising. I think this is basically equal to “what EAs working on MCE most strongly prioritize” and “what EAs working on AIA most strongly prioritize.”

[2] There’s a sense in which AIA is a form of MCE simply because AIA will tend to lead to certain values. I’m excluding that AIA approach of MCE from my analysis here to avoid overlap between these two cause areas.

[3] Depending on how close we’re talking about, this could be quite unlikely. If we’re discussing the range of outcomes from dystopia across the universe to utopia across the universe, then a range like “between modern earth and the opposite value of modern earth” seems like a very tiny fraction of the total possible range.

[4] I mean “good” in a “positive impact” sense here, so it includes not just rationality according to the decision-maker but also value alignment, luck, being empirically well-informed, being capable of doing good things, etc.

[5] One reason for optimism is that you might think most extinction risk is in the next few years, such that you and other EAs you know today will still be around to do this research yourselves and make good decisions after those risks are avoided.

[6] Technically one could believe the far future is negative but also that humans will make good decisions about extinction, such as if one believes the far future (given non-extinction) will be bad only due to nonhuman forces, such as aliens or evolutionary trends, but has optimism about human decision-making, including both that humans will make good decisions about extinction and that they will be logistically able to make those decisions. I think this is an unlikely view to settle on, but it would make option value a good thing in a “close to zero” scenario.

[7] Non-extinct civilizations could be maximized for happiness, maximized for interestingness, set up like Star Wars or another sci-fi scenario, etc. while extinct civilizations would all be devoid of sentient beings, perhaps with some variation in physical structure like different planets or remnant structures of human civilization.

[8] My views on this are currently largely qualitative, but if I had to put a number on the word “significant” in this context, it’d be somewhere around 5-30%. This is a very intuitive estimate, and I’m not prepared to justify it.

[9] Paul Christiano made a general argument in favor of humanity reaching good values in the long run due to reflection in his post “Against Moral Advocacy” (see the “Optimism about reflection” section) though he doesn’t specifically address concern for all sentient beings as a potential outcome, which might be less likely than other good values that are more driven by cooperation."

[10] Nick Bostrom has considered some of these risks of artificial suffering using the term “mind crime,” which specifically refers to harming sentient beings created inside a superintelligence. See his book, Superintelligence.

[11] The Foundational Research Institute has written about risks of astronomical suffering in “Reducing Risks of Astronomical Suffering: A Neglected Priority.” The TV series Black Mirror is an interesting dramatic exploration of how the far future could involve vasts amounts of suffering, such as the episodes “White Christmas” and “USS Callister.” Of course, the details of these situations often veer towards entertainment over realism, but their exploration of the potential for dystopias in which people abuse sentient digital entities is thought-provoking.

[12] I’m highly uncertain about what sort of motivations (like happiness and suffering in humans) future digital sentient beings will have. For example, is punishment being a stronger motivator in earth-originating life just an evolutionary fluke that we can expect to dissipate in artificial beings? Could they be just as motivated to attain reward as we are to avoid punishment? I think this is a promising avenue for future research, and I’m glad it’s being discussed by some EAs.

[13] Brian Tomasik discusses this in his essay on “Values Spreading is Often More Important than Extinction Risk,” suggesting that, “there's not an obvious similar mechanism pushing organisms toward the things that I care about.” However, Paul Christiano notes in “Against Moral Advocacy” that he expects “[c]onvergence of values” because “the space of all human values is not very broad,” though this seems quite dependent on how one defines the possible space of values.

[14] This efficiency argument is also discussed in Ben West’s article on “An Argument for Why the Future May Be Good.”

[15] The term “resources” is intentionally quite broad. This means whatever the limitations are on the ability to produce happiness and suffering, such as energy or computation.

[16] One can also create hedonium as a promise to get things from rivals, but promises seem less common than threats because threats tend to be more motivating and easier to implement (e.g it’s easier to destroy than create). However, some social norms encourage promises over threats because promises are better for society as a whole. Additionally, threats against powerful beings (e.g. other citizens in the same country) do less than threats against less powerful, or more distant beings, and the latter category might be increasingly common in the future. Additionally, threats and promises matter less when one considers that they are often unfulfilled because the other party doesn’t do the action that was the subject of the threat or promise.

[17] Paul Christiano’s blog post on “Why might the future be good?” argues that “the future will be characterized by much higher influence for altruistic values [than self-interest],” though he seems to just be discussing the potential of altruism and self-interest to create positive value, rather than their potential to create negative value.

Brian Tomasik discusses Christiano’s argument and others in “The Future of Darwinism” and concludes, “Whether the future will be determined by Darwinism or the deliberate decisions of a unified governing structure remains unclear.”
[18] One discussion of changes in morality on a large scale is Robin Hanson’s blog post, “Forager, Farmer Morals.”

[19] Armchair research is relatively easy, in the sense that all it requires is writing and thinking rather than also digging through historical texts, running scientific studies, or engaging in substantial conversation with advocates, researchers, and/or other stakeholders. It’s also more similar to the mathematical and philosophical work that most EAs are used to doing. And it’s more attractive as a demonstration of personal prowess to think your way into a crucial consideration than to arrive at one through the tedious work of research. (These reasons are similar to the reasons I feel most far-future-focused EAs are biased towards AIA over MCE.)

[20] These sentient beings probably won’t be the biological animals we know today, but instead digital beings who can more efficiently achieve the AI’s goals.

[21] The neglectedness heuristic involves a similar messiness of definitions, but the choices seem less arbitrary to me, and the different definitions lead to more similar results.

[22] Arguably this consideration should be under Tractability rather than Scale.

[23] There’s a related framing here of “leverage,” with the basic argument being that AIA seems more compelling than MCE because AIA is specifically targeted at an important, narrow far future factor (the development of AGI) while MCE is not as specifically targeted. This also suggests that we should consider specific MCE tactics focused on important, narrow far future factors, such as ensuring the AI decision-makers have wide moral circles even if the rest of society lags behind. I find this argument fairly compelling, including the implication that MCE advocates should focus more on advocating for digital sentience and advocating in the EA community than they would otherwise.

[24] Though plausibly MCE involves only influencing a few decision-makers, such as the designers of an AGI.

[25] Brian Tomasik discusses this in, “Values Spreading is Often More Important than Extinction Risk,” arguing that, “Very likely our values will be lost to entropy or Darwinian forces beyond our control. However, there's some chance that we'll create a singleton in the next few centuries that includes goal-preservation mechanisms allowing our values to be "locked in" indefinitely. Even absent a singleton, as long as the vastness of space allows for distinct regions to execute on their own values without take-over by other powers, then we don't even need a singleton; we just need goal-preservation mechanisms.”

[26] Brian Tomasik discusses the likelihood of value lock-in in his essay, “Will Future Civilization Eventually Achieve Goal Preservation?”

[27] The advent of AGI seems like it will have similar effects on the lock-in of values and alignment, so if you think AI timelines are shorter (i.e. advanced AI will be developed sooner), then that increases the urgency of both cause areas. If you think timelines are so short that we will struggle to successfully reach AI alignment, then that decreases the tractability of AIA, but MCE seems like it could more easily have a partial effect on AI outcomes than AIA could.

[28] In the case of near-term, direct interventions, one might believe that “most social programmes don’t work,” which suggests that we should have low, strong priors for intervention effectiveness that we need robustness to overcome.

[29] Caspar Oesterheld discusses the ambiguity of neglectedness definitions in his blog post, "Complications in evaluating neglectedness." Other EAs have also raised concern about this commonly-used heuristic, and I almost included this content in this post under the “Tractability” section for this reason.

[30] This is a fairly intuitive sense of the word “matched.” I’m taking the topic of ways to affect the far future, dividing it into population risk and quality risk categories, then treating AIA and MCE as subcategories of each. I’m also thinking in terms of each project (AIA and MCE) being in the category of “cause areas with at least pretty good arguments in their favor,” and I think “put decent resources into all such projects until the arguments are rebutted” is a good approach for the EA community.

[31] I mean “advocate” quite broadly here, just anyone working to effect social change, such as people submitting op-eds to newspapers or trying to get pedestrians to look at their protest or take their leaflets.

[32] It’s unclear what the explanation is for this. It could just be demographic differences such as high IQ, going to elite universities, etc. but it could also be exceptional “rationality skills” like finding loopholes in the publishing system.

[33] In Brian Tomasik’s essay on “Values Spreading is Often More Important than Extinction Risk,” he argues that “[m]ost people want to prevent extinction” while, “In contrast, you may have particular things that you value that aren't widely shared. These things might be easy to create, and the intuition that they matter is probably not too hard to spread. Thus, it seems likely that you would have higher leverage in spreading your own values than in working on safety measures against extinction.”

[34] This is just my personal impression from working in MCE, especially with my organization Sentience Institute. With indirect work, The Good Food Institute is a potential exception since they have struggled to quickly hired talented people after their large amounts of funding.

[35] See “Superrationality” in “Reasons to Be Nice to Other Value Systems” for an EA introduction to the idea. See “In favor of ‘being nice’” in “Against Moral Advocacy” as example of cooperation as an argument against values spreading. In “Multiverse-wide Cooperation via Correlated Decision Making,” Caspar Oesterheld argues that superrational cooperation makes MCE more important.

[36] This discussion is complicated by the widely varying degrees of MCE. While, for example, most US residents seem perfectly okay with expanding concern to vertebrates, there would be more opposition to expanding to insects, and even more to some simple computer programs that some argue should fit into the edges of our moral circles. I do think the farthest expansions are much less cooperative in this sense, though if the message is just framed as, “expand our moral circle to all sentient beings,” I still expect strong agreement.

[37] One exception is a situation where everyone wants a change to happen, but nobody else wants it badly enough to put the work into changing the status quo.

[38] My impression is that the AI safety community currently wants to avoid fixing these values, though they might still be trying to make them resistant to advocacy from other people, and in general I think many people today would prefer to fix the values of an AGI when they consider that they might not agree with potential future values.

106 Reactions

Mentioned in

274Problem areas beyond 80,000 Hours' current priorities

269Big List of Cause Candidates

140The Future Might Not Be So Great

122PhD on Moral Progress - Bibliography Review

107Should Longtermists Mostly Think About Animals?

Load more (5/31)

More posts like this

Comments72

Sorted by

New & upvoted

Click to highlight new comments since: Today at 2:33 AM

Some comments are truncated due to high volume. (⌘F to expand all)Change truncation settings

seanrson2y17

Review for the Decade Review

I come back to this post quite frequently when considering whether to prioritize MCE (via animal advocacy) or AI safety. It seems that these two cause areas often attract quite different people with quite different objectives, so this post is unique in its attempt to compare the two based on the same long-term considerations.

I especially like the discussion of bias. Although some might find the whole discussion a bit ad hominem, I think people in EA should take seriously the worry that certain features common in the EA community (e.g., an attraction towards abstract puzzles) might bias us towards particular cause areas.

I recommend this post for anyone interested in thinking more broadly about longtermism.

Gregory Lewis6y32

Thank you for writing this post. An evergreen difficulty that applies to discussing topics of such a broad scope is the large number of matters that are relevant, difficult to judge, and where one's judgement (whatever it may be) can be reasonably challenged. I hope to offer a crisper summary of why I am not persuaded.

I understand from this the primary motivation of MCE is avoiding AI-based dystopias, with the implied causal chain being along the lines of, “If we ensure the humans generating the AI have a broader circle of moral concern, the resulting post-human civilization is less likely to include dystopic scenarios involving great multitudes of suffering sentiences.”

There are two considerations that speak against this being a greater priority than AI alignment research: 1) Back-chaining from AI dystopias leaves relatively few occasions where MCE would make a crucial difference. 2) The current portfolio of ‘EA-based’ MCE is poorly addressed to averting AI-based dystopias.

Re. 1): MCE may prove neither necessary nor sufficient for ensuring AI goes well. On one hand, AI designers, even if speciesist themselves, might nonetheless provide the right apparatus for value learning such ... (read more)

mic3y17

In Stuart Russell's Human Compatible (2019), he advocates for AGI to follow preference utilitarianism, maximally satisfying the values of humans. As for animal interests, he seems to think that they are sufficiently represented since he writes that they will be valued by the AI insofar as humans care about them. Reading this from Stuart Russell shifted me toward thinking that moral circle expansion probably does matter for the long-term future. It seems quite plausible (likely?) that AGI will follow this kind of value function which does not directly care about animals rather than broadly anti-speciesist values, since AI researchers are not generally anti-speciesists. In this case, moral circle expansion across the general population would be essential.

(Another factor is that Russell's reward modeling depends on receiving feedback occasionally from humans to learn their preferences, which is much more difficult to do with animals. Thus, under an approach similar to reward modeling, AGI developers probably won't bother to directly include animal preferences, when that involves all the extra work of figuring out how to get the AI to discern animal preferences. And how many AI researc... (read more)

Jacy6y28

Those considerations make sense. I don't have much more to add for/against than what I said in the post.

On the comparison between different MCE strategies, I'm pretty uncertain which are best. The main reasons I currently favor farmed animal advocacy over your examples (global poverty, environmentalism, and companion animals) are that (1) farmed animal advocacy is far more neglected, (2) farmed animal advocacy is far more similar to potential far future dystopias, mainly just because it involves vast numbers of sentient beings who are largely ignored by most of society. I'm not relatively very worried about, for example, far future dystopias where dog-and-cat-like-beings (e.g. small, entertaining AIs kept around for companionship) are suffering in vast numbers. And environmentalism is typically advocating for non-sentient beings, which I think is quite different than MCE for sentient beings.

I think the better competitors to farmed animal advocacy are advocating broadly for antispeciesism/fundamental rights (e.g. Nonhuman Rights Project) and advocating specifically for digital sentience (e.g. a larger, more sophisticated version of People for the Ethical Treatment of Reinforcement L... (read more)

Pablo6y22

The main reasons I currently favor farmed animal advocacy over your examples (global poverty, environmentalism, and companion animals) are that (1) farmed animal advocacy is far more neglected, (2) farmed animal advocacy is far more similar to potential far future dystopias, mainly just because it involves vast numbers of sentient beings who are largely ignored by most of society.

Wild animal advocacy is far more neglected than farmed animal advocacy, and it involves even larger numbers of sentient beings ignored by most of society. If the superiority of farmed animal advocacy over global poverty along these two dimensions is a sufficient reason for not working on global poverty, why isn't the superiority of wild animal advocacy over farmed animal advocacy along those same dimensions not also a sufficient reason for not working on farmed animal advocacy?

Jacy6y19

I personally don't think WAS is as similar to the most plausible far future dystopias, so I've been prioritizing it less even over just the past couple of years. I don't expect far future dystopias to involve as much naturogenic (nature-caused) suffering, though of course it's possible (e.g. if humans create large numbers of sentient beings in a simulation, but then let the simulation run on its own for a while, then the simulation could come to be viewed as naturogenic-ish and those attitudes could become more relevant).

I think if one wants something very neglected, digital sentience advocacy is basically across-the-board better than WAS advocacy.

That being said, I'm highly uncertain here and these reasons aren't overwhelming (e.g. WAS advocacy pushes on more than just the "care about naturogenic suffering" lever), so I think WAS advocacy is still, in Gregory's words, an important part of the 'far future portfolio.' And often one can work on it while working on other things, e.g. I think Animal Charity Evaluators' WAS content (e.g. ]guest blog post by Oscar Horta](https://animalcharityevaluators.org/blog/why-the-situation-of-animals-in-the-wild-should-concern-us/)) has helped them be more well-rounded as an organization, and didn't directly trade off with their farmed animal content.

saulius

But humanity/AI is likely to expand to other planets. Won't those planets need to have complex ecosystems that could involve a lot of suffering? Or do you think it will all be done with some fancy tech that'll be too different from today's wildlife for it to be relevant? It's true that those ecosystems would (mostly?) be non-naturogenic but I'm not that sure that people would care about them, it'd still be animals/diseases/hunger.etc. hurting animals. Maybe it'd be easier to engineer an ecosystem without predation and diseases but that is a non-trivial assumption and suffering could then arise in other ways. Also, some humans want to spread life to other planets for its own sake and relatively few people need to want that to cause a lot of suffering if no one works on preventing it. This could be less relevant if you think that most of the expected value comes from simulations that won't involve ecosystems.

Jacy6y10

Yes, terraforming is a big way in which close-to-WAS scenarios could arise. I do think it's smaller in expectation than digital environments that develop on their own and thus are close-to-WAS.

I don't think terraforming would be done very differently than today's wildlife, e.g. done without predation and diseases.

Ultimately I still think the digital, not-close-to-WAS scenarios seem much larger in expectation.

Evan_Gaensbauer6y12

Thanks for funding this research. Notes:

Ostensibly it seems like much of Sentience Institute's (SI) current research is focused on identifying those MCE strategies which historically have turned out to be more effective among the strategies which have been tried. I think SI as an organization is based on the experience of EA as a movement in having significant success with MCE in a relatively short period of time. Successfully spreading the meme of effective giving; increasing concern for the far future in notable ways; and corporate animal welfare campaigns are all dramatic achievements for a young social movement like EA. While these aren't on the scale of shaping MCE over the course of the far future, these achievements makes it seem more possible EA and allied movements can have an outsized impact by pursuing neglected strategies for values-spreading.
On terminology, to say the focus is on non-human animals, or even moral patients which typically come to mind when describing 'animal-like' minds, i.e., familiar vertebrates is inaccurate. "Sentient being", "moral patient" or "non-human agents/beings" are terms which are inclusive of non-human animals, and other types of potential moral patients posited. Admittedly these aren't catchy terms.

Ben_West

This is something I also struggle with in understanding the post. it seems like we need: 1. AI creators can be convinced to expand their moral circle 2. Despite (1), they do not wish to be convinced to expand their moral circle 3. The AI follows this second desire to not be convinced to expand their moral circle I imagine this happening with certain religious things; e.g. I could imagine someone saying "I wish to think the Bible is true even if I could be convinced that the Bible is false". But it seems relatively implausible with regards to MCE? Particularly given that AI safety talks a lot about things like CEV, it is unclear to me whether there is really a strong trade-off between MCE and AIA. (Note: Jacy and I discussed this via email and didn't really come to a consensus, so there's a good chance I am just misunderstanding his argument.)

Jacy6y18

Hm, yeah, I don't think I fully understand you here either, and this seems somewhat different than what we discussed via email.

My concern is with (2) in your list. "[T]hey do not wish to be convinced to expand their moral circle" is extremely ambiguous to me. Presumably you mean they -- without MCE advocacy being done -- wouldn't put in wide-MC* values or values that lead to wide-MC into an aligned AI. But I think it's being conflated with, "they actively oppose" or "they would answer 'no' if asked, 'Do you think your values are wrong when it comes to which moral beings deserve moral consideration?'"

I think they don't actively oppose it, they would mostly answer "no" to that question, and it's very uncertain if they will put the wide-MC-leading values into an aligned AI. I don't think CEV or similar reflection processes reliably lead to wide moral circles. I think they can still be heavily influenced by their initial set-up (e.g. what the values of humanity when reflection begins).

This leads me to think that you only need (2) to be true in a very weak sense for MCE to matter. I think it's quite plausible that this is the case.

*Wide-MC meaning an extremely wide moral circle, e.g. includes insects, small/weird digital minds.

William_S

Why do you think this is the case? Do you think there is an alternative reflection process (either implemented by an AI, by a human society, or combination of both) that could be defined that would reliably lead to wide moral circles? Do you have any thoughts on what would it look like? If we go through some kind of reflection process to determine our values, I would much rather have a reflection process that wasn't dependent on whether or not MCE occurred before hand, and I think not leading to a wide moral circle should be considered a serious bug in any definition of a reflection process. It seems to me that working on producing this would be a plausible alternative or at least parallel path to directly performing MCE.

Lukas_Gloor6y10

I think that there's an inevitable tradeoff between wanting a reflection process to have certain properties and worries about this violating goal preservation for at least some people. This blogpost is not about MCE directly, but if you think of "BAAN thought experiment" as "we do moral reflection and the outcome is such a wide circle that most people think it is extremely counterintuitive" then the reasoning in large parts of the blogpost should apply perfectly to the discussion here.

That is not to say that trying to fine tune reflection processes is pointless: I think it's very important to think about what our desiderata should be for a CEV-like reflection process. I'm just saying that there will be tradeoffs between certain commonly mentioned desiderata that people don't realize are there because they think there is such a thing as "genuinely free and open-ended deliberation."

Jacy6y10

Thanks for commenting, Lukas. I think Lukas, Brian Tomasik, and others affiliated with FRI have thought more about this, and I basically defer to their views here, especially because I haven't heard any reasonable people disagree with this particular point. Namely, I agree with Lukas that there seems to be an inevitable tradeoff here.

Brian_Tomasik6y15

I tend to think of moral values as being pretty contingent and pretty arbitrary, such that what values you start with makes a big difference to what values you end up with even on reflection. People may "imprint" on the values they receive from their culture to a greater or lesser degree.

I'm also skeptical that sophisticated philosophical-type reflection will have significant influence over posthuman values compared with more ordinary political/economic forces. I suppose philosophers have sometimes had big influences on human politics (religions, Marxism, the Enlightenment), though not necessarily in a clean "carefully consider lots of philosophical arguments and pick the best ones" kind of way.

Jacy6y11

I'd qualify this by adding that the philosophical-type reflection seems to lead in expectation to more moral value (positive or negative, e.g. hedonium or dolorium) than other forces, despite overall having less influence than those other forces.

Daniel_Eth6y29

I thought this piece was good. I agree that MCE work is likely quite high impact - perhaps around the same level as X-risk work - and that it has been generally ignored by EAs. I also agree that it would be good for there to be more MCE work going forward. Here's my 2 cents:

You seem to be saying that AIA is a technical problem and MCE is a social problem. While I think there is something to this, I think there are very important technical and social sides to both of these. Much of the work related to AIA so far has been about raising awareness about the problem (eg the book Superintelligence), and this is more a social solution than a technical one. Also, avoiding a technological race for AGI seems important for AIA, and this also is more a social problem than a technical one.

For MCE, the 2 best things I can imagine (that I think are plausible) are both technical in nature. First, I expect clean meat will lead to the moral circle expanding more to animals. I really don't see any vegan social movement succeeding in ending factory farming anywhere near as much as I expect clean meat to. Second, I'd imagine that a mature science of consciousness would increase MCE significantly. Many ... (read more)

Jacy6y19

Thanks for the comment! A few of my thoughts on this:

Presumably we want some people working on both of these problems, some people have skills more suited to one than the other, and some people are just going to be more passionate about one than the other.

If one is convinced non-extinction civilization is net positive, this seems true and important. Sorry if I framed the post too much as one or the other for the whole community.

Much of the work related to AIA so far has been about raising awareness about the problem (eg the book Superintelligence), and this is more a social solution than a technical one.

Maybe. My impression from people working on AIA is that they see it as mostly technical, and indeed they think much of the social work has been net negative. Perhaps not Superintelligence, but at least the work that's been done to get media coverage and widespread attention without the technical attention to detail of Bostrom's book.

I think the more important social work (from a pro-AIA perspective) is about convincing AI decision-makers to use the technical results of AIA research, but my impression is that AIA proponents still think getting those technical results is proba... (read more)

Brian_Tomasik

I would guess that increasing understanding of cognitive science would generally increase people's moral circles if only because people would think more about these kinds of questions. Of course, understanding cognitive science is no guarantee that you'll conclude that animals matter, as we can see from people like Dennett, Yudkowsky, Peter Carruthers, etc.

Jacy

Agreed.

MikeJohnson

I think that's right. Specifically, I would advocate consciousness research as a foundation for principled moral circle expansion. I.e., if we do consciousness research correctly, the equations themselves will tell us how conscious insects are, whether algorithms can suffer, how much moral weight we should give animals, and so on. On the other hand, if there is no fact of the matter as to what is conscious, we're headed toward a very weird, very contentious future of conflicting/incompatible moral circles, with no 'ground truth' or shared principles to arbitrate disputes. Edit: I'd also like to thank Jacy for posting this- I find it a notable contribution to the space, and clearly a product of a lot of hard work and deep thought.

Larks6y22

Thanks for writing this, I thought it was a good article. And thanks to Greg for funding it.

My pushback would be on the cooperation and coordination point. It seems that a lot of other people, with other moral values, could make a very similar argument: that they need to promote their values now, as the stakes as very high with possible upcoming value lock-in. To people with those values, these arguments should seem roughly as important as the above argument is to you.

Christians could argue that, if the singularity is approaching, it is vitally important that we ensure the universe won't be filled with sinners who will go to hell.
Egalitarians could argue that, if the singularity is approaching, it is vitally important that we ensure the universe won't be filled with wider and wider diversities of wealth.
Libertarians could argue that, if the singularity is approaching, it is vitally important that we ensure the universe won't be filled with property rights violations.
Naturalists could argue that, if the singularity is approaching, it is vitally important that we ensure the beauty of nature won't be bespoiled all over the universe.
Nationalists could argue that, if the singular

... (read more)

Jacy6y17

Yeah, I think that's basically right. I think moral circle expansion (MCE) is closer to your list items than extinction risk reduction (ERR) is because MCE mostly competes in the values space, while ERR mostly competes in the technology space.

However, MCE is competing in a narrower space than just values. It's in the MC space, which is just the space of advocacy on what our moral circle should look like. So I think it's fairly distinct from the list items in that sense, though you could still say they're in the same space because all advocacy competes for news coverage, ad buys, recruiting advocacy-oriented people, etc. (Technology projects could also compete for these things, though there are separations, e.g. journalists with a social beat versus journalists with a tech beat.)

I think the comparably narrow space of ERR is ER, which also includes people who don't want extinction risk reduced (or even want it increased), such as some hardcore environmentalists, antinatalists, and negative utilitarians.

I think these are legitimate cooperation/coordination perspectives, and it's not really clear to me how they add up. But in general, I think this matters mostly in situations where you... (read more)

Matthew_Barnett6y11

But it seems that it would be very bad if everyone took this advice literally.

Fortunately, not everyone does take this advice literally :).

This is very similar to the tragedy of the commons. If everyone acts out of their own self motivated interests, then everyone will be worse off. However, the situation as you described does not fully reflect reality because none of the groups you mentioned are actually trying to influence AI researchers at the moment. Therefore, MCE has a decisive advantage. Of course, this is always subject to change.

In contrast, preventing the extinction of humanity seems to occupy a privileged position

I find that it is often the case that people will dismiss any specific moral recommendation for AI except this one. Personally I don't see a reason to think that there are certain universal principles of minimal alignment. You may argue that human extinction is something that almost everyone agrees is bad -- but now the principle of minimal alignment has shifted to "have the AI prevent things that almost everyone agrees is bad" which is another privileged moral judgement that I see no intrinsic reason to hold.

In truth, I see no neutral assumptions to ground AI alignment theory in. I think this is made even more difficult because even relatively small differences in moral theory from the point of view of information theoretic descriptions of moral values can lead to drastically different outcomes. However, I do find hope in moral compromise.

Evan_Gaensbauer6y10

As EA as a movement has grown so far, the community appears to converge upon a rationalization process whereby most of us have realized what is centrally morally important is the experiences of well-being of a relatively wide breadth of moral patients, and the relatively equal moral weight assigned to well-being of each moral patient. The difference between SI and those who focus on AIA is primarily their differing estimates of the expected value of far-future in terms of average or total well-being. Among the examples you provided, it seems some worldviews are more amenable to the rationalization process which lends itself to consequentialism and EA. Many community members were egalitarians and libertarians who find common cause now in trying to figure out if to focus on AIA or MCE. I think your point is important in that ultimately advocating for this type of values spreading could be bad. However what appears to be an extreme amount of diversity could end up looking less fraught in a competition among values as divergent worldviews converge on similar goals.

Since different types of worldviews, like any amenable to aggregate consequentialist frameworks, can collate around a sing... (read more)

John_Maxwell6y21

Thanks for this post. Some scattered thoughts:

The main risk for AIA seems to be that the technical research done to better understand how to build an aligned AI will increase AI capabilities generally, meaning it’s also easier for humanity to produce an unaligned AI.

This doesn't seem like a big consideration to me. Even if unfriendly AI comes sooner by an entire decade, this matters little on a cosmic timescale. An argument I find more compelling: If we plot the expected utility of an AGI as a function of the amount of effort put into aligning it, there might be a "valley of bad alignment" that is worse than no attempt at alignment at all. (A paperclip maximizer will quickly kill us and not generate much long-term suffering, whereas an AI that understands the importance of human survival but doesn't understand any other values will imprison us for all eternity. Something like that.)

I'd like to know more about why people think that our moral circles have expanded. I suspect activism plays a smaller role than you think. Steven Pinker talks about possible reasons for declining violence in his book The Better Angels of Our Nature. I'm guessing this is highly relat... (read more)

Lukas_Gloor5y12

Next to the counterpoints mentioned by Gregory Lewis, I think there is an additional reason why MCE seems less effective than more targeted interventions to improve the quality of the long-term future: Gains from trade between humans with different values become easier to implement as the reach of technology increases. As long as a non-trivial fraction of humans end up caring about animal wellbeing or digital minds, it seems likely it would be cheap for other coalitions to offer trades. So whether 10% of future people end up with an expanded moral circle or 100% may not make much of a difference to the outcome: It will be reasonably good either way if people reap the gains from trade.

One might object that it is unlikely that humans would be able to cooperate efficiently, given that we don't see this type of cooperation happening today. However, I think it's reasonable to assume that staying in control of technological progress beyond the AGI transition requires a degree of wisdom and foresight that is very far away from where most societal groups are at today. And if humans do stay in control, then finding a good solution for value disagreements may be the easier problem... (read more)

Brian_Tomasik

Interesting points. :) I think there could be substantial differences in policy between 10% support and 100% support for MCE depending on the costs of appeasing this faction and how passionate it is. Or between 1% and 10% support for MCE applied to more fringe entities. I'm not sure if sophistication increases convergence. :) If anything, people who think more about philosophy tend to diverge more and more from commonsense moral assumptions. Yudkowsky and I seem to share the same metaphysics of consciousness and have both thought about the topic in depth, yet we occupy almost antipodal positions on the question of how many entities we consider moral patients. I tend to assume that one's starting points matter a lot for what views one ends up with.

William_S

I agree with this. It seems like the world where Moral Circle Expansion is useful is the world where: The creators of AI are philosophically sophisticated (or persuadable) enough to expand their moral circle if they are exposed to the right arguments or work is put into persuading them. They are not philosophically sophisticated enough to realize the arguments for expanding the moral circle on their own (seems plausible). They are not philosophically sophisticated enough to realize that they might want to consider a distribution of arguments that they could have faced and could have persuaded them about what is morally right, and design AI with this in mind (ie CEV), or with the goal of achieving a period of reflection where they can sort out the sort of arguments that they would want to consider. I think I'd prefer pushing on point 3, as it also encompasses a bunch of other potential philosophical mistakes that AI creators could make.

nonn6y13

I think there’s a significant[8] chance that the moral circle will fail to expand to reach all sentient beings, such as artificial/small/weird minds (e.g. a sophisticated computer program used to mine asteroids, but one that doesn’t have the normal features of sentient minds like facial expressions). In other words, I think there’s a significant chance that powerful beings in the far future will have low willingness to pay for the welfare of many of the small/weird minds in the future.[9]

I think it’s likely that the powerful beings in the far future (analogous to humans as the powerful beings on Earth in 2018) will use large numbers of less powerful sentient beings

So I'm curious for your thoughts. I see this concern about "incidental suffering of worker-agents" stated frequently, which may be likely in many future scenarios. However, it doesn't seem to be a crucial consideration, specifically because I care about small/weird minds with non-complex experiences (your first consideration).

Caring about small minds seems to imply that "Opportunity Cost/Lost Risks" are the dominate consideration - if small minds have moral value comparable to large minds, then... (read more)

avacyn6y12

This post is extremely valuable - thank you! You have caused me to reexamine my views about the expected value of the far future.

What do you think are the best levers for expanding the moral circle, besides donating to SI? Is there anything else outside of conventional EAA?

Jacy6y11

Thanks! That's very kind of you.

I'm pretty uncertain about the best levers, and I think research can help a lot with that. Tentatively, I do think that MCE ends up aligning fairly well with conventional EAA (perhaps it should be unsurprising that the most important levers to push on for near-term values are also most important for long-term values, though it depends on how narrowly you're drawing the lines).

A few exceptions to that:

Digital sentience probably matters the most in the long run. There are good reasons to be skeptical we should be advocating for this now (e.g. it's quite outside of the mainstream so it might be hard to actually get attention and change minds; it'd probably be hard to get funding for this sort of advocacy (indeed that's one big reason SI started with farmed animal advocacy)), but I'm pretty compelled by the general claim, "If you think X value is what matters most in the long-term, your default approach should be working on X directly." Advocating for digital sentience is of course neglected territory, but Sentience Institute, the Nonhuman Rights Project, and Animal Ethics have all worked on it. People for the Ethical Treatment of Reinforceme

... (read more)

MichaelPlant6y11

I thought this was very interesting, thanks for writing up. Two comments

It was useful to have a list of reasons why you think the EV of the future could be around zero, but it still found it quite vague/hard to imagine - why exactly would more powerful minds be mistreating less powerful minds? etc. - so I'd would have liked to see that sketched in slightly more depth.
It's not obvious to me it's correct/charitable to draw the neglectedness of MCE so narrowly. Can't we conceive of a huge ammount of moral philosophy, and well as social activism, both new and old, as MCE? Isn't all EA outreach an indirect form of MCE?

Jacy6y10

I'm sympathetic to both of those points personally.

1) I considered that, and in addition to time constraints, I know others haven't written on this because there's a big concern of talking about it making it more likely to happen. I err more towards sharing it despite this concern, but I'm pretty uncertain. Even the detail of this post was more than several people wanted me to include.

But mostly, I'm just limited on time.

2) That's reasonable. I think all of these boundaries are fairly arbitrary; we just need to try to use the same standards across cause areas, e.g. considering only work with this as its explicit focus. Theoretically, since Neglectedness is basically just a heuristic to estimate how much low-hanging fruit there is, we're aiming at "The space of work that might take such low-hanging fruit away." In this sense, Neglectedness could vary widely. E.g. there's limited room for advocating (e.g. passing out leaflets, giving lectures) directly to AI researchers, but this isn't affected much by advocacy towards the general population.

I do think moral philosophy that leads to expanding moral circles (e.g. writing papers supportive of utiltiarianism), moral-circle-foc... (read more)

Matthew_Barnett6y9

A very interesting and engaging article indeed.

I agree that people often underestimate the value of strategic value spreading. Oftentimes, proposed moral models that AI agents will follow have some lingering narrowness to them, even when they attempt to apply the broadest of moral principles. For instance, in Chapter 14 of Superintelligence, Bostrom highlights his common good principle:

Superintelligence should be developed only for the benefit of all of humanity and in the service of widely shared ethical ideals.

Clearly, even something as broad as tha... (read more)

Evan_Gaensbauer

Historically it doesn't seem to be true. As AIA becomes more mainstream, it'll be attracting a wider diversity of people, which may induce a form of common grounding and normalization of the values in the community. We should be looking for opportunities to collect data on this in the future to see how attitudes within AIA change. Of course this could lead to attempts to directly influence the proportionate representation of different values within EA. That'd be prone to all the hazards of an internal tug of war pointed out in other comments on this post. Because the vast majority of the EA movement focused on the impact of advanced AI on the far future are relatively coordinated and with sufficiently similar goals there isn't much risk of internal fraction in the near future. I think organizations from MIRI to FRI are also averse to growing AIA in ways which drive the trajectory of the field away from what EA currently values.

Michael_S6y8

On this topic, I similarly do still believe there’s a higher likelihood of creating hedonium; I just have more skepticism about it than I think is often assumed by EAs.

This is the main reason I think the far future is high EV. I think we should be focusing on p(Hedonium) and p(Delorium) more than anything else. I'm skeptical that, from a hedonistic utilitarian perspective, byproducts of civilization could come close to matching the expected value from deliberately tiling the universe (potentially multiverse) with consciousness optimized for pleasure or pain. If p(H)>p(D), the future of humanity is very likely positive EV.

ElizabethBarnes6y7

You say

I think a given amount of dolorium/dystopia (say, the amount that can be created with 100 joules of energy) is far larger in absolute moral expected value than hedonium/utopia made with the same resources

Could you elaborate more on why this is the case? I would tend to think that a prior would be that they're equal, and then you update on the fact that they seem to be asymmetrical, and try to work out why that is the case, and whether those factors will apply in future. They could be fundamentally asymmetrical, or evolutionary pressures may tend... (read more)

itaibn6y5

My current position is that the amount of pleasure/suffering that conscious entities will experience in a far-future technological civilization will not be well-defined. Some arguments for this:

Generally utility functions or reward functions are invariant under affine transformations (with suitable rescaling for the learning rate for reward functions). Therefore they cannot be compared between different intelligent agents as a measure of pleasure.
The clean separation of our civilization into many different individuals is an artifact of how evolution op

... (read more)

MikeJohnson

Possibly the biggest unknown in ethics is whether bits matter, or whether atoms matter. If you assume bits matter, then I think this naturally leads into a concept cluster where speaking about utility functions, preference satisfaction, complexity of value, etc, makes sense. You also get a lot of weird unresolved thought-experiments like homomorphic encryption. If you assume atoms matter, I think this subtly but unavoidably leads to a very different concept cluster-- qualia turns out to be a natural kind instead of a leaky reification, for instance. Talking about the 'unity of value thesis' makes more sense than talking about the 'complexity of value thesis'. TL;DR: I think you're right that if we assume computationalism/functionalism is true, then pleasure and suffering are inherently ill-defined, not crisp. They do seem well-definable if we assume physicalism is true, though.

itaibn

Thanks for reminding me that I was implicitly assuming computationalism. Nonetheless, I don't think physicalism substantially affects the situation. My arguments #2 and #4 stand unaffected; you have not backed up your claim that qualia is a natural kind under physicalism. While it's true that physicalism gives clear answers for the value of two identical systems or a system simulated with homomorphic encryption, it may still be possible to have quantum computations involving physically instantiated conscious beings, by isolating the physical environment of this being and running the CPT reversal of this physical system after an output has been extracted to maintain coherence. Finally, physicalism adds its own questions, namely, given a bunch of physical systems that all appear to have behavior that appears to be conscious, which ones are actually conscious and which are not. If I understood you correctly, physicalism as a statement about consciousness is primary a negative statement, "the computational behavior of a system is not sufficient to determine what sort of conscious activity occurs there", which doesn't by itself tell you what sort of conscious activity occurs.

MikeJohnson

It seems to me your #2 and #4 still imply computationalism and/or are speaking about a straw man version of physicalism. Different physical theories will address your CPT reversal objection differently, but it seems pretty trivial to me. I would generally agree, but would personally phrase this differently; rather, as noted here, there is no objective fact-of-the-matter as to what the 'computational behavior' of a system is. I.e., no way to objectively derive what computations a physical system is performing. In terms of a positive statement about physicalism & qualia, I'm assuming something on the order of dual-aspect monism / neutral monism. And yes insofar as a formal theory of consciousness which has broad predictive power would depart from folk intuition, I'd definitely go with the formal theory.

itaibn

Thanks for the link. I didn't think to look at what other posts you have published and now I understand your position better. As I now see it, there two critical questions for distinguishing the different positions on the table: 1. Does our intuitive notion of pleasure/suffering have objective precisely defined fundamental concept underlying it? 2. In practice, is it a useful approach to look for computational structures exhibiting pleasure/suffering in the distant future as a means to judge possible outcomes? Brian Tomasik answers these questions "No/Yes", and a supporter of the Sentience Institute would probably answer "Yes" to the second question. Your answers are "Yes/No", and so you prefer to work on finding the underlying theory for pleasure/suffering. My answers are "No/No", and am at a loss. I see two reasons why a person might think that pleasure/pain of conscious entities is a solid enough concept to answer "Yes" to either of these questions (not counting conservative opinions over what futures are possible for question 2). The first is a confusion caused by subtle implicit assumptions in the way we talk about consciousness, which makes a sort of conscious experience from which includes in it pleasure and pain seem more ontologically basic than it really is. I won't elaborate on this in this comment, but for now you can round me as an eliminativist. The second is what I was calling "a sort wishful thinking" in argument #4: These people have moral intuitions that tell them to care about others' pleasure and pain, which implies not fooling themselves about how much pleasure and pain others experience. On the other hand, there are many situations where their intuition does not give them a clear answer, but also tells them that picking an answer arbitrarily is like fooling themselves. They resolve this tension by telling themselves, "there is a 'correct answer' to this dilemma, but I don't know what it is. I should act to best approximate this 'correct

MikeJohnson

Thanks, this is helpful. My general position on your two questions is indeed "Yes/No". The question of 'what are reality's natural kinds?' is admittedly complex and there's always room for skepticism. That said, I'd suggest the following alternatives to your framing: * Whether the existence of qualia itself is 'crisp' seems prior to whether pain/pleasure are. I call this the 'real problem' of consciousness. * I'm generally a little uneasy with discussing pain/pleasure in technically precise contexts- I prefer 'emotional valence'. * Another reframe to consider is to disregard talk about pain/pleasure, and instead focus on whether value is well-defined on physical systems (i.e. the subject of Tegmark's worry here). Conflation of emotional valence & moral value can then be split off as a subargument. Generally speaking, I think if one accepts that it's possible in principle to talk about qualia in a way that 'carves reality at the joints', it's not much of a stretch to assume that emotional valence is one such natural kind (arguably the 'c. elegans of qualia'). I don't think we're logically forced to assume this, but I think it's prima facie plausible, and paired with some of our other work it gives us a handhold for approaching qualia in a scientific/predictive/falsifiable way. Essentially, QRI has used this approach to bootstrap the world's first method for quantifying emotional valence in humans from first principles, based on fMRI scans. (It also should work for most non-human animals; it's just harder to validate in that case.) We haven't yet done the legwork on connecting future empirical results here back to the computationalism vs physicalism debate, but it's on our list. TL;DR: If consciousness is a 'crisp' thing with discoverable structure, we should be able to build/predict useful things with this that cannot be built/predicted otherwise, similar to how discovering the structure of electromagnetism let us build/predict useful things we could not ha

itaibn

It wasn't clear to me from your comment, but based on your link I am presuming that by "crisp" you mean "amenable to generalizable scientific theories" (rather than "ontologically basic"). I was using "pleasure/pain" as a catch-all term and would not mind substituting "emotional valence". It's worth emphasizing that just because a particular feature is crisp does not imply that it generalizes to any particular domain in any particular way. For example, a single ice crystalline has a set of directions in which the molecular bonds are oriented which is the same throughout the crystal, and this surely qualifies as a "crisp" feature. Nonetheless, when the ice melts, this feature becomes undefined -- no direction is distinguished from any other direction in water. When figuring out whether a concept from one domain extends to a new domain, to posit that there's a crisp theory describing the concept does not answer this question without any information on what that theory looks like. So even if there existed a theory describing qualia and emotional valence as it exists on Earth, it need not extend to being able to describe every physically possible arrangement of matter, and I see no reason to expect it to. Since a far future civilization will be likely to approach the physical limits of matter in many ways, we should not assume that it is not one such arrangement of matter where the notion of qualia is inapplicable.

MikeJohnson

This is an important point and seems to hinge on the notion of reference, or the question of how language works in different contexts. The following may or may not be new to you, but trying to be explicit here helps me think through the argument. Mostly, words gain meaning from contextual embedding- i.e. they’re meaningful as nodes in a larger network. Wittgenstein observed that often, philosophical confusion stems from taking a perfectly good word and trying to use it outside its natural remit. His famous example is the question, “what time is it on the sun?”. As you note, maybe notions about emotional valence are similar- trying to ‘universalize’ valence may be like trying to universalize time-zones, an improper move. But there’s another notable theory of meaning, where parts of language gain meaning through deep structural correspondence with reality. Much of physics fits this description, for instance, and it’s not a type error to universalize the notion of the electromagnetic force (or electroweak force, or whatever the fundamental unification turns out to be). I am essentially asserting that qualia is like this- that we can find universal principles for qualia that are equally and exactly true in humans, dogs, dinosaurs, aliens, conscious AIs, etc. When I note I’m a physicalist, I intend to inherit many of the semantic properties of physics, how meaning in physics ‘works’. I suspect all conscious experiences have an emotional valence, in much the same way all particles have a charge or spin. I.e. it’s well-defined across all physical possibilities.

itaibn

Do you think we should move the conversation to private messages? I don't want to clutter a discussion thread that's mostly on a different topic, and I'm not sure whether the average reader of the comments benefits or is distracted by long conversations on a narrow subtopic. Your comment appears to be just reframing the point I just made in your own words, and then affirming that you believe that the notion of qualia generalizes to all possible arrangements of matter. This doesn't answer the question, why do you believe this? By the way, although there is no evidence for this, it is commonly speculated by physicists that the laws of physics allow multiple metastable vacuum states, and the observable universe only occupies one such vacuum, and near different vacua there different fields and forces. If this is true then the electromagnetic field and other parts of the Standard Model are not much different from my earlier example of the alignment of an ice crystal. One reason this view is considered plausible is simply the fact that it's possible: It's not considered so unusual for a quantum field theory to have multiple vacuum states, and if the entire observable universe is close to one vacuum then none of our experiments give us any evidence on what other vacuum states are like or whether they exist. This example is meant to illustrate a broader point: I think that making a binary distinction between contextual concepts and universal concepts is oversimplified. Rather, here's how I would put it: Many phenomena generalize beyond the context in which they were originally observed. Taking advantage of this, physicists deliberate seek out the phenomena that generalize as far as possible, and over history broadened their grasp very far. Nonetheless, they avoid thinking about any concept as "universal", and often when they do think a concept generalizes they have a specific explanation for why it should, while if there's a clear alternative to the concept generalizing

MikeJohnson

EA forum threads auto-hide so I’m not too worried about clutter. I don’t think you’re fully accounting for the difference in my two models of meaning. And, I think the objections you raise to consciousness being well-defined would also apply to physics being well-defined, so your arguments seem to prove too much. To attempt to address your specific question, I find the hypothesis that ‘qualia (and emotional valence) are well-defined across all arrangements of matter’ convincing because (1) it seems to me the alternative is not coherent (as I noted in the piece on computationalism I linked for you) and (2) it seems generative and to lead to novel and plausible predictions I think will be proven true (as noted in the linked piece on quantifying bliss and also in Principia Qualia). All the details and sub arguments can be found in those links. Will be traveling until Tuesday; probably with spotty internet access until then.

itaibn

I haven't responded to you for so long firstly because I felt like we got to the point in the discussion where it's difficult to get across anything new and I wanted to be attentive to what I say, and then because after a while without writing anything I became disinclined from continuing. The conversation may close soon. Some quick points: * My whole point in my previous comment is that the conceptual structure of physics is not what you make it out to be, and so your analogy to physics is invalid. If you want to say that my arguments against consciousness apply equally well to physics you will need to explain the analogy. * My views on consciousness that I mentioned earlier but did not elaborate on are becoming more relevant. It would be a good idea for me to explain them in more detail. * I read your linked piece on quantifying bliss and I am unimpressed. I concur with the last paragraph of this comment.

ElizabethBarnes6y4

Thanks very much for writing this, and thanks to Greg for funding it! I think this is a really important discussion. Some slightly rambling thoughts below.

We can think about 3 ways of improving the EV of the far future:

1: Changing incentive structures experienced by powerful agents in the future (e.g. avoiding arms races, power struggles, selection pressures)

2: a) Changing the moral compass of powerful agents in the future in specific directions (e.g. MCE).

b) Indirect ways to improve the moral compass of powerful agents in the future (e.g. philosophy r... (read more)

oge6y4

Thank you for providing an abstract for your article. I found it very helpful.

(and I wish more authors here would do so as well)

ateabug6y3

Random thought: (factory farm) animal welfare issues will likely eventually be solved by cultured (lab grown) meat when it becomes cheaper than growing actual animals. This may take a few decades, but social change might take even longer. The article even suggests technical issues may be easier to solve, so why not focus more on that (rather than on MCE)?

Jacy

I just took it as an assumption in this post that we're focusing on the far future, since I think basically all the theoretical arguments for/against that have been made elsewhere. Here's a good article on it. I personally mostly focus on the far future, though not overwhelmingly so. I'm at something like 80% far future, 20% near-term considerations for my cause prioritization decisions. To clarify, the post isn't talking about ending factory farming. And I don't think anyone in the EA community thinks we should try to end factory farming without technology as an important component. Though I think there are good reasons for EAs to focus on the social change component, e.g. there is less for-profit interest in that component (most of the tech money is from for-profit companies, so it's less neglected in this sense).

Vidur Kapur6y3

Thank you for this piece. I enjoyed reading it and I'm glad that we're seeing more people being explicit about their cause-prioritization decisions and opening up discussion on this crucially important issue.

I know that it's a weak consideration, but I hadn't, before I read this, considered the argument for the scale of values spreading being larger than the scale of AI alignment (perhaps because, as you pointed out, the numbers involved in both are huge) so thanks for bringing that up.

I'm in agreement with Michael_S that hedonium and delorium should be... (read more)

Jacy

That makes sense. If I were convinced hedonium/dolorium dominated to a very large degree, and that hedonium was as good as dolorium is bad, I would probably think the far future was at least moderately +EV.

zdgroff

Isn't hedonium inherently as good as dolorium is bad? If it's not, can't we just normalize and then treat them as the same? I don't understand the point of saying there will be more hedonium than dolorium in the future, but the dolorium will matter more. They're vague and made-up quantities, so can't we just set it so that "more hedonium than dolorium" implies "more good than bad"?

MetricSulfateFive

He defines hedonium/dolorium as the maximum positive/negative utility you can generate with a certain amount of energy: "For example, I think a given amount of dolorium/dystopia (say, the amount that can be created with 100 joules of energy) is far larger in absolute moral expected value than hedonium/utopia made with the same resources."

Jacy

Exactly. Let me know if this doesn't resolve things, zdgroff.

brb2432y1

Review for the Decade Review

While the central thesis to expand one’s moral circles can be well-enjoyed by the community, this post is not selling it well. This is exemplified by the “One might be biased towards AIA if…” section, which makes assumptions about individuals who focus on AI alignment. Further, while the post includes a section on cooperation, it discourages it. [Edit: Prima facie,] the post does not invite critical discussion. Thus, ~~I would not recommend this post to any readers interested in moral circles expansion, AI alignment, or cooperation.~~ Thus, I would recommend this post to readers interested in moral circles expansion, AI alignment, and cooperation, as long as they are interested in a vibrant discourse.

MichaelStJules2y14

Do you think there's a better way to discuss biases that might push people to one cause or another? Or that we shouldn't talk about such potential biases at all?

What do you mean by this post discouraging cooperation?

What do you expect an invitation for critical discussion to look like? I usually take that to be basically implicit when something is posted to the EA Forum, unless the author states otherwise.

brb243

Hm, 1) how do you define a bias? What is your reference for evaluation whether something is biased? The objective should be to make the best decisions with available information at any given time while supporting innovation and keeping 'openminded.' This 'bias' assessment should be conducted to identify harmful actions that individuals deem positive due to their biases and inform overall prioritization decisionmaking rather than seeking to change one's perspectives on causes they prefer. This can contribute to systemic change and optimal specialization development by individuals. This is a better way to approach biases. The section on cooperation discourages collaboration because it understands cooperation as asserting one’s perspectives where these are not welcome rather than advancing ventures. The part also states: “insofar as MCE is uncooperative, I think a large number of other EA interventions, including AIA, are similarly uncooperative.” These author’s assumptions, if not critically examined against evidence, can discourage persons who could be seeking encouragement to cooperate with others in this article from doing so, because one may wish to avoid sharing perspectives where they are not welcome. An invitation for critical discussion can include an argument for the writing’s relevance to the development of answers to open-ended questions. But I can agree with your point that this can be superfluous, so would add (added) prima facie and edited the conclusion.

Denkenberger6y1

Impressive article - I especially liked the biases section. I would recommend doing a quantitative model of cost effectiveness comparing to AIA, as I have done for global agricultural catastrophes, especially because neglectedness is hard to define in your case.

Jan_Kulveit6y-1

Thanks for writing it.

Here are my reasons for the belief wild animal/small minds/... suffering agenda is based mostly on errors and uncertainties. Some of the uncertainties should warrant research effort, but I do not believe the current state of knowledge justifies prioritization ofany kind of advocacy or value spreading.

1] The endeavour seems to be based on extrapolating intuitive models far outside the scope for which we have data. The whole suffering calculus is based on extrapolating the concept of suffering far away from the domain for which we have... (read more)

Brian_Tomasik6y16

You raise some good points. (The following reply doesn't necessarily reflect Jacy's views.)

I think the answers to a lot of these issues are somewhat arbitrary matters of moral intuition. (As you said, "Big part of it seems arbitrary.") However, in a sense, this makes MCE more important rather than less, because it means expanded moral circles are not an inevitable result of better understanding consciousness/etc. For example, Yudkowsky's stance on consciousness is a reasonable one that is not based on a mistaken understanding of present-day neuroscience (as far as I know), yet some feel that Yudkowsky's view about moral patienthood isn't wide enough for their moral tastes.

Another possible reply (that would sound better in a political speech than the previous reply) could be that MCE aims to spark discussion about these hard questions of what kinds of minds matter, without claiming to have all the answers. I personally maintain significant moral uncertainty regarding how much I care about what kinds of minds, and I'm happy to learn about other people's moral intuitions on these things because my own intuitions aren't settled.

E.g. we can think about the DNA based evolutio

... (read more)

adamaero6y-4

@Matthew_Barnett As a senior electrical engineering student, proficient in a variety of programming languages, I do think and believe that AI is important to think about and discuss. The theoretical threat of a malevolent strong AI would be immense. But that does not mean one has cause or a valid reason to support CS grad students financially.

A large, significant, asteroid collision with Earth would also be quite devastating. Yet, to fund and support aerospace grads does not follow. Perhaps I really mean this: AI safety is an Earning to Give non sequitur.... (read more)

Matthew_Barnett

I don't think anyone here is suggesting supporting random CS grads financially. Although, they might endorse something like that indirectly by funding AI alignment research, which tends to attract CS grads. I agree that simply because an asteroid collision would be devastating, it does not follow that we should necessarily focus on that work in particular. However, there are variables which I think you might be overlooking. The reason why people are concerned with AI alignment is not necessarily because of the scope of the issue, but also the urgency and tractability of the problem. The urgency of the problem comes from the idea that advanced AI will probably be developed this century. The tractability of the problem comes from the idea that there exists a set of goals that we could in theory put into an AI goals that are congruent with ours -- you might want to read up on the Orthogonality Thesis. Furthermore, it is dangerous to assume that we should judge the effectiveness of certain activities merely based on prior evidence or results. There are some activities which are just infeasible to give post hoc judgements about -- and this issue is one of them. The inherent nature of the problem is that we will probably only get about one chance to develop superintelligence -- because if we fail, then we will all probably die or otherwise be permanently unable to alter its goals. To give you an analogy, few would agree that because climate change is an unprecedented threat, it therefore follows that we should wait until after the damage has been done to assess the best ways of mitigating it. Unfortunately for issues that have global scope, it doesn't look like we get a redo if things start going badly. If you want to learn more about the research, I recommend reading Superintelligence by Nick Bostrom. The vast majority of AI alignment researchers are not worried about malevolent AI despite your statement. I mean this is in the kindest way possible, but if you really

-4

adamaero

Please, what AIA organizations? MIRI? And do not worry about offending me. I do not intend to offend. If I do/did though my tone or however, I am sorry. That being said, I wish you would've examined the actual claims I presented. I did not claim AI researchers are worried about a malevolent AI. I am not against researchers; research in robotics, industrial PLCs, nanotech, whatever--are fields in their own right. It is donating my income, as an individual that I take offense. People can fund whatever they want: A new planetary wing at a museum, research in robotics, research in CS, research in CS philosophy. Although, Earning to Give does not follow. Thinking about and discussing the risks of strong AI does make sense, and we both seem to agree it is important. The CS grad students being supported, however, what makes them different from a random CS grad? Just because they claim to be researching AIA? Following the money, there is not a clear answer on which CS grad students are receiving it. Low or zero transparency. MIRI or no? Am I missing some public information? Second, what do you define as advanced AI? Before, I said strong AI. Is that what you mean? Is there some sort of AI in between? I'm not aware. This is crucially where I split with AI safety. The theory is an idea of a belief about the far future. To claim that we're close to developing strong AI is unfounded to me. What in this century is so close to strong AI? Neural networks do not seem to be (from my light research). I do not believe climate change is as simple to define a "before" and "after." Perhaps a large rogue solar flair or the Yellowstone supervolcano. Or perhaps even a time travel analogy would suffice ~ time travel safety research. There is no tractability/solvability. [Blank] cannot be defined because it doesn't exist; unfounded and unknown phenomena cannot be solved. Climate change exists. It is a very real reality. It has solvability. A belief in an idea about the future is a poor re

Matthew_Barnett

Yes, MIRI is one. FHI is another. You did, however, say "The theoretical threat of a malevolent strong AI would be immense. But that does not mean one has cause or a valid reason to support CS grad students financially." I assumed you meant that you believed someone was giving an argument along the lines of "since malevolent AI is possible, then we should support CS grads." If that is not what you meant, then I don't see the relevance of mentioning malevolent AI. Since you also stated that you had an issue with me not being charitable, I would reciprocate likewise. I agree that we should be charitable to each other's opinions. Having truthful views is not about winning debate. It's about making sure that you hold good beliefs for good reasons, end of story. I encourage you to imagine this conversation not as a way to convince me that I'm wrong -- but more of a case study about what the current arguments are, and whether they are valid. In the end, you don't get points for winning an argument. You get points for actually holding correct views. Therefore, it's good to make sure that your beliefs actually hold weight under scrutiny. Not in a, "you can't find the flaw after 10 minutes of self-sabotaged thinking" sort of way, but in a very deep understanding sort of way. I agree people can fund whatever they want. It's important to make a distinction between normative questions and factual ones. It's true that people can fund whatever project they like; however, it's also true that some projects have a high value from an impersonal utilitarian perspective. It is this latter category that I care about, which is why I want to find projects with particular high value. I believe that existential risk mitigation and AI alignment is among these projects, although I fully admit that I may be mistaken. If you agree that thinking about something is valuable, why not also agree that funding that thing is valuable. It seems you think that the field should just get a certain t

-1

adamaero

I am not trying to "win" anything. I am stating why MIRI is not transparent, and does not deal in scalable issues. As an individual, Earning to Give, it does not follow to fund such things under the guise of Effective Altruism. Existential risk is important to think about and discuss as individuals. However, funding CS grad students does not make sense in the light of Effective Altruism. Funding does not increase "thinking." The whole point of EA is to not give blindly. For example, giving food aid, although meaning well, can have a very negative effect (i.e., the crowding out effect on the local market). Nonmaleficence should be one's initial position in regards to funding. Lastly, no I rarely accept something as true first. I do not first accept the null hypothesis. "But there's a whole load of arguments about why it is a tractable field"--What are they? Again, none of the actual arguments were examined: How is MIRI going about tractable/solvable issues? Who of MIRI is getting the funds? How is time travel safety not as relevant as AI safety?

Dunja

Thanks for this discussion, which I find quite interesting. I think the effectiveness and efficiency of funding research projects concerning risks of AI is a largely neglected topic. I've posted some concerns on this below an older thread on MIRI: http://effective-altruism.com/ea/14c/why_im_donating_to_miri_this_year/dce * the primary problem being the lack of transparency on the side of Open Phil. concerning the evaluative criteria used in their decision to award MIRI with an extremely huge grant.

adamaero6y-8