Hide table of contents
Sep 11 201513 min read 13

32

Note: When I speak of extinction risk in this essay, it refers not just to complete extinction but to any event that collapses civilization to the point where we cannot achieve highly good outcomes for the far future.

There are two major interventions for shaping the far future: reducing human extinction risk and spreading good values. Although we don’t really know how to reduce human extinction, the problem itself is fairly clear and has seen a lot of discussion among effective altruists. Values spreading is less clear.

A lot of EA activities could be classified as values spreading, but of very different sorts. Meta-organizations like Giving What We Can and Charity Science try to encourage people to value charity more highly; animal charities like The Humane League and Animal Ethics try to get people to assign greater weight to non-human animals. Many supporters of animal welfare interventions believe that these interventions have a large positive effect on the far future via spreading values that cause people to behave in ways that make the world better.

I believe that reducing extinction risk has a higher expected value than spreading good values, and there are a number of concerns with values spreading that make me reluctant to support it. This essay lays out my reasoning.

Personal note: In 2014 I directed my entire donations budget to The Humane League, and in 2015 I directed it to Animal Charity Evaluators. At the time, I generally agreed with the arguments that values spreading is the most important intervention. But recently I have considered this claim more carefully and now I am more skeptical, for the reasons outlined below.

The Upside of Values Spreading

In “Values Spreading Is Often More Important than Extinction Risk”, Brian Tomasik lays out a mathematical intuition for the claim that values spreading has a much higher impact than preventing extinction risk. If we can slightly increase the probability of creating vastly good outcomes in the far future, this could be higher impact than preventing extinction, since the difference between all-humans-are-dead and business-as-usual is probably smaller than the difference between business-as-usual and everything-is-as-good-as-possible. He also claims that values spreading is easier than preventing extinction risk: the best future would be filling the universe with beings that constantly feel euphoria, but this strikes most people as boring and undesirable, so it’s not a popular belief. Thus, increasing the number of people who hold this belief is relatively easy–if one thousand people believe something, convincing 1% more people is a lot easier than if one million people believe it (because it requires convincing only 10 people rather than 10,000).

I have some concerns with this line of reasoning, and I generally believe that focusing on preventing extinction risk is more important (although I’m not confident about this conclusion–I believed the opposite about a month before I started writing this). In this essay I lay out some of my concerns and reasons to prefer working on reducing extinction risk over values spreading.

Uncertainty About Values Spreading

Brian’s mathematical argument in favor of values spreading over existential risk reduction has no obvious flaws, but it is not robust and its inputs derive from the reader’s intuition rather than empirical evidence about existing values and how easy they are to change. I put weak credence in this argument in particular. I have additional concerns with values spreading efforts in general.

Overconfidence

In many cases, spreading values is a zero-sum game played against other people spreading different values. (For example, I believe that the best thing to do is to fill the universe with tiny beings experiencing maximum possible happiness, but someone else might think this is a terrible outcome, so we both want to change each other’s mind.) As a general principle, we should be careful about playing zero-sum games rather than reaping gains from cooperation. We should only spread values if we are highly confident that we have the correct values. (I feel sufficiently confident about the fact that (many) non-human animals have value, but not sufficiently confident that we should fill the universe with tiny happy beings.)

I have reflected carefully on my values and feel fairly strongly about them. I believe I have reflected more carefully about my values than perhaps 99.9% of the population, but that still means there are about 7 million people in the world who have reflected as carefully as I have, and many of them disagree with me. The outside view says that I would probably change my mind about at least some of my values if I had more information, so I should be cautious about spreading them.

There’s a sense of values spreading for which this reasoning does not apply. If I want to spread values, I might do that by presenting well-constructed arguments for why my values are important. Even if I’m wrong, I believe it is generally beneficial to produce new good arguments or expose more people to good arguments, presuming those people have sufficiently strong ability to evaluate arguments. Although I believe this sort of activity is probably beneficial in most cases, that’s a long way from the claim that it’s the best thing to be doing. 1

Which values to promote?

I can think of a few values that are plausibly extremely important for people to hold:

  • Concern for non-human animals
  • Effective altruism
  • Utilitarianism
  • Rationality
  • Cooperation between people with competing goals
  • Desire for universal eudaimonia–filling the universe with beings experiencing the greatest possible happiness

Right now a lot of EAs support spreading concern for non-human animals and effective altruism, and utilitarianism is pretty similar in practice to effective altruism (the reason a lot of people (including myself) got into EA is because it’s the logical conclusion of utilitarian ethics). CFAR and other similar organizations are trying to improve rationality. A lot of people care about spreading cooperation but I don’t know much about efforts to do this and whether any are effective. I don’t know of anyone specifically trying to spread desire for universal eudaimonia.

Spreading desire for universal eudaimonia is plausibly the best of these, but as far as I know no one is doing this (and it’s not even clear how one would go about doing it). I generally believe that supporting effective object-level charities is more important than spreading effective altruism (this is a non-obvious claim but I don’t want to get into it here–the basic argument is that the best charities probably have much higher impact than the most popular EA charities). Spreading anti-speciesism may be highly valuable, but Brian’s mathematical intuition for the value of spreading desire for eudaimonia works against the case for anti-speciesism: a sufficiently large proportion of people care about animals that increasing this proportion is relatively difficult relative to increasing the proportion of people who share some less common value, like desire for universal eudaimonia.

For the most part, these different values are not mutually contradictory, so in this sense the fact that it’s not obvious which values to spread does not act as an argument against spreading values.

Value Shifts

We have a few reasons to be optimistic that human values are moving in the right direction. Humans have become less violent and more cooperative over time (Steven Pinker has argued this in depth in The Better Angels of Our Nature; Holden discusses it here).

In The Expanding Circle, Peter Singer argues that humans’ circles of compassion have expanded over time. In light of this evidence, it appears likely that concern for all sentient beings will eventually become widespread as most people’s circles of compassion expand. (Although some have disputed Singer’s evidence.)

In the very long term, values shift due to evolutionary forces, and this could be harmful. But for now, values appear to be moving in a positive direction and will probably continue to do so. The evidence here is not very clear-cut and there’s some evidence in the opposite direction, but I’m generally optimistic.

Debiasing people could lead to correct values

In my experience, when people make non-utilitarian judgments in thought experiments, these judgments almost always arise due to well-understood cognitive biases. If these biases were corrected, they would make utilitarian judgments in most situations.

A few examples of such situations come to mind. (I’m sure many readers will disagree with my assertion that people’s judgments are wrong in these scenarios. For such readers, you can probably think of an area where you disagree with most people, but you expect most people would agree with you if they had better information or were less biased.)

It’s important to note that I do not expect standard rationality training to fix these errors in judgment. In most cases they would probably require much stronger interventions, such as creating technology to let people feel what it’s like to be a chicken or building an experience machine. I predict that technology like this will exist in the future. But I nonetheless find it likely that most people will change their minds about these scenarios once they get better information.

  • If we make humans capable of experiencing massively more happiness, then anyone who does it should agree that making it happen is super important. And if they don’t then I’m probably wrong about it being important. See Scott Alexander’s Wirehead Gods on Lotus Thrones. See also SMBC.
  • People find the repugnant conclusion repugnant because they are not mentally capable of aggregating small experiences over lots of people; see also Torture vs. Dust Specks and its followup Circular Altruism.
  • People don’t believe we should give all our resources to a utility monster because they can’t conceive of how amazingly happy a utility monster would be.
  • People are negative utilitarians because the worst possible suffering outweighs the best possible happiness in humans (and probably in all sentient animals), but this is likely untrue over the space of all possible minds. If we could modify humans to experience happiness equal to their capacity for suffering, they should choose, for example, 2 seconds of extreme happiness plus 1 second of extreme suffering rather than none of either.
  • Intelligent humans frequently discount animals because they believe they cannot suffer (which is factually false) or that higher-order reasoning or self-awareness is what makes suffering morally significant. If we can temporarily remove people’s self-awareness (which may be possible if we can upload humans to computers and then modify their minds), it should become obvious to them that self-awareness is not what makes suffering bad. And if it doesn’t become obvious then I’m probably wrong about that.
  • Many people believe that having twice as many happy people is less than twice as good. They probably believe this because their mental algorithms naively apply the principle of diminishing marginal utility even where it is not relevant, and their Systems 1 don’t know that diminishing marginal utility doesn’t work over people.
  • At EA Global, Joshua Greene gave a talk in which he claimed that drugs that enhance the amygdala response cause people to make more non-utilitarian judgments, and drugs that inhibit it make people more utilitarian. (Talk is available here.)

Will value shifts carry through to the far future?

Changing values now may not have much effect on values in the far future. Paul Christiano argues that changes to present-day values should be discounted by an order of magnitude when considering the far future:

I think that convincing an additional 10% of the world to share my values would increase the value of the future by less than 1% via it’s [sic] effect on far future values[.]

He presents a number of reasons to believe this; I have little to add to his analysis on this point.

Urgency

Societal values appear to be fairly malleable and moving in the right direction. Given sufficient time, it seems likely that we will produce highly good outcomes for not just humans but all sentient beings. We could speed up this process or make it more probable, but it feels much less urgent than extinction risk. Civilization has perhaps a 20% chance of collapsing in the next hundred years; it is critical that we prevent this from happening, and we need to act quickly.

It’s conceivable that society could end up “locking in” its values at some point in the near future, perhaps by creating a singleton. In this case it would be urgent to ensure that the singleton has correct values. Probably the best way to do this would be to find plausible mechanisms by which this could happen and then ensure that the singleton has the correct values. For example, if AI researchers will create a superintelligent AI with locked in values that quickly takes control of everything, we have to make sure that the AI has good values. It’s critical to ensure that a superintelligent AI places appropriate value on non-human animals. However, it seems considerably more likely that we will go extinct than that we will get locked in to values that are bad but not bad in a way that kills all humans.

Values Spreading as Cooperation

Some people (such as Paul Christiano) frame values spreading as an uncooperative act in which I compete against other moral agents to push my values instead of theirs. But the values that I would actually want to spread don’t seem to fit this. I could of course attempt to spread cooperation itself. But other values I care about count as a sort of cooperation, as well. For example, spreading concern for animals really means getting more people to take actions that benefit animals’ interests. It’s not clear that we can coherently say that chickens value not being tortured on factory farms, but they certainly have an interest in not being tortured. However, chickens cannot be cooperated with in the sense that chickens do not try to spread their values, and perhaps the only sense in which it makes sense to talk about cooperation is with other agents that are trying to spread their own values.

Problems with current efforts

For the sake of brevity, I will only discuss spreading concern for animals in this section. I believe it is probably the most promising values-spreading intervention.

We can relatively easily observe whether an advocacy organization successfully convinces people to eat less meat or fewer eggs (although even this is tricky). It’s intuitively plausible that when people eat less meat it’s because they care more about nonhuman animals, and they’re likely to propagate forward those values in ways that will benefit animals in the far future. But although this claim is plausible, there’s little actual evidence to support it–it has no advantage over claims about efforts to reduce existential risk. If the biggest benefit of animal advocacy organizations comes from spreading good values, then they need to put more effort into demonstrating that they actually do this. The evidence on this is extremely murky.

As far as I know, Animal Ethics is the only organization that explicitly tries to spread concern for wild animal suffering, which is by far the biggest source of suffering that currently exists. They are a young organization and I have no idea if they’re succeeding at this goal.

It’s not unfeasible to measure shifts in values for things we care about. But we have a difficult enough time figuring out whether animal charities even reduce meat consumption, much less change people’s fundamental judgments about non-human animals.

The Open Philanthropy Project plans to put $5 million per year into reducing factory farming, but it looks like they plan to focus more on having a direct impact on factory-farmed animals rather than spreading concern for animals in general. Activities like this have to take into account the large effects that factory farming has on wild animals: reducing factory farming might actually be harmful for animals in the short run (although Brian Tomasik believes (with weak confidence) that it is more likely to be positive than negative), so it is critical that any efforts improve concern for animals in a way that will benefit animals in the far future.

How efforts could be better

Probably, some sorts of values spreading matter much, much more than others. Perhaps convincing AI researchers to care more about non-human animals could substantially increase the probability that a superintelligent AI will also care about animals. This could be highly impactful and may even be the most effective thing to do right now. (I am much more willing to consider donating to MIRI now that its executive director values non-human animals.) I don’t see anyone doing this, and if I did, I would want to see evidence that they were doing a good job; but this would plausibly be a highly effective intervention.

There may be other sorts of targeted values spreading that work well. Veg outreach charities already try to target elite colleges on the premise that students there have more social leverage. You could probably do even better than that by targeting smaller and higher-leverage groups of people.

Conclusion

Both existential risk reduction and long-term values spreading potentially have astronomical impact but have fairly weak evidence supporting their effectiveness. It’s not clear which values we should spread, and if we spread them it’s not clear if they will stick or that they will produce a better world than we should have had anyway.

Additionally, existential risk reduction acts as a hedge: even if we don’t agree on values, lots of value systems agree that preventing extinction is good; and if civilization stays around for longer it gives us more time to find correct values. (It’s not obvious that values are the sort of thing you can deduce through reasoning, but I expect people’s values will often shift in the right direction in response to new knowledge, as explained in “Debiasing people could lead to correct values” above.)

Policy debates should not appear one-sided; I do believe there are compelling advantages of values spreading over existential risk reduction. Perhaps it’s easier to push values in the right direction than it is to reduce the probability of an extinction event; perhaps the difference between the everyone’s-dead world and the business-as-usual world is smaller than the difference between business-as-usual and everyone-has-perfect-values. But in general I find the concerns with values spreading more significant and more clear, which leads me to the conclusion that supporting existential risk reduction is more important.

Notes

  1. Actually, producing arguments on important subjects is probably the best thing to be doing in a lot of cases–it’s what I’m doing as I write this–but it has rapidly diminishing marginal utility. It’s probably valuable for me to spend five hours a week on this, but 40 would be excessive. For some people, like Brian Tomasik, who are exceptionally good at producing new and useful arguments, it may be better to do this full time. Producing arguments in this way sounds a lot like doing research. It’s beyond the scope of this essay to discuss whether high-impact research is the best cause to support.

Comments13
Sorted by Click to highlight new comments since: Today at 3:15 PM

Great post!

And if they don’t then I’m probably wrong about it being important.

I'm not sure what you mean by "wrong" here. :) Maybe you place a lot of value on the values that would be reached by a collective of smart, rational people thinking about the issues for a long time, and your current values are just best guesses of what this idealized group of people would arrive at? (assuming, unrealistically, that there would be a single unique output of that idealization process across a broad range of parameter settings)

For people who hold very general values of caring what other smart, rational people would care about, values-spreading seems far less promising. In contrast, if -- in light of the utter arbitrariness of values -- you care more about whatever random values you happen to feel now based on your genetic and environmental background, values-spreading seems more appealing.

People are negative utilitarians because the worst possible suffering outweighs the best possible happiness in humans (and probably in all sentient animals), but this is likely untrue over the space of all possible minds. If we could modify humans to experience happiness equal to their capacity for suffering, they should choose, for example, 2 seconds of extreme happiness plus 1 second of extreme suffering rather than none of either.

And if we could modify humans to recognize just how amazing paperclips are, they should choose, for example, 2 paperclips plus 1 second of extreme suffering rather than none of either.

However, it seems considerably more likely that we will go extinct than that we will get locked in to values that are bad but not bad in a way that kills all humans.

I'm curious to know your probabilities of these outcomes. If the chance of extinction (including by uncontrolled AI) in the next century is 20%, and if human-level AI arrives in the next century, then the chance of human-controlled AI would be 80%. Within that 80%, I personally would put most of the probability mass on AIs that favor particular values or give most of the decision-making power to particular groups of people. (Indeed, this has been the trend throughout human history and up to the present. Even in democracies, wealthy people have far more power than ordinary citizens.)

Addressing each of your comments in turn:

  1. I'm fairly confident that hedonistic utilitarianism is true (for some sense of "true"). Much of my confidence comes from the observation that people's objections to utilitarianism play into well-known cognitive biases, and if these biases were removed, I'd expect more people to agree with me. If they didn't agree with me even if they didn't have these biases, that would be grounds for questioning my confidence in utilitarianism.

  2. I think there's a difference between modifying people to be able to experience more happiness and modifying them to believe paperclips are great. The former modifies an experience and lets people's preferences arise naturally; the latter modifies preferences directly, so we can't trust that their preferences reflect what's actually good for them. Of course, preferences that arise naturally don't always reflect what's good for people either, but they do tend in that direction.

Within that 80%, I personally would put most of the probability mass on AIs that favor particular values or give most of the decision-making power to particular groups of people.

I hadn't considered this as a particularly likely possibility. If you'll allow me to go up one meta level, this sort of argument is why I prefer to be more epistemically modest about far-future concerns, and why I wish more people would be more modest. This argument you've just made had not occurred to me during the many hours of thinking and discussion I've already conducted, and it seems plausible that a nontrivial portion of the probability mass of the far future falls on "a small group of people get control of everything and optimize the world for their own benefit." The existence of this argument, and the fact that I hadn't considered it before, makes me uncertain about my own ability to reason about the expected value of the far future.

Thanks!

  1. One man's bias is another's intrinsic value, at least for "normative" biases like scope insensitivity, status-quo bias, and failure to aggregate. But at least I understand your meaning better. :) Most of LessWrong is not hedonistic utilitarian (most people there are more preference utilitarian or complexity-of-value consequentialist), so one might wonder why other people who think a lot about overcoming those normative biases aren't hedonistic utilitarians.

  2. Of course, one could give people the experience of having grown up in a culture that valued paperclips, of meeting the Great Paperclip in the Sky and hearing him tell them that paperclips are the meaning of life, and so on. These might "naturally" incline people to intrinsically value paperclips. But I agree there seem to be some differences between this case and the pleasure case.

  3. I'm glad that comment was useful. :) I think it's unfortunate that it's so often assumed that "human-controlled AI" means something like CEV, when in fact CEV seems to me a remote possibility. I don't know that you should downshift your ability to reason about the far future that much. :) Over time you'll hear more and more perspectives, which can help challenge previous assumptions.

one might wonder why other people who think a lot about overcoming those normative biases aren't hedonistic utilitarians.

Simple: just because LessWrongers know that these biases exist doesn't mean they're immune to them.

I don't know that you should downshift your ability to reason about the far future that much.

It was already pretty low, this is just an example of why I think it should be low.

The question is what is the mechanism of value spreading.

If the mechanism is having rational discussions then it is not necessarily urgent to have these discussions right now. Once we create a future in which there is no death and no economic pressures to self-modify in ways that are value destructive, we'll have plenty of time for rational discussions. Things like "experience machine" also fit into this framework, as long as the experiences are in some sense non-destructive (this rules out experiences that create addiction, for example).

If the mechanism is anything but rational discussion then

  1. It's not clear in what sense the values you're spreading are "correct" if it's impossible to convince other people through rational discussion.
  2. I would definitely consider this sort of intervention as evil and would fight rather than cooperate with it (at least assuming the effect cannot be reversed by rational discussion; I also consider hedonistic utilitarianism abhorrent except as an approximate model in very restricted contexts).

Regarding MIRI in particular, I don't think the result of their work depends on the personal opinions of its director in the way you suggest. I think that any reasonable solution to the FAI problem will be on the meta-level (defining what does it mean for values to be "correct") rather than the object level (hard-coding specific values like animal suffering).

I mostly agree with you. I am less confident than you are that a solution to the FAI problem will be on the meta-level. I think you're probably right, but I have enough uncertainty about it that I much prefer someone who's doing AI safety research to share my values so I can be more confident that they will do research that's useful for all sentient beings and not just humans.

Do you think that trying to supplant others' plans with your own is uncooperative? Coercing them for some greater good? We oughtn't define 'cooperative' as 'good', lest it lose all meaning.

Paul could argue that cooperating with someone means helping them achieve their values. Cooperative approaches would be to help people to live out their values, and if you don't agree with their values, then you can trade your plans with theirs to get some Pareto-optimal outcome. That's probably a simple definition of cooperation in some economic fields... A more interesting edge case is trying to help them weigh together their meta-ethical views to arrive at ethical principles, which feels cooperative to me intuitively.

The distinction here is a bit fuzzy. Some sorts of values spreading care clearly uncooperative, but other times it's unclear. Like what about trying to convince selfish people to be more cooperative? That's uncooperative in that it works against their goals, but if you're a "cooperation consequentialist" then you're still doing good because you're increasing the total amount of cooperation in the world.

If you're a war criminal, and I slap you, it's still violence, irrespective of whether I call myself a "pacifism conseqentialist"!

Yeah that's true. So it depends on whether you're talking about increasing the total amount of cooperation in the world, or increasing your personal level of cooperation with other agents. It seems to me that the former matters more than the latter.

One point that I like to make is that for some philosophies, its more important to just help people to think clearly in general, rather than to promote one morality, because it's hard to justify moralising if you don't have strong objective reasons to think your metamoral reasoning is superior. If objectively bad thinking procedures led people to have a 'wrong' moral view, then correcting these could be easier than promoting a more dubious moral conclusion, while also helping selfish people.

I'm curious under what circumstances we can judge thinking to be better or worse but can't make such judgments of "metamoral reasoning".

I'm saying that on some views, you might want to make people do better things on their values, so long as those values are supported by good metamoral thinking. One way to do that is promote good clear thinking, or philosophical thinking in general, rather than just promoting your personal moral system. And for some reasons, perhaps signalling-related, it's much more common to see people profess and evangelise their personal moral beliefs than metaethical or general philosophical ones.