Hi all,
Effective altruism has given a lot of attention to ethics, and in particular suffering reduction. However, nobody seems to have a clear definition for what suffering actually is, or what moral value is. The implicit assumptions seem to be:
- We (as individuals and as a community) tend to have reasonably good intuitions as to what suffering and moral value are, so there's little urgency to put things on a more formal basis; and/or
- Formally defining suffering & moral value is much too intractable to make progress on, so it would be wasted effort to try (this seems to be the implicit position of e.g., Foundational Research Institute, and other similar orgs).
I think both of these assumptions are wrong, and that formal research into consciousness and valence is tractable, time-sensitive, and critically important.
Consciousness research is critically important and time-sensitive
Obviously, if one wants to do any sort of ethical calculus for a utilitarian intervention, or realistically estimate the magnitude of wild animal suffering, it's vital to have both a good theory of consciousness (what is conscious?) and valence (which conscious states feel good, which ones feel bad?). It seems obvious but is worth emphasizing that if your goal is to reduce suffering, it's important to know what suffering is.
But I would go further, and say that the fact that we don't have a good theory of consciousness & valence constitutes an existential risk.
First, here's Max Tegmark making the case that precision in moral terminology is critically important when teaching AIs what to value, and that we currently lack this precision:
Relentless progress in artificial intelligence (AI) is increasingly raising concerns that machines will replace humans on the job market, and perhaps altogether. Eliezer Yudkowsky and others have explored the possibility that a promising future for humankind could be guaranteed by a superintelligent "Friendly AI", designed to safeguard humanity and its values. I argue that, from a physics perspective where everything is simply an arrangement of elementary particles, this might be even harder than it appears. Indeed, it may require thinking rigorously about the meaning of life: What is "meaning" in a particle arrangement? What is "life"? What is the ultimate ethical imperative, i.e., how should we strive to rearrange the particles of our Universe and shape its future? If we fail to answer the last question rigorously, this future is unlikely to contain humans.
I discuss the potential for consciousness & valence research to help AI safety here. But the x-risk argument for consciousness research goes much further. Namely: if consciousness is a precondition for value, and we can't define what consciousness is, then we may inadvertently trade it away for competitive advantage. Nick Bostrom and Scott Alexander have both noted this possibility:
We could thus imagine, as an extreme case, a technologically highly advanced society, containing many complex structures, some of them far more intricate and intelligent than anything that exists on the planet today – a society which nevertheless lacks any type of being that is conscious or whose welfare has moral significance. In a sense, this would be an uninhabited society. It would be a society of economic miracles and technological awesomeness, with nobody there to benefit. A Disneyland with no children. (Superintelligence)
Moloch is exactly what the history books say he is. He is the god of Carthage. He is the god of child sacrifice, the fiery furnace into which you can toss your babies in exchange for victory in war.
He always and everywhere offers the same deal: throw what you love most into the flames, and I will grant you power.
...
The last value we have to sacrifice is being anything at all, having the lights on inside. With sufficient technology we will be “able” to give up even the final spark. (Meditations on Moloch)
Finally, here's Andres Gomez Emilsson on the danger of a highly competitive landscape which is indifferent toward consciousness & valence:
I will define a pure replicator, in the context of agents and minds, to be an intelligence that is indifferent towards the valence of its conscious states and those of others. A pure replicator invests all of its energy and resources into surviving and reproducing, even at the cost of continuous suffering to themselves or others. Its main evolutionary advantage is that it does not need to spend any resources making the world a better place.
A decade ago, these concerns would have seemed very sci-fi. Today, they seem interesting and a little worrying. In ten years, they'll seem incredibly pressing and we'll wish we had started on them sooner.
Consciousness research and valence research are tractable
I'm writing this post not because I hope these topics ultimately turn out to be tractable. I'm writing it because I know they are, because I've spent several years of focused research on them and have progress to show for it.
The result of this research is Principia Qualia. Essentially, it's five things:
- A literature review on what affective neuroscience knows about pain & pleasure, why it's so difficult to research, and why a more principled approach is needed;
- A literature review on quantitative theories of consciousness, centered on IIT and its flaws;
- A framework for clarifying & generalizing IIT in order to fix these flaws;
- A crisp, concise, and falsifiable hypothesis about what valence is, in the context of a mathematical theory of consciousness;
- A blueprint for turning qualia research into a formal scientific discipline.
The most immediately significant takeaway is probably (4), a definition of valence (pain/pleasure) in terms of identity, not just correlation. It's long, but I don't think any shorter paper could do all these topics justice. I encourage systems-thinkers who are genuinely interested in consciousness, morality, and x-risk to read it and comment.
Relatedly, I see this as the start of consciousness & valence research as an EA cause area, an area which is currently not being served within the community or by academia. As such I strongly encourage organizations dealing with cause prioritization and suffering reduction to consider my case whether this area is as important, time-sensitive, and tractable as I'm arguing it to be. A handful of researchers are working on the problem- mostly Andres Gomez Emilsson and I- and I've spoken with brilliant, talented folks who really want to work on the research program I've outlined, but we're significantly resource-constrained. (I'm happy to make a more detailed case for prioritization of this cause area, what organizations are doing related work, and why this specific area hasn't been addressed, later; I think saying more now would be premature.)
---
Strong claims invite critical examination. This post is intended as sort of an "open house" for EAs to examine the research, ask for clarification, discuss alternatives, et cetera.
One point I'd like to stress is that this research is developed enough to make specific, object-level, novel, falsifiable predictions (Sections XI and XII). I've made the framework broad enough to be compatible with many different theories of consciousness, but in order to say anything meaningful about consciousness, we have to rule out certain possibilities. We can discuss metaphysics, but in my experience it's more effective to discuss things on the object-level. So for objections such as, "consciousness can't be X sort of thing, because it's Y sort of thing," consider framing it as an object-level objection- i.e., a divergent prediction. A final point- the link above goes to an executive summary. The primary document, which can be found here, goes into much more detail.
All comments welcome.
Mike, Qualia Research Institute
Edit, 12-20-16 & 1-9-17: In addition to the above remarks, qualia research also seems important for smoothing certain coordination problems between various EA and x-risk organizations. My comment to Jessica Taylor:
>I would expect the significance of this question [about qualia] to go up over time, both in terms of direct work MIRI expects to do, and in terms of MIRI's ability to strategically collaborate with other organizations. I.e., when things shift from "let's build alignable AGI" to "let's align the AGI", it would be very good to have some of this metaphysical fog cleared away so that people could get on the same ethical page, and see that they are in fact on the same page.
Right now, it's reasonable for EA organizations to think they're on the same page and working toward the same purpose. But as AGI approaches and the stakes get higher & our possible futures become more divergent, I fear apparently small differences may grow very large. Research into qualia alone won't solve this, but it would help a lot. This question seems to parallel a debate between Paul Christiano and Wei Dai, about whether philosophical confusion magnifies x-risk, and if so, how much.
Hi Jessica,
Thanks for the thoughtful note. I do want to be very clear that I’m not criticizing MIRI’s work on CEV, which I do like very much! - It seems like the best intuition pump & Schelling Point in its area, and I think it has potential to be more.
My core offering in this space (where I expect most of the value to be) is Principia Qualia- it’s more up-to-date and comprehensive than the blog post you’re referencing. I pose some hypotheticals in the blog post, but it isn’t intended to stand alone as a substantive work (whereas PQ is).
But I had some thoughts in response to your response on valence + AI safety:
->1. First, I agree that leaving our future moral trajectory in the hands of humans is a great thing. I’m definitely not advocating anything else.
->2. But I would push back on whether our current ethical theories are very good- i.e., good enough to see us through any future AGI transition without needlessly risking substantial amounts of value.
To give one example: currently, some people make the claim that animals such as cows are much more capable of suffering than humans, because they don’t have much intellect to blunt their raw, emotional feeling. Other people make the claim that cows are much less capable of suffering than humans, because they don’t have the ‘bootstrapping strange loop’ mind architecture enabled by language, and necessary for consciousness. Worryingly, both of these arguments seem plausible, with no good way to pick between them.
Now, I don’t think cows are in a strange quantum superposition of both suffering and not suffering— I think there’s a fact of the matter, though we clearly don’t know it.
This example may have moral implications, but little relevance to existential risk. However, when we start talking about mind simulations and ‘thought crime’, WBE, selfish replicators, and other sorts of tradeoffs where there might be unknown unknowns with respect to moral value, it seems clear to me that these issues will rapidly become much more pressing. So, I absolutely believe work on these topics is important, and quite possibly a matter of survival. (And I think it's tractable, based on work already done.)
Based on my understanding, I don’t think Act-based agents or Task AI would help resolve these questions by default, although as tools they could probably help.
->3. I also think theories in IIT’s reference class won’t be correct, but I suspect I define the reference class much differently. :) Based on my categorization, I would object to lumping my theory into IIT’s reference class (we could talk more about this if you'd like).
->4. Re: suffering computations- a big, interesting question here is whether moral value should be defined at the physical or computational level. I.e., “is moral value made out of quarks or bits (or something else)?” — this may be the crux of our disagreement, since I’m a physicalist and I gather you’re a computationalist. But PQ’s framework allows for bits to be “where the magic happens”, as long as certain conditions obtain.
One factor that bears mentioning is whether an AGI’s ontology & theory of ethics might be path-dependent upon its creators’ metaphysics in such a way that it would be difficult for it to update if it’s wrong. If this is a plausible concern, this would imply a time-sensitive factor in resolving the philosophical confusion around consciousness, valence, moral value, etc.
->5. I wouldn’t advocate strictly hedonic values (this was ambiguous in the blog post but is clearer in Principia Qualia).
->6. However, I do think that “how much horrific suffering is there in possible world X?” is a hands-down, qualitatively better proxy for whether it’s a desirable future than “what is the Dow Jones closing price in possible world X?”
->7. Re: neuromorphic AIs: I think an interesting angle here is, “how does boredom stop humans from wireheading on pleasurable stimuli?” - I view boredom as a sophisticated anti-wireheading technology. It seems possible (although I can’t vouch for plausible yet) that if we understand the precise mechanism by which boredom is implemented in human brains, it may help us understand and/or control neuromorphic AGIs better. But this is very speculative, and undeveloped.
I'm curious about this, since you mentioned fixing IIT's flaws. I came to the comments to make the same complaint you were responding to Jessica about.