Hi all,
Effective altruism has given a lot of attention to ethics, and in particular suffering reduction. However, nobody seems to have a clear definition for what suffering actually is, or what moral value is. The implicit assumptions seem to be:
- We (as individuals and as a community) tend to have reasonably good intuitions as to what suffering and moral value are, so there's little urgency to put things on a more formal basis; and/or
- Formally defining suffering & moral value is much too intractable to make progress on, so it would be wasted effort to try (this seems to be the implicit position of e.g., Foundational Research Institute, and other similar orgs).
I think both of these assumptions are wrong, and that formal research into consciousness and valence is tractable, time-sensitive, and critically important.
Consciousness research is critically important and time-sensitive
Obviously, if one wants to do any sort of ethical calculus for a utilitarian intervention, or realistically estimate the magnitude of wild animal suffering, it's vital to have both a good theory of consciousness (what is conscious?) and valence (which conscious states feel good, which ones feel bad?). It seems obvious but is worth emphasizing that if your goal is to reduce suffering, it's important to know what suffering is.
But I would go further, and say that the fact that we don't have a good theory of consciousness & valence constitutes an existential risk.
First, here's Max Tegmark making the case that precision in moral terminology is critically important when teaching AIs what to value, and that we currently lack this precision:
Relentless progress in artificial intelligence (AI) is increasingly raising concerns that machines will replace humans on the job market, and perhaps altogether. Eliezer Yudkowsky and others have explored the possibility that a promising future for humankind could be guaranteed by a superintelligent "Friendly AI", designed to safeguard humanity and its values. I argue that, from a physics perspective where everything is simply an arrangement of elementary particles, this might be even harder than it appears. Indeed, it may require thinking rigorously about the meaning of life: What is "meaning" in a particle arrangement? What is "life"? What is the ultimate ethical imperative, i.e., how should we strive to rearrange the particles of our Universe and shape its future? If we fail to answer the last question rigorously, this future is unlikely to contain humans.
I discuss the potential for consciousness & valence research to help AI safety here. But the x-risk argument for consciousness research goes much further. Namely: if consciousness is a precondition for value, and we can't define what consciousness is, then we may inadvertently trade it away for competitive advantage. Nick Bostrom and Scott Alexander have both noted this possibility:
We could thus imagine, as an extreme case, a technologically highly advanced society, containing many complex structures, some of them far more intricate and intelligent than anything that exists on the planet today – a society which nevertheless lacks any type of being that is conscious or whose welfare has moral significance. In a sense, this would be an uninhabited society. It would be a society of economic miracles and technological awesomeness, with nobody there to benefit. A Disneyland with no children. (Superintelligence)
Moloch is exactly what the history books say he is. He is the god of Carthage. He is the god of child sacrifice, the fiery furnace into which you can toss your babies in exchange for victory in war.
He always and everywhere offers the same deal: throw what you love most into the flames, and I will grant you power.
...
The last value we have to sacrifice is being anything at all, having the lights on inside. With sufficient technology we will be “able” to give up even the final spark. (Meditations on Moloch)
Finally, here's Andres Gomez Emilsson on the danger of a highly competitive landscape which is indifferent toward consciousness & valence:
I will define a pure replicator, in the context of agents and minds, to be an intelligence that is indifferent towards the valence of its conscious states and those of others. A pure replicator invests all of its energy and resources into surviving and reproducing, even at the cost of continuous suffering to themselves or others. Its main evolutionary advantage is that it does not need to spend any resources making the world a better place.
A decade ago, these concerns would have seemed very sci-fi. Today, they seem interesting and a little worrying. In ten years, they'll seem incredibly pressing and we'll wish we had started on them sooner.
Consciousness research and valence research are tractable
I'm writing this post not because I hope these topics ultimately turn out to be tractable. I'm writing it because I know they are, because I've spent several years of focused research on them and have progress to show for it.
The result of this research is Principia Qualia. Essentially, it's five things:
- A literature review on what affective neuroscience knows about pain & pleasure, why it's so difficult to research, and why a more principled approach is needed;
- A literature review on quantitative theories of consciousness, centered on IIT and its flaws;
- A framework for clarifying & generalizing IIT in order to fix these flaws;
- A crisp, concise, and falsifiable hypothesis about what valence is, in the context of a mathematical theory of consciousness;
- A blueprint for turning qualia research into a formal scientific discipline.
The most immediately significant takeaway is probably (4), a definition of valence (pain/pleasure) in terms of identity, not just correlation. It's long, but I don't think any shorter paper could do all these topics justice. I encourage systems-thinkers who are genuinely interested in consciousness, morality, and x-risk to read it and comment.
Relatedly, I see this as the start of consciousness & valence research as an EA cause area, an area which is currently not being served within the community or by academia. As such I strongly encourage organizations dealing with cause prioritization and suffering reduction to consider my case whether this area is as important, time-sensitive, and tractable as I'm arguing it to be. A handful of researchers are working on the problem- mostly Andres Gomez Emilsson and I- and I've spoken with brilliant, talented folks who really want to work on the research program I've outlined, but we're significantly resource-constrained. (I'm happy to make a more detailed case for prioritization of this cause area, what organizations are doing related work, and why this specific area hasn't been addressed, later; I think saying more now would be premature.)
---
Strong claims invite critical examination. This post is intended as sort of an "open house" for EAs to examine the research, ask for clarification, discuss alternatives, et cetera.
One point I'd like to stress is that this research is developed enough to make specific, object-level, novel, falsifiable predictions (Sections XI and XII). I've made the framework broad enough to be compatible with many different theories of consciousness, but in order to say anything meaningful about consciousness, we have to rule out certain possibilities. We can discuss metaphysics, but in my experience it's more effective to discuss things on the object-level. So for objections such as, "consciousness can't be X sort of thing, because it's Y sort of thing," consider framing it as an object-level objection- i.e., a divergent prediction. A final point- the link above goes to an executive summary. The primary document, which can be found here, goes into much more detail.
All comments welcome.
Mike, Qualia Research Institute
Edit, 12-20-16 & 1-9-17: In addition to the above remarks, qualia research also seems important for smoothing certain coordination problems between various EA and x-risk organizations. My comment to Jessica Taylor:
>I would expect the significance of this question [about qualia] to go up over time, both in terms of direct work MIRI expects to do, and in terms of MIRI's ability to strategically collaborate with other organizations. I.e., when things shift from "let's build alignable AGI" to "let's align the AGI", it would be very good to have some of this metaphysical fog cleared away so that people could get on the same ethical page, and see that they are in fact on the same page.
Right now, it's reasonable for EA organizations to think they're on the same page and working toward the same purpose. But as AGI approaches and the stakes get higher & our possible futures become more divergent, I fear apparently small differences may grow very large. Research into qualia alone won't solve this, but it would help a lot. This question seems to parallel a debate between Paul Christiano and Wei Dai, about whether philosophical confusion magnifies x-risk, and if so, how much.
Thanks for your comments too, I'm finding them helpful for understanding other possible positions on ethics.
OK, how about a rule like this:
(formalizing this rule would require a theory of logical counterfactuals; I'm not sure if I expect a fully general theory to exist but it seems plausible that one does)
I'm not asserting that this rule is correct but it doesn't seem inconsistent. In particular it doesn't seem like you could use it to prove A > B and B > A. And clearly your popcorn embeds neither a cat nor the suffering of five holocausts under this rule.
If it turns out that no simple rule of this form works, I wouldn't be too troubled, though; I'd be psychologically prepared to accept that there isn't a clean quarkscomputations mapping. Similar to how I already accept that human value is complex, I could accept that human judgments of "does this physical system implement this computation" are complex (and thus can't be captured in a simple rule). I don't think this would make me inconsistent, I think it would just make me more tolerant of nebulosity in ethics. At the moment it seems like clean mappings might exist and so it makes sense to search for them, though.
On the object level, it seems like it's possible to think of painting as "how should we arrange the brush strokes on the canvas?". But it seems hard to paint well while only thinking at the level of brush strokes (and not thinking about the higher levels, like objects). I expect ethics to be similar; at the very least if human ethics has an "aesthetic" component then it seems like designing a good light cone is at least as hard as making a good painting. Maybe this is a strawman of your position?
On the meta level, I would caution against this use of "ultimately"; see here and here (the articles are worded somewhat disagreeably but I mostly endorse the content). In some sense ethics is about quarks, but in other senses it's about:
I think these are all useful ways of viewing ethics, and I don't feel the need to pick a single view (although I often find it appealing to look at what some views say about what other views are saying and resolving the contradictions between them). There are all kinds of reasons why it might be psychologically uncomfortable not to have a simple theory of ethics (e.g. it's harder to know whether you're being ethical, it's harder to criticize others for being unethical, it's harder for groups to coordinate around more complex and ambiguous ethical theories, you'll never be able to "solve" ethics once and then never have to think about ethics again, it requires holding multiple contradictory views in your head at once, you won't always have a satisfying verbal justification for why your actions are ethical). But none of this implies that it's good (in any of the senses above!) to assume there's a simple ethical theory.
(For the record I think it's useful to search for simple ethical theories even if they don't exist, since you might discover interesting new ways of viewing ethics, even if these views aren't complete).
I suspect this still runs into the same problem-- in the case of the computational-physical mapping, even if we assert that C has changed, we can merely choose a different interpretation of P which is consistent with the change, without actually changing P.
... (read more)