Comment author: MikeJohnson 31 July 2017 06:34:36PM *  1 point [-]

Speaking of the metaphysical correctness of claims about qualia sounds confused, and I think precise definitions of qualia-related terms should be judged by how useful they are for generalizing our preferences about central cases.

I agree a good theory of qualia should help generalize our preferences about central cases. I disagree that we can get there with the assumption that qualia are intrinsically vague/ineffable. My critique of analytic functionalism is that it is essentially nothing but an assertion of this vagueness.

Regarding the objection that shaking a bag of popcorn can be interpreted as carrying out an arbitrary computation, I'm not convinced that this is actually true, and I suspect it isn't.

Without a bijective mapping between physical states/processes and computational states/processes, I think my point holds. I understand it's counterintuitive, but we should expect that when working in these contexts.

I think the edge cases that you quote Scott Aaronson bringing up are good ones to think about, and I do have a large amount of moral uncertainty about them. But I don't see these as problems specific to analytic functionalism. These are hard problems, and the fact that some more precise theory about qualia may be able to easily answer them is not a point in favor of that theory, since wrong answers are not helpful.

Correct; they're the sorts of things a theory of qualia should be able to address- necessary, not sufficient.

Re: your comments on the Symmetry Theory of Valence, I feel I have the advantage here since you haven't read the work. Specifically, it feels as though you're pattern-matching me to IIT and channeling Scott Aaronson's critique of Tononi, which is a bit ironic since that forms a significant part of PQ's argument why an IIT-type approach can't work.

At any rate I'd be happy to address specific criticism of my work. This is obviously a complicated topic and informed external criticism is always helpful. At the same time, I think it's a bit tangential to my critique about FRI's approach: as I noted,

I mention all this because I think analytic functionalism- which is to say radical skepticism/eliminativism, the metaphysics of last resort- only looks as good as it does because nobody’s been building out any alternatives.

Comment author: AlexMennen 01 August 2017 08:22:51PM 0 points [-]

My critique of analytic functionalism is that it is essentially nothing but an assertion of this vagueness.

That's no reason to believe that analytic functionalism is wrong, only that it is not sufficient by itself to answer very many interesting questions.

Without a bijective mapping between physical states/processes and computational states/processes, I think my point holds.

No, it doesn't. I only claim that most physical states/processes have only a very limited collection of computational states/processes that it can reasonably be interpreted as, not that every physical state/process has exactly one computational state/process that it can reasonably be interpreted as, and certainly not that every computational state/process has exactly one physical state/process that can reasonably be interpreted as it. Those are totally different things.

it feels as though you're pattern-matching me to IIT and channeling Scott Aaronson's critique of Tononi

Kind of. But to clarify, I wasn't trying to argue that there will be problems with the Symmetry Theory of Valence that derive from problems with IIT. And when I heard about IIT, I figured that there were probably trivial counterexamples to the claim that Phi measures consciousness and that perhaps I could come up with one if I thought about the formula enough, before Scott Aaronson wrote the blog post where he demonstrated this. So although I used that critique of IIT as an example, I was mainly going off of intuitions I had prior to it. I can see why this kind of very general criticism from someone who hasn't read the details could be frustrating, but I don't expect I'll look into it enough to say anything much more specific.

I mention all this because I think analytic functionalism- which is to say radical skepticism/eliminativism, the metaphysics of last resort- only looks as good as it does because nobody’s been building out any alternatives.

But people have tried developing alternatives to analytic functionalism.

Comment author: Brian_Tomasik 31 July 2017 05:13:53AM 0 points [-]

To steelman the popcorn objection, one could say that separating "normal" computations from popcorn shaking requires at least certain sorts of conditions on what counts as a valid interpretation, and such conditions increase the arbitrariness of the theory. Of course, if we adopt a complexity-of-value approach to moral value (as I and probably you think we should), then those conditions on what counts as a computation may be minimal compared with the other forms of arbitrariness we bring to bear.

I haven't read Principia Qualia and so can't comment competently, but I agree that symmetry seems like not the kind of thing I'm looking for when assessing the moral importance of a physical system, or at least it's not more than one small part of what I'm looking for. Most of what I care about is at the level of ordinary cognitive science, such as mental representations, behaviors, learning, preferences, introspective abilities, etc.

That said, I do think theories like IIT are at least slightly useful insofar as they expand our vocabulary and provide additional metrics that we might care a little bit about.

Comment author: AlexMennen 31 July 2017 01:29:48PM 1 point [-]

That said, I do think theories like IIT are at least slightly useful insofar as they expand our vocabulary and provide additional metrics that we might care a little bit about.

If you expanded on this, I would be interested.

Comment author: AlexMennen 30 July 2017 10:17:36PM *  6 points [-]

Speaking of the metaphysical correctness of claims about qualia sounds confused, and I think precise definitions of qualia-related terms should be judged by how useful they are for generalizing our preferences about central cases. I expect that any precise definition for qualia-related terms that anyone puts forward before making quite a lot of philosophical progress is going to be very wrong when judged by usefulness for describing preferences, and that the vagueness of the analytic functionalism used by FRI is necessary to avoid going far astray.

Regarding the objection that shaking a bag of popcorn can be interpreted as carrying out an arbitrary computation, I'm not convinced that this is actually true, and I suspect it isn't. It seems to me that the interpretation would have to be doing essentially all of the computation itself, and it should be possible to make precise the sense in which brains and computers simulating brains carry out a certain computation that waterfalls and bags of popcorn don't. The defense of this objection that you quote from McCabe is weak; the uncontroversial fact that many slightly different physical systems can carry out the same computation does not establish that an arbitrary physical system can be reasonably interpreted as carrying out an arbitrary computation.

I think the edge cases that you quote Scott Aaronson bringing up are good ones to think about, and I do have a large amount of moral uncertainty about them. But I don't see these as problems specific to analytic functionalism. These are hard problems, and the fact that some more precise theory about qualia may be able to easily answer them is not a point in favor of that theory, since wrong answers are not helpful.

The Symmetry Theory of Valence sounds wildly implausible. There are tons of claims that people put forward, often contradicting other such claims, that some qualia-related concept is actually some other simple thing. For instance, I've heard claims that goodness is complexity and that what humans value is increasing complexity. Complexity and symmetry aren't quite opposites, but they're certainly anti-correlated, and both theories can't be right. These sorts of theories never end up getting empirical support, although their proponents often claim to have empirical support. For example, proponents of Integrated Information Theory often cite that the cerebrum has a higher Phi value than the cerebellum does as support for the hypothesis that Phi is a good measure of the amount of consciousness a system has, as if comparing two data points was enough to support such a claim, and it turns out that large regular rectangular grids of transistors, and the operation of multiplication by a large Vandermonde matrix, both have arbitrarily high Phi values, and yet the claim that Phi measures consciousness still survives and claims empirical support, despite this damning disconfirmation. And I think the “goodness is complexity” people also provided examples of good things that they thought they had established are complex and bad things that they thought they had established are not. I know this sounds totally unfair, but I won't be at all surprised if you claim to have found substantial empirical support for your theory, and I still won't take your theory at all seriously if you do, because any evidence you cite will inevitably be highly dubious. The heuristic that claims that a qualia-related concept is some simple other thing are wrong, and that claims of empirical support for such claims never hold up, seems to be pretty well supported. I am almost certain that there are trivial counterexamples to the Symmetry Theory of Valence, even though perhaps you may have developed a theory sophisticated enough to avoid the really obvious failure modes like claiming that a square experiences more pleasure and less suffering than a rectangle because its symmetry group is twice as large.

Comment author: Tobias_Baumann 08 July 2017 08:31:50AM *  0 points [-]

Do you mean more promising than other technical safety research (e.g. concrete problems, Paul's directions, MIRI's non-HRAD research)?

Yeah, and also (differentially) more promising than AI strategy or AI policy work. But I'm not sure how strong the effect is.

If so, I'd be interested in hearing why you think hard / unexpected takeoff differentially favors HRAD.

In a hard / unexpected takeoff scenario, it's more plausible that we need to get everything more or less exactly right to ensure alignment, and that we have only one shot at it. This might favor HRAD because a less principled approach makes it comparatively unlikely that we get all the fundamentals right when we build the first advanced AI system.

In contrast, if we think there's no such discontinuity and AI development will be gradual, then AI control may be at least somewhat more similar (but surely not entirely comparable) to how we "align" contemporary software systems. That is, it would be more plausible that we could test advanced AI systems extensively without risking catastrophic failure or that we could iteratively try a variety of safety approaches to see what works best.

It would also be more likely that we'd get warning signs of potential failure modes, so that it's comparatively more viable to work on concrete problems whenever they arise, or to focus on making the solutions to such problems scalable – which, to my understanding, is a key component of Paul's approach. In this picture, successful alignment without understanding the theoretical fundamentals is more likely, which makes non-HRAD approaches more promising.

My personal view is that I find a hard and unexpected takeoff unlikely, and accordingly favor other approaches than HRAD, but of course I can't justify high confidence in this given expert disagreement. Similarly, I'm not highly confident that the above distinction is actually meaningful.

I'd be interested in hearing your thoughts on this!

Comment author: AlexMennen 10 July 2017 06:11:08AM *  2 points [-]

There's a strong possibility, even in a soft takeoff, that an unaligned AI would not act in an alarming way until after it achieves a decisive strategic advantage. In that case, the fact that it takes the AI a long time to achieve a decisive strategic advantage wouldn't do us much good, since we would not pick up an indication that anything was amiss during that period.

Reasons an AI might act in a desirable manner before but not after achieving a decisive strategic advantage:

Prior to achieving a decisive strategic advantage, the AI relies on cooperation with humans to achieve its goals, which provides an incentive not to act in ways that would result in it getting shut down. An AI may be capable of following these incentives well before achieving a decisive strategic advantage.

It may be easier to give an AI a goal system that aligns with human goals in familiar circumstances than it is to give it a goal system that aligns with human goals in all circumstances. An AI with such a goal system would act in ways that align with human goals if it has little optimization power but in ways that are not aligned with human goals if it has sufficiently large optimization power, and it may attain that much optimization power only after achieving a decisive strategic advantage (or before achieving a decisive strategic advantage, but after acquiring the ability to behave deceptively, as in the previous reason).

Comment author: AlexMennen 27 February 2017 04:59:23AM 3 points [-]

5) Look at the MIRI and 80k AI Safety syllabus, and see if how much of it looks like something you'd be excited to learn. If applicable to you, consider diving into that so you can contribute to the cutting edge of knowledge. This may make most sense if you do it through

...

Comment author: AlexMennen 01 January 2017 02:05:42AM 4 points [-]

Do any animal welfare EAs have anything to say on animal products from ethically raised animals, and how to identify such animal products? It seems plausible to me that consumption of such animal products could even be morally positive on net, if the animals are treated well enough to have lives worth living, and raising them does not reduce wild animal populations much more than the production of non-animal-product substitutes. Most animal welfare EAs seem confident that almost all animals raised for the production of animal products do not live lives worth living, and that most claims by producers that their animals are treated well are false. However, there are independent organizations (e.g. the Cornucopia Institute's egg and dairy scorecards) that agree that such claims are often false, but also claim to be able to identify producers that do treat their animals well. Thoughts?

In response to Lunar Colony
Comment author: AlexMennen 19 December 2016 05:51:42PM 10 points [-]

One thing to keep in mind is that we currently don't have the ability to create a space colony that can sustain itself indefinitely. So pursuing a strategy of creating a space colony in case of human life on Earth being destroyed probably should look like capacity-building so that we can create an indefinitely self-sustaining space colony, rather than just creating a space colony.

Comment author: Maxdalton 12 December 2016 08:01:03AM *  2 points [-]

Hi Alex, thanks for the comment, great to pick up issues like this.

I wrote the article, and I agree and am aware of your original point. Your edit is also correct in that we are using risk aversion in the psychological/pure sense, and so the VNM theory does imply that this form of risk aversion is irrational. However, I think you're right that, given that people are more likely to have heard of the concept of economic risk aversion, the expected value article is likely to be misleading. I have edited to emphasise the way that we're using risk aversion in these articles, and to clarify that VNM alone does not imply risk neutrality in an economic sense. I've also added a bit more discussion of economic risk aversion. Further feedback welcome!

Comment author: AlexMennen 12 December 2016 10:03:16PM 0 points [-]

Even though the last paragraph of the expected value maximization article now says that it's talking about the VNM notion of expected value, the rest of the article still seems to be talking about the naive notion of expected value that is linear with respect to things of value (in the examples given, years of fulfilled life). This makes the last paragraph seem pretty out of place in the article.

Nitpicks on the risk aversion article: "However, it seems like there are fewer reasons for altruists to be risk-neutral in the economic sense" is a confusing way of starting a paragraph about how it probably makes sense for altruists to be close to economically risk-neutral as well. And I'm not sure what "unless some version of pure risk-aversion is true" is supposed to mean.

Comment author: MikeJohnson 10 December 2016 05:03:39AM *  3 points [-]

Hi Jessica,

Thanks for the thoughtful note. I do want to be very clear that I’m not criticizing MIRI’s work on CEV, which I do like very much! - It seems like the best intuition pump & Schelling Point in its area, and I think it has potential to be more.

My core offering in this space (where I expect most of the value to be) is Principia Qualia- it’s more up-to-date and comprehensive than the blog post you’re referencing. I pose some hypotheticals in the blog post, but it isn’t intended to stand alone as a substantive work (whereas PQ is).

But I had some thoughts in response to your response on valence + AI safety:

->1. First, I agree that leaving our future moral trajectory in the hands of humans is a great thing. I’m definitely not advocating anything else.

->2. But I would push back on whether our current ethical theories are very good- i.e., good enough to see us through any future AGI transition without needlessly risking substantial amounts of value.

To give one example: currently, some people make the claim that animals such as cows are much more capable of suffering than humans, because they don’t have much intellect to blunt their raw, emotional feeling. Other people make the claim that cows are much less capable of suffering than humans, because they don’t have the ‘bootstrapping strange loop’ mind architecture enabled by language, and necessary for consciousness. Worryingly, both of these arguments seem plausible, with no good way to pick between them.

Now, I don’t think cows are in a strange quantum superposition of both suffering and not suffering— I think there’s a fact of the matter, though we clearly don’t know it.

This example may have moral implications, but little relevance to existential risk. However, when we start talking about mind simulations and ‘thought crime’, WBE, selfish replicators, and other sorts of tradeoffs where there might be unknown unknowns with respect to moral value, it seems clear to me that these issues will rapidly become much more pressing. So, I absolutely believe work on these topics is important, and quite possibly a matter of survival. (And I think it's tractable, based on work already done.)

Based on my understanding, I don’t think Act-based agents or Task AI would help resolve these questions by default, although as tools they could probably help.

->3. I also think theories in IIT’s reference class won’t be correct, but I suspect I define the reference class much differently. :) Based on my categorization, I would object to lumping my theory into IIT’s reference class (we could talk more about this if you'd like).

->4. Re: suffering computations- a big, interesting question here is whether moral value should be defined at the physical or computational level. I.e., “is moral value made out of quarks or bits (or something else)?” — this may be the crux of our disagreement, since I’m a physicalist and I gather you’re a computationalist. But PQ’s framework allows for bits to be “where the magic happens”, as long as certain conditions obtain.

One factor that bears mentioning is whether an AGI’s ontology & theory of ethics might be path-dependent upon its creators’ metaphysics in such a way that it would be difficult for it to update if it’s wrong. If this is a plausible concern, this would imply a time-sensitive factor in resolving the philosophical confusion around consciousness, valence, moral value, etc.

->5. I wouldn’t advocate strictly hedonic values (this was ambiguous in the blog post but is clearer in Principia Qualia).

->6. However, I do think that “how much horrific suffering is there in possible world X?” is a hands-down, qualitatively better proxy for whether it’s a desirable future than “what is the Dow Jones closing price in possible world X?”

->7. Re: neuromorphic AIs: I think an interesting angle here is, “how does boredom stop humans from wireheading on pleasurable stimuli?” - I view boredom as a sophisticated anti-wireheading technology. It seems possible (although I can’t vouch for plausible yet) that if we understand the precise mechanism by which boredom is implemented in human brains, it may help us understand and/or control neuromorphic AGIs better. But this is very speculative, and undeveloped.

Comment author: AlexMennen 12 December 2016 04:54:37AM 1 point [-]

->3. I also think theories in IIT’s reference class won’t be correct, but I suspect I define the reference class much differently. :) Based on my categorization, I would object to lumping my theory into IIT’s reference class (we could talk more about this if you'd like).

I'm curious about this, since you mentioned fixing IIT's flaws. I came to the comments to make the same complaint you were responding to Jessica about.

Comment author: AlexMennen 10 December 2016 08:57:55PM *  2 points [-]

The article on expected value theory incorrectly cites the VNM theorem as a defense of maximizing expected value. The VNM theorem says that for a rational agent, there must exist some measure of value for which the rational agent maximizes its expectation, but the theorem does not say anything about the structure of that measure of value. In particular, it does not say that value must be linear with respect to anything, so it does not give a reason not to be risk averse. There are good reasons for altruists to have very low risk aversion, but the VNM theorem is not a sufficient such reason.

Edit: I see the article on risk aversion clarifies that "risk aversion" means in the psychological sense, but without that context, it looks like the expected value article is saying that many EAs think altruists should have low risk aversion in the economic sense, which is true, an important point, and not supported by the VNM theorem. Also, the economics version of risk aversion is also an important concept for EAs, so I don't think it's a good idea to establish that "risk aversion" only refers to the psychological notion by default, rather than clarifying it every time.

Edit 2: Since this stuff is kind of a pet peeve of mine, I'd actually be willing to attempt to rewrite those articles myself, and if you're interested, I would let you use and modify whatever I write however you want.

View more: Next