Comment author: Tobias_Baumann 08 July 2017 08:31:50AM *  0 points [-]

Do you mean more promising than other technical safety research (e.g. concrete problems, Paul's directions, MIRI's non-HRAD research)?

Yeah, and also (differentially) more promising than AI strategy or AI policy work. But I'm not sure how strong the effect is.

If so, I'd be interested in hearing why you think hard / unexpected takeoff differentially favors HRAD.

In a hard / unexpected takeoff scenario, it's more plausible that we need to get everything more or less exactly right to ensure alignment, and that we have only one shot at it. This might favor HRAD because a less principled approach makes it comparatively unlikely that we get all the fundamentals right when we build the first advanced AI system.

In contrast, if we think there's no such discontinuity and AI development will be gradual, then AI control may be at least somewhat more similar (but surely not entirely comparable) to how we "align" contemporary software systems. That is, it would be more plausible that we could test advanced AI systems extensively without risking catastrophic failure or that we could iteratively try a variety of safety approaches to see what works best.

It would also be more likely that we'd get warning signs of potential failure modes, so that it's comparatively more viable to work on concrete problems whenever they arise, or to focus on making the solutions to such problems scalable – which, to my understanding, is a key component of Paul's approach. In this picture, successful alignment without understanding the theoretical fundamentals is more likely, which makes non-HRAD approaches more promising.

My personal view is that I find a hard and unexpected takeoff unlikely, and accordingly favor other approaches than HRAD, but of course I can't justify high confidence in this given expert disagreement. Similarly, I'm not highly confident that the above distinction is actually meaningful.

I'd be interested in hearing your thoughts on this!

Comment author: AlexMennen 10 July 2017 06:11:08AM *  2 points [-]

There's a strong possibility, even in a soft takeoff, that an unaligned AI would not act in an alarming way until after it achieves a decisive strategic advantage. In that case, the fact that it takes the AI a long time to achieve a decisive strategic advantage wouldn't do us much good, since we would not pick up an indication that anything was amiss during that period.

Reasons an AI might act in a desirable manner before but not after achieving a decisive strategic advantage:

Prior to achieving a decisive strategic advantage, the AI relies on cooperation with humans to achieve its goals, which provides an incentive not to act in ways that would result in it getting shut down. An AI may be capable of following these incentives well before achieving a decisive strategic advantage.

It may be easier to give an AI a goal system that aligns with human goals in familiar circumstances than it is to give it a goal system that aligns with human goals in all circumstances. An AI with such a goal system would act in ways that align with human goals if it has little optimization power but in ways that are not aligned with human goals if it has sufficiently large optimization power, and it may attain that much optimization power only after achieving a decisive strategic advantage (or before achieving a decisive strategic advantage, but after acquiring the ability to behave deceptively, as in the previous reason).

Comment author: AlexMennen 27 February 2017 04:59:23AM 3 points [-]

5) Look at the MIRI and 80k AI Safety syllabus, and see if how much of it looks like something you'd be excited to learn. If applicable to you, consider diving into that so you can contribute to the cutting edge of knowledge. This may make most sense if you do it through


Comment author: AlexMennen 01 January 2017 02:05:42AM 4 points [-]

Do any animal welfare EAs have anything to say on animal products from ethically raised animals, and how to identify such animal products? It seems plausible to me that consumption of such animal products could even be morally positive on net, if the animals are treated well enough to have lives worth living, and raising them does not reduce wild animal populations much more than the production of non-animal-product substitutes. Most animal welfare EAs seem confident that almost all animals raised for the production of animal products do not live lives worth living, and that most claims by producers that their animals are treated well are false. However, there are independent organizations (e.g. the Cornucopia Institute's egg and dairy scorecards) that agree that such claims are often false, but also claim to be able to identify producers that do treat their animals well. Thoughts?

In response to Lunar Colony
Comment author: AlexMennen 19 December 2016 05:51:42PM 10 points [-]

One thing to keep in mind is that we currently don't have the ability to create a space colony that can sustain itself indefinitely. So pursuing a strategy of creating a space colony in case of human life on Earth being destroyed probably should look like capacity-building so that we can create an indefinitely self-sustaining space colony, rather than just creating a space colony.

Comment author: Maxdalton 12 December 2016 08:01:03AM *  2 points [-]

Hi Alex, thanks for the comment, great to pick up issues like this.

I wrote the article, and I agree and am aware of your original point. Your edit is also correct in that we are using risk aversion in the psychological/pure sense, and so the VNM theory does imply that this form of risk aversion is irrational. However, I think you're right that, given that people are more likely to have heard of the concept of economic risk aversion, the expected value article is likely to be misleading. I have edited to emphasise the way that we're using risk aversion in these articles, and to clarify that VNM alone does not imply risk neutrality in an economic sense. I've also added a bit more discussion of economic risk aversion. Further feedback welcome!

Comment author: AlexMennen 12 December 2016 10:03:16PM 0 points [-]

Even though the last paragraph of the expected value maximization article now says that it's talking about the VNM notion of expected value, the rest of the article still seems to be talking about the naive notion of expected value that is linear with respect to things of value (in the examples given, years of fulfilled life). This makes the last paragraph seem pretty out of place in the article.

Nitpicks on the risk aversion article: "However, it seems like there are fewer reasons for altruists to be risk-neutral in the economic sense" is a confusing way of starting a paragraph about how it probably makes sense for altruists to be close to economically risk-neutral as well. And I'm not sure what "unless some version of pure risk-aversion is true" is supposed to mean.

Comment author: MikeJohnson 10 December 2016 05:03:39AM *  3 points [-]

Hi Jessica,

Thanks for the thoughtful note. I do want to be very clear that I’m not criticizing MIRI’s work on CEV, which I do like very much! - It seems like the best intuition pump & Schelling Point in its area, and I think it has potential to be more.

My core offering in this space (where I expect most of the value to be) is Principia Qualia- it’s more up-to-date and comprehensive than the blog post you’re referencing. I pose some hypotheticals in the blog post, but it isn’t intended to stand alone as a substantive work (whereas PQ is).

But I had some thoughts in response to your response on valence + AI safety:

->1. First, I agree that leaving our future moral trajectory in the hands of humans is a great thing. I’m definitely not advocating anything else.

->2. But I would push back on whether our current ethical theories are very good- i.e., good enough to see us through any future AGI transition without needlessly risking substantial amounts of value.

To give one example: currently, some people make the claim that animals such as cows are much more capable of suffering than humans, because they don’t have much intellect to blunt their raw, emotional feeling. Other people make the claim that cows are much less capable of suffering than humans, because they don’t have the ‘bootstrapping strange loop’ mind architecture enabled by language, and necessary for consciousness. Worryingly, both of these arguments seem plausible, with no good way to pick between them.

Now, I don’t think cows are in a strange quantum superposition of both suffering and not suffering— I think there’s a fact of the matter, though we clearly don’t know it.

This example may have moral implications, but little relevance to existential risk. However, when we start talking about mind simulations and ‘thought crime’, WBE, selfish replicators, and other sorts of tradeoffs where there might be unknown unknowns with respect to moral value, it seems clear to me that these issues will rapidly become much more pressing. So, I absolutely believe work on these topics is important, and quite possibly a matter of survival. (And I think it's tractable, based on work already done.)

Based on my understanding, I don’t think Act-based agents or Task AI would help resolve these questions by default, although as tools they could probably help.

->3. I also think theories in IIT’s reference class won’t be correct, but I suspect I define the reference class much differently. :) Based on my categorization, I would object to lumping my theory into IIT’s reference class (we could talk more about this if you'd like).

->4. Re: suffering computations- a big, interesting question here is whether moral value should be defined at the physical or computational level. I.e., “is moral value made out of quarks or bits (or something else)?” — this may be the crux of our disagreement, since I’m a physicalist and I gather you’re a computationalist. But PQ’s framework allows for bits to be “where the magic happens”, as long as certain conditions obtain.

One factor that bears mentioning is whether an AGI’s ontology & theory of ethics might be path-dependent upon its creators’ metaphysics in such a way that it would be difficult for it to update if it’s wrong. If this is a plausible concern, this would imply a time-sensitive factor in resolving the philosophical confusion around consciousness, valence, moral value, etc.

->5. I wouldn’t advocate strictly hedonic values (this was ambiguous in the blog post but is clearer in Principia Qualia).

->6. However, I do think that “how much horrific suffering is there in possible world X?” is a hands-down, qualitatively better proxy for whether it’s a desirable future than “what is the Dow Jones closing price in possible world X?”

->7. Re: neuromorphic AIs: I think an interesting angle here is, “how does boredom stop humans from wireheading on pleasurable stimuli?” - I view boredom as a sophisticated anti-wireheading technology. It seems possible (although I can’t vouch for plausible yet) that if we understand the precise mechanism by which boredom is implemented in human brains, it may help us understand and/or control neuromorphic AGIs better. But this is very speculative, and undeveloped.

Comment author: AlexMennen 12 December 2016 04:54:37AM 1 point [-]

->3. I also think theories in IIT’s reference class won’t be correct, but I suspect I define the reference class much differently. :) Based on my categorization, I would object to lumping my theory into IIT’s reference class (we could talk more about this if you'd like).

I'm curious about this, since you mentioned fixing IIT's flaws. I came to the comments to make the same complaint you were responding to Jessica about.

Comment author: AlexMennen 10 December 2016 08:57:55PM *  2 points [-]

The article on expected value theory incorrectly cites the VNM theorem as a defense of maximizing expected value. The VNM theorem says that for a rational agent, there must exist some measure of value for which the rational agent maximizes its expectation, but the theorem does not say anything about the structure of that measure of value. In particular, it does not say that value must be linear with respect to anything, so it does not give a reason not to be risk averse. There are good reasons for altruists to have very low risk aversion, but the VNM theorem is not a sufficient such reason.

Edit: I see the article on risk aversion clarifies that "risk aversion" means in the psychological sense, but without that context, it looks like the expected value article is saying that many EAs think altruists should have low risk aversion in the economic sense, which is true, an important point, and not supported by the VNM theorem. Also, the economics version of risk aversion is also an important concept for EAs, so I don't think it's a good idea to establish that "risk aversion" only refers to the psychological notion by default, rather than clarifying it every time.

Edit 2: Since this stuff is kind of a pet peeve of mine, I'd actually be willing to attempt to rewrite those articles myself, and if you're interested, I would let you use and modify whatever I write however you want.

Comment author: [deleted] 13 October 2016 01:46:56PM *  -2 points [-]


In response to comment by [deleted] on Ask MIRI Anything (AMA)
Comment author: AlexMennen 14 October 2016 12:03:47AM 4 points [-]

If many people intrinsically value the proliferation of natural Darwinian ecosystems, and the fact that animals in such ecosystems suffer significantly would not change their mind, then that could happen. If it's just that many people think it would be better for there to be more such ecosystems because they falsely believe that wild animals experience little suffering, and would prefer otherwise if their empirical beliefs were correct, then a human-friendly AI should not bring many such ecosystems into existence.

Comment author: [deleted] 12 October 2016 03:38:54AM *  7 points [-]

A lot of the discourse around AI safety uses terms like "human-friendly" or "human interests". Does MIRI's conception of friendly AI take the interests of non-human sentient beings into consideration as well? Especially troubling to me is Yudkowsky's view on animal consciousness, but I'm not sure how representative his views are of MIRI in general.

(I realize that MIRI's research focuses mainly on alignment theory, not target selection, but I am still concerned about this issue.)

In response to comment by [deleted] on Ask MIRI Anything (AMA)
Comment author: AlexMennen 13 October 2016 04:17:58AM 3 points [-]

I am not a MIRI employee, and this comment should not be interpreted as a response from MIRI, but I wanted to throw my two cents in about this topic.

I think that creating a friendly AI to specifically advance human values would actually turn out okay for animals. Such a human-friendly AI should optimize for everything humans care about, not just the quality of humans' subjective experience. Many humans care a significant amount about the welfare of non-human animals. A human-friendly AI would thus care about animal welfare by proxy through the values of humans. As far as I am aware, there is not a significant number of humans who specifically want animals to suffer. It is extremely common for humans to want things (like food with the taste and texture of bacon) that currently can currently be produced most efficiently at significant expense to non-human animals. However, it seems unlikely that a friendly AI would not be able to find an efficient way of producing bacon that does not involve actual pigs.

Comment author: So8res 11 June 2015 11:50:10PM *  4 points [-]

(1) Eventually. Predicting the future is hard. My 90% confidence interval conditioned on no global catastrophes is maybe 5 to 80 years. That is to say, I don't know.

(2) I fairly strongly expect a fast takeoff. (Interesting aside: I was recently at a dinner full of AI scientists, some of them very skeptical about the whole long-term safety problem, who unanimously professed that they expect a fast takeoff -- I'm not sure yet how to square this with the fact that Bostrom's survey showed fast takeoff was a minority position).

It seems hard (but not impossible) to build something that's better than humans at designing AI systems & has access to its own software and new hardware, which does not self improve rapidly. Scenarios where this doesn't occur include (a) scenarios where the top AI systems are strongly hardware limited; (b) scenarios where all operators of all AI systems successfully remove all incentives to self-improve; or (c) the first AI system is strong enough to prevent all intelligence explosions, but is also constructed such that it does not itself self-improve. The first two scenarios seem unlikely from here, the third is more plausible (if the frontrunners explicitly try to achieve it) but still seems like a difficult target to hit.

(3) I think we're pretty likely to eventually get a singleton: in order to get a multi-polar outcome, you need to have a lot of systems that are roughly at the same level of ability for a long time. That seems difficult but not impossible. (For example, this is much more likely to happen if the early AGI designs are open-sourced and early AGI algorithms are incredibly inefficient such that progress is very slow and all the major players progress in lockstep.)

Remember that history is full of cases where a better way of doing things ends up taking over the world -- humans over the other animals, agriculture dominating hunting & gathering, the Brits, industrialization, etc. (Agriculture and arguably industrialization emerged separately in different places, but in both cases the associated memes still conquered the world.) One plausible outcome is that we get a series of almost-singletons that can't quite wipe out other weaker entities and therefore eventually go into decline (which is also a common pattern throughout history), but I expect superintelligent systems to be much better at "finishing the job" and securing very long-term power than, say, the Romans were. Thus, I expect a singleton outcome in the long run.

The run-up to that may look pretty strange, though.

Comment author: AlexMennen 12 June 2015 12:08:08AM 3 points [-]

I was recently at a dinner full of AI scientists, some of them very skeptical about the whole long-term safety problem, who unanimously professed that they expect a fast takeoff -- I'm not sure yet how to square this with the fact that Bostrom's survey showed fast takeoff was a minority position.

Perhaps the first of them to voice a position on the matter expected a fast takeoff and was held in high regard by the others, so they followed along, having not previously thought about it?

View more: Next