Comment author: Robert_Wiblin 05 September 2017 11:21:30PM 1 point [-]

The term existential risk has serious problems - it has no obvious meaning unless you've studied what it means (is this about existentialism?!), and is very often misused even by people familiar with it (to mean extinction only, neglecting other persistent 'trajectory changes').

Comment author: RobBensinger 06 September 2017 08:12:51AM 1 point [-]

"Existential risk" has the advantage over "long-term future" and "far future" that it sounds like a technical term, so people are more likely to Google it if they haven't encountered it (though admittedly this won't fully address people who think they know what it means without actually knowing). In contrast, someone might just assume they know what "long-term future" and "far future" means, and if they do Google those terms they'll have a harder time getting a relevant or consistent definition. Plus "long-term future" still has the problem that it suggests existential risk can't be a near-term issue, even though some people working on existential risk are focusing on nearer-term scenarios than, e.g., some people working on factory farming abolition.

I think "global catastrophic risk" or "technological risk" would work fine for this purpose, though, and avoids the main concerns raised for both categories. ("Technological risk" also strikes me as a more informative / relevant / joint-carving category than the others considered, since x-risk and far future can overlap more with environmentalism, animal welfare, etc.)

Comment author: Kerry_Vaughan 07 July 2017 10:55:00PM 2 points [-]

3c. Other research, especially "learning to reason from humans," looks more promising than HRAD (75%?)

I haven't thought about this in detail, but you might think that whether the evidence in this section justifies the claim in 3c might depend, in part, on what you think the AI Safety project is trying to achieve.

On first pass, the "learning to reason from humans" project seems like it may be able to quickly and substantially reduce the chance of an AI catastrophe by introducing human guidance as a mechanism for making AI systems more conservative.

However, it doesn't seem like a project that aims to do either of the following:

(1) Reduce the risk of an AI catastrophe to zero (or near zero) (2) Produce an AI system that can help create an optimal world

If you think either (1) or (2) are the goals of AI Safety, then you might not be excited about the "learning to reason from humans" project.

You might think that "learning to reason from humans" doesn't accomplish (1) because a) logic and mathematics seem to be the only methods we have for stating things with extremely high certainty, and b) you probably can't rule out AI catastrophes with high certainty unless you can "peer inside the machine" so to speak. HRAD might allow you to peer inside the machine and make statements about what the machine will do with extremely high certainty.

You might think that "learning to reason from humans" doesn't accomplish (2) because it makes the AI human-limited. If we want an advanced AI to help us create the kind of world that humans would want "if we knew more, thought faster, were more the people we wished we were" etc. then the approval of actual humans might, at some point, cease to be helpful.

Comment author: RobBensinger 08 July 2017 01:52:08AM *  7 points [-]

FWIW, I don't think (1) or (2) plays a role in why MIRI researchers work on the research they do, and I don't think they play a role in why people at MIRI think "learning to reason from humans" isn't likely to be sufficient. The shape of the "HRAD is more promising than act-based agents" claim is more like what Paul Christiano said here:

As far as I can tell, the MIRI view is that my work is aimed at [a] problem which is not possible, not that it is aimed at a problem which is too easy. [...] One part of this is the disagreement about whether the overall approach I'm taking could possibly work, with my position being "something like 50-50" the MIRI position being "obviously not" [...]

There is a broader disagreement about whether any "easy" approach can work, with my position being "you should try the easy approaches extensively before trying to rally the community behind a crazy hard approach" and the MIRI position apparently being something like "we have basically ruled out the easy approaches, but the argument/evidence is really complicated and subtle."

With a clarification I made in the same thread:

I think Paul's characterization is right, except I think Nate wouldn't say "we've ruled out all the prima facie easy approaches," but rather something like "part of the disagreement here is about which approaches are prima facie 'easy.'" I think his model says that the proposed alternatives to MIRI's research directions by and large look more difficult than what MIRI's trying to do, from a naive traditional CS/Econ standpoint. E.g., I expect the average game theorist would find a utility/objective/reward-centered framework much less weird than a recursive intelligence bootstrapping framework. There are then subtle arguments for why intelligence bootstrapping might turn out to be easy, which Nate and co. are skeptical of, but hashing out the full chain of reasoning for why a daring unconventional approach just might turn out to work anyway requires some complicated extra dialoguing. Part of how this is framed depends on what problem categories get the first-pass "this looks really tricky to pull off" label.

Comment author: JoeW 18 April 2017 08:30:52PM *  0 points [-]

To clarify: I don't think it will be especially fruitful to try to ensure AIs are conscious, for the reason you mention: multipolar scenarios don't really work that way, what will happen is determined by what's efficient in a competitive world, which doesn't allow much room to make changes now that will actually persist.

And yes, if a singleton is inevitable, then our only hope for a good future is to do our best to align the singleton, so that it uses its uncontested power to do good things rather than just to pursue whatever nonsense goal it will have been given otherwise.

What I'm concerned about is the possibility that a singleton is not inevitable (which seems to me the most likely scenario) but that folks attempt to create one anyway. This includes realities where a singleton is impossible or close to it, as well as where a singleton is possible but only with some effort made to push towards that outcome. An example of the latter would just be a soft takeoff coupled with an attempt at forming a world government to control the AI - such a scenario certainly seems to me like it could fit the "possible but not inevitable" description.

A world takeover attempt has the potential to go very, very wrong - and then there's the serious possibility that the creation of the singleton would be successful but the alignment of it would not. Given this, I don't think it makes sense to push unequivocally for this option, with the enormous risks it entails, until we have a good idea of what the alternative looks like. That we can't control that alternative is irrelevant - we can still understand it! When we have a reasonable picture of that scenario, then we can start to think about whether it's so bad that we should embark on dangerous risky strategies to try to avoid it.

One element of that understanding would be on how likely AIs are to be conscious; another would be how good or bad a life conscious AIs would have in a multipolar scenario. I agree entirely that we don't know this yet - whether for rabbits or for future AIs - that's part of what I'd need to understand before I'd agree that a singleton seems like our best chance at a good future.

Comment author: RobBensinger 27 April 2017 10:59:56PM *  1 point [-]

Did anything in Nate's post or my comments strike you as "pushing for a singleton"? When people say "singleton," I usually understand them to have in mind some kind of world takeover, which sounds like what you're talking about here. The strategy people at MIRI favor tends to be more like "figure out what minimal AI system can end the acute risk period (in particular, from singletons), while doing as little else as possible; then steer toward that kind of system". This shouldn't be via world takeover if there's any less-ambitious path to that outcome, because any added capability, or any added wrinkle in the goal you're using the system for, increases accident risk.

More generally, alignment is something that you can partially solve for systems with some particular set of capabilities, rather than being all-or-nothing.

I agree entirely that we don't know this yet - whether for rabbits or for future AIs - that's part of what I'd need to understand before I'd agree that a singleton seems like our best chance at a good future.

I think it's much less likely that we can learn that kind of generalization in advance than that we can solve most of the alignment problem in advance. Additionally, solving this doesn't in any obvious way get you any closer to being able to block singletons from being developed, in the scenario where singletons are "possible but only with some effort made". Knowing about the utility of a multipolar outcome where no one ever builds a singleton can be useful for knowing whether you should aim for a multipolar outcome where no one ever build a singleton, but it doesn't get us any closer to knowing how to prevent anyone from ever building a singleton if you find a way to achieve an initially multipolar outcome.

I'd also add that I think the risk of producing bad conscious states via non-aligned AI mainly lies in AI systems potentially having parts or subsystems that are conscious, rather than in the system as a whole (or executive components) being conscious in the fashion of a human.

Comment author: JoeW 16 April 2017 10:33:11PM *  0 points [-]

Thanks for the reply. You're right that we can't be sure that conscious beings will do good things, but we don't have that assurance for any outcome we might push for.

If AIs are conscious, then a multipolar future filled with vast numbers of unaligned AIs could very plausibly be a wonderful future, brimming with utility. This isn't overwhelmingly obvious, but it's a real possibility. By contrast, if AIs aren't conscious then this scenario would represent a dead future. So distinguishing the two seems quite vital to understanding whether a multipolar outcome is bad or good.

You point out that even compared to the optimistic scenario I describe above, a correctly-aligned singleton could do better, by ensuring the very best future possible. True, but if a singleton isn't inevitable, trying to create one will itself pose a serious existential risk, as would any similar world takeover attempt. And even if the attempt is successful, we all agree that creating an aligned singleton is a very difficult task. Most singleton outcomes result in a universe almost entirely full of dead matter, produced by the singleton AI optimising for something irrelevant; even if it's conscious itself, resources that could have been put towards creating utility are almost all wasted as paperclips or whatever.

So it seems to me that, unless you're quite certain we're headed for a singleton future, the question of whether AIs will be conscious or not has a pretty huge impact on what path we should try to take.

Comment author: RobBensinger 17 April 2017 03:26:30AM *  1 point [-]

You're right that we can't be sure that conscious beings will do good things, but we don't have that assurance for any outcome we might push for.

One way to think about the goal is that we want to "zero in" on valuable futures: it's unclear what exactly a good future looks like, and we can't get an "assurance," but for example a massive Manhattan Project to develop whole brain emulation is a not-implausible path to zeroing in, assuming WBE isn't too difficult to achieve on the relevant timescale and assuming you can avoid accelerating difficult-to-align AI too much in the process. It's a potentially promising option for zeroing in because emulated humans could be leveraged to do a lot of cognitive work in a compressed period of time to sort out key questions in moral psychology+philosophy, neuroscience, computer science, etc. that we need to answer in order to get a better picture of good outcomes.

This is also true for a Manhattan Project to develop a powerful search algorithm that generates smart creative policies to satisfy our values, while excluding hazardous parts of the search space -- this is the AI route.

Trying to ensure that AI is conscious, without also solving WBE or alignment or global coordination or something of that kind in the process, doesn't have this "zeroing in" property. It's more of a gamble that hopefully good-ish futures have a high enough base rate even when we don't put a lot of work into steering in a specific direction, that maybe arbitrary conscious systems would make good things happen. But building a future of conscious AI systems opens up a lot of ways for suffering to end up proliferating in the universe, just as it opens up a lot of ways for happiness to end up proliferating in the universe. Just as it isn't obvious that e.g. rabbits experience more joy than suffering in the natural world, it isn't obvious that conscious AI systems in a multipolar outcome would experience more joy than suffering. (Or otherwise experience good-on-net lives.)

True, but if a singleton isn't inevitable, trying to create one will itself pose a serious existential risk, as would any similar world takeover attempt.

I think if an AI singleton isn't infeasible or prohibitively difficult to achieve, then it's likely to happen eventually regardless of what we'd ideally prefer to have happen, absent some intervention to prevent it. Either it's not achievable, or something needs to occur to prevent anyone in the world from reaching that point. If you're worried about singletons, I don't think pursuing multipolar outcomes and/or conscious-AI outcomes should be a priority for you, because I don't think either of those paths concentrates very much probability mass (if any) into scenarios where singletons start off feasible but something blocks them from occurring.

Multipolar scenarios are likelier to occur in scenarios where singletons simply aren't feasible, as a background fact about the universe; but conditional on singletons being feasible, I'm skeptical that achieving a multipolar AI outcome would do much (if anything) to prevent a singleton from occurring afterward, and I think it would make alignment much more difficult.

Alignment and WBE look like difficult tasks, but they have the "zeroing in" property, and we don't know exactly how difficult they are. Alignment in particular could turn out to be much harder than it looks or much easier, because there's so little understanding of what specifically is required. (Investigating WBE has less value-of-information because we already have some decent WBE roadmaps.)

Comment author: JoeW 14 April 2017 10:41:09PM *  1 point [-]

I'm still not sure how the consciousness issue can just be ignored. Yes, given the assumption that AIs will be mindless machines with no moral value, obviously we need to build them to serve humans. But if AIs will be conscious creatures with moral value like us, then...? In this case finding the right thing to do seems like a much harder problem, as it would be far from clear that a future in which machine intelligences gradually replace human intelligences represents a nightmare scenario, or even an existential risk at all. It's especially frustrating to see AI-risk folks treat this question as an irrelevance, since it seems to have such enormous implications on how important AI alignment actually is.

(Note that I'm not invoking 'ghost in the machine', I am making the very reasonable guess that our consciousness is a physical process that occurs in our brains, that it's there for the same reason other features of our minds are there - because it's adaptive - and that similar functionality might very plausibly be useful for an AI too.)

Comment author: RobBensinger 15 April 2017 01:45:36AM *  6 points [-]

Consciousness is something people at MIRI have thought about quite a bit, and we don't think people should ignore the issue. But it's a conceptually separate issue from intelligence, and it's important to be clear about that.

One reason to prioritize the AI alignment issue over consciousness research is that a sufficiently general solution to AI alignment would hopefully also resolve the consciousness problem: we care about the suffering, happiness, etc. of minds in general, so if we successfully build a system that shares and promotes our values, that system would hopefully also look out for the welfare of conscious machines, if any exist. That includes looking out for its own welfare, if it's conscious.

In contrast, a solution to the consciousness problem doesn't get us a solution to the alignment problem, because there's no assurance that conscious beings will do things that are good (including good for the welfare of conscious machines).

Some consciousness-related issues are subsumed in alignment, though. E.g., a plausible desideratum for early general AI systems is "limit its ability to model minds" ('behaviorist' AI). And we do want to drive down the probability that the system is conscious if we can find a way to do that.

Comment author: RyanCarey 04 April 2017 11:45:44PM 0 points [-]

Agree that that's the most common operationalization of a GCR. It's a bit inelegant for GCR not to include all x-risks though, especially given that it is used interchangeably within EA.

It would odd if the onset of a permanently miserable dictatorship didn't count as a global catastrophe because no lives were lost.

Comment author: RobBensinger 05 April 2017 05:53:14PM *  0 points [-]

Could you or Will provide an example of a source that explicitly uses "GCR" and "xrisk" in such a way that there are non-GCR xrisks? You say this is the most common operationalization, but I'm only finding examples that treat xrisk as a subset of GCR, as the Bostrom quote above does.

Comment author: William_MacAskill 30 March 2017 11:34:06PM *  9 points [-]

Agree that GCRs are a within-our-lifetime problem. But in my view mitigating GCRs is unlikely to be the optimal donation target if you are only considering the impact on beings alive today. Do you know of any sources that make the opposite case?

And it's framed as long-run future because we think that there are potentially lots of things that could have a huge positive on the value of the long-run future which aren't GCRs - like humanity having the right values, for example.

Comment author: RobBensinger 31 March 2017 08:43:25PM *  3 points [-]

And it's framed as long-run future because we think that there are potentially lots of things that could have a huge positive on the value of the long-run future which aren't GCRs - like humanity having the right values, for example.

I don't have much to add to what Rob W and Carl said, but I'll note that Bostrom defined "existential risk" like this back in 2008:

A subset of global catastrophic risks is existential risks. An existential risk is one that threatens to cause the extinction of Earth-originating intelligent life or to reduce its quality of life (compared to what would otherwise have been possible) permanently and drastically.

Presumably we should replace "intelligent" here with "sentient" or similar. The reason I'm quoting this is that on the above definition, it sounds like any potential future event or process that would cost us a large portion of the future's value counts as an xrisk (and therefore as a GCR). 'Humanity's moral progress stagnates or we otherwise end up with the wrong values' sounds like a global catastrophic risk to me, on that definition. (From a perspective that does care about long-term issues, at least.)

I'll note that I think there's at least some disagreement at FHI / Open Phil / etc. about how best to define terms like "GCR", and I don't know if there's currently a consensus or what that consensus is. Also worth noting that the "risk" part is more clearly relevant than the "global catastrophe" part -- malaria and factory farming are arguably global catastrophes in Bostrom's sense, but they aren't "risks" in the relevant sense, because they're already occurring.

Comment author: RobBensinger 30 March 2017 09:47:22PM *  14 points [-]

To some, it’s just obvious that future lives have value and the highest priority is fighting existential threats to humanity (‘X-risks’).

I realize this is just an example, but I want to mention as a side-note that I find it weird what a common framing this is. AFAIK almost everyone working on existential risk think it's a serious concern in our lifetimes, not specifically a "far future" issue or one that turns on whether it's good to create new people.

As an example of what I have in mind, I don't understand why the GCR-focused EA Fund is framed as a "long-term future" fund (unless I'm misunderstanding the kinds of GCR interventions it's planning to focus on), or why philosophical stances like the person-affecting view and presentism are foregrounded. The natural things I'd expect to be foregrounded are factual questions about the probability and magnitude over the coming decades of the specific GCRs EAs are most worried about.

Comment author: RobBensinger 17 March 2017 10:43:57PM *  4 points [-]

I think wild animal suffering isn't a long-term issue except in scenarios where we go extinct for non-AGI-related reasons. The three likeliest scenarios are:

  1. Humans leverage AGI-related technologies in a way that promotes human welfare as well as (non-human) animal welfare.

  2. Humans leverage AGI-related technologies in a way that promotes human welfare and is effectively indifferent to animal welfare.

  3. Humans accidentally use AGI-related technologies in a way that is indifferent to human and animal welfare.

In all three scenarios, the decision-makers are likely to have "ambitious" goals that favor seizing more and more resources. In scenario 2, efficient resource use almost certainly implies that biological human bodies and brains get switched out for computing hardware running humans, and that wild animals are replaced with more computing hardware, energy/cooling infrastructure, etc. Even if biological humans who need food stick around for some reason, it's unlikely that the optimal way to efficiently grow food in the long run will be "grow entire animals, wasting lots of energy on processes that don't directly increase the quantity or quality of the food transmitted to humans".

In scenario 1, wild animals might be euthanized, or uploaded to a substrate where they can live whatever number of high-quality lives seems best. This is by far the best scenario, especially for people who think (actual or potential) non-human animals might have at least some experiences that are of positive value, or at least some positive preferences that are worth fulfilling. I would consider this extremely likely if non-human animals are moral patients at all, though scenario 1 is also strongly preferable if we're uncertain about this question and want to hedge our bets.

Scenario 3 has the same impact on wild animals as scenario 1, and for analogous reasons: resource limitations make it costly to keep wild animals around. 3 is much worse than 1 because human welfare matters so much; even if the average present-day human life turned out to be net-negative, this would be a contingent fact that could be addressed by improving global welfare.

I consider scenario 2 much less likely than scenarios 1 and 3; my point in highlighting it is to note that scenario 2 is similarly good for the purpose of preventing wild animal suffering. I also consider scenario 2 vastly more likely than "sadistic" scenarios where some agent is exerting deliberate effort to produce more suffering in the world, for non-instrumental reasons.

Comment author: ThomasSittler 11 March 2017 10:40:23AM 4 points [-]

I think I've only ever seen cause-neutrality used to mean cause-impartiality.

Comment author: RobBensinger 15 March 2017 01:18:31AM 2 points [-]

The discussion of CFAR's pivot to focusing on existential risk seemed to use "cause-neutral" to mean something like "cause-general".

Confusingly, the way "cause-neutral" was used there directly contradicts its use here: there, it meant avoiding cause-impartially favoring a specific cause based on its apparent expected value, in favor of a cause-partial commitment to pet causes like rationality and EA capacity-building. (Admittedly, at the organizational level it often makes sense to codify some "pet causes" even if in principle the individuals in that organization are trying to maximize global welfare impartially.)

View more: Next