Comment author: RobBensinger 31 October 2017 04:04:06AM *  8 points [-]

This was a really good read! In addition to being super well-timed.

I don't think there's a disagreement here about ideal in-principle reasoning. I’m guessing that the disagreement is about several different points:

  • In reality, how generally difficult is it to spot important institutions and authorities failing in large ways? Where we might ask subquestions for particular kinds of groups; e.g., maybe you and the anti-modest will turn out to agree about how dysfunctional US national politics is on average, while disagreeing about how dysfunctional academia is on average in the US.

  • In reality, how generally difficult is it to evaluate your own level of object-level accuracy in some domain, the strength of object-level considerations in that domain, your general competence or rationality or meta-rationality, etc.? To what extent should we update strongly on various kinds of data about our reasoning ability, vs. distrusting the data source and penalizing the evidence? (Or looking for ways to not have to gather or analyze data like that at all, e.g., prioritizing finding epistemic norms or policies that work relatively OK without such data.)

  • How strong are various biases, either in general or in our environs? It sounds like you think that arrogance, overconfidence, and excess reliance on inside-view arguments are much bigger problems for core EAs than underconfidence or neglect of inside-view arguments, while Eliezer thinks the opposite.

  • What are the most important and useful debiasing interventions? It sounds like you think these mostly look like attempts to reduce overconfidence in inside views, self-aggrandizing biases, and the like, while Eliezer thinks that it's too easy to overcorrect if you organize your epistemology around that goal. I think the anti-modesty view here is that we should mostly address those biases (and other biases) through more local interventions that are sensitive to the individual's state and situation, rather than through rules akin to "be less confident" or "be more confident".

  • What's the track record for more modesty-like views versus less modesty-like views overall?

  • What's the track record for critics of modesty in particular? I would say that Eliezer and his social circle have a really strong epistemic track record, and that this is good evidence that modesty is a bad idea; but I gather you want to use that track record as Exhibit A in the case for modesty being a good idea. So I assume it would help to discuss the object-level disagreements underlying these diverging generalizations.

Does that match your sense of the disagreement?

Comment author: Pablo_Stafforini 30 October 2017 12:35:07PM *  1 point [-]

I think the main two factual disagreements here might be "how often, and to what extent, do top institutions and authorities fail in large and easy-to-spot ways?" and "for epistemic and instrumental purposes, to what extent should people like you and Eliezer trust your own inside-view reasoning about your (and authorities') competency, epistemic rationality, meta-rationality, etc.?"

Thank you, this is extremely clear, and captures the essence of much of what's going between Eliezer and his critics in this area.

Could you say more about what you have in mind by "confident pronouncements [about] AI timelines"? I usually think of Eliezer as very non-confident about timelines.

I had in mind forecasts Eliezer made many years ago that didn't come to pass as well as his most recent bet with Bryan Caplan. But it's a stretch to call these 'confident pronouncements', so I've edited my post and removed 'AI timelines' from the list of examples.

Comment author: RobBensinger 31 October 2017 12:41:35AM *  1 point [-]

Going back to your list:

nutrition, animal consciousness, philosophical zombies, population ethics, and quantum mechanics

I haven't looked much at the nutrition or population ethics discussions, though I understand Eliezer mistakenly endorsed Gary Taubes' theories in the past. If anyone has links, I'd be interested to read more.

AFAIK Eliezer hasn't published why he holds his views about animal consciousness, and I don't know what he's thinking there. I don't have a strong view on whether he's right (or whether he's overconfident).

Concerning zombies: I think Eliezer is correct that the zombie argument can't provide any evidence for the claim that we instantiate mental properties that don't logically supervene on the physical world. Updating on factual evidence is a special case of a causal relationship, and if instantiating some property P is causally impacting our physical brain states and behaviors, then P supervenes on the physical.

I'm happy to talk more about this, and I think questions like this are really relevant to evaluating the track record of anti-modesty positions, so this seems like as good a place as any for discussion. I'm also happy to talk more about meta questions related to this issue, like, "If the argument above is correct, why hasn't it convinced all philosophers of mind?" I don't have super confident views on that question, but there are various obvious possibilities that come to mind.

Concerning QM: I think Eliezer's correct that Copenhagen-associated views like "objective collapse" and "quantum non-realism" are wrong, and that the traditional arguments for these views are variously confused or mistaken, often due to misunderstandings of principles like Ockham's razor. I'm happy to talk more about this too; I think the object-level discussions are important here.

Comment author: Pablo_Stafforini 30 October 2017 12:35:07PM *  1 point [-]

I think the main two factual disagreements here might be "how often, and to what extent, do top institutions and authorities fail in large and easy-to-spot ways?" and "for epistemic and instrumental purposes, to what extent should people like you and Eliezer trust your own inside-view reasoning about your (and authorities') competency, epistemic rationality, meta-rationality, etc.?"

Thank you, this is extremely clear, and captures the essence of much of what's going between Eliezer and his critics in this area.

Could you say more about what you have in mind by "confident pronouncements [about] AI timelines"? I usually think of Eliezer as very non-confident about timelines.

I had in mind forecasts Eliezer made many years ago that didn't come to pass as well as his most recent bet with Bryan Caplan. But it's a stretch to call these 'confident pronouncements', so I've edited my post and removed 'AI timelines' from the list of examples.

Comment author: RobBensinger 31 October 2017 12:18:52AM *  0 points [-]

Cool. Note the bet with Bryan Caplan was partly tongue-in-cheek; though it's true Eliezer is currently relatively pessimistic about humanity's chances.

From Eliezer on Facebook:

Key backstory: I made two major bets in 2016 and lost both of them, one bet against AlphaGo beating Lee Se-dol, and another bet against Trump winning the presidency. In both cases I was betting with the GJP superforecasters, but lost anyway.

Meanwhile Bryan won every one of his bets, again, including his bet that "Donald Trump will not concede the election by Saturday".

So, to take advantage of Bryan's amazing bet-winning capability and my amazing bet-losing capability, I asked Bryan if I could bet him that the world would be destroyed by 2030.

The generator of this bet wasn't a strong epistemic stance, which seems important to emphasize because of the usual expectations involving public bets. BUT you may be licensed to draw conclusions from the fact that, when I was humorously imagining what I could get from exploiting this phenomenon, my System 1 thought that having the world not be destroyed before 2030 was the most it could reasonably ask.

Comment author: ClaireZabel 29 October 2017 10:43:21PM 16 points [-]

Thank so much for the clear and eloquent post. I think a lot of the issues related to lack of expertise and expert bias are stronger than I think you do, and I think it's both rare and not inordinately difficult to adjust for common biases such that in certain cases a less-informed individual can often beat the expert consensus (because few enough of the experts are doing this, for now). But it was useful to read this detailed and compelling explanation of your view.

The following point seems essential, and I think underemphasized:

Modesty can lead to double-counting, or even groupthink. Suppose in the original example Beatrice does what I suggest and revise their credences to be 0.6, but Adam doesn’t. Now Charlie forms his own view (say 0.4 as well) and does the same procedure as Beatrice, so Charlie now holds a credence of 0.6 as well. The average should be lower: (0.8+0.4+0.4)/3, not (0.8+0.6+0.4)/3, but the results are distorted by using one-and-a-half helpings of Adam’s credence. With larger cases one can imagine people wrongly deferring to hold consensus around a view they should think is implausible, and in general the nigh-intractable challenge from trying to infer cases of double counting from the patterns of ‘all things considered’ evidence.

One can rectify this by distinguishing ‘credence by my lights’ versus ‘credence all things considered’. So one can say “Well, by my lights the credence of P is 0.8, but my actual credence is 0.6, once I account for the views of my epistemic peers etc.” Ironically, one’s personal ‘inside view’ of the evidence is usually the most helpful credence to publicly report (as it helps others modestly aggregate), whilst ones all things considered modest view usually for private consumption.

I rarely see any effort to distinguish between the two outside the rationalist/EA communities, which is one reason I think both over-modesty and overconfident backlash against it are common.

My experience is that most reasonable, intelligent people I know have never explicitly thought of the distinction between the two types of credence. I think many of them have an intuition that something would be lost if they stated their "all things considered" credence only, even though it feels "truer" and "more likely to be right," though they haven't formally articulated the problem. And knowing that other people rarely make this distinction, it's hard for everyone know how to update based on others' views without double-counting, as you note.

It seems like it's intuitive for people to state either their inside view, or their all-things-considered view, but not both. To me, stating "both">"inside view only">"outside view only", but I worry that calls for more modest views tend to leak nuance and end up pushing for people to publicly state "outside view only" rather than "both"

Also, I've generally heard people call the "credence by my lights" and "credence all things considered" one's "impressions" and "beliefs," respectively, which I prefer because they are less clunky. Just fyi.

(views my own, not my employer's)

Comment author: RobBensinger 30 October 2017 01:14:06AM *  2 points [-]

The dichotomy I see the most at MIRI is 'one's inside-view model' v. 'one's belief', where the latter tries to take into account things like model uncertainty, outside-view debiasing for addressing things like the planning fallacy, and deference to epistemic peers. Nate draws this distinction a lot.

Comment author: Pablo_Stafforini 29 October 2017 08:41:57PM *  1 point [-]

I never claimed that this is what Eliezer was doing in that particular case, or in other cases. (I'm not even sure I understand Eliezer's position.) I was responding to the previous comment, and drawing a parallel between "beating the market" in that and other contexts. I'm sorry if this was unclear.

To address your substantive point: If the claim is that we shouldn't give much weight to the views of individuals and institutions that we shouldn't expect them to be good at tracking the truth, despite their status or prominence in society, this is something that hardly any rationalist or EA would dispute. Nor does this vindicate various confident pronouncements Eliezer has made in the past—about nutrition, animal consciousness, philosophical zombies, population ethics, and quantum mechanics, to name a few—that deviate significantly from expert opinion, unless this is conjoined with credible arguments for thinking that warranted skepticism extends to each of those expert communities. To my knowledge, no persuasive arguments of this sort have been provided.

Comment author: RobBensinger 29 October 2017 09:43:57PM *  2 points [-]

Yeah, I wasn't saying that you were making a claim about Eliezer; I just wanted to highlight that he's possibly making a stronger claim even than the one you're warning against when you say "one should generally distrust one's ability to 'beat elite common sense' even if one thinks one can accurately diagnose why members of this reference class are wrong in this particular instance".

If the claim is that we shouldn't give much weight to the views of individuals and institutions that we shouldn't expect to be closely aligned with the truth, this is something that hardly anyone would dispute.

I think the main two factual disagreements here might be "how often, and to what extent, do top institutions and authorities fail in large and easy-to-spot ways?" and "for epistemic and instrumental purposes, to what extent should people like you and Eliezer trust your own inside-view reasoning about your (and authorities') competency, epistemic rationality, meta-rationality, etc.?" I don't know whether you in particular would disagree with Eliezer on those claims, though it sounds like you may.

Nor does this vindicate various confident pronouncements Eliezer has made in the past—about nutrition, animal consciousness, AI timelines, philosophical zombies, population ethics, etc.—unless it is conjoined with an argument for thinking that his skepticism extends to the relevant community of experts in each of those fields.

Yeah, agreed. The "adequacy" level of those fields, and the base adequacy level of civilization as a whole, is one of the most important questions here.

Could you say more about what you have in mind by "confident pronouncements [about] AI timelines"? I usually think of Eliezer as very non-confident about timelines.

Comment author: Austen_Forrester 29 October 2017 04:38:46PM 0 points [-]

I agree that financial incentives/disincentives result in failures (ie. social problems) of all kinds. One of the biggest reasons, as I'm sure you mention at some point in your book, is corruption. ie. the beef/dairy industry pays off environmental NGOs and government to stay quiet about their environmental impact.

But don't you think that non-financial rewards/punishment also play a large role in impeding social progress, in particular social rewards/punishment? ie. people don't wear enough to stay warm in the winter because others will tease them for being uncool, people bully others because they are then respected more, etc.

Comment author: RobBensinger 29 October 2017 05:29:15PM *  0 points [-]

Non-financial incentives clearly play a major role both in dysfunctional systems and in well-functioning ones. A lot of those incentives are harder to observe and quantify, though; and I'd expect them to vary more interpersonally, and to be harder to intervene on in cases like the Bank of Japan.

It isn't so surprising if (say) key decisionmakers at the Bank of Japan cared more about winning the esteem of particular friends and colleagues at dinner parties, than about the social pressure from other people to change course; or if they cared more about their commitment to a certain ideology or self-image; or any number of other small day-to-day factors. Whereas it would be genuinely surprising if those commonplace small factors were able to outweigh a large financial incentive.

Comment author: Pablo_Stafforini 29 October 2017 11:32:55AM *  6 points [-]

The reason people aren't doing this is probably that it isn't profitable once you account for import duties, value added tax and customs clearance fees, as well as the time costs of transacting in the black market. I'm from Argentina and have investigated this in the past for other electronics, so my default assumption is that these reasons generalize to this particular case.

I think this discussion provides a good illustration of the following principle: you should usually be skeptical of your ability to "beat the market" even if you are able to come up with a plausible explanation of the phenomenon in question from which it follows that your circumstances are unique.

Similarly, I think one should generally distrust one's ability to "beat elite common sense" even if one thinks one can accurately diagnose why members of this reference class are wrong in this particular instance.

Very rarely, you may be able to do better than the market or the experts, but knowing that this is one of those cases takes much more than saying "I have a story that implies I can do this, and this story looks plausible to me."

Comment author: RobBensinger 29 October 2017 02:38:46PM *  1 point [-]

Similarly, I think one should generally distrust one's ability to "beat elite common sense" even if one thinks one can accurately diagnose why members of this reference class are wrong in this particular instance.

Note that in Eliezer's example above, he isn't claiming to have any diagnosis at all of what led the Bank of Japan to reach the wrong conclusion. The premise isn't "I have good reason to think the Bank of Japan is biased/mistaken in this particular way in this case," but rather: "It's unsurprising for institutions like the Bank of Japan to be wrong in easy-to-demonstrate ways, so it doesn't take a ton of object-level evidence for me to reach a confident conclusion that they're wrong on the object level, even if I have no idea what particular mistake they're making, what their reasons are, etc. The Bank of Japan just isn't the kind of institution that we should strongly expect to be right or wrong on this kind of issue (even though this issue is basic to its institutional function); so moderate amounts of ordinary object-level evidence can be dispositive all on its own."

From:

[W]hen I read some econbloggers who I’d seen being right about empirical predictions before saying that Japan was being grotesquely silly, and the economic logic seemed to me to check out, as best I could follow it, I wasn’t particularly reluctant to believe them. Standard economic theory, generalized beyond the markets to other facets of society, did not seem to me to predict that the Bank of Japan must act wisely for the good of Japan. It would be no surprise if they were competent, but also not much of a surprise if they were incompetent.

Comment author: Robert_Wiblin 05 September 2017 11:21:30PM 1 point [-]

The term existential risk has serious problems - it has no obvious meaning unless you've studied what it means (is this about existentialism?!), and is very often misused even by people familiar with it (to mean extinction only, neglecting other persistent 'trajectory changes').

Comment author: RobBensinger 06 September 2017 08:12:51AM 1 point [-]

"Existential risk" has the advantage over "long-term future" and "far future" that it sounds like a technical term, so people are more likely to Google it if they haven't encountered it (though admittedly this won't fully address people who think they know what it means without actually knowing). In contrast, someone might just assume they know what "long-term future" and "far future" means, and if they do Google those terms they'll have a harder time getting a relevant or consistent definition. Plus "long-term future" still has the problem that it suggests existential risk can't be a near-term issue, even though some people working on existential risk are focusing on nearer-term scenarios than, e.g., some people working on factory farming abolition.

I think "global catastrophic risk" or "technological risk" would work fine for this purpose, though, and avoids the main concerns raised for both categories. ("Technological risk" also strikes me as a more informative / relevant / joint-carving category than the others considered, since x-risk and far future can overlap more with environmentalism, animal welfare, etc.)

Comment author: Kerry_Vaughan 07 July 2017 10:55:00PM 2 points [-]

3c. Other research, especially "learning to reason from humans," looks more promising than HRAD (75%?)

I haven't thought about this in detail, but you might think that whether the evidence in this section justifies the claim in 3c might depend, in part, on what you think the AI Safety project is trying to achieve.

On first pass, the "learning to reason from humans" project seems like it may be able to quickly and substantially reduce the chance of an AI catastrophe by introducing human guidance as a mechanism for making AI systems more conservative.

However, it doesn't seem like a project that aims to do either of the following:

(1) Reduce the risk of an AI catastrophe to zero (or near zero) (2) Produce an AI system that can help create an optimal world

If you think either (1) or (2) are the goals of AI Safety, then you might not be excited about the "learning to reason from humans" project.

You might think that "learning to reason from humans" doesn't accomplish (1) because a) logic and mathematics seem to be the only methods we have for stating things with extremely high certainty, and b) you probably can't rule out AI catastrophes with high certainty unless you can "peer inside the machine" so to speak. HRAD might allow you to peer inside the machine and make statements about what the machine will do with extremely high certainty.

You might think that "learning to reason from humans" doesn't accomplish (2) because it makes the AI human-limited. If we want an advanced AI to help us create the kind of world that humans would want "if we knew more, thought faster, were more the people we wished we were" etc. then the approval of actual humans might, at some point, cease to be helpful.

Comment author: RobBensinger 08 July 2017 01:52:08AM *  7 points [-]

FWIW, I don't think (1) or (2) plays a role in why MIRI researchers work on the research they do, and I don't think they play a role in why people at MIRI think "learning to reason from humans" isn't likely to be sufficient. The shape of the "HRAD is more promising than act-based agents" claim is more like what Paul Christiano said here:

As far as I can tell, the MIRI view is that my work is aimed at [a] problem which is not possible, not that it is aimed at a problem which is too easy. [...] One part of this is the disagreement about whether the overall approach I'm taking could possibly work, with my position being "something like 50-50" the MIRI position being "obviously not" [...]

There is a broader disagreement about whether any "easy" approach can work, with my position being "you should try the easy approaches extensively before trying to rally the community behind a crazy hard approach" and the MIRI position apparently being something like "we have basically ruled out the easy approaches, but the argument/evidence is really complicated and subtle."

With a clarification I made in the same thread:

I think Paul's characterization is right, except I think Nate wouldn't say "we've ruled out all the prima facie easy approaches," but rather something like "part of the disagreement here is about which approaches are prima facie 'easy.'" I think his model says that the proposed alternatives to MIRI's research directions by and large look more difficult than what MIRI's trying to do, from a naive traditional CS/Econ standpoint. E.g., I expect the average game theorist would find a utility/objective/reward-centered framework much less weird than a recursive intelligence bootstrapping framework. There are then subtle arguments for why intelligence bootstrapping might turn out to be easy, which Nate and co. are skeptical of, but hashing out the full chain of reasoning for why a daring unconventional approach just might turn out to work anyway requires some complicated extra dialoguing. Part of how this is framed depends on what problem categories get the first-pass "this looks really tricky to pull off" label.

Comment author: JoeW 18 April 2017 08:30:52PM *  0 points [-]

To clarify: I don't think it will be especially fruitful to try to ensure AIs are conscious, for the reason you mention: multipolar scenarios don't really work that way, what will happen is determined by what's efficient in a competitive world, which doesn't allow much room to make changes now that will actually persist.

And yes, if a singleton is inevitable, then our only hope for a good future is to do our best to align the singleton, so that it uses its uncontested power to do good things rather than just to pursue whatever nonsense goal it will have been given otherwise.

What I'm concerned about is the possibility that a singleton is not inevitable (which seems to me the most likely scenario) but that folks attempt to create one anyway. This includes realities where a singleton is impossible or close to it, as well as where a singleton is possible but only with some effort made to push towards that outcome. An example of the latter would just be a soft takeoff coupled with an attempt at forming a world government to control the AI - such a scenario certainly seems to me like it could fit the "possible but not inevitable" description.

A world takeover attempt has the potential to go very, very wrong - and then there's the serious possibility that the creation of the singleton would be successful but the alignment of it would not. Given this, I don't think it makes sense to push unequivocally for this option, with the enormous risks it entails, until we have a good idea of what the alternative looks like. That we can't control that alternative is irrelevant - we can still understand it! When we have a reasonable picture of that scenario, then we can start to think about whether it's so bad that we should embark on dangerous risky strategies to try to avoid it.

One element of that understanding would be on how likely AIs are to be conscious; another would be how good or bad a life conscious AIs would have in a multipolar scenario. I agree entirely that we don't know this yet - whether for rabbits or for future AIs - that's part of what I'd need to understand before I'd agree that a singleton seems like our best chance at a good future.

Comment author: RobBensinger 27 April 2017 10:59:56PM *  1 point [-]

Did anything in Nate's post or my comments strike you as "pushing for a singleton"? When people say "singleton," I usually understand them to have in mind some kind of world takeover, which sounds like what you're talking about here. The strategy people at MIRI favor tends to be more like "figure out what minimal AI system can end the acute risk period (in particular, from singletons), while doing as little else as possible; then steer toward that kind of system". This shouldn't be via world takeover if there's any less-ambitious path to that outcome, because any added capability, or any added wrinkle in the goal you're using the system for, increases accident risk.

More generally, alignment is something that you can partially solve for systems with some particular set of capabilities, rather than being all-or-nothing.

I agree entirely that we don't know this yet - whether for rabbits or for future AIs - that's part of what I'd need to understand before I'd agree that a singleton seems like our best chance at a good future.

I think it's much less likely that we can learn that kind of generalization in advance than that we can solve most of the alignment problem in advance. Additionally, solving this doesn't in any obvious way get you any closer to being able to block singletons from being developed, in the scenario where singletons are "possible but only with some effort made". Knowing about the utility of a multipolar outcome where no one ever builds a singleton can be useful for knowing whether you should aim for a multipolar outcome where no one ever build a singleton, but it doesn't get us any closer to knowing how to prevent anyone from ever building a singleton if you find a way to achieve an initially multipolar outcome.

I'd also add that I think the risk of producing bad conscious states via non-aligned AI mainly lies in AI systems potentially having parts or subsystems that are conscious, rather than in the system as a whole (or executive components) being conscious in the fashion of a human.

View more: Prev | Next