Comment author: Halffull 18 November 2017 12:11:17AM -1 points [-]

Imagine two epistemic peers estimating the weighting of a coin. They start with their probabilities bunched around 50% because they have been told the coin will probably be close to fair. They both see the same number of flips, and then reveal their estimates of the weighting. Both give an estimate of p=0.7. A modest person, who correctly weights the other person's estimates as equally as informative as their own, will now offer a number quite a bit higher than 0.7, which takes into account the equal information both of them has to pull them away from their prior.

This is what I'm talking about when I say "jut so stories" about the data from the GJP. One explanation is that superforecasters are going through this thought process, another would be that they discard non-superforecasters' knowledge, and therefore end up as more extreme without explicitly running the extremizing algorithm on their own forecasts.

Similarly, the existence of super-forecasters themselves argues for a non-modest epistemology, while the fact that the extremized aggregation beats the superforecasters may argue for somewhat of a more modest epistemology. Saying that the data here points one way or the other to my mind is cherrypicking.

Comment author: Robert_Wiblin 18 November 2017 01:00:17AM *  1 point [-]

"...the existence of super-forecasters themselves argues for a non-modest epistemology..."

I don't see how. No theory on offer argues that everyone is an epistemic peer. All theories predict some people have better judgement and will be reliably able to produce better guesses.

As a result I think superforecasters should usually pay little attention to the predictions of non-superforecasters (unless it's a question on which expertise pays few dividends).

Comment author: vaniver 17 November 2017 01:24:51AM *  3 points [-]

I think with Eliezer's approach, superforecasters should exist, and it should be possible to be aware that you are a superforecaster. Those both seem like they would be lower probability under the modest view. Whether Eliezer personally is a superforecaster seems about as relevant as whether Tetlock is one; you don't need to be a superforecaster to study them.

I expect Eliezer to agree that a careful aggregation of superforecasters will outperform any individual superforecaster; similarly, I expect Eliezer to think that a careful aggregation of anti-modest reasoners will outperform any individual anti-modest reasoner.

It's worth considering what careful aggregations look like when not dealing with binary predictions. The function of a careful aggregation is to disproportionately silence error while maintaining signal. With many short-term binary predictions, we can use methods that focus on the outcomes, without any reference to how those predictors are estimating those outcomes. With more complicated questions, we can't compare outcomes directly, and so need to use the reasoning processes themselves as data.

That suggests a potential disagreement to focus on: the anti-modest view suspects that one can do a careful aggregation based on reasoner methodology (say, weighing more highly forecasters who adjust their estimates more frequently, or who report using Bayes, or so on), whereas I think the modest view suspects that this won't outperform uniform aggregation.

(The modest view has two components--approving of weighting past performance, and disapproving of other weightings. Since other approaches can agree on the importance of past performance, and the typical issues where the two viewpoints differ are those where we have little data on past performance, it seems more relevant to focus on whether the disapproval is correct than whether the approval is correct.)

Comment author: Robert_Wiblin 17 November 2017 11:10:49AM *  0 points [-]

OK so it seems like the potential areas of disagreement are:

  • How much external confirmation do you need to know that you're a superforecaster (or have good judgement in general), or even the best forecaster?
  • How narrowly should you define the 'expert' group?
  • How often should you define who is a relevant expert based on whether you agree with them in that specific case?
  • How much should you value 'wisdom of the crowd (of experts)' against the views of the one best person?
  • How much to follow a preregistered process to whatever conclusion it leads to, versus change the algorithm as you go to get an answer that seems right?

We'll probably have to go through a lot of specific cases to see how much disagreement there actually is. It's possible to talk in generalities and feel you disagree, but actually be pretty close on concrete cases.

Note that it's entirely possible that non-modest contributors will do more to enhance the accuracy of a forecasting tournament because they try harder to find errors, but less right than others' all-things-considered views, because of insufficient deference to the answer the tournament as a whole spits out. Active traders enhance market efficiency, but still lose money as a group.

As for Eliezer knowing how to make good predictions, but not being able to do it himself, that's possible (though it would raise the question of how he has gotten strong evidence that these methods work). But as I understand it, Eliezer regards himself as being able to do unusually well using the techniques he has described, and so would predict his own success in forecasting tournaments.

Comment author: Halffull 17 November 2017 01:20:31AM 0 points [-]

How is that in conflict with my point? As superforecasters spend more time talking and sharing information with one another, maybe they have already incorporated extremising into their own forecasts.

Doesn't this clearly demonstrate that the superforecasters are not using modest epistemology? At best, this shows that you can improve upon a "non-modest" epistemology by aggregating them together, but does not argue against the original post.

Comment author: Robert_Wiblin 17 November 2017 09:54:43AM 1 point [-]

Hi Halffull - now I see what you're saying, but actually the reverse is true. That superforecasters have already extremised shows their higher levels of modesty. Extremising is about updating based on other people's views, and realising that because they have independent information to add, after hearing their view, you can be more confident of where to shift from your prior.

Imagine two epistemic peers estimating the weighting of a coin. They start with their probabilities bunched around 50% because they have been told the coin will probably be close to fair. They both see the same number of flips, and then reveal their estimates of the weighting. Both give an estimate of p=0.7. A modest person, who correctly weights the other person's estimates as equally as informative as their own, will now offer a number quite a bit higher than 0.7, which takes into account the equal information both of them has to pull them away from their prior.

Once they've done that, there won't be gains from further extremising. But a non-humble participant would fail to properly extremise based on the information in the other person's view, leaving accuracy to be gained if this is done at a later stage by someone running the forecasting tournament.

Comment author: Robert_Wiblin 17 November 2017 12:16:33AM *  8 points [-]

It strikes me as much more prevalent for people to be overconfident in their own idiosyncratic opinions. If you see half of people are 90% confident in X and half of people are 90% confident in not-X, then you know on average they are overconfident. That's how most of the world looks to me.

But no matter - they probably won't suffer much, because the meek do no inherit the Earth, at least not in this life.

People follow confidence in leaders, generating the pathological start-up founder who is sure they're 100x more likely to succeed than the base rate; someone who portrays themselves as especially competent in a job interview is more likely to be hired than someone who accurately appraises their merits; and I don't imagine deferring to a boring consensus brings more romantic success than elaborating on one's exciting contrarian opinions.

Given all this, it's unsurprising evolution has programmed us to place an astonishingly high weight on our own judgement.

While there are some social downsides to seeming arrogant, people who preach modesty here advocate going well beyond what's required to avoid triggering an anti-dominance reaction in others.

Indeed, even though I think strong modesty is epistemically the correct approach on the basis of reasoned argument, I do not and can not consistently live and speak that way, because all my personal incentives are lined up in favour of me portraying myself as very confident in my inside view.

In my experience it requires a monastic discipline to do otherwise, a discipline almost none possess.

Comment author: Halffull 16 November 2017 10:26:19PM -3 points [-]

It's an interesting just so story about what IARPA has to say about epistemology, but the actual story is much more complicated. For instance, the fact that "Extremizing" works to better calibrate general forecasts, but that extremizing of superforecaster's predictions makes them worse.

Furthermore, that contrary to what you seem to be claiming about people not being able to outperform others, there are in fact "superforecasters" who out perform the average participant year after year, even if they can't outperform the aggregate when their forecasts are factored in.

Comment author: Robert_Wiblin 16 November 2017 10:36:49PM *  3 points [-]

Not sure how this is a 'just so story' in the sense that I understand the term.

"the fact that "Extremizing" works to better calibrate general forecasts, but that extremizing of superforecaster's predictions makes them worse."

How is that in conflict with my point? As superforecasters spend more time talking and sharing information with one another, maybe they have already incorporated extremising into their own forecasts.

I know very well about superforecasters (I've read all of his books and interviewed Tetlock last week), but I am pretty sure an aggregation of superforecasters beats almost all of them individually, which speaks to the benefits of averaging a range of people's views in most cases. Though in many cases you should not give much weight to those who are clearly in a worse epistemic position (non-superforecasters, whose predictions Tetlock told me were about 10-30x less useful).

Comment author: Robert_Wiblin 16 November 2017 07:05:57PM *  8 points [-]

Hi Eliezer, I wonder if you've considered trying to demonstrate the superiority of your epistemic approach by participating in one of the various forecasting tournaments funded by IARPA, and trying to be classified as a 'superforecaster'. For example the new Hybrid Forecasting Competition is actively recruiting participants.

To me your advice seems in tension with the recommendations that have come out of that research agenda (via Tetlock and others) which finds forecasts carefully aggregated from many people perform better than almost any individuals - and individuals that beat the aggregation were almost always lucky and can't repeat the feat. I'd be interested to see how an anti-modest approach fares in direct quantified competition with alternatives.

It would be understandable if you didn't think that was the best use of your time, in which case perhaps some others who endorse and practice the mindset you recommend could find the time to do it instead.

Comment author: Jacy_Reese 04 November 2017 01:29:41AM *  16 points [-]

[Disclaimer: Rob, 80k's Director of Research, and I briefly chatted about this on Facebook, but I want to make a comment here because that post is gone and more people will see it here. Also, as a potential conflict-of-interest, I took the survey and work at an organization that's between the animal and far future cause areas.]

This is overall really interesting, and I'm glad the survey was done. But I'm not sure how representative of EA community leaders it really is. I'd take the cause selection section in particular with a big grain of salt, and I wish it were more heavily qualified and discussed in different language. Of the organizations surveyed and number surveyed per organization, my personal count is that 14 were meta, 12.5 were far future, 3 were poverty, and 1.5 were animal.* My guess is that a similar distribution holds for the 5 unaffiliated respondents. So it should be no surprise to readers that meta and far future work were most prioritized. **

I think we shouldn't call this a general survey of EA leadership (e.g. the title of the post) when it's so disproportionate. I think the inclusion of more meta organization makes sense, but there are poverty groups like the Against Malaria Foundation and Schistosomiasis Control Initiative, as well as animal groups like The Good Food Institute and The Humane League, that seem to meet the same bar for EA-ness as the far future groups included like CSER and MIRI.

Focusing heavily on far future organizations might be partly due to selecting only organizations founded after the EA community coalesced, and while that seems like a reasonable metric (among several possibilities), is also seems biased towards far future work because that's a newer field and it's at least the reasonable metric that conveniently syncs up with 80k's cause prioritization views. Also, the ACE-recommended charity GFI was founded explicitly on the principle of effective altruism after EA coalesced. Their team says that quite frequently, and as far as I know, the leadership all identifies as EA. Perhaps you're using a metric more like social ties to other EA leaders, but that's exactly the sort of bias I'm worried about here.

Also, the EA community as a whole doesn't seem to hold this cause prioritization view (http://effective-altruism.com/ea/1e5/ea_survey_2017_series_cause_area_preferences/). Leadership can of course deviate from the broad community, but this is just another reason to be cautious in weighing these results.

I think your note about this selection is fair - "the group surveyed included many of the most clever, informed and long-involved people in the movement,"

and I appreciate that you looked a little at cause prioritization for relatively-unbiased subsets - "Views were similar among people whose main research work is to prioritise different causes – none of whom rated Global Development as the most effective," - "on the other hand, many people not working in long-term focussed organisations nonetheless rated it as most effective"

but it's still important to note that you (Rob and 80k) personally favor these two areas strongly, which seems to create a big potential bias, and that we should be very cautious of groupthink in our community where updating based on the views of EA leaders is highly prized and recommended. I know the latter is a harder concern to get around with a survey, but I think it should have been noted in the report, ideally in the Key Figures section. And as I mentioned at the beginning, I don't think this should be discussed as a general survey of EA leaders, at least not when it comes to cause prioritization.

This post certainly made me more worried personally that my prioritization of the far future could be more due to groupthink than I previously thought.


Here's the categorization I'm using for organizations. It might be off, but it's at least pretty close. ff = far future

80,000 Hours (3) meta AI Impacts (1) ff Animal Charity Evaluators (1) animal Center for Applied Rationality (2) ff Centre for Effective Altruism (3) meta Centre for the Study of Existential Risk (1) ff Charity Science: Health (1) poverty DeepMind (1) ff Foundational Research Institute (2) ff Future of Humanity Institute (3) ff GiveWell (2) poverty Global Priorities Institute (1) meta Leverage Research (1) meta Machine Intelligence Research Institute (2) ff Open Philanthropy Project (5) meta Rethink Charity (1) meta Sentience Institute (1) animal/ff Unaffiliated (5)

*The 80k post notes that not everyone filled out all the survey answers, e.g. GiveWell only had one person fill out the cause selection section.

**Assuming the reader has already seen other evidence, e.g. that CFAR only recently adopted a far future mission, or that people like Rob went from other cause areas towards a focus on the far future.

Comment author: Robert_Wiblin 04 November 2017 09:15:07AM *  3 points [-]

Hey Jacy thanks for the detailed comment - with EA Global London on this weekend I'll have to be brief! :)

One partial response is that even if you don't think this is fully representative of the set of all organisation you'd like to have seen surveyed, it's informative about the groups that were. We list the orgs that were surveyed and point out who wasn't near the start of the article so people understand who the answers represent:

"The reader should keep in mind this sample does not include some direct work organisations that some in the community donate to, including the Against Malaria Foundation, Mercy for Animals or the Center for Human-Compatible AI at UC Berkeley."

You can take this information for whatever it's worth!

As for who I chose to sample - on any definition there's always going to be some grey area, orgs that almost meet that definition but don't quite. I tried to find all the organisations with full-time staff who i) were a founding part of the EA movement, or, ii) were founded by people who identify strongly as part of the EA community, or, iii) are now mostly led by people who identify more strongly as part of the EA movement than other other community. I think that's a natural grouping and don't view AMF, MfA or CHAI as meeting that definition (though I'd be happy to be corrected if any group does meet this definition whose leadership I'm not personally familiar with).

The main problem with that question in my mind is underrepresentation of GiveWell which has a huge budget and is clearly a central EA organisation - the participants from GiveWell gave me one vote to work with but didn't provide quantitative answers, as they didn't have a strong or clear enough view. More generally, people from the sample who specialise in one cause were more inclined to say they didn't have a view on fund which was most effective and so not answer it (which is reasonable but could bias the answers).

Personally like you I give more weight to the views of specialist cause priorities researchers working at cause-neutral organisations. They were more likely to answer the question and are singled out in the table with individual votes. Interestingly their results were quite similar to the full sample.

I agree we should be cautious about all piling on to the same causes and falling for an 'information cascade'. That said, if the views in that table are a surprise to someone, it's a reason to update in their direction, even if they don't act on that information yet.

I'd be very keen to get more answers to this question, including folks from direct work orgs. And also increase the sample at some organisations that were included in the survey, but for which few people answered that question (GiveWell most notably). With a larger sample we'll be able to break the answers down more finely to see how they vary by subgroup, and weight them by organisation size without giving single data points huge leverage over the result.

I'll try to do that in the next week or two one EAG London is over!

Comment author: Robert_Wiblin 02 November 2017 02:40:32PM *  3 points [-]

Hi Sacha, thanks for writing this, good food for thought. I'll get back to you properly next week after EA Global London (won't have any spare time for at least 4 days).

I just wanted to point out quickly that we do have personal fit in our framework and it can give you up to a 100x difference between causes: https://80000hours.org/articles/problem-framework/#how-to-assess-personal-fit

I also wonder if we should think about the effective resources dedicated to solving a problem using a Cobb-Douglas production function: Effective Resources = Funding ^ 0.5 * Talent ^ 0.5. That would help capture cases where an increase in funding without a commensurate increase in talent in the area has actually increased the marginal returns to an extra person working on the problem.

Comment author: Robert_Wiblin 30 October 2017 12:12:22AM 5 points [-]

This post is one of the best things I've read on this forum. I upvoted it, but didn't feel that was sufficient appreciation for you writing something this thorough in your spare time!

Comment author: ClaireZabel 29 October 2017 10:43:21PM 16 points [-]

Thank so much for the clear and eloquent post. I think a lot of the issues related to lack of expertise and expert bias are stronger than I think you do, and I think it's both rare and not inordinately difficult to adjust for common biases such that in certain cases a less-informed individual can often beat the expert consensus (because few enough of the experts are doing this, for now). But it was useful to read this detailed and compelling explanation of your view.

The following point seems essential, and I think underemphasized:

Modesty can lead to double-counting, or even groupthink. Suppose in the original example Beatrice does what I suggest and revise their credences to be 0.6, but Adam doesn’t. Now Charlie forms his own view (say 0.4 as well) and does the same procedure as Beatrice, so Charlie now holds a credence of 0.6 as well. The average should be lower: (0.8+0.4+0.4)/3, not (0.8+0.6+0.4)/3, but the results are distorted by using one-and-a-half helpings of Adam’s credence. With larger cases one can imagine people wrongly deferring to hold consensus around a view they should think is implausible, and in general the nigh-intractable challenge from trying to infer cases of double counting from the patterns of ‘all things considered’ evidence.

One can rectify this by distinguishing ‘credence by my lights’ versus ‘credence all things considered’. So one can say “Well, by my lights the credence of P is 0.8, but my actual credence is 0.6, once I account for the views of my epistemic peers etc.” Ironically, one’s personal ‘inside view’ of the evidence is usually the most helpful credence to publicly report (as it helps others modestly aggregate), whilst ones all things considered modest view usually for private consumption.

I rarely see any effort to distinguish between the two outside the rationalist/EA communities, which is one reason I think both over-modesty and overconfident backlash against it are common.

My experience is that most reasonable, intelligent people I know have never explicitly thought of the distinction between the two types of credence. I think many of them have an intuition that something would be lost if they stated their "all things considered" credence only, even though it feels "truer" and "more likely to be right," though they haven't formally articulated the problem. And knowing that other people rarely make this distinction, it's hard for everyone know how to update based on others' views without double-counting, as you note.

It seems like it's intuitive for people to state either their inside view, or their all-things-considered view, but not both. To me, stating "both">"inside view only">"outside view only", but I worry that calls for more modest views tend to leak nuance and end up pushing for people to publicly state "outside view only" rather than "both"

Also, I've generally heard people call the "credence by my lights" and "credence all things considered" one's "impressions" and "beliefs," respectively, which I prefer because they are less clunky. Just fyi.

(views my own, not my employer's)

Comment author: Robert_Wiblin 29 October 2017 11:36:40PM 4 points [-]

I just thought I'd note that this appears similar to the 'herding' phenomenon in political polling, which reduces aggregate accuracy: http://www.aapor.org/Education-Resources/Election-Polling-Resources/Herding.aspx

View more: Next