Comment author: tyleralterman 03 June 2016 05:35:40PM 1 point [-]

+1

Though I suspect it will be difficult to get to a sufficient threshold of EAs using LinkedIn as their social network without something similar to a marketing campaign. Any takers?

0

The Multiple Stage Fallacy

This was originally written by Eliezer Yudkowsky and posted on his Facebook wall. It is reposted here with permission from the author:   In August 2015, renowned statistician and predictor Nate Silver wrote "Trump's Six Stages of Doom" in which he gave Donald Trump a 2% chance of getting the... Read More
Comment author: RyanCarey 25 February 2016 07:47:55AM *  13 points [-]

A lot of people start their project later than they should, and a lot start it earlier. So this kind of article is going to move a lot of people to be more rash, for better and for worse. The problem is that audiences probably self-select these articles based on fulfilling their existing preconceptions. On net, it's not obvious whether or not this ends up being useful, even granted that you think most people move too slowly on their projects (which could as easily be your mistake as theirs anyway).

I reckon it would be more useful to actually discuss some concrete situations (deidentified mashups of career stories) so that people could actually know how rash you think people were beforehand and where they're supposed to end up. This would help people to know how they're supposed to generalise your arguments to their own example, and would also help the community to figure out how it actually disagrees about the topic, rather than just pushing abstract arguments back and forth.

Comment author: tyleralterman 27 February 2016 11:07:19PM 1 point [-]

I agree with Owen's comments and the others. The basic message of my post, however, seems to be something like, "Make sure you compare your plans to reality" while emphasizing the failure mode I see more often in EA (that people overestimate the difficulty of launching their own project).

Would it be correct to say that your comments don't disagree with the underlying message, but rather believe that my framing will have net harmful effects because you predict that many people reading this forum will be incited to take unwise actions?

Comment author: Sebastian_Farquhar 25 February 2016 10:19:29AM 9 points [-]

Often (in EA in particular) the largest cost to a failed started project isn't to you, but is a hard-to-see counterfactual impact.

Imagine I believe that building a synth bio safety field is incredibly important. Without a real background in synth bio, I go about building the field but because I lack context and subtle field knowledge, I screw it up having reached out to almost all the key players. They would now are be conditioned to think that synth bio safety is something that is pursued by naive outsiders who don't understand synth bio. This makes it harder for future efforts to proceed. It makes it harder for them to raise funds. It makes it harder for them to build a team.

The worst case is that you start a project, fail, but don't quit. This can block the space, and stop better projects from entering it.

These can be worked around, but it seems that many of your assumptions are conditional on not having these sorts of large negative counterfactual impacts. While that may work out, it seems overconfident to assume a 0% chance of this, especially if the career capital building steps are actually relevant domain knowledge building.

Comment author: tyleralterman 26 February 2016 03:36:20AM 4 points [-]

Agreed. This updates my view.

Comment author: tyleralterman 26 February 2016 03:35:28AM 2 points [-]

Fascinating - this ranks as both my most downvoted and most shared post of all time.

5

Should you start your own project now rather than later?

[Note, when reading this post, I advocate keeping this important comment from Seb in mind.] As the CEA team reviews Pareto Fellowship applications, we are noticing a funny phenomenon. This phenomenon afflicts even some of our very top applicants. These applicants will have an excellent goal G, like "Found a... Read More
Comment author: Stefan_Schubert 14 February 2016 02:06:24PM *  2 points [-]

The ability to judge others' competence is incredibly important for organisation effectiveness, and seems to have been quite neglected, e.g. in the rationalism community. I think one important heuristic is to:

a) Identify well-known biases (e.g. people seem to be biased in favour of attractive people).

b) Systematically try to notice whether you might have fallen prey for these biases, e.g. when recruiting. (This is obviously non-trivial, but one might try to come up with techniques which facilitate it. Getting input from others on one's biases could be one effective if somewhat sensitive technique.)

c) If so, adjust your judgment of the competence of that person downards or upwards (depending on whether you're positively or negatively biased).

Comment author: tyleralterman 14 February 2016 06:58:41PM 0 points [-]

Yup, this is an important thing to keep in the background of expert assessment.

Comment author: Andrew_SB 14 February 2016 08:42:20AM 3 points [-]

"the measure Pomodoro-maximization might accidentally become the target, even though the intended target is goal completion."

Nonsense.

Comment author: tyleralterman 14 February 2016 06:57:51PM 1 point [-]

I'm glad you think it's nonsense, since - in some strange state of affairs - a certain unnamed person has been crushing on the communal Pom sheet lately. =P

Comment author: Owen_Cotton-Barratt 14 February 2016 09:36:26AM 2 points [-]

Agree with this question.

In general, you've set yourself up for us to give you a hard time on this article, since it's putting us in a frame of mind to question expertise, and even suggesting some tools for analysing that. But if we try to use the tools on you for the question of expertise in assessing expertise, it looks like you're okay on 'P' and that we don't have enough evidence on the rest.

Comment author: tyleralterman 14 February 2016 06:54:47PM *  1 point [-]

Well-observed! Here's my guess on where I rank on the various conditions above:

  • P - Process: Medium. I think my explicit process is still fairly decent, but my implicit processes still need work. E.g., I might perform well at identifying an expert if you gave me a decent amount of time to check markers with my framework, but I'm not fluent enough in my explicit models to do expertise assessments on the fly very well, Sherlock Holmes-style.
  • I - Interaction: Medium. I've spent dozens of hours interacting with expertise assessment tasks, as mentioned in the article. However, for much of this interaction with the data, I did not have strong explicit models (I only developed the expert assessment framework last month.) Since my interaction with the data was not very model-guided for the majority of the time, it's likely that I often didn't pay attention to the right features of the data. So I may have been rather like Bob above:

    Bob, a graphic design novice, pays no attention to the signs and advertisements along the side of the street, even though they are within his field of vision. It may have been that lots of data relating to expertise was literally and metaphorically in my field of vision, but that I wasn't focusing on it very well, or wasn't focusing on the proper features.

  • F - Feedback: Low. Since I've only had well-developed explicit models for about a month, I still have only gotten minor feedback on my predictive power. I have run a few predictive exercises - they went well by the n is still small. My primary feedback method has been to generate lots of examples of people I am confident have expertise and check whether each marker can be found in all the examples. I also did the opposite: generate lots of examples of people I am confident lack expertise, and check whether each marker is absent from all the examples. I also used normal proxy methods that one can apply to check the robustness of theories without knowing much about them. (E.g., are there logical contradictions?) I used a couple other methods (e.g., running simulations and checking whether my system 1 yielded error signals), but I'd need to write a full-length article about them for these to make sense. For now, I will just say that they were weak feedback processes, but useful ones. Overall, I looked for correlation between the various feedback methods.
  • T - Time: Low-medium. I have probably spent more time training in specifically domain-general expertise assessment relative to most people in the world. But this is not saying much, since domain-general expertise assessment is not a thriving or even recognized field, as far as I can tell. Also, I have been only a small amount of time on the skill relative to the amount of training required to become skilled in domains falling into a similar reference class. (e.g., I think expertise assessment could be it's own scientific discipline, and people spend years in order to gain sufficient expertise in scientific disciplines.)

Comment author: RomeoStevens 14 February 2016 08:46:22AM *  10 points [-]

First of all, thanks a lot for spending the time to turn this into a model and polishing it enough to be shared.

Issue: It seems like the model might have trouble filtering people who have detailed but wrong models. I encounter this a lot in the nutrition literature where very detailed and technical models with complex evidence from combinations of in vitro, animal, and some human studies compete against outcome measuring RCTs. As near as I can tell, an expert with a detailed but wrong model can potentially get by 3 of the 4 filters, PI and T. They will have a harder time with F, but my current guess is that the vast majority of experts fail F, because that is where you have loaded most of the epistemic rigor. Consider how rarely (if ever) you have heard a response like the example given for F from real life researchers. You might say "all is well, the vast majority fail and the ones left are highly reliable." It seems to me however that we must rely on the lower quality evidence from people failing the F filter all the time, simply because in the vast majority of cases there is little to no evidence really passing muster and yet we must make a decision anyway.

Side note: in my estimation The Cambridge Handbook of Expertise would lend support for most of the "work" here being done by F, as opportunities for rapid, measurable feedback is one of the core predictors of performance they point to.

Potential improvement: Rather than a binary pass fail for experts we should like a metric that grades the material they present. Even crude metrics outperform estimates that do not use metrics according to the forecasting literature. Cochrane's metric for risk of bias, for example, is simply a list of 5 common sources of bias which the reviewer grades as low, high, or unclear, with a short summary of the reasoning. A very simple example would be rating each of the PIFT criteria similarly. This gives some path forward for improvement over time as well: whether or not a low or high score in a particular dimension is actually predicting subsequent expert performance.

I hope you interpret detailed feedback as a +1 and not too punishing. I am greatly encouraged by seeing work on what I consider core areas of improving the quality of EA research.

Comment author: tyleralterman 14 February 2016 06:30:07PM 1 point [-]

Potential improvement: Rather than a binary pass fail for experts we should like a metric that grades the material they present.

Agreed. I tried to make it binary for the sake of generating good examples, but the world is much more messy. In the spreadsheet version I use, I try to assign each marker a rating from "none" to "high."

View more: Next