Comment author: capybaralet 08 April 2018 02:25:11PM 0 points [-]

Based on the report [1], it's a bit misleading to say that they are a charity doing $35 cataracts. The report seems pretty explicit that donations to the charity are used for other activities.

Comment author: John_Maxwell_IV 28 September 2017 10:04:03AM *  20 points [-]

In Tetlock's book Superforecasting, he distinguishes between two skills related to forecasting: generating questions, and answering them. This "disentanglement research" business sounds more like the first sort of work. Unfortunately, Tetlock's book focuses on the second skill, but I do believe he talks some about the first skill (e.g. giving examples of people who are good at it).

I would imagine that for generating questions, curiosity and creativity are useful. Unfortunately, the Effective Altruism movement seems to be bad at creativity.

John Cleese gave this great talk about creativity in which he distinguishes between two mental modes, "open mode" and "closed mode". Open mode is good for generating ideas, whereas closed mode is good for accomplishing well-defined tasks. It seems to me that for a lot of different reasons, the topic of AI strategy might put a person in closed mode:

  • Ethical obligation - Effective altruism is often framed as an ethical obligation. If I recall correctly, a Facebook poll indicated that around half of the EA community sees EA as more of an obligation than an opportunity. Obligations don't typically create a feeling of playfulness.

  • Size of the problem - Paul Graham writes: "Big problems are terrifying. There's an almost physical pain in facing them." AI safety strategy is almost the biggest problem imaginable.

  • Big names - People like Nick Bostrom, Eliezer Yudkowsky, and Eric Drexler have a very high level of prestige within the EA community. (The status difference between them and your average EA is greater than what I've observed between the students & the professor in any college class I remember taking.) Eliezer in particular can get very grumpy with you if you disagree with him. I've noticed that I'm much more apt to generate ideas if I see myself as being at the top of the status hierarchy, and if there is no penalty for coming up with a "bad" idea (even a bad idea can be a good starting point). One idea for solving the EA community's creativity problem is to encourage more EAs to develop Richard Feynman-level indifference to our local status norms.

  • Urgency - As you state in this post, every second counts! Unfortunately urgency typically has the effect of triggering closed mode.

  • Difficulty - As you state in this post, many brilliant people have tried & failed. For some people, this fact is likely to create a sense of intimidation which precludes creativity.

For curiosity, one useful exercise I've found is Anna Salamon's practice of setting a 7-minute timer and trying to think of as many questions as possible within that period. The common pattern here seems to be "quantity over quality". If you're in a mental state where you feel a small amount of reinforcement for a bad idea, and a large amount of reinforcement for a good idea, don't be surprised if a torrent of ideas follows (some of which are good).

Another practice I've found useful is keeping a notebook. Harnessing "ambient thought" and recording ideas as they come to me, in the appropriate notebook page, seems to be much more efficient on a per-minute basis than dedicated brainstorming.

If I was attacking this problem, my overall strategic approach would differ a little from what you are describing here.

I would place less emphasis on intellectual centralization and more emphasis on encouraging people to develop idiosyncratic perspectives/form their own ontologies. Rationale: if many separately developed idiosyncratic perspectives all predict that a particular action X is desirable, that is good evidence that we should do X. There's an analogy to stock trading here. (Relatedly, the finance/venture capital industry might be the segment of society that has the most domain expertise related to predicting the future, modulo principle-agent problems that come with investing other peoples' money. Please let me know if you can think of other candidates... perhaps the intelligence community?)

Discipline could be useful for reading books & passing classes which expand one's library of concepts, but once you get to the original reasoning part, discipline gets less useful. Centralization could be useful for making sure that the space of ideas relevant to AI strategy gets thoroughly covered through our collective study, and for helping people find intellectual collaborators. But I would go for beers, whiteboards, and wikis with long lists of crowdsourced pros and cons, structured to maximize the probability that usefully related ideas will at one point or another be co-located in someone's working memory, before any kind of standard curriculum. I suspect it's better to see AI strategy as a fundamentally interdisciplinary endeavor. (It might be useful to look at successful interdisciplinary research groups such as the Santa Fe Institute for ideas.) And forget all that astronomical waste nonsense for a moment. We are in a simulation. We score 1 point if we get a positive singularity, 0 points otherwise. Where is the loophole in the game's rules that the designers didn't plan for?

[Disclaimer: I haven't made a serious effort to survey the literature or systematically understand the recommendations of experts on either creativity or curiosity, and everything in this comment is just made up of bits and pieces I picked up here and there. If you agree with my hunch that creativity/curiosity are a core part of the problem, it might be worth doing a serious lit review/systematically reading authors who write about this stuff such as Thomas Kuhn, plus reading innovators in various fields who have written about their creative process.]

Comment author: capybaralet 16 October 2017 03:00:35AM 1 point [-]

I strongly agree that independent thinking seems undervalued (in general and in EA/LW). There is also an analogy with ensembling in machine learning (

By "independent" I mean "thinking about something without considering others' thoughts on it" or something to that effect... it seems easy for people's thoughts to converge too much if they aren't allowed to develop in isolation.

Thinking about it now, though, I wonder if there isn't some even better middle ground; in my experience, group brainstorming can be much more productive than independent thought as I've described it.

There is a very high-level analogy with evolution: I imagine sexual reproduction might create more diversity in a population than horizontal gene transfer, since in the latter case, an idea(=gene) which seems good could rapidly become universal, and thus "local optima" might be more of a problem for the population (I have no idea if that's actually how this works biologically... in fact, it seems like it might not be, since at least some viruses/bacteria seem to do a great job of rapidly mutating to become resistant to defences/treatments.)

Comment author: capybaralet 10 October 2017 04:30:34PM 0 points [-]
Comment author: capybaralet 07 October 2017 04:14:19PM *  3 points [-]

Thanks for writing this. My TL;DR is:

  1. AI policy is important, but we don’t really know where to begin at the object level

  2. You can potentially do 1 of 3 things, ATM: A. “disentanglement” research: B. operational support for (e.g.) FHI C. get in position to influence policy, and wait for policy objectives to be cleared up

  3. Get in touch / Apply to FHI!

I think this is broadly correct, but have a lot of questions and quibbles.

  • I found “disentanglement” unclear. [14] gave the clearest idea of what this might look like. A simple toy example would help a lot.
  • Can you give some idea of what an operations role looks like? I find it difficult to visualize, and I think uncertainty makes it less appealling.
  • Do you have any thoughts on why operations roles aren’t being filled?
  • One more policy that seems worth starting on: programs that build international connections between researchers (especially around policy-relevant issues of AI (i.e. ethics/safety)).
  • The timelines for effective interventions in some policy areas may be short (e.g. 1-5 years), and it may not be possible to wait for disentanglement to be “finished”.
  • Is it reasonable to expect the “disentanglement bottleneck” to be cleared at all? Would disentanglement actually make policy goals clear enough? Trying to anticipate all the potential pitfalls of policies is a bit like trying to anticipate all the potential pitfalls of a particular AI design or reward specification… fortunately, there is a bit of a disanalogy in that we are more likely to have a chance to correct mistakes with policy (although that still could be very hard/impossible). It seems plausible that “start iterating and create feedback loops” is a better alternative to the “wait until things are clearer” strategy.
Comment author: capybaralet 11 August 2017 01:15:51AM *  1 point [-]

My main comments:

  1. As others have mentioned: great post! Very illuminating!

  2. I agree value-learning is the main technical problem, although I’d also note that value-learning related techniques are becoming much more popular in mainstream ML these days, and hence less neglected. Stuart Russell has argued (and I largely agree) that things like IRL will naturally become a more popular research topic (but I’ve also argued this might not be net-positive for safety:

  3. My main comment wrt the value of HRAD (3a) is: I think HRAD-style work is more about problem definitions than solutions. So I find it to be somewhat orthogonal to the other approach of “learning to reason from humans” (L2R). We don’t have the right problem definitions, at the moment; we know that the RL framework is a leaky abstraction. I think MIRI has done the best job of identifying the problems which could result from our current leaky abstractions, and working to address them by improving our understanding of what problems need to be solved.

  4. It’s also not clear that human reasoning can be safely amplified; the relative safety of existing humans may be due to our limited computational / statistical resources, rather than properties of our cognitive algorithms. But this argument is not as strong as it seems; see comment #3 below.

A few more comments:

  1. RE 3b: I don’t really think the AI community’s response to MIRI’s work is very informative, since it’s just not on people’s radar. The problems and not well known or understood, and the techniques are (AFAIK) not very popular or in vogue (although I’ve only been in the field for 4 years, and only studied machine-learning based approaches to AI). I think decision theory was already a relatively well known topic in philosophy, so I think philosophy would naturally be more receptive to these results.

  2. I’m unconvinced about the feasibility of Paul’s approach**, and share Wei Dai’s concerns about it hinging on a high level of competitiveness. But I also think HRAD suffers from the same issues of competitiveness (this does not seem to be MIRI’s view, which I’m confused by). This is why I think solving global coordination is crucial.

  3. A key missing (assumed?) argument here is that L2R can be a stepping stone, e.g. providing narrow or non-superintelligent AI capabilities which can be applied to AIS problems (e.g. making much more progress on HRAD than MIRI). To me this is a key argument for L2R over HRAD, and generally a source of optimism. I’m curious if this argument plays a significant role in your thought; in other words, is it that HRAD problems don’t need to be solved, or just that the most effective solution path goes through L2R? I’m also curious about the counter-argument for pursuing HRAD now: i.e. what role does MIRI anticipate safe advanced (but not general / superhuman) intelligent systems to play in HRAD?

  4. An argument for more funding for MIRI which isn’t addressed is the apparent abundance of wealth at the disposal of Good Ventures. Since funding opportunities are generally scarce in AI Safety, I think every decent opportunity should be aggressively pursued. There are 3 plausible arguments I can see for the low amount of funding to MIRI: 1) concern of steering other researchers in unproductive directions 2) concern about bad PR 3) internal politics.

  5. Am I correct that there is a focus on shorter timelines (e.g. <20 years)?

Briefly, my overall perspective on the future of AI and safety relevance is:

  1. There ARE fundamental insights missing, but they are unlikely to be key to building highly capable OR safe AI.

  2. Fundamental insights might be crucial for achieving high confidence in a putatively safe AI (but perhaps not for developing an AI which is actually safe).

  3. HRAD line of research is likely to uncover mostly negative results (ala AIXI’s arbitrary dependence on prior)

  4. Theory is behind empiricism, and the gap is likely to grow; this is the main reason I’m a bit pessimistic about theory being useful. On the other hand, I think most paths to victory involve using capability-control for as long as possible while transitioning to completely motivation-control based approaches, so conditioning on victory, it seems more likely that we solve more fundamental problems (i.e. “we have to solve these problems eventually”).

** the two main reasons are: 1) I don’t think it will be competitive and 2) I suspect it will be difficult to prevent compounding errors in a bootstrapping process that yields superintelligent agents.

Comment author: Wei_Dai 11 July 2017 05:45:15PM 3 points [-]

I can talk in more detail about the reduction from (capability amplification --> agent foundations) if it's not clear whether it is possible and it would have an effect on your view.

Yeah, this is still not clear. Suppose we had a solution to agent foundations, I don't see how that necessarily helps me figure out what to do as H in capability amplification. For example the agent foundations solution could say, use (some approximation of) exhaustive search in the following way, with your utility function as the objective function, but that doesn't help me because I don't have a utility function.

When comparing difficulty of two approaches you should presumably compare the difficulty of achieving a fixed goal with one approach or the other.

My point was that HRAD potentially enables the strategy of pushing mainstream AI research away from opaque designs (which are hard to compete with while maintaining alignment, because you don't understand how they work and you can't just blindly copy the computation that they do without risking safety), whereas in your approach you always have to worry about "how do I compete with with an AI that doesn't have an overseer or has an overseer who doesn't care about safety and just lets the AI use whatever opaque and potentially dangerous technique it wants".

On the agent foundations side, it seems like plausible approaches involve figuring out how to peer inside the previously-opaque hypotheses, or understanding what characteristic of hypotheses can lead to catastrophic generalization failures and then excluding those from induction.

Oh I see. In my mind the problems with Solomonoff Induction means that it's probably not the right way to define how induction should be done as an ideal, so we should look for something kind of like Solomonoff Induction but better, not try to patch it by doing additional things on top of it. (Like instead of trying to figure out exactly when CDT would make wrong decisions and add more complexity on top of it to handle those cases, replace it with UDT.)

Comment author: capybaralet 06 August 2017 11:55:15AM *  0 points [-]

My point was that HRAD potentially enables the strategy of pushing mainstream AI research away from opaque designs (which are hard to compete with while maintaining alignment, because you don't understand how they work and you can't just blindly copy the computation that they do without risking safety), whereas in your approach you always have to worry about "how do I compete with with an AI that doesn't have an overseer or has an overseer who doesn't care about safety and just lets the AI use whatever opaque and potentially dangerous technique it wants".

I think both approaches potentially enable this, but are VERY unlikely to deliver. MIRI seems more bullish that fundamental insights will yield AI that is just plain better (Nate gave me the analogy of Judea Pearl coming up with Causal PGMs as such an insight), whereas Paul just seems optimistic that we can get a somewhat negligible performance hit for safe vs. unsafe AI.

But I don't think MIRI has given very good arguments for why we might expect this; it would be great if someone can articulate or reference the best available arguments.

I have a very strong intuition that dauntingly large safety-performance trade-offs are extremely likely to persist in practice, thus the only answer to the "how do I compete" question seems to be "be the front-runner".

Comment author: WillPearson 10 July 2017 09:58:19PM 1 point [-]

Fixed, thanks.

I agree that HRAD might be useful. I read some of the stuff. I think we need a mix of theory and practice and only when we have community where they can feed into each other will we actually get somewhere. When an AI safety theory paper says, "Here is an experiment we can do to disprove this theory," then I will pay more attention than I do.

The "ignored physical aspect of computation" is less about a direction to follow, but more an argument about the type of systems that are likely to be effective and so an argument about which ones we should study. There is no point studying how to make ineffective systems safe if the lessons don't carry over to effective ones.

You don't want a system that puts in the same computational resources trying to decide what brand of oil is best for its bearings as it does to deciding the question of what is a human or not. If you decide how much computational resources you want to put into each class of decision, you start to get into meta-decision territory. You also need to decide how much of your pool you want to put into making that meta-decision as making it will take away from making your other decisions.

I am thinking about a possible system which can allocate resources among decision making systems and this can be used to align the programs (at least somewhat). It cannot align a super intelligent malign program, work needs to done on the initial population of programs in the system, so that we can make sure they do not appear. Or we need a different way of allocating resources entirely.

I don't pick this path because it is an easy path to safety, but because I think it is the only path that leads anywhere interesting/dangerous and so we need to think about how to make it safe.

Comment author: capybaralet 06 August 2017 07:32:08AM *  0 points [-]

Will - I think "meta-reasoning" might capture what you mean by "meta-decision theory". Are you familiar with this research (e.g. Nick Hay did a thesis w/Stuart Russell on this topic recently)?

I agree that bounded rationality is likely to loom large, but I don't think this means MIRI is barking up the wrong tree... just that other trees also contain parts of the squirrel.

Comment author: sdspikes 01 March 2017 01:50:13AM 1 point [-]

As a Stanford CS (BS/MS '10) grad who took AI/Machine Learning courses in college from Andrew Ng, worked at Udacity with Sebastian Thrun, etc. I have mostly been unimpressed by non-technical folks trying to convince me that AI safety (not caused by explicit human malfeasance) is a credible issue.

Maybe I have "easily corrected, false beliefs" but the people I've talked to at MIRI and CFAR have been pretty unconvincing to me, as was the book Superintelligence.

My perception is that MIRI has focused in on an extremely specific kind of AI that to me seems unlikely to do much harm unless someone is recklessly playing with fire (or intentionally trying to set one). I'll grant that that's possible, but that's a human problem, not an AI problem, and requires a human solution.

You don't try to prevent nuclear disaster by making friendly nuclear missiles, you try to keep them out of the hands of nefarious or careless agents or provide disincentives for building them in the first place.

But maybe you do make friendly nuclear power plants? Not sure if this analogy worked out for me or not.

Comment author: capybaralet 01 March 2017 11:34:16PM 2 points [-]

I'm also very interested in hearing you elaborate a bit.

I guess you are arguing that AIS is a social rather than a technical problem. Personally, I think there are aspects of both, but that the social/coordination side is much more significant.

RE: "MIRI has focused in on an extremely specific kind of AI", I disagree. I think MIRI has aimed to study AGI in as much generality as possible and mostly succeeded in that (although I'm less optimistic than them that results which apply to idealized agents will carry over and produce meaningful insights in real-world resource-limited agents). But I'm also curious what you think MIRIs research is focusing on vs. ignoring.

I also would not equate technical AIS with MIRI's research.

Is it necessary to be convinced? I think the argument for AIS as a priority is strong so long as the concerns have some validity to them, and cannot be dismissed out of hand.

Comment author: capybaralet 27 January 2017 02:31:26AM 3 points [-]

(cross posted on facebook):

I was thinking of applying... it's a question I'm quite interested in. The deadline is the same as ICML tho!

I had an idea I will mention here: funding pools: 1. You and your friends whose values and judgement you trust and who all have small-scale funding requests join together.
2. A potential donor evaluates one funding opportunity at random, and funds all or none of them on the basis of that evaluation. 3. You have now increased the ratio of funding / evaluation available to a potential donor by a factor of #projects 4. There is an incentive for you to NOT include people in your pool if you think their proposal is quite inferior to yours... however, you might be incentivized to include somewhat inferior proposals in order to reach a threshold where the combined funding opportunity is large enough to attract more potential donors.


EA essay contest for <18s

I am planning to sponsor an Effective Altruism essay contest for people <18 years old. See this document for details and prompts . The inspiration is the Ayn Rand Institute essay contest .   The goal is to motivate young people to learn about and get involved with EA, thus... Read More

View more: Next