Comment author: capybaralet 11 August 2017 01:15:51AM *  1 point [-]

My main comments:

  1. As others have mentioned: great post! Very illuminating!

  2. I agree value-learning is the main technical problem, although I’d also note that value-learning related techniques are becoming much more popular in mainstream ML these days, and hence less neglected. Stuart Russell has argued (and I largely agree) that things like IRL will naturally become a more popular research topic (but I’ve also argued this might not be net-positive for safety: http://lesswrong.com/lw/nvc/risks_from_approximate_value_learning/)

  3. My main comment wrt the value of HRAD (3a) is: I think HRAD-style work is more about problem definitions than solutions. So I find it to be somewhat orthogonal to the other approach of “learning to reason from humans” (L2R). We don’t have the right problem definitions, at the moment; we know that the RL framework is a leaky abstraction. I think MIRI has done the best job of identifying the problems which could result from our current leaky abstractions, and working to address them by improving our understanding of what problems need to be solved.

  4. It’s also not clear that human reasoning can be safely amplified; the relative safety of existing humans may be due to our limited computational / statistical resources, rather than properties of our cognitive algorithms. But this argument is not as strong as it seems; see comment #3 below.

A few more comments:

  1. RE 3b: I don’t really think the AI community’s response to MIRI’s work is very informative, since it’s just not on people’s radar. The problems and not well known or understood, and the techniques are (AFAIK) not very popular or in vogue (although I’ve only been in the field for 4 years, and only studied machine-learning based approaches to AI). I think decision theory was already a relatively well known topic in philosophy, so I think philosophy would naturally be more receptive to these results.

  2. I’m unconvinced about the feasibility of Paul’s approach**, and share Wei Dai’s concerns about it hinging on a high level of competitiveness. But I also think HRAD suffers from the same issues of competitiveness (this does not seem to be MIRI’s view, which I’m confused by). This is why I think solving global coordination is crucial.

  3. A key missing (assumed?) argument here is that L2R can be a stepping stone, e.g. providing narrow or non-superintelligent AI capabilities which can be applied to AIS problems (e.g. making much more progress on HRAD than MIRI). To me this is a key argument for L2R over HRAD, and generally a source of optimism. I’m curious if this argument plays a significant role in your thought; in other words, is it that HRAD problems don’t need to be solved, or just that the most effective solution path goes through L2R? I’m also curious about the counter-argument for pursuing HRAD now: i.e. what role does MIRI anticipate safe advanced (but not general / superhuman) intelligent systems to play in HRAD?

  4. An argument for more funding for MIRI which isn’t addressed is the apparent abundance of wealth at the disposal of Good Ventures. Since funding opportunities are generally scarce in AI Safety, I think every decent opportunity should be aggressively pursued. There are 3 plausible arguments I can see for the low amount of funding to MIRI: 1) concern of steering other researchers in unproductive directions 2) concern about bad PR 3) internal politics.

  5. Am I correct that there is a focus on shorter timelines (e.g. <20 years)?

Briefly, my overall perspective on the future of AI and safety relevance is:

  1. There ARE fundamental insights missing, but they are unlikely to be key to building highly capable OR safe AI.

  2. Fundamental insights might be crucial for achieving high confidence in a putatively safe AI (but perhaps not for developing an AI which is actually safe).

  3. HRAD line of research is likely to uncover mostly negative results (ala AIXI’s arbitrary dependence on prior)

  4. Theory is behind empiricism, and the gap is likely to grow; this is the main reason I’m a bit pessimistic about theory being useful. On the other hand, I think most paths to victory involve using capability-control for as long as possible while transitioning to completely motivation-control based approaches, so conditioning on victory, it seems more likely that we solve more fundamental problems (i.e. “we have to solve these problems eventually”).

** the two main reasons are: 1) I don’t think it will be competitive and 2) I suspect it will be difficult to prevent compounding errors in a bootstrapping process that yields superintelligent agents.

Comment author: Wei_Dai 11 July 2017 05:45:15PM 3 points [-]

I can talk in more detail about the reduction from (capability amplification --> agent foundations) if it's not clear whether it is possible and it would have an effect on your view.

Yeah, this is still not clear. Suppose we had a solution to agent foundations, I don't see how that necessarily helps me figure out what to do as H in capability amplification. For example the agent foundations solution could say, use (some approximation of) exhaustive search in the following way, with your utility function as the objective function, but that doesn't help me because I don't have a utility function.

When comparing difficulty of two approaches you should presumably compare the difficulty of achieving a fixed goal with one approach or the other.

My point was that HRAD potentially enables the strategy of pushing mainstream AI research away from opaque designs (which are hard to compete with while maintaining alignment, because you don't understand how they work and you can't just blindly copy the computation that they do without risking safety), whereas in your approach you always have to worry about "how do I compete with with an AI that doesn't have an overseer or has an overseer who doesn't care about safety and just lets the AI use whatever opaque and potentially dangerous technique it wants".

On the agent foundations side, it seems like plausible approaches involve figuring out how to peer inside the previously-opaque hypotheses, or understanding what characteristic of hypotheses can lead to catastrophic generalization failures and then excluding those from induction.

Oh I see. In my mind the problems with Solomonoff Induction means that it's probably not the right way to define how induction should be done as an ideal, so we should look for something kind of like Solomonoff Induction but better, not try to patch it by doing additional things on top of it. (Like instead of trying to figure out exactly when CDT would make wrong decisions and add more complexity on top of it to handle those cases, replace it with UDT.)

Comment author: capybaralet 06 August 2017 11:55:15AM *  0 points [-]

My point was that HRAD potentially enables the strategy of pushing mainstream AI research away from opaque designs (which are hard to compete with while maintaining alignment, because you don't understand how they work and you can't just blindly copy the computation that they do without risking safety), whereas in your approach you always have to worry about "how do I compete with with an AI that doesn't have an overseer or has an overseer who doesn't care about safety and just lets the AI use whatever opaque and potentially dangerous technique it wants".

I think both approaches potentially enable this, but are VERY unlikely to deliver. MIRI seems more bullish that fundamental insights will yield AI that is just plain better (Nate gave me the analogy of Judea Pearl coming up with Causal PGMs as such an insight), whereas Paul just seems optimistic that we can get a somewhat negligible performance hit for safe vs. unsafe AI.

But I don't think MIRI has given very good arguments for why we might expect this; it would be great if someone can articulate or reference the best available arguments.

I have a very strong intuition that dauntingly large safety-performance trade-offs are extremely likely to persist in practice, thus the only answer to the "how do I compete" question seems to be "be the front-runner".

Comment author: WillPearson 10 July 2017 09:58:19PM 1 point [-]

Fixed, thanks.

I agree that HRAD might be useful. I read some of the stuff. I think we need a mix of theory and practice and only when we have community where they can feed into each other will we actually get somewhere. When an AI safety theory paper says, "Here is an experiment we can do to disprove this theory," then I will pay more attention than I do.

The "ignored physical aspect of computation" is less about a direction to follow, but more an argument about the type of systems that are likely to be effective and so an argument about which ones we should study. There is no point studying how to make ineffective systems safe if the lessons don't carry over to effective ones.

You don't want a system that puts in the same computational resources trying to decide what brand of oil is best for its bearings as it does to deciding the question of what is a human or not. If you decide how much computational resources you want to put into each class of decision, you start to get into meta-decision territory. You also need to decide how much of your pool you want to put into making that meta-decision as making it will take away from making your other decisions.

I am thinking about a possible system which can allocate resources among decision making systems and this can be used to align the programs (at least somewhat). It cannot align a super intelligent malign program, work needs to done on the initial population of programs in the system, so that we can make sure they do not appear. Or we need a different way of allocating resources entirely.

I don't pick this path because it is an easy path to safety, but because I think it is the only path that leads anywhere interesting/dangerous and so we need to think about how to make it safe.

Comment author: capybaralet 06 August 2017 07:32:08AM *  0 points [-]

Will - I think "meta-reasoning" might capture what you mean by "meta-decision theory". Are you familiar with this research (e.g. Nick Hay did a thesis w/Stuart Russell on this topic recently)?

I agree that bounded rationality is likely to loom large, but I don't think this means MIRI is barking up the wrong tree... just that other trees also contain parts of the squirrel.

Comment author: sdspikes 01 March 2017 01:50:13AM 1 point [-]

As a Stanford CS (BS/MS '10) grad who took AI/Machine Learning courses in college from Andrew Ng, worked at Udacity with Sebastian Thrun, etc. I have mostly been unimpressed by non-technical folks trying to convince me that AI safety (not caused by explicit human malfeasance) is a credible issue.

Maybe I have "easily corrected, false beliefs" but the people I've talked to at MIRI and CFAR have been pretty unconvincing to me, as was the book Superintelligence.

My perception is that MIRI has focused in on an extremely specific kind of AI that to me seems unlikely to do much harm unless someone is recklessly playing with fire (or intentionally trying to set one). I'll grant that that's possible, but that's a human problem, not an AI problem, and requires a human solution.

You don't try to prevent nuclear disaster by making friendly nuclear missiles, you try to keep them out of the hands of nefarious or careless agents or provide disincentives for building them in the first place.

But maybe you do make friendly nuclear power plants? Not sure if this analogy worked out for me or not.

Comment author: capybaralet 01 March 2017 11:34:16PM 2 points [-]

I'm also very interested in hearing you elaborate a bit.

I guess you are arguing that AIS is a social rather than a technical problem. Personally, I think there are aspects of both, but that the social/coordination side is much more significant.

RE: "MIRI has focused in on an extremely specific kind of AI", I disagree. I think MIRI has aimed to study AGI in as much generality as possible and mostly succeeded in that (although I'm less optimistic than them that results which apply to idealized agents will carry over and produce meaningful insights in real-world resource-limited agents). But I'm also curious what you think MIRIs research is focusing on vs. ignoring.

I also would not equate technical AIS with MIRI's research.

Is it necessary to be convinced? I think the argument for AIS as a priority is strong so long as the concerns have some validity to them, and cannot be dismissed out of hand.

Comment author: capybaralet 27 January 2017 02:31:26AM 3 points [-]

(cross posted on facebook):

I was thinking of applying... it's a question I'm quite interested in. The deadline is the same as ICML tho!

I had an idea I will mention here: funding pools: 1. You and your friends whose values and judgement you trust and who all have small-scale funding requests join together.
2. A potential donor evaluates one funding opportunity at random, and funds all or none of them on the basis of that evaluation. 3. You have now increased the ratio of funding / evaluation available to a potential donor by a factor of #projects 4. There is an incentive for you to NOT include people in your pool if you think their proposal is quite inferior to yours... however, you might be incentivized to include somewhat inferior proposals in order to reach a threshold where the combined funding opportunity is large enough to attract more potential donors.

3

EA essay contest for <18s

I am planning to sponsor an Effective Altruism essay contest for people <18 years old. See this document for details and prompts . The inspiration is the Ayn Rand Institute essay contest .   The goal is to motivate young people to learn about and get involved with EA, thus... Read More
Comment author: capybaralet 17 January 2017 05:38:13AM 0 points [-]

I was overall a bit negative on Sarah's post, because it demanded a bit too much attention, (e.g. the title), and seemed somewhat polemic. It was definitely interesting, and I learned some things.

I find the most evocative bit to be the idea that EA treats outsiders as "marks".
This strikes me as somewhat true, and sadly short-sighted WRT movement building. I do believe in the ideas of EA, and I think they are compelling enough that they can become mainstream.

Overall, though, I think it's just plain wrong to argue for an unexamined idea of honesty as some unquestionable ideal. I think doing so as a consequentialist, without a very strong justification, itself smacks of disingenuousness and seems motivated by the same phony and manipulative attitude towards PR that Sarah's article attacks.

What would be more interesting to me would be a thoughtful survey of potential EA perspectives on honesty, but an honest treatment of the subject does seem to be risky from a PR standpoint. And it's not clear that it would bring enough benefit to justify the cost. We probably will all just end up agreeing with common moral intuitions.

Comment author: MichaelDickens  (EA Profile) 24 December 2016 08:38:08PM 13 points [-]

I'm glad that you write this sort of thing. 80K is one of the few organizations that I see writing "why you should donate to us" articles. I believe more organizations should do this because they generally know more about their own accomplishments than anyone else. I wouldn't take an organization's arguments as seriously as a third party's because they're necessarily biased toward themselves, but they can still provide a useful service to potential donors by presenting the strongest arguments in favor of donating to them.

I have written before about why I'm not convinced that I should donate to 80K (see the comments on the linked comment thread). I have essentially the same concerns that I did then. Since you're giving more elaborate arguments than before, I can respond in more detail about why I'm still not convinced.

My fundamental concern with 80K is that the evidence it its favor is very weak. My favorite meta-charity is REG because it has a straightforward causal chain of impact, and it raises a lot of money for charities that I believe do much more good in expectation than GiveWell top charities. 80K can claim the latter to some extent but cannot claim the former.

Below I give a few of the concerns I have with 80K, and what could convince me to donate.

Highly indirect impact. A lot of 80K's claims to impact rely on long chains such that your actual effect is pretty indirect. For example, the claim that an IASPC is worth £7500 via getting people to sign the GWWC pledge relies on assuming:

  • These people would not have signed the pledge without 80K.
  • These people would not have done something similarly or more valuable otherwise.
  • The GWWC pledge is as valuable as GWWC claims it is.

I haven't seen compelling evidence that any of these is true, and they all have to be true for 80K to have the impact here that it claims to have.

Problems with counterfactuals.

When someone switches from (e.g.) earning to give to direct work, 80K adds this to its impact stats. When someone else switches from direct work to earning to give, 80K also adds this to its impact stats. The only way these can both be good is if 80K is moving people toward their comparative advantages, which is a much harder claim to justify. I would like to see more effort on 80K's part to figure out whether its plan changes are actually causing people to do more good.

Questionable marketing tactics.

This is somewhat less of a concern, but I might as well bring it up here. 80K uses very aggressive marketing tactics (invasive browser popups, repeated asks to sign up for things, frequent emails) that I find abrasive. 80K justifies these by claiming that it increases sign-ups, and I'm sure it does, but these metrics don't account for the cost of turning people off.

By comparison, GiveWell does essentially no marketing but has still attracted more attention than any other EA organization, and it has among the best reputations of any EA org. It attracts donors by producing great content rather than by cajoling people to subscribe to its newsletter. For most orgs I don't believe this would work because most orgs just aren't capable of producing valuable content, but like GiveWell, 80K produces plenty of good content.

Perhaps 80K's current marketing tactics are a good idea on balance, but we have no way of knowing. 80K's metrics can only observe the value its marketing produces and not the value it destroys. It may be possible to get better evidence on this; I haven't really thought about it.

Past vs. future impact.

80K has made a bunch of claims about its historical impact. I'm skeptical that the impact has been as big as 80K claims, but I'm also skeptical that the impact will continue to be as big. For example, 80K claims substantial credit for about a half dozen new organizations. Do we have any reason to believe that 80K will cause more organizations to be created, and that they will be as effective as the ones it contributed to in the past? 80K's writeup claims that it will but doesn't give much justification. Similarly, 80K claims that a lot of benefit comes from its articles, but writing new articles has diminishing utility as you start to cover the most important ideas.


In summary, to persuade me to donate to 80K, you need to convince me that it has sufficiently high leverage that it does more good than the single best direct-work org, and it has higher leverage than any other meta org. More importantly, you need to find strong evidence that 80K actually has the impact it claims to have, or better demonstrate that the existing evidence is sufficient.

Comment author: capybaralet 07 January 2017 02:05:59AM 1 point [-]

Do you have any info on how reliable self-reports are wrt counterfactuals about career changes and EWWC pledging?

I can imagine that people would not be very good at predicting that accurately.

Comment author: capybaralet 05 January 2017 07:38:16PM 3 points [-]

People are motivated both by: 1. competition and status and 2. cooperation and identifying with the successes of a group. I think we should aim to harness both of these forms of motivation.

Comment author: John_Maxwell_IV 23 December 2016 12:00:32PM 1 point [-]

9) seems pretty compelling to me. To use some analogies from the business world: it wouldn't make sense for a company to hire lots of people before it had a business model figured out, or run a big marketing campaign while its product was still being developed. Sometimes it feels to me like EA is doing those things. (But maybe that's just because I am less satisfied with the current EA "business model"/"product" than most people.)

Comment author: capybaralet 05 January 2017 06:13:17PM 0 points [-]

"But maybe that's just because I am less satisfied with the current EA "business model"/"product" than most people."

Care to elaborate (or link to something?)

View more: Next