Comment author: lukeprog 07 September 2017 04:00:23AM 3 points [-]

We got close to doing this when I was at MIRI but just didn't have the outreach capacity to do it. The closest we got was to print a bunch of paperback copies of (the first 17 chapters of) just one book, HPMoR, and we shipped copies of that to contacts at various universities etc. I think we distributed 1000-2000 copies, not sure if more happened after I left.

Comment author: Wei_Dai 07 September 2017 08:10:51AM 2 points [-]

This is a bit tangential, but do you know if anyone has done an assessment of the impact of HPMoR? Cousin_it (Vladimir Slepnev) recently wrote:

The question then becomes, how do we set up a status economy that will encourage research? Peer review is one way, because publications and citations are a status badge desired by many people. Participating in a forum like LW when it's "hot" and frequented by high status folks is another way, but unfortunately we don't have that anymore. From that perspective it's easy to see why the massively popular HPMOR didn't attract many new researchers to AI risk, but attracted people to HPMOR speculation and rational fic writing. People do follow their interests sometimes, but mostly they try to find venues to show off.

Taking this one step further, it seems to me that HPMoR may have done harm by directing people's attentions (including Eliezer's own) away from doing the hard work of making philosophical and practical progress in AI alignment and rationality, towards discussion/speculation of the book and rational fic writing, thereby contributing to the decline of LW. Of course it also helped bring new people into the rationalist/EA communities. What would be a fair assessment of its net impact?

Comment author: Wei_Dai 04 September 2017 11:37:52PM 3 points [-]

I'm also worried about the related danger of AI persuasion technology being "democratically" deployed upon open societies (i.e., by anyone with an agenda, not necessarily just governments and big corporations), with the possible effect that in the words of Paul Christiano, "we’ll live to see a world where it’s considered dicey for your browser to uncritically display sentences written by an untrusted party." This is arguably already true today for those especially vulnerable to conspiracy theories, but eventually will affect more and more people as the technology improves. How will we solve our collective problems when the safety of discussions are degraded to such an extent?

Comment author: Lukas_Gloor 21 July 2017 11:16:58PM *  9 points [-]

Is it just something like "preventing suffering is the most important thing to work on (and the disjunction of assumptions that can lead to this conclusion)"?

This sounds right. Before 2016, I would have said that rough value alignment (normatively "suffering-focused") is very-close-to necessary, but we updated away from this condition and for quite some time now hold the view that it is not essential if people are otherwise a good fit. We still have an expectation that researchers think about research-relevant background assumptions in ways that are not completely different from ours on every issue, but single disagreements are practically never a dealbreaker. We've had qualia realists both on the team (part-time) and as interns, and some team members now don't hold strong views on the issue one way or the other. Brian especially is a really strong advocate of epistemic diversity and goes much further with it than I feel most people would go.

People who are not so sure about consciousness anti-realism tend to be less certain about their values as a result, and hence don't focus on suffering as much.

Hm, this does not fit my observations. We had and still have people on our team who don't have strong confidence in either view, and there exists also a sizeable cluster of people who seem highly confident in both qualia realism and morality being about reducing suffering, the most notable example being David Pearce.

The one view that seems unusually prevalent within FRI, apart from people self-identifying with suffering-focused values, is a particular anti-realist perspective on morality and moral reasoning where valuing open-ended moral reflection is not always regarded as the by default "prudent" thing to do. This is far from a consensus and many team members value moral reflection a great deal, but many of us expect less “work” to be done by value-reflection procedures than others in the EA movement seemingly expect. Perhaps this is due to different ways of thinking about extrapolation procedures, or perhaps it’s due to us having made stronger lock-ins to certain aspects of our moral self image.

Paul Christiano’s indirect normativity write-up for instance deals with the "Is “Passing the Buck” Problematic?” objection in an in my view unsatisfying way. Working towards a situation where everyone has much more time to think about their values is more promising the more likely it is that there is “much to be gained,” normatively. But this somewhat begs the question. If one finds suffering-focused views very appealing, other interventions become more promising. There seems to be high value of information on narrowing down one’s moral uncertainty in this domain (much more so, arguably, than with questions of consciousness or which computations to morally care about). One way to attempt to reduce one’s moral uncertainty and capitalize on the value of information is by thinking more about the object-level arguments in population ethics; another way to do it is by thinking more about the value of moral reflection, how much it depends on intuition or self-image-based "lock ins" vs. how much it (either in general or in one's personal case) is based on other things that are more receptive to information gains or intelligence gains.

Personally, I would be totally eager to place the fate of “Which computations count as suffering?” into the hands of some in-advance specified reflection process, even when I feel like I don’t understand the way moral reflection will work out in the details of this complex algorithm. I’d be less confident in my current understanding of consciousness than I’d be confident in being able to pick a reassuring-seeming way of delegating the decision-making to smarter advisors. However, I get the opposite feeling when it comes to questions of population ethics. There, I feel like I have thought about the issue a lot, experience it as easier and more straightforward to think about than consciousness and whether I care about insects or electrons or Jupiter brains, and I have some strong intuitions and aspects of my self-identity about the matter and am unsure in which legitimate ways (as opposed to failures of goal preservation) I could gain evidence that would strongly change my mind. It would feel wrong to me to place the fate of my values into some in-advance specified, open-ended deliberation algorithm where I won’t really understand how it will play out and what initial settings make which kind of difference to the end result (and why). I'd be fine with quite "conservative" reflection procedures where I could be confident that it would likely output something that does not seem too far away from my current thinking, but would be gradually more worried about more open-ended ones.

Comment author: Wei_Dai 22 July 2017 10:06:39AM *  6 points [-]

The one view that seems unusually prevalent within FRI, apart from people self-identifying with suffering-focused values, is a particular anti-realist perspective on morality and moral reasoning where valuing open-ended moral reflection is not always regarded as the by default "prudent" thing to do.

Thanks for pointing this out. I've noticed this myself in some of FRI's writings, and I'd say this, along with the high amount of certainty on various object-level philosophical questions that presumably cause the disvaluing of reflection about them, are what most "turns me off" about FRI. I worry a lot about potential failures of goal preservation (i.e., value drift) too, but because I'm highly uncertain about just about every meta-ethical and normative question, I see no choice but to try to design some sort of reflection procedure that I can trust enough to hand off control to. In other words, I have nothing I'd want to "lock in" at this point and since I'm by default constantly handing off control to my future self with few safeguards against value drift, doing something better than that default is one of my highest priorities. If other people are also uncertain and place high value on (safe/correct) reflection as a result, that helps with my goal (because we can then pool resources together to work out what safe/correct reflection is), so it's regrettable to see FRI people sometimes argue for more certainty than I think is warranted and especially to see them argue against reflection.

Comment author: Wei_Dai 21 July 2017 09:07:49PM 3 points [-]

I'm a bit surprised to find that Brian Tomasik attributes his current views on consciousness to his conversations with Carl Shulman, since in my experience Carl is a very careful thinker and the case for accepting anti-realism as the answer to the problem of consciousness seems pretty weak, at least as explained by Brian. I'm very curious to read Carl's own explanation of his views, if he has written one down. I scanned Carl Shulman's list of writings but was unable to find anything that addressed this.

Comment author: Kaj_Sotala 20 July 2017 11:10:53PM *  10 points [-]

This looks sensible to me. I'd just quickly note that I'm not sure if it's quite accurate to describe this as "FRI's metaphysics", exactly - I work for FRI, but haven't been sold on the metaphysics that you're criticizing. In particular, I find myself skeptical of the premise "suffering is impossible to define objectively", which you largely focus on. (Though part of this may be simply because I haven't yet properly read/considered Brian's argument for it, so it's possible that I would change my mind about that.)

But in any case, I've currently got three papers in various stages of review, submission or preparation (that other FRI people have helped me with), and none of those papers presuppose this specific brand of metaphysics. There's a bunch of other work being done, too, which I know of and which I don't think presupposes it. So it doesn't feel quite accurate to me to suggest that the metaphysics would be holding back our progress, though of course there can be some research being carried out that's explicitly committed to this particular metaphysics.

(opinions in this comment purely mine, not an official FRI statement etc.)

Comment author: Wei_Dai 21 July 2017 03:39:04PM 6 points [-]

What would you say are the philosophical or other premises that FRI does accept (or tends to assume in its work), which distinguishes it from other people/organizations working in a similar space such as MIRI, OpenAI, and QRI? Is it just something like "preventing suffering is the most important thing to work on (and the disjunction of assumptions that can lead to this conclusion)"?

It seems to me that a belief in anti-realism about consciousness explains a lot of Brian's (near) certainty about his values and hence his focus on suffering. People who are not so sure about consciousness anti-realism tend to be less certain about their values as a result, and hence don't focus on suffering as much. Does this seem right, and if so, can you explain what premises led you to work for FRI?

Comment author: Wei_Dai 20 July 2017 07:50:49PM 15 points [-]

What lazy solutions will look like seems unpredictable to me. Suppose someone in the future wants to realistically roleplay a historical or fantasy character. The lazy solution might be to simulate a game world with conscious NPCs. The universe contains so much potential for computing power (which presumably can be turned into conscious experiences), that even if a very small fraction of people do this (or other things whose lazy solutions happen to involve suffering), that could create an astronomical amount of suffering.

Comment author: Daniel_Dewey 10 July 2017 07:22:05PM 3 points [-]

I think there's something to this -- thanks.

To add onto Jacob and Paul's comments, I think that while HRAD is more mature in the sense that more work has gone into solving HRAD problems and critiquing possible solutions, the gap seems much smaller to me when it comes to the justification for thinking HRAD is promising vs justification for Paul's approach being promising. In fact, I think the arguments for Paul's work being promising are more solid than those for HRAD, despite it only being Paul making those arguments -- I've had a much harder time understanding anything more nuanced than the basic case for HRAD I gave above, and a much easier time understanding why Paul thinks his approach is promising.

Comment author: Wei_Dai 15 July 2017 03:41:19PM *  1 point [-]

Daniel, while re-reading one of Paul's posts from March 2016, I just noticed the following:

[ETA: By the end of 2016 this problem no longer seems like the most serious.] ... [ETA: while robust learning remains a traditional AI challenge, it is not at all clear that it is possible. And meta-execution actually seems like the ingredient furthest from existing ML practice, as well as having non-obvious feasibility.]

My interpretation of this is that between March 2016 and the end of 2016, Paul updated the difficulty of his approach upwards. (I think given the context, he means that other problems, namely robust learning and meta-execution, are harder, not that informed oversight has become easier.) I wanted to point this out to make sure you updated on his update. Clearly Paul still thinks his approach is more promising than HRAD, but perhaps not by as much as before.

Comment author: jsteinhardt 13 July 2017 03:16:20PM 4 points [-]

(Speaking for myself, not OpenPhil, who I wouldn't be able to speak for anyways.)

For what it's worth, I'm pretty critical of deep learning, which is the approach OpenAI wants to take, and still think the grant to OpenAI was a pretty good idea; and I can't really think of anyone more familiar with MIRI's work than Paul who isn't already at MIRI (note that Paul started out pursuing MIRI's approach and shifted in an ML direction over time).

That being said, I agree that the public write-up on the OpenAI grant doesn't reflect that well on OpenPhil, and it seems correct for people like you to demand better moving forward (although I'm not sure that adding HRAD researchers as TAs is the solution; also note that OPP does consult regularly with MIRI staff, though I don't know if they did for the OpenAI grant).

Comment author: Wei_Dai 13 July 2017 05:06:54PM *  4 points [-]

I can't really think of anyone more familiar with MIRI's work than Paul who isn't already at MIRI (note that Paul started out pursuing MIRI's approach and shifted in an ML direction over time).

The Agent Foundations Forum would have been a good place to look for more people familiar with MIRI's work. Aside from Paul, I see Stuart Armstrong, Abram Demski, Vadim Kosoy, Tsvi Benson-Tilsen, Sam Eisenstat, Vladimir Slepnev, Janos Kramar, Alex Mennen, and many others. (Abram, Tsvi, and Sam have since joined MIRI, but weren't employees of it at the time of the Open Phil grant.)

That being said, I agree that the public write-up on the OpenAI grant doesn't reflect that well on OpenPhil, and it seems correct for people like you to demand better moving forward

I had previously seen some complaints about the way the OpenAI grant was made, but until your comment, hadn't thought of a possible group blind spot due to a common ML perspective. If you have any further insights on this and related issues (like why you're critical of deep learning but still think the grant to OpenAI was a pretty good idea, what are your objections to Paul's AI alignment approach, how could Open Phil have done better), would you please write them down somewhere?

Comment author: jsteinhardt 11 July 2017 03:59:21PM 5 points [-]

I think the argument along these lines that I'm most sympathetic to is that Paul's agenda fits more into the paradigm of typical ML research, and so is more likely to fail for reasons that are in many people's collective blind spot (because we're all blinded by the same paradigm).

Comment author: Wei_Dai 13 July 2017 11:37:14AM 5 points [-]

That actually didn't cross my mind before, so thanks for pointing it out. After reading your comment, I decided to look into Open Phil's recent grants to MIRI and OpenAI, and noticed that of the 4 technical advisors Open Phil used for the MIRI grant investigation (Paul Christiano, Jacob Steinhardt, Christopher Olah, and Dario Amodei), all either have a ML background or currently advocate a ML-based approach to AI alignment. For the OpenAI grant however, Open Phil didn't seem to have similarly engaged technical advisors who might be predisposed to be critical of the potential grantee (e.g., HRAD researchers), and in fact two of the Open Phil technical advisors are also employees of OpenAI (Paul Christiano and Dario Amodei). I have to say this doesn't look very good for Open Phil in terms of making an effort to avoid potential blind spots and bias.

Comment author: Paul_Christiano 11 July 2017 04:04:41PM *  3 points [-]

On capability amplification:

MIRI's traditional goal would allow you to break cognition down into steps that we can describe explicitly and implement on transistors, things like "perform a step of logical deduction," "adjust the probability of this hypothesis," "do a step of backwards chaining," etc. This division does not need to be competitive, but it needs to be reasonably close (close enough to obtain a decisive advantage).

Capability amplification requires breaking cognition down into steps that humans can implement. This decomposition does not need to be competitive, but it needs to be efficient enough that it can be implemented during training. Humans can obviously implement more than transistors, the main difference is that in the agent foundations case you need to figure out every response in advance (but then can have a correspondingly greater reason to think that the decomposition will work / will preserve alignment).

I can talk in more detail about the reduction from (capability amplification --> agent foundations) if it's not clear whether it is possible and it would have an effect on your view.

On competitiveness:

I would prefer be competitive with non-aligned AI, rather than count on forming a singleton, but this isn't really a requirement of my approach. When comparing difficulty of two approaches you should presumably compare the difficulty of achieving a fixed goal with one approach or the other.

On reliability:

On the agent foundations side, it seems like plausible approaches involve figuring out how to peer inside the previously-opaque hypotheses, or understanding what characteristic of hypotheses can lead to catastrophic generalization failures and then excluding those from induction. Both of these seem likely applicable to ML models, though would depend on how exactly they play out.

On the ML side, I think the other promising approaches involve either adversarial training, ensembling / unanimous votes, which could be applied to the agent foundations problem.

Comment author: Wei_Dai 11 July 2017 05:45:15PM 3 points [-]

I can talk in more detail about the reduction from (capability amplification --> agent foundations) if it's not clear whether it is possible and it would have an effect on your view.

Yeah, this is still not clear. Suppose we had a solution to agent foundations, I don't see how that necessarily helps me figure out what to do as H in capability amplification. For example the agent foundations solution could say, use (some approximation of) exhaustive search in the following way, with your utility function as the objective function, but that doesn't help me because I don't have a utility function.

When comparing difficulty of two approaches you should presumably compare the difficulty of achieving a fixed goal with one approach or the other.

My point was that HRAD potentially enables the strategy of pushing mainstream AI research away from opaque designs (which are hard to compete with while maintaining alignment, because you don't understand how they work and you can't just blindly copy the computation that they do without risking safety), whereas in your approach you always have to worry about "how do I compete with with an AI that doesn't have an overseer or has an overseer who doesn't care about safety and just lets the AI use whatever opaque and potentially dangerous technique it wants".

On the agent foundations side, it seems like plausible approaches involve figuring out how to peer inside the previously-opaque hypotheses, or understanding what characteristic of hypotheses can lead to catastrophic generalization failures and then excluding those from induction.

Oh I see. In my mind the problems with Solomonoff Induction means that it's probably not the right way to define how induction should be done as an ideal, so we should look for something kind of like Solomonoff Induction but better, not try to patch it by doing additional things on top of it. (Like instead of trying to figure out exactly when CDT would make wrong decisions and add more complexity on top of it to handle those cases, replace it with UDT.)

View more: Next