kbog comments on Principia Qualia: blueprint for a new cause area, consciousness research with an eye toward ethics and x-risk - Effective Altruism Forum

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (49)

You are viewing a single comment's thread. Show more comments above.

Comment author: Jessica_Taylor 09 December 2016 11:10:33PM *  3 points [-]

Some thoughts:

IMO the most plausible non-CEV proposals are

  1. Act-based agents, which defer to humans to a large extent. The goal is to keep humans in control of the future.
  2. Task AI, which is used to accomplish concrete objectives in the world. The idea would be to use this to accomplish goals people would want accomplished using AI (including reducing existential risk), while leaving the future moral trajectory in the hands of humans.

Both proposals end up deferring to humans to decide the long-run trajectory of humanity. IMO, this isn't a coincidence; I don't think it's likely that we get a good outcome without deferring to humans in the long run.

Some more specific comments:

If pleasure/happiness is an important core part of what humanity values, or should value, having the exact information-theoretic definition of it on-hand could directly and drastically simplify the problems of what to maximize, and how to load this value into an AGI

There's one story where this makes a little bit of sense, where we basically give up on satisfying any human values other than hedonic values, and build an AI that maximizes pleasure without satisfying any other human values. I'm skeptical that this is any easier than solving the full value alignment problem, but even if it were, I think this would be undesirable to the vast majority of humans, and so we would collectively be better off coordinating around a higher target.

If we're shooting for a higher target, then we have some story for why we get more values than just hedonic values. E.g. the AI defers to human moral philosophers on some issues. But this method should also succeed for loading hedonic values. So there isn't a significant benefit to having hedonic values specified ahead of time.

Even if pleasure isn’t a core terminal value for humans, it could still be used as a useful indirect heuristic for detecting value destruction. I.e., if we’re considering having an AGI carry out some intervention, we could ask it what the expected effect is on whatever pattern precisely corresponds to pleasure/happiness.

This seems to be in the same reference class as asking questions like "how many humans exist" or "what's the closing price of the Dow Jones". I.e. you can use it to check if things are going as expected, though the metric can be manipulated. Personally I'm pessimistic about such sanity checks in general, and even if I were optimistic about them, I would think that the marginal value of one additional sanity check is low.

There’s going to be a lot of experimentation involving intelligent systems, and although many of these systems won’t be “sentient” in the way humans are, some system types will approach or even surpass human capacity for suffering.

See Eliezer's thoughts on mindcrime. Also see the discussion in the comments. It does seem like consciousness research could help for defining a nonpersonhood predicate.

I don't have comments on cognitive enhancement since it's not my specialty.

Some of the points (6,7,8) seem most relevant if we expect AGI to be designed to use internal reinforcement substantially similar to humans' internal reinforcement and substantially different from modern reinforcement learning. I don't have precise enough models of such AGI systems that I feel optimistic about doing research related to such AGIs, but if you think questions like "how would we incentivize neuromorphic AI systems to do what we want" are tractable then maybe it makes sense for you to do research on this question. I'm pessimistic about things in the reference class of IIT making any progress on this question, but maybe you have different models here.

I agree that "Valence research could change the social and political landscape AGI research occurs in" and, like you, I think the sign is unclear.

(I am a MIRI research fellow but am currently speaking for myself not my employer).

Comment author: kbog  (EA Profile) 10 December 2016 04:21:00AM 0 points [-]
Comment author: Jessica_Taylor 10 December 2016 05:09:03AM *  1 point [-]

I expect:

  1. We would lose a great deal of value by optimizing the universe according to current moral uncertainty, without the opportunity to reflect and become less uncertain over time.

  2. There's a great deal of reflection necessary to figure out what actions moral theory X recommends, e.g. to figure out which minds exist or what implicit promises people have made to each other. I don't see this reflection as distinct from reflection about moral uncertainty; if we're going to defer to a reflection process anyway for making decisions, we might as well let that reflection process decide on issues of moral theory.

Comment author: turchin 08 July 2018 02:09:55PM 0 points [-]

What if AI exploring moral uncertainty finds that there is provably no correct moral theory or right moral facts? It that case, there is no moral uncertainty between moral theories, as they are all false. Could it escape this obstacle just by aggregating human's opinion about possible situations?

Comment author: kbog  (EA Profile) 11 July 2018 12:09:16PM *  1 point [-]

What if AI exploring moral uncertainty finds that there is provably no correct moral theory or right moral facts?

In that case it would be exploring traditional metaethics, not moral uncertainty.

But if moral uncertainty is used as a solution then we just bake in some high level criteria for the appropriateness of a moral theory, and the credences will necessarily sum to 1. This is little different from baking in coherent extrapolated volition. In either case the agent is directly motivated to do whatever it is that satisfies our designated criteria, and it will still want to do it regardless of what it thinks about moral realism.

Those criteria might be very vague and philosophical, or they might be very specific and physical (like 'would a simulation of Bertrand Russell say "a-ha, that's a good theory"?'), but either way they will be specified.