WD

Wei Dai

3887 karmaJoined Jun 2015

Posts
7

Sorted by New
9
· 4y ago · 1m read

Comments
216

Some suggestions for you to consider:

  1. Target a different (non-EA) audience.
  2. Do not say anything or cite any data that could be interpreted or misinterpreted as racist (keeping in mind that some people will be highly motivated to interpret them in this way).
  3. Tailor your message to what you can say/cite. For example, perhaps frame the cause as one of pure justice/fairness (as opposed to consequentialist altruism), e.g., it's simply unfair that some people can not afford genetic enhancement while others can. (Added: But please think this through carefully to prevent undesirable side effects, e.g., making some people want to ban genetic enhancement altogether.)
  4. You may need to start a new identity in order to successfully do the above.

Then I think for practical decision-making purposes we should apply a heavy discount to world A) — in that world, what everyone else would eventually want isn’t all that close to what I would eventually want. Moreover what me-of-tomorrow would eventually want probably isn’t all that close to what me-of-today would eventually want. So it’s much much less likely that the world we end up with even if we save it is close to the ideal one by my lights. Moreover, even though these worlds possibly differ significantly, I don’t feel like from my present position I have that much reason to be opinionated between them; it’s unclear that I’d greatly imperfect worlds according to the extrapolated volition of some future-me, relative to the imperfect worlds according to the extrapolated volition of someone else I think is pretty reasonable.

  1. You seem to be assuming that people's extrapolated views in world A will be completely uncorrelated with their current views/culture/background, which seems a strange assumption to make.
  2. People's extrapolated views could be (in part) selfish or partial, which is an additional reason that extrapolated views of you at different times may be closer than that of strangers.
  3. People's extrapolated views not converging doesn't directly imply "it’s much much less likely that the world we end up with even if we save it is close to the ideal one by my lights" because everyone could still get close to what they want through trade/compromise, or you (and/or others with extrapolated views similar to yours) could end up controlling most of the future by winning the relevant competitions.
  4. It's not clear that applying a heavy discount to world A makes sense, regardless of the above, because we're dealing with "logical risk" which seems tricky in terms of decision theory.

Thanks, lots of interesting articles in this list that I missed despite my interest in this area.

One suggestion I have is to add some studies of failed attempts at building/reforming institutions, otherwise one might get a skewed view of the topic. (Unfortunately I don't have specific readings to suggest.)

A related topic you don't mention here (maybe due to lack of writings on it?) is maybe humanity should pause AI development and have a long (or even short!) reflection about what it wants to do next, e.g. resume AI development or do something else like subsidize intelligence enhancement (e.g. embryo selection) for everyone who wants it so more people can meaningfully participate in deciding the fate of our world. (I note that many topics on this reading list are impossible for most humans to fully understand, perhaps even with AI assistance.)

I claim that this area outscores regular AI safety on importance while being significantly more neglected

This neglect is itself perhaps one of the most important puzzles of our time. With AGI very plausibly just a few years away, why aren't more people throwing money or time/effort at this cluster of problems just out of self interest? Why isn't there more intellectual/academic interest in these topics, many of which seem so intrinsically interesting to me?

We have to make judgment calls about how to structure our reflection strategy. Making those judgment calls already gets us in the business of forming convictions. So, if we are qualified to do that (in “pre-reflection mode,” setting up our reflection procedure), why can’t we also form other convictions similarly early?

  1. I'm very confused/uncertain about many philosophical topics that seem highly relevant to morality/axiology, such as the nature of consciousness and whether there is such a thing as "measure" or "reality fluid" (and if so what is it based on). How can it be right or safe to form moral convictions under such confusion/uncertainty?
  2. It seems quite plausible that in the future I'll have access to intelligence-enhancing technologies that will enable me to think of many new moral/philosophical arguments and counterarguments, and/or to better understand existing ones. I'm reluctant to form any convictions until that happens (or the hope of it ever happening becomes very low).

Also I'm not sure how I would form object-level moral convictions even if I wanted to. No matter what I decide today, why wouldn't I change my mind if I later hear a persuasive argument against it? The only thing I can think of is to hard-code something to prevent my mind being changed about a specific idea, or to prevent me from hearing or thinking arguments against a specific idea, but that seems like a dangerous hack that could mess up my entire belief system.

Therefore, it seems reasonable/defensible to think of oneself as better positioned to form convictions about object-level morality (in places where we deem it safe enough).

Do you have any candidates for where you deem it safe enough to form object-level moral convictions?

I put the full report here so you don't have to wait for them to email it to you.

Anyone with thoughts on what went wrong with EA's involvement in OpenAI? It's probably too late to apply any lessons to OpenAI itself, but maybe not too late elsewhere (e.g., Anthropic)?

While drafting this post, I wrote down and then deleted an example of "avoiding/deflecting questions about risk" because the person I asked such a question is probably already trying to push their organization to take risks more seriously, and probably had their own political considerations for not answering my question, so I don't want to single them out for criticism, and also don't want to damage my relationship with this person or make them want to engage less with me or people like me in the future.

Trying to enforce good risk management via social rewards/punishments might be pretty difficult for reasons like these.

My main altruistic endeavor involves thinking and writing about ideas that seem important and neglected. Here is a list of the specific risks that I'm trying to manage/mitigate in the course of doing this. What other risks am I overlooking or not paying enough attention to, and what additional mitigations I should be doing?

  1. Being wrong or overconfident, distracting people or harming the world with bad ideas.
    1. Think twice about my ideas/arguments. Look for counterarguments/risks/downsides. Try to maintain appropriate uncertainties and convey them in my writings.
  2. The idea isn't bad, but some people take it too seriously or too far.
    1. Convey my uncertainties. Monitor subsequent discussions and try to argue against people taking my ideas too seriously or too far.
  3. Causing differential intellectual progress in an undesirable direction, e.g., speeding up AI capabilities relative to AI safety, spreading ideas that are more useful for doing harm than doing good.
    1. Check ideas/topics for this risk. Self-censor ideas or switch research topics if the risk seems high.
  4. Being first to talk about some idea, but not developing/pursuing it as vigorously as someone else might if they were first, thereby causing a net delay in intellectual or social progress.
    1. Not sure what to do about this one. So far not doing anything except to think about it.
  5. PR/political risks, e.g., talking about something that damages my reputation or relationships, and in the worst case harms people/causes/ideas associated with me.
    1. Keep this in mind and talk more diplomatically or self-censor when appropriate.

@Will Aldred I forgot to mention that I do have the same concern about "safety by eating marginal probability" on AI philosophical competence as on AI alignment, namely that progress on solving problems lower in the difficulty scale might fool people into having a false sense of security. Concretely, today AIs are so philosophically incompetent that nobody trusts them to do philosophy (or almost nobody), but if they seemingly got better, but didn't really (or not enough relative to appearances), a lot more people might and it could be hard to convince them not to.

Thanks for the comment. I agree that what you describe is a hard part of the overall problem. I have a partial plan, which is to solve (probably using analytic methods) metaphilosophy for both analytic and non-analytic philosophy, and then use that knowledge to determine what to do next. I mean today the debate between the two philosophical traditions is pretty hopeless, since nobody even understands what people are really doing when they do analytic or non-analytic philosophy. Maybe the situation will improve automatically when metaphilosophy has been solved, or at least we'll have a better knowledge base for deciding what to do next.

If we can't solve metaphilosophy in time though (before AI takeoff), I'm not sure what the solution is. I guess AI developers use their taste in philosophy to determine how to filter the dataset, and everyone else hopes for the best?

Load more