CL

Chris Leong

Organiser @ AI Safety Australia and NZ
6046 karmaJoined Nov 2015Sydney NSW, Australia

Bio

Participation
7

Currently doing local AI safety Movement Building in Australia and NZ.

Comments
1003

I didn't know that CHAI or 80,000 Hours had recommended material.

The 80,000 Hours syllabus = "Go read a bunch of textbooks". This is probably not ideal for a "getting started' guide.

I was there for an AI Safety workshop, I can't remember the content though. Do you know what you included?

I found that just open discussion sometimes leads to less valuable discussion, so in both cases I'd focus on a few specific discussion prompts / trying to help people come to a conclusion on some question


That's useful feedback. Maybe it'd be best to take some time at the end of the first session of the week to figure out what questions to discuss in the second session? This would also allow people to look things up before the discussion and take some time for reflection.

I'd be keen to hear specifically what the pre-requisite knowledge is - just in order to inform people if they 'know enough' to take your course. Maybe it's weeks 1-3 of the alignment course?

Thoughts on prerequisites off the top of my head:
Week 0: Even though it is a theory course, it would likely be useful to have some basic understanding of machine learning, although this would vary depending on the exact content of the course. It might or might not make sense to run a week 0 depending on most people's backgrounds.
Week 1 & 2: I'd assume that the participants have at least a basic understanding of inner vs outer alignment, deceptive alignment, instrumental convergence, orthogonality thesis, why we're concerned about powerful optimisers, value lock-in, recursive self-improvement, slow vs. fast take-off, superintelligence, transformative AI, wireheading, though I could quite easily create a document that defines all of these terms. The purpose of this course also wouldn't be to reiterate the basic AI safety argument, although it might cover debates such as the validity of counting arguments for mesa-optimisers or whether RLHF means that we should expect outer alignment to be solved by default.

I.e. what if you ask 3-5 experts what they think the most important part of agent foundations is, and maybe try to conduct 30 min interviews with them to solicit the story they would tell in a curriculum? You can also ask them their top recommended resources, and why they recommend it. That would be a strong start, I think.

That's a great suggestion. I would still be tempted to create a draft curriculum though, even just at the level of week 1 focuses on question x and includes readings on topics a, b and c. I could also lift heavily from the previous agent foundations week and other past versions of AISF, alignment 201, key phenomenon in AI Safety, MATS AI Safety Strategy Curriculum, MIRI's Research Guide,  John Wentworth's alignment training program + the highlighted AI Safety Sequences on Less Wrong (in addition to possibly including some material from the AI Safety Bootcamp or Advanced Fellowship that I ran).

I'd want to first ask them what they would like to see included without them being anchored on my draft, then I'd show them my draft and ask for more specific feedback. Expert time is valuable, so I'd want to get the most out of their time and it is easier to critique a specific artifact.

I'm quite tempted to create a course for conceptual AI alignment, especially since agent foundations has been removed from the latest version of the BlueDot Impact course[1].

If I did this, I would probably run it as follows:

a) Each week would have two sessions. One to discuss the readings and another for people to bounce their takes off others in the cohort. I expect that people trying to learn conceptual alignment would benefit from having extra time to discuss their ideas with informed participants.
b) The course would be less introductory, though without assuming knowledge of AGISF. AGISF already serves as a general introduction for those who need it and making progress on conceptual alignment is less of a numbers game, so it would likely make sense to focus on people further along the pipeline, rather than trying to expand the top of the funnel. In terms of the rough target audience, I imagine people who have been browsing Less Wrong or hanging around the AI safety community for years; or maybe someone who found out about it more recently and has been seriously reading up on it for the last couple of months. For this reason, I would want to assume that people already know why we're worried about AI Safety and basic ideas like inner/outer alignment and instrumental convergence.[2]
c) I'd probably follow the AGISF in picking one question to focus on every week. I also like how it contextualises each reading.

Figuring out what to include seems like it'd be a massive challenge, but I agree that one of the best ways to do this would be to just create a curriculum, send it around to people and then additionally collect feedback from people who have gone through the course.

Anyway, I'd love to hear if anyone has any thoughts on what such a course should look like.

(The closest current course is the Key Phenomenon in AI Safety Course that PIBSS ran, but this would assume that people are more technical - in the broader sense where technical includes maths, physics, comp sci, etc - and would be less introductory).

  1. ^

    This is quite a reasonable decision. Shorter timelines makes agent foundations work less pressing. Additionally, I imagine that most people who complete AGISF would not gain that much value from covering a week on agent foundations, at least not this early in their alignment journeys. Having a week where a substantial part of the cohort feel "why was I taught this" is not a very good experience for them.

  2. ^

    Though it wouldn't be too hard to create a document containing assumed knowledge.

I think the biggest criticism that this cause will face from an EA perspective is that it's going to be pretty hard to argue for moving more talent to first-world countries to do random things than either convincing more medical, educational or business talent to move to developing countries to help them develop or to focus on bringing more talent to top cause areas. I'm not saying that such a case couldn't be made, just that I think it'd be tricky.

The upshot is: I recommend only choosing this career entry route if you are someone for whom working exclusively at EA organisations is incredibly high on your priority list.


I think taking a role like this early on could also be high-value if you're trying to determine whether working in a particular cause area is for you. Often it's useful to figure that out pretty early on. Of course, the fact that it isn't the exact same job as you might be doing later on might make it less valuable for this.

This is a very interesting idea. I'd love to see if someone could make it work.

I'm perfectly fine with holding an opinion that goes against the consensus. Maybe I could have worded it a bit better though? Happy to listen to any feedback on this.

I suppose at this stage it's probably best to just agree to disagree.

Sorry, I misread the definition of ex ante.

I agree that the post poses a challenge to the standard EA view.

I don't see "There are no massive differences in impact between individuals" as an accurate characterization of the claim the argument is showing.

 "There are no massive ex ante differences in impact between individuals" would be a reasonable title. Or perhaps "no massive identifiable differences"?

Load more