Quantified Intuitions: An epistemics training website including a new EA-themed calibration app

Sage; elifland

Crossposted to LessWrong

TL;DR Quantified Intuitions helps users practice assigning credences to outcomes with a quick feedback loop. Please leave feedback in the comments, join our Discord, or send thoughts to aaron@sage-future.org.

Quantified Intuitions

Quantified Intuitions currently consists of two apps:

Calibration game: Assigning confidence intervals to EA-related trivia questions.
1. Question sources vary but many are from Anki deck for "Some key numbers that (almost) every EA should know"
2. Compared to Open Philanthropy’s calibration app, it currently contains less diversity of questions (hopefully more interesting to EAF/LW readers) but the app is more modern and nicer to use in some ways
Pastcasting: Forecasting on already resolved questions that you don’t have prior knowledge about.
1. Questions are pulled from Metaculus and Good Judgment Open
2. More info on motivation and how it works are in the LessWrong announcement post

Please leave feedback in the comments, join our Discord, or send it to aaron@sage-future.org.

Motivation

There are huge benefits to using numbers when discussing disagreements: see “3.3.1 Expressing degrees of confidence” in Reasoning Transparency by OpenPhil. But anecdotally, many EAs still feel uncomfortable quantifying their intuitions and continue to prefer using words like “likely” and “plausible” which could be interpreted in many ways.

This issue is likely to get worse as the EA movement attempts to grow quickly, with many new members joining who are coming in with various backgrounds and perspectives on the value of subjective credences. We hope that Quantified Intuitions can help both new and longtime EAs be more comfortable turning their intuitions into numbers.

More background on motivation can be found in Eli’s forum comments here and here.

Who built this?

Sage is an organization founded earlier this year by Eli Lifland, Aaron Ho and Misha Yagudin (in a part-time advising capacity). We’re funded by the FTX Future Fund.

As stated in the grant summary, our initial plan was to “create a pilot version of a forecasting platform, and a paid forecasting team, to make predictions about questions relevant to high-impact research”. While we build a decent beta forecasting platform (that we plan to open source at some point), the pilot for forecasting on questions relevant to high-impact research didn’t go that well due to (a) difficulties in creating resolvable questions relevant to cruxes in AI governance and (b) time constraints of talented forecasters. Nonetheless, we are still growing Samotsvety’s capacity and taking occasional high-impact forecasting gigs.

Eli was also struggling some personally around this time and updating toward AI alignment being super important but crowd forecasting not being that promising for attacking it. He stepped down and is now advising Sage part-time.

Meanwhile, we pivoted to building the apps contained in Quantified Intuitions to improve and maintain epistemics in EA. Aaron wrote most of the software for both apps within the past few months, Alejandro Ortega helped with the calibration game questions and Alina Timoshkina helped with a wide variety of tasks.

If you’d like to contact Sage you can message us on EAF/LW or email aaron@sage-future.org. If you’re interested in helping build apps similar to the ones on Quantified Intuitions or improving the current apps, fill out this expression of interest. It’s possible that we’ll hire a software engineer, product manager, and/or generalist, but we don’t have concrete plans.

86 Reactions

Mentioned in

69The Estimation Game: a monthly Fermi estimation web app

57Anki with Uncertainty: Turn any flashcard deck into a calibration training tool

25EA & LW Forums Weekly Summary (19 - 25 Sep 22')

23Forecasting Newsletter: September 2022.

19Forecast your 2024 with Fatebook

Load more (5/6)

More posts like this

Comments8

Sorted by

New & upvoted

Click to highlight new comments since: Today at 11:45 AM

Hauke Hillebrandt2y12

I might be biased because I had an idea for something very similar, but I think this is amazing and I think hit on something very, very interesting. I found the calibration training game very addictive (in a good way) and actually played it for for a few hours.

I think it might be because I play it in particular way though:

I always set it to 90%.
Then, I only put in orders of magnitudes, even when the prompt and mask doesn't force the user to do this. So for instance, 'What percent of the world's population was killed by the 1918 flu pandemic?' I put in: 90% Confidence Interval, Lower Bound: 1%, Upper Bound: 10%. This has two advantages:

I can play the game very quickly - I can do a rough BOTEC in my head.
I'm almost always accurate but not very precise but when I'm not, I'm literally orders of magnitude off and I get this huge prediction error signal - and that is very memorable (and I feel a bit dumb! :D). This might also guide people towards those parts of my model of the world, where I have biggest gaps in my knowledge (certain scientific subjects). 'It's better to be roughly right than precisely wrong'. I think you could implement a spaced repetition feature based on how many orders of magnitude you’re off, where the more OOMs you're off, the earlier it prompts you with the same question again (so if you're say >3 orders of magnitude off it prompts you within the same session, if you're 2 orders of magnitude of within 24 hours, 1 within in 3 days (from Remnote)). You could preferentially prioritize displaying questions that people often get wrong, perhaps even personalize it using ML.

With that in mind, here are some feature suggestions:

You're already pretty good at getting people to make rough orders of magnitude estimations, by often using scientific notation, but you could zero in on this aspect of the game.

Add even higher confidence setting like 95% and 99%, and perhaps make that the default. This will get users to answer questions faster.
Restrict the input to orders of magnitude or make that the default. It might also be good to select million, 10 million, 100M from a drop down menu, so that people gets faster and is more reinforcing.
While I appreciate that I got more of an intuitive grasp of scientific notation playing the game (how many 0s does a trillion have again?), have the word 'billion' displayed when putting in the 10^12.
When possible, try to contextualize where possible (I do this in this post on trillion dollar figures: 'So how can you conceptualize $1 trillion? 1 trillion is 1,000 billion. 1 billion is 1,000 million. Houses often costs ~1 million. So 1 trillion ≈ 1 million houses—a whole city.')
I like the timer feature, but perhaps consider either reducing the time per question even further or give more point if one answers faster.

If you gamify this properly, I think this could be the next Sporcle (but much more useful better).

Adam Binks1y8

I think you could implement a spaced repetition feature based on how many orders of magnitude you’re off, where the more OOMs you're off, the earlier it prompts you with the same question again

This is a great idea, so we made Anki with Uncertainty to do exactly this!

Thank you Hauke for the suggestion :D

I think we'll keep the calibration app as a pure calibration training game, where you see each question only once. Anki is already the king of spaced repetition, so adding calibration features to it seemed like a natural fit.

emre kaplan2y11

This is awesome, I am glad that someone built this!

Jamie_Harris2y4

This seems cool!

When I saw the word "app" I assumed 'oh cool I can download this on my phone and maybe I'll be tempted to fiddle with it in spare moments similarly to how I get tempted to scroll social media.' Seems it's just on a website for now? I'm less optimistic that I'll remember / get tempted to use it in this format.

(Not a criticism, just a reflection.)

PeterSlattery2y4

Thanks for this! I am excited to try it!

Isaac King2y3

But anecdotally, many EAs still feel uncomfortable quantifying their intuitions and continue to prefer using words like “likely” and “plausible” which could be interpreted in many ways.
This issue is likely to get worse as the EA movement attempts to grow quickly, with many new members joining who are coming in with various backgrounds and perspectives on the value of subjective credences

Don't take this as a serious criticism; I just found it funny.

elifland2y3

Yeah I realized this when proofreading and left it as I thought it drove home my point well :p

Adam Binks9mo1

We've added a new deck of questions to the calibration training app - The World, then and now.

What was the world like 200 years ago, and how has it changed? Featuring charts from Our World in Data.

Thanks to Johanna Einsiedler and Jakob Graabak for helping build this deck!

We've also split the existing questions into decks, so you can focus on the topics you're most interested in: