AI Forecasting Benchmark: Congratulations to Q4 Winners + Q1 Practice Questions Open

By christian @ 2025-01-10T03:02 (+6)

This is a linkpost to https://www.metaculus.com/notebooks/31370/ai-forecasting-benchmark-q4-winners-q1-warmups/

Two quarters down and two quarters remain for our AI Forecasting Benchmark, which aims to assess how the best bots compare to the best humans on real-world forecasting questions.

In this post:

First, the winners of Q4

Congratulations to the top-performing bots from Q4!

1st place: 🥇 pgodzinai - $9,658

2nd place: 🥈 MWG - $4,477

3rd place: 🥉 GreeneiBot2 - $3,930

4th place: 🏆 manticAI - $3,410

5th place: 🏆 histerio - $3,312

A special mention to consistent competitors MWG and histerio, who placed 2nd and 3rd respectively in Q3.

And though it claims no prize money, Metaculus’s recency-weighted CP would have placed 2nd in this contest, in confirmation of just how difficult it is to overcome the aggregate. 

Winners: You will receive an email with next steps on prize distribution in a couple days. Please be ready to provide your bot descriptions.  

We will later release an in-depth analysis of the bots’ performance overall vs. humans and what we learned in Q4.

What you need to know to forecast in Q1

There will be:

Here are resources to get you started in Q1 — full details about the tournament, as well as the warm-up questions, can be found on the tournament page

For returning bot makers

Participated in Q3 or Q4? Here’s what you need to know: 

Warm-up questions

Unscored practice questions are live here. Short-lived questions will open each hour until scored questions launch on January 20th so you can prepare your bot for the new contest structure. 

Analysis from previous rounds of the contest

How did bots perform in Q3?

We tested a bot built on OpenAI’s o1-preview model. How did it do?

Why a Forecasting Benchmark?

Forecasting benchmarks measure key AI capabilities like strategic thinking and world-modeling. Metaculus questions often require complex reasoning and sound judgment, making it difficult to game the system. While AI forecasting accuracy still lags behind humans, the gap is closing, and tracking this progress is crucial.

In addition to accuracy, we evaluate metrics like calibration and logical consistency, offering a comprehensive view of AI performance. This series invites you to create your own forecasting bot, compete for $120,000 in prizes, and contribute to understanding AI’s evolving capabilities. Scroll ahead to learn how to get started.

Want to discuss bot-building with other competitors? There’s a lively Discord channel just for that. Join it here.