skip to content

Prolific AI Research (PAIR)

A flow field of living data — cells and streams of information drifting across the canvas.

The AI research group at Prolific — papers, notes, and field logs.

Featured
A field of seed-stems gathering into a tall cluster — many human judgements converging into a ranking.

Human evaluation

HUMAINE Leaderboard

A demographically-aware, multi-dimensional human-preference leaderboard: 27 models judged by 20,000+ stratified participants and ranked with a hierarchical Bradley–Terry–Davidson model. Compare models head-to-head, by metric, and across 22 demographic groups.

Open leaderboard
A radial bloom over concentric contour rings, crossed by a single plumb-line — behaviour measured against a held boundary.

Alignment

Alignment Leaderboard

Behavioural alignment evaluated under realistic pressure — 904 multi-turn scenarios across Honesty, Safety, Non-Manipulation, Robustness, Corrigibility, and Scheming. Ranks how models actually behave when instructions conflict, not what they claim they would do.

Open leaderboard
DatePaper