Prolific AI ResearchPAIR

Publications

Papers

Conference papers and preprints from the lab. Links point to the PDF or code; titles open the in-site write-up.

DatePaperArea

May 2026

Faithful to the Persona, Unfaithful to the Decision: A Mechanism for Chain-of-Thought Unfaithfulness

Preprint coming

Interpretability

Interpretability

May 2026

Deval: A Pipeline for Deployment-Valid LLM Benchmark Evaluation

Preprint coming

Benchmark validity

Benchmark validity

May 2026

MetaLoop: Benchmarking the Full Metacognitive Loop in LLMs

Preprint coming

Metacognition

Metacognition

Apr 2026

Pressure Reveals Character: Behavioural Alignment Evaluation at Depth

ICML 2026 (spotlight)

OpenReview PDF ↗Leaderboard ↗

Alignment

Alignment

Mar 2026

The Missing Red Line: How Commercial Pressure Erodes AI Safety Boundaries

ICLR 2026 Workshop ICBINB

OpenReview PDF ↗

AI safety

AI safety

Jan 2026

Unpacking Human Preference for LLMs: Demographically Aware Evaluation with the HUMAINE Framework

ICLR 2026 (poster)

OpenReview PDF ↗Leaderboard ↗Dataset ↗

Human evaluation

Human evaluation