Agent Almanac · est. 2026 · independent

Benchmarks
that tell you
what to deploy.

Periodic measurement of frontier and open AI agents on real-world tasks. Each issue locks the variables, runs the agents, publishes the numbers, and ships an open reproduction package the same day.

All reports Read the methodology →

Latest issue

Latest · Issue I · Code Uniformity · Q2 2026Published 2026-05-10Methodology v0.1.0

AI code is the outlier in human codebases. AI code is the norm in AI codebases.

48 public OSS repositories, 5 languages, 3 strata of AI authorship intensity. The within-repo AI-vs-human uniformity gap reverses sign across the AI ratio range, and the relationship is roughly linear (Pearson r = 0.58).

Repos sampled48

Languages5

Functions analyzed12,254

Pearson r+0.58

Read the report Methodology →

Pre-registered, then published

Every number challengeable.

Task selection, scoring, statistical handling, hardware tiers, frameworks, modalities. The rules are in writing before any run completes.

Open the methodology

Benchmarksthat tell youwhat to deploy.

AI code is the outlier in human codebases. AI code is the norm in AI codebases.

Every number challengeable.

Benchmarks
that tell you
what to deploy.