Dataset News
Subscribe
Sign in
Home
Pitch
About
Refreshing general-knowledge benchmarks without leakage
Static QA tests are showing their age; how to refresh without rewarding recollection.
Aug 21, 2025
•
Michael Gordon
Judge swaps, drifting agent evals
Leaderboards move when the judge changes. Here’s how to keep web-agent evaluations stable enough to buy, benchmark, and ship against.
Aug 14, 2025
•
Michael Gordon
Dataset News
Curated, verified updates on datasets, benchmarks, and licensing for researchers and teams who buy, evaluate, and ship with data.
Subscribe
Dataset News
Subscribe
About
Archive
Sitemap
This site requires JavaScript to run correctly. Please
turn on JavaScript
or unblock scripts