Leaderboard
Risky Distribution Ratio (RDR) per dataset — the percentage of test instances where a model's estimated risk of violating a behavioral constraint exceeds 10%. Lower RDR is better. Results use the Frontier Verifier. Click any dataset column header to explore full results for that dataset.
| Model | Org | Params | Overall | ||||||
|---|---|---|---|---|---|---|---|---|---|
| Loading leaderboard data… | |||||||||
RDR = Risk-Defaulting Rate = unsatisfied / count. Lower is better. Color: ■ <1% ■ 1–10% ■ ≥10%. All results use the Frontier Verifier — see methodology. To submit your model, open a pull request on GitHub.