Measuring Model Performance

MLCommons releases new AILuminate benchmark for measuring AI model safety

MLCommons today released AILuminate, a new benchmark test for evaluating the safety of large language models. Launched in 2020, MLCommons is an industry consortium backed by several dozen tech firms.

NBC News

AI's capabilities may be exaggerated by flawed tests, according to new study

Researchers behind a new study say that the methods used to evaluate AI systems’ capabilities routinely oversell AI performance and lack scientific rigor. The study, led by researchers at the Oxford ...

MarketingProfs

When the Data Doesn't Add Up: A CMO's Guide to Auditing Measurement Tools

Marketing measurement tools often overclaim results. Discover three practical ways to audit attribution and MMM outputs using ...

Results that may be inaccessible to you are currently showing.

Hide inaccessible results

MLCommons releases new AILuminate benchmark for measuring AI model safety

AI's capabilities may be exaggerated by flawed tests, according to new study

When the Data Doesn't Add Up: A CMO's Guide to Auditing Measurement Tools

Trending now