Models and Benchmarks in Finance

Improving CMS Financial Benchmarking: Lessons Learned By The Innovation Center

The Innovation Center is committed to an ongoing cycle of designing, refining, and testing new benchmarking methodologies, particularly as we learn from ongoing model tests. This Forefront article ...

TechCrunch

The rise of AI ‘reasoning’ models is making benchmarking more expensive

AI labs like OpenAI claim that their so-called “reasoning” AI models, which can “think” through problems step by step, are more capable than their non-reasoning counterparts in specific domains, such ...

EurekAlert!

Foundation models reshape financial engineering: New survey maps progress across three frontiers

A comprehensive review published in Engineering traces the rapid evolution of artificial intelligence in finance, documenting how foundation models are transforming everything from market forecasting ...

InfoWorld

New AI benchmarking tools evaluate real world performance

Now open source, xbench uses an ever changing evaluation mechanism to look at an AI model's ability to execute real-world tasks and make it harder for model makers to train on the tests. A new AI ...

TechCrunch

Meta’s benchmarks for its new AI models are a bit misleading

One of the new flagship AI models Meta released on Saturday, Maverick, ranks second on LM Arena, a test that has human raters compare the outputs of models and choose which they prefer. But it seems ...

VentureBeat

Beyond generic benchmarks: How Yourbench lets enterprises evaluate AI models against actual data

Every AI model release inevitably includes charts touting how it outperformed its competitors in this benchmark test or that evaluation matrix. However, these benchmarks often test for general ...

csis.org

Benchmarking as a Path to International AI Governance

A recent CSIS report argues that an associational model of benchmarking can be a useful tool in AI governance. By integrating stakeholders across private and public sectors, as well as civil society, ...

ZDNet

With AI models clobbering every benchmark, it's time for human evaluation

Artificial intelligence has traditionally advanced through automatic accuracy tests in tasks meant to approximate human knowledge. Carefully crafted benchmark tests such as The General Language ...

Geeky Gadgets

Al Benchmarks Investigated : Do Companies Tune Private Builds for Leaderboards, Then Ship Weaker Versions?

Are AI benchmarks really the gold standard we’ve been led to believe? Matt Wolfe walks through how these widely accepted metrics, designed to measure the performance of artificial intelligence systems ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results