Benchmark Test Programm

Ferrari: First F1 2026 test will focus on mileage and not "pure performance"

Ferrari boss Fred Vasseur reckons “pure performance” is quite irrelevant at the first pre-season test for the 2026 Formula 1 ...

Turnaround or takeover: Austin ISD’s high-stakes year at three middle schools

One semester in, three North Austin middle schools give insights into lessons for other campuses planning turnarounds.

ExtremeTech

Cinebench 2026 Introduces New Redshift Benchmarks

On Monday, Maxon announced Cinebench 2026, the latest version of its benchmarking software for testing CPU and GPU ...

Cinebench 2026 Arrives With Support for Blackwell GPUs, Apple M5 & Snapdragon X

The latest version of the benchmark makes several UX improvements, and also toughens up the testing for new high-end hardware ...

‘The Copenhagen Test’ Review: Simu Liu and Melissa Barrera’s Peacock Espionage Drama Takes Too Long to Uncover the Fun Stuff

An intelligence agency analyst discovers his brain has been hacked and has to figure out whom he can trust in this sci-fi ...

InfoQ

Benchmarking beyond the Application Layer: How Uber Evaluates Infrastructure Changes and Cloud Skus

Uber’s Ceilometer framework automates infrastructure performance benchmarking beyond applications. It standardizes testing ...

SpaceNews

Benchmark demonstrates high-throughput ASCENT thruster in hotfire testing at Edwards Air Force Base

Benchmark Macaw ASCENT thruster during hotfire testing Benchmark’s 22-Newton Macaw ASCENT thruster during hotfire at the company’s propulsion test facility near Pleasanton, California. Credit: ...

VentureBeat

Alibaba's AgentEvolver lifts model performance in tool use by ~30% using synthetic, auto-generated tasks

Researchers at Alibaba’s Tongyi Lab have developed a new framework for self-evolving agents that create their own training data by exploring their application environments. The framework, AgentEvolver ...

decrypt

Anthropic Completes AI Model Upgrades With Claude Opus 4.5—And Slashes Prices

Anthropic released Claude Opus 4.5 on Monday, completing its three-model family and marking the company's third major launch in just two months. The new flagship model claims the top spot in coding ...

VentureBeat

Anthropic’s Claude Opus 4.5 is here: Cheaper AI, infinite chats, and coding skills that beat humans

Anthropic released its most capable artificial intelligence model yet on Monday, slashing prices by roughly two-thirds while claiming state-of-the-art performance on software engineering tasks — a ...

Inc

The Winners (and Losers) of This New Vibe-Coding Benchmark Will Surprise You

In a new benchmark named Vibe Code Bench, OpenAI’s GPT-5.1 achieved the highest level of accuracy in completing a series of software engineering tasks, narrowly beating rival Anthropic’s Claude 4.5 ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results