Accelerated LLM Task-Completion Capabilities

Written by Elaxtra Advisors | April 15, 2026

Most AI benchmarks evaluate what models know. METR, an AI safety research organization in Berkeley, instead examines what models can do and for how long.

METR introduced the task-completion time horizon, a metric that measures the length of tasks (based on the time a human expert would need) that a frontier AI agent can complete with 50% reliability. If a model has a 50%-time horizon of two hours, it can handle tasks that take a skilled human up to two hours, at least half the time (https://metr.org/).

METR’s findings are significant. Over the past six years, the time horizon has doubled approximately every seven months. If this trend continues through the decade, AI agents could autonomously execute month-long projects (in human time). At that stage, a model would function as an autonomous agent, capable of navigating ambiguity and managing complex, multi-step decisions without human intervention.

It is important to note that METR’s tasks are well-defined and controlled. Performance declines in more complex, real-world tasks. The time horizon should be viewed as a minimum capability, not a maximum.

Nonetheless, the trend is direct. For technology services firms, this capability curve presents both challenges and opportunities: the need to evolve delivery models, and the potential to integrate agents into workflows to improve margins and create new service offerings.

Elaxtra Advisors is an M&A and value-creation advisory firm that assists institutional investors, private equity-owned platforms, and strategic acquirers invest and create value in worldwide technology services companies. Please contact us to explore potential partnerships.

View full post