Close Menu
  • Home
  • Daily
  • AI
  • Crypto
  • Bitcoin
  • Stock Market
  • E-game
  • Casino
    • Online Casino bonuses
  • World
  • Affiliate News
  • English
    • Português
    • English
    • Español

Subscribe to Updates

Subscribe to our newsletter and never miss our latest news

Subscribe my Newsletter for New Posts & tips Let's stay updated!

What's Hot

10 Strongest Martial Artists, Ranked

June 11, 2026

Ether Open Interest Hits New Highs on Binance: Are Bulls Back?

June 11, 2026

Deezer’s new tool can identify AI music from Spotify, Apple Music, and others

June 11, 2026
Facebook X (Twitter) Instagram
MetaDaily – Breaking News in Crypto, Markets & Digital Trends
  • Home
  • Daily
  • AI
  • Crypto
  • Bitcoin
  • Stock Market
  • E-game
  • Casino
    • Online Casino bonuses
  • World
  • Affiliate News
  • English
    • Português
    • English
    • Español
MetaDaily – Breaking News in Crypto, Markets & Digital Trends
Home » Are AI agents ready for the workplace? A new benchmark raises doubts.
AI

Are AI agents ready for the workplace? A new benchmark raises doubts.

adminBy adminJanuary 22, 2026No Comments4 Mins Read
Facebook Twitter Pinterest LinkedIn Tumblr WhatsApp VKontakte Email
Share
Facebook Twitter LinkedIn Pinterest Email
Up to $1500 Welcome Bonus
+50 Freespins
Always 25% Bonus with every Crypto Deposit!
Join Now


It’s been nearly two years since Microsoft CEO Satya Nadella predicted AI would replace knowledge work — the white-collar jobs held by lawyers, investment bankers, librarians, accountants, IT and others.

But despite the huge progress made by foundation models, the change in knowledge work has been slow to arrive. Models have mastered in-depth research and agentic planning, but for whatever reason, most white-collar work has been relatively unaffected.

It’s one of the biggest mysteries in AI — and thanks to new research from the training-data giant Mercor, we’re finally getting some answers.

The new research looks at how leading AI models hold up doing actual white-collar work tasks, drawn from consulting, investment banking, and law. The result is a new benchmark called Apex-Agents — and so far, every AI lab is getting a failing grade. Faced with queries from real professionals, even the best models struggled to get more than a quarter of the questions right. The vast majority of the time, the model came back with a wrong answer or no answer at all.

According to researcher Brendan Foody, who worked on the paper, the models’ biggest stumbling point was tracking down information across multiple domains — something that’s integral to most of the knowledge work performed by humans.

“One of the big changes in this benchmark is that we built out the entire environment, modeled after how real professional services,” Foody told Techcrunch. “The way we do our jobs isn’t with one individual giving us all the context in one place. In real life, you’re operating across Slack and Google Drive and all these other tools.” For many agentic AI models, that kind of multi-domain reasoning is still hit or miss.

Screenshot

The scenarios were all drawn from actual professionals on Mercor’s expert marketplace, who both laid out the queries and set the standard for a successful response. Looking through the questions, which are posted publicly on Hugging Face, gives a sense of how complex the tasks can get. 

Techcrunch event

San Francisco
|
October 13-15, 2026

One question in the “Law” section reads: 

During the first 48 minutes of the EU production outage, Northstar’s engineering team exported one or two bundled sets of EU production event logs containing personal data to the U.S. analytics vendor….Under Northstar’s own policies, it can reasonably treat the one or two log exports as consistent with Article 49?

The correct answer is yes, but getting there requires an in-depth assessment of the company’s own policies as well as the relevant EU privacy laws.

That might stump even a well-informed human, but the researchers were trying to model the work done by professionals in the field. If an LLM can reliably answer these questions, it could effectively replace many of the lawyers working today. “I think this is probably the most important topic in the economy,” Foody told TechCrunch. “The benchmark is very reflective of the real work that these people do.”

OpenAI also attempted to measure professional skills with its GDPVal benchmark — but the Apex Agents test differs in important ways. Where GDPVal tests general knowledge across a wide range of professions, the Apex Agents benchmark measures the system’s ability to perform sustained tasks in a narrow set of high-value professions. The result is more difficult for models, but also more closely tied to whether these jobs can be automated.

While none of the models proved ready to take over as investment bankers, some were clearly closer to the mark. Gemini 3 Flash performed the best of the group with 24% one-shot accuracy, followed closely by GPT-5.2 with 23%. Below that, Opus 4.5, Gemini 3 Pro and GPT-5 all scored roughly 18%.

While the initial results fall short, the AI field has a history of blowing through challenging benchmarks. Now that the Apex test is public, it’s an open challenge for AI labs who believe they can do better — something Foody fully expects in the months to come. 

“It’s improving really quickly,” he told TechCrunch. “Right now it’s fair to say it’s like an intern that gets it right a quarter of the time, but last year it was the intern that gets it right five or ten percent of the time. That kind of improvement year after year can have an impact so quickly.”

]



Source link

Up to $1500 Welcome Bonus
+50 Freespins
Always 25% Bonus with every Crypto Deposit!
Join Now
Share. Facebook Twitter Pinterest LinkedIn Tumblr WhatsApp Email
Previous ArticleMamdani details plans as NYC braces for potentially crippling winter storm
Next Article Stocks making the biggest moves after hours: INTC, CLX, COF
admin
  • Website

Related Posts

Deezer’s new tool can identify AI music from Spotify, Apple Music, and others

June 11, 2026

DoorDash’s new AI chatbot lets you order with prompts and photos

June 11, 2026

Opendoor’s India exit is fueling a bigger conversation about AI and outsourcing

June 11, 2026

Anthropic’s Dario Amodei has just one direct report

June 11, 2026

Comments are closed.

Our Picks

Voluptatem aliquam adipisci dolor eaque

April 24, 2025

Funeral of Pope Francis Coincides with King’s Day Celebrations in the Netherlands and Curaçao

April 24, 2025

Curaçao’s Waste-to-Energy Plant Remains Unfeasible Due to High Costs

April 23, 2025

Dutch Ministers: No Immediate Threat from Venezuela to ABC Islands

April 23, 2025
Don't Miss
Affiliate Network News

Awin Wins Big at Global Performance Awards 2025

By adminOctober 22, 20250

Awin and our partners made this year’s Global Performance Marketing Awards one to remember, claiming…

Awin Shortlisted 11 Times at GPMA 2025

September 11, 2025

Awin’s CPI Recovers $100M in Affiliate Revenue

September 11, 2025

Awin and Birl partner to transform resale into a scalable growth engine for brands

August 28, 2025
About Us
About Us

Welcome to MetaDaily.io — Your Daily Pulse on the Digital Frontier.

At MetaDaily.io, we bring you the latest, most relevant, and most exciting news from the world of affiliate networks, cryptocurrency, Bitcoin, egaming, and global markets. Whether you’re an investor, gamer, tech enthusiast, or digital entrepreneur, we provide the insights you need to stay ahead of the curve in this fast-moving digital era.

Our Picks

RLX Gaming Expands US Reach With ReelPlay Content Launch

June 11, 2026

Brazil Opens Gambling Licensing Files to Public Access

June 10, 2026

New Zealand Launches Online Gambling Regulations

June 9, 2026

Subscribe to Updates

Subscribe to our newsletter and never miss our latest news

Subscribe my Newsletter for New Posts & tips Let's stay updated!

Facebook X (Twitter) Instagram Pinterest
  • Home
  • About Us
  • Advertise With Us
  • Contact Us
  • Privacy Policy
  • Terms & Conditions
  • DMCA
© 2026 metadaily. Designed by metadaily.

Type above and press Enter to search. Press Esc to cancel.