Research
theregister.com
Microsoft researchers reveal frontier AI models degrade on long-running tasks
LLMs like Gemini 3.1 Pro, Claude 4.6 Opus, and GPT-5.4 lose up to 50% accuracy over repeated interactions, corrupting documents in most cases. Agents with tools perform even worse.