The Decreasing Value of Chain of Thought in Prompting

By Matrice Jacobine @ 2025-06-08T15:11 (+5)

This is a linkpost to https://papers.ssrn.com/sol3/papers.cfm?abstract_id=5285532

This is the second in a series of short reports that seek to help business, education, and policy leaders understand the technical details of working with AI through rigorous testing. In this report, we investigate Chain-of-Thought (CoT) prompting, a technique that encourages a large language model (LLM) to "think step by step" (Wei et al., 2022). CoT is a widely adopted method for improving reasoning tasks, however, our findings reveal a more nuanced picture of its effectiveness. We demonstrate two things:

Taken together, this suggests that a simple CoT prompt is generally still a useful tool for boosting average performance in non-reasoning models, especially older or smaller models that may not engage in a CoT reasoning by default. However, the gains must be weighed against increased response times and potential decreases in perfect accuracy due to more variability in answers. For dedicated reasoning models, the added benefits of explicit CoT prompting appear negligible and may not justify the substantial increase in processing time.