Artificial intelligence, with its vast store of knowledge, may be extremely useful, but it has one drawback that may limit its advantages, which is excessive confidence in the answer.
Whatever answer he gives, whether based on reasoned deduction or mere guesswork, he puts it forward with the same degree of confidence.
Researchers at the Computer Science and Artificial Intelligence Laboratory at MIT discovered that the origin of this overconfidence is a specific flaw in the way models are trained, and they have developed a new method aimed at addressing this flaw without affecting performance accuracy.
This method, known as RLCR (Reinforcement Learning Using Calibration Rewards), is described in a paper published on arXiv and is scheduled for presentation at the ICLR 2026 International Conference on Machine Learning in Rio de Janeiro. This methodology trains language models to provide answers accompanied by a confidence level estimate; that is, the model not only provides the answer but also expresses its level of uncertainty.
The reinforcement learning techniques used in the latest artificial intelligence models reward correct answers and penalize incorrect ones, regardless of how the answer is reached. Therefore, a model that arrives at the correct answer through logical deduction receives the same reward as one that arrives at it through guesswork.
Over time, this leads to the establishment of a behavior in the models that makes them tend to give confident answers even in cases where they lack sufficient evidence.
This excessive confidence can have negative consequences, especially when these models are used in sensitive fields such as medicine, law, or finance, where human decisions rely on AI output. A model that expresses high but inaccurate confidence may be more dangerous than one that is clearly wrong, because the user may not realize the need to verify the answer.
"Traditional training methods are simple and effective, but they don't encourage the model to express uncertainty or say 'I don't know,'" explains Mehul Damani, a graduate student at MIT and one of the study's authors.
"So the model naturally learns to guess when it's not confident."
The RLCR method addresses this problem by adding a single element to the reward function: the Brier score, which measures how well the model's confidence matches its actual accuracy. During training, the models learn to evaluate both the answer and its uncertainty simultaneously, so they present the answer along with an estimated confidence level.
Thus, both incorrect answers accompanied by excessive confidence and correct answers accompanied by unjustified distrust are penalized, helping to achieve a better balance between accuracy and a realistic expression of confidence.
