It turns out that large language models, although they seem to perceive humor in a manner close to humans, are not yet capable of creative thinking and understanding meaning in depth.
A new study, the results of which were published at the "2025 Conference on Experimental Approaches to Natural Language Processing", provided a detailed examination of these aspects of artificial intelligence.
It was previously believed that artificial intelligence could "understand" jokes, but the study authors pointed out that the data used in previous experiments was not accurate enough. To test this, the researchers improved upon the old tests and designed new ones.
For example, the models were asked to distinguish between meaningful wordplay and regular sentences versus meaningless expressions. When key words were changed, the accuracy of the responses dropped sharply, and the models often became confused and misclassified the jokes, demonstrating clear weaknesses in their understanding of context and the phonetic similarity of words.
Co-author Mohammed Tahir Bilwar from Cardiff University said: "When the models are confronted with unfamiliar wordplay, their success in distinguishing between jokes and ordinary sentences may drop to 20%, which is much lower than the 50% expected when guessing at random. We also revealed the models' overconfidence in assuming that the text is really funny, and this became evident when dealing with unfamiliar wordplay."

