The research team presented an approach they called "mental commentary," which relies on advanced linguistic models to accurately link brain activity to textual content.
The new method uses functional magnetic resonance imaging (fMRI) to record brain activity, then employs advanced linguistic models such as RoBERTa-large and DeBERTa-large to decode the semantic features of that brain activity. This results in accurate textual descriptions of what the person sees, and these descriptions can be gradually refined through an iterative process to become clearer and more coherent.
In the first experiment, six participants watched 2,196 diverse short video clips while their brain activity was recorded using functional magnetic resonance imaging (fMRI). The clips included random scenes, objects, movements, and events, and the participants were native speakers of both Japanese and English.
The same passages had previously undergone a collective translation analysis, and a pre-trained machine learning module (DeBERTa-large) was used to extract the linguistic features of each passage. These features were then matched with brain activity to generate texts using the RoBERTa-large model in an iterative manner.
Initially, the resulting descriptions were fragmented and unclear, but they gradually improved to become coherent and clear, accurately reflecting the key events in the sections, including interactions between different objects. When compared to correct and incorrect labels, the descriptions achieved an accuracy rate of approximately 50%, a higher percentage than previous methods and indicating promising future potential.
In the next phase, participants were asked to recall video clips while their brains were being scanned using fMRI to test the technology's ability to read memories. The results showed that the system was able to generate descriptions accurately reflecting the content of the recalled clips, although accuracy varied among individuals. Some participants achieved an accuracy rate of approximately 40% in identifying the correct clip from among 100 possible clips.
The findings suggest that this technology could be a future way to help people who have lost the ability to speak, such as stroke patients, by enabling them to express their thoughts in more detail than current brain-computer interfaces. However, the system still needs further development before this stage can be reached.
Despite its significant scientific potential, mind-reading technology raises ethical concerns regarding privacy and the possibility of misuse. Scientists emphasize that informed consent will be a prerequisite for any practical application of this technology, and that questions concerning mental privacy must be addressed before its widespread adoption.
However, the study provides a new tool for understanding how the brain represents complex experiences, as well as potential to help people who are unable to express themselves verbally.
The study was published in the journal Science Advances.
