A new technology in the field of artificial intelligence enables the understanding of human preferences, which is an important step towards promoting safe and harmonious interaction between humans and robots in various aspects of daily life.
Researchers from the KAIST Institute of Science and Technology have taken a significant step towards expanding artificial intelligence applications in the real world, after developing a new technology that allows machines to learn human preferences and evaluation criteria autonomously through a limited number of video clips.
Professor Zhang Di Yu, from the School of Electrical Engineering, presented a new approach called VOTP (Video-based Optimal TransPort Preference) , which aims to enable intelligent systems to understand human preferences without the need for huge amounts of pre-classified data.
In recent years, artificial intelligence has been able to write texts, create images, and compose music, and today it is moving towards a new stage known as "physical artificial intelligence," where its capabilities are not limited to producing digital content, but extend to direct interaction with the real world.
The most prominent applications of this trend include industrial robots that operate in hazardous environments, self-driving cars capable of navigating complex traffic conditions, and surgical robots that assist doctors in performing delicate operations.
Until recently, developing robot behavior faced a major obstacle: the need for artificial intelligence to understand behavioral patterns that humans deem appropriate or inappropriate. To achieve this, developers relied on collecting tens of thousands of human assessments, manually classifying each machine behavior as "appropriate" or "inappropriate"—a process that was time-consuming, costly, and required significant human resources.
VOTP technology offers a more efficient and natural approach, based on the simple principle that humans learn new skills by watching a limited number of illustrative examples.
intelligence can analyze a small number of videos that include successful and unsuccessful examples, and deduce the criteria that humans use to evaluate different behaviors.
The new algorithm helps machines infer unspoken human intentions and preferences. For example, a surgical robot performing sutures, or a self-driving car crossing a busy pedestrian intersection, could choose the most appropriate course of action from several available options, based on an understanding of human expectations rather than simply adhering to rigid instructions.
Experiments conducted in diverse conditions and tasks showed that the technology achieved effective results and was able to transfer the acquired knowledge to new situations that the system had not previously dealt with.
VOTP technology contributes significantly to reducing the costs of collecting data and human feedback, as intelligent systems no longer need huge databases of evaluations, but rather a limited number of high-quality video clips are sufficient, which speeds up the development process and reduces its cost.
The technology also opens the door to a wide range of applications, including automated handling systems in factories, humanoid robots, autonomous cars, smart production lines, drones, advanced surgical systems, and software agents that manage computers on behalf of users.
Researchers believe that VOTP could become one of the key technologies for the next generation of physical AI systems that rely on understanding human needs and preferences to make more accurate and effective decisions
Commenting on the development, Professor Zhang Di Yu said: "The essence of physical artificial intelligence lies in teaching machines to understand human intentions and choose appropriate actions. Since VOTP technology is able to learn human evaluation criteria through a limited number of video clips, it represents a key step towards accelerating the development of robots and intelligent systems capable of making decisions closer to human decisions."
