Spanish English French German Italian Portuguese
Social Marketing
HomeDigitalInnovationVision and language combined is the key to a more effective AI

Vision and language combined is the key to a more effective AI

Depending on which theory of intelligence you subscribe to, achieving “human-level” AI will require a system that can take advantage of multiple modalities, for example: sound, vision and text, to reason about the world. When an image of an overturned truck and a police car on a snowy highway is shown, a human-level AI could infer that dangerous road conditions caused an accident. Or, running on a robot, when asked to grab a can of soda from the refrigerator, they navigate between people, furniture, and pets to retrieve the can and place it within the requester's reach.

The current AI falls short. But new research shows signs of encouraging progress, from robots that can figure out steps to follow basic commands (eg, "grab a bottle of water") to text-producing systems that learn from explanations.

OpenAI Enhanced DALL-E AI Research Lab, DALL-E2, is easily the most impressive project emerging from the depths of an AI research lab. While the original DALL-E demonstrated remarkable prowess in creating images to match virtually any message (for example, "a dog in a beret"), DALL-E2 It goes beyond. The images it produces are much more detailed and DALL-E2 can intelligently replace a certain area in an image, for example, by inserting a table into a photo of a marble floor filled with the appropriate reflections.

Researchers of Google have also detailed an equally impressive visual comprehension system called Prosody Visually driven for text to speech: VDTTS – in a post published on the AI ​​blog of Google. VDTTS can generate realistic-sounding, lip-synced speech given nothing more than text frames and video of the person speaking.

The speech generated by VDTTSThough not a perfect substitute for recorded dialogue, it's still pretty good, with convincingly human-like expressiveness and pacing. Google sees that one day it will be used in a studio to replace original audio that might have been recorded in noisy conditions.

Of course, visual understanding is just one step on the path to a more capable AI. Another component is the language understanding, which lags behind in many respects, even setting aside the well-documented issues of toxicity and AI bias. In a clear example, a state-of-the-art system of Google, Pathways Language Model (PaLM), memorized 40% of the data that was used to “train” him, according to a document, resulting in PaLM plagiarizing text up to copyright notices on code snippets.

Fortunately, DeepMind, the Alphabet-backed artificial intelligence lab, is among those exploring techniques to address this problem. In a new study, researchers from DeepMind investigate whether AI language systems, who learn to generate text from many examples of existing text (think books and social media), could benefit from receiving explanations of those texts. After scoring dozens of language tasks (eg, "Answer these questions by identifying whether the second sentence is an appropriate paraphrase of the first metaphorical sentence") with explanations (eg, "David's eyes were not literally daggers, is a metaphor used to imply that David was glaring at Paul.”) and evaluating the performance of different systems on them, The team of DeepMind found that examples actually improve system performance.

The focus of DeepMind, if you pass the exam within the academic community, could one day be applied in robotics, forming the building blocks of a robot that can understand vague requests (for example, "throw out the trash") without step-by-step instructions.

RELATED

Leave a response

Please enter your comment!
Please enter your name here

Comment moderation is enabled. Your comment may take some time to appear.

This site uses Akismet to reduce spam. Learn how your comment data is processed.

SUBSCRIBE TO TRPLANE.COM

Publish on TRPlane.com

If you have an interesting story about transformation, IT, digital, etc. that can be found on TRPlane.com, please send it to us and we will share it with the entire Community.

MORE PUBLICATIONS

Enable notifications OK No thanks