In a groundbreaking development, engineering company Boston Dynamics has integrated ChatGPT, a sophisticated language model developed by OpenAI, into one of its remarkable robots, Spot. This canine-like companion is now equipped to offer guided tours around a building, providing insightful commentary on each exhibit along the way.
Spot has undergone a remarkable transformation, now boasting a selection of distinctive personalities. Depending on the chosen persona, the robot’s voice, tone, and personalized remarks adapt accordingly.
To perceive its surroundings, Spot employs Visual Question Answering (VQA) models, capable of generating captions for images and providing concise responses to queries about them. This visual data is refreshed approximately once every second and conveyed to the system as a text prompt.
Spot’s communication capabilities have also been enhanced by adding a specially crafted vibration-resistant mount for a Respeaker V2 speaker, a ring-array microphone adorned with LEDs. This innovative hardware seamlessly integrates with Spot’s EAP 2 payload via USB.
Control over the robot is managed by an offboard computer, either a desktop PC or a laptop, which communicates with Spot through its Software Development Kit (SDK). A straightforward Spot SDK service has been implemented to facilitate audio communication with the EAP 2.
Regarding verbal responses, Spot relies on the ElevenLabs text-to-speech service. To optimize response time, engineers have devised a system where text is streamed to the tool in parallel as “phrases”, and the resulting audio is played back serially.
Adding a touch of personality, Spot now exhibits body language capabilities. It can identify and track moving objects, enabling it to discern the location of the nearest person and orient its arm towards them. To create a whimsical touch, a lowpass filter has been applied to the generated speech, mimicking the motion of a puppet’s mouth. This effect is further accentuated by adorning the gripper with comical costumes and affixing googly eyes.
One of the most intriguing aspects of this experiment lies in the AI’s inherent logic, which required minimal fine-tuning. When questioned about its “parents,” Spot astoundingly navigated to the location where its predecessors resided, humorously declaring them to be its “elders.” This is a testament to the model’s ability to establish statistical associations between concepts without implying consciousness.
However, it is worth noting that the demonstration does have its limitations. Spot, like many language models, may occasionally experience hallucinations, where it generates fictitious information. An intriguing example of this phenomenon can be found in an article discussing a Sims-inspired town populated by AI agents. Additionally, there is a slight delay in responses, with users occasionally experiencing a wait time of approximately six seconds.
Despite these minor setbacks, this project marks a significant stride forward in research at the intersection of robotics and AI. Boston Dynamics is committed to further exploring this fusion of technologies, with the ultimate aim of enhancing robotic performance in human-centric environments. This promising endeavour holds the potential to revolutionize the way we interact with machines, ushering in a new era of intelligent companionship.
Check out the Reference Article. All Credit For This Research Goes To the Researchers on This Project. Also, don’t forget to join our 32k+ ML SubReddit, 40k+ Facebook Community, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.
Niharika is a Technical consulting intern at Marktechpost. She is a third year undergraduate, currently pursuing her B.Tech from Indian Institute of Technology(IIT), Kharagpur. She is a highly enthusiastic individual with a keen interest in Machine learning, Data science and AI and an avid reader of the latest developments in these fields.