Computer scientists at the University of Toronto have developed a camera robot called Stargazer that can create dynamic instructional videos. The robot is designed to assist teachers and remove the restrictions of working with a static camera.
Jiannan Li, a doctoral student in the computer science department of the Faculty of Arts & Science and principal investigator of the Stargazer project, said, “The teachers are there to teach. The role of the robot is to help with the filming, that is, to do the heavy lifting.”
The robot guides a single smartphone camera on a seven-degree-of-freedom robotic arm, which allows it to move with the video subject and independently track objects of interest. The system recognizes subtle hints from the teacher, such as body movements, gestures, and language, captured by the camera and other sensors.
The speech is picked up by a wireless microphone and forwarded to the Microsoft Azure Speech-to-Text language software. The transcribed text is interpreted by the GPT-3 language model along with a user-defined prompt.
The control commands for the camera are carried out naturally. For example, pointing to individual objects is enough to cause the camera to pan. In addition to gestures, speech is also analyzed and converted into robot control commands.
In a study, the system had to prove its functionality. Six teachers each completed two practice sessions before they could produce instructional videos with Stargazer. They showed how to maintain skateboards, made interactive sculptures, and showed how to set up virtual reality headsets.
According to the researchers, all participants were able to produce videos of satisfactory quality with the robotic camera. The team is now working on expanding the vocabulary of subtle intentions to improve collaboration with the robot and produce even more beautiful videos.
The scientists concede that Stargazer is not yet made for the masses, but there is a market for professional robotic film equipment. The robotic arm used is also still too expensive for teaching applications, and it also relies on external sensors, which makes the application even more complicated.