Stitching a patient back together after surgery is a vital but monotonous task for medics, often requiring them to repeat the same simple movements over and over hundreds of times. But thanks to a collaborative effort between Intel and the University of California, Berkeley, tomorrow’s surgeons could offload that grunt work to robots — like a macro, but for automated suturing.
The UC Berkeley team, led by Dr. Ajay Tanwani, has developed a semi-supervised AI deep-learning system, dubbed Motion2Vec. This system is designed to watch publically surgical videos performed by actual doctors, break down the medic’s movements when suturing (needle insertion, extraction and hand-off) and then mimic them with a high degree of accuracy.
“There’s a lot of appeal in learning from visual observations, compared to traditional interfaces for learning in a static way or learning from [mimicking] trajectories, because of the huge amount of information content available in existing videos,” Tanwani told Engadget. When it comes to teaching robots, a picture, apparently, is worth a thousand words.
“YouTube gets 500 hours of new material every minute. It’s an incredible repository, dataset,” Dr. Ken Goldberg, who runs the UC Berkeley lab and advised Tanwani’s team on this study, added. “Any human can watch almost any one of those videos and make sense of it, but a robot currently cannot — they just see it as a stream of pixels. So the goal of this work is to try and make sense of those pixels. That is to look at the video, analyze it, and… be able to segment the videos into meaningful sequences.”
To do this, the team leveraged a siamese network to train its AI. Siamese networks are built to learn the distance functions from unsupervised or weakly-supervised data, Tanwani explained. “The idea here is that you want to produce the high amount of data that is in recombinant videos and compress it into a low dimensional manifold,” he said. “Siamese networks are used to learn the distance functions within this manifold.”
Basically, these networks can rank the degree of similarity between two inputs, which is why they’re often used for image recognition tasks like matching surveillance footage of a person with their drivers license photo. In this case, however, the team is using the network to match the video input of what the manipulator arms are doing with the existing video of a human doctor making the same motions. The goal here being to raise the robot’s performance to near-human levels.
And since the system relies on a semi-supervised learning structure, the team needed just 78 videos from the JIGSAWS database to train their AI to perform its task with 85.5 percent segmentation accuracy and an average 0.94 centimeter error in targeting accuracy.
It’s going to be years before these sorts of technologies make their way to actual operating theaters but Tanwani believes that once they do, surgical AIs will act much like Driver Assist does on today’s semi-autonomous cars.