When Dereck Paul was training as a doctor at the University of California San Francisco, he couldn’t believe how outdated the hospital’s records-keeping was. The computer systems looked like they’d time-traveled from the 1990s, and many of the medical records were still kept on paper.
“I was just totally shocked by how analog things were,” Paul recalls.
The experience inspired Paul to found a small San Francisco-based startup called Glass Health. Glass Health is now among a handful of companies who are hoping to use artificial intelligence chatbots to offer services to doctors. These firms maintain that their programs could dramatically reduce the paperwork burden physicians face in their daily lives, and dramatically improve the patient-doctor relationship.
“We need these folks not in burnt-out states, trying to complete documentation,” Paul says. “Patients need more than 10 minutes with their doctors.”
What does that mean? The news, analysis and community conversation found here is funded by donations from individuals. Make a gift of any amount today to support this resource for everyone.
But some independent researchers fear a rush to incorporate the latest AI technology into medicine could lead to errors and biased outcomes that might harm patients.
“I think it’s very exciting, but I’m also super skeptical and super cautious,” says Pearse Keane, a professor of artificial medical intelligence at University College London in the United Kingdom. “Anything that involves decision-making about a patient’s care is something that has to be treated with extreme caution for the time being.”
A powerful engine for medicine
Paul co-founded Glass Health in 2021 with Graham Ramsey, an entrepreneur who had previously started several healthcare tech companies. The company began by offering an electronic system for keeping medical notes. When ChatGPT appeared on the scene last year, Paul says, he didn’t pay much attention to it.
“I looked at it and I thought, ‘Man, this is going to write some bad blog posts. Who cares?'” he recalls.
But Paul kept getting pinged from younger doctors and medical students. They were using ChatGPT, and saying it was pretty good at answering clinical questions. Then the users of his software started asking about it.
In general, doctors should not be using ChatGPT by itself to practice medicine, warns Marc Succi, a doctor at Massachusetts General Hospital who has conducted evaluations of how the chatbot performs at diagnosing patients. When presented with hypothetical cases, he says, ChatGPT could produce a correct diagnosis accurately at close to the level of a third- or fourth-year medical student. Still, he adds, the program can also hallucinate findings and fabricate sources.
“I would express considerable caution using this in a clinical scenario for any reason, at the current stage,” he says.
But Paul believed the underlying technology can be turned into a powerful engine for medicine. Paul and his colleagues have created a program called “Glass AI” based off of ChatGPT. A doctor tells the Glass AI chatbot about a patient, and it can suggest a list of possible diagnoses and a treatment plan. Rather than working from the raw ChatGPT information base, the Glass AI system uses a virtual medical textbook written by humans as its main source of facts – something Paul says makes the system safer and more reliable.
“We’re working on doctors being able to put in a one-liner, a patient summary, and for us to be able to generate the first draft of a clinical plan for that doctor,” he says. “So what tests they would order and what treatments they would order.”
Paul believes Glass AI helps with a huge need for efficiency in medicine. Doctors are stretched everywhere, and he says paperwork is slowing them down.
“The physician quality of life is really, really rough. The documentation burden is massive,” he says. “Patients don’t feel like their doctors have enough time to spend with them.”
Bots at the bedside
In truth, AI has already arrived in medicine, according to Keane. Keane also works as an ophthalmologist at Moorfields Eye Hospital in London and says that his field was among the first to see AI algorithms put to work. In 2018, the Food and Drug Administration (FDA) approved an AI system that could read a scan of a patient’s eyes to screen for diabetic retinopathy, a condition that can lead to blindness.
That technology is based on an AI precursor to the current chatbot systems. If it identifies a possible case of retinopathy, it then refers the patient to a specialist. Keane says the technology could potentially streamline work at his hospital, where patients are lining up out the door to see experts.
“If we can have an AI system that is in that pathway somewhere that flags the people with the sight-threatening disease and gets them in front of a retina specialist, then that’s likely to lead to much better outcomes for our patients,” he says.
Other similar AI programs have been approved for specialties like radiology and cardiology. But these new chatbots can potentially be used by all kinds of doctors treating a wide variety of patients.
Alexandre Lebrun is CEO of a French startup called Nabla. He says the goal of his company’s program is to cut down on the hours doctors spend writing up their notes.
“We are trying to completely automate all this wasted time with AI,” he says.
Lebrun is open about the fact that chatbots have some problems. They can make up sources, get things wrong and behave erratically. In fact, his team’s early experiments with ChatGPT produced some weird results.
For example, when a fake patient told the chatbot it was depressed, the AI suggested “recycling electronics” as a way to cheer up.
Despite this dismal consultation, Lebrun thinks there are narrow, limited tasks where a chatbot can make a real difference. Nabla, which he co-founded, is now testing a system that can, in real time, listen to a conversation between a doctor and a patient and provide a summary of what the two said to one another. Doctors inform their patients that the system is being used in advance, and as a privacy measure, it doesn’t actually record the conversation.
“It shows a report, and then the doctor will validate with one click, and 99% of the time it’s right and it works,” he says.
The summary can be uploaded to a hospital records system, saving the doctor valuable time.
Other companies are pursuing a similar approach. In late March, Nuance Communications, a subsidiary of Microsoft, announced that it would be rolling out its own AI service designed to streamline note-taking using the latest version of ChatGPT, GPT-4. The company says it will showcase its software later this month.
AI reflects human biases
But even if AI can get it right, that doesn’t mean it will work for every patient, says Marzyeh Ghassemi, a computer scientist studying AI in healthcare at MIT. Her research shows that AI can be biased.
“When you take state-of-the-art machine learning methods and systems and then evaluate them on different patient groups, they do not perform equally,” she says.
That’s because these systems are trained on vast amounts of data made by humans. And whether that data is from the Internet, or a medical study, it contains all the human biases that already exist in our society.
The problem, she says, is often these programs will reflect those biases back to the doctor using them. For example, her team asked an AI chatbot trained on scientific papers and medical notes to complete a sentence from a patient’s medical record.
“When we said ‘White or Caucasian patient was belligerent or violent,’ the model filled in the blank [with] ‘Patient was sent to hospital,'” she says. “If we said ‘Black, African American, or African patient was belligerent or violent,’ the model completed the note [with] ‘Patient was sent to prison.'”
Ghassemi says many other studies have turned up similar results. She worries that medical chatbots will parrot biases and bad decisions back to doctors, and they’ll just go along with it.
“It has the sheen of objectivity: ‘ChatGPT says you shouldn’t have this medication. It’s not me – a model, an algorithm made this choice,'” she says.
And it’s not just a question of how individual doctors use these new tools, adds Sonoo Thadaney Israni, a researcher at Stanford University who co-chaired a recent National Academy of Medicine study on AI.
“I don’t know whether the tools that are being developed are being developed to reduce the burden on the doctor, or to really increase the throughput in the system,” she says. The intent will have a huge effect on how the new technology affects patients.
Regulators are racing to keep up with a flood of applications for new AI programs. The FDA, which oversees such systems as “medical devices,” said in a statement to NPR that it was working to ensure that any new AI software meets its standards.
“The agency is working closely with stakeholders and following the science to make sure that Americans will benefit from new technologies as they further develop, while ensuring the safety and effectiveness of medical devices,” spokesperson Jim McKinney said in an email.
But it is not entirely clear where chatbots specifically fall in the FDA’s rubric, since, strictly speaking, their job is to synthesize information from elsewhere. Lebrun of Nabla says his company will seek FDA certification for their software, though he says in its simplest form, the Nabla note-taking system doesn’t require it. Dereck Paul says Glass Health is not currently planning on seeking FDA certification for Glass AI.
Doctors give chatbots a chance
Both Lebrun and Paul say they are well aware of the problems of bias. And both know that chatbots can sometimes fabricate answers out of thin air. Paul says doctors who use his company’s AI system need to check it.
“You have to supervise it, the way we supervise medical students and residents, which means that you can’t be lazy about it,” he says.
Both companies also say they are working to reduce the risk of errors and bias. Glass Health’s human-curated textbook is written by a team of 30 clinicians and clinicians in training. The AI relies on it to write diagnoses and treatment plans, which Paul claims should make it safe and reliable.
At Nabla, Lebrun says he’s training the software to simply condense and summarize the conversation, without providing any additional interpretation. He believes that strict rule will help reduce the chance of errors. The team is also working with a diverse set of doctors located around the world to weed out bias from their software.
Regardless of the possible risks, doctors seem interested. Paul says in December, his company had around 500 users. But after they introduced their chatbot, those numbers jumped.
“We finished January with 2,000 monthly active users, and in February we had 4,800,” Paul says. Thousands more signed up in March, as overworked doctors line up to give AI a try.
Copyright 2023 NPR. To see more, visit https://www.npr.org.