Inside an AI Training for Doctors
Excerpt
At Northwell Health, executives are encouraging clinicians and all 85,000 employees to use a tool called AI Hub, according to a presentation obtained by 404 Media.
At Northwell Health, executives are encouraging clinicians and all 85,000 employees to use a tool called AI Hub, according to a presentation obtained by 404 Media.
Photo by Luis Melendez / Unsplash
Northwell Health, New York Stateâs largest healthcare provider, recently launched a large language model tool that it is encouraging doctors and clinicians to use for translation, sensitive patient data, and has suggested it can be used for diagnostic purposes, 404 Media has learned. Northwell Health has more than 85,000 employees.
An internal presentation and employee chats obtained by 404 Media shows how healthcare professionals are using LLMs and chatbots to edit writing, make hiring decisions, do administrative tasks, and handle patient data.Â
This segment is a paid ad. If youâre interested in advertising, letâs talk.
As generative AI shapes our future, keeping it safe is becoming a top priority for regulators, consumers, and tech platforms alike.
In the presentation given in August, Rebecca Kaul, senior vice president and chief of digital innovation and transformation at Northwell, along with a senior engineer, discussed the launch of the tool, called AI Hub, and gave a demonstration of how clinicians and researchersâor anyone with a Northwell email addressâcan use it.Â
AI Hub âuses [a] generative LLM, used much like any other internal/administrative platform: Microsoft 365, etc. for tasks like improving emails, check grammar and spelling, and summarizing briefs,â a spokesperson for Northwell told 404 Media. âIt follows the same federal compliance standards and privacy protocols for the tools mentioned on our closed network. It wasnât designed to make medical decisions and is not connected to our clinical databases.âÂ
A screenshot from a presentation given to Northwell employees in August, showing examples of âtasks.â
But the presentation and materials viewed by 404 Media include leadership saying AI Hub can be used for âclinical or clinical adjacentâ tasks, as well as answering questions about hospital policies and billing, writing job descriptions and editing writing, and summarizing electronic medical record excerpts and inputting patientsâ personally identifying and protected health information. The demonstration also showed potential capabilities that included âdetect pancreas cancer,â and âparse HL7,â a health data standard used to share electronic health records.Â
The leaked presentation shows that hospitals are increasingly using AI and LLMs to streamlining administrative tasks, and shows that some are experimenting with or at least considering how LLMs would be used in clinical settings or in interactions with patients.Â
A screenshot from a presentation given to Northwell employees in August, showing the ways they can use AI Hub.
In Northwellâs internal employee forum someone asked if they can use PHI, meaning protected health information thatâs covered by HIPAA, in AI Hub. âFor example we are wondering if we can leverage this tool to write denial appeal letters by copying a medical record excerpt and having Al summarize the record for the appeal,â they said. âWe are seeing this as being developed with other organizations so just brainstorming this for now.âÂ
A business strategy advisor at Northwell responded, âYes, it is safe to input PHI and PII [Personally Identifiable Information] into the tool, as it will not go anywhere outside of Northwellâs walls. Itâs why we developed it in the first place! Feel free to use it for summarizing EMR [Electronic Medical Record] excerpts as well as other information. As always, please be vigilant about any data you input anywhere outside of Northwellâs approved tools.â
AI Hub was released in early March 2024, the presenters said, and usage had since spread primarily through word of mouth within the company. By August, more than 3,000 Northwell employees were using AI Hub, they said, and leading up to the demo it was gaining as many as 500 to 1,000 new users a month.Â
đĄ
Do you know anything about how your employer is integrating AI into your workplace? I would love to hear from you. Using a non-work device, you can message me securely on Signal at sam.404. Otherwise, send me an email at sam@404media.co.
During the presentation, obtained by 404 Media and given to more than 40 Northwell employeesâincluding physicians, scientists, and engineersâKaul and the engineer demonstrated how AI Hub works and explained why it was developed. Introducing the tool, Kaul said that Northwell saw examples where external chat systems were leaking confidential or corporate information, and that corporations were banning use of âthe ChatGPTs of the worldâ by employees.Â
âAnd as we started to discuss this, we started to say, well, we canât shut [the use of ChatGPT] down if we donât give people something to use, because this is exciting technology, and we want to make the best of it,â Kaul said in the presentation. âFrom my perspective, itâs less about being shut down and replaced, but itâs more about, how can we harness the capabilities that we have?â
Throughout the presentation, the presenters suggested Northwell employees use AI Hub for things like questions about hospital policies and writing job descriptions or editing writing. At one point she said âpeople have been using this for clinical chart summaries.â She acknowledged that LLMs are often wrong. âThat, as this community knows, is sort of the thing with gen AI. You canât take it at face value out of the box for whatever it is,â Kaul said. âYou always have to keep reading it and reviewing any of the outputs, and you have to keep iterating on it until you get the kind of output quality that youâre looking for if you want to use it for a very specific purpose. And so weâll always keep reinforcing, take it as a draft, review it, and you are accountable for whatever you use.â Â
The tool looks similar to any text-based LLM interface: a text box at the bottom for user inputs, the chatbotâs answers in a window above that, and a sidebar showing the usersâ recent conversations along the left. Users can choose to start a conversation or âlaunch a task.â The examples of tasks presenters gave in their August demo included administrative ones, like summarizing research materials, but also detecting cancer and âparse HL7,â which stands for Health Level 7, an international health data standard that allows hospitals to share patient health records and data with each other securely and interoperably.Â
They can also choose from one of 14 different models to interact with, including Gemini 1.5 Pro, Gemini 1.5 Flash, Claude 3.5 Sonnet, GPT 4 Omni, GPT 4, GPT 4 Omni Mini, Codey, Claude 3 Opus, Claude 3 Sonnet, Claude 3 Haiku, GPT 3.5, PaLM 2, Gemini 1.0 Pro, and MedLM,
MedLM is a Google-developed LLM designed for healthcare. An information box for MedLM in AI Hub calls it âa model possessing advanced clinical knowledge and expertise. Can perform at expert levels on medical tasks, but is still inferior to human clinicians.âÂ
A screenshot from a presentation given to Northwell employees in August, showing the different LLMs available to choose from.
Tasks are saved prompts, and the examples the presenters gives in the demo include âBreak down materialâ and âComment My Code,â but also includes âDetect Pancreas Cancer,â with the description âTakes in an abdomen/adb+plv CT/MR and makes and prediction with reasoning about whether or not the report indicates presence or suspicious of pancreatic cancer or pancreatic pre-neoplasia.âÂ
Tasks are private to individual users to start, but can be shared to the entire Northwell workforce. Users can submit a task for review by âthe AI Hub team,â according to text in the demo. At the time of the demo, documents uploaded directly to a conversation or task expired in 72 hours. âHowever, once we make this a feature of tasks, where you can save a task with a document, make that permanent, thatâll be a permanently uploaded document, youâll be able to come back to that task whenever, and the document will still be there for you to use,â the senior engineer said.
AI Hub also accepts uploads of photos, audio, videos and files like PDFs in Gemini 1.5 Pro and Flash, which is a feature that has been âheavily requestedâ and is âgetting a lot of use,â the presenters said. To demonstrate that feature, he uploaded a 58-page PDF about how to remotely monitor patients and asked Gemini 1.5 Pro âwhat are the billing aspects?â which the model summarizes from the document.Â
Another one of the uses Northwell suggests for AI Hub is hiring. In the demo, the engineer uploaded two resumes, and asked the model to compare them. Workplaces are increasingly using AI in hiring practices, despite warnings that it can worsen discrimination and systemic bias. Last year, the American Civil Liberties Union wrote that the use of AI poses âan enormous danger of exacerbating existing discrimination in the workplace based on race, sex, disability, and other protected characteristics, despite marketing claims that they are objective and less discriminatory.Â
At one point in the demo, a radiologist asked a question: âIs there any sort of medical or ethical oversight on the publication of tasks?â They imagined a scenario where someone chooses a task, they said, thinking it does one thing but not realizing itâs meant to do another, and receiving inaccurate results from the model. âI saw one that was, âdetect pancreas cancer in a radiology report.â I realize this might be for play right now, but at some point people are going to start to trust this to do medical decision making.â
The engineer replied that this is why tasks require a review period before being published to the rest of the network. âThat review process is still being developedâŠÂ Especially for any tasks that are going to be clinical or clinical adjacent, weâre going to have clinical input on making sure that those are good to go and that, you know, [they are] as unobjectionable as possible before we roll those out to be available to everybody. We definitely understand that we donât want to just allow people to kind of publish anything and everything to the broader community.â Â
According to a report by National Nurses United which surveyed 2,300 registered nurses and members of NNU from January to March 2024, 40 percent of respondents said their employer âhas introduced new devices, gadgets, and changes to the electronic health records (EHR) in the past year.â As with almost every industry around the world, thereâs a race to adopt AI happening in hospitals, with investors and shareholders promising a healthcare revolution if only networks adopt AI. âWe are at an inflection point in AI where we can see its potential to transform health on a planetary scale,â Karen DeSalvo, Google Healthâs chief health officer, said at an event earlier this year for the launch of MedLMâs chest x-ray capabilities and other updates. âIt seems clear that in the future, AI wonât replace doctors, but doctors who use AI will replace those who donât.â Some studies show promising results in detecting cancer using AI models, including when used to supplement radiologistsâ evaluations of mammograms in breast cancer screenings, and early detection of pancreatic cancer.
âEverybody fears that it will release some time for clinicians, and then, instead of improving care, theyâll be expected to do more things, and that wonât really helpâ
But patients arenât buying it yet. A 2023 report by Pew Research found that 54 percent of men and 66 percent of women said they would be uncomfortable with the use of AI âin their own health care to do things like diagnose disease and recommend treatments.âÂ
A Northwell employee I spoke to about AI Hub told me that as a patient, they would want to know if their doctors were using AI to inform their care. âGiven that the chats are monitored, if a clinician uploads a chart and gets a summary, the team monitoring the chat could presumably read that summary, even if they canât read the chart,â they said. (Northwell did not respond to a question about who is able to see what information in tasks.)Â
âThis is new. Weâre still trying to build trust,â Vardit Ravitsky, professor of bioethics at the University of Montreal, senior lecturer at Harvard Medical School and president of the Hastings Center, told me in a call. âItâs all experimental. For those reasons, itâs very possible the patients should know more rather than less. And again, itâs a matter of building trust in these systems, and being respectful of patient autonomy and patientsâ right to know.âÂ
Healthcare worker burnoutâostensibly, the reason behind automating tasks like hiring, research, writing and patient intake, as laid out by the AI Hub team in their August demoâis a real and pressing issue. According to industry estimates, burnout could cost healthcare systems at least $4.6 billion annually. And while reports of burnout were down overall in 2023 compared to previous years (during which a global pandemic happened and burnout was at an all-time high) more than 48 percent of physicians âreported experiencing at least one symptom of burnout,â according to the American Medical Association (AMA).Â
âA source of that stress? More than one-quarter of respondents said they did not have enough physicians and support staff. There was an ongoing need for more nurses, medical assistants or documentation assistance to reduce physician workload,â an AMA report based on a national survey of 12,400 responses from physicians across 31 states at 81 health systems and organizations said. âIn addition, 12.7% of respondents said that too many administrative tasks were to blame for job stress. The lack of support staff, time and payment for administrative work also increases physiciansâ job stress.âÂ
There could be some promise in AI for addressing administrative burdens on clinicians. A recent (albeit small and short) study found that using LLMs to do tasks like drafting emails could help with burnout. Studies show that physicians spend between 34 to 55 percent of their work days âcreating notes and reviewing medical records in the electronic health record (EHR), which is time diverted from direct patient interactions,â and that administrative work includes things like billing documentation and regulatory compliance.
âThe need is so urgent,â Ravitsky said. âClinician burnout because of note taking and updating records is a real phenomenon, and the hope is that time saved from that will be spent on the actual clinical encounter, looking at the patientâs eyes rather than at a screen, interacting with them, getting more contextual information from them, and they would actually improve clinical care.â But this is a double-edged sword: âEverybody fears that it will release some time for clinicians, and then, instead of improving care, theyâll be expected to do more things, and that wonât really help,â she said.Â
Thereâs also the matter of cybersecurity risks associated with putting patient data into a network, even if itâs a closed system.Â
âI would be uncomfortable with medical providers using this technology without understanding the limitations and risksâ
Blake Murdoch, Senior Research Associate at the Health Law Institute in Alberta, told me in an email that if itâs an internal tool thatâs not sending data outside the network, itâs not necessarily different from other types of intranet software. âThe manner in which it is used, however, would be important,â he said.
âGenerally we have the principle of least privilege for PHI in particular, whereby there needs to be an operational need to justify accessing a patientâs file. Unnecessary layers of monitoring need to be minimized,â Murdoch said. âPrivacy law can be broadly worded so the monitoring you mention may not automatically constitute a breach of the law, but it could arguably breach the underlying principles and be challenged. Also, some of this could be resolved by automated de-identification of patient information used in LLMs, such as stripping names and assigning numbers, etc. such that those monitoring cannot trace actions in the LLM back to identifiable patients.â
As Kaul noted in the AI Hub demo, corporations are in fact banning use of âthe ChatGPTs of the world.â Last year, a ChatGPT user said his account leaked other peopleâs passwords and chat histories. Multiple federal agencies have blocked the use of generative AI services on their networks, including the Department of Veterans Affairs, the Department of Energy, the Social Security Administration and the Agriculture Department, and the Agency for International Development warned employees not to input private data into public AI systems.Â
Casey Fiesler, Associate Professor of Information Science at University of Colorado Boulder, told me in a call that while itâs good for physicians to be discouraged from putting patient data into the open-web version of ChatGPT, how the Northwell network implements privacy safeguards is importantâas is education for users. âI would hope that if hospital staff is being encouraged to use these tools, that there is some significant education about how they work and how itâs appropriate and not appropriate,â she said. âI would be uncomfortable with medical providers using this technology without understanding the limitations and risks. âÂ
There have been several ransomware attacks on hospitals recently, including the Change healthcare data breach earlier this year that exposed the protected health information of at least 100 million individuals, and a May 8 ransomware attack against Ascension, a Catholic health system comprised of 140 hospitals across more than a dozen states that hospital staff was still recovering from weeks later.
Sarah Myers West, co-executive director of the AI Now Institute, told 404 Media that healthcare professionals like National Nurses United have been raising the alarm about AI in healthcare settings. âA set of concerns theyâve raised is that frequently, the deployment of these systems is a pretext for reducing patient facing staffing, and that leads to real harms,â she said, pointing to a June Bloomberg report that said a Google AI tool meant to analyze patient medical records missed noting a patientâs drug allergies. A nurse caught the omission. West said that alone with privacy and security concerns, these kinds of flaws in AI systems have âlife or deathâ consequences for patients.
Earlier this year, a group of researchers found that OpenAIâs Whisper transcription tool makes up sentences, the Associated Press reported. The researchersâwho presented their work as a conference paper at the 2024 ACM Conference on Fairness, Accountability, and Transparency in Juneâwrote that many of Whisperâs transcriptions were highly accurate, but roughly one percent of audio transcriptions âcontained entire hallucinated phrases or sentences which did not exist in any form in the underlying audio.â The researchers analyzed the Whisper-hallucinated content, and found that 38 percent of those hallucinations âinclude explicit harms such as perpetuating violence, making up inaccurate associations, or implying false authority,â they wrote. Nabla, an AI copilot tool marketed that recently raised $24 million in a series B round of funding, uses a combination of Microsoftâs off-the-shelf speech-to-text API and fine-tuned Whisper model. Nabla is already being used in major hospital systems including the University of Iowa.Â
âThere are so many examples of these kinds of mistakes or flaws that are compounded by the use of AI systems to reduce staffing, where this hospital system could otherwise just adequately staff their patient beds and lead to better clinical outcomes,â West said.