3 Learnings about Human-Centered AI Principles from Designing a Breast Cancer Screening Platform for Radiologists

Dipti Ganeriwala
AIxDESIGN
Published in
10 min readApr 26, 2021

--

_________________________________________________________________

PSA: It is a key value for us at AIxDESIGN to open-source our work and research. The forced paywalls here have led us to stop using Medium so while you can still read the article below, future writings & resources will be published on other platforms. Learn more at aixdesign.co or come hang with us on any of our other channels. Hope to see you there 👋

Design for AI has its own set of principles and a lot can be said about who or what should determine these rules. Some like to consider AI a design material while others talk about a human-centered AI design philosophy. There are also numerous discussions on the ethical aspect of harnessing AI technology and the role of designers in ensuring a safe human-AI collaboration. These are some broader topics that lend directly to my work as an AI designer.

In this article I will delve deeper into a very specific use-case of AI which is the AI-radiologist collaboration during breast cancer screening, elucidating how some of these principles can be applied to the real world. Here I would like to point out that the challenges talked about in the article are quite complex and multi-dimensional and required the collaborative effort of the entire team at Vara to solve.

Before getting started, it is important to explain the breast cancer screening use case to provide more context. Breast cancer is the most common cancer and 1 out of 8 women will be diagnosed with it in their lifetime. Screening is shown to reduce mortality by up to 25% which is why countries like Germany, USA, UK, The Netherlands, and others, have a nationwide screening program in place. Women within a certain age group (e.g. 50–70 years old) go for a periodic scan which could be yearly or every two years.

However, human expertise will not be able to match the ever-increasing workload. Last year alone 300+ million mammograms were conducted worldwide. And this number will only increase. Because of these resource constraints, the world is looking to AI technology to augment this workflow so as to maximise efficiency and therefore save more lives. AI also has the capability to support radiologists with finding cancers by displaying regions of interest, thus preventing missed cancers.

How does AI augment the radiological workflow?

Triaging — Studies that the algorithm thinks have no cancer are classified as normal (which is the bulk of all studies at 97%) for which the reports are fully pre-filled. Even so, the radiologist still has to sign off.

Triaging classifies studies that have no cancer as “normal”

Safety Net — This tool does exactly as the name suggests — it allows the radiologist to go through the entire case on his/her own and only in a case where the radiologist created a report and may have missed something, the safety net alerts the radiologist, in case it has spotted something suspicious.

The Safety Net alerts the radiologist in case they may have missed a finding

While these are some ways AI augments the screening workflow, it is equally important to highlight the UX work that is integrated into the product in order to provide a seamless, safe, and valuable experience for the human user, thus optimising the AI-human collaboration.

1. Safety

Medical safety is the topic I’d like to highlight simply because it is so important. The first point of discussion would have to be computerised aided detection (CAD). Several CAD systems have had FDA approval as early as 1998 and 91% of radiology centers in the US employ CAD for mammography screening. Despite very promising results in laboratory settings, in actual clinical use, CAD not only increased the rate of false positives and therefore recalls, but it also decreased sensitivity in some cases, leading to missed cancers. CAD could lull the novice reader into a false sense of security.

This article is a great resource to understand the crux of the medical AI safety topic while my attempt here is to direct the discussion to be more specific to breast cancer screening.

Biases: we all have them

When talking about safety it is imperative to talk about biases. While there are a whole range of biases that we are susceptible to as humans, the two that shine the most in this use case are automation bias and anchoring bias.

Automation bias is the tendency for radiologists using computer-aided decision support to over-rely on the software for the diagnosis and ignoring their own opinions.

Anchoring bias is the tendency for one to focus on salient evidence upon the initial stages of the diagnosis leading to the diagnosis. Anchoring bias can also be heuristic in nature.

Such biases can wreak havoc, especially in a setting like a mammography screening, by way of either increasing the rate of follow-up examinations or missing cancer, both of which can cause direct patient harm. While studies are still inconclusive on what the best method to address this kind of bias is, it is important to make the human aware of such an occurrence (debiasing) and to be vigilant during training.

While designing the product, this was a constant topic of discussion. Several aspects of the product were therefore affected.

Always communicating honestly Even though our algorithm is 99% sure that the study has been classified as “normal” correctly, there is still a 1% chance of error. This is important to address; placing a message “the assessment has been pre-filled by the AI. Please review the report and make changes if necessary” in every study along with reiterating this during user training helps with covering all bases.

The notification gently reminds the user that they should review the report created by the AI and make changes if necessary

When in doubt, handover to the human — There are sometimes cases where the information of the studies or their images doesn’t match the standard format (for example, the images are from different manufacturers or there is an uneven number of images, or if the algorithm(s) fail). In such cases, we prefer to hand control to the radiologists and let them know that the study could not be assessed because even a lack of a prediction could otherwise bias them to think that the study may be potentially suspicious.

The phrasing of the notification is equally important. In this case, we chose “This study could not be assessed by Vara’s algorithm due to an internal error.” and preferred to keep the message intentionally vague so as not to cause any anchoring bias. For example, if we had communicated that the triaging algorithm failed, it may have biased the user into thinking that the study was definitely suspicious. This is one such situation where the UX writing plays a crucial role in propelling the aspect of safety within a product.

The notification that alerts the user that the study must be read independently of ML

Timing of displaying predictions — While there is value in showing the region of interest at the beginning of the reporting process, we apply a human-first approach where we let the radiologist go through their workflow uninterrupted before we alert them of something suspicious.

The safety net showing the region of interest (ROI)

Two reads are better than one In most countries, mammography screening requires two separate reads by two different radiologists. This ensures the safety and a better overall outcome. In order to fit this setting, we combine one read with machine learning and the other read independently.

Mammography screening setting showing one radiologist working with AI and one independent of AI

2. Trust

“Imagine finding yourself in a hospital where you didn’t know why you were admitted, what procedures you were going to receive, whose judgments were being followed. Now imagine being a doctor in this hospital, not knowing which drugs you were administering, which patients you were treating, what conditions you were testing them for. This is, in a Kafka-esque way, our current digital world where AI drives our clicks and purchases, our dating lives, our eating choices, our newsfeeds, our opinions, and soon, our physical bodies. Design can bring clarity, intuition, and usability to these kinds of experiences.”

Lingua Franca: A Design Language for Human-Centered AI

Once we’ve established that we can provide a safe product for an AI-human collaboration, what are the measures we can take to help the radiologists trust the system more? Because only then can the true potential of such a combination be harnessed.

Being transparentMaking transparency a default design principle: we provide a dashboard view to our users where they can compare themselves to the algorithm’s performance on various metrics. This helps our users know that we are honest.

Concept dashboard view to enable radiologists to check their performance alongside the performance of the algorithm

Managing expectations — Users will anyway build mental models about the models, so we educate them to build the correct mental models. During every interaction with our users and especially during user training, we let them know what the algorithm is and isn’t capable of. This helps in increasing trustworthiness using social transitivity. Instead of allowing the radiologist to believe what they read or hear about AI, being in direct contact with them and with a solid onboarding process, we can direct the conversation to help strengthen the AI-human collaboration.

Using social transitivity to increase trustworthiness (Source: MLUX Paris x Octo — Trust in Intelligent Systems, Augustin Grimprel)

A system that fails safely — Mentioned earlier, whenever there are cases which are unique or in cases when there are errors incurred by the algorithm(s), such studies are always handed over to the human because “I don’t know” is always better than a wrong answer.

3. Efficiency

Efficiency is essential in the high-volume screening setting where a radiologist can read up to 200 studies a day. This is where the AI-radiologist combination can truly shine. If the first two boxes are ticked, we can then think about optimising for efficiency.

One size doesn’t fit all — Following the human-centered philosophy, we took notice that every radiologist has a different skillset. That would mean that the algorithm could be adjusted to the needs of individual screening units in order to arrive at the best possible outcome. So we provide customised threshold settings to every unit based on an initial evaluation which can then be reviewed and adapted regularly.

Streamlining the AI-human interaction — By adding a “normal” tag directly in the worklist (the list in which all the unread studies appear) and allowing the users to sort the worklist by way of tabs, we followed a one-click interaction approach. The user can, with one click, go into normal mode in which they are able to read all studies classified as normal by the algorithm together. This can be a feature that users prefer to use when they are feeling more fatigued, for example.

The tabs functionality lets users select which mode they would like to operate in

Identifying user pain points, in this case, fatigue from a long day of work, and combining it with the strengths of the AI technology can help maximise productivity.

Dashboard — The dashboard’s functionality extends from transparency to also allowing for the radiologists to understand the most efficient way of working with the algorithm, for understanding their own as well as the algorithm’s strengths and weaknesses. It also allows the head of the screening unit to have an overview of the team’s performance and make adjustments to the algorithm’s threshold settings accordingly.

Conclusion

With the increasing performance of the algorithms, many new features will be possible in the future. Following the principles of safety, trust, and efficiency it is also important to choose the task that you want the AI to solve wisely, know the quality requirements that need to be fulfilled by that technology, and know the current limitations of a technology and time features based on the requirements of the users and the market (in our case the patients).

A human-centric approach, managing expectations on what an AI system can and can’t do, de-biasing the human and making sure that a medically safe environment can be fostered, are some of the key takeaways from this journey.

Resources

AI Is Your New Design MaterialDiscovering the critical role of UX and product design in AI, which is set to define the next era of digital products.

Lingua FrancaA brief guide to designing human-centered AI.

Katalinic et al. (2019)Breast cancer incidence and mortality before and after implementation of the German mammography screening program.

Why CAD Failed in Mammography, A. Kohli (2018)Journal of the American College of Radiology.

Medical AI SafetyWe have a problem.

Cognitive Bias in Diagnostic Radiology

MLUX Paris x Octo — Trust in Intelligent Systems, Augustin Grimprel

The article covers learnings that took the collaborative effort of the entire team at Vara to solve—I’ve only summarised them from my own perspective.

Thanks to Maximilian Brandstätter for his contributions. All image screenshots in the post use dummy patient data.

--

--