AI use in imaging

Hi all

To those departments that have AI integrated in their workflow ( plain chest pathology, stroke, fracture, GI polyp etc). Do you have a protocol written up that you could share in how the reporters and the wider hospital referrers deal with the findings generated from the AI package. Tganks in advance

We have Brainomix, Qure, Gleamer and AIDOC all implemented here, we’re about to get Aidence Veye too.
Brainomix is just a DICOM SC output into PACS.
Aidence is a GSPX overlay on the original series
Qure and Gleamer are deployed through the Blackford Platform and both output DICOM SC too as an additional series, but we’re working on HL7 report prioritisation as part of a PACS-based reporting workflow. The idea is to be able to flag an event wherever there’s a positive finding, and then build worklists that prioritise them for reporting first.
AIDOC’s a bit different. There’s no DICOM output to PACS, rather they provide a widget that’s installed on Radiologists’ workstations that notifies them of positive findings (incidental PE algorithm). They too support HL7 integration, so as with qure and gleamer, we’re looking to set up a feed to PACS that flags studies with positive results and the set the duty radiologists’ worklist to populate with those studies.
There are some other nuances around AIDOC - there’s a button to load the appropriate studies in PACS from the notification, which is a powershell command in our case but can use others, and we’re going to implement some level of bi-directional integration too, so when loading a positive study for reporting in PACS, the widget automatically displays the heatmap of where it found a PE.

Happy to demo or discuss further :slight_smile:

The way we approached this is first to do a safety risk assessment/hazard log of using the AI in clinical practice. If the AI is used for clinical decision support we need to consider the human factors of ‘situational awareness‘ and ‘automation bias’. There is a risk that referring clinicians and reporters can either fail to look or take into the AI findings (and a patient comes to harm as a result the inaction) or the clinician/reporter can over rely on the AI findings, which we know will be wrong in a percentage of cases (false positives and false negatives - and the patient coming to harm as a result of subsequent action or inaction).

The protocol for using it in clinical practice then attempts to mitigate those risks.

Reporters and clinicians are used to assimilating and weighting information, some of which is contradictory, and coming to a diagnosis. The risk with AI is it is something new and they may not know how much emphasis to place on the findings.

For reporters, our SOP and training is is to look at the radiology imaging as you would normally at first - and then check the AI to see if anything may have been missed. This can reduce the risk of automation bias. We also trained reporters on the false positive and false negative rates to help set expectations - to raise their situational awareness.

For clinicians that’s more difficult to enforce - and referrers may be more over-reliant on the AI findings if they are inexperienced at reading images. This may be mitigated through communication and training, we also including guidance with the AI findings/report stating that this is only decision support and the formal radiology report should always be taken as the definitive result (if available). If AI findings are being viewed by clinicians without the radiology report (due to reporting backlogs) then it may help to reflect the confidence of AI findings - using words like ‘possible finding’ rather than it appearing too definitive - and advising to discuss with radiology if it flags any findings requiring immediate action. It’s important the AI does not circumvent the process of the clinician referrer viewing and acknowledging the radiology report (when available)


Where do you get this data from? Is it provided with the software instructions for use?

The statistics can be confusing - and can be misleading as high values on some statistics can be falsely reassuring.

The AI algorithm providers will normally provide the sensitivity, specificity and accuracy of they findings - as that is usually part of their CE mark submission.

From a clinician/reporter perspective- you really want to know how likely is AI to miss something - and how likely is it to over call findings?

A sensitivity of 90% means it will pick up the abnormality 90% of the time and it will therefore miss 10% of cases. That helps give users the level of assurance as to whether it will miss something. Most people find that helpful to know.

Users also find it helpful to know if AI flags something - how likely is it the patient will really have the abnormality (and how likely is it that AI will overcall a finding). That is the positive predictive value (PPV). This may be better described as - If AI says it’s positive - what percentage of the time is it right?

Unfortunately to calculate PPV you also need to know the prevalence of the finding in your population. The AI providers don’t normally publish that as they don’t know your population. You will need to run it on a sample to work that out.

As an example - if eg the sensitivity for detecting pneumothorax on a chest x-Ray is 90% and the specificity is 95% - and if you know from running the AI on chest X-rays that the prevalence of pneumothorax in your population is 3% (so 3% of chest x-rays have a pneumothorax) then you can work out the PPV.

The calculation is PPV = (sensitivity x prevalence) / [ (sensitivity x prevalence) + ((1 – specificity) x (1 – prevalence)) ]

It’s easier to use an online calculator eg Sensitivity and Specificity Calculator

In the example above - the PPV works out at 35%. This means if AI says there is a pneumothorax it will be right 35% of the time - and wrong 100 - 35 = 65% of the time!

That raises the question - why do a test that is wrong more times than it’s right when it flags something? How can that be safe? An A&E doctor could potentially think the AI must be right (the computer says YES) and could try to insert a chest drain into someone that doesn’t have a pneumothorax.

The question is then how can you mitigate that risk through training - and still get a positive safety benefit from the AI?

From a training perspective you might then say to clinicians -

“AI does not flag pneumothorax very often (3%) of cases - when it does flag a ‘possible’ pneumothorax have a second look to check if you may have missed one. Bare in mind though that AI will only be right in about a third (1 in 3) of cases and will over call two thirds (2 in 3) - so if you disagree that’s fine - go with your own judgement. The AI is just there to say have a second look - just in case you missed one.

Also AI will miss 1 in 10 pneumothorax- so don’t be falsely reassured if it doesn’t flag one and you think there might be a pneumothorax. Have a second look though and ask advice if you are unsure. “

NB calculating PPV in this way is an approximation based on the published sensitivity and specificity . In general terms - the less common an abnormality is (less prevalent) - then the more likely AI will overcall a finding.

Negative predictive value NPV also depends on prevalence - and high values can be misleading. For example lung cancers are typically only diagnosed in ~ 0.5% (1 in 200) GP referral chest X-rays. If you tossed a coin for every chest X-ray- the negative predictive value of ‘tails’ (or heads) for lung cancer would be 99.5%. If anyone wants to buy a coin off me that can have a negative predictive value of 99.5% for lung cancer - let me know.!