Automated quality control in image segmentation: application to the UK Biobank cardiovascular magnetic resonance imaging study.
Robinson R., Valindria VV., Bai W., Oktay O., Kainz B., Suzuki H., Sanghvi MM., Aung N., Paiva JM., Zemrak F., Fung K., Lukaschuk E., Lee AM., Carapella V., Kim YJ., Piechnik SK., Neubauer S., Petersen SE., Page C., Matthews PM., Rueckert D., Glocker B.
BACKGROUND: The trend towards large-scale studies including population imaging poses new challenges in terms of quality control (QC). This is a particular issue when automatic processing tools such as image segmentation methods are employed to derive quantitative measures or biomarkers for further analyses. Manual inspection and visual QC of each segmentation result is not feasible at large scale. However, it is important to be able to automatically detect when a segmentation method fails in order to avoid inclusion of wrong measurements into subsequent analyses which could otherwise lead to incorrect conclusions. METHODS: To overcome this challenge, we explore an approach for predicting segmentation quality based on Reverse Classification Accuracy, which enables us to discriminate between successful and failed segmentations on a per-cases basis. We validate this approach on a new, large-scale manually-annotated set of 4800 cardiovascular magnetic resonance (CMR) scans. We then apply our method to a large cohort of 7250 CMR on which we have performed manual QC. RESULTS: We report results used for predicting segmentation quality metrics including Dice Similarity Coefficient (DSC) and surface-distance measures. As initial validation, we present data for 400 scans demonstrating 99% accuracy for classifying low and high quality segmentations using the predicted DSC scores. As further validation we show high correlation between real and predicted scores and 95% classification accuracy on 4800 scans for which manual segmentations were available. We mimic real-world application of the method on 7250 CMR where we show good agreement between predicted quality metrics and manual visual QC scores. CONCLUSIONS: We show that Reverse classification accuracy has the potential for accurate and fully automatic segmentation QC on a per-case basis in the context of large-scale population imaging as in the UK Biobank Imaging Study.