Leveraging Clinically Relevant Biometric Constraints to Supervise a Deep Learning Model for the Accurate Caliper Placement to Obtain Sonographic Measurements of the Fetal Brain
Purpose
To develop a deep learning (DL) system for the automated caliper placement to obtain key sonographic measurements of the fetal brain and evaluate the effect of leveraging clinically relevant biometric constraints and domain-relevant data augmentations in its performance.
Methods
A total of 1192 images (596 Transcerebellar, 596 Transventricular) were retrospectively obtained from 473 mid-trimester USG examinations (18-24 weeks; transabdominal scans) at 3 centers (2 tertiary referral centers and 1 routine imaging center) using GE Voluson E8, S10, and P8 USG machines. For all the training images, the caliper positions of 4 measurements (TV plane: atrial width [AW]; TC plane: transcerebellar diameter [TCD], nuchal fold thickness [NFT], cisterna magna size [CMS]) were provided by medical expert annotators based on internationally prescribed guidelines. We trained a DL system (U-Net based) to automatically predict the caliper positions using the expert annotated data and computed the corresponding biometric measurements as the euclidean distance between them. The DL system performance was assessed on an unseen test of 145 images (145 pregnancies) annotated by 7 experienced clinicians.
The mean euclidean error for each caliper position, the euclidean error between each biometric measurement (DL vs 7 clinicians), and the absolute agreement (intra-class correlation coefficients [ICC]; two-way random; single rater) were used as the performance assessment metrics. Additionally, the effect of leveraging clinically relevant constraints and domain-relevant data augmentations were tested across three different architectures to demonstrate the generalizability of the approach.
Results
The mean euclidean error across 4 measurements was 0.88+-0.59mm and the DL system was in good to excellent agreement with the 7 clinicians. The proposed biometric constraint and domain relevant data augmentations improved the performance by 3 and 6 percent across three different architectures.
Conclusion
Traditional approaches for obtaining automated measurements through computer vision depend on the quality of automated segmentation. Our approach eliminates this need by directly obtaining the calliper points by modeling the problem statement as a “landmark detection” problem. This eliminates the need to prepare expensive datasets (for segmentation based approach), and opens doors as a vastly generalizable and reusable framework for obtaining any measurements directly from merely landmark points without the need to develop custom computer vision algorithms. Clinically, we believe that the successful clinical translation of the proposed framework can assist novice users in the accurate and standardized assessment of fetal brain USG examinations to aid the screening of CNS anomalies.