Detection and Characterization of Endplate Structural Defects on CT: A Diagnostic Accuracy Study

Spine (Phila Pa 1976). 2024 Jan 29. doi: 10.1097/BRS.0000000000004936. Online ahead of print.

Abstract

Study design: Diagnostic test study.

Objective: To determine the reliability and validity or diagnostic accuracy of two previously described endplate structural defect (EPSD) assessment methods.

Summary of background data: Studies of EPSD may further the understanding of pathoanatomical mechanisms underlying back pain. However, clinical imaging methods used to document EPSD have not been validated, leaving uncertainty about what the observations represent.

Methods: Using an evaluation manual, 418 endplates on CT sagittal slices obtained from 19 embalmed cadavers (9 men and 10 women, aged 62-91 y) were independently assessed by two experienced radiologists and a novice for EPSD using the two methods. The corresponding micro-CT (µCT) from the harvested T7-S1 spines were assessed by another independent rater with excellent intra-rater reliability (Kappa=0.96).

Results: Inter-rater reliability was good for presence (Kappa=0.60-0.69) and fair for specific phenotypes (Kappa=0.43-0.58) of EPSD. Erosion, for which the Brayda-Bruno classification lacked a category, was mainly (82.8%) classified as wavy/irregular, while many notched defects (n=15, 46.9%) and Schmorl's nodes (n=45, 79%) were recorded as focal defects using Feng's classification. When compared to µCT, endplate fractures (n=53) and corner defects (n=28) were routinely missed on CT. Endplates classified as wavy/irregular on CT corresponded to erosion (n=29, 21.2%), jagged defects (n=21, 15.3%), calcification (n=19, 13.9%), and other phenotypes on µCT. Some focal defects on CT represented endplate fractures (n=21, 27.6%) on µCT. Overall, with respect to the presence of an EPSD, there was a sensitivity of 70.9% and specificity of 79.1% using Feng's method, and 79.5% and 57.5% using Brayda-Bruno's. Poor to fair inter-rater reliability (k=0.26-0.47) was observed for defect dimensions.

Conclusion: There was good inter-rater reliability and evidence of criterion validity supporting assessments of EPSD presence using both methods. However, neither method contained all needed EPSD phenotypes for optimal sensitivity, and specific phenotypes were often misclassified.