This retrospective study was approved by the Institutional Review Board of the hospital (Approval No. 2023-P2-074-01), and the requirement for written informed consent was waived.
MRI Datasets
Development dataset: We retrospectively analyzed 1374 multiphase contrast-enhanced liver MRI scans performed at our hospital between May 2020 and February 2021. All scans were from adult patients (≥ 18 years) who underwent MRI for clinical indications using either extracellular non-specific or hepatocyte-specific contrast agents.
External test dataset: The external test cohort (n = 123) was derived from two sources. The primary source was an ongoing multicenter study (ChiCTR**, n = 600, 20 hospitals), from which 100 cases were randomly selected. To supplement the cohort with under-represented cases using gadoxetate disodium, 23 additional cases were included. Of these, 10 cases were from the same multicenter study (but from different patients), and 13 cases were retrospectively collected from two institutions. This resulted in a final test set representing 22 distinct hospitals. Further details on its construction and the distribution of cases across participating centers are provided in the Supplementary Materials (Table S1).
Classification System
The system categorizes the images into 18 classes: 17 diagnostic series (covering all sequences of a standard liver MRI protocol) and one "Others" category for acquisitions not included in the aforementioned 17 classes—typically those with low diagnostic value or unconventional acquisition sequences. The complete classification scheme is detailed in Fig. 1.
Reference Standard
Classification of contrast-enhanced phases, including Early Arterial Phase (EAP), Late Arterial Phase (LAP), Portal Venous Phase (PVP), Delayed Phase (DP), Transitional Phase (TP), and Hepatobiliary Phase (HBP), followed the definitions from the Liver Imaging Reporting and Data System (LI-RADS) v2018 [15]. Non-enhanced sequences were identified based on specific imaging parameters and qualitatively assessed contrast features.
MRI Interpretation
All MRI acquisitions were independently interpreted by two radiologists (with 5 and 10 years of liver imaging experience) for sequence and phase classification. The radiologists based their determinations on image appearance and available DICOM metadata, adhering to the pre-defined reference standard. In cases of disagreement, a third radiologist (with over 15 years of experience) conducted arbitration. Sequences or phases that did not achieve a consensus (≥ 2 radiologists in agreement) after three rounds of evaluation were excluded from the final dataset.
Design of the KDP Framework
An overview of the proposed KDP framework is shown in Fig. 2. The framework integrates two complementary prediction streams: a 3D CNN-based image path and a metadata path utilizing DICOM header attributes. Their outputs are reconciled in a rule-constrained fusion module to produce final classifications across 18 categories.
Raw liver MRI DICOM images are first processed by the Sequence Preprocessing Module, which sequentially performs resizing, equidistant slice sampling, and cropping. Subsequently, the data were processed through a dual-path architecture. In the image path, preprocessed sequences were fed into a 3D CNN to generate image-based predictions. In the metadata path, key attributes were extracted from DICOM headers (e.g., Series Description, Acquisition Time) to produce metadata-based predictions.
The outputs from both paths were then combined in the Rule-Constrained Fusion Module. In this module, clinically derived rules, together with information from the metadata path, were applied to validate consistency and to correct implausible results. The final classification covered 18 categories, including 17 diagnostic sequences/phases and one “Others” category for non-diagnostic series.
Image Path
The image path was designed to extract spatiotemporal visual features from DICOM pixel data. To mitigate variations introduced by multicenter imaging protocols, all sequences underwent standardized preprocessing, including image resizing, equidistant slice sampling, and cropping. Each three-dimensional sequence was resampled to a fixed volume (12×224×224), with slice numbers standardized by equidistant sampling, and two-dimensional slices resized to 256×256. During training, random cropping was applied to enhance data diversity, whereas center cropping was used during testing to ensure stable predictions.
For feature extraction, a lightweight 3D ResNet was adopted as the backbone network, retaining only the first three residual blocks to balance multi-scale feature capture with computational efficiency. Features from different stages were pooled and concatenated, followed by a classification head to predict sequence/phase categories. Training was performed with a cross-entropy loss function and the Adam optimizer, with an initial learning rate of 1e-3 and a learning rate decay strategy. A batch size of 32 was used, and training was conducted in parallel on 8 NVIDIA GPUs.
Metadata Path
The metadata path was designed to leverage semantic information from DICOM metadata to assist sequence classification. Key attributes were extracted from each series, including Series Description, Acquisition Time, and other available sequence-related parameters. These fields were normalized and cleaned to reduce inconsistencies caused by heterogeneous naming conventions and entry practices across centers.
During modeling, the extracted metadata were transformed into structured inputs and encoded to generate preliminary metadata-based predictions. Among these attributes, Series Description provided direct semantic cues, while Acquisition Time played a critical role in distinguishing dynamic contrast-enhanced phases. The metadata-based predictions were finally integrated with the image-based outputs in the rule-constrained fusion module.
Rule-Constrained Fusion Module
The rule-constrained fusion module was designed to integrate predictions from the image and metadata paths. For most sequences, concordant predictions were directly accepted as the final output, whereas in cases where one path was missing or less reliable, the result from the other path was adopted. Rule-based adjudication was selectively applied to specific categories, particularly dynamic contrast-enhanced phases and diffusion-weighted imaging (DWI), where metadata attributes such as Acquisition Time and diffusion b-value were incorporated to validate and refine predictions. For dynamic phases, temporal ordering constraints were imposed to ensure physiologic plausibility (e.g., the PRE must precede the EAP) and to prevent mutually incompatible phases (e.g., DP and TP) from coexisting within a single examination. For DWI, the diffusion b-value extracted from the DICOM metadata was used to further categorize sequences into low- and high-b-value groups. This selective constraint strategy preserved flexibility for general sequence integration while ensuring logical consistency in critical categories.
Model Evaluation and Statistical Analysis
The performance of the proposed approach was rigorously evaluated on both internal and external test sets by comparing its predictions against the radiologist-adjudicated reference standard. We computed standard classification metrics—including accuracy, precision, sensitivity (recall), specificity, and F1-score—from the resulting confusion matrix. The macro-averaged F1-score was selected as the primary metric for comprehensive performance assessment due to its capacity to balance precision and recall across multiple imbalanced classes. To quantify the specific contribution of integrating DICOM metadata and clinical rules, we compared the full KDP against an ablated image-only model on the subset of 12 sequence categories common to both pathways (the remaining 6 categories relied exclusively on the DICOM metadata pathway and were excluded from this comparative analysis). The statistical significance of performance differences was assessed using McNemar’s test with a two-sided significance threshold of α = 0.05. All analyses were performed using Python 3.12.