A Knowledge-Guided Dual-Path Framework for Automated Liver MRI Sequence Classification

doi:10.21203/rs.3.rs-7727538/v1

Download PDF

Research Article

A Knowledge-Guided Dual-Path Framework for Automated Liver MRI Sequence Classification

https://doi.org/10.21203/rs.3.rs-7727538/v1

This work is licensed under a CC BY 4.0 License

You are reading this latest preprint version

Purpose

Accurate identification of liver MRI sequences and phases is crucial for streamlining clinical workflows, yet current automated methods remain limited in coverage and robustness for real-world practice. This study aims to develop a clinically feasible automated classification system for liver MRI sequences and phases.

Methods

To achieve this goal, we developed a knowledge-guided dual-path (KDP) framework. It integrates imaging features extracted by a convolutional neural network with semantic prior knowledge parsed from DICOM metadata. A rule-based fusion module, grounded in clinical protocols, was designed to perform global consistency checks and fine-grained adjustments, ensuring clinically plausible predictions across 18 sequence categories. This approach was externally validated on a multicenter test set of 2,208 sequences from 123 cases across 22 hospitals.

Results

The proposed method demonstrated excellent performance, with macro-average F1-scores of 0.9763 on the internal test set (n = 7,141 sequences) and 0.9676 on the external, multicentric test set (n = 2,208 sequences). The KDP framework significantly outperformed the image-only model, with a macro F1-score of 0.9650 versus 0.9154 on the 12 shared categories (McNemar’s test, χ² = 53.6, p < 0.001).

Conclusion

The proposed method provides accuracy and fully automated discrimination of liver MRI series. Its validated robustness across multicentric data highlights its potential for clinical integration.

Artificial intelligence

Magnetic resonance imaging

Liver

Dual-Path Framework

Multiphase contrast-enhanced magnetic resonance imaging (MRI) is essential for diagnosing and managing liver diseases, providing superior soft-tissue contrast and valuable functional insights into hepatic lesions and parenchyma [1, 2]. A typical liver MRI exam includes multiple sequences and contrast-enhanced phases, each contributing unique diagnostic information [3]. Accurate identification of these sequences and phases is critical across the entire clinical workflow, from image acquisition and interpretation to integration into artificial intelligence (AI)-driven analysis and quantitative assessments [4–6].

At present, the recognition and labeling of MRI sequences are still predominantly manual, leading to inefficiencies, higher costs, and significant inter-observer variability [7]. These challenges are further exacerbated in multicenter settings, where variations in scanning protocols, equipment, and Digital Imaging and Communications in Medicine (DICOM) metadata labeling conventions add complexity and increase the potential for error.

In response to these challenges, several deep learning approaches have been proposed for liver MRI sequence classification. Wang et al. [8] developed the Dense Feature Fusion Neural Network (DFuNN), which achieved high accuracy in classifying ten liver MRI sequences. Zhu et al. [9] introduced innovative architectural designs to address scale variations in abdominal MRI. Kim et al. [10] systematically evaluated the performance of existing models for body MRI, while Kim et al. [11] designed a dedicated convolutional neural network (CNN) architecture that achieved near-perfect accuracy in classifying pulse sequences, imaging orientations, and contrast enhancement status in abdominal MRI. Despite these advances, integrating such models into comprehensive clinical workflows remains challenging. First, these classification systems often lack clinical completeness, as existing models fail to comprehensively cover key liver MRI sequences, such as non-contrast sequences, dynamic enhancement phases, and quantitative sequences. Second, these models are often not robust enough to handle the non-ideal data commonly encountered in real-world clinical environments, such as non-diagnostic or low-value images [12], which can lead to processing failures or degraded performance. Lastly, these methods have not been sufficiently validated for generalizability across multi-center and multi-vendor data, raising concerns about their robustness and reliability in diverse clinical settings.

In addition to image content, DICOM metadata embedded within imaging files provides valuable contextual information for sequence identification[13]. Fields such as sequence name, acquisition parameters, and contrast agent usage can offer strong cues to distinguish visually similar sequences or phases. However, the reliability of these fields is highly dependent on consistent protocol naming and standardized metadata labeling, which are often lacking in multicenter datasets [14]. Given these limitations, while DICOM metadata represents a powerful complementary source of information, it must be carefully integrated with image-based features and clinical knowledge to achieve both accuracy and robustness. This necessitates a framework capable of synergizing multi-source information while resolving potential conflicts. To this end, we propose a knowledge-guided dual-path (KDP) framework that combines image features, DICOM metadata, and clinical knowledge. This design addresses the inconsistencies inherent in multicenter datasets and enables accurate, robust, and fully automated sequence classification.

This retrospective study was approved by the Institutional Review Board of the hospital (Approval No. 2023-P2-074-01), and the requirement for written informed consent was waived.

MRI Datasets

Development dataset: We retrospectively analyzed 1374 multiphase contrast-enhanced liver MRI scans performed at our hospital between May 2020 and February 2021. All scans were from adult patients (≥ 18 years) who underwent MRI for clinical indications using either extracellular non-specific or hepatocyte-specific contrast agents.

External test dataset: The external test cohort (n = 123) was derived from two sources. The primary source was an ongoing multicenter study (ChiCTR**, n = 600, 20 hospitals), from which 100 cases were randomly selected. To supplement the cohort with under-represented cases using gadoxetate disodium, 23 additional cases were included. Of these, 10 cases were from the same multicenter study (but from different patients), and 13 cases were retrospectively collected from two institutions. This resulted in a final test set representing 22 distinct hospitals. Further details on its construction and the distribution of cases across participating centers are provided in the Supplementary Materials (Table S1).

Classification System

The system categorizes the images into 18 classes: 17 diagnostic series (covering all sequences of a standard liver MRI protocol) and one "Others" category for acquisitions not included in the aforementioned 17 classes—typically those with low diagnostic value or unconventional acquisition sequences. The complete classification scheme is detailed in Fig. 1.

Reference Standard

Classification of contrast-enhanced phases, including Early Arterial Phase (EAP), Late Arterial Phase (LAP), Portal Venous Phase (PVP), Delayed Phase (DP), Transitional Phase (TP), and Hepatobiliary Phase (HBP), followed the definitions from the Liver Imaging Reporting and Data System (LI-RADS) v2018 [15]. Non-enhanced sequences were identified based on specific imaging parameters and qualitatively assessed contrast features.

MRI Interpretation

All MRI acquisitions were independently interpreted by two radiologists (with 5 and 10 years of liver imaging experience) for sequence and phase classification. The radiologists based their determinations on image appearance and available DICOM metadata, adhering to the pre-defined reference standard. In cases of disagreement, a third radiologist (with over 15 years of experience) conducted arbitration. Sequences or phases that did not achieve a consensus (≥ 2 radiologists in agreement) after three rounds of evaluation were excluded from the final dataset.

Design of the KDP Framework

An overview of the proposed KDP framework is shown in Fig. 2. The framework integrates two complementary prediction streams: a 3D CNN-based image path and a metadata path utilizing DICOM header attributes. Their outputs are reconciled in a rule-constrained fusion module to produce final classifications across 18 categories.

Raw liver MRI DICOM images are first processed by the Sequence Preprocessing Module, which sequentially performs resizing, equidistant slice sampling, and cropping. Subsequently, the data were processed through a dual-path architecture. In the image path, preprocessed sequences were fed into a 3D CNN to generate image-based predictions. In the metadata path, key attributes were extracted from DICOM headers (e.g., Series Description, Acquisition Time) to produce metadata-based predictions.

The outputs from both paths were then combined in the Rule-Constrained Fusion Module. In this module, clinically derived rules, together with information from the metadata path, were applied to validate consistency and to correct implausible results. The final classification covered 18 categories, including 17 diagnostic sequences/phases and one “Others” category for non-diagnostic series.

Image Path

The image path was designed to extract spatiotemporal visual features from DICOM pixel data. To mitigate variations introduced by multicenter imaging protocols, all sequences underwent standardized preprocessing, including image resizing, equidistant slice sampling, and cropping. Each three-dimensional sequence was resampled to a fixed volume (12×224×224), with slice numbers standardized by equidistant sampling, and two-dimensional slices resized to 256×256. During training, random cropping was applied to enhance data diversity, whereas center cropping was used during testing to ensure stable predictions.

For feature extraction, a lightweight 3D ResNet was adopted as the backbone network, retaining only the first three residual blocks to balance multi-scale feature capture with computational efficiency. Features from different stages were pooled and concatenated, followed by a classification head to predict sequence/phase categories. Training was performed with a cross-entropy loss function and the Adam optimizer, with an initial learning rate of 1e-3 and a learning rate decay strategy. A batch size of 32 was used, and training was conducted in parallel on 8 NVIDIA GPUs.

Metadata Path

The metadata path was designed to leverage semantic information from DICOM metadata to assist sequence classification. Key attributes were extracted from each series, including Series Description, Acquisition Time, and other available sequence-related parameters. These fields were normalized and cleaned to reduce inconsistencies caused by heterogeneous naming conventions and entry practices across centers.

During modeling, the extracted metadata were transformed into structured inputs and encoded to generate preliminary metadata-based predictions. Among these attributes, Series Description provided direct semantic cues, while Acquisition Time played a critical role in distinguishing dynamic contrast-enhanced phases. The metadata-based predictions were finally integrated with the image-based outputs in the rule-constrained fusion module.

Rule-Constrained Fusion Module

The rule-constrained fusion module was designed to integrate predictions from the image and metadata paths. For most sequences, concordant predictions were directly accepted as the final output, whereas in cases where one path was missing or less reliable, the result from the other path was adopted. Rule-based adjudication was selectively applied to specific categories, particularly dynamic contrast-enhanced phases and diffusion-weighted imaging (DWI), where metadata attributes such as Acquisition Time and diffusion b-value were incorporated to validate and refine predictions. For dynamic phases, temporal ordering constraints were imposed to ensure physiologic plausibility (e.g., the PRE must precede the EAP) and to prevent mutually incompatible phases (e.g., DP and TP) from coexisting within a single examination. For DWI, the diffusion b-value extracted from the DICOM metadata was used to further categorize sequences into low- and high-b-value groups. This selective constraint strategy preserved flexibility for general sequence integration while ensuring logical consistency in critical categories.

Model Evaluation and Statistical Analysis

The performance of the proposed approach was rigorously evaluated on both internal and external test sets by comparing its predictions against the radiologist-adjudicated reference standard. We computed standard classification metrics—including accuracy, precision, sensitivity (recall), specificity, and F1-score—from the resulting confusion matrix. The macro-averaged F1-score was selected as the primary metric for comprehensive performance assessment due to its capacity to balance precision and recall across multiple imbalanced classes. To quantify the specific contribution of integrating DICOM metadata and clinical rules, we compared the full KDP against an ablated image-only model on the subset of 12 sequence categories common to both pathways (the remaining 6 categories relied exclusively on the DICOM metadata pathway and were excluded from this comparative analysis). The statistical significance of performance differences was assessed using McNemar’s test with a two-sided significance threshold of α = 0.05. All analyses were performed using Python 3.12.

Data Characteristics

The development cohort consisted of 38,072 sequences/phases (approximately 27 per examination), with 31,331 assigned to the training set and 7,141 to the internal test set. The external test cohort included 2,208 sequences. As summarized in Table 1, the cohorts were balanced in terms of patient age and sex. As anticipated, the external test set exhibited greater diversity in MRI scanner types compared to the internal set, reflecting real-world multicenter clinical conditions.

Table 1
Characteristics of patients and MRI examinations in the datasets.
Variable	Training set	Internal test set	External test set
Patient characteristics
Total no. of patients	1168	206	123
Mean age (y)*	55.8 (14.1)	55.6 (14.3)	53.7 (12.7)
Age range (y)	19–89	22–91	21–86
Sex
M	591 (50.6%)	104 (50.5%)	58 (47.2%)
F	577 (46.8%)	88 (49.5%)	65 (52.8%)
MRI characteristics
Field strength
1.5T	24 (2.1%)	4 (1.9%)	42 (34.1%)
3.0T	1144 (97.9%)	202 (98.1%)	81 (65.9%)
Manufacturers and models
Siemens Healthineers	309 (26.5%)	52 (25.2%)	53 (43.1%)
Magnetom Avanto			15
Magnetom Amira			3
Magnetom Aera			7
Magnetom Prisma	309	52	7
Magnetom Skyra			13
Magnetom Vida			4
Verio			4
GE Healthcare	493 (42.2%)	90 (43.7%)	39 (31.7%)
Discovery 750W	131	26	7
Discovery 750			7
SIGNA Voyager			3
SIGNA Pioneer	338	60	6
Signa Explorer			2
Signa Excite			1
Signa HDx			5
Signa HDxt	24	4	3
Optima MR360			2
Signa Premier			3
Signa Explorer
Philips Healthcare	366 (31.3%)	64 (31.1%)	16 (13.0%)
Ingenia Elition	366	64	13
Achieva			3
United Imaging			15 (12.2%)
uMR790			2
uMR780			5
uMR870			6
uMR670			2
Contrast agents
Extracellular non-specific	897 (76.8%)	159 (77.2%)	95 (77.2%)
Hepatocyte-specific	271 (23.2%)	47 (22.8%)	28 (22.8%)
gadoxetate disodium	271	47	23
gadobenate dimeglumine			5
Sequence and phases count	31331	7141	2208
Note. — Unless otherwise indicated, data are numbers of patients and data in parentheses are percentages.

*Data in parentheses are SDs.

Classification Performance of the KDP Framework

Our proposed framework demonstrated excellent performance on both the internal and external test sets, achieving macro-average accuracy, precision, sensitivity, specificity, and F1-scores of 0.9986, 0.9808, 0.9739, 0.9991, and 0.9763 on the internal set, and 0.9972, 0.9740, 0.9628, 0.9984, and 0.9676 on the external set, respectively.

Performance varied across sequences. On the internal test set, F1-scores for EAP, LAP, PVP, DP, and TP ranged from 0.8615 to 0.9771, which was lower than for other phases (> 0.98, Table 2). On the external test set, these same phases remained the most challenging, with F1-scores ranging from 0.8571 to 0.9699. This pattern reflects the inherent visual similarity among dynamic contrast-enhanced phases. Additionally, slight performance degradation was observed specifically for coronal T2-weighted imaging (T2WI) and coronal contrast-enhanced T1-weighted imaging (T1WI + C) sequences on the external test set (Table 3), which may be due to inter-institutional variations in scanning protocols and image quality.

Table 2
Performance metrics per class on the internal test set.
Class	TP†	FP	FN	TN	Accuracy	Precision	Sensitivity	Specificity	F1 Score
Axial T2WI	227	1	2	6911	0.9996	0.9956	0.9913	0.9999	0.9934
Low b-value DWI	212	2	3	6924	0.9993	0.9907	0.9860	0.9997	0.9883
High b-value DWI	257	2	3	6879	0.9993	0.9923	0.9885	0.9997	0.9904
ADC	206	0	2	6933	0.9997	1.0000	0.9904	1.0000	0.9952
IP	288	0	1	6852	0.9999	1.0000	0.9965	1.0000	0.9983
OP	288	0	0	6853	1.0000	1.0000	1.0000	1.0000	1.0000
PRE	401	0	3	6737	0.9996	1.0000	0.9926	1.0000	0.9963
EAP	140	2	43	6956	0.9937	0.9859	0.7650	0.9997	0.8615
LAP	342	42	1	6756	0.9940	0.8906	0.9971	0.9938	0.9409
PVP	205	5	17	6914	0.9969	0.9762	0.9234	0.9993	0.9491
DP	320	7	8	6806	0.9979	0.9786	0.9756	0.9990	0.9771
TP	89	12	4	7036	0.9978	0.8812	0.9570	0.9983	0.9175
HBP	100	2	1	7038	0.9996	0.9804	0.9901	0.9997	0.9852
Coronal T2WI	222	0	5	6914	0.9993	1.0000	0.9780	1.0000	0.9889
Coronal T1WI + C	247	3	0	6891	0.9996	0.9880	1.0000	0.9996	0.9940
PDFF	193	0	0	6948	1.0000	1.0000	1.0000	1.0000	1.0000
R2*	192	0	0	6949	1.0000	1.0000	1.0000	1.0000	1.0000
Others	3116	18	3	4004	0.9971	0.9943	0.9990	0.9955	0.9966
Macro Avg					0.9986	0.9808	0.9739	0.9991	0.9763

TP† = true positive, FP = false positive, FN = false negative, TN = true negative, T2WI = T2-weighted imaging, DWI = diffusion-weighted imaging, ADC = apparent diffusion coefficient, T1WI = T1-weighted imaging, IP = in-phase, OP = out-of-phase, PRE = pre-contrast phase, EAP = early arterial phase, LAP = late arterial phase, PVP = portal venous phase, DP = delayed phase, TP = transitional phase, HBP = hepatobiliary phase, T1WI + C = contrast-enhanced T1-weighted imaging, PDFF = proton density fat fraction, Avg = average.

Table 3
Performance metrics per class on the external test set.
Class	TP†	FP	FN	TN	Accuracy	Precision	Sensitivity	Specificity	F1 Score
Axial T2WI	134	0	8	2066	0.9964	1.0000	0.9437	1.0000	0.9710
Low b-value DWI	120	0	1	2087	0.9995	1.0000	0.9917	1.0000	0.9959
High b-value DWI	123	1	0	2084	0.9995	0.9919	1.0000	0.9995	0.9960
ADC	101	0	2	2105	0.9991	1.0000	0.9806	1.0000	0.9902
IP	133	0	0	2075	1.0000	1.0000	1.0000	1.0000	1.0000
OP	131	0	0	2077	1.0000	1.0000	1.0000	1.0000	1.0000
PRE	193	1	0	2014	0.9995	0.9948	1.0000	0.9995	0.9974
EAP	38	2	4	2164	0.9973	0.9500	0.9048	0.9991	0.9268
LAP	129	4	4	2071	0.9964	0.9699	0.9699	0.9981	0.9699
PVP	134	8	7	2059	0.9932	0.9437	0.9504	0.9961	0.9470
DP	117	9	5	2077	0.9937	0.9286	0.9590	0.9957	0.9435
TP	27	3	6	2172	0.9959	0.9000	0.8182	0.9986	0.8571
HBP	39	1	1	2167	0.9991	0.9750	0.9750	0.9995	0.9750
Coronal T2WI	92	0	18	2098	0.9918	1.0000	0.8364	1.0000	0.9109
Coronal T1WI + C	127	13	0	2068	0.9941	0.9071	1.0000	0.9938	0.9513
PDFF	24	0	0	2184	1.0000	1.0000	1.0000	1.0000	1.0000
R2*	19	0	0	2189	1.0000	1.0000	1.0000	1.0000	1.0000
Others	471	14	0	1723	0.9937	0.9711	1	0.9919	0.9854
Macro Avg					0.9972	0.9740	0.9628	0.9984	0.9676

The Added Value of the KDP Relative to the Image-Only Model

Performance comparison on the external test set showed that the KDP consistently outperformed the image-only model across all evaluated metrics (Table 4). The KDP achieved higher accuracy (0.9734 vs. 0.9202), representing a 5.3% absolute improvement. In terms of class discrimination, it demonstrated superior macro-averaged sensitivity (0.9594 vs. 0.9043) and specificity (0.9980 vs. 0.9942), indicating better performance in both identifying positive cases and excluding negative ones across all classes. The precision metrics also favored the KDP (0.9712 vs. 0.9354), suggesting fewer false positive predictions.

McNemar's test on the external test set showed a statistically significant difference between the two methods (χ² = 53.6, p < 0.001 with continuity correction). Among the 1354 instances (from the external test set) in the 12 common sequence categories, 94 discordant pairs were observed, representing 6.9% of this subset. KDP correctly classified 83 instances misclassified by the image-only model, while only 11 instances were misclassified by the KDP (OR = 7.55; 95% CI: [4.07, 14.26]). This nearly 7.5:1 ratio of corrected misclassifications highlights the substantial practical value of combining DICOM metadata with clinical rules for joint decision-making.

Table 4
Comparative performance of the baseline image-only classifier and the KDP framework.
Metric	Image-only	KDP framework	Difference
Accuracy	0.9202	0.9734	+ 0.0532
Macro Sensitivity	0.9043	0.9594	+ 0.0551
Macro Specificity	0.9942	0.9980	+ 0.0038
Macro Precision	0.9354	0.9712	+ 0.0357
Macro F1-score	0.9154	0.9650	+ 0.0495

KDP = knowledge-guided dual-path framework. The “Difference” column indicates the absolute performance difference, calculated as the value for the KDP framework minus that for the image-only model.

Error Distribution in the External Test Set

The confusion matrix of the external test set (Fig. 3) revealed a distinct pattern of misclassifications. The most prominent errors occurred among the dynamic contrast-enhanced phases: mutual misclassification between the PVP and DP accounted for the majority of errors; substantial cross-classification was observed between EAP vs. LAP; and the TP exhibited extremely high classification uncertainty, with errors widely distributed across multiple categories, including PVP, DP, and HBP. In addition, another class of errors was characterized by misclassification of coronal T2WI as coronal T1WI + C (13 cases, 11.8%), as well as several diagnostic images (including 7 axial T2WI, 2 ADC maps, and 5 coronal T2WI) being incorrectly assigned to the “Others” category.

Accurate and automated identification of MRI sequences and contrast phases is crucial for optimizing clinical workflows and enabling downstream AI applications. In this study, we developed and validated a KDP framework for the fully automated classification of the entire liver MRI series. The proposed approach demonstrated outstanding and consistent performance, achieving macro-average F1-scores of 0.9763 and 0.9676 on the internal and external test sets, respectively.

Unlike prior studies that focused only on specific sequence types or relied on pre-filtered datasets [8, 13, 16–18], our approach is designed to operate in more complex settings that closely mirror real-world clinical practice. We directly process uncurated multi-center data covering the full range of commonly used sequences and contrast-enhanced phases, while also addressing practical challenges such as inconsistencies in protocol naming, variations in acquisition parameters, and the presence of irregular image series. The ability to maintain robust performance under these complex conditions provides a strong foundation for clinical translation and broader adoption.

These advantages stem from our innovative hybrid architecture. First, the architecture effectively addresses the inherent limitations of relying on a single-source data through the integration of imaging data with DICOM metadata: While DICOM metadata provide valuable semantic information [13], their utility is often constrained by inconsistent nomenclature across institutions [14]; Pure image-based models, though capable of achieving stable baseline accuracy, remain sensitive to variations in image quality and acquisition parameters and often struggle to differentiate sequences with similar contrast characteristics. In their study on the automatic identification and classification of mammography views, Lituiev et al. [19] similarly employed a hybrid approach integrating image content with DICOM metadata, achieving exceptional accuracy (AUC = 99.7%) in classifying special diagnostic views. This aligns with our foundational strategy of multimodal fusion. Despite the strong foundation for high accuracy provided by multimodal fusion, its data-driven nature may still yield outputs that are clinically illogical (e.g., the PVP appearing before the arterial phase). To address this issue, we introduced a rule-based engine grounded in clinical prior knowledge. This component performs final arbitration through medical logic checks, effectively identifying and correcting such implausible predictions. This design goes beyond the purely data-driven paradigm by deeply integrating interpretable clinical rules into the decision-making process, which was essential for achieving the accuracy and robustness demonstrated in our results.

Despite the high accuracy, certain patterns of misclassification were observed. The most prominent errors occurred among dynamic contrast-enhanced phases, a phenomenon consistent with previous report [8, 9]. These misclassifications stem from multifactorial causes: physiologically, the continuous nature of contrast enhancement creates inherent inter-phase overlap, exacerbated by individual variations in hemodynamics and liver function; technically, inconsistencies in acquisition timing, breath-holding, or protocols distort ideal phase characteristics; and algorithmically, despite excelling at local texture analysis [20], deep learning models lack sensitivity to global enhancement dynamics [21], a limitation compounded by insufficient training on atypical phase variants. Importantly, these intrinsic visual similarities between phases also pose challenges in clinical practice, where even experienced radiologists frequently encounter similar ambiguities [22], underscoring the inherent difficulty of this classification task. Other errors, such as the misclassification of coronal T2WI as T1WI + C or the erroneous assignment of diagnostic sequences (e.g., T2WI) to the “Others” category, were mainly attributable to non-standardized, missing, or ambiguous DICOM metadata. Such deficiencies either misdirected the system or inadvertently activated the exclusion rules for non-diagnostic images. To fundamentally mitigate these challenges, future efforts should focus on establishing cross-center standardized dictionaries for DICOM metadata or developing natural language processing-based tools for automated cleaning and normalization of sequence nomenclature. In addition, investigating fusion model architectures with greater resilience to missing or aberrant metadata constitutes an important direction toward improving the generalizability and clinical utility of such systems.

Our study has several limitations. Although the external validation cohort encompassed 22 hospitals and thus enhanced the generalizability of our findings, the overall sample size remained relatively modest, particularly for cases using hepatocyte-specific contrast agents. This imbalance may have limited the statistical power to fully evaluate model performance across all subgroups. Furthermore, while constraints informed by DICOM metadata and clinical knowledge notably enhanced performance, their reliance on manually crafted logic poses limitations, as ongoing maintenance and revisions will be necessary to accommodate future changes in scanning protocols and the development of novel sequence types. Future research could focus on exploring methods for automating the learning and encoding of domain knowledge, such as through knowledge graphs or probabilistic reasoning, to improve scalability and adaptability.

In conclusion, we have developed and validated a robust, accurate, and comprehensive approach for the automated classification of liver MRI sequences and phases. By integrating image-based deep learning with DICOM metadata and clinically derived rules, we developed a framework that not only achieves high performance but also effectively addresses the complexities of real-world clinical environments. Its validated ability to process raw, uncurated DICOM data from multiple centers underscores its potential as a practical tool for clinical and AI workflows, enhancing efficiency, standardization, and reliability.

Author Contribution

Conceptualization: X. Han; Data curation: H. Xu; Formal analysis: H. Xu; Funding acquisition: Z. Yang, H. Xu and X. Jia; Investigation: X. Han; Methodology: X. Han; Project administration: Z. Yang; Resources: Z. Yang and C. Zheng; Software: C. Zheng, L. Gong and J. Lv; Supervision: D. Yang and Z. Wang; Validation: H. Xu; Visualization: X. Han; Writing-original draft: X. Han; Writing-review & editing: D. Yang. All authors reviewed the manuscript.

Acknowledgement

We thank Dr. Yijia Zheng for constructive comments of the first draft.

Matos AP, Velloni, F, Ramalho, M, AlObaidy, M, Rajapaksha, A,Semelka, RC (2015) Focal liver lesions: Practical magnetic resonance imaging approach. World J Hepatol 7:1987–2008. https://doi.org/10.4254/wjh.v7.i16.1987
Frenette C, Mendiratta-Lala, M, Salgia, R, Wong, RJ, Sauer, BG,Pillai, A (2024) Acg clinical guideline: Focal liver lesions. Am J Gastroenterol 119:1235–1271. https://doi.org/10.14309/ajg.0000000000002857
Kamal O, Sy, E, Chernyak, V, Gupta, A, Yaghmai, V, Fowler, K, et al. (2023) Optional mri sequences for li-rads: Why, what, and how? Abdom Radiol (NY) 48:519–531. https://doi.org/10.1007/s00261-022-03726-8
Guglielmo FF, Mitchell, DG, Roth, CG,Deshmukh, S (2014) Hepatic mr imaging techniques, optimization, and artifacts. Magn Reson Imaging Clin N Am 22:263 – 82. https://doi.org/10.1016/j.mric.2014.04.004
Keaveney S, McHugh, DJ, Rata, M, Dragan, A, Winfield, JM, Doran, SJ, et al. (2025) An open-source repository-based tool for quality control of imaging protocol compliance: Demonstration in a multicentre mri study. Br J Radiol 98:1236–1244. https://doi.org/10.1093/bjr/tqaf089
Donato H, França, M, Candelária, I,Caseiro-Alves, F (2017) Liver mri: From basic protocol to advanced techniques. Eur J Radiol 93:30–39. https://doi.org/10.1016/j.ejrad.2017.05.028
de Mello JPV, Paixão, TM, Berriel, R, Reyes, M, Badue, C, De Souza, AF, et al. Deep learning-based type identification of volumetric mri sequences. in 2020 25th International Conference on Pattern Recognition (ICPR). 2021. IEEE.
Wang SH, Du, J, Xu, H, Yang, D, Ye, Y, Chen, Y, et al. (2021) Automatic discrimination of different sequences and phases of liver mri using a dense feature fusion neural network: A preliminary study. Abdom Radiol (NY) 46:4576–4587. https://doi.org/10.1007/s00261-021-03142-4
Zhu Z, Mittendorf, A, Shropshire, E, Allen, B, Miller, C, Bashir, MR, et al. (2022) 3d pyramid pooling network for abdominal mri series classification. IEEE Trans Pattern Anal Mach Intell 44:1688–1698. https://doi.org/10.1109/tpami.2020.3033990
Kim B, Mathai, TS, Helm, K, Mukherjee, P, Liu, J,Summers, RM (2025) Automated classification of body mri sequences using convolutional neural networks. Acad Radiol 32:1192–1203. https://doi.org/10.1016/j.acra.2024.11.046
Kim J, Chae, A, Duda, J, Borthakur, A, Rader, DJ, Gee, JC, et al. (2025) Automated characterization of abdominal mri exams using deep learning. Scientific reports 15:27044. https://doi.org/10.1038/s41598-025-11985-w
Petroianu LPG, Li, L, Mieloszyk, RJ, Mastrangelo, CM, Stapleton, S,Hall, C (2024) Mri log file analysis for workflow improvement. Curr Probl Diagn Radiol 53:192–200. https://doi.org/10.1067/j.cpradiol.2023.10.018
Liang S, Beaton, D, Arnott, SR, Gee, T, Zamyadi, M, Bartha, R, et al. (2021) Magnetic resonance imaging sequence identification using a metadata learning approach. Front Neuroinform 15:622951. https://doi.org/10.3389/fninf.2021.622951
Gueld MO, Kohnen, M, Keysers, D, Schubert, H, Wein, BB, Bredno, J, et al. Quality of dicom header information for image categorization. in Medical imaging 2002: PACS and integrated medical information systems: design and evaluation. 2002. SPIE.
American College of Radiology (2018) CT/MRI LI-RADS v2018 core. Elsevier. https://www.acr.org/Clinical-Resources/Clinical-Tools-and-Reference/Reporting-and-Data-Systems/LI-RADS. Accessed 10 September 2025
Chou Y, Remedios, SW, Butman, JA,Pham, DL. Automatic classification of mri contrasts using a deep siamese network and one-shot learning. in Medical imaging 2022: image processing. 2022. SPIE.
Braeker N, Schmitz, C, Wagner, N, Stanicki, BJ, Schröder, C, Ehret, F, et al. (2022) Classifying the acquisition sequence for brain mris using neural networks on single slices. Cureus 14:e22435. https://doi.org/10.7759/cureus.22435
Ranjbar S, Singleton, KW, Jackson, PR, Rickertsen, CR, Whitmire, SA, Clark-Swanson, KR, et al. (2020) A deep convolutional neural network for annotation of magnetic resonance imaging sequence type. J Digit Imaging 33:439–446. https://doi.org/10.1007/s10278-019-00282-4
Lituiev DS, Trivedi, H, Panahiazar, M, Norgeot, B, Seo, Y, Franc, B, et al. (2019) Automatic labeling of special diagnostic mammography views from images and dicom headers. J Digit Imaging 32:228–233. https://doi.org/10.1007/s10278-018-0154-z
Geirhos R, Rubisch, P, Michaelis, C, Bethge, M, Wichmann, FA,Brendel, W. Imagenet-trained cnns are biased towards texture; increasing shape bias improves accuracy and robustness. in International conference on learning representations. 2018.
Ottens T, Barbieri, S, Orton, MR, Klaassen, R, van Laarhoven, HW, Crezee, H, et al. (2022) Deep learning dce-mri parameter estimation: Application in pancreatic cancer. Medical image analysis 80:102512.
Ehman EC, Behr, SC, Umetsu, SE, Fidelman, N, Yeh, BM, Ferrell, LD, et al. (2016) Rate of observation and inter-observer agreement for li-rads major features at ct and mri in 184 pathology proven hepatocellular carcinomas. Abdom Radiol (NY) 41:963–9. https://doi.org/10.1007/s00261-015-0623-5

No competing interests reported.

supplementmaterials.docx

Download PDF

Editorial decision: Revision requested
25 Dec, 2025
Reviews received at journal
09 Oct, 2025
Reviewers agreed at journal
09 Oct, 2025
Reviews received at journal
08 Oct, 2025
Reviewers agreed at journal
07 Oct, 2025
Reviewers invited by journal
07 Oct, 2025
Editor assigned by journal
02 Oct, 2025
Submission checks completed at journal
02 Oct, 2025
First submitted to journal
27 Sep, 2025

You are reading this latest preprint version

A Knowledge-Guided Dual-Path Framework for Automated Liver MRI Sequence Classification

Status:

Version 1

Abstract

Purpose

Methods

Results

Conclusion

Figures

Introduction

Materials and Methods

MRI Datasets

Classification System

MRI Interpretation

Design of the KDP Framework

Image Path

Metadata Path

Rule-Constrained Fusion Module

Model Evaluation and Statistical Analysis

Results

Data Characteristics

Classification Performance of the KDP Framework

The Added Value of the KDP Relative to the Image-Only Model

Error Distribution in the External Test Set

Discussion

Declarations

Author Contribution

Acknowledgement

References

Additional Declarations

Supplementary Files

Status:

Version 1