4.1 Synergistic Effects of Multi-Source Data Fusion and Algorithm Performance Comparison
This study evaluated five data combinations (conventional chemical indicators, thermogravimetric curves, thermogravimetric features, thermogravimetric curves combined with conventional chemical indicators, and thermogravimetric features integrated with conventional chemical indicators) across four algorithms (CNN, RF, KNN, LDA). The enhancement of multi-source data fusion on leaf regional style classification was systematically assessed via five-fold cross-validation, as summarized in Table 3.
Table 3. Regional Style Prediction Performance of Four Algorithms Across Feature Categories
|
CNN
|
K1
|
K2
|
K3
|
K4
|
K5
|
Average
|
|
Chemical Indicators
|
89.66%
|
81.61%
|
85.06%
|
80.46%
|
80.23%
|
83.40%
|
|
DTG Curves
|
97.70%
|
89.66%
|
97.70%
|
89.66%
|
89.53%
|
92.85%
|
|
DTG Features
|
83.91%
|
83.91%
|
90.80%
|
83.91%
|
84.88%
|
85.48%
|
|
DTG Curves + Chemical
|
98.85%
|
90.80%
|
95.40%
|
88.51%
|
89.53%
|
92.62%
|
|
DTG Features + Chemical
|
94.25%
|
91.95%
|
97.70%
|
90.80%
|
90.70%
|
93.08%
|
|
RF
|
K1
|
K2
|
K3
|
K4
|
K5
|
Average
|
|
Chemical Indicators
|
90.80%
|
83.91%
|
72.41%
|
80.46%
|
80.23%
|
81.56%
|
|
DTG Curves
|
88.51%
|
90.80%
|
80.46%
|
90.80%
|
82.56%
|
86.63%
|
|
DTG Features
|
87.36%
|
89.66%
|
89.66%
|
87.36%
|
80.23%
|
86.85%
|
|
DTG Curves + Chemical
|
90.80%
|
93.10%
|
80.46%
|
91.95%
|
83.72%
|
88.01%
|
|
DTG Features + Chemical
|
91.95%
|
94.25%
|
91.95%
|
83.72%
|
88.37%
|
90.05%
|
|
KNN
|
K1
|
K2
|
K3
|
K4
|
K5
|
Average
|
|
Chemical Indicators
|
63.22%
|
64.37%
|
59.77%
|
59.77%
|
65.12%
|
62.45%
|
|
DTG Curves
|
86.21%
|
82.76%
|
78.16%
|
89.66%
|
81.40%
|
83.64%
|
|
DTG Features
|
77.01%
|
73.56%
|
75.86%
|
67.82%
|
66.28%
|
72.11%
|
|
DTG Curves + Chemical
|
64.37%
|
64.37%
|
60.92%
|
62.07%
|
63.95%
|
63.14%
|
|
DTG Features + Chemical
|
79.31%
|
77.01%
|
77.01%
|
73.56%
|
67.44%
|
74.87%
|
|
LDA
|
K1
|
K2
|
K3
|
K4
|
K5
|
Average
|
|
Chemical Indicators
|
80.46%
|
80.46%
|
87.36%
|
81.61%
|
86.05%
|
83.19%
|
|
DTG Curves
|
80.46%
|
83.91%
|
73.56%
|
77.01%
|
83.72%
|
79.73%
|
|
DTG Features
|
78.16%
|
79.31%
|
82.76%
|
71.26%
|
75.58%
|
77.42%
|
|
DTG Curves + Chemical
|
88.51%
|
88.51%
|
86.21%
|
89.66%
|
86.05%
|
87.78%
|
|
DTG Features + Chemical
|
93.10%
|
88.51%
|
94.25%
|
87.36%
|
95.35%
|
91.71%
|
The performance of the four algorithms in regional style prediction across different feature categories indicates: (1) Feature complementarity: CNN, RF, and LDA all show improved prediction performance after fusing thermogravimetric features with chemical indicators, confirming the complementary effects between pyrolysis kinetics and chemical composition, while demonstrating that dimensionality reduction reduces data redundancy. (2) CNN dominance: After combining thermogravimetric features with chemical indicators, CNN achieves an average accuracy of 93.08%, significantly outperforming traditional algorithms (LDA: 91.71%, RF: 90.05%, KNN: 74.87%).
In the approach combining thermogravimetric features with chemical indicators, the model stability (range values) under five-fold cross-validation ranks as CNN (7%) < LDA (8%) < RF (11%) < KNN (12%), validating the robustness of the deep learning model in cross-validation. The accuracy of KNN decreases when integrating high-dimensional DTG curves, reflecting the curse of dimensionality.
This study selects the multi-source data and CNN-based model as the leaf regional style evaluation model for subsequent analysis. It performs exceptionally well in validation on 434 tobacco leaves, as shown in Table 4, achieving an overall accuracy of 99.54%, with region D showing slight misclassification (2/61 samples).
Table 4. Prediction Results of Leaf-centric CNN Model on 434 Tobacco Samples
|
True Production Region
|
Sample Size
|
Predicted Production Region
|
Errors
|
Accuracy
|
|
A
|
B
|
C
|
D
|
E
|
F
|
G
|
H
|
|
A
|
54
|
54
|
--
|
--
|
--
|
--
|
--
|
--
|
--
|
0
|
100.00%
|
|
B
|
78
|
--
|
78
|
--
|
--
|
--
|
--
|
--
|
--
|
0
|
100.00%
|
|
C
|
62
|
--
|
--
|
62
|
--
|
--
|
--
|
--
|
--
|
0
|
100.00%
|
|
D
|
61
|
--
|
--
|
--
|
59
|
1
|
--
|
1
|
--
|
2
|
96.72%
|
|
E
|
45
|
--
|
--
|
--
|
--
|
45
|
--
|
--
|
--
|
0
|
100.00%
|
|
F
|
38
|
--
|
--
|
--
|
--
|
--
|
38
|
--
|
--
|
0
|
100.00%
|
|
G
|
36
|
--
|
--
|
--
|
--
|
--
|
--
|
36
|
--
|
0
|
100.00%
|
|
H
|
60
|
--
|
--
|
--
|
--
|
--
|
--
|
--
|
60
|
0
|
100.00%
|
|
Total
|
434
|
--
|
--
|
--
|
--
|
--
|
--
|
--
|
--
|
2
|
99.54%
|
4.2 Dominant Style Mechanism Analysis in Blended leaf Formulations
To elucidate the dynamic interplay between the style characteristics of blended leaf formulations and the proportion of primary source leaves, the leaf-centric CNN model was first used to predict the styles of all 304,800 blends. Initial evaluation revealed a 50.91% consistency rate between predicted formulation styles and primary source leaf styles (Table 5), prompting investigation into two potential hypotheses: (1) genuine style transformation arising from multi-regional leaf interactions, or (2) feature extraction limitations in single-leaf style modeling.
Table 5. Consistency Rate Between Formulation Styles Predicted by Leaf-centric CNN and Primary Source Leaf Styles in Blends
|
Maximum Proportion
|
Number of Blends
|
A
|
B
|
C
|
D
|
E
|
F
|
G
|
H
|
Average
|
|
90%
|
140
|
100.00%
|
100.00%
|
100.00%
|
100.00%
|
100.00%
|
100.00%
|
100.00%
|
99.29%
|
99.91%
|
|
80%
|
560
|
99.82%
|
100.00%
|
100.00%
|
100.00%
|
100.00%
|
99.64%
|
100.00%
|
94.64%
|
99.26%
|
|
70%
|
1680
|
99.40%
|
98.27%
|
99.76%
|
98.63%
|
98.81%
|
87.44%
|
98.81%
|
57.14%
|
92.28%
|
|
60%
|
4200
|
92.14%
|
91.88%
|
98.21%
|
92.00%
|
93.90%
|
61.52%
|
88.10%
|
20.74%
|
79.81%
|
|
50%
|
9100
|
67.14%
|
74.13%
|
93.43%
|
74.42%
|
79.76%
|
35.03%
|
65.01%
|
4.99%
|
61.74%
|
|
40%
|
14560
|
37.11%
|
42.03%
|
89.06%
|
47.81%
|
61.25%
|
15.65%
|
43.02%
|
0.49%
|
42.05%
|
|
30%
|
7860
|
15.41%
|
12.39%
|
88.49%
|
19.39%
|
43.65%
|
4.40%
|
25.64%
|
0.00%
|
26.17%
|
|
Total
|
38100
|
49.77%
|
52.62%
|
91.67%
|
56.37%
|
68.01%
|
27.73%
|
53.16%
|
7.94%
|
50.91%
|
To discriminate between these hypotheses, a proportional regression model based on CNN was constructed by adding an additional convolutional layer to the baseline architecture (Figure 2), reducing the output dimension of the fully connected layer to 1, and removing activation functions. Regional proportion prediction across all blends demonstrated robust performance with mean accuracy of 82.17%, and prediction errors remained within ±10% (Table 6). Lower accuracy was observed for Region C (75.83%) and Region F (75.48%), while higher accuracy was achieved for Region B (90.74%) and Region H (89.68%), indicating differences in style retention across regions. Additionally, the integrated regression model achieved 84.75% style consistency between predicted formulation styles and primary source leaf styles. These results confirm that the primary source leaves maintain stylistic dominance despite blending interactions, otherwise the accurate prediction of regional proportions and identification of primary source leaves in blends would not be achievable.
Table 6. Performance of Proportional Regression Model (Within ±10% Error Range)
|
Designed Proportion
|
Predicted Proportion Range
|
Percentage of Blends within Predicted Range
|
|
A
|
B
|
C
|
D
|
E
|
F
|
G
|
H
|
|
90%
|
80%-100%
|
97.86%
|
97.86%
|
80.71%
|
98.57%
|
93.57%
|
85.00%
|
92.14%
|
98.57%
|
|
80%
|
70%-90%
|
96.79%
|
96.25%
|
81.07%
|
95.00%
|
92.50%
|
81.25%
|
86.79%
|
99.64%
|
|
70%
|
60%-80%
|
93.15%
|
92.56%
|
72.86%
|
88.27%
|
87.38%
|
76.49%
|
83.75%
|
97.98%
|
|
60%
|
50%-70%
|
89.38%
|
90.24%
|
69.90%
|
84.64%
|
82.17%
|
73.50%
|
78.19%
|
94.81%
|
|
50%
|
40%-60%
|
86.85%
|
88.03%
|
67.53%
|
82.35%
|
79.25%
|
73.07%
|
75.52%
|
90.36%
|
|
40%
|
30%-50%
|
84.26%
|
86.27%
|
66.47%
|
80.80%
|
76.45%
|
72.45%
|
73.55%
|
87.01%
|
|
30%
|
20%-40%
|
77.46%
|
83.36%
|
66.16%
|
76.96%
|
71.55%
|
69.95%
|
68.44%
|
82.43%
|
|
20%
|
10%-30%
|
73.05%
|
84.46%
|
71.34%
|
74.82%
|
69.81%
|
70.64%
|
69.59%
|
79.88%
|
|
10%
|
0-20%
|
73.47%
|
95.06%
|
93.17%
|
78.43%
|
74.90%
|
85.04%
|
84.01%
|
84.21%
|
|
0%
|
0-10%
|
66.28%
|
93.34%
|
89.11%
|
71.38%
|
67.14%
|
67.40%
|
80.03%
|
81.94%
|
|
Average
|
83.85%
|
90.74%
|
75.83%
|
83.12%
|
79.47%
|
75.48%
|
79.20%
|
89.68%
|
|
Total average
|
82.17%
|
4.3 Development and Validation of Fusion Style Model
To overcome the low accuracy (50.91%) of the leaf-centric CNN regional style model in formulation prediction, we developed a fusion style model unifying feature representation across 434 tobacco leaves and 304,800 blend formulations. The model architecture operationalizes the empirically validated hypothesis that blend styles inherit dominance from their primary source leaf origin characteristics, as established in Section 4.2.
When applying this fusion model to region styles prediction for 434 tobacco leaf samples (see Table 7), the average prediction accuracy reaches 90.09%, confirming robust feature extraction capabilities in leaf regional style.
Table 7. Prediction Results of Fusion Style Model on 434 Tobacco Samples
|
True Production Region
|
Sample Size
|
Predicted Production Region
|
Errors
|
Accuracy
|
|
A
|
B
|
C
|
D
|
E
|
F
|
G
|
H
|
|
A
|
54
|
50
|
3
|
--
|
--
|
--
|
--
|
--
|
1
|
4
|
92.59%
|
|
B
|
78
|
--
|
76
|
--
|
1
|
--
|
1
|
--
|
--
|
2
|
97.44%
|
|
C
|
62
|
--
|
--
|
54
|
--
|
4
|
3
|
1
|
--
|
8
|
87.10%
|
|
D
|
61
|
--
|
--
|
--
|
58
|
1
|
--
|
--
|
2
|
3
|
95.08%
|
|
E
|
45
|
--
|
1
|
--
|
1
|
41
|
1
|
--
|
1
|
4
|
91.11%
|
|
F
|
38
|
--
|
2
|
2
|
1
|
--
|
33
|
--
|
--
|
5
|
86.84%
|
|
G
|
36
|
--
|
7
|
--
|
--
|
--
|
1
|
25
|
3
|
11
|
69.44%
|
|
H
|
60
|
2
|
--
|
--
|
--
|
--
|
--
|
4
|
54
|
6
|
90.00%
|
|
Total
|
434
|
52
|
89
|
56
|
61
|
46
|
39
|
30
|
61
|
43
|
90.09%
|
The integrated style model was applied to predict blended leaf formulation styles, as shown in Table 8. The results demonstrate that the predicted formulation styles strongly align with the primary source leaf styles, achieving an average consistency rate of 87.90%. This significantly outperforms the leaf-centric style model (50.91%) and slightly exceeds the regression model (84.75%). Key findings include:
(1) Threshold Effect: When the primary source leaf proportion ≥40%, the average style consistency exceeds 87.50%.
(2) Region-Specific Retention: Regions B and H exhibit superior style preservation capabilities, maintaining >93% consistency even at a 40% proportion, significantly outperforming other regions.
(3) Nonlinear Decay: As the primary source leaf proportion decreases from 90% to 30%, average consistency nonlinearly declines from 99.91% to 67.90%.
Table 8. Consistency Rate Between Formulation Styles Predicted by Fusion Model and Primary Source Leaf Styles in Blends
|
Maximum Proportion
|
Number of Blends
|
A
|
B
|
C
|
D
|
E
|
F
|
G
|
H
|
Average
|
|
90%
|
140
|
100.00%
|
100.00%
|
100.00%
|
100.00%
|
100.00%
|
100.00%
|
100.00%
|
99.29%
|
99.91%
|
|
80%
|
560
|
100.00%
|
100.00%
|
100.00%
|
100.00%
|
100.00%
|
100.00%
|
100.00%
|
100.00%
|
100.00%
|
|
70%
|
1680
|
100.00%
|
100.00%
|
99.94%
|
100.00%
|
100.00%
|
100.00%
|
100.00%
|
100.00%
|
99.99%
|
|
60%
|
4200
|
99.88%
|
99.95%
|
99.31%
|
99.90%
|
99.57%
|
99.81%
|
99.88%
|
100.00%
|
99.79%
|
|
50%
|
9100
|
98.20%
|
99.47%
|
94.41%
|
97.97%
|
94.20%
|
96.33%
|
97.41%
|
99.52%
|
97.19%
|
|
40%
|
14560
|
88.97%
|
94.38%
|
82.21%
|
88.79%
|
79.44%
|
85.46%
|
87.54%
|
93.19%
|
87.50%
|
|
30%
|
7860
|
66.56%
|
73.69%
|
66.93%
|
66.34%
|
61.48%
|
67.42%
|
70.66%
|
70.13%
|
67.90%
|
|
Total
|
38100
|
88.44%
|
92.29%
|
84.97%
|
88.28%
|
82.76%
|
86.82%
|
88.55%
|
91.12%
|
87.90%
|
The fusion model’s predictive superiority validates its capacity to resolve the feature representation between leaf and blended systems. These results systematically confirm that blend styles constitute weighted integrations rather than emergent properties, with primary source leaves governing stylistic trajectories through quantifiable composition thresholds.
4.4 Style Modulation Strategy in Blended Leaf Formulations
Analysis of the 36,867 style mismatch cases (12.1% of total formulations) revealed critical patterns in compositional-stylistic relationships (Table 9). Detailed analysis of these mismatched cases revealed that 87.5% (32,246 blends) occurred when the proportion difference between primary and secondary source regions fell within 10%. Furthermore, 86.5% of errors (31,902 blends) involved misclassification into secondary source region styles, highlighting the sensitivity of style outcomes to compositional balance.
Table 9. Data of Mismatches Between Predicted and Designed Primary Source Regions in Blends
|
Proportion Difference Between Primary and Secondary Sources
|
Total Blends
|
Mismatched Blends
|
Mismatch Rate
|
Classified as Secondary Source
|
|
80%
|
1120
|
1
|
0.09%
|
0
|
|
70%
|
3360
|
0
|
0.00%
|
0
|
|
60%
|
6720
|
0
|
0.00%
|
0
|
|
50%
|
12320
|
1
|
0.01%
|
1
|
|
40%
|
24640
|
11
|
0.04%
|
8
|
|
30%
|
47040
|
269
|
0.57%
|
239
|
|
20%
|
80800
|
4339
|
5.37%
|
3716
|
|
10%
|
128800
|
32246
|
25.04%
|
27938
|
|
Total
|
304800
|
36867
|
12.10%
|
31902
|
These findings inform a dual-path regulatory framework for precision blend design:
A. Primary Source Style Preservation Protocol
(1) Composition Differential Threshold: Maintain ≥10% proportion advantage over secondary sources.
(2) Risk Mitigation: Reduces style deviation probability to <5%.
(3) Dominance Assurance: Blends with ≥30% proportion difference exhibit high style fidelity.
B. Dual-Source Style Retention Strategy
(1) Controlled Synergy: Restrict primary-secondary proportion differences to <10%.
(2) Hybrid Expression: Enables balanced integration of regional characteristics.
The framework demonstrates particular efficacy in addressing region-specific interactions. Regions B and H exhibit superior style retention, while regions C and F demonstrate reduced preservation capacity attributable to enhanced cross-component interactions in their phytochemical profiles. By quantifying these relationships, the study establishes a predictive foundation for converting compositional parameters into predefined style outputs, transforming blend design from empirical experimentation to computational optimization. This advances the field toward precision engineering of tobacco formulations with controlled stylistic outcomes.