Clinical data
The demographic data of the 254 HCC patients are presented in Table S1.
Somatic mutations
The mutational landscape is shown in Fig. 1. The top 10 most common mutated genes were TP53 (22% vs. 28%), CTNNB1 (13% vs. 24%), MUC16 (8% vs. 16%), LRP1B (6% vs. 8%), ALB (5% vs. 11%), and CSMD3 (5% vs. 8%) and the overall mutation rate was lower than the rate reported in The Cancer Genome Atlas-Liver Hepatocellular Carcinoma (TCGA-LIHC). Some mutated genes were present at higher frequencies, such as RB1 (11% vs. 5%), ARID1A (10% vs. 7%), AXIN1 (9% vs. 8%), and ARID2 (8% vs. 5%), compared with TCGA-LIHC; the TERT mutation rate (47.24%) was determined by analysis of promoter (Fig. 1A). Figure 1B shows the most commonly mutated genes and gene mutation frequencies among the various etiologies of Taiwanese HCCs.
We also compared the mutational frequencies of Taiwanese HCCs with the frequencies in various subgroups in TCGA; we found that the frequencies of cancer-related genes were different from the rates in other regions, even in Asia (Fig. S1).
Analysis of mutational frequencies according to etiology showed that FBN2, IRAK3, MAGI1, KDM3B, LRP1B, CTNNB1, SPTA1, and ARID1A were more frequently mutated in HCV-related HCC (p = 0.0024, 0.0143, 0.0143, 0.0151, 0.0153, 0.0286, 0.0337, 0.0482, respectively), ATRX was more frequently mutated in HBV-related HCC (p = 0.0367), COL3A1 was more frequently mutated in dual HBV/HCV inflection (p = 0.0061), and CSMD3 was more frequently mutated in double cancer-related HCC (p = 0.0104) (Fig. 1C).
Analysis of associations between clinical data and genomic alterations showed that more women had HCV infection (p = 0.0127), and HCV-related HCC was associated with a higher rate of cirrhosis (p = 2.961e-05). Kaplan-Meier survival analysis indicated that the different etiologies did not influence survival. (Fig. S2). TERT promoter mutations frequently occurred in HCV-related HCC (p = 5.024e-06) and were uncommon in HBV-related HCC (p = 2.812e-06). Moreover, the presence of TERT promoter mutation was associated with a significant decrease in overall survival (p = 0.039) (Fig. S3).
Copy number alterations (CNAs)
CNAs, which involved chromosomal regions and related genes, are shown in the heatmap (Fig. S4A) and circos plot (Fig. S5A). The most frequent alterations in chromosomal arm were copy number gains in 1q, 6p, 7, 8q, and 17q, as well as copy number losses in 4q, 8p, 13p, 16, and 17p; whole chromosomal amplification did not affect patient survival (p = 0.78). Loss of 17p was associated with better survival (p = 0.05), whereas 7q amplification was associated with worse survival (p = 0.053) (Fig. S4B). Subsequent analyses of cancer-related genes in these regions of chromosomal alteration revealed that they may contain several oncogenes and tumor suppressor genes (TSGs), resulting in simultaneous gain or loss of these genes (i.e., overexpression or down-expression of both oncogene and TSG). For example, deletion of TSGs (CHD3, GPS2, PER1, and YWHAE in the17p deletion) [24, 25] was associated with better survival. Survival-related CNA-involved genes are shown in Fig. S4C and Fig. S5B. Associations between HCC etiology and CNAs of related genes are shown in Fig. S5C. Overall, double cancers and HBV-related HCCs had more copy number gains, compared with other etiologies.
Structural alterations
Analysis of cancer driver structural variants (SVs) revealed 17,639 somatic SVs in our cohort (Fig. S6). We used probability of being loss-of-function intolerant (pLI) to analyze somatic SVs; we identified 1108 (p < 2.2e-16, odds ratio = 1.69) and 1761 (p < 2.2e-16, odds ratio = 0.60) involved oncogenes or TSGs, respectively [24–27], for somatic SVs in our HCC cohort (Fig. S7).
We then analyzed the clinical significance of somatic SVs with > 3% recurrent frequencies; associations between HCC etiology and SVs are shown in Fig. 2. Most SVs were on the same chromosome, and only a few SVs involved different chromosomes. Thirteen SVs (all associated with patient survival) were in the outer areas of the circos plot of chromosome (Fig. 2A). Five SVs (one duplication, two translocations, and two inversions) were associated with double cancers, two deletion SVs were associated with HCV-related HCC; seven SVs (four inversions, two translocations, and one duplication) were associated with HBV-related HCC, seven SVs (five inversions, one duplication, and one translocation) were associated with dual HBV/HCV infection HCC; and five SVs (one inversion and four deletions) were associated with non-HBV/non-HCV-related HCC (Fig. 2B). We identified 13 somatic SVs associated with patient survival (Fig. S8); 7 of the 13 somatic SVs were novel (Fig. S9).
Mutation analysis of 114 histone-related genes, 74 HCC-related long non coding RNAs (lncRNAs), and 36 non-coding driver genes
Among 114 histone-related genes, we identified 39 variants with ClinVar pathogenic/likely pathogenic (P/LP; one case) or CADD ≥ 30 (21 cases); HILS1 (4 cases) had the highest mutated rate (Table S2). We validated HILS1 mutations using Sanger sequencing (Fig. S10). Among 74 HCC-related lncRNAs, we identified 24 possible driver variants in 254 HCCs; HAGLR and LINC473 (5 cases) had more mutations (Table S3). Among 36 non-coding driver genes, we identified two RMRP promoter variants with ClinVar P/LP and one COX6B2 variant with CADD ≥ 30. Many NEAT1 and Malat1 variants were detected, and their CADD scores were null; further analyses are needed to identify the roles of these variants (Table S4).
Mutational signatures
We performed mutational signature analysis on the core set of 254 HCCs by non-negative matrix factorization. There were three types of mutational signatures in Taiwanese HCCs. The case numbers and distributions of mutational signatures in each cluster are shown in Fig. 3A. Associations among viral type, biochemical data, and TERT mutation are shown in Fig. S11. Signature 1 (19.29%, 49 cases) was matched to SBS22 and significantly associated with the plant-derived carcinogen aristolochic acid (AA) signature, with a predominance of A:T-to-T:A transversions at T/CAG tri-nucleotide motifs. Signature 2 (194 cases) was matched to SBS5 with unknown etiology. Signature 3 was matched to SBS9 (11 cases) and associated with polymerase eta.
Greater proportions of HCV and dual HBV/HCV infection-related HCCs exhibited the AA exposure signature (19/82 and 7/19, respectively); this signature was present in 9/106, 0/28, and 0/19 of HBV, non-HBV/non-HCV, and double cancer-related HCC, respectively. Among cases of non-HBV/non-C and double cancer-related HCCs, 6/28 and 8/19, respectively, exhibited the chemotherapy-related signature SBS25; no cases with other etiologies exhibited SBS25 (Fig. 3B).
Differential expression genes (DEGs) and clinical significance
In total, 3585 of 51,497 genes were differentially expressed: 2694 genes were overexpression, and 891 genes were down-expression (details can be provided on request). Next, we analyzed the DEGs and their associations with survival; 229 genes (123 protein-coding and 106 non-coding) were associated with survival. Among the 123 protein-coding genes, increased expression levels of 28 genes were associated with better survival, whereas increased expression levels of the remaining 95 genes were correlated with worse survival. Among the 106 non-coding genes, increased expression levels of 33 genes (including 16 novel genes) and the remaining 73 genes (including 29 novel genes) were associated with better or worse survival, respectively (Table S5).
Fusion gene analysis and clinical significance
After low confidence fusion events had been filtered out, 206 fusion events from 180 fusion genes remained (Table S6); 88 cases exhibited fusion genes, including 19 cases with > 2 fusion genes. Three fusions (SLC45A2-AMACR, ITCH-ASIP, and RNF138-RNF125) were present in > 1 case (Fig. S12A). Compared with HCC patients who lacked fusion events, HCC patients with fusion events had better prognoses (Fig. S12B).
Fifty of fusion genes have been reported and 14 involving known cancer genes (ALDH2-ACAD10, BIRC6-SPAST, BIRC6-TTC27, CGNL1-TCF12, CPEB3-IDE, CTNNA1-KDM3B, DNAJB1-PRKACA, LRP5-CHKA, PPFIBP1-STK38L, RNF213-SLC26A11, RXRA-WDR5, SEC16A-NOTCH1, WNK1-ERC1, and ACVR1B-ACVRL1). Among the 111 novel fusion genes, 18 involving known cancer genes (APOH-NKTR, ARHGEF10L-HEG1, C1S-GNAI2, CCND1-FGF19, CHP1-INO80, CPB2-LCP1, KANSL1-AC005324.3, LASP1-NCOR1, LPP-RNF139, PIK3R1-NKD2, QKI-CDC45, RHOA-MST1, RNF139-LPP, RXRA-RAPGEF1, SLC30A7-DPYD, SRSF3-PNPLA1, TCF12-TMOD3, and UBL3-SYNE1) (Table S7).
Analysis of HBV DNA and transcripts using WGS and RNA-seq
We used kraken tools to analyze HBV DNAs and HBV-related transcripts. In total, 116 HBV-related HCCs were closely associated with the results of serum testing, RNA-seq, and WGS; 131 cases were closely associated with the results of WGS and RNA-seq; and 3 and 12 cases were closely associated with only WGS or only RNA-seq, respectively (Fig. 4A).
Among HBV-related fusion genes, we selected fusion detected by WGS or RNA-seq that exhibited > 2 reads. Twenty-five genes were fused with HBV virus; the top three fusion genes were KMT2B, TERT, and CC2. Only a few DNA fusion genes had RNA transcripts (Fig. 4B); the structures of representative fusion genes are shown (Fig. 4C). We compared the survival of HCCs with and without fusions, we found that HBV-related fusion gens were not associated with patient survival (p = 0.86 using RNA analysis, p = 0.15 using DNA analysis) (Fig. S13).
Alternative splicing (AS) analysis and clinical significance
We used SUPPA2 to estimate the abundances of AS events and calculate proportion spliced-in (PSI) values for Taiwanese HCCs. There were 914 AS events for 0.2 PSI and 920 events for 0.3 PSI. For both 0.2 and 0.3 PSI, the number of common events was 345 (Fig. S14). We then analyzed survival-related AS events; 98 events in 93 genes and 86 events in 75 genes were associated with better or worse survival, respectively (Fig. S15).
For the novel AS genes, 148 have not been previously reported (Table S8). Increased expression level of 79 genes (79/93) that underwent AS were associated with better survival. However, increased expression levels of 65 other genes (65/75) that underwent AS were associated with worse survival. AS for CDK13, CFLAR, EGLN1, and ZNF717 exhibited aberrant effects on survival that were caused by different types of AS event (Fig. S16); for example, CDK13 alternative first exon was associated with better survival, whereas CDK13 alternative last exon was associated with worse survival.
We analysis the impacts of AS on protein structure; different AS forms of the same gene led to changes in survival (Fig. 5A-B). AS of oncogene-like genes [28, 29] resulting in loss of function (e.g., frameshift with nonsense decay or protein truncation) was associated with better survival (Fig. 5C-D and Fig. S17). With respect to AS of TSG-like genes [24, 30]with worse survival, short truncated proteins were identified by analysis of possible coded protein from AS transcripts (Fig. 5E-F and Fig. S18).
Associations between immune checkpoint gene expression and genomic alterations
We analyzed the effects of genomic alterations on the expression patterns of 42 immune checkpoint genes. Eight highly recurrent mutated genes (TP53, CTNNB1, RB1, ARID1A, AXIN1, ARID2, MUC16, and TERT) were selected for analysis (Details can be provided on request). Only mutations in ARID1A was associated with higher expression of CD70 (Fig. S19A).
Recurrent copy changes in many cancer-related genes also affected the expression patterns of immune checkpoint gene (Details can be provided on request). Statistically significant copy changes occurred in AKT1 (influencing TNFSF14), AKT2 (influencing CD40LG), ARID1A (influencing CD276), ARID1B (influencing CD40LG and TNFRSF4), AXIN1 (influencing CD276, CD96, and PDCD1LG2), GNAS (influencing TNFSF14 and PDCD1LG2), IDH2 (influencing CD274 and CD96), NF1 (influencing CD40LG), SMARC4 (influencing HHLA2, TNFSF9, and VTCN1), STAT3 (influencing CD40LG, LGALS3, and CD96) and TSC2 (influencing CD276 and TRAC) (Fig. S19B-V).
Associations of 64 immune and stroma cell types/three scores with various genomic alterations
We used xCell to explore associations between genetic alterations and the tumor microenvironment. For mutation association analysis, we selected the top eight mutated genes (described above) to explore their associations with the numbers of immune and stroma cell. We found that CTNNB1 mutations were associated with a decrease in epithelial cells but increases in hematopoietic stem cells and megakaryocyte-erythroid progenitor (MEP). RB1 mutations and copy loss were associated with decreases in adipocytes, lymphatic (ly) endothelial cells, stroma score, and preadipocytes; they were associated with increases in common lymphoid progenitor (CLP), pro B-cells, and Th1 and Th2 cells. TERT mutations were associated with a decrease in neurons, whereas ARID1B copy loss was associated with an increase in microvascular (mv) endothelial cells. IL6ST copy loss was associated with a decrease in microenvironment score. MDM4 amplification was associated with a decrease in pro B-cells, whereas TSC1 amplification was associated with a decrease in adipocytes. TSC1 copy loss was associated with an increase in sebocytes. AXIN1, BRD7, IDH2 and TSC2 copy loss were each associated with a decrease in hepatocytes (Fig. S20).
With respect to 17p deletion and 7q duplication, none remained statistically significant after adjustment. The SVs (AnnotSV_ID: 1_105963115_182721369_DUP_1) involving 38 protein genes was associated with basophils (Fig. S22).
Associations among the expression of immune checkpoint genes, the number of 64 cell types/3 scores, and AS
We explored associations among survival-related AS events, immune checkpoint genes, and 64 cell types/3 scores. AS events in well-known cancer-related genes involving more cancers usually have greater influence on both the numbers of immune and stromal cells and the expression patterns of immune checkpoint genes. For example, better survival-related AS of 17 well-known cancer-related genes (1st to 17th ) was associated with larger changes in the numbers of mesenchymal stem cells (MSC), natural killer T (NKT), eosinophils, central memory CD4 + T cell (CD4 + Tcm), pericytes, ly endothelial cells, mv endothelial cells, Th1 cells, MEP, endothelial cells, and inflammatory (M1) macrophages, as well as greater changes in stroma score; it was associated with smaller changes in the numbers of CD8 + naïve T-cells, CLP, Th2 cells, and smooth muscle. This AS was also associated with significant increases in the expression levels of the immune checkpoint genes TNFRSF4, TNFRSF18, VSIR, TMIGD2, ICOSLG, TNFRSF14, PDCD1, and CD70, along with decreases in the expression levels of TNFSF4, CEACAM1, PDCD1LG2, and CD86; AS of other genes showed similar associations (Fig. 6). Fig. S22 shows that worse survival-related AS events were associated with distinct changes in cell type and immune checkpoint gene expression.