Generation of Engineered Heart Tissues as a Model of Human Myocardium
Although monolayer hiPSC-CM cultures have revolutionized cardiac disease research [13, 14], their cellular immaturity limits their potential as an experimental system. Engineered heart tissue or engineered heart myocardium [18-22] systems have been developed by mixing hiPSC-CMs with fibroblasts and other cell types, and then placing cell mixtures onto bio-scaffolds where they mature under tension. There are multiple formats for tissue generation, varying in bio-scaffold content and form, ranging from simple strips to rings or even bio-printed ventricular shapes [41]. Here, we molecularly characterized the transcriptomic and open chromatin landscapes of human EHTs and compared these data to hiPSC-CMs in 2D cultures and to left ventricle (LV) myocardium.
To generate EHTs, eGFP-expressing human iPSCs were differentiated to cardiomyocytes in monolayer cell culture for 10 days using WNT modulation followed by depletion of stem cells and enrichment for cardiomyocytes [21-23, 42]. Enriched cardiomyocytes were combined 9:1 with fibroblasts isolated from human hearts in a three-dimensional collagen-based scaffold, forming approximately 15mm ring-like EHTs around two posts. The tissues were then matured for an additional 20 days, totaling 30 days from hiPSC-CM induction to mature EHTs (Figure 1A). Immunofluorescence imaging with an antibody to cMyBP-C in EHTs revealed cardiomyocyte alignment as well as sarcomere alignment parallel to the axis of contraction (Figure 1B). By comparison, 2D monolayer hiPSC-CMs show sarcomeres radially aligned around central nuclei, with little cell-cell alignment (Figure 1C). These characteristics agree with previous morphology described in EHTs [42-44], and further suggest that this cellular alignment more closely models in vivo cardiomyocytes.
Single-cell gene expression in EHTs
To assess the cell-type composition and gene expression patterns within EHTs, we dissociated pools of day 30 EHTs, collected from 2 independent differentiations, into single cell suspensions, and generated single-cell RNA seq libraries using the 10x Genomics ChromiumTM platform. After sequencing, quality control, and filtering, we obtained 17,081 cells. Unbiased clustering of the transcriptomic data revealed 14 communities (Supplement 1A-1C) comprising six distinct cell-types (Figure 1D). By cell number, cardiomyocytes (42.4 %) and cardiac fibroblasts (37.9 %) represented most of the tissue. This cardiomyocyte percentage is similar to estimates found in human hearts from the Heart Cell Atlas (HCA) project, where cardiomyocytes constitute 49% of ventricle cells [45]. SMC/pericyte-like, mesothelial progenitors, neural-like, and endothelial cells contribute the remaining 19.7% of the cell population. EHT cardiomyocytes expressed genes for canonical sarcomere components and known cardiac transcription factors (Figure 1E, Supplement 1D). Furthermore, average expression of the adult ventricular myosin heavy chain, MYH7, exceeded that of the fetal isoform MYH6 in EHTs. In general, each major cell-type expressed canonical markers such as FN1 in cardiac fibroblasts, RGS5 in pericyte/SMC cells, PECAM1 in endothelial cells, and SOX2 in neural-like cells (Figure 1F). As a validation of these cell type annotations, we compared our labels with predictions from CellTypist [46, 47] models trained on the human Heart Cell Atlas. We found general agreement in the predictions of the major cell types, adding confidence regarding EHT composition (Supplement 1B).
To assess potential heterogeneity within the EHT-CM population, we subclustered EHT-CMs at higher resolution with particular attention to sarcomere and transcription factor gene expression. This analysis revealed populations spanning the maturation spectrum (Figure 2A), aligning well with the second principal component of variation in the expression data (Figure 2B). We noted a continuum of early, still cycling, and late cardiomyocytes, characterized by differential regulation of canonical adult and neonatal sarcomere genes as well as the specific transitions between MYH6 and MYH7 (Figure 2C). We also noted subsets of cardiomyocytes across maturation states expressing extracellular matrix components (noted as ECM+). We identified a cardiac progenitor population defined by high expression of the transcription factors GATA4, TBX20, and MYOCD (Figure 2C). To compare cardiac gene expression between input hiPSC-CMs and EHT-CMs, we performed RNA-seq on 2D hiPSC-CMs. Gene expression was moderately and significantly correlated (r: 0.476334. P-value: 0.0025) between the datasets (Figure 2D). However, several genes characteristic of mature cardiomyocytes were expressed at substantially higher levels in EHTs, including the sarcomere components TPM1, ACTC1, TTN, MYL2, MYL3, TNNC1, and MYH7. Conversely, we observe only one gene relatively upregulated in 2D hiPSC-CMs, MYH6, the MYH isoform that characterizes the developing ventricle, highlighting a difference in maturity between the systems.
Open chromatin profiling of engineered heart tissues
Given the advanced maturation of gene expression patterns in EHT-CMs relative to 2D hiPSC-CMs, we speculated that the enhancer landscape of myocardial genes may also more closely resemble that of the adult heart. To test this hypothesis, we generated deep ATAC-Seq datasets from day 30 EHTs and 2D hiPSC-CMs. To separate cell populations into cardiomyocyte-enriched (CM) and fibroblast enriched (FB) pools from which to generate libraries, we sorted the input cell populations using FACS, gated for eGFP signal (Figure 3A). For an adult myocardial comparison, we also reprocessed sequencing data from eight libraries of four high quality bio samples taken from left ventricle (LV) tissue originally published for the ENCODE project [24, 25] (Supplemental Table 1). After peak calling and filtering, this analysis produced a set of 160,270 fixed-width open chromatin regions (OCRs) in the reference ENCODE LV dataset and 153,189 OCRs in the 2D hiPSC-CM ATAC-Seq (Figure 3B). In the EHT-CM and EHT-FB datasets, we found substantially more OCRs: 221,530 and 185,303 regions respectively, for a combined dataset of 406,833 OCRs. This increased number of OCRs likely reflects the model, which includes developmental and adult gene expression patterns, further enhanced by the uniquely deep sequencing applied to the EHTs.
The pattern of genomic context of the identified OCRs was consistent with what is expected from ATAC sequencing [48], where the plurality of regions mapped to introns, followed by intergenic regions, promoters, exons, and finally transcription termination sites (Figure 3C). Comparing the EHT ATAC-Seq results to ENCODE LV data reprocessed using the same metrics, we detected 206,077 EHT intersecting at least one element in the ENCODE LV set (Figure 3D). Of the EHT-CM OCRs, we also found 130,571 annotations (58.9%) overlapped at least one region in the 2D hiPSC-CM set; demonstrating broad overlap with both the adult and early states. We illustrate one example proximal to the cardiac troponin C1 gene, TNNC1, which is absent in ENCODE LV but found in both EHT and 2D hiPSC-CMs (Figure 3E, arrow). We also observe many sites where the EHT dataset mimics a hybrid state between the early open chromatin structure of 2D hiPSC-CMs and the mature structure of ENCODE LV tissue. This hybrid state is exemplified at the MYH6-MYH7 locus where 2D hiPSC-CMs display strong open chromatin signal at a proximal promoter element of the fetal ventricular heavy chain MYH6 (Figure 3F, arrows). In contrast, the adult LV shows open chromatin approximately 1900 bp upstream, with diminished signal at the proximal promoter.The EHT CM OCR dataset captures both elements with comparable signal strength. These findings suggest EHTs can model genomic perturbations relevant to cardiac development and the more mature myocardium. As an additional validation of the cardiac regulatory elements, we analyzed transcription factor binding sites within EHT CM OCRs. We found canonical cardiac transcription factors motifs among the most enriched in the EHT OCRs (Figure 3G) including those of MEF2C, GATA4, TBX20, NKX2-5, and TBX5.
CM- and FB-enriched ATAC-Seq data also enable deconvolution of the bulk ENCODE LV signal into relative cell type contributions. For instance, at DDR2, a gene encoding a collagen-binding receptor tyrosine kinase involved in fibrosis, we detect several promoter elements as well as distal intronic OCRs in the fibroblasts rather than the cardiomyocytes (Figure 3H, arrows). Motif enrichment within fibroblast footprints suggests an active state, likely reflecting the TGFb supplementation used during the tissue consolidation process. These include motifs corresponding to TCF1, ETS1, FLI1, and NR2F2 (Figure 3I). These fibroblast-specific sites provide an additional opportunity to assess potential fibroblast contributions to cardiac disease.
Functional Fine-mapping of Cardiomyopathies from EHTs
The depth of OCR annotations in EHTs enabled fine-mapping of cardiomyopathy GWAS loci [2, 5-8]. Functionally informed fine-mapping is a two-step, empirical Bayes approach to variant prioritization that uses genomic annotations to estimate how strongly GWAS effect sizes are enriched in specific functional regions. These enrichment estimates are then used as prior probabilities and combined with linkage disequilibrium data to pinpoint variants likely to contribute to trait biology at the locus (Figure 4A). We generated effect-size enrichment estimates using TORUS [38] for each OCR dataset, and applied this analysis to EHT-CM, EHT-FB, iPSC-CM, and ENCODE LV data. We used summary statistics from a multi-trait GWAS (MTAG) conducted on cohorts with hypertrophic cardiomyopathy [8] and dilated cardiomyopathy [6], left ventricle ejection fraction (LVEF) and left ventricle end-diastolic volume (LVEDVi) [49]. We also selected two traits unrelated to myocardial tissue: body mass index (BMI) [50] and schizophrenia (SCH) [51] to serve as trait controls.
Our estimates reveal strong GWAS effect size enrichment in all cardiac OCRs across each of the cardiac traits (Figure 4B). Both EHT-CM and ENCODE LV OCRs were enriched for variants associated with contractile traits like LVEF and LVEDVi, as well as DCM and HCM, reflecting the contributions of cardiomyocytes and fibroblasts underlying those phenotypes. By comparison, unrelated traits of body mass index and schizophrenia showed no enrichment (Figure 4B). The strong enrichment in cardiomyopathy associated GWAS effects may reflect increased maturation of gene regulatory circuits in EHTs across entire pathways, relative to 2D hiPSC-CMs. Assaying pathway-level gene expression in EHT-CMs, we note higher expression in EHT-CMs in the sarcomere, as well as in cardiomyopathy and several other disease-associated gene sets, and sets related to cardiac development and contractile function (Figure 4C). Taken together, these estimates demonstrate that the open chromatin regions in EHTs are globally enriched for SNPs identified in GWAS of cardiac traits at levels comparable to adult left ventricle.
We next used the enrichment estimates to derive SNP-level prior probabilities for statistical fine-mapping of each GWAS and OCR dataset with SuSieR [39, 52]. SuSieR uses summary statistics data from GWAS studies and linkage disequilibrium data (LD) to perform fine-mapping. At each locus, we obtained a group of SNPs that together have a 90% chance of containing a functional variant, termed a “credible set.” Each SNP was then assigned a posterior inclusion probability (PIP), reflecting the probability that the variant is functional. In total, 5817 SNPs were mapped to at least one credible SNP set in HCM (2545 SNPs across 83 loci) and/or DCM (3870 SNPs across 87 loci) (Supplemental Tables 3-5).
We found that average credible set sizes were reduced using this prioritized fine-mapping compared to uniform fine-mapping of the same loci. EHT-CM fine-mapping especially outperforms the other analyses in reducing large credible sets of 30 or more SNPs (Supplement 2A). Most fine-mapped SNPs were selected into more than one credible set across each OCR input, indicating a measure of robustness in the strategy (Supplement 2B). These SNPs included in multiple credible sets tended to show higher PIPs when using OCR prior probabilities (max functional PIP) when compared to fine-mapping with uniform priors (uniform PIP) (Figure 4D). This finding suggests that SNPs within open chromatin regions are more likely to be prioritized as potentially functional, as opposed to simply being in LD. A substantial proportion of the entire set of 5817 SNPs were found to intersect with at least one OCR dataset (17.3%), with DCM SNPs in OCRs at a slightly higher rate (20.5%) than HCM SNPs (15.7%) (Supplement 2C). Most of this overlap could be reproduced in EHT datasets alone (Combined: 13.6%, HCM: 12.4%, DCM 16.2%). These fine-mapping results allowed us to identify individual variants of interest which are likely to be functionally relevant for cardiomyopathy expression, improving the efficiency of downstream variant-to-function analysis.
Interrogation of Noncoding Variants in HCM and DCM
When fine mapping the HCM and DCM GWAS data, we found that most credible set SNPs mapped within noncoding regions of the genome, including those SNPs identified as having the highest probability of being functional. The annotation indicates that these SNPs are likely to modify gene expression via cis-regulatory mechanisms, as has been suggested for HCM and DCM associated GWAS signals [53]. To further prioritize loci potentially acting via cis-regulatory mechanisms, we intersected many these credible sets with chromatin conformation capture data which we previously generated in hiPSC-CMs [27].
We identified four HCM GWAS variants at chr17q24.2 that were strongly prioritized by fine-mapping. These SNPs, rs17633401, rs67700546, rs12601850, and rs9892651, were in a locus spanning the PRKCA gene, which encodes protein kinase C alpha (Figure 5A & Supplement 3A). Both rs67700546 and rs17633401 were in OCRs shared by EHT-CMs and EHT-FBs, while rs12601850 intersects a fibroblast specific element, and rs9892651 did not intersect an OCR in any dataset (Figure 5A). In enhancer assays in hiPSC-CM, the regions displayed activity, with rs12601850 and rs9892651 conferring allele-specific expression effects (Figure 5B). These variants were also identified in GTEx [54] as eQTLs for expression of PRKCA in left ventricle in the expected direction, adding external support to the presence of one or more functional variants at the locus(Figure 5C). These results illustrate that the genetic architecture of a given GWAS-associated locus may involve more than one functionally relevant variant affecting different enhancers [55, 56], potentially in different cell types.
We fine-mapped an additional HCM signal within an intron of VTI1A, a gene implicated in vesicle transport on chr10 at 10q25.2 (Supplement 4A). The credible set for this signal spans approximately 38 kilobases (kb), with both ends forming distal chromatin interactions with theTCF7L2 promoter approximately 245 kb away. Interestingly, the EHT-CM and ENCODE LV fine-mapping prioritized two different variants; rs7096151 (EHT CM PIP = 0.878) and rs10885378 (ENCODE LV PIP = 0.542), which may be due to the structure of the underlying open chromatin data. rs7096151 lies in an OCR found in EHT-CM and 2D hiPSC-CMs. Conversely, rs10885378 is at the edge of an ENCODE LV OCR (Supplement 4A). To test whether either variant was located in cardiac enhancers, we expressed 1000-bp sequences centered on each SNP in hiPSC-CMs to assay luciferase reporter activity. rs10885378-T showed enhancer activity, while rs10885378-C was not significantly different from gene desert or rs10885378-T, indicating modest genotype-specificity (Supplement 4B). Similarly, rs70961651-A trended towards higher enhancer activity than rs70961651-G, but was not significantly different (P=0.09) (Supplement 4B). The transcriptional regulator TCF7L2 is differentially active in failed hearts [57], thus, it is possible that this SNP might demonstrate stronger effects under conditions of heart failure, or that both regions may contribute and act synergistically.
We also dissected a credible set associated with DCM locus at chr8p23.1. The credible set at this locus spans 30 kb and forms long-range chromatin interactions over 280 kb in hiPSC-CM with the developmental cardiac transcription factor gene, GATA4 (Supplement 5A). A single variant, rs13260325, strongly prioritized in EHT-CM fine-mapping (PIP = 0.46), ENCODE LV (PIP = 0.577) and 2D-hiPSC-CM (PIP = 0.436), overlapped an OCR shared in all datasets (Supplement 5A). When placed into a luciferase reporter and transfected into hiPSC-CMs, rs13260325 acts as an enhancer of luciferase gene expression (Supplement 5B). These data suggest that noncoding variants, albeit at a distance from GATA4, may impart their effects on through cis-regulation of GATA4 expression in addition to the established roles of rare coding variants in GATA4 in DCM [58].
Characterization of the 3p25.1 HCM Locus
We identified a DCM-associated region at chr3p25.1 harboring a GWAS signal concentrated in an intergenic region near the LSM3 gene previously highlighted in Garnier et al, 2021 [3]. Fine-mapping of the locus prioritized a credible set spanning 6 kb overlapping a dense set of OCRs shared in multiple cardiac data sets (Figure 6A). This cluster of OCRs forms long-range chromatin interactions with SLC6A6, and GRIP2. Each fine-mapping approach strongly prioritized SNP rs6807275 (EHT-CM PIP = 0.886). Variants in this DCM-chr3p25.1 credible set are arranged into four predominant haplotypes in European-associated ancestry populations; designated haplotypes ‘A-D’. Haplotypes ‘A’ and ‘B’ occur most frequently (Allele frequencies: A = 0.455; B = 0.240) (Figure 6B). Both haplotypes carry the prioritized SNP rs6807275, but differ in three other linked variants (rs6786141, rs7650762, and rs12634565) common to the ‘C-D’ group sequences. Luciferase reporter assays confirmed enhancer activity of each of the four haplotypes expression with levels significantly higher than our negative control non-enhancer sequence (A; P = 9.19x10-8, B; P = 6.80x10-10, C; P = 1.38x10-6, D; P = 6.96x10-6) (Figure 6C). Furthermore, we observed that the ‘A’ haplotype – which is associated increased DCM risk, showed the lowest average enhancer activity, and differed significantly from the ‘B’ haplotype (P=6.72x10-3). We hypothesized that one or more of the SNPs differing between ‘A’ and ‘B’ conferred cis-compensatory, stabilizing enhancer function to mitigate the effect of rs6807275. To test this hypothesis, we created synthetic haplotypes on the ‘A’ background, introducing the single substitutions of each ‘B’ specific variant and similarly tested these sequences in hiPSC-CMs (Figure 6D). Two of the tested B-Haplotype SNPs, rs6786141 (P =2.29x10-2) and rs7650762 (P =2.25x10-2) restored enhancer activity, consistent with a compensatory effect (Figure 6E). Notably, both SNPs are also shared on the two minor ‘C-D’ group haplotypes. Reporter constructs for these haplotypes exhibited intermediate activity relative to A and B, suggesting that regulatory output at this locus reflects combinatorial interactions between at the three variants, highlighting a complex cis-regulatory architecture centered on rs6807275.
Finally, we deleted a 6 kb region containing the enhancer haplotypes in hiPSCs using CRISPR-Cas9 genome editing to test its regulatory influence on gene expression, generating heterozygous and homozygous knockout lines. We then differentiated these hiPSCs to CMs and assessed the transcriptome by RNA-seq (Figure 6F). In enhancer deleted hiPSC-CMs, the expression of both SLC6A6 (P=9.27x10-4)and GRIP2 (P=1.77x10-2) was significantly reduced (Figure 6G-H). Despite being highly expressed in the heart, GRIP2 has not been well studied for its cardiac role. There is evidence for role of the taurine transporter SLC6A6 in cardiomyopathy [2, 3, 59], and our data indicates this genomic region regulates both genes and the possibility that both genes contribute to trait liability at the locus. Together, these results identify the GWAS signal at chr3p25.1 as localizing to a distal enhancer, positioned as a modulator of SLC6A6 and GRIP2 expression. These data indicate multiple variants within a single region can induce changes in expression of more than one target gene and may have broader impacts outside of individual-trait biology.