4.1 Studying species genomic data
The whole genome sequence, protein sequence, and annotation files of C. sinensis were obtained from the Tea Plant Genome Database (https://eplant.njau.edu.cn/tea/index.html). The corresponding files for the Arabidopsis thaliana MADS-box gene family were acquired from the TAIR database (https://www.arabidopsis.org/).
4.2 Identification of the MADS-box gene family members in C.sinensis
To identify the MADS-box genes in C. elegans, two different strategies were used: BLAST search versus Hidden Markov Model (HMM) search. First, the gene IDs of known Arabidopsis thaliana MADS-box genes were obtained from previous studies. Subsequently, their corresponding protein sequences were extracted using TBtools as queries for a BLASTP search against the C. sinensis genome. Finally, only homologous sequences with E-values less than 1e-5 were retained as candidate genes for subsequent analysis.To identify sequences containing typical MADS-box or K-box domains, the corresponding Hidden Markov model (HMM) profiles (PF00319, PF01486) were downloaded from the PFAM database (https://pfam-legacy.xfam.org/) and employed in a domain search. The final MADS-box genes were obtained by merging the sequences identified by the BLAST search with the HMM search and removing redundant members. The subcellular localization of MADS-box proteins was predicted using the online tool WoLF PSORT (https://wolfpsort.hgc.jp/). Meanwhile, the physicochemical properties of these proteins were analyzed with the ProtParam-based “Protein Parameter Calc” function in TBtools.[20].
4.3 Phylogenetic analysis of the MADS gene of C. sinensis
A multiple sequence alignment of all identified MADS-box protein sequences from C. sinensis was conducted using MUSCLE v3.8.[21]. First, the protein sequences of the C. sinensis MADS-box gene family were merged with those from Arabidopsis thaliana. Subsequently, the optimal model for phylogenetic construction was determined using the Model Selection tool built into MEGA11. Finally, a phylogenetic tree was constructed with the Neighbor-Joining (NJ) method, applying the Gamma-distributed rate variation model (+G)[22]. After the evolutionary tree construction was completed the tree files generated by MEGA were beautified using Evolview v3(https://www.evolgenius.info/evolview/#/treeview) by annotating the type I (Mα, Mβ, Mγ) and type II (MIKCC, MIKC*, etc.) genes with different colours[23].
4.4 Analysis of the structure, conserved structural domains and cis-acting elements of the MADS gene of C. sinensis
Identification of conserved motifs was performed using the MEME online tool (https://meme-suite.org/meme/index.html), configured to search for 10 motifs, with subsequent visualization conducted using the relevant function in TBtools [24]. For structural domain prediction, protein sequences were submitted to NCBI CDD (https://www.ncbi.nlm.nih.gov/Structure/bwrpsb/bwrpsb.cgi), and the resulting data were visualized [25]. Investigation of potential regulatory elements involved the extraction of a 2000 bp putative promoter region upstream of the translation initiation codon for each gene from the genomic files using TBtools. Cis-acting elements within these promoter sequences were identified via the PlantCARE database (https://bioinformatics.psb.ugent.be/webtools/plantcare/html/) and visualized using TBtools [26].
4.5 Genome-Wide Identification of MADS-box Genes: Chromosome Distribution and Synteny in C. sinensis
The chromosomal locations of the MADS-box genes were determined from the C. sinensis genome annotation file and visualized using the "Gene Location Visualize" function in TBtools. The One Step MCScan X-Super Fast function of the TBtools software was then used to analyse the covariance of the MADS-box gene family members within the C. sinensis species.A similar procedure was used to demonstrate the distribution and covariance of MADS-box homologous genes on different chromosomes by going to the Ensembl Plantsdatabase to retrieve the genomic data of grapevine, Arabidopsis thaliana, rice, and maize for inter-species covariance with the MADS-box genes of C. sinensis[27].
4.6 RNA Extraction, Reverse Transcription, and Quantitative RT-PCR Analysis in Response to Simulated Drought and Salt Stress
Four potted large-leaf tea plants (C. sinensis) from Wuzhishan, Hainan, with uniform growth status were selected and divided into two groups. One group was irrigated with 20% PEG6000 to simulate drought stress, while the other was treated with 200 mmol/L NaCl for salt stress. Leaves were collected on day 0, day 3, and day 6 to form the following groups: CK, PEG-3Day, PEG-6Day, NaCl-3Day, and NaCl-6Day. Upon harvest, all leaf samples were promptly wrapped in aluminum foil, flash-frozen in liquid nitrogen, and stored at -80°C.
Extraction of total RNA from all collected samples was performed using an RNA extraction kit (Tiangen Biotech, Beijing, China) in accordance with the manufacturer’s instructions. Following an assessment of RNA purity, reverse transcription was conducted using the FastKing One-Step cDNA Synthesis PreMix (KR118). Quantitative real-time PCR amplification was executed on a LightCycler® 480 II system, and the relative expression levels of seven selected Type II MADS-box genes were determined by the 2–∆∆Ct method. The corresponding quantitative primers are provided in Supplementary Table 1.