The regression model used here combines the two models of Okbay et al. (2022) and of Selzam et al. (2019). The model by Okbay regresses the child’s phenotype on the child’s PGS and parental PGSs. Disregarding covariates, the model can be expressed as:
(1) phik = b0 + b1 × PGSik + b2 × (PGSim + PGSif ) + εik,
where phik denotes the phenotypic score in child k in family i, PGSik, PGSim, and PGSif are the PGSs in the child, the mother and the father, respectively, and εik is the residual. In this model, the effects of the father's and the mother's PGS are equal (i.e., b2). If desired, this assumption can be relaxed. In within-family regression, the phenotypic variance is decomposed into a between-family effect, represented by the family mean, and a within-family effect, represented by the deviation of the individual PGS from the family mean. We used an equivalent, but slightly altered, model: Instead of the deviation from the family mean, we used the raw PGS of the individual to represent the within-family genetic effect. In the presence of genotyped siblings, we regressed twin phenotype on their own PGS and the average PGS of them and their co-twin, such that:
(2) phik = f0 + f1 × PGSik + f2 × PGSi. + εik ,
where PGSi. denotes the average PGS of the twin pair.
Given Eq. 1, the rejection of b2 = 0 suggests that cov(AC) ≠ 0, which is often attributed to cultural transmission (Balbona et al. 2021; Tubbs et al. 2020). Given Eq. 2, rejection of f2 = 0 suggests that cov(AC) ≠ 0, but does not necessarily identify the source (or process) of AC covariance. A significant f2, so an effect of the sibling mean, could be due to either cultural transmission or sibling interaction. Conversely, the rejection of b2 = 0 could be due to sibling interaction, rather than cultural transmission. We investigated this possibility of misattribution using the regression model that combines PGSs of parents and offspring and the phenotypes of the offspring (Demange et al. 2022; Young et al. 2022):
(3) phik = g0 + g1 × PGSik + g2 × PGSi. + g3 × (PGSim + PGSif) + εik ,
Below, we used exact data simulation (see van der Sluis et al. 2008) according to the model, where the residual εik is bivariate normally distributed N(0, Sε2). The 2x2 covariance matrix Sε2 is a function of residual genetic and (shared and unshared) environmental influences relevant to the phenotype. In this model, the test of AC covariance is based on the parameters g2 and g3. In our simulation g3 is attributable to the process of cultural transmission, while g2 is attributable to the process of sibling interaction. We do not consider the main effects of covariates, such as genetic principal components (Price et al. 2006) sex, or age, as these are not relevant to the present issue.
The estimation of the parameters in these regression models can be done using analyses that can account for correlated residuals (e.g., Sε2). We can use GEE regression analysis, linear mixed modeling, or SEM. In all three approaches, the regressors are treated as fixed regressors, such that the distributional assumption concerns the distribution of the residuals (εik). However, in SEM, we can also choose to treat the regressors as random variables. Treated as such, the distributional assumption concerns the joint distribution of the PGS and the phenotypes, as in this approach a covariance structure model is fitted to the PGS and the phenotypes, and the PGS and phenotypes are assumed to be multivariate normally distributed. Using the SEM approach has the considerable practical advantage that it can handle missing PGS and missing phenotypic data (i.e., using full information maximum likelihood estimation). We note that the random/fixed terminology is confusing. Therefore, we emphasize that the treatment of the regressors as random variables does not imply random effects: the effects of the random regressors are fixed (to be estimated) effects. Random effects are relevant in multilevel modeling, in general, and with linear mixed modeling, particularly in genetics (e.g., Yang et al. 2011).
Procedure
Data simulation was based on the model presented in Fig. 1. We limited the analyses to one set of values of the three variance components (ACE) but varied the predictive power of the PGS (i.e., the proportion of A variance that is accounted for by the PGS), and the effect sizes for cultural transmission, and sibling interaction. Explicitly, we first simulated data for a single set of variance components (a2 = .4, c2 = .3, e2 = .3), one sample size (nMZ = nDZ = 4,000 pairs), and set the PGS to explain 10% of A variance. We varied the two sources of AC covariance (settings: 0, 0.05, 0.1) in a 3x3 design. Note that the variance components sum up to 1 (i.e., .4 + .3 + .3), but the total phenotypic variance equals a2 + c2 + e2 + 2 · cov(AC), meaning it will exceed 1 in the presence of positive AC covariance. Subsequently, we varied the predictive power of the PGS (settings: 2%, 5%, 10%, 15%) and conducted analyses for a DZ-only sample (nDZ = 4,000 pairs). The predictive power is defined in terms of a percentage of total additive genetic variance. Below we present the effect size in terms of both PGS predictive power and in explained phenotypic variance. We note that researchers may be interested in a broader range of settings for the ACE variance components. While these are not the focal point of the current study, we consider a broader range of settings in the supplement (Appendix A).
We adopted the SEM approach using OpenMx. We present results for three OpenMx models: The combined model based on Eq. 3, including the parents’ and twins’ PGSs, the cultural transmission-only model based on Eq. 2 (nested under the Eq. 3 model by fixing g3 to zero) and the sibling interaction-only model based on Eq. 1 (nested under the Eq. 3 model by fixing g2 to zero). We focused on how the models that only assume one source of AC covariance performed in comparison to the combined model in terms of power and effect sizes. Hereafter, we refer to the coefficient for cultural transmission as g and the coefficient for sibling interaction as b.
We fitted the models to DZ and MZ twin data, i.e., data typically present in the twin and family registries, but we note that MZ twin data are not required to fit these models. The residual covariance matrix Sε2 is expected to differ in the MZ and DZ samples, due to residual genetic influences (i.e., not explained by the PGS). We also fitted the models to the DZ twin data only, as in practice, phenotypic data may be limited to full siblings.
Power calculations were based on the likelihood-ratio test statistic, given the degrees of freedom of the test, the non-centrality parameter of the χ2 distribution, and the a of 0.05 (e.g., Hewitt & Heath 1988). We also present estimated path coefficients for cultural transmission and sibling interaction. We express the effect size in terms of the increase in phenotypic variance in MZ and DZ twins that is due to AC covariance (in comparison to the baseline ACE model with no AC covariance). This measure is zygosity-dependent in the presence of sibling interaction and will then be higher in MZ twins, assuming positive AC covariance (Carey 1986; Eaves 1976; Eaves et al 1978; see Appendix C). We also present results for four different setting of the predictive power of the PGS, which is expressed as the percentage of A variance captured by the PGS. From these values, we can calculate the proportion of overall phenotypic variance that is captured by the PGS. In absence of AC covariance (g = b = 0), and given a2 = .4 and the values for the predictive power of the PGS (settings: 2%, 5%, 10%, 15%), the PGS captures 0.8%, 2%, 4%, and 6% of the total phenotypic variance, respectively. This value increases in the presence of AC covariance and is, again, zygosity-dependent in the presence of sibling interaction. For example, when g = .1 and b = .1, the PGS captures 4.9% of MZ phenotypic variance and 5.2% of DZ phenotypic variance (see Appendix C).
The SEM model can be extended by fitting an ACE model to the residual (background) covariance matrices Sε2 in the MZ and DZ twin data. Fitting the extended model is not necessary for the purpose of this study, but it is relevant to note that this background model is misspecified in the presence of AC covariance. The estimate of the residual C variance is expected to be biased in the presence of AC covariance, because the PGS only captures part of the A variance, and thus only part of the AC covariance. For reference, we present extended results for the SEM approach in a wider range of settings (Appendix A). We also provide results of the GEE regression analyses (Appendix B).
Materials
All analyses were performed in R (R Development Core Team 2023) used in RStudio (RStudio Team 2023). We used the mvrnorm() function of the MASS package (Ripley et al 2023) for exact data simulation, geepack (Højsgaard et al 2022) for GEE regression (Appendix B), and OpenMx (Neale et al 2016) for SEM. For ease of use, analysis steps were combined in the package gnomesims (Bernardo 2024; see Appendix D).