Issue 1: Twin research designs and analytic approaches

ISSN: 2652-5518 
By Katrina J. Scurrah and John L. Hopper

Main points

  • The classic twin design has been used to study the causes of variation in human characteristics for nearly a century, but researchers now increasingly recognise many other questions that can be addressed using twin study designs.
  • Advances in biological science and technology have created new opportunities and possibilities for twin study designs.
  • These new initiatives are especially important when used in combination with new statistical models being developed to address the fundamental issue of causation.
  • Twin studies will continue to remain highly relevant to health and medical research.


The aim of this article is to summarise different types of research study designs involving twins and the ways in which these can be used to address different specific research questions. We describe the advantages and limitations of the various designs, highlight some issues of which researchers should be aware when using them, and discuss some controversies. Historical perspectives and current methodological research topics are also considered.

Suggested citation: K.J. Scurrah and J.L. Hopper. Twin research: designs and analytic approaches. Conversations in Twins Research, Twins Research Australia, Melbourne, 2019.


Studies of twins can provide insights about the health of all persons, not just twins, so they are relevant to the whole population. The various twin study designs address different aims, require specific statistical analysis techniques with associated model assumptions, and have diverse advantages and limitations.

For example, a recent study found that if one twin in a pair had hallux valgus (bunions), the other twin was also likely to have bunions [1].This showed that factors shared between twins in a pair were important. However, there was no difference in similarity between monozygotic (MZ) and dizygotic (DZ) twins, suggesting that shared environmental factors, not genes, are important. One implication of this is that future research can be focused on preventing specific non-genetic risk factors, such as the type of shoes worn.

Studies of within-pair differences can also produce valuable outcomes. For example, a study of twin pairs with discordant tobacco smoking histories revealed that the more heavily smoking twin was more likely to have lower bone mineral density and twice the risk of a bone fracture [2]. In a study of twin pairs in which one had breast cancer and the other did not, researchers found that the affected twin usually went through puberty first [3].

The classic twin design

One of the most commonly used twin designs, and one that researchers usually think of first when considering a twin study, is the “classic” twin design. Familial resemblance for a trait (such as height) can be quantified by estimating its correlation in twin pairs; this can be done for all types of twin pairs together or separately for MZ and DZ twins. Correlations are always between -1 and 1, and correlations greater than zero suggest that there are familial factors that cause resemblance for the particular trait. The classic twin model initially estimates correlations separately for MZ and DZ pairs and compares them. If the correlation is greater for MZ pairs than for DZ pairs, this is consistent with (but does not prove) the existence of genetic effects that influence variation, even though genes have not been measured. This inference is made under certain assumptions (see below), and even when correlations are consistent with genetic effects, the specific genes responsible cannot be identified by studying disease traits without measuring genes.

The classic twin method relies on two strong assumptions:

  • MZ and DZ pairs share aspects of the environment (relevant to the trait) to the same extent (often referred to as the “equal environments” assumption [4])
  • that the only difference between MZ and DZ pairs is the proportion of genes they share.

The first assumption does not mean that all pairs of twins are assumed to share their environments equally, just that on average the extent of sharing does not depend on zygosity. Put another way, this means that on average MZ pairs do not share their trait-relevant environment any more (or less) than DZ pairs.

The trait mean (the average value of a trait) often depends on measured variables such as age and sex. For example, average height is usually greater for men than for women. In such situations the trait mean should be adjusted for these variables before correlations are estimated. This usually requires fitting statistical models which estimate the effects of the measured variables on the trait of interest (e.g., what is the effect of sex on height, or equivalently, how much taller are males than females on average?).

Once the correlations for MZ and DZ pairs are estimated, and the statistical significance of their difference determined, models which divide the residual (unexplained) variation into components due to shared genetic effects, shared environmental effects and unshared effects can be fitted. These “variance components models” can be fitted using maximum likelihood estimation [5] or via a structural equations approach [6] and should also incorporate the adjustments for measured variables described in the paragraph above.

The residual variance (which includes variation due to measurement error and random factors, as well as factors unique to individual twins) and the amounts of variation attributable to shared genes and shared environmental effects, provide information about the relative causes of differences in the trait of interest between people across the population. Inclusion of opposite-sex DZ pairs allows inference about whether the extent to which genes or environmental effects influence trait variation differs between males and females. The components of variance are often reported as proportions of the total variance, and in this case the proportion of shared genetic effects is referred to as the heritability. Although it is commonly reported, focusing on the heritability loses a lot of information and is not recommended [7, 8].

Correlations are usually estimated for continuous traits (such as height), but scientists also often want to estimate how similar twins are with respect to a categorical outcome, such as presence or absence of bunions, baldness (none, mild, moderate or severe) or a schizophrenia diagnosis. This similarity is usually quantified using the concordance (either casewise or pairwise [9]). For schizophrenia, the casewise concordance for MZ pairs has been estimated to be 0.48, meaning that if one member of an MZ pair is diagnosed with the condition there is a 48% chance that the other one will also have the condition (see Figure 3.6 of Plomin et al. [10]). For DZ pairs the concordance has been estimated to be 0.17. If we assume again that the two twin types share their schizophrenia-related environments equally and only differ in the proportion of genes they share, these estimates are consistent with a genetic influence on risk of schizophrenia. This does not exclude the role of environmental factors, because otherwise the concordance for MZ pairs would be 1.00.

An alternative approach to quantifying similarity for binary outcomes is the twin odds ratio (ORtwin) method [11, 12], in which the co-twin’s outcome is included as a risk factor in a logistic regression model. This compares the odds of the outcome for an individual with an affected twin with the odds of the outcome for an individual with an unaffected twin, which is referred to as the ORtwin. An ORtwin significantly greater than 1 provides evidence of familial effects. By including an interaction between zygosity and the co-twin’s outcome, different ORs for MZ and DZ twins can be estimated and compared. For example, this approach showed that although familial effects were present for bunions with ORtwin=3.7 (i.e., the co-twin of a person with bunions has 3.7 times the odds of developing bunions compared with a co-twin of a person without bunions), the ORs were not significantly different for MZ and DZ pairs, indicating that environmental but not genetic factors contribute to variation in bunions risk [1].

Variance components models can also be fitted to binary traits (e.g., bunions or no bunions) by using the correlation. Alternatives exist, including estimation of “intrinsic correlation” after adjustment for measured variables such as age, which can affect concordances [1, 13], and generalised linear mixed models (GLMMs) fitted using Markov chain Monte Carlo methods [14]. Methods for outcomes which are best represented as ordinal, categorical [15, 16] or as censored survival times [17, 18] also exist.

Another way to analyse binary traits is to assume there is an unmeasured continuously distributed underlying liability to disease. Although commonly performed, this method of analysis is controversial because it involves untestable assumptions and the interpretation is problematic; for example, heritability of liability is not the proportion of disease caused by genes [19].
Advantages of the classic twin design include the ability to include all types of twin pairs (such as MZ and DZ, same-sex and opposite-sex, both affected/one affected/none affected, both exposed/one exposed/none exposed) regardless of their measured outcomes or exposures, the ability to estimate variation and covariation (the amount of variance shared by twins in a pair)as well as correlations and heritability, and the capacity to assess effects of measured risk factors such as age and sex on both the trait and the variances or covariances. However, this design also has limitations, perhaps most importantly regarding the equal environments assumption, which is crucial to inference but difficult to test. The classic design also has low power to detect shared environmental effects (see, e.g., [4]) and assumes that all differences in correlations are due solely to differences in genetics.

Other types of twin designs

Other twin study designs are very useful in epidemiology generally, not just in genetic epidemiology. Some examples of other designs and of the types of questions that twin research can address are shown in Table 1 below. Most designs require statistical models which allow for correlated observations to be fitted, such as GLMMs, also known as mixed effects models, or models fitted using generalised estimating equations (GEEs). These models have advantages and disadvantages. For example, the GEE approach is appropriate when the researcher is interested in the association of measured risk factors (e.g., military service in Vietnam or calcium supplementation) with the outcome, while GLMMs are required to estimate variation and covariation. Both these models are appropriate for continuous (e.g., height) and binary (e.g., bunions) outcomes, and both can be fitted in standard statistical software packages such as Stata [20]. Which approach and model are best depends on the research question and available data.

Table 1. Twin designs and statistical approaches

DesignExamples/applicationsStatistical model/analytic approach
Co-twin control study – disease discordant: matched case–control design, generally with identical twins discordant for a traitEarlier onset of puberty is not associated with breast cancer risk in disease-discordant twins [3]Conditional logistic regression or binary GLMMs/GEEs
Co-twin control study – exposure discordant: identical twins discordant for a health-related characteristicMilitary service in Vietnam is associated with higher risk of post-traumatic stress symptoms than military service elsewhere [21]Linear or logistic regression with adjustment for within-pair correlation (e.g., GEEs or mixed effects models)
Differences study: both exposure and outcome are discordant, and both are continuous variablesDifferences in smoking consumption predict differences in bone density [22]Linear regression
Cross-sectional study: twin pairs are not selected by their outcomes or exposuresSeveral common genetic variations associated with breast cancer risk are also associated with mammographic density  As for exposure-discordant studies
Randomised controlled trial: identical twins, naturally matched for age, sex and genes, given the same or different interventions (or possibly both, if a cross-over design is used)Calcium supplementation in adolescence has little effect on bone density [23]As for exposure-discordant studies
Epigenetics: MZ twins share the same DNA but the way in which this DNA operates can differBirth weight is associated with epigenetic differences in growth and metabolism genes [24]As for exposure-discordant studies
Longitudinal designs: Following the growth and development of twins can identify changing patterns of genetic and environmental influenceHigher cumulative exposure to solvents over the lifetime is associated with increased risk of Parkinson’s disease [25]. Many twin registries have longitudinal data on twins [26-28]Regression-based approach accounting for correlation (e.g., mixed effects models)
Multivariate designs: Studying two or more traits simultaneously can identify shared and separate genetic and environmental influencesGenes that influence children’s reading abilities are in part shared with genes that influence mathematics abilities [29]Multivariate variance components models
Extended twin family study: Inclusion of other family members such as parents and siblings extends the range of genetic and environmental factors that can be understoodDifferent patterns of covariation exist for height, body mass index (BMI) and blood pressure [30]Variance components models

Advantages and limitations of twin study designs

Each of these study types has advantages and disadvantages. For disease-discordant twin studies, the case–control pairs are matched for both measured and unmeasured factors – 100% for age, genetic factors (perfectly for MZ pairs; 50% for DZ), non-genetic familial factors (not necessarily to the same degree for MZ and DZ pairs), mother, father, uterus and (perhaps) placenta, sex (if only same-sex pairs are included), calendar year and season of birth. Although this type of study can be less costly and time-consuming than a cohort study, it has similar limitations to standard case–control studies (such as potential recall bias and inefficiency of sampling for rare exposures).

While exposure-discordant twin studies also match for both unmeasured and measured factors (other than the exposure of interest), and have the potential to allow causal inference, the risk estimates might not be generalisable to the general population. This study type has similar advantages and limitations to standard matched cohort studies: it is advantageous for rare exposures, but it can be difficult to find and recruit exposure-discordant twin pairs. For example, in one study over 2000 pairs of female twins were screened to find 20 pairs discordant by 20 or more pack-years of smoking [2].

In randomised controlled trials, the pairs will be matched for genes, their participation may be enhanced by the pairing, and they may be a motivated group. However, the bond between twins could lead to non-adherence to study protocol (e.g., by discussing or swapping treatments).

Causal inference from twin studies

Epidemiological studies of twin pairs are extremely valuable for estimation of associations of risk factors with outcomes because they control by naturally matching for age, sex, genes and other familial risk factors. They still, however, only estimate associations, and cannot be used to infer causation. Two variables can be associated because there is another variable that causes them both. For example, alcohol use is associated with lung cancer, but doesn’t cause it. Smoking causes lung cancer, and smokers tend to be more likely to be drinkers.

Making correct inference about causation is extremely important for improving health and is an active area of research, and several statistical approaches based on fitting one or more of a series of regression-based models have been described and applied (Table 2). Often, both the twin’s measured risk factor and his/her co-twin’s measured risk factor are included as risk factors in a regression model. The within–between model, which uses linear or logistic regression with adjustment for within-pair correlation (e.g. GEEs or mixed effects models), was one of the first such models [31, 32]. This model includes as risk factors the twin pair mean exposure, and the difference between the twin pair mean and the individual’s exposure; the authors describe possible scenarios and the inferences that can be made from them. This idea was later developed by Sjölander and colleagues to describe specific circumstances and models under which causal inferences may be made [33, 34].

An alternative approach to inference is Inference about Causation through Examination of Familial Confounding (ICE FALCON), which assesses changes in regression coefficients between three regression models with i) only the individual’s exposure, ii) with only the co-twin’s exposure, and iii) with both the individual’s and the co-twin’s exposure included as risk factors. Conclusions from this approach [35, 36] are consistent with those from a Mendelian randomisation approach [37], which uses measured genetic variants known to affect an environmental exposure as instrumental variables to assess whether the exposure is associated with the outcome in each genetic risk stratum. Both approaches show smoking and higher BMI cause DNA methylation changes (which alter the way genes are expressed in the body), rather than vice versa [36]. Causal mediation analyses, which aim to quantify direct and indirect effects of an exposure on an outcome, could in theory also be applied to twin data (with appropriate adjustments for within-pair correlation).

Table 2. Examples of statistical approaches to causal inference using twin designs

Within–between study(related to causal models)The observed association between birth weight and cord blood erythropoietin appears due to individual rather than pair-specific factors [31]  
Causal models in twinsThe observed association between BMI and mortality is unlikely to be causal and appears largely due to shared confounding [33, 34]
ICE FALCONSmoking and BMI appear to have a causal effect on DNA methylation [35, 36]
Mendelian randomisationFemale DZ twins with a twin brother scored lower for attention deficit hyperactivity disorder (ADHD) and autism traits than those with a twin sister [38]
Causal mediationTwins’ greater risk of poor educational outcomes is mediated by gestational age and small size for gestational age [39]

Designs for twin-specific issues

Research on twin-specific issues can help parents of twins (with day-to-day decisions) and policymakers (with long-term planning). Standard epidemiological designs, as well as the twin-specific designs described above, can be used to investigate such issues. For example, a cross-sectional design was used in a study which found that for most twin pairs, staying together in the first year of school is a critical social support [40]. Although there is considerable evidence that twins have worse outcomes than singletons at birth, a recent study which used a data linkage approach found that twin children also have poorer educational outcomes in childhood than singletons, while a causal mediation analysis then showed that this disadvantage was mostly due to the shorter average gestational age for twins [39]. Regression-based models, with adjustments for within-pair correlation, are usually fitted to analyse data in these studies but the precise model required depends on the research question and outcome type.

General statistical issues

General statistical principles apply when analysing data from twins and families. Good statistical practice in this context includes:

  • thorough exploration of data prior to model fitting
  • being aware of and testing model assumptions
  • reporting estimated values, plausible ranges for these values (such as 95% confidence intervals), and strength of evidence against the null hypothesis (such as p-values)
  • starting with simple analyses and models then building on them
  • adjusting for measured variables before considering unmeasured effects
  • remembering that analyses of continuous outcomes are usually more powerful than those of binary outcomes created from the continuous outcome.


While the classic twin design has been applied for nearly 100 years for disentangling the causes of variation in human characteristics, researchers increasingly employ and recognise the value of many other twin designs, particularly in health and medical research. Technological advances that have led to the creation of epigenetic, microbiome and other “omics” platforms have opened up new opportunities and possibilities for twin studies using these designs, especially given that new statistical models are being developed to address the fundamental issue of causation. Twin studies will continue to remain highly relevant to many fields of research.

If you have any comments or questions, please email us at

About the authors

Katrina Scurrah is the Lead Biostatistician and Chair of the Statistical Methodology Group within Twins Research Australia, Centre for Epidemiology and Biostatistics, Melbourne School of Population and Global Health, University of Melbourne.

John Hopper is the Director (Research) of the Centre for Epidemiology and Biostatistics, Melbourne School of Population Global Health, University of Melbourne.


We thank Brian Byrne, Gillian Dite, Minh Bui, Jeff Craig and John Mathews (the founding director of the Australian Twin Registry) for helpful discussions and comments on this paper. Some of this material appears in Chapter 15. Twins and Twinning (Umstad M., L. Calais-Ferreira, K. Scurrah, J.G. Hall, and J.M. Craig) of Pyeritz R., B.R. Korf, and W. Grody (eds) Emery and Rimoin’s Principles and Practices of Medical Genetics (2018).


1. Munteanu, S.E., et al., Hallux valgus, by nature or nurture? A twin study. Arthritis Care & Research, 2016. 69(9): p. 1421-1428.
2. Hopper , J.L. and E. Seeman The Bone Density of Female Twins Discordant for Tobacco Use. New England Journal of Medicine, 1994. 330(6): p. 387-392.
3. Hamilton, A.S. and T.M. Mack, Puberty and genetic susceptibility to breast cancer in a case-control study in twins. New England Journal of Medicine, 2003. 348(23): p. 2313-2322.
4. Hopper, J.L., Why ‘common environmental effects’ are so uncommon in the literature, in Advances in Twin and Sib-pair Analysis, T.D. Spector, H. Snieder, and A.J. MacGregor, Editors. 2000, Oxford University Press: London. p. pp151-65.
5. Lange, K., D. Weeks, and M. Boehnke, Programs for pedigree analysis: MENDEL, FISHER and dGene. Genetic Epidemiology, 1988. 5: p. 471-472.
6. Rijsdijk, F.V. and P.C. Sham, Analytic approaches to twin data using structural equation models. Brief Bioinform, 2002. 3(2): p. 119-33.
7. Fisher, R.A., Limits to intensive production in animals. British Agricultural Bulletin, 1951. 4: p. 217-218.
8. Hopper, J.L., Heritability, in Encyclopedia of Biostatistics. 2005, John Wiley & Sons, Ltd.
9. Witte, J.S., J.B. Carlin, and J.L. Hopper, Likelihood-based approach to estimating twin concordance for dichotomous traits. Genet Epidemiol, 1999. 16(3): p. 290-304.
10. Plomin, R., et al., Behavioral genetics. Fifth ed. 2008, New York: Worth Publishers.
11. Betensky, R.A., et al., A computationally simple test of homogeneity of odds ratios for twin data. Genetic Epidemiology, 2001. 20(2): p. 228-238.
12. Ramakrishnan, V., et al., Elementary methods for the analysis of dichotomous outcomes in unselected samples of twins. Genet Epidemiol, 1992. 9(4): p. 273-87.
13. Hannah, M.C., J.L. Hopper, and J.D. Mathews, Twin concordance for a binary trait. I. Statistical models illustrated with data on drinking status. Acta Genet Med Gemellol (Roma), 1983. 32(2): p. 127-37.
14. Burton, P., et al., Genetic variance components analysis for binary phenotypes using generalized linear mixed models (GLMMs) and Gibbs sampling. Genetic Epidemiology, 1999. 17(2): p. 118:140.
15. Zaloumis, S.G., et al., Non-proportional odds multivariate logistic regression of ordinal family data. Biometrical Journal, 2015. 57(2): p. 286-303.
16. Nyholt, D.R., et al., Genetic basis of male pattern baldness. Journal of Investigative Dermatology, 2003. 121(6): p. 1561-1564.
17. Yashin, I., J. Vaupel, and I. Iachine, Correlated individual frailty: An advantageous approach to survival analysis of bivariate data. Mathematical Population Studies, 1995. 5(2): p. 145-159.
18. Scurrah, K.J., L.J. Palmer, and P.R. Burton, Variance components analysis for pedigree-based censored survival data using generalized linear mixed models (GLMMs) and Gibbs sampling in BUGS. Genetic Epidemiology, 2000. 19(2): p. 127-148.
19. Hopper, J.L. and T.M. Mack, The heritability of prostate cancer-letter. Cancer Epidemiol Biomarkers Prev, 2015. 24(5): p. 878.
20. StataCorp, Stata Statistical Software: Release 14. 2015, StataCorp LLC College Station, T.
21. Goldberg, J. and M. Fischer, Co-twin Control Methods, in Encyclopedia of Statistics in Behavioral Science. 2005, John Wiley & Sons, Ltd.
22. Hopper, J.L. and E. Seeman, The bone density of female twins discordant for tobacco use. N Engl J Med, 1994. 330(6): p. 387-92.
23. Nowson, C.A., et al., A co-twin study of the effect of calcium supplementation on bone density during adolescence. Osteoporos Int, 1997. 7(3): p. 219-25.
24. Gordon, L., et al., Neonatal DNA methylation profile in human twins is specified by a complex interplay between intrauterine environmental and genetic factors, subject to tissue-specific influence. Genome Res, 2012. 22(8): p. 1395-406.
25. Goldman, S.M., et al., Solvent Exposures and Parkinson’s Disease Risk in Twins. Annals of Neurology, 2012. 71(6): p. 776-784.
26. Moayyeri, A., et al., Cohort Profile: TwinsUK and healthy ageing twin study. Int J Epidemiol, 2013. 42(1): p. 76-85.
27. Hopper, J.L., et al., Australian Twin Registry: 30 years of progress. Twin Res Hum Genet, 2013. 16(1): p. 34-42.
28. Gatz, M., et al., Cohort Profile: The National Academy of Sciences-National Research Council Twin Registry (NAS-NRC Twin Registry). Int J Epidemiol, 2014.
29. Davis, O.S.P., et al., The correlation between reading and mathematics ability at age twelve has a substantial genetic component. 2014. 5: p. 4204.
30. Harrap, S.B., et al., Familial patterns of covariation for cardiovascular risk factors in adults: The Victorian Family Heart Study. Am J Epidemiol, 2000. 152(8): p. 704-15.
31. Carlin, J.B., et al., Regression models for twin studies: a critical review. International Journal of Epidemiology, 2005. 34(5): p. 1089-1099.
32. Gurrin, L.C., et al., Using bivariate models to understand between- and within-cluster regression coefficients, with application to twin data. Biometrics, 2006. 62(3): p. 745-751.
33. Sjölander, A., T. Frisell, and S. Öberg, Causal Interpretation of Between-Within Models for Twin Research, in Epidemiologic Methods. 2012. p. Article 10.
34. Sjolander, A., et al., Between-within models for survival analysis. Statistics in Medicine, 2013. 32(18): p. 3067-3076.
35. Li, S., et al., Inference about causation between body mass index and DNA methylation in blood from a twin family study. bioRxiv, 2017.
36. Li, S., et al., Causal effect of smoking on DNA methylation in peripheral blood: a twin and family study. Clin Epigenetics, 2018. 10: p. 18.
37. Davey Smith, G. and S. Ebrahim, Mendelian randomization: prospects, potentials, and limitations. Int J Epidemiol, 2004. 33(1): p. 30-42.
38. Attermann, J., et al., Traits of ADHD and autism in girls with a twin brother: a Mendelian randomization study. Eur Child Adolesc Psychiatry, 2012. 21(9): p. 503-9.
39. Zeltzer, J., et al., Twins’ greater risk of poor educational outcomes compared with singletons is mediated by gestational age:- a population based data linkage study. In preparation, 2019.
40. Staton, S., et al., To separate or not to separate? Parental decision-making regarding the separation of twins in the early years of schooling. Journal of Early Childhood Research, 2012. 10(2): p. 196-208.