Overview of rare and common disease genetic architecture
Why study genetic diseases?
- Some treatments are empirically (根据经验) effective but mechanism is unknown
- Contemporary medicine (现代医学)
- Treat diseases by the causes
- Genetics
- Better understanding of disease, diagnosis, treatment and screening
- Knowing the variant might lead to treatment
Monogenetic diseases
- Caused by variants in one or a few genes (Monogenic diseases)
- Inherited in Mendelian patterns (dominant, recessive, X-linked)
- Often individually very rare, but collectively common
Genetic diagnosis
| Genetic diagnosis | Clinical diagnosis |
|---|---|
| Identify the cause of disease | Identify the disease |
| Genetic diagnosis can lead to clinical diagnosis | Clinical diagnosis can suggest which genes to test |
Many conditions are named after the gene that is mutated/defective
Family screening
To identify other family members who are at risk of developing the disease
prenatal screening
To identify parents who are at risk of having affected child
Healthy child could be selected using pre-implantation genetic diagnosis (PGD)
Inform decisions about future pregnancies
recurrence risk counselling (复发风险咨询)
To inform parents of the risk of having another affected child
If the disease is “de novo” (newly in the child), the risk of recurrence is low
Pompe disease
Process:
Mutation of GAA gene -> Incifient of acid alpha-glucosidase -> Glycogen breakdown -> glycogen build up -> Damage to muscle and other cells
Treatment:
-
Enzyme replacement therapy (ERT)
-
Supportive therapy
- mechanical ventilation
- feeding tube
- physical therapy
Future treatment
-
DNA editing
-
Antisense oligonucleotides (ASOs)
Understanding the biology of disease
Genetic can implicate new pathogenic pathways
GWAS (Genome-wide association studies) can be unbiased screen identify new genes and pathways involved in disease
Example: Crohn’s disease - T300A variant in the ATG16L1 gene. Reduced autophagy (自噬) in response to bacteria.
Discovering new drug targets
Examples:
- PCSK9 - linkage studies, gain-of-function mutations cause hypercholesterolemia (高胆固醇血症).
Drug targets PCSK9 to lower cholesterol approved by FDA in 2015.
- BCL11A - GWAS, beta-thalassemia (地中海贫血) and sickle cell anemia (镰状细胞贫血).
Ongoing clinical trials for gene editing to treat these diseases.
Long time from discovery to treatment
Genetic information in drug development
- Similarities between phenotypes to the actual indication
Blood glucose variant for type 2 diabetes drug
- Source of the genetic variant
Monogenic vs polygenic
- Confidence of how we could impact on the gene
Coding vs non-coding
Predicting disease onset with genetics (disease onset: 发病)
Using genome-wide polygenic risk scores (PRS) to predict disease onset
Personalised medicine
Help to tailor the right medicine and right dosage to the right patient
- Adverse events (i.e. side effects)
- Drug efficacy (i.e. effectiveness)
- Drug dosage (i.e. amount of drug to give)
Example: Warfarin for blood clotting
Too much: bleeding
Too little: clotting
Genetic architecture
| Aspect | Rare, Monogenic Diseases | Common, Complex Diseases |
|---|---|---|
| Mode of Inheritance | Dominant, recessive, X-linked, etc. De novo (new mutations) or inherited | Often polygenic and multifactorial with no clear pattern of inheritance |
| Heritability | High; nearly all variation in disease risk can be genetic | Variable; a lower proportion of disease risk is genetic |
| Genetic Heterogeneity | Allelic (different mutations in the same gene) and Locus (mutations in different genes) can both occur | More common due to many genes and environmental factors involved |
| Effect Size/Frequency Spectrum | Often rare variants of large effect | Can involve major loci (common variants of large effect), common variants of small effect, and a “polygenic tail” (many genes each contributing a small amount) |
| Penetrance | Typically high; most mutation carriers develop the disease | Variable; many carriers of risk alleles do not develop the disease |
| Rate of Sporadic Phenocopies | Relatively low, as most cases are genetic | Higher, due to the complex interaction of genetic and environmental factors |
| Genetic Model | Typically follows Mendelian patterns of inheritance | Often involves additive effects, but can also include dominant patterns, gene-by-environment interactions, and epistasis (gene-gene interactions) |
Rare diseases
Domaina, Kashmiri and SUM1, CC BY-SA 4.0 https://creativecommons.org/licenses/by-sa/4.0, via Wikimedia Commons
-
Dominant: one copy of the variant is sufficient to cause disease
- One parent is affected and one parent is unaffected, 50% chance of passing on the variant
-
Recessive: two copies of the variant are required to cause disease
- Both parents are unaffected carriers, 25% chance of passing on the variant
Homozygous vs. compound heterozygous
Recessive can becaused by two allel with same variant (homozygous) or two allel with different variant (compound heterozygous)
- X-linked dominant: one copy of the variant on the X chromosome is sufficient to cause disease
- Affected fathers pass on the variant to all daughters but no sons
- Affected mothers pass on the variant to 50% of sons and daughters
Domaina, Angelito7 and SUM1 Derivative work: SUM1, CC BY-SA 4.0 https://creativecommons.org/licenses/by-sa/4.0, via Wikimedia Commons
- X-linked recessive:
- Affected fathers pass on the variant to all daughters who are unaffected carriers and no sons
- Carrier mothers pass on the variant to 50% of sons who are affected and 50% of daughters who are unaffected carriers
Domaina, Kashmiri and SUM1 Derivative work: SUM1, CC BY-SA 4.0 https://creativecommons.org/licenses/by-sa/4.0, via Wikimedia Commons
- De Novo: new mutations that arise in the child
- Neither parent is affected, but the child is affected
- Can be dominant or recessive
Inherited vs. sporadic diseases (遗传性 vs. 散发性)
| Aspect | Inherited Disease | Sporadic Disease |
|---|---|---|
| Description | Disease that occurs in family members across multiple generations | Disease that arises in an individual without a clear family history |
| Inheritance | Can be dominant or recessive | Often de novo mutations; can occur without any known inheritance |
| Penetrance | May be complete (all individuals with the mutation express the phenotype) or incomplete (not all individuals express the phenotype) | Not necessarily applicable, but if genetic, may influence the chance of disease |
| Genetic Factors | Usually has a clear genetic basis | May not be genetic; if genetic, it influences the likelihood rather than directly causing the disease |
Identify causal genes
- For some diseases, single gene is sufficient to cause disease (i.e. neurofibromatosis(神经纤维瘤病))
- For other diseases, multiple genes can cause disease (i.e. Cardiomyopathies(心肌病))
- Single causal gene may cause multiple diseases (i.e. mutations in the same gene can cause different diseases)
- Multiple variants could be on the same gene
- Very small proportion of disease is caused by any single variant
Disease Mechanism
Gain vs. loss of function
| Mechanism | Loss-of-Function | Gain-of-Function |
|---|---|---|
| Protein Function | Absent or non-functional protein | Heightened activity of a protein, such as an overactive kinase |
| Genetic Variants | Variants cause RNA degradation or disrupt a critical protein domain | Variants lead to increased function or activity of the protein |
| Prevalence | Commonly observed in various genetic disorders | Rarer and often harder to detect than loss-of-function mutations |
Dominant negative
A variant in one allele can affect the protein product of the wild-type allele
Early vs. late onset
Age of onset may be influenced by the type of variant. (e.g. Huntington’s disease(亨廷顿舞蹈病))
Complex diseases
Heritability is the proportion of phenotypic variance due to genetic factors
The Additive Genetic Model for a Single Variant
This model assumes that the effect of a single genetic variant on a phenotype is additive, meaning that each additional copy of the variant has the same effect on the trait.
The equation for the additive genetic model for a single variant is:
Where:
- is the phenotypic value for individual .
- is the effect size of variant .
- is the genotype dosage for individual at variant (0, 1, or 2).
- is the residual error for individual .
Scenario: Researchers have identified a single nucleotide polymorphism (SNP) in the human genome that is associated with height. They wish to determine how this SNP affects height across a population.
Application: Each individual’s height () is modeled based on their genotype at this SNP (). The effect size () indicates how much the SNP affects height. For instance, they may find that each copy of the minor allele increases height by 0.5 cm. The model accounts for the SNP’s effect while acknowledging that other factors (captured in ) also influence height.
The Additive Polygenic Model
This model extends the single variant model to include multiple genetic variants. It assumes that the phenotype is the result of additive effects from several genetic variants.
The equation for the additive polygenic model is:
Where:
- is the phenotypic value for individual .
- is the effect size of variant .
- is the genotype dosage for individual at variant .
- is the total number of genetic variants considered.
- is the non-genetic residual error for individual .
Scenario: Further research shows that height is influenced by many genetic variants across the genome, not just one SNP.
Application: A polygenic model sums the effects of many SNPs to predict individual height (). Each SNP’s effect is weighted by its effect size (), and the genotype dosage () of each SNP is considered. This model is typical in genome-wide association studies where researchers assess the contribution of many genetic factors to variations in height across a population.
Defining the Additive Heritability of a Trait
Additive heritability quantifies how much of the variance in a trait can be attributed to genetic factors.
The model used to define the additive heritability of a trait is:
Where:
- is the phenotypic value for individual .
- is the additive genetic component for individual .
- is the environmental and non-genetic residual error for individual .
Additive heritability, denoted as , can be defined in two equivalent ways:
- is the square of the correlation coefficient between the genetic component and the trait .
- is the ratio of the variance of to the variance of .
Scatterplots are used to visualize the relationship between and , and the heritability is often computed as the proportion of the variance explained by the genetic factors.
Scenario: After acknowledging that height is affected by numerous genetic factors, researchers want to quantify how much of the variation in height within a population is due to genetic differences, as opposed to environmental factors like nutrition.
Application: Heritability estimation is used here. The total variance in height in the population () is decomposed into a genetic component (), which is the sum of the effects of all genetic variants, and an environmental component (). By comparing the variation due to genetics with the total variation, researchers can estimate the heritability () of height. A high heritability suggests that genetic factors play a significant role in determining height, while a lower heritability indicates that environmental factors have a greater impact.
From continuous to binary traits
Some traits are binary (i.e. affected vs. unaffected) rather than continuous (i.e. height). The same models can be used to study binary traits, but the interpretation of the results is different.
Liablity Threshold Model
Everyone has a continuous “disease liability” (), and you have the disease if this goes above a threshold :
Logistic Risk Model
Individuals are assigned a continuous log odds (). Your risk of disease depends on :
This equation represents the logistic function used to model the probability of disease presence.
Calculating Heritability from Twin Studies
Using monozygotic (MZ) twins, which are genetically identical, we can estimate heritability of a trait. Assuming that the correlation between MZ twins is purely due to genetics, the correlation can be described by the formula:
This makes it seem straightforward to estimate heritability.
However, we must consider that twins also share environmental factors. To account for this, we can include dizygotic (DZ) twins in our analysis, who share about half of their DNA and are assumed to share similar environmental conditions as MZ twins.
If we assume that both genetic and environmental components are independent, for MZ twins we can express the correlation as:
Under the assumption that MZ and DZ twins experience the same level of environmental similarity (the “equal environment assumption”), we can then write the correlation for DZ twins as:
Simple algebra gives us the equation to calculate heritability:
Thus, with adequate data from MZ and DZ twins, we can determine the heritability of traits with relative ease.
Common modifiers of rare diseases
Increase or decrease the expression of a variant may affect the disease penetrance, severity or expression
Natural Selection and Genetic Variants
The segregation of genetic variants within a population, including their frequency, is influenced by natural selection. A simplified representation of this concept is:
- : Selective coefficient of the variant, indicating the strength of selection against that variant.
- : Prevalence of the disease within the population.
- : Selective coefficient of the disease, representing the reduction in the number of offspring produced by individuals with the disease compared to those without.
- : Effect size of the variant on disease risk.
Key Insights:
- Natural selection tends to decrease the frequency of variants with large effect sizes on diseases, hence these variants are less common and have lower allele frequencies.
- Variants with large effects are often found in diseases that are either less common (e.g., certain autoimmune diseases with a low ) or those that manifest later in life (e.g., Alzheimer’s disease, macular degeneration with a low ), where the selection pressure against them is relatively weaker.
Variant and Effect
-
Major loci (common variants of large effect)
-
Large effect size, high frequency
-
Very important
-
Not all traits have major loci (most don’t)
-
Example: APOE gene and Alzheimer’s disease
-
-
Common variants of small effect
-
Small effect size
-
Often non-coding
-
-
Rare variants of large effect
-
Low frequency, large effect size
-
Hard to find, need whole exome or genome sequencing
-
Do not contribute much to heritability
-
Example: Type 2 diabetes
-
-
Polygenic tail (many genes each contributing a small amount)
-
Small effect size
-
Almost all traits have a polygenic tail
-
Example: Height, 12111 variants discovered
-
Common modifiers of rare diseases
Regulatory variants can increase or decrease the expression of a variant, affecting the disease penetrance, severity or expression
Individual polygenic background can also affect the risk and expression of a variant
Common and rare diseases can have similar symptoms
Somtimes mutations in the same gene can cause rare or common diseases
There isn’t a clear boundary between monogenic and polygenic diseases
Allele Frequency and Penetrance
Allele Frequency
The incidence of an allele in a population
Allele: Major allele, minor allele, ancestral allele, variant allele
population: may be different in different populations
Determine allele frequency
For example:
A||A A||A A||A A||G A||G A||A A||G G||G G||G
Allele frequency for A: 11/18 = 0.61
Allele frequency for G: 7/18 = 0.39Genotype frequency for AA: 4/9 = 0.44
Genotype frequency for AG: 3/9 = 0.33
Genotype frequency for GG: 2/9 = 0.22
gnomAD: Can be used to determine allele frequency
Sometimes it might looks very rare, but it’s not rare in a particular population
We can only determine the allele frequency in the population we have data for, and it might not be representative of the population we are interested in
Penetrance
The proportion of individuals with a variant who develop the disease
Can be measured in different ways:
- Often reported as a percentage
- Often by age
- Can be measured in a particular variant or gene
Penetrance vs. Expressivity
Penetrance: The proportion of individuals with a variant who develop the disease
Expressivity: The severity of the disease in individuals who develop it
Why is penetrance important?
- To inform genetic counselling
- Incidental findings
How to determine penetrance?
-
Ideally:
- Unbiased cohort study with particular genotype
- Long follow-up
- Determine the proportion of individuals who develop the disease
-
Family studies
-
Population studies
-
Bayesian theorem:
-
-
-
Disease prevalence: hard to estimate for rare diseases
-
Case allele frequency: hard to estimate for rare alleles
-