0%

Historical Approaches in Human Disease Genetics

Historical Approaches in Human Disease Genetics

Some history

  • Before 1990: Expensive, time-consuming, and labor-intensive to study genetic diseases. Smaller lab could only do segregation analysis, collecting non-genetic data from families to infer genetic models.

  • 1990s:

    • Linkage Analysis: Genotyping small numbers of variants spread thinly across the genotype in disease families to find regions of the genome associated with a disease.
    • Candidate Gene Studies: Testing pre-selected variants in gene of interest for asoociation with disease.
  • 2005 onwards: Whole-genome genotyping and sequencing became more affordable and accessible, leading to genome-wide association studies (GWAS).

Genotyping vs. Sequencing: Genotyping involves testing for specific variants, while sequencing involves reading the entire genome.

Segregation Analysis

Genetics with no genetic data

Involves studying family trees (pedigrees) to determine inheritance patterns of diseases without genetic data.

Example: Cycstic Fibrosis was identified as a recessive trait by Dorothy Hansine Andersen using segregation analysis.

Multiple models explaining the inheritance mode of a disease

Complex Segregation Analysis (CSA)

Build a statistical model of gneetic risk, and average over all possible (unknown) genotypes present in the family.

Need to have lots of families to get a good estimate of the genetic model.

CSA likelihood measures how likely you will see the pattern of disease observed across all faimilies under a genetic model

Use log likelihood because the the numbers are normally quite small.

Pick the least negative is the best model (result always negative).

Unified Model

segregation analysis tool - implemented in the package SAGE

  • It uses:
    • A major locus, an allele frequency, and penetrance
    • A polygenic component
    • A environmental component

All parameters are estimated from the data.

A good Example: Parameterizing BRCA mutations in familial breast cancer.
A bad Example: Medical school is a recessive trait.

Genetic Linkage in Families

Linkage maps

Made with Linkage Markers - highly polymorphic markers spaced throughout the genome.

Earliest days - less than 200 variants
Now about 3000 variants

Parametric Linkage and LOD score

Prarametric linkage uses likelihood-based modeling as CSAs

  • A specified genetic model
  • Test every site in the genome for how consistent the marker data is with the genetic model

The LOD score is the log of the likelihood ratio of the data under the null hypothesis (no linkage) to the alternative hypothesis (linkage).

Larger LOD score means more evidence for linkage.
LOD score of 3 is considered significant evidence for linkage. (>4 is safer)

None parametric linkage

Wrong parametric model can lead to poor results.

Non-parametric linkage tests for the family members are more closely related to each other at the site of the disease gene than expected by chance.

Identical by Descent (IBD)

  • IBD 0: Siblings share no alleles by descent at a locus.
  • IBD 1: Siblings share one allele by descent.
  • IBD 2: Siblings share both alleles by descent.

In absence of disease risk, siblings have 25/50/25 ratio of IBD 0/1/2.

  • Dominant disease: 50/50/0
  • Recessive disease: 100/0/0

Almost all genetic models generated this effect

The power of this method is lower but it is more robust due to no need for a correct genetic model.

Example: Continuing the BRCA example, Linkage analysis was used to map the gene to chromosome 17.

Rise and Fall of Linkage Studies

  • Successes: Cystic Fibrosis, Huntington’s Disease, and BRCA1

  • Some successes in complex diseases: inflammatory bowel disease

  • Need large effect sizes

  • Problems with heterogeneity

Candidate Gene Studies

Sequencing and genotyping were more affordatble

  • Pick a gene that is thought to be associated with a disease
  • Collect a few hundred cases and controls
  • Genotype variants in the gene
  • Test for association

Advantages:

  • Not too expensive
  • Results for a gene
  • Power is high for odd ratios > 1.5

Good Example: PPARγ(G)'s Pro12Ala mutation was found to be associated with diabetes.

Bad Example: Multiple candidate genes for depression failed to replicate in larger studies.

Problems:

  • Power: Probability of getting positive result and the hypothesis is True.
    • Effect size
    • Sample size
    • Allele frequency
  • Type 1 Error: Probability of getting positive result and the hypothesis is False.
  • Prior: Probability of the hypothesis being True.
    • In candidate gene studies, it is the proportion of genes that are actually increasing risk.
Pick Gene Hypothesis Result
Pick a gene and variant Variants increase risk (Prior) p < 0.05 True Positive (Power)
p > 0.05 False Negative (1 - Power)
Variants does not increase risk (1 - Prior) p < 0.05 False Positive
p > 0.05 True Negative

Pr(True PositivePositive)=prior×powerprior×power+(1prior)×αPr(\text{True Positive}|\text{Positive}) = \frac{\text{prior} \times \text{power}}{\text{prior} \times \text{power} + (1 - \text{prior}) \times \alpha}

We know:

  • α\alpha threshold for significance
  • The sample size
  • Allele frequency

We don’t know:

  • Prior
  • Effect size

With a large sample size, and a small p-value, we can get a good estimate even with a small prior.