Sequencing Technologies
Sanger Sequencing
Estevezj, CC BY-SA 3.0 https://creativecommons.org/licenses/by-sa/3.0, via Wikimedia Commons
Sanger Sequencing, also known as the chain termination method, was developed by Frederick Sanger in 1977. It is a method of DNA sequencing based on the selective incorporation of chain-terminating dideoxynucleotides by DNA polymerase during in vitro DNA replication.
- Principle: Incorporation of dideoxynucleotides (ddNTPs), which lack the 3’-hydroxyl group, resulting in chain termination once incorporated.
- Process: The DNA sample is divided into four separate sequencing reactions, each containing one of the four ddNTPs. DNA fragments of varying lengths are produced, each ending with the incorporated ddNTP.
- Read Length: Produces read lengths of about 500 to 900 bases.
- Accuracy: Sanger sequencing is known for its high accuracy, but it has a low throughput compared to newer methods.
- Frederick Sanger developed chain termination method for DNA sequencing
- Uses modified nucleotides to terminate sequencing reaction
- Radioactive labeling used originally to detect sequence
- Automated later with fluorescent labels
- Allowed first human genome project sequencing
Why High Throughput Sequencing
- Human genome project took 13 years, $3 billion, and many labs
- Catalyzed development of high throughput sequencing technologies
- Greatly increased speed and reduced cost
- Illumina became dominant platform
Illumina Sequencing
Illumina Sequencing, also known as next-generation sequencing (NGS), is a massively parallel sequencing technology that offers high-throughput sequencing of DNA and RNA samples.
- Principle: Uses sequencing by synthesis (SBS), where fluorescently-labeled nucleotides are added to the DNA strand, and their incorporation is detected by their fluorescence.
- Process: DNA is
fragmentedandadaptersare ligated to fragments. The fragments are then attached to asolid surfaceandamplifiedto formclusters. Sequencing is performed in aflow cell, and millions of fragments are sequenced in parallel. - Read Length: Typically generates short reads ranging from 50 to 300 bases.
- Throughput: Provides a high-throughput option, capable of sequencing millions of fragments simultaneously, making it suitable for large-scale genomic projects.
- Attachment of DNA fragments to flow cell, bridge amplification creates clusters
- Sequencing by synthesis with reversible terminators
- Paired end sequencing by sequencing each end separately
- Limitations: Alignment of
structural variants,De novoassembly of complex genomes, and sequencing ofrepetitive regions
Long Read Sequencing
Long Read Sequencing, often associated with technologies like PacBio Sequencing and Oxford Nanopore, provides the ability to sequence much longer DNA fragments.
- Principle: Directly sequences single molecules of DNA, enabling the detection of long stretches of nucleotides in a single read without the need for assembly of short reads.
- Process: PacBio uses
Circular Consensus Sequencingto obtain HiFi reads single-molecule real-time (SMRT) sequencing, where a DNA polymerase incorporates labeled nucleotides into a DNA template strand. Nanopore sequencing usesprotein nanoporesthrough which individual DNA molecules are threaded, and changes inionic currentare measured as nucleotides pass through. - Read Length: Capable of generating read lengths in the kilobase to megabase range.
- Application: Particularly useful for genome assembly, identification of structural variants, and sequencing of areas with high GC content or repeats.
- Can detect DNA modifications (e.g.,
methylation) directly from the sequence (no need to synthesize a complementary strand).
Example: Highly polymorphic MHC gene
HLA-A
- Short read sequencing has limitations for complex regions
- Long reads span repetitive regions, structural variations better
- Currently more expensive than short read sequencing
- PacBio and Oxford Nanopore are main long read sequencing technologies
- Limitations:
Cost,Throughput,Accuracy(Nanopore’s newer technology is trying to read both strands of DNA to improve accuracy), DNA amount, DNA quality
Other Sequencing Technologies
- Multiplex Sequencing: Simultaneous sequencing of multiple samples in a single run. (e.g., Illumina’s multiplexing, PacBio)
- Virtual Long Reads: Massive barcoding of short reads to create long reads (e.g., MGI’s stLFR)
Comparison Table
| Feature | Sanger Sequencing | Illumina Sequencing | Long Read Sequencing |
|---|---|---|---|
| Method | Chain termination | Sequencing by synthesis | Single-molecule |
| Read Length | 500-900 bases | 75-300 bases | Kilobase to megabase |
| Throughput | Low | High | Moderate to High |
| Accuracy | High | High | Lower than Sanger/Illumina |
| Suitability | Small-scale projects | Large-scale genomics | Complex genome assembly |
| Complexity | Low | Moderate | High |
| Cost | Low per read | Low per base | Higher per read/base |
| Infrastructure | Basic lab equipment | Specialized equipment | Specialized equipment |
Sample Prep for Sequencing
- Key steps: sample preparation, adding adaptors, sequencing, data analysis
- Adaptors allow sequencing primers to bind
- PCR replicates DNA fragments to increase signal
RNA Sequencing
- Sequencing RNA provides information on gene expression
- Reverse transcription converts RNA to cDNA for sequencing
- Strategies for dealing with RNA degradation
Additional Notes
-
Read Length: the length of a DNA sequence that is read by a sequencing machine
-
Read Depth: the number of times a DNA sequence is read by a sequencing machine (remove PCR duplicates, which are reads that are identical to other reads)
-
Coverage: the average number of reads that align to, or “cover”, a given nucleotide in the reference genome during the sequencing process
-
Deletion: When there is a deletion in the DNA sequence, there will be a gap in the alignment
-
Adapter Contamination: Adapter contamination occurs when the adapter sequence is accidentally sequenced along with the DNA fragment. This can happen when the adapter is not properly removed during the library preparation process. Adapter contamination can be detected by looking for the adapter sequence in the reads.