Long Read Sequencing Fills in the Missing Pieces of Genomics
Summary
The development of DNA sequencing technology has made it possible for human beings to explore the mysteries of themselves and other beings, while the advent of the genomics era has put higher demands on sequencing technologies. Scientific research has now entered the era of high throughput, moving from single, localized genes or fragments to the study of the entire genome, covering basic science, disease diagnosis and treatment, agriculture and the environment.- Author Name: Kiko Garcia
The development of DNA sequencing technology has made it possible for human beings to explore the mysteries of themselves and other beings, while the advent of the genomics era has put higher demands on sequencing technologies. Scientific research has now entered the era of high throughput, moving from single, localized genes or fragments to the study of the entire genome, covering basic science, disease diagnosis and treatment, agriculture and the environment.
Currently, most next-generation sequencing (NGS) technologies require amplification first, which can introduce base mismatches and preferences that can affect accuracy. In addition, the weakness of NGS is not conducive to later genome splicing and downstream data analysis, with an average read length of 100-150 bp. Taking the advantage of ultra-long reads (between 10,000 and 100,000 base pairs), long read sequencing (i.e., Pacbio SMRT and Nanopore sequencing) can produce reads in real time and shorten sequencing time. It also reduces the difficulty of genome assembly, gene prediction and annotation. It helps to carry out the sequencing of large and complex whole genomes, such as animals, plants and humans.
Pacbio SMRT Sequencing
This technology is developed by Pacific Biosciences, applying a strategy of sequencing while synthesizing. A single DNA polymerase is immobilized within a zero-mode wave guide (ZMW), which provides the smallest available volume for light detection), and captures the template SMRTbell (A double-stranded DNA template capped with hairpin adaptors at both ends) for replication. And 4 nucleotides labeled fluorescent are paired with the template to recognize bases based on the light pulse generated by excitation. The continuous light pulse signal recorded, which can be considered as a continuous base sequence, is called CLR; by identifying excised hairpin junctions, CLR can be divided into multiple subreads, and the sequences shared between subreads within the same ZMW are called loop common sequences.
Oxford Nanopore Sequencing
Oxford Nanopore sequencing is based on electrical signals rather than optical signals and enables real-time reads during the passage of DNA molecules through special nanopores. When a DNA molecule passes through a nanopore, the electrical charge changes, which briefly affects the intensity of the current passing through the nanopore, and the bases are identified by detecting the change in current.
Applications of Long Read Sequencing
Single-molecule real-time sequencing is beginning to enter the limelight. This technology enables short-read sequencing and directly sequences individual molecules of DNA/RNA, and is gradually being widely used.
Long read sequencing has great application in the direction of de novo genome sequencing and resequencing, especially to improve the accuracy of detecting repeat regions, structural variants, as well as complex regions. Researchers used long read sequencing to produce haploid human genome assembly, populating multiple gaps in the reference genome, including short tandem repeats and structural variant regions in high GC regions. The more complete mapping of the genome, structural variants and repetitive sequences obtained by long read sequencing can provide a reference for genetic and evolutionary studies.
In addition, long read sequencing technology can directly sequence epigenetic modifications in their natural state. Pacbio SMRT sequencing can detect modifications such as methylation and acetylation of DNA and RNA by characterizing 25 methylation types. Researchers using long read sequencing can achieve both de novo assembly and epigenetic modification identification, to determine the genome and epigenome of H. pylori. The long read sequencing technique is also the first to obtain the 6mA methylation site map and distribution of the human genome.