Drawings
illustrates a CE device 100 in accordance with one embodiment.
Referring to Figure 1, a CE device 100 in one embodiment comprises a voltage bias source 102, a capillary 104, a body 114, a detector 106, a sample injection port 108, a heater 110, and a separation media 112. A sample is injected into the sample injection port 108, which is maintained at an above-ambient temperature by the heater 110. Once injected the sample engages the separation media 112 and is split into component molecules. The components migrate through the capillary 104 under the influence of an electric field established by the voltage bias source 102, until they reach the detector 106.
illustrates a CE device 200 in accordance with one embodiment.
Referring to Figure 2, a CE device 200, in one embodiment, comprises a voltage bias source 202, a capillary 204, a body 214, a detector 206, a sample injection port 208, a heater 210, and a separation media 212. A sample is injected into the sample injection port 208, which is maintained at an above-ambient temperature by the heater 210. Once injected the sample engages the separation media 212 and is split into component molecules. The components migrate through the capillary 204 under the influence of an electric field established by the voltage bias source 202, until they reach the detector 206. The CE device 200 may be a component of an instrument 216 that include a computational device to collect and process image signals from the detector. The instrument 216 may be a capillary electrophoresis genetic analyzer providing many similar features found in exemplary commercial CE instruments
illustrates a CE system 300 in accordance with one embodiment.
Referencing Figure 3, a CE system 300 in one embodiment comprises a source buffer 316 initially comprising the fluorescently labeled sample 318, a capilary 320, a destination buffer 324, a power supply 326, a computing device 302 comprising a processor 308, memory 306 comprising basecaller algorithm 304, and a controller 310. The source buffer 316 is in fluid communication with the destination buffer 324 by way of the capilary 320. The power supply 326 applies voltage to the source buffer 316 and the destination buffer 324 generating a voltage bias through an anode 328 in the source buffer 316 and a cathode 330 in the destination buffer 324. The voltage applied by the power supply 326 is configured by a controller 310 operated by the computing device 302. The fluorescently labeled sample 318 near the source buffer 316 is pulled through the capilary 320 by the voltage gradient and optically labeled nucleotides of the DNA fragments within the sample are detected as they pass through an optical sensor 322. Differently sized DNA fragments within the fluorescently labeled sample 318 are pulled through the capilary at different times due to their size. The optical sensor 322 detects the fluorescent labels on the nucleotides as an image signal and communicates the image signal to the to the computing device 302. The computing device 302 aggregates the image signal as sample data and utilizes a basecaller algorithm 304 stored in memory 306 to transform the sample data into processed data and generate an electropherogram 314 to be shown in a display device 312.
illustrates a CE process 400 in accordance with one embodiment.
Referencing Figure 4, a CE process 400 involves a computing device 412 communicating a configuration control 416 to a controller 408 to control the voltage applied by a power supply 406 to the buffers 402. After the prepared flourscently labled sample has been added to the source buffer, the controller 408 communicates an operation control 418 to the power supply 406 to apply a voltage 420 to the buffers creating a voltage bias/electrical gradient. The applied voltage cause the fluorescently labled sample 422 to move through capilary 404 between the buffers 402 and pass by the optical sensor 410. The optical sensor 410 detects fluorescent labels on the nucleotides of the DNA fragments that pass through the capilary and communicates the image signal 424 to the computing device 412. The computing device 412 aggregates the image signal 424 to generate the sample data for further processing. A basecaller algorithm processes the sample data (e.g., signal values) to generate processed data. The computing device 412 then generates a display control 426 to display an electropherogram of the processed data on a display device 414.
illustrates a CE process 500 in accordance with one embodiment.
Referencing Figure 5, a CE process 500 involves configuring a capillary electrophoresis instrument operating parameters to sequence at least one fluorescently labeled sample (block 502). The configuration of the insturment may include creating or importing a plate setting for running a series of samples and assigning labels to the plate samples to assist in the processing of the collected imaging data. The process may also include communicating configuration controls to a controller to start applying voltage at a predetermined time. In block 504, the CE process 500 loads the fluorescently labled sample into the instrument. After the sample is loaded into the insturment, the insturment may transfer the sample from a plate well into the capilary tube and then position the capilary tube into the starting buffer at the beginning of the capilary electrophoresis process. In block 506, the CE process 500 begins the insturment run after the sample has been loaded into the capilary by applying a voltage to the buffer solutions postioned at opposite ends of the capilary, forming an electrical gradient to transport DNA fragments of the fluorescently labeled sample fromt the starting buffer to a destination buffer and traversing an optical sensor. In block 508, the CE process 500 detects the individual fluorescent signals on the nucleotides of the DNA fragments as they move towards the destination buffer through the optical sensor and communicates the image signal to the computing device. In block 510, the CE process 500 aggregates the image signal at the computing device from the optical sensor and generates sample data that corresponds to the fluorescent intensity of the nucleotides DNA fragments. In block 512, the CE process 500 processes the sample data to identify the bases called in the DNA fragments at the particular time point. In block 514, the CE process 500 displays processed data through an electrpherogram through a display device.
illustrates a sequencing data 600 in accordance with one embodiment.
Referencing Figure 6, sequencing data 600 shows Sanger sequencing data 602 in a electropherogram. Capillary electrophoresis technologies (CE) can examine several varieties of biopolymer; for example, DNA, methylated DNA, mRNA, proteins tagged with variable length oligos. The resulting data appears as a series of peaks. Minor variations in a biopolymer appear as smaller peaks, possibly overlapped with the peaks corresponding to the dominant form of the biopolymer. The smaller peaks can be confused with peaks that arise from biochemical noise associated with the chemical reactions used to process the biological sample. With the Sanger sequencing data 602, a number of peaks 604 are seen at 116 nucleotide position corresponding to a cytosine. Thorough exisitng analysis techniques that involve examining the characteristics of the peaks such as peak height and width and data maninpulation 608, the Sanger sequencing data 602 may be presented as adjusted data 610 showing the peaks 604 as a detected variant 606 for the 116 nucleotide position.
Current solutions, such as commerically available variant detection software (e.g., Thermo Fisher Minor variant Finder, Multilocus variant Analysis, variant Reporter Software v1.1, variant analysis and identitifcation modules in MicrobeBridge Software, etc.,) to recognize the smaller peaks that are associated with the minor biopolymeric forms present in a sample involve examining the characteristics of the peaks such as peak height and width.
The Sanger sequencing data 602 and the adjusted data 610, as well as the analysis process utilized to identify the peak characteristics, may be stored within sequencing data storage to be used in the statistical analysis of similar results or applied as inputs to machine learning algorithms to distinguish dominant from minor variants of biomolecules.
Parts List
100
CE device
102
voltage bias source
104
capillary
106
detector
108
sample injection port
110
heater
112
separation media
114
body
200
CE device
202
voltage bias source
204
capillary
206
detector
208
sample injection port
210
heater
212
separation media
214
body
216
instrument
300
CE system
302
computing device
304
basecaller algorithm
306
memory
308
processor
310
controller
312
display device
314
electropherogram
316
source buffer
318
fluorescently labeled sample
320
capilary
322
optical sensor
324
destination buffer
326
power supply
328
anode
330
cathode
400
CE process
402
buffers
404
capilary
406
power supply
408
controller
410
optical sensor
412
computing device
414
display device
416
configuration control
418
operation control
420
422
fluorescently labled sample
424
image signal
426
display control
500
CE process
502
block
504
block
506
block
508
block
510
block
512
block
514
block
600
sequencing data
602
Sanger sequencing data
604
peaks
606
detected variant
608
data maninpulation
610
adjusted data
Terms/Definitions
average peak
sample data
the output of a single lane or capillary on a sequencing instrument. Sample data is entered into Sequencing Analysis, SeqScape, and other sequencing analysis software.
plasmid
a genetic structure in a cell that can replicate independently of the chromosomes, typically a small circular DNA strand in the cytoplasm of a bacterium or protozoan. Plasmids are much used in the laboratory manipulation of genes.
polymerase
an enzyme that catalyzes polymerization. DNA and RNA polymerases build single‐stranded DNA or RNA (respectively) from free nucleotides, using another single‐stranded DNA or RNA as the template.
mixed base
One-base positions that contain 2, 3, or 4 bases. These bases are assigned the appropriate IUB code.
noise
average background fluorescent intensity for each dye.
capillary electrophoresis genetic analyzer
insturment that applies an electrical field to a capilary loaded with a sample so that the negatively charged DNA fragments move toward the positive electrode. The speed at which a DNA fragment moves through the medium is inversely proportional to its molecular weight. This process of electrophoresis can separate the extension products by size at a resolution of one base.
raw data
a multicolor graph displaying the fluorescence intensity (signal) collected for each of the four fluorescent dyes.
basecall
then assigning a nucleotide base to each peak (A, C, G, T, or N) of the fluorescence signal.
primer
A short single strand of DNA that serves as the priming site for DNA polymerase in a PCR reaction.
amplicon
the product of a PCR reaction. Typically, an amplicon is a short piece of DNA.
variant
bases where the consensus sequence differs from the reference sequence that is provided.
base pair
complementary nucleotide in a DNA sequence. Thymine (T) is complementary to adenine (A) and guanine (G) is complementary to cytosine (C).
3′ end
single nucleotide polymorphism
a variation in a single base pair in a DNA sequence.
quality values
an estimate (or prediction) of the likelihood that a given basecall is in error. Typically, the quality value is scaled following the convention established by the phred program: QV = –10 log10(Pe), where Pe stands for the estimated probability that the call is in error. Quality values are a measure of the certainty of the base calling and consensus-calling algorithms. Higher values correspond to lower chance of algorithm error. Sample quality values refer to the perbase quality values for a sample, and consensus quality values are per-consensus quality values.
width curve
heterozygous insertion deletion variant
see single nucleotide polymorphism
average peak width
Sanger Sequencer
a DNA sequencing process that takes advantage of the ability of DNA polymerase to incorporate 2´,3´-dideoxynucleotides—nucleotide base analogs that lack the 3´-hydroxyl group essential in phosphodiester bond formation. Sanger dideoxy sequencing requires a DNA template, a sequencing primer, DNA polymerase, deoxynucleotides (dNTPs), dideoxynucleotides (ddNTPs), and reaction buffer. Four separate reactions are set up, each containing radioactively labeled nucleotides and either ddA, ddC, ddG, or ddT. The annealing, labeling, and termination steps are performed on separate heat blocks. DNA synthesis is performed at 37°C, the temperature at which DNA polymerase has the optimal enzyme activity. DNA polymerase adds a deoxynucleotide or the corresponding 2´,3´-dideoxynucleotide at each step of chain extension. Whether a deoxynucleotide or a dideoxynucleotide is added depends on the relative concentration of both molecules. When a deoxynucleotide (A, C, G, or T) is added to the 3´ end, chain extension can continue. However, when a dideoxynucleotide (ddA, ddC, ddG, or ddT) is added to the 3´ end, chain extension 4 DNA Sequencing by Capillary terminates . Sanger dideoxy sequencing results in the formation of extension products of various lengths terminated with dideoxynucleotides at the 3´ end.
mobility shift
electrophoretic mobility changes imposed by the presence of different fluorescent dye molecules associated with differently labeled reaction extension products.
spacing curve
5′ end
image signal
a number that indicates the intensity of the fluorescence from one of the dyes used to identify bases during a data run. Signal strength numbers are shown in the Annotation view of the sample file.
n-1 peak
pure base
assignment mode for a base caller, where the base caller determines an A, C, G, and T to a position instead of a variable.
polymerase slippage
is a form of mutation that leads to either a trinucleotide or dinucleotide expansion or contraction during DNA replication. A slippage event normally occurs when a sequence of repetitive nucleotides (tandem repeats) are found at the site of replication. Tandem repeats are unstable regions of the genome where frequent insertions and deletions of nucleotides can take place.
relative fluoresce unit
measurements in electrophoresis methods, such as for DNA analysis. A “relative fluorescence unit” is a unit of measurement used in analysis which employs fluorescence detection.
base spacing
the number of data points from one peak to the next. A negative spacing value or a spacing value shown in red indicates a problem with your samples, and/or the analysis parameters.
Exemplary commercial CE devices
include the Applied Biosystems, Inc. (ABI) genetic analyzer models 310 (single capillary), 3130 (4 capillary), 3130xL (16 capillary), 3500 (8 capillary), 3500xL (24 capillary), 3730 (48 capillary), and 3730xL (96 capillary), the Agilent 7100 device, Prince Technologies, Inc.’s PrinCE™ Capillary Electrophoresis System, Lumex, Inc.’s Capel-105™ CE system, and Beckman Coulter’s P/ACE™ MDQ systems, among others.
separation or sieving media
include gels, however non-gel liquid polymers such as linear polyacrylamide, hydroxyalkylcellulose (HEC), agarose, and cellulose acetate, and the like can be used. Other separation media that can be used for capillary electrophoresis include, but are not limited to, water soluble polymers such as poly(N,N′-dimethylacrylamide)(PDMA), polyethylene glycol (PEG), poly(vinylpyrrolidone) (PVP), polyethylene oxide, polysaccharides and pluronic polyols; various poly(vinylalcohol) (PVAL)-related polymers, polyether-water mixture, lyotropic polymer liquid crystals, among others.
beam search
a heuristic search algorithm that explores a graph by expanding the most promising node in a limited set. Beam search is an optimization of best-first search that reduces its memory requirements. Best-first search is a graph search which orders all partial solutions (states) according to some heuristic. But in beam search, only a predetermined number of best partial solutions are kept as candidates.[1] It is thus a greedy algorithm. Beam search uses breadth-first search to build its search tree. At each level of the tree, it generates all successors of the states at the current level, sorting them in increasing order of heuristic cost.[2] However, it only stores a predetermined number, β, of best states at each level (called the beam width). Only those states are expanded next. The greater the beam width, the fewer states are pruned. With an infinite beam width, no states are pruned and beam search is identical to breadth-first search. The beam width bounds the memory required to perform the search. Since a goal state could potentially be pruned, beam search sacrifices completeness (the guarantee that an algorithm will terminate with a solution, if one exists). Beam search is not optimal (that is, there is no guarantee that it will find the best solution). In general, beam search returns the first solution found. Beam search for machine translation is a different case: once reaching the configured maximum search depth (i.e. translation length), the algorithm will evaluate the solutions found during search at various depths and return the best one (the one with the highest probability). The beam width can either be fixed or variable. One approach that uses a variable beam width starts with the width at a minimum. If no solution is found, the beam is widened and the procedure is repeated.