Result interpretation guide for whole plasmid sequencing
In our whole plasmid sequencing service we not only deliver a high-quality assembled and polished sequence, but also provide an informative HTML report. In case you need assistance interpreting the quality of your data, please find detailed descriptions below, along with examples of good and bad quality results. Additionally, we offer tips and tricks to ensure that you achieve only high-quality results.
Oxford nanopore technologies (ONT) sequencing
In ONT nanopore sequencing, the input material directly influences the output. If a flow cell is loaded with small fragment DNA, the resulting reads will be of a corresponding size. Conversely, if high-molecular weight DNA is loaded, longer reads will be obtained. Overall, all molecules included in the sample will be sequenced and the relative read counts of various molecular species will generally align with the actual proportions of those species present in the sample. There is one caveat though: small fragments are sequenced in a higher frequency than longer ones.
Also be aware that agarose gels don’t have a very high sensitivity, only DNA fragments in a high enough concentration can be visualized. There might be more going on at a lower level in the background of the sample (you might see a smear on the gel).
Nanopore sequencing by ONT does not require primers and typically involves sequencing the entire plasmid molecule with read lengths that span its entirety. Therefore, all molecules present in the received sample, including degraded plasmids or background genomic DNA, are sequenced.
Our sample requirements are:
2.5 – 25 kbp
30 ng/ µl
25 – 125 kbp
50 ng/ µl
125 – 300 kbp
50 ng/ µl
The required DNA concentration according to our specifications are between 30 and 50 ng/µl. We highly recommend fluorometric concentration measurements (e.g. Qubit) instead of photometric ones (e.g. Nanodrop), because of their significantly higher accuracy for double-stranded DNA. Photometric measurements frequently overestimate the samples’ DNA concentration. We often receive plasmids from customers who still measure their concentration with photometric measurements, which is the most common reason for failed attempts of plasmid sequencing. We will attempt to sequence your sample even when your sample doesn’t fulfill your requirements. We cannot guarantee success, but unfortunately, we must charge for our sequencing attempt .
As our service is optimized for plasmid clonal population of molecules, we recommend controlling the quality of a sample (i.e. a single gel band), preferably as a linearized plasmid, on a gel or with a Bioanalyzer/Fragment Analyzer (however watch out for biological concatemers, see below). If your samples failed to be sequenced, please consider performing a new plasmid preparation to rule out contamination. Additionally, performing a size selection on a gel could be a good procedure to remove contaminating degraded DNA.
Accuracy of plasmid sequencing results
According to the specifications provided by Oxford Nanopore for the chemistry and flowcells used in our current plasmid sequencing, the raw read accuracy exceeds 99%. In general, higher coverage, which refers to having more reads available for consensus building, tends to enhance the accuracy of the results.
Nevertheless, we also deliver a variant calling within the report, which detects positions in the final plasmid sequence with lower confidence.
If you observe discrepancies between our consensus and your reference, it is possible that the plasmid construct you provided differs from your reference due to missing elements, mutations, or other factors. Such outcomes are commonly revealed through whole plasmid sequencing.
Lower confidence bases
Our consensus assembly process utilizes deep sequencing to achieve a high level of accuracy at the individual base level. However, Oxford Nanopore long read data encounters challenges in resolving certain common motifs. To tackle this issue, we polish the sequence to correct many of these problematic bases.
In addition, we employ a strategy where we map your reads against a high-quality consensus assembly to identify lower confidence bases. During this process, we determine the frequency of each nucleotide at a specific position. In regions with high confidence bases, the majority of raw reads will contain the same assembled base. However, in areas that pose challenges, such as motifs like Dcm methylation sites (CC[A/T]GG) or long stretches of homopolymer bases, different nucleotides may be identified at the same position in the raw reads, despite the assembled base potentially being correct.
If your assembly differs from your expectations, it is important to consider these factors.
Errors or low confidence positions in homopolymer region or a Dcm / Dam methylation site
The most common error modes for Oxford Nanopore are deletions in homopolymer strechtes, errors at the middle position of the Dcm methylation sites CCTGG or CCAGG and errors at the Dam methylation site GATC.
Sequencing coverage of plasmids
We cannot provide a specific level of coverage guarantee as the number of raw reads generated can significantly vary based on the quality of the sample. Typically, successful samples sent at the recommended concentration yield a substantial number of raw sequencing reads, ranging from high dozens to potentially hundreds or even thousands. The average coverage is indicated in the report, and a coverage of approximately 20x or higher suggests a highly accurate consensus.
Read length histograms
Prior to sequencing your plasmids, the library preparation workflow linearizes the circular DNA to obtain predominantly full-length sequence reads. In the result report we plot a read length histogram, which shows the read length from all DNA molecules present in the sample (and which can be sequenced). One dominant peak in the histogram indicates a clean plasmid preparation, usually with a sufficient concentration (see good examples below).
Non-weighted vs. weighted histogram
Distribution of read lengths from sequenced data is shown in the following histograms. Read length histograms can be used to assess the quality of sequencing data, as the distribution of read lengths can indicate extraction quality and fragmentation, the presence of contaminants, or biases in the sequencing process. They can also be used to determine the size of short plasmids, depending on the quality of the sample.
The first histogram displays the number of reads on the y-axis and the read length on the x-axis. Each bar in the histogram represents a range of read lengths, and the height of the bar indicates the total number of reads falling within that range.
The second histogram displays the number of sequenced bases (bp) on the y-axis and the read length on the x-axis. Instead of total number reads the height of the bars indicate the total number of bases (bp) falling within that range. This results in a weighted plot by the number of nucleotides per bin, as longer reads carry more weight in the histogram.