Result interpretation guide for clonal amplicon sequencing
In our clonal amplicon service, we not only deliver a high-quality assembled and polished sequence, but also provide an informative HTML report. In case you need assistance interpreting the quality of your data, please find detailed descriptions below, along with examples of good and bad quality results. Additionally, we offer tips and tricks to ensure that you achieve only high-quality results.
Oxford nanopore technologies (ONT) sequencing
In ONT nanopore sequencing, the input material directly influences the output. If a flow cell is loaded with small fragment DNA, the resulting reads will be of a corresponding size. Conversely, if high-molecular weight DNA is loaded, longer reads will be obtained. Overall, all molecules included in the sample will be sequenced and the relative read counts of various molecular species will generally align with the actual proportions of those species present in the sample. There is one caveat though: small fragments are sequenced in a higher frequency than longer ones.
Also be aware that agarose gels don’t have a very high sensitivity, only DNA fragments in a high enough concentration can be visualized. There might be more going on at a lower level in the background of the sample (you might see a smear on the gel).
Nanopore sequencing by ONT does not require primers. Therefore, all molecules present in the received sample, including unspecific amplifications, primer dimers or background genomic DNA, are sequenced.
Sample requirements:
Our sample requirements are:
Size category
|
Length
|
Concentration
|
Min. Volume
|
Clonal amplicons
|
1 – 25 kbp
|
30 ng/ µl
|
20 µl
|
The required DNA concentration according to our specifications are minimum 30 50 ng/µl. We highly recommend fluorometric concentration measurements (e.g. Qubit) instead of photometric ones (e.g. Nanodrop), because of their significantly higher accuracy for double-stranded DNA. Photometric measurements frequently overestimate the samples’ DNA concentration. We often receive amplicons from customers who still measure their concentration with photometric measurements, which is the most common reason for failed attempts of amplicon sequencing. We will attempt to sequence your sample even when your sample doesn’t fulfill your requirements. We cannot guarantee success, but unfortunately, we must charge for our sequencing attempt .
As our service is optimized for clonal population of molecules, we recommend controlling the quality of a sample (i.e. a single gel band), on a gel or with a Bioanalyzer/Fragment Analyzer. If your samples failed to be sequenced, please consider performing a new PCR to rule out contamination. Additionally, performing a size selection on a gel could be a good procedure to remove contaminating unspecific DNA amplifications or primer dimers.
Accuracy of clonal amplicon sequencing results
According to the specifications provided by Oxford Nanopore for the chemistry and flowcells used in our current clonal amplicon sequencing, the raw read accuracy exceeds 99%. In general, higher coverage, which refers to having more reads available for consensus building, tends to enhance the accuracy of the results.
Nevertheless, we also deliver a variant calling within the report, which detects positions in the final clonal amplicon sequence with lower confidence.
If you observe discrepancies between our consensus and your reference, it is possible that the amplicon you provided differs from your reference due to missing elements, mutations, or other factors. Such outcomes are commonly revealed through clonal amplicon sequencing.
Lower confidence bases
Our consensus assembly process utilizes deep sequencing to achieve a high level of accuracy at the individual base level. However, Oxford Nanopore long read data encounters challenges in resolving certain common motifs. To tackle this issue, we polish the sequence to correct many of these problematic bases.
In addition, we employ a strategy where we map your reads against a high-quality consensus assembly to identify lower confidence bases. During this process, we determine the frequency of each nucleotide at a specific position. In regions with high confidence bases, the majority of raw reads will contain the same assembled base. However, in areas that pose challenges, such as motifs like Dcm methylation sites (CC[A/T]GG) or long stretches of homopolymer bases, different nucleotides may be identified at the same position in the raw reads, despite the assembled base potentially being correct.
If your assembly differs from your expectations, it is important to consider these factors.
Errors or low confidence positions in homopolymer region or a Dcm / Dam methylation site
The most common error modes for Oxford Nanopore are deletions in homopolymer strechtes, errors at the middle position of the Dcm methylation sites CCTGG or CCAGG and errors at the Dam methylation site GATC.
Sequencing coverage of clonal amplicons
We cannot provide a specific level of coverage guarantee as the number of raw reads generated can significantly vary based on the quality of the sample. Typically, successful samples sent at the recommended concentration yield a substantial number of raw sequencing reads, ranging from high dozens to potentially hundreds or even thousands. The average coverage is indicated in the report, and a coverage of approximately 20x or higher suggests a highly accurate consensus.
Data interpretation
Read length histograms
Prior to sequencing your amplicons, the library preparation workflow cuts the linear DNA and attaches sequencing adapters. In the result report we plot a read length histogram, which shows the read length from all sequenced DNA molecules present in the sample (and which can be sequenced). Because the library preparation has to cut the amplicon we don’t expect a dominant peak in the length of the initial amplicon, but rather sequencing reads in different lengths up to the amplicon size. The length of the sequencing reads ultimately depend on where the amplicon was cut during library preparation.
Non-weighted vs. weighted histogram
Distribution of read lengths from sequenced data is shown in the following histograms. Read length histograms can be used to assess the quality of sequencing data, as the distribution of read lengths can indicate extraction quality and unspecific amplification, the presence of contaminants, or biases in the sequencing process.
Non-weighted histogram
The first histogram displays the number of reads on the y-axis and the read length on the x-axis. Each bar in the histogram represents a range of read lengths, and the height of the bar indicates the total number of reads falling within that range.
Weighted histogram
The second histogram displays the number of sequenced bases (bp) on the y-axis and the read length on the x-axis. Instead of total number reads the height of the bars indicate the total number of bases (bp) falling within that range. This results in a weighted plot by the number of nucleotides per bin, as longer reads carry more weight in the histogram.
Insufficient data
Frequently, the read count is insufficient to generate a consensus sequence. When the read count is too low, it is typically due to samples not being prepared at the required DNA concentration according to our specifications of at least 30 ng/µl. We highly recommend fluorometric concentration measurements (e.g. Qubit) instead of photometric ones (e.g. Nanodrop), because of their significantly higher accuracy for double-stranded DNA. Photometric measurements frequently overestimate the samples’ DNA concentration. We often receive PCR products from customers who still measure their concentration with photometric measurements, which is the most common reason for failed attempts of plasmid sequencing.