Troubleshooting guide

Troubleshooting & result interpretation

Result interpretation guide for whole plasmid sequencing

In our whole plasmid sequencing service we not only deliver a high-quality assembled and polished sequence, but also provide an informative HTML report. In case you need assistance interpreting the quality of your data, please find detailed descriptions below, along with examples of good and bad quality results. Additionally, we offer tips and tricks to ensure that you achieve only high-quality results.

Oxford nanopore technologies (ONT) sequencing

In ONT nanopore sequencing, the input material directly influences the output. If a flow cell is loaded with small fragment DNA, the resulting reads will be of a corresponding size. Conversely, if high-molecular weight DNA is loaded, longer reads will be obtained. Overall, all molecules included in the sample will be sequenced and the relative read counts of various molecular species will generally align with the actual proportions of those species present in the sample. There is one caveat though: small fragments are sequenced in a higher frequency than longer ones.
Also be aware that agarose gels don’t have a very high sensitivity, only DNA fragments in a high enough concentration can be visualized. There might be more going on at a lower level in the background of the sample (you might see a smear on the gel).
Nanopore sequencing by ONT does not require primers and typically involves sequencing the entire plasmid molecule with read lengths that span its entirety. Therefore, all molecules present in the received sample, including degraded plasmids or background genomic DNA, are sequenced.

Sample requirements:

Our sample requirements are:

Size category	Length	Concentration	Min. Volume
Regular	2.5 – 25 kbp	30 ng/ µl	20 µl
Large	25 – 125 kbp	50 ng/ µl	30 µl
XL	125 – 300 kbp	50 ng/ µl	50 µl

The required DNA concentration according to our specifications are between 30 and 50 ng/µl. We highly recommend fluorometric concentration measurements (e.g. Qubit) instead of photometric ones (e.g. Nanodrop), because of their significantly higher accuracy for double-stranded DNA. Photometric measurements frequently overestimate the samples’ DNA concentration. We often receive plasmids from customers who still measure their concentration with photometric measurements, which is the most common reason for failed attempts of plasmid sequencing. We will attempt to sequence your sample even when your sample doesn’t fulfill your requirements. We cannot guarantee success, but unfortunately, we must charge for our sequencing attempt .

As our service is optimized for plasmid clonal population of molecules, we recommend controlling the quality of a sample (i.e. a single gel band), preferably as a linearized plasmid, on a gel or with a Bioanalyzer/Fragment Analyzer (however watch out for biological concatemers, see below). If your samples failed to be sequenced, please consider performing a new plasmid preparation to rule out contamination. Additionally, performing a size selection on a gel could be a good procedure to remove contaminating degraded DNA.

Accuracy of plasmid sequencing results

According to the specifications provided by Oxford Nanopore for the chemistry and flowcells used in our current plasmid sequencing, the raw read accuracy exceeds 99%. In general, higher coverage, which refers to having more reads available for consensus building, tends to enhance the accuracy of the results.

Nevertheless, we also deliver a variant calling within the report, which detects positions in the final plasmid sequence with lower confidence.

If you observe discrepancies between our consensus and your reference, it is possible that the plasmid construct you provided differs from your reference due to missing elements, mutations, or other factors. Such outcomes are commonly revealed through whole plasmid sequencing.

Lower confidence bases

Our consensus assembly process utilizes deep sequencing to achieve a high level of accuracy at the individual base level. However, Oxford Nanopore long read data encounters challenges in resolving certain common motifs. To tackle this issue, we polish the sequence to correct many of these problematic bases.

In addition, we employ a strategy where we map your reads against a high-quality consensus assembly to identify lower confidence bases. During this process, we determine the frequency of each nucleotide at a specific position. In regions with high confidence bases, the majority of raw reads will contain the same assembled base. However, in areas that pose challenges, such as motifs like Dcm methylation sites (CC[A/T]GG) or long stretches of homopolymer bases, different nucleotides may be identified at the same position in the raw reads, despite the assembled base potentially being correct.

If your assembly differs from your expectations, it is important to consider these factors.

Errors or low confidence positions in homopolymer region or a Dcm / Dam methylation site

The most common error modes for Oxford Nanopore are deletions in homopolymer strechtes, errors at the middle position of the Dcm methylation sites CCTGG or CCAGG and errors at the Dam methylation site GATC.

Sequencing coverage of plasmids

We cannot provide a specific level of coverage guarantee as the number of raw reads generated can significantly vary based on the quality of the sample. Typically, successful samples sent at the recommended concentration yield a substantial number of raw sequencing reads, ranging from high dozens to potentially hundreds or even thousands. The average coverage is indicated in the report, and a coverage of approximately 20x or higher suggests a highly accurate consensus.

For more questions, please also visit our FAQs >>

Data interpretation

Read length histograms

Prior to sequencing your plasmids, the library preparation workflow linearizes the circular DNA to obtain predominantly full-length sequence reads. In the result report we plot a read length histogram, which shows the read length from all DNA molecules present in the sample (and which can be sequenced). One dominant peak in the histogram indicates a clean plasmid preparation, usually with a sufficient concentration (see good examples below).

Non-weighted vs. weighted histogram

Distribution of read lengths from sequenced data is shown in the following histograms. Read length histograms can be used to assess the quality of sequencing data, as the distribution of read lengths can indicate extraction quality and fragmentation, the presence of contaminants, or biases in the sequencing process. They can also be used to determine the size of short plasmids, depending on the quality of the sample.

Non-weighted histogram

The first histogram displays the number of reads on the y-axis and the read length on the x-axis. Each bar in the histogram represents a range of read lengths, and the height of the bar indicates the total number of reads falling within that range.

Weighted histogram

The second histogram displays the number of sequenced bases (bp) on the y-axis and the read length on the x-axis. Instead of total number reads the height of the bars indicate the total number of bases (bp) falling within that range. This results in a weighted plot by the number of nucleotides per bin, as longer reads carry more weight in the histogram.

Examples

Example of good plasmid preparations

One dominant peak indicates a clean monoclonal plasmid preparation, which usually yields good sequencing results (if enough coverage is achieved). Plasmid mixtures can result in erroneous assemblies, depending on sequence homologies within the sample.

Please be aware that a single apparent peak in the histogram could represent multiple plasmids of the same size or multiple plasmids of varying lengths that happen to fall within the same bin. In the analysis pipeline, sequences that are highly similar are treated as variations of a single species, resulting in an attempt to generate a single consensus sequence (with potentially low confidence positions reported in the report).

Example of plasmid preparations that meet the required criteria

In certain instances, we encounter samples exhibiting a prominent peak alongside a significant presence of degraded DNA, including both genomic and plasmid fragments. In the majority of cases the dominant peak still yields a consensus sequence if the read coverage and accuracy meet the required thresholds.

Sometimes the read length is divided into two different size bins of the histogram. This can be a result of:

defined bin boundaries split the dominant read length peak of one clean plasmid into two, this has no influence on our analysis
small variations in the plasmid (e.g. homopolymer regions of different sizes, small insertion and deletions (InDels)), which result in slightly different plasmid sizes around the bin boundaries
plasmid mixture of two plasmids of very similar size

These variations in the plasmid prep can only be seen in the read length histogram but would not be visible in traditional Sanger sequencing.

Example of multiple peaks

You may observe multiple peaks in the read length histograms, this can have three reasons:

a sample including a mixture of plasmids with different sizes
Concatemers of a biological plasmid..
Unexpected side products of the propagation of the plasmid in the cloning strain that are a result of deletions, recombinations, or the above mentioned concatemers

We frequently observe the presence of concatemers, which are not sequencing artifacts. Concatemers cannot be detected through Sanger sequencing, and they are not visible on gels of digested or linearized plasmids. Consequently, they may appear unfamiliar to those who are not accustomed to encountering them. However, running the sample uncut on a gel will reveal the dimer etc. band. Concatemers often form in vivo during growth in a recA+ strain. For a detailed description of this phenomenon see https://blog.addgene.org/plasmids-101-dimers-and-multimers

As our service is optimized for plasmid clonal population of molecules, we recommend controlling the quality of a sample (i.e. a single gel band) on a gel or with a Bioanalyzer/Fragment Analyzer.

Our pipeline is designed to only return the major peak and corresponding plasmid included in the sample (see weighted vs non-weighted above). If the above sample is a plasmid mixture only the plasmid with around 7kb would be reported.

Resolving plasmid mixtures

Our plasmid service is designed for analyzing clonal populations of molecules. While it is possible to submit mixtures of molecular species, it comes with an inherent risk as we cannot predict the outcome of the analysis.

If your plasmid molecules are highly similar in length and sequence, with only a few nucleotide differences, the analysis pipeline will typically produce a single .fasta consensus file with low confidence positions at SNP/indel locations. You can refer to the report provided to identify these locations.
If your species exhibits sufficient distinctness, the pipeline will generate a .fasta consensus file for the most abundant species if the plasmid size is below <25kb.

Degraded DNA or contamination with small fragments

If your plasmid DNA is degraded during the preparation process, the resulting sequencing reads will predominantly consist of small fragments with no dominant peak, despite high read count. This can result in insufficient coverage and no plasmid consensus sequence can be generated. Another potential scenario is contamination with degraded host genomic DNA, which would have a similar effect. As our service is optimized for plasmid clonal population of molecules, we recommend controlling the quality of a sample (i.e. a single gel band), preferably as a linearized plasmid, on a gel or with a Bioanalyzer/Fragment Analyzer (however watch out for biological concatemers, see above).

We do not offer re-sequencing or refunds for samples that fail, unless a technical issue is identified on our end. Samples not meeting the required DNA concentration, containing multiple plasmids, substantial host contamination, or plasmids with multiple large repetitive elements are more likely to fail in generating a consensus sequence. If your sample failed to be sequenced, please refer to our Sample requirement stated above, where you will find suggestions on how to improve your sample quality.

Insufficient data

Frequently, the read count is insufficient to differentiate individual peaks or generate a consensus. When the read count is too low, it is typically due to samples not being prepared at the required DNA concentration according to our specifications of 50 ng/µl. We highly recommend fluorometric concentration measurements (e.g. Qubit) instead of photometric ones (e.g. Nanodrop), because of their significantly higher accuracy for double-stranded DNA. Photometric measurements frequently overestimate the samples’ DNA concentration. We often receive plasmids from customers who still measure their concentration with photometric measurements, which is the most common reason for failed attempts of plasmid sequencing.

Assembly failed/final plasmid size doesn’t fit

Possible reasons are real-life biological plasmid concatemers (see above), which appear typically during propagation in recA+ strains via homologous recombination. If the concatemers are in a high proportion in the sample, they might be picked up. Concatemers cannot be seen on a gel with a digested/linearized plasmid, but you can run the supercoiled/uncut plasmid with a supercoiled ladder.

Additionally, although the third gen ONT service sequences the plasmid with long reads in the plasmids’ size, in rare cases large highly repetitive plasmids might not be resolved correctly by the assembler. Finally, the sample quality might not be sufficient (a clean single plasmid prep has one dominant plasmid peak). Multiple peaks indicate plasmid mixtures, or other side products like insertions/deletions and/or recombination.

Quality is important for us at Eurofins

Our products and services are produced and performed under strict quality management and quality assurance systems.

Find certificates here