Seven points that could make or break your next Illumina NGS project
Overview – What are the points to ensure a successful Illumina NGS project
There are several important factors that you need to consider before doing next-generation sequencing (NGS) on an Illumina platform. A well-planned experiment could easily maximise the success of the final outcome regardless of whether you perform the sequencing yourself or outsource to a service provider.
- Starting material
The starting point of your project is the choice of a proper DNA/RNA extraction method for your organism of interest. The lysis and homogenisation steps of the protocol should be tailored to the specific material in order to maximise yield and quality. The protocol should be performed by experienced users to avoid degradation of the nucleic acid due to missteps or delays. For some materials, such as plants, the removal of inhibitors is of essence. In all cases, the quality of your DNA/RNA should be assessed using capillary gel electrophoresis or similar methods.
- Number of samples
The number of samples very much depends on the aim of your sequencing experiment. If you want to check the presence of a gene/SNP of a bacterial strain, one sample might be enough to answer this question. However, if you want to analyse effects on groups of samples, replicates should be performed if possible for proper statistical analysis. Also consider including controls to validate your results.
- Number of reads
Your project will require a minimum number of sequencing reads in order to generate reliable data. If you are sequencing amplicons, small RNAs or re-sequencing small genomes, as little as 5 million reads could be sufficient. For resequencing projects the genome size needs to be considered and it is directly connected to the desired sequencing depth/coverage. For larger eukaryotic genomes more sequencing reads are needed than for small prokaryotic genomes. Over-sequencing does not only costs more money and time, it also complicates downstream data analysis.
- Sequencing depth
The sequencing coverage, or the average number of times a single base is read during a run, is also of importance. The more frequent the base is sequenced, the more reliable the base call is likely to be. This parameter is also highly dependent on the application. For example, to reliably identify germline mutations 30x coverage is usually sufficient. However, 100x and more should be sequenced to detect somatic mutations of tumour samples.
- Sequencing mode
Illumina offer two distinct types of sequencing - the single-read and the paired-end mode. Single-read runs sequence DNA fragments from one end to the other end depending on the fragment length and the sequencing length. The single-read mode is fast, cheap and could be beneficial for some RNA-seq and ChIP-Seq experiments. In paired-end mode the fragment is read first from one end and in a second read the same fragment is read from the opposite end. Thereby, for each fragment two paired reads are generated. Although this sequencing mode is more expensive, it generates more data and it makes the mapping to the genomic reference more reliable. Therefore, it is the preferred choice for applications like SNP analysis and genome assembly.
During the library preparation each sample is labelled with a sample-specific molecular tag, called barcode. This process allows multiple samples to be processed in the same sequencing reaction and then separated during BioIT analysis. Besides lowering costs, multiplexing allows for randomisation and can help minimise sequencing bias. In a perfect world, the experimental design would involve pooling all controls and experimental samples together and sequencing these on the same lane. If this cannot be achieved samples should be randomised so that in each batch of samples both cases and controls are processed. For low complexity samples, such as amplicons and bisulfite-treated DNA, pooling samples together with high complexity samples can increase the sequencing quantity and quality.
- BioIT analysis
As sequencing costs are lower than ever before, the current bottleneck of NGS tends to be the bioinformatics analysis. Thinking about how you will analyse your data in advance can help ensure you have included all necessary controls. You can go ahead yourself with the data analysis using free or commercially available software, but learning how to use these programs often involves a steep learning curve. Alternatively, the option exists to outsource the BioIT analysis to an experienced provider.
Are you interested in NGS? Discover our solutions.
The NGSelect Amplicons services can help advance your targeted sequencing project.
Are you analysing microbial communities via 16S rRNA gene amplicon sequencing? Try the INVIEW Microbiome Profiling services for taxonomic characterisation.
Are you working with human DNA samples and need to investigate only the protein-coding genomic sequences? Then use the INVIEW Human Exome services for generating and analysing all exon sequences.
For cases where only a few genes of interest from a smaller number of samples are needed to be characterised, see how our Sanger sequencing services can be used to your advantage.