Nh?ng Ph?n M?m Download Torrent T?t Nh?t 2015

New hampshire free stuff - craigslist CL new hampshire new hampshire albany, NY boston cape cod catskills cornwall, ON eastern CT glens falls hartford hudson valley long island maine montreal new haven north jersey northwest CT oneonta plattsburgh potsdam-massena quebec rhode island sherbrooke south coast trois-rivieres utica vermont watertown. Download b o c o t ng k t 10 n m ch ng tr nh ph i h p v ph for FREE. All formats available for PC, Mac, eBook Readers and other mobile devices. Download b o c o t ng k t 10 n m ch ng tr nh ph i h p v ph.pdf.

Nh Ng Ph N M M Download Torrent T T Nh T 2015 For Sale
Aim Download
Nh Ng Ph N M M Download Torrent T T Nh T 2015 For Sale

Published online 2015 May 5. doi: 10.4137/BBI.S12462

PMID: 25983555

This article has been cited by other articles in PMC.

Abstract

Advances in next-generation sequencing (NGS) have allowed significant breakthroughs in microbial ecology studies. This has led to the rapid expansion of research in the field and the establishment of “metagenomics”, often defined as the analysis of DNA from microbial communities in environmental samples without prior need for culturing. Many metagenomics statistical/computational tools and databases have been developed in order to allow the exploitation of the huge influx of data. In this review article, we provide an overview of the sequencing technologies and how they are uniquely suited to various types of metagenomic studies. We focus on the currently available bioinformatics techniques, tools, and methodologies for performing each individual step of a typical metagenomic dataset analysis. We also provide future trends in the field with respect to tools and technologies currently under development. Moreover, we discuss data management, distribution, and integration tools that are capable of performing comparative metagenomic analyses of multiple datasets using well-established databases, as well as commonly used annotation standards.

Keywords: metagenomics, next-generation sequencing, computational tools, data analysis

Introduction

The advent of next-generation sequencing (NGS) or high-throughput sequencing has revolutionized the field of microbial ecology and brought classical environmental studies to another level. This type of cutting-edge technology has led to the establishment of the field of “metagenomics”, defined as the direct genetic analysis of genomes contained within an environmental sample without the prior need for cultivating clonal cultures. Initially, the term was only used for functional and sequence-based analysis of the collective microbial genomes contained in an environmental sample, but currently it is also widely applied to studies performing polymerase chain reaction (PCR) amplification of certain genes of interest. The former can be referred to as “full shotgun metagenomics”, and the latter as “marker gene amplification metagenomics” (ie, 16S ribosomal RNA gene) or “meta-genetics”.

Such methodologies allow a much faster and elaborative genomic/genetic profile generation of an environmental sample at a very acceptable cost. Full shotgun metagenomics has the capacity to fully sequence the majority of available genomes within an environmental sample (or community). This creates a community biodiversity profile that can be further associated with functional composition analysis of known and unknown organism lineages (ie, genera or taxa). Shotgun metagenomics has evolved to address the questions of who is present in an environmental community, what they are doing (function-wise), and how these microorganisms interact to sustain a balanced ecological niche. It further provides unlimited access to functional gene composition information derived from microbial communities inhabiting practical ecosystems.

Marker gene metagenomics is a fast and gritty way to obtain a community/taxonomic distribution profile or fingerprint using PCR amplification and sequencing of evolutionarily conserved marker genes, such as the 16S rRNA gene. This taxonomic distribution can subsequently be associated with environmental data (metadata) derived from the sampling site under investigation.

Several types of ecosystems have been studied so far using metagenomics, including extreme environments such as areas of volcanism^– or other areas of extreme temperature,^, alkalinity, acidity,^, low oxygen,^, and high heavy-metal composition.¹⁷^, This invaluable resource provides an infinite capacity for bioprospecting and allows the discovery of novel enzymes capable of catalyzing reactions of biotechnological commercialization.

The first metagenomic studies were focused on low- diversity environments, such as an acid mine drainage, human gut microbiome, and water samples from the Sargasso Sea, mainly due to the unavailability of both high-throughput sequencing technologies at that time and relevant software for the scaffolds’ assembly. As more and more researchers entered this new field of study, the need for powerful tools and software became apparent and therefore led to the creation of several such tools.

Sequencing Technologies

Two commonly used NGS technologies utilized to date are the 454 Life Sciences and the Illumina systems, with the ratio of usage shifting in favor of the latter recently. Both technologies have been widely used in metagenomic studies, and hence it is important to briefly describe their advantages and disadvantages with respect to the sequencing of metagenomics samples.

The 454 pyrosequencer was the first next-generation sequencer to achieve commercial introduction in 2004. Its chemistry relies on the immobilization of DNA fragments on DNA-capture beads in a water–oil emulsion and then using PCR to amplify the fixed fragments. The beads are placed on a PicoTiterPlate (a fiber-optic chip). DNA polymerase is also packed in the plate, and pyrosequencing is performed.^, Its main difference from the classic Sanger sequencing is that pyrosequencing relies on the detection of pyrophosphate release on nucleotide incorporation rather than chain termination with dideoxynucleotides. The release of pyrophosphate is conveyed into light using enzyme reactions, which is then converted into actual sequence information.

In the initial years of high-throughput sequencing, scientists embraced the new technology and hence discovered the existence of the “rare biosphere”. However, in many cases the apparent assignment of a microbial operational taxonomic unit (OTU) was in fact an attribute of sequencing errors, which caused an overinflation of the diversity estimates.²⁷ Noise generated by this 454 pyrosequencing technology affected different aspects of metagenomic data analysis and led to biased results.

PCR errors may lead to replicate sequence artifacts, which can cause overestimation of species abundance and functional gene abundance in 16S rRNA and full shotgun metagenomics, respectively. PCR can also generate noise in the form of single base pair errors (ie, substitutions, deletions) that can cause frame shifts for protein coding genes in shotgun meta-genomics. Moreover, PCR chimeras (sequences generated by undesired end-joining of two or more true sequences) can also affect 16S metagenomics results with respect to species distribution. Sequencing errors can also occur due to the actual chemistry underlining the technology. For example, there is an inherent difficulty in clearly identifying the intensity of 454 pyrosequencing-generated flowgrams. This task becomes even more difficult during the sequencing of homopolymers. The 454 pyrosequencing technology can generate reads up to 1,000 bp in length and ~1,000,000 reads per run. The relatively long read length generated by this technology (in comparison to other sequencing technologies) allows a significantly less error-prone assembly in shotgun metagenomics and permits greater annotation accuracy.^, The cost of sequencing using 454 pyrosequencing technology is estimated at around US$20 per Mb, but it has a relatively low coverage of 0.7 GB per sequencing run. With respect to pyrosequencing, <20 ng of DNA is sufficient for sequencing single-end libraries, although paired-end sequencing may require larger quantities of DNA.

Although 454 will eventually stop being supported by Life Sciences, still one should take into account that there is a large number of existing unpublished datasets that have been generated via this technology. Therefore, it is important to include it in this review and compare it with the other sequencing services that have become more popular over the last years, namely Illumina.

Illumina dye sequencing by synthesis begins with the attachment of DNA molecules to primers on a slide, followed by amplification of that DNA to produce local colonies. This generation of “DNA clusters” is accompanied by the addition of fluorescently labeled, reversible terminator bases (adenine, cytosine, guanine, and thymine) attached with a blocking group. The four bases then compete for binding sites on the template DNA to be sequenced, and the nonincorporated molecules are washed away. After each synthesis cycle, a laser is used to excite the dyes, and a high-resolution scan of the incorporated base is made. A chemical deblocking step ensures the removal of the 3’terminal blocking group and the dye in a single step. The process is repeated until the full DNA molecule is sequenced. Illumina has a variety of sequencing instruments dedicated to different applications. MiSeq, for example, has an output of 15 GB and 25 million sequencing reads of 300 bp in length; clustered fragments can be sequenced from both ends (paired-end sequencing), which can be merged so that 600 bp reads can be obtained. HiSeq2500 has a much greater output (1,000 GB per run) but offers 125 bp reads. Illumina yields involve a much lower cost (~US$0.50 per Mb), but the run time is longer than that for 454 pyrosequencing. Currently, this feature is being addressed by the MiSeq Illumina machine, which has been developed in order to run smaller jobs at a much faster rate with relatively high throughput. Illumina allows sample preparation sizes of <20 ng DNA (similar to 454 pyrosequencing). The shorter read length produced by Illumina may increase errors during assembly and, subsequently, the annotation inaccuracies during shotgun metagenomics data analysis. In contrast, when analyzing 16S metagenomics data, this technology obviates the need for time-consuming noise removal algorithms required for pyrosequencing and makes analysis less error-prone. The greater coverage/yield generally offered by Illumina allows significant decrease of systematic errors. This advantage and the low cost are the delineating factors that have turned Illumina into the preferred high-throughput sequencing technology for metagenomics studies.

Additional sequencing technologies are available and can potentially be used for metagenomic studies. These include the Applied Biosystems SOLiD 5500 W Series sequencer, which offers higher coverage than 454 pyrosequencing but lower than Illumina (~120 GB per run). It allows fragment or mate-paired sequencing; however, it can only guarantee a low error rate for sequencing reads of maximum 50 bp in length. This reduces the possibility of generating a reliable and usable de novo assembly for shotgun metagenomics; but, on the other hand, this technology performs very well when utilizing a reference genome for mapping or assembly of reads. However, using the Exact Call Chemistry (ECC) module, the SOLiD system offers to boost the accuracy of its ligation-based sequencing.

An emerging sequencing technology that may have high impact on the fields of genomics and metagenomics was recently developed by Pacific Biosciences (PacBio). This technology uses single-molecule real-time (SMRT) sequencing, which is a parallelized single-molecule DNA sequencing by synthesis. SMRT sequencing utilizes the zero-mode waveguide (ZMW), whereby a single DNA polymerase enzyme is fixed to the bottom of a ZMW with a single molecule of DNA as a template. The ZMW is a structure that creates an illuminated observation volume that is small enough to allow the observation of a single nucleotide of DNA (also known as a base) being incorporated by DNA polymerase. Each of the four DNA bases is attached to one of four different fluorescent dyes. When a nucleotide is incorporated by the DNA polymerase, the fluorescent tag is cleaved off, which diffuses out of the observation area of the ZMW where its fluorescence is no longer observable. A detector detects the fluorescent signal of the nucleotide incorporation, and the base call is made according to the corresponding fluorescence of the dye. PacBio provides much longer read lengths (~10,000 bp) compared to the aforementioned technologies, thus having obvious advantages when addressing issues of annotation and assembly for shotgun metagenomics. PacBio technology uses a process called strobing to perform paired-end read sequencing. Despite the high read length of PacBio, this technology is limited by high error rates and low coverage (albeit at higher throughput than Sanger sequencing).

In addition to the aforementioned technologies, which are based on optics, technologies such as Ion Torrent’s semiconductor sequencing benchtop sequencer and Ion Proton are now coming into play. These technologies are based on the use of proton emission during polymerization of DNA in order to detect nucleotide incorporation. This system promises read lengths of >200 bp and relatively high throughput, on the order of magnitude achieved by 454 Life Sciences systems. Additionally, it offers higher quality than 454, especially when sequencing homopolymers, but at a similar cost (about US$23 per Mb for the Ion Torrent PGM −314 Chip). Looking into the future, and given that 454 will eventually stop being supported by Life Sciences, it is very likely that former users of the 454 pyrosequencing will switch to Ion Torrent sequencing chemistry, due to the similarities of both (eg, emulsion PCR step) and the significant the advantages of the latter.

An even more cutting-edge technology is currently under development by Oxford Nanopore technologies, which is developing “strand sequencing”, a method of DNA analysis that could potentially sequence completely intact DNA strands/polymers passed through a protein nanopore. This obviates the need for shotgun sequencing and aims to revolutionize the sequencing industry in the future. Oxford Nanopore intends to commercialize this technology with the Company’s GridION™ and MinION™ systems. For meta-genomics, this technology can have obvious advantages, as it will eliminate erroneous sequencing caused by shotgun meta-genomics and exclude the need for the error-prone assembly step during data analysis (for details, see later). However, nanopore sequencing is at the moment noncommercialized (offered only through the MinION™ Access Program) and is still being optimized on case-by-case basis using specific template and sequencing needs.

Another example of an innovative and very promising technology is the Irys Technology (BioNano Genomics), which uses micro and nanostructures and offers new ways of de novo constructing genome maps. The input is DNA labeled at specific sequence motifs that can be used for imaging and identification in IrysChips. These labeling steps result in a uniquely identifiable, sequence-specific pattern of labels to be used for de novo map assembly or for anchoring sequencing contigs.

Shotgun Metagenomics

Assembly of shotgun metagenomics data

Metagenomics studies are commonly applied to investigate the specific genomes (known as well as unknown, both cultured and uncultured) that are present within an environmental community under study. Moreover, when performing full shotgun metagenomics, the complete sequences of protein coding genes (previously characterized or novel) as well as full operons in the sequenced genomes can offer invaluable functional knowledge about the community. For these reasons, an assembly of shorter reads into genomic contigs and orientation of these into scaffolds is often performed to provide a more compact and concise view of the sequenced community under investigation. Early attempts at metagenomic data assemblies utilized tools initially implemented for single genome data assemblies. They, therefore, fell short when forced to assemble reads into contigs for metagenomic samples. However, assembly tools have significantly evolved since then, and the current line of tools have been modified and specifically designed to assemble samples containing multiple genomes, thereby rendering them much more affective for the task in hand.

The process of assembling shorter reads into contigs can take two different routes: 1) reference-based assembly and 2) de novo assembly. The choice of which route to follow depends on the dataset that needs to be analyzed and on the specific needs of each research project. For example, de novo assembly could be, in theory, used even if a reference genome exists, if the computational power allows for it.

Reference-based assembly refers to the use of one or more reference genomes as a “map” in order to create contigs, which can represent genomes or parts of genomes belonging to a specific species or genus. Tools such as Newbler (Roche), MIRA 4, or AMOS, as well as the recent MetaAMOS, are commonly used in metagenomics for performing referenced-based assemblies. These tools are not computationally intensive and perform well when metagenomic samples are derived from extensively studied and researched areas. In such cases, sequences from closely related organism would have already been deposited in online data repositories and databases, allowing them to be used as references for the assembly process. Often, assemblies are visually evaluated using genome browser tools such as Artemis. The observation of large gaps in the query genome(s) of the resulting assembly, when comparing to the reference genome(s), can be seen as an indication that perhaps the assembly is incomplete or that the reference genome(s) used are too distantly related to the community under investigation in order to perform optimally.

De novo assembly refers to the generation of assembled contigs using no prior reference to known genome(s). This task is computationally expensive and relies heavily on sophisticated graph theory algorithms, such as de-Bruijn graphs, which were specifically employed to tackle this job. Tools such as EULER, Velvet, SOAP, and Abyss were amongst the first to perform de novo assembly and are still widely used today. They require computers with large amounts of memory and generally long execution times (depending on the size of the dataset). However, these tools were built with the assumption of assembling a single genome and often underperform when used for metagenome assemblies. Problems arise from 1) variation between similar subspecies, 2) genomic sequence similarity between different species, and 3) difference in abundance for species in a sample also affected by different sequencing depths for individual species. These issues introduce kinks (or branches) in the de Bruijn graph, and have to be addressed in order to improve the assembly.

The next generation of assembly tools, such as MetaVelvet and very recently MetaVelvet-SL^, and Meta-IDBA, was developed to address these issues. MetaVelvet and Meta-IDBA employ a combined binning (for details on binning, see below) and assembly approach to create more accurate assemblies from datasets containing a mixture of multiple genomes. They make use of k-mer frequencies to detect kinks in the de-Bruijn graph and then use these k-mer thresholds to decompose the graph into subgraphs. These tools further assemble contigs and scaffolds based on the decomposed subgraphs, and thus perform a more efficient grouping/assembly of contigs, effectively separating those belonging to different species.

The IDBA-UD algorithm was recently developed to additionally address the issue of metagenomic sequencing technologies with uneven sequencing depths. It makes use of multiple depth-relative k-mer thresholds to remove erroneous k-mers in both low-depth and high-depth regions. Comparison of the performances of these tools is often performed using the N50 length score, which is defined as “the length for which the collection of all contigs of that length, or longer, contains at least half of the total of the lengths of the contigs in the assembly”.^, A recent comparison of the latest line of assembly tools shows that IDBA-UD can reconstruct longer contigs with higher accuracy. However, there is still much room for the improvement of metagenomic assembly algorithms in order for them to conceptually capture the task in hand.

Binning tools for metagenomes

Binning is the process of grouping (binning) reads or contigs into individual genomes and assigning the groups to specific species, subspecies, or genus. Binning methods can be characterized in two different ways depending on the information used to group the sequences in hand: 1) Composition-based binning is based on the observation that individual genomes have a unique distribution of k-mer sequences (also denoted as genomic signatures). By making use of this conserved species-specific nucleotide composition, these methods are capable of grouping sequences into their respective genomes. 2) Similarity- or homology-based binning refers to the process of using alignment algorithms such as BLAST or profile hidden Markov Models (pHMMs) to obtain similarity information about specific sequences/genes from publically available databases (eg, NCBI’s nonredundant database – nr or PFAM). Thereafter, sequences are binned according to their assigned taxonomic information.

Available composition-based binning algorithms are included in tools such as TETRA, S-GSOM,^, Phylopythia and its successor PhylopythiaS, TACAO, PCAHIER, ESOM,⁵⁸^, and ClaMS, while examples of purely similarity-based binning software include tools such as CARMA, MetaPhyler, and SOrt-ITEMS. Some tools employ similarity-based binning algorithms in their metagenomics analysis pipelines. Examples of such tools are IMG/MER 4, MG-RAST,^, and MEGAN^– and will be described in more detail below.

Certain binning tools employ a hybrid approach using both composition and similarity-based information to group sequences. Some examples of such tools are PhymmBL and MetaCluster.^, More innovative binning approaches include co-abundance gene segregation across a series of metagenomic samples, thus facilitating the assembly of microbial genomes without the need for reference sequences. This new method promises to overcome the usual computational challenges of other binning tools and has been tested for a human gut microbiome.

Binning tools can further be characterized with respect to the type of algorithm they employ such as 1) ab initio unsupervised classifiers and 2) supervised/training-based classifiers. Unsupervised binning refers to the process of using pre-existing bins derived from genomic sequences to classify a given dataset without user supervision. In contrast, supervised binning allows user interference and supervision in the training process per se. More particularly, the user may specify the type of sequences that will be used to train each bin and, furthermore, select sequences from known taxonomic lineages to use while training the classifier. Sophisticated algorithms such as support vector machines (PhylopythiaS), hidden Markov models (PhymmBL, TETRA), as well as self-organizing maps (ESOMs) have been used in binning algorithms. However, tools such as PhylopythiaS and TETRA allow little user intervention, while ClaMS and ESOM provide a more supervised training approach that can be fine-tuned to allow optimal classification for the specific dataset under consideration.

There are certain aspects that one must take into consideration when performing the binning of metagenomic sequences. Composition-based binning using genomic signature has its drawbacks, especially when performed on short reads (ie, 150 bps). Given that all possible tetranucleotide combinations amount to 256, it is unlikely to extract sufficient information to reliably assign a taxonomic rank to a specific bin using short reads. Therefore, it is common practice to perform composition-based binning on assembled datasets. This way, longer contigs can provide the required k-mer distribution information, which will allow effective binning and taxonomic assignment. Observation of a taxonomic marker sequence (ie, 16S rRNA gene) within the bins can further facilitate reliable taxonomic assignment for the respective bin. Similarity-based binning also has its disadvantages. Although capable of binning reads of short length, it fails to do so accurately when the metagenome under consideration consists of numerous closely related species. This may cause assignment of closely related sequences to the same reference genome, perhaps at a higher taxonomic level (ie, order or class), thereby generating bins containing a mixture of genomes. Therefore, optimal binning results are expected to be attained when combining both composition- and similarity-based approaches as adopted by hybrid tools such as PhymmBL and MetaCluster.^,

Annotation of metagenomics sequences

Annotation of metagenomes is specifically designed to work with mixtures of genomes and contigs of varying length. Initially, a series of preprocessing steps prepare the reads for annotation. These include 1) Trimming of low-quality reads using platform-specific tools such as the FASTX-Toolkit. Additionally, FastQC can provide summary statistics for FASTQ files. Both have been recently integrated into the Galaxy platform.^– SolexaQA and Lucy 2 are also used for FASTQ files. Most of these tools make use of Phred or Q quality scores,^, the thresholds of which depend on sequencing technology; 2) Masking of low-complexity reads performed using tools such as DUST; 3) A de-replication step that removes sequences that are more than 95% identical; 4) A screening step performed by some tools (ie, MG-RAST) in which the pipeline provides the option of removing reads that are near-exact matches to the genomes of a handful of model organisms, including fly, mouse, cow, and human. This is done using mapping tools such as Bowtie 2.

The next main stage of the annotation pipeline is the identification of genes within the reads/assembled contig, a process often denoted as “gene calling”. Genes are labeled as coding DNA sequences (CDSs) and noncoding RNA genes, and certain annotation pipelines (eg, IMG/MER) also predict for regulatory elements such as clustered regularly interspaced short palindromic repeats (CRISPRs).

CDSs are identified using a number of tools including MetaGeneMark, Metagene, Prodigal, Orphelia, and FragGeneScan, all of which utilize ab initio gene prediction algorithms. Often, annotation pipelines use an intersection of these tools to obtain a more informative prediction of the protein coding genes. Gene prediction tools utilize codon information (ie, start codon – AUG) to identify potential open reading frames and hence label sequences as coding or non-coding. Most tools can be trained by using the desired training sets. For example, FragGeneScan is trained for prokaryotic genomes only, and is used by IMG/MER and MG RAST as well as EBI Metagenomics. It is believed to be one of the most accurate gene-prediction tools currently available. However, like most of these tools, it is expected to have an average prediction accuracy of ~65%–70%, resulting in multiple genes that are missed altogether.

CRISPR elements are identified by programs such as CRT and PILER-CR. IMG/MER uses a concatenation of results obtained from both these programs, retaining the longest element prediction in case of overlap.

Noncoding RNAs such as tRNAs are predicted using programs like tRNAscan,^, ribosomal RNA (rRNA) genes (5s, 16s, and 23s) are predicted using internally developed rRNA models for IMG/MER, and MG-RAST uses similarity to compare three known databases (SILVA, Greengenes, and the Ribosomal Database Project-RDP^,) to predict rRNA genes.

The next stage of the annotation pipeline involves functional assignment to the predicted protein coding genes. This is currently achieved by homology-based searches of query sequences against databases containing known functional and/or taxonomic information. Due to the large size of metagenomic datasets, this stage is often very expensive computationally and highly automated. BLAST or other sequence-similarity-based algorithms often run on high-performance computer clusters. Often, multithreading or other parallel programming approaches are used to divide jobs in multiple central/graphic processing units (CPUs/GPUs). This reduces the running time complexity and significantly speeds up querying execution time.

Some widely used data repositories to obtain annotation for metagenomic datasets include functional annotation databases such as KEGG,^, SEED, eggNOG, COG/KOG, as well as protein domain databases such as PFAM^, and TIGRFAM. Often, annotation pipelines make use of multiple databases or composite protein domain databases such as Interpro (see EBI Metagenomics) in order to obtain a more collective, cumulative biological functional annotation.

IMG/MER utilizes HMMsearch (profile HMMs) to associate genes with PFAM, and genes are further annotated using COGs. Database of position-specific scoring matrix (PSSMs) for COGs are downloaded from NCBI and are used to annotate protein sequences. Moreover, genes are labeled using KEGG-associated KO terms, EC numbers, and assigned phylogeny using similarity searches. With a large set of genomes in its public repositories, IMG/MER can exploit its own resources, using them as reference nonredundant databases from which it obtains additional functional annotation.

MG-RAST utilizes many of the databases described above for annotation mapping as well as the NCBI taxonomy. The primary data product displayed to the user by MG-RAST is in the form of abundance profiles, and taxonomic information is projected against this data.

Both IMG/MER and MG-RAST are widely used data management repositories and comparative genomics environments. They are fully automated pipelines that provide quality control, gene prediction, and functional annotation. Both tools support user download of data products generated, as well as optional sharing and publishing within the respective portals. However, there are important differences between MG-RAST and IMG/MER that are relevant to the way MG-RAST calculates abundance profiles.

MG-RAST predicts all genes in the metagenome, and then identifies the best homologs of those genes in the isolate genomes using a tool called BLAT (BLAST-like alignment tool). BLAT misses similarities below 70% identity, so many strong hits to other genes are missed. After the best hits to genes from an isolated genome are identified, all subsequent analysis is done using the genes of the isolate genomes, not the genes of the metagenome at hand. This creates a lot of limitations due to the fact that the analysis is not performed on the original genes of the metagenome but on the “proxy” genes to the isolated genomes instead. The advantage of this method is its speed; the only computationally intensive step is to find the best hits of the metagenomes against the isolates. Once this is done, all other comparisons are already pre-existing. The other major advantage is that the MG-RAST database does not grow in size, as is the case with the IMG/MER database.

IMG/MER also begins with prediction of all genes from the metagenome, but then runs all the computations on those genes rather than on their proxies. This allows the identification of PFAM hits (which is not supported in MG-RAST) and provides much more detailed functional information compared to COGS, which is the only protein families database used in MG-RAST. The major bottleneck for IMG/MER is the exponential growth of the gene number, which is not an issue for MG-RAST since the metagenome genes are not kept for analysis. It is, however, important to use PFAM for functional analysis because by comparing the number of genes from any metagenome that go into COG or PFAM clusters, the second provides significantly higher coverage and therefore allows a much deeper analysis. Another major advantage of IMG/MER is that, since the tool keeps the original metagenome genes, it also keeps the original contigs, which provides synteny information. Therefore, it is far more suitable if one is interested in identifying novel biosynthetic gene clusters (BGCs) in the metagenomes, a type of analysis that may be less viable using MG-RAST. The prediction of BGCs from metagenomics data is recently gaining a great deal of interest due to their potential in biotechnological applications. The possibility to engineer BGCs for the production of secondary metabolites with improved properties, known for their use in anticancer drugs and antibiotics, offers limitless potential for bioprospecting.

Over its 35-year career, Anthrax has been a pioneering band with its unique style, sound and heavy brand of thrash metal, and, as Metallica's Kirk Hammett put it. Jan 1, 2018 - For All Kings| Anthrax to stream in hi-fi, or to download in True CD Quality on Qobuz.com. Check out For All Kings by Anthrax on Amazon Music. Stream ad-free or purchase CD's and MP3s now on Amazon.com. Jan 1, 2016 - You Gotta Believe 02. Monster at the End 03. For All Kings 04. Breathing Lightning 05. Evil Twin 07. Blood Eagle Wings 08.

The EBI Metagenomics service is a newly developed web-based portal that uses metadata structures and formats that comply with the Genomic Standards Consortium (GSC) guidelines. Moreover, a novel data scheme currently being hosted by the EBI-EMBL is being adopted by the EBI Meta-genomics service. This is known as the European Nucleotide Archive (ENA) data schema and aims to integrate data derived from sequencing technologies under a consensus, mutually accepted standard. EBI Metagenomics offers a dual shotgun and marker gene analysis service. It allows the extraction of rRNA data from shotgun metagenomic data using tools such as rRNASelector for concurrent marker metagenomic analysis. It therefore supports additional 16S rRNA-based analysis tools such as Qiime (see section on Marker Gene Metagenomics) for the efficient taxonomic assignment of these sequences. For functional analysis and annotation of CDS sequences, EBI Metagenomics uses FragGeneScan to obtain protein coding sequences and thereafter utilizes databases such as Interpro, which is a composite, cumulative system comprised of multiple databases of protein families, and allows for protein domain prediction and functional assignment. EBI Metagenomics provides data archiving via ENA and provides unique accession numbers for submitted datasets. Archiving policies require the data to be made public; however, there is a 2-year period (upon submission) during which the data is kept private pending user publication of analysis results.

CAMERA is another online cloud computing service that provides hosted software tools and a high-performance computing infrastructure for the analysis of metagenomic data. One advantage of CAMERA is that it allows greater user intervention and flexibility during the analysis process. However, this means that users must have expertise, knowledge, and hands-on experience in metagenomic date analysis per se, in order to ensure correct execution of the pipeline and accuracy of results. Moreover, in order to perform comparative metagenomics using CAMERA, the datasets in hand must be traversed through the CAMERA pipeline, thus making integration of data from different resources more computationally demanding. MEGAN 5 is yet another tool that performs analysis of metagenomic data and offers a wide range of visualization tools for metagenomic annotation results. It supports multiple visualization schemes including functional or taxonomic dendrograms, tag clouds, bar charts, and Krona taxonomic plots, that allow hierarchical data to be explored in the form of a zoomable pie chart.

Marker Gene Metagenomics

It is widely accepted that sequencing of the 16S rRNA gene reflects eubacterial evolution. Since the introduction of SSU rDNA-based molecular techniques,^– the study of microbial diversity in natural environments has advanced significantly. In addition, pyrosequencing^, of the 16S rRNA gene has been widely applied in the field of microbial ecology^,^– and has resulted in a great number of sequences deposited in relevant databases, thus enhancing the value of 16S as the “gold standard” in microbial ecology. While the 16S rRNA gene fragment, containing one or more variable regions, is the preferred target marker gene for bacteria and archaea, this is not the case for fungi and eukaryotes where the preferred marker genes are the internal transcribed spacer (ITS) and 18S rRNA gene, respectively.

Taxonomic analysis for prokaryotes (ie, bacteria and archaea) is regularly performed using 16S data derived from varying sequencing technologies (ie, 454 pyrosequencing as well as Illumina, Solid and Ion Torrent), and, for the purposes of this review, we will list the relevant software to allow analysis for most sequencing technologies. Commonly used tools for 16S data analysis and denoising include QIIME, Mothur, SILVAngs, MEGAN, and AmpliconNoise. Despite the vast availability of algorithms and software for analysis of 16S metagenomics datasets, QIIME seems to be established as the “gold standard”.

It is important to be aware of certain aspects of the terminology required for the efficient analysis of 16S metagenomics data. These include the following: 1) Amplicon – a DNA fragment that is amplified by PCR, eg, one or more 16S rRNA variable regions, or other marker genes. Most researchers will make use of standard PCR primers; 2) OTU – species distinction in microbiology, typically using rRNA and a percentage of similarity threshold for classifying microbes within the same, or different, OTUs; 3) Barcode – a short DNA sequence that is added to each read during amplification and that is specific for a given sample. This allows samples to be mixed (multiplexed) to reduce sequencing cost. During analysis, sequences need to be demultiplexed, ie, separated by sample.

Analysis usually requires a reference database that is searched to find the closest match to an OTU from which a taxonomic lineage is inferred. Some widely utilized databases include Greengenes, (16S), Ribosomal Database Project,^,^, (16S), Silva^, (16S + 18S), and Unite (ITS). These databases are less suitable for certain groups of organisms, such as protists and viruses, which are extremely diverse and for which considerably less sequence information is available compared to bacteria.

Denoising

Denoising is important for 16S metagenomic data analysis, and it is platform-specific; ie, certain platforms (eg, Illumina) require less denoising than others (eg, pyrosequencing). For example, denoising of 454 pyrosequencing data, despite being computationally expensive, is necessary due to intrinsic errors generated from pyrosequencing that can give rise to erroneous OTUs. A procedure called “flowgram clustering” removes problematic reads and increases the accuracy of the taxonomic analysis. Several denoising algorithms have been developed so far,^,^,^– but for the purpose of this review three of them will be analyzed in detail.

Denoising is performed very efficiently by Amplicon-Noise, a tool that uses the following basic denoising steps: 1) Filtering of noisy reads: reads are truncated based on the appearance of low signal intensities; 2) Removing pyrosequencing noise: distance between the flowgrams is defined and true sequences and their frequencies are inferred by an expectation-maximization (EM) algorithm; 3) Removing PCR noise: the same ideas are used for removing PCR errors; 4) Chimera identification and removal: for each sequence, exact pairwise alignments are performed to all sequences with equal or greater abundance, which is the set of possible parents. Although a considerable number of sequences is lost during the denoising process, it results in high-quality sequences¹³²; however, there has been some debate on the level of stringency required to achieve such high quality.

A very popular software for the analysis of microbial communities is QIIME. Initially QIIME was implemented for use of 454 pyrosequencing datasets only, ie, using sff (Standard Flowgram Format) files, but currently QIIME has been modified to accept the fastq file format, thereby making the analysis of Illumina datasets possible. The QIIME developers provide users with extensive online tutorials for several workflows, and, moreover, QIIME is available as an open-source software package mostly implemented using the programming language PYTHON.

Another widely used software for the analysis of microbial communities is Mothur. It was created from the combination of pre-existing software, such as DOTUR, SONS, and Treeclimber, but, due to the community support it has received, currently it incorporates many more algorithms, thus providing the user with a variety of choices.

More recently, a web-based application called SILVAngs was developed, which provides a fully automated analysis pipeline for data derived from rRNA marker gene amplicon sequencing. The analysis workflow is based on 1) Alignment of reads, 2) Quality assessment and filtering of reads, 3) Dereplication, whereby identical sequences are filtered out to avoid overestimation, 4) Clustering and OTU picking using a priori defined thresholds, and 5) Taxonomic assignment of OTUs using the SILVA rDNA database.

The choice of which denoising algorithm to use is largely depends on the user. Once a choice is made, the user should also consider whether to deviate from the default parameters. Parameter adjustment is related to the dataset produced, ie, which specific 16S rRNA region was sequenced and which technology was used to perform the actual sequencing. In addition, it has been suggested that use of different denoising methods can produce significantly different outcomes, which should be taken into careful consideration when comparing studies that have utilized different algorithms for data analysis.

OTU clustering, picking, and taxonomic assignment

After the demultiplexing of the dataset, ie, the assignment of reads to samples using barcode information, the next step is OTU picking. For bacteria/archaea, it is accepted that OTUs of similarity greater than 97% correspond to the same species, but also other dissimilarity cutoffs can be employed, if needed for the downstream analyses. There are numerous OTU picking strategies: 1) De novo is used if amplicons overlap and if a reference sequence collection is not available. It clusters all reads without using a reference and is quite expensive computationally, hence not very suitable for very large datasets. 2) Closed-reference is used if amplicons do not overlap and if a reference sequence collection is available. This approach discards reads that do not hit a reference sequence. 3) Open-reference is used if amplicons overlap and a reference rules='rows'>Shotgun metagenomicsAssemblyEULER⁴¹Velvet⁴²SOAP⁴³ABySS⁴⁴MetaVelvet⁴⁶MetaVelvet-SL⁴⁵Meta-IDBA⁴⁷IDBA-UD⁴⁸Newbler (Roche)MIRA³⁷Mapsembler¹⁷¹ALLPATHS¹⁷²,¹⁷³MetaORFA¹⁷⁴,¹⁷⁵MetAMOS³⁸BinningTETRA⁵¹S-GSOM⁵²PhylopythiaS⁵⁴,⁵⁵TACOA⁵⁶PCAHIER⁵⁷ESOM⁵⁸ClaMS⁶⁰CARMA⁶¹WGSQuikr¹⁷⁶SPHINX¹⁷⁷MetaPhyler⁶²SOrt-ITEMS⁶³PhymmBL⁷⁰MetaCluster⁷¹,⁷²AnnotationFASTX-Toolkit⁷⁴FastQC⁶⁷SolexaQA⁷⁸Lucy 2⁷⁹DUST⁸²Bowtie⁸³MetaGeneMark⁸⁴LEfSe¹⁹TACOA⁵⁶Metagene⁸⁵CREST¹⁷⁸Prodigal⁸⁶mOTU-LG¹⁷⁹Orphelia⁸⁷Kraken¹⁸⁰FragGeneScan⁸⁸CRT⁸⁹NBC¹⁸¹MyTaxa¹⁸²RITA¹⁸³PILER-CR⁹⁰tRNAscan¹⁸⁴KEGG⁹⁹MetaCluster TA⁷¹SEED¹⁰⁰eggNOG¹⁰¹ProViDE¹⁸⁵COG/KOG¹⁸⁶PFAM¹⁰³,¹⁰⁴,¹⁸⁷TIGRFAM¹⁰⁵MetaPhlAn¹⁸⁸HighSSR¹⁸⁹Blat¹⁰⁷Analysis pipelinesIMG/MER⁶⁴,¹⁹⁰MG-RAST⁶⁵MEGAN 5⁶⁷–⁶⁹CAMERA¹¹²Parallel-META⁷⁴,¹⁹¹EBI Metagenomics¹⁰⁸METAREP¹⁹²PHACCS¹⁹³Marker gene metagenomicsStandalone softwareQIIME¹¹¹,¹⁹⁴Mothur¹²¹JAguc¹⁹⁵M-pick¹⁹⁶OTUbase¹⁹⁷CopyRighter¹⁹⁸AbundantOTU¹⁹⁹UniFrac¹⁴⁵,²⁰⁰ESPRIT¹⁴¹,²⁰¹Analysis pipelinesSILVA¹²⁵FunFrame²⁰²PANGEA²⁰³FastGroupII²⁰⁴CLOTU²⁰⁵DenoisingAmpliconNoise¹²²DADA²⁸JATAC¹²⁷UCHIME²⁰⁶Bellerophon²⁰⁷CANGS²⁰⁸,²⁰⁹DatabasesSILVA¹²⁵Greengenes⁹⁴Ribosomal Database Project (RDP)²¹⁰Unite¹²⁶

There has been some controversy within the metagenomics community regarding the actual need for performing assembly on metagenomes. One contention is that using clustering algorithms such as cd-hit^, or uclust is sufficient to group similar reads together and thereafter proceed to annotation of these clusters without prior assembly. This clustering approach may allow for more accurate annotation of highly diverse samples containing rare, uncultured genomes that may otherwise be excluded from the assembly process due to their low coverage. One drawback of not performing an assembly may be that complex regulatory elements such as CRISPRs may not be identified successfully.

Binning and annotation methods are also constantly being modified and altered to specifically address metagenomic analysis pipelines. A significant improvement of these processes will be achieved upon increase of the genomic repository of cultured as well as uncultured genomes within the public database repertoire. Composition-based as well as similarity-based binning methods, especially those making use of supervised machine learning algorithms (ie, PhylopithiaS, trained on reference genomes), will become increasingly accurate due to the availability of more reliable information.

At this stage it is important to mention that, in spite of the best efforts to reconstruct and prepare datasets by 1) quality filtering, 2) performing assemblies, and 3) binning sequences into taxonomically informative groups, annotation pipelines still achieve successful annotation for only ~50% of the sequences under analysis.^, As mentioned above, the annotation process is highly dependent on the available databases and hence limited by the amount of information that is present within these repositories. Sequences that do not have any similarity with any other sequence existing in a known database are termed “orphan genes”. These genes are believed to be 1) a consequence of sequencing errors and/or reflect the inaccuracy of gene prediction tools, or 2) truly novel genes that have no sequence or function similarity to known genes and may share higher order similarity in the form of protein folds.^, A lot of work is currently being undertaken in order to shed some light on these unknowns/orphans using various types of information. Some existing tools use pathway information from metagenomic neighbors and also context-depended metabolomic data to assign a functional annotation to unknown genes.^, Along these lines, the use of metabolomic, metatranscriptomic, and/or metaproteomic data will provide a more elaborate view of the “picture”, addressing all aspect of the dogma of life in the metagenomics era. Moreover, single-cell genomics is now becoming increasingly popular by investigating information from sequencing individual cells. The synergy of single-cell genomics with metagenomics can allow a more accurate separation of metagenomics sequences into individual genomes, guided by the single-cell sequencing data.

A wide array of software is currently available to perform each step of the marker gene metagenomics analysis pipeline. What is missing from the literature is a systematic evaluation of software and algorithms that have been used so far and a standardized means of comparing results derived from different workflows. Variation in results can occur due to inconsistencies in a number of factors, such as DNA extraction,^, primer pair and amplification region,^– sequencing platform, and the software used. All of the aforementioned sources of variation make it very difficult to compare and obtain trustworthy results. Computational and programming challenges to improve the already available software can be achieved, but only through benchmarks, simulations, and thorough testing. Initiatives such as the GSC could potentially take over the design of the “ Minimum Analysis Requirements of Metagenome Sequences (MARMS)”. This will be made up of standardized methodologies and consensus in the choice of software, analysis steps, threshold values, and parameters. Such an initiative would eliminate, or at least minimize, the biases that can be generated by analyzing data using multiple methodologies.

The availability of data software such as EBI Meta-genomics, IMG/MER, MG-RAST, and SILVAngs will further allow users with limited computational facilities to perform analysis of metagenomic samples. In comparative metagenomic analyses, one can use tools to compare samples from different ecological niches and extract information that is common and/or unique to a specific environment.^,^, Moreover, the GSC is striving toward the successful integration of analyzed data under a unified and mutually acceptable structure/format that will facilitate the exchange of valuable insights and information in the field of microbial ecology and environmental microbiology.

To sum up, we have created a metagenomics flowchart (Fig. 1) outlining all the aforementioned basic steps of the analysis pipeline. Analysis can take two different routes depending on the type of sequencing data (marker gene or shotgun metagenomics). Every analysis step shown in the flowchart is complemented by a list of some well-established tools used by the metagenomics community.

Flowchart of basic metagenomics steps and tools currently in practice.

Notes: The analysis pipeline can take two different routes depending on the type of sequencing data (marker gene or shotgun metagenomics) available. The flowchart outlines the basic steps in the analysis pipeline starting with preprocessing of the data to the final extraction of results and concurrent storage and management of the data. Some popular tools that have been used extensively by the metagenomics community are shown for every step, as a well as the databases and algorithms in common practice.

Footnotes

ACADEMIC EDITOR: J.T Efird, Editor in Chief

FUNDING: This work was supported by the European Commission FP7 programs INFLA-CARE (EC grant agreement number 223151), “Translational Potential” (EC grant agreement number 285948), and LifeWatchGreece Research Infrastructure (http://www.lifewatchgreece.eu/) [384676–94/GSRT/NSRF(C&E)]. The authors confirm that the funder had no influence over the study design, content of the article, or selection of this journal.

COMPETING INTERESTS: Authors disclose no potential conflicts of interest.

Paper subject to independent expert blind peer review by minimum of two reviewers. All editorial decisions made by independent academic editor. Upon submission manuscript was subject to anti-plagiarism scanning. Prior to publication all authors have given signed confirmation of agreement to article publication and compliance with all applicable ethical and legal requirements, including the accuracy of author and contributor information, disclosure of competing interests and funding sources, compliance with ethical requirements relating to human and animal study participants, and compliance with any copyright requirements of third parties. This journal is a member of the Committee on Publication Ethics (COPE). Provenance: the authors were invited to submit this paper.

Author Contributions

AO, GAP, II conceived the idea of the manuscript. AO, CP wrote the first draft of the manuscript. All other authors (GAP, II, NP, PP, GK, CA) made critical revisions and approved the final version of the manuscript.

REFERENCES

1. Riesenfeld CS, Schloss PD, Handelsman J. Metagenomics: genomic analysis of microbial communities. Annu Rev Genet. 2004;38:525–52. [PubMed] [Google Scholar]

2. Xia LC, Cram JA, Chen T, Fuhrman JA, Sun F. Accurate genome relative abundance estimation based on shotgun metagenomic reads. PLoS One. 2011;6(12):e27992–e27992.[PMC free article] [PubMed] [Google Scholar]

3. Handelsman J. Metagenetics: spending our inheritance on the future. Microb Biotechnol. 2009;2(2):138–9.[PMC free article] [PubMed] [Google Scholar]

4. Tringe SG, von Mering C, Kobayashi A, et al. Comparative metagenomics of microbial communities. Science. 2005;308(5721):554–7. [PubMed] [Google Scholar]

5. Tringe SSG, Hugenholtz P. A renaissance for the pioneering 16S rRNA gene. Curr Opin Microbiol. 2008;11(5):442–6. [PubMed] [Google Scholar]

6. Benson CA, Bizzoco RW, Lipson DA, Kelley ST. Microbial diversity in non-sulfur, sulfur and iron geothermal steam vents. FEMS Microbiol Ecol. 2011;76(1):74–88. [PubMed] [Google Scholar]

7. Urich T, Lanzén A, Stokke R, et al. Microbial community structure and functioning in marine sediments associated with diffuse hydrothermal venting assessed by integrated meta-omics. Environ Microbiol. 2014;16(9):2699–710. [PubMed] [Google Scholar]

8. Xie W, Wang F, Guo L, et al. Comparative metagenomics of microbial communities inhabiting deep-sea hydrothermal vent chimneys with contrasting chemistries. ISME J. 2011;5(3):414–26.[PMC free article] [PubMed] [Google Scholar]

9. Kilias SP, Nomikou P, Papanikolaou D, et al. New insights into hydrothermal vent processes in the unique shallow-submarine arc-volcano, Kolumbo (Santorini), Greece. Sci Rep. 2013;3:2421.[PMC free article] [PubMed] [Google Scholar]

10. Bradford MA, Davies CA, Frey SD, et al. Thermal adaptation of soil microbial respiration to elevated temperature. Ecol Lett. 2008;11(12):1316–27. [PubMed] [Google Scholar]

11. Pearce DA, Newsham KK, Thorne MA, et al. Metagenomic analysis of a southern maritime antarctic soil. Front Microbiol. 2012;3:403–403.[PMC free article] [PubMed] [Google Scholar]

12. Xiong J, Liu Y, Lin X, et al. Geographic distance and pH drive bacterial distribution in alkaline lake sediments across Tibetan Plateau. Environ Microbiol. 2012;14(9):2457–66.[PMC free article] [PubMed] [Google Scholar]

13. García-Moyano A, González-Toril E, Aguilera Á, Amils R, Aguilera A. Comparative microbial ecology study of the sediments and the water column of Río Tinto, an extreme acidic environment. FEMS Microbiol Ecol. 2012;81(2):303–14. [PubMed] [Google Scholar]

14. Johnson DB. Geomicrobiology of extremely acidic subsurface environments. FEMS Microbiol Ecol. 2012;81(1):2–12. [PubMed] [Google Scholar]

15. Bryant JA, Stewart FJ, Eppley JM, DeLong EF. Microbial community phylogenetic and trait diversity declines with depth in a marine oxygen minimum zone. Ecology. 2012;93(7):1659–73. [PubMed] [Google Scholar]

16. Stevens H, Ulloa O. Bacterial diversity in the oxygen minimum zone of the eastern tropical South Pacific. Environ Microbiol. 2008;10(5):1244–59. [PubMed] [Google Scholar]

17. Chodak M, Gołębiewski M, Morawska-Płoskonka J, Kuduk K, Niklińska M. Diversity of microorganisms from forest soils differently polluted with heavy metals. Appl Soil Ecol. 2013;64:7–14.[Google Scholar]

18. Gołębiewski M, Deja-Sikora E, Cichosz M, Tretyn A, Wróbel B. 16S rDNA pyrosequencing analysis of bacterial community in heavy metals polluted soils. Microb Ecol. 2014;67(3):635–47.[PMC free article] [PubMed] [Google Scholar]

19. Segata N, Izard J, Waldron L, et al. Metagenomic biomarker discovery and explanation. Genome Biol. 2011;12(6):R60.[PMC free article] [PubMed] [Google Scholar]

20. Tyson GW, Chapman J, Hugenholtz P, et al. Community structure and metabolism through reconstruction of microbial genomes from the environment. Nature. 2004;428(6978):37–43. [PubMed] [Google Scholar]

21. Breitbart M, Hewson I, Felts B, et al. Metagenomic analyses of an uncultured viral community from human feces. J Bacteriol. 2003;185(20):6220–3.[PMC free article] [PubMed] [Google Scholar]

22. Venter JC, Remington K, Heidelberg JF, et al. Environmental genome shotgun sequencing of the Sargasso Sea. Science. 2004;304(5667):66–74. [PubMed] [Google Scholar]

23. Mardis ER. Next-generation DNA sequencing methods. Annu Rev Genomics Hum Genet. 2008;9:387–402. [PubMed] [Google Scholar]

24. Ronaghi M. Pyrosequencing sheds light on DNA sequencing. Genome Res. 2001;11(1):3–11. [PubMed] [Google Scholar]

25. Ronaghi M, Uhlén M, Nyrén P. A sequencing method based on realtime pyrophosphate. Science. 1998;281(5375):363–5. [PubMed] [Google Scholar]

26. Sogin ML, Morrison HG, Huber JA, et al. Microbial diversity in the deep sea and the underexplored “rare biosphere” Proc Natl Acad Sci USA. 2006;103(32):12115–20.[PMC free article] [PubMed] [Google Scholar]

27. Brown SP, Veach AM, Rigdon-Huss AR, Grond K, Lothamer KL, Lickteig SK. Scraping the bottom of the barrel: are rare high throughput sequences artifacts? Fungal Ecol. 2014;13:6–10.[Google Scholar]

28. Rosen MJ, Callahan BJ, Fisher DS, Holmes SP. Denoising PCR-amplified metagenome data. BMC Bioinformatics. 2012;13(1):283.[PMC free article] [PubMed] [Google Scholar]

29. Brodin J, Mild M, Hedskog C, et al. PCR-induced transitions are the major source of error in cleaned ultra-deep pyrosequencing data. PLoS One. 2013;8(7):e70388–e70388.[PMC free article] [PubMed] [Google Scholar]

30. Rothberg JM, Leamon JH. The development and impact of 454 sequencing. Nat Biotechnol. 2008;26(10):1117–24. [PubMed] [Google Scholar]

31. Thomas T, Gilbert J, Meyer F. Metagenomics – a guide from sampling to data analysis. Microb Inform Exp. 2012;2(1):3.[PMC free article] [PubMed] [Google Scholar]

32. Wommack KE, Bhavsar J, Ravel J. Metagenomics: read length matters. Appl Environ Microbiol. 2008;74(5):1453–63.[PMC free article] [PubMed] [Google Scholar]

33. Bentley DR, Balasubramanian S, Swerdlow HP, et al. Accurate whole human genome sequencing using reversible terminator chemistry. Nature. 2008;456(7218):53–9.[PMC free article] [PubMed] [Google Scholar]

34. Kircher M, Sawyer S, Meyer M. Double indexing overcomes inaccuracies in multiplex sequencing on the illumina platform. Nucleic Acids Res. 2012;40(1):e3–e3.[PMC free article] [PubMed] [Google Scholar]

35. Werner JJ, Zhou D, Caporaso JG, Knight R, Angenent LT. Comparison of Illumina paired-end and single-direction sequencing for microbial 16S rRNA gene amplicon surveys. ISME J. 2011;6(7):1273–6.[PMC free article] [PubMed] [Google Scholar]

36. Metzker ML. Sequencing technologies – the next generation. Nat Rev Genet. 2010;11(1):31–46. [PubMed] [Google Scholar]

37. Chevreux B, Pfisterer T, Drescher B, et al. Using the miraEST assembler for reliable and automated mRNA transcript assembly and SNP detection in sequenced ESTs. Genome Res. 2004;14(6):1147–59.[PMC free article] [PubMed] [Google Scholar]

38. Treangen TJ, Koren S, Sommer DD, et al. MetAMOS: a modular and open source metagenomic assembly and analysis pipeline. Genome Biol. 2013;14(1):R2.[PMC free article] [PubMed] [Google Scholar]

39. Rutherford K, Parkhill J, Crook J, et al. Artemis: sequence visualization and annotation. Bioinformatics. 2000;16(10):944–5. [PubMed] [Google Scholar]

40. Paszkiewicz K, Studholme DJ. De novo assembly of short sequence reads. Brief Bioinform. 2010;11(5):457–72. [PubMed] [Google Scholar]

41. Pevzner PA, Tang H, Waterman MS. An Eulerian path approach to DNA fragment assembly. Proc Natl Acad Sci U S A. 2001;98(17):9748–53.[PMC free article] [PubMed] [Google Scholar]

42. Zerbino DR, Birney E. Velvet: algorithms for de novo short read assembly using de Bruijn graphs. Genome Res. 2008;18(5):821–9.[PMC free article] [PubMed] [Google Scholar]

43. Li R, Li Y, Kristiansen K, Wang J. SOAP: short oligonucleotide alignment program. Bioinformatics. 2008;24(5):713–4. [PubMed] [Google Scholar]

44. Simpson JT, Wong K, Jackman SD, Schein JE, Jones SJ, Birol I. ABySS: a parallel assembler for short read sequence data. Genome Res. 2009;19(6):1117–23.[PMC free article] [PubMed] [Google Scholar]

45. Afiahayati, Sato K, Sakakibara Y. MetaVelvet-SL: an extension of the Velvet assembler to a de novo metagenomic assembler utilizing supervised learning. DNA Res. 2014;22(1):69–77.[PMC free article] [PubMed] [Google Scholar]

46. Namiki T, Hachiya T, Tanaka H, Sakakibara Y. MetaVelvet: an extension of Velvet assembler to de novo metagenome assembly from short sequence reads. Nucleic Acids Res. 2012;40(20):e155.[PMC free article] [PubMed] [Google Scholar]

47. Peng Y, Leung HC, Yiu SM, Chin FY. Meta-IDBA: a de Novo assembler for metagenomic data Bioinformatics 20112713)i94–101. [PMC free article] [PubMed] [Google Scholar]

48. Peng Y, Leung HC, Yiu SM, Chin FY. IDBA-UD: a de novo assembler for single-cell and metagenomic sequencing data with highly uneven depth. Bioinformatics. 2012;28(11):1420–8. [PubMed] [Google Scholar]

49. Earl D, Bradnam K, St John J, et al. Assemblathon 1: a competitive assessment of de novo short read assembly methods. Genome Res. 2011;21(12):2224–41.[PMC free article] [PubMed] [Google Scholar]

50. Miller JR, Koren S, Sutton G. Assembly algorithms for next-generation sequencing data. Genomics. 2010;95(6):315–27.[PMC free article] [PubMed] [Google Scholar]

51. Teeling H, Waldmann J, Lombardot T, Bauer M, Glockner FO. TETRA: a web-service and a stand-alone program for the analysis and comparison of tetranucleotide usage patterns in DNA sequences. BMC Bioinformatics. 2004;5:163.[PMC free article] [PubMed] [Google Scholar]

52. Chan CK, Hsu AL, Halgamuge SK, Tang SL. Binning sequences using very sparse labels within a metagenome. BMC Bioinformatics. 2008;9:215.[PMC free article] [PubMed] [Google Scholar]

53. Chan CK, Hsu AL, Tang SL, Halgamuge SK. Using growing self-organising maps to improve the binning process in environmental whole-genome shotgun sequencing. J Biomed Biotechnol. 2008;2008:513701.[PMC free article] [PubMed] [Google Scholar]

54. McHardy AC, Martin HG, Tsirigos A, Hugenholtz P, Rigoutsos I. Accurate phylogenetic classification of variable-length DNA fragments. Nat Methods. 2007;4(1):63–72. [PubMed] [Google Scholar]

55. Patil KR, Roune L, McHardy AC. The PhyloPythiaS web server for taxonomic assignment of metagenome sequences. PLoS One. 2012;7(6):e38581.[PMC free article] [PubMed] [Google Scholar]

56. Diaz NN, Krause L, Goesmann A, Niehaus K, Nattkemper TW. TACOA: taxonomic classification of environmental genomic fragments using a kernelized nearest neighbor approach. BMC Bioinformatics. 2009;10:56.[PMC free article] [PubMed] [Google Scholar]

57. Zheng H, Wu H. Short prokaryotic DNA fragment binning using a hierarchical classifier based on linear discriminant analysis and principal component analysis. J Bioinform Comput Biol. 2010;8(6):995–1011. [PubMed] [Google Scholar]

58. Ultsch A, Moerchen F. ESOM-Maps: tools for clustering, visualization, and classification with Emergent SOM. Department of Mathematics and Computer Science, University of Marburg; 2005. (Technical Report). [Google Scholar]

59. Dick GJ, Andersson AF, Baker BJ, et al. Community-wide analysis of microbial genome sequence signatures. Genome Biol. 2009;10(8):R85.[PMC free article] [PubMed] [Google Scholar]

60. Pati A, Heath LS, Kyrpides NC, Ivanova N. ClaMS: a classifier for metagenomic sequences. Stand Genomic Sci. 2011;5(2):248–53.[PMC free article] [PubMed] [Google Scholar]

61. Krause L, Diaz NN, Goesmann A, et al. Phylogenetic classification of short environmental DNA fragments. Nucleic Acids Res. 2008;36(7):2230–9.[PMC free article] [PubMed] [Google Scholar]

62. Liu B, Gibbons T, Ghodsi M, Treangen T, Pop M. Accurate and fast estimation of taxonomic profiles from metagenomic shotgun sequences. BMC Genomics. 2011;12(suppl 2):S4.[PMC free article] [PubMed] [Google Scholar]

63. Monzoorul Haque M, Ghosh TS, Komanduri D, Mande SS. SOrt-ITEMS: sequence orthology based approach for improved taxonomic estimation of meta-genomic sequences. Bioinformatics. 2009;25(14):1722–30. [PubMed] [Google Scholar]

64. Markowitz VM, Chen IM, Palaniappan K, et al. IMG 4 version of the integrated microbial genomes comparative analysis system. Nucleic Acids Res. 2014;42(Database issue):D560–7.[PMC free article] [PubMed] [Google Scholar]

65. Glass EM, Wilkening J, Wilke A, Antonopoulos D, Meyer F. Using the metagenomics RAST server (MG-RAST) for analyzing shotgun metagenomes. Cold Spring Harb Protoc. 2010;1:dbrot5368. [PubMed] [Google Scholar]

66. Meyer F, Paarmann D, D’Souza M, et al. The metagenomics RAST server – a public resource for the automatic phylogenetic and functional analysis of metagenomes. BMC Bioinformatics. 2008;9:386.[PMC free article] [PubMed] [Google Scholar]

67. Huson DH, Auch AF, Qi J, Schuster SC. MEGAN analysis of metagenomic data. Genome Res. 2007;17(3):377–86.[PMC free article] [PubMed] [Google Scholar]

68. Huson DH, Mitra S. Introduction to the analysis of environmental sequences: metagenomics with MEGAN. Methods Mol Biol. 2012;856:415–29. [PubMed] [Google Scholar]

69. Huson DH, Weber N. Microbial community analysis using MEGAN. Methods Enzymol. 2013;531:465–85. [PubMed] [Google Scholar]

70. Brady A, Salzberg SL. Phymm and PhymmBL: metagenomic phylogenetic classification with interpolated Markov models. Nat Methods. 2009;6(9):673–6.[PMC free article] [PubMed] [Google Scholar]

71. Wang Y, Leung H, Yiu S, Chin F. MetaCluster-TA: taxonomic annotation for metagenomic data based on assembly-assisted binning. BMC Genomics. 2014;15(suppl 1):S12.[PMC free article] [PubMed] [Google Scholar]

72. Wang Y, Leung HC, Yiu SM, Chin FY. MetaCluster 5.0: a two-round binning approach for metagenomic data for low-abundance species in a noisy sample. Bioinformatics. 2012;28(18):i356–62.[PMC free article] [PubMed] [Google Scholar]

73. Nielsen HB, Almeida M, Juncker AS, et al. MetaHIT Consortium Identification and assembly of genomes and genetic elements in complex metagenomic samples without using reference genomes. Nat Biotechnol. 2014;32(8):822–8. [PubMed] [Google Scholar]

74. Su X, Xu J, Ning K. Parallel-META: efficient metagenomic data analysis based on high-performance computation. BMC Syst Biol. 2012;6(suppl 1):S16.[PMC free article] [PubMed] [Google Scholar]

75. Blankenberg D, Von Kuster G, Coraor N, et al. Galaxy: a web-based genome analysis tool for experimentalists. Curr Protoc Mol Biol. 2010:1–21. Chapter 19. [PMC free article] [PubMed] [Google Scholar]

76. Giardine B, Riemer C, Hardison RC, et al. Galaxy: a platform for interactive large-scale genome analysis. Genome Res. 2005;15(10):1451–5.[PMC free article] [PubMed] [Google Scholar]

77. Goecks J, Nekrutenko A, Taylor J, Galaxy T. Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences. Genome Biol. 2010;11(8):R86.[PMC free article] [PubMed] [Google Scholar]

78. Cox MP, Peterson DA, Biggs PJ. SolexaQA: at-a-glance quality assessment of Illumina second-generation sequencing data. BMC Bioinformatics. 2010;11:485.[PMC free article] [PubMed] [Google Scholar]

79. Li S, Chou HH. LUCY2: an interactive DNA sequence quality trimming and vector removal tool. Bioinformatics. 2004;20(16):2865–6. [PubMed] [Google Scholar]

80. Ewing B, Green P. Base-calling of automated sequencer traces using phred. II. Error probabilities. Genome Res. 1998;8(3):186–94. [PubMed] [Google Scholar]

81. Ewing B, Hillier L, Wendl MC, Green P. Base-calling of automated sequencer traces using phred. I. Accuracy assessment. Genome Res. 1998;8(3):175–85. [PubMed] [Google Scholar]

82. Morgulis A, Gertz EM, Schaffer AA, Agarwala R. A fast and symmetric DUST implementation to mask low-complexity DNA sequences. J Comput Biol. 2006;13(5):1028–40. [PubMed] [Google Scholar]

83. Langmead B, Salzberg SL. Fast gapped-read alignment with Bowtie 2. Nat Methods. 2012;9(4):357–9.[PMC free article] [PubMed] [Google Scholar]

84. Zhu W, Lomsadze A, Borodovsky M. Ab initio gene identification in metagenomic sequences. Nucleic Acids Res. 2010;38(12)(e132)[PMC free article] [PubMed] [Google Scholar]

85. Noguchi H, Park J, Takagi T. MetaGene: prokaryotic gene finding from environmental genome shotgun sequences. Nucleic Acids Res. 2006;34(19):5623–30.[PMC free article] [PubMed] [Google Scholar]

86. Hyatt D, Chen GL, Locascio PF, Land ML, Larimer FW, Hauser LJ. Prodigal: prokaryotic gene recognition and translation initiation site identification. BMC Bioinformatics. 2010;11:119.[PMC free article] [PubMed] [Google Scholar]

87. Hoff KJ, Lingner T, Meinicke P, Tech M. Orphelia: predicting genes in metagenomic sequencing reads. Nucleic Acids Res. 2009;37(Web Server issue):W101–5.[PMC free article] [PubMed] [Google Scholar]

88. Rho M, Tang H, Ye Y. FragGeneScan: predicting genes in short and error-prone reads. Nucleic Acids Res. 2010;38(20):e191.[PMC free article] [PubMed] [Google Scholar]

89. Bland C, Ramsey TL, Sabree F, et al. CRISPR recognition tool (CRT): a tool for automatic detection of clustered regularly interspaced palindromic repeats. BMC Bioinformatics. 2007;8:209.[PMC free article] [PubMed] [Google Scholar]

90. Edgar RC. PILER-CR: fast and accurate identification of CRISPR repeats. BMC Bioinformatics. 2007;8:18.[PMC free article] [PubMed] [Google Scholar]

91. Lowe TM, Eddy SR. tRNAscan-SE: a program for improved detection of transfer RNA genes in genomic sequence. Nucleic Acids Res. 1997;25(5):955–64.[PMC free article] [PubMed] [Google Scholar]

92. Schattner P, Brooks AN, Lowe TM. The tRNAscan-SE, snoscan and snoGPS web servers for the detection of tRNAs and snoRNAs. Nucleic Acids Res. 2005;33(Web Server issue):W686–9.[PMC free article] [PubMed] [Google Scholar]

93. Quast C, Pruesse E, Yilmaz P, et al. The SILVA ribosomal RNA gene database project: improved data processing and web-based tools. Nucleic Acids Res. 2013;41(Database issue):D590–6.[PMC free article] [PubMed] [Google Scholar]

94. DeSantis TZ, Hugenholtz P, Larsen N, et al. Greengenes, a chimera-checked 16S rRNA gene database and workbench compatible with ARB. Appl Environ Microbiol. 2006;72(7):5069–72.[PMC free article] [PubMed] [Google Scholar]

95. Cole JR, Chai B, Farris RJ, et al. The ribosomal database project (RDP-II): introducing myRDP space and quality controlled public data. Nucleic Acids Res. 2007;35(Database issue):D169–72.[PMC free article] [PubMed] [Google Scholar]

96. Maidak BL, Olsen GJ, Larsen N, Overbeek R, McCaughey MJ, Woese CR. The ribosomal database project (RDP) Nucleic Acids Res. 1996;24(1):82–5.[PMC free article] [PubMed] [Google Scholar]

97. Edgar RC. Search and clustering orders of magnitude faster than BLAST. Bioinformatics. 2010;26(19):2460–1. [PubMed] [Google Scholar]

98. Du J, Yuan Z, Ma Z, Song J, Xie X, Chen Y. KEGG-PATH: Kyoto encyclopedia of genes and genomes-based pathway analysis using a path analysis model. Mol Biosyst. 2014;10(9):2441–7. [PubMed] [Google Scholar]

Nh Ng Ph N M M Download Torrent T T Nh T 2015 For Sale

99. Ogata H, Goto S, Sato K, Fujibuchi W, Bono H, Kanehisa M. KEGG: Kyoto encyclopedia of genes and genomes. Nucleic Acids Res. 1999;27(1):29–34.[PMC free article] [PubMed] [Google Scholar]

100. Overbeek R, Begley T, Butler RM, et al. The subsystems approach to genome annotation and its use in the project to annotate 1000 genomes. Nucleic Acids Res. 2005;33(17):5691–702.[PMC free article] [PubMed] [Google Scholar]

101. Powell S, Forslund K, Szklarczyk D, et al. eggNOG v4.0: nested orthology inference across 3686 organisms. Nucleic Acids Res. 2014;42(Database issue):D231–9.[PMC free article] [PubMed] [Google Scholar]

102. Tatusov RL, Galperin MY, Natale DA, Koonin EV. The COG database: a tool for genome-scale analysis of protein functions and evolution. Nucleic Acids Res. 2000;28(1):33–6.[PMC free article] [PubMed] [Google Scholar]

103. Bateman A, Birney E, Durbin R, Eddy SR, Howe KL, Sonnhammer EL. The Pfam protein families database. Nucleic Acids Res. 2000;28(1):263–6.[PMC free article] [PubMed] [Google Scholar]

104. Finn RD, Bateman A, Clements J, et al. Pfam: the protein families database. Nucleic Acids Res. 2014;42(Database issue):D222–30.[PMC free article] [PubMed] [Google Scholar]

105. Haft DH, Selengut JD, White O. The TIGRFAMs database of protein families. Nucleic Acids Res. 2003;31(1):371–3.[PMC free article] [PubMed] [Google Scholar]

106. Hunter S, Apweiler R, Attwood TK, et al. InterPro: the integrative protein signature database. Nucleic Acids Res. 2009;37(Database issue):D211–5.[PMC free article] [PubMed] [Google Scholar]

107. Kent WJ. BLAT – the BLAST-like alignment tool. Genome Res. 2002;12(4):656–64.[PMC free article] [PubMed] [Google Scholar]

108. Hunter S, Corbett M, Denise H, et al. EBI metagenomics – a new resource for the analysis and archiving of metagenomic data. Nucleic Acids Res. 2014;42(Database issue):D600–6.[PMC free article] [PubMed] [Google Scholar]

109. Leinonen R, Akhtar R, Birney E, et al. The European nucleotide archive. Nucleic Acids Res. 2011;39(Database issue):D28–31.[PMC free article] [PubMed] [Google Scholar]

110. Lee JH, Yi H, Chun J. rRNASelector: a computer program for selecting ribosomal RNA encoding sequences from metagenomic and metatranscriptomic shotgun libraries. J Microbiol. 2011;49(4):689–91. [PubMed] [Google Scholar]

111. Caporaso JG, Kuczynski J, Stombaugh J, et al. QIIME allows analysis of high-throughput community sequencing data. Nat Methods. 2010;7(5):335–6.[PMC free article] [PubMed] [Google Scholar]

112. Seshadri R, Kravitz SA, Smarr L, Gilna P, Frazier M. CAMERA: a community resource for metagenomics. PLoS Biol. 2007;5(3):e75.[PMC free article] [PubMed] [Google Scholar]

113. Ondov BD, Bergman NH, Phillippy AM. Interactive metagenomic visualization in a Web browser. BMC Bioinformatics. 2011;12:385.[PMC free article] [PubMed] [Google Scholar]

114. Woese CR. Bacterial evolution. Microbiol Rev. 1987;51(2):221–71.[PMC free article] [PubMed] [Google Scholar]

115. Amann RI, Ludwig W, Schleifer KH. Phylogenetic identification and in situ detection of individual microbial cells without cultivation. Microbiol Rev. 1995;59(1):143–69.[PMC free article] [PubMed] [Google Scholar]

116. Muyzer G. DGGE/TGGE a method for identifying genes from natural ecosystems. Curr Opin Microbiol. 1999;2(3):317–22. [PubMed] [Google Scholar]

117. Rusch DB, Halpern AL, Sutton G, et al. The Sorcerer II global ocean sampling expedition: northwest Atlantic through eastern tropical Pacific. PLoS Biol. 2007;5(3):e77.[PMC free article] [PubMed] [Google Scholar]

118. Jones RT, Robeson MS, Lauber CL, Hamady M, Knight R, Fierer N. A comprehensive survey of soil acidobacterial diversity using pyrosequencing and clone library analyses. ISME J. 2009;3(4):442–53.[PMC free article] [PubMed] [Google Scholar]

119. Luna RA, Fasciano LR, Jones SC, Boyanton BL, Jr, Ton TT, Versalovic J. DNA pyrosequencing-based bacterial pathogen identification in a pediatric hospital setting. J Clin Microbiol. 2007;45(9):2985–92.[PMC free article] [PubMed] [Google Scholar]

120. Thompson FL, Bruce T, Gonzalez A, et al. Coastal bacterioplankton community diversity along a latitudinal gradient in Latin America by means of V6 tag pyrosequencing. Arch Microbiol. 2011;193(2):105–14. [PubMed] [Google Scholar]

121. Schloss PD, Westcott SL, Ryabin T, et al. Introducing mothur: open-source, platform-independent, community-supported software for describing and comparing microbial communities. Appl Environ Microbiol. 2009;75(23):7537–41.[PMC free article] [PubMed] [Google Scholar]

122. Quince C, Lanzen A, Davenport RJ, Turnbaugh PJ. Removing noise from pyrosequenced amplicons. BMC Bioinformatics. 2011;12:38.[PMC free article] [PubMed] [Google Scholar]

123. Nilakanta H, Drews KL, Firrell S, Foulkes MA, Jablonski KA. A review of software for analyzing molecular sequences. BMC Res Notes. 2014;7(1):830–830.[PMC free article] [PubMed] [Google Scholar]

124. Cole JR, Wang Q, Cardenas E, et al. The ribosomal database project: improved alignments and new tools for rRNA analysis. Nucleic Acids Res. 2009;37(Database issue):D141–5.[PMC free article] [PubMed] [Google Scholar]

125. Pruesse E, Quast C, Knittel K, et al. SILVA: a comprehensive online resource for quality checked and aligned ribosomal RNA sequence data compatible with ARB. Nucleic Acids Res. 2007;35(21):7188–96.[PMC free article] [PubMed] [Google Scholar]

126. Kõljalg U, Nilsson RH, Abarenkov K, et al. Towards a unified paradigm for sequence-based identification of fungi. Mol Ecol. 2013;22(21):5271–7. [PubMed] [Google Scholar]

127. Balzer S, Malde K, Grohme MA, Jonassen I. Filtering duplicate reads from 454 pyrosequencing data. Bioinformatics. 2013;29(7):830–6.[PMC free article] [PubMed] [Google Scholar]

128. Bragg L, Stone G, Imelfort M, Hugenholtz P, Tyson GW. Fast, accurate error-correction of amplicon pyrosequences using Acacia. Nat Methods. 2012;9(5):425–6. [PubMed] [Google Scholar]

129. Iyer S, Bouzek H, Deng W, Larsen B, Casey E, Mullins JI. Quality score based identification and correction of pyrosequencing errors. PLoS One. 2013;8(9):e73015–e73015.[PMC free article] [PubMed] [Google Scholar]

130. Keegan KP, Trimble WL, Wilkening J, et al. A platform-independent method for detecting errors in metagenomic sequencing data: DRISEE. PLoS Comput Biol. 2012;8(6):e1002541–e1002541.[PMC free article] [PubMed] [Google Scholar]

131. Reeder J, Knight R. Rapidly denoising pyrosequencing amplicon reads by exploiting rank-abundance distributions. Nat Methods. 2010;7(9):668–9.[PMC free article] [PubMed] [Google Scholar]

132. Gaspar JM, Thomas W. The consequences of denoising marker-based metagenomic data. BMC Proc. 2012;6(suppl 6):11–11.[Google Scholar]

133. Bakker MG, Tu ZJ, Bradeen JM, Kinkel LL. Implications of pyrosequencing error correction for biological data interpretation. PLoS One. 2012;7(8):e44357–e44357.[PMC free article] [PubMed] [Google Scholar]

134. Schloss PD, Handelsman J. Introducing DOTUR, a computer program for defining operational taxonomic units and estimating species richness. Appl Environ Microbiol. 2005;71(3):1501–6.[PMC free article] [PubMed] [Google Scholar]

135. Schloss PD, Handelsman J. Introducing SONS, a tool for operational taxonomic unit-based comparisons of microbial community memberships and structures. Appl Environ Microbiol. 2006;72(10):6773–9.[PMC free article] [PubMed] [Google Scholar]

136. Schloss P, Handelsman J. Introducing TreeClimber, a test to compare microbial community structures. Appl Environ Microbiol. 2006;72(4):2379–2379.[PMC free article] [PubMed] [Google Scholar]

137. Koskinen K, Auvinen P, Bjorkroth KJ, Hultman J. Inconsistent denoising and clustering algorithms for amplicon sequence data. J Comput Biol. 2014. [Ahead of print]. Accessed at http://online.liebertpub.com/doi/abs/10.1089/cmb.2014.0268. [PubMed] [CrossRef]

138. Hwang K, Oh J, Kim TK, et al. CLUSTOM: a novel method for clustering 16S rRNA next generation sequences by overlap minimization. PLoS One. 2013;8(5):e62623–e62623.[PMC free article] [PubMed] [Google Scholar]

Nh ng ph n m m download torrent t t nh t 2015 for sale

139. Patin NV, Kunin V, Lidström U, Ashby MN. Effects of OTU clustering and PCR artifacts on microbial diversity estimates. Microb Ecol. 2013;65(3):709–19. [PubMed] [Google Scholar]

140. Preheim SP, Perrotta AR, Martin-Platero AM, Gupta A, Alm EJ. Distribution-based clustering: using ecology to refine the operational taxonomic unit. Appl Environ Microbiol. 2013;79(21):6593–603.[PMC free article] [PubMed] [Google Scholar]

141. Sun Y, Cai Y, Liu L, et al. ESPRIT: estimating species richness using large collections of 16S rRNA pyrosequences. Nucleic Acids Res. 2009;37(10):e76.[PMC free article] [PubMed] [Google Scholar]

142. FigTree. Available at: http://tree.bio.ed.ac.uk/software/figtree/.0000.

143. McDonald D, Clemente JC, Kuczynski J, et al. The biological observation matrix (BIOM) format or: how I learned to stop worrying and love the ome-ome. Giga Sci. 2012;1(1):7.[PMC free article] [PubMed] [Google Scholar]

144. Chao A. Nonparametric estimation of the number of classes in a population. Scand J Stat. 1984;11:265–70.[Google Scholar]

145. Lozupone C, Hamady M, Knight R. UniFrac – an online tool for comparing microbial community diversity in a phylogenetic context. BMC Bioinformatics. 2006;7:371.[PMC free article] [PubMed] [Google Scholar]

146. Clarke KG, Gorley RN. PRIMER v6: User Manual/Tutorial. Plymouth: PRIMER-E; 2006. [Google Scholar]

147. Team RDC. R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing; 2008. p. I. [Google Scholar]

148. Oksanen J, Kindt R, Legendre P, et al. The vegan package. 2008;10(01):2008.[Google Scholar]

149. McMurdie PJ Holmes S. phyloseq: an R package for reproducible interactive analysis and graphics of microbiome census data. PLoS One. 2013;8(4):e61217–e61217.[PMC free article] [PubMed] [Google Scholar]

150. Gentleman RC, Carey VJ, Bates DM, et al. Bioconductor: open software development for computational biology and bioinformatics. Genome Biol. 2004;5(10):R80–R80.[PMC free article] [PubMed] [Google Scholar]

151. Pavlopoulos GA, Oulas A, Iacucci E, et al. Unraveling genomic variation from next generation sequencing data. BioData Min. 2013;6(1):13.[PMC free article] [PubMed] [Google Scholar]

152. Yilmaz P, Kottmann R, Field D, et al. Minimum information about a marker gene sequence (MIMARKS) and minimum information about any (x) sequence (MIxS) specifications. Nat Biotechnol. 2011;29(5):415–20.[PMC free article] [PubMed] [Google Scholar]

153. Reddy TB, Thomas AD, Stamatis D, et al. The Genomes OnLine Database (GOLD) v.5: a metadata management system based on a four level (meta)genome project classification. Nucleic Acids Res. 2014;43(Database issue):D1099–106.[PMC free article] [PubMed] [Google Scholar]

154. Mavromatis K, Ivanova N, Barry K, et al. Use of simulated data sets to evaluate the fidelity of metagenomic processing methods. Nat Methods. 2007;4(6):495–500. [PubMed] [Google Scholar]

155. Fu L, Niu B, Zhu Z, Wu S, Li W. CD-HIT: accelerated for clustering the next-generation sequencing data. Bioinformatics. 2012;28(23):3150–2.[PMC free article] [PubMed] [Google Scholar]

156. Li W, Godzik A. Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences. Bioinformatics. 2006;22(13):1658–9. [PubMed] [Google Scholar]

157. Gilbert JA, Field D, Swift P, et al. The taxonomic and functional diversity of microbes at a temperate coastal site: a ‘multi-omic’ study of seasonal and diel temporal variation. PLoS One. 2010;5(11):e15545.[PMC free article] [PubMed] [Google Scholar]

158. Lespinet O, Labedan B. Orphan enzymes? Science. 2005;307(5706):42. [PubMed] [Google Scholar]

159. Yamada T, Waller AS, Raes J, et al. Prediction and identification of sequences coding for orphan enzymes using genomic and metagenomic neighbours. Mol Syst Biol. 2012;8:581.[PMC free article] [PubMed] [Google Scholar]

160. Smith AA, Belda E, Viari A, Medigue C, Vallenet D. The CanOE strategy: integrating genomic and metabolic contexts across multiple prokaryote genomes to find candidate genes for orphan enzymes. PLoS Comput Biol. 2012;8(5):e1002540.[PMC free article] [PubMed] [Google Scholar]

161. Cruaud P, Vigneron A, Lucchetti-Miganeh C, Ciron PE, Godfroy A, Cambon-Bonavita M-A. Influence of DNA extraction methods, 16S rRNA targeted hypervariable regions and sample origins on the microbial diversity detected by 454 pyrosequencing in marine chemosynthetic ecosystems. Appl Environ Microbiol. 2014;80(15):4626–39.[PMC free article] [PubMed] [Google Scholar]

162. Vishnivetskaya TA, Layton AC, Lau MC, et al. Commercial DNA extraction kits impact observed microbial community composition in permafrost samples. FEMS Microbiol Ecol. 2014;87(1):217–30. [PubMed] [Google Scholar]

163. Kim M, Morrison M, Yu Z. Evaluation of different partial 16S rRNA gene sequence regions for phylogenetic analysis of microbiomes. J Microbiol Methods. 2011;84(1):81–7. [PubMed] [Google Scholar]

164. Klindworth A, Pruesse E, Schweer T, et al. Evaluation of general 16S ribosomal RNA gene PCR primers for classical and next-generation sequencing-based diversity studies. Nucleic Acids Res. 2013;41(1):e1–e1.[PMC free article] [PubMed] [Google Scholar]

165. Soergel DAW, Dey N, Knight R, Brenner SE. Selection of primers for optimal taxonomic classification of environmental 16S rRNA gene sequences. ISME J. 2012;6(7):1440–4.[PMC free article] [PubMed] [Google Scholar]

166. Harismendy O, Ng PC, Strausberg RL, et al. Evaluation of next generation sequencing platforms for population targeted sequencing studies. Genome Biol. 2009;10(3):R32–R32.[PMC free article] [PubMed] [Google Scholar]

167. Sun Y, Cai Y, Huse SM, et al. A large-scale benchmark study of existing algorithms for taxonomy-independent microbial community analysis. Brief Bioinform. 2012;13(1):107–21.[PMC free article] [PubMed] [Google Scholar]

168. Richter DC, Ott F, Auch AF, Schmid R, Huson DH. MetaSim: a sequencing simulator for genomics and metagenomics. PLoS One. 2008;3(10):1–12.[PMC free article] [PubMed] [Google Scholar]

169. D’Argenio V, Casaburi G, Precone V, Salvatore F. Comparative metagenomic analysis of human gut microbiome composition using two different bioinformatic pipelines. Biomed Res Int. 2014;2014:325340.[PMC free article] [PubMed] [Google Scholar]

170. Sangwan N, Lata P, Dwivedi V, et al. Comparative metagenomic analysis of soil microbial communities across three hexachlorocyclohexane contamination levels. PLoS One. 2012;7(9):e46219.[PMC free article] [PubMed] [Google Scholar]

171. Peterlongo P, Chikhi R. Mapsembler, targeted and micro assembly of large NGS datasets on a desktop computer. BMC Bioinformatics. 2012;13:48.[PMC free article] [PubMed] [Google Scholar]

172. Maccallum I, Przybylski D, Gnerre S, et al. ALLPATHS 2: small genomes assembled accurately and with high continuity from short paired reads. Genome Biol. 2009;10(10):R103.[PMC free article] [PubMed] [Google Scholar]

173. Butler J, MacCallum I, Kleber M, et al. ALLPATHS: de novo assembly of whole-genome shotgun microreads. Genome Res. 2008;18(5):810–20.[PMC free article] [PubMed] [Google Scholar]

174. Ye Y, Tang H. An ORFome assembly approach to metagenomics sequences analysis. J Bioinform Comput Biol. 2009;7(3):455–71.[PMC free article] [PubMed] [Google Scholar]

175. Ye Y, Tang H. An ORFome assembly approach to metagenomics sequences analysis. J Bioinform Comput Biol. 2008;7:3–13.[Google Scholar]

176. Koslicki D, Foucart S, Rosen G. WGSQuikr: fast whole-genome shotgun metagenomic classification. PLoS One. 2014;9(3):e91784.[PMC free article] [PubMed] [Google Scholar]

177. Mohammed MH, Ghosh TS, Singh NK, Mande SS. SPHINX – an algorithm for taxonomic binning of metagenomic sequences. Bioinformatics. 2011;27(1):22–30. [PubMed] [Google Scholar]

178. Lanzén A, Jørgensen SL, Huson DH, et al. CREST – classification resources for environmental sequence tags. PLoS One. 2012;7(11):e49334.[PMC free article] [PubMed] [Google Scholar]

179. Sunagawa S, Mende DR, Zeller G, et al. Metagenomic species profiling using universal phylogenetic marker genes. Nat Methods. 2013;10(12):1196–9. [PubMed] [Google Scholar]

180. Wood DE, Salzberg SL. Kraken: ultrafast metagenomic sequence classification using exact alignments. Genome Biol. 2014;15(3):R46.[PMC free article] [PubMed] [Google Scholar]

181. Rosen GL, Reichenberger ER, Rosenfeld AM. NBC: the Naive Bayes classification tool webserver for taxonomic classification of metagenomic reads. Bioinformatics. 2011;27(1):127–9.[PMC free article] [PubMed] [Google Scholar]

182. Luo C, Rodriguez RL, Konstantinidis KT. MyTaxa: an advanced taxonomic classifier for genomic and metagenomic sequences. Nucleic Acids Res. 2014;42(8):e73.[PMC free article] [PubMed] [Google Scholar]

183. MacDonald NJ, Parks DH, Beiko RG. Rapid identification of high-confidence taxonomic assignments for metagenomic data. Nucleic Acids Res. 2012;40(14):e111.[PMC free article] [PubMed] [Google Scholar]

184. Wang X, Wang Y, Yue B, Zhang X, Liu S. The complete mitochondrial genome of the Bufo tibetanus (Anura: Bufonidae) Mitochondrial DNA. 2013;24(3):186–8. [PubMed] [Google Scholar]

185. Ghosh TS, Mohammed MH, Komanduri D, Mande SS. ProViDE: a software tool for accurate estimation of viral diversity in metagenomic samples. Bioinformation. 2011;6(2):91–4.[PMC free article] [PubMed] [Google Scholar]

186. Tatusov RL, Fedorova ND, Jackson JD, et al. The COG database: an updated version includes eukaryotes. BMC Bioinformatics. 2003;4:41.[PMC free article] [PubMed] [Google Scholar]

187. Finn RD, Miller BL, Clements J, Bateman A. iPfam: a database of protein family and domain interactions found in the protein data bank. Nucleic Acids Res. 2014;42(Database issue):D364–73.[PMC free article] [PubMed] [Google Scholar]

188. Segata N, Waldron L, Ballarini A, Narasimhan V, Jousson O, Huttenhower C. Metagenomic microbial community profiling using unique clade-specific marker genes. Nat Methods. 2012;9(8):811–4.[PMC free article] [PubMed] [Google Scholar]

189. Churbanov A, Ryan R, Hasan N, et al. HighSSR: high-throughput SSR characterization and locus development from next-gen sequencing data. Bioinformatics. 2012;28(21):2797–803.[PMC free article] [PubMed] [Google Scholar]

190. Markowitz VM, Chen IM, Chu K, et al. IMG/M 4 version of the integrated metagenome comparative analysis system. Nucleic Acids Res. 2014;42(Database issue):D568–73.[PMC free article] [PubMed] [Google Scholar]

191. Su X, Pan W, Song B, Xu J, Ning K. Parallel-META 2.0: enhanced metagenomic data analysis with functional annotation, high performance computing and advanced visualization. PLoS One. 2014;9(3):e89323.[PMC free article] [PubMed] [Google Scholar]

192. Goll J, Rusch DB, Tanenbaum DM, et al. METAREP: JCVI metagenomics reports – an open source tool for high-performance comparative metagenomics. Bioinformatics. 2010;26(20):2631–2.[PMC free article] [PubMed] [Google Scholar]

193. Angly F, Rodriguez-Brito B, Bangor D, et al. PHACCS, an online tool for estimating the structure and diversity of uncultured viral communities using metagenomic information. BMC Bioinformatics. 2005;6:41.[PMC free article] [PubMed] [Google Scholar]

194. Kuczynski J, Stombaugh J, Walters WA, Gonzalez A, Caporaso JG, Knight R. Using QIIME to analyze 16S rRNA gene sequences from microbial communities. Curr Protoc Bioinformatics. 2011 Chapter 10:Unit10.17. [PMC free article] [PubMed] [Google Scholar]

Aim Download

195. Nebel ME, Wild S, Holzhauser M, et al. Jaguc – a software package for environmental diversity analyses. J Bioinform Comput Biol. 2011;9(6):749–73. [PubMed] [Google Scholar]

196. Wang X, Yao J, Sun Y, Mai V. M-pick, a modularity-based method for OTU picking of 16S rRNA sequences. BMC Bioinformatics. 2013;14:43.[PMC free article] [PubMed] [Google Scholar]

197. Beck D, Settles M, Foster JA. OTUbase: an R infrastructure package for operational taxonomic unit data. Bioinformatics. 2011;27(12):1700–1.[PMC free article] [PubMed] [Google Scholar]

198. Angly FE, Dennis PG, Skarshewski A, Vanwonterghem I, Hugenholtz P, Tyson GW. CopyRighter: a rapid tool for improving the accuracy of microbial community profiles through lineage-specific gene copy number correction. Microbiome. 2014;2:11.[PMC free article] [PubMed] [Google Scholar]

199. Ye Y. Identification and quantification of abundant species from pyrosequences of 16S rRNA by consensus alignment. Proceedings IEEE Int Conf Bioinformatics Biomed. 2011;2010:153–7.[PMC free article] [PubMed] [Google Scholar]

200. Lozupone C, Knight R. UniFrac: a new phylogenetic method for comparing microbial communities. Appl Environ Microbiol. 2005;71(12):8228–35.[PMC free article] [PubMed] [Google Scholar]

201. Cai Y, Sun Y. ESPRIT-Tree: hierarchical clustering analysis of millions of 16S rRNA pyrosequences in quasilinear computational time. Nucleic Acids Res. 2011;39(14)[PMC free article] [PubMed] [Google Scholar]

202. Weisman D, Yasuda M, Bowen JL. FunFrame: functional gene ecological analysis pipeline. Bioinformatics. 2013;29(9):1212–4. [PubMed] [Google Scholar]

203. Giongo A, Crabb DB, Davis-Richardson AG, et al. PANGEA: pipeline for analysis of next generation amplicons. ISME J. 2010;4(7):852–61.[PMC free article] [PubMed] [Google Scholar]

204. Yu Y, Breitbart M, McNairnie P, Rohwer F. FastGroupII: a web-based bioinformatics platform for analyses of large 16S rDNA libraries. BMC Bioinformatics. 2006;7:57.[PMC free article] [PubMed] [Google Scholar]

205. Kumar S, Carlsen T, Mevik BH, et al. CLOTU: an online pipeline for processing and clustering of 454 amplicon reads into OTUs followed by taxonomic annotation. BMC Bioinformatics. 2011;12:182.[PMC free article] [PubMed] [Google Scholar]

206. Edgar RC, Haas BJ, Clemente JC, Quince C, Knight R. UCHIME improves sensitivity and speed of chimera detection. Bioinformatics. 2011;27(16):2194–200.[PMC free article] [PubMed] [Google Scholar]

207. Huber T, Faulkner G, Hugenholtz P. Bellerophon: a program to detect chimeric sequences in multiple sequence alignments. Bioinformatics. 2004;20(14):2317–9. [PubMed] [Google Scholar]

208. Pandey RV, Nolte V, Boenigk J, Schlotterer C. CANGS DB: a stand-alone web-based database tool for processing, managing and analyzing 454 data in biodiversity studies. BMC Res Notes. 2011;4:227.[PMC free article] [PubMed] [Google Scholar]

209. Pandey RV, Nolte V, Schlotterer C. CANGS: a user-friendly utility for processing and analyzing 454 GS-FLX data in biodiversity studies. BMC Res Notes. 2010;3:3.[PMC free article] [PubMed] [Google Scholar]

210. Cole JR, Chai B, Marsh TL, et al. Ribosomal Database Project The Ribosomal Database Project (RDP-II): previewing a new autoaligner that allows regular updates and the new prokaryotic taxonomy. Nucleic Acids Res. 2003;31(1):442–3.[PMC free article] [PubMed] [Google Scholar]

Articles from Bioinformatics and Biology Insights are provided here courtesy of SAGE Publications

Published online 2016 Dec 2. doi: 10.2147/OTT.S99807

PMID: 27980425

This article has been cited by other articles in PMC.

Abstract

Technological advances have led to the introduction of next-generation sequencing (NGS) platforms in cancer investigation. NGS allows massive parallel sequencing that affords maximal tumor genomic assessment. NGS approaches are different, and concern DNA and RNA analysis. DNA sequencing includes whole-genome, whole-exome, and targeted sequencing, which focuses on a selection of genes of interest for a specific disease. RNA sequencing facilitates the detection of alternative gene-spliced transcripts, posttranscriptional modifications, gene fusion, mutations/single-nucleotide polymorphisms, small and long noncoding RNAs, and changes in gene expression. Most applications are in the cancer research field, but lately NGS technology has been revolutionizing cancer molecular diagnostics, due to the many advantages it offers compared to traditional methods. There is greater knowledge on solid cancer diagnostics, and recent interest has been shown also in the field of hematologic cancer. In this review, we report the latest data on NGS diagnostic/predictive clinical applications in solid and hematologic cancers. Moreover, since the amount of NGS data produced is very large and their interpretation is very complex, we briefly discuss two bioinformatic aspects, variant-calling accuracy and copy-number variation detection, which are gaining a lot of importance in cancer-diagnostic assessment.

Keywords: hereditary breast cancer, melanoma, prostate cancer, thyroid cancer, lung cancer, colorectal cancer, hematologic cancer

Introduction

In recent years, next-generation sequencing (NGS) technologies have played an essential role in the understanding of the altered genetic pathways involved in human cancer. Compared to earlier genome-sequencing methods, numerous advantages characterize NGS. Primarily, this is a high-throughput method, as it allows massive parallel sequencing consisting of simultaneous sequencing of multiple targeted genomic regions in multiple samples in order to detect concomitant mutations in the same run. Another important advantage in routine tumor sequencing is the reduced turnaround time of analysis, which leads to reduced clinical reporting time. Moreover, an analysis in NGS requires very low input of DNA/RNA, in contrast to traditional sequencing methods. A variety of genomic aberrations with high accuracy and sensitivity can be screened simultaneously, such as single/multiple-nucleotide variants, small and large insertions and deletions, copy-number variations (CNVs), and fusion transcripts. The sensitivity of NGS is higher than Sanger sequencing (detection of 2%–10% versus 15%–25% allele frequency, respectively), and allows quantitative evaluation of the mutated allele.

NGS workflow is constituted by different steps, from nucleic acid extraction to variant annotation, as shown in Figure 1. There are currently three main companies offering NGS platforms: Roche, Illumina, and Life Technologies (Thermo Fisher Scientific, Waltham, MA, USA). Each of the available platforms uses different sequencing chemistry and methods for signal detection. Roche 454 platforms employ pyrosequencing, whereby a chemiluminescent signal indicates base incorporation and the intensity of the signal correlates with the number of bases incorporated through homopolymer reads. However, the NGS platforms most commonly used employ sequencing by synthesis, in which the DNA strand to be sequenced is used as a template, a complementary strand is synthesized, and consequently the sequence of the template strand is obtained. Illumina MiSeq and HiSeq sequencers use four distinct fluorescently labeled nucleotides and optical imaging to visualize the growing complementary strand. The error rate estimated for Illumina technology is <0.4%., Instead, Life Technologies uses a nonoptical approach and unlabeled nucleotides. Sequencing by synthesis is performed in microscopic wells interfaced with a semiconductor chip. The DNA is clonally amplified on microscopic beads. After incorporation of nucleotides one at a time, the protons released result in a change in pH, measured by the semiconductor chip. The error rate estimated for Ion Torrent technology is 1.8%–1.9%, mostly in the detection of homopolymer stretches.,

NGS workflow from nucleic acid extraction to variant annotation.

Abbreviation: NGS, next-generation sequencing.

NGS approaches are different, and concern tumoral DNA and RNA analysis. DNA sequencing includes whole-genome sequencing (WGS), whole-exome sequencing (WES), and targeted sequencing. WGS allows sequencing of the entire genome, requiring a large DNA sample. To detect clinical mutations accurately, 100- to 200-fold sequencing coverage may be needed, which is both time- and cost-prohibitive. Usually, a 30- to 60-fold sequencing, sufficient to identify structural rearrangements, is employed. WES focuses on the coding regions (exons) of a genome, typically ~2.5% of the human genome, to discover rare or common variants associated with a disorder or phenotype. WES reduces cost and time compared to WGS. The most common methods rely on hybridization by oligonucleotide probes to “capture” targeted DNA fragments, thereby enriching for exonic sequences. Targeted sequencing, focusing on a selection of genes of interest for a specific disease, could be more accurate and accessible in terms of time and cost for clinical applications for more laboratories.

RNA sequencing (RNA-Seq) facilitates the detection of alternative gene-spliced transcripts, posttranscriptional modifications, gene fusion, mutations/single-nucleotide polymorphisms (SNPs), and changes in gene expression. The extracted RNA is first enriched and reverse-transcribed into complementary DNA, which is then processed. Moreover, with the NGS approach, it is possible to investigate epigenetic alterations, such as promoter methylation, microRNAs, and the expression of other small RNAs, even if currently there are no relevant panels available to use in diagnostics. Life Technologies is engaging more in the setup of specific kits for the disease (Ion AmpliSeq Colon and Lung Panel version 2, BRCA1/2 Panel, AML Panel, and RNA Lung Fusion Panel) with respect to the Illumina approach, which is based on the development of a generic cancer-panel kit including information on the genes of several cancers (TruSeq Amplicon and TruSight Cancer).

Although NGS is extensively used for research purposes, its application in clinical practice has not been fully formalized with guidelines, due to the novelty of the approach. Despite this, NGS is beginning to be widely used for diagnostic requests. The Italian Society of Human Genetics has recently released early indications on this topic, summarizing criteria needed for new NGS-based molecular diagnosis.

This review includes the advances and initial clinical applications of NGS in solid and hematologic cancer diagnosis. Moreover, we briefly discuss two bioinformatic aspects that are gaining significant importance in cancer-diagnostic assessment: first, the accuracy and quality of variant calling, which is still an open question in terms of reducing the false-positive rate; and second, CNV detection, which is an essential analysis in the clinical setting.

NGS analysis for solid cancer diagnosis

Detection of critical cancer-gene alterations in solid-tumor samples better defines patient diagnosis and prognosis, and indicates what targeted therapies must be administered to improve the care of selected cancer patients in the personalized-medicine scenario. NGS studies on solid cancer, described here, offer a fundamental overview about how the cancer molecular approach is changing, evidencing advantages compared to traditional diagnostic methods.

Hereditary breast cancer

Hereditary breast cancers (HBCs) account for 5%–10% of all BCs, and in about 30% of cases are caused by BRCA1 and BRCA2 mutations. The BRCA1 and BRCA2 genes codify for tumor-suppressor proteins, essential for DNA repair and genomic stability. The presence of these mutations increases the lifetime risk of developing HBC, and so genetic counseling and a BRCA-gene test is recommended for BC patients with early onset or a significant family history.

Conventional DNA sequencing, such as direct Sanger sequencing, requires long analysis times and high costs, due to BRCA1 and BRCA2 gene lengths of 23 and 27 exons, respectively. Moreover, prescreening methods, such as denaturing high-performance liquid chromatography, have been suggested to speed up the molecular analysis.

Our lab experience and several recent papers have demonstrated how NGS methods are adequate to detect point mutations and indels in BRCA1/BRCA2 genes, revolutionizing this genetic analysis and reducing time and costs.– This approach in fact is suitable in routine diagnostic workflow, since it is faster and more sensitive than denaturing high-performance liquid chromatography/Sanger sequencing methods. Data quality is assured by participation in international quality programs on BRCA1/BRCA2 testing with the NGS method (ie, the European Molecular Genetics Quality Network) that also allow obtainment of specific certification on correct results, sensitivity, specificity, and interpretation of variant calling.

Nowadays, other genes besides BRCA1/BRCA2 have been shown to confer high BC risk. NGS platforms allow the customization of gene panels, in order to give more chance to patients to determine their BC risk.– Tung et al found that the frequency of mutations in non-BRCA1/BRCA2 genes was 4.3% in their 25-gene panel. Lin et al developed a sequencing panel containing 68 genes associated with cancer risk for patients with early onset or familial BC. They discovered alterations in RAD50, TP53, ATM, BRIP1, FANCI, MSH2, MUTYH, and RAD51C, which may be valuable in BC-risk assessment. Lhota et al performed NGS of 581 genes in 325 BC patients (negatively tested in previous BRCA1/BRCA2/PALB2 analyses), identifying 127 truncating variants.

Despite several findings on HBC with NGS, a recent study demonstrated that with regard to the two most common platforms, neither the Illumina MiSeq sequencer with the supplied MiSeq Reporter software nor the Life Technologies Ion Torrent Personal Genome Machine (Ion PGM) with the supplied Torrent Suite software were completely suitable for clinical laboratory sequencing of BRCA1 or BRCA2. The inability of the MiSeq system is that it fails to detect insertions and deletions larger than nine base pairs. Similarly, the inability of the Ion PGM with Torrent Suite software was in detecting a ten-base pair insertion and 64-base pair deletion. However, the authors reported that an alternative alignment and variant-calling software, Quest Sequencing Analysis Pipeline (QSAP), was capable of detecting large deletions and insertions. With the combination of the MiSeq platform and QSAP alignment, they were able to design an assay with 100% sensitivity and specificity for BRCA1-and BRCA2-sequence variations. These results underline the strong impact of specific bioinformatic tools for alignment and variant calling depending on the application of interest, as we describe herein.

Melanoma

BRAF mutations play a key role in 40%–70% of malignant melanomas. According to the COSMIC database, 44% of the melanomas have BRAF mutations and 97.1% of these mutations are localized in codon 600 of the BRAF gene. Mutated BRAF can be inhibited by small-molecule kinase inhibitors, among which are vemurafenib (Roche), approved by the US Food and Drug Administration (FDA) in August 2011 for unresectable or metastatic melanoma, and dabrafenib. For these therapies, it is mandatory to detect BRAF alterations by gold-standard methods, such as Sanger sequencing and real-time polymerase chain reaction (PCR).

Ihle et al evaluated several parameters of different methods for BRAF-mutation analysis. They compared allele-specific PCR performed with the Cobas BRAF^V600 test, pyrosequencing using the Therascreen BRAF Pyro kit, high-resolution melting analysis, immunohistochemistry, the NGS approach, and Sanger sequencing with regard to their sensitivity, specificity, costs, amount of work, feasibility, and limitations. They suggested the best method to be a combination of VE1-antibody staining and high-resolution melting for p.V600E-mutation analysis, associating the lowest detection limit with a fast method with 100% sensitivity. However, the authors reported the numerous NGS advantages for melanoma molecular diagnostics, supporting the future substitution of the current methods with an NGS approach.

However, there is a clinical need to analyze other genes, both in terms of finding other target-therapy types and understanding eventual resistance. Currently, validated diagnostic panels are not commercially available, and very few studies have been performed on the development of a custom-designed gene panel., van Engen-van Grunsven et al designed a panel containing hotspot alterations, such as BRAF exon 15, NRAS exons 2 and 3, HRAS exons 2 and 3, AKT1 exon 3, GNAQ exons 4 and 5, GNA11 exons 4 and 5, KIT exons 8, 9, 11, 13, and 14, and PDGFRA exons 12, 14, and 18. Our AmpliSeq custom panel includes eleven crucial full-length genes (BRAF, NRAS, PTEN, MITF, CDK4, MGMT, CTLA4, PIK3CA, MC1R, KIT, and RB1) involved in melanoma carcinogenesis and therapy-response pathways. We tested its clinical applicability on the NGS platform Ion PGM in order to individuate new or already known SNPs and mutations that could be related to different response duration to BRAF inhibitors. Our results showed higher sensitivity and specificity in detecting a wide range of genetic alterations compared to traditional sequencing methods. Moreover we identified alterations in CTLA4, MITF, PIK3CA, KIT, and MC1R related to BRAF-inhibitor response duration. This panel is now in validation, in order to use it routinely in diagnostic prognosis and therapy prediction.

Prostate cancer

Prostate cancer (PC) has become the leading cause of cancer death in many countries among males. The high tumor heterogeneity suggests that numerous genetic events are responsible for indolent and aggressive forms of the disease. Currently, there is no way to differentiate accurately between these two forms before treatment. Most men diagnosed with PC have clinically indolent disease that does not require immediate radical treatment, and overtreatment of these men could lead to worse quality of life. The clinical response to therapy varies widely from patient to patient, since some patients relapse shortly after treatment, whereas others remain disease-free for a long time before relapsing.

Recent advances in NGS technology have improved the understanding of PC biology and clinical variability. In particular, DNA-Seq, RNA-Seq, chromatin immunoprecipitation-Seq, and methyl-Seq experiments have better elucidated the major pathways affecting prostate tumorigenesis, which are the AR-signaling, PI3K–PTEN–Akt, and RTK–Ras–MAPK pathways.

Two studies have evidenced the possibility for huge screening of PC patients in routine diagnosis., Manson-Bahr et al showed that DNA from cancer material dissected from transrectal ultrasound needle-core biopsy specimens can be analyzed. The authors observed a pattern of mutation consistent with those previously observed in PC surgical tissues, including TMPRSS2–ERG fusion and mutations in SPOP, TP53, ATM, and MEN1, while nonsense mutations were observed in the MAP2K5 and the NCOR2 genes. Iacono et al performed the first retrospective NGS study on 60 specimens: 30 high- and 30 intermediate-risk patients. They identified nonsynonymous variations and SNPs with an allelic frequency ≥10% in the TP53, CSFR1, KDR, KIT, PIK3CA, MET, and FGFR2 genes, evidencing their role in the progression and aggression of PC. However, at present the study of multiple genetic alterations in PC is not suggested for routine diagnostic purposes.

Thyroid cancer

Thyroid nodules, very frequent in the general population, are mostly benign, but an accurate identification of those nodules that could be a precursor of a cancer is needed. A common diagnostic approach that allows differential diagnosis between cancerous and benign nodules in most cases is represented by ultrasound-guided fine-needle aspiration (FNA) of the thyroid nodule followed by cytological examination. However, in approximately 25% of nodules, the diagnosis cannot be established by FNA cytology, since the limited diagnostic material available is not sufficient to perform a comprehensive molecular characterization by traditional techniques.

In the last few years, several studies have been performed on the possibility of improving thyroid cancer (TC) diagnosis by an NGS molecular test.– In 2013, they developed the first custom gene panel, the ThyroSeq, which allowed the targeting of 284 mutational hotspots in 12 cancer genes. Sequencing was performed on 228 neoplastic and nonneoplastic thyroid samples, including 105 frozen, 72 formalin-fixed, and 51 FNA samples, representing all major types of TC. Using this approach, point mutations were detected in 30%–83% of specific types of TC and in only 6% of benign thyroid nodules.

In 2014, Nikiforov et al validated the performance of a new gene-mutation panel (ThyroSeq version 2) and a gene-fusion panel (ThyroSeq RNA) in a large series of thyroid nodules with follicular or oncocytic (Hürthle cell) neoplasms/suspicious for a follicular or oncocytic (Hürthle cell) neoplasm, demonstrating that it allowed accurate cancer-risk assessment in these nodules. In 2015, the same authors demonstrated the possibility to stratify patients with benign and malignant thyroid nodules diagnosed as atypia of undetermined significance/follicular lesion by cytology, with high sensitivity and specificity. The last custom panel developed by the authors (ThyroSeq version 2.1) included 14 genes analyzed for point mutations and 42 types of gene fusions occurring in TC.

Recently, Simbolo et al investigated the diagnostic stratification of sporadic medullary TC by the use of the Ion AmpliSeq Hot Spot Cancer Panel version 2 (Life Technologies). Thirteen cases had a somatic RET mutation, and the authors showed that only ten were detected by both Sanger sequencing and NGS, while three were undetected by Sanger, revealing higher NGS sensitivity. In summary, these studies demonstrated that NGS offers the possibility of better classifying thyroid nodules. Moreover, this should improve patient management and allow clinicians to avoid diagnostic surgeries associated with significant costs and potential risks.

Lung cancer

Lung cancer (LC) is the leading cause of cancer-related death in developed countries, and is often diagnosed at an advanced stage. A comprehensive knowledge of predictive biomarkers has enabled the selection of LC patients for the use of tyrosine-kinase inhibitors (TKIs). In clinical practice, EGFR mutations must be evaluated to address patients for TKI treatment appropriately. Most (80%–90%) EGFR mutations are either small exon 19 deletions or the L858R mutation in exon 21, but other TKI-sensitive EGFR mutations can occur in exons 18–21. The mutation T790M in exon 20 needs to be investigated, because it is associated with first-generation TKI resistance but third-generation TKI sensitivity.– Another marker of TKI resistance consists of ALK rearrangement. Indeed, to date, EGFR and ALK are the only actionable genes that have drugs approved by the FDA for LC treatment.

Formalin-fixed paraffin-embedded tissue is considered an optimal specimen for molecular analysis. The gold-standard technique to detect EGFR mutations for several years was Sanger sequencing, but recently other methods have been employed for molecular diagnostics (high resolution melting, restriction fragment-length polymorphism, mutant allele-specific PCR, peptide nucleic acid-mediated PCR, pyrosequencing, immunohistochemistry with specific EGFR antibodies, and the Scorpion Amplification Refractory Mutation System). Instead, to study ALK rearrangements, the gold standard is still immunohistochemistry or fluorescence in situ hybridization.

Several studies have indicated numerous changes due to the introduction of NGS into daily clinical practice for LC molecular diagnosis, reporting high sensitivity for detecting actionable alterations by the use of a gene panel on LC specimens.– In fact, Lim et al recently reported that 58% of patients with wild type by standard testing for EGFR/KRAS/ALK showed alterations identified by NGS, thus giving these patients a therapeutic chance.

However, tissue biopsies are not always available, because 60% of non-small-cell LCs (NSCLCs) are high-stage locally advanced and/or inoperable tumors that have already metastasized to distant sites when they are detected. The diagnosis of LC sometimes depends on metastatic lymph-node specimens obtained by FNA cytology. In these patients, cytology specimens are usually the only material available for histological typing and for molecular analysis. In these cases, the tumor-cell content may be very low, implying the need to use very sensitive methods. Scarpa et al demonstrated for the first time in 2013 the diagnostic relevance of the Ion AmpliSeq Colon and Lung Cancer Panel on lung adenocarcinoma cytological samples. The first version of this panel included 504 mutational hotspot regions in 22 cancer-related genes, and it was able to detect variants up to 1% of allelic frequency, which corresponds to 2% of cancer cells in a sample. An implementation of the Ion AmpliSeq Colon and Lung Cancer Panel was reported in a study in which seven different labs belonging to the OncoNetwork Consortium tested the NGS panel on the same samples. This final version of the panel was constituted of 1,825 selected mutational hotspots in 22 cancer-related genes. Recent studies have confirmed the sufficient and high quality of DNA from cytological LC samples for NGS molecular analysis.,

Neoplastic tissues remain the standard specimen for molecular analysis. However, the potential to obtain noninvasive sampling compared with tissue biopsy is very attractive. Blood collection is less invasive than tissue sampling, and can be used when tissue specimens are limited/not available or for critically ill patients. Moreover, it can allow for sampling at several time points to monitor the genetic evolution of the tumor and also to predict early treatment resistance or nonresponse.

Plasma DNA can also be used by NGS to detect cancer-related gene alterations useful in LC-treatment decisions, because plasma may reflect disease status compared with tumor biopsy.– Moreover, during treatment, plasma analysis could reveal EGFR treatment-resistant mutations, indicating early clinical progression.

In our lab, two NGS panels on Ion Torrent are in daily use in NSCLC patients: the Ion AmpliSeq Colon and Lung Cancer Panel version 2 and the Ion AmpliSeq RNA Fusion Lung Cancer Research Panel. We also participated in Thermo Fisher Scientific’s international validation program for the final version of this fusion panel. Routinely, NGS clinical analysis is performed on NSCLC formalin-fixed paraffin-embedded and cytological samples. A comparative study is ongoing on NGS application for tissue and plasma detection, obtaining encouraging results (manuscript in preparation). Moreover, the use of the Ion AmpliSeq Colon and Lung Cancer Panel is a fundamental step in our clinical analysis to characterize the EGFR deletion type, because specific in vitro diagnostic molecular tests on Rotor-Gene real-time PCR do not provide this information.

Colorectal cancer

EGFR, involved in cancer growth and survival, is targeted by several drugs in colorectal cancer (CRC) therapy. However, only a small subgroup of patients with metastatic CRC can benefit from anti-EGFR therapies (cetuximab or panitumumab), and thus prediction of patient responses is necessary to avoid side effects and to save costs. Ras proteins (HRas, KRas, and NRas) are important downstream effectors that transmit signals from EGFR to the intracellular signaling cascade. KRAS is considered a predictive biomarker for the efficacy of anti-EGFR therapy since KRAS-mutant CRC patients (codons 12 and 13 in exon 2) are resistant to treatment with EGFR inhibitors. However, approximately 40%–50% of patients harboring wild-type KRAS exon 2 do not benefit from these targeted agents, suggesting the potential involvement of other genetic alterations in pathways downstream of EGFR. In fact, a recent study suggested that additional mutations in KRAS and NRAS, as well as downstream mutations in BRAF or PIK3CA, may cause resistance to anti-EGFR treatment. Inter- and intratumoral genetic heterogeneity is another factor in predicting treatment failure and drug resistance in CRC therapies. The recently updated National Comprehensive Cancer Network guideline strongly recommends genotyping of tumor tissue (either primary tumor or metastasis) in all patients with metastatic CRC for RAS (exons 2–4 of KRAS and NRAS), and patients with any known KRAS or NRAS mutation should not be treated with cetuximab or panitumumab. To date, the gold standard for analysis of these genes is real-time PCR or pyrosequencing, methods that are time-consuming with low sensitivity.

In order to investigate CRC specimens with NGS in clinical practice, Tops et al developed a multigene panel, already used for LC investigation. This panel has also been employed by several other groups in CRC research, who have recommended it in clinics when compared to traditional methods.– Another clinical application of NGS to CRC is represented by an interesting recent study in which a cutoff for mutational load can be identified via multigene tumor profiling to discriminate CRC patients in DNA-mismatch repair (MMR)-pathway proficiency and deficiency, since 15%–20% of CRC patients are deficient in one or more genes of MMR. This approach can be used for initial screening of Lynch syndrome. Moreover, the authors demonstrated the feasibility of analyzing MMR deficiency and RAS/BRAF mutations in CRC patients with the same panel, reducing time and costs of analysis.

Lately, several custom gene panels have been developed with Illumina and Life Technologies to investigate many other crucial CRC genes.– A multigene approach is in fact mandatory to obtain simultaneously a larger mutational spectrum, increasing the knowledge of CRC. Probably in the future, additional information emerging from these NGS studies will be useful for anti-EGFR therapy response duration or to develop other target therapies.

NGS and hematologic cancer

Hematological malignancies are grounded in genetic aberrations, in particular large mutations that are at the basis of the different phenotypes in the spectrum of hematologic cancers. NGS technologies have been applied to hematological disorders in a variety of contexts: guiding diagnosis (TCR gene rearrangement to establish T-cell clonality), subclassification (recurrent cytogenetic translocations in acute myeloid leukemia), prognosis (Philadelphia chromosome-positive in acute lymphoblastic leukemia), and minimal residual disease (MRD) testing (BCR–ABL transcripts in chronic myelogenous leukemia), often allowing the identification of novel mutations.,49 The characterization of leukemias, lymphomas, and myelomas is continually evolving, and includes the precise identification of additional common mutations that may be of great prognostic value and clinical importance.

Multiple myeloma

Multiple myeloma (MM) is a malignancy of plasma cells. It is a multistep process, and an asymptomatic stage of monoclonal gammopathy of undetermined significance precedes virtually all cases of MM. This malignancy undergoes a multistep-transformation process. Its genetic landscape changes over time due to additional events, such as somatic mutations and epigenetic and chromosomal copy-number changes, driving its progression from monoclonal gammopathy of undetermined significance to symptomatic MM and ultimately to aggressive extramedullary disease in some patients.

The first important event in plasma-cell transformation is represented by hyperdiploidy, observed in up to 55% of patients. The second is based on IGH translocations in 40%–50% of patients. Moreover, t(11;14) (dysregulation of the CCND1 gene, with its overexpression), t(4;14) (upregulation of FGFR3 and MMSET/WHSC1), and many other chromosomal rearrangements are present in the tumor plasma cells at the time of diagnosis. All these abnormalities have been known for a long time, because they are visible on the conventional karyotype. More recent data based on comparative genomic hybridization or SNP-array technologies have revealed other important chromosomal changes, especially homozygotic deletions.

With the development of NGS, the understanding of MM has been greatly improved in the past 5 years, confirming its wide heterogeneity at the molecular level, but also providing a clearer picture of the disease pathogenesis and progression. The quantitative nature of NGS data allows for higher resolution of the subclonal architecture of cancers. Nevertheless, initial reports of genomic evolution in MM using NGS were conducted on small cohorts, suggesting that MM shows a heterogeneous subclonal structure at diagnosis and only a few recurrent mutated genes of likely pathogenetic significance, including KRAS, NRAS, TP53, BRAF, and FAM46C.,

With NGS, Bolli et al confirmed subclonal KRAS, NRAS, and BRAF mutations in MM observed in about one-third of patients: acquisitions with crucial therapeutic implications in trials of Mek and BRaf inhibitors. Recently, Kortüm et al designed a 47-gene-targeting gene panel containing 39 genes known to be mutated in ≥3% of MM cases and eight genes in pathways therapeutically targeted in MM. Mutation analysis revealed KRAS as the most commonly mutated gene, followed by NRAS, TP53, DIS3, FAM46C, and SP140. They tracked clonal evolution and identified mutation acquisition and/or loss in FAM46C, FAT1, KRAS, NRAS, SPEN, PRDM1, NEB, and TP53, as well as two mutations in XBP1, a gene associated with bortezomib resistance.

Lymphomas

In recent years, the development of NGS has also allowed the acquisition of important molecular information in a variety of lymphoid tumors, including Hodgkin’s lymphoma, diffuse large B-cell lymphoma, Burkitt’s lymphoma, chronic lymphocytic leukemia, follicular lymphoma, mantle-cell lymphoma, hairy-cell leukemia, and splenic marginal zone lymphoma. Although there have been many advances in this field, NGS panels are not yet available for clinical practice. The current modality to diagnose a hematological disease is based on fluorescence in situ hybridization, classic molecular biology, and radiographic studies. The latter in particular is associated with radiation exposure and limited specificity.

The new sequencing technologies, in addition to identifying somatic mutations involved in cancer progression (ie, mutations of BRAF, MYD88, and NOTCH2), have provided scientific evidence that might be useful for clinical treatment, as well as for the diagnosis and progression of these diseases. NGS aims to detect the tumor-specific clonotype and circulating tumor-specific sequence in the peripheral blood of patients with Hodgkin’s lymphoma. Quesada et al used this approach to identify lymphoma-specific immunoglobulin gene rearrangements in primary tumor samples at diagnosis or disease recurrence, as well as in follow-up. Moreover, the sequencing of B-cell lymphoma genomes has identified recurrent mutations, some of which have prognostic impact or serve as drug targets. Mutation of P53 predicts poor response to treatment and shortened overall survival across lymphoma entities, and mutations in NOTCH1 and SF3B1 have been shown to be independent predictors of poor outcome in chronic lymphocytic leukemia.

Minimal residual disease

MRD is defined as the small number of cancer cells that persist in a patient during or after treatment, even though clinical and microscopic examinations confirm complete remission and the patient shows no signs or symptoms of disease. MRD detection and quantification are used for the evaluation of treatment efficiency, patient-risk stratification, and long-term outcome prediction in hematological malignancies.

Currently, flow cytometry is the most commonly used technique for the diagnosis and characterization of hematological malignancies and MRD. Although the method is widely used, a high level of expertise is required to interpret the data precisely when it comes to rare-event detection, such as MRD. The sensitivity for the detection of malignant cells varies according to the type of disorder, the panel of antibodies used, the number of cells analyzed, and the expertise of the laboratory. Furthermore, DNA and RNA tests usually lack the sensitivity required for MRD monitoring.

NGS approaches allow for searching not only for known mutations/translocations but also for all clonal gene mutations and rearrangements present in diagnostic samples to understand the possible evolution of MRD better. In a recent study, consensus primers and high-throughput sequencing were employed to amplify and sequence all rearranged IGH and TCR gene segments.

Ladetto et al described a comparison between real-time quantitative PCR and LymphoSight NGS as methods for MRD detection using clonal IGH rearrangements. The primary results demonstrated that NGS enabled the detection of this molecular marker in a high proportion of cases, including a fraction in which standard PCR-based amplification failed. In addition, NGS showed a sensitivity comparable to that obtained by real-time quantitative PCR, allowing its use for detection of MRD.

Unfortunately, NGS for this purpose is not yet routinely employed in clinical practice. NGS might overcome some disadvantages of PCR-based methods and avoid the need for patient-specific reagents. In addition, the NGS approach enables the analysis of genetic diversity and clonogenic heterogeneity which may contribute to our current understanding of disease biology and relapse kinetics., To date, only one CE (Conformité Européene)-marked in vitro diagnostic panel is commercially available: the LymphoTrack Dx assay (for Illumina MiSeq and Ion PGM) used for the identification of the DNA sequence, clonal prevalence, and V–J family identity for each gene rearrangement, as well as IGH assays, and the extent of IGHV somatic hypermutation.

Variant calling and copy-number variations

NGS provides large-scale data that continue to pose a major challenge. To call variants from NGS data, many aligners and variant callers have been developed and composed into diverse pipelines. A typical pipeline contains an aligner, which maps the sequencing reads to a reference genome, and a variant caller, which identifies variant sites and assigns a genotype to a subject. The performances of different aligners have been extensively studied, and great effort is still needed to correctly identify the best analysis pipeline.,

The Genome Analysis Toolkit (GATK; Broad Institute, Cambridge, MA, USA) is a powerful set of tools for NGS-data analysis. Recently, we focused on the optimization of GATK to call variants from data sets coming from an Ion Torrent targeted custom panel, including eleven genes involved in melanoma. In particular, the variant-filtration step was investigated. To this end, the variant quality-score recalibration (VQSR) step has been recently introduced. VQSR filtering uses annotation metrics (eg, quality by depth, mapping quality, strand bias) from a true variant, annotated in HapMap for instance, to generate an adaptive model. Such a model applied to the other variants allows calculation of the probability that a variant is true or false. Although this is a powerful method, it requires a large call set. Indeed, GATK’s best practices suggest not to apply VQSR in “small-scale experiments, such as targeted gene panels or exome studies with fewer than 30 exomes”. In these cases, hard filtering is the approach indicated by GATK. General rules are available, but appropriate filters have to be specifically set up for each study, considering also that GATK does not provide any technical documentation for Ion Torrent data. Therefore, starting from a comparison of results from GATK and proprietary Torrent Suite variant caller (TVC) analyses on the real data set, our aim was to determine a framework for GATK hard filtering in order to lower false-positive calls (Figure 2). We observed a high discrepancy between TVC and GATK, particularly for indels, suggesting that such type variants are difficult to detect with even the present bioinformatic tools. We then decided to simulate two data sets, each with a different coverage and each carrying alterations found in real data. Indeed, the importance of defining a “gold standard” data set to test variant calling methods is a very hot topic, and, recently “synthetic” matched tumor–normal samples have been created to compare performances of popular variant callers in the detection of “somatic” single nucleotide variants (SNVs). The first important result is that results are strictly correlated with coverage. We found that in a high-coverage data set, calling of SNVs led to a lower number of false positives than in a low-coverage data set. However, focusing on indels, the picture is more complex and the number of false-positive cases is high in both of the two data sets when looking at the variants suggested by GATK in the phase preceding the filtering of “good” variants. To be able to select opportune hard filters, we considered the most important parameters of quality indicated in the raw Variant Call Format file. We built up regression trees to be able to identify the best choice for hard filtering, in order to discriminate true and false calls better. We performed the analyses in SNV and indel subsets, both stratified by genotype, in high- and low-coverage data sets. Regression trees allowed us to set a series of filters for each type of alteration. Recently, Vanni et al used GATK to analyze sequencing data of the targeted AmpliSeq Colon and Lung Cancer Panel (Life Technologies). Methodologically, they filtered out variants with a Phred score of 5–30, marking them as low quality. Our results showed that such an approach might not be enough to have a high-quality GATK call set. In detail, we found that different parameters could be tuned depending on the type of mutations and genotypes suggested. The application of hard filtering was able to reduce the number of false positives. Sometimes, the loss of true variants could be high, in particular for indels, but it has to be noticed that the number of false variants was also high. Therefore, the application of hard filtering can help to drastically reduce such a high number of false positives, and we argue that increasing coverage should improve filtering results in terms of true variants not correctly discarded. We explored flanking regions of each type of alteration, in particular searching for recurrent homopolymeric strings, highlighting that they are partly responsible for false-positive calls. Hard filters were tested on a real independent cohort, which underwent sequencing by the same custom panel. We found almost 100% concordance regarding SNV calling (manuscript in preparation).

Our approach to setting up a pipeline for SNV calling.

Abbreviations: SNV, single-nucleotide variant; GATK, Genome Analysis Toolkit; TVC, Torrent Suite variant caller; VCF, Variant Call Format.

Another NGS application is represented by CNV analysis. CNVs occur frequently during carcinogenesis, and thus the detection of these aberrations is essential for cancer-genome analysis to improve diagnosis and treatment. NGS-based CNV algorithms frequently manage WGS and WES data. A number of somatic CNV-detection programs for NGS data have been developed, each of them based on a different approach. However, with regard to targeted sequencing, the approach used in diagnostic settings, the bioinformatic challenge remains open. In essence, all pipelines for CNV detection in targeted-sequencing data work through the read-depth approach. In detail, they are based on the calculation of coverage of the amplicons and in the detection of outliers, subsequent to the opportune normalization step. Some algorithms require matched tumor–normal samples or a reference DNA, but recently an R package was introduced, Ioncopy, that does not need control samples.– Different biases have to be considered in a read depth-based approach. PCR could lead to coverage distortion, because of nonuniform efficiency in amplification. An important issue deeply studied for CNV identification is guanine–cytosine bias, which affects read coverage. Moreover, another important bias regards the alignment step, because short reads might not be unambiguously mapped to the reference genome. In conclusion, even if a number of methods have been set up, validation is still needed in order to include them in a clinical setting.

Conclusion

Despite several critical points regarding mostly technology implementation and data interpretation, in this review, we have shown numerous benefits of an NGS approach (Figure 3). In fact, recent innovations in sequencing technologies, have allowed the obtainment of a wide spectrum of genomic alterations occurring within tumors.

Benefits obtained from the use of NGS methods in clinical molecular diagnostics.

Note: The introduction of the NGS in the clinical guidelines requires an improvement of some critical points shown in this figure.

Abbreviation: NGS, next-generation sequencing.

At present, the clinical utility and efficacy of comprehensive genomic profiling with the NGS are under evaluation, in order to introduce this technology in clinical guidelines for solid and hematologic cancer management. Initial results demonstrate that NGS might improve patient care, guiding them toward specific screening programs and targeted therapies with more accuracy and specificity than traditional sequencing methods, even if other many studies are needed.

Footnotes

Disclosure

The authors report no conflicts of interest in this work.

References

1. Li W, Zhao K, Kirberger M, Liao W, Yan Y. Next generation sequencing technologies in cancer diagnostics and therapeutics: a mini review. Cell Mol Biol (Noisy-le-grand) 2015;61(5):91–102. [PubMed] [Google Scholar]

2. Quail MA, Smith M, Coupland P, et al. A tale of three next generation sequencing platforms: comparison of Ion Torrent, Pacific Biosciences and Illumina MiSeq sequencers. BMC Genomics. 2012;13:341.[PMC free article] [PubMed] [Google Scholar]

3. Ross MG, Russ C, Costello M, et al. Characterizing and measuring bias in sequence data. Genome Biol. 2013;14(5):R51.[PMC free article] [PubMed] [Google Scholar]

4. D’Argenio V, Esposito MV, Telese A, et al. The molecular analysis of BRCA1 and BRCA2: next-generation sequencing supersedes conventional approaches. Clin Chim Acta. 2015;446:221–225. [PubMed] [Google Scholar]

5. Tarabeux J, Zeitouni B, Moncoutier V, et al. Streamlined ion torrent PGM-based diagnostics: BRCA1 and BRCA2 genes as a model. Eur J Hum Genet. 2014;22(4):535–541.[PMC free article] [PubMed] [Google Scholar]

6. Pilato B, Pinto R, De Summa S, et al. BRCA1-2 diagnostic workflow from next-generation sequencing technologies to variant identification and final report. Genes Chromosomes Cancer. 2016;55(10):803–813. [PubMed] [Google Scholar]

7. Tung N, Battelli C, Allen B, et al. Frequency of mutations in individuals with breast cancer referred for BRCA1 and BRCA2 testing using next-generation sequencing with a 25-gene panel. Cancer. 2015;121(1):25–33. [PubMed] [Google Scholar]

8. Lin PH, Kuo WH, Huang AC, et al. Multiple gene sequencing for risk assessment in patients with early-onset or familial breast cancer. Oncotarget. 2016;7(7):8310–8320.[PMC free article] [PubMed] [Google Scholar]

9. Lhota F, Zemankova P, Kleiblova P, et al. Hereditary truncating mutations of DNA repair and other genes in BRCA1/BRCA2/PALB2-negatively tested breast cancer patients. Clin Genet. 2016;90(4):324–333. [PubMed] [Google Scholar]

10. Strom CM, Rivera S, Elzinga C, et al. Development and validation of a next-generation sequencing assay for BRCA1 and BRCA2 variants for the clinical laboratory. PLoS One. 2015;10(8):e0136419.[PMC free article] [PubMed] [Google Scholar]

11. Ihle MA, Fassunke J, König K, et al. Comparison of high resolution melting analysis, pyrosequencing, next generation sequencing and immunohistochemistry to conventional Sanger sequencing for the detection of p.V600E and non-p.V600E BRAF mutations. BMC Cancer. 2014;14:13.[PMC free article] [PubMed] [Google Scholar]

12. van Engen-van Grunsven AC, Kusters-Vandevelde H, Groenen PJ, Blokx WA. Update on molecular pathology of cutaneous melanocytic lesions: what is new in diagnosis and molecular testing for treatment? Front Med (Lausanne) 2014;1:39.[PMC free article] [PubMed] [Google Scholar]

13. Pinto R, De Summa S, Strippoli S, et al. The next generation of metastatic melanoma: uncovering the genetic variants for anti-BRAF therapy response. Oncotarget. 2016;7(18):25135–25149.[PMC free article] [PubMed] [Google Scholar]

14. Chen J, Zhang D, Yan W, Yang D, Shen B. Translational bioinformatics for diagnostic and prognostic prediction of prostate cancer in the next-generation sequencing era. Biomed Res Int. 2013;2013:901578.[PMC free article] [PubMed] [Google Scholar]

15. Manson-Bahr D, Ball R, Gundem G, et al. Mutation detection in formalin-fixed prostate cancer biopsies taken at the time of diagnosis using next-generation DNA sequencing. J Clin Pathol. 2015;68(3):212–217. [PubMed] [Google Scholar]

16. Iacono ML, Buttigliero C, Monica V, et al. Retrospective study testing next generation sequencing of selected cancer-associated genes in resected prostate cancer. Oncotarget. 2016;7(12):14394–14404.[PMC free article] [PubMed] [Google Scholar]

17. Nikiforova MN, Wald AI, Roy S, Durso MB, Nikiforov YE. Targeted next-generation sequencing panel (ThyroSeq) for detection of mutations in thyroid cancer. J Clin Endocrinol Metab. 2013;98(11):E1852–E1860.[PMC free article] [PubMed] [Google Scholar]

18. Nikiforov YE, Carty SE, Chiosea SI, et al. Highly accurate diagnosis of cancer in thyroid nodules with follicular neoplasm/suspicious for a follicular neoplasm cytology by ThyroSeq v2 next-generation sequencing assay. Cancer. 2014;120(23):3627–3634. [PubMed] [Google Scholar]

19. Nikiforov YE, Carty SE, Chiosea SI, et al. Impact of the multi-gene ThyroSeq next-generation sequencing assay on cancer diagnosis in thyroid nodules with atypia of undetermined significance/follicular lesion of undetermined significance cytology. Thyroid. 2015;25(11):1217–1223.[PMC free article] [PubMed] [Google Scholar]

20. Simbolo M, Mian C, Barollo S, et al. High-throughput mutation profiling improves diagnostic stratification of sporadic medullary thyroid carcinomas. Virchows Arch. 2014;465(1):73–78. [PubMed] [Google Scholar]

21. Sequist LV, Waltman BA, Dias-Santagata D, et al. Genotypic and histological evolution of lung cancers acquiring resistance to EGFR inhibitors. Sci Transl Med. 2011;3(75):75ra26.[PMC free article] [PubMed] [Google Scholar]

22. Yu HA, Arcila ME, Rekhtman N, et al. Analysis of tumor specimens at the time of acquired resistance to EGFR-TKI therapy in 155 patients with EGFR-mutant lung cancers. Clin Cancer Res. 2013;19(8):2240–2247.[PMC free article] [PubMed] [Google Scholar]

23. Walter AO, Sjin RT, Haringsma HJ, et al. Discovery of a mutant-selective covalent inhibitor of EGFR that overcomes T790M-mediated resistance in NSCLC. Cancer Discov. 2013;3(12):1404–1415.[PMC free article] [PubMed] [Google Scholar]

24. Cross DA, Ashton SE, Ghiorghiu S, et al. AZD9291, an irreversible EGFR TKI, overcomes T790M-mediated resistance to EGFR inhibitors in lung cancer. Cancer Discov. 2014;4(9):1046–1061.[PMC free article] [PubMed] [Google Scholar]

25. Masago K, Fujita S, Muraki M, et al. Next-generation sequencing of tyrosine kinase inhibitor-resistant non-small-cell lung cancers in patients harboring epidermal growth factor-activating mutations. BMC Cancer. 2015;15:908.[PMC free article] [PubMed] [Google Scholar]

26. Rangachari D, VanderLaan PA, Le X, et al. Experience with targeted next generation sequencing for the care of lung cancer: insights into promises and limitations of genomic oncology in day-to-day practice. Cancer Treat Commun. 2015;4:174–181.[PMC free article] [PubMed] [Google Scholar]

27. Shao D, Lin Y, Liu J, et al. A targeted next-generation sequencing method for identifying clinically relevant mutation profiles in lung adenocarcinoma. Sci Rep. 2016;6:22338.[PMC free article] [PubMed] [Google Scholar]

28. Lim SM, Kim EY, Kim HR, et al. Genomic profiling of lung adenocarcinoma patients reveals therapeutic targets and confers clinical benefit when standard molecular testing is negative. Oncotarget. 2016;7(17):24172–24178.[PMC free article] [PubMed] [Google Scholar]

29. Scarpa A, Sikora K, Fassan M, et al. Molecular typing of lung adenocarcinoma on cytological samples using a multigene next generation sequencing panel. PLoS One. 2013;8(11):e80478.[PMC free article] [PubMed] [Google Scholar]

30. Tops BB, Normanno N, Kurth H, et al. Development of a semi-conductor sequencing-based panel for genotyping of colon and lung cancer by the Onconetwork consortium. BMC Cancer. 2015;15:26.[PMC free article] [PubMed] [Google Scholar]

31. Qiu T, Guo H, Zhao H, Wang L, Zhang Z. Next-generation sequencing for molecular diagnosis of lung adenocarcinoma specimens obtained by fine needle aspiration cytology. Sci Rep. 2015;5:11317.[PMC free article] [PubMed] [Google Scholar]

32. Treece AL, Montgomery ND, Patel NM, et al. FNA smears as a potential source of DNA for targeted next-generation sequencing of lung adenocarcinomas. Cancer Cytopathol. 2016;124(6):406–414.[PMC free article] [PubMed] [Google Scholar]

33. Uchida J, Kato K, Kukita Y, et al. Diagnostic accuracy of noninvasive genotyping of EGFR in lung cancer patients by deep sequencing of plasma cell-free DNA. Clin Chem. 2015;61(9):1191–1196. [PubMed] [Google Scholar]

34. Schwaederle M, Husain H, Fanta PT, et al. Detection rate of actionable mutations in diverse cancers using a biopsy-free (blood) circulating tumor cell DNA assay. Oncotarget. 2016;7(9):9707–9717.[PMC free article] [PubMed] [Google Scholar]

35. Que D, Xiao H, Zhao B, Zhang X, Wang Q, Wang G. EGFR mutation status in plasma and tumor tissues in non-small cell lung cancer serves as a predictor of response to EGFR-TKI treatment. Cancer Biol Ther. 2016;17(3):320–327.[PMC free article] [PubMed] [Google Scholar]

36. Xu S, Lou F, Wu Y, et al. Circulating tumor DNA identified by targeted sequencing in advanced-stage non-small cell lung cancer patients. Cancer Lett. 2016;370(2):324–331. [PubMed] [Google Scholar]

37. Marchetti A, Palma JF, Felicioni L, et al. Early prediction of response to tyrosine kinase inhibitors by quantification of EGFR mutations in plasma of NSCLC patients. J Thorac Oncol. 2015;10(10):1437–1443. [PubMed] [Google Scholar]

38. Therkildsen C, Bergmann TK, Henrichsen-Schnack T, Ladelund S, Nilbert M. The predictive value of KRAS, NRAS, BRAF, PIK3CA and PTEN for anti-EGFR treatment in metastatic colorectal cancer: a systematic review and meta-analysis. Acta Oncol. 2014;53(7):852–864. [PubMed] [Google Scholar]

39. Gao J, Wu H, Wang L, et al. Validation of targeted next-generation sequencing for RAS mutation detection in FFPE colorectal cancer tissues: comparison with Sanger sequencing and ARMS-Scorpion real-time PCR. BMJ Open. 2016;6(1):e009532.[PMC free article] [PubMed] [Google Scholar]

40. Belardinilli F, Capalbo C, Buffone A, et al. Validation of the Ion Torrent PGM sequencing for the prospective routine molecular diagnostic of colorectal cancer. Clin Biochem. 2015;48(13–14):908–910. [PubMed] [Google Scholar]

41. D’Haene N, Le Mercier M, De Neve N, et al. Clinical validation of targeted next generation sequencing for colon and lung cancers. PLoS One. 2015;10(9):e0138245.[PMC free article] [PubMed] [Google Scholar]

42. Haley L, Tseng LH, Zheng G, et al. Performance characteristics of next-generation sequencing in clinical mutation detection of colorectal cancers. Mod Pathol. 2015;28(10):1390–1399.[PMC free article] [PubMed] [Google Scholar]

43. Stadler ZK, Battaglin F, Middha S, et al. Reliable detection of mismatch repair deficiency in colorectal cancers using mutational load in next- generation sequencing panels. J Clin Oncol. 2016;34(18):2141–2147.[PMC free article] [PubMed] [Google Scholar]

44. Hsu HC, Thiam TK, Lu YJ, et al. Mutations of KRAS/NRAS/BRAF predict cetuximab resistance in metastatic colorectal cancer patients. Oncotarget. 2016;7(16):22257–22270.[PMC free article] [PubMed] [Google Scholar]

45. Malapelle U, Pisapia P, Sgariglia R, et al. Less frequently mutated genes in colorectal cancer: evidences from next-generation sequencing of 653 routine cases. J Clin Pathol. 2016;69(9):767–771.[PMC free article] [PubMed] [Google Scholar]

46. Jesinghaus M, Pfarr N, Endris V, et al. Genotyping of colorectal cancer for cancer precision medicine: results from the IPH Center for Molecular Pathology. Genes Chromosomes Cancer. 2016;55(6):505–521. [PubMed] [Google Scholar]

47. Wang SR, Malik S, Tan IB, et al. Technical validation of a next-generation sequencing assay for detecting actionable mutations in patients with gastrointestinal cancer. J Mol Diagn. 2016;18(3):416–424. [PubMed] [Google Scholar]

48. Hussaini M. Biomarkers in hematological malignancies: a review of molecular testing in hematopathology. Cancer Control. 2015;22(2):158–166. [PubMed] [Google Scholar]

49. Black JS, Salto-Tellez M, Mills KI, Catherwood MA. The impact of next generation sequencing technologies on haematological research: a review. Pathogenesis. 2015;2(1–2):9–16.[Google Scholar]

50. Morgan GJ, Walker BA, Davies FE. The genetic architecture of multiple myeloma. Nat Rev Cancer. 2012;12(5):335–348. [PubMed] [Google Scholar]

51. Chapman MA, Lawrence MS, Keats JJ, et al. Initial genome sequencing and analysis of multiple myeloma. Nature. 2011;471(7339):467–472.[PMC free article] [PubMed] [Google Scholar]

52. Walker BA, Wardell CP, Melchor L, et al. Intraclonal heterogeneity is a critical early event in the development of myeloma and precedes the development of clinical symptoms. Leukemia. 2014;28(2):384–390.[PMC free article] [PubMed] [Google Scholar]

53. Bolli N, Avet-Loiseau H, Wedge DC, et al. Heterogeneity of genomic evolution and mutational profiles in multiple myeloma. Nat Commun. 2014;5:2997.[PMC free article] [PubMed] [Google Scholar]

54. Kortüm KM, Langer C, Monge J, et al. Longitudinal analysis of 25 sequential sample-pairs using a custom multiple myeloma mutation sequencing panel (M³P) Ann Hematol. 2015;94(7):1205–1211.[PMC free article] [PubMed] [Google Scholar]

55. Hallek M, Cheson BD, Catovsky D, et al. Guidelines for the diagnosis and treatment of chronic lymphocytic leukemia: a report from the International Workshop on Chronic Lymphocytic Leukemia updating the National Cancer Institute: Working Group 1996 guidelines. Blood. 2008;111(12):5446–5456.[PMC free article] [PubMed] [Google Scholar]

56. Quesada V, Conde L, Villamor N, et al. Exome sequencing identifies recurrent mutations of the splicing factor SF3B1 gene in chronic lymphocytic leukemia. Nat Genet. 2012;44(1):47–52. [PubMed] [Google Scholar]

57. Faham M, Zheng J, Moorhead M, et al. Deep-sequencing approach for minimal residual disease detection in acute lymphoblastic leukemia. Blood. 2012;120(26):5173–5180.[PMC free article] [PubMed] [Google Scholar]

58. Ladetto M, Bruggemann M, Monitillo L, et al. Next-generation sequencing and real-time quantitative PCR for minimal residual disease detection in B-cell disorders. Leukemia. 2014;28(6):1299–1307. [PubMed] [Google Scholar]

59. Nielsen R, Paul JS, Albrechtsen A, Song YS. Genotype and SNP calling from next-generation sequencing data. Nat Rev Genet. 2011;12(6):443–451.[PMC free article] [PubMed] [Google Scholar]

60. Costa JL, Sousa S, Justino A, et al. Nonoptical massive parallel DNA sequencing of BRCA1 and BRCA2 genes in a diagnostic setting. Hum Mutat. 2013;34(4):629–635. [PubMed] [Google Scholar]

61. Rothberg JM, Hinz W, Rearick TM, et al. An integrated semiconductor device enabling non-optical genome sequencing. Nature. 2011;475(7356):348–352. [PubMed] [Google Scholar]

62. Zook JM, Chapman B, Wang J, et al. Integrating human sequence data sets provides a resource of benchmark SNP and indel genotype calls. Nat Biotechnol. 2014;32(3):246–251. [PubMed] [Google Scholar]

63. Vanni I, Coco S, Truini A, et al. Next-generation sequencing workflow for NSCLC critical samples using a targeted sequencing approach by Ion Torrent PGM platform. Int J Mol Sci. 2015;16(12):28765–28782.[PMC free article] [PubMed] [Google Scholar]

Nh Ng Ph N M M Download Torrent T T Nh T 2015 For Sale

64. Liu B, Morrison CD, Johnson CS, et al. Computational methods for detecting copy number variations in cancer genome using next generation sequencing: principles and challenges. Oncotarget. 2013;4(11):1868–1881.[PMC free article] [PubMed] [Google Scholar]

65. Hoogstraat M, Hinrichs JW, Besselink NJ, et al. Simultaneous detection of clinically relevant mutations and amplifications for routine cancer pathology. J Mol Diagn. 2015;17(1):10–18. [PubMed] [Google Scholar]

66. Grasso C, Butler T, Rhodes K, et al. Assessing copy number alterations in targeted, amplicon-based next-generation sequencing data. J Mol Diagn. 2015;17(1):53–63. [PubMed] [Google Scholar]

67. Budczies J, Pfarr N, Stenzinger A, et al. Ioncopy: a novel method for calling copy number alterations in amplicon sequencing data including significance assessment. Oncotarget. 2016;7(11):13236–13247.[PMC free article] [PubMed] [Google Scholar]

Articles from OncoTargets and therapy are provided here courtesy of Dove Press