BPGA has dependencies with other tools that require installation. It runs both in Windows and in Linux as executables files (source code in Perl). In addition, it offers various entity (core-, pangenome, and MLST) phylogeny, phyletic profile analysis (gene presence/absence), subset analysis, atypical sequence composition analysis, orthologous, and functional annotation for all gene datasets, user-selection of gene clustering algorithm, command line interface, and nice graphics. 2016), comes with a handful of new options and features most notably that of optimizing the speed of execution. ( 2012).īacterial Pangenome Analysis (BPGA) (Chaudhari et al. Additional optimizations can be achieved by exploiting alternative (to the original exponential decay) regressions functions practical implementations of such optimizations are described in Tettelin et al. Indeed, observations from limited in size datasets, showed that even extreme sampling is still able to model reliably the pangenome bypassing the need to follow an exhaustive all-against-all comparison (Fig. 2015) the total number of comparisons needed comparisons are randomly selected making sure that each genome undergoes the same number of comparisons the trick here is to set the number of possible comparisons to a number that will optimally balance the existing computational power and the target dataset size. Where C is the total number of comparisons, and N is the total number of genomes.Ī workaround to an exhaustive approach is a method of subsampling (Vernikos et al. On the other hand, if the group complexity is exhausted very fast even from the analyses of a handful of group members then we are dealing with a closed pangenome whereby we only need few representatives to describe the totality of the sequence variability. The lifestyle of the species of interest is one of the parameters strongly dictating the distribution shape of the pangenome for example, if by recurring addition of group members, the pangenome continues to grow, we are analyzing an open pangenome (such examples include human pathogens and environmental bacteria) (Hiller et al. 2015) that the minimum number of genomes to analyze be at least five. Obviously, limited or sparse datasets might lead to erroneous conclusions therefore, it was recommended (Vernikos et al. The pangenome concept can be implemented either in reverse or in forward-thinking approaches in the first case, we are interested to capture the genomic diversity of the group of interest, while in the second case we are more interested in exploring and predicting from a pragmatic perspective what is the minimum number of genome sequences required to capture the totality of the group. These fueled the interest of many researchers to carry out pangenome analysis at every conceivable phylogenetic resolution level (Table 1), exploiting various modeling frameworks, assumptions, and underlying homology search engines.Įxamples of the application of pangenome approaches at different levels of phylogenetic resolutionĪ pivotal work in terms of phylogenetic resolution was carried out by Lapierre and Gogarten ( 2009), showing that on average in the largest bacterium group analyzed so far, the core gene set accounts only for 8% of the pangenome. 2017) are available in the public domain. Today, as of August 2018, 110,660 complete whole-genome sequencing projects-of which 87% are bacteria-and 15,066 finished whole-genome sequencing projects (Mukherjee et al. The exponential growth of genomic databases started in 1995 with Haemophilus influenzae being the first complete genome project (Fleischmann et al. Simply put, using the original definition, the core-genome describes the set of sequences shared by all members of the taxa of interest, the dispensable genome captures a subset of sequences shared by some members of the group (dictating the diversity of the group: alternative biochemical pathways, niche adaptation, antibiotic resistance, etc.) while the pangenome is simply the union of core and dispensable genomes (describing the totality of taxa at the level of sequence datasets). 2005), supragenome, distributed and unique genes (Lapierre and Gogarten 2009), and flexible regions (Rodriguez-Valera and Ussery 2012). Since then the nomenclature of this concept became fairly wide to accommodate words like pangenome, core and dispensable genes, strain-specific genes (Medini et al. ( 2005) conceived the concept of pangenome, in an attempt to describe and model the genomic totality of a taxa (species, serovar, phylum, kingdom, etc.) of interest.
0 Comments
Leave a Reply. |
Details
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |