RevDate: 2021-09-15

Colquhoun RM, Hall MB, Lima L, et al (2021)

Pandora: nucleotide-resolution bacterial pan-genomics with reference graphs.

Genome biology, 22(1):267.

We present pandora, a novel pan-genome graph structure and algorithms for identifying variants across the full bacterial pan-genome. As much bacterial adaptability hinges on the accessory genome, methods which analyze SNPs in just the core genome have unsatisfactory limitations. Pandora approximates a sequenced genome as a recombinant of references, detects novel variation and pan-genotypes multiple samples. Using a reference graph of 578 Escherichia coli genomes, we compare 20 diverse isolates. Pandora recovers more rare SNPs than single-reference-based tools, is significantly better than picking the closest RefSeq reference, and provides a stable framework for analyzing diverse samples without reference bias.

RevDate: 2021-09-13

Da Silva K, Pons N, Berland M, et al (2021)

StrainFLAIR: strain-level profiling of metagenomic samples using variation graphs.

PeerJ, 9:e11884 pii:11884.

Current studies are shifting from the use of single linear references to representation of multiple genomes organised in pangenome graphs or variation graphs. Meanwhile, in metagenomic samples, resolving strain-level abundances is a major step in microbiome studies, as associations between strain variants and phenotype are of great interest for diagnostic and therapeutic purposes. We developed StrainFLAIR with the aim of showing the feasibility of using variation graphs for indexing highly similar genomic sequences up to the strain level, and for characterizing a set of unknown sequenced genomes by querying this graph. On simulated data composed of mixtures of strains from the same bacterial species Escherichia coli, results show that StrainFLAIR was able to distinguish and estimate the abundances of close strains, as well as to highlight the presence of a new strain close to a referenced one and to estimate its abundance. On a real dataset composed of a mix of several bacterial species and several strains for the same species, results show that in a more complex configuration StrainFLAIR correctly estimates the abundance of each strain. Hence, results demonstrated how graph representation of multiple close genomes can be used as a reference to characterize a sample at the strain level.

RevDate: 2021-09-09

Hall RJ, Whelan FJ, Cummins EA, et al (2021)

Gene-gene relationships in an Escherichia coli accessory genome are linked to function and mobility.

Microbial genomics, 7(9):.

The pangenome contains all genes encoded by a species, with the core genome present in all strains and the accessory genome in only a subset. Coincident gene relationships are expected within the accessory genome, where the presence or absence of one gene is influenced by the presence or absence of another. Here, we analysed the accessory genome of an Escherichia coli pangenome consisting of 400 genomes from 20 sequence types to identify genes that display significant co-occurrence or avoidance patterns with one another. We present a complex network of genes that are either found together or that avoid one another more often than would be expected by chance, and show that these relationships vary by lineage. We demonstrate that genes co-occur by function, and that several highly connected gene relationships are linked to mobile genetic elements. We find that genes are more likely to co-occur with, rather than avoid, another gene in the accessory genome. This work furthers our understanding of the dynamic nature of prokaryote pangenomes and implicates both function and mobility as drivers of gene relationships.

RevDate: 2021-09-07

Mazzuoli MV, Daunesse M, Varet H, et al (2021)

The CovR regulatory network drives the evolution of Group B Streptococcus virulence.

PLoS genetics, 17(9):e1009761 pii:PGENETICS-D-21-00538 [Epub ahead of print].

Virulence of the neonatal pathogen Group B Streptococcus is under the control of the master regulator CovR. Inactivation of CovR is associated with large-scale transcriptome remodeling and impairs almost every step of the interaction between the pathogen and the host. However, transcriptome analyses suggested a plasticity of the CovR signaling pathway in clinical isolates leading to phenotypic heterogeneity in the bacterial population. In this study, we characterized the CovR regulatory network in a strain representative of the CC-17 hypervirulent lineage responsible of the majority of neonatal meningitis. Transcriptome and genome-wide binding analysis reveal the architecture of the CovR network characterized by the direct repression of a large array of virulence-associated genes and the extent of co-regulation at specific loci. Comparative functional analysis of the signaling network links strain-specificities to the regulation of the pan-genome, including the two specific hypervirulent adhesins and horizontally acquired genes, to mutations in CovR-regulated promoters, and to variability in CovR activation by phosphorylation. This regulatory adaptation occurs at the level of genes, promoters, and of CovR itself, and allows to globally reshape the expression of virulence genes. Overall, our results reveal the direct, coordinated, and strain-specific regulation of virulence genes by the master regulator CovR and suggest that the intra-species evolution of the signaling network is as important as the expression of specific virulence factors in the emergence of clone associated with specific diseases.

RevDate: 2021-09-07

Li G, Jiang T, Li J, et al (2021)

PanSVR: Pan-Genome Augmented Short Read Realignment for Sensitive Detection of Structural Variations.

Frontiers in genetics, 12:731515.

The comprehensive discovery of structure variations (SVs) is fundamental to many genomics studies and high-throughput sequencing has become a common approach to this task. However, due the limited length, it is still non-trivial to state-of-the-art tools to accurately align short reads and produce high-quality SV callsets. Pan-genome provides a novel and promising framework to short read-based SV calling since it enables to comprehensively integrate known variants to reduce the incompleteness and bias of single reference to breakthrough the bottlenecks of short read alignments and provide new evidences to the detection of SVs. However, it is still an open problem to develop effective computational approaches to fully take the advantage of pan-genomes. Herein, we propose Pan-genome augmented Structure Variation calling tool with read Re-alignment (PanSVR), a novel pan-genome-based SV calling approach. PanSVR uses several tailored methods to implement precise re-alignment for SV-spanning reads against well-organized pan-genome reference with plenty of known SVs. PanSVR enables to greatly improve the quality of short read alignments and produce clear and homogenous SV signatures which facilitate SV calling. Benchmark results on real sequencing data suggest that PanSVR is able to largely improve the sensitivity of SV calling than that of state-of-the-art SV callers, especially for the SVs from repeat-rich regions and/or novel insertions which are difficult to existing tools.

RevDate: 2021-09-07

Karthik K, Anbazhagan S, Thomas P, et al (2021)

Genome Sequencing and Comparative Genomics of Indian Isolates of Brucella melitensis.

Frontiers in microbiology, 12:698069.

Brucella melitensis causes small ruminant brucellosis and a zoonotic pathogen prevalent worldwide. Whole genome phylogeny of all available B. melitensis genomes (n = 355) revealed that all Indian isolates (n = 16) clustered in the East Mediterranean lineage except the ADMAS-GI strain. Pangenome analysis indicated the presence of limited accessory genomes with few clades showing specific gene presence/absence pattern. A total of 43 virulence genes were predicted in all the Indian strains of B. melitensis except 2007BM-1 (ricA and wbkA are absent). Multilocus sequence typing (MLST) analysis indicated all except one Indian strain (ADMAS-GI) falling into sequence type (ST 8). In comparison with MLST, core genome phylogeny indicated two major clusters (>70% bootstrap support values) among Indian strains. Clusters with <70% bootstrap support values represent strains with diverse evolutionary origins present among animal and human hosts. Genetic relatedness among animal (sheep and goats) and human strains with 100% bootstrap values shows its zoonotic transfer potentiality. SNP-based analysis indicated similar clustering to that of core genome phylogeny. Among the Indian strains, the highest number of unique SNPs (112 SNPs) were shared by a node that involved three strains from Tamil Nadu. The node SNPs involved several peptidase genes like U32, M16 inactive domain protein, clp protease family protein, and M23 family protein and mostly represented non-synonymous (NS) substitutions. Vaccination has been followed in several parts of the world to prevent small ruminant brucellosis but not in India. Comparison of Indian strains with vaccine strains showed that M5 is genetically closer to most of the Indian strains than Rev.1 strain. The presence of most of the virulence genes among all Indian strains and conserved core genome compositions suggest the use of any circulating strain/genotypes for the development of a vaccine candidate for small ruminant brucellosis in India.

RevDate: 2021-09-07

Agarwal G, Choudhary D, Stice SP, et al (2021)

Pan-Genome-Wide Analysis of Pantoea ananatis Identified Genes Linked to Pathogenicity in Onion.

Frontiers in microbiology, 12:684756.

Pantoea ananatis, a gram negative and facultative anaerobic bacterium is a member of a Pantoea spp. complex that causes center rot of onion, which significantly affects onion yield and quality. This pathogen does not have typical virulence factors like type II or type III secretion systems but appears to require a biosynthetic gene-cluster, HiVir/PASVIL (located chromosomally comprised of 14 genes), for a phosphonate secondary metabolite, and the 'alt' gene cluster (located in plasmid and comprised of 11 genes) that aids in bacterial colonization in onion bulbs by imparting tolerance to thiosulfinates. We conducted a deep pan-genome-wide association study (pan-GWAS) to predict additional genes associated with pathogenicity in P. ananatis using a panel of diverse strains (n = 81). We utilized a red-onion scale necrosis assay as an indicator of pathogenicity. Based on this assay, we differentiated pathogenic (n = 51)- vs. non-pathogenic (n = 30)-strains phenotypically. Pan-genome analysis revealed a large core genome of 3,153 genes and a flexible accessory genome. Pan-GWAS using the presence and absence variants (PAVs) predicted 42 genes, including 14 from the previously identified HiVir/PASVIL cluster associated with pathogenicity, and 28 novel genes that were not previously associated with pathogenicity in onion. Of the 28 novel genes identified, eight have annotated functions of site-specific tyrosine kinase, N-acetylmuramoyl-L-alanine amidase, conjugal transfer, and HTH-type transcriptional regulator. The remaining 20 genes are currently hypothetical. Further, a core-genome SNPs-based phylogeny and horizontal gene transfer (HGT) studies were also conducted to assess the extent of lateral gene transfer among diverse P. ananatis strains. Phylogenetic analysis based on PAVs and whole genome multi locus sequence typing (wgMLST) rather than core-genome SNPs distinguished red-scale necrosis inducing (pathogenic) strains from non-scale necrosis inducing (non-pathogenic) strains of P. ananatis. A total of 1182 HGT events including the HiVir/PASVIL and alt cluster genes were identified. These events could be regarded as a major contributing factor to the diversification, niche-adaptation and potential acquisition of pathogenicity/virulence genes in P. ananatis.

RevDate: 2021-09-07

Letcher B, Hunt M, Z Iqbal (2021)

Gramtools enables multiscale variation analysis with genome graphs.

Genome biology, 22(1):259.

Genome graphs allow very general representations of genetic variation; depending on the model and implementation, variation at different length-scales (single nucleotide polymorphisms (SNPs), structural variants) and on different sequence backgrounds can be incorporated with different levels of transparency. We implement a model which handles this multiscale variation and develop a JSON extension of VCF (jVCF) allowing for variant calls on multiple references, both implemented in our software gramtools. We find gramtools outperforms existing methods for genotyping SNPs overlapping large deletions in M. tuberculosis and is able to genotype on multiple alternate backgrounds in P. falciparum, revealing previously hidden recombination.

RevDate: 2021-09-06

Gupta PK (2021)

GWAS for genetics of complex quantitative traits: Genome to pangenome and SNPs to SVs and k-mers.

BioEssays : news and reviews in molecular, cellular and developmental biology [Epub ahead of print].

The development of improved methods for genome-wide association studies (GWAS) for genetics of quantitative traits has been an active area of research during the last 25 years. This activity initially started with the use of mixed linear model (MLM), which was variously modified. During the last decade, however, with the availability of high throughput next generation sequencing (NGS) technology, development and use of pangenomes and novel markers including structural variations (SVs) and k-mers for GWAS has taken over as a new thrust area of research. Pangenomes and SVs are now available in humans, livestock, and a number of plant species, so that these resources along with k-mers are being used in GWAS for exploring additional genetic variation that was hitherto not available for analysis. These developments have resulted in significant improvement in GWAS methodology for detection of marker-trait associations (MTAs) that are relevant to human healthcare and crop improvement.

RevDate: 2021-09-06

Mann A, Malik S, Rana JS, et al (2021)

Whole genome sequencing data of Klebsiella aerogenes isolated from agricultural soil of Haryana, India.

Data in brief, 38:107311 pii:S2352-3409(21)00595-3.

Klebsiella aerogenes, is a Gram-negative bacterium, which was previously known as Enterobacter aerogenes. It is present in all environments such as water, soil, air and hospitals; and is an opportunistic pathogen that causes several types of infections. As compared to other clinically important pathogens included in the ESKAPE category (Enterococcus faecium, Staphylococcus aureus, Klebsiella pneumoniae, Acinetobacter baumannii, Pseudomonas aeruginosa, and Enterobacter species), the pangenome and population structure of Klebsiella aerogenes is still poorly understood. For the present study, the bacterial sample was isolated from agricultural soils of Haryana, India. With an aim to identify the occurrence of multi-drug resistance genes in the agricultural field soil bacterial isolate, whole genome sequencing (WGS) of the bacteria was performed; and the antibiotic resistance causing genes, along with the genes responsible for other major functions of the cell; and the different Single Nuceotide Polymorphisms (SNPs) and Insertions and deletions (InDels) were identified. The data presented in this manuscript can be reused by researchers as a reference for determining the antibiotic resistance genes that could be present in different bacterial isolates, and it would also help in determination of functions of various other genes present in other genomes of Klebsiella species.

RevDate: 2021-09-06

Rai A, Jagadeeshwari U, Deepshikha G, et al (2021)

Phylotaxogenomics for the Reappraisal of the Genus Roseomonas With the Creation of Six New Genera.

Frontiers in microbiology, 12:677842.

The genus Roseomonas is a significant group of bacteria which is invariably of great clinical and ecological importance. Previous studies have shown that the genus Roseomonas is polyphyletic in nature. Our present study focused on generating a lucid understanding of the phylogenetic framework for the re-evaluation and reclassification of the genus Roseomonas. Phylogenetic studies based on the 16S rRNA gene and 92 concatenated genes suggested that the genus is heterogeneous, forming seven major groups. Existing Roseomonas species were subjected to an array of genomic, phenotypic, and chemotaxonomic analyses in order to resolve the heterogeneity. Genomic similarity indices (dDDH and ANI) indicated that the members were well-defined at the species level. The Percentage of Conserved Proteins (POCP) and the average Amino Acid Identity (AAI) values between the groups of the genus Roseomonas and other interspersing members of the family Acetobacteraceae were below 65 and 70%, respectively. The pan-genome evaluation depicted that the pan-genome was an open type and the members shared 958 core genes. This claim of reclassification was equally supported by the phenotypic and chemotaxonomic differences between the groups. Thus, in this study, we propose to re-evaluate and reclassify the genus Roseomonas and propose six novel genera as Pararoseomonas gen. nov., Falsiroseomonas gen. nov., Paeniroseomonas gen. nov., Plastoroseomonas gen. nov., Neoroseomonas gen. nov., and Pseudoroseomonas gen. nov.

RevDate: 2021-09-04

Vandamme P, Peeters C, Seth-Smith HMB, et al (2021)

Gulosibacter hominis sp. nov.: a novel human microbiome bacterium that may cause opportunistic infections.

Antonie van Leeuwenhoek [Epub ahead of print].

We present genomic, phylogenomic, and phenotypic taxonomic data to demonstrate that three human ear isolates represent a novel species within the genus Gulosibacter. These isolates could not be identified reliably using MALDI-TOF mass spectrometry during routine diagnostic work, but partial 16S rRNA gene sequence analysis revealed that they belonged to the genus Gulosibacter. Overall genomic relatedness indices between the draft genome sequences of the three isolates and of the type strains of established Gulosibacter species confirmed that the three isolates represented a single novel Gulosibacter species. A biochemical characterisation yielded differential tests between the novel and established Gulosibacter species, which could also be differentiated using MALDI-TOF mass spectrometry. We propose to formally classify these three isolates into Gulosibacter hominis sp. nov., with 401352-2018 T (= LMG 31778 T, CCUG 74795 T) as the type strain. The whole-genome sequence of strain 401352-2018 T has a size of 2,340,181 bp and a G+C content of 62.04 mol%. A Gulosibacter pangenome analysis revealed 467 gene clusters that were exclusively present in G. hominis genomes. While these G. hominis specific gene clusters were enriched in several COG functional categories, this analysis did not reveal functions that suggested a role in the human microbiome, nor did it explain the occurrence of G. hominis in ear infections. The absence of acquired antimicrobial resistance determinants and virulence factors in the G. hominis genomes, and an analysis of publicly available 16S rRNA gene sequences and 16S rRNA amplicon sequencing data sets suggested that G. hominis is a member of the human skin microbiota that may occasionally be involved in opportunistic infections.

RevDate: 2021-08-31

Li Q, Tian S, Yan B, et al (2021)

Building a Chinese pan-genome of 486 individuals.

Communications biology, 4(1):1016.

Pan-genome sequence analysis of human population ancestry is critical for expanding and better defining human genome sequence diversity. However, the amount of genetic variation still missing from current human reference sequences is still unknown. Here, we used 486 deep-sequenced Han Chinese genomes to identify 276 Mbp of DNA sequences that, to our knowledge, are absent in the current human reference. We classified these sequences into individual-specific and common sequences, and propose that the common sequence size is uncapped with a growing population. The 46.646 Mbp common sequences obtained from the 486 individuals improved the accuracy of variant calling and mapping rate when added to the reference genome. We also analyzed the genomic positions of these common sequences and found that they came from genomic regions characterized by high mutation rate and low pathogenicity. Our study authenticates the Chinese pan-genome as representative of DNA sequences specific to the Han Chinese population missing from the GRCh38 reference genome and establishes the newly defined common sequences as candidates to supplement the current human reference.

RevDate: 2021-08-31

Peters S, Pascoe B, Wu Z, et al (2021)

Campylobacter jejuni genotypes are associated with post-infection irritable bowel syndrome in humans.

Communications biology, 4(1):1015.

Campylobacter enterocolitis may lead to post-infection irritable bowel syndrome (PI-IBS) and while some C. jejuni strains are more likely than others to cause human disease, genomic and virulence characteristics promoting PI-IBS development remain uncharacterized. We combined pangenome-wide association studies and phenotypic assays to compare C. jejuni isolates from patients who developed PI-IBS with those who did not. We show that variation in bacterial stress response (Cj0145_phoX), adhesion protein (Cj0628_CapA), and core biosynthetic pathway genes (biotin: Cj0308_bioD; purine: Cj0514_purQ; isoprenoid: Cj0894c_ispH) were associated with PI-IBS development. In vitro assays demonstrated greater adhesion, invasion, IL-8 and TNFα secretion on colonocytes with PI-IBS compared to PI-no-IBS strains. A risk-score for PI-IBS development was generated using 22 genomic markers, four of which were from Cj1631c, a putative heme oxidase gene linked to virulence. Our finding that specific Campylobacter genotypes confer greater in vitro virulence and increased risk of PI-IBS has potential to improve understanding of the complex host-pathogen interactions underlying this condition.

RevDate: 2021-08-31

Porcellato D, Smistad M, Skeie SB, et al (2021)

Whole genome sequencing reveals possible host species adaptation of Streptococcus dysgalactiae.

Scientific reports, 11(1):17350.

Streptococcus dysgalactiae (SD) is an emerging pathogen in human and veterinary medicine, and is associated with several host species, disease phenotypes and virulence mechanisms. SD has traditionally been divided into the subspecies dysgalactiae (SDSD) and subsp. equisimilis (SDSE), but recent molecular studies have indicated that the phylogenetic relationships are more complex. Moreover, the genetic basis for the niche versatility of SD has not been extensively investigated. To expand the knowledge about virulence factors, phylogenetic relationships and host-adaptation strategies of SD, we analyzed 78 SDSD genomes from cows and sheep, and 78 SDSE genomes from other host species. Sixty SDSD and 40 SDSE genomes were newly sequenced in this study. Phylogenetic analysis supported SDSD as a distinct taxonomic entity, presenting a mean value of the average nucleotide identity of 99%. Bovine and ovine associated SDSD isolates clustered separately on pangenome analysis, but no single gene or genetic region was uniquely associated with host species. In contrast, SDSE isolates were more heterogenous and could be delineated in accordance with host. Although phylogenetic clustering suggestive of cross species transmission was observed, we predominantly detected a host restricted distribution of the SD-lineages. Furthermore, lineage specific virulence factors were detected, several of them located in proximity to hotspots for integration of mobile genetic elements. Our study indicates that SD has evolved to adapt to several different host species and infers a potential role of horizontal genetic transfer in niche specialization.

RevDate: 2021-08-30

Bachert BA, Richardson JB, Mlynek KD, et al (2021)

Development, Phenotypic Characterization and Genomic Analysis of a Francisella tularensis Panel for Tularemia Vaccine Testing.

Frontiers in microbiology, 12:725776.

Francisella tularensis is one of several biothreat agents for which a licensed vaccine is needed to protect against this pathogen. To aid in the development of a vaccine protective against pneumonic tularemia, we generated and characterized a panel of F. tularensis isolates that can be used as challenge strains to assess vaccine efficacy. Our panel consists of both historical and contemporary isolates derived from clinical and environmental sources, including human, tick, and rabbit isolates. Whole genome sequencing was performed to assess the genetic diversity in comparison to the reference genome F. tularensis Schu S4. Average nucleotide identity analysis showed >99% genomic similarity across the strains in our panel, and pan-genome analysis revealed a core genome of 1,707 genes, and an accessory genome of 233 genes. Three of the strains in our panel, FRAN254 (tick-derived), FRAN255 (a type B strain), and FRAN256 (a human isolate) exhibited variation from the other strains. Moreover, we identified several unique mutations within the Francisella Pathogenicity Island across multiple strains in our panel, revealing unexpected diversity in this region. Notably, FRAN031 (Scherm) completely lacked the second pathogenicity island but retained virulence in mice. In contrast, FRAN037 (Coll) was attenuated in a murine pneumonic tularemia model and had mutations in pdpB and iglA which likely led to attenuation. All of the strains, except FRAN037, retained full virulence, indicating their effectiveness as challenge strains for future vaccine testing. Overall, we provide a well-characterized panel of virulent F. tularensis strains that can be utilized in ongoing efforts to develop an effective vaccine against pneumonic tularemia to ensure protection is achieved across a range F. tularensis strains.

RevDate: 2021-08-30

Outten J, A Warren (2021)

Methods and Developments in Graphical Pangenomics.

Journal of the Indian Institute of Science pii:255 [Epub ahead of print].

Pangenomes are organized collections of the genomic information from related individuals or groups. Graphical pangenomics is the study of these pangenomes using graphical methods to identify and analyze genes, regions, and mutations of interest to an array of biological questions. This field has seen significant progress in recent years including the development of graph based models that better resolve biological phenomena, and an explosion of new tools for mapping reads, creating graphical genomes, and performing pangenome analysis. In this review, we discuss recent developments in models, algorithms associated with graphical genomes, and comparisons between similar tools. In addition we briefly discuss what these developments may mean for the future of genomics.

RevDate: 2021-08-27

Mashima I, Liao YC, Lin CH, et al (2021)

Comparative Pan-Genome Analysis of Oral Veillonella Species.

Microorganisms, 9(8): pii:microorganisms9081775.

The genus Veillonella is a common and abundant member of the oral microbiome. It includes eight species, V. atypica, V. denticariosi, V. dispar, V. infantium, V. nakazawae, V. parvula, V. rogosae and V. tobetusensis. They possess important metabolic pathways that utilize lactate as an energy source. However, the overall metabolome of these species has not been studied. To further understand the metabolic framework of Veillonella in the human oral microbiome, we conducted a comparative pan-genome analysis of the eight species of oral Veillonella. Analysis of the oral Veillonella pan-genome revealed features based on KEGG pathway information to adapt to the oral environment. We found that the fructose metabolic pathway was conserved in all oral Veillonella species, and oral Veillonella have conserved pathways that utilize carbohydrates other than lactate as an energy source. This discovery may help to better understand the metabolic network among oral microbiomes and will provide guidance for the design of future in silico and in vitro studies.

RevDate: 2021-08-27

Agarwal G, Gitaitis RD, B Dutta (2021)

Pan-Genome of Novel Pantoea stewartii subsp. indologenes Reveals Genes Involved in Onion Pathogenicity and Evidence of Lateral Gene Transfer.

Microorganisms, 9(8): pii:microorganisms9081761.

Pantoea stewartii subsp. indologenes (Psi) is a causative agent of leafspot on foxtail millet and pearl millet; however, novel strains were recently identified that are pathogenic on onions. Our recent host range evaluation study identified two pathovars; P. stewartii subsp. indologenes pv. cepacicola pv. nov. and P. stewartii subsp. indologenes pv. setariae pv. nov. that are pathogenic on onions and millets or on millets only, respectively. In the current study, we developed a pan-genome using the whole genome sequencing of newly identified/classified Psi strains from both pathovars [pv. cepacicola (n = 4) and pv. setariae (n = 13)]. The full spectrum of the pan-genome contained 7030 genes. Among these, 3546 (present in genomes of all 17 strains) were the core genes that were a subset of 3682 soft-core genes (present in ≥16 strains). The accessory genome included 1308 shell genes and 2040 cloud genes (present in ≤2 strains). The pan-genome showed a clear linear progression with >6000 genes, suggesting that the pan-genome of Psi is open. Comparative phylogenetic analysis showed differences in phylogenetic clustering of Pantoea spp. using PAVs/wgMLST approach in comparison with core genome SNPs-based phylogeny. Further, we conducted a horizontal gene transfer (HGT) study using Psi strains from both pathovars along with strains from other Pantoea species, namely, P. stewartii subsp. stewartii LMG 2715T, P. ananatis LMG 2665T, P. agglomerans LMG L15, and P. allii LMG 24248T. A total of 317 HGT events among four Pantoea species were identified with most gene transfer events occurring between Psi pv. cepacicola and Psi pv. setariae. Pan-GWAS analysis predicted a total of 154 genes, including seven gene-clusters, which were associated with the pathogenicity phenotype (necrosis on seedling) on onions. One of the gene-clusters contained 11 genes with known functions and was found to be chromosomally located.

RevDate: 2021-08-27

Lee JY, Lee DH, DH Kim (2021)

Characterization of Martelella soudanensis sp. nov., Isolated from a Mine Sediment.

Microorganisms, 9(8): pii:microorganisms9081736.

Gram-stain-negative, strictly aerobic, non-spore-forming, non-motile, and rod-shaped bacterial strains, designated NC18T and NC20, were isolated from the sediment near-vertical borehole effluent originating 714 m below the subsurface located in the Soudan Iron Mine in Minnesota, USA. The 16S rRNA gene sequence showed that strains NC18T and NC20 grouped with members of the genus Martelella, including M. mediterranea DSM 17316T and M. limonii YC7034T. The genome sizes and G + C content of both NC18T and NC20 were 6.1 Mb and 61.8 mol%, respectively. Average nucleotide identity (ANI), the average amino acid identity (AAI), and digital DNA-DNA hybridization (dDDH) values were below the species delineation threshold. Pan-genomic analysis showed that NC18T, NC20, M. mediterranea DSM 17316T, M. endophytica YC6887T, and M. lutilitoris GH2-6T had 8470 pan-genome orthologous groups (POGs) in total. Five Martelella strains shared 2258 POG core, which were mainly associated with amino acid transport and metabolism, general function prediction only, carbohydrate transport and metabolism, translation, ribosomal structure and biogenesis, and transcription. The two novel strains had major fatty acids (>5%) including summed feature 8 (C18:1 ω7c and/or C18:1 ω6c), C19:0 cyclo ω8c, C16:0, C18:1 ω7c 11-methyl, C18:0, and summed feature 2 (C12:0 aldehyde and/or iso-C16:1 I and/or C14:0 3-OH). The sole respiratory quinone was uniquinone-10 (Q-10). On the basis of polyphasic taxonomic analyses, strains NC18T and NC20 represent novel species of the genus Martelella, for which the name Martelella soudanensis sp. nov. is proposed. The type strain is NC18T (=KTCT 82174T = NBRC 114661T).

RevDate: 2021-08-27

Castillo D, Donati VL, Jørgensen J, et al (2021)

Comparative Genomic Analyses of Flavobacterium psychrophilum Isolates Reveals New Putative Genetic Determinants of Virulence Traits.

Microorganisms, 9(8): pii:microorganisms9081658.

The fish pathogen Flavobacterium psychrophilum is currently one of the main pathogenic bacteria hampering the productivity of salmonid farming worldwide. Although putative virulence determinants have been identified, the genetic basis for variation in virulence of F. psychrophilum is not fully understood. In this study, we analyzed whole-genome sequences of a collection of 25 F. psychrophilum isolates from Baltic Sea countries and compared genomic information with a previous determination of their virulence in juvenile rainbow trout. The results revealed a conserved population of F. psychrophilum that were consistently present across the Baltic Sea countries, with no clear association between genomic repertoire, phylogenomic, or gene distribution and virulence traits. However, analysis of the entire genome of four F. psychrophilum isolates by hybrid assembly provided an unprecedented resolution for discriminating even highly related isolates. The results showed that isolates with different virulence phenotypes harbored genetic variances on a number of consecutive leucine-rich repeat (LRR) proteins, repetitive motifs in gliding motility-associated protein, and the insertion of transposable elements into intergenic and genic regions. Thus, these findings provide novel insights into the genetic variation of these elements and their putative role in the modulation of F. psychrophilum virulence.

RevDate: 2021-08-27

Lin N, Tao Y, Gao P, et al (2021)

Comparative Genomics Revealing Insights into Niche Separation of the Genus Methylophilus.

Microorganisms, 9(8): pii:microorganisms9081577.

The genus Methylophilus uses methanol as a carbon and energy source, which is widely distributed in terrestrial, freshwater and marine ecosystems. Here, three strains (13, 14 and QUAN) related to the genus Methylophilus, were newly isolated from Lake Fuxian sediments. The draft genomes of strains 13, 14 and QUAN were 3.11 Mb, 3.02 Mb, 3.15 Mb with a G+C content of 51.13, 50.48 and 50.33%, respectively. ANI values between strains 13 and 14, 13 and QUAN, and 14 and QUAN were 81.09, 81.06 and 91.46%, respectively. Pan-genome and core-genome included 3994 and 1559 genes across 18 Methylophilus genomes, respectively. Phylogenetic analysis based on 1035 single-copy genes and 16S rRNA genes revealed two clades, one containing strains isolated from aquatic and the other from the leaf surface. Twenty-three aquatic-specific genes, such as 2OG/Fe(II) oxygenase and diguanylate cyclase, reflected the strategy to survive in oxygen-limited water and sediment. Accordingly, 159 genes were identified specific to leaf association. Besides niche separation, Methylophilus could utilize the combination of ANRA and DNRA to convert nitrate to ammonia and reduce sulfate to sulfur according to the complete sulfur metabolic pathway. Genes encoding the cytochrome c protein and riboflavin were detected in Methylophilus genomes, which directly or indirectly participate in electron transfer.

RevDate: 2021-08-26

Xu S, Li Z, Huang Y, et al (2021)

Whole genome sequencing reveals the genomic diversity, taxonomic classification, and evolutionary relationships of the genus Nocardia.

PLoS neglected tropical diseases, 15(8):e0009665 pii:PNTD-D-21-00306 [Epub ahead of print].

Nocardia is a complex and diverse genus of aerobic actinomycetes that cause complex clinical presentations, which are difficult to diagnose due to being misunderstood. To date, the genetic diversity, evolution, and taxonomic structure of the genus Nocardia are still unclear. In this study, we investigated the pan-genome of 86 Nocardia type strains to clarify their genetic diversity. Our study revealed an open pan-genome for Nocardia containing 265,836 gene families, with about 99.7% of the pan-genome being variable. Horizontal gene transfer appears to have been an important evolutionary driver of genetic diversity shaping the Nocardia genome and may have caused historical taxonomic confusion from other taxa (primarily Rhodococcus, Skermania, Aldersonia, and Mycobacterium). Based on single-copy gene families, we established a high-accuracy phylogenomic approach for Nocardia using 229 genome sequences. Furthermore, we found 28 potentially new species and reclassified 16 strains. Finally, by comparing the topology between a phylogenomic tree and 384 phylogenetic trees (from 384 single-copy genes from the core genome), we identified a novel locus for inferring the phylogeny of this genus. The dapb1 gene, which encodes dipeptidyl aminopeptidase BI, was far superior to commonly used markers for Nocardia and yielded a topology almost identical to that of genome-based phylogeny. In conclusion, the present study provides insights into the genetic diversity, contributes a robust framework for the taxonomic classification, and elucidates the evolutionary relationships of Nocardia. This framework should facilitate the development of rapid tests for the species identification of highly variable species and has given new insight into the behavior of this genus.

RevDate: 2021-08-26

Shapiro JW, C Putonti (2021)

Rephine.r: a pipeline for correcting gene calls and clusters to improve phage pangenomes and phylogenies.

PeerJ, 9:e11950 pii:11950.

Background: A pangenome is the collection of all genes found in a set of related genomes. For microbes, these genomes are often different strains of the same species, and the pangenome offers a means to compare gene content variation with differences in phenotypes, ecology, and phylogenetic relatedness. Though most frequently applied to bacteria, there is growing interest in adapting pangenome analysis to bacteriophages. However, working with phage genomes presents new challenges. First, most phage families are under-sampled, and homologous genes in related viruses can be difficult to identify. Second, homing endonucleases and intron-like sequences may be present, resulting in fragmented gene calls. Each of these issues can reduce the accuracy of standard pangenome analysis tools.

Methods: We developed an R pipeline called Rephine.r that takes as input the gene clusters produced by an initial pangenomics workflow. Rephine.r then proceeds in two primary steps. First, it identifies three common causes of fragmented gene calls: (1) indels creating early stop codons and new start codons; (2) interruption by a selfish genetic element; and (3) splitting at the ends of the reported genome. Fragmented genes are then fused to create new sequence alignments. In tandem, Rephine.r searches for distant homologs separated into different gene families using Hidden Markov Models. Significant hits are used to merge families into larger clusters. A final round of fragment identification is then run, and results may be used to infer single-copy core genomes and phylogenetic trees.

Results: We applied Rephine.r to three well-studied phage groups: the Tevenvirinae (e.g., T4), the Studiervirinae (e.g., T7), and the Pbunaviruses (e.g., PB1). In each case, Rephine.r recovered additional members of the single-copy core genome and increased the overall bootstrap support of the phylogeny. The Rephine.r pipeline is provided through GitHub ( as a single script for automated analysis and with utility functions to assist in building single-copy core genomes and predicting the sources of fragmented genes.

RevDate: 2021-08-25

Clermont O, Condamine B, Dion S, et al (2021)

The E phylogroup of Escherichia coli is highly diverse and mimics the whole E. coli species population structure.

Environmental microbiology [Epub ahead of print].

To get a global picture of the population structure of the Escherichia coli phylogroup E, encompassing the O157:H7 EHEC lineage, we analysed the whole genome of 144 strains isolated from various continents, hosts and life styles and representative of the phylogroup diversity. The strains possess 4331 to 5440 genes with a core genome of 2771 genes and a pangenome of 33,722 genes. The distribution of these genes among the strains shows an asymmetric U-shaped distribution. E phylogenetic strains have the largest genomes of the species, partly explained by the presence of mobile genetic elements. Sixty-eight lineages were delineated, some of them exhibiting extra-intestinal virulence genes and being virulent in the mouse sepsis model. Except for the EHEC lineages and the reference EPEC, EIEC and ETEC strains, very few strains possess intestinal virulence genes. Most of the strains were devoid of acquired resistance genes, but eight strains possessed extended-spectrum beta-lactamase genes. Human strains belong to specific lineages, some of them being virulent and antibiotic resistant (ST complexes (STc) 350 and 2064). The E phylogroup mimics all the features of the species as a whole, a phenomenon already observed at the STc level, arguing for a fractal population structure of E. coli. This article is protected by copyright. All rights reserved.

RevDate: 2021-08-23

Lee AHY, Porto WF, de Faria C, et al (2021)

Genomic insights into the diversity, virulence and resistance of Klebsiella pneumoniae extensively drug resistant clinical isolates.

Microbial genomics, 7(8):.

Klebsiella pneumoniae has been implicated in wide-ranging nosocomial outbreaks, causing severe infections without effective treatments due to antibiotic resistance. Here, we performed genome sequencing of 70 extensively drug resistant clinical isolates, collected from Brasília's hospitals (Brazil) between 2010 and 2014. The majority of strains (60 out of 70) belonged to a single clonal complex (CC), CC258, which has become distributed worldwide in the last two decades. Of these CC258 strains, 44 strains were classified as sequence type 11 (ST11) and fell into two distinct clades, but no ST258 strains were found. These 70 strains had a pan-genome size of 10 366 genes, with a core-genome size of ~4476 genes found in 95 % of isolates. Analysis of sequences revealed diverse mechanisms of resistance, including production of multidrug efflux pumps, enzymes with the same target function but with reduced or no affinity to the drug, and proteins that protected the drug target or inactivated the drug. β-Lactamase production provided the most notable mechanism associated with K. pneumoniae. Each strain presented two or three different β-lactamase enzymes, including class A (SHV, CTX-M and KPC), class B and class C AmpC enzymes, although no class D β-lactamase was identified. Strains carrying the NDM enzyme involved three different ST types, suggesting that there was no common genetic origin.

RevDate: 2021-08-21

Woodhouse MR, Cannon EK, Portwood JL, et al (2021)

A pan-genomic approach to genome databases using maize as a model system.

BMC plant biology, 21(1):385.

Research in the past decade has demonstrated that a single reference genome is not representative of a species' diversity. MaizeGDB introduces a pan-genomic approach to hosting genomic data, leveraging the large number of diverse maize genomes and their associated datasets to quickly and efficiently connect genomes, gene models, expression, epigenome, sequence variation, structural variation, transposable elements, and diversity data across genomes so that researchers can easily track the structural and functional differences of a locus and its orthologs across maize. We believe our framework is unique and provides a template for any genomic database poised to host large-scale pan-genomic data.

RevDate: 2021-08-20

Hudec C, Biessy A, Novinscak A, et al (2021)

Comparative Genomics of Potato Common Scab-Causing Streptomyces spp. Displaying Varying Virulence.

Frontiers in microbiology, 12:716522.

Common scab of potato causes important economic losses worldwide following the development of necrotic lesions on tubers. In this study, the genomes of 14 prevalent scab-causing Streptomyces spp. isolated from Prince Edward Island, one of the most important Canadian potato production areas, were sequenced and annotated. Their phylogenomic affiliation was determined, their pan-genome was characterized, and pathogenic determinants involved in their virulence, ranging from weak to aggressive, were compared. 13 out of 14 strains clustered with Streptomyces scabiei, while the last strain clustered with Streptomyces acidiscabies. The toxicogenic and colonization genomic regions were compared, and while some atypical gene organizations were observed, no clear correlation with virulence was observed. The production of the phytotoxin thaxtomin A was also quantified and again, contrary to previous reports in the literature, no clear correlation was found between the amount of thaxtomin A secreted, and the virulence observed. Although no significant differences were observed when comparing the presence/absence of the main virulence factors among the strains of S. scabiei, a distinct profile was observed for S. acidiscabies. Several mutations predicted to affect the functionality of some virulence factors were identified, including one in the bldA gene that correlates with the absence of thaxtomin A production despite the presence of the corresponding biosynthetic gene cluster in S. scabiei LBUM 1485. These novel findings obtained using a large number of scab-causing Streptomyces strains are challenging some assumptions made so far on Streptomyces' virulence and suggest that other factors, yet to be characterized, are also key contributors.

RevDate: 2021-08-19

Vaid RK, Thakur Z, Anand T, et al (2021)

Comparative genome analysis of Salmonella enterica serovar Gallinarum biovars Pullorum and Gallinarum decodes strain specific genes.

PloS one, 16(8):e0255612 pii:PONE-D-21-04830.

Salmonella enterica serovar Gallinarum biovar Pullorum (bvP) and biovar Gallinarum (bvG) are the etiological agents of pullorum disease (PD) and fowl typhoid (FT) respectively, which cause huge economic losses to poultry industry especially in developing countries including India. Vaccination and biosecurity measures are currently being employed to control and reduce the S. Gallinarum infections. High endemicity, poor implementation of hygiene and lack of effective vaccines pose challenges in prevention and control of disease in intensively maintained poultry flocks. Comparative genome analysis unravels similarities and dissimilarities thus facilitating identification of genomic features that aids in pathogenesis, niche adaptation and in tracing of evolutionary history. The present investigation was carried out to assess the genotypic differences amongst S.enterica serovar Gallinarum strains including Indian strain S. Gallinarum Sal40 VTCCBAA614. The comparative genome analysis revealed an open pan-genome consisting of 5091 coding sequence (CDS) with 3270 CDS belonging to core-genome, 1254 CDS to dispensable genome and strain specific genes i.e. singletons ranging from 3 to 102 amongst the analyzed strains. Moreover, the investigated strains exhibited diversity in genomic features such as virulence factors, genomic islands, prophage regions, toxin-antitoxin cassettes, and acquired antimicrobial resistance genes. Core genome identified in the study can give important leads in the direction of design of rapid and reliable diagnostics, and vaccine design for effective infection control as well as eradication. Additionally, the identified genetic differences among the S. enterica serovar Gallinarum strains could be used for bacterial typing, structure based inhibitor development by future experimental investigations on the data generated.

RevDate: 2021-08-19

Simonsen AK (2021)

Environmental stress leads to genome streamlining in a widely distributed species of soil bacteria.

The ISME journal [Epub ahead of print].

Bacteria have highly flexible pangenomes, which are thought to facilitate evolutionary responses to environmental change, but the impacts of environmental stress on pangenome evolution remain unclear. Using a landscape pangenomics approach, I demonstrate that environmental stress leads to consistent, continuous reduction in genome content along four environmental stress gradients (acidity, aridity, heat, salinity) in naturally occurring populations of Bradyrhizobium diazoefficiens (widespread soil-dwelling plant mutualists). Using gene-level network and duplication functional traits to predict accessory gene distributions across environments, genes predicted to be superfluous are more likely lost in high stress, while genes with multi-functional roles are more likely retained. Genes with higher probabilities of being lost with stress contain significantly higher proportions of codons under strong purifying and positive selection. Gene loss is widespread across the entire genome, with high gene-retention hotspots in close spatial proximity to core genes, suggesting Bradyrhizobium has evolved to cluster essential-function genes (accessory genes with multifunctional roles and core genes) in discrete genomic regions, which may stabilise viability during genomic decay. In conclusion, pangenome evolution through genome streamlining are important evolutionary responses to environmental change. This raises questions about impacts of genome streamlining on the adaptive capacity of bacterial populations facing rapid environmental change.

RevDate: 2021-08-17

Belloso Daza MV, Cortimiglia C, Bassi D, et al (2021)

Genome-based studies indicate that the Enterococcus faecium Clade B strains belong to Enterococcus lactis species and lack of the hospital infection associated markers.

International journal of systematic and evolutionary microbiology, 71(8):.

Enterococcus lactis and the heterotypic synonym Enterococcus xinjiangensis from dairy origin have recently been identified as a novel species based on 16S rRNA gene sequence analysis. Enterococcus faecium type strain NCTC 7171T was used as the reference genome for determining E. lactis and E. faecium to be separate species. However, this taxonomic classification did not consider the diverse lineages of E. faecium, and the double nature of hospital-associated (clade A) and community-associated (clade B) isolates. Here, we investigated the taxonomic relationship among isolates of E. faecium of different origins and E. lactis, using a genome-based approach. Additional to 16S rRNA gene sequence analysis, we estimated the relatedness among strains and species using phylogenomics based on the core pangenome, multilocus sequence typing, the average nucleotide identity and digital DNA-DNA hybridization. Moreover, following the available safety assessment schemes, we evaluated the virulence profile and the ampicillin resistance of E. lactis and E. faecium clade B strains. Our results confirmed the genetic and evolutionary differences between clade A and the intertwined clade B and E. lactis group. We also confirmed the absence in these strains of virulence gene markers IS16, hylEfm and esp and the lack of the PBP5 allelic profile associated with ampicillin resistance. Taken together, our findings support the reassignment of the strains of E. faecium clade B as E. lactis.

RevDate: 2021-08-17

Matteoli FP, Pedrosa-Silva F, Dutra-Silva L, et al (2021)

The global population structure and beta-lactamase repertoire of the opportunistic pathogen Serratia marcescens.

Genomics pii:S0888-7543(21)00316-5 [Epub ahead of print].

Serratia marcescens is a global spread nosocomial pathogen. This rod-shaped bacterium displays a broad host range and a worldwide geographical distribution. Here we analyze an international collection of this multidrug-resistant, opportunistic pathogen from 35 countries to infer its population structure. We show that S. marcescens comprises 12 lineages; Sm1, Sm4, and Sm10 harbor 78.3% of the environmental strains known. Sm5, Sm6, and Sm7 comprise only human-associated strains which harbor smallest pangenomes, genomic fluidity and lowest levels of core recombination, indicating niche specialization. Sm7 and Sm9 lineages exhibit the most concerning resistome; blaKPC-2 plasmid is widespread in Sm7, whereas Sm9, also an anthropogenic-exclusive lineage, presents highest plasmid/lineage size ratio and plasmid-diversity encoding metallo-beta-lactamases comprising blaNDM-1. The heterogeneity of resistance patterns of S. marcescens lineages elucidated herein highlights the relevance of surveillance programs using whole-genome sequencing to provide insight into the molecular epidemiology of carbapenemase producing strains of this species.

RevDate: 2021-08-17

Orsi WD, Magritsch T, Vargas S, et al (2021)

Genome Evolution in Bacteria Isolated from Million-Year-Old Subseafloor Sediment.

mBio [Epub ahead of print].

Beneath the seafloor, microbial life subsists in isolation from the surface world under persistent energy limitation. The nature and extent of genomic evolution in subseafloor microbes have been unknown. Here, we show that the genomes of Thalassospira bacterial populations cultured from million-year-old subseafloor sediments evolve in clonal populations by point mutation, with a relatively low rate of homologous recombination and elevated numbers of pseudogenes. Ratios of nonsynonymous to synonymous substitutions correlate with the accumulation of pseudogenes, consistent with a role for genetic drift in the subseafloor strains but not in type strains of Thalassospira isolated from the surface world. Consistent with this, pangenome analysis reveals that the subseafloor bacterial genomes have a significantly lower number of singleton genes than the type strains, indicating a reduction in recent gene acquisitions. Numerous insertion-deletion events and pseudogenes were present in a flagellar operon of the subseafloor bacteria, indicating that motility is nonessential in these million-year-old subseafloor sediments. This genomic evolution in subseafloor clonal populations coincided with a phenotypic difference: all subseafloor isolates have a lower rate of growth under laboratory conditions than the Thalassospira xiamenensis type strain. Our findings demonstrate that the long-term physical isolation of Thalassospira, in the absence of recombination, has resulted in clonal populations whereby reduced access to novel genetic material from neighbors has resulted in the fixation of new mutations that accumulate in genomes over millions of years. IMPORTANCE The nature and extent of genomic evolution in subseafloor microbial populations subsisting for millions of years below the seafloor are unknown. Subseafloor populations have ultralow metabolic rates that are hypothesized to restrict reproduction and, consequently, the spread of new traits. Our findings demonstrate that genomes of cultivated bacterial strains from the genus Thalassospira isolated from million-year-old abyssal sediment exhibit greatly reduced levels of homologous recombination, elevated numbers of pseudogenes, and genome-wide evidence of relaxed purifying selection. These substitutions and pseudogenes are fixed into the population, suggesting that the genome evolution of these bacteria has been dominated by genetic drift. Thus, reduced recombination, stemming from long-term physical isolation, resulted in small clonal populations of Thalassospira that have accumulated mutations in their genomes over millions of years.

RevDate: 2021-08-13

Llamas B, Narzisi G, Schneider V, et al (2019)

A strategy for building and using a human reference pangenome.

F1000Research, 8:1751.

In March 2019, 45 scientists and software engineers from around the world converged at the University of California, Santa Cruz for the first pangenomics codeathon. The purpose of the meeting was to propose technical specifications and standards for a usable human pangenome as well as to build relevant tools for genome graph infrastructures. During the meeting, the group held several intense and productive discussions covering a diverse set of topics, including advantages of graph genomes over a linear reference representation, design of new methods that can leverage graph-based data structures, and novel visualization and annotation approaches for pangenomes. Additionally, the participants self-organized themselves into teams that worked intensely over a three-day period to build a set of pipelines and tools for specific pangenomic applications. A summary of the questions raised and the tools developed are reported in this manuscript.

RevDate: 2021-08-13

Saco A, Rey-Campos M, Rosani U, et al (2021)

The Evolution and Diversity of Interleukin-17 Highlight an Expansion in Marine Invertebrates and Its Conserved Role in Mucosal Immunity.

Frontiers in immunology, 12:692997.

The interleukin-17 (IL-17) family consists of proinflammatory cytokines conserved during evolution. A comparative genomics approach was applied to examine IL-17 throughout evolution from poriferans to higher vertebrates. Cnidaria was highlighted as the most ancient diverged phylum, and several evolutionary patterns were revealed. Large expansions of the IL-17 repertoire were observed in marine molluscs and echinoderm species. We further studied this expansion in filter-fed Mytilus galloprovincialis, which is a bivalve with a highly effective innate immune system supported by a variable pangenome. We recovered 379 unique IL-17 sequences and 96 receptors from individual genomes that were classified into 23 and 6 isoforms after phylogenetic analyses. Mussel IL-17 isoforms were conserved among individuals and shared between closely related Mytilidae species. Certain isoforms were specifically implicated in the response to a waterborne infection with Vibrio splendidus in mussel gills. The involvement of IL-17 in mucosal immune responses could be conserved in higher vertebrates from these ancestral lineages.

RevDate: 2021-08-13

Zhang X, Liu T, Wang J, et al (2021)

Pan-genome of Raphanus highlights genetic variation and introgression among domesticated, wild and weedy radishes.

Molecular plant pii:S1674-2052(21)00318-X [Epub ahead of print].

Post-polyploid diploidization associated with descending dysploidy and interspecific introgression drives plant genome evolution by unclear mechanisms. Raphanus is an economically and ecologically important Brassiceae genus and model system for studying post-polyploidization genome evolution and introgression. Here, we have sequenced and de novo assembled eleven genomes covering most of the typical sub-species and varieties of domesticated, wild and weedy radishes from East Asia, South Asia, Europe and America. Divergence among the species, sub-species, and South/East Asian types coincided with Quaternary glaciations. A genus-level pan-genome was constructed with family-based, locus-based, and graph-based methods, and whole-genome comparisons revealed genetic variations ranging from single-nucleotide polymorphisms (SNPs) to inversions and translocations of whole ancestral karyotype (AK) blocks. Extensive gene flow occurred between wild, weedy and domesticated radishes. High frequencies of genome reshuffling, biased retention and large-fragment translocation have shaped the genomic diversity. Most variety-specific gene-rich blocks showed large structural variations. Extensive translocation and tandem duplication of dispensable genes were revealed in two large rearrangement-rich islands. Disease resistance genes mostly resided on specific and dispensable loci. Variations causing the loss of function of enzymes modulating gibberellin deactivation were identified and could play an important role in phenotype divergence and adaptive evolution. This study elucidates the genomic evolution underlying post-polyploid diploidization and the genetic improvement of radish crops, biological control of weeds and protection of wild species' germplasms.

RevDate: 2021-08-11

Baker JL (2021)

Complete Genomes of Clade G6 Saccharibacteria Suggest a Divergent Ecological Niche and Lifestyle.

mSphere [Epub ahead of print].

Saccharibacteria (formerly TM7) have reduced genomes and a small cell size and appear to have a parasitic lifestyle dependent on a bacterial host. Although there are at least 6 major clades of Saccharibacteria inhabiting the human oral cavity, complete genomes of oral Saccharibacteria were previously limited to the G1 clade. In this study, nanopore sequencing was used to obtain three complete genome sequences from clade G6. Phylogenetic analysis suggested the presence of at least 3 to 5 distinct species within G6, with two discrete taxa represented by the 3 complete genomes. G6 Saccharibacteria were highly divergent from the more-well-studied clade G1 and had the smallest genomes and lowest GC content of all Saccharibacteria. Pangenome analysis showed that although 97% of shared pan-Saccharibacteria core genes and 89% of G1-specific core genes had putative functions, only 50% of the 244 G6-specific core genes had putative functions, highlighting the novelty of this group. Compared to G1, G6 harbored divergent metabolic pathways. G6 genomes lacked an F1Fo ATPase, the pentose phosphate pathway, and several genes involved in nucleotide metabolism, which were all core genes for G1. G6 genomes were also unique compared to that of G1 in that they encoded d-lactate dehydrogenase, adenylate cyclase, limited glycerolipid metabolism, a homolog to a lipoarabinomannan biosynthesis enzyme, and the means to degrade starch. These differences at key metabolic steps suggest a distinct lifestyle and ecological niche for clade G6, possibly with alternative hosts and/or host dependencies, which would have significant ecological, evolutionary, and likely pathogenic implications. IMPORTANCE Saccharibacteria are ultrasmall parasitic bacteria that are common members of the oral microbiota and have been increasingly linked to disease and inflammation. However, the lifestyle and impact on human health of Saccharibacteria remain poorly understood, especially for the clades with no complete genomes (G2 to G6) or cultured isolates (G2 and G4 to G6). Obtaining complete genomes is of particular importance for Saccharibacteria, because they lack many of the "essential" core genes used for determining draft genome completeness, and few references exist outside clade G1. In this study, complete genomes of 3 G6 strains, representing two candidate species, were obtained and analyzed. The G6 genomes were highly divergent from that of G1 and enigmatic, with 50% of the G6 core genes having no putative functions. The significant difference in encoded functional pathways is suggestive of a distinct lifestyle and ecological niche, probably with alternative hosts and/or host dependencies, which would have major implications in ecology, evolution, and pathogenesis.

RevDate: 2021-08-10

Gómez-Sanz E, Haro-Moreno JM, Jensen SO, et al (2021)

The Resistome and Mobilome of Multidrug-Resistant Staphylococcus sciuri C2865 Unveil a Transferable Trimethoprim Resistance Gene, Designated dfrE, Spread Unnoticed.

mSystems [Epub ahead of print].

Methicillin-resistant Staphylococcus sciuri (MRSS) strain C2865 from a stranded dog in Nigeria was trimethoprim (TMP) resistant but lacked formerly described staphylococcal TMP-resistant dihydrofolate reductase genes (dfr). Whole-genome sequencing, comparative genomics, and pan-genome analyses were pursued to unveil the molecular bases for TMP resistance via resistome and mobilome profiling. MRSS C2865 comprised a species subcluster and positioned just above the intraspecies boundary. Lack of species host tropism was observed. S. sciuri exhibited an open pan-genome, while MRSS C2865 harbored the highest number of unique genes (75% associated with mobilome). Within this fraction, we discovered a transferable TMP resistance gene, named dfrE, which confers high-level TMP resistance in Staphylococcus aureus and Escherichia coli. dfrE was located in a novel multidrug resistance mosaic plasmid (pUR2865-34) encompassing adaptive, mobilization, and segregational stability traits. dfrE was formerly denoted as dfr_like in Exiguobacterium spp. from fish farm sediment in China but escaped identification in one macrococcal and diverse staphylococcal genomes in different Asian countries. dfrE shares the highest identity with dfr of soil-related Paenibacillus anaericanus (68%). Data analysis discloses that dfrE has emerged from a single ancestor and places S. sciuri as a plausible donor. C2865 unique fraction additionally enclosed novel chromosomal mobile islands, including a multidrug-resistant pseudo-SCCmec cassette, three apparently functional prophages (Siphoviridae), and an SaPI4-related staphylococcal pathogenicity island. Since dfrE seems not yet common in staphylococcal clinical specimens, our data promote early surveillance and enable molecular diagnosis. We evidence the genome plasticity of S. sciuri and highlight its role as a resourceful reservoir for adaptive traits. IMPORTANCE The discovery and surveillance of antimicrobial resistance genes (AMRG) and their mobilization platforms are critical to understand the evolution of bacterial resistance and to restrain further expansion. Limited genomic data are available on Staphylococcus sciuri; regardless, it is considered a reservoir for critical AMRG and mobile elements. We uncover a transferable staphylococcal TMP resistance gene, named dfrE, in a novel mosaic plasmid harboring additional resistance, adaptive, and self-stabilization features. dfrE is present but evaded detection in diverse species from varied sources geographically distant. Our analyses evidence that the dfrE-carrying element has emerged from a single ancestor and position S. sciuri as the donor species for dfrE spread. We also identify novel mobilizable chromosomal islands encompassing AMRG and three unrelated prophages. We prove high intraspecies heterogenicity and genome plasticity for S. sciuri. This work highlights the importance of genome-wide ecological studies to facilitate identification, characterization, and evolution routes of bacteria adaptive features.

RevDate: 2021-08-09

Hily JM, Poulicard N, Kubina J, et al (2021)

Metagenomic analysis of nepoviruses: diversity, evolution and identification of a genome region in members of subgroup A that appears to be important for host range.

Archives of virology [Epub ahead of print].

Data mining and metagenomic analysis of 277 open reading frame sequences of bipartite RNA viruses of the genus Nepovirus, family Secoviridae, were performed, documenting how challenging it can be to unequivocally assign a virus to a particular species, especially those in subgroups A and C, based on some of the currently adopted taxonomic demarcation criteria. This work suggests a possible need for their amendment to accommodate pangenome information. In addition, we revealed a host-dependent structure of arabis mosaic virus (ArMV) populations at a cladistic level and confirmed a phylogeographic structure of grapevine fanleaf virus (GFLV) populations. We also identified new putative recombination events in members of subgroups A, B and C. The evolutionary specificity of some capsid regions of ArMV and GFLV that were described previously and biologically validated as determinants of nematode transmission was circumscribed in silico. Furthermore, a C-terminal segment of the RNA-dependent RNA polymerase of members of subgroup A was predicted to be a putative host range determinant based on statistically supported higher π (substitutions per site) values for GFLV and ArMV isolates infecting Vitis spp. compared with non-Vitis-infecting ArMV isolates. This study illustrates how sequence information obtained via high-throughput sequencing can increase our understanding of mechanisms that modulate virus diversity and evolution and create new opportunities for advancing studies on the biology of economically important plant viruses.

RevDate: 2021-08-08

Iqbal S, Vollmers J, HA Janjua (2021)

Genome Mining and Comparative Genome Analysis Revealed Niche-Specific Genome Expansion in Antibacterial Bacillus pumilus Strain SF-4.

Genes, 12(7):.

The present study reports the isolation of antibacterial exhibiting Bacillus pumilus (B. pumilus) SF-4 from soil field. The genome of this strain SF-4 was sequenced and analyzed to acquire in-depth genomic level insight related to functional diversity, evolutionary history, and biosynthetic potential. The genome of the strain SF-4 harbor 12 Biosynthetic Gene Clusters (BGCs) including four Non-ribosomal peptide synthetases (NRPSs), two terpenes, and one each of Type III polyketide synthases (PKSs), hybrid (NRPS/PKS), lipopeptide, β-lactone, and bacteriocin clusters. Plant growth-promoting genes associated with de-nitrification, iron acquisition, phosphate solubilization, and nitrogen metabolism were also observed in the genome. Furthermore, all the available complete genomes of B. pumilus strains were used to highlight species boundaries and diverse niche adaptation strategies. Phylogenetic analyses revealed local diversification and indicate that strain SF-4 is a sister group to SAFR-032 and 150a. Pan-genome analyses of 12 targeted strains showed regions of genome plasticity which regulate function of these strains and proposed direct strain adaptations to specific habitats. The unique genome pool carries genes mostly associated with "biosynthesis of secondary metabolites, transport, and catabolism" (Q), "replication, recombination and repair" (L), and "unknown function" (S) clusters of orthologous groups (COG) categories. Moreover, a total of 952 unique genes and 168 exclusively absent genes were prioritized across the 12 genomes. While newly sequenced B. pumilus SF-4 genome consists of 520 accessory, 59 unique, and seven exclusively absent genes. The current study demonstrates genomic differences among 12 B. pumilus strains and offers comprehensive knowledge of the respective genome architecture which may assist in the agronomic application of this strain in future.

RevDate: 2021-08-05

Surachat K, Deachamag P, Kantachote D, et al (2021)

In silico comparative genomics analysis of Lactiplantibacillus plantarum DW12, a potential gamma-aminobutyric acid (GABA)-producing strain.

Microbiological research, 251:126833 pii:S0944-5013(21)00139-7 [Epub ahead of print].

Gamma-aminobutyric acid (GABA) is an amino that plays a major role as a neurotransmitter. It iscommonly produced by lactic acid bacteria (LAB) naturally found in fermented food and fruit. Lactiplantibacillus plantarum DW12 is a high potential GABA-producing strain isolated from a fermented beverage. In this study, to highlight its ability to produce GABA, we sequenced the genome of L. plantarum DW12 and then performed comprehensive bioinformatics and meta-analysis to compare the genomic data of previously published genomes. Also, the evolutionary analysis among L. plantarum species was demonstrated using pan-genome analysis against 576 genomes from the database. As a result, the DW12 genome comprises one circular chromosome of 3,217,574 bp. It contains several genes that encode for the production of antimicrobial compounds including plantaricin A, E, F, J, K, and N. The glutamic acid decarboxylase (GAD) operon was found in the DW12 genome, suggests a high potential of producing GABA in this strain. Therefore, L. plantarum DW12 could be a good candidate as a starter culture in the beverage and food industries due to its safety aspects and ability to produce GABA.

RevDate: 2021-08-04

Hufnagel B, Soriano A, Taylor J, et al (2021)

Pangenome of white lupin provides insights into the diversity of the species.

Plant biotechnology journal [Epub ahead of print].

White lupin is an old crop with renewed interest due to its seed high protein content and high nutritional value. Despite a long domestication history in the Mediterranean basin, modern breeding efforts have been fairly scarce. Recent sequencing of its genome has provided tools for further description of genetic resources but detailed characterization of genomic diversity is still missing. Here, we report the genome sequencing of 39 accessions that were used to establish a white lupin pangenome. We defined 32,068 core genes that are present in all individuals and14,822 that are absent in some and may represent a gene pool for breeding for improved productivity, grain quality and stress adaptation. We used this new pangenome resource to identify candidate genes for alkaloid synthesis, a key grain quality trait. The white lupin pangenome provides a novel genetic resource to better understand how domestication has shaped the genomic variability within this crop. Thus, this pangenome resource is an important step towards the effective and efficient genetic improvement of white lupin to help meet the rapidly growing demand for plant protein sources for human and animal consumption.

RevDate: 2021-08-06

Maarala AI, Arasalo O, Valenzuela D, et al (2021)

Distributed hybrid-indexing of compressed pan-genomes for scalable and fast sequence alignment.

PloS one, 16(8):e0255260.

Computational pan-genomics utilizes information from multiple individual genomes in large-scale comparative analysis. Genetic variation between case-controls, ethnic groups, or species can be discovered thoroughly using pan-genomes of such subpopulations. Whole-genome sequencing (WGS) data volumes are growing rapidly, making genomic data compression and indexing methods very important. Despite current space-efficient repetitive sequence compression and indexing methods, the deployed compression methods are often sequential, computationally time-consuming, and do not provide efficient sequence alignment performance on vast collections of genomes such as pan-genomes. For performing rapid analytics with the ever-growing genomics data, data compression and indexing methods have to exploit distributed and parallel computing more efficiently. Instead of strict genome data compression methods, we will focus on the efficient construction of a compressed index for pan-genomes. Compressed hybrid-index enables fast sequence alignments to several genomes at once while shrinking the index size significantly compared to traditional indexes. We propose a scalable distributed compressed hybrid-indexing method for large genomic data sets enabling pan-genome-based sequence search and read alignment capabilities. We show the scalability of our tool, DHPGIndex, by executing experiments in a distributed Apache Spark-based computing cluster comprising 448 cores distributed over 26 nodes. The experiments have been performed both with human and bacterial genomes. DHPGIndex built a BLAST index for n = 250 human pan-genome with an 870:1 compression ratio (CR) in 342 minutes and a Bowtie2 index with 157:1 CR in 397 minutes. For n = 1,000 human pan-genome, the BLAST index was built in 1520 minutes with 532:1 CR and the Bowtie2 index in 1938 minutes with 76:1 CR. Bowtie2 aligned 14.6 GB of paired-end reads to the compressed (n = 1,000) index in 31.7 minutes on a single node. Compressing n = 13,375,031 (488 GB) GenBank database to BLAST index resulted in CR of 62:1 in 575 minutes. BLASTing 189,864 Crispr-Cas9 gRNA target sequences (23 MB in total) to the compressed index of human pan-genome (n = 1,000) finished in 45 minutes on a single node. 30 MB mixed bacterial sequences were (n = 599) were blasted to the compressed index of 488 GB GenBank database (n = 13,375,031) in 26 minutes on 25 nodes. 78 MB mixed sequences (n = 4,167) were blasted to the compressed index of 18 GB E. coli sequence database (n = 745,409) in 5.4 minutes on a single node.

RevDate: 2021-08-03

Awan F, Ali MM, Dong Y, et al (2021)

In Silico Analysis of Potential Outer Membrane Beta-Barrel Proteins in Aeromonas hydrophila Pangenome.

International journal of peptide research and therapeutics [Epub ahead of print].

Outer membrane proteins (OMPs) of Aeromonas hydrophila have a variety of functional roles in virulence and pathogenesis and represent promising targets for vaccine development. The main objective of this study was to develop an in-silico model of beta-barrel OMP present among the valid A. hydrophila pangenomes (n = 22). With a program named the β-barrel Outer Membrane Protein Predictor (BOMP), total beta-barrel OMPs (n = 3127) were predicted across 22 genomes with the estimated median number of 64 per genome. In pangenome analysis, only 32 OMPs were found to be conserved. These beta-barrel OMPs also showed variations among source of isolation, COG and KEGG classes. Among 32 conserved OMPs, a highly antigenic protein was identified by utilizing Vaxijen. With B cell epitope predictions, two fragments of amino acid sequences i.e. GLTLGAQFTGNNDPQNADRSN (21 mer) and FKPSLAYLRTDVKDNARGI DDTATEY (26 mer) bearing B-cell binding sites were selected. Further, an epitope (12 amino acids: GLTLGAQFTGNN) that complexes to maximum MHC alleles with a higher antigenicity was determined. The analysis of evolutionary forces on the identified OMP sequence and epitope indicated that none of basic amino acid sites has shown significantly different substitution ratios. This conserved protein and epitope will be helpful in developing a vaccine that may be effective against all the A. hydrophila strains. Also, this study provides a theoretical basis for vaccine design against other pathogenic bacteria.

Supplementary Information: The online version contains supplementary material available at 10.1007/s10989-021-10259-z.

RevDate: 2021-07-30

Wang K, Hu H, Tian Y, et al (2021)

The chicken pan-genome reveals gene content variation and a promoter region deletion in IGF2BP1 affecting body size.

Molecular biology and evolution pii:6332014 [Epub ahead of print].

Domestication and breeding have reshaped the genomic architecture of chicken, but the retention and loss of genomic elements during these evolutionary processes remain unclear. We present the first chicken pan-genome constructed using 664 individuals, which identified an additional ∼66.5 Mb sequences that are absent from the reference genome (GRCg6a). The constructed pan-genome encoded 20,491 predicated protein-coding genes, of which higher expression level are observed in conserved genes relative to dispensable genes. Presence/absence variation (PAV) analyses demonstrated that gene PAV in chicken was shaped by selection, genetic drift, and hybridization. PAV-based GWAS identified numerous candidate mutations related to growth, carcass composition, meat quality, or physiological traits. Among them, a deletion in the promoter region of IGF2BP1 affecting chicken body size is reported, which is supported by functional studies and extra samples. This is the first time to report the causal variant of chicken body size QTL located at chromosome 27 which was repeatedly reported. Therefore, the chicken pan-genome is a useful resource for biological discovery and breeding. It improves our understanding of chicken genome diversity and provides materials to unveil the evolution history of chicken domestication.

RevDate: 2021-07-30

Hu H, Scheben A, Verpaalen B, et al (2021)

Amborella gene presence/absence variation is associated with abiotic stress responses that may contribute to environmental adaptation.

Amborella trichopoda (Amborellaceae) is the single living sister species of all other extant flowering plants and only occurs in rain forest habitats on the remote island of New Caledonia. These features make Amborella an important species in which to study genetic variation, including gene presence/absence variants (PAVs). Here, we apply the reference genome based iterative mapping and assembly strategy (Bayer et al., 2020) to assess gene diversity across ten diverse individuals.

RevDate: 2021-08-01

Davidson RM, Benoit JB, Kammlade SM, et al (2021)

Genomic characterization of sporadic isolates of the dominant clone of Mycobacterium abscessus subspecies massiliense.

Scientific reports, 11(1):15336.

Recent studies have characterized a dominant clone (Clone 1) of Mycobacterium abscessus subspecies massiliense (M. massiliense) associated with high prevalence in cystic fibrosis (CF) patients, pulmonary outbreaks in the United States (US) and United Kingdom (UK), and a Brazilian epidemic of skin infections. The prevalence of Clone 1 in non-CF patients in the US and the relationship of sporadic US isolates to outbreak clones are not known. We surveyed a reference US Mycobacteria Laboratory and a US biorepository of CF-associated Mycobacteria isolates for Clone 1. We then compared genomic variation and antimicrobial resistance (AMR) mutations between sporadic non-CF, CF, and outbreak Clone 1 isolates. Among reference lab samples, 57/147 (39%) of patients with M. massiliense had Clone 1, including pulmonary and extrapulmonary infections, compared to 11/64 (17%) in the CF isolate biorepository. Core and pan genome analyses revealed that outbreak isolates had similar numbers of single nucleotide polymorphisms (SNPs) and accessory genes as sporadic US Clone 1 isolates. However, pulmonary outbreak isolates were more likely to have AMR mutations compared to sporadic isolates. Clone 1 isolates are present among non-CF and CF patients across the US, but additional studies will be needed to resolve potential routes of transmission and spread.

RevDate: 2021-07-26

Bayer PE, Scheben A, Golicz AA, et al (2021)

Modelling of gene loss propensity in the pangenomes of three Brassica species suggests different mechanisms between polyploids and diploids.

Plant biotechnology journal [Epub ahead of print].

Plant genomes demonstrate significant presence/absence variation (PAV) within a species, however the factors that lead to this variation have not been studied systematically in Brassica across diploids and polyploids. Here, we developed pangenomes of polyploid Brassica napus and its two diploid progenitor genomes B. rapa and B. oleracea to infer how PAV may differ between diploids and polyploids. Modelling of gene loss suggests that loss propensity is primarily associated with transposable elements in the diploids while in B. napus, gene loss propensity is associated with homoeologous recombination. We use these results to gain insights into the different causes of gene loss, both in diploids and following polyploidisation, and pave the way for the application of machine learning methods to understanding the underlying biological and physical causes of gene presence/absence.

RevDate: 2021-08-07

Hernández-Juárez LE, Camorlinga M, Méndez-Tenorio A, et al (2021)

Analyses of publicly available Hungatella hathewayi genomes revealed genetic distances indicating they belong to more than one species.

Virulence, 12(1):1950-1964.

Hungatella hathewayi has been observed to be a member of the gut microbiome. Unfortunately, little is known about this organism in spite of being associated with human fatalities; it is important to understand virulence mechanisms and epidemiological prospective to cause disease. In this study, a patient with chronic neurologic symptoms presented to the clinic with subsequent isolation of a strain with phenotypic characteristics suggestive of Clostridium difficile. However, whole-genome sequence found the organism to be H. hathewayi. Analysis including publicly available Hungatella genomes found substantial genomic differences as compared to the type strain, indicating this isolate was not C. difficile. We examined the whole-genome of Hungatella species and related genera, using comparative genomics to fully examine species identification and toxin production. Orthogonal phylogenetic using the 16S rRNA gene and entire genome analyses that included genome distance analyses using Genome-to-Genome Distance (GGDC), Average Nucleotide Identity (ANI), and a pan-genome analysis with inclusion of available public genomes determined the speciation to be Hungatella. Two clearly differentiated groups were identified, one including a reference H. hathewayi genome (strain DSM-13,479) and a second group that was determined to be H. effluvii, which included our clinical isolate. Also, some genomes reported as H. hathewayi were found to belong to other genera, including Clostridium and Faecalicatena. We show that the Hungatella species have an open pan-genome reflecting high genomic diversity. This study highlights the importance of correctly assigning taxonomic identification, particularly in disease-associated strains, to better understand virulence and therapeutic options.

RevDate: 2021-07-23

Liu Z, Zhao Y, Sossah FL, et al (2021)

Characterization, Pathogenicity, Phylogeny, and Comparative Genomic Analysis of Pseudomonas tolaasii Strains Isolated from Various Mushrooms in China.

Phytopathology [Epub ahead of print].

Since 2016, devastating bacterial blotch affecting the fruiting bodies of Agaricus bisporus, Cordyceps militaris, Flammulina filiformis, and Pleurotus ostreatus in China has caused severe economic losses. We isolated 102 bacterial strains and characterized them polyphasically. We identified the causal agent as Pseudomonas tolaasii and confirmed the pathogenicity of the strains. A host range test further confirmed the pathogen's ability to infect multiple hosts. This is the first report in China of bacterial blotch in C. militaris caused by P. tolaasii. Whole-genome sequences were generated for three strains: Pt11 (6.48 Mb), Pt51 (6.63 Mb), and Pt53 (6.80 Mb), and pangenome analysis was performed with 13 other publicly accessible P. tolaasii genomes to determine their genetic diversity, virulence, antibiotic resistance, and mobile genetic elements. The pangenome of P. tolaasii is open, and many more gene families are likely to emerge with further genome sequencing. Multilocus sequence analysis using the sequences of four common housekeeping genes (glns, gyrB, rpoB, and rpoD) showed high genetic variability among the P. tolaasii strains, with 115 strains clustered into a monophyletic group. The P. tolaasii strains possess various genes for secretion systems, virulence factors, carbohydrate-active enzymes, toxins, secondary metabolites, and antimicrobial resistance genes that are associated with pathogenesis and adapted to different environments. The myriad of insertion sequences, integrons, prophages, and genome islands encoded in the strains may contribute to genome plasticity, virulence, and antibiotic resistance. These findings advance understanding of the determinants of virulence, which can be targeted for the effective control of bacterial blotch disease.

RevDate: 2021-07-21

Bayer PE, Petereit J, Danilevicz MF, et al (2021)

The application of pangenomics and machine learning in genomic selection in plants.

The plant genome [Epub ahead of print].

Genomic selection approaches have increased the speed of plant breeding, leading to growing crop yields over the last decade. However, climate change is impacting current and future yields, resulting in the need to further accelerate breeding efforts to cope with these changing conditions. Here we present approaches to accelerate plant breeding and incorporate nonadditive effects in genomic selection by applying state-of-the-art machine learning approaches. These approaches are made more powerful by the inclusion of pangenomes, which represent the entire genome content of a species. Understanding the strengths and limitations of machine learning methods, compared with more traditional genomic selection efforts, is paramount to the successful application of these methods in crop breeding. We describe examples of genomic selection and pangenome-based approaches in crop breeding, discuss machine learning-specific challenges, and highlight the potential for the application of machine learning in genomic selection. We believe that careful implementation of machine learning approaches will support crop improvement to help counter the adverse outcomes of climate change on crop production.

RevDate: 2021-07-21

Fiedoruk K, Drewnowska JM, Mahillon J, et al (2021)

Pan-Genome Portrait of Bacillus mycoides Provides Insights into the Species Ecology and Evolution.

Microbiology spectrum [Epub ahead of print].

Bacillus mycoides is poorly known despite its frequent occurrence in a wide variety of environments. To provide direct insight into its ecology and evolutionary history, a comparative investigation of the species pan-genome and the functional gene categorization of 35 isolates obtained from soil samples from northeastern Poland was performed. The pan-genome of these isolates is composed of 20,175 genes and is characterized by a strong predominance of adaptive genes (∼83%), a significant amount of plasmid genes (∼37%), and a great contribution of prophages and insertion sequences. The pan-genome structure and phylodynamic studies had suggested a wide genomic diversity among the isolates, but no correlation between lineages and the bacillus origin was found. Nevertheless, the two B. mycoides populations, one from Białowieża National Park, the last European natural primeval forest with soil classified as organic, and the second from mineral soil samples taken in a farm in Jasienówka, a place with strong anthropogenic pressure, differ significantly in the frequency of genes encoding proteins enabling bacillus adaptation to specific stress conditions and production of a set of compounds, thus facilitating their colonization of various ecological niches. Furthermore, differences in the prevalence of essential stress sigma factors might be an important trail of this process. Due to these numerous adaptive genes, B. mycoides is able to quickly adapt to changing environmental conditions. IMPORTANCE This research allows deeper understanding of the genetic organization of natural bacterial populations, specifically, Bacillus mycoides, a psychrotrophic member of the Bacillus cereus group that is widely distributed worldwide, especially in areas with continental cold climates. These thorough analyses made it possible to describe, for the first time, the B. mycoides pan-genome, phylogenetic relationship within this species, and the mechanisms behind the species ecology and evolutionary history. Our study indicates a set of functional properties and adaptive genes, in particular, those encoding sigma factors, associated with B. mycoides acclimatization to specific ecological niches and changing environmental conditions.

RevDate: 2021-07-24
CmpDate: 2021-07-22

Steidele CE, R Stam (2021)

Multi-omics approach highlights differences between RLP classes in Arabidopsis thaliana.

BMC genomics, 22(1):557.

BACKGROUND: The Leucine rich-repeat (LRR) receptor-like protein (RLP) family is a complex gene family with 57 members in Arabidopsis thaliana. Some members of the RLP family are known to be involved in basal developmental processes, whereas others are involved in defence responses. However, functional data is currently only available for a small subset of RLPs, leaving the remaining ones classified as RLPs of unknown function.

RESULTS: Using publicly available datasets, we annotated RLPs of unknown function as either likely defence-related or likely fulfilling a more basal function in plants. Then, using these categories, we can identify important characteristics that differ between the RLP subclasses. We found that the two classes differ in abundance on both transcriptome and proteome level, physical clustering in the genome and putative interaction partners. However, the classes do not differ in the genetic di versity of their individual members in accessible pan-genome data.

CONCLUSIONS: Our work has several implications for work related to functional studies on RLPs as well as for the understanding of RLP gene family evolution. Using our annotations, we can make suggestions on which RLPs can be identified as potential immune receptors using genetics tools and thereby complement disease studies. The lack of differences in nucleotide diversity between the two RLP subclasses further suggests that non-synonymous diversity of gene sequences alone cannot distinguish defence from developmental genes. By contrast, differences in transcript and protein abundance or clustering at genomic loci might also allow for functional annotations and characterisation in other plant species.

RevDate: 2021-07-27

Wu JJ, Chou HP, Huang JW, et al (2021)

Genomic and biochemical characterization of antifungal compounds produced by Bacillus subtilis PMB102 against Alternaria brassicicola.

Microbiological research, 251:126815 pii:S0944-5013(21)00121-X [Epub ahead of print].

Bacillus subtilis is ubiquitous and capable of producing various metabolites, which make the bacterium a good candidate as a biocontrol agent for managing plant diseases. In this study, a phyllosphere bacterium B. subtilis PMB102 isolated from tomato leaf was found to inhibit the growth of Alternaria brassicicola ABA-31 on PDA and suppress Alternaria leaf spot on Chinese cabbage (Brassica rapa). The genome of PMB102 (Accession no. CP047645) was completely sequenced by Nanopore and Illumina technology to generate a circular chromosome of 4,103,088 bp encoding several gene clusters for synthesizing bioactive compounds. PMB102 and the other B. subtilis strains from different sources were compared in pangenome analysis to identify a suite of conserved genes involved in biocontrol and habitat adaptation. Two predicted gene products, surfactin and fengycin, were extracted from PMB102 culture filtrates and verified by LC-MS/MS. The antifungal activity of fengycin was tested on A. brassicicola ABA-31 in bioautography to inhibit hyphae growth, and in co-culturing assays to elicit the formation of swollen hyphae. Our data revealed that B. subtilis PMB102 suppresses Alternaria leaf spot by the production of antifungal metabolites, and fengycin plays an important role to inhibit the vegetative growth of A. brassicicola ABA-31.

RevDate: 2021-08-06
CmpDate: 2021-08-06

Branford I, Johnson S, Chapwanya A, et al (2021)

Comprehensive Molecular Dissection of Dermatophilus congolensis Genome and First Observation of tet(Z) Tetracycline Resistance.

International journal of molecular sciences, 22(13):.

Dermatophilus congolensis is a bacterial pathogen mostly of ruminant livestock in the tropics/subtropics and certain temperate climate areas. It causes dermatophilosis, a skin disease that threatens food security by lowering animal productivity and compromising animal health and welfare. Since it is a prevalent infection in ruminants, dermatophilosis warrants more research. There is limited understanding of its pathogenicity, and as such, there is no registered vaccine against D. congolensis. To better understanding the genomics of D. congolensis, the primary aim of this work was to investigate this bacterium using whole-genome sequencing and bioinformatic analysis. D. congolensis is a high GC member of the Actinobacteria and encodes approximately 2527 genes. It has an open pan-genome, contains many potential virulence factors, secondary metabolites and encodes at least 23 housekeeping genes associated with antimicrobial susceptibility mechanisms and some isolates have an acquired antimicrobial resistance gene. Our isolates contain a single CRISPR array Cas type IE with classical 8 Cas genes. Although the isolates originate from the same geographical location there is some genomic diversity among them. In conclusion, we present the first detailed genomic study on D. congolensis, including the first observation of tet(Z), a tetracycline resistance-conferring gene.

RevDate: 2021-08-03

Basharat Z, Jahanzaib M, N Rahman (2021)

Therapeutic target identification via differential genome analysis of antibiotic resistant Shigella sonnei and inhibitor evaluation against a selected drug target.

Infection, genetics and evolution : journal of molecular epidemiology and evolutionary genetics in infectious diseases, 94:105004 pii:S1567-1348(21)00302-6 [Epub ahead of print].

Shigella sonnei has been implicated in bloody diarrhea (accompanied by abdominal pain and fever) and is an emerging pathogen of concern, especially in developing countries. The major means of transmission is the fecal-oral route while sexual transmission has also been reported. In children, the impact might be stunted growth due to life-threatening illness. Resistance has been reported in this species for several types of antibiotics. In this study, we retrieved the antibiotic-resistant labeled whole genome sequences of the species from the PATRIC database and performed a pan-genome analysis to filter out core genes. Antibiotic resistance was studied in the core, accessory and unique genome. Core genes were utilized as seed substance for essentiality analysis and drug candidate assignment. Product of the gene aroG, i.e. chorismate biosynthetic process 3-deoxy-7-phosphoheptulonate synthase enzyme, responsible for aromatic amino acid family biosynthetic process, was taken for further downstream processing. Natural product libraries of flavonoids (n = 178), ZINC database derived inhibitor compounds of the 3-deoxy-7-phosphoheptulonate synthase enzyme (n = 112), and streptomycin compounds (n = 737) were docked to find out potent inhibitors, followed by dynamics simulation of 50 ns each for top compounds.. Physicochemical and ADMET profiling of the top compounds was done to analyze their safety for consumption. We propose that the top compounds: Phytoene from Streptomycin library and ZINC000036444158 (synonym:1,16-bis[(dihydroxyphosphinyl)oxy]hexadecane) from 3-deoxy-7-phosphoheptulonate synthase inhibitor library of ZINC database (and used as a control in this study) should be tested in vitro against Shigella sonnei, to fully determine their efficacy. This could add to the drying pipeline of potent drug molecules against emerging pathogens.

RevDate: 2021-07-18

Bornowski N, Michel KJ, Hamilton JP, et al (2021)

Genomic variation within the maize stiff-stalk heterotic germplasm pool.

The plant genome [Epub ahead of print].

The stiff-stalk heterotic group in Maize (Zea mays L.) is an important source of inbreds used in U.S. commercial hybrid production. Founder inbreds B14, B37, B73, and, to a lesser extent, B84, are found in the pedigrees of a majority of commercial seed parent inbred lines. We created high-quality genome assemblies of B84 and four expired Plant Variety Protection (ex-PVP) lines LH145 representing B14, NKH8431 of mixed descent, PHB47 representing B37, and PHJ40, which is a Pioneer Hi-Bred International (PHI) early stiff-stalk type. Sequence was generated using long-read sequencing achieving highly contiguous assemblies of 2.13-2.18 Gbp with N50 scaffold lengths >200 Mbp. Inbred-specific gene annotations were generated using a core five-tissue gene expression atlas, whereas transposable element (TE) annotation was conducted using de novo and homology-directed methodologies. Compared with the reference inbred B73, synteny analyses revealed extensive collinearity across the five stiff-stalk genomes, although unique components of the maize pangenome were detected. Comparison of this set of stiff-stalk inbreds with the original Iowa Stiff Stalk Synthetic breeding population revealed that these inbreds represent only a proportion of variation in the original stiff-stalk pool and there are highly conserved haplotypes in released public and ex-Plant Variety Protection inbreds. Despite the reduction in variation from the original stiff-stalk population, substantial genetic and genomic variation was identified supporting the potential for continued breeding success in this pool. The assemblies described here represent stiff-stalk inbreds that have historical and commercial relevance and provide further insight into the emerging maize pangenome.

RevDate: 2021-07-16

Verma DK, Chaudhary C, Singh L, et al (2021)

Corrigendum: Isolation and Taxonomic Characterization of Novel Haloarchaeal Isolates From Indian Solar Saltern: A Brief Review on Distribution of Bacteriorhodopsins and V-Type ATPases in Haloarchaea.

Frontiers in microbiology, 12:713942.

[This corrects the article DOI: 10.3389/fmicb.2020.554927.].

RevDate: 2021-07-29

Liao J, Guo X, Weller DL, et al (2021)

Nationwide genomic atlas of soil-dwelling Listeria reveals effects of selection and population ecology on pangenome evolution.

Nature microbiology, 6(8):1021-1030.

Natural bacterial populations can display enormous genomic diversity, primarily in the form of gene content variation caused by the frequent exchange of DNA with the local environment. However, the ecological drivers of genomic variability and the role of selection remain controversial. Here, we address this gap by developing a nationwide atlas of 1,854 Listeria isolates, collected systematically from soils across the contiguous United States. We found that Listeria was present across a wide range of environmental parameters, being mainly controlled by soil moisture, molybdenum and salinity concentrations. Whole-genome data from 594 representative strains allowed us to decompose Listeria diversity into 12 phylogroups, each with large differences in habitat breadth and endemism. 'Cosmopolitan' phylogroups, prevalent across many different habitats, had more open pangenomes and displayed weaker linkage disequilibrium, reflecting higher rates of gene gain and loss, and allele exchange than phylogroups with narrow habitat ranges. Cosmopolitan phylogroups also had a large fraction of genes affected by positive selection. The effect of positive selection was more pronounced in the phylogroup-specific core genome, suggesting that lineage-specific core genes are important drivers of adaptation. These results indicate that genome flexibility and recombination are the consequence of selection to survive in variable environments.

RevDate: 2021-07-14

Norri T, Cazaux B, Dönges S, et al (2021)

Founder Reconstruction Enables Scalable and Seamless Pangenomic Analysis.

Bioinformatics (Oxford, England) pii:6321452 [Epub ahead of print].

MOTIVATION: Variant calling workflows that utilize a single reference sequence are the de facto standard elementary genomic analysis routine for resequencing projects. Various ways to enhance the reference with pangenomic information have been proposed, but scalability combined with seamless integration to existing workflows remains a challenge.

RESULTS: We present PanVC with founder sequences, a scalable and accurate variant calling workflow based on a multiple alignment of reference sequences. Scalability is achieved by removing duplicate parts up to a limit into a founder multiple alignment, that is then indexed using a hybrid scheme that exploits general purpose read aligners. Our implemented workflow uses GATK or BCFtools for variant calling, but the various steps of our workflow (e.g. vcf2multialign tool, founder reconstruction) can be of independent interest as a basis for creating novel pangenome analysis workflows beyond variant calling.

AVAILABILITY: Our open access tools and instructions how to reproduce our experiments are available at the following address:

SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

RevDate: 2021-07-22
CmpDate: 2021-07-21

Lu TY, Human Genome Structural Variation Consortium, MJP Chaisson (2021)

Profiling variable-number tandem repeat variation across populations using repeat-pangenome graphs.

Nature communications, 12(1):4250.

Variable number tandem repeats (VNTRs) are composed of consecutive repetitive DNA with hypervariable repeat count and composition. They include protein coding sequences and associations with clinical disorders. It has been difficult to incorporate VNTR analysis in disease studies that use short-read sequencing because the traditional approach of mapping to the human reference is less effective for repetitive and divergent sequences. In this work, we solve VNTR mapping for short reads with a repeat-pangenome graph (RPGG), a data structure that encodes both the population diversity and repeat structure of VNTR loci from multiple haplotype-resolved assemblies. We develop software to build a RPGG, and use the RPGG to estimate VNTR composition with short reads. We use this to discover VNTRs with length stratified by continental population, and expression quantitative trait loci, indicating that RPGG analysis of VNTRs will be critical for future studies of diversity and disease.

RevDate: 2021-07-15
CmpDate: 2021-07-15

Jain C, Tavakoli N, S Aluru (2021)

A variant selection framework for genome graphs.

Bioinformatics (Oxford, England), 37(Suppl_1):i460-i467.

MOTIVATION: Variation graph representations are projected to either replace or supplement conventional single genome references due to their ability to capture population genetic diversity and reduce reference bias. Vast catalogues of genetic variants for many species now exist, and it is natural to ask which among these are crucial to circumvent reference bias during read mapping.

RESULTS: In this work, we propose a novel mathematical framework for variant selection, by casting it in terms of minimizing variation graph size subject to preserving paths of length α with at most δ differences. This framework leads to a rich set of problems based on the types of variants [e.g. single nucleotide polymorphisms (SNPs), indels or structural variants (SVs)], and whether the goal is to minimize the number of positions at which variants are listed or to minimize the total number of variants listed. We classify the computational complexity of these problems and provide efficient algorithms along with their software implementation when feasible. We empirically evaluate the magnitude of graph reduction achieved in human chromosome variation graphs using multiple α and δ parameter values corresponding to short and long-read resequencing characteristics. When our algorithm is run with parameter settings amenable to long-read mapping (α = 10 kbp, δ = 1000), 99.99% SNPs and 73% SVs can be safely excluded from human chromosome 1 variation graph. The graph size reduction can benefit downstream pan-genome analysis.


SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

RevDate: 2021-08-02

Pedrós-Alió C (2021)

Time travel in microorganisms.

Systematic and applied microbiology, 44(4):126227.

RevDate: 2021-07-12

Nie S, Wang B, Ding H, et al (2021)

Genome assembly of the Chinese maize elite inbred line RP125 and its EMS mutant collection provide new resources for maize genetics research and crop improvement.

The Plant journal : for cell and molecular biology [Epub ahead of print].

Maize is an important crop worldwide, as well as a valuable model with vast genetic diversity. Accurate genome and annotation information for a wide range of inbred lines would provide valuable resources for crop improvement and pan-genome characterization. In this study, we generated a high-quality de novo genome assembly (contig N50 of 15.43 megabases) of the Chinese elite inbred line RP125 using Nanopore long-read sequencing and Hi-C scaffolding, which yield highly contiguous, chromosome-length scaffolds. Global comparison of the RP125 genome with those of B73, W22, and Mo17 revealed a large number of structural variations. To create new germplasm for maize research and crop improvement, we carried out an EMS mutagenesis screen on RP125. We obtained a total of 5,818 independent M2 families, with 946 mutants showing heritable phenotypes. Taking advantage of the high-quality RP125 genome, we successfully cloned 10 mutants from the EMS library, including the novel kernel mutant qk1 (quekou: 'missing a small part' in Chinese), which exhibited partial loss of endosperm and a starch accumulation defect. QK1 encodes a predicted metal tolerance protein that is specifically required for iron transport. Increased accumulation of iron and ROS as well as ferroptosis-like cell death were detected in endosperm of qk1. Our study provides the community with a high-quality genome sequence and a large collection of mutant germplasm.

RevDate: 2021-07-28

Noroy C, DF Meyer (2021)

The super repertoire of type IV effectors in the pangenome of Ehrlichia spp. provides insights into host-specificity and pathogenesis.

PLoS computational biology, 17(7):e1008788.

The identification of bacterial effectors is essential to understand how obligatory intracellular bacteria such as Ehrlichia spp. manipulate the host cell for survival and replication. Infection of mammals-including humans-by the intracellular pathogenic bacteria Ehrlichia spp. depends largely on the injection of virulence proteins that hijack host cell processes. Several hypothetical virulence proteins have been identified in Ehrlichia spp., but one so far has been experimentally shown to translocate into host cells via the type IV secretion system. However, the current challenge is to identify most of the type IV effectors (T4Es) to fully understand their role in Ehrlichia spp. virulence and host adaptation. Here, we predict the T4E repertoires of four sequenced Ehrlichia spp. and four other Anaplasmataceae as comparative models (pathogenic Anaplasma spp. and Wolbachia endosymbiont) using previously developed S4TE 2.0 software. This analysis identified 579 predicted T4Es (228 pT4Es for Ehrlichia spp. only). The effector repertoires of Ehrlichia spp. overlapped, thereby defining a conserved core effectome of 92 predicted effectors shared by all strains. In addition, 69 species-specific T4Es were predicted with non-canonical GC% mostly in gene sparse regions of the genomes and we observed a bias in pT4Es according to host-specificity. We also identified new protein domain combinations, suggesting novel effector functions. This work presenting the predicted effector collection of Ehrlichia spp. can serve as a guide for future functional characterisation of effectors and design of alternative control strategies against these bacteria.

RevDate: 2021-07-13

Cao H, Xu H, Ning C, et al (2021)

Multi-Omics Approach Reveals the Potential Core Vaccine Targets for the Emerging Foodborne Pathogen Campylobacter jejuni.

Frontiers in microbiology, 12:665858.

Campylobacter jejuni is a leading cause of bacterial gastroenteritis in humans around the world. The emergence of bacterial resistance is becoming more serious; therefore, development of new vaccines is considered to be an alternative strategy against drug-resistant pathogen. In this study, we investigated the pangenome of 173 C. jejuni strains and analyzed the phylogenesis and the virulence factor genes. In order to acquire a high-quality pangenome, genomic relatedness was firstly performed with average nucleotide identity (ANI) analyses, and an open pangenome of 8,041 gene families was obtained with the correct taxonomy genomes. Subsequently, the virulence property of the core genome was analyzed and 145 core virulence factor (VF) genes were obtained. Upon functional genomics and immunological analyses, five core VF proteins with high antigenicity were selected as potential core vaccine targets for humans. Furthermore, functional annotations indicated that these proteins are involved in important molecular functions and biological processes, such as adhesion, regulation, and secretion. In addition, transcriptome analysis in human cells and pig intestinal loop proved that these vaccine target genes are important in the virulence of C. jejuni in different hosts. Comprehensive pangenome and relevant animal experiments will facilitate discovering the potential core vaccine targets with improved efficiency in reverse vaccinology. Likewise, this study provided some insights into the genetic polymorphism and phylogeny of C. jejuni and discovered potential vaccine candidates for humans. Prospective development of new vaccines using the targets will be an alternative to the use of antibiotics and prevent the development of multidrug-resistant C. jejuni in humans and even other animals.

RevDate: 2021-07-13

Banerjee R, Chaudhari NM, Lahiri A, et al (2021)

Interplay of Various Evolutionary Modes in Genome Diversification and Adaptive Evolution of the Family Sulfolobaceae.

Frontiers in microbiology, 12:639995.

Sulfolobaceae family, comprising diverse thermoacidophilic and aerobic sulfur-metabolizing Archaea from various geographical locations, offers an ideal opportunity to infer the evolutionary dynamics across the members of this family. Comparative pan-genomics coupled with evolutionary analyses has revealed asymmetric genome evolution within the Sulfolobaceae family. The trend of genome streamlining followed by periods of differential gene gains resulted in an overall genome expansion in some species of this family, whereas there was reduction in others. Among the core genes, both Sulfolobus islandicus and Saccharolobus solfataricus showed a considerable fraction of positively selected genes and also higher frequencies of gene acquisition. In contrast, Sulfolobus acidocaldarius genomes experienced substantial amount of gene loss and strong purifying selection as manifested by relatively lower genome size and higher genome conservation. Central carbohydrate metabolism and sulfur metabolism coevolved with the genome diversification pattern of this archaeal family. The autotrophic CO2 fixation with three significant positively selected enzymes from S. islandicus and S. solfataricus was found to be more imperative than heterotrophic CO2 fixation for Sulfolobaceae. Overall, our analysis provides an insight into the interplay of various genomic adaptation strategies including gene gain-loss, mutation, and selection influencing genome diversification of Sulfolobaceae at various taxonomic levels and geographical locations.

RevDate: 2021-07-27

Begrem S, Jérôme M, Leroi F, et al (2021)

Genomic diversity of Serratia proteamaculans and Serratia liquefaciens predominant in seafood products and spoilage potential analyses.

International journal of food microbiology, 354:109326 pii:S0168-1605(21)00285-3 [Epub ahead of print].

Serratia sp. cause food losses and waste due to spoilage; it is noteworthy that they represent a dominant population in seafood. The main spoilage associated species comprise S. liquefaciens, S. grimesii, S. proteamaculans and S. quinivorans, also known as S. liquefaciens-like strains. These species are difficult to discriminate since classical 16S rRNA gene-based sequences do not possess sufficient resolution. In this study, a phylogeny based on the short-length luxS gene was able to speciate 47 Serratia isolates from seafood, with S. proteamaculans being the main species from fresh salmon and tuna, cold-smoked salmon, and cooked shrimp while S. liquefaciens was only found in cold-smoked salmon. The genome of the first S. proteamaculans strain isolated from the seafood matrix (CD3406 strain) was sequenced. Pangenome analyses of S. proteamaculans and S. liquefaciens indicated high adaptation potential. Biosynthetic pathways involved in antimicrobial compounds production and in the main seafood spoilage compounds were also identified. The genetic equipment highlighted in this study contributed to gain further insights into the predominance of Serratia in seafood products and their capacity to spoil.

RevDate: 2021-08-01

Wang S, Narsing Rao MP, Wei D, et al (2021)

Complete genome sequencing and comparative genome analysis of the extremely halophilic archaea, Haloterrigena daqingensis.

Biotechnology and applied biochemistry [Epub ahead of print].

In the present study, we report the complete genome sequencing of Haloterrigena daqingensis species. The genome of H. daqingensis JX313T consisted of a circular chromosome with three plasmids. The genome size and G+C content were estimated to be 3835796 bp and 61.7%, respectively. A total of 4158 genes were predicted with six rRNAs and 45 tRNAs. Metabolic pathway analysis suggests that H. daqingensis JX313T codes for all the necessary genes responsible to sustain its life at saline environment. The pan-genome analysis suggests that the number of singleton-gene between H. daqingensis and other Haloterrigena species varied. The study not only helps us understand H. daqingensis strategy for dealing with high stress, but it also provides an overview of its genomic makeup.

RevDate: 2021-07-12

Sanoussi CN, Coscolla M, Ofori-Anyinam B, et al (2021)

Mycobacterium tuberculosis complex lineage 5 exhibits high levels of within-lineage genomic diversity and differing gene content compared to the type strain H37Rv.

Microbial genomics, 7(7):.

Pathogens of the Mycobacterium tuberculosis complex (MTBC) are considered to be monomorphic, with little gene content variation between strains. Nevertheless, several genotypic and phenotypic factors separate strains of the different MTBC lineages (L), especially L5 and L6 (traditionally termed Mycobacterium africanum) strains, from each other. However, this genome variability and gene content, especially of L5 strains, has not been fully explored and may be important for pathobiology and current approaches for genomic analysis of MTBC strains, including transmission studies. By comparing the genomes of 355 L5 clinical strains (including 3 complete genomes and 352 Illumina whole-genome sequenced isolates) to each other and to H37Rv, we identified multiple genes that were differentially present or absent between H37Rv and L5 strains. Additionally, considerable gene content variability was found across L5 strains, including a split in the L5.3 sub-lineage into L5.3.1 and L5.3.2. These gene content differences had a small knock-on effect on transmission cluster estimation, with clustering rates influenced by the selected reference genome, and with potential overestimation of recent transmission when using H37Rv as the reference genome. We conclude that full capture of the gene diversity, especially high-resolution outbreak analysis, requires a variation of the single H37Rv-centric reference genome mapping approach currently used in most whole-genome sequencing data analysis pipelines. Moreover, the high within-lineage gene content variability suggests that the pan-genome of M. tuberculosis is at least several kilobases larger than previously thought, implying that a concatenated or reference-free genome assembly (de novo) approach may be needed for particular questions.

RevDate: 2021-07-12
CmpDate: 2021-07-12

Sinha D, Sun X, Khare M, et al (2021)

Pangenome analysis and virulence profiling of Streptococcus intermedius.

BMC genomics, 22(1):522.

BACKGROUND: Streptococcus intermedius, a member of the S. anginosus group, is a commensal bacterium present in the normal microbiota of human mucosal surfaces of the oral, gastrointestinal, and urogenital tracts. However, it has been associated with various infections such as liver and brain abscesses, bacteremia, osteo-articular infections, and endocarditis. Since 2005, high throughput genome sequencing methods enabled understanding the genetic landscape and diversity of bacteria as well as their pathogenic role. Here, in order to determine whether specific virulence genes could be related to specific clinical manifestations, we compared the genomes from 27 S. intermedius strains isolated from patients with various types of infections, including 13 that were sequenced in our institute and 14 available in GenBank.

RESULTS: We estimated the theoretical pangenome size to be of 4,020 genes, including 1,355 core genes, 1,054 strain-specific genes and 1,611 accessory genes shared by 2 or more strains. The pangenome analysis demonstrated that the genomic diversity of S. intermedius represents an "open" pangenome model. We identified a core virulome of 70 genes and 78 unique virulence markers. The phylogenetic clusters based upon core-genome sequences and SNPs were independent from disease types and sample sources. However, using Principal Component analysis based on presence/ absence of virulence genes, we identified the sda histidine kinase, adhesion protein LAP and capsular polysaccharide biosynthesis protein cps4E as being associated to brain abscess or broncho-pulmonary infection. In contrast, liver and abdominal abscess were associated to presence of the fibronectin binding protein fbp54 and capsular polysaccharide biosynthesis protein cap8D and cpsB.

CONCLUSIONS: Based on the virulence gene content of 27 S. intermedius strains causing various diseases, we identified putative disease-specific genetic profiles discriminating those causing brain abscess or broncho-pulmonary infection from those causing liver and abdominal abscess. These results provide an insight into S. intermedius pathogenesis and highlights putative targets in a diagnostic perspective.

RevDate: 2021-08-07

Liu C, Peng P, Li W, et al (2021)

Deciphering variation of 239 elite japonica rice genomes for whole genome sequences-enabled breeding.

Genomics, 113(5):3083-3091 pii:S0888-7543(21)00280-9 [Epub ahead of print].

Revealing genomic variation of representative and diverse germplasm is the cornerstone of deploying genomics information into genetic improvement programs of species of agricultural importance. Here we report the re-sequencing of 239 japonica rice elites representing the genetic diversity of japonica germplasm in China, Japan and Korea. A total of 4.8 million SNPs and PAV of 35,634 genes were identified. The elites from Japan and Korea are closely related and relatively less diverse than those from China. A japonica rice pan-genome was constructed, and 35 Mb non-redundant novel sequences were identified, from which 1131 novel genes were predicted. Strong selection signals of genomic regions were detected on most of the chromosomes. The heading date genes Hd1 and Hd3a have been artificially selected during the breeding process. The results from this study lay the foundation for future whole genome sequences-enabled breeding in rice and provide a paradigm for other species.

RevDate: 2021-07-06

Rijzaani H, Bayer PE, Rouard M, et al (2021)

The pangenome of banana highlights differences between genera and genomes.

The plant genome [Epub ahead of print].

Banana (Musaceae family) has a complex genetic history and includes a genus Musa with a variety of cultivated clones with edible fruits, Ensete species that are grown for their edible corm, and monospecific Musella whose generic status has been questioned. The most commonly exported banana cultivars belong to Cavendish, a subgroup of Musa triploid cultivars, which is under threat by fungal pathogens, though there are also related species M. balbisiana Colla (B genome), M. textilis Née (T genome), and M. schizocarpa N. W. Simmonds (S genome), along with hybrids of these genomes, which potentially host genes of agronomic interest. Here we present the first cross-genus pangenome of banana, which contains representatives of the Musa and Ensete genera. Clusters based on gene presence-absence variation (PAV) clearly separate Musa and Ensete, while Musa is split further based on species. These results present the first pangenome study across genus boundaries and identifies genes that differentiate between Musaceae species, information that may support breeding programs in these crops.

RevDate: 2021-07-26
CmpDate: 2021-07-26

Lovell JT, Bentley NB, Bhattarai G, et al (2021)

Four chromosome scale genomes and a pan-genome annotation to accelerate pecan tree breeding.

Nature communications, 12(1):4125.

Genome-enabled biotechnologies have the potential to accelerate breeding efforts in long-lived perennial crop species. Despite the transformative potential of molecular tools in pecan and other outcrossing tree species, highly heterozygous genomes, significant presence-absence gene content variation, and histories of interspecific hybridization have constrained breeding efforts. To overcome these challenges, here, we present diploid genome assemblies and annotations of four outbred pecan genotypes, including a PacBio HiFi chromosome-scale assembly of both haplotypes of the 'Pawnee' cultivar. Comparative analysis and pan-genome integration reveal substantial and likely adaptive interspecific genomic introgressions, including an over-retained haplotype introgressed from bitternut hickory into pecan breeding pedigrees. Further, by leveraging our pan-genome presence-absence and functional annotation database among genomes and within the two outbred haplotypes of the 'Lakota' genome, we identify candidate genes for pest and pathogen resistance. Combined, these analyses and resources highlight significant progress towards functional and quantitative genomics in highly diverse and outbred crops.

RevDate: 2021-07-06

Hendrickx APA, Debast S, Pérez-Vázquez M, et al (2021)

A genetic cluster of MDR Enterobacter cloacae complex ST78 harbouring a plasmid containing bla VIM-1 and mcr-9 in the Netherlands.

JAC-antimicrobial resistance, 3(2):dlab046.

Background: Carbapenemases produced by Enterobacterales are often encoded by genes on transferable plasmids and represent a major healthcare problem, especially if the plasmids contain additional antibiotic resistance genes. As part of Dutch national surveillance, 50 medical microbiological laboratories submit their Enterobacterales isolates suspected of carbapenemase production to the National Institute for Public Health and the Environment for characterization. All isolates for which carbapenemase production is confirmed are subjected to next-generation sequencing.

Objectives: To study the molecular characteristics of a genetic cluster of Enterobacter cloacae complex isolates collected in Dutch national surveillance in the period 2015-20 in the Netherlands.

Methods: Short- and long-read genome sequencing was used in combination with MLST and pan-genome MLST (pgMLST) analyses. Automated antimicrobial susceptibility testing (AST), the Etest for meropenem and the broth microdilution test for colistin were performed. The carbapenem inactivation method was used to assess carbapenemase production.

Results: pgMLST revealed that nine E. cloacae complex isolates from three different hospitals in the Netherlands differed by <20 alleles and grouped in a genetic cluster termed EclCluster-013. Seven isolates were submitted by one hospital in 2016-20. EclCluster-013 isolates produced carbapenemase and were from ST78, a globally disseminated lineage. EclCluster-013 isolates harboured a 316 078 bp IncH12 plasmid carrying the bla VIM-1 carbapenemase and the novel mcr-9 colistin resistance gene along with genes encoding resistance to different antibiotic classes. AST showed that EclCluster-013 isolates were MDR, but susceptible to meropenem (<2 mg/L) and colistin (<2 mg/L).

Conclusions: The EclCluster-013 reported here represents an MDR E. cloacae complex ST78 strain containing an IncH12 plasmid carrying both the bla VIM-1 carbapenemase and the mcr-9 colistin resistance gene.

RevDate: 2021-07-16
CmpDate: 2021-07-16

Cheng C, Zhou W, Dong X, et al (2021)

Genomic Analysis of Delftia tsuruhatensis Strain TR1180 Isolated From A Patient From China With In4-Like Integron-Associated Antimicrobial Resistance.

Frontiers in cellular and infection microbiology, 11:663933.

Delftia tsuruhatensis has become an emerging pathogen in humans. There is scant information on the genomic characteristics of this microorganism. In this study, we determined the complete genome sequence of a clinical D. tsuruhatensis strain, TR1180, isolated from a sputum specimen of a female patient in China in 2019. Phylogenetic and average nucleotide identity analysis demonstrated that TR1180 is a member of D. tsuruhatensis. TR1180 exhibited resistance to β-lactam, aminoglycoside, tetracycline and sulphonamide antibiotics, but was susceptible to phenicols, fluoroquinolones and macrolides. Its genome is a single, circular chromosome measuring 6,711,018 bp in size. Whole-genome analysis identified 17 antibiotic resistance-related genes, which match the antimicrobial susceptibility profile of this strain, as well as 24 potential virulence factors and a number of metal resistance genes. Our data showed that Delftia possessed an open pan-genome and the genes in the core genome contributed to the pathogenicity and resistance of Delftia strains. Comparative genomics analysis of TR1180 with other publicly available genomes of Delftia showed diverse genomic features among these strains. D. tsuruhatensis TR1180 harbored a unique 38-kb genomic island flanked by a pair of 29-bp direct repeats with the insertion of a novel In4-like integron containing most of the specific antibiotic resistance genes within the genome. This study reports the findings of a fully sequenced genome from clinical D. tsuruhatensis, which provide researchers and clinicians with valuable insights into this uncommon species.

RevDate: 2021-07-06

Koeksoy E, Bezuidt OM, Bayer T, et al (2021)

Zetaproteobacteria Pan-Genome Reveals Candidate Gene Cluster for Twisted Stalk Biosynthesis and Export.

Frontiers in microbiology, 12:679409.

Twisted stalks are morphologically unique bacterial extracellular organo-metallic structures containing Fe(III) oxyhydroxides that are produced by microaerophilic Fe(II)-oxidizers belonging to the Betaproteobacteria and Zetaproteobacteria. Understanding the underlying genetic and physiological mechanisms of stalk formation is of great interest based on their potential as novel biogenic nanomaterials and their relevance as putative biomarkers for microbial Fe(II) oxidation on ancient Earth. Despite the recognition of these special biominerals for over 150 years, the genetic foundation for the stalk phenotype has remained unresolved. Here we present a candidate gene cluster for the biosynthesis and secretion of the stalk organic matrix that we identified with a trait-based analyses of a pan-genome comprising 16 Zetaproteobacteria isolate genomes. The "stalk formation in Zetaproteobacteria" (sfz) cluster comprises six genes (sfz1-sfz6), of which sfz1 and sfz2 were predicted with functions in exopolysaccharide synthesis, regulation, and export, sfz4 and sfz6 with functions in cell wall synthesis manipulation and carbohydrate hydrolysis, and sfz3 and sfz5 with unknown functions. The stalk-forming Betaproteobacteria Ferriphaselus R-1 and OYT-1, as well as dread-forming Zetaproteobacteria Mariprofundus aestuarium CP-5 and Mariprofundus ferrinatatus CP-8 contain distant sfz gene homologs, whereas stalk-less Zetaproteobacteria and Betaproteobacteria lack the entire gene cluster. Our pan-genome analysis further revealed a significant enrichment of clusters of orthologous groups (COGs) across all Zetaproteobacteria isolate genomes that are associated with the regulation of a switch between sessile and motile growth controlled by the intracellular signaling molecule c-di-GMP. Potential interactions between stalk-former unique transcription factor genes, sfz genes, and c-di-GMP point toward a c-di-GMP regulated surface attachment function of stalks during sessile growth.

RevDate: 2021-07-06

Farace PD, Irazoqui JM, Morsella CG, et al (2021)

Phylogenomic analysis for Campylobacter fetus ocurring in Argentina.

Veterinary world, 14(5):1165-1179.

Background and Aim: Campylobacter fetus is one of the most important pathogens that severely affects livestock industry worldwide. C. fetus mediated bovine genital campylobacteriosis infection in cattle has been associated with significant economic losses in livestock production in the Pampas region, the most productive area of Argentina. The present study aimed to establish the genomic relationships between C. fetus strains, isolated from the Pampas region, at local and global levels. The study also explored the utility of multi-locus sequence typing (MLST) as a typing technique for C. fetus.

Materials and Methods: For pangenome and phylogenetic analysis, whole genome sequences for 34 C. fetus strains, isolated from cattle in Argentina were downloaded from GenBank. A local maximum likelihood (ML) tree was constructed and linked to a Microreact project. In silico analysis based on MLST was used to obtain information regarding sequence type (ST) for each strain. For global phylogenetic analysis, a core genome ML-tree was constructed using genomic dataset for 265 C. fetus strains, isolated from various sources obtained from 20 countries.

Results: The local core genome phylogenetic tree analysis described the presence of two major clusters (A and B) and one minor cluster (C). The occurrence of 82% of the strains in these three clusters suggested a clonal population structure for C. fetus. The MLST analysis for the local strains revealed that 31 strains were ST4 type and one strain was ST5 type. In addition, a new variant was identified that was assigned a novel ST, ST70. In the present case, ST4 was homogenously distributed across all the regions and clusters. The global analysis showed that most of the local strains clustered in the phylogenetic groups that comprised exclusively of the strains isolated from Argentina. Interestingly, three strains showed a close genetic relationship with bovine strains obtained from Uruguay and Brazil. The ST5 strain grouped in a distant cluster, with strains obtained from different sources from various geographic locations worldwide. Two local strains clustered in a phylogenetic group comprising intercontinental Campylobacter fetus venerealis strains.

Conclusion: The results of the study suggested active movement of animals, probably due to economic trade between different regions of the country as well as with neighboring countries. MLST results were partially concordant with phylogenetic analysis. Thus, this method did not qualify as a reliable subtyping method to assess C. fetus diversity in Argentina. The present study provided a basic platform to conduct future research on C. fetus, both at local and international levels.

RevDate: 2021-07-24

Carpi FM, Coman MM, Silvi S, et al (2021)

Comprehensive pan-genome analysis of Lactiplantibacillus plantarum complete genomes.

Journal of applied microbiology [Epub ahead of print].

AIMS: The aim of this work was to refine the taxonomy and the functional characterization of publicly available Lactiplantibacillus plantarum complete genomes through a pan-genome analysis. Particular attention was paid in depicting the probiotic potential of each strain.

METHODS AND RESULTS: Complete genome sequence of 127 L. plantarum strains, without detected anomalies, was downloaded from NCBI. Roary analysis of L. plantarum pan-genome identified 1436 core, 414 soft core, 1858 shell and 13,203 cloud genes, highlighting the 'open' nature of L. plantarum pan-genome. Identification and characterization of plasmid content, mobile genetic elements, adaptative immune system and probiotic marker genes (PMGs) revealed unique features across all the L. plantarum strains included in the present study. Considering our updated list of PMGs, we determined that approximatively 70% of the PMGs belongs to the core/soft-core genome.

CONCLUSIONS: The comparative genomic analysis conducted in this study provide new insights into the genomic content and variability of L. plantarum.

This study provides a comprehensive pan-genome analysis of L. plantarum, including the largest number (N = 127) of complete L. plantarum genomes retrieved from publicly available repositories. Our effort aimed to determine a solid reference panel for the future characterization of newly sequenced L. plantarum strains useful as probiotic supplements.

RevDate: 2021-07-02

Ge T, Jiang H, Tan EH, et al (2021)

Pangenomic Analysis of Dickeya dianthicola Strains Related to the Outbreak of Blackleg and Soft Rot of Potato in USA.

Plant disease [Epub ahead of print].

Dickeya dianthicola has caused an outbreak of blackleg and soft rot of potato in the eastern half of the USA since 2015. To investigate genetic diversity of the pathogen, a comparative analysis was conducted on genomes of D. dianthicola strains. Whole genomes of 16 strains from the USA outbreak were assembled and compared to 16 previously sequenced genomes of D. dianthicola isolated from potato or carnation. Among the 32 strains, eight distinct clades were distinguished based on phylogenomic analysis. The outbreak strains were grouped into three clades, with the majority of the strains in clade I. Clade I strains were unique and homogeneous, suggesting a recent incursion of this strain into potato production from alternative hosts or environmental sources. Pangenome of the 32 strains contained 6693 genes, 3377 of which were core genes. By screening primary protein subunits associated with virulence from all USA strains, we found many virulence-related gene clusters, such as plant cell wall degrading enzyme genes, flagellar and chemotaxis related genes, two-component regulatory genes, and type I/II/III secretion system genes were highly conserved but type IV and type VI secretion system genes varied. The virulent clade I strains encoded two clusters of type IV secretion systems, while clade II and III strains encoded only one cluster. Clade I and II strains encoded one more VgrG/PAAR spike protein than clade III. Thus, we predicted that the presence of additional virulence-related genes may have enabled the unique clade I strain to become predominant source in the USA outbreak.

RevDate: 2021-07-27

Pintado A, Pérez-Martínez I, Aragón IM, et al (2021)

The Rhizobacterium Pseudomonas alcaligenes AVO110 Induces the Expression of Biofilm-Related Genes in Response to Rosellinia necatrix Exudates.

Microorganisms, 9(7):.

The rhizobacterium Pseudomonas alcaligenes AVO110 exhibits antagonism toward the phytopathogenic fungus Rosellinia necatrix. This strain efficiently colonizes R. necatrix hyphae and is able to feed on their exudates. Here, we report the complete genome sequence of P. alcaligenes AVO110. The phylogeny of all available P. alcaligenes genomes separates environmental isolates, including AVO110, from those obtained from infected human blood and oyster tissues, which cluster together with Pseudomonas otitidis. Core and pan-genome analyses showed that P. alcaligenes strains encode highly heterogenic gene pools, with the AVO110 genome encoding the largest and most exclusive variable region (~1.6 Mb, 1795 genes). The AVO110 singletons include a wide repertoire of genes related to biofilm formation, several of which are transcriptionally modulated by R. necatrix exudates. One of these genes (cmpA) encodes a GGDEF/EAL domain protein specific to Pseudomonas spp. strains isolated primarily from the rhizosphere of diverse plants, but also from soil and water samples. We also show that CmpA has a role in biofilm formation and that the integrity of its EAL domain is involved in this function. This study contributes to a better understanding of the niche-specific adaptations and lifestyles of P. alcaligenes, including the mycophagous behavior of strain AVO110.

RevDate: 2021-07-26
CmpDate: 2021-07-26

Alouane T, Rimbert H, Bormann J, et al (2021)

Comparative Genomics of Eight Fusarium graminearum Strains with Contrasting Aggressiveness Reveals an Expanded Open Pangenome and Extended Effector Content Signatures.

International journal of molecular sciences, 22(12):.

Fusarium graminearum, the primary cause of Fusarium head blight (FHB) in small-grain cereals, demonstrates remarkably variable levels of aggressiveness in its host, producing different infection dynamics and contrasted symptom severity. While the secreted proteins, including effectors, are thought to be one of the essential components of aggressiveness, our knowledge of the intra-species genomic diversity of F. graminearum is still limited. In this work, we sequenced eight European F. graminearum strains of contrasting aggressiveness to characterize their respective genome structure, their gene content and to delineate their specificities. By combining the available sequences of 12 other F. graminearum strains, we outlined a reference pangenome that expands the repertoire of the known genes in the reference PH-1 genome by 32%, including nearly 21,000 non-redundant sequences and gathering a common base of 9250 conserved core-genes. More than 1000 genes with high non-synonymous mutation rates may be under diverse selection, especially regarding the trichothecene biosynthesis gene cluster. About 900 secreted protein clusters (SPCs) have been described. Mostly localized in the fast sub-genome of F. graminearum supposed to evolve rapidly to promote adaptation and rapid responses to the host's infection, these SPCs gather a range of putative proteinaceous effectors systematically found in the core secretome, with the chloroplast and the plant nucleus as the main predicted targets in the host cell. This work describes new knowledge on the intra-species diversity in F. graminearum and emphasizes putative determinants of aggressiveness, providing a wealth of new candidate genes potentially involved in the Fusarium head blight disease.

RevDate: 2021-07-21

Ahmed O, Rossi M, Kovaka S, et al (2021)

Pan-genomic matching statistics for targeted nanopore sequencing.

iScience, 24(6):102696.

Nanopore sequencing is an increasingly powerful tool for genomics. Recently, computational advances have allowed nanopores to sequence in a targeted fashion; as the sequencer emits data, software can analyze the data in real time and signal the sequencer to eject "nontarget" DNA molecules. We present a novel method called SPUMONI, which enables rapid and accurate targeted sequencing using efficient pan-genome indexes. SPUMONI uses a compressed index to rapidly generate exact or approximate matching statistics in a streaming fashion. When used to target a specific strain in a mock community, SPUMONI has similar accuracy as minimap2 when both are run against an index containing many strains per species. However SPUMONI is 12 times faster than minimap2. SPUMONI's index and peak memory footprint are also 16 to 4 times smaller than those of minimap2, respectively. This could enable accurate targeted sequencing even when the targeted strains have not necessarily been sequenced or assembled previously.

RevDate: 2021-07-02

Li Y, Wang M, Sun ZZ, et al (2021)

Comparative Genomic Insights Into the Taxonomic Classification, Diversity, and Secondary Metabolic Potentials of Kitasatospora, a Genus Closely Related to Streptomyces.

Frontiers in microbiology, 12:683814.

While the genus Streptomyces (family Streptomycetaceae) has been studied as a model for bacterial secondary metabolism and genetics, its close relatives have been less studied. The genus Kitasatospora is the second largest genus in the family Streptomycetaceae. However, its taxonomic position within the family remains under debate and the secondary metabolic potential remains largely unclear. Here, we performed systematic comparative genomic and phylogenomic analyses of Kitasatospora. Firstly, the three genera within the family Streptomycetaceae (Kitasatospora, Streptomyces, and Streptacidiphilus) showed common genomic features, including high G + C contents, high secondary metabolic potentials, and high recombination frequencies. Secondly, phylogenomic and comparative genomic analyses revealed phylogenetic distinctions and genome content differences among these three genera, supporting Kitasatospora as a separate genus within the family. Lastly, the pan-genome analysis revealed extensive genetic diversity within the genus Kitasatospora, while functional annotation and genome content comparison suggested genomic differentiation among lineages. This study provided new insights into genomic characteristics of the genus Kitasatospora, and also uncovered its previously underestimated and complex secondary metabolism.

RevDate: 2021-07-22
CmpDate: 2021-07-22

Köstlbacher S, Collingro A, Halter T, et al (2021)

Pangenomics reveals alternative environmental lifestyles among chlamydiae.

Nature communications, 12(1):4021.

Chlamydiae are highly successful strictly intracellular bacteria associated with diverse eukaryotic hosts. Here we analyzed metagenome-assembled genomes of the "Genomes from Earth's Microbiomes" initiative from diverse environmental samples, which almost double the known phylogenetic diversity of the phylum and facilitate a highly resolved view at the chlamydial pangenome. Chlamydiae are defined by a relatively large core genome indicative of an intracellular lifestyle, and a highly dynamic accessory genome of environmental lineages. We observe chlamydial lineages that encode enzymes of the reductive tricarboxylic acid cycle and for light-driven ATP synthesis. We show a widespread potential for anaerobic energy generation through pyruvate fermentation or the arginine deiminase pathway, and we add lineages capable of molecular hydrogen production. Genome-informed analysis of environmental distribution revealed lineage-specific niches and a high abundance of chlamydiae in some habitats. Together, our data provide an extended perspective of the variability of chlamydial biology and the ecology of this phylum of intracellular microbes.

RevDate: 2021-07-01

Zhou Q, Mai K, Yang D, et al (2021)

Comparative genomic analysis of Mycoplasma anatis strains.

Genes & genomics [Epub ahead of print].

BACKGROUND: The Gram-negative intracellular bacterium Mycoplasma anatis is a pathogen of respiratory infectious diseases in ducks and has caused significant economic losses in the poultry industry.

OBJECTIVE: This study, as the first report of the structure and function of the pan-genome of Mycoplasma anatis, may provide a valuable genetic basis for many aspects of future research on the pathogens of waterfowl.

METHODS: We sequenced the whole genomes of 15 Mycoplasma anatis isolated from ducks in China. Draft genome sequencing was carried out and whole-genome sequencing was performed by the sequencers of the PacBio Sequel and an IonTorrent Personal Genome Machine (PGM). Then the common genic elements of protein-coding genes, tRNAs, and rRNAs of Mycoplasma anatis genomes were predicted by using the pipeline Prokka v1.13.7. To investigate homologous protein clusters across Mycoplasma anatis genomes, we adopted Roary v3.13.0 to cluster orthologous genes (OGs) based on the following criteria.

RESULTS: We obtained one complete genome and 14 genome sketches. Microbial mobile genetic element analysis revealed the distribution of insertion sequences (IS30, IS3, and IS1634), prophage regions, and CRISPR arrays in the genome of Mycoplasma anatis. Comparative genomic analysis decoded the genetic components and functional classification of the pan-genome of Mycoplasma anatis that comprised 646 core genes, 231 dispensable genes and among them 110 was strain-specific. Virulence-related gene profiles of Mycoplasma anatis were systematically identified, and the products of these genes included bacterial ABC transporter systems, iron transport proteins, toxins, and secretion systems.

CONCLUSION: A complete virulence-related gene profile of Mycoplasma anatis has been identified, most of the genes are highly conserved in all strains. Sequencing results are relevant to the molecular mechanisms of drug resistance, adaptive evolution of pathogens, population structure, and vaccine development.

RevDate: 2021-06-29

Tláskal V, Pylro VS, Žifčáková L, et al (2021)

Ecological Divergence Within the Enterobacterial Genus Sodalis: From Insect Symbionts to Inhabitants of Decomposing Deadwood.

Frontiers in microbiology, 12:668644.

The bacterial genus Sodalis is represented by insect endosymbionts as well as free-living species. While the former have been studied frequently, the distribution of the latter is not yet clear. Here, we present a description of a free-living strain, Sodalis ligni sp. nov., originating from decomposing deadwood. The favored occurrence of S. ligni in deadwood is confirmed by both 16S rRNA gene distribution and metagenome data. Pangenome analysis of available Sodalis genomes shows at least three groups within the Sodalis genus: deadwood-associated strains, tsetse fly endosymbionts and endosymbionts of other insects. This differentiation is consistent in terms of the gene frequency level, genome similarity and carbohydrate-active enzyme composition of the genomes. Deadwood-associated strains contain genes for active decomposition of biopolymers of plant and fungal origin and can utilize more diverse carbon sources than their symbiotic relatives. Deadwood-associated strains, but not other Sodalis strains, have the genetic potential to fix N2, and the corresponding genes are expressed in deadwood. Nitrogenase genes are located within the genomes of Sodalis, including S. ligni, at multiple loci represented by more gene variants. We show decomposing wood to be a previously undescribed habitat of the genus Sodalis that appears to show striking ecological divergence.

RevDate: 2021-07-05
CmpDate: 2021-07-05

Zhao Y, Chen X, Hu X, et al (2021)

Characterization of a carbapenem-resistant Citrobacter amalonaticus coharbouring bla IMP-4 and qnrs1 genes.

Journal of medical microbiology, 70(6):.

Introduction. Members of the genus Citrobacter are facultative anaerobic Gram-negative bacilli belonging to the Enterobacterales [Janda J Clin Microbiol 1994; 32(8):1850-1854; Arens Clin Microbiol Infect 1997;3(1):53-57]. Formerly, Citrobacter species were occasionally reported as nosocomial pathogens with low virulence [Pepperell Antimicrob Agents Chemother 2002;46(11):3555-60]. Now, they are consistently reported to cause nosocomial infections of the urinary tract, respiratory tract, bone, peritoneum, endocardium, meninges, intestines, bloodstream and central nervous system. Among Citrobacter species, the most common isolates are C. koseri and C. freundii, while C. amalonaticus has seldom been isolated [Janda J Clin Microbiol 1994; 32(8):1850-1854; Marak Infect Dis (Lond) 2017;49(7):532-9]. Further, Citrobacter spp. are usually susceptible to carbapenems, aminoglycosides, tetracyclines and colistin [Marak Infect Dis (Lond) 2017;49(7):532-9].Hypothesis/Gap Statement. As C. amalonaticus is rare, only one clinical isolate, coharbouring carbapenem resistance gene bla IMP-4 and quinolone resistance gene qnrs1, has been reported.Aim. To characterize a carbapenem-resistant C. amalonaticus strain from PR China coharbouring bla IMP-4 and qnrs1.Methodology. Three hundred and forty nonrepetitive carbapenem-resistant Enterobacterales (CRE) strains were collected during 2011-2018. A carbapenem-resistant C. amalonaticus strain was detected and confirmed using a VITEK mass spectrometry-based microbial identification system and 16S rRNA sequencing. Minimum inhibitory concentrations (MICs) for clinical antimicrobials were obtained by the broth microdilution method. Whole-genome sequencing (WGS) was performed for antibiotic resistance gene analysis, and a phylogenetic tree of C. amalonaticus strains was constructed using the Bacterial Pan Genome Analysis (BPGA) tool. The transferability of the resistance plasmid was verified by conjugal transfer.Results. A rare carbapenem-resistant C. amalonaticus strain (CA71) was recovered from a patient with cerebral obstruction and the sequences of 16S rRNA gene shared more than 99 % similarity with C. amalonaticus CITRO86, FDAARGOS 165. CA71 is resistant to β-lactam, quinolone and aminoglycoside antibiotics, and even imipenem and meropenem (MICs of 2 and 4 mg l-1 respectively), and is only sensitive to polymyxin B and tigecycline. Six antibiotic resistance genes were detected via WGS, including the β-lactam genes bla IMP-4, bla CTX-M-18 and bla Sed1, the quinolone gene qnrs1, and the aminoglycoside genes AAC(3)-VIIIa, AadA24. Interestingly, bla IMP-4 and qnrs1 coexist on an IncN1-type plasmid (pCA71-IMP) and successfully transferred to Escherichia coli J53 via conjugal transfer. Phylogenetic analysis showed that CA71 is most similar to C. amalonaticus strain CJ25 and belongs to the same evolutionary cluster along with seven other strains.Conclusion. To the best of our knowledge, this is the first report of a carbapenem-resistant C. amalonaticus isolate coharbouring bla IMP-4 and qnrs1.

RevDate: 2021-06-25

Bayer PE, Valliyodan B, Hu H, et al (2021)

Sequencing the USDA core soybean collection reveals gene loss during domestication and breeding.

The plant genome [Epub ahead of print].

The gene content of plants varies between individuals of the same species due to gene presence/absence variation, and selection can alter the frequency of specific genes in a population. Selection during domestication and breeding will modify the genomic landscape, though the nature of these modifications is only understood for specific genes or on a more general level (e.g., by a loss of genetic diversity). Here we have assembled and analyzed a soybean (Glycine spp.) pangenome representing more than 1,000 soybean accessions derived from the USDA Soybean Germplasm Collection, including both wild and cultivated lineages, to assess genomewide changes in gene and allele frequency during domestication and breeding. We identified 3,765 genes that are absent from the Lee reference genome assembly and assessed the presence/absence of all genes across this population. In addition to a loss of genetic diversity, we found a significant reduction in the average number of protein-coding genes per individual during domestication and subsequent breeding, though with some genes and allelic variants increasing in frequency associated with selection for agronomic traits. This analysis provides a genomic perspective of domestication and breeding in this important oilseed crop.

RevDate: 2021-07-03

Shahid F, Zaheer T, Ashraf ST, et al (2021)

Chimeric vaccine designs against Acinetobacter baumannii using pan genome and reverse vaccinology approaches.

Scientific reports, 11(1):13213.

Acinetobacter baumannii (A. baumannii), an opportunistic, gram-negative pathogen, has evoked the interest of the medical community throughout the world because of its ability to cause nosocomial infections, majorly infecting those in intensive care units. It has also drawn the attention of researchers due to its evolving immune evasion strategies and increased drug resistance. The emergence of multi-drug-resistant-strains has urged the need to explore novel therapeutic options as an alternative to antibiotics. Due to the upsurge in antibiotic resistance mechanisms exhibited by A. baumannii, the current therapeutic strategies are rendered less effective. The aim of this study is to explore novel therapeutic alternatives against A. baumannii to control the ailed infection. In this study, a computational framework is employed involving, pan genomics, subtractive proteomics and reverse vaccinology strategies to identify core promiscuous vaccine candidates. Two chimeric vaccine constructs having B-cell derived T-cell epitopes from prioritized vaccine candidates; APN, AdeK and AdeI have been designed and checked for their possible interactions with host BCR, TLRs and HLA Class I and II Superfamily alleles. These vaccine candidates can be experimentally validated and thus contribute to vaccine development against A. baumannii infections.

RevDate: 2021-06-25

Tenea GN, P Hurtado (2021)

Next-Generation Sequencing for Whole-Genome Characterization of Weissella cibaria UTNGt21O Strain Originated From Wild Solanum quitoense Lam. Fruits: An Atlas of Metabolites With Biotechnological Significance.

Frontiers in microbiology, 12:675002.

The whole genome of Weissella cibaria strain UTNGt21O isolated from wild fruits of Solanum quitoense (naranjilla) shrub was sequenced and annotated. The similarity proportions based on the genus level, as a result of the best hits for the entire contig, were 54.84% with Weissella, 6.45% with Leuconostoc, 3.23% with Lactococcus, and 35.48% no match. The closest genome was W. cibaria SP7 (GCF_004521965.1) with 86.21% average nucleotide identity (ANI) and 3.2% alignment coverage. The genome contains 1,867 protein-coding genes, among which 1,620 were assigned with the EggNOG database. On the basis of the results, 438 proteins were classified with unknown function from which 247 new hypothetical proteins have no match in the nucleotide Basic Local Alignment Search Tool (BLASTN) database. It also contains 78 tRNAs, six copies of 5S rRNA, one copy of 16S rRNA, one copy of 23S rRNA, and one copy of tmRNA. The W. cibaria UTNGt21O strain harbors several genes responsible for carbohydrate metabolism, cellular process, general stress responses, cofactors, and vitamins, conferring probiotic features. A pangenome analysis indicated the presence of various strain-specific genes encoded for proteins responsible for the defense mechanisms as well as gene encoded for enzymes with biotechnological value, such as penicillin acylase and folates; thus, W. cibaria exhibited high genetic diversity. The genome characterization indicated the presence of a putative CRISPR-Cas array and five prophage regions and the absence of acquired antibiotic resistance genes, virulence, and pathogenic factors; thus, UTNGt21O might be considered a safe strain. Besides, the interaction between the peptide extracts from UTNGt21O and Staphylococcus aureus results in cell death caused by the target cell integrity loss and the release of aromatic molecules from the cytoplasm. The results indicated that W. cibaria UTNGt21O can be considered a beneficial strain to be further exploited for developing novel antimicrobials and probiotic products with improved technological characteristics.

RevDate: 2021-06-25

Lawal OU, Barata M, Fraqueza MJ, et al (2021)

Staphylococcus saprophyticus From Clinical and Environmental Origins Have Distinct Biofilm Composition.

Frontiers in microbiology, 12:663768.

Biofilm formation has been shown to be critical to the success of uropathogens. Although Staphylococcus saprophyticus is a common cause of urinary tract infections, its biofilm production capacity, composition, genetic basis, and origin are poorly understood. We investigated biofilm formation in a large and diverse collection of S. saprophyticus (n = 422). Biofilm matrix composition was assessed in representative strains (n = 63) belonging to two main S. saprophyticus lineages (G and S) recovered from human infection, colonization, and food-related environment using biofilm detachment approach. To identify factors that could be associated with biofilm formation and structure variation, we used a pangenome-wide association study approach. Almost all the isolates (91%; n = 384/422) produced biofilm. Among the 63 representative strains, we identified eight biofilm matrix phenotypes, but the most common were composed of protein or protein-extracellular DNA (eDNA)-polysaccharides (38%, 24/63 each). Biofilms containing protein-eDNA-polysaccharides were linked to lineage G and environmental isolates, whereas protein-based biofilms were produced by lineage S and infection isolates (p < 0.05). Putative biofilm-associated genes, namely, aas, atl, ebpS, uafA, sasF, sasD, sdrH, splE, sdrE, sdrC, sraP, and ica genes, were found with different frequencies (3-100%), but there was no correlation between their presence and biofilm production or matrix types. Notably, icaC_1 was ubiquitous in the collection, while icaR was lineage G-associated, and only four strains carried a complete ica gene cluster (icaADBCR) except one that was without icaR. We provided evidence, using a comparative genomic approach, that the complete icaADBCR cluster was acquired multiple times by S. saprophyticus and originated from other coagulase-negative staphylococci. Overall, the composition of S. saprophyticus biofilms was distinct in environmental and clinical isolates, suggesting that modulation of biofilm structure could be a key step in the pathogenicity of these bacteria. Moreover, biofilm production in S. saprophyticus is ica-independent, and the complete icaADBCR was acquired from other staphylococci.

RevDate: 2021-06-24

Zhang S, Amanze C, Sun C, et al (2021)

Evolutionary, genomic, and biogeographic characterization of two novel xenobiotics-degrading strains affiliated with Dechloromonas.

Heliyon, 7(6):e07181.

Xenobiotics are generally known as man-made refractory organic pollutants widely distributed in various environments. For exploring the bioremediation possibility of xenobiotics, two novel xenobiotics-degrading strains affiliated with Azonexaceae were isolated. We report here the phylogenetics, genome, and geo-distribution of a novel and ubiquitous Azonexaceae species that primarily joins in the cometabolic process of some xenobiotics in natural communities. Strains s22 and t15 could be proposed as a novel species within Dechloromonas based on genomic and multi-phylogenetic analysis. Pan-genome analysis showed that the 63 core genes in Dechloromonas include genes for dozens of metabolisms such as nitrogen fixation protein (nifU), nitrogen regulatory protein (glnK), dCTP deaminase, C4-dicarboxylate transporter, and fructose-bisphosphate aldolase. Strains s22 and t15 have the ability to metabolize nitrogen, including nitrogen fixation, NirS-dependent denitrification, and dissimilatory nitrate reduction. Moreover, the novel species possesses the EnvZ-OmpR two-component system for controlling osmotic stress and QseC-QseB system for quorum sensing to rapidly sense environmental changes. It is intriguing that this new species has a series of genes for the biodegradation of some xenobiotics such as azathioprine, 6-Mercaptopurine, trinitrotoluene, chloroalkane, and chloroalkene. Specifically, glutathione S-transferase (GST) and 4-oxalocrotonate tautomerase (praC) in this novel species play important roles in the detoxification metabolism of some xenobiotics like dioxin, trichloroethene, chloroacetyl chloride, benzo[a]pyrene, and aflatoxin B1. Using data from GenBank, DDBJ and EMBL databases, we also demonstrated that members of this novel species were found globally in plants (e.g. rice), guts (e.g. insect), pristine and contaminated regions. Given these data, Dechloromonas sp. strains s22 and t15 take part in the biodegradation of some xenobiotics through key enzymes.

RevDate: 2021-06-22

Sahmi-Bounsiar D, Rolland C, Aherfi S, et al (2021)

Marseilleviruses: An Update in 2021.

Frontiers in microbiology, 12:648731.

The family Marseilleviridae was the second family of giant viruses that was described in 2013, after the family Mimiviridae. Marseillevirus marseillevirus, isolated in 2007 by coculture on Acanthamoeba polyphaga, is the prototype member of this family. Afterward, the worldwide distribution of marseilleviruses was revealed through their isolation from samples of various types and sources. Thus, 62 were isolated from environmental water, one from soil, one from a dipteran, one from mussels, and two from asymptomatic humans, which led to the description of 67 marseillevirus isolates, including 21 by the IHU Méditerranée Infection in France. Recently, five marseillevirus genomes were assembled from deep sea sediment in Norway. Isolated marseilleviruses have ≈250 nm long icosahedral capsids and 348-404 kilobase long mosaic genomes that encode 386-545 predicted proteins. Comparative genomic analyses indicate that the family Marseilleviridae includes five lineages and possesses a pangenome composed of 3,082 clusters of genes. The detection of marseilleviruses in both symptomatic and asymptomatic humans in stool, blood, and lymph nodes, and an up-to-30-day persistence of marseillevirus in rats and mice, raise questions concerning their possible clinical significance that are still under investigation.

RevDate: 2021-06-19

Ruperao P, Thirunavukkarasu N, Gandham P, et al (2021)

Sorghum Pan-Genome Explores the Functional Utility for Genomic-Assisted Breeding to Accelerate the Genetic Gain.

Frontiers in plant science, 12:666342.

Sorghum (Sorghum bicolor L.) is a staple food crops in the arid and rainfed production ecologies. Sorghum plays a critical role in resilient farming and is projected as a smart crop to overcome the food and nutritional insecurity in the developing world. The development and characterisation of the sorghum pan-genome will provide insight into genome diversity and functionality, supporting sorghum improvement. We built a sorghum pan-genome using reference genomes as well as 354 genetically diverse sorghum accessions belonging to different races. We explored the structural and functional characteristics of the pan-genome and explain its utility in supporting genetic gain. The newly-developed pan-genome has a total of 35,719 genes, a core genome of 16,821 genes and an average of 32,795 genes in each cultivar. The variable genes are enriched with environment responsive genes and classify the sorghum accessions according to their race. We show that 53% of genes display presence-absence variation, and some of these variable genes are predicted to be functionally associated with drought adaptation traits. Using more than two million SNPs from the pan-genome, association analysis identified 398 SNPs significantly associated with important agronomic traits, of which, 92 were in genes. Drought gene expression analysis identified 1,788 genes that are functionally linked to different conditions, of which 79 were absent from the reference genome assembly. This study provides comprehensive genomic diversity resources in sorghum which can be used in genome assisted crop improvement.

RevDate: 2021-06-19

Zheng L, Zhu LW, Jing J, et al (2021)

Pan-Genome Analysis of Vibrio cholerae and Vibrio metschnikovii Strains Isolated From Migratory Birds at Dali Nouer Lake in Chifeng, China.

Frontiers in veterinary science, 8:638820.

Migratory birds are recently recognized as Vibrio disease vectors, but may be widespread transporters of Vibrio strains. We isolated Vibrio cholerae (V. cholerae) and Vibrio metschnikovii (V. metschnikovii) strains from migratory bird epidemic samples from 2017 to 2018 and isolated V. metschnikovii from migratory bird feces in 2019 from bird samples taken from the Inner Mongolia autonomous region of China. To investigate the evolution of these two Vibrio species, we sequenced the genomes of 40 V. cholerae strains and 34 V. metschnikovii strains isolated from the bird samples and compared these genomes with reference strain genomes. The pan-genome of all V. cholerae and V. metschnikovii genomes was large, with strains exhibiting considerable individual differences. A total of 2,130 and 1,352 core genes were identified in the V. cholerae and V. metschnikovii genomes, respectively, while dispensable genes accounted for 16,180 and 9,178 of all genes for the two strains, respectively. All V. cholerae strains isolated from the migratory birds that encoded T6SS and hlyA were non-O1/O139 serotypes without the ability to produce CTX. These strains also lacked the ability to produce the TCP fimbriae nor the extracellular matrix protein RbmA and could not metabolize trimetlylamine oxide (TMAO). Thus, these characteristics render them unlikely to be pandemic-inducing strains. However, a V. metschnikovii isolate encoding the complete T6SS system was isolated for the first time. These data provide new molecular insights into the diversity of V. cholerae and V. metschnikovii isolates recovered from migratory birds.

RevDate: 2021-06-16

N'Guessan A, Brito IL, Serohijos AWR, et al (2021)

Mobile gene sequence evolution within individual human gut microbiomes is better explained by gene-specific than host-specific selective pressures.

Genome biology and evolution pii:6300526 [Epub ahead of print].

Pangenomes-the cumulative set of genes encoded by a population or species-arise from the interplay of horizontal gene transfer, drift, and selection. The balance of these forces in shaping pangenomes has been debated, and studies to date focused on ancient evolutionary time scales have suggested that pangenomes generally confer niche adaptation to their bacterial hosts. To shed light on pangenome evolution on shorter evolutionary time scales, we inferred the selective pressures acting on mobile genes within individual human microbiomes from 176 Fiji islanders. We mapped metagenomic sequence reads to a set of known mobile genes to identify single nucleotide variants (SNVs) and calculated population genetic metrics to infer deviations from a neutral evolutionary model. We found that mobile gene sequence evolution varied more by gene family than by human social attributes, such as household or village. Patterns of mobile gene sequence evolution could be qualitatively recapitulated with a simple evolutionary simulation without the need to invoke adaptive value of mobile genes to either bacterial or human hosts. These results stand in contrast with the apparent adaptive value of pangenomes over longer evolutionary time scales. In general, the most highly mobile genes (i.e. those present in more distinct bacterial host genomes) tend to have higher metagenomic read coverage and an excess of low-frequency SNVs, consistent with their rapid spread across multiple bacterial species in the gut. However, a subset of mobile genes- including those involved in defense mechanisms and secondary metabolism-showed a contrasting signature of intermediate-frequency SNVs, indicating species-specific selective pressures or negative frequency-dependent selection on these genes. Together, our evolutionary models and population genetic data show that gene-specific selective pressures predominate over human or bacterial host-specific pressures during the relatively short time scales of a human lifetime.

RevDate: 2021-06-15

Huang X, Yang X, Shi X, et al (2021)

Whole-genome sequencing analysis of uncommon Shiga toxin-producing Escherichia coli from cattle: Virulence gene profiles, antimicrobial resistance predictions, and identification of novel O-serogroups.

Food microbiology, 99:103821.

Shiga toxin-producing E. coli (STEC) are major foodborne pathogens. While many studies have focused on the "top-7 STEC", little is known for minor serogroups. A total of 284 non-top-7 STEC strains isolated from cattle feces were subjected to whole-genome sequencing (WGS) to determine the serotypes, the presence of virulence genes and antimicrobial resistance (AMR) determinants. Nineteen typeable and three non-typeable serotypes with novel O-antigen loci were identified. Twenty-one AMR genes and point mutations in another six genes that conferred resistance to 10 antimicrobial classes were detected, as well as 46 virulence genes. The distribution of 33 virulence genes and 15 AMR determinants exhibited significant differences among serotypes (p < 0.05). Among all strains, 81.7% (n = 232) and 14.1% (n = 40) carried stx2 and stx1 only, respectively; only 4.2% (n = 12) carried both. Subtypes stx1a, stx1c, stx2a, stx2c, stx2d, and stx2g were identified. Forty-six strains carried eae and stx2a and therefore had the potential cause severe diseases; 47 strains were genetically related to human clinical strains inferred from a pan-genome phylogenetic tree. We were able to demonstrate the utility of WGS as a surveillance tool to characterize the novel serotypes, as well as AMR and virulence profiles of uncommon STEC that could potentially cause human illness.

RevDate: 2021-07-27
CmpDate: 2021-07-27

Sutton G, Fogel GB, Abramson B, et al (2021)

A pan-genome method to determine core regions of the Bacillus subtilis and Escherichia coli genomes.

F1000Research, 10:286.

Background: Synthetic engineering of bacteria to produce industrial products is a burgeoning field of research and application. In order to optimize genome design, designers need to understand which genes are essential, which are optimal for growth, and locations in the genome that will be tolerated by the organism when inserting engineered cassettes. Methods: We present a pan-genome based method for the identification of core regions in a genome that are strongly conserved at the species level. Results: We show that the core regions determined by our method contain all or almost all essential genes. This demonstrates the accuracy of our method as essential genes should be core genes. We show that we outperform previous methods by this measure. We also explain why there are exceptions to this rule for our method. Conclusions: We assert that synthetic engineers should avoid deleting or inserting into these core regions unless they understand and are manipulating the function of the genes in that region. Similarly, if the designer wishes to streamline the genome, non-core regions and in particular low penetrance genes would be good targets for deletion. Care should be taken to remove entire cassettes with similar penetrance of the genes within cassettes as they may harbor toxin/antitoxin genes which need to be removed in tandem. The bioinformatic approach introduced here saves considerable time and effort relative to knockout studies on single isolates of a given species and captures a broad understanding of the conservation of genes that are core to a species.

RevDate: 2021-07-07

Panibe JP, Wang L, Li J, et al (2021)

Chromosomal-level genome assembly of the semi-dwarf rice Taichung Native 1, an initiator of Green Revolution.

Genomics, 113(4):2656-2674.

Here we report the 409.5 Mb chromosome-level assembly of the first bred semi-dwarf rice, the Taichung Native 1 (TN1), which served as the template for the development of the Green Revolution (GR) cultivar IR8 "miracle rice". We sequenced the TN1 genome utilizing multiple platforms and produced PacBio long reads, Illumina paired-end reads, Illumina mate-pair reads and 10x Genomics linked reads. We used a hybrid approach to assemble the 226× coverage of sequences by a combination of de novo and reference-guided approaches. The assembled TN1 genome has an N50 scaffold size of 33.1 Mb with the longest measuring 45.5 Mb. We annotated 37,526 genes, in which 24,102 (64.23%) were assigned Blast2GO annotations. The genome has 4672 or 95.4% complete BUSCOs and a repeat content of 51.52%. We developed our own method of creating a GR pangenome using the orthologous relationships of the proteins of TN1, IR8, MH63 and IR64, identifying 16,999 core orthologue groups of Green Revolution. From the pangenome, we identified a set of shared and unique gene ontology terms for the accessory clusters, characterizing TN1, IR8, MH63 and IR64. This TN1 genome assembly and GR pangenome will be a resource for new genomic discoveries about Green Revolution, and for improving the disease and insect resistances and the yield of rice.


