Viewport Size Code:
Login | Create New Account


About | Classical Genetics | Timelines | What's New | What's Hot

About | Classical Genetics | Timelines | What's New | What's Hot


Bibliography Options Menu

Hide Abstracts   |   Hide Additional Links
Long bibliographies are displayed in blocks of 100 citations at a time. At the end of each block there is an option to load the next block.

Bibliography on: Pangenome

The Electronic Scholarly Publishing Project: Providing world-wide, free access to classic scientific papers and other scholarly materials, since 1993.


ESP: PubMed Auto Bibliography 02 Dec 2023 at 01:32 Created: 


Although the enforced stability of genomic content is ubiquitous among MCEs, the opposite is proving to be the case among prokaryotes, which exhibit remarkable and adaptive plasticity of genomic content. Early bacterial whole-genome sequencing efforts discovered that whenever a particular "species" was re-sequenced, new genes were found that had not been detected earlier — entirely new genes, not merely new alleles. This led to the concepts of the bacterial core-genome, the set of genes found in all members of a particular "species", and the flex-genome, the set of genes found in some, but not all members of the "species". Together these make up the species' pan-genome.

Created with PubMed® Query: ( pangenome OR "pan-genome" OR "pan genome" ) NOT pmcbook NOT ispreviousversion

Citations The Papers (from PubMed®)


RevDate: 2023-12-01

Liu X, Wu Z, Hu T, et al (2023)

Comparative genomic analysis reveals niche adaption of Lactobacillus acidophilus.

Journal of applied microbiology pii:7457760 [Epub ahead of print].

AIMS: Lactobacillus acidophilus has been extensively applied in plentiful probiotic products. Although several studies have been performed to investigate the beneficial characteristics and genome function of L. acidophilus, comparative genomic analysis remains scarce. In this study, we collected 74 L. acidophilus genomes from our gut bacterial genome collection and the public database and conducted a comprehensive comparative genomic analysis.

METHODS AND RESULTS: This study revealed the potential correlation of the genomic diversity and niche adaptation of L. acidophilus from different perspectives. The pan-genome of L. acidophilus was found to be open, with metabolism, information storage and processing genes mainly distributed in the core genome. Phage- and peptidase-associated genes were found in the genome of the specificity of animal-derived strains, which were related to adaptation of animal gut. SNP analysis showed the differences of the utilization of vitamin B12 in cellular of L. acidophilus strains from animal gut and others.

CONCLUSIONS: This work provides new insights for the genomic diversity analysis of Lactobacillus acidophilus and uncovers the ecological adaptation of the specific strains.

RevDate: 2023-12-01

Andreace F, Lechat P, Dufresne Y, et al (2023)

Comparing methods for constructing and representing human pangenome graphs.

Genome biology, 24(1):274.

BACKGROUND: As a single reference genome cannot possibly represent all the variation present across human individuals, pangenome graphs have been introduced to incorporate population diversity within a wide range of genomic analyses. Several data structures have been proposed for representing collections of genomes as pangenomes, in particular graphs.

RESULTS: In this work, we collect all publicly available high-quality human haplotypes and construct the largest human pangenome graphs to date, incorporating 52 individuals in addition to two synthetic references (CHM13 and GRCh38). We build variation graphs and de Bruijn graphs of this collection using five of the state-of-the-art tools: Bifrost, mdbg, Minigraph, Minigraph-Cactus and pggb. We examine differences in the way each of these tools represents variations between input sequences, both in terms of overall graph structure and representation of specific genetic loci.

CONCLUSION: This work sheds light on key differences between pangenome graph representations, informing end-users on how to select the most appropriate graph type for their application.

RevDate: 2023-11-30

Chen J, Liu Y, Liu M, et al (2023)

Pangenome analysis reveals genomic variations associated with domestication traits in broomcorn millet.

Nature genetics [Epub ahead of print].

Broomcorn millet (Panicum miliaceum L.) is an orphan crop with the potential to improve cereal production and quality, and ensure food security. Here we present the genetic variations, population structure and diversity of a diverse worldwide collection of 516 broomcorn millet genomes. Population analysis indicated that the domesticated broomcorn millet originated from its wild progenitor in China. We then constructed a graph-based pangenome of broomcorn millet based on long-read de novo genome assemblies of 32 representative accessions. Our analysis revealed that the structural variations were highly associated with transposable elements, which influenced gene expression when located in the coding or regulatory regions. We also identified 139 loci associated with 31 key domestication and agronomic traits, including candidate genes and superior haplotypes, such as LG1, for panicle architecture. Thus, the study's findings provide foundational resources for developing genomics-assisted breeding programs in broomcorn millet.

RevDate: 2023-11-30

Muhammad SA, Guo J, Noor K, et al (2023)

Pangenomic and immunoinformatics based analysis of Nipah virus revealed CD4[+] and CD8[+] T-Cell epitopes as potential vaccine candidates.

Frontiers in pharmacology, 14:1290436 pii:1290436.

Introduction: Nipah (NiV) is the zoonotic deadly bat-borne virus that causes neurological and respiratory infections which ultimately lead to death. There are 706 infected cases reported up till now especially in Asia, out of which 409 patients died. There is no vaccine and effective treatment available for NiV infections and we have to timely design such strategies as world could not bear another pandemic situation. Methods: In this study, we screened viral proteins of NiV strains based on pangenomics analysis, antigenicity, molecular weight, and sub-cellular localization. The immunoproteomics based approach was used to predict T-cell epitopes of MHC class-I and II as potential vaccine candidates. These epitopes are capable to activate CD4[+], CD8[+], and T-cell dependent B-lymphocytes. Results: The two surface proteins including fusion glycoprotein (F) and attachment glycoprotein (G) are antigenic with molecular weights of 60 kDa and 67 kDa respectively. Three epitopes of F protein (VNYNSEGIA, PNFILVRNT, and IKMIPNVSN) were ranked and selected based on the binding affinity with MHC class-I, and 3 epitopes (VILNKRYYS, ILVRNTLIS, and VKLQETAEK) with MHC-II molecules. Similarly, for G protein, 3 epitopes each for MHC-I (GKYDKVMPY, ILKPKLISY, and KNKIWCISL) and MHC-II (LRNIEKGKY, FLIDRINWI, and FLLKNKIWC) with substantial binding energies were predicted. Based on the physicochemical properties, all these epitopes are non-toxic, hydrophilic, and stable. Conclusion: Our vaccinomics and system-level investigation could help to trigger the host immune system to prevent NiV infection.

RevDate: 2023-11-30

Feng L, Zhang M, Z Fan (2023)

Population genomic analysis of clinical ST15 Klebsiella pneumoniae strains in China.

Frontiers in microbiology, 14:1272173.

ST15 Klebsiella pneumoniae (Kpn) is a growing public health concern in China and worldwide, yet its genomic and evolutionary dynamics in this region remain poorly understood. This study comprehensively elucidates the population genomics of ST15 Kpn in China by analyzing 287 publicly available genomes. The proportion of the genomes increased sharply from 2012 to 2021, and 92.3% of them were collected from the Yangtze River Delta (YRD) region of eastern China. Carbapenemase genes, including OXA-232, KPC-2, and NDM, were detected in 91.6% of the studied genomes, and 69.2% of which were multidrug resistant (MDR) and hypervirulent (hv). Phylogenetic analysis revealed four clades, C1 (KL112, 59.2%), C2 (mainly KL19, 30.7%), C3 (KL48, 0.7%) and C4 (KL24, 9.4%). C1 appeared in 2007 and was OXA-232-producing and hv; C2 and C4 appeared between 2005 and 2007, and both were KPC-2-producing but with different levels of virulence. Transmission clustering detected 86.1% (n = 247) of the enrolled strains were grouped into 55 clusters (2-159 strains) and C1 was more transmissible than others. Plasmid profiling revealed 88 plasmid clusters (PCs) that were highly heterogeneous both between and within clades. 60.2% (n = 53) of the PCs carrying AMR genes and 7 of which also harbored VFs. KPC-2, NDM and OXA-232 were distributed across 14, 4 and 1 PCs, respectively. The MDR-hv strains all carried one of two homologous PCs encoding iucABCD and rmpA2 genes. Pangenome analysis revealed two major coinciding accessory components predominantly located on plasmids. One component, associated with KPC-2, encompassed 15 additional AMR genes, while the other, linked to OXA-232, involved seven more AMR genes. This study provides essential insights into the genomic evolution of the high-risk ST15 CP-Kpn strains in China and warrants rigorous monitoring.

RevDate: 2023-11-29

Wu F, Zhang T, Wu Q, et al (2023)

Complete genome sequence and comparative analysis of a Vibrio vulnificus strain isolated from a clinical patient.

Frontiers in microbiology, 14:1240835.

Vibrio vulnificus is an opportunistic, global pathogen that naturally inhabits sea water and is responsible for most vibriosis-related deaths. We investigated the genetic characteristics of V. vulnificus isolated from the clinical blood culture specimen of a patient with hepatitis B virus cirrhosis in 2018 (named as V. vulnificus VV2018) by whole genome sequencing (WGS). VV2018 belonged to a novel sequencing type 620 (ST620) and comprised two circular chromosomes, containing 4,389 potential coding sequences (CDSs) and 152 RNA genes. The phylogenetic tree of single nucleotide polymorphisms (SNPs) using 26 representative genomes revealed that VV2108 grouped with two other V. vulnificus strains isolated from humans. The pan-genome of V. vulnificus was constructed using 26 representative genomes to elucidate their genetic diversity, evolutionary characteristics, and virulence and antibiotic resistance profiles. The pan-genome analysis revealed that VV2018 shared a total of 3,016 core genes (≥99% presence), including 115 core virulence factors (VFs) and 5 core antibiotic resistance-related genes, and 309 soft core genes (≥95 and <99% presence) with 25 other V. vulnificus strains. The varG gene might account for the cefazolin resistance, and comparative analysis of the genetic context of varG revealed that two genes upstream and downstream of varG were conserved. The glycosylation (pgl) like genes were found in VV2018 compared with Pgl-related proteins in Neisseria that might affect the adherence of the strain in hosts. The comparative analysis of VV2018 would contribute to a better understanding of the virulence and antibiotic resistance profiles of V. vulnificus. Meanwhile much work remains to be done to better understand the function of pgl-like genes in V. vulnificus.

RevDate: 2023-11-29

Cai X, Peng Y, Yang G, et al (2023)

Populational genomic insights of Paraclostridium bifermentans as an emerging human pathogen.

Frontiers in microbiology, 14:1293206.

Paraclostridium bifermentans (P.b) is an emerging human pathogen that is phylogenomically close to Paeniclostridium sordellii (P.s), while their populational genomic features and virulence capacity remain understudied. Here, we performed comparative genomic analyses of P.b and compared their pan-genomic features and virulence coding profiles to those of P.s. Our results revealed that P.b has a more plastic pangenome, a larger genome size, and a higher GC content than P.s. Interestingly, the P.b and P.s share similar core-genomic functions, but P.b encodes more functions in nutrient metabolism and energy conversion and fewer functions in host defense in their accessory-genomes. The P.b may initiate extracellular infection processes similar to those of P.s and Clostridium perfringens by encoding three toxin homologs (i.e., microbial collagenase, thiol-activated cytolysin, phospholipase C, which are involved in extracellular matrices degradation and membrane damaging) in their core-genomes. However, P.b is less toxic than the P.s by encoding fewer secretion toxins in the core-genome and fewer lethal toxins in the accessory-genome. Notably, P.b carries more toxins genes in their accessory-genomes, particularly those of plasmid origin. Moreover, three within-species and highly conserved plasmid groups, encoding virulence, gene acquisition, and adaptation, were carried by 25-33% of P.b strains and clustered by isolation source rather than geography. This study characterized the pan-genomic virulence features of P.b for the first time, and revealed that P. bifermentans is an emerging pathogen that can threaten human health in many aspects, emphasizing the importance of phenotypic and genomic characterizations of in situ clinical isolates.

RevDate: 2023-11-29

Crosby KC, Rojas M, Sharma P, et al (2023)

Genomic delineation and description of species and within-species lineages in the genus Pantoea.

Frontiers in microbiology, 14:1254999.

As the name of the genus Pantoea ("of all sorts and sources") suggests, this genus includes bacteria with a wide range of provenances, including plants, animals, soils, components of the water cycle, and humans. Some members of the genus are pathogenic to plants, and some are suspected to be opportunistic human pathogens; while others are used as microbial pesticides or show promise in biotechnological applications. During its taxonomic history, the genus and its species have seen many revisions. However, evolutionary and comparative genomics studies have started to provide a solid foundation for a more stable taxonomy. To move further toward this goal, we have built a 2,509-gene core genome tree of 437 public genome sequences representing the currently known diversity of the genus Pantoea. Clades were evaluated for being evolutionarily and ecologically significant by determining bootstrap support, gene content differences, and recent recombination events. These results were then integrated with genome metadata, published literature, descriptions of named species with standing in nomenclature, and circumscriptions of yet-unnamed species clusters, 15 of which we assigned names under the nascent SeqCode. Finally, genome-based circumscriptions and descriptions of each species and each significant genetic lineage within species were uploaded to the LINbase Web server so that newly sequenced genomes of isolates belonging to any of these groups could be precisely and accurately identified.

RevDate: 2023-11-29

Shikov AE, Merkushova AV, Savina IA, et al (2023)

The man, the plant, and the insect: shooting host specificity determinants in Serratia marcescens pangenome.

Frontiers in microbiology, 14:1211999.

INTRODUCTION: Serratia marcescens is most commonly known as an opportunistic pathogen causing nosocomial infections. It, however, was shown to infect a wide range of hosts apart from vertebrates such as insects or plants as well, being either pathogenic or growth-promoting for the latter. Despite being extensively studied in terms of virulence mechanisms during human infections, there has been little evidence of which factors determine S. marcescens host specificity. On that account, we analyzed S. marcescens pangenome to reveal possible specificity factors.

METHODS: We selected 73 high-quality genome assemblies of complete level and reconstructed the respective pangenome and reference phylogeny based on core genes alignment. To find an optimal pipeline, we tested current pangenomic tools and obtained several phylogenetic inferences. The pangenome was rich in its accessory component and was considered open according to the Heaps' law. We then applied the pangenome-wide associating method (pan-GWAS) and predicted positively associated gene clusters attributed to three host groups, namely, humans, insects, and plants.

RESULTS: According to the results, significant factors relating to human infections included transcriptional regulators, lipoproteins, ABC transporters, and membrane proteins. Host preference toward insects, in its turn, was associated with diverse enzymes, such as hydrolases, isochorismatase, and N-acetyltransferase with the latter possibly exerting a neurotoxic effect. Finally, plant infection may be conducted through type VI secretion systems and modulation of plant cell wall synthesis. Interestingly, factors associated with plants also included putative growth-promoting proteins like enzymes performing xenobiotic degradation and releasing ammonium irons. We also identified overrepresented functional annotations within the sets of specificity factors and found that their functional characteristics fell into separate clusters, thus, implying that host adaptation is represented by diverse functional pathways. Finally, we found that mobile genetic elements bore specificity determinants. In particular, prophages were mainly associated with factors related to humans, while genetic islands-with insects and plants, respectively.

DISCUSSION: In summary, functional enrichments coupled with pangenomic inferences allowed us to hypothesize that the respective host preference is carried out through distinct molecular mechanisms of virulence. To the best of our knowledge, the presented research is the first to identify specific genomic features of S. marcescens assemblies isolated from different hosts at the pangenomic level.

RevDate: 2023-11-29

Kabata F, D Thaldar (2023)

The human genome as the common heritage of humanity.

Frontiers in genetics, 14:1282515 pii:1282515.

While debate on the international regulation of human genomic research remains unsettled, the Universal Declaration on the Human Genome and Human Rights, 1997 qualifies the human genome as "heritage of humankind" in a symbolic sense. Using document analysis this article assesses whether, how and to what extent the common heritage framework is relevant in regulation of human genomic research. The article traces the history of the Human Genome Project to reveal the international community's race against privatization of the human genome and its resulting qualification as the common heritage of humanity. Further, it reviews the archival records of UNESCO's International Bioethics Committee to discover the rationale for qualifying the human genome as common heritage of humankind. The article finds that the common heritage of mankind framework remains relevant to the application of the human genome at the collective level. However, the framework is at odds with the individual dimension of the human genome based on individual personality rights. The article thus argues that the right to benefit from scientific progress and its applications offers an alternative international regulatory framework for human genomic research.

RevDate: 2023-11-29

Ghaly TM, Rajabal V, Penesyan A, et al (2023)

Functional enrichment of integrons: Facilitators of antimicrobial resistance and niche adaptation.

iScience, 26(11):108301 pii:S2589-0042(23)02378-7.

Integrons are genetic elements, found among diverse bacteria and archaea, that capture and rearrange gene cassettes to rapidly generate genetic diversity and drive adaptation. Despite their broad taxonomic and geographic prevalence, and their role in microbial adaptation, the functions of gene cassettes remain poorly characterized. Here, using a combination of bioinformatic and experimental analyses, we examined the functional diversity of gene cassettes from different environments. We find that cassettes encode diverse antimicrobial resistance (AMR) determinants, including those conferring resistance to antibiotics currently in the developmental pipeline. Further, we find a subset of cassette functions is universally enriched relative to their broader metagenomes. These are largely involved in (a)biotic interactions, including AMR, phage defense, virulence, biodegradation, and stress tolerance. The remainder of functions are sample-specific, suggesting that they confer localised functions relevant to their microenvironment. Together, they comprise functional profiles different from bulk metagenomes, representing niche-adaptive components of the prokaryotic pangenome.

RevDate: 2023-11-29

Yocca AE, Platts A, Alger E, et al (2023)

Blueberry and cranberry pangenomes as a resource for future genetic studies and breeding efforts.

Horticulture research, 10(11):uhad202 pii:uhad202.

Domestication of cranberry and blueberry began in the United States in the early 1800s and 1900s, respectively, and in part owing to their flavors and health-promoting benefits are now cultivated and consumed worldwide. The industry continues to face a wide variety of production challenges (e.g. disease pressures), as well as a demand for higher-yielding cultivars with improved fruit quality characteristics. Unfortunately, molecular tools to help guide breeding efforts for these species have been relatively limited compared with those for other high-value crops. Here, we describe the construction and analysis of the first pangenome for both blueberry and cranberry. Our analysis of these pangenomes revealed both crops exhibit great genetic diversity, including the presence-absence variation of 48.4% genes in highbush blueberry and 47.0% genes in cranberry. Auxiliary genes, those not shared by all cultivars, are significantly enriched with molecular functions associated with disease resistance and the biosynthesis of specialized metabolites, including compounds previously associated with improving fruit quality traits. The discovery of thousands of genes, not present in the previous reference genomes for blueberry and cranberry, will serve as the basis of future research and as potential targets for future breeding efforts. The pangenome, as a multiple-sequence alignment, as well as individual annotated genomes, are publicly available for analysis on the Genome Database for Vaccinium-a curated and integrated web-based relational database. Lastly, the core-gene predictions from the pangenomes will serve useful to develop a community genotyping platform to guide future molecular breeding efforts across the family.

RevDate: 2023-11-29

Jensen MG, Svraka L, Baez E, et al (2023)

Species- and strain-level diversity of Corynebacteria isolated from human facial skin.

BMC microbiology, 23(1):366.

BACKGROUND: Sequencing of the human skin microbiome revealed that Corynebacterium is an ubiquitous and abundant bacterial genus on human skin. Shotgun sequencing further highlighted the microbial "dark matter" of the skin microbiome, consisting of microorganisms, including corynebacterial species that were not cultivated and genome-sequenced so far. In this pilot project, facial human skin swabs of 13 persons were cultivated to selectively obtain corynebacteria. 54 isolates were collected and 15 of these were genome-sequenced and the pan-genome was determined. The strains were biochemically characterized and antibiotic susceptibility testing (AST) was performed.

RESULTS: Among the 15 sequenced strains, nine different corynebacterial species were found, including two so far undescribed species, tentatively named "Corynebacterium vikingii" and "Corynebacterium borealis", for which closed genome sequences were obtained. Strain variability beyond the species level was determined in biochemical tests, such as the variable presence of urease activity and the capacity to ferment different sugars. The ability to grow under anaerobic conditions on solid agar was found to be species-specific. AST revealed resistances to clindamycin in seven strains. A Corynebacterium pseudokroppenstedtii strain showed additional resistance towards beta-lactam and fluoroquinolone antibiotics; a chromosomally located 17 kb gene cluster with five antibiotic resistance genes was found in the closed genome of this strain.

CONCLUSIONS: Taken together, this pilot study identified an astonishing diversity of cutaneous corynebacterial species in a relatively small cohort and determined species- and strain-specific individualities regarding biochemical and resistance profiles. This further emphasizes the need for cultivation-based studies to be able to study these microorganisms in more detail, in particular regarding their host-interacting and, potentially, -beneficial and/or -detrimental properties.

RevDate: 2023-11-28

Williams AN, Ma A, Croxen MA, et al (2023)

Genomic analysis of Streptococcus pneumoniae serogroup 20 isolates in Alberta, Canada from 1993-2019.

Microbial genomics, 9(11):.

In the province of Alberta, Canada, invasive disease caused by Streptococcus pneumoniae serogroup 20 (serotypes 20A/20B) has been increasing in incidence. Here, we characterize provincial invasive serogroup 20 isolates collected from 1993 to 2019 alongside invasive and non-invasive serogroup 20 isolates from the Global Pneumococcal Sequencing (GPS) Project collected from 1998 to 2015. Trends in clinical metadata and geographic location were evaluated, and serogroup 20 isolate genomes were subjected to molecular sequence typing, virulence and antimicrobial resistance factor mining, phylogenetic analysis and pangenome calculation. Two hundred and seventy-four serogroup 20 isolates from Alberta were sequenced, and analysed along with 95 GPS Project genomes. The majority of invasive Alberta serogroup 20 isolates were identified after 2007 in primarily middle-aged adults and typed predominantly as ST235, a sequence type that was rare among GPS Project isolates. Most Alberta isolates carried a full-length whaF capsular gene, suggestive of serotype 20B. All Alberta and GPS Project genomes carried molecular resistance determinants implicated in fluoroquinolone and macrolide resistance, with a few Alberta isolates exhibiting phenotypic resistance to azithromycin, clindamycin, erythromycin, tetracycline and trimethoprim-sulfamethoxazole, as well as non-susceptibility to tigecycline. All isolates carried multiple virulence factors including those involved in adherence, immune modulation and nutrient uptake, as well as exotoxins and exoenzymes. Phylogenetically, Alberta serogroup 20 isolates clustered with predominantly invasive GPS Project isolates from the USA, Israel, Brazil and Nepal. Overall, this study highlights the increasing incidence of invasive S. pneumoniae serogroup 20 disease in Alberta, Canada, and provides insights into the genetic and clinical characteristics of these isolates within a global context.

RevDate: 2023-11-28

Ramsbottom KA, Prakash A, Riverol YP, et al (2023)

A meta-analysis of rice phosphoproteomics data to understand variation in cell signalling across the rice pan-genome.

bioRxiv : the preprint server for biology pii:2023.11.17.567512.

Phosphorylation is the most studied post-translational modification, and has multiple biological functions. In this study, we have re-analysed publicly available mass spectrometry proteomics datasets enriched for phosphopeptides from Asian rice (Oryza sativa). In total we identified 15,522 phosphosites on serine, threonine and tyrosine residues on rice proteins. We identified sequence motifs for phosphosites, and link motifs to enrichment of different biological processes, indicating different downstream regulation likely caused by different kinase groups. We cross-referenced phosphosites against the rice 3,000 genomes, to identify single amino acid variations (SAAVs) within or proximal to phosphosites that could cause loss of a site in a given rice variety. The data was clustered to identify groups of sites with similar patterns across rice family groups, for example those highly conserved in Japonica, but mostly absent in Aus type rice varieties - known to have different responses to drought. These resources can assist rice researchers to discover alleles with significantly different functional effects across rice varieties. The data has been loaded into UniProt Knowledge-Base - enabling researchers to visualise sites alongside other data on rice proteins e.g. structural models from AlphaFold2, PeptideAtlas and the PRIDE database - enabling visualisation of source evidence, including scores and supporting mass spectra.

RevDate: 2023-11-27

Chen S, Wang P, Kong W, et al (2023)

Gene mining and genomics-assisted breeding empowered by the pangenome of tea plant Camellia sinensis.

Nature plants [Epub ahead of print].

Tea is one of the world's oldest crops and is cultivated to produce beverages with various flavours. Despite advances in sequencing technologies, the genetic mechanisms underlying key agronomic traits of tea remain unclear. In this study, we present a high-quality pangenome of 22 elite cultivars, representing broad genetic diversity in the species. Our analysis reveals that a recent long terminal repeat burst contributed nearly 20% of gene copies, introducing functional genetic variants that affect phenotypes such as leaf colour. Our graphical pangenome improves the efficiency of genome-wide association studies and allows the identification of key genes controlling bud flush timing. We also identified strong correlations between allelic variants and flavour-related chemistries. These findings deepen our understanding of the genetic basis of tea quality and provide valuable genomic resources to facilitate its genomics-assisted breeding.

RevDate: 2023-11-27

Liu H, Zhao W, Hua W, et al (2023)

Correction: A large-scale population based organelle pan-genomes construction and phylogeny analysis reveal the genetic diversity and the evolutionary origins of chloroplast and mitochondrion in Brassica napus L.

BMC genomics, 24(1):716.

RevDate: 2023-11-27

Edwards D, J Batley (2023)

Teatime for pangenomics.

Nature plants [Epub ahead of print].

RevDate: 2023-11-27

Hong A, Oliva M, Köppl D, et al (2023)

PFP-FM: An Accelerated FM-index.

Research square.

FM-indexes are a crucial data structure in DNA alignment, but searching with them usually takes at least one random access per character in the query pattern. Ferragina and Fischer [1] observed in 2007 that word-based indexes often use fewer random accesses than character-based indexes, and thus support faster searches. Since DNA lacks natural word-boundaries, however, it is necessary to parse it somehow before applying word-based FM-indexing. Last year, Deng et al. [2] proposed parsing genomic data by induced suffix sorting, and showed the resulting word-based FM-indexes support faster counting queries than standard FM-indexes when patterns are a few thousand characters or longer. In this paper we show that using prefix-free parsing-which takes parameters that let us tune the average length of the phrases-instead of induced suffix sorting, gives a significant speedup for patterns of only a few hundred characters. We implement our method and demonstrate it is between 3 and 18 times faster than competing methods on queries to GRCh38, and is consistently faster on queries made to 25,000, 50,000 and 100,000 SARS-CoV-2 genomes. Hence, it seems our method accelerates the performance of count over all state-of-the-art methods with a minor increase in the memory. The source code for PFP-FM is available at

RevDate: 2023-11-26

Mackenzie A, Norman M, Gessese M, et al (2023)

Wheat stripe rust resistance locus YR63 is a hot spot for evolution of defence genes - a pangenome discovery.

BMC plant biology, 23(1):590.

BACKGROUND: Stripe rust, caused by Puccinia striiformis f. sp. tritici (Pst), poses a threat to global wheat production. Deployment of widely effective resistance genes underpins management of this ongoing threat. This study focused on the mapping of stripe rust resistance gene YR63 from a Portuguese hexaploid wheat landrace AUS27955 of the Watkins Collection.

RESULTS: YR63 exhibits resistance to a broad spectrum of Pst races from Australia, Africa, Asia, Europe, Middle East and South America. It was mapped to the short arm of chromosome 7B, between two single nucleotide polymorphic (SNP) markers sunCS_YR63 and sunCS_67, positioned at 0.8 and 3.7 Mb, respectively, in the Chinese Spring genome assembly v2.1. We characterised YR63 locus using an integrated approach engaging targeted genotyping-by-sequencing (tGBS), mutagenesis, resistance gene enrichment and sequencing (MutRenSeq), RNA sequencing (RNASeq) and comparative genomic analysis with tetraploid (Zavitan and Svevo) and hexaploid (Chinese Spring) wheat genome references and 10+ hexaploid wheat genomes. YR63 is positioned at a hot spot enriched with multiple nucleotide-binding and leucine rich repeat (NLR) and kinase domain encoding genes, known widely for defence against pests and diseases in plants and animals. Detection of YR63 within these gene clusters is not possible through short-read sequencing due to high homology between members. However, using the sequence of a NLR member we were successful in detecting a closely linked SNP marker for YR63 and validated on a panel of Australian bread wheat, durum and triticale cultivars.

CONCLUSIONS: This study highlights YR63 as a valuable source for resistance against Pst in Australia and elsewhere. The closely linked SNP marker will facilitate rapid introgression of YR63 into elite cultivars through marker-assisted selection. The bottleneck of this study reinforces the necessity for a long-read sequencing such as PacBio or Oxford Nanopore based techniques for accurate detection of the underlying resistance gene when it is part of a large gene cluster.

RevDate: 2023-11-25

Carter MQ, Quiñones B, He X, et al (2023)

Genomic and Phenotypic Characterization of Shiga Toxin-Producing Escherichia albertii Strains Isolated from Wild Birds in a Major Agricultural Region in California.

Microorganisms, 11(11): pii:microorganisms11112803.

Escherichia albertii is an emerging foodborne pathogen. To better understand the pathogenesis and health risk of this pathogen, comparative genomics and phenotypic characterization were applied to assess the pathogenicity potential of E. albertii strains isolated from wild birds in a major agricultural region in California. Shiga toxin genes stx2f were present in all avian strains. Pangenome analyses of 20 complete genomes revealed a total of 11,249 genes, of which nearly 80% were accessory genes. Both core gene-based phylogenetic and accessory gene-based relatedness analyses consistently grouped the three stx2f-positive clinical strains with the five avian strains carrying ST7971. Among the three Stx2f-converting prophage integration sites identified, ssrA was the most common one. Besides the locus of enterocyte effacement and type three secretion system, the high pathogenicity island, OI-122, and type six secretion systems were identified. Substantial strain variation in virulence gene repertoire, Shiga toxin production, and cytotoxicity were revealed. Six avian strains exhibited significantly higher cytotoxicity than that of stx2f-positive E. coli, and three of them exhibited a comparable level of cytotoxicity with that of enterohemorrhagic E. coli outbreak strains, suggesting that wild birds could serve as a reservoir of E. albertii strains with great potential to cause severe diseases in humans.

RevDate: 2023-11-25

Xue M, Gao Q, Yan R, et al (2023)

Comparative Genomic Analysis of Shrimp-Pathogenic Vibrio parahaemolyticus LC and Intraspecific Strains with Emphasis on Virulent Factors of Mobile Genetic Elements.

Microorganisms, 11(11): pii:microorganisms11112752.

Vibrio parahaemolyticus exhibits severe pathogenicity in humans and animals worldwide. In this study, genome sequencing and comparative analyses were conducted for in-depth characterization of the virulence factor (VF) repertoire of V. parahaemolyticus strain LC, which presented significant virulence to shrimp Litopenaeus vannamei. Strain LC, harboring two circular chromosomes and three linear plasmids, demonstrated ≥98.14% average nucleotide identities with 31 publicly available V. parahaemolyticus genomes, including 13, 11, and 7 shrimp-, human-, and non-pathogenic strains, respectively. Phylogeny analysis based on dispensable genes of pan-genome clustered 11 out of 14 shrimp-pathogenic strains and 7 out of 11 clinical strains into two distinct clades, indicating the close association between host-specific pathogenicity and accessory genes. The VFDB database revealed that 150 VFs of LC were mainly associated with the secretion system, adherence, antiphagocytosis, chemotaxis, motility, and iron uptake, whereas no homologs of the typical pathogenic genes pirA, pirB, tdh, and trh were detected. Four genes, mshB, wbfT, wbfU, and wbtI, were identified in both types of pathogenic strains but were absent in non-pathogens. Notably, a unique cluster similar to Yen-Tc, which encodes an insecticidal toxin complex, and diverse toxin-antitoxin (TA) systems, were identified on the mobile genetic elements (MGEs) of LC. Conclusively, in addition to the common VFs, various unique MGE-borne VFs, including the Yen-Tc cluster, TA components, and multiple chromosome-encoded chitinase genes, may contribute to the full spectrum of LC virulence. Moreover, V. parahaemolyticus demonstrates host-specific virulence, which potentially drives the origin and spread of pathogenic factors.

RevDate: 2023-11-25

Wang C, Mao L, Bao G, et al (2023)

Pan-Genome Analyses of the Genus Cohnella and Proposal of the Novel Species Cohnella silvisoli sp. nov., Isolated from Forest Soil.

Microorganisms, 11(11): pii:microorganisms11112726.

Two strains, designated NL03-T5[T] and NL03-T5-1, were isolated from a soil sample collected from the Nanling National Forests, Guangdong Province, PR China. The two strains were Gram-stain-positive, aerobic, rod-shaped and had lophotrichous flagellation. Strain NL03-T5[T] could secrete extracellular mucus whereas NL03-T5-1 could not. Phylogenetic analysis based on 16S rRNA gene sequences revealed that the two strains belong to the genus Cohnella, were most closely related to Cohnella lupini LMG 27416[T] (95.9% and 96.1% similarities), and both showed 94.0% similarity with Cohnella arctica NRRL B-59459[T], respectively. The two strains showed 99.8% 16S rRNA gene sequence similarity between them. The draft genome size of strain NL03-T5[T] was 7.44 Mbp with a DNA G+C content of 49.2 mol%. The average nucleotide identities (ANI) and the digital DNA-DNA hybridization (dDDH) values between NL03-T5[T] and NL03-T5-1 were 99.98% and 100%, indicating the two strains were of the same species. Additionally, the ANI and dDDH values between NL03-T5[T] and C. lupini LMG 27416[T] were 76.1% and 20.4%, respectively. The major cellular fatty acids of strain NL03-T5[T] included anteiso-C15:0 and iso-C16:0. The major polar lipids and predominant respiratory quinone were diphosphatidylglycerol (DPG) and menaquinone-7 (MK-7). Based on phylogenetic analysis, phenotypic and chemotaxonomic characterization, genomic DNA G+C content, and ANI and dDDH values, strains NL03-T5[T] and NL03-T5-1 represent novel species in the genus Cohnella, for which the name Cohnella silvisoli is proposed. The type strain is NL03-T5[T] (=GDMCC 1.2294[T] = JCM 34999[T]). Furthermore, comparative genomics revealed that the genus Cohnella had an open pan-genome. The pan-genome of 29 Cohnella strains contained 41,356 gene families, and the number of strain-specific genes ranged from 6 to 1649. The results may explain the good adaptability of the Cohnella strains to different habitats at the genetic level.

RevDate: 2023-11-25

Singh G, Singh N, Ellur RK, et al (2023)

Genetic Enhancement for Biotic Stress Resistance in Basmati Rice through Marker-Assisted Backcross Breeding.

International journal of molecular sciences, 24(22): pii:ijms242216081.

Pusa Basmati 1509 (PB1509) is one of the major foreign-exchange-earning varieties of Basmati rice; it is semi-dwarf and early maturing with exceptional cooking quality and strong aroma. However, it is highly susceptible to various biotic stresses including bacterial blight and blast. Therefore, bacterial blight resistance genes, namely, xa13 + Xa21 and Xa38, and fungal blast resistance genes Pi9 + Pib and Pita were incorporated into the genetic background of recurrent parent (RP) PB1509 using donor parents, namely, Pusa Basmati 1718 (PB1718), Pusa 1927 (P1927), Pusa 1929 (P1929) and Tetep, respectively. Foreground selection was carried out with respective gene-linked markers, stringent phenotypic selection for recurrent parent phenotype, early generation background selection with Simple sequence repeat (SSR) markers, and background analysis at advanced generations with Rice Pan Genome Array comprising 80K SNPs. This has led to the development of Near isogenic lines (NILs), namely, Pusa 3037, Pusa 3054, Pusa 3060 and Pusa 3066 carrying genes xa13 + Xa21, Xa38, Pi9 + Pib and Pita with genomic similarity of 98.25%, 98.92%, 97.38% and 97.69%, respectively, as compared to the RP. Based on GGE-biplot analysis, Pusa 3037-1-44-3-164-20-249-2 carrying xa13 + Xa21, Pusa 3054-2-47-7-166-24-261-3 carrying Xa38, Pusa 3060-3-55-17-157-4-124-1 carrying Pi9 + Pib, and Pusa 3066-4-56-20-159-8-174-1 carrying Pita were identified to be relatively stable and better-performing individuals in the tested environments. Intercrossing between the best BC3F1s has led to the generation of Pusa 3122 (xa13 + Xa21 + Xa38), Pusa 3124 (Xa38 + Pi9 + Pib) and Pusa 3123 (Pi9 + Pib + Pita) with agronomy, grain and cooking quality parameters at par with PB1509. Cultivation of such improved varieties will help farmers reduce the cost of cultivation with decreased pesticide use and improve productivity with ensured safety to consumers.

RevDate: 2023-11-25

Zhegalova IV, Vasiluev PA, Flyamer IM, et al (2023)

Trisomies Reorganize Human 3D Genome.

International journal of molecular sciences, 24(22): pii:ijms242216044.

Trisomy is the presence of one extra copy of an entire chromosome or its part in a cell nucleus. In humans, autosomal trisomies are associated with severe developmental abnormalities leading to embryonic lethality, miscarriage or pronounced deviations of various organs and systems at birth. Trisomies are characterized by alterations in gene expression level, not exclusively on the trisomic chromosome, but throughout the genome. Here, we applied the high-throughput chromosome conformation capture technique (Hi-C) to study chromatin 3D structure in human chorion cells carrying either additional chromosome 13 (Patau syndrome) or chromosome 16 and in cultured fibroblasts with extra chromosome 18 (Edwards syndrome). The presence of extra chromosomes results in systematic changes of contact frequencies between small and large chromosomes. Analyzing the behavior of individual chromosomes, we found that a limited number of chromosomes change their contact patterns stochastically in trisomic cells and that it could be associated with lamina-associated domains (LAD) and gene content. For trisomy 13 and 18, but not for trisomy 16, the proportion of compacted loci on a chromosome is correlated with LAD content. We also found that regions of the genome that become more compact in trisomic cells are enriched in housekeeping genes, indicating a possible decrease in chromatin accessibility and transcription level of these genes. These results provide a framework for understanding the mechanisms of pan-genome transcription dysregulation in trisomies in the context of chromatin spatial organization.

RevDate: 2023-11-25

Qian M, Han X, Liu J, et al (2023)

Genomic Insights on the Carbon-Negative Workhorse: Systematical Comparative Genomic Analysis on 56 Synechococcus Strains.

Bioengineering (Basel, Switzerland), 10(11): pii:bioengineering10111329.

Synechococcus, a type of ancient photosynthetic cyanobacteria, is crucial in modern carbon-negative synthetic biology due to its potential for producing bioenergy and high-value products. With its high biomass, fast growth rate, and established genetic manipulation tools, Synechococcus has become a research focus in recent years. Abundant germplasm resources have been accumulated from various habitats, including temperature and salinity conditions relevant to industrialization. In this study, a comprehensive analysis of complete genomes of the 56 Synechococcus strains currently available in public databases was performed, clarifying genetic relationships, the adaptability of Synechococcus to the environment, and its reflection at the genomic level. This was carried out via pan-genome analysis and a detailed comparison of the functional gene groups. The results revealed an open-genome pattern, with 275 core genes and variable genome sizes within these strains. The KEGG annotation and orthology composition comparisons unveiled that the cold and thermophile strains have 32 and 84 unique KO functional units in their shared core gene functional units, respectively. Each KO functional unit reflects unique gene families and pathways. In terms of salt tolerance and comparative genomics, there are 65 unique KO functional units in freshwater-adapted strains and 154 in strictly marine strains. By delving into these aspects, our understanding of the metabolic potential of Synechococcus was deepened, promoting the development and industrial application of cyanobacterial biotechnology.

RevDate: 2023-11-24

Gao G, Zhang H, Ni J, et al (2023)

Insights into genetic diversity and phenotypic variations in domestic geese through comprehensive population and pan-genome analysis.

Journal of animal science and biotechnology, 14(1):150.

BACKGROUND: Domestic goose breeds are descended from either the Swan goose (Anser cygnoides) or the Greylag goose (Anser anser), exhibiting variations in body size, reproductive performance, egg production, feather color, and other phenotypic traits. Constructing a pan-genome facilitates a thorough identification of genetic variations, thereby deepening our comprehension of the molecular mechanisms underlying genetic diversity and phenotypic variability.

RESULTS: To comprehensively facilitate population genomic and pan-genomic analyses in geese, we embarked on the task of 659 geese whole genome resequencing data and compiling a database of 155 RNA-seq samples. By constructing the pan-genome for geese, we generated non-reference contigs totaling 612 Mb, unveiling a collection of 2,813 novel genes and pinpointing 15,567 core genes, 1,324 softcore genes, 2,734 shell genes, and 878 cloud genes in goose genomes. Furthermore, we detected an 81.97 Mb genomic region showing signs of genome selection, encompassing the TGFBR2 gene correlated with variations in body weight among geese. Genome-wide association studies utilizing single nucleotide polymorphisms (SNPs) and presence-absence variation revealed significant genomic associations with various goose meat quality, reproductive, and body composition traits. For instance, a gene encoding the SVEP1 protein was linked to carcass oblique length, and a distinct gene-CDS haplotype of the SVEP1 gene exhibited an association with carcass oblique length. Notably, the pan-genome analysis revealed enrichment of variable genes in the "hair follicle maturation" Gene Ontology term, potentially linked to the selection of feather-related traits in geese. A gene presence-absence variation analysis suggested a reduced frequency of genes associated with "regulation of heart contraction" in domesticated geese compared to their wild counterparts. Our study provided novel insights into gene expression features and functions by integrating gene expression patterns across multiple organs and tissues in geese and analyzing population variation.

CONCLUSION: This accomplishment originates from the discernment of a multitude of selection signals and candidate genes associated with a wide array of traits, thereby markedly enhancing our understanding of the processes underlying domestication and breeding in geese. Moreover, assembling the pan-genome for geese has yielded a comprehensive apprehension of the goose genome, establishing it as an indispensable asset poised to offer innovative viewpoints and make substantial contributions to future geese breeding initiatives.

RevDate: 2023-11-24

Hyun JC, Monk JM, Szubin R, et al (2023)

Global pathogenomic analysis identifies known and candidate genetic antimicrobial resistance determinants in twelve species.

Nature communications, 14(1):7690.

Surveillance programs for managing antimicrobial resistance (AMR) have yielded thousands of genomes suited for data-driven mechanism discovery. We present a workflow integrating pangenomics, gene annotation, and machine learning to identify AMR genes at scale. When applied to 12 species, 27,155 genomes, and 69 drugs, we 1) find AMR gene transfer mostly confined within related species, with 925 genes in multiple species but just eight in multiple phylogenetic classes, 2) demonstrate that discovery-oriented support vector machines outperform contemporary methods at recovering known AMR genes, recovering 263 genes compared to 145 by Pyseer, and 3) identify 142 AMR gene candidates. Validation of two candidates in E. coli BW25113 reveals cases of conditional resistance: ΔcycA confers ciprofloxacin resistance in minimal media with D-serine, and frdD V111D confers ampicillin resistance in the presence of ampC by modifying the overlapping promoter. We expect this approach to be adaptable to other species and phenotypes.

RevDate: 2023-11-24

Gmeiner A, Njage PMK, Hansen LT, et al (2023)

Predicting Listeria monocytogenes virulence potential using whole genome sequencing and machine learning.

International journal of food microbiology, 410:110491 pii:S0168-1605(23)00408-7 [Epub ahead of print].

Contamination with food-borne pathogens, such as Listeria monocytogenes, remains a big concern for food safety. Hence, rigorous and continuous microbial surveillance is a standard procedure. At this point, however, the food industry and authorities only focus on detection of Listeria monocytogenes without characterization of individual strains into groups of more or less concern. As whole genome sequencing (WGS) gains increasing interest in the industry, this methodology presents an opportunity to obtain finer resolution of microbial traits such as virulence. Within this study, we therefore aimed to explore the use of WGS in combination with Machine Learning (ML) to predict L. monocytogenes virulence potential on a sub-species level. The WGS datasets used in this study for ML model training consisted of i) national surveillance isolates (n = 169, covering 38 MLST types) and ii) publicly available isolates acquired through the GenomeTrakr network (n = 2880, spanning 80 MLST types). We used the clinical frequency, i.e., ratio of the number of clinical isolates to total amount of isolates, as estimate for virulence potential. The predictive performance of input features from three different genomic levels (i.e., virulence genes, pan-genome genes, and single nucleotide polymorphisms (SNPs)) and six machine learning algorithms (i.e., Support Vector Machine with a linear kernel, Support Vector Machine with a radial kernel, Random Forrest, Neural Networks, LogitBoost, and Majority Voting) were compared. Our machine learning models predicted sub-species virulence potential with nested cross-validation F1-scores up to 0.88 for the majority voting classifier trained on national surveillance data and using pan-genome genes as input features. The validation of the pre-trained ML models based on 101 previously in vivo studied isolates resulted in F1-scores up to 0.76. Furthermore, we found that the more rapid and less computationally intensive raw read alignment yields comparably accurate models as de novo assembly. The results of our study suggest that a majority voting classifier trained on pan-genome genes is the best and most robust choice for the prediction of clinical frequency. Our study contributes to more rapid and precise characterization of L. monocytogenes virulence and its variation on a sub-species level. We further demonstrated a possible application of WGS data in the context of microbial hazard characterization for food safety. In the future, predictive models may assist case-specific microbial risk management in the food industry. The python code, pre-trained models, and prediction pipeline are deposited at (

RevDate: 2023-11-24

Li Y, Yao J, Sang H, et al (2023)

Pan-genome analysis highlights the role of structural variation in the evolution and environmental adaptation of Asian honeybees.

Molecular ecology resources [Epub ahead of print].

The Asian honeybee, Apis cerana, is an ecologically and economically important pollinator. Mapping its genetic variation is key to understanding population-level health, histories and potential capacities to respond to environmental changes. However, most efforts to date were focused on single nucleotide polymorphisms (SNPs) based on a single reference genome, thereby ignoring larger scale genomic variation. We employed long-read sequencing technologies to generate a chromosome-scale reference genome for the ancestral group of A. cerana. Integrating this with 525 resequencing data sets, we constructed the first pan-genome of A. cerana, encompassing almost the entire gene content. We found that 31.32% of genes in the pan-genome were variably present across populations, providing a broad gene pool for environmental adaptation. We identified and characterized structural variations (SVs) and found that they were not closely linked with SNP distributions; however, the formation of SVs was closely associated with transposable elements. Furthermore, phylogenetic analysis using SVs revealed a novel A. cerana ecological group not recoverable from the SNP data. Performing environmental association analysis identified a total of 44 SVs likely to be associated with environmental adaptation. Verification and analysis of one of these, a 330 bp deletion in the Atpalpha gene, indicated that this SV may promote the cold adaptation of A. cerana by altering gene expression. Taken together, our study demonstrates the feasibility and utility of applying pan-genome approaches to map and explore genetic feature variations of honeybee populations, and in particular to examine the role of SVs in the evolution and environmental adaptation of A. cerana.

RevDate: 2023-11-23

Bonnici V, Mengoni C, Mangoni M, et al (2023)

PanDelos-frags: A methodology for discovering pangenomic content of incomplete microbial assemblies.

Journal of biomedical informatics pii:S1532-0464(23)00273-3 [Epub ahead of print].

Pangenomics was originally defined as the problem of comparing the composition of genes into gene families within a set of bacterial isolates belonging to the same species. The problem requires the calculation of sequence homology among such genes. When combined with metagenomics, namely for human microbiome composition analysis, gene-oriented pangenome detection becomes a promising method to decipher ecosystem functions and population-level evolution. Established computational tools are able to investigate the genetic content of isolates for which a complete genomic sequence is available. However, there is a plethora of incomplete genomes that are available on public resources, which only a few tools may analyze. Incomplete means that the process for reconstructing their genomic sequence is not complete, and only fragments of their sequence are currently available. However, the information contained in these fragments may play an essential role in the analyses. Here, we present PanDelos-frags, a computational tool which exploits and extends previous results in analysing complete genomes. It provides a new methodology for inferring missing genetic information and thus for managing incomplete genomes. PanDelos-frags outperforms state-of-the-art approaches in reconstructing gene families in synthetic benchmarks and in a real use case of metagenomics. PanDelos-frags is publicly available at

RevDate: 2023-11-23

Vos M, Padfield D, Quince C, et al (2023)

Adaptive radiations in natural populations of prokaryotes: innovation is key.

FEMS microbiology ecology pii:7444994 [Epub ahead of print].

Prokaryote diversity makes up most of the tree of life and is crucial to the functioning of the biosphere and human health. However, the patterns and mechanisms of prokaryote diversification have received relatively little attention compared to animals and plants. Adaptive radiation, the rapid diversification of an ancestor species into multiple ecologically divergent species, is a fundamental process by which macrobiological diversity is generated. Here, we discuss whether ecological opportunity could lead to similar bursts of diversification in bacteria. We explore how adaptive radiations in prokaryotes can be kickstarted by horizontally acquired key innovations allowing lineages to invade new niche space that subsequently is partitioned among diversifying specialist descendants. We discuss how novel adaptive zones are colonised and exploited after the evolution of a key innovation and whether certain types of are more prone to adaptive radiation. Radiation into niche specialists does not necessarily lead to speciation in bacteria when barriers to recombination are absent. We propose that in this scenario, niche-specific genes could accumulate within a single lineage, leading to the evolution of an open pan-genome.

RevDate: 2023-11-23

Rice ES, Alberdi A, Alfieri J, et al (2023)

A pangenome graph reference of 30 chicken genomes allows genotyping of large and complex structural variants.

BMC biology, 21(1):267.

BACKGROUND: The red junglefowl, the wild outgroup of domestic chickens, has historically served as a reference for genomic studies of domestic chickens. These studies have provided insight into the etiology of traits of commercial importance. However, the use of a single reference genome does not capture diversity present among modern breeds, many of which have accumulated molecular changes due to drift and selection. While reference-based resequencing is well-suited to cataloging simple variants such as single-nucleotide changes and short insertions and deletions, it is mostly inadequate to discover more complex structural variation in the genome.

METHODS: We present a pangenome for the domestic chicken consisting of thirty assemblies of chickens from different breeds and research lines.

RESULTS: We demonstrate how this pangenome can be used to catalog structural variants present in modern breeds and untangle complex nested variation. We show that alignment of short reads from 100 diverse wild and domestic chickens to this pangenome reduces reference bias by 38%, which affects downstream genotyping results. This approach also allows for the accurate genotyping of a large and complex pair of structural variants at the K feathering locus using short reads, which would not be possible using a linear reference.

CONCLUSIONS: We expect that this new paradigm of genomic reference will allow better pinpointing of exact mutations responsible for specific phenotypes, which will in turn be necessary for breeding chickens that meet new sustainability criteria and are resilient to quickly evolving pathogen threats.

RevDate: 2023-11-22

Glad HM, Tralamazza SM, D Croll (2023)

The expression landscape and pangenome of long non-coding RNA in the fungal wheat pathogen Zymoseptoria tritici.

Microbial genomics, 9(11):.

Long non-coding RNAs (lncRNAs) are regulatory molecules interacting in a wide array of biological processes. lncRNAs in fungal pathogens can be responsive to stress and play roles in regulating growth and nutrient acquisition. Recent evidence suggests that lncRNAs may also play roles in virulence, such as regulating pathogenicity-associated enzymes and on-host reproductive cycles. Despite the importance of lncRNAs, only a few model fungi have well-documented inventories of lncRNA. In this study, we apply a recent computational pipeline to predict high-confidence lncRNA candidates in Zymoseptoria tritici, an important global pathogen of wheat impacting global food production. We analyse genomic features of lncRNAs and the most likely associated processes through analyses of expression over a host infection cycle. We find that lncRNAs are frequently expressed during early infection, before the switch to necrotrophic growth. They are mostly located in facultative heterochromatic regions, which are known to contain many genes associated with pathogenicity. Furthermore, we find that lncRNAs are frequently co-expressed with genes that may be involved in responding to host defence signals, such as oxidative stress. Finally, we assess pangenome features of lncRNAs using four additional reference-quality genomes. We find evidence that the repertoire of expressed lncRNAs varies substantially between individuals, even though lncRNA loci tend to be shared at the genomic level. Overall, this study provides a repertoire and putative functions of lncRNAs in Z. tritici enabling future molecular genetics and functional analyses in an important pathogen.

RevDate: 2023-11-21

Hernández-Soto LM, Martínez-Abarca F, Ramírez-Saad H, et al (2023)

Genome analysis of haloalkaline isolates from the soda saline crater lake of Isabel Island; comparative genomics and potential metabolic analysis within the genus Halomonas.

BMC genomics, 24(1):696.

BACKGROUND: Isabel Island is a Mexican volcanic island primarily composed of basaltic stones. It features a maar known as Laguna Fragatas, which is classified as a meromictic thalassohaline lake. The constant deposition of guano in this maar results in increased levels of phosphorus, nitrogen, and carbon. The aim of this study was to utilize high-quality genomes from the genus Halomonas found in specialized databases as a reference for genome mining of moderately halophilic bacteria isolated from Laguna Fragatas. This research involved genomic comparisons employing phylogenetic, pangenomic, and metabolic-inference approaches.

RESULTS: The Halomonas genus exhibited a large open pangenome, but several genes associated with salt metabolism and homeostatic regulation (ectABC and betABC), nitrogen intake through nitrate and nitrite transporters (nasA, and narGI), and phosphorus uptake (pstABCS) were shared among the Halomonas isolates.

CONCLUSIONS: The isolated bacteria demonstrate consistent adaptation to high salt concentrations, and their nitrogen and phosphorus uptake mechanisms are highly optimized. This optimization is expected in an extremophile environment characterized by minimal disturbances or abrupt seasonal variations. The primary significance of this study lies in the dearth of genomic information available for this saline and low-disturbance environment. This makes it important for ecosystem conservation and enabling an exploration of its biotechnological potential. Additionally, the study presents the first two draft genomes of H. janggokensis.

RevDate: 2023-11-17

Cheng J, Wu S, Ye Q, et al (2023)

A novel multiplex PCR based method for the detection of Listeria monocytogenes clonal complex 8.

International journal of food microbiology, 409:110475 pii:S0168-1605(23)00392-6 [Epub ahead of print].

Listeria monocytogenes is an important foodborne pathogen worldwide, which could cause listeriosis with a 20-30 % fatality rate in immunocompromised individuals. Listeria monocytogenes MLST clonal complex (CC) 8 strain is a common clone in food and clinical cases. The aim of this study was to develop multiplex PCR (mPCR) and high-resolution melting (HRM) qPCR to simultaneously detect L. monocytogenes CC8 and the other L. monocytogenes strains based on pan-genome analysis. A novel multiplex PCR and HRM qPCR targeted for the genes LM5578_1180 (specific for CC8) and LM5578_2262 (for L. monocytogenes) were developed. The specificity of this multiplex PCR and HRM qPCR were verified with other CCs of L. monocytogenes and other species strains. The detection limit of this multiplex PCR and HRM qPCR is 2.1 × 10[3] CFU/mL and 2.1 × 10[0] CFU/mL, respectively. This multiplex PCR and HRM qPCR could accurately detect CC8 strains with the interference of different ratios of L. monocytogenes CC9, CC87, CC121, CC155, and L. innocua strains. Subsequently, the detection ability of mPCR and HRM qPCR were also evaluated in spiked samples. The mPCR method could successfully detect 6.2 × 10[3] CFU/mL of CC8 L. monocytogenes after 6 h enrichment while the multiplex HRM qPCR method could successfully detect 6.2 × 10[4] CFU/mL of CC8 L. monocytogenes after 3 h enrichment. The feasibility of these methods were satisfactory in terms of sensitivity, specificity, and efficiency after evaluating 12 mushroom samples and was consistent with that of the National Standard Detection Method (GB4789.30-2016). In conclusion, the developed assays could be applied for rapid screening and detection of L. monocytogenes CC8 strains both in food and food production environments, providing accurate results to adopt monitoring measures to improve microbiological safety.

RevDate: 2023-11-17

Corut AK, JG Wallace (2023)

kGWASflow: a modular, flexible, and reproducible Snakemake workflow for k-mers-based GWAS.

G3 (Bethesda, Md.) pii:7425459 [Epub ahead of print].

Genome-wide association studies (GWAS) have been widely used to identify genetic variation associated with complex traits. Despite its success and popularity, the traditional GWAS approach comes with a variety of limitations. For this reason, newer methods for GWAS have been developed, including the use of pan-genomes instead of a reference genome and the utilization of markers beyond single-nucleotide polymorphisms, such as structural variations and k-mers. The k-mers-based GWAS approach has especially gained attention from researchers in recent years. However, these new methodologies can be complicated and challenging to implement. Here, we present kGWASflow, a modular, user-friendly, and scalable workflow to perform GWAS using k-mers. We adopted an existing kmersGWAS method into an easier and more accessible workflow using management tools like Snakemake and Conda and eliminated the challenges caused by missing dependencies and version conflicts. kGWASflow increases the reproducibility of the kmersGWAS method by automating each step with Snakemake and using containerization tools like Docker. The workflow encompasses supplemental components such as quality control, read-trimming procedures, and generating summary statistics. kGWASflow also offers post-GWAS analysis options to identify the genomic location and context of trait-associated k-mers. kGWASflow can be applied to any organism and requires minimal programming skills. kGWASflow is freely available on GitHub ( and Bioconda (

RevDate: 2023-11-17

Khan K, Jalal K, R Uddin (2023)

Pangenome diversification and resistance gene characterization in Salmonella Typhi prioritized RfaJ as a significant therapeutic marker.

Journal, genetic engineering & biotechnology, 21(1):125.

BACKGROUND: Salmonella Typhi stands as the etiological agent responsible for the onset of human typhoid fever. The pressing demand for innovative therapeutic targets against S. Typhi is underscored by the escalating prevalence of this pathogen and the severe nature of its infections. Consequently, this study employs pangenome analysis to scrutinize 119 S. Typhi-resistant strains, aiming to identify the most promising therapeutic targets originating from its core genome.

RESULTS: Subtractive genomics was employed to systematically eliminate non-homologous (n=1147), essential (n=551), drug-like (n=80), and pathogenicity-related (n=18) proteins from the initial pool of 3351 core genome proteins. Consequently, lipopolysaccharide 1,2-glucosyltransferase RfaJ was designated as the optimal pharmacological target due to its potential versatility. Furthermore, a compendium of 9000 FDA-approved compounds was repurposed for evaluation against the RfaJ drug target, with the specific intent of prioritizing novel, high-potency therapeutic candidates for combating S. Typhi. Ultimately, four compounds, namely DB00549 (Zafirlukast), DB15637 (Fluzoparib), DB15688 (Zavegepant), and DB12411 (Bemcentinib), were singled out as potential inhibitors based on the ligand-protein binding affinity (indicated by the lowest anticipated binding energy) and the overall stability of these compounds. Notably, molecular dynamics simulations, conducted over a 50 nanosecond interval, convincingly demonstrated the stability of these compounds in the context of the RfaJ protein.

CONCLUSION: In summary, the present findings hold significant promise as an initial stride in the broader drug discovery endeavor against S. Typhi infections. However, the experimental validation of the identified drug target and drug candidate is further required to increase the effectiveness of the applied methodology.

RevDate: 2023-11-17

Baril T, D Croll (2023)

A pangenome-guided manually curated library of transposable elements for Zymoseptoria tritici.

BMC research notes, 16(1):335.

OBJECTIVES: High-quality species-specific transposable element (TE) libraries are required for studies to elucidate the evolutionary dynamics of TEs and gain an understanding of their impacts on host genomes. Such high-quality TE resources are severely lacking for species in the fungal kingdom. To facilitate future studies on the putative role of TEs in rapid adaptation observed in the fungal wheat pathogen Zymoseptoria tritici, we produced a manually curated TE library. This was generated by detecting TEs in 19 reference genome assemblies representing the global diversity of the species supplemented by multiple sister species genomes. Improvements over previous TE libraries have been made on TE boundary resolution, detection of ORFs, TE domains, terminal inverted repeats, and class-specific motifs.

DATA DESCRIPTION: A TE consensus library for Z. tritici formatted for use with RepeatMasker. This data is relevant to other researchers investigating TE-host evolutionary dynamics in Z. tritici or who are interested in comparative studies of the fungal kingdom. Further, this TE library can be used to improve gene annotation. Finally, this TE library increases the number of manually curated TE datasets, providing resources to further our understanding of TE diversity.

RevDate: 2023-11-17

Ferhaoui N, Tanaka R, Sekizuka T, et al (2023)

Whole genome sequencing and pan-genome analysis of Staphylococcus/Mammaliicoccus spp. isolated from diabetic foot ulcers and contralateral healthy skin of Algerian patients.

BMC microbiology, 23(1):342.

BACKGROUND: Diabetic foot infections (DFIs) are the most common complications of diabetic foot ulcers (DFUs), and a significant cause of lower extremity amputation. In this study we used whole genome sequencing to characterize the clonal composition, virulence and resistance genetic determinants of 58 Staphylococcus/Mammaliicoccus spp. isolates from contralateral healthy skin and DFU from 44 hospitalized patients.

RESULTS: S. aureus (n = 32) and S. epidermidis (n = 10) isolates were recovered from both DFUs and healthy skin, whereas, S. haemolyticus (n = 8), M. sciuri (n = 1), S. hominis (n = 1) and S. simulans (n = 3) were recovered exclusively from healthy skin. In contrast, S. caprae (n = 2) and S. saprophyticus (n = 1) were recovered only from DFUs. Among S. aureus isolates, MRSA were present with high prevalence (27/32, 84.4%), 18 of which (66.7%) were from DFUs and 9 (33.3%) from healthy skin. In contrast, the coagulase-negative Staphylococcus (CoNS)/Mammaliicoccus isolates (n = 26), in particular S. epidermidis and S. haemolyticus were more prevalent in healthy skin, (10/26, 38.5%) and (8/26, 30.8%), respectively. MLST, spa and SCCmec typing classified the 32 S. aureus isolates into 6 STs, ST672, ST80, ST241, ST1, ST97, ST291 and 4 unknown STs (STNF); 8 spa types, t044, t037, t3841, t1247, t127, t639, t937 and t9432 and 2 SCCmec types, type IV and type III(A). Among CoNS, the S. epidermidis isolates belonged to ST54, ST35 and ST640. S. haemolyticus belonged to ST3, ST25, ST29, ST1 and ST56. The sole M. sciuri isolate was found to carry an SCCmec type III(A). A wide range of virulence genes and antimicrobial resistance genes were found among our isolates, with varying distribution between species or STs. The pan-genome analysis revealed a highly clonal population of Staphylococcus isolates, particularly among S. aureus isolates. Interestingly, the majority of S. aureus isolates including MRSA, recovered from the healthy skin and DFUs of the same patient belonged to the same clone and exhibited similar virulence/resistance genotype.

CONCLUSIONS: Our study provides clinically relevant information on the population profile, virulence and antibiotic resistance of Staphylococcus/Mammaliicoccus spp. in DFIs, which could serve as a basis for further studies on these as well as other groups of pathogens associated with DFIs.

RevDate: 2023-11-16

McLaughlin M, Fiebig A, S Crosson (2023)

XRE transcription factors conserved in Caulobacter and φCbK modulate adhesin development and phage production.

PLoS genetics, 19(11):e1011048 pii:PGENETICS-D-23-00944 [Epub ahead of print].

The xenobiotic response element (XRE) family of transcription factors (TFs), which are commonly encoded by bacteria and bacteriophage, regulate diverse features of bacterial cell physiology and impact phage infection dynamics. Through a pangenome analysis of Caulobacter species isolated from soil and aquatic ecosystems, we uncovered an apparent radiation of a paralogous XRE TF gene cluster, several of which have established functions in the regulation of holdfast adhesin development and biofilm formation in C. crescentus. We further discovered related XRE TFs throughout the class Alphaproteobacteria and its phages, including the φCbK Caulophage, suggesting that members of this cluster impact host-phage interactions. Here we show that a closely related group of XRE transcription factors encoded by both C. crescentus and φCbK can physically interact and function to control the transcription of a common gene set, influencing processes including holdfast development and the production of φCbK virions. The φCbK-encoded XRE paralog, tgrL, is highly expressed at the earliest stages of infection and can directly inhibit transcription of host genes including hfiA, a potent holdfast inhibitor, and gafYZ, an activator of prophage-like gene transfer agents (GTAs). XRE proteins encoded from the C. crescentus chromosome also directly repress gafYZ transcription, revealing a functionally redundant set of host regulators that may protect against spurious production of GTA particles and inadvertent cell lysis. Deleting the C. crescentus XRE transcription factors reduced φCbK burst size, while overexpressing these host genes or φCbK tgrL rescued this burst defect. We conclude that this XRE TF gene cluster, shared by C. crescentus and φCbK, plays an important role in adhesion regulation under phage-free conditions, and influences host-phage dynamics during infection.

RevDate: 2023-11-16

Sharma PK, Ahmed HI, Heuberger M, et al (2023)

An online database for einkorn wheat to aid in gene discovery and functional genomics studies.

Database : the journal of biological databases and curation, 2023:.

Diploid A-genome wheat (einkorn wheat) presents a nutrition-rich option as an ancient grain crop and a resource for the improvement of bread wheat against abiotic and biotic stresses. Realizing the importance of this wheat species, reference-level assemblies of two einkorn wheat accessions were generated (wild and domesticated). This work reports an einkorn genome database that provides an interface to the cereals research community to perform comparative genomics, applied genetics and breeding research. It features queries for annotated genes, the use of a recent genome browser release, and the ability to search for sequence alignments using a modern BLAST interface. Other features include a comparison of reference einkorn assemblies with other wheat cultivars through genomic synteny visualization and an alignment visualization tool for BLAST results. Altogether, this resource will help wheat research and breeding. Database URL

RevDate: 2023-11-15

Wang T, Duan S, Xu C, et al (2023)

Pan-genome analysis of 13 Malus accessions reveals structural and sequence variations associated with fruit traits.

Nature communications, 14(1):7377.

Structural variations (SVs) and copy number variations (CNVs) contribute to trait variations in fleshy-fruited species. Here, we assemble 10 genomes of genetically diverse Malus accessions, including the ever-green cultivar 'Granny Smith' and the widely cultivated cultivar 'Red Fuji'. Combining with three previously reported genomes, we assemble the pan-genome of Malus species and identify 20,220 CNVs and 317,393 SVs. We also observe CNVs that are positively correlated with expression levels of the genes they are associated with. Furthermore, we show that the noncoding RNA generated from a 209 bp insertion in the intron of mitogen-activated protein kinase homology encoding gene, MMK2, regulates the gene expression and affects fruit coloration. Moreover, we identify overlapping SVs associated with fruit quality and biotic resistance. This pan-genome uncovers possible contributions of CNVs to gene expression and highlights the role of SVs in apple domestication and economically important traits.

RevDate: 2023-11-15

Nagano DS, Taniguchi I, Ono T, et al (2023)

Systematic analysis of plasmids of the Serratia marcescens complex using 142 closed genomes.

Microbial genomics, 9(11):.

Plasmids play important roles in bacterial genome diversification. In the Serratia marcescens complex (SMC), a notable contribution of plasmids to genome diversification was also suggested by our recent analysis of >600 draft genomes. As accurate analyses of plasmids in draft genomes are difficult, in this study we analysed 142 closed genomes covering the entire complex, 67 of which were obtained in this study, and identified 132 plasmids (1.9-244.4 kb in length) in 77 strains. While the average numbers of plasmids in clinical and non-clinical strains showed no significant difference, strains belonging to clade 2 (one of the two hospital-adapted lineages) contained more plasmids than the others. Pangenome analysis revealed that of the 28 954 genes identified, 12.8 % were plasmid-specific, and 1.4 % were present in plasmids or chromosomes depending on the strain. In the latter group, while transposon-related genes were most prevalent (31.4 % of the function-predicted genes), genes related to antimicrobial resistance and heavy metal resistance accounted for a notable proportion (22.7 %). Mash distance-based clustering separated the 132 plasmids into 23 clusters and 50 singletons. Most clusters/singletons showed notably different GC contents compared to those of host chromosomes, suggesting their recent or relatively recent appearance in the SMC. Among the 23 clusters, 17 were found in only clinical or only non-clinical strains, suggesting the possible preference of their distribution on the environmental niches of host strains. Regarding the host strain phylogeny, 16 clusters were distributed in two or more clades, suggesting their interclade transmission. Moreover, for many plasmids, highly homologous plasmids were found in other species, indicating the broadness of their potential host ranges, beyond the genus, family, order, class or even phylum level. Importantly, highly homologous plasmids were most frequently found in Klebsiella pneumoniae and other species in the family Enterobacteriaceae, suggesting that this family, particularly K. pneumoniae, is the main source for plasmid exchanges with the SMC. These results highlight the power of closed genome-based analysis in the investigation of plasmids and provide important insights into the nature of plasmids distributed in the SMC.

RevDate: 2023-11-15

Zhou X, Kang X, Chen J, et al (2023)

Genome degradation promotes Salmonella pathoadaptation by remodeling fimbriae-mediated proinflammatory response.

National science review, 10(10):nwad228.

Understanding changes in pathogen behavior (e.g. increased virulence, a shift in transmission channel) is critical for the public health management of emerging infectious diseases. Genome degradation via gene depletion or inactivation is recognized as a pathoadaptive feature of the pathogen evolving with the host. However, little is known about the exact role of genome degradation in affecting pathogenic behavior, and the underlying molecular detail has yet to be examined. Using large-scale global avian-restricted Salmonella genomes spanning more than a century, we projected the genetic diversity of Salmonella Pullorum (bvSP) by showing increasingly antimicrobial-resistant ST92 prevalent in Chinese flocks. The phylogenomic analysis identified three lineages in bvSP, with an enhancement of virulence in the two recently emerged lineages (L2/L3), as evidenced in chicken and embryo infection assays. Notably, the ancestor L1 lineage resembles the Salmonella serovars with higher metabolic flexibilities and more robust environmental tolerance, indicating stepwise evolutionary trajectories towards avian-restricted lineages. Pan-genome analysis pinpointed fimbrial degradation from a virulent lineage. The later engineered fim-deletion mutant, and all other five fimbrial systems, revealed behavior switching that restricted horizontal fecal-oral transmission but boosted virulence in chicks. By depleting fimbrial appendages, bvSP established persistent replication with less proinflammation in chick macrophages and adopted vertical transovarial transmission, accompanied by ever-increasing intensification in the poultry industry. Together, we uncovered a previously unseen paradigm for remodeling bacterial surface appendages that supplements virulence-enhanced evolution with increased vertical transmission.

RevDate: 2023-11-15

Tahir Ul Qamar M, Sadaqat M, Zhu XT, et al (2023)

Comparative genomics profiling revealed multi-stress responsive roles of the CC-NBS-LRR genes in three mango cultivars.

Frontiers in plant science, 14:1285547.

The nucleotide-binding site-leucine-rich repeat (NBS-LRR) gene family is the largest group of disease resistance (R) genes in plants and is active in response to viruses, bacteria, and fungi usually involved in effector-triggered immunity (ETI). Pangenome-wide studies allow researchers to analyze the genetic diversity of multiple species or their members simultaneously, providing a comprehensive understanding of the evolutionary relationships and diversity present among them. The draft pan-genome of three Mangifera indica cultivars (Alphonso, Hong Xiang Ya, and Tommy atkins) was constructed and Presence/absence variants (PAVs) were filtered through the ppsPCP pipeline. As a result, 2823 genes and 5907 PAVs from H. Xiang Ya, and 1266 genes and 2098 PAVs from T. atkins were added to the reference genome. For the identification of CC-NBS-LRR (CNL) genes in these mango cultivars, this draft pan-genome study has successfully identified 47, 27, and 36 members in Alphonso, H. Xiang Ya, and T. atkins respectively. The phylogenetic analysis divided MiCNL proteins into four distinct subgroups. All MiCNL genes are unevenly distributed on chromosomes. Both tandem and segmental duplication events played a significant role in the expansion of the CNL gene family. These genes contain cis-elements related to light, stress, hormone, and development. The analysis of protein-protein interactions (PPI) revealed that MiCNL proteins interacted with other defense-responsive proteins. Gene Ontology (GO) analysis indicated that MiCNL genes play a role in defense mechanisms within the organism. The expression level of the identified genes in fruit peel was observed under disease and cold stress which showed that Mi_A_CNL13 and 14 were up-regulated while Mi_A_CNL15, 25, 30, 31, and 40 were down-regulated in disease stress. On the other hand, Mi_A_CNL2, 14, 41, and 45 were up-regulated and Mi_A_CNL47 is down-regulated in cold stress. Subsequently, the Random Forest (RF) classifier was used to assess the multi-stress response of MiCNLs. It was found that Mi_A_CNL14 is a gene that responds to multiple stress conditions. The CNLs have similar protein structures which show that they are involved in the same function. The above findings provide a foundation for a deeper understanding of the functional characteristics of the mango CNL gene family.

RevDate: 2023-11-14

Hu H, Scheben A, Wang J, et al (2023)

Unravelling inversions: Technological advances, challenges, and potential impact on crop breeding.

Plant biotechnology journal [Epub ahead of print].

Inversions, a type of chromosomal structural variation, significantly influence plant adaptation and gene functions by impacting gene expression and recombination rates. However, compared with other structural variations, their roles in functional biology and crop improvement remain largely unexplored. In this review, we highlight technological and methodological advancements that have allowed a comprehensive understanding of inversion variants through the pangenome framework and machine learning algorithms. Genome editing is an efficient method for inducing or reversing inversion mutations in plants, providing an effective mechanism to modify local recombination rates. Given the potential of inversions in crop breeding, we anticipate increasing attention on inversions from the scientific community in future research and breeding applications.

RevDate: 2023-11-14

Zakeri M, Brown NK, Ahmed OY, et al (2023)

Movi: a fast and cache-efficient full-text pangenome index.

bioRxiv : the preprint server for biology pii:2023.11.04.565615.

Efficient pangenome indexes are promising tools for many applications, including rapid classification of nanopore sequencing reads. Recently, a compressed-index data structure called the "move structure" was proposed as an alternative to other BWT-based indexes like the FM index and r-index. The move structure uniquely achieves both O(r) space and O(1)-time queries, where r is the number of runs in the pangenome BWT. We implemented Movi, an efficient tool for building and querying move-structure pangenome indexes. While the size of the Movi's index is larger than the r-index, it scales at a smaller rate for pangenome references, as its size is exactly proportional to r, the number of runs in the BWT of the reference. Movi can compute sophisticated matching queries needed for classification - such as pseudo-matching lengths - at least ten times faster than the fastest available methods. Movi achieves this speed by leveraging the move structure's strong locality of reference, incurring close to the minimum possible number of cache misses for queries against large pangenomes. Movi's fast constant-time query loop makes it well suited to real-time applications like adaptive sampling for nanopore sequencing, where decisions must be made in a small and predictable time interval.

RevDate: 2023-11-14

Krieger M, AbdelRahman YM, Choi D, et al (2023)

The prevalence of Fusobacterium nucleatum subspecies in the oral cavity stratifies by local health status.

bioRxiv : the preprint server for biology pii:2023.10.25.563997.

The ubiquitous inflammophilic pathobiont Fusobacterium nucleatum is widely recognized for its strong association with a variety of human dysbiotic diseases such as periodontitis and oral/extraoral abscesses, as well as multiple types of cancer . F. nucleatum is currently subdivided into four subspecies: F. nucleatum subspecies nucleatum (Fn. nucleatum) , animalis (Fn. animalis), polymorphum (Fn. polymorphum), and vincentii/fusiforme (Fn. vincentii). Although these subspecies have been historically considered as functionally interchangeable in the oral cavity, direct clinical evidence is largely lacking for this assertion. Consequently, we assembled a collection of oral clinical specimens to determine whether F. nucleatum subspecies prevalence in the oral cavity stratifies by local oral health status. Patient-matched clinical specimens of both disease-free dental plaque and odontogenic abscess were analyzed with newly developed culture-dependent and culture-independent approaches using 44 and 60 oral biofilm/tooth abscess paired specimens, respectively. Most oral cavities were found to simultaneously harbor multiple F. nucleatum subspecies, with a greater diversity present within dental plaque compared to abscesses. In dental plaque, Fn. polymorphum is clearly the dominant organism, but this changes dramatically within odontogenic abscesses where Fn. animalis is heavily favored over all other fusobacteria. Surprisingly, the most commonly studied F. nucleatum subspecies, Fn. nucleatum, is only a minor constituent in the oral cavity. To gain further insights into the genetic basis for these phenotypes, we subsequently performed pangenome, phylogenetic, and functional enrichment analyses of oral fusobacterial genomes using the Anvi'o platform, which revealed significant genotypic distinctions among F. nucleatum subspecies. Accordingly, our results strongly support a taxonomic reassignment of each F. nucleatum subspecies into distinct Fusobacterium species. Of these, Fn. animalis should be considered as the most clinically relevant at sites of active inflammation, despite being among the least characterized oral fusobacteria.

RevDate: 2023-11-14

Pushkova EN, Borkhert EV, Novakovskiy RO, et al (2023)

Selection of Flax Genotypes for Pan-Genomic Studies by Sequencing Tagmentation-Based Transcriptome Libraries.

Plants (Basel, Switzerland), 12(21): pii:plants12213725.

Flax (Linum usitatissimum L.) products are used in the food, pharmaceutical, textile, polymer, medical, and other industries. The creation of a pan-genome will be an important advance in flax research and breeding. The selection of flax genotypes that sufficiently cover the species diversity is a crucial step for the pan-genomic study. For this purpose, we have adapted a method based on Illumina sequencing of transcriptome libraries prepared using the Tn5 transposase (tagmentase). This approach reduces the cost of sample preparation compared to commercial kits and allows the generation of a large number of cDNA libraries in a short time. RNA-seq data were obtained for 192 flax plants (3-6 individual plants from 44 flax accessions of different morphology and geographical origin). Evaluation of the genetic relationship between flax plants based on the sequencing data revealed incorrect species identification for five accessions. Therefore, these accessions were excluded from the sample set for the pan-genomic study. For the remaining samples, typical genotypes were selected to provide the most comprehensive genetic diversity of flax for pan-genome construction. Thus, high-throughput sequencing of tagmentation-based transcriptome libraries showed high efficiency in assessing the genetic relationship of flax samples and allowed us to select genotypes for the flax pan-genomic analysis.

RevDate: 2023-11-14

Dutta B, Halder U, Chitikineni A, et al (2023)

Delving into the lifestyle of Sundarban Wetland resident, biofilm producing, halotolerant Salinicoccus roseus: a comparative genomics-based intervention.

BMC genomics, 24(1):681.

BACKGROUND: Microbial community played an essential role in ecosystem processes, be it mangrove wetland or other intertidal ecologies. Several enzymatic activities like hydrolases are effective ecological indicators of soil microbial function. So far, little is known on halophilic bacterial contribution and function on a genomic viewpoint of Indian Sundarban Wetland. Considering the above mentioned issues, the aims of this study was to understand the life style, metabolic functionalities and genomic features of the isolated bacterium, Salinicoccus roseus strain RF1H. A comparative genome-based study of S. roseus has not been reported yet. Henceforth, we have considered the inclusion of the intra-species genome comparison of S. roseus to gain insight into the high degree of variation in the genome of strain RF1H among others.

RESULTS: Salinicoccus roseus strain RF1H is a pink-red pigmented, Gram-positive and non-motile cocci. The bacterium exhibited high salt tolerance (up to 15% NaCl), antibiotic resistance, biofilm formation and secretion of extracellular hydrolytic enzymes. The circular genome was approximately 2.62978 Mb in size, encoding 574 predicted genes with GC content 49.5%. Presence of genomic elements (prophages, transposable elements, CRISPR-Cas system) represented bacterial virulence and multidrug-resistance. Furthermore, genes associated with salt tolerance, temperature adaptation and DNA repair system were distributed in 17 genomic islands. Genes related to hydrocarbon degradation manifested metabolic capability of the bacterium for potential biotechnological applications. A comparative pangenome analysis revealed two-component response regulator, modified C4-dicarboxylate transport system and osmotic stress regulated ATP-binding proteins. Presence of genes encoding arginine decarboxylase (ADC) enzyme being involved in biofilm formation was reported from the genome. In silico study revealed the protein is thermostable and made up with ~ 415 amino acids, and hydrophilic in nature. Three motifs appeared to be evolutionary conserved in all Salinicoccus sequences.

CONCLUSION: The first report of whole genome analysis of Salinicoccus roseus strain RF1H provided information of metabolic functionalities, biofilm formation, resistance mechanism and adaptation strategies to thrive in climate-change induced vulnerable spot like Sundarban. Comparative genome analysis highlighted the unique genome content that contributed the strain's adaptability. The biomolecules produced during metabolism are important sources of compounds with potential beneficial applications in pharmaceuticals.

RevDate: 2023-11-13

Joglekar P, Conlan S, Lee-Lin SQ, et al (2023)

Integrated genomic and functional analyses of human skin-associated Staphylococcus reveal extensive inter- and intra-species diversity.

Proceedings of the National Academy of Sciences of the United States of America, 120(47):e2310585120.

Human skin is stably colonized by a distinct microbiota that functions together with epidermal cells to maintain a protective physical barrier. Staphylococcus, a prominent genus of the skin microbiota, participates in colonization resistance, tissue repair, and host immune regulation in strain-specific manners. To unlock the potential of engineering skin microbial communities, we aim to characterize the diversity of this genus within the context of the skin environment. We reanalyzed an extant 16S rRNA amplicon dataset obtained from distinct body sites of healthy volunteers, providing a detailed biogeographic depiction of staphylococcal species that colonize our skin. S. epidermidis, S. capitis, and S. hominis were the most abundant staphylococcal species present in all volunteers and were detected at all body sites. Pan-genome analysis of isolates from these three species revealed that the genus-core was dominated by central metabolism genes. Species-restricted-core genes encoded known host colonization functions. The majority (~68%) of genes were detected only in a fraction of isolate genomes, underscoring the immense strain-specific gene diversity. Conspecific genomes grouped into phylogenetic clades, exhibiting body site preference. Each clade was enriched for distinct gene sets that are potentially involved in site tropism. Finally, we conducted gene expression studies of select isolates showing variable growth phenotypes in skin-like medium. In vitro expression revealed extensive intra- and inter-species gene expression variation, substantially expanding the functional diversification within each species. Our study provides an important resource for future ecological and translational studies to examine the role of shared and strain-specific staphylococcal genes within the skin environment.

RevDate: 2023-11-12

Harrison PW, Amode MR, Austine-Orimoloye O, et al (2023)

Ensembl 2024.

Nucleic acids research pii:7416379 [Epub ahead of print].

Ensembl ( is a freely available genomic resource that has produced high-quality annotations, tools, and services for vertebrates and model organisms for more than two decades. In recent years, there has been a dramatic shift in the genomic landscape, with a large increase in the number and phylogenetic breadth of high-quality reference genomes, alongside major advances in the pan-genome representations of higher species. In order to support these efforts and accelerate downstream research, Ensembl continues to focus on scaling for the rapid annotation of new genome assemblies, developing new methods for comparative analysis, and expanding the depth and quality of our genome annotations. This year we have continued our expansion to support global biodiversity research, doubling the number of annotated genomes we support on our Rapid Release site to over 1700, driven by our close collaboration with biodiversity projects such as Darwin Tree of Life. We have also strengthened support for key agricultural species, including the first regulatory builds for farmed animals, and have updated key tools and resources that support the global scientific community, notably the Ensembl Variant Effect Predictor. Ensembl data, software, and tools are freely available.

RevDate: 2023-11-12

Raney BJ, Barber GP, Benet-Pagès A, et al (2023)

The UCSC Genome Browser database: 2024 update.

Nucleic acids research pii:7416382 [Epub ahead of print].

The UCSC Genome Browser ( is a web-based genomic visualization and analysis tool that serves data to over 7,000 distinct users per day worldwide. It provides annotation data on thousands of genome assemblies, ranging from human to SARS-CoV2. This year, we have introduced new data from the Human Pangenome Reference Consortium and on viral genomes including SARS-CoV2. We have added 1,200 new genomes to our GenArk genome system, increasing the overall diversity of our genomic representation. We have added support for nine new user-contributed track hubs to our public hub system. Additionally, we have released 29 new tracks on the human genome and 11 new tracks on the mouse genome. Collectively, these new features expand both the breadth and depth of the genomic knowledge that we share publicly with users worldwide.

RevDate: 2023-11-12

Li Y, Wu Y, Li D, et al (2023)

Multicenter comparative genomic study of Klebsiella oxytoca complex reveals a highly antibiotic-resistant subspecies of Klebsiellamichiganensis.

Journal of microbiology, immunology, and infection = Wei mian yu gan ran za zhi pii:S1684-1182(23)00205-0 [Epub ahead of print].

BACKGROUND: The Klebsiella oxytoca complex is an opportunistic pathogen that has been recently identified as an actual complex. However, the characteristics of each species remain largely unknown. We aimed to study the clinical prevalence, antimicrobial profiles, genetic differences, and interaction with the host of each species of this complex.

METHODS: One hundred and three clinical isolates of the K. oxytoca complex were collected from 33 hospitals belonging to 19 areas in China from 2020 to 2021. Species were identified using whole genome sequencing based on average nucleotide identity. Clinical infection characteristics of the species were analyzed. Comparative genomics and pan-genome analyses were performed on these isolates and an augmented dataset, including 622 assemblies from the National Center for Biotechnology Information. In vitro assays evaluating the adhesion ability of human respiratory epithelial cells and survivability against macrophages were performed on randomly selected isolates.

RESULTS: Klebsiella michiganensis (46.6%, 48/103) and K. oxytoca (35.92%, 37/103) were the major species of the complex causing human infections. K. michiganensis had a higher genomic diversity and larger pan-genome size than did K. oxytoca. K. michiganensis isolates with blaoxy-5 had a higher resistance rate to various antibiotics, antimicrobial gene carriage rate, adhesion ability to human respiratory epithelial cells, and survival rate against macrophages than isolates of other species.

CONCLUSION: Our study revealed the genetic diversity of K. michiganensis and firstly identified the highly antimicrobial-resistant profile of K. michiganensis carrying blaoxy-5.

RevDate: 2023-11-11

Laux M, Piroupo CM, Setubal JC, et al (2023)

The Raphidiopsis (= Cylindrospermopsis) raciborskii pangenome updated: Two new metagenome-assembled genomes from the South American clade.

Harmful algae, 129:102518.

Two Raphidiopsis (=Cylindrospermopsis) raciborskii metagenome-assembled genomes (MAGs) were recovered from two freshwater metagenomic datasets sampled in 2011 and 2012 in Pampulha Lake, a hypereutrophic, artificial, shallow reservoir, located in the city of Belo Horizonte (MG), Brazil. Since the late 1970s, the lake has undergone increasing eutrophication pressure, due to wastewater input, leading to the occurrence of frequent cyanobacterial blooms. The major difference observed between PAMP2011 and PAMP2012 MAGs was the lack of the saxitoxin gene cluster in PAMP2012, which also presented a smaller genome, while PAMP2011 presented the complete sxt cluster and all essential proteins and clusters. The pangenome analysis was performed with all Raphidiopsis/Cylindrospermopsis genomes available at NCBI to date, with the addition of PAMP2011 and PAMP2012 MAGs (All33 subset), but also without the South American strains (noSA subset), and only among the South American strains (SA10 and SA8 subsets). We observed a substantial increase in the core genome size for the 'noSA' subset, in comparison to 'All33' subset, and since the core genome reflects the closeness among the pangenome members, the results strongly suggest that the conservation level of the essential gene repertoire seems to be affected by the geographic origin of the strains being analyzed, supporting the existence of a distinct SA clade. The Raphidiopsis pangenome comprised a total of 7943 orthologous protein clusters, and the two new MAGs increased the pangenome size by 11%. The pangenome based phylogenetic relationships among the 33 analyzed genomes showed that the SA genomes clustered together with 99% bootstrap support, reinforcing the metabolic particularity of the Raphidiopsis South American clade, related to its saxitoxin producing unique ability, while also indicating a different evolutionary history due to its geographic isolation.

RevDate: 2023-11-09

Mahnoor I, Shabbir H, Nawaz S, et al (2023)

Characterization of exclusively non-commensal Neisseria gonorrhoeae pangenome to prioritize globally conserved and thermodynamically stable vaccine candidates using immune-molecular dynamic simulations.

Microbial pathogenesis pii:S0882-4010(23)00472-2 [Epub ahead of print].

Neisseria gonorrhoeae (Ngo) has emerged as a global threat leading to one of the most common sexually transmitted diseases in the world. It has also become one of the leading antimicrobial resistant organisms, resulting in fewer treatment options and an increased morbidity. Therefore, in recent years, there has been an increased focus on the development of new treatments and preventive strategies to combat its infection. In this study, we have combined the most conserved epitopes from the completely assembled strains of Ngo to develop a universal and a thermodynamically stable vaccine candidate. For our vaccine design, the epitopes were selected for their high immunogenicity, non-allergenicity and non-cytotoxicity, making them the ideal candidates for vaccine development. For the screening process, several reverse vaccinology tools were employed to rigorously extract non-homologous and immunogenic epitopes from the selected proteins. Consequently, a total number of 3 B-cell epitopes and 6 T-cell epitopes were selected and joined by multiple immune-modulating adjuvants and linkers to generate a promiscuous immune response. Additionally, the stability and flexible nature of the vaccine construct was confirmed using various molecular dynamic simulation tools. Overall, the vaccine candidate showed promising binding affinity to various HLA alleles and TLR receptors; however, further studies are needed to assess its efficacy in-vivo. In this way, we have designed a multi-subunit vaccine candidate to potentially combat and control the spread of N. gonorrhoeae.

RevDate: 2023-11-09

Lu K, Pan Y, Shen J, et al (2023)

SilkMeta: a comprehensive platform for sharing and exploiting pan-genomic and multi-omic silkworm data.

Nucleic acids research pii:7369802 [Epub ahead of print].

The silkworm Bombyx mori is a domesticated insect that serves as an animal model for research and agriculture. The silkworm super-pan-genome dataset, which we published last year, is a unique resource for the study of global genomic diversity and phenotype-genotype association. Here we present SilkMeta (, a comprehensive database covering the available silkworm pan-genome and multi-omics data. The database contains 1082 short-read genomes, 546 long-read assembled genomes, 1168 transcriptomes, 294 phenotype characterizations (phenome), tens of millions of variations (variome), 7253 long non-coding RNAs (lncRNAs), 18 717 full length transcripts and a set of population statistics. We have compiled publications on functional genomics research and genetic stock deciphering (mutant map). A range of bioinformatics tools is also provided for data visualization and retrieval. The large batch of omics data and tools were integrated in twelve functional modules that provide useful strategies and data for comparative and functional genomics research. The interactive bioinformatics platform SilkMeta will benefit not only the silkworm but also the insect biology communities.

RevDate: 2023-11-08

Krishnan S, Sasi S, Kodakkattumannil P, et al (2023)

Cationic and anionic detergent buffers in sequence yield high-quality genomic DNA from diverse plant species.

Analytical biochemistry pii:S0003-2697(23)00337-8 [Epub ahead of print].

Because of the heterogeneity among seedlings of outbreeding species, the use of seedling tissues as a source of DNA is unsuitable for the genomic characterization of elite germplasms. High-quality DNA, free of RNA, proteins, polysaccharides, secondary metabolites, and shearing, is mandatory for downstream molecular biology applications, especially for next-generation genome sequencing and pangenome analysis aiming to capture the complete genetic diversity within a species. The study aimed to accomplish an efficient protocol for the extraction of high-quality DNA suitable for diverse plant species/tissues. We describe a reliable, and consistent protocol suitable for the extraction of DNA from 42 difficult-to-extract plant species belonging to 33 angiosperm (monocot and dicot) families, including tissues such as seeds, roots, endosperm, and flower/fruit tissues. The protocol was first optimized for the outbreeding recalcitrant trees viz., Prosopis cineraria, Conocarpus erectus, and Phoenix dactylifera, which are rich in proteins, polysaccharides, and secondary metabolites, and the quality of the extracted DNA was confirmed by downstream applications. Nine procedures were attempted to extract high-quality, impurities-free DNA from these three plant species. Extraction of the ethanol-precipitated DNA from cetyltrimethylammonium bromide (CTAB) protocol using sodium dodecyl sulfate (SDS) buffer, i.e., the extraction using a cationic (CTAB) detergent followed by an anionic (SDS) detergent was the key for high yield and high purity (1.75-1.85 against A260/280 and an A260/230 ratio of >2) DNA. A vice versa extraction procedure, i.e., SDS buffer followed by CTAB buffer, and also CTAB buffer followed by CTAB, did not yield good-quality DNA. PCR (using different primers) and restriction endonuclease digestion of the DNA extracted from these three plants validated the protocol. The accomplishment of the genome of P. cineraria using the DNA extracted using the modified protocol confirmed its applicability to genomic studies. The optimized protocol successful in extracting high-quality DNA from diverse plant species/tissues extends its applicability and is useful for accomplishing genome sequences of elite germplasm of recalcitrant plant species with quality reads.

RevDate: 2023-11-08

Gushgari-Doyle S, Lui LM, Nielsen TN, et al (2022)

Genotype to ecotype in niche environments: adaptation of Arthrobacter to carbon availability and environmental conditions.

ISME communications, 2(1):32.

Niche environmental conditions influence both the structure and function of microbial communities and the cellular function of individual strains. The terrestrial subsurface is a dynamic and diverse environment that exhibits specific biogeochemical conditions associated with depth, resulting in distinct environmental niches. Here, we present the characterization of seven distinct strains belonging to the genus Arthrobacter isolated from varying depths of a single sediment core and associated groundwater from an adjacent well. We characterized genotype and phenotype of each isolate to connect specific cellular functions and metabolisms to ecotype. Arthrobacter isolates from each ecotype demonstrated functional and genomic capacities specific to their biogeochemical conditions of origin, including laboratory-demonstrated characterization of salinity tolerance and optimal pH, and genes for utilization of carbohydrates and other carbon substrates. Analysis of the Arthrobacter pangenome revealed that it is notably open with a volatile accessory genome compared to previous pangenome studies on other genera, suggesting a high potential for adaptability to environmental niches.

RevDate: 2023-11-07

Radjasa OK, Steven R, Humaira Z, et al (2023)

Biosynthetic gene cluster profiling from North Java Sea Virgibacillus salarius reveals hidden potential metabolites.

Scientific reports, 13(1):19273.

Virgibacillus salarius 19.PP.SC1.6 is a coral symbiont isolated from Indonesia's North Java Sea; it has the ability to produce secondary metabolites that provide survival advantages and biological functions, such as ectoine, which is synthesized by an ectoine gene cluster. Apart from being an osmoprotectant for bacteria, ectoine is also known as a chemical chaperone with numerous biological activities such as maintaining protein stability, which makes ectoine in high demand in the market industry and makes it beneficial to investigate V. salarius ectoine. However, there has been no research on genome-based secondary metabolite and ectoine gene cluster characterization from Indonesian marine V. salarius. In this study, we performed a genomic analysis and ectoine identification of V. salarius. A high-quality draft genome with total size of 4.45 Mb and 4426 coding sequence (CDS) was characterized and then mapped into the Cluster of Orthologous Groups (COG) category. The genus Virgibacillus has an "open" pangenome type with total of 18 genomic islands inside the V. salarius 19.PP.SC1.6 genome. There were seven clusters of secondary metabolite-producing genes found, with a total of 80 genes classified as NRPS, PKS (type III), terpenes, and ectoine biosynthetic related genes. The ectoine gene cluster forms one operon consists of ectABC gene with 2190 bp gene cluster length, and is successfully characterized. The presence of ectoine in V. salarius was confirmed using UPLC-MS/MS operated in Multiple Reaction Monitoring (MRM) mode, which indicates that V. salarius has an intact ectoine gene clusters and is capable of producing ectoine as compatible solutes.

RevDate: 2023-11-07

Sen S, Woodhouse MR, Portwood JL, et al (2023)

Maize Feature Store: A centralized resource to manage and analyze curated maize multi-omics features for machine learning applications.

Database : the journal of biological databases and curation, 2023:.

The big-data analysis of complex data associated with maize genomes accelerates genetic research and improves agronomic traits. As a result, efforts have increased to integrate diverse datasets and extract meaning from these measurements. Machine learning models are a powerful tool for gaining knowledge from large and complex datasets. However, these models must be trained on high-quality features to succeed. Currently, there are no solutions to host maize multi-omics datasets with end-to-end solutions for evaluating and linking features to target gene annotations. Our work presents the Maize Feature Store (MFS), a versatile application that combines features built on complex data to facilitate exploration, modeling and analysis. Feature stores allow researchers to rapidly deploy machine learning applications by managing and providing access to frequently used features. We populated the MFS for the maize reference genome with over 14 000 gene-based features based on published genomic, transcriptomic, epigenomic, variomic and proteomics datasets. Using the MFS, we created an accurate pan-genome classification model with an AUC-ROC score of 0.87. The MFS is publicly available through the maize genetics and genomics database. Database URL

RevDate: 2023-11-07

Ullah A, Rehman B, Khan S, et al (2023)

An In Silico Multi-epitopes Vaccine Ensemble and Characterization Against Nosocomial Proteus penneri.

Molecular biotechnology [Epub ahead of print].

Proteus penneri (P. penneri) is a bacillus-shaped, gram-negative, facultative anaerobe bacterium that is primarily an invasive pathogen and the etiological agent of several hospital-associated infections. P. penneri strains are naturally resistant to macrolides, amoxicillin, oxacillin, penicillin G, and cephalosporins; in addition, no vaccines are available against these strains. This warrants efforts to propose a theoretical based multi-epitope vaccine construct to prevent pathogen infections. In this research, reverse vaccinology bioinformatics and immunoinformatics approaches were adopted for vaccine target identification and construction of a multi-epitope vaccine. In the first phase, a core proteome dataset of the targeted pathogen was obtained using the NCBI database and subjected to bacterial pan-genome analysis using bacterial pan-genome analysis (BPGA) to predict core protein sequences which were then used to find good vaccine target candidates. This identified two proteins, Hcp family type VI secretion system effector and superoxide dismutase family protein, as promising vaccine targets. Afterward using the IEDB database, different B-cell and T-cell epitopes were predicted. A set of four epitopes "KGSVNVQDRE, NTGKLTGTR, IIHSDSWNER, and KDGKPVPALK" were chosen for the development of a multi-epitope vaccine construct. A 183 amino acid long vaccine design was built along with "EAAAK" and "GPGPG" linkers and a cholera toxin B-subunit adjuvant. The designed vaccine model comprised immunodominant, non-toxic, non-allergenic, and physicochemical stable epitopes. The model vaccine was docked with MHC-I, MHC-II, and TLR-4 immune cell receptors using the Cluspro2.0 web server. The binding energy score of the vaccine was - 654.7 kcal/mol for MHC-I, - 738.4 kcal/mol for MHC-II, and - 695.0 kcal/mol for TLR-4. A molecular dynamic simulation was done using AMBER v20 package for dynamic behavior in nanoseconds. Additionally, MM-PBSA binding free energy analysis was done to test intermolecular binding interactions between docked molecules. The MM-GBSA net binding energy score was - 148.00 kcal/mol, - 118.00 kcal/mol, and - 127.00 kcal/mol for vaccine with TLR-4, MHC-I, and MHC-II, respectively. Overall, these in silico-based predictions indicated that the vaccine is highly promising in terms of developing protective immunity against P. penneri. However, additional experimental validation is required to unveil the real immune response to the designed vaccine.

RevDate: 2023-11-07

Raghuram V, Gunoskey JJ, Hofstetter KS, et al (2023)

Comparison of genomic diversity between single and pooled Staphylococcus aureus colonies isolated from human colonization cultures.

Microbial genomics, 9(11):.

The most common approach to sampling the bacterial populations within an infected or colonized host is to sequence genomes from a single colony obtained from a culture plate. However, it is recognized that this method does not capture the genetic diversity in the population. Sequencing a mixture of several colonies (pool-seq) is a better approach to detect population heterogeneity, but it is more complex to analyse due to different types of heterogeneity, such as within-clone polymorphisms, multi-strain mixtures, multi-species mixtures and contamination. Here, we compared 8 single-colony isolates (singles) and pool-seq on a set of 2286 Staphylococcus aureus culture samples to identify features that can distinguish pure samples, samples undergoing intraclonal variation and mixed strain samples. The samples were obtained by swabbing 3 body sites on 85 human participants quarterly for a year, who initially presented with a methicillin-resistant S. aureus skin and soft-tissue infection (SSTI). We compared parameters such as sequence quality, contamination, allele frequency, nucleotide diversity and pangenome diversity in each pool to those for the corresponding singles. Comparing singles from the same culture plate, we found that 18% of sample collections contained mixtures of multiple multilocus sequence types (MLSTs or STs). We showed that pool-seq data alone could predict the presence of multi-ST populations with 95% accuracy. We also showed that pool-seq could be used to estimate the number of intra-clonal polymorphic sites in the population. Additionally, we found that the pool may contain clinically relevant genes such as antimicrobial resistance markers that may be missed when only examining singles. These results highlight the potential advantage of analysing genome sequences of total populations obtained from clinical cultures rather than single colonies.

RevDate: 2023-11-07

Sommer H, Djamalova D, M Galardini (2023)

Reduced ambiguity and improved interpretability of bacterial genome-wide associations using gene-cluster-centric k-mers.

Microbial genomics, 9(11):.

The wide adoption of bacterial genome sequencing and encoding both core and accessory genome variation using k-mers has allowed bacterial genome-wide association studies (GWAS) to identify genetic variants associated with relevant phenotypes such as those linked to infection. Significant limitations still remain because of k-mers being duplicated across gene clusters and as far as the interpretation of association results is concerned, which affects the wider adoption of GWAS methods on microbial data sets. We have developed a simple computational method (panfeed) that explicitly links each k-mer to their gene cluster at base-resolution level, which allows us to avoid biases introduced by a global de Bruijn graph as well as more easily map and annotate associated variants. We tested panfeed on two independent data sets, correctly identifying previously characterized causal variants, which demonstrates the precision of the method, as well as its scalable performance. panfeed is a command line tool written in the python programming language and is available at

RevDate: 2023-11-07

Garcia J, Morales-Cruz A, Cochetel N, et al (2023)

Comparative pangenomic insights into the distinct evolution of virulence factors among grapevine trunk pathogens.

Molecular plant-microbe interactions : MPMI [Epub ahead of print].

The permanent organs of grapevines (V. vinifera L.), like other woody perennials, are colonized by various unrelated pathogenic ascomycete fungi secreting cell wall-degrading enzymes and phytotoxic secondary metabolites that contribute to host damage and disease symptoms. Trunk pathogens differ in the symptoms they induce and the extent and speed of damage. Isolates of the same species often display a wide virulence range, even within the same vineyard. This study focuses on Eutypa lata, Neofusicoccum parvum, and Phaeoacremonium minimum, causal agents of Eutypa dieback, Botryosphaeria dieback, and Esca, respectively. We sequenced fifty isolates from viticulture regions worldwide and built nucleotide-level, reference-free pangenomes for each species. Through examining genomic diversity and pangenome structure, we analyzed intraspecific conservation and variability of putative virulence factors, focusing on functions under positive selection, and recent gene-family dynamics of contraction and expansion. Our findings reveal contrasting distributions of putative virulence factors in the core, dispensable, and private genomes of each pangenome. For example, CAZymes were prevalent in the core genomes of each pangenome, whereas biosynthetic gene clusters were prevalent in the dispensable genomes of E. lata and P. minimum. The dispensable fractions were also enriched in Gypsy transposable elements and virulence factors under positive selection (polyketide synthases genes in E. lata and P. minimum glycosyltransferases in N. parvum). Our findings underscore the complexity of the genomic architecture in each species and provide insights into their adaptive strategies, enhancing our understanding of the underlying mechanisms of virulence.

RevDate: 2023-11-06

Laufer V, Glover TW, TE Wilson (2023)

Applications of advanced technologies for detecting genomic structural variation.

Mutation research. Reviews in mutation research pii:S1383-5742(23)00023-6 [Epub ahead of print].

Chromosomal structural variation (SV) encompasses a heterogenous class of genetic variants that exerts strong influences on human health and disease. Despite their importance, many structural variants (SVs) have remained poorly characterized at even a basic level, a discrepancy predicated upon the technical limitations of prior genomic assays. However, recent advances in genomic technology can identify and localize SVs accurately, opening new questions regarding SV risk factors and their impacts in humans. Here, we first define and classify human SVs and their generative mechanisms, highlighting characteristics leveraged by various SV assays. We next examine the first-ever gapless assembly of the human genome and the technical process of assembling it, which required third-generation sequencing technologies to resolve structurally complex loci. The new portions of that "telomere-to-telomere" and subsequent pangenome assemblies highlight aspects of SV biology likely to develop in the near-term. We consider the strengths and limitations of the most promising new SV technologies and when they or longstanding approaches are best suited to meeting salient goals in the study of human SV in population-scale genomics research, clinical, and public health contexts. It is a watershed time in our understanding of human SV when new approaches are expected to fundamentally change genomic applications.

RevDate: 2023-11-06

Magome TG, Ramatla T, Mokgokong P, et al (2023)

The draft genome and pan-genome structure of Paraclostridium bifermentans strain T2 isolated from sheep faeces.

Data in brief, 51:109660.

Paraclostridium bifermentans is a Gram-positive, rod-shaped bacterium that can inhabit various mesophilic environments such as soil, marine habitats, and polluted waters. Some species of Paraclostridium are reported to cause fatal infections in humans, although mechanisms and capacity for adaptation are still unknown. We hereby present the whole genome sequence data of P. bifermentans T2 strain isolated from sheep faecal matter in Potchefstroom, South Africa. DNA libraries were sequenced on the Oxford Nanopore Mk1B platform. The generated sequence data was assembled and polished using Flye assembler. Genome data analysis yielded a genome size of 2 911,782 bp, comprising of a 27.8 % G + C content. Rapid Annotation using Subsystem Technology (RAST) showed that the draft genome of this strain consists of 6 514 coding sequences (CDS). The pan-genome was defined by a total of 16 288 CDSs, grouping the strain with the genome of P. bifermentans SampleS7P1. The draft genome sequence has been deposited in NCBI GenBank with the accession number of JAUPET000000000.

RevDate: 2023-11-03

Bachari A, Nassar N, Schanknecht E, et al (2023)

Rationalizing a prospective coupling effect of cannabinoids with the current pharmacotherapy for melanoma treatment.

WIREs mechanisms of disease [Epub ahead of print].

Melanoma is one of the leading fatal forms of cancer, yet from a treatment perspective, we have minimal control over its reoccurrence and resistance to current pharmacotherapies. The endocannabinoid system (ECS) has recently been accepted as a multifaceted homeostatic regulator, influencing various physiological processes across different biological compartments, including the skin. This review presents an overview of the pathophysiology of melanoma, current pharmacotherapy used for treatment, and the challenges associated with the different pharmacological approaches. Furthermore, it highlights the utility of cannabinoids as an additive remedy for melanoma by restoring the balance between downregulated immunomodulatory pathways and elevated inflammatory cytokines during chronic skin conditions as one of the suggested critical approaches in treating this immunogenic tumor. This article is categorized under: Cancer > Molecular and Cellular Physiology.

RevDate: 2023-11-03

Pibiri GE, Fan J, R Patro (2023)

Meta-colored compacted de Bruijn graphs.

bioRxiv : the preprint server for biology pii:2023.07.21.550101.

MOTIVATION: The colored compacted de Bruijn graph (c-dBG) has become a fundamental tool used across several areas of genomics and pangenomics. For example, it has been widely adopted by methods that perform read mapping or alignment, abundance estimation, and subsequent downstream analyses. These applications essentially regard the c-dBG as a map from k-mers to the set of references in which they appear. The c-dBG data structure should retrieve this set -- the color of the k-mer -- efficiently for any given k-mer, while using little memory. To aid retrieval, the colors are stored explicitly in the data structure and take considerable space for large reference collections, even when compressed. Reducing the space of the colors is therefore of utmost importance for large-scale sequence indexing.

RESULTS: We describe the meta-colored compacted de Bruijn graph (Mac-dBG) -- a new colored de Bruijn graph data structure where colors are represented holistically, i.e., taking into account their redundancy across the whole collection being indexed, rather than individually as atomic integer lists. This allows the factorization and compression of common sub-patterns across colors. While optimizing the space of our data structure is NP-hard, we propose a simple heuristic algorithm that yields practically good solutions. Results show that the Mac-dBG data structure improves substantially over the best previous space/time trade-off, by providing remarkably better compression effectiveness for the same (or better) query efficiency. This improved space/time trade-off is robust across different datasets and query workloads. Code availability: A C++17 implementation of the Mac-dBG is publicly available on GitHub at:

RevDate: 2023-11-02

Zhuang Z, Cheng YY, Deng J, et al (2023)

Genomic insights into the phage-defense systems of Stenotrophomonas maltophilia clinical isolates.

Microbiological research, 278:127528 pii:S0944-5013(23)00230-6 [Epub ahead of print].

Stenotrophomonas maltophilia is a rapidly evolving multidrug-resistant opportunistic pathogen that can cause serious infections in immunocompromised patients. Although phage therapy is one of promising strategies for dealing with MDR bacteria, the main challenges of phage therapeutics include accumulation of phage resistant mutations and acquisition of the phage defense systems. To systematically evaluate the impact of (pro)phages in shaping genetic and evolutionary diversity of S. maltophilia, we collected 166 S. maltophilia isolates from three hospitals in southern China to analyze its pangenome, virulence factors, prophage regions, and anit-viral immune systems. Pangenome analysis indicated that there are 1328 saturated core genes and 26961 unsaturated accessory genes in the pangenome, suggesting existence of highly variable parts of S. maltophilia genome. The presence of genes in relation to T3SS and T6SS mechanisms suggests the great potential to secrete toxins by the S. maltophilia population, which is contrary to the conventional notion of low-virulence of S. maltophilia. Additionally, we characterized the pan-immune system maps of these clinical isolates against phage infections and revealed the co-harboring of CBASS and anti-CBASS in some strains, suggesting a never-ending arms race and the co-evolutionary dynamic between bacteria and phages. Furthermore, our study predicted 310 prophage regions in S. maltophilia with high genetic diversity. Six viral defense systems were found to be located at specific position of the S. maltophilia prophage genomes, indicating potential evolution of certain site/region similar to bacterial 'defense islands' in prophage. Our study provides novel insights of the S. maltophilia pangenome in relation to phage-defense mechanisms, which extends to our understanding of bacterial-phage interactions and might the guide application of phage therapy in combating S. maltophilia infections.

RevDate: 2023-11-02

Mun SY, Lee W, Lee SY, et al (2024)

Pediococcus inopinatus with a well-developed CRISPR-Cas system dominates in long-term fermented kimchi, Mukeunji.

Food microbiology, 117:104385.

Kimchi is produced through a low-temperature fermentation without pre-sterilization, resulting in a heterogeneous microbial community. As fermentation progresses, dominant lactic acid bacteria (LAB) species emerge and undergo a transition process. In this study, LAB were isolated from Mukeunji, a long-term fermented kimchi that is in the final stage of kimchi fermentation process. It was confirmed, through culture-dependent and independent analysis, as well as metagenome analysis, that Pediococcus inopinatus are generally dominant in long-term fermented kimchi. Comparative analysis of the de novo assembled whole genome of P. inopinatus with other kimchi LAB revealed that this species has a well-developed clustered regularly interspaced short palindromic repeats (CRISPR) system. The CRISPR system of P. inopinatus has an additional copy of the csa3 gene, a transcription factor for cas genes. Indeed, this species not only highly expresses cas1 and cas2, which induce spacer acquisition, but also has many diverse spacers that are actively expressed. These findings indicate that the well-developed CRISPR-Cas system is enabling P. inopinatus to dominate in long-fermented kimchi. Overall, this study revealed that LAB with a robust defense system dominate in the final stage of kimchi fermentation and presented a model for the succession mechanism of kimchi LAB.

RevDate: 2023-11-02

Chinchilla D, Nieves C, Gutiérrez R, et al (2023)

Phylogenomics of Leptospira santarosai, a prevalent pathogenic species in the Americas.

PLoS neglected tropical diseases, 17(11):e0011733 pii:PNTD-D-23-01039 [Epub ahead of print].

BACKGROUND: Leptospirosis is a complex zoonotic disease mostly caused by a group of eight pathogenic species (L. interrogans, L. borgpetersenii, L. kirschneri, L. mayottensis, L. noguchii, L. santarosai, L. weilii, L. alexanderi), with a wide spectrum of animal reservoirs and patient outcomes. Leptospira interrogans is considered as the leading causative agent of leptospirosis worldwide and it is the most studied species. However, the genomic features and phylogeography of other Leptospira pathogenic species remain to be determined.

Here we investigated the genome diversity of the main pathogenic Leptospira species based on a collection of 914 genomes from strains isolated around the world. Genome analyses revealed species-specific genome size and GC content, and an open pangenome in the pathogenic species, except for L. mayottensis. Taking advantage of a new set of genomes of L. santarosai strains isolated from patients in Costa Rica, we took a closer look at this species. L. santarosai strains are largely distributed in America, including the Caribbean islands, with over 96% of the available genomes originating from this continent. Phylogenetic analysis showed high genetic diversity within L. santarosai, and the clonal groups identified by cgMLST were strongly associated with geographical areas. Serotype identification based on serogrouping and/or analysis of the O-antigen biosynthesis gene loci further confirmed the great diversity of strains within the species.

CONCLUSIONS/SIGNIFICANCE: In conclusion, we report a comprehensive genome analysis of pathogenic Leptospira species with a focus on L. santarosai. Our study sheds new light onto the genomic diversity, evolutionary history, and epidemiology of leptospirosis in America and globally. Our findings also expand our knowledge of the genes driving O-antigen diversity. In addition, our work provides a framework for understanding the virulence and spread of L. santarosai and for improving its surveillance in both humans and animals.

RevDate: 2023-11-01

Li Z, Liu X, Wang C, et al (2023)

The pig pangenome provides insights into the roles of coding structural variations in genetic diversity and adaptation.

Genome research pii:gr.277638.122 [Epub ahead of print].

Structural variations have emerged as an important driving force for genome evolution and phenotypic variation in various organisms, yet their contributions to genetic diversity and adaptation in domesticated animals remain largely unknown. Here we constructed a pangenome based on 250 sequenced individuals from 32 pig breeds in Eurasia and systematically characterized coding sequence presence/absence variations (PAVs) within pigs. We identified 308.3-Mb nonreference sequences and 3438 novel genes absent from the current reference genome. Gene PAV analysis showed that 16.8% of the genes in the pangene catalog undergo PAV. A number of newly identified dispensable genes showed close associations with adaptation. For instance, several novel swine leukocyte antigen (SLA) genes discovered in nonreference sequences potentially participate in immune responses to productive and respiratory syndrome virus (PRRSV) infection. We delineated previously unidentified features of the pig mobilome that contained 490,480 transposable element insertion polymorphisms (TIPs) resulting from recent mobilization of 970 TE families, and investigated their population dynamics along with influences on population differentiation and gene expression. In addition, several candidate adaptive TE insertions were detected to be co-opted into genes responsible for responses to hypoxia, skeletal development, regulation of heart contraction, and neuronal cell development, likely contributing to local adaptation of Tibetan wild boars. These findings enhance our understanding on hidden layers of the genetic diversity in pigs and provide novel insights into the role of SVs in the evolutionary adaptation of mammals.

RevDate: 2023-11-01

Angelo L, Vaillant A, Blanchet M, et al (2023)

Pangenomic antiviral effect of REP 2139 in CRISPR/Cas9 engineered cell lines expressing hepatitis B virus surface antigen.

PloS one, 18(11):e0293167 pii:PONE-D-23-18377.

Chronic hepatitis B remains a global health problem with 296 million people living with chronic HBV infection and being at risk of developing cirrhosis and hepatocellular carcinoma. Non-infectious subviral particles (SVP) are produced in large excess over infectious Dane particles in patients and are the major source of Hepatitis B surface antigen (HBsAg). They are thought to exhaust the immune system, and it is generally considered that functional cure requires the clearance of HBsAg from blood of patient. Nucleic acid polymers (NAPs) antiviral activity lead to the inhibition of HBsAg release, resulting in rapid clearance of HBsAg from circulation in vivo. However, their efficacy has only been demonstrated in limited genotypes in small scale clinical trials. HBV exists as nine main genotypes (A to I). In this study, the HBsAg ORFs from the most prevalent genotypes (A, B, C, D, E, G), which account for over 96% of human cases, were inserted into the AAVS1 safe-harbor of HepG2 cells using CRISPR/Cas9 knock-in. A cell line producing the D144A vaccine escape mutant was also engineered. The secretion of HBsAg was confirmed into these new genotype cell lines (GCLs) and the antiviral activity of the NAP REP 2139 was then assessed. The results demonstrate that REP 2139 exerts an antiviral effect in all genotypes and serotypes tested in this study, including the vaccine escape mutant, suggesting a pangenomic effect of the NAPs.

RevDate: 2023-11-01

English J, Newberry F, Hoyles L, et al (2023)

Genomic analyses of Bacteroides fragilis: subdivisions I and II represent distinct species.

Journal of medical microbiology, 72(11):.

Introduction. Bacteroides fragilis is a Gram-negative anaerobe that is a member of the human gastrointestinal microbiota and is frequently found as an extra-intestinal opportunistic pathogen. B. fragilis comprises two distinct groups - divisions I and II - characterized by the presence/absence of genes [cepA and ccrA (cfiA), respectively] that confer resistance to β-lactam antibiotics by either serine or metallo-β-lactamase production. No large-scale analyses of publicly available B. fragilis sequence data have been undertaken, and the resistome of the species remains poorly defined.Hypothesis/Gap Statement. Reclassification of divisions I and II B. fragilis as two distinct species has been proposed but additional evidence is required.Aims. To investigate the genomic diversity of GenBank B. fragilis genomes and establish the prevalence of division I and II strains among publicly available B. fragilis genomes, and to generate further evidence to demonstrate that B. fragilis division I and II strains represent distinct genomospecies.Methodology. High-quality (n=377) genomes listed as B. fragilis in GenBank were included in pangenome and functional analyses. Genome data were also subject to resistome profiling using The Comprehensive Antibiotic Resistance Database.Results. Average nucleotide identity and phylogenetic analyses showed B. fragilis divisions I and II represent distinct species: B. fragilis sensu stricto (n=275 genomes) and B. fragilis A (n=102 genomes; Genome Taxonomy Database designation), respectively. Exploration of the pangenome of B. fragilis sensu stricto and B. fragilis A revealed separation of the two species at the core and accessory gene levels.Conclusion. The findings indicate that B. fragilis A, previously referred to as division II B. fragilis, is an individual species and distinct from B. fragilis sensu stricto. The B. fragilis pangenome analysis supported previous genomic, phylogenetic and resistome screening analyses collectively reinforcing that divisions I and II are two separate species. In addition, it was confirmed that differences in the accessory genes of B. fragilis divisions I and II are primarily associated with carbohydrate metabolism and suggest that differences other than antimicrobial resistance could also be used to distinguish between these two species.

RevDate: 2023-11-01

Hodgeman R, Mann R, Djitro N, et al (2023)

The pan-genome of Mycobacterium avium subsp. paratuberculosis (Map) confirms ancestral lineage and reveals gene rearrangements within Map Type S.

BMC genomics, 24(1):656.

BACKGROUND: To date genomic studies on Map have concentrated on Type C strains with only a few Type S strains included for comparison. In this study the entire pan-genome of 261 Map genomes (205 Type C, 52 Type S and 4 Type B) and 7 Mycobacterium avium complex (Mac) genomes were analysed to identify genomic similarities and differences between the strains and provide more insight into the evolutionary relationship within this Mycobacterial species.

RESULTS: Our analysis of the core genome of all the Map isolates identified two distinct lineages, Type S and Type C Map that is consistent with previous phylogenetic studies of Map. Pan-genome analysis revealed that Map has a larger accessory genome than Mycobacterium avium subsp. avium (Maa) and Type C Map has a larger accessory genome than Type S Map. In addition, we found large rearrangements within Type S strains of Map and little to none in Type C and Type B strains. There were 50 core genes identified that were unique to Type S Map and there were no unique core genes identified between Type B and Type C Map strains. In Type C Map we identified an additional CE10 CAZyme class which was identified as an alpha/beta hydrolase and an additional polyketide and non-ribosomal peptide synthetase cluster. Consistent with previous analysis no plasmids and only incomplete prophages were identified in the genomes of Map. There were 45 hypothetical CRISPR elements identified with no associated cas genes.

CONCLUSION: This is the most comprehensive comparison of the genomic content of Map isolates to date and included the closing of eight Map genomes. The analysis revealed that there is greater variation in gene synteny within Type S strains when compared to Type C indicating that the Type C Map strain emerged after Type S. Further analysis of Type C and Type B genomes revealed that they are structurally similar with little to no genetic variation and that Type B Map may be a distinct clade within Type C Map and not a different strain type of Map. The evolutionary lineage of Maa and Map was confirmed as emerging after M. hominissuis.

RevDate: 2023-10-31

Manzano-Morales S, Liu Y, González-Bodí S, et al (2023)

Comparison of gene clustering criteria reveals intrinsic uncertainty in pangenome analyses.

Genome biology, 24(1):250.

BACKGROUND: A key step for comparative genomics is to group open reading frames into functionally and evolutionarily meaningful gene clusters. Gene clustering is complicated by intraspecific duplications and horizontal gene transfers that are frequent in prokaryotes. In consequence, gene clustering methods must deal with a trade-off between identifying vertically transmitted representatives of multicopy gene families, which are recognizable by synteny conservation, and retrieving complete sets of species-level orthologs. We studied the implications of adopting homology, orthology, or synteny conservation as formal criteria for gene clustering by performing comparative analyses of 125 prokaryotic pangenomes.

RESULTS: Clustering criteria affect pangenome functional characterization, core genome inference, and reconstruction of ancestral gene content to different extents. Species-wise estimates of pangenome and core genome sizes change by the same factor when using different clustering criteria, allowing robust cross-species comparisons regardless of the clustering criterion. However, cross-species comparisons of genome plasticity and functional profiles are substantially affected by inconsistencies among clustering criteria. Such inconsistencies are driven not only by mobile genetic elements, but also by genes involved in defense, secondary metabolism, and other accessory functions. In some pangenome features, the variability attributed to methodological inconsistencies can even exceed the effect sizes of ecological and phylogenetic variables.

CONCLUSIONS: Choosing an appropriate criterion for gene clustering is critical to conduct unbiased pangenome analyses. We provide practical guidelines to choose the right method depending on the research goals and the quality of genome assemblies, and a benchmarking dataset to assess the robustness and reproducibility of future comparative studies.

RevDate: 2023-10-30

Chandra G, C Jain (2023)

Gap-Sensitive Colinear Chaining Algorithms for Acyclic Pangenome Graphs.

Journal of computational biology : a journal of computational molecular cell biology [Epub ahead of print].

A pangenome graph can serve as a better reference for genomic studies because it allows a compact representation of multiple genomes within a species. Aligning sequences to a graph is critical for pangenome-based resequencing. The seed-chain-extend heuristic works by finding short exact matches between a sequence and a graph. In this heuristic, colinear chaining helps identify a good cluster of exact matches that can be combined to form an alignment. Colinear chaining algorithms have been extensively studied for aligning two sequences with various gap costs, including linear, concave, and convex cost functions. However, extending these algorithms for sequence-to-graph alignment presents significant challenges. Recently, Makinen et al. introduced a sparse dynamic programming framework that exploits the small path cover property of acyclic pangenome graphs, enabling efficient chaining. However, this framework does not consider gap costs, limiting its practical effectiveness. We address this limitation by developing novel problem formulations and provably good chaining algorithms that support a variety of gap cost functions. These functions are carefully designed to enable fast chaining algorithms whose time requirements are parameterized in terms of the size of the minimum path cover. Through an empirical evaluation, we demonstrate the superior performance of our algorithm compared with existing aligners. When mapping simulated long reads to a pangenome graph comprising 95 human haplotypes, we achieved 98.7% precision while leaving <2% of reads unmapped.

RevDate: 2023-10-28

Alsubaiyel AM, SI Bukhari (2023)

Computational exploration and design of a multi-epitopes vaccine construct against Chlamydia psittaci.

Journal of biomolecular structure & dynamics [Epub ahead of print].

Chlamydia psittaci is an intracellular pathogen and causes variety of deadly infections in humans. Antibiotics are effective against C. psittaci however high percentage of resistant strains have been reported in recent times. As there is no licensed vaccine, we used in-silico techniques to design a multi-epitopes vaccine against C. psittaci. Following a step-wise protocol, the proteome of available 26 strains was retrieved and filtered for subcellular localized proteins. Five proteins were selected (2 extracellular and 3 outer membrane) and were further analyzed for B-cell and T-cell epitopes prediction. Epitopes were further checked for antigenicity, solubility, stability, toxigenicity, allergenicity, and adhesive properties. Filtered epitopes were linked via linkers and the 3D structure of the designed vaccine construct was predicted. Binding of the designed vaccine with immune receptors: MHC-I, MHC-II, and TLR-4 was analyzed, which resulted in docking energy scores of -4.37 kcal/mol, -0.20 kcal/mol and -22.38 kcal/mol, respectively. Further, the docked complexes showed stable dynamics with a maximum value of vaccine-MHC-I complex (7.8 Å), vaccine-MHC-II complex (6.2 Å) and vaccine-TLR4 complex (5.2 Å). As per the results, the designed vaccine construct reported robust immune responses to protect the host against C. psittaci infections. In the study, the C. psittaci proteomes were considered in pan-genome analysis to extract core proteins. The pan-genome analysis was conducted using bacterial pan-genome analysis (BPGA) software. The core proteins were checked further for non-redundant proteins using a CD-Hit server. Surface localized proteins were investigated using PSORTb v 3.0. The surface proteins were BLASTp against Virulence Factor Data Base (VFDB) to predict virulent factors. Antigenicity prediction of the shortlisted proteins was further done using VAXIGEN v 2.0. The epitope mapping was done using the immune epitope database (IEDB). A multi-epitopes vaccine was built and a 3D structure was generated using 3Dprot online server. The docking analysis of the designed vaccine with immune receptors was carried out using PATCHDOCK. Molecular dynamics and post-simulation analyses were carried out using AMBER v20 to decipher the dynamics stability and intermolecular binding energies of the docked complexes.Communicated by Ramaswamy H. Sarma.

RevDate: 2023-10-28

Hamed SM, Mohamed HO, Ashour HM, et al (2023)

Comparative genomic analysis of strong biofilm-forming Klebsiella pneumoniae isolates uncovers novel ISEcp1-mediated chromosomal integration of a full plasmid-like sequence.

Infectious diseases (London, England) [Epub ahead of print].

BACKGROUND: The goal of the current study was to elucidate the genomic background of biofilm formation in Klebsiella pneumoniae.

METHODS: Clinical isolates were screened for biofilm formation using the crystal violet assay. Antimicrobial resistance (AMR) profiles were assessed by disk diffusion and broth microdilution tests. Biofilm formation was correlated to virulence and resistance genes screened by PCR. Draft genomes of three isolates that form strong biofilm were generated by Illumina sequencing.

RESULTS: Only the siderophore-coding gene iutA was significantly associated with more pronounced biofilm formation. ST1399-KL43-O1/O2v1 and ST11-KL15-O4 were assigned to the multidrug-resistant strain K21 and the extensively drug-resistant strain K237, respectively. ST1999-KL38-O12 was assigned to K57. Correlated with CRISPR/Cas distribution, more plasmid replicons and prophage sequences were identified in K21 and K237 compared to K57. The acquired AMR genes (blaOXA-48, rmtF, aac(6')-Ib and qnrB) and (blaNDM-1, blaCTX-M, aph(3')-VI, qnrS, and aac(6')-Ib-cr) were found in K237 and K21, respectively. The latter showed a novel ISEcp1-mediated chromosomal integration of replicon type IncM1 plasmid-like structure harboring blaCTX-M-14 and aph(3')-VI that uniquely interrupted rcsC. The plasmid-mediated heavy metal resistance genes merACDEPRT and arsABCDR were spotted in K21, which also exclusively carried the acquired virulence genes mrkABCDF and the hypervirulence-associated genes iucABCD-iutA, and rmpA/A2. Pangenome analysis revealed NTUH-K2044 accessory genes most frequently shared with K21.

CONCLUSIONS: While less virulent to Galleria mellonella than ST1999 (K57), the strong biofilm former, multidrug-resistant, NDM-producer K. pneumoniae K21 (ST1399-KL43-O1/O2v1) carries a novel chromosomally integrated plasmid-like structure and hypervirulence-associated genes and represents a serious threat to countries in the area.

RevDate: 2023-10-28

Hu R, Li F, Chen Y, et al (2023)

AnimalMetaOmics: a multi-omics data resources for exploring animal microbial genomes and microbiomes.

Nucleic acids research pii:7332077 [Epub ahead of print].

The Animal Meta-omics landscape database (AnimalMetaOmics, is a comprehensive and freely available resource that includes metagenomic, metatranscriptomic, and metaproteomic data from various non-human animal species and provides abundant information on animal microbiomes, including cluster analysis of microbial cognate genes, functional gene annotations, active microbiota composition, gene expression abundance, and microbial protein identification. In this work, 55 898 microbial genomes were annotated from 581 animal species, including 42 924 bacterial genomes, 12 336 virus genomes, 496 archaea genomes and 142 fungi genomes. Moreover, 321 metatranscriptomic datasets were analyzed from 31 animal species and 326 metaproteomic datasets from four animal species, as well as the pan-genomic dynamics and compositional characteristics of 679 bacterial species and 13 archaea species from animal hosts. Researchers can efficiently access and acquire the information of cross-host microbiota through a user-friendly interface, such as species, genomes, activity levels, expressed protein sequences and functions, and pan-genome composition. These valuable resources provide an important reference for better exploring the classification, functional diversity, biological process diversity and functional genes of animal microbiota.

RevDate: 2023-10-28

Dimonaco NJ, Clare A, Kenobi K, et al (2023)

StORF-Reporter: finding genes between genes.

Nucleic acids research pii:7332062 [Epub ahead of print].

Large regions of prokaryotic genomes are currently without any annotation, in part due to well-established limitations of annotation tools. For example, it is routine for genes using alternative start codons to be misreported or completely omitted. Therefore, we present StORF-Reporter, a tool that takes an annotated genome and returns regions that may contain missing CDS genes from unannotated regions. StORF-Reporter consists of two parts. The first begins with the extraction of unannotated regions from an annotated genome. Next, Stop-ORFs (StORFs) are identified in these unannotated regions. StORFs are open reading frames that are delimited by stop codons and thus can capture those genes most often missing in genome annotations. We show this methodology recovers genes missing from canonical genome annotations. We inspect the results of the genomes of model organisms, the pangenome of Escherichia coli, and a set of 5109 prokaryotic genomes of 247 genera from the Ensembl Bacteria database. StORF-Reporter extended the core, soft-core and accessory gene collections, identified novel gene families and extended families into additional genera. The high levels of sequence conservation observed between genera suggest that many of these StORFs are likely to be functional genes that should now be considered for inclusion in canonical annotations.

RevDate: 2023-10-28

Kurihara MNL, Santos INM, Eisen AKA, et al (2023)

Phenotypic and Genotypic Characterization of Cutibacterium acnes Isolated from Shoulder Surgery Reveals Insights into Genetic Diversity.

Microorganisms, 11(10): pii:microorganisms11102594.

Specific virulence factors that likely influence C. acnes invasion into deep tissues remain to be elucidated. Herein, we describe the frequency of C. acnes identification in deep tissue specimens of patients undergoing clean shoulder surgery and assess its phenotypic and genetic traits associated with virulence and antibiotic resistance patterns, compared with isolates from the skin of healthy volunteers. Multiple deep tissue specimens from the bone fragments, tendons, and bursa of 84 otherwise healthy patients undergoing primary clean-open and arthroscopic shoulder surgeries were aseptically collected. The overall yield of tissue sample cultures was 21.5% (55/255), with 11.8% (30/255) identified as C. acnes in 27.3% (23/84) of patients. Antibiotic resistance rates were low, with most strains expressing susceptibility to first-line antibiotics, while a few were resistant to penicillin and rifampicin. Phylotypes IB (73.3%) and II (23.3%) were predominant in deep tissue samples. Genomic analysis demonstrated differences in the pangenome of the isolates from the same clade. Even though strains displayed a range of pathogenic markers, such as biofilm formation, patients did not evolve to infection during the 1-year follow-up. This suggests that the presence of polyclonal C. acnes in multiple deep tissue samples does not necessarily indicate infection.

RevDate: 2023-10-28

Nedashkovskaya O, Otstavnykh N, Balabanova L, et al (2023)

Rhodoalgimonas zhirmunskyi gen. nov., sp. nov., a Marine Alphaproteobacterium Isolated from the Pacific Red Alga Ahnfeltia tobuchiensis: Phenotypic Characterization and Pan-Genome Analysis.

Microorganisms, 11(10): pii:microorganisms11102463.

A novel Gram-staining negative, strictly aerobic, rod-shaped, and non-motile bacterium, designated strain 10Alg 79[T], was isolated from the red alga Ahnfeltia tobuchiensis. A phylogenetic analysis based on 16S rRNA gene sequences placed the novel strain within the family Roseobacteraceae, class Alphaproteobacteria, phylum Pseudomonadota, where the nearest neighbor was Shimia sediminis ZQ172[T] (97.33% of identity). However, a phylogenomic study clearly showed that strain 10Alg 79[T] forms a distinct evolutionary lineage at the genus level within the family Roseobacteraceae combining with strains Aquicoccus porphyridii L1 8-17[T], Marimonas arenosa KCTC 52189[T], and Lentibacter algarum DSM 24677[T]. The ANI, AAI, and dDDH values between them were 75.63-78.15%, 67.41-73.08%, and 18.8-19.8%, respectively. The genome comprises 3,754,741 bp with a DNA GC content of 62.1 mol%. The prevalent fatty acids of strain 10Alg 79[T] were C18:1 ω7c and C16:0. The polar lipid profile consisted of phosphatidylethanolamine, phosphatidylglycerol, phosphatidylcholine, an unidentified aminolipid, an unidentified phospholipid and an unidentified lipid. A pan-genome analysis showed that the unique part of the 10Alg 79[T] genome consists of 13 genus-specific clusters and 413 singletons. The annotated singletons were more often related to transport protein systems, transcriptional regulators, and enzymes. A functional annotation of the draft genome sequence revealed that this bacterium could be a source of a new phosphorylase, which may be used for phosphoglycoside synthesis. A combination of the genotypic and phenotypic data showed that the bacterial isolate represents a novel species and a novel genus, for which the name Rhodoalgimonas zhirmunskyi gen. nov., sp. nov. is proposed. The type strain is 10Alg 79[T] (=KCTC 72611[T] = KMM 6723[T]).

RevDate: 2023-10-28

Covas C, Figueiredo G, Gomes M, et al (2023)

The Pangenome of Gram-Negative Environmental Bacteria Hides a Promising Biotechnological Potential.

Microorganisms, 11(10): pii:microorganisms11102445.

Secondary metabolites (SMs) from environmental bacteria offer viable solutions for various health and environmental challenges. Researchers are employing advanced bioinformatic tools to investigate less-explored microorganisms and unearth novel bioactive compounds. In this research area, our understanding of SMs from environmental Gram-negative bacteria lags behind that of its Gram-positive counterparts. In this regard, Pedobacter spp. have recently gained attention, not only for their role as plant growth promoters but also for their potential in producing antimicrobials. This study focuses on the genomic analysis of Pedobacter spp. to unveil the diversity of the SMs encoded in their genomes. Among the 41 genomes analyzed, a total of 233 biosynthetic gene clusters (BGCs) were identified, revealing the potential for the production of diverse SMs, including RiPPs (27%), terpenes (22%), hybrid SMs (17%), PKs (12%), NRPs (9%) and siderophores (6%). Overall, BGC distribution did not correlate with phylogenetic lineage and most of the BGCs showed no significant hits in the MIBiG database, emphasizing the uniqueness of the compounds that Pedobacter spp. can produce. Of all the species examined, P. cryoconitis and P. lusitanus stood out for having the highest number and diversity of BGCs. Focusing on their applicability and ecological functions, we investigated in greater detail the BGCs responsible for siderophore and terpenoid production in these species and their relatives. Our findings suggest that P. cryoconitis and P. lusitanus have the potential to produce novel mixtures of siderophores, involving bifunctional IucAC/AcD NIS synthetases, as well as carotenoids and squalene. This study highlights the biotechnological potential of Pedobacter spp. in medicine, agriculture and other industries, emphasizing the need for a continued exploration of its SMs and their applications.

RevDate: 2023-10-28

Alghamdi M, Al-Judaibi E, Al-Rashede M, et al (2023)

Comparative De Novo and Pan-Genome Analysis of MDR Nosocomial Bacteria Isolated from Hospitals in Jeddah, Saudi Arabia.

Microorganisms, 11(10): pii:microorganisms11102432.

Multidrug-resistant (MDR) bacteria are one of the most serious threats to public health, and one of the most important types of MDR bacteria are those that are acquired in a hospital, known as nosocomial. This study aimed to isolate and identify MDR bacteria from selected hospitals in Jeddah and analyze their antibiotic-resistant genes. Bacteria were collected from different sources and wards of hospitals in Jeddah City. Phoenix BD was used to identify the strains and perform susceptibility testing. Identification of selected isolates showing MDR to more than three classes on antibiotics was based on 16S rRNA gene and whole genome sequencing. Genes conferring resistance were characterized using de novo and pan-genome analyses. In total, we isolated 108 bacterial strains, of which 75 (69.44%) were found to be MDR. Taxonomic identification revealed that 24 (32%) isolates were identified as Escherichia coli, 19 (25.3%) corresponded to Klebsiella pneumoniae, and 17 (22.67%) were methicillin-resistant Staphylococcus aureus (MRSA). Among the Gram-negative bacteria, K. pneumoniae isolates showed the highest resistance levels to most antibiotics. Of the Gram-positive bacteria, S. aureus (MRSA) strains were noticed to exhibit the uppermost degree of resistance to the tested antibiotics, which is higher than that observed for K. pneumoniae isolates. Taken together, our results illustrated that MDR Gram-negative bacteria are the most common cause of nosocomial infections, while MDR Gram-positive bacteria are characterized by a wider antibiotic resistance spectrum. Whole genome sequencing found the appearance of antibiotic resistance genes, including SHV, OXA, CTX-M, TEM-1, NDM-1, VIM-1, ere(A), ermA, ermB, ermC, msrA, qacA, qacB, and qacC.

RevDate: 2023-10-28

Yaraguppi DA, Bagewadi ZK, Patil NR, et al (2023)

Iturin: A Promising Cyclic Lipopeptide with Diverse Applications.

Biomolecules, 13(10): pii:biom13101515.

This comprehensive review examines iturin, a cyclic lipopeptide originating from Bacillus subtilis and related bacteria. These compounds are structurally diverse and possess potent inhibitory effects against plant disease-causing bacteria and fungi. Notably, Iturin A exhibits strong antifungal properties and low toxicity, making it valuable for bio-pesticides and mycosis treatment. Emerging research reveals additional capabilities, including anticancer and hemolytic features. Iturin finds applications across industries. In food, iturin as a biosurfactant serves beyond surface tension reduction, enhancing emulsions and texture. Biosurfactants are significant in soil remediation, agriculture, wound healing, and sustainability. They also show promise in Microbial Enhanced Oil Recovery (MEOR) in the petroleum industry. The pharmaceutical and cosmetic industries recognize iturin's diverse properties, such as antibacterial, antifungal, antiviral, anticancer, and anti-obesity effects. Cosmetic applications span emulsification, anti-wrinkle, and antibacterial use. Understanding iturin's structure, synthesis, and applications gains importance as biosurfactant and lipopeptide research advances. This review focuses on emphasizing iturin's structural characteristics, production methods, biological effects, and applications across industries. It probes iturin's antibacterial, antifungal potential, antiviral efficacy, and cancer treatment capabilities. It explores diverse applications in food, petroleum, pharmaceuticals, and cosmetics, considering recent developments, challenges, and prospects.

RevDate: 2023-10-27

Gould AL, Donohoo SA, Román ED, et al (2023)

Strain-level diversity of symbiont communities between individuals and populations of a bioluminescent fish.

The ISME journal [Epub ahead of print].

The bioluminescent symbiosis involving the urchin cardinalfish, Siphamia tubifer, and Photobacterium mandapamensis, a luminous member of the Vibrionaceae, is highly specific compared to other bioluminescent fish-bacteria associations. Despite this high degree of specificity, patterns of genetic diversity have been observed for the symbionts from hosts sampled over relatively small spatial scales. We characterized and compared sub-species, strain-level symbiont diversity within and between S. tubifer hosts sampled from the Philippines and Japan using PCR fingerprinting. We then carried out whole genome sequencing of the unique symbiont genotypes identified to characterize the genetic diversity of the symbiont community and the symbiont pangenome. We determined that an individual light organ contains six symbiont genotypes on average, but varied between 1-13. Additionally, we found that there were few genotypes shared between hosts from the same location. A phylogenetic analysis of the unique symbiont strains indicated location-specific clades, suggesting some genetic differentiation in the symbionts between host populations. We also identified symbiont genes that were variable between strains, including luxF, a member of the lux operon, which is responsible for light production. We quantified the light emission and growth rate of two strains missing luxF along with the other strains isolated from the same light organs and determined that strains lacking luxF were dimmer but grew faster than most of the other strains, suggesting a potential metabolic trade-off. This study highlights the importance of strain-level diversity in microbial associations and provides new insight into the underlying genetic architecture of intraspecific symbiont communities within a host.

RevDate: 2023-10-27

Bachari A, Nassar N, Telukutla S, et al (2023)

In Vitro Antiproliferative Effect of Cannabis Extract PHEC-66 on Melanoma Cell Lines.

Cells, 12(20): pii:cells12202450.

Melanoma, an aggressive form of skin cancer, can be fatal if not diagnosed and treated early. Melanoma is widely recognized to resist advanced cancer treatments, including immune checkpoint inhibitors, kinase inhibitors, and chemotherapy. Numerous studies have shown that various Cannabis sativa extracts exhibit potential anticancer effects against different types of tumours both in vitro and in vivo. This study is the first to report that PHEC-66, a Cannabis sativa extract, displays antiproliferative effects against MM418-C1, MM329 and MM96L melanoma cells. Although these findings suggest that PHEC-66 has promising potential as a pharmacotherapeutic agent for melanoma treatment, further research is necessary to evaluate its safety, efficacy, and clinical applications.

RevDate: 2023-10-26

Depuydt L, Renders L, Abeel T, et al (2023)

Pan-genome de Bruijn graph using the bidirectional FM-index.

BMC bioinformatics, 24(1):400.

BACKGROUND: Pan-genome graphs are gaining importance in the field of bioinformatics as data structures to represent and jointly analyze multiple genomes. Compacted de Bruijn graphs are inherently suited for this purpose, as their graph topology naturally reveals similarity and divergence within the pan-genome. Most state-of-the-art pan-genome graphs are represented explicitly in terms of nodes and edges. Recently, an alternative, implicit graph representation was proposed that builds directly upon the unidirectional FM-index. As such, a memory-efficient graph data structure is obtained that inherits the FM-index' backward search functionality. However, this representation suffers from a number of shortcomings in terms of functionality and algorithmic performance.

RESULTS: We present a data structure for a pan-genome, compacted de Bruijn graph that aims to address these shortcomings. It is built on the bidirectional FM-index, extending the ability of its unidirectional counterpart to navigate and search the graph in both directions. All basic graph navigation steps can be performed in constant time. Based on these features, we implement subgraph visualization as well as lossless approximate pattern matching to the graph using search schemes. We demonstrate that we can retrieve all occurrences corresponding to a read within a certain edit distance in a very efficient manner. Through a case study, we show the potential of exploiting the information embedded in the graph's topology through visualization and sequence alignment.

CONCLUSIONS: We propose a memory-efficient representation of the pan-genome graph that supports subgraph visualization and lossless approximate pattern matching of reads against the graph using search schemes. The C++ source code of our software, called Nexus, is available at under AGPL-3.0 license.

RevDate: 2023-10-26

Hoover RL, Keffer JL, Polson SW, et al (2023)

Gallionellaceae pangenomic analysis reveals insight into phylogeny, metabolic flexibility, and iron oxidation mechanisms.

mSystems [Epub ahead of print].

The iron-oxidizing Gallionellaceae drive a wide variety of biogeochemical cycles through their metabolisms and biominerals. To better understand the environmental impacts of Gallionellaceae, we need to improve our knowledge of their diversity and metabolisms, especially any novel iron oxidation mechanisms. Here, we used a pangenomic analysis of 103 genomes to resolve Gallionellaceae phylogeny and explore their genomic potential. Using a concatenated ribosomal protein tree and key gene patterns, we determined Gallionellaceae has four genera, divided into two groups: iron-oxidizing bacteria (FeOB) Gallionella, Sideroxydans, and Ferriphaselus with iron oxidation genes (cyc2, mtoA) and nitrite-oxidizing bacteria (NOB) Candidatus Nitrotoga with the nitrite oxidase gene nxr. The FeOB and NOB have similar electron transport chains, including genes for reverse electron transport and carbon fixation. Auxiliary energy metabolisms, including S oxidation, denitrification, and organotrophy, were scattered throughout the FeOB. Within FeOB, we found genes that may represent adaptations for iron oxidation, including a variety of extracellular electron uptake mechanisms. FeOB genomes encoded more predicted c-type cytochromes than NOB genomes, notably more multiheme c-type cytochromes (MHCs) with >10 CXXCH motifs. These include homologs of several predicted outer membrane porin-MHC complexes, including MtoAB and Uet. MHCs efficiently conduct electrons across longer distances and function across a wide range of redox potentials that overlap with mineral redox potentials, which can expand the range of usable iron substrates. Overall, the results of pangenome analyses suggest that the Gallionellaceae genera Gallionella, Sideroxydans, and Ferriphaselus have acquired a range of adaptations to succeed in various environments but are primarily iron oxidizers.IMPORTANCENeutrophilic iron-oxidizing bacteria (FeOB) produce copious iron (oxyhydr)oxides that can profoundly influence biogeochemical cycles, notably the fate of carbon and many metals. To fully understand environmental microbial iron oxidation, we need a thorough accounting of iron oxidation mechanisms. In this study, we show the Gallionellaceae FeOB genomes encode both characterized iron oxidases as well as uncharacterized multiheme cytochromes (MHCs). MHCs are predicted to transfer electrons from extracellular substrates and likely confer metabolic capabilities that help Gallionellaceae occupy a range of different iron- and mineral-rich niches. Gallionellaceae appear to specialize in iron oxidation, so it would be advantageous for them to have multiple mechanisms to oxidize various forms of iron, given the many iron minerals on Earth, as well as the physiological and kinetic challenges faced by FeOB. The multiple iron/mineral oxidation mechanisms may help drive the widespread ecological success of Gallionellaceae.

RevDate: 2023-10-26

Pérez Castro S, Peredo EL, Mason OU, et al (2023)

Diversity at single nucleotide to pangenome scales among sulfur cycling bacteria in salt marshes.

Applied and environmental microbiology [Epub ahead of print].

Sulfur-cycling microbial communities in salt marsh rhizosphere sediments mediate a recycling and detoxification system central to plant productivity. Despite the importance of sulfur-cycling microbes, their biogeographic, phylogenetic, and functional diversity remain poorly understood. Here, we use metagenomic data sets from Massachusetts (MA) and Alabama (AL) salt marshes to examine the distribution and genomic diversity of sulfur-cycling plant-associated microbes. Samples were collected from sediments under Sporobolus alterniflorus and Sporobolus pumilus in separate MA vegetation zones, and under S. alterniflorus and Juncus roemerianus co-occuring in AL. We grouped metagenomic data by plant species and site and identified 38 MAGs that included pathways for sulfate reduction or sulfur oxidation. Phylogenetic analyses indicated that 29 of the 38 were affiliated with uncultivated lineages. We showed differentiation in the distribution of MAGs between AL and MA, between S. alterniflorus and S. pumilus vegetation zones in MA, but no differentiation between S. alterniflorus and J. roemerianus in AL. Pangenomic analyses of eight ubiquitous MAGs also detected site- and vegetation-specific genomic features, including varied sulfur-cycling operons, carbon fixation pathways, fixed single-nucleotide variants, and active diversity-generating retroelements. This genetic diversity, detected at multiple scales, suggests evolutionary relationships affected by distance and local environment, and demonstrates differential microbial capacities for sulfur and carbon cycling in salt marsh sediments.IMPORTANCESalt marshes are known for their significant carbon storage capacity, and sulfur cycling is closely linked with the ecosystem-scale carbon cycling in these ecosystems. Sulfate reducers are key for the decomposition of organic matter, and sulfur oxidizers remove toxic sulfide, supporting the productivity of marsh plants. To date, the complexity of coastal environments, heterogeneity of the rhizosphere, high microbial diversity, and uncultured majority hindered our understanding of the genomic diversity of sulfur-cycling microbes in salt marshes. Here, we use comparative genomics to overcome these challenges and provide an in-depth characterization of sulfur-cycling microbial diversity in salt marshes. We characterize communities across distinct sites and plant species and uncover extensive genomic diversity at the taxon level and specific genomic features present in MAGs affiliated with uncultivated sulfur-cycling lineages. Our work provides insights into the partnerships in salt marshes and a roadmap for multiscale analyses of diversity in complex biological systems.

RevDate: 2023-10-25

Yu J, Jiang C, Yamano R, et al (2023)

Unveiling the early life core microbiome of the sea cucumber Apostichopus japonicus and the unexpected abundance of the growth-promoting Sulfitobacter.

Animal microbiome, 5(1):54.

BACKGROUND: Microbiome in early life has long-term effects on the host's immunological and physiological development and its disturbance is known to trigger various diseases in host Deuterostome animals. The sea cucumber Apostichopus japonicus is one of the most valuable marine Deuterostome invertebrates in Asia and a model animal in regeneration studies. To understand factors that impact on host development and holobiont maintenance, host-microbiome association has been actively studied in the last decade. However, we currently lack knowledge of early life core microbiome during its ontogenesis and how it benefits the host's growth.

RESULTS: We analyzed the microbial community in 28 sea cucumber samples from a laboratory breeding system, designed to replicate aquaculture environments, across six developmental stages (fertilized eggs to the juvenile stage) over a three years-period to examine the microbiomes' dynamics and stability. Microbiome shifts occurred during sea cucumber larval ontogenesis in every case. Application of the most sophisticated core microbiome extraction methodology, a hybrid approach with abundance-occupancy core microbiome analyses (top 75% of total reads and > 70% occupation) and core index calculation, first revealed early life core microbiome consisted of Alteromonadaceae and Rhodobacteraceae, as well as a stage core microbiome consisting of pioneer core microbe Pseudoalteromonadaceae in A. japonicus, suggesting a stepwise establishment of microbiome related to ontogenesis and feeding behavior in A. japonicus. More interestingly, four ASVs affiliated to Alteromonadaceae and Rhodobacteraceae were extracted as early life core microbiome. One of the ASV (ASV0007) was affiliated to the Sulfitobactor strain BL28 (Rhodobacteraceae), isolated from blastula larvae in the 2019 raring batch. Unexpectedly, a bioassay revealed the BL28 strain retains a host growth-promoting ability. Further meta-pangenomics approach revealed the BL28 genome reads were abundant in the metagenomic sequence pool, in particular, in that of post-gut development in early life stages of A. japonicus.

CONCLUSION: Repeated rearing efforts of A. japonicus using laboratory aquaculture replicating aquaculture environments and hybrid core microbiome extraction approach first revealed particular ASVs affiliated to Alteromonadaceae and Rhodobacteraceae as the A. japonicus early life core microbiome. Further bioassay revealed the growth promoting ability to the host sea cucumber in one of the core microbes, the Sulfitobactor strain BL28 identified as ASV0007. Genome reads of the BL28 were abundant in post-gut development of A. japonicus, which makes us consider effective probiotic uses of those core microbiome for sea cucumber resource production and conservation. The study also emphasizes the importance of the core microbiome in influencing early life stages in marine invertebrates. Understanding these dynamics could offer pathways to improve growth, immunity, and disease resistance in marine invertebrates.

RevDate: 2023-10-24

Islam MM, Kolling GL, Glass EM, et al (2023)

Model-driven characterization of functional diversity of Pseudomonas aeruginosa clinical isolates with broadly representative phenotypes.

bioRxiv : the preprint server for biology pii:2023.10.08.561426.

UNLABELLED: Pseudomonas aeruginosa is a leading cause of infections in immunocompromised individuals and in healthcare settings. This study aims to understand the relationships between phenotypic diversity and the functional metabolic landscape of P. aeruginosa clinical isolates. To better understand the metabolic repertoire of P. aeruginosa in infection, we deeply profiled a representative set from a library of 971 clinical P. aeruginosa isolates with corresponding patient metadata and bacterial phenotypes. The genotypic clustering based on whole-genome sequencing of the isolates, multi-locus sequence types, and the phenotypic clustering generated from a multi-parametric analysis were compared to each other to assess the genotype-phenotype correlation. Genome-scale metabolic network reconstructions were developed for each isolate through amendments to an existing PA14 network reconstruction. These network reconstructions show diverse metabolic functionalities and enhance the collective P. aeruginosa pangenome metabolic repertoire. Characterizing this rich set of clinical P. aeruginosa isolates allows for a deeper understanding of the genotypic and metabolic diversity of the pathogen in a clinical setting and lays a foundation for further investigation of the metabolic landscape of this pathogen and host-associated metabolic differences during infection.

IMPACT STATEMENT: Pseudomonas aeruginosa is a leading cause of infections in immunocompromised individuals and in healthcare settings. The treatment of these infections is complicated by the presence of a variety of virulence mechanisms and metabolic uniqueness among clinically relevant strains. This study is an attempt to understand the relationships between isolate phenotypic diversity and the functional metabolic landscape within a representative group of P. aeruginosa clinical isolates. Characterizing this rich set of clinical P. aeruginosa isolates allows for a deeper understanding of genotypic and metabolic diversity of the pathogen in a clinical setting and lays a foundation for further investigation of the metabolic landscape of this pathogen and host-associated metabolic differences in infection.

RevDate: 2023-10-23

Gao Z, Bian J, Lu F, et al (2023)

Corrigendum: Triticeae crop genome biology: an endless frontier.

Frontiers in plant science, 14:1280660.

[This corrects the article DOI: 10.3389/fpls.2023.1222681.].

RevDate: 2023-10-21

Liang Y, Y Han (2023)

Pan-genome brings opportunities to revitalize ancient crop foxtail millet.

Plant communications pii:S2590-3462(23)00281-X [Epub ahead of print].

The annual grass, foxtail millet (Setaria italica), was first domesticated ∼11,000 years ago, making it one of the most ancient crops in the world, and it was the mainstay underpinning the development of Asian farming civilization. The looming food shortage crisis aggravated by climate change threatens to make current agriculture unsustainable. As a C4 photosynthetic plant, foxtail millet has attracted increasing attention from the scientific and industrial farming communities because of its drought tolerance, good adaptability and nutritional properties. Foxtail millet and green foxtail (Setaria viridis) have been developed into ideal model systems for C4 crops due to their compact diploid genomes, rich genetic diversity, self-pollination, high-throughput transformation, short life cycles and ease of laboratory culture.

RevDate: 2023-10-19

Cumsille A, Serna-Cardona N, González V, et al (2023)

Exploring the biosynthetic gene clusters in Brevibacterium: a comparative genomic analysis of diversity and distribution.

BMC genomics, 24(1):622.

Exploring Brevibacterium strains from various ecosystems may lead to the discovery of new antibiotic-producing strains. Brevibacterium sp. H-BE7, a strain isolated from marine sediments from Northern Patagonia, Chile, had its genome sequenced to study the biosynthetic potential to produce novel natural products within the Brevibacterium genus. The genome sequences of 98 Brevibacterium strains, including strain H-BE7, were selected for a genomic analysis. A phylogenomic cladogram was generated, which divided the Brevibacterium strains into four major clades. A total of 25 strains are potentially unique new species according to Average Nucleotide Identity (ANIb) values. These strains were isolated from various environments, emphasizing the importance of exploring diverse ecosystems to discover the full diversity of Brevibacterium. Pangenome analysis of Brevibacterium strains revealed that only 2.5% of gene clusters are included within the core genome, and most gene clusters occur either as singletons or as cloud genes present in less than ten strains. Brevibacterium strains from various phylogenomic clades exhibit diverse BGCs. Specific groups of BGCs show clade-specific distribution patterns, such as siderophore BGCs and carotenoid-related BGCs. A group of clade IV-A Brevibacterium strains possess a clade-specific Polyketide synthase (PKS) BGCs that connects with phenazine-related BGCs. Within the PKS BGC, five genes, including the biosynthetic PKS gene, participate in the mevalonate pathway and exhibit similarities with the phenazine A BGC. However, additional core biosynthetic phenazine genes were exclusively discovered in nine Brevibacterium strains, primarily isolated from cheese. Evaluating the antibacterial activity of strain H-BE7, it exhibited antimicrobial activity against Salmonella enterica and Listeria monocytogenes. Chemical dereplication identified bioactive compounds, such as 1-methoxyphenazine in the crude extracts of strain H-BE7, which could be responsible of the observed antibacterial activity. While strain H-BE7 lacks the core phenazine biosynthetic genes, it produces 1-methoxyphenazine, indicating the presence of an unknown biosynthetic pathway for this compound. This suggests the existence of alternative biosynthetic pathways or promiscuous enzymes within H-BE7's genome.

RevDate: 2023-10-19

Srivastava N, Shiburaj S, SK Khare (2023)

Pan-genomic comparison of a potential solvent-tolerant alkaline protease-producing Exiguobacterium sp. TBG-PICH-001 isolated from a marine habitat.

3 Biotech, 13(11):371.

UNLABELLED: The identification and applicability of bacteria are inconclusive until comprehended with genomic repositories. Our isolate, Exiguobacterium sp. TBG-PICH-001 exhibited excellent halo- and organic solvent tolerance with simultaneous production of alkaline protease/s (0.512 IU/mL). The crude protease (1 IU) showed a 43.57% degradation of whey protein. The bulk proteins in the whey were hydrolyzed to smaller peptides which were evident in the SDS-PAGE profile. With such characteristics, the isolate became interesting for its genomic studies. The TBG-PICH-001 genome was found to be 3.14 Mb in size with 17 contigs and 47.33% GC content. The genome showed 3176 coding genes, and 2699 genes were characterized for their functionality. The Next-Generation-Sequencing of the genome identified only the isolate's genus; hence we attempted to delineate its species position. The genomes of the isolate and other representative Exiguobacterium spp. were compared based on orthologous genes (Orthovenn2 server). A pan-genomic analysis revealed the match of TBG-PICH-001 with 15 uncharacterized Exiguobacterium genomes at the species level. All these collectively matched with Exiguobacterium indicum, and the results were reconfirmed through phylogenetic studies. Further, the Exiguobacterium indicum genomes were engaged for homology studies rendering 11 classes of protease genes. Two putative proteases (Zinc metalloprotease and Serine protease) obtained from homology were checked for PCR amplification using genomic DNA of TBG-PICH-001 and other Exiguobacterium genomes. The results showed amplification only in the Exiguobacterium indicum genome. These protease genes, after sequencing, were matched with the TBG-PICH-001 genome. Their presence in its whole genome experimentally validated the study.

SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1007/s13205-023-03796-5.

RevDate: 2023-10-18

Hadjifrangiskou M, Reasoner S, Flores V, et al (2023)

Defining the Infant Male Urobiome and Moving Towards Mechanisms in Urobiome Research.

Research square.

The urinary bladder harbors a community of microbes termed the urobiome, which remains understudied. In this study, we present the urobiome of healthy infant males from samples collected by transurethral catheterization. Using a combination of extended culture and amplicon sequencing, we identify several common bacterial genera that can be further investigated for their effects on urinary health across the lifespan. Many genera were shared between all samples suggesting a consistent urobiome composition among this cohort. We note that, for this cohort, early life exposures including mode of birth (vaginal vs. Caesarean section), or prior antibiotic exposure did not influence urobiome composition. In addition, we report the isolation of culturable bacteria from the bladders of these infant males, including Actinotignum schaalii, a bacterial species that has been associated with urinary tract infection in older male adults. Herein, we isolate and sequence 9 distinct strains of A. schaalii enhancing the genomic knowledge surrounding this species and opening avenues for delineating the microbiology of this urobiome constituent. Furthermore, we present a framework for using the combination of culture-dependent and sequencing methodologies for uncovering mechanisms in the urobiome.


ESP Quick Facts

ESP Origins

In the early 1990's, Robert Robbins was a faculty member at Johns Hopkins, where he directed the informatics core of GDB — the human gene-mapping database of the international human genome project. To share papers with colleagues around the world, he set up a small paper-sharing section on his personal web page. This small project evolved into The Electronic Scholarly Publishing Project.

ESP Support

In 1995, Robbins became the VP/IT of the Fred Hutchinson Cancer Research Center in Seattle, WA. Soon after arriving in Seattle, Robbins secured funding, through the ELSI component of the US Human Genome Project, to create the original ESP.ORG web site, with the formal goal of providing free, world-wide access to the literature of classical genetics.

ESP Rationale

Although the methods of molecular biology can seem almost magical to the uninitiated, the original techniques of classical genetics are readily appreciated by one and all: cross individuals that differ in some inherited trait, collect all of the progeny, score their attributes, and propose mechanisms to explain the patterns of inheritance observed.

ESP Goal

In reading the early works of classical genetics, one is drawn, almost inexorably, into ever more complex models, until molecular explanations begin to seem both necessary and natural. At that point, the tools for understanding genome research are at hand. Assisting readers reach this point was the original goal of The Electronic Scholarly Publishing Project.

ESP Usage

Usage of the site grew rapidly and has remained high. Faculty began to use the site for their assigned readings. Other on-line publishers, ranging from The New York Times to Nature referenced ESP materials in their own publications. Nobel laureates (e.g., Joshua Lederberg) regularly used the site and even wrote to suggest changes and improvements.

ESP Content

When the site began, no journals were making their early content available in digital format. As a result, ESP was obliged to digitize classic literature before it could be made available. For many important papers — such as Mendel's original paper or the first genetic map — ESP had to produce entirely new typeset versions of the works, if they were to be available in a high-quality format.

ESP Help

Early support from the DOE component of the Human Genome Project was critically important for getting the ESP project on a firm foundation. Since that funding ended (nearly 20 years ago), the project has been operated as a purely volunteer effort. Anyone wishing to assist in these efforts should send an email to Robbins.

ESP Plans

With the development of methods for adding typeset side notes to PDF files, the ESP project now plans to add annotated versions of some classical papers to its holdings. We also plan to add new reference and pedagogical material. We have already started providing regularly updated, comprehensive bibliographies to the ESP.ORG site.

Electronic Scholarly Publishing
961 Red Tail Lane
Bellingham, WA 98226

E-mail: RJR8222 @

Papers in Classical Genetics

The ESP began as an effort to share a handful of key papers from the early days of classical genetics. Now the collection has grown to include hundreds of papers, in full-text format.

Digital Books

Along with papers on classical genetics, ESP offers a collection of full-text digital books, including many works by Darwin (and even a collection of poetry — Chicago Poems by Carl Sandburg).


ESP now offers a much improved and expanded collection of timelines, designed to give the user choice over subject matter and dates.


Biographical information about many key scientists.

Selected Bibliographies

Bibliographies on several topics of potential interest to the ESP community are now being automatically maintained and generated on the ESP site.

ESP Picks from Around the Web (updated 07 JUL 2018 )