• by Huang, B., Luo, Y., Zhuang, Y., He, S., Yuan, C.
    Accurate prediction of long non-coding RNA (lncRNA) subcellular localization is crucial for understanding its biological functions. In this study, we propose a novel deep learning framework, LncMamba, which utilizes a two-layer FPN network for multi-scale feature extraction and introduces the Mamba network to lncRNA localization prediction tasks for the first time. Based on this, we improved the localization-specific attention mechanism, allowing the model to more effectively focus on key sequence motifs related to localization. Additionally, through statistical analysis of localization […]
  • by Boyle, I. A., Aquib, N. A., Kocak, M., Creasi, R., Montgomery, P. G., Campbell, C. D., Dempster, J. M.
    Overrepresentation analysis (ORA) is used to identify the biological relationships in a list of genes by testing gene sets for enrichment in the query. However, the inconsistent definition and highly overlapping nature of gene set databases can make interpreting ORA results difficult. Here, we introduce GeneTEA, a model that takes in free-text gene descriptions and incorporates several natural language processing methods to learn a sparse gene-by-term embedding, which can be treated as a de novo gene set database. We benchmark […]
  • by Hallinan, C., Ji, H. J., Salzberg, S. L., Fan, J.
    The accuracy of spatial gene expression profiles generated by probe-based in situ spatially-resolved transcriptomic technologies depends on the specificity with which probes bind to their intended target gene. Off-target binding, defined as a probe binding to something other than the target gene, can distort a gene's true expression profile, making probe specificity essential for reliable transcriptomics. Here, we investigate off-target binding in the 10x Genomics Xenium v1 Human Breast Gene Expression Panel. We developed a software tool, Off-target Probe Tracker […]
  • by Lapin, J., Nilsson, A., Wilhelm, M., Käll, L.
    A fundamental challenge in mass spectrometry-based proteomics is determining which peptide generated a given MS2 spectrum. Peptide sequencing typically relies on matching spectra against a known sequence database, which in some applications is not available. Deep learning- based de novo sequencing can address this limitation by directly predicting peptide sequences from MS2 data. We have seen the application of the transformer architecture to de novo sequencing produce state-of-the-art results on the so-called nine-species benchmark. In this study, we propose an […]
  • by Li, C., Mowlaei, M. E., Human Genome Structural Variation Consortium,, HGSVC Functional Analysis Working Group,, Carnevale, V., Kumar, S., Shi, X.
    High-throughput chromosome conformation capture sequencing (Hi-C) is a key technology for studying the three-dimensional (3D) structure of genomes and chromatin folding. Hi-C data reveals important patterns of genome organization such as topologically associating domains (TADs) and chromatin loops with critical roles in transcriptional regulation and disease etiology and progression. However, the relatively low resolution of existing Hi-C data often hinders robust and reliable inference of 3D structures. Hence, we propose TRUHiC, a new computational method that leverages recent state-of-the-art deep […]
  • by Davydzenka, K., Caravagna, G., Sanguinetti, G.
    Genome aneuploidy, characterized by Copy Number Variations (CNVs), profoundly alters gene expression in cancer. CNVs can directly influence transcription through gene dosage effects or indirectly through compensatory regulatory mechanisms. However, existing differential gene expression (DGE) testing methods do not differentiate between these mechanisms, conflating all expression changes and limiting biological interpretability. This misclassification can obscure key genes involved in tumor adaptation and progression, hindering biomarker discovery and leading to incomplete insights into cancer biology. To address this, we developed DeConveil, […]
  • by Buytenhuijs, F., Ankan, A., Textor, J.
    To better understand immune responses, comparing the abundance of T cell receptors (TCRs) between conditions can provide insights into which T cells have proliferated or were involved in immune activation. This requires methods that can accurately identify significant differences in TCR-seq data. For conventional RNA-seq data, well-established differential gene expression (DGE) analysis tools such as DESeq2 and edgeR have been developed. However, applying these methods to TCR sequencing (TCR-seq) data presents additional challenges. TCR-seq data is highly sparse, overdispersed, and […]
  • by Huson, D.
    Phylogenetic trees and networks play a central role in biology, bioinformatics, and mathematical biology, and producing clear, informative visualizations of them is an important task. We present new algorithms for visualizing rooted phylogenetic networks as either combining or transfer networks, in both cladogram and phylogram style. In addition, we introduce a layout algorithm that aims to improve clarity by minimizing the total stretch of reticulate edges. To address the common issue that biological publications often omit machine-readable representations of depicted […]
  • by Seal, S., Neelon, B.
    Advancements in spatial omics technologies have enabled the measurement of expression profiles of different molecules, such as genes (using spatial transcriptomics), and peptides, lipids, or N-glycans (using mass spectrometry imaging), across thousands of spatial locations within a tissue. While identifying molecules with spatially variable expression is a well-studied statistical problem, robust methodologies for detecting spatially varying co-expression between molecule pairs remain limited. To address this gap, we introduce a Bayesian fused modeling framework for estimating molecular co-expression at both local […]
  • by Zhang, Y., Yu, Z., Yang, D., Chen, Q., Zhang, Y., Li, Z., Wang, Y., Wang, C.
    Gene expression is shaped by transcription regulatory networks (TRNs), where transcription regulators interact within regulatory elements in a context-specific manner. Despite significant efforts, understanding the intricate interactions of transcription regulators across different genomic regions and cell types remains a major challenge, largely due to data sparsity. Here, we introduce ChromBERT, a foundation model pre-trained on large-scale human ChIP-seq datasets. ChromBERT effectively captures the interaction syntax of approximately one thousand transcription regulators across diverse genomic contexts, generating interpretable representations of context-specific […]
  • by Dennler, O., Chenel, E., Coste, F., Blanquart, S., Belleannee, C., Theret, N.
    Summary: FUSE-PhyloTree is a phylogenomic analysis software for identifying local sequence conservation associated with the different functions of a multi-functional (e.g., paralogous or multi-domain) protein family. FUSE-PhyloTree introduces an original approach that combines advanced sequence analysis with phylogenetic methods. First, local sequence conservation modules within the family are identified using partial local multiple sequence alignment. Next, the evolution of the detected modules and known protein functions is inferred within the family's phylogenetic tree using three-level phylogenetic reconciliation and ancestral state […]
  • by Feldl, M., Abbaszade, G., Schattenberg, F., Stuckrath, K., Mueller, S., Mueller, C. L.
    Computational optimal transport-based approaches have emerged as promising tools for the integration and interpretation of complex single-cell data. In this study, we introduce an integrative Optimal Transport (OT) framework for spatiotemporal and multimodal bacterial single-cell analysis using Gaussian Mixture Model (GMM) OT, termed biscot (bacterial integrative single-cell optimal transport). We show that biscot, equipped with a novel global-to-local GMM initialization, outperforms classical OT and entropically-regularized OT methods both in terms of speed and accuracy for disentangling complex bacterial communities mixtures […]
  • by Melendez-Gallardo, J., Plada-Delgado, D.
    This study investigates the impact of serotonin (5-HT) on motoneuron electrical activity and muscle force generation. Using a computational model, we explore how 5-HT receptors influence motoneuron excitability and muscle function at different stimulation frequencies. Our results demonstrate that physiological 5-HT release increases motoneuron excitability, particularly at higher frequencies (40 Hz and 100 Hz), consistent with the known excitatory role of serotonin through 5-HT2a receptor activation. However, high concentrations of 5-HT lead to decreased motoneuron excitability, potentially due to excessive […]
  • by Li, Y., Sun, M., Raaijmakers, J. M., Mommer, L., Zhang, F., Song, C., Medema, M. H.
    Plants release a substantial fraction of their photosynthesized carbon into the rhizosphere as root exudates, a mix of chemically diverse compounds that drive microbiome assembly. Deciphering how plants modulate the composition and activities of rhizosphere microbiota through root exudates is challenging, as no dedicated computational methods exist to systematically identify microbial root exudate catabolic pathways. Here we used and integrated published information on catabolic genes in bacterial taxa that contribute to their rhizosphere competence. We developed the RhizoSMASH algorithm for […]
  • by Chen, X., Zheng, J., Huang, Z., Xu, Z., Huang, J., Wei, Y., Zhang, H.
    Allosteric regulation plays a pivotal role in modulating protein function and allosteric sites represent a promising target for drug discovery. However, identifying allosteric sites remains challenging due to their structural and evo-lutionary diversity. Here, we present AlloPED, a novel framework that com-bines protein language models and machine learning to predict allosteric sites with high accuracy. AlloPED consists of two modules: AlloPED-pocket, an ensemble model leveraging physicochemical features to predict allosteric pockets; and AlloPED-site, a dilated convolutional neural network (DCNN) augmented […]
  • by Mitrofanov, A., Beisel, C., Baumdicker, F., Alkhnbashi, O., Backofen, R.
    CRISPR Cas systems are adaptive immune mechanisms in bacteria and archaea that protect against invading genetic elements by integrating short fragments of foreign DNA into CRISPR arrays. These arrays consist of repetitive sequences interspersed with unique spacers, guiding Cas proteins to recognize and degrade matching nucleic acids. The integrity of these repeat sequences is crucial for the proper function of CRISPR Cas systems, yet their mutational dynamics remain poorly understood. In this study, we analyzed 56,343 CRISPR arrays across 25,628 […]
  • by Liang, H., Berger, B., Singh, R.
    The three-dimensional organization of chromatin into topologically associating domains (TADs) may impact gene regulation by bringing distant genes into contact. However, many questions about TADs' function and their influence on transcription remain unresolved due to technical limitations in defining TAD boundaries and measuring the direct effect that TADs have on gene expression. Here, we develop consensus TAD maps for human and mouse with a novel "bag-of-genes" approach for defining the gene composition within TADs. This approach enables new functional interpretations […]
  • by Degn, K., Utichi, M., Besora, P. S.-I., Tiberti, M., Papaleo, E.
    Change in protein stability, quantified as the change in Gibbs free energy of folding ({Delta}{Delta}G) in kcal/mol, plays a crucial role in functional alterations of proteins, with misfolding and destabilization commonly associated with pathogenicity. The past two decades have brought the development of bioinformatics tools leveraging evolutionary knowledge, empirical force fields, and machine learning to predict stability alterations. However, existing tools are often optimized towards or trained on limited experimental data, leading to unbalanced datasets and potential overfitting. The research […]
  • by Fadaei, S., Krebs, F. S., Zoete, V.
    Human protein kinases constitute a large superfamily of about 500 genes, historically classified into subfamilies based on phylogenetic relationship. However, many kinases remain unclassified. Phylogeny is typically based on multiple sequence alignments, and neglects the physico-chemical properties of residues at each position of the sequence. By incorporating these properties, we can gain deeper insights beyond basic alignments. Here we use, for the first time, a detailed physico-chemical description of kinases to identify class-specific structural regions, supporting an unsupervised classification method […]
  • by Cooke, J., Wieder, C., Poupin, N., Frainay, C., Ebbels, T., Jourdan, F.
    Initially developed for transcriptomics data, pathway analysis (PA) methods can introduce biases when applied to metabolomics data, especially if input parameters are not chosen with care. This is particularly true for exometabolomics data, where exported metabolites may be far from internal disruptions in the organism. Experimentally evaluating PA methods is practically impossible when the sample's "true" metabolic disruption is unknown. Using in silico metabolic modelling, we simulated metabolic profiles for entire pathway knockouts, providing both a known disruption site as […]

Related Journals