- by Degn, K., Utichi, M., Besora, P. S.-I., Tiberti, M., Papaleo, E.Change in protein stability, quantified as the change in Gibbs free energy of folding ({Delta}{Delta}G) in kcal/mol, plays a crucial role in functional alterations of proteins, with misfolding and destabilization commonly associated with pathogenicity. The past two decades have brought the development of bioinformatics tools leveraging evolutionary knowledge, empirical force fields, and machine learning to predict stability alterations. However, existing tools are often optimized towards or trained on limited experimental data, leading to unbalanced datasets and potential overfitting. The research […]
- by Dennler, O., Chenel, E., Coste, F., Blanquart, S., Belleannee, C., Theret, N.Summary: FUSE-PhyloTree is a phylogenomic analysis software for identifying local sequence conservation associated with the different functions of a multi-functional (e.g., paralogous or multi-domain) protein family. FUSE-PhyloTree introduces an original approach that combines advanced sequence analysis with phylogenetic methods. First, local sequence conservation modules within the family are identified using partial local multiple sequence alignment. Next, the evolution of the detected modules and known protein functions is inferred within the family's phylogenetic tree using three-level phylogenetic reconciliation and ancestral state […]
- by Feldl, M., Abbaszade, G., Schattenberg, F., Stuckrath, K., Mueller, S., Mueller, C. L.Computational optimal transport-based approaches have emerged as promising tools for the integration and interpretation of complex single-cell data. In this study, we introduce an integrative Optimal Transport (OT) framework for spatiotemporal and multimodal bacterial single-cell analysis using Gaussian Mixture Model (GMM) OT, termed biscot (bacterial integrative single-cell optimal transport). We show that biscot, equipped with a novel global-to-local GMM initialization, outperforms classical OT and entropically-regularized OT methods both in terms of speed and accuracy for disentangling complex bacterial communities mixtures […]
- by Fadaei, S., Krebs, F. S., Zoete, V.Human protein kinases constitute a large superfamily of about 500 genes, historically classified into subfamilies based on phylogenetic relationship. However, many kinases remain unclassified. Phylogeny is typically based on multiple sequence alignments, and neglects the physico-chemical properties of residues at each position of the sequence. By incorporating these properties, we can gain deeper insights beyond basic alignments. Here we use, for the first time, a detailed physico-chemical description of kinases to identify class-specific structural regions, supporting an unsupervised classification method […]
- by Melendez-Gallardo, J., Plada-Delgado, D.This study investigates the impact of serotonin (5-HT) on motoneuron electrical activity and muscle force generation. Using a computational model, we explore how 5-HT receptors influence motoneuron excitability and muscle function at different stimulation frequencies. Our results demonstrate that physiological 5-HT release increases motoneuron excitability, particularly at higher frequencies (40 Hz and 100 Hz), consistent with the known excitatory role of serotonin through 5-HT2a receptor activation. However, high concentrations of 5-HT lead to decreased motoneuron excitability, potentially due to excessive […]
- by Li, Y., Sun, M., Raaijmakers, J. M., Mommer, L., Zhang, F., Song, C., Medema, M. H.Plants release a substantial fraction of their photosynthesized carbon into the rhizosphere as root exudates, a mix of chemically diverse compounds that drive microbiome assembly. Deciphering how plants modulate the composition and activities of rhizosphere microbiota through root exudates is challenging, as no dedicated computational methods exist to systematically identify microbial root exudate catabolic pathways. Here we used and integrated published information on catabolic genes in bacterial taxa that contribute to their rhizosphere competence. We developed the RhizoSMASH algorithm for […]
- by Li, X., Whan, A. P., McNeil, M., Andrew, S. C., Dai, X., Fechner, M., Paris, C., Suchecki, R.Genome annotation is critical for understanding functional elements within genomes. Manual curation is a common practice for identifying the functions of genes, particularly those missed by automated annotation pipelines. However, this process is notoriously labour-intensive and time-consuming. In response to these challenges, we present GeneWhisperer, an innovative assistant system designed to facilitate the manual gene functional annotation process. Utilizing a large language model (LLM) agent, GeneWhisperer provides users access to tools appropriate for specific curation tasks in genome annotation. Featuring […]
- by Zhang, B., Zhang, Y., Zhao, Y.Abstract Pancreatic ductal adenocarcinoma (PDAC) is an exceptionally aggressive cancer with a 5-year survival rate of less than 10%, driven by late-stage diagnosis, limited treatment options, and a lack of reliable biomarkers for early detection and prognosis. In this study, we integrated DNA methylation data from TCGA and ICGC cohorts, categorizing samples based on survival time, and identified 684 differentially methylated CpG sites, along with 224 CpG biomarkers significantly associated with patient survival through statistical and machine learning-based analyses. We […]
- by Liu, K., Liu, J., Wang, C.A key challenge in single-cell RNA sequencing (scRNA-seq) analysis is clustering cells based on their expression profiles. Effective clustering requires selecting the most informative gene features whose varying expression levels in different cell types can be used to discriminate between different cell types. This study introduces DIFS, a novel statistical framework designed to enhance discriminative feature selection for scRNA-seq-based cell clustering. DIFS operates in two stages. In the first stage, a modified dip test identifies genes with significant multimodal expression […]
- by Deng, Y., Mao, J., Choi, J., Le Cao, K.-A.Gene regulatory networks (GRNs) provide a fundamental framework for understanding the molecular mechanisms that govern gene expression. Advances in single-cell RNA sequencing (scRNA-seq) have enabled GRN inference at cellular resolution; however, most existing approaches rely on predefined clusters or cell states, implicitly assuming static regulatory programs and potentially missing subtle, dynamic variation in regulation across individual cells. To address these limitations, we introduce NeighbourNet (NNet), a method that constructs cell-specific co-expression networks. NNet first applies principal component analysis to embed […]
- by Park, S. A., Kim, Y., Gurnari, D., Dlotko, P., Hahn, J.The spatial interactions between malignant and immune cells in the tumor microenvironment (TME) play a crucial role in cancer biology and treatment response. Understanding these interactions is critical for predicting prognosis and assessing immunotherapy effectiveness. Conventional methods, which focus on local spatial features, often struggle to achieve robust analysis due to the complex and heterogeneous cellular distributions. We propose a Topological Data Analysis (TDA)-based framework using both global and local spatial features between malignant and immune cells. For the global […]
- by Yuan, C., Patel, K., Shi, H., Wang, H.-L., Wang, F., Li, R., Li, Y., Corces, V., Shi, H., Das, S., Yu, J., Jin, P., Yao, B., Hu, J.Spatial transcriptomics (ST) has shown great potential for unraveling the molecular mechanisms of neurodegenerative diseases. However, most existing analyses of ST data focus on bulk or single-cell resolution, overlooking subcellular compartments such as synapses, which are fundamental structures of the brain's neural network. Here we present mcDETECT, a novel framework that integrates machine learning algorithms and in situ ST (iST) with targeted gene panels to study synapses. mcDETECT identifies individual synapses based on the aggregation of synaptic mRNAs in three-dimensional […]
- by Karimzadeh, M., Sababi, A. M., Momen-Roknabadi, A., Chen, N.-C., Cavazos, T. B., Sekhon, S., Wang, J., Hanna, R., Huang, A., Nguyen, D., Chen, S., Lam, T., Hartwig, A., Fish, L., Li, H., Behsaz, B., Hormozdiari, F., Alipanahi, B., Goodarzi, H.Cell-free RNA (cfRNA) profiling has emerged as a powerful tool for non-invasive disease detection, but its application is limited by data sparsity and complexity, especially in settings with constrained sample availability. We introduce Exai-1, a multi-modal, transformer-based generative foundation model that integrates RNA sequence embeddings with cfRNA abundance data to capture biologically meaningful representations of circulating RNAs. By leveraging both sequence and expression modalities, Exai-1 captures a biologically meaningful latent structure of cfRNA profiles. Pre-trained on over 306 billion tokens […]
- by Yoshida, K., Hisada, S., Takase, R., Okuma, A., Ishida, Y., Kawara, T., Miura-Yamashita, T., Ito, D.Chimeric antigen receptor (CAR)-T cell therapy has shown remarkable success in treating hematological malignancies; however, several challenges remain, including limited efficacy against solid tumors, T cell exhaustion, and lack of T cell persistence, which have restricted its clinical efficacy across various indications. Sequence optimization of CAR constructs offers a promising strategy to enhance therapeutic efficacy of CAR-T cells. Recent advances in machine learning, especially protein language models (PLMs), enable prediction of mutational effects based on sequence representations. Nevertheless, applying PLMs […]
- by Ajmal, H. B., Nandi, S. B., Kebabci, N. B., Ryan, C.Synthetic lethality (SL) is an extreme form of negative genetic interaction, where simultaneous disruption of two non-essential genes causes cell death. SL can be exploited to develop cancer therapies that target tumour cells with specific mutations, potentially limiting toxicity. Pooled combinatorial CRISPR screens, where two genes are simultaneously perturbed and the resulting impacts on fitness estimated, are now widely used for the identification of SL targets in cancer. Various scoring methods have been developed to infer SL genetic interactions from […]
- by Zhu, Y.-H., Zhu, S., Yu, X., Yan, H., Liu, Y., Xie, X., Yu, D.-J., Ye, R.Accurately identifying protein functions is essential to understand life mechanisms and thus advance drug discovery. Although biochemical experiments are the gold standard for determining protein functions, they are often time-consuming and labor-intensive. Here, we proposed a novel composite deep-learning method, MKFGO, to infer Gene Ontology (GO) attributes through integrating five complementary pipelines built on multi-source biological data. MKFGO was rigorously benchmarked on 1522 non-redundant proteins, demonstrating superior performance over 11 state-of-the-art function prediction methods. Comprehensive data analyses revealed that the […]
- by Dash, H., Roberts, T., Weinert, M., Skene, N.Summary MotifPeeker benchmarks epigenomic datasets where no "gold standard" reference exists, using motif enrichment as a key metric. With minimal input, users can analyse their data in a single function and receive an intuitive HTML report. Availability and Implementation MotifPeeker is available on Bioconductor at https://bioconductor.org/packages/devel/bioc/html/MotifPeeker.html. The complete source code is available on GitHub at https://github.com/neurogenomics/MotifPeeker, with full documentation provided at https://neurogenomics.github.io/MotifPeeker. Additionally, the MotifPeeker Docker image is hosted on GitHub at https://github.com/neurogenomics/MotifPeeker/pkgs/container/motifpeeker.
- by Fawzy, M., Marsh, J. A.Intrinsically disordered protein regions (IDPRs) are central to diverse cellular processes but present unique challenges for interpreting genetic variants implicated in human disease. Unlike structured protein domains, IDPRs lack stable three-dimensional conformations and are often involved in regulation through transient interactions and post-translational modifications. These features can affect both the distribution of pathogenic variants and the performance of computational tools used to predict their effects. Here, we systematically assessed the distribution of pathogenic vs benign missense variants across disordered, intermediate, […]
- by Parks, B., Greenleaf, W.The growth of single-cell datasets to multi-million cell atlases has uncovered major scalability problems for single-cell analysis software. Here, we present BPCells, a package for high-performance single-cell analysis of RNA-seq and ATAC-seq datasets. BPCells uses disk-backed streaming compute algorithms to reduce memory requirements by nearly 70-fold compared to in-memory workflows with little to no loss of execution speed. BPCells also introduces high-performance compressed formats based on bitpacking compression for ATAC-seq fragment files and single-cell sparse matrices. These novel compression algorithms […]
- by Wan, S., Zhou, T., Ma, Y., Chen, Y., Zhou, C. C., Peng, J., Lin, L., Luo, W., Gu, W., Liu, Z., Hua, X.Background: Preeclampsia (PE) is a pregnancy-related hypertensive disorder and a leading cause of maternal and perinatal mortality. Current treatments focus primarily on symptom management, as delivery remains the only definitive cure. This underscores the urgent need for innovative therapeutic strategies. Cytokines released by placental immune cells may contribute to the progression of PE and represent promising therapeutic targets. Methods: We conducted single-cell sequencing on placental tissues obtained via cesarean section from patients with severe PE and cases of non-infectious preterm […]