- by Green, A. G., Tasmin, M., Vargas, R., Farhat, M. R.In Mycobacterium tuberculosis, a prevalent and deadly pathogen, resistance to antibiotics evolves primarily through non-synonymous mutations in proteins. Sequence-based analyses are currently used to understand the genetic basis of antibiotic resistance, either via genotype-phenotype association, or via signals of convergent evolution. These methods focus on primary sequence and usually neglect other biological signals such as protein structural information. We hypothesize that integrating the structural context of mutations improves the prediction of effects on function and phenotype. We curate high confidence […]
- by Trimbour, R., Saez-Rodriguez, J., Cantini, L.Chromatin 3D folding creates numerous DNA interactions, participating in gene expression regulation. Single-cell chromatin-accessibility assays now profile hundreds of thousands of cells, challenging existing methods for mapping cis-regulatory interactions. We present CIRCE, a fast and scalable Python package to predict cis-regulatory DNA interactions from single-cell chromatin accessibility data. CIRCE re-implements the Cicero workflow to analyse single-cell atlases, cutting runtime and memory use by several orders of magnitude. We also provide new options to compute metacells, grouping similar cells to reduce […]
- by Nonchev, K., Manaiev, G., Koelzer, V., Raetsch, G.Spot-based spatial transcriptomics (ST) technologies like 10x Visium quantify genome-wide gene expression and preserve spatial tissue organization. However, their coarse spot-level resolution aggregates signals from multiple cells, preventing accurate single-cell analysis and detailed cellular characterization. Here, we present DeepSpot2Cell, a novel DeepSet neural network that leverages pretrained pathology foundation models and spatial multi-level context to effectively predict virtual single-cell gene expression from histopathological images using spot-level supervision. DeepSpot2Cell substantially improves gene expression correlations on a newly curated benchmark we specifically […]
- by Noral, M., Alhusban, S., Han, Q., Pressley, A. H., O'Keefe, S. H., Leanhart, S., Southerland, K. W., McClung, J. M., Annex, B. H.Background: Peripheral arterial disease (PAD) results from atherosclerotic occlusion(s) in leg arteries. Chronic limb-threatening ischemia (CLTI) is the most severe form of PAD. Patients with CLTI suffer from rest pain, ulcers, or gangrene. Clinical outcomes remain poor in patients with CLTI and many investigational approaches, such as promoting angiogenesis, have failed. Understanding cell-specific vs. bulk-RNA changes within muscle offers an opportunity to better understand this disease. Objective: To assess cell-specific alterations in gene and metabolism pathways in endothelial and muscle […]
- by Zsigmond, K., Surendran, A., Chen, L., Quintana, R. A. M.The concept of chemical space is critical in cheminformatics, medicinal chemistry, and machine learning applications. Despite this, the high dimensionality of molecular representations greatly complicates its sampling, analysis, and visualization. A popular approach to overcome problem is to project these representations to a "human-manageable" subspace, usually containing only two dimensions. Non-linear dimensionality reduction techniques are by far the preferred strategy, following the reasoning that their flexibility can accommodate any arbitrary distribution originally present in the high-dimensional space. However, this ignores […]
- by Rabuzin, L., Tarnow, M., Boeva, V.Spatial omics technologies provide rich insights into biological processes by jointly capturing molecular profiles and the spatial organization of cells. The resulting high-dimensional data can be naturally represented as graphs, where Graph Neural Networks (GNNs) offer an effective framework to model interactions in the tissue. Self-supervised pretraining methods such as Bootstrapped Graph Latents (BGRL) and GRACE leverage graph augmentations to build invariances without costly labels. Yet, the design of augmentation strategies remains underexplored, particularly in the context of spatial omics. […]
- by Montero-Tena, J. A., Zanini, S. F., Yildiz, G., Kox, T., Abbadi, A., Snowdon, R. J., Golicz, A. A.Meiotic recombination is essential for generating genetic diversity, driving plant evolution, and enabling crop improvement, yet its uneven distribution across genomes constrains breeding efforts. Here, we investigated the multi-omic landmarks that shape the recombination landscape in Brassica napus by integrating epigenomic, genomic and transcriptomic data with recombination maps derived from large multi-parental rapeseed populations. Predictive machine-learning accurately predicted recombination rates and hotspot location using only feature information. Recombination was generally suppressed in centromeres and other repeat-rich, methylated regions and enriched […]
- by Xi, B., Wang, H., Sun, G., Zhang, B., Mao, R., Ge, Y., Wang, Y., Zhang, J., Pan, Y., Zhou, F., Wang, Y., Liu, Z., Jiang, D., Wang, H., Zhou, W., Huang, B.Deep learning shows promise in structure-based drug discovery, yet challenges persist in generating pharmacologically plausible molecules with valid 3D conformation and decent binding mode in the pocket. We introduce SE3-BiLingoMol, an SE(3)-equivariant Transformer for pocket-based 3D molecule generation, addressing two key limitations of existing language-model approaches. First, it uses Geometric Algebra Transformers for SE(3)-equivariant handling of continuous 3D coordinates. Second, a bidirectional attention mechanism mitigates conformational errors accumulated during autoregressive sampling. These innovations enable SE3-BiLingoMol to generate 2D drug-like, 3D […]
- by Withnell, E., Celik, C., Secrier, M.Cellular plasticity – the ability of cells to change phenotype in response to intrinsic and environmental cues – is central to development, regeneration, and disease, but remains difficult to quantify due to its dynamic, context-dependent nature. Here we introduce a framework that unites AI and geostatistics – graph neural networks and spatial regression models – to both predict and explain cell state variation in spatial transcriptomics data. We formalize state predictability as a quantitative proxy for plasticity, where stable states […]
- by Jain, Y., Jepson, J., Chen, R., Maier, E., Herr, B. W., Puig-Barbe, A., Quardokus, E. M., Qaurooni, D., Yapp, C., Ewing, S. L., Enninful, A., Farzad, N., Bueckle, A., Easter, Q. T., Matuck, B., Zhu, C., Monte, E. M., Purkerson, J. M., Jehrio, M., Misra, R. S., Fan, R., Ginty, F., Karunamurthy, A., Fan, J., Campbell-Thompson, M., Pryhuber, G. S., Byrd, K. M., Hickey, J. W., Börner, K.Endothelial cells are ubiquitously present in the human body and line the luminal surface of blood and lymphatic vessels. The oxygen-dependence of cells impacts their proximity to blood vessels, and consequently, to endothelial cells depending on their functional properties and priorities. This paper presents cell-to-nearest-endothelial-cell distance distributions for various cell types using 399 spatially resolved omics datasets from 14 studies comprising 12 tissue types with a total of 47,349,496 cells. Additionally, we developed an open-source web-based interactive tool, Cell Distance […]
- by Korenskaia, A., Szenei, J., Vader, L., Blin, K., Weber, T., Ziemert, N.Phylogenetic analysis is widely used to predict enzyme function, yet building annotated and reusable trees is labor-intensive and requires extensive knowledge about the specific enzymes. Existing resources rarely cover biosynthetic enzymes and lack the context needed for meaningful analysis. We present PhyloNaP, the first large-scale resource dedicated to phylogenies of biosynthetic enzymes. PhyloNaP provides ~18,500 annotated and interactive trees enriched with chemical, functional, and taxonomic information. Users can classify their own sequences via phylogenetic placement, enabling functional inference in an […]
- by Perez, M., Hong, J., Zweig, A., Azizi, E.Accurate prediction of patient-level disease status from single-cell RNA sequencing (scRNA-seq) data is critical to enabling precision diagnostics. However, study-specific artifacts induce spurious correlations that limit generalization and interpretability. We studied this problem in the context of Multiple Instance Learning (MIL), a framework where each patient is modeled as a set of single-cell profiles. To improve robustness to domain shifts, we propose an adversarial and metric-based approach that learns domain-invariant representations while preserving task-relevant biological variation. We benchmarked our method […]
- by Hu, Y., Cao, Z., Liu, Y.The Root-Mean-Square Deviation (RMSD) metric, coupled with the Kabsch algorithm for optimal superposition, represents a cornerstone of quantitative structural biology. However, its application is fundamentally limited to the comparison of structures with a pre-defined, one-to-one correspondence of equal length, precluding its direct use for the vast majority of biologically relevant comparisons involving proteins of different lengths or unknown equivalences. To overcome this limitation, we introduce OT-RMSD, a novel, parameter-light method that generalizes the RMSD for comparing unequal-length protein structures without […]
- by Mishra, D., Tiwari, A., Srivastava, S., Tripathi, M. B., Kapoor, A.Motivation: Whole-genome sequencing (WGS) is increasingly used for preventive genomics, yet rule-based ACMG engines such as InterVar were tuned for high pre-test probability diagnostics. In screening contexts, these heuristics can inflate pathogenic/likely pathogenic (P/LP) calls, prompting unnecessary follow-up. We sought an interpretable, data- driven recalibration tailored to proactive use. Results: Across 20 WGS cases, InterVar flagged 109 variants as P/LP; only 18 (16.5%) were concordant with ClinVar P/LP assertions. The remaining 83.5% were largely absent from ClinVar (n=68) or mapped […]
- by Dreval, K., Hilton, L. K., Grande, B. M., Coyle, K. M., Cruz, M., Gillis, S., Pararajalingam, P., Rushton, C. K., Shaalan, H., Thomas, N., Winata, H., Wong, J., Yiu, J., Steidl, C., Scott, D. W., Morin, R. D.The surge of genomic data from advanced sequencing technologies is outpacing current analytical pipelines. We introduce LCR-modules, an open-source suite of bioinformatics tools designed for flexible and automated cancer genome data analysis. LCR-modules enable reproducible analysis of diverse cancer genomics data at scale. The suite comprises 49 Snakemake-based workflows organized into three levels, facilitating tasks from low-level quality control to complex cohort-level analyses. LCR-modules support various sequencing types and integrate pipelines such as mutation calling, expression quantification, and cohort-level aggregation, […]
- by Bertin, A. A., Bucher, E. E., Griere, O. O., Hurtado, M. M., L. Rocha, H., Heiland, R., Sundus, A., Macklin, P., Francois-Lavet, V., Rachelson, E., Pancaldi, V.This paper presents PhysiGym, a framework that integrates agent-based biological simulation within standardized reinforcement learning environments. By integrating the agent-based modeling framework PhysiCell with the Gymnasium API, we provide a flexible tool for exploring reinforcement learning strategies to control in silico biological processes. We demonstrate PhysiGym's potential with a case study where a deep reinforcement learning algorithm guides a tumor microenvironment model toward an anti-tumoral state, ultimately achieving tumour elimination. Our results highlight PhysiGym's flexibility for AI-driven biological control and […]
- by Green, A. F., Ribas, C. E., Jandalala, I., Muston, P., O'Cathail, C., Cochrane, G., Ernst, C., Zhao, L., Madrigal, P., Attrill, H., Marygold, S., Lancet, D., Dobzinski, N., Chan, P. P., Lowe, T. M., Bruford, E. A., Seal, R. L., Hermjakob, H., Panneerselvam, K., Finn, R. D., Gurbich, T. A., Griffiths-Jones, S., Fromm, B., Peterson, K. J., Sordyl, D., Bujnicki, J. M., Velankar, S., Appasamy, S. D., Ganguly, S., Zhang, P., He, S., Rutherford, K. M., Wood, V., Lovering, R. C., Picardi, E., Ontiveros, N., Huang, L., Miao, Z., Petrov, A. S., McCann, H., Cavalleri, E., Mesiti, M., Rivas, E., SzikszaiRNAcentral was founded in 2014 to serve as a comprehensive database of non-coding RNA sequences. It began by providing a single unified interface to more specialised resources, and now contains 45 million sequences. It has grown beyond providing a single interface to many specialised resources and now provides several services and analyses. These include secondary structure prediction with R2DT, sequence search, and analysis with Rfam. Since its last publication in 2021, RNAcentral has developed two major features. First, literature integration […]
- by Ross, T. A., Pöntinen, A. K., Holsbo, E., Samuelsen, O., Hegstad, K., Kampffmeyer, M., Corander, J., Gladstone, R. A.Rising antimicrobial resistance (AMR) in Escherichia coli bloodstream infections (BSIs) in high-income settings has typically been dominated by one clone, the sequence type (ST) 131. More specifically, ST131 clade C (ST131-C) is associated with fluoroquinolone resistance and extended-spectrum {beta}-lactamases (ESBLs). Even though urinary tract infections (UTIs) are a known common precursor to BSIs, there is currently limited knowledge on the longitudinal prevalence of ST131-C in UTIs and, therefore, the temporal link between the two infection types. Leveraging available genomic and […]
- by Kholmatov, M., Prummel, K. D., Guezguez, B., Lindberg, E. H., Moura, P. L., Zaugg, J. B.Myeloid neoplasms (MN) are clonal heterogeneous disorders initiated by somatic driver mutations in hematopoietic stem and progenitor cells (HSPCs). Among the most common are mutations in RNA splicing factors, which exert pleiotropic effects but are difficult to study due to altered hematopoietic differentiation and the inability to specifically isolate mutant HSPCs. Single-cell transcriptomics offers a powerful framework to dissect mutant cell states, yet direct genotyping is hampered by data sparsity and often requires costly, labor-intensive approaches. To overcome this limitation, […]
- by Vilicich, F., Yin, S., Su, Z., Wu, Y.Protein conformational flexibility underlies a wide range of biological functions, yet experimentally probing dynamics at atomic resolution remains costly and low-throughput. Here, we present a deep learning framework that predicts protein flexibility directly from static structural descriptors, bypassing the need for molecular dynamics (MD) simulations. Using the ATLAS database of standardized all-atom MD trajectories, we encoded 1,374 protein chains as 30-dimensional Gaussian integral (GI) vectors, global shape and topology invariants of the protein backbone. Principal component analysis of GI profiles […]