• by Herault, L., Gabriel, A. A., Duc, B., Dolfi, B., Shah, A., Joyce, J. A., Gfeller, D.
    Multimodal single-cell atlases comprising hundreds of thousands of cells provide unique resources for exploring complex biological tissues and generating testable hypotheses. To streamline the analysis of such large datasets, we introduce SuperCell2.0, a robust workflow to build (semi-)supervised multimodal metacells. We demonstrate that multimodal metacells outperform metacells built with a single modality, improve inter-modality consistency, and facilitate integration of multiomic single-cell datasets. SuperCell2.0 can further leverage full or partial cell type annotations to improve metacell quality. This workflow enables us […]
  • by Song, J., Li, Q.
    Spatial transcriptomics (ST) measures gene expression while preserving spatial context within tissues, enabling detailed characterization of tissue organization. As ST technologies advance, aligning datasets across tissue sections, individuals, platforms, and developmental stages has become increasingly important but remains challenging due to sparse expression, biological heterogeneity, and geometric distortions between slices. We introduce OT-knn, a method for ST alignment that integrates local neighborhood information within an optimal transport framework. Rather than relying solely on single-spot expression, OT-knn reconstructs each spot using […]
  • by Tang, Q., Mchaourab, H., Wu, T., Soubasis, B.
    AlphaFold3 architecture represented an important leap relative to Alphafold2 by enabling the inclusion of protein ligands in the prediction network. Ligand-dependent structural rearrangements are inherently difficult to predict computationally as they imply transitions between states separated by large energy differences. Here we apply AlphaFold3 to predict nucleotide-dependent changes in the conformational cycle of representative ABC transporters that have been extensively investigated by experimental structural biology techniques. We show that under similar conditions, AlphaFold3 predictions sample experimentally observed conformations. Moreover, the […]
  • by Lapp, Z., Leitner, T.
    Motivation: Understanding how virus sequences are shaped by selection can inform vaccine design and transmission inference. Modeling within-host evolution to interrogate these questions requires a detailed mechanistic framework that accurately captures sequence diversification. The CD8+ cytotoxic T-lymphocyte (CTL) response plays an important role in immune-mediated selection and can leave strong signatures in virus sequences; however, existing sequence-based within-host virus modeling frameworks do not explicitly include an HLA-aware CTL response. Results: We extended our previously published within-host sequence evolution simulator, wavess, […]
  • by Adasme, M. F., Ochoa, D., Lopez, I., Do, H.-M.-A., McDonagh, E. M., O'Boyle, N. M., Leach, A. R., Zdrazil, B.
    Chemical probes are indispensable tools for validating therapeutic hypotheses, yet their broader impact on early-stage drug discovery remains unquantified. To our knowledge, this study represents the first systematic, large-scale investigation of the chemical probe literature. By screening over 18 million articles using a high-quality dictionary of 561 chemical probes, we identified 20,000 articles mentioning a chemical probe which resulted in 5,558 unique target-disease (T-D) associations. Our analysis yields four principal findings that redefine the utility of these chemicals: First, we […]
  • by Min, J., Vishnyakova, O., Brooks-Wilson, A., Elliott, L. T.
    Identifying physiological sweet spots (optimal ranges for homeostasis) is essential for precision medicine. However, traditional statistical methods often rely on globally linear or locally jagged models that struggle to capture the smooth, non-linear nature of biological regulation in high-dimensional data. We present the Quantile Feature Selection Network (Q-FSNet), a neural network-based framework that integrates quantile regression, feature selection, and uncertainty estimation to identify biomarkers with sweet spots. Unlike traditional methods, Q-FSNet learns continuous response curves without requiring pre-specified number of […]
  • by Xia, T., Zhao, X., Islam, S. S. M., Mohammed, K. K., Xie, Z., Zhi, D.
    Magnetic resonance imaging (MRI)-derived phenotypes (IDP) has enabled the discovery of numerous genomic loci associated with brain structure and function. However, most existing IDPs and learned representations are derived from a single imaging modality, missing complementary information across modalities and potentially limiting the scope of genetic discovery. Here, we introduce a multimodal contrastive learning framework to derive heritable representations from paired T1- and T2-weighted MRIs. Unlike single-modality reconstruction-based models, we designed a momentum-based contrastive learning framework. As a result, our […]
  • by Ma, Z., Liu, M., Wang, S., Wang, S., Zang, C.
    Spatial organization of the genome plays a vital role in defining cell identity and regulating gene expression. The three-dimensional (3D) genome structure can be measured by sequencing-based techniques such as Hi-C usually on the cell population level or by imaging-based techniques such as chromatin tracing at the single-cell level. Chromatin tracing is a multiplexed DNA fluorescence in situ hybridization (FISH)-based method that can directly map the 3D positions of genomic loci along individual chromosomes at single-molecule resolution. However, few computational […]
  • by Shahid, A., Ulrich, J.-U., Kuehnert, D.
    High genomic variability among viral species makes sequence classification highly dependent on multiple sequence alignment (MSA) methods, which are both computationally intensive and sensitive to data quality issues. To provide a more efficient and robust alternative, we developed DiCNN-UniK, a Dual-Input Convolutional Neural Network (DiCNN) utilizing unique k-mer signatures and universal k-mer libraries to generate novel and direct embeddings. Instead of relying on k-mer frequency patterns, DiCNN-UniK directly leverages k-mer embedding information, which provides a clear picture of local genomic […]
  • by Sapoval, N., Nakhleh, L.
    Gene tree parsimony (GTP) is a common approach for efficient reconciliation of multiple discordant gene tree phylogenies for the inference of a single species tree. However, despite the popularity of GTP methods due to their low computational costs, prior work has shown that some commonly employed parsimony costs are statistically inconsistent under the multispecies coalescent process. Furthermore, a fine-grained analysis of the inconsistency has indicated potentially complimentary behavior of duplication and deep coalescence costs for symmetric and asymmetric species trees. […]
  • by Xia, F., Baudis, M., Anisimova, M.
    Short tandem repeats (STRs) are a major source of genetic variation, yet their potential for genome-wide population structure inference remains underexplored. Here we present a multi-modal framework for STR-based population inference, integrating unsupervised clustering, supervised population assignment, and a novel admixture inference model, Directional Non-negative Matrix Factorization (dNMF). Applying this framework to thousands of genomes from multiple global cohorts, we first demonstrate that genome-wide STR variations provide substantially finer resolution of human population structure than single-nucleotide polymorphisms (SNPs), particularly at […]
  • by Jiang, C., Zheng, R., Ji, Y., Cao, S., Fang, Y., Wang, Z., Wang, R., Liang, S., Tao, S.
    Single-cell RNA sequencing enables high-resolution characterization of cellular heterogeneity, yet integrating datasets from diverse sources remains challenging due to batch effects. Current methods rely on implicit feature disentanglement and and lack geometric constraintsoften result in under-correction, over-correction, or compromised biological fidelity. Here, we present iDLC, an interpretable deep learning framework that performs dual-level correction through explicit feature disentanglement and optimal transport – regularized adversarial alignment. iDLC separates biological and technical components within a structured latent space, then leverages high-confidence mutual […]
  • by Sefa, S. M., Sarkar, J., Robin, A. H. K., Uddin, M.
    Protein function depends on interactions between structural domains and regulatory motifs. Yet current tools analyze these elements separately, hindering investigation of disease mutations affecting evolutionarily conserved, structurally constrained motifs. We present ProteoMapper, a computational framework integrating HMMER-based domain annotation with user-defined motif detection to quantify motif-domain spatial relationships in protein families. ProteoMapper introduces two discovery metrics: (1) positional conservation scoring, identifying motifs at identical alignment coordinates in [≥] N% of sequences (default 60%), indicating purifying selection; (2) Motif-Domain Coverage Score […]
  • by Bolut, C., Pacary, A., Pieruccioni, L., Ousset, M., Paupert, J., Casteilla, L., Simoncini, D.
    Machine learning (ML) models are effective at classifying images across various fields, including biology. However, their performance on biomedical images is often limited by the small size of available datasets that are constrained by the time-consuming and costly nature of experimental data collection. A review of the literature shows that many studies using biomedical images fail to follow ML best practices. This study focuses on regenerative medicine, which aims to promote tissue regeneration rather than scarring. To explore this process, […]
  • by Zhou, Y., Wei, C., Sun, M., Wang, L., Song, J., Xu, F., Li, Y., Zheng, W., Zhang, Y.
    Modeling protein conformational landscapes is essential for understanding dynamics, allostery, and drug discovery, yet existing resources lack diverse conformational coverage, energetic annotations, or benchmarking standards. ProteinConformers (https://zhanggroup.org /ProteinConformers) provides 2.7 million geometry-optimized conformations generated with a multi-seed molecular dynamics strategy, paired with 13.7 million energy evaluations and 5.5 million similarity annotations. It delivers continuous landscapes from non-native to near-native states, benchmarking framework for multi-conformation generators, and an interactive analysis platform.
  • by Zondi, S., Mtambo, S., Buthelezi, N., Shunmugam, L., Magwenyane, A., Kumalo, H. M.
    Chikungunya virus (CHIKV) infection is one of the major public health concerns in several countries around the world. CHIKV non-structural protein 2 (nsP2) is a promising drug design target due to the enzymes multifunctional properties that facilitate viral replication and propagation. To date, there is an evident lack of preventative and therapeutic developments that can be used against CHIKV. Drug repurposing is a time-saving and cost-effective method used for the development of new drugs. In this study, drug repurposing was […]
  • by Abdollahi, N., Kaveh, S., Shayesteh, S., Mommahed, S., Alemzadeh, Y., Zarrin, R., Chaker Hosseini Zavareh, F., Esmaeili, P., Hassanzadeh, R., Kossida, S., Eslahchi, C.
    Adaptive immune receptor repertoire sequencing (AIRR seq) enables large scale profiling of B and T cell receptor diversity and has become a cornerstone of modern computational immunology. However, AIRR seq provides only a partial and lossy molecular snapshot of immune dynamics, lacking explicit ground truth for clonal ancestry, lineage trajectories, antigen specificity, and longitudinal immune evolution. This limitation complicates benchmarking, method validation, and mechanistic interpretation of repertoire analysis pipelines. Here, we introduce UnivAIRRse, a unified hierarchical framework that organizes AIRR […]
  • by Goncalves, D. M., Patricio, A., Costa, R. S., Henriques, R.
    The growing availability and complexity of omics data have driven the development of specialized algorithms for modeling molecular systems. Although graph-based learning methods effectively represent biological interactions, they often neglect the statistical information embedded in node and edge annotations. To address this limitation, we propose a novel graph-based framework that integrates structured statistical distributions into nodes and edges, capturing probabilistic characteristics of molecular relationships. We evaluate the proposed approach on omics datasets from five cancer types across multiple clinical outcomes, […]
  • by Munoz-Gacitua, D., Blamey, J.
    The LRLLR cell-penetrating motif can be transferred to confer membrane translocation activity, but only to compatible recipient peptides. Using umbrella sampling molecular dynamics simulations, we demonstrate that C-terminal LRLLR addition to the pro-apoptotic smacN peptide eliminates its translocation barrier entirely, transforming a +65 kJ/mol barrier into a -50 kJ/mol energy well. In contrast, N-terminal LRLLR addition to the neuroprotective NR2B9c peptide increases the translocation barrier from +85 to +100 kJ/mol, demonstrating that motif transfer can prove counterproductive for incompatible sequences. […]
  • by Gonzalez-Bermejo, M., Serrano-Ron, L., Garcia-Martin, S., Lapuente-Santana, O., Sanz-Portillo, I., Gonzalez-Martinez, P., Gomez-Lopez, G., Al-Shahrour, F.
    Intratumoral heterogeneity (ITH) is a major determinant of therapeutic failure, yet its impact on drug response across cancers remains incompletely understood. Here, we present the Therapeutic Cancer Cell Atlas (TCCA), a pan-cancer single-cell resource integrating ~1.8 million transcriptomes from 537 patients and 183 cancer cell lines spanning 34 tumor types. By combining single-cell transcriptomics with copy-number alteration inference and computational drug-response prediction, we systematically map therapeutic heterogeneity at subclonal resolution across cancers. Using this framework, we identify ten recurrent therapeutic […]

Related Journals