• by Medina Tretmanis, J., Avila-Arcos, M. C., Jay, F., Huerta-Sanchez, E.
    Motivation: Local Ancestry Inference (LAI) allows us to study evolutionary processes in admixed populations, uncover ancestry-specific disease risk factors, and to better understand the demographic history of these populations. Many methods for LAI exist, however, these methods usually focus on cases of intercontinental admixture. In this work, we evaluate both existing and novel methods in challenging scenarios, such as downsampled reference panels, intracontinental admixture, and distant admixture events. Results: We present four novel LAI implementations based on neural network architectures, […]
  • by Da Silva, K., Naffakh, N., Rameix-Welti, M.-A., Lemoine, F.
    In the context of viral epidemic surveillance, generating accurate consensus viral genomes from sequencing data is critical for tracking the emergence of mutations of concern, evaluating the genomic diversity of circulating viruses, and anticipating which viral strains could become most prevalent. However, this task is made difficult by the presence of Deletion-containing Viral Genomes (DelVGs), which contain truncated (or rearranged) and potentially mutated versions of the full length virus genome. Because these DelVGs can outnumber the full genome in terms […]
  • by Karagöl, A., Karagöl, T.
    Amino acid substitutions are often directionally asymmetric due to underlying biophysical constraints and diverse evolutionary pressures. We introduce Tv (variant tension), a kernel regression-based metric that quantifies this directional asymmetry directly from aligned multiple sequence alignments (MSAs). Tv leverages empirical amino acid frequencies and a non-parametric Gaussian kernel to capture nonlinear substitution flows, providing a causality-inspired framework for understanding evolutionary dynamics. We also present a web-based application that implements the calculation, allowing users to input MSAs, adjust parameters (kernel bandwidth;, […]
  • by Su, H., Liang, Y., Xiao, W., Li, H., Liu, X., Yang, Z., Yuan, M., Liu, X.
    The escalating crisis of antimicrobial resistance necessitates novel therapeutic strategies, among which drug combination therapy shows great promise by enhancing efficacy and reducing toxicity. However, identifying effective synergistic pairs from the vast combinatorial space remains experimentally challenging and resource-intensive. To address this, we introduce GCN-Mamba, a deep learning framework that integrates Graph Convolutional Networks (GCN) with the Mamba State Space Model. This architecture captures both local molecular topological structures and global implicit interactions by leveraging Extended 3-Dimensional Fingerprints (E3FP) and […]
  • by Wang, Z., Chen, X., Cui, X., Gao, Z., Li, Z., Li, K., Jiang, R.
    The single-cell assay for transposase-accessible chromatin using sequencing (scATAC-seq) has emerged as a core technology for dissecting cellular epigenomic heterogeneity and gene regulatory programs. With the emergence of atlas-level scATAC-seq datasets, cell type annotation increasingly faces challenges arising from unprecedented data scale and increased cell-type diversity, which together place stringent demands on model reliability and robustness. Here, we introduce HitAnno, a hierarchical language model capable of accurate and scalable cell type annotation in atlas-level scATAC-seq data. Leveraging selected cell-type-specific peaks […]
  • by Ren, Y., Morlot, L., Andrews, J. O., Thrane Hertz, E. P., Mailand, N., Caicedo, J. C.
    Recent advances in cell segmentation successfully produce models that generalize across various cell-lines and imaging types. However, these methods still fail to recognize subcellular structures such as micronuclei (MN), which are rare and tiny DNA-containing structures found outside of the main nucleus and observable under the microscope. While they can be hard to recognize in images, studying MN formation is of great interest because of their relationship to chromosome instability, genotoxicity, and cancer progression. Here we present a segmentation model, […]
  • by Pfennig, A., Akey, J. M.
    Accurate phasing of genetic and epigenetic variation is crucial for many downstream analyses, including association testing, clinical variant interpretation, and inference of population history. Although long-read sequencing significantly improves the continuity and completeness of genome sequencing, reconstructing chromosome-scale haplotypes remains challenging, often requiring the integration of multiple technologies, such as PacBio HiFi and Oxford Nanopore Technologies (ONT) sequencing. While these sequencing platforms detect the epigenetic modification 5-methylcytosine (5mC), current read-based phasing algorithms do not incorporate this information. We developed a […]
  • by Dahmani, L. Z., Banerjee, A.
    Recombinant human Interleukin-2 (rhIL-2, Aldesleukin) is used in immunotherapy for metastatic melanoma and renal cell carcinoma. Low-dose IL-2 has been investigated for administration after adoptive T cell transfer to enhance CAR T expansion and sustain effector function. However, systemic IL-2 can cause severe toxicities and promote expansion of regulatory T cells (Tregs). Previous attempts at mitigating cytokine-mediated side effects involved isolating CAR T cell signaling from endogenous immune responses by developing IL-2/IL-2RB; based selective ligand-receptors systems. Expressing these variant orthogonal […]
  • by Seckin, E., Colinet, D., Danchin, E., Sarti, E.
    Transformer-based models (TBMs) are state-of-the-art deep learning architectures that predict protein structural and functional features with high accuracy. Despite methodological differences, they all rely on large protein sequence datasets structured by homology, as homologous proteins typically share structure and function. However, 5-30% of eukaryotic proteomes consist of orphan proteins – sequences without detectable similarity to known families. Although they may share structural or functional traits with characterized proteins, their lack of homology makes them ideal for evaluating TBM generalization beyond […]
  • by Zhou, S., Zhao, T.
    Genotype-by-environment interactions are central to crop adaptation and yield stability, yet they remain difficult to model for robust prediction across heterogeneous environments. Although enviromic profiling has improved the characterization of dynamic field conditions, most existing genomic prediction methods adopt a late-fusion strategy that encodes genomic and environmental information independently before global integration, thereby limiting their ability to resolve fine-scale, context-dependent G x E effects. Here, we developed GE-BiCross, a hierarchical bidirectional cross-attention framework for maize prediction. GE-BiCross incorporates a dual-path […]
  • by Beeloo, R., Groot Koerkamp, R.
    Motivation. Searching short DNA patterns such as barcodes, primers, or CRISPR spacers within sequencing reads or genomes is a fundamental task in bioinformatics. These problems are instances of multiple approximate string matching (MASM) [Baeza-Yates and Navarro, 1997], which requires locating all occurrences with up to k errors of multiple patterns of length m in a text of length n. Classical approaches based on seeding with exact matches become inefficient for short patterns (m [≤]64 bp) as k increases, producing either […]
  • by Slaninakova, T., Rosinec, A., Cillik, J., Krenek, A., Gresova, K., Porubska, J., Marsalkova, E., Olha, J., Prochazka, D., Hejtmanek, L., Dohnal, V., Berka, K., Svobodova, R., Antol, M.
    The availability of large-scale protein structure collections enables structure-based analysis of their function and evolution beyond what is possible from sequence alone. However, applying three-dimensional structure comparison at scale remains computationally demanding and limits practical exploration of large experimental and predicted collections. This creates a need for fast, structure-based search methods that retain biological relevance while enabling large-scale exploration. In this paper, we present AlphaFind v2, an application for finding structurally similar proteins in the AlphaFold Database (https://alphafold.ebi.ac.uk/) of predicted […]
  • by Swami, H., Eckmann, J.-P., McBride, J. M., Tlusty, T.
    Protein function is executed at the molecular surface, where shape and chemistry act together to govern interaction. Yet most comparison methods treat these aspects separately, privileging either global fold or local descriptors and missing their coupled organization. Here we introduce IFACE (Intrinsic Field–Aligned Coupled Embedding), a correspondence-based framework that aligns protein surfaces through probabilistic coupling of intrinsic geometry with spatially distributed chemical fields. From this alignment, we derive a joint geometric-chemical distance that integrates structural and physicochemical discrepancies within a […]
  • by Corradi, M., Djidrovski, I., Ladeira, L., Staumont, B., Verhoeven, A., Sanz Serrano, J., Rougny, A., Vaez, A., Hemedan, A., Mazein, A., Niarakis, A., de Carvalho e Silva, A., Auffray, C., Wilighagen, E., Kuchovska, E., Schreiber, F., Balaur, I., Calzone, L., Matthews, L., Veschini, L., Gillespie, M. E., Kutmon, M., Koenig, M., van Welzen, M., Hiroi, N., Lopata, O., Klemmer, P., Overall, R., Hofer, T., Satagopam, V., Schneider, R., Teunis, M., Geris, L., Ostaszewski, M.
    As biomedical knowledge keeps growing, resources storing available information multiply and grow in size and complexity. Such resources can be in the format of molecular interaction maps, which represent cellular and molecular processes under normal or pathological conditions. However, these maps can be complex and hard to navigate, especially to novice users. Large Language Models (LLMs), particularly in the form of agentic frameworks, have emerged as a promising technology to support this exploration. In this article, we describe a user-driven […]
  • by Sapoval, N., Treangen, T., Nakhleh, L.
    Motivation: Measures of microbial diversity that can be derived directly from metagenomic sequencing data offer a valuable summary view of the underlying complex systems. Prior work has shown that both taxonomic composition and abundances that are captured by standard diversity measures (e.g., Shannon entropy), and structural variation within the metagenome due to gene duplications, losses and horizontal transfers (HGT), can correlate with the host's health. However, there are no diversity measures available that simultaneously account for the genome architecture and […]
  • by Zhou, X., Wang, Z., Ling, Y., Tian, Q., Zhang, Z., Li, Y., Zhou, P., Chen, L.
    Reconstructing cellular trajectories from time-resolved single-cell transcriptomics is fundamental to understanding processes from embryonic development to cancer progression. While single-cell foundation models (scFMs) promise universal biological representations through large-scale pretraining, their capacity to capture the non-linear dynamics governing cell-fate decisions remains uncharacterized. Here we systematically benchmark multiple scFMs across challenging biomedical scenarios involving branching lineages and continuous state transitions. By coupling zero-shot scFM embeddings with dynamic optimal transport, we evaluated their performance against a traditional highly variable gene (HVG) baseline […]
  • by Tsuchihashi, R., Kinoshita, M.
    Cyclic peptides have emerged as a pivotal modality for next-generation therapeutics, due to their superior biocompatibility, high selectivity, and structural stability. While AI-driven peptide design has advanced rapidly, conventional optimization algorithms are often constrained by initialization biases, which impede the efficient exploration of the vast chemical space. Here, we propose a novel methodology that integrates the protein language model ESM-2 with cyclic permutation averaging of embeddings to resolve this bottleneck. This approach establishes a comprehensive "peptide space", a high-dimensional vector […]
  • by Argentini, A., Fernandez Fernandez, E., Pauwels, J., Gevaert, K.
    Data-independent acquisition (DIA) has become the preferred data acquisition method for mass spectrometry-based proteomics, yet, reproducible workflows for differential expression (DE) analysis and reporting results remain limited. We present DiaReport, an R package that performs precursor- and protein-level DE analysis from DIA-NN output using MSqRob and QFeatures, while generating high-quality, interactive HTML reports through Quarto. DiaReport integrates precursor data, filtering of missing values, normalization, protein summarization and statistical modeling within a single function, supporting both simple pairwise as well as […]
  • by Douglas, G. M., Bobay, L.-M.
    Physicochemically similar amino acids undergo more frequent substitutions compared to dissimilar amino acid pairs. Despite their clear potential, amino acid similarity matrices remain underused in molecular evolution, partially due to the high number of proposed amino acid distance measures and the lack of agreement on which are most accurate. In this study, we assessed the performance of 30 amino acid distance measures, including a new amino acid distance measure we developed based on recent deep mutational scanning data. We compared […]
  • by Chakraborty, D. S., Singh, P. P., Dey, C., Kaur, J.
    We have conducted all atom molecular dynamics simulations of POPC and DPPC lipid bilayers using AMBER Lipid21 force field with eight different water models, including SPC/E, TIP3P, TIP3P-FB, TIP4P-FB, TIP4P-Ew, TIP4P/2005, TIP4P-D, and OPC, to identify the most compatible one without any modification. A number of parameters have been computed in order to understand the structure of the lipid bilayer: Area per lipid, Isothermal compressibility modulus, average Volume per lipid, electron density profile, bilayer thickness, X-ray and neutron scattering form […]

Related Journals