BioRxiv Bioinformatics – Mass Spectrometry Blog

Journal Home

RSS

Improving Local Ancestry Inference through Neural Networks
March 13, 2026 by Medina Tretmanis, J., Avila-Arcos, M. C., Jay, F., Huerta-Sanchez, E.
Motivation: Local Ancestry Inference (LAI) allows us to study evolutionary processes in admixed populations, uncover ancestry-specific disease risk factors, and to better understand the demographic history of these populations. Many methods for LAI exist, however, these methods usually focus on cases of intercontinental admixture. In this work, we evaluate both existing and novel methods in challenging scenarios, such as downsampled reference panels, intracontinental admixture, and distant admixture events. Results: We present four novel LAI implementations based on neural network architectures, […]
Accounting for Defective Viral Genomes in viral consensus genome reconstruction, application to influenza virus
March 12, 2026 by Da Silva, K., Naffakh, N., Rameix-Welti, M.-A., Lemoine, F.
In the context of viral epidemic surveillance, generating accurate consensus viral genomes from sequencing data is critical for tracking the emergence of mutations of concern, evaluating the genomic diversity of circulating viruses, and anticipating which viral strains could become most prevalent. However, this task is made difficult by the presence of Deletion-containing Viral Genomes (DelVGs), which contain truncated (or rearranged) and potentially mutated versions of the full length virus genome. Because these DelVGs can outnumber the full genome in terms […]
Directional Variant Tension (Tv): A Causal Framework for Quantifying Substitution Asymmetry
March 12, 2026 by Karagöl, A., Karagöl, T.
Amino acid substitutions are often directionally asymmetric due to underlying biophysical constraints and diverse evolutionary pressures. We introduce Tv (variant tension), a kernel regression-based metric that quantifies this directional asymmetry directly from aligned multiple sequence alignments (MSAs). Tv leverages empirical amino acid frequencies and a non-parametric Gaussian kernel to capture nonlinear substitution flows, providing a causality-inspired framework for understanding evolutionary dynamics. We also present a web-based application that implements the calculation, allowing users to input MSAs, adjust parameters (kernel bandwidth;, […]
GCN-Mamba: Graph Convolutional Network with Mamba for Antibacterial Synergy Prediction
March 12, 2026 by Su, H., Liang, Y., Xiao, W., Li, H., Liu, X., Yang, Z., Yuan, M., Liu, X.
The escalating crisis of antimicrobial resistance necessitates novel therapeutic strategies, among which drug combination therapy shows great promise by enhancing efficacy and reducing toxicity. However, identifying effective synergistic pairs from the vast combinatorial space remains experimentally challenging and resource-intensive. To address this, we introduce GCN-Mamba, a deep learning framework that integrates Graph Convolutional Networks (GCN) with the Mamba State Space Model. This architecture captures both local molecular topological structures and global implicit interactions by leveraging Extended 3-Dimensional Fingerprints (E3FP) and […]
HitAnno: Atlas-level cell type annotation based on scATAC-seq data via a hierarchical language model
March 12, 2026 by Wang, Z., Chen, X., Cui, X., Gao, Z., Li, Z., Li, K., Jiang, R.
The single-cell assay for transposase-accessible chromatin using sequencing (scATAC-seq) has emerged as a core technology for dissecting cellular epigenomic heterogeneity and gene regulatory programs. With the emergence of atlas-level scATAC-seq datasets, cell type annotation increasingly faces challenges arising from unprecedented data scale and increased cell-type diversity, which together place stringent demands on model reliability and robustness. Here, we introduce HitAnno, a hierarchical language model capable of accurate and scalable cell type annotation in atlas-level scATAC-seq data. Leveraging selected cell-type-specific peaks […]
mnDINO: Accurate and robust segmentation of micronuclei with vision transformer networks
March 12, 2026 by Ren, Y., Morlot, L., Andrews, J. O., Thrane Hertz, E. P., Mailand, N., Caicedo, J. C.
Recent advances in cell segmentation successfully produce models that generalize across various cell-lines and imaging types. However, these methods still fail to recognize subcellular structures such as micronuclei (MN), which are rare and tiny DNA-containing structures found outside of the main nucleus and observable under the microscope. While they can be hard to recognize in images, studying MN formation is of great interest because of their relationship to chromosome instability, genotoxicity, and cancer progression. Here we present a segmentation model, […]
Harnessing methylation signals inherent in long-read sequencing data for improved variant phasing
March 12, 2026 by Pfennig, A., Akey, J. M.
Accurate phasing of genetic and epigenetic variation is crucial for many downstream analyses, including association testing, clinical variant interpretation, and inference of population history. Although long-read sequencing significantly improves the continuity and completeness of genome sequencing, reconstructing chromosome-scale haplotypes remains challenging, often requiring the integration of multiple technologies, such as PacBio HiFi and Oxford Nanopore Technologies (ONT) sequencing. While these sequencing platforms detect the epigenetic modification 5-methylcytosine (5mC), current read-based phasing algorithms do not incorporate this information. We developed a […]
Rational Design of Selective IL-2-based Activators for CAR T Cells Using AlphaFold3 and Physics-Informed Machine Learning
March 12, 2026 by Dahmani, L. Z., Banerjee, A.
Recombinant human Interleukin-2 (rhIL-2, Aldesleukin) is used in immunotherapy for metastatic melanoma and renal cell carcinoma. Low-dose IL-2 has been investigated for administration after adoptive T cell transfer to enhance CAR T expansion and sustain effector function. However, systemic IL-2 can cause severe toxicities and promote expansion of regulatory T cells (Tregs). Previous attempts at mitigating cytokine-mediated side effects involved isolating CAR T cell signaling from endogenous immune responses by developing IL-2/IL-2RB; based selective ligand-receptors systems. Expressing these variant orthogonal […]
Evaluating transformer-based models for structural characterization of orphan proteins
March 12, 2026 by Seckin, E., Colinet, D., Danchin, E., Sarti, E.
Transformer-based models (TBMs) are state-of-the-art deep learning architectures that predict protein structural and functional features with high accuracy. Despite methodological differences, they all rely on large protein sequence datasets structured by homology, as homologous proteins typically share structure and function. However, 5-30% of eukaryotic proteomes consist of orphan proteins – sequences without detectable similarity to known families. Although they may share structural or functional traits with characterized proteins, their lack of homology makes them ideal for evaluating TBM generalization beyond […]
GE-BiCross: A Hierarchical Bidirectional Cross-Attention Framework for Genotype-by-Environment Prediction in Maize
March 12, 2026 by Zhou, S., Zhao, T.
Genotype-by-environment interactions are central to crop adaptation and yield stability, yet they remain difficult to model for robust prediction across heterogeneous environments. Although enviromic profiling has improved the characterization of dynamic field conditions, most existing genomic prediction methods adopt a late-fusion strategy that encodes genomic and environmental information independently before global integration, thereby limiting their ability to resolve fine-scale, context-dependent G x E effects. Here, we developed GE-BiCross, a hierarchical bidirectional cross-attention framework for maize prediction. GE-BiCross incorporates a dual-path […]
Sassy2: Batch Searching of Short DNA Patterns
March 12, 2026 by Beeloo, R., Groot Koerkamp, R.
Motivation. Searching short DNA patterns such as barcodes, primers, or CRISPR spacers within sequencing reads or genomes is a fundamental task in bioinformatics. These problems are instances of multiple approximate string matching (MASM) [Baeza-Yates and Navarro, 1997], which requires locating all occurrences with up to k errors of multiple patterns of length m in a text of length n. Classical approaches based on seeding with exact matches become inefficient for short patterns (m [≤]64 bp) as k increases, producing either […]
AlphaFind v2: Similarity Search in AlphaFold DB and TED Domains across Structural Contexts
March 12, 2026 by Slaninakova, T., Rosinec, A., Cillik, J., Krenek, A., Gresova, K., Porubska, J., Marsalkova, E., Olha, J., Prochazka, D., Hejtmanek, L., Dohnal, V., Berka, K., Svobodova, R., Antol, M.
The availability of large-scale protein structure collections enables structure-based analysis of their function and evolution beyond what is possible from sequence alone. However, applying three-dimensional structure comparison at scale remains computationally demanding and limits practical exploration of large experimental and predicted collections. This creates a need for fast, structure-based search methods that retain biological relevance while enabling large-scale exploration. In this paper, we present AlphaFind v2, an application for finding structurally similar proteins in the AlphaFold Database (https://alphafold.ebi.ac.uk/) of predicted […]
Joint Geometric–Chemical Distance for Protein Surfaces
March 12, 2026 by Swami, H., Eckmann, J.-P., McBride, J. M., Tlusty, T.
Protein function is executed at the molecular surface, where shape and chemistry act together to govern interaction. Yet most comparison methods treat these aspects separately, privileging either global fold or local descriptors and missing their coupled organization. Here we introduce IFACE (Intrinsic Field–Aligned Coupled Embedding), a correspondence-based framework that aligns protein surfaces through probabilistic coupling of intrinsic geometry with spatially distributed chemical fields. From this alignment, we derive a joint geometric-chemical distance that integrates structural and physicochemical discrepancies within a […]
User-driven development and evaluation of an agentic framework for analysis of large pathway diagrams
March 12, 2026 by Corradi, M., Djidrovski, I., Ladeira, L., Staumont, B., Verhoeven, A., Sanz Serrano, J., Rougny, A., Vaez, A., Hemedan, A., Mazein, A., Niarakis, A., de Carvalho e Silva, A., Auffray, C., Wilighagen, E., Kuchovska, E., Schreiber, F., Balaur, I., Calzone, L., Matthews, L., Veschini, L., Gillespie, M. E., Kutmon, M., Koenig, M., van Welzen, M., Hiroi, N., Lopata, O., Klemmer, P., Overall, R., Hofer, T., Satagopam, V., Schneider, R., Teunis, M., Geris, L., Ostaszewski, M.
As biomedical knowledge keeps growing, resources storing available information multiply and grow in size and complexity. Such resources can be in the format of molecular interaction maps, which represent cellular and molecular processes under normal or pathological conditions. However, these maps can be complex and hard to navigate, especially to novice users. Large Language Models (LLMs), particularly in the form of agentic frameworks, have emerged as a promising technology to support this exploration. In this article, we describe a user-driven […]
Leveraging spectrum of graph sheaf Laplacian as a genome-architecture-aware measure of microbiome diversity
March 12, 2026 by Sapoval, N., Treangen, T., Nakhleh, L.
Motivation: Measures of microbial diversity that can be derived directly from metagenomic sequencing data offer a valuable summary view of the underlying complex systems. Prior work has shown that both taxonomic composition and abundances that are captured by standard diversity measures (e.g., Shannon entropy), and structural variation within the metagenome due to gene duplications, losses and horizontal transfers (HGT), can correlate with the host's health. However, there are no diversity measures available that simultaneously account for the genome architecture and […]
Benchmarking zero-shot single-cell foundation model embeddings for cellular dynamics reconstruction
March 12, 2026 by Zhou, X., Wang, Z., Ling, Y., Tian, Q., Zhang, Z., Li, Y., Zhou, P., Chen, L.
Reconstructing cellular trajectories from time-resolved single-cell transcriptomics is fundamental to understanding processes from embryonic development to cancer progression. While single-cell foundation models (scFMs) promise universal biological representations through large-scale pretraining, their capacity to capture the non-linear dynamics governing cell-fate decisions remains uncharacterized. Here we systematically benchmark multiple scFMs across challenging biomedical scenarios involving branching lineages and continuous state transitions. By coupling zero-shot scFM embeddings with dynamic optimal transport, we evaluated their performance against a traditional highly variable gene (HVG) baseline […]
Cyclic peptides space: The methodology of sequence selection to cover the comprehensive physical properties
March 12, 2026 by Tsuchihashi, R., Kinoshita, M.
Cyclic peptides have emerged as a pivotal modality for next-generation therapeutics, due to their superior biocompatibility, high selectivity, and structural stability. While AI-driven peptide design has advanced rapidly, conventional optimization algorithms are often constrained by initialization biases, which impede the efficient exploration of the vast chemical space. Here, we propose a novel methodology that integrates the protein language model ESM-2 with cyclic permutation averaging of embeddings to resolve this bottleneck. This approach establishes a comprehensive "peptide space", a high-dimensional vector […]
DiaReport: Reproducible Workflow for Differential Expression Analysis and Interactive Reporting in DIA-based Proteomics
March 12, 2026 by Argentini, A., Fernandez Fernandez, E., Pauwels, J., Gevaert, K.
Data-independent acquisition (DIA) has become the preferred data acquisition method for mass spectrometry-based proteomics, yet, reproducible workflows for differential expression (DE) analysis and reporting results remain limited. We present DiaReport, an R package that performs precursor- and protein-level DE analysis from DIA-NN output using MSqRob and QFeatures, while generating high-quality, interactive HTML reports through Quarto. DiaReport integrates precursor data, filtering of missing values, normalization, protein summarization and statistical modeling within a single function, supporting both simple pairwise as well as […]
DEX: a consensus-based amino acid exchangeability measure for improved codon substitution modelling
March 12, 2026 by Douglas, G. M., Bobay, L.-M.
Physicochemically similar amino acids undergo more frequent substitutions compared to dissimilar amino acid pairs. Despite their clear potential, amino acid similarity matrices remain underused in molecular evolution, partially due to the high number of proposed amino acid distance measures and the lack of agreement on which are most accurate. In this study, we assessed the performance of 30 amino acid distance measures, including a new amino acid distance measure we developed based on recent deep mutational scanning data. We compared […]
Comparative Analysis of Structural and Dynamical Properties of Lipid Membranes Simulated with the AMBER Lipid21 ForceField Using SPC/E, TIP3P, TIP3P-FB, TIP4P-FB, TIP4P-Ew, TIP4P/2005, TIP4P-D, and OPC Water Models
March 12, 2026 by Chakraborty, D. S., Singh, P. P., Dey, C., Kaur, J.
We have conducted all atom molecular dynamics simulations of POPC and DPPC lipid bilayers using AMBER Lipid21 force field with eight different water models, including SPC/E, TIP3P, TIP3P-FB, TIP4P-FB, TIP4P-Ew, TIP4P/2005, TIP4P-D, and OPC, to identify the most compatible one without any modification. A number of parameters have been computed in order to understand the structure of the lipid bilayer: Area per lipid, Isothermal compressibility modulus, average Volume per lipid, electron density profile, bilayer thickness, X-ray and neutron scattering form […]

Related Journals