BioRxiv Bioinformatics – Mass Spectrometry Blog

Journal Home

RSS

ProteoPy: an AnnData-based framework for integrated proteomics analysis
April 1, 2026 by Fichtner, I. D., Temesvari-Nagy, L., Sahm, F., Gerstung, M., Bludau, I.
Summary: ProteoPy is a lightweight Python library for protein- and peptide-level quantitative proteomics analysis, built around the AnnData class as its core data structure. It streamlines data import, preprocessing, and differential analysis while preserving all metadata within a single object. A reimplementation of our previously published COPF algorithm enables proteoform group inference directly from peptide-level data, facilitating the identification of proteoform-specific regulation and isoform usage. Designed for accessibility and flexibility, ProteoPy simplifies analysis for non-specialists and provides an extensible foundation […]
IMMREP25: Unseen Peptides
April 1, 2026 by Richardson, E., Aarts, Y. J. M., Altin, J. A., Baakman, C. A. B., Bradley, P., Chen, B., Clifford, J., Dhar, M., Diepenbroek, D., Fast, E., Gowthaman, R., He, J., Karnaukhov, V., Marzella, D. F., Meysman, P., Nielsen, M., Nilsson, J. B., Deleuran, S. N., Parizi, F. M., Pelissier, A., Pierce, B. G., Rodriguez Martinez, M., Roran A R, D., Saravanakumar, S., Shao, Y., Smit, N., Van Houcke, M., Visani, G. M., Wan, Y.-T. R., Wang, X., Woods, L., Wuyts, S., Xiao, C., Xue, L. C., IMMREP25 Participant Consortium,, Barton, J., Noakes, M., May, D. H., Peters, B.
T cell receptors (TCRs) can bind to peptides presented by MHC molecules (pMHC) as a first step to trigger a T cell response. Reliable approaches to predict TCR:pMHC binding would have broad applications in clinical diagnostics, therapeutics, and the fundamental understanding of molecular interactions. IMMREP is a community organized series of prediction contests that asks participants to predict TCR:pMHC binding on unpublished datasets. Previous iterations in 2022 and 2023 showed multiple approaches can predict TCR-pMHC binding with significant accuracy (median […]
The human pangenome reference reduces ancestry-related biases in somatic mutation detection
April 1, 2026 by Pham, C. V. K., Abdelmalek, F. S. A., Hua, T., Apel, E., Bizjak, A., Schmidt, E. J., Houlahan, K. E.
Commonly used human reference genomes collapse extensive genetic variability into a single linear genome of which 70% is derived from one donor. These linear genomes fail to capture the full spectrum of genetic variation, which can lead to misalignment of sequencing reads particularly for individuals underrepresented by the linear reference genomes. To address this shortcoming, the Human Pangenome Reference Consortium released the first draft of the human pangenome reference, a graph-based reference that integrates diverse haplotypes. While the human pangenome […]
CLEAR: Concise List Enrichment Analysis Reducing Redundancy
April 1, 2026 by Jia, X., Phan, A., Dorman, K., Kadelka, C.
High-throughput experiments generate genome-wide measurements for thousands of genes, which are often tested marginally. Biological processes are driven by coordinated groups of genes rather than individual genes, making gene set enrichment analysis an essential post hoc interpretation tool. Traditional approaches such as Over-Representation Analysis and Gene Set Enrichment Analysis test gene sets independently, which ignores the hierarchical and overlapping structure of gene set collections such as the Gene Ontology, and often leads to redundant enrichment results. Set-based approaches such as […]
Combining mutation detection with fragmentomics features leads to improved tumor-informed ctDNA detection
April 1, 2026 by Lin, Y., Oroperv, C., Frydendahl, A., Rasmussen, M. H., Andersen, C. L., Besenbacher, S.
Liquid biopsy through circulating tumor DNA (ctDNA) analysis enables non-invasive detection of minimal residual disease (MRD) and early identification of cancer relapse, facilitating timely clinical intervention. However, detecting ctDNA in plasma samples with low tumor burden remains challenging due to the scarcity of mutant molecules, the background noise of sequencing errors and somatic mutations in normal cell-free DNA (cfDNA). Here, we present a mutation-informed fragmentomic framework and evaluate it on 90 stage III colorectal cancer patients with three years of […]
Resolution of recursive data corruption to transform T-cell epitope discovery
April 1, 2026 by Preibisch, G., Tyrolski, M., Kucharski, P., Gizinski, S., Grzegorczyk, P., Moon, S., Kim, S., Zaro, B., Gambin, A.
Accurate prediction of MHC class~I-presented peptides is essential for any vaccine or T-cell therapy design, yet reported gains on in silico benchmarks have not translated into clinical successes. We show that this discrepancy comes from a methodological error: immunopeptidomics datasets are fundamentally contaminated by existing prediction models through prediction-based deconvolution and filtering – an iterative confirmation bias. An audit of the IEDB, the biggest database in the field, reveals that textbf{over 70%} of published data was labeled by computational models […]
Introducing circStudio, a Python package for preprocessing, analyzing and modeling actigraphy data
April 1, 2026 by Marques, D., Barbosa-Morais, N. L., Reis, C. C. P.
Actigraphy is a non-invasive and cost-effective method for monitoring behavioral rhythms under real-world conditions by collecting time-resolved measurements of locomotor activity, light exposure, and temperature. Although several open-source packages support specific aspects of actigraphy analysis, aspects such as preprocessing, metric calculation, and mathematical modeling are often distributed across separate software packages, limiting interoperability and increasing programming overhead. Here we introduce circStudio, a Python package that unifies actigraphy data processing and mathematical modeling of circadian rhythms within a single framework. Built […]
Structure-Guided Computational Analysis of Linker effects in an scFv Targeting Guanylyl Cyclase C
April 1, 2026 by Melo, R., Viegas, T.
Single-chain variable fragments (scFvs) are widely used in diagnostic and therapeutic applications. These antibody fragments comprise two antibody variable domains connected by a flexible peptide linker whose properties critically influence folding, stability, oligomeric state, and antigen-binding. Therefore, careful linker selection represents a key step in scFv design. Guanylyl Cyclase C (GUCY2C) is a tumor-associated cell surface receptor expressed in gastrointestinal malignancies, including more than 90% of colorectal cancer (CRC) cases across all disease stages. Its restricted physiological expression pattern makes […]
Dynamic multimodal survival prediction in multiple myeloma integrating gene expression, longitudinal laboratories, and treatment history
April 1, 2026 by JIA, S., Lysenko, A., Boroevich, K. A., Sharma, A., Tsunoda, T.
Prognostic stratification in multiple myeloma (MM) relies on staging systems that assign patients to fixed categories at diagnosis and discard the temporal information that accumulates during treatment. We developed a dynamic multimodal framework that predicts residual overall survival using observation windows ranging from 1 to 18 months post-diagnosis. The model integrates DeepInsight-transformed gene expression representation, longitudinal laboratory measurement trajectories across 10 analytes, and treatment history for three drug classes through an adaptive fusion mechanism that accounts for missing clinical observations. […]
Baktfold: Sensitive protein functional annotation across the microbial tree of life using structural information
April 1, 2026 by Bouras, G., Lim, S. w., Durr, L., Vreugde, S., Goesmann, A., Edwards, R. A., Schwengers, O.
The functional annotation of protein sequences has undergone tremendous progress over recent years, but still too-many protein sequences remain as so-called hypothetical proteins after applying state-of-the-art genome annotation software pipelines. Here, we introduce Baktfold, a new command line software tool for the ultra-sensitive but taxon-independent fast annotation of protein sequences across the microbial tree of life. Baktfold conducts sequential protein structure-based searches against four complementary structure databases. Protein sequences are transformed into Foldseek 3Di tokens via the ProstT5 protein language […]
Odon: An ultra-fast viewer for spatial proteomics
April 1, 2026 by Coulton, A., McGranahan, N.
Multiplexed spatial proteomics and spatial transcriptomics generate large, high-dimensional imaging datasets that are challenging to visualize efficiently, particularly at whole-slide and cohort scale. Visualization is an essential step for rapid detection of staining artefacts, such as protein aggregates or non-specific staining. Here, we present Odon, a native Rust desktop viewer designed for rapid, interactive exploration of multiplex imaging data on a standard laptop. Odon is primarily built around the OME-Zarr imaging format, and supports annotations via GeoJSON and GeoParquet, with […]
The PhageExpressionAtlas reveals shared and unique transcriptional patterns across phage-host interactions
April 1, 2026 by Wolfram-Schauerte, M., Trust, C., Waffenschmidt, N., Nieselt, K.
Time-resolved transcriptomic profiling has been used to study phage-host interactions for more than a decade. However, the resulting datasets are not readily accessible for custom re-analysis, and resources are lacking that provide standardized processing, storage, and analysis of transcriptomes from phage infections. Here, we present the PhageExpressionAtlas, the first bioinformatics resource for storing time-resolved dual RNA-sequencing data from phage infections. This data was processed uniformly using a custom analysis pipeline and is presented for interactive exploration through visualisation. The PhageExpressionAtlas […]
De novo design of a peptide ligand for specific affinity purification of human complement C1q
April 1, 2026 by Tsuchihashi, R., Kinoshita, M., Aino, H.
Affinity purification is a essential technique for isolating highly purified proteins; however, generating affinity ligands require significant time and financial investment. To address these limitations, this study proposes a novel affinity chromatography method utilizing in silico-designed cyclic peptides as ligands. Targeting Complement C1q (C1q), a plasma protein that plays crucial roles in classical complement pathway, we employed the biomolecular structure prediction model, AlphaFold2, to design specific binding cyclic peptides. Based on these designs, we synthesized lariat-type cyclic peptides characterized by […]
emb2dis: a novel protein disorder prediction tool based on ResNets, dilated convolutions & protein language models
April 1, 2026 by Duarte, S. A., Mehdiabadi, M., Bugnon, L. A., Aspromonte, M. C., Piovesan, D., Milone, D. H., Tosatto, S., Stegmayer, G.
Intrinsically disordered proteins (IDPs) play an important role in a wide range of biological functions and are linked to several diseases. Due to technical difficulties and the high cost of experimental determination of disorder in proteins, combined with the exponential increase of unannotated protein sequences, the development of computational methods for disorder prediction became an active area of research in the last few decades. In this work, we present emb2dis, a deep learning model that uses protein language models (pLMs) […]
geneslator: an R package for comprehensive gene identifier conversion and annotation
April 1, 2026 by Cavallaro, G., Micale, G., Privitera, G. F., Pulvirenti, A., Forte, S., Alaimo, S.
Motivation: High-throughput sequencing generates large gene lists, making data interpretation challenging. Accurate gene annotation and reliable conversion between identifiers (e.g., Gene symbols, Ensembl GeneIDs, Entrez GeneIDs) are essential for integrating datasets, conducting functional analyses, and enabling cross-species comparisons. Existing tools and databases facilitate annotation but often suffer from inconsistencies, missing mappings, and fragmented workflows, limiting reproducibility and interpretability. Results: To address these limitations, we developed geneslator, an R package that unifies gene identifier conversion, orthologs mapping, and pathway annotation across […]
Explainable protein-protein binding affinity prediction via fine-tuning protein language models
April 1, 2026 by Singh, H., SINGH, R. K., Srivastava, S. P., Pradhan, S., Gorantla, R.
Predicting protein-protein binding affinity from sequence alone remains a bottleneck for antibody optimization, biologics design and large-scale affinity modelling. Structure-based methods achieve high accuracy but cannot scale when complex structures are unavailable. Here we present a framework that reframes affinity prediction as metric learning: two proteins are projected into a shared latent space in which cosine similarity directly correlates with experimental binding affinity, and the protein language model encoder is adapted through parameter-efficient fine-tuning (PEFT). On the PPB-Affinity benchmark, the […]
Inferring circadian phases and quantifying biological desynchrony across single-cell transcriptomes
April 1, 2026 by Salati, A., Paychere, Y., Hahaut, V., Gobet, C., Naef, F.
Single-cell RNA sequencing (scRNA-seq) reveals heterogeneity in circadian clock states across individual cells, yet accurately inferring circadian phase and distinguishing biological desynchrony from technical noise remains challenging. Here, we introduce scRitmo, a probabilistic framework that infers single-cell circadian phases from mRNA count data, providing both a point estimate and a posterior uncertainty for each cell. A simulation-calibrated variance decomposition separates the observed phase dispersion into biological and technical components, enabling direct estimation of intercellular desynchrony. We validate scRitmo using deeply […]
STAPLE: automating spatial transcriptomics analysis and AI interpretation
April 1, 2026 by Lvovs, D., Quinn, J., Forjaz, A., Santana-Cruz, I., Stapleton, O., Vavikolanu, K., Wetzel, M., Data Science Hub TeamLab,, Demystifying Pancreatic Cancer Therapies TeamLab,, Pagan, V. B., Herb, B. R., Favorov, A., Kagohara, L. T., Kiemen, A. L., Maitra, A., Sidiropoulos, D. N., Tansey, W., Wood, L., Deshpande, A., Noble, M., Fertig, E. J.
Spatial transcriptomics workflows often span separate tools for cell typing, neighborhoods, and cell-cell communication, yielding fragmented outputs that hinder scalability, interpretation, and reproducibility. STAPLE systematizes analyses across distinct methods into a modular framework, unifying data structures and cross-tool interoperability. End-to-end analyses are performed unassisted with a single invocation, fostering rigorous, reproducible spatial transcriptomics analysis. Its novel, AI-enabled reporting layer synthesizes quantitative results into summaries of biological findings, facilitating analysis interpretation.
ECLIPSE: Exploring the dark proteome of ESKAPE pathogens through the sequence similarity network of the Protein Universe Atlas
April 1, 2026 by Lata, S., Heinz, D. W.
The accelerating crisis of antimicrobial resistance among the critical ESKAPE pathogens demands the urgent identification of novel molecular targets. However, a substantial fraction of bacterial proteomes remains functionally uncharacterized, with many genes annotated as encoding hypothetical proteins. These protein sequences often lack significant similarity to known protein families when using conventional homology-based annotation methods and thus remain dark. This limits our ability to explore their role in pathogenicity, and it is thus crucial to bridge this substantial gap in pathogen […]
CROWN: Curated Repository Of Well-resolved Noncovalent interactions
April 1, 2026 by Poelmans, R., Van Eynde, W., Bruncsics, B., Bruncsics, B., Arany, A., Moreau, Y., Voet, A. R.
The development of machine learning models for protein-ligand interactions is fundamentally constrained by the quality and diversity of available structural data. Existing databases of protein-ligand complexes present researchers with an unsatisfying trade-off: carefully curated collections such as PDBBind and HiQBind offer high structural reliability but cover only a narrow slice of the Protein Data Bank (PDB), while large-scale resources like PLInder provide broad coverage at the expense of rigorous quality control. Here, we introduce CROWN (Curated Repository Of Well-resolved Non-covalent […]

Related Journals