{"id":3112,"date":"2023-01-17T20:44:50","date_gmt":"2023-01-18T02:44:50","guid":{"rendered":"https:\/\/kermitmurray.com\/msblog\/?page_id=3112"},"modified":"2023-01-17T20:44:50","modified_gmt":"2023-01-18T02:44:50","slug":"biorxiv-bioinformatics","status":"publish","type":"page","link":"https:\/\/kermitmurray.com\/msblog\/links\/journal-feeds\/biochemistry-journal-feeds\/biorxiv\/biorxiv-bioinformatics\/","title":{"rendered":"BioRxiv Bioinformatics"},"content":{"rendered":"\n<div class=\"wp-block-caxton-grid relative\"><div class=\"absolute absolute--fill\"><div class=\"absolute absolute--fill cover bg-center\" style=\"background-color:;background-image:linear-gradient( );\"><\/div><div class=\"absolute absolute--fill\" style=\"background-color:;background-image:linear-gradient( );opacity:1;\"><\/div><\/div><div class=\"relative caxton-columns caxton-grid-block\" style=\"padding-top:0;padding-left:0;padding-bottom:0;padding-right:0;grid-template-columns:repeat(12, 1fr)\" data-tablet-css=\"padding-left:em;padding-right:em;\" data-mobile-css=\"padding-left:em;padding-right:em;\">\n<div class=\"wp-block-caxton-section relative\" style=\"grid-area:span 1\/span 8\"><div class=\"absolute absolute--fill\"><div class=\"absolute absolute--fill cover bg-center\" style=\"background-color:;background-image:linear-gradient( );\"><\/div><div class=\"absolute absolute--fill\" style=\"background-color:;background-image:linear-gradient( );opacity:1;\"><\/div><\/div><div class=\"relative caxton-section-block\" style=\"padding-top:5px;padding-left:5px;padding-bottom:5px;padding-right:5px\" data-mobile-css=\"padding-left:1em;padding-right:1em;\" data-tablet-css=\"padding-left:1em;padding-right:1em;\">\n<p><strong><a href=\"https:\/\/www.biorxiv.org\/alertsrss\" target=\"_blank\" rel=\"noreferrer noopener\">Journal Home<\/a><\/strong><\/p>\n<\/div><\/div>\n\n\n\n<div class=\"wp-block-caxton-section relative\" style=\"grid-area:span 1\/span 4\"><div class=\"absolute absolute--fill\"><div class=\"absolute absolute--fill cover bg-center\" style=\"background-color:;background-image:linear-gradient( );\"><\/div><div class=\"absolute absolute--fill\" style=\"background-color:;background-image:linear-gradient( );opacity:1;\"><\/div><\/div><div class=\"relative caxton-section-block\" style=\"padding-top:5px;padding-left:5px;padding-bottom:5px;padding-right:5px\" data-mobile-css=\"padding-left:1em;padding-right:1em;\" data-tablet-css=\"padding-left:1em;padding-right:1em;\">\n<p><strong><a href=\"http:\/\/connect.biorxiv.org\/biorxiv_xml.php?subject=bioinformatics\" target=\"_blank\" rel=\"noreferrer noopener\">RSS<\/a><\/strong><\/p>\n<\/div><\/div>\n<\/div><\/div>\n\n\n<ul class=\"has-dates has-authors has-excerpts wp-block-rss\"><li class='wp-block-rss__item'><div class='wp-block-rss__item-title'><a href='https:\/\/www.biorxiv.org\/content\/10.64898\/2026.05.27.728155v1?rss=1'>Trustworthy ML\/AI for Aging Clocks: Preventing Systematic Prediction Bias in Biological Age Estimation<\/a><\/div><time datetime=\"2026-06-01T00:00:00-05:00\" class=\"wp-block-rss__item-publish-date\">June 1, 2026<\/time> <span class=\"wp-block-rss__item-author\">by Lee, H., Ye, Z., Yang, Y., Pan, Y., Maron, B., Wang, Z., Kochunov, P., Thompson, P., Hong, L. E., MA, T., Chen, C., Chen, S.<\/span><div class=\"wp-block-rss__item-excerpt\">Machine learning (ML)- and artificial intelligence (AI)-based aging clocks are increasingly used to quantify physiological and molecular aging from omics and medical imaging data as distinct from chronological age. Here, we characterize a fundamental but underappreciated computational limitation of commonly used ML\/AI regression models: systematic prediction bias and its propagation to downstream association estimates. We demonstrate that systematic prediction bias can distort, and in some cases reverse, biomedical conclusions drawn from aging-clock analyses. For example, it can produce spurious associations [&hellip;]<\/div><\/li><li class='wp-block-rss__item'><div class='wp-block-rss__item-title'><a href='https:\/\/www.biorxiv.org\/content\/10.64898\/2026.05.28.728453v1?rss=1'>Recursive exploration of metabolic yield space<\/a><\/div><time datetime=\"2026-06-01T00:00:00-05:00\" class=\"wp-block-rss__item-publish-date\">June 1, 2026<\/time> <span class=\"wp-block-rss__item-author\">by Mores, W., Bhonsale, S., Floros, S., Logist, F., Van Impe, J. F. M.<\/span><div class=\"wp-block-rss__item-excerpt\">Genome-scale metabolic network reconstructions contain extremely detailed and valuable information regarding cellular metabolism. For many applications such as finding genetic engineering targets and reduced kinetic model construction, metabolic network analysis techniques exist. Yield spaces based on the extreme rays of solution cones related to the metabolic network are frequently constructed for these types of analyses. However, for genome-scale networks, full enumeration of these extreme rays is not computationally feasible. In this work, a novel direct generation method for yield spaces [&hellip;]<\/div><\/li><li class='wp-block-rss__item'><div class='wp-block-rss__item-title'><a href='https:\/\/www.biorxiv.org\/content\/10.64898\/2026.05.28.726140v1?rss=1'>Ultra-efficient High Resolution 3D Reconstruction of Spatial Omics Data with Neural Transcriptomic Field<\/a><\/div><time datetime=\"2026-06-01T00:00:00-05:00\" class=\"wp-block-rss__item-publish-date\">June 1, 2026<\/time> <span class=\"wp-block-rss__item-author\">by Gong, Y., Yuan, X., Gao, R., Chen, J., Yu, Z.<\/span><div class=\"wp-block-rss__item-excerpt\">Biological tissues are inherently three-dimensional (3D) ecosystems where spatial architecture dictates cellular function. While spatial omics technologies have revolutionized molecular profiling, they are largely restricted to isolated two-dimensional (2D) tissue sections. Existing computational methods attempting to reconstruct 3D volumes from sparse slices rely heavily on local slice-to-slice interpolation, struggling to balance high-fidelity reconstruction, noise reduction, and atlas-scale efficiency. Here, we present Neural Transcriptomic Field (NTF), a deep learning framework employing multi-resolution hash-grid encoding and implicit neural representations. Unlike interpolation-based approaches [&hellip;]<\/div><\/li><li class='wp-block-rss__item'><div class='wp-block-rss__item-title'><a href='https:\/\/www.biorxiv.org\/content\/10.64898\/2026.05.28.728511v1?rss=1'>GeneKnow: AI-powered literature synthesis for gene-context analysis<\/a><\/div><time datetime=\"2026-06-01T00:00:00-05:00\" class=\"wp-block-rss__item-publish-date\">June 1, 2026<\/time> <span class=\"wp-block-rss__item-author\">by Zhang, H., Zang, C.<\/span><div class=\"wp-block-rss__item-excerpt\">Interpreting gene function in specific biological contexts is essential for biomedical research, yet manual literature review is labor-intensive. We developed GeneKnow, a source-grounded framework that uses generative AI models within a controlled hybrid workflow to produce reliable, traceable literature synthesis supported by authentic citations. Through systematic benchmarking, we showed that GeneKnow outperforms mainstream web-interface AI tools in generating trustworthy context-specific gene function syntheses without fabricated citations and minimizing hallucinations.<\/div><\/li><li class='wp-block-rss__item'><div class='wp-block-rss__item-title'><a href='https:\/\/www.biorxiv.org\/content\/10.64898\/2026.05.28.719125v1?rss=1'>Reproducible and shareable bioinformatics pipelines from natural-language prompts<\/a><\/div><time datetime=\"2026-06-01T00:00:00-05:00\" class=\"wp-block-rss__item-publish-date\">June 1, 2026<\/time> <span class=\"wp-block-rss__item-author\">by Kim, H.-M., Jeong, H., Mekonnen, A. M., Kim, Y., Oh, Y., Lee, H., Jung, C., Park, J.<\/span><div class=\"wp-block-rss__item-excerpt\">Large language models (LLMs) are increasingly used to generate bioinformatics pipelines and to carry out analyses from natural-language prompts. However, the resulting analyses are often difficult to reproduce across sessions, owing to the non-deterministic nature of LLM-driven conversations and heterogeneity of local execution environments, and cannot run on remote high-performance computing (HPC) servers or be shared and reused. We present Autopipe, a platform that guides any Model Context Protocol (MCP) &#8211; compatible LLM to produce, execute, and publish source-preserved, re-executable [&hellip;]<\/div><\/li><li class='wp-block-rss__item'><div class='wp-block-rss__item-title'><a href='https:\/\/www.biorxiv.org\/content\/10.64898\/2026.05.28.728246v1?rss=1'>Species- and Topic-aware Representation Learning for Antimicrobial Peptide Discovery<\/a><\/div><time datetime=\"2026-06-01T00:00:00-05:00\" class=\"wp-block-rss__item-publish-date\">June 1, 2026<\/time> <span class=\"wp-block-rss__item-author\">by Padi, S., Mondal, K., Kaur, N., Hoogerheide, D. P., Heinrich, F., Mihailescu, E., Klauda, J. B., Cardone, A., Keyrouz, W.<\/span><div class=\"wp-block-rss__item-excerpt\">Antimicrobial resistance poses a major global health challenge, necessitating efficient strategies to discover potent antimicrobial peptides (AMPs). While recent generative models can produce many candidate sequences, experimentally validating all generated peptides in wet labs is impractical due to the high costs and time involved in such measurements. As a result, there is a strong demand for accurate predictions of peptide efficacy, typically measured as the minimum inhibitory concentration (MIC). We introduce STAMP, a framework for Species-and Topic-aware Representation Learning in [&hellip;]<\/div><\/li><li class='wp-block-rss__item'><div class='wp-block-rss__item-title'><a href='https:\/\/www.biorxiv.org\/content\/10.64898\/2026.05.29.728633v1?rss=1'>UMITIC: An unsupervised framework for the joint characterization of cellular phenotypes and spatial neighborhoods in multiplex and hyperplex immunofluorescence imaging data<\/a><\/div><time datetime=\"2026-06-01T00:00:00-05:00\" class=\"wp-block-rss__item-publish-date\">June 1, 2026<\/time> <span class=\"wp-block-rss__item-author\">by Sangu\u0308esa Recalde, M., De Andrea, C. E., Ariz, M.<\/span><div class=\"wp-block-rss__item-excerpt\">Multiplexed imaging technologies enable the simultaneous measurement of dozens of protein markers while preserving context, providing a high-resolution view of tissue organization schemes. However, extracting meaningful insights from these high-dimensional datasets&#8211;particularly in hyperplex settings (&gt;20 markers)&#8211;remains a major computational challenge, especially in the absence of annotated data. Here, we present UMITIC (Unsupervised Analysis of Multiplex Images via TIssue Characterization), a modular and unsupervised computational framework for the joint characterization of cell phenotypes and tissue neighborhoods from multiplex imaging data. UMITIC [&hellip;]<\/div><\/li><li class='wp-block-rss__item'><div class='wp-block-rss__item-title'><a href='https:\/\/www.biorxiv.org\/content\/10.64898\/2026.06.01.729228v1?rss=1'>Nucleic acid 3D structure search and alignment with GTalign<\/a><\/div><time datetime=\"2026-06-01T00:00:00-05:00\" class=\"wp-block-rss__item-publish-date\">June 1, 2026<\/time> <span class=\"wp-block-rss__item-author\">by Margelevicius, M., Rana, S.<\/span><div class=\"wp-block-rss__item-excerpt\">Structural comparison of nucleic acids, particularly RNA, is critical for understanding evolutionary and functional relationships beyond sequence similarity, yet efficient tools for large-scale 3D structure search and alignment remain scarce. We extend GTalign to support nucleic acid structures, enabling unified, high-performance alignment across macromolecules. Benchmarking on a diverse RNA dataset demonstrates improved alignment accuracy and substantially lower runtimes compared to existing methods. GTalign thus provides a scalable solution for nucleic acid structure comparison and database search. Availability and Implementation: https:\/\/github.com\/minmarg\/gtalign_alpha<\/div><\/li><li class='wp-block-rss__item'><div class='wp-block-rss__item-title'><a href='https:\/\/www.biorxiv.org\/content\/10.64898\/2026.05.28.728361v1?rss=1'>A Multi-Epitope Vaccine Design for Human Pasteurellosis using Outer Membrane \u03b2-barrel Proteins of Pasteurella multocida<\/a><\/div><time datetime=\"2026-06-01T00:00:00-05:00\" class=\"wp-block-rss__item-publish-date\">June 1, 2026<\/time> <span class=\"wp-block-rss__item-author\">by Panda, A., Kapoor, J., Kumar, S., Bandyopadhyay, A.<\/span><div class=\"wp-block-rss__item-excerpt\">Pasteurella multocida is a facultative anaerobic, Gram-negative coccobacillus that causes pasteurellosis in companion animals (cats and dogs), livestock, and poultry. Close contact with infected animals poses a significant zoonotic risk to humans through bite wounds, scratches, licking and transfer of bodily fluids. Current treatment relies mainly on antibiotics, and the lack of a licensed human vaccine further exacerbates the challenge. In the present study, a consensus-based computational approach was employed on the P. multocida Past 9 proteome. A total of [&hellip;]<\/div><\/li><li class='wp-block-rss__item'><div class='wp-block-rss__item-title'><a href='https:\/\/www.biorxiv.org\/content\/10.64898\/2026.05.28.728367v1?rss=1'>Assessing and Optimizing Low-Frequency Somatic Mutation Detection: A Multi-Platform High-Throughput Sequencing Perspective<\/a><\/div><time datetime=\"2026-06-01T00:00:00-05:00\" class=\"wp-block-rss__item-publish-date\">June 1, 2026<\/time> <span class=\"wp-block-rss__item-author\">by Feng, B. N., Lin, Y., Liu, L., Lin, Q., Lin, Y., Liu, Y., Li, J., Lei, C., Chen, C., Yang, M., Peng, X., Zhou, Z., Yan, Q., Sun, L., Li, Q.<\/span><div class=\"wp-block-rss__item-excerpt\">The availability of multiple commercial short-read sequencing platforms necessitates systematic cross-platform performance comparisons, particularly for challenging applications such as low-frequency somatic mutation detection. Here, a large-scale targeted sequencing dataset from five Genome in a Bottle (GIAB) human genomic DNA reference standards, HG001 to HG005, alongside Twist Biosciences cfDNA reference standards featuring 1% variant allele frequency (VAF), was generated by six platforms (NovaSeq 6000, NovaSeq X, FASTASeq 300, GenoLab M, SURFSeq 5000, and MGISEQ-T7). To build a realistic benchmark while keeping [&hellip;]<\/div><\/li><li class='wp-block-rss__item'><div class='wp-block-rss__item-title'><a href='https:\/\/www.biorxiv.org\/content\/10.64898\/2026.05.28.728427v1?rss=1'>Monju: Multi-criteria clustering in single-cell omics<\/a><\/div><time datetime=\"2026-06-01T00:00:00-05:00\" class=\"wp-block-rss__item-publish-date\">June 1, 2026<\/time> <span class=\"wp-block-rss__item-author\">by Kaneko, T., Sakaguchi, S., Fujioka, S., Yada, Y., Kojima, R., Naoki, H.<\/span><div class=\"wp-block-rss__item-excerpt\">Clustering is a fundamental step in single-cell omics analysis. Although single-cell omics data can, in principle, be partitioned according to multiple biologically meaningful criteria, existing methods typically cluster cells using a single criterion. To address this problem, we developed Monju, a multi-criteria clustering method based on a deep generative mixture model. Monju divides cells into biologically reasonable submodels, each of which is equipped with an interpretable latent space. Furthermore, although the partitioning of cells into submodels varies across random seeds, [&hellip;]<\/div><\/li><li class='wp-block-rss__item'><div class='wp-block-rss__item-title'><a href='https:\/\/www.biorxiv.org\/content\/10.64898\/2026.05.28.728543v1?rss=1'>Morphology-robust quantification of subcellular organization in complex cells<\/a><\/div><time datetime=\"2026-06-01T00:00:00-05:00\" class=\"wp-block-rss__item-publish-date\">June 1, 2026<\/time> <span class=\"wp-block-rss__item-author\">by Hu, R., Naseri, N. N., Shalem, O., Camara, P. G.<\/span><div class=\"wp-block-rss__item-excerpt\">Quantitative analysis of subcellular protein organization is often confounded by variation in cell morphology, limiting the identification and interpretation of localization patterns in fluorescence microscopy data from morphologically complex cells, such as neurons and glia. We introduce CellAligner, an unsupervised framework that uses fused unbalanced Gromov-Wasserstein couplings to map protein distributions from morphologically distinct cells into shared anchor-cell geometries, enabling morphology-robust comparison of subcellular localization. In neuronal imaging benchmarks, applying current image-analysis methods (CellProfiler, Cytoself, Paired Cell Inpainting) to CellAligner&#039;s [&hellip;]<\/div><\/li><li class='wp-block-rss__item'><div class='wp-block-rss__item-title'><a href='https:\/\/www.biorxiv.org\/content\/10.64898\/2026.05.27.728319v1?rss=1'>A Foundation Model for the Cancer Genome<\/a><\/div><time datetime=\"2026-06-01T00:00:00-05:00\" class=\"wp-block-rss__item-publish-date\">June 1, 2026<\/time> <span class=\"wp-block-rss__item-author\">by Sidhom, J.-W., Baras, A. S., Elemento, O., Shah, M. A.<\/span><div class=\"wp-block-rss__item-excerpt\">Cancer is a disease of the genome, in which somatic mutations and copy-number alterations determine tumour identity, clinical behaviour, and response to therapy. Consortium-scale sequencing has profiled hundreds of thousands of tumours, yet clinical interpretation still proceeds one alteration at a time against hand-curated knowledgebases, often ignoring co-occurring alterations and the genome-wide copy-number pattern. Self-supervised foundation models pretrained on unlabelled corpora have produced transferable representations in adjacent biological domains by learning joint structure across many features, yet no comparable model [&hellip;]<\/div><\/li><li class='wp-block-rss__item'><div class='wp-block-rss__item-title'><a href='https:\/\/www.biorxiv.org\/content\/10.64898\/2026.05.27.728108v1?rss=1'>fourSynergy: Ensemble-based interaction calling on 4C-seq data using gradient-free optimization<\/a><\/div><time datetime=\"2026-06-01T00:00:00-05:00\" class=\"wp-block-rss__item-publish-date\">June 1, 2026<\/time> <span class=\"wp-block-rss__item-author\">by Wind, S.-M., Plagwitz, L., Dix, J., Heidtmann, G., Heider, D., Walter, C.<\/span><div class=\"wp-block-rss__item-excerpt\">Motivation: Chromatin organization plays a crucial role in gene regulation and is associated with various severe diseases like cancer. Since chromatin changes are potentially reversible, a deeper understanding of the alterations needs to be harnessed for the development of new therapies. Circular Chromosome Conformation Capture Sequencing (4C-seq) is a sequencing technique enabling the identification of chromatin interactions between genes and regulatory elements. This work aims to develop an ensemble algorithm that utilizes synergies among available 4C-seq tools, which in turn [&hellip;]<\/div><\/li><li class='wp-block-rss__item'><div class='wp-block-rss__item-title'><a href='https:\/\/www.biorxiv.org\/content\/10.64898\/2026.05.28.728581v1?rss=1'>Rare RNA Polymerase II failure modes mark the cancer-driving genes most affected by epigenetic perturbation<\/a><\/div><time datetime=\"2026-05-31T00:00:00-05:00\" class=\"wp-block-rss__item-publish-date\">May 31, 2026<\/time> <span class=\"wp-block-rss__item-author\">by Asante, Y., Gryder, B. E.<\/span><div class=\"wp-block-rss__item-excerpt\">RNA Polymerase II (Pol2) transcribes genes through a complex life cycle (initiation, pausing, elongation, co-transcriptional splicing, termination, and recycling). Chromatin immunoprecipitation of Pol2 before and after chemical perturbation has identified promoter-proximal accumulation (pausing) as a critical step in the transcription genome-wide. However, the full landscape of Pol2 responses has not been well characterized. Here, we introduce a tool for comparing Pol2 Activity State Shifts (compPASS), a computational pipeline which uses data from paired ChIP-based approaches to assign genes to one [&hellip;]<\/div><\/li><li class='wp-block-rss__item'><div class='wp-block-rss__item-title'><a href='https:\/\/www.biorxiv.org\/content\/10.64898\/2026.05.29.728699v1?rss=1'>Predicting host-pathogen interactions using a proteome-scale language model<\/a><\/div><time datetime=\"2026-05-31T00:00:00-05:00\" class=\"wp-block-rss__item-publish-date\">May 31, 2026<\/time> <span class=\"wp-block-rss__item-author\">by Malbranke, C., Fruet, C., Bitbol, A.-F.<\/span><div class=\"wp-block-rss__item-excerpt\">ProteomeLM is a proteome-scale language model trained on proteomes spanning the tree of life to reconstruct masked protein embeddings from proteome context within each species. Its attention coefficients capture protein-protein interactions without supervision. Here, we show that this capability extends to cross-species host-pathogen interactions (HPI) across ten human pathogen taxa spanning viruses and bacteria, and can be further improved with lightweight fine-tuning. We introduce ProteomeLM-HPI, a parameter-efficient adaptation via LoRA, trained on concatenated host-pathogen proteomes to reconstruct masked pathogen embeddings [&hellip;]<\/div><\/li><li class='wp-block-rss__item'><div class='wp-block-rss__item-title'><a href='https:\/\/www.biorxiv.org\/content\/10.64898\/2026.05.28.728477v1?rss=1'>Systematic In Silico Off-Target Assessment of siRNAs: Integrated Tissue-Specific Scoring and Cross-Species Preclinical Model Selection with TargetSureR<\/a><\/div><time datetime=\"2026-05-31T00:00:00-05:00\" class=\"wp-block-rss__item-publish-date\">May 31, 2026<\/time> <span class=\"wp-block-rss__item-author\">by Ni, S., Kan, K., Zhu, F., Wang, L., Wang, W., Wu, N.<\/span><div class=\"wp-block-rss__item-excerpt\">Small interfering RNAs (siRNAs) have become a transformative class of nucleic acid therapeutics for clinical disease treatment, yet sequence-dependent off-target silencing continues to pose a major safety barrier that hinders their preclinical refinement and large-scale translational application. Existing bioinformatics tools only support partial off-target evaluation, either focusing on basic sequence optimization or simple seed-region scanning, and fail to deliver systematic, multi-dimensional and reproducible safety assessment for siRNA lead screening. To fill this gap, we developed TargetSureR, a lightweight, modular and [&hellip;]<\/div><\/li><li class='wp-block-rss__item'><div class='wp-block-rss__item-title'><a href='https:\/\/www.biorxiv.org\/content\/10.64898\/2026.05.27.728217v1?rss=1'>TRACE: a graph-based workflow for TCR-epitope prioritization and tumor-reactive T-cell identification<\/a><\/div><time datetime=\"2026-05-31T00:00:00-05:00\" class=\"wp-block-rss__item-publish-date\">May 31, 2026<\/time> <span class=\"wp-block-rss__item-author\">by Chen, Y., Giuliano, V., Dacillo, I., Lin, W., Yan, Y., Luo, P.<\/span><div class=\"wp-block-rss__item-excerpt\">Accurate prioritization of T-cell receptor (TCR)-epitope interactions and identification of tumor-reactive T cells are important but difficult steps in immunotherapy-oriented bioinformatics workflows. Existing methods typically address these tasks separately and either model TCR-epitope pairs as independent observations or rely primarily on transcriptomic signatures. In this study, we present TRACE (TCR-epitope pRioritization And T-Cell idEntification), a graph-based computational workflow that unifies both applications within a single heterogeneous graph framework. The protocol represents TCRs, epitopes, and T cells as typed nodes connected [&hellip;]<\/div><\/li><li class='wp-block-rss__item'><div class='wp-block-rss__item-title'><a href='https:\/\/www.biorxiv.org\/content\/10.64898\/2026.05.27.728216v1?rss=1'>LRP2: A proteogenomics pipeline for long-read informed protein isoform analysis and discovery<\/a><\/div><time datetime=\"2026-05-31T00:00:00-05:00\" class=\"wp-block-rss__item-publish-date\">May 31, 2026<\/time> <span class=\"wp-block-rss__item-author\">by Schertzer, M. D., Lewandowski, J. T., Watts, E. F., Rosenow, W., Mehlferber, M. M., Jeffery, E. D., Adamson, S. I., Bruand, J., Tseng, E., Neelamraju, Y., Garrett-Bakelman, F. E., Dolzhenko, E., Knowles, D. A., Sheynkman, G.<\/span><div class=\"wp-block-rss__item-excerpt\">Most human genes produce multiple RNA isoforms, yet it remains unclear which isoforms are translated into stable, functional proteins. Long-read RNA-sequencing resolves full-length transcript structures and, when paired with mass spectrometry, can provide empirical evidence of isoform translation. Despite this opportunity, comprehensive workflows integrating isoform discovery, open reading frame prediction, peptide identification, and protein inference remain limited, leaving users to handle these steps piecemeal. Here, we present LRP2, a modular, end-to-end long-read proteogenomics pipeline built in Nextflow. LRP2 scales transcript [&hellip;]<\/div><\/li><li class='wp-block-rss__item'><div class='wp-block-rss__item-title'><a href='https:\/\/www.biorxiv.org\/content\/10.64898\/2026.05.28.726277v1?rss=1'>Instance-Wise Contrastive Graph Neural Network Enables the Discovery of Novel Aedes aegypti Larvicidal Compounds<\/a><\/div><time datetime=\"2026-05-31T00:00:00-05:00\" class=\"wp-block-rss__item-publish-date\">May 31, 2026<\/time> <span class=\"wp-block-rss__item-author\">by da Costa, K. S. L., Caldeira, G. H. G., Costa, V. A. F., Silva, A. S., Pereira, C. d. S., Batista, B. C., Manchein, L. B., Martin, H.-J., Rafique, J., Braga, R. d. C., Muratov, E., Saba, S., de Oliveira, G. A. R., Luz, C., Rodrigues, J., Neves, B. J.<\/span><div class=\"wp-block-rss__item-excerpt\">Aedes aegypti remains a major arboviral vector, making larval control a critical strategy to reduce mosquito populations. However, resistance to commercial larvicides has reduced the long-term effectiveness of current interventions, reinforcing the need for new compounds with improved potency and selectivity. Here, we present an instance-wise contrastive graph neural network (GNN) framework to accelerate the discovery of novel larvicidal compounds. The model was trained on a curated dataset of 556 organic compounds organized into LC50-derived multitask classification thresholds and integrated [&hellip;]<\/div><\/li><\/ul>\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity is-style-wide\"\/>\n\n\n\n<h4 class=\"wp-block-heading\">Related Journals<\/h4>\n\n\n<ul class=\"su-siblings\"><li class=\"page_item page-item-3099\"><a href=\"https:\/\/kermitmurray.com\/msblog\/links\/journal-feeds\/biochemistry-journal-feeds\/biorxiv\/biorxiv-biochemistry\/\">BioRxiv Biochemistry<\/a><\/li>\n<li class=\"page_item page-item-3132\"><a href=\"https:\/\/kermitmurray.com\/msblog\/links\/journal-feeds\/biochemistry-journal-feeds\/biorxiv\/biorxiv-biophysics\/\">BioRxiv Biophysics<\/a><\/li>\n<li class=\"page_item page-item-3188\"><a href=\"https:\/\/kermitmurray.com\/msblog\/links\/journal-feeds\/biochemistry-journal-feeds\/biorxiv\/biorxiv-cancer-biology\/\">BioRxiv Cancer Biology<\/a><\/li>\n<li class=\"page_item page-item-3190\"><a href=\"https:\/\/kermitmurray.com\/msblog\/links\/journal-feeds\/biochemistry-journal-feeds\/biorxiv\/biorxiv-pharmacology-and-toxicology\/\">BioRxiv Pharmacology and Toxicology<\/a><\/li>\n<li class=\"page_item page-item-3114\"><a href=\"https:\/\/kermitmurray.com\/msblog\/links\/journal-feeds\/biochemistry-journal-feeds\/biorxiv\/biorxiv-systems-biology\/\">BioRxiv Systems Biology<\/a><\/li>\n<li class=\"page_item page-item-3193\"><a href=\"https:\/\/kermitmurray.com\/msblog\/links\/journal-feeds\/biochemistry-journal-feeds\/biorxiv\/biorxiv-zoology\/\">BioRxiv Zoology<\/a><\/li>\n<\/ul>\n","protected":false},"excerpt":{"rendered":"<p>Related Journals<\/p>\n","protected":false},"author":1,"featured_media":2652,"parent":3087,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"","meta":{"footnotes":"","_links_to":"","_links_to_target":""},"class_list":["post-3112","page","type-page","status-publish","has-post-thumbnail","hentry","entry"],"_links":{"self":[{"href":"https:\/\/kermitmurray.com\/msblog\/wp-json\/wp\/v2\/pages\/3112","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/kermitmurray.com\/msblog\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/kermitmurray.com\/msblog\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/kermitmurray.com\/msblog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/kermitmurray.com\/msblog\/wp-json\/wp\/v2\/comments?post=3112"}],"version-history":[{"count":1,"href":"https:\/\/kermitmurray.com\/msblog\/wp-json\/wp\/v2\/pages\/3112\/revisions"}],"predecessor-version":[{"id":3113,"href":"https:\/\/kermitmurray.com\/msblog\/wp-json\/wp\/v2\/pages\/3112\/revisions\/3113"}],"up":[{"embeddable":true,"href":"https:\/\/kermitmurray.com\/msblog\/wp-json\/wp\/v2\/pages\/3087"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/kermitmurray.com\/msblog\/wp-json\/wp\/v2\/media\/2652"}],"wp:attachment":[{"href":"https:\/\/kermitmurray.com\/msblog\/wp-json\/wp\/v2\/media?parent=3112"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}