{"id":3112,"date":"2023-01-17T20:44:50","date_gmt":"2023-01-18T02:44:50","guid":{"rendered":"https:\/\/kermitmurray.com\/msblog\/?page_id=3112"},"modified":"2023-01-17T20:44:50","modified_gmt":"2023-01-18T02:44:50","slug":"biorxiv-bioinformatics","status":"publish","type":"page","link":"https:\/\/kermitmurray.com\/msblog\/links\/journal-feeds\/biochemistry-journal-feeds\/biorxiv\/biorxiv-bioinformatics\/","title":{"rendered":"BioRxiv Bioinformatics"},"content":{"rendered":"\n<div class=\"wp-block-caxton-grid relative\"><div class=\"absolute absolute--fill\"><div class=\"absolute absolute--fill cover bg-center\" style=\"background-color:;background-image:linear-gradient( );\"><\/div><div class=\"absolute absolute--fill\" style=\"background-color:;background-image:linear-gradient( );opacity:1;\"><\/div><\/div><div class=\"relative caxton-columns caxton-grid-block\" style=\"padding-top:0;padding-left:0;padding-bottom:0;padding-right:0;grid-template-columns:repeat(12, 1fr)\" data-tablet-css=\"padding-left:em;padding-right:em;\" data-mobile-css=\"padding-left:em;padding-right:em;\">\n<div class=\"wp-block-caxton-section relative\" style=\"grid-area:span 1\/span 8\"><div class=\"absolute absolute--fill\"><div class=\"absolute absolute--fill cover bg-center\" style=\"background-color:;background-image:linear-gradient( );\"><\/div><div class=\"absolute absolute--fill\" style=\"background-color:;background-image:linear-gradient( );opacity:1;\"><\/div><\/div><div class=\"relative caxton-section-block\" style=\"padding-top:5px;padding-left:5px;padding-bottom:5px;padding-right:5px\" data-mobile-css=\"padding-left:1em;padding-right:1em;\" data-tablet-css=\"padding-left:1em;padding-right:1em;\">\n<p><strong><a href=\"https:\/\/www.biorxiv.org\/alertsrss\" target=\"_blank\" rel=\"noreferrer noopener\">Journal Home<\/a><\/strong><\/p>\n<\/div><\/div>\n\n\n\n<div class=\"wp-block-caxton-section relative\" style=\"grid-area:span 1\/span 4\"><div class=\"absolute absolute--fill\"><div class=\"absolute absolute--fill cover bg-center\" style=\"background-color:;background-image:linear-gradient( );\"><\/div><div class=\"absolute absolute--fill\" style=\"background-color:;background-image:linear-gradient( );opacity:1;\"><\/div><\/div><div class=\"relative caxton-section-block\" style=\"padding-top:5px;padding-left:5px;padding-bottom:5px;padding-right:5px\" data-mobile-css=\"padding-left:1em;padding-right:1em;\" data-tablet-css=\"padding-left:1em;padding-right:1em;\">\n<p><strong><a href=\"http:\/\/connect.biorxiv.org\/biorxiv_xml.php?subject=bioinformatics\" target=\"_blank\" rel=\"noreferrer noopener\">RSS<\/a><\/strong><\/p>\n<\/div><\/div>\n<\/div><\/div>\n\n\n<ul class=\"has-dates has-authors has-excerpts wp-block-rss\"><li class='wp-block-rss__item'><div class='wp-block-rss__item-title'><a href='https:\/\/www.biorxiv.org\/content\/10.64898\/2026.06.16.732716v1?rss=1'>Benchmarking cell type annotation in spatial transcriptomics: resolving cellular hierarchies, biological fidelity, and dynamic cell states<\/a><\/div><time datetime=\"2026-06-22T00:00:00-05:00\" class=\"wp-block-rss__item-publish-date\">June 22, 2026<\/time> <span class=\"wp-block-rss__item-author\">by Zhu, Y., Hu, Y., Xie, M. B., Qin, H., Szul, Z. J., Young, D. M., Yuan, W., Wang, Q., Liu, Y. H., Shen, W., Meltzer, S., Zhou, X. M.<\/span><div class=\"wp-block-rss__item-excerpt\">Spatial transcriptomics enables the quantification of gene expression within its native tissue context, providing unprecedented insight into tissue architecture, cellular ecosystems, and local cell-cell interactions at regional and single-cell resolution. Accurate cell type annotation is a critical prerequisite for interpreting these data and is often the first and most essential step in downstream analysis. Despite rapid advances in computational methods, cell type annotation remains challenging and frequently requires extensive expert-driven manual curation based on marker-gene expression, spatial context, and prior [&hellip;]<\/div><\/li><li class='wp-block-rss__item'><div class='wp-block-rss__item-title'><a href='https:\/\/www.biorxiv.org\/content\/10.64898\/2026.06.17.732863v1?rss=1'>PhaseWY: A pipeline for haplotype phasing, sex chromosome identification and extraction of sex-limited sequences<\/a><\/div><time datetime=\"2026-06-22T00:00:00-05:00\" class=\"wp-block-rss__item-publish-date\">June 22, 2026<\/time> <span class=\"wp-block-rss__item-author\">by Ellerstrand, S. J., Churcher, A. M. J., Kutschera, V. E., Hansson, B.<\/span><div class=\"wp-block-rss__item-excerpt\">Sex chromosomes are central to many ecological and evolutionary processes. Evidence has accumulated that sex chromosome systems vary extensively in age, turnover and transitions, motivating renewed efforts to study the diversity of sex chromosome systems across the tree of life. However, successful genomic detection of sex chromosomes depends on several factors, including the size and divergence time, background genetic diversity, and the number of sequenced females and males. In addition, technical challenges associated with sequencing and analysing the sex-limited Y\/W [&hellip;]<\/div><\/li><li class='wp-block-rss__item'><div class='wp-block-rss__item-title'><a href='https:\/\/www.biorxiv.org\/content\/10.64898\/2026.06.17.732876v1?rss=1'>When Less Is Not More: DICEPro Mitigates the Impact of Incomplete Reference Matrices on Cellular Frequency Deconvolution.<\/a><\/div><time datetime=\"2026-06-22T00:00:00-05:00\" class=\"wp-block-rss__item-publish-date\">June 22, 2026<\/time> <span class=\"wp-block-rss__item-author\">by BA, K., Thiebaut, R., Hinaut, X., Hejblum, B. P.<\/span><div class=\"wp-block-rss__item-excerpt\">Cellular deconvolution aims to estimate the frequencies of different cell populations from gene expression measurements in a biological sample. Supervised approaches, such as CIBERSORTx and DISSECT, critically depend on the reference signature matrix, which encodes the gene expression profiles of cell-types based on prior knowledge. Despite numerous deconvolution methods, the impact of missing cell populations in the reference matrix remains understudied. Here, we evaluate the robustness of state-of-the-art deconvolution approaches using simulations based on real dataset examples combined with statistical [&hellip;]<\/div><\/li><li class='wp-block-rss__item'><div class='wp-block-rss__item-title'><a href='https:\/\/www.biorxiv.org\/content\/10.64898\/2026.06.17.732933v1?rss=1'>Multivariate Random Forests for Cross-Modal Multi-Omics Integration<\/a><\/div><time datetime=\"2026-06-22T00:00:00-05:00\" class=\"wp-block-rss__item-publish-date\">June 22, 2026<\/time> <span class=\"wp-block-rss__item-author\">by Zhang, W., Wang, L., Franzmann, E. J., Chen, X. S.<\/span><div class=\"wp-block-rss__item-excerpt\">Multi-omics studies are widely used across many areas of biomedical research. In many diseases, some signals are shared across data types, while others are strongest in a single omics layer. Current multi-omics clustering methods often either merge all data types into a single representation, which can blur biology that is strong in one layer, or rely on linear structure that may miss more complex relationships across data types. We introduce multiRF, a random-forest-based method that handles complex data types and [&hellip;]<\/div><\/li><li class='wp-block-rss__item'><div class='wp-block-rss__item-title'><a href='https:\/\/www.biorxiv.org\/content\/10.64898\/2026.06.17.733012v1?rss=1'>Dynamic balance of sparse flux vectors for efficient simulation of culture dynamics and metabolic network reduction<\/a><\/div><time datetime=\"2026-06-22T00:00:00-05:00\" class=\"wp-block-rss__item-publish-date\">June 22, 2026<\/time> <span class=\"wp-block-rss__item-author\">by Tapia Garc\u00eda, I., Torrealba, C., Luna, R., P\u00e9rez-Correa, J. R., Saa, P. A.<\/span><div class=\"wp-block-rss__item-excerpt\">Dynamic Flux Balance Analysis (DFBA) enables simulation of microbial culture dynamics under changing environmental conditions, but remains computationally expensive for tasks such as parameter calibration and fermentation optimization when applied using genome-scale metabolic models (GEMs). To address this challenge, we introduce Dynamic Flux Vector Balancing (DFVB), a reformulation of DFBA that solves an equivalent problem using a pre-computed, sparse basis of flux solutions that reduces the dimensionality of the internal optimization problem without information loss. Notably, DFVB provides a compact, [&hellip;]<\/div><\/li><li class='wp-block-rss__item'><div class='wp-block-rss__item-title'><a href='https:\/\/www.biorxiv.org\/content\/10.64898\/2026.06.17.732853v1?rss=1'>HTS-Oracle X: AI-Guided Prospective Discovery of Small Molecule Immune Checkpoint Binders<\/a><\/div><time datetime=\"2026-06-22T00:00:00-05:00\" class=\"wp-block-rss__item-publish-date\">June 22, 2026<\/time> <span class=\"wp-block-rss__item-author\">by Abdel-Rahman, S., Gabr, M.<\/span><div class=\"wp-block-rss__item-excerpt\">Targeting immune checkpoint protein-protein interactions (PPIs) using small molecules remains limited by the shallow, featureless binding surfaces of co-stimulatory and co-inhibitory receptors and the characteristically low hit rates of conventional high-throughput screening against these interfaces. Here we report HTS-Oracle X, a multimodal deep learning platform that integrates bidirectional cross-attention fusion of ChemBERTa SMILES embeddings with extended RDKit descriptors, trains on continuous biophysical binding signals rather than binary labels, and employs Monte Carlo Dropout uncertainty quantification for uncertainty-adjusted compound selection. Trained [&hellip;]<\/div><\/li><li class='wp-block-rss__item'><div class='wp-block-rss__item-title'><a href='https:\/\/www.biorxiv.org\/content\/10.64898\/2026.06.17.732914v1?rss=1'>Drug-Prot: A query system for statistical inference of drug effects and interactions in dynamic proteomic networks<\/a><\/div><time datetime=\"2026-06-22T00:00:00-05:00\" class=\"wp-block-rss__item-publish-date\">June 22, 2026<\/time> <span class=\"wp-block-rss__item-author\">by Ulmer, M., Sun, R., Qian, L., Aebersold, R., Guo, T., Buehlmann, P.<\/span><div class=\"wp-block-rss__item-excerpt\">Understanding drug effects and drug-drug interactions is essential for developing combination therapies. We present Drug-Prot, a computational framework that leverages large-scale perturbation proteomics to quantify causal drug effects, drug-drug interactions, and dynamic protein relationships. Using data from 63 single drugs and 59 drug combinations applied to 18 breast cancer cell lines at 6, 24, and 48 hours, Drug-Prot estimates drug effects on protein expression and reconstructs directed temporal protein dependency networks. The publicly available software enables targeted analyses of user-defined [&hellip;]<\/div><\/li><li class='wp-block-rss__item'><div class='wp-block-rss__item-title'><a href='https:\/\/www.biorxiv.org\/content\/10.64898\/2026.06.22.733705v1?rss=1'>PanRes: A database of latent and acquired antimicrobial resistance allowing 3D-based protein homology search<\/a><\/div><time datetime=\"2026-06-22T00:00:00-05:00\" class=\"wp-block-rss__item-publish-date\">June 22, 2026<\/time> <span class=\"wp-block-rss__item-author\">by Vojtkova, M., Baltusis, M., Martiny, H.-M., Baral, A., Pyrounakis, N., Beleon, A., Freitag, R., Pico-Tomas, A., Kaas, R. S., Petersen, T. N., Munk, P.<\/span><div class=\"wp-block-rss__item-excerpt\">Antimicrobial resistance databases are central to genomic surveillance, but resistance determinants remain distributed across resources with different scopes, structures, and annotations. We developed PanRes, a curated resistance database of 11,717 genes integrating acquired and latent determinants of antibiotic, biocide, and metal resistance within a unified ontology. We predicted representative protein structures and clustered them by structural similarity, grouping proteins into 598 structurally conserved clusters coherent despite sequence divergence. Their structure-guided alignments were used to build Hidden Markov Models (HMMs) for [&hellip;]<\/div><\/li><li class='wp-block-rss__item'><div class='wp-block-rss__item-title'><a href='https:\/\/www.biorxiv.org\/content\/10.64898\/2026.06.16.732397v1?rss=1'>CellTosg2Sequence: A Unified Text-Omics-Signaling-Graph Large Language Model for Single-Cell Analysis<\/a><\/div><time datetime=\"2026-06-22T00:00:00-05:00\" class=\"wp-block-rss__item-publish-date\">June 22, 2026<\/time> <span class=\"wp-block-rss__item-author\">by chen, w., Ye, M., Xu, T., Huang, D., Zhang, H., Li, H., Li, W., Chen, Y., Payne, P. R., Li, F.<\/span><div class=\"wp-block-rss__item-excerpt\">bioRxivLaTeXUnicodeabstract &#8212; In single-cell (sc)-based scientific discovery, text-formatted biomedical prior knowledge and signaling graphs are essential for annotating and interpreting numeric sc-omics data and for generating novel testable hypotheses. A major limitation of existing single-cell large language models (scLLMs) is that they rely on numeric expression data with gene names as the only textual signal, while comprehensive biomedical priors &#8212; cellular localization, gene function, disease associations, and signaling interaction patterns &#8212; remain absent from the model input. We introduce CellTosg2Sequence, [&hellip;]<\/div><\/li><li class='wp-block-rss__item'><div class='wp-block-rss__item-title'><a href='https:\/\/www.biorxiv.org\/content\/10.64898\/2026.06.16.732574v1?rss=1'>Complex-valued representations of time-series gene expression profiles for network analysis<\/a><\/div><time datetime=\"2026-06-22T00:00:00-05:00\" class=\"wp-block-rss__item-publish-date\">June 22, 2026<\/time> <span class=\"wp-block-rss__item-author\">by Sun, J., Cao, W., Ikumi, K., Shimizu, K. K., Sese, J.<\/span><div class=\"wp-block-rss__item-excerpt\">Time-series RNA sequencing provides a powerful framework for studying dynamic gene regulation, yet conventional analyses usually represent gene expression profiles as real-valued vectors in Euclidean space and quantify similarity using correlation or distance. Inspired by quantum information theory, we present a framework for encoding time-series gene expression profiles as complex-valued vectors comprising amplitude and phase components in Hilbert space. We designed multiple encoding models to represent gene expression in the amplitude of complex-valued vectors, encode temporal differences in the phase, [&hellip;]<\/div><\/li><li class='wp-block-rss__item'><div class='wp-block-rss__item-title'><a href='https:\/\/www.biorxiv.org\/content\/10.64898\/2026.06.21.733631v1?rss=1'>Few-Shot Classification of C. elegans Developmental Stages via Explainable Hierarchical Hyperbolic Graph Embeddings<\/a><\/div><time datetime=\"2026-06-22T00:00:00-05:00\" class=\"wp-block-rss__item-publish-date\">June 22, 2026<\/time> <span class=\"wp-block-rss__item-author\">by Khalid, N., Elliott, L., Obafemi-Ajayi, T., Wunsch, D., Scharf, A.<\/span><div class=\"wp-block-rss__item-excerpt\">Automated, accurate, and fast developmental-stage classification of C. elegans from microscopy-based morphological images is essential for aging research, drug screening, and disease modeling. However, it remains challenging due to morphological similarities between stages and the limited annotated data. In this work, we propose HyperDev, a hyperbolic few-shot learning framework that addresses these limitations by directly encoding developmental hierarchies in the embedding space, unlike conventional Euclidean approaches that treat stages as independent classes. HyperDev uses Poincare ball geometry, combined with a [&hellip;]<\/div><\/li><li class='wp-block-rss__item'><div class='wp-block-rss__item-title'><a href='https:\/\/www.biorxiv.org\/content\/10.64898\/2026.06.21.733642v1?rss=1'>EMAlign: accurate alignment of cryo-EM maps through main-chain probability using deep learning<\/a><\/div><time datetime=\"2026-06-22T00:00:00-05:00\" class=\"wp-block-rss__item-publish-date\">June 22, 2026<\/time> <span class=\"wp-block-rss__item-author\">by Cao, H., Chen, J., Li, T., Huang, S.-Y.<\/span><div class=\"wp-block-rss__item-excerpt\">Accurate alignment of cryo-EM density maps is essential for comparing conformational states, searching map libraries, and guiding atomic model building, but remains challenging for noisy experimental maps and partially overlapping structures. Existing alignment methods are often based on raw maps, which may result in reduced accuracy due to the density noise, or require manual intervention for local alignment, which suffers from limited general applicability. Addressing the limitations, we present EMAlign, an automatic global and local cryo-EM map alignment with predicted [&hellip;]<\/div><\/li><li class='wp-block-rss__item'><div class='wp-block-rss__item-title'><a href='https:\/\/www.biorxiv.org\/content\/10.64898\/2026.06.16.732528v1?rss=1'>Reference-guided immune recovery matching prioritizes traditional Chinese medicine ingredients<\/a><\/div><time datetime=\"2026-06-22T00:00:00-05:00\" class=\"wp-block-rss__item-publish-date\">June 22, 2026<\/time> <span class=\"wp-block-rss__item-author\">by Hu, C., Xiao, B., Chen, C. Y.-C.<\/span><div class=\"wp-block-rss__item-excerpt\">Therapeutic prioritization from single-cell transcriptomes requires a target that is closer to treatment response than disease-signature reversal. In immune diseases, post-treatment recovery may follow patient- and cell-type-specific trajectories rather than a simple return along the pretreatment disease axis. We developed ImmuneNavi, a healthy-reference-anchored recovery-matching workflow for ranking traditional Chinese medicine ingredients from paired PBMC data. The workflow maps heterogeneous PBMC cohorts to a common healthy immune coordinate system, constructs patient-cell-type disease and recovery states, and processes ITCM treated-control profiles into [&hellip;]<\/div><\/li><li class='wp-block-rss__item'><div class='wp-block-rss__item-title'><a href='https:\/\/www.biorxiv.org\/content\/10.64898\/2026.06.16.732538v1?rss=1'>From hotspot dependence to distributed robustness in resistance-aware lead optimization<\/a><\/div><time datetime=\"2026-06-22T00:00:00-05:00\" class=\"wp-block-rss__item-publish-date\">June 22, 2026<\/time> <span class=\"wp-block-rss__item-author\">by Wang, Y., Xiao, B., Kang, J., Cui, H., Fu, Y., Li, W., Perea, S. E., Han, W.<\/span><div class=\"wp-block-rss__item-excerpt\">Drug resistance remains a recurrent failure mode in targeted anticancer and antiviral therapy, and resistance evidence often enters only after compound selection. ResistAgent is an evidence-constrained framework that converts mutational liabilities into design-time objectives through site- and combo-aware resistance mapping, deterministic mechanism diagnosis and robust counter-design. In EGFR-Erlotinib and HIV-RT-Rilpivirine, the framework separated residue-level liabilities from observed HIV combination liabilities and linked prioritized mutations to anchor loss, pocket rearrangement, electrostatic shifts and contact redistribution. Same-budget paired searches showed that robust [&hellip;]<\/div><\/li><li class='wp-block-rss__item'><div class='wp-block-rss__item-title'><a href='https:\/\/www.biorxiv.org\/content\/10.64898\/2026.06.18.733197v1?rss=1'>EventHorizon: A Foundation Model for Clinical Flow Cytometry<\/a><\/div><time datetime=\"2026-06-22T00:00:00-05:00\" class=\"wp-block-rss__item-publish-date\">June 22, 2026<\/time> <span class=\"wp-block-rss__item-author\">by Medina Grespan, M., Morrison, M., O&#039;Fallon, B., Shean, R., Spies, N. C., Ng, D.<\/span><div class=\"wp-block-rss__item-excerpt\">Flow cytometry is an essential tool for diagnosis of hematologic malignancies, but existing clinical workflows are highly dependent on expert manual interpretation. Existing machine learning approaches typically require extensive labeled data and are sensitive to variability in panel design, instrumentation, and laboratory workflows, limiting their generalizability. We present EventHorizon, a self-supervised foundation model for clinical flow cytometry that produces unified specimen-level representations from heterogeneous multi-panel data. EventHorizon employs a two-stage hierarchical transformer architecture with marker-aware tokenization, enabling seamless integration of [&hellip;]<\/div><\/li><li class='wp-block-rss__item'><div class='wp-block-rss__item-title'><a href='https:\/\/www.biorxiv.org\/content\/10.64898\/2026.06.17.732686v1?rss=1'>GENATATORs: ab initio Gene Annotation With DNA Language Models<\/a><\/div><time datetime=\"2026-06-21T00:00:00-05:00\" class=\"wp-block-rss__item-publish-date\">June 21, 2026<\/time> <span class=\"wp-block-rss__item-author\">by Shmelev, A., Shadskiy, A., Kuratov, Y., Burtsev, M., Kardymon, O., Fishman, V.<\/span><div class=\"wp-block-rss__item-excerpt\">Inference of gene structure and location from genome sequences &#8211; known as de novo gene annotation &#8211; is a fundamental task in biological research. However, sequence grammar encoding gene structure is complex and poorly understood, often requiring costly transcriptomic data for accurate gene annotation. In this work, we benchmark current solutions and develop new methods of gene annotation. We show that pretrained DNA language model (DNA LM) embeddings do not capture the features necessary for precise gene segmentation, and that [&hellip;]<\/div><\/li><li class='wp-block-rss__item'><div class='wp-block-rss__item-title'><a href='https:\/\/www.biorxiv.org\/content\/10.64898\/2026.06.17.732859v1?rss=1'>OracleScreen-LILRB4: Machine Learning-Guided Discovery of Myeloid Immune Checkpoint Binders Validated in Patient-Derived Cells<\/a><\/div><time datetime=\"2026-06-21T00:00:00-05:00\" class=\"wp-block-rss__item-publish-date\">June 21, 2026<\/time> <span class=\"wp-block-rss__item-author\">by Abdel-Rahman, S., Gabr, M.<\/span><div class=\"wp-block-rss__item-excerpt\">The identification of small molecule modulators of immune checkpoint proteins remains a significant challenge in drug discovery due to the flat, featureless nature of protein-protein interaction interfaces and the characteristically low hit rates observed in conventional high-throughput screening campaigns. Here we report OracleScreen-LILRB4, an ensemble machine learning framework trained on quantitative biophysical screening data from two structurally diverse compound libraries (19,800 compounds total) screened against the myeloid immune checkpoint leukocyte immunoglobulin-like receptor B4 (LILRB4\/ILT3). By formulating binding prediction as a [&hellip;]<\/div><\/li><li class='wp-block-rss__item'><div class='wp-block-rss__item-title'><a href='https:\/\/www.biorxiv.org\/content\/10.64898\/2026.06.16.732382v1?rss=1'>SPA-C: an hybrid tool to accurately scaffold genomes using Hi-C and Deep-Learning<\/a><\/div><time datetime=\"2026-06-21T00:00:00-05:00\" class=\"wp-block-rss__item-publish-date\">June 21, 2026<\/time> <span class=\"wp-block-rss__item-author\">by Mergez, A., Mourad, R., Hernandez-Raquet, G., Zytnicki, M.<\/span><div class=\"wp-block-rss__item-excerpt\">Genome assembly is a computational pipeline designed to reconstruct chromosomes from small sequencing reads. Following their assembly, contiguous sequences (contigs) are arranged into chromosome-long sequences during scaffolding. Hi-C, a long-range linkage information between regions of the genome widely used in recent large sequencing projects, is often required to correctly order contigs. Several tools have been developed to automate this task following either statistical or deep-learning approaches. Statistical approaches summarise 2D Hi-C matrices into contact densities across sequences, thus ignoring informative [&hellip;]<\/div><\/li><li class='wp-block-rss__item'><div class='wp-block-rss__item-title'><a href='https:\/\/www.biorxiv.org\/content\/10.64898\/2026.06.17.732633v1?rss=1'>DeepCDS: Ab initio coding sequence prediction in prokaryotic short reads<\/a><\/div><time datetime=\"2026-06-21T00:00:00-05:00\" class=\"wp-block-rss__item-publish-date\">June 21, 2026<\/time> <span class=\"wp-block-rss__item-author\">by Nielsen, L. S., Nielsen, H., Winther, O.<\/span><div class=\"wp-block-rss__item-excerpt\">Accurate coding sequence prediction in short prokaryotic metagenomic reads remains challenging due to sequence fragmentation, unknown sequence origins, and sequencing errors. Here we introduce DeepCDS, a deep learning-based ab initio coding sequence predictor trained on short prokaryotic sequences with and without simulated Illumina-like sequencing errors. DeepCDS integrates ESM-2 protein language model embeddings with nucleotide-level information to predict complete and fragmented coding sequence regions. Benchmarking on 215 phylogenetically diverse prokaryotic organisms demonstrates that DeepCDS consistently outperforms current state-of-the-art methods in coding [&hellip;]<\/div><\/li><li class='wp-block-rss__item'><div class='wp-block-rss__item-title'><a href='https:\/\/www.biorxiv.org\/content\/10.64898\/2026.06.20.733316v1?rss=1'>Expanding the GUSome: Structure-guided identification and characterization of gut microbial \u03b2-glucuronidases<\/a><\/div><time datetime=\"2026-06-21T00:00:00-05:00\" class=\"wp-block-rss__item-publish-date\">June 21, 2026<\/time> <span class=\"wp-block-rss__item-author\">by Singhal, T., Badgujar, C. V., Bihani, S. C.<\/span><div class=\"wp-block-rss__item-excerpt\">The gut microbiome-encoded {beta}-glucuronidase (GUS) enzymes have a significant effect on human physiology through their deglucuronidation activity on endogenous and exogenous glucuronides. GUS activity also significantly influences the pharmacokinetics, efficacy and toxicity of various drugs including chemotherapeutic drugs. Given their crucial role in drug metabolism, GUS enzymes have emerged as promising targets for therapeutic intervention. Here, we have identified and characterized 79 unique GUS enzymes through a structure-guided approach. Structural modelling of these GUS enzymes revealed a conserved core and [&hellip;]<\/div><\/li><\/ul>\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity is-style-wide\"\/>\n\n\n\n<h4 class=\"wp-block-heading\">Related Journals<\/h4>\n\n\n<ul class=\"su-siblings\"><li class=\"page_item page-item-3099\"><a href=\"https:\/\/kermitmurray.com\/msblog\/links\/journal-feeds\/biochemistry-journal-feeds\/biorxiv\/biorxiv-biochemistry\/\">BioRxiv Biochemistry<\/a><\/li>\n<li class=\"page_item page-item-3132\"><a href=\"https:\/\/kermitmurray.com\/msblog\/links\/journal-feeds\/biochemistry-journal-feeds\/biorxiv\/biorxiv-biophysics\/\">BioRxiv Biophysics<\/a><\/li>\n<li class=\"page_item page-item-3188\"><a href=\"https:\/\/kermitmurray.com\/msblog\/links\/journal-feeds\/biochemistry-journal-feeds\/biorxiv\/biorxiv-cancer-biology\/\">BioRxiv Cancer Biology<\/a><\/li>\n<li class=\"page_item page-item-3190\"><a href=\"https:\/\/kermitmurray.com\/msblog\/links\/journal-feeds\/biochemistry-journal-feeds\/biorxiv\/biorxiv-pharmacology-and-toxicology\/\">BioRxiv Pharmacology and Toxicology<\/a><\/li>\n<li class=\"page_item page-item-3114\"><a href=\"https:\/\/kermitmurray.com\/msblog\/links\/journal-feeds\/biochemistry-journal-feeds\/biorxiv\/biorxiv-systems-biology\/\">BioRxiv Systems Biology<\/a><\/li>\n<li class=\"page_item page-item-3193\"><a href=\"https:\/\/kermitmurray.com\/msblog\/links\/journal-feeds\/biochemistry-journal-feeds\/biorxiv\/biorxiv-zoology\/\">BioRxiv Zoology<\/a><\/li>\n<\/ul>\n","protected":false},"excerpt":{"rendered":"<p>Related Journals<\/p>\n","protected":false},"author":1,"featured_media":2652,"parent":3087,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"","meta":{"footnotes":"","_links_to":"","_links_to_target":""},"class_list":["post-3112","page","type-page","status-publish","has-post-thumbnail","hentry","entry"],"_links":{"self":[{"href":"https:\/\/kermitmurray.com\/msblog\/wp-json\/wp\/v2\/pages\/3112","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/kermitmurray.com\/msblog\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/kermitmurray.com\/msblog\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/kermitmurray.com\/msblog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/kermitmurray.com\/msblog\/wp-json\/wp\/v2\/comments?post=3112"}],"version-history":[{"count":1,"href":"https:\/\/kermitmurray.com\/msblog\/wp-json\/wp\/v2\/pages\/3112\/revisions"}],"predecessor-version":[{"id":3113,"href":"https:\/\/kermitmurray.com\/msblog\/wp-json\/wp\/v2\/pages\/3112\/revisions\/3113"}],"up":[{"embeddable":true,"href":"https:\/\/kermitmurray.com\/msblog\/wp-json\/wp\/v2\/pages\/3087"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/kermitmurray.com\/msblog\/wp-json\/wp\/v2\/media\/2652"}],"wp:attachment":[{"href":"https:\/\/kermitmurray.com\/msblog\/wp-json\/wp\/v2\/media?parent=3112"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}