{"id":3112,"date":"2023-01-17T20:44:50","date_gmt":"2023-01-18T02:44:50","guid":{"rendered":"https:\/\/kermitmurray.com\/msblog\/?page_id=3112"},"modified":"2023-01-17T20:44:50","modified_gmt":"2023-01-18T02:44:50","slug":"biorxiv-bioinformatics","status":"publish","type":"page","link":"https:\/\/kermitmurray.com\/msblog\/links\/journal-feeds\/biochemistry-journal-feeds\/biorxiv\/biorxiv-bioinformatics\/","title":{"rendered":"BioRxiv Bioinformatics"},"content":{"rendered":"\n<div class=\"wp-block-caxton-grid relative\"><div class=\"absolute absolute--fill\"><div class=\"absolute absolute--fill cover bg-center\" style=\"background-color:;background-image:linear-gradient( );\"><\/div><div class=\"absolute absolute--fill\" style=\"background-color:;background-image:linear-gradient( );opacity:1;\"><\/div><\/div><div class=\"relative caxton-columns caxton-grid-block\" style=\"padding-top:0;padding-left:0;padding-bottom:0;padding-right:0;grid-template-columns:repeat(12, 1fr)\" data-tablet-css=\"padding-left:em;padding-right:em;\" data-mobile-css=\"padding-left:em;padding-right:em;\">\n<div class=\"wp-block-caxton-section relative\" style=\"grid-area:span 1\/span 8\"><div class=\"absolute absolute--fill\"><div class=\"absolute absolute--fill cover bg-center\" style=\"background-color:;background-image:linear-gradient( );\"><\/div><div class=\"absolute absolute--fill\" style=\"background-color:;background-image:linear-gradient( );opacity:1;\"><\/div><\/div><div class=\"relative caxton-section-block\" style=\"padding-top:5px;padding-left:5px;padding-bottom:5px;padding-right:5px\" data-mobile-css=\"padding-left:1em;padding-right:1em;\" data-tablet-css=\"padding-left:1em;padding-right:1em;\">\n<p><strong><a href=\"https:\/\/www.biorxiv.org\/alertsrss\" target=\"_blank\" rel=\"noreferrer noopener\">Journal Home<\/a><\/strong><\/p>\n<\/div><\/div>\n\n\n\n<div class=\"wp-block-caxton-section relative\" style=\"grid-area:span 1\/span 4\"><div class=\"absolute absolute--fill\"><div class=\"absolute absolute--fill cover bg-center\" style=\"background-color:;background-image:linear-gradient( );\"><\/div><div class=\"absolute absolute--fill\" style=\"background-color:;background-image:linear-gradient( );opacity:1;\"><\/div><\/div><div class=\"relative caxton-section-block\" style=\"padding-top:5px;padding-left:5px;padding-bottom:5px;padding-right:5px\" data-mobile-css=\"padding-left:1em;padding-right:1em;\" data-tablet-css=\"padding-left:1em;padding-right:1em;\">\n<p><strong><a href=\"http:\/\/connect.biorxiv.org\/biorxiv_xml.php?subject=bioinformatics\" target=\"_blank\" rel=\"noreferrer noopener\">RSS<\/a><\/strong><\/p>\n<\/div><\/div>\n<\/div><\/div>\n\n\n<ul class=\"has-dates has-authors has-excerpts wp-block-rss\"><li class='wp-block-rss__item'><div class='wp-block-rss__item-title'><a href='https:\/\/www.biorxiv.org\/content\/10.64898\/2026.07.10.737551v1?rss=1'>ProtBLIP2-SST: Protein Function Prediction via BLIP2 with Sequence, Structure, and Text<\/a><\/div><time datetime=\"2026-07-12T00:00:00-05:00\" class=\"wp-block-rss__item-publish-date\">July 12, 2026<\/time> <span class=\"wp-block-rss__item-author\">by Chen, Z., Luo, Q.<\/span><div class=\"wp-block-rss__item-excerpt\">Protein function prediction traditionally relies on structured gene ontology (GO) labels or multi-label classifiers. However, these labels or classifiers cannot flexibly describe molecular function, biological process, cellular component, and free-text functional narratives in a single output. In comparison, generation-based approaches offer an intuitive paradigm for flexible free-text protein annotation, with large language models (LLMs) as a representative method for protein-text modeling. Recent efforts on utilizing LLMs for protein semantic understanding and annotation generation have adopted sequence-only encoding or sequence-text contrastive [&hellip;]<\/div><\/li><li class='wp-block-rss__item'><div class='wp-block-rss__item-title'><a href='https:\/\/www.biorxiv.org\/content\/10.64898\/2026.07.08.737273v1?rss=1'>Genomic Annotation Infrastructure (GAIn): Pipelines and Resource Repositories for Annotating Variants, Positions, and Regions<\/a><\/div><time datetime=\"2026-07-12T00:00:00-05:00\" class=\"wp-block-rss__item-publish-date\">July 12, 2026<\/time> <span class=\"wp-block-rss__item-author\">by Cokol, M., Chorbadjiev, L., Lee, Y.-h., Jamsandekar, M., Gergova, I., Todorov, I., Iossifov, I.<\/span><div class=\"wp-block-rss__item-excerpt\">Interpretation of genomic variants, positions, and regions depends on reliable annotation &#8211; adding evidence such as predicted effect, conservation, population frequency, and gene-level context &#8211; yet the underlying resources are numerous, versioned, and assembly-specific. We present the Genomic Annotation Infrastructure (GAIn), a platform that generates transparent, reproducible annotations via declarative pipelines that define annotation tasks as ordered lists of components, called annotators, that produce annotation attributes using genomic resources from Genomic Resource Repositories (GRRs). We provide two public GRRs: a [&hellip;]<\/div><\/li><li class='wp-block-rss__item'><div class='wp-block-rss__item-title'><a href='https:\/\/www.biorxiv.org\/content\/10.64898\/2026.07.08.737248v1?rss=1'>OCellus: A Language-Model Framework for Single-Cell, Spatial, and Perturbation Biology with Natural-Language Reasoning<\/a><\/div><time datetime=\"2026-07-12T00:00:00-05:00\" class=\"wp-block-rss__item-publish-date\">July 12, 2026<\/time> <span class=\"wp-block-rss__item-author\">by Zhang, C., Sun, J., Xu, Z., Liao, R., Yin, A., Gao, H., Liu, E., Bao, Y., Zhao, L., Wang, G.<\/span><div class=\"wp-block-rss__item-excerpt\">Computational modeling of cellular behavior &#8211; the virtual cell &#8211; has emerged as a stated grand challenge at the intersection of artificial intelligence and biology, yet existing foundation models remain specialized: single-cell models process dissociated transcriptomes only, spatial models require dedicated spatial-aware architectures, and perturbation predictors depend on manually curated knowledge bases that cap generalization. Here we introduce OCellus, a single nine-billion-parameter language model (Qwen3.5-9B) fine-tuned on twenty-two biological tasks that simultaneously addresses all three limitations through three coordinated technical [&hellip;]<\/div><\/li><li class='wp-block-rss__item'><div class='wp-block-rss__item-title'><a href='https:\/\/www.biorxiv.org\/content\/10.64898\/2026.07.08.737319v1?rss=1'>A geometric atlas of how ESM3 organizes modalities across depth<\/a><\/div><time datetime=\"2026-07-12T00:00:00-05:00\" class=\"wp-block-rss__item-publish-date\">July 12, 2026<\/time> <span class=\"wp-block-rss__item-author\">by Steenwyk, J. L.<\/span><div class=\"wp-block-rss__item-excerpt\">Protein language models learn general-purpose representations from large collections of protein sequences and structures, and have advanced the prediction of protein structure and function. ESM3 is a multimodal protein language model that ingests a protein through several channels at once, including amino-acid sequence, three-dimensional structure, secondary structure (SS8), solvent accessibility (SASA), and discrete functional annotations, summing their embeddings into a single residual stream. Little is known about whether these modalities occupy separate subspaces and the depth at which they fuse. [&hellip;]<\/div><\/li><li class='wp-block-rss__item'><div class='wp-block-rss__item-title'><a href='https:\/\/www.biorxiv.org\/content\/10.64898\/2026.07.08.737305v1?rss=1'>TEscape: Defining the human transposable element transcriptome using multiplatform long-read sequencing<\/a><\/div><time datetime=\"2026-07-12T00:00:00-05:00\" class=\"wp-block-rss__item-publish-date\">July 12, 2026<\/time> <span class=\"wp-block-rss__item-author\">by Mercuri, R. L. V., Mombach, D. M., dos Santos, F. R. C., Perez-Schindler, J., Huang, Y., Spealman, P., Pintacuda, G., Al&#039;Khafaji, A., Donnard, E. R., Claussnitzer, M., Galante, P. A. F.<\/span><div class=\"wp-block-rss__item-excerpt\">Transposable elements (TEs) not only account for half of the human genome sequence but also generate transcripts that contribute to transcriptomic diversity. Yet, their repetitive nature has hindered accurate quantification of the full TE-derived transcriptome, a challenge that long-read sequencing can overcome. Here, we combined multiplexed arrays isoform sequencing (MAS-ISO-seq) with a dedicated computational framework (TEscape) to perform an in-depth annotation of the human TE transcriptome. To capture the breadth of human transcriptome diversity, we profiled six representative cell types [&hellip;]<\/div><\/li><li class='wp-block-rss__item'><div class='wp-block-rss__item-title'><a href='https:\/\/www.biorxiv.org\/content\/10.64898\/2026.07.10.737871v1?rss=1'>EcoMorph: Universal morphological trait quantification from natural language prompts for ecological research<\/a><\/div><time datetime=\"2026-07-12T00:00:00-05:00\" class=\"wp-block-rss__item-publish-date\">July 12, 2026<\/time> <span class=\"wp-block-rss__item-author\">by Amoah, E. I., Bunch, Z., Thomas, H. M., Patch, H. M., Grozinger, C.<\/span><div class=\"wp-block-rss__item-excerpt\">1. Morphological traits such as floral area and body size are fundamental to ecological research, serving as inputs for studies of pollinator-plant interactions, habitat quality, and biodiversity monitoring. However, accurately measuring these traits from images remains challenging, particularly in complex field conditions where existing tools exhibit reduced accuracy and limited generalizability across taxa. 2. We present EcoMorph, a modular morphological measurement system that leverages the Segment Anything Model 3 (SAM3) to quantify traits across diverse ecological contexts. Unlike task-specific segmentation [&hellip;]<\/div><\/li><li class='wp-block-rss__item'><div class='wp-block-rss__item-title'><a href='https:\/\/www.biorxiv.org\/content\/10.64898\/2026.07.08.737312v1?rss=1'>The YTHDC1 glutamate-rich domain docks to the ADAR1 Z\u03b1 Domain, linking the N6-methyladenosine modification of pre-mRNAs to dsRNA editing<\/a><\/div><time datetime=\"2026-07-12T00:00:00-05:00\" class=\"wp-block-rss__item-publish-date\">July 12, 2026<\/time> <span class=\"wp-block-rss__item-author\">by Gromak, D., Shaytan, A. K., Herbert, A., Poptsova, M.<\/span><div class=\"wp-block-rss__item-excerpt\">The p150 isoform of the double-stranded RNA editing enzyme ADAR1 binds Z-DNA and Z-RNA through the conserved winged helix-turn-helix Z domain. Here, we describe an inverse computational design strategy to map protein interactors of Z. We used RFdiffusion and ProteinMPNN to generate around 10,000 synthetic binders optimized for the Z recognition surface, then used their sequences as structural templates for BLASTp searches against the human proteome. Multi-stage screening of around 1,200 candidate regions from 298 proteins via ColabFold pDockQ identified [&hellip;]<\/div><\/li><li class='wp-block-rss__item'><div class='wp-block-rss__item-title'><a href='https:\/\/www.biorxiv.org\/content\/10.64898\/2026.07.09.737562v1?rss=1'>BioMetAll v2.0: Introducing Scores, Metal Discrimination, and Side-Chain Descriptors for Predicting Metal-Binding Sites in Proteins.<\/a><\/div><time datetime=\"2026-07-12T00:00:00-05:00\" class=\"wp-block-rss__item-publish-date\">July 12, 2026<\/time> <span class=\"wp-block-rss__item-author\">by Marechal, J. D., Fernandez Diaz, R., Pena Losada, R., Sanchez Aparicio, J. E., Gao, W., Alemany, M.<\/span><div class=\"wp-block-rss__item-excerpt\">Predicting the location of metal-binding sites in proteins is crucial for fundamental biological questions and biotechnological applications. Over the past decade, the rise in metal-bound protein structures in the Protein Data Bank, combined with advanced statistical models such as deep learning, has accelerated the development of metal-binding site prediction tools. Several approaches are now available, offering high-quality benchmarks and predictive performance. Our initial development in this area is BioMetAll, whose first version was based on backbone pre-organization. Here, we introduce [&hellip;]<\/div><\/li><li class='wp-block-rss__item'><div class='wp-block-rss__item-title'><a href='https:\/\/www.biorxiv.org\/content\/10.64898\/2026.07.11.737882v1?rss=1'>ProtPen combines sequence- and structure-based approaches to facilitate protein function predictions on a proteome-wide scale<\/a><\/div><time datetime=\"2026-07-11T00:00:00-05:00\" class=\"wp-block-rss__item-publish-date\">July 11, 2026<\/time> <span class=\"wp-block-rss__item-author\">by Mathai, D., Schulze, S.<\/span><div class=\"wp-block-rss__item-excerpt\">Proteins of unknown function represent a significant gap in our understanding of biological processes, encompassing large portions of the proteomes of many organisms, especially prokaryotes. Addressing this gap is critical to understanding the biology and pathogenicity of such organisms. We introduce ProtPen, an open-source pipeline that facilitates protein function prediction by combining eggNOG-mapper for sequence-based annotation with Foldseek for rapid structural similarity searches using AlphaFold-predicted protein structures. Annotation results from both tools are merged and enriched with UniProt metadata to [&hellip;]<\/div><\/li><li class='wp-block-rss__item'><div class='wp-block-rss__item-title'><a href='https:\/\/www.biorxiv.org\/content\/10.64898\/2026.07.09.736945v1?rss=1'>PKProbDesign: RNA inverse folding including pseudoknots by optimizing thermodynamic folding probability<\/a><\/div><time datetime=\"2026-07-11T00:00:00-05:00\" class=\"wp-block-rss__item-publish-date\">July 11, 2026<\/time> <span class=\"wp-block-rss__item-author\">by Otagaki, T., Iwakiri, J., terai, g., Asai, K., Sato, K.<\/span><div class=\"wp-block-rss__item-excerpt\">Motivation: RNA inverse folding, the design of RNA sequences that fold into specified target structures, is a central problem in RNA design, with applications in functional RNA engineering, synthetic biology, and nucleic-acid therapeutics. This task becomes especially challenging for pseudoknotted target structures because pseudoknots disrupt the nested structure assumed by standard thermodynamic folding models. Existing pseudoknot inverse-folding methods often rely on structure-predictor-based objectives. Direct optimization of the thermodynamic folding probability of a specified pseudoknotted target remains limited. This requires an [&hellip;]<\/div><\/li><li class='wp-block-rss__item'><div class='wp-block-rss__item-title'><a href='https:\/\/www.biorxiv.org\/content\/10.64898\/2026.07.07.736993v1?rss=1'>Metagenomic contextualization of proteins with state space models<\/a><\/div><time datetime=\"2026-07-11T00:00:00-05:00\" class=\"wp-block-rss__item-publish-date\">July 11, 2026<\/time> <span class=\"wp-block-rss__item-author\">by Azbijari, N., Wynne, J. H., David, M., Thurber, A. R.<\/span><div class=\"wp-block-rss__item-excerpt\">Since the early adoption of metagenomics (the culture-free sequencing of microbial community genomes) in 2011, sequence data has increased over 500-fold across ecosystems. This surge in data has outpaced reliable taxonomic and functional annotation, with over half of sequences lacking confident functional assignment. These unknown sequences limit our understanding of microbial processes central to planetary health and human health. Recent advances in genomic language modeling have made progress in the interpretation of metagenomics datasets. Most state-of-the-art models rely on transformer [&hellip;]<\/div><\/li><li class='wp-block-rss__item'><div class='wp-block-rss__item-title'><a href='https:\/\/www.biorxiv.org\/content\/10.64898\/2026.07.07.737037v1?rss=1'>AGPI: An AI-Powered Genomic Pathogen Intelligence Platform for Integrated Classification, Visualization, and Therapeutic Targeting<\/a><\/div><time datetime=\"2026-07-11T00:00:00-05:00\" class=\"wp-block-rss__item-publish-date\">July 11, 2026<\/time> <span class=\"wp-block-rss__item-author\">by Goel, A., Mishra, P.<\/span><div class=\"wp-block-rss__item-excerpt\">Rapid and accurate pathogen detection remains a major challenge in modern bioinformatics, as existing tools are often fragmented and require multiple specialized workflows. We present AGPI (AI-powered Genomic Pathogen Intelligence), an integrated platform that combines genomic sequence classification, biological enrichment, three-dimensional structural visualization, and AI-guided therapeutic prioritization within a single interpretable pipeline. AGPI employs a hybrid convolutional Bidirectional Gated Recurrent Unit (BiGRU) architecture trained on DNA sequences from 40 pathogen classes spanning viruses, bacteria, fungi, and protozoan pathogens. The model [&hellip;]<\/div><\/li><li class='wp-block-rss__item'><div class='wp-block-rss__item-title'><a href='https:\/\/www.biorxiv.org\/content\/10.64898\/2026.07.07.737093v1?rss=1'>Locus-Level Transposable Element Profiling Resolves Division-Coupled Transcriptional Dynamics During Human Endoderm Specification<\/a><\/div><time datetime=\"2026-07-11T00:00:00-05:00\" class=\"wp-block-rss__item-publish-date\">July 11, 2026<\/time> <span class=\"wp-block-rss__item-author\">by GAIRE, A., Kummerfeld, E., Aliferis, C., Wang, J.<\/span><div class=\"wp-block-rss__item-excerpt\">Background: Transposable elements (TEs) constitute nearly half of the human genome and are now recognized as significant contributors to mammalian gene regulatory networks. Despite this, most transcriptomic studies quantify TE expression at the subfamily level, which may obscure meaningful variation arising from individual insertion sites. Whether resolving TE expression to individual loci can reveal biologically distinct signals during stem cell differentiation has not been systematically characterized. Results: We re-analysed a published RNA-seq time course of FUCCI-h9 human embryonic stem cells [&hellip;]<\/div><\/li><li class='wp-block-rss__item'><div class='wp-block-rss__item-title'><a href='https:\/\/www.biorxiv.org\/content\/10.64898\/2026.07.07.737032v1?rss=1'>Single-cell RNA-seq reveals conserved and divergent cellular states across wound types and species<\/a><\/div><time datetime=\"2026-07-11T00:00:00-05:00\" class=\"wp-block-rss__item-publish-date\">July 11, 2026<\/time> <span class=\"wp-block-rss__item-author\">by Choi, D., Bakhtiari, M., Amin, A., Mann, J., Bhasin, S., Bhasin, M.<\/span><div class=\"wp-block-rss__item-excerpt\">Chronic wounds, such as diabetic foot ulcers, fail to progress through the normal healing process and impose a significant burden on healthcare systems. While previous single-cell studies have characterized specific wound conditions, a unified understanding of the shared and distinct cellular landscapes across diverse wound microenvironments has been lacking. Therefore, we integrated over 500,541 cells from patients and mice across multiple wound conditions, including acute wound, diabetic foot ulcer, and venous ulcer as well as their healing outcome. Fibroblast-focused analysis [&hellip;]<\/div><\/li><li class='wp-block-rss__item'><div class='wp-block-rss__item-title'><a href='https:\/\/www.biorxiv.org\/content\/10.64898\/2026.07.08.737144v1?rss=1'>AptViralDB: A Repository of Experimentally Validated Antiviral Aptamers<\/a><\/div><time datetime=\"2026-07-11T00:00:00-05:00\" class=\"wp-block-rss__item-publish-date\">July 11, 2026<\/time> <span class=\"wp-block-rss__item-author\">by Bajiya, N., Singh, S., Gahlot, P. S., Raghava, G. P. S.<\/span><div class=\"wp-block-rss__item-excerpt\">In an era of increasing drug resistance, exploring alternative molecules is crucial for the efficient management and treatment of viral diseases. Nucleic acid aptamers have emerged as highly promising candidates due to their exceptional target specificity, low immunogenicity, and versatile mechanisms for viral blocking. This manuscript describes AptViralDB, a manually curated database providing comprehensive information on experimentally validated antiviral aptamers. It contains 1,768 entries of antiviral aptamers against 40 viral species and 104 molecular targets, compiled from literature and existing [&hellip;]<\/div><\/li><li class='wp-block-rss__item'><div class='wp-block-rss__item-title'><a href='https:\/\/www.biorxiv.org\/content\/10.64898\/2026.07.10.737545v1?rss=1'>ProtAug: An Empirical Investigation of pLM-Guided Data Augmentation for Protein Sequence Prediction Tasks<\/a><\/div><time datetime=\"2026-07-11T00:00:00-05:00\" class=\"wp-block-rss__item-publish-date\">July 11, 2026<\/time> <span class=\"wp-block-rss__item-author\">by Chen, Z., Wang, R., Luo, Q.<\/span><div class=\"wp-block-rss__item-excerpt\">Protein language models (pLMs) offer great potential for protein sequence analysis, yet the scarcity of labeled data often limits their effectiveness in fine-tuning. Data augmentation is a promising remedy, but systematic evaluation of augmentation strategies for protein sequences remains limited, and the conditions under which augmentation confers downstream benefits are not well understood. In this paper, we systematically investigate pLM-guided substitution-based augmentation across seven protein prediction tasks. We propose ProtAug, a framework that leverages encoder-based (ESM-2) and autoregressive (ProtGPT2) pLMs [&hellip;]<\/div><\/li><li class='wp-block-rss__item'><div class='wp-block-rss__item-title'><a href='https:\/\/www.biorxiv.org\/content\/10.64898\/2026.07.10.737715v1?rss=1'>High resolution Streptococcus pyogenes core genome MLST and LIN coding scheme for outbreak detection<\/a><\/div><time datetime=\"2026-07-11T00:00:00-05:00\" class=\"wp-block-rss__item-publish-date\">July 11, 2026<\/time> <span class=\"wp-block-rss__item-author\">by Ryan, Y., Jolley, K. A., Hearn, H., Parfitt, K. M., Platt, S., Lamagni, T., Moganeradj, K.<\/span><div class=\"wp-block-rss__item-excerpt\">Streptococcus pyogenes is a globally important pathogen responsible for at least 500,000 deaths a year, causing significant burden on healthcare systems. It is the causative agent for ailments such as impetigo and strep throat to septicaemia and necrotizing fasciitis. Assessment of genetic relatedness for the detection of outbreaks within communities or healthcare facilities is vital in decreasing the propagation of S. pyogenes within these settings, alongside epidemiological data. As the volume of isolates being sequenced increases year on year, more [&hellip;]<\/div><\/li><li class='wp-block-rss__item'><div class='wp-block-rss__item-title'><a href='https:\/\/www.biorxiv.org\/content\/10.64898\/2026.07.07.736532v1?rss=1'>SPARC: A Graph-based Optimization Framework for Directional Trajectory Reconstruction Across Ordered Single-Cell Conditions<\/a><\/div><time datetime=\"2026-07-11T00:00:00-05:00\" class=\"wp-block-rss__item-publish-date\">July 11, 2026<\/time> <span class=\"wp-block-rss__item-author\">by Wu, S., Walker, W. C., Martin, C., Yustein, J. T., Samee, M. A. H.<\/span><div class=\"wp-block-rss__item-excerpt\">Single-cell transcriptomics has enabled systematic profiling of cellular states across ordered biological contexts, including developmental stages, treatment phases, disease progression, and anatomical compartments. A central challenge is to reconstruct trajectories that respect the directionality imposed by biology or experimental design. Existing trajectory inference methods reconstruct cell-state progressions from latent-space geometry but do not enforce external biological ordering during graph construction, yielding biologically inadmissible transitions. An emerging paradigm of optimal-transport (OT) approaches partially addresses this limitation by incorporating experimental ordering into [&hellip;]<\/div><\/li><li class='wp-block-rss__item'><div class='wp-block-rss__item-title'><a href='https:\/\/www.biorxiv.org\/content\/10.64898\/2026.07.11.737898v1?rss=1'>ArthroVerse: mapping protein family diversity across arthropod-associated microbiomes<\/a><\/div><time datetime=\"2026-07-11T00:00:00-05:00\" class=\"wp-block-rss__item-publish-date\">July 11, 2026<\/time> <span class=\"wp-block-rss__item-author\">by Chasapi, I. N., Aplakidou, E., Chasapi, M. N., Lamari, E., Galaras, A., Diplari, S., Iliopoulos, I., Emiris, I. Z., Georgakopoulos-Soares, I., Patalano, S., Stravopodis, D. J., Karatzas, E., Baltoumas, F. A., Kyrpides, N., Pavlopoulos, G. A.<\/span><div class=\"wp-block-rss__item-excerpt\">Metagenomic studies of arthropod-associated microbiomes have generated vast amounts of sequence data, yet the functional and structural organization of these proteins remains largely unexplored. Here, we present ArthroVerse, the first comprehensive database of protein families derived from arthropod-associated metagenomes. Non-redundant protein families were generated after rigorous filtering, deduplication, and clustering. The protein families were further annotated with microbial taxonomy, host associations, protein structural information, and Carbohydrate-active enzymes (CAZyme) predictions. The resulting dataset integrates both metagenomic and reference genome-derived proteins, enabling [&hellip;]<\/div><\/li><li class='wp-block-rss__item'><div class='wp-block-rss__item-title'><a href='https:\/\/www.biorxiv.org\/content\/10.64898\/2026.07.07.737090v1?rss=1'>Multiscale harmonization and semantic integration of biomedical data enable biological insights through immersive exploration<\/a><\/div><time datetime=\"2026-07-11T00:00:00-05:00\" class=\"wp-block-rss__item-publish-date\">July 11, 2026<\/time> <span class=\"wp-block-rss__item-author\">by Bueckle, A., Zhu, C., Wong, A. Y. H., Enninful, A., Miao, Y., Farzad, N., Pedersen, M., Mattison, C., Sloan, N., Mares, J., Xing, C., Herr, B. W., Khare, J., Kumar, Y. R., Parekh, K., Chavan, S., Luby, P., Patel, U., Hickey, J. W., Bader, G. D., Phatnani, H., Menon, V., Fan, R., Sorger, P., Snyder, M., Boerner, K.<\/span><div class=\"wp-block-rss__item-excerpt\">The Human Reference Atlas (HRA) enables multiscale data exploration and visualization. We present &quot;HRA: Powers of Ten,&quot; a virtual reality (VR) application for integrating, harmonizing, and visualizing data within the HRA Organ Gallery. It enables immersive navigation from a whole-body view of 81 organs to datasets across 5 organs, 5 assay types, and 4 spatial scales using a Multiscale Elevator System. The application, data, and code are available open-source.<\/div><\/li><\/ul>\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity is-style-wide\"\/>\n\n\n\n<h4 class=\"wp-block-heading\">Related Journals<\/h4>\n\n\n<ul class=\"su-siblings\"><li class=\"page_item page-item-3099\"><a href=\"https:\/\/kermitmurray.com\/msblog\/links\/journal-feeds\/biochemistry-journal-feeds\/biorxiv\/biorxiv-biochemistry\/\">BioRxiv Biochemistry<\/a><\/li>\n<li class=\"page_item page-item-3132\"><a href=\"https:\/\/kermitmurray.com\/msblog\/links\/journal-feeds\/biochemistry-journal-feeds\/biorxiv\/biorxiv-biophysics\/\">BioRxiv Biophysics<\/a><\/li>\n<li class=\"page_item page-item-3188\"><a href=\"https:\/\/kermitmurray.com\/msblog\/links\/journal-feeds\/biochemistry-journal-feeds\/biorxiv\/biorxiv-cancer-biology\/\">BioRxiv Cancer Biology<\/a><\/li>\n<li class=\"page_item page-item-3190\"><a href=\"https:\/\/kermitmurray.com\/msblog\/links\/journal-feeds\/biochemistry-journal-feeds\/biorxiv\/biorxiv-pharmacology-and-toxicology\/\">BioRxiv Pharmacology and Toxicology<\/a><\/li>\n<li class=\"page_item page-item-3114\"><a href=\"https:\/\/kermitmurray.com\/msblog\/links\/journal-feeds\/biochemistry-journal-feeds\/biorxiv\/biorxiv-systems-biology\/\">BioRxiv Systems Biology<\/a><\/li>\n<li class=\"page_item page-item-3193\"><a href=\"https:\/\/kermitmurray.com\/msblog\/links\/journal-feeds\/biochemistry-journal-feeds\/biorxiv\/biorxiv-zoology\/\">BioRxiv Zoology<\/a><\/li>\n<\/ul>\n","protected":false},"excerpt":{"rendered":"<p>Related Journals<\/p>\n","protected":false},"author":1,"featured_media":2652,"parent":3087,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"","meta":{"footnotes":"","_links_to":"","_links_to_target":""},"class_list":["post-3112","page","type-page","status-publish","has-post-thumbnail","hentry","entry"],"_links":{"self":[{"href":"https:\/\/kermitmurray.com\/msblog\/wp-json\/wp\/v2\/pages\/3112","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/kermitmurray.com\/msblog\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/kermitmurray.com\/msblog\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/kermitmurray.com\/msblog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/kermitmurray.com\/msblog\/wp-json\/wp\/v2\/comments?post=3112"}],"version-history":[{"count":1,"href":"https:\/\/kermitmurray.com\/msblog\/wp-json\/wp\/v2\/pages\/3112\/revisions"}],"predecessor-version":[{"id":3113,"href":"https:\/\/kermitmurray.com\/msblog\/wp-json\/wp\/v2\/pages\/3112\/revisions\/3113"}],"up":[{"embeddable":true,"href":"https:\/\/kermitmurray.com\/msblog\/wp-json\/wp\/v2\/pages\/3087"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/kermitmurray.com\/msblog\/wp-json\/wp\/v2\/media\/2652"}],"wp:attachment":[{"href":"https:\/\/kermitmurray.com\/msblog\/wp-json\/wp\/v2\/media?parent=3112"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}