- by O'Connell, K. A.We introduce ClaroAI-Bench, an evaluation suite for measuring AI agents' ability to reproduce computational findings from published biomedical research. The benchmark comprises 35 real NIH-funded papers spanning five modalities (genomics, imaging, clinical/EHR, epidemiology, wet-lab) scored on a five-dimension rubric: data findability (D1), data accessibility (D2), code availability (D3), environment reconstructability (D4), and results reproducibility (D5). Each task requires an agent to locate data, obtain code, reconstruct the compute environment, execute the analysis, and verify results against published claims, mirroring the […]
- by Upadhayaya, R., Pradhan, M. M., Metzger, V. T., Malec, S. A.Background: Variable selection for causal inference from observational biomedical data is challenging, as overlooking confounders or conditioning on colliders leads to biased estimates. While vast causal knowledge exists in biomedical literature, manually extracting this information for principled variable selection is impractical at scale. Methods: We developed CausalKnowledgeTrace, a Python-based computational framework with Django web interface that systematically leverages structured causal knowledge from the Semantic MEDLINE Database (SemMedDB) to inform variable selection in causal studies. The system implements a six-stage analysis […]
- by Pawar, P., samarasinghe, s., Kulasiri, D.Bovine tuberculosis (TB), caused by Mycobacterium bovis, has become a global concern over the last two decades. Bovine TB primarily affects cattle, but other domestic livestock are also affected and it is more common in less developed and developing countries. The significant loss of livestock leads to trade restrictions and economic crises. Zoonotic potential of bovine TB raises health concerns for the public. Currently, no effective treatment is available and animal slaughtering is usually undertaken to reduce the burden of […]
- by Zhu, J., Zhang, Z., Gregorio, R. D., Chang, K., Dong, X., Banerjee, K., Liu, K., Rea-Moreno, M., Kizilbash, M., Alonso, A., Liu, J., Tsai, S., Chen, Y.-W., Evans, T., Chen, S.The human sinoatrial node (SAN) functions as the primary pacemaker of the heart and coordinates the hierarchical electrical activity that drives cardiac contraction. However, experimental systems capable of reconstructing pacemaker driven cardiac organization in human tissues remain limited. Here we integrate spatial multi-omics of the human fetal SAN with stem cell engineering to generate pacemaker organoids (Sinoids) and assemble them into a pacemaker driven human mini-heart composed of sinoatrial, atrial and ventricular cardiac modules. High-resolution spatial transcriptomics and single nucleus […]
- by Hobbs, E. E. M., Gloster, T. M., Pritchard, L.Many phytopathogenic bacteria have evolved large, diverse arsenals of Carbohydrate Active enZymes (CAZymes) that liberate simple sugars, and thus nutrition and energy, from the complex lignocellulosic matrices of their plant hosts. The CAZyme arsenals of these phytopathogens are expected to be influenced by and adapted to the cell wall composition of their plant hosts. The solutions these organisms have reached for the problem of degrading plant material may help us understand their host ranges and present a rich source of […]
- by chen, w., Yang, X., Lu, J., Miao, M., Huang, Y., Zheng, S., Zhang, C., Xie, L., Zhang, Y.Whole-body SPECT bone scintigraphy reflects skeletal metabolic activity throughout the body and plays an indispensable role in the screening, treatment evaluation, and prognostic assessment of bone metastases in tumors. However, the automatic detection and segmentation of hypermetabolic bone lesions remain challenging due to low contrast, limited spatial resolution, and complex lesion distributions. In this study, we proposed Bone-Segnet, a dual-view guided automatic segmentation network for hypermetabolic bone lesions that integrated multi-scale feature modeling, global context modeling, and view-conditioned modulation. Pixel-level […]
- by Tian, C., Wang, J., Hou, J., Liu, W., Luo, Y., Wang, Y., Yang, L., Lin, W.Olfactory perception arises from distributed activation across hundreds of olfactory receptors (ORs), yet our understanding of this landscape remains constrained by the scarcity of OR affinity measurements. Here, we present Receptor-Anchored Metric Supervision (RAMS), a transfer learning framework using perceptual consistency as weak supervision to predict OR activation spectra. RAMS fine-tunes a pretrained drug-target affinity model by imposing constraints derived from olfactory perception, where similar odorants are encouraged to exhibit similar OR activations. It transfers protein-ligand interaction knowledge learned from […]
- by Lasch, P.Over the last two decades, matrix-assisted laser desorption/ionization time-of-flight mass spectrometry (MALDI-ToF MS) has become the standard method for identifying bacteria and has found a wide range of applications, especially in clinical microbiology. The method's high taxonomic resolution, minimal sample preparation, and complete, ready-to-use commercial systems, which include instrumentation, experimental protocols, spectral databases, and identification analysis software, were key factors in the success of MALDI-ToF MS as the standard for identifying microorganisms in routine diagnostic laboratories. However, despite the availability […]
- by Griffin, P., Deganutti, G., Jadeja, K., Idigbe, C., Pipito', L., Mejuto, L., Ng, C. P., Peck, S., Greaves, J., Reynolds, C. A.In any field, unquestioningly accepting artificial intelligence (AI) results should be considered bad practise. Here, we devised a comparative modelling-based strategy for validating protein structures that exploits the well-known observation that protein folds are far more conserved than protein sequences. We identify proteins with a similar fold to the AlphaFold-generated query protein and determine their structural alignment to the query. The hypothesis is that if the sequence alignment coincides with the structural alignment, then the structure is validated. The strategy […]
- by Teng, D., Qiu, Y., Sakthivel, G., Aranganathan, A., Herron, L., Tiwary, P.While RNA language models (LMs) have served as foundation models (FMs) to advanced structural prediction, their evaluation relies heavily on supervised downstream tasks. Such tasks can often mask FM inefficiencies and reflect downstream training set memorization. To address this, here we introduce REDIAL (RNA Embedding perturbation Diagnostics for Language models), a zero-shot, unsupervised framework designed to extract coevolutionary signals directly from the high-dimensional latent spaces of RNA language models. By applying REDIAL, we uncover stark, layer-wise disparities in how popular […]
- by Dohi, E.We screened a 5 receptor x 7 aptamer = 35-cell cross-target matrix with HADDOCK3 [1] under blind ambiguous-interaction-restraint (AIR) protocols on AlphaFold-modelled receptors. The screen surfaced 12 operationally distinct failure modes (collapsing to about 8 conceptual classes; Section 3.1). The K_D-calibration subset is n = 4 cells with literature K_D records under matched assay conditions; the broader cohort includes >= 6 biological cognate or intended-cognate cells. The principal case study is P01031 (complement C5, 1676 aa, >= 12 structural domains): […]
- by Spinner, A., Notin, P., Berry, S., Cortade, D., Sisson, Z., Ikonomova, S., Ross, D., Marks, D.Generative models are increasingly used for protein design, but the lack of standardized evaluation frameworks limits comparison across model classes and hinders translation to experimental success. Here, we introduce a unified sampling and benchmarking framework that enables controlled sequence generation across alignment, protein language, and structure-based models, and apply it to Tobacco etch virus (TEV) protease. Across hundreds of thousands of designed sequences, different models explore distinct regions of sequence space with no clear computational selection metrics to assess enzymatic […]
- by Bansal, N., Parsodkar, A. P., Pathak, A., Narayanan, M.Identifying causal relationships, rather than mere associations, is essential for applications such as finding genes driving diseases and guiding drug discovery towards disease mechanisms rather than symptom management. Although many studies extract biomedical relations from large literature corpora such as PubMed, fewer focus on causal relations from abstracts, and fewer still summarize corpus-level evidence for causal links. LLMs (Large Language Models) are increasingly used for biomedical summarization and relation extraction, but explicit benchmarks comparing generalized LLMs against specialized, domain-aware methods […]
- by Inda-Diaz, J. S., Adegoke, F., Löber, U., Jarquin-Diaz, V. H., Duan, Y., Bengtsson-Palme, J., Ugarcina Perovic, S., Coelho, L. P.Identifying antibiotic resistance genes (ARGs) from metagenomic data is critical for studying antimicrobial resistance across microbial communities and pathogens. However, there is no standardized methodology for ARG annotation. Here, we compare ten commonly used ARG detection pipelines by analysing over 270 million prokaryotic genes from the Global Microbial Gene Catalogue across 13 distinct habitats. We observed up to a 45-fold difference in the number of reported ARGs, with a mean Jaccard index of only 16% between pipelines. Pipeline selection profoundly […]
- by Wang, L., Xu, M., Yan, H., Zheng, Y., Feng, S., Zhang, Y., Li, C., Qiu, D., Hu, B., Wan, X., Zhang, F.Early detection of critical transitions in complex diseases is crucial for timely clinical intervention. However, as patients often provide only a single snapshot, identifying sample-specific early-warning signals (EWS) from a dynamical evolution perspective remains challenging, coupled with high-dimensional noise amplification. Here, we present TD-COM, a framework for detecting personalized EWS of critical transitions via single-sample community detection. By constructing a temporal perturbation map STDN, TD-COM captures latent dynamical perturbations inferred from static individual profiles. Synergizing these temporal-deviation signals with static […]
- by Swan, H. K., Baran, A. M., Aparicio-Puerta, E., Halushka, M. K., Jun, S.-H., McCall, M. N.MicroRNAs (miRNAs) are non-coding RNAs, approximately 18 – 24 nucleotides in length, with important gene regulatory functions. In small RNA sequencing (sRNA-seq), observed iso- forms of miRNA, called isomiRs, arise from my biological and technical processes. Alterations in isomiR expression has been linked to a wide variety of human diseases, from cancers to neurological diseases. However, it is difficult to distinguish be- tween technical and biological isomiRs. We present PARiS, an algorithm for the Probabilistic Assignment and Repartitioning of isomiR […]
- by Yang, Y., Yan, Z., Qian, H., Du, L., Wang, C., Peng, Y., Bu, X., Zhou, J.-G., Wang, S.Single-cell RNA sequencing has revolutionized our understanding of cellular heterogeneity, yet linking specific cell subpopulations to clinically relevant phenotypes remains a persistent challenge. Although multiple computational methods have been developed to bridge this gap, they are typically implemented as standalone packages with heterogeneous preprocessing pipelines, incompatible parameter conventions, and divergent output formats, thereby hindering rigorous cross-method benchmarking and reproducible multi-method workflows. Here, we present SigBridgeR, an extensible R framework and comprehensive toolkit that currently unifies eight state-of-the-art phenotype-associated cell screening […]
- by Steyaert, A., Van Hecke, M., Marchal, K., Fostier, J.Background: Detecting distinct bacterial strains in a mixed sample is an important, yet less well-developed aspect of metagenomic research. Several methods exist that successfully retrieve a de novo reconstruction of viral strains. However, the reconstruction of bacterial haplotypes poses its own distinct challenges, and methods that successfully reconstruct full genome-length bacterial strains de novo are scarce. Here, we develop HaploDetox, a method for de novo bacterial haplotype reconstruction from short reads. We use a de Bruijn graph representation of the […]
- by Chen, Y., Sun, M., Tadepally, L., Wang, J., Barcenilla, H., Gonzalez, L., Brodin, P.The application of artificial intelligence to biomedical research increasingly depends on iterative cycles in which AI systems analyze experimental data, propose follow-up conditions, and drive automated execution at scale, a paradigm central to Bio-AI and autonomous laboratory science. For such cycles to operate, laboratory protocols must be expressed in a form that is simultaneously human-readable and machine-executable. Natural-language descriptions, the current standard in laboratory practice, do not satisfy this dual requirement. We present Culsma, a formal language and execution framework […]
- by Gdoura-Ben Amor, M., MATHLOUTHI, N. E. H., BELGUITH, I., DEROUICH, R.Durum wheat (Triticum turgidum subsp. durum) is a Mediterranean dietary staple threatened by accelerating climate change, yet the genomic basis of adaptation in North African landraces remains poorly characterised. We present the first integrated whole-genome sequencing (WGS) and RNA-seq study of two contrasting Tunisian landraces: humid-adapted Chili and arid-adapted Mahmoudi. From 27,777 high-confidence SNPs, permutation-based FST outlier analysis (1,000 shuffles) identified 46 selection hotspots across six chromosomes, with a peak signal on chromosome 6B (FST = 0.833; p = 0.013). […]
