• by Brown, S. D., Dreolini, L., Minor, A., Mozel, M., Wong, N., Mar, S., Lieu, A., Khan, M., Carlson, A., Hrynchak, M., Holt, R. A., Missirlis, P. I.
    The Oxford Nanopore Technologies' sequencing platform offers a path towards bedside genomics, producing long reads that can completely cover a gene of interest, and thus detect any known or novel variant the gene contains. However, the analysis of these long reads to identify actionable genotypes remains challenging and typically requires customization depending on the target gene. Here, we describe a generic algorithm to accurately reconstruct allele sequences derived from long-reads of genomic-amplicon origin. Rather than calling variants directly from these […]
  • by Zhang, P., Han, R., Kong, X., Chen, T., Ma, J.
    Structure-based generative models often optimize single target affinity while ignoring specificity, producing candidates prone to off-target binding. We introduce SpecLig, a unified, structure-based framework that jointly generates small molecules and peptides with improved target affinity and specificity. SpecLig represents a complex as a block-based graph, combining a hierarchical SE(3)-equivariant variational autoencoder with an energy-guided geometric latent diffusion model. Chemical priors derived from block-block contact statistics are explicitly incorporated, biasing generation toward pocket-complementary fragment combinations. We benchmark SpecLig on peptide and […]
  • by Peng, K., Chen, W., Yao, T., Xia, H., Fu, G., Li, G., Bao, Y., Liu, E., Zhao, L., Wang, G.
    Next-generation sequencing (NGS) remains the most used sequencing technique in the field of genomics. Traditional basecall methods face significant challenges in decoding high density sequencing data due to inherent noise in biochemical reactions and limitations of instruments. Here, we present a multi-dimensional deep learning neural network based on spatiotemporal attention mechanism named AICall. The network skips computationally heavy but less effective steps of peak finding and brightness extraction/correction, and directly basecalls from the time sequence of multi-dimensional image stacks obtained […]
  • by Szikszai, M., Wang, T.-Y., Krueger, R., Mathews, D. H., Ward, M., Aviran, S.
    The diverse regulatory functions, protein production capacity, and stability of natural and synthetic RNAs are closely tied to their ability to fold into intricate structures. Determining RNA structure is thus fundamental to RNA biology and bioengineering. Among existing approaches to structure determination, computational secondary structure prediction offers a rapid and low-cost strategy and is thus widely used, especially when seeking to identify functional RNA elements in large transcriptomes or screen massive libraries of novel designs. While traditional approaches rely on […]
  • by Heydari, M. J., Lye, B., Masouri, P., Marsland, T., Lock, J., McKenna, J., Vafaee, F. G.
    Accurately predicting drug synergy is critical to accelerate the development of combination therapies for cancer and other complex diseases. Yet, the vast combinatorial drug and dose space poses a substantial challenge, even for modern deep learning approaches. Existing approaches often lack generalisability, collapse rich dose response surfaces into single dose averaged synergy scores, and fail to quantify predictive uncertainty. Here, we introduce AlgoraeOS, a biologically informed, attention-aware deep neural network designed to address these challenges. Trained on the largest harmonised […]
  • by Zhao, W., Sutherland, D. J., Dao Duc, K.
    Modern imaging technologies produce vast collections of cellular and subcellular structures, calling for principled methods that enable shape comparison across individuals and populations. We introduce the stratified Wasserstein framework, which treats each shape as an unstructured point cloud and embeds it into Euclidean space via ranked local distance profiles. This embedding yields an isometry-invariant Euclidean distance and a positive-definite kernel for population analysis, with a consistent sample-based estimator that supports large datasets in near-quadratic time. By leveraging kernel methods, the […]
  • by Qian, J., Yang, L., Wang, R., Qi, Y.
    Protein solubility is a critical physicochemical property influencing protein stability, therapeutic efficacy, and overall developability in drug discovery. However, traditional experimental methods for assessing solubility are often resource-intensive and time-consuming. To address these limitations, computational approaches leveraging artificial intelligence have emerged, yet current models generally treat qualitative classification and quantitative regression as separate tasks and rely predominantly on sequence-based information, neglecting crucial structural and surface characteristics. Here, we introduce Pro4S, a novel multimodal predictive model that integrates protein language models, […]
  • by Atas Guvenilir, H., Dogan, T.
    Discovering new, efficacious molecules remains slow and costly; rigorous data science-driven systems-level approaches are therefore essential to prioritise hypotheses and de-risk drug development. In this study, we present ECLIPSE, a systems-level framework for compound/ligand-protein interaction (CPI) representation and prediction, combining heterogeneous knowledge graphs (KGs), which encode large-scale entity-relation structure, with graph neural networks that exploit relational inductive biases to perform inference on graph-structured data. ECLIPSE uses our comprehensive biomedical KG-based platform, CROssBAR, incorporating genes/proteins, drugs, compounds, pathways, diseases, and phenotypes, […]
  • by Froehlich, H., Patajoshi, S., Madan, S.
    Targeted protein degradation (TPD) has transformed modern drug discovery by harnessing the ubiquitin proteasome system to eliminate disease-driving proteins previously deemed undruggable. However, current approaches predominantly rely on a narrow set of ubiquitously expressed E3 ligases, such as Cereblon (CRBN) and Von Hippel Lindau (VHL), which limits tissue specificity, increases systemic toxicity, and fosters resistance. Here, we present an AI-driven framework for the rational identification of tissue specific E3 ligases suitable for precision-targeted degradation. Our model leverages a BERT-based protein […]
  • by Wolf, M., Knipper, L., Schallert, K., Groba, A.-C., Hellwig, P., Bang, C., Rausch, P., Franke, A., Benndorf, D., Aden, K., Sczyrba, A., Heyer, R.
    Inflammatory bowel disease (IBD) is a chronic intestinal disorder involving recurring inflammation and pronounced microbial dysbiosis. Comprehensive studies with large patient cohorts are required to Identify meaningful biomarker candidates for diagnosing and monitoring IBD. In this large-scale meta-study of over 600 samples based on fecal metaproteomics, our goal was to validate known biomarkers and discover new candidates. We performed bioinformatic reanalysis using the Mascot search engine and MMUPHin for batch effect correction as well as knowledge graph-enhanced data analysis. We […]
  • by Pielesiak, J., Niznik, K., Snioszek, P., Wachowski, G., Zurawski, M., Antczak, M., Szachniuk, M., Zok, T.
    RNApdbee 3.0 (publicly available at https://rnapdbee.cs.put.poznan.pl/) offers an advanced pipeline for comprehensive RNA structural annotation, integrating 2D and 3D data to build detailed nucleotide interaction networks. It classifies base pairs as canonical or noncanonical using the Leontis-Westhof and Saenger schemes and identifies stacking, base-ribose, base-phosphate, and base-triple interactions. The tool handles incomplete or modified residues, marking missing nucleotides and distinguishing noncanonical base pairs for accurate and effective visualization. Results are provided in standard formats – namely, extended dot-bracket notation, BPSEQ, […]
  • by Ahuja, G., Antill, A., Su, Y., Dall'Olio, G. M., Basnayake, S., Karlsson, G., Dhapola, P.
    Cell type annotation remains a critical bottleneck, with current methods often inaccurate and requiring extensive manual validation, particularly in disease contexts. While large language models (LLMs) show promise, they can be unreliable due to hallucinations. We developed CyteType, a multi-agent framework that generates competing hypotheses grounded in full expression data and study context, validates against external databases, and iteratively self-evaluates. Comprehensive benchmarking demonstrates that CyteType substantially outperforms reference-based and LLM-based methods, with self-generated confidence scores reliably identifying trustworthy annotations. CyteType […]
  • by Marques, L. L., Pinho, A. J., Pratas, D.
    Ancient DNA (aDNA) sequences present unique challenges for taxonomic classification due to extreme fragmentation (reads 20-100 bp), end-biased cytosine deamination, and high contamination rates. Conventional metagenomic classifiers based on exact k-mer matching or alignment lose discriminative power on such short and damaged reads, limiting the analysis of paleogenomic samples. We present FALCON2, a compression-based metagenomic classifier that leverages position-aware finite-context models to maintain high accuracy on degraded viral ancient viruses. FALCON2 consolidates the capabilities of its predecessor, FALCON-meta, into a […]
  • by Bunga, S., Tan, A., Roos, M., Kuersten, S.
    Metatranscriptomic (MetaT) sequencing provides critical insights into the gene expression and functional activity of microbial communities. However, its utility is limited by the overwhelming abundance of ribosomal RNA (rRNA), which typically represents [≥]90% of total RNA [1-2]. A major obstacle to efficient MetaT analysis is the removal of highly abundant rRNA transcripts present in complex microbial communities, which may contain thousands of species. Although commercial rRNA depletion kits can effectively reduce rRNA content, they are typically optimized for specific host […]
  • by Lieftinck, M., Verlaan, T., Reinders, M.
    Deep Neural Networks (DNNs) are renowned for their high accuracy and versatility, which has led to their application in many fields of research, including biology. However, this accuracy often comes at the expense of interpretability, making it challenging to reason about the inner workings of most DNNs. Particularly in biological research, understanding the mechanisms behind specific outcomes is highly valuable. To elucidate the latent space of DNNs in the context of cancer biology, we introduce GONNECT: a Gene Ontology-derived Neural […]
  • by Mendoza Cantu, A., Gauglitz, J., Bittremieux, W.
    Untargeted tandem mass spectrometry (MS/MS)-based metabolomics enables broad characterization of small molecules in complex samples, yet the majority of spectra in a typical experiment remain unannotated, limiting biological interpretation. Reference data-driven (RDD) metabolomics addresses this gap by contextualizing spectra through comparison to curated, metadata-annotated reference datasets, allowing inference of spectrum origins without requiring exact structural identification. Here, we present an open-source RDD metabolomics platform comprising a user-friendly web application and a Python software package that perform RDD analyses directly from […]
  • by Pang, Y., Chen, L., Dodge, H. H., Zhou, J.
    Background: Digital language markers show promise in detecting early cognitive impairment related to Alzheimer's disease (AD), yet their relationship with cerebrospinal fluid (CSF) biomarkers of AD pathology remains unclear mainly due to the lack of data with both CSF and language markers. Objective: This study aims to build links between digital language markers and fluid biomarkers through surrogate CSF biomarkers. Methods: Using NACC clinical data as anchor variables, language makers in the I-CONECT study were linked to NACC CSF data. […]
  • by Liu, J., Coker, M. O., Osazuwa-Peters, N., Peter, O., Idemudia, N. L., Schlecht, N. F., Obuekwe, O., Eki-Udoko, F. E., Bromberg, Y.
    BackgroundWhole metagenome shotgun sequencing (WMS) is widely used to profile microbial function. However, technical variability in sequencing and analysis often obscures true biological patterns. Large-scale studies are particularly susceptible to batch effects, such as differences in sequencing depth and platform and annotation strategies, as well as sample-to-flow-cell assignments. However, the relative effects of these factors on functional inference in such studies have yet to be systematically evaluated. We analyzed oral-rinse WMS data from a study cohort including 671 Nigerian youths […]
  • by Kuo, M., Le Cao, K.-A., Kodikara, S., Mao, J., Sankaran, K.
    Summary: Stacked barplots, though widely used in microbiome studies, can obscure important patterns in microbiome data. They omit rare taxa and can mask shifts that emerge at finer taxonomic levels. To address this issue, we introduce phylobar, an R package that interactively links stacked barplots with overview phylogenetic or taxonomic hierarchies. The interface allows users to collapse or expand subtrees, paint color palettes interactively, and search for specific taxa. This allows comparison across taxonomic resolutions that are hidden in static […]
  • by Bota, P. M., Picon-Pages, P., Fanlo-Ucar, H., Almabhouh, S., Bagudanch, O., Zeylan, M. E., Senyuz, S., Gohl, P., Molina-Fernandez, R., Fernandez-Fuentes, N., Barbu, E., Vicente, R., Nattel, S., Ois, A., Puig-Pijoan, A., Garcia-Ojalvo, J., Keskin, O., Gursoy, A., Munoz, F. J., Oliva, B.
    Astrocytes are central to brain homeostasis, supporting neuronal metabolism, synaptic activity, and the blood-brain barrier. With aging, these glial cells undergo molecular and functional changes that weaken support functions and promote neuroinflammation, contributing to neurodegeneration. Yet the systems-level mechanisms of astrocytic aging remain poorly defined in human models. Because aging also heightens risk for cardiovascular disease, cognitive impairment, type 2 diabetes, and systemic inflammation, clarifying shared astrocytic pathways is critical for understanding brain-body crosstalk. Using an in vitro human astrocyte […]

Related Journals