- by Huson, D.Phylogenetic trees and networks play a central role in biology, bioinformatics, and mathematical biology, and producing clear, informative visualizations of them is an important task. Tanglegrams, which display two phylogenies side by side with lines connecting shared taxa, are widely used for comparing evolutionary histories, host-parasite associations, and horizontal gene transfer. Existing layout algorithms have largely focused on trees and on minimizing the number of inter-taxon edge crossings. We introduce displacement-optimized tanglegrams (DO-tanglegrams), a new approach that applies equally to […]
- by Gong, K., Lu, T., Wang, X., Liu, X.Cytotoxic chemotherapy and immune checkpoint inhibitors (ICIs) have transformed the management of advanced cancers, yet durable responses remain restricted to subsets of patients and strongly depend on the tumor immune microenvironment (TIME). Distinct hot and cold TIMEs differ in pre-existing effector T-cell activity and treatment-induced immunogenicity, suggesting that combination regimens should be tailored to microenvironmental context under realistic clinical constraints. Here we develop a deterministic dynamic optimization framework that jointly designs chemotherapy and ICI dosing schedules on a mechanistic tumor-immune […]
- by Yan, J., Cai, J., Li, Y., Lin, Z., Xian, W., Wei, X., Lei, I. F., Zhou, M., Campbell-Valois, F.-X., Siu, S. W. I.Antimicrobial peptides (AMPs) represent a promising therapeutic strategy to combat the increasing challenge of multidrug-resistant pathogens, a crisis intensified by the overuse of conventional antibiotics. In addition to their broad-spectrum antimicrobial activity, low toxicity, and reduced propensity for resistance development, AMPs offer significant advantages over traditional antibiotic therapies. However, the discovery of novel AMPs through biological experiments remains constrained by high costs, labor-intensive workflows, and time-consuming procedures, underscoring the urgent need for in silico computational methods to design AMP sequences. […]
- by Markarian, N., Engelhardt, B. E., Pierce, N. A., Sternberg, P. W., Pachter, L.Principal component analysis (PCA) and k-means clustering are two seemingly different methods for dimension reduction and clustering, respectively, but can be understood as special cases of inference in a Gaussian latent variable model framework. We leverage this insight to develop a probabilistic framework and methods for simultaneous dimension reduction, clustering, and latent space learning that are efficient and interpretable, and that can replace current ad hoc combinations of PCA and clustering. The algorithm, k-spaces, has broad applicability, which we demonstrate […]
- by Azizpour, A., Rao, N., Segarra, S., Nakhleh, L., Sapoval, N.Gene regulatory networks (GRNs) capture complex regulatory relationships that govern gene expression in cells. Inference of GRNs from single-cell RNA-seq (scRNA-seq) data has been an active topic of research in the past several years. However, despite the improvements in the data quality, the GRN inference problem remains a challenging task with many approaches showing variable performance dependent on the organism and cell type. To improve the quality of GRN inference and enable more comprehensive exploratory analyses of GRNs across various […]
- MARRVEL-MCP enables natural language variant interpretation through autonomous workflow constructionby Everton, Z. J., Botas, J., Kim, S. Y., Liu, Z., Jeong, H.-H.Rare disease variant interpretation requires navigating multiple genomic databases with strict input formats and synthesizing heterogeneous evidence, creating barriers for non-experts and cognitive burdens even for specialists. MARRVEL exemplifies this challenge by requiring precise queries (e.g., HGVS notation) and returning complex, difficult-to-synthesize outputs. To address this input-output asymmetry, we developed MARRVEL-MCP, a natural language interface enabling large language models (LLMs) to perform end-to-end variant interpretation via structured tool access. This work demonstrates the impact of context engineering-the deliberate design of […]
- by Deberneh, H. M., Wilkinson, D. J., Crossland, H., Basisty, N., Smith, K., Atherton, P. J., Sadygov, R. G.We present the first application of a deuterated water metabolic labeling workflow coupled with data-independent acquisition (DIA) tandem mass spectrometry (MS/MS) for quantifying label enrichment in MS/MS to study protein turnover. The approach automates the turnover rate determination from combined precursor and fragment ions. The truncation of the observed isotope distributions of fragments is overcome by implementing an approach to determining the label enrichment from two mass isotopomers. The high redundancy of fragment ions provides a confident assessment for deuterium […]
- by Duzgun, D., Oltean, S.Serine/arginine-rich protein kinase 1 (SRPK1), which primarily regulates alternative splicing (AS), has been implicated in various malignancies. However, the comprehensive expression landscape and clinical relevance of SRPK1 across diverse tumour types have not been systematically investigated. Chemoresistance remains a formidable obstacle in cancer treatment, accounting for nearly 90% of treatment failures and leading to poor patient survival. Aberrant AS, often driven by splicing factors like SRPK1, is a mechanism cancer cells exploit to overcome chemotherapy-induced cytotoxicity. With that, this study […]
- by Ribeiro, C. A. M., Quispe Saji, G. d. R., Cerqueira e Costa, M. d. O., Viana, A. S., Carvalho, M. F., Figueiredo, A. M. S., Galan-Vasquez, E., Martinez-Hernandez, J. E., Nicolas, M. F.Small regulatory RNAs (sRNAs) are fast-acting non-coding RNAs (ncRNAs), stress-responsive regulators that fine-tune bacterial gene expression, shaping virulence, antimicrobial resistance, metabolism, and biofilm development. At the post-transcriptional level, sRNAs pair with target mRNAs to block or enhance translation, remodel secondary structures, adjust transcript stability, and act as molecular sponges for other sRNAs. Staphylococcus aureus, a leading cause of hospital-acquired infections, relies on a multiple-layered regulatory network, including post-transcriptional mechanisms, to transition between planktonic and biofilm lifestyles. Here, we expand the […]
- by McCreight, A., Cho, Y., Li, R., Nachun, D., Gan, H.-Y., Carbonetto, P., Stephens, M., Denault, W. R., Wang, G.Sum of Single Effects regression (SuSiE) has become widely adopted for genetic fine-mapping, yet its original implementation faces architectural limitations that hinder extensibility and performance. We present SuSiE 2.0, featuring a modular redesign for extensibility, up to 5x speed improvements for summary statistics applications, and several useful extensions including SuSiE-ash, a new method that improves calibration when strong signals coexist with moderate effects. Simulations and real data benchmarks demonstrate performance across diverse genetic architectures, highlighting improved calibration of SuSiE-ash for […]
- by Pantolini, L., Studer, G., Engist, L., Pudziuvelyte, I., Pommerening, F., Waterhouse, A. M., Tauriello, G., Steinegger, M., Schwede, T., Durairaj, J.Detecting remote homology with speed and sensitivity is crucial for tasks like function annotation and structure prediction. We introduce a novel approach using contrastive learning to convert protein language model embeddings into a new 20-letter alphabet, TEA, enabling highly efficient large-scale protein homology searches. Searching with our alphabet performs on par with and complements structure-based methods without requiring any structural information, and with the speed of sequence search. Ultimately, we bring the exciting advances in protein language model representation learning […]
- by Shen, X., Song, D., Zhang, H., Yang, Q., Chen, B.Unspecific peroxygenases (UPOs) are capable of catalyzing the selective oxidation of organic substrates under mild conditions, using hydrogen peroxide (H2O2) as the sole oxidant. This makes them one of the most promising biocatalysts for chemical synthesis. However, the major limitation restricting the application of UPOs to date is their difficulty in heterologous expression. Although more than 4,000 putative UPO enzymes have been recorded in databases, only about 50 of them can currently be heterologously expressed. All UPOs discovered so far […]
- by Prihoda, D., Ancona, M., Calounova, T., Kral, A., Polak, L., Hrban, H., Dickens, N. J., Bitton, D. A.The protein design field is rapidly advancing, with frequent emergence of new models and pipelines for designing de novo proteins with tailored properties and functions not found in nature. However, the current tool landscape is fragmented, tools are hard to install and deploy, and require significant computational expertise to integrate into end-to-end, scalable pipelines. A particular challenge is managing many sequences, structures, and metrics for downstream testing and retrospective analysis of input parameters. To address this need, we introduce Ovo, […]
- by Liu, Y., Su, Z., Yang, W., Li, D., Zhang, J., Zhang, Y., Zeng, T., Zhang, Y., Li, Y., Fan, G., Ma, K., Liu, S., Xu, X., Dong, Y., Wang, Z.Protein nanopores are essential molecular gateways in biology and have inspired transformative technologies in biosensing and single-molecule sequencing. While this technology has transformed genomics and biosensing, the discovery of novel nanopore scaffolds remains limited due to the scarcity of experimentally resolved pore structures. Here, we present NanoporeDB, an open-access structural resource comprising over 6,600 high-confidence multimeric models across four representative pore types. Candidate proteins were systematically mined from large-scale datasets, including the AlphaFold Protein Structure Database, UniRef90, and MGnify90, and […]
- by Dai, M., Torok, T., Sun, D., Shende, V., Wang, G., Lin, Y., Wu, S. J., Rukshin, A., Fishell, G., Chen, F.Advances in spatially resolved transcriptomics provide unprecedented opportunities to characterise intercellular communication pathways. However, robust and computationally efficient incorporation of spatial information into intercellular communication inference remains challenging. Here, we present LARIS (Ligand And Receptor Interaction analysis in Spatial transcriptomics), an accurate and scalable method that identifies cell type-specific and spatially restricted ligand-receptor (LR) interactions at single-cell or bead resolution. LARIS is compatible with all spatial transcriptomic technologies and quantifies specificity, infers sender-receiver directionality, and detects how differential interactions vary […]
- by Syed, M., Walter, C., Meyer, H. V.Motivation: Population genetic analyses rely on high quality datasets that pass rigorous controls for sample and marker quality. Many analyses also require additional processing including identification of ancestry and sample relatedness. A software package that addresses all these common, yet crucial tasks is missing. Results: We have developed plinkQC, an R/CRAN package that combines these functionalities into a single software package with detailed vignettes for example applications. plinkQC determines the ancestry of study samples via a pre-trained random forest classifier […]
- by Hassan, M. T., Gaffar, S., Zahid, H., Lee, S. J.Transcription factors (TFs) are pivotal regulators of gene expression and play essential roles in diverse cellular activities. The three-dimensional organization of the genome and transcriptional regulation are predominantly orchestrated by TFs. By recruiting the transcriptional machinery to gene enhancers or promoters, TFs can either activate or repress transcription, thereby controlling gene activity and various biological pathways. Accurate identification of TFs is vital for elucidating gene regulatory mechanisms within cells. However, experimental identification remains labor-intensive and time-consuming, highlighting the necessity for […]
- by Liao, X., Wen, L., Jing, M., Li, X., Chen, B., Zhang, B., Gao, X., Shang, X.Tandem repeats (TRs) are highly polymorphic genomic elements, associated with diverse molecular traits and implicated in numerous human diseases. However, large-scale analysis of TRs has been limited by computational challenges, including motif recognition, detection in complex regions, and excessive computational cost. Here we present FastSTR, a computationally efficient tool for precise detection and characterization of TRs. FastSTR integrates a context-aware N-gram motif model with a segmented global alignment algorithm to enable accurate motif identification and boundary definition, even for repeat […]
- by Wang, A., Geman, D., Chitra, U., Younes, L.Spatial transcriptomics (ST) technologies measure gene expression at thousands of locations within a two-dimensional tissue slice, enabling the study of spatial gene expression patterns. Spatial variation in gene expression is characterized by spatial gradients, or the collection of vector fields describing the direction and magnitude in which the expression of each gene increases. However, the few existing methods that learn spatial gradients from ST data either make restrictive and unrealistic assumptions on the structure of the spatial gradients or do […]
- by Sola, L., Bagordo, D., Carpanzano, S., Santorsola, M., Lescai, F.Moving past learning just algorithms and code is a key challenge of bioinformatics education: the ideal goal is for students to acquire higher-order knowledge such as the ability to solve biological problems with the appropriate tools, and more importantly learn to interpret the results in the broader context where bioinformatics is needed. To design such a teaching and learning experience, data simulations play a key role: however, there is a massive barrier to adoption. Different data types are produced by […]
