- by Degn, K., Utichi, M., Besora, P. S.-I., Tiberti, M., Papaleo, E.Change in protein stability, quantified as the change in Gibbs free energy of folding ({Delta}{Delta}G) in kcal/mol, plays a crucial role in functional alterations of proteins, with misfolding and destabilization commonly associated with pathogenicity. The past two decades have brought the development of bioinformatics tools leveraging evolutionary knowledge, empirical force fields, and machine learning to predict stability alterations. However, existing tools are often optimized towards or trained on limited experimental data, leading to unbalanced datasets and potential overfitting. The research […]
- by Dennler, O., Chenel, E., Coste, F., Blanquart, S., Belleannee, C., Theret, N.Summary: FUSE-PhyloTree is a phylogenomic analysis software for identifying local sequence conservation associated with the different functions of a multi-functional (e.g., paralogous or multi-domain) protein family. FUSE-PhyloTree introduces an original approach that combines advanced sequence analysis with phylogenetic methods. First, local sequence conservation modules within the family are identified using partial local multiple sequence alignment. Next, the evolution of the detected modules and known protein functions is inferred within the family's phylogenetic tree using three-level phylogenetic reconciliation and ancestral state […]
- by Fawzy, M., Marsh, J. A.Intrinsically disordered protein regions (IDPRs) are central to diverse cellular processes but present unique challenges for interpreting genetic variants implicated in human disease. Unlike structured protein domains, IDPRs lack stable three-dimensional conformations and are often involved in regulation through transient interactions and post-translational modifications. These features can affect both the distribution of pathogenic variants and the performance of computational tools used to predict their effects. Here, we systematically assessed the distribution of pathogenic vs benign missense variants across disordered, intermediate, […]
- by Karimzadeh, M., Sababi, A. M., Momen-Roknabadi, A., Chen, N.-C., Cavazos, T. B., Sekhon, S., Wang, J., Hanna, R., Huang, A., Nguyen, D., Chen, S., Lam, T., Hartwig, A., Fish, L., Li, H., Behsaz, B., Hormozdiari, F., Alipanahi, B., Goodarzi, H.Cell-free RNA (cfRNA) profiling has emerged as a powerful tool for non-invasive disease detection, but its application is limited by data sparsity and complexity, especially in settings with constrained sample availability. We introduce Exai-1, a multi-modal, transformer-based generative foundation model that integrates RNA sequence embeddings with cfRNA abundance data to capture biologically meaningful representations of circulating RNAs. By leveraging both sequence and expression modalities, Exai-1 captures a biologically meaningful latent structure of cfRNA profiles. Pre-trained on over 306 billion tokens […]
- by Zhu, Y.-H., Zhu, S., Yu, X., Yan, H., Liu, Y., Xie, X., Yu, D.-J., Ye, R.Accurately identifying protein functions is essential to understand life mechanisms and thus advance drug discovery. Although biochemical experiments are the gold standard for determining protein functions, they are often time-consuming and labor-intensive. Here, we proposed a novel composite deep-learning method, MKFGO, to infer Gene Ontology (GO) attributes through integrating five complementary pipelines built on multi-source biological data. MKFGO was rigorously benchmarked on 1522 non-redundant proteins, demonstrating superior performance over 11 state-of-the-art function prediction methods. Comprehensive data analyses revealed that the […]
- by Cooke, J., Wieder, C., Poupin, N., Frainay, C., Ebbels, T., Jourdan, F.Initially developed for transcriptomics data, pathway analysis (PA) methods can introduce biases when applied to metabolomics data, especially if input parameters are not chosen with care. This is particularly true for exometabolomics data, where exported metabolites may be far from internal disruptions in the organism. Experimentally evaluating PA methods is practically impossible when the sample's "true" metabolic disruption is unknown. Using in silico metabolic modelling, we simulated metabolic profiles for entire pathway knockouts, providing both a known disruption site as […]
- by Liu, K., Liu, J., Wang, C.A key challenge in single-cell RNA sequencing (scRNA-seq) analysis is clustering cells based on their expression profiles. Effective clustering requires selecting the most informative gene features whose varying expression levels in different cell types can be used to discriminate between different cell types. This study introduces DIFS, a novel statistical framework designed to enhance discriminative feature selection for scRNA-seq-based cell clustering. DIFS operates in two stages. In the first stage, a modified dip test identifies genes with significant multimodal expression […]
- by Deng, Y., Mao, J., Choi, J., Le Cao, K.-A.Gene regulatory networks (GRNs) provide a fundamental framework for understanding the molecular mechanisms that govern gene expression. Advances in single-cell RNA sequencing (scRNA-seq) have enabled GRN inference at cellular resolution; however, most existing approaches rely on predefined clusters or cell states, implicitly assuming static regulatory programs and potentially missing subtle, dynamic variation in regulation across individual cells. To address these limitations, we introduce NeighbourNet (NNet), a method that constructs cell-specific co-expression networks. NNet first applies principal component analysis to embed […]
- by Park, S. A., Kim, Y., Gurnari, D., Dlotko, P., Hahn, J.The spatial interactions between malignant and immune cells in the tumor microenvironment (TME) play a crucial role in cancer biology and treatment response. Understanding these interactions is critical for predicting prognosis and assessing immunotherapy effectiveness. Conventional methods, which focus on local spatial features, often struggle to achieve robust analysis due to the complex and heterogeneous cellular distributions. We propose a Topological Data Analysis (TDA)-based framework using both global and local spatial features between malignant and immune cells. For the global […]
- by Del Azodi, C. B., Dunstone, A. M., McCarthy, D. J.Motivation: The scope of many Quantitative Trait Loci (QTL) mapping studies has increased to include different cellular and environmental states. However, drawing biologically relevant conclusions from the large, high-dimensional data that come from multi-state QTL mapping studies is not straightforward. Results: To address this problem, we introduce two R packages, QTLExperiment and multistateQTL. The QTLExperiment package provides a robust container for storing and manipulating QTL summary statistics and associated metadata. Building upon existing Bioconductor infrastructure and conventions, this object class […]
- by Dohmen, E., Aubel, M., Eicholt, L. A., Roginski, P., Luria, V., Karger, A., Grandchamp, A.Motivation: De novo genes emerge from previously non-coding regions of the genome, challenging the traditional view that new genes primarily arise through duplication and adaptation of existing ones. Characterised by their rapid evolution and their novel structural properties or functional roles, de novo genes represent a young area of research. Therefore, the field currently lacks established standards and methodologies, leading to inconsistent terminology and challenges in comparing and reproducing results. Results: This work presents a standardised annotation format to document […]
- by Parks, B., Greenleaf, W.The growth of single-cell datasets to multi-million cell atlases has uncovered major scalability problems for single-cell analysis software. Here, we present BPCells, a package for high-performance single-cell analysis of RNA-seq and ATAC-seq datasets. BPCells uses disk-backed streaming compute algorithms to reduce memory requirements by nearly 70-fold compared to in-memory workflows with little to no loss of execution speed. BPCells also introduces high-performance compressed formats based on bitpacking compression for ATAC-seq fragment files and single-cell sparse matrices. These novel compression algorithms […]
- by Dash, H., Roberts, T., Weinert, M., Skene, N.Summary MotifPeeker benchmarks epigenomic datasets where no "gold standard" reference exists, using motif enrichment as a key metric. With minimal input, users can analyse their data in a single function and receive an intuitive HTML report. Availability and Implementation MotifPeeker is available on Bioconductor at https://bioconductor.org/packages/devel/bioc/html/MotifPeeker.html. The complete source code is available on GitHub at https://github.com/neurogenomics/MotifPeeker, with full documentation provided at https://neurogenomics.github.io/MotifPeeker. Additionally, the MotifPeeker Docker image is hosted on GitHub at https://github.com/neurogenomics/MotifPeeker/pkgs/container/motifpeeker.
- by Yoshida, K., Hisada, S., Takase, R., Okuma, A., Ishida, Y., Kawara, T., Miura-Yamashita, T., Ito, D.Chimeric antigen receptor (CAR)-T cell therapy has shown remarkable success in treating hematological malignancies; however, several challenges remain, including limited efficacy against solid tumors, T cell exhaustion, and lack of T cell persistence, which have restricted its clinical efficacy across various indications. Sequence optimization of CAR constructs offers a promising strategy to enhance therapeutic efficacy of CAR-T cells. Recent advances in machine learning, especially protein language models (PLMs), enable prediction of mutational effects based on sequence representations. Nevertheless, applying PLMs […]
- by Ajmal, H. B., Nandi, S. B., Kebabci, N. B., Ryan, C.Synthetic lethality (SL) is an extreme form of negative genetic interaction, where simultaneous disruption of two non-essential genes causes cell death. SL can be exploited to develop cancer therapies that target tumour cells with specific mutations, potentially limiting toxicity. Pooled combinatorial CRISPR screens, where two genes are simultaneously perturbed and the resulting impacts on fitness estimated, are now widely used for the identification of SL targets in cancer. Various scoring methods have been developed to infer SL genetic interactions from […]
- by Wan, S., Zhou, T., Ma, Y., Chen, Y., Zhou, C. C., Peng, J., Lin, L., Luo, W., Gu, W., Liu, Z., Hua, X.Background: Preeclampsia (PE) is a pregnancy-related hypertensive disorder and a leading cause of maternal and perinatal mortality. Current treatments focus primarily on symptom management, as delivery remains the only definitive cure. This underscores the urgent need for innovative therapeutic strategies. Cytokines released by placental immune cells may contribute to the progression of PE and represent promising therapeutic targets. Methods: We conducted single-cell sequencing on placental tissues obtained via cesarean section from patients with severe PE and cases of non-infectious preterm […]
- by Izquierdo-Lozano, C., Tholen, M. M. E., Girola, V., Swietlikowska, A., Merkx, M., Grisoni, F., Albertazzi, L.Bioinformatics and cheminformatics are established disciplines, but nanoinformatics, the development of computational tools for understanding and designing nanomaterials, is still in its infancy. In light of the new data-driven approaches for nanomaterials discovery, there is a growing need for in silico tools tailored to analyze nanomaterials datasets. This is particularly crucial for soft materials, where a crystalline structure cannot be obtained and therefore the characterization datasets are less structured, and there are no standard methods for data mining. Here we […]
- by Wang, Z., Xu, Y., Hu, Y., Gao, L.Diverse cell types within a tissue assemble into multicellular structures to shape the functions of the tissue. These structural modules typically comprise specialized subunits, each performing unique roles. We present HRCHY-CytoCommunity, a graph neural network-based framework that leverages differentiable graph pooling and graph pruning to identify hierarchical multicellular structures in cell phenotype-annotated single-cell spatial maps. HRCHY-CytoCommunity ensures the robustness of the result by employing a hierarchical majority voting strategy. To extend the utility of HRCHY-CytoCommunity in cross-sample comparative analysis, an […]
- by Zhang, B., Zhang, Y., Zhao, Y.Abstract Pancreatic ductal adenocarcinoma (PDAC) is an exceptionally aggressive cancer with a 5-year survival rate of less than 10%, driven by late-stage diagnosis, limited treatment options, and a lack of reliable biomarkers for early detection and prognosis. In this study, we integrated DNA methylation data from TCGA and ICGC cohorts, categorizing samples based on survival time, and identified 684 differentially methylated CpG sites, along with 224 CpG biomarkers significantly associated with patient survival through statistical and machine learning-based analyses. We […]
- by McGuire, C. E., Hibbs, M. A.Automated function prediction (AFP) is the process of predicting the function of genes or proteins with machine learning models trained on high-throughput biological data. Deep learning with neural networks has become the dominant machine learning architecture of contemporary AFP models. However, it is unclear what difference exists between neural networks and previous machine learning architectures for AFP. Therefore, we created a model of AFP in yeast using neural networks that is trained on gene co-expression data to predict Gene Ontology […]