• by Yan, X., Chen, J., Zheng, R., Li, M.
    The integration of single-cell multi-omics datasets is critical for deciphering cellular heterogeneities. Mosaic integration, the most general integration task, poses a greater challenge regarding disparity in modality abundance across datasets. Here, we present ACE, a mosaic integration framework that assembles two types of strategies to handle this problem: modality-alignment based strategy (ACE-align) and regression-based strategy (ACE-spec). ACE-align utilizes a novel contrastive learning objective for explicit modality alignment to uncover the shared latent representations behind modalities. ACE-spec combines the modality-alignment results […]
  • by Martinovic, I., Vlasic, T., Li, Y., Hooi, B., Zhang, Y., Sikic, M.
    Several deep learning-based tools for RNA 3D structure prediction have recently emerged, including DRfold, DeepFoldRNA, RhoFold, RoseTTAFoldNA, trRosettaRNA, and AlphaFold3. In this study, we systematically evaluate these six models on three datasets: RNA Puzzles, CASP15 RNA targets, and a newly generated large dataset of sequentially distinct RNAs, which serves as a benchmark for generalization capabilities. To ensure a robust evaluation, we also introduce a fourth, more stringent dataset that contains both sequentially and structurally distinct RNAs. We observed that each […]
  • by Wang, X., Zhang, H., Huang, J., Qin, Z.
    We present MAPPE, a novel algorithm integrating a k-nearest neighbor (KNN) similarity network with co-occurrence matrix analysis to extract evolutionary insights from protein language model (PLM) embeddings. The KNN network captures diverse evolutionary relationships and events, while the co-occurrence matrix identifies directional evolutionary paths and potential signals of gene transfer. MAPPE overcomes the limitations of traditional sequence alignment methods in detecting structural homology and functional associations in low-similarity protein sequences. By employing sliding windows of varying sizes, it analyzes embeddings […]
  • by Dang, T., Lysenko, A., A. Boroevich, K., Tsunoda, T.
    The analysis of high-dimensional microbiome multiomics datasets is crucial for understanding the complex interactions between microbial communities and host physiological states across health and disease conditions. Despite their importance, current methods such as the microbe-metabolite vectors (MMvec) approach, often fail to efficiently identify keystone species. This arises from the vast dimensionality of metagenomics data which complicates the inference of significant relationships, particularly the estimation of co-occurrence probabilities between microbes and metabolites. Here we propose the variational Bayesian microbiome multiomics (VBayesMM) […]
  • by Mclaughlin, S. M., Ahanger, S. H., Lim, D. A.
    The spatial organization of the genome within the nucleus is partially determined by its interactions with distinct nuclear subcompartments, such as the nuclear lamina and nuclear speckles, which play key roles in gene regulation during development. However, whether these genome-nuclear subcompartment interactions are encoded in the underlying DNA sequence remains poorly understood. The mechanisms for gene regulation are primarily encoded in noncoding DNA sequences, but deciphering how these sequence features control gene expression remains a significant challenge in genomics. Here, […]
  • by Lu, L., McLinden, A. P., Walker, N. M., Vittal, R., Wang, Y., Combs, M. P., Welch, J. D., Lama, V. N.
    Primary graft dysfunction (PGD) and chronic lung allograft dysfunction (CLAD) are critical challenges in lung transplantation. Dysregulated gene expression and epigenomic states in lung mesenchymal cells (MCs) play a key role in these conditions, but further work is needed to elucidate the biomarkers and molecular drivers. Single-cell multi-omic technologies offer an unprecedented opportunity to address this gap by jointly measuring gene expression and chromatin accessibility in diseased and healthy cells. We performed single-cell multi-omic profiling and genetic demultiplexing on MCs […]
  • by Steenwyk, J. L., Buida, T. J.
    Phylogenomics aims to reconstruct the history of genes and genomes. However, noise or error during inference can stem from diverse sources, such as compositional biases. Here, we introduce RCVT (Relative Composition Variability among Taxa), a novel metric to quantify compositional biases among taxa. We demonstrate the utility of RCVT using example data and quantify compositional biases in 16 empirical phylogenomic datasets, revealing variation in biases across datasets and taxa therein. RCVT may help researchers diagnose and potentially ameliorate phylogenomic noise […]
  • by Molodenskiy, D., Maurer, V., Yu, D., Chojnowski, G., Bienert, S., Tauriello, G., Gilep, K., Schwede, T., Kosinski, J.
    AlphaPulldown2 streamlines protein structural modeling by automating workflows, improving code adaptability, and optimizing data management for large-scale applications. It introduces an automated Snakemake pipeline, compressed data storage, support for additional modeling backends like UniFold and AlphaLink2, and a range of other improvements. These upgrades make AlphaPulldown2 a versatile platform for predicting both binary interactions and complex multi-unit assemblies.
  • by Jofily, P., Kalyaanamoorthy, S.
    Proteolysis Targeting Chimeras (Protacs) are a new class of drugs which promote degradation of a protein of interest (POI) by hijacking the Ubiquitin-Proteasome system. Structural knowledge of an E3 ligase:Protac:POI ternary complex is required for Protac rational design, and computational modelling of such heteromeric complex structures is nontrivial. To date, few programs have been developed to address this challenge, however, there remains a need for readily accessible tools that can significantly improve ternary complex modelling accuracy. Particularly, programs that can […]
  • by Altenbuchinger, M. C., Mensching-Buhr, M., Sterr, T., Seifert, N., Voelkl, D., Tauschke, J., Rayford, A., Zacharias, H. U., Grellscheid, S. N., Beissbarth, T., Goertler, F.
    Gene expression profiles of heterogeneous bulk samples contain signals from multiple cell populations. Studying variations in their composition can help to identify cell populations relevant for disease. Moreover, analyses, such as the identification of differentially expressed genes, can be confounded by cellular composition, as differences in gene expression may arise from both variations in cellular composition and gene regulation. Here, we present Deconvolution of omics data (Deconomix) — a comprehensive toolbox for the cell-type deconvolution of bulk transcriptomics data. Deconomix […]
  • by Sun, N., Zou, S., Tao, T., Mahbub, S., Li, D., Zhuang, Y., Wang, H., Cheng, X., Song, L., Xing, E. P.
    Proteins play a fundamental role in life. Understanding the language of proteins offers significant potential for gaining mechanistic insights into biological systems and introduces new avenues for treating diseases, enhancing agriculture, and safeguarding the environment. While large protein language models (PLMs) like ESM2-15B and xTrimoPGLM-100B have achieved remarkable performance in diverse protein understanding and design tasks, these models, being dense transformer models, pose challenges due to their computational inefficiency during training and deployment. In this work, we introduce AIDO.Protein, a […]
  • by Colange, M., Appe, G., Meunier, L., Weill, S., Johnson, W. E., Nordor, A., Behdenna, A.
    We introduce InMoose, an open-source Python environment aimed at omic data analysis. We illustrate its capabilities for bulk transcriptomic data analysis. Due to its wide adoption, Python has grown as a de facto standard in fields increasingly important for bioinformatic pipelines, such as data science, machine learning, or artificial intelligence (AI). As a general-purpose language, Python is also recognized for its versatility and scalability. InMoose aims at bringing state-of-the-art tools, historically written in R, to the Python ecosystem. Our intent […]
  • by Wang, J., Crowell, H., Robinson, M. D.
    Single-cell omics approaches profile molecular constituents of individual cells. Replicated multi-condition experiments in particular aim at studying how the molecular makeup and composition of cell subpopulations changes at the sample-level. Two main approaches have been proposed for these tasks: firstly, cluster-based methods that group cells into (non-overlapping) subpopulations based on their molecular profiles and, secondly, cluster-free but neighborhood-based methods that identify (overlapping) groups of cells in consideration of cross-condition changes. In either approach, discrete cell groups are subjected to differential […]
  • by Kell, D. B., Pretorius, E.
    A recent analysis compared the proteome of (i) blood clots seen in two diseases, viz. sepsis and long COVID, when blood was known to have clotted into an amyloid microclot form (as judged by staining with the fluorogenic amyloid stain thioflavin T) with (ii) that of those non-amyloid clots considered to have formed normally. Such fibrinaloid microclots are also relatively resistant to fibrinolysis. The proteins that the amyloid microclots contained differed markedly both from the soluble proteome of typical plasma […]
  • by Ho, N., Ellington, C. N., Hou, J., Addagudi, S., Mo, S., Tao, T., Li, D., Zhuang, Y., Wang, H., Cheng, X., Song, L., Xing, E. P.
    Developing a unified model of cellular systems is a canonical challenge in biology. Recently, a wealth of public single-cell RNA sequencing data as well as rapid scaling of self-supervised learning methods have provided new avenues to address this longstanding challenge. However, rapid parameter scaling has been essential to the success of large language models in text and images, while similar scaling has not been attempted with Transformer architectures for cellular modeling. To produce accurate, transferable, and biologically meaningful representations of […]
  • by Luo, S., Germain, P.-L., von Meyenn, F., Robinson, M. D.
    Benchmarks are crucial to understanding the strengths and weaknesses of the growing number of tools for single-cell and spatial omics analysis. A key task is to distinguish subpopulations within complex tissues, where evaluation typically relies on external clustering validation metrics. Different metrics often lead to inconsistencies between rankings, highlighting the importance of understanding the behavior and biological implications of each metric. In this work, we provide a framework for systematically understanding and selecting validation metrics for single-cell data analysis, addressing […]
  • by Bass, A. J., Cutler, D. J., Epstein, M. P.
    Differential co-expression analysis (DCA) aims to identify genes in a pathway whose shared expression depends on a risk factor. While DCA provides insights into the biological activity of diseases, existing methods are limited to categorical risk factors and/or suffer from bias due to batch and variance-specific effects. We propose a new framework, Kernel-based Differential Co-expression Analysis (KDCA), that harnesses correlation patterns between genes in a pathway to detect differential co-expression arising from general (i.e., continuous, discrete, or categorical) risk factors. […]
  • by Hawkins, R., Balaghi, N., Rothenberg, K. E., Ly, M., Fernandez-Gonzalez, R.
    Segmenting multi-dimensional microscopy data requires high accuracy across many images (e.g. timepoints or Z slices) and is thus a labour-intensive part of biological image processing pipelines. We present ReSCU-Nets, recurrent convolutional neural networks that use the segmentation results from the previous frame as a prompt to segment the current frame. We demonstrate that ReSCU-Nets outperform state-of-the-art image segmentation models in different segmentation tasks on time-lapse microscopy sequences.
  • by Paganini, J. A., Kerkvliet, J. J., Jordan, O., Teunis, G., Plantinga, N. L., Willems, R. J. L., Arredondo-Alonso, S., Schurch, A. C.
    Plasmids play a pivotal role in the spread of antibiotic resistance genes. Accurately reconstructing plasmids often requires long-read sequencing, but bacterial genomic data in publicly accessible repositories has historically been derived from short-read sequencing technology. We recently presented an approach for reconstructing Escherichia coli antimicrobial resistance plasmids using Illumina short reads. This method consisted of combining a robust binary classification tool named plasmidEC with gplas2, which is a tool that makes use of features of the assembly graph to bin […]
  • by Zhang, J., Zheng, S., Zhang, W., Gu, J.
    Recently, CNN-based and Transformer-based network have become the de-facto standard for vessel segmentation, due to their strong feature representation capabilities. However, CNNs fall short in explicitly capturing global dependencies between different spatial locations, while Transformers require quadratic computation by virtue of the long sequences and lack fine-grained local correlations. Seeking to model both local correlations and long-range dependencies, our research revisits this issue and introduces a novel retinal vessel segmentation network based on U-Net, the Coordinate and Channel Attention Mixing […]

Related Journals