- by Kuang, Z.-Y., Sun, Y.-C., Wei, N.-N., Wu, H.-J.Cross-species integration of single-cell transcriptomes requires establishing gene correspondences to enable comparative analysis of expression profiles across organisms. Current approaches predominantly rely on Ensembl homology tables, whose default many-to-many mappings often amplify gene-family effects and introduce artifactual micro-clusters that lack clear cell-type identity, thereby complicating biological interpretation. While restricting mappings to a one-to-one scheme suppresses such artifacts, it reduces the number of homology gene pairs by approximately 8% ([~]900 pairs). To address this limitation, we developed a protein large language […]
- by Skolnick, J., Srinivasan, B., Zhou, H.Molecular glues can drive targeted protein degradation by stabilizing ternary complexes between proteins of interest and E3 ubiquitin ligases, but rational design has lagged due to limited rules for interface recognition and an overreliance on a few ligases (e.g., VHL or Cereblon). We introduce GlueFinder, a systematic, unbiased platform that leverages structural bioinformatics to mine the Protein Data Bank for ligand binding pockets adjacent to the protein interface which are ligandable sites near protein-protein interfaces that can nucleate glue-mediated complex […]
- by Kishore, V., Debarnot, V., Righetto, R. D., Engel, B. D., Dokmanic, I.Cryo-electron tomography (cryo-ET) visualizes 3D cellular architecture in near-native states. Recent deep-learning methods (CryoCARE, IsoNet, DeepDeWedge, CryoLithe) improve denoising and artifact correction, but performance remains limited by very low signal-to-noise ratio, a restricted angular range ("missing wedge"), and the lack of ground truth. Here, we present Icecream, which follows the broad template of earlier self-supervised approaches, but treats symmetry in a way consistent with the recent equivariant imaging framework (Chen et al., 2021). Coupled with several engineering refinements, including mixed-precision […]
- by Zinsou, K. M. S., Mahamoud, H. A., Gaye, A. M., Diop, I., Ndiaye, M., Sow, D., Diop, C. T., Korkin, D.Accurate diagnosis of infectious skin diseases remains a major challenge, particularly for neglected tropical diseases such as mycetoma, where precise pathogen identification is crucial for effective treatment. Histopathology imaging is the diagnostic gold standard, involving examination of tissue biopsies to identify characteristic inflammatory patterns, cellular changes, or microbial pathogens. However, its analysis is often limited by variability in tissue sampling and staining, subjective interpretation, inter-observer differences, and the absence of visible microbial grains in early disease stages. To elevate these […]
- by Wragg, D., Kang, E., Morgan, M.Combinatorial barcoding technologies for single-cell nucleotide sequencing, such as split-pool ligation protocols, involve sequential rounds of cell barcoding to uniquely tag individual cells. The rapid adoption of combinatorial barcoding in recent years is due in part to its scalability across cells and samples. However, small shifts in barcode positions within sequencing reads caused by technical artifacts, e.g. during barcode incorporation or synthesis, can impact the accurate assignment of reads to cell barcodes. Existing processing tools typically assume barcodes contain fixed-length […]
- by Uddin, M. R., Zheng, Z., Gandhi, K., Chang, H.-C., Kozel, J., Gali, Y., Buck, S. A., Glausier, J. R., Tseng, G. C., Freyberg, Z., Xu, M.We present a computational pipeline that links nuclear morphology to mRNA expression-based cell phenotypes under diverse biological conditions, including aging, disease progression, and drug response, using RNAscope imaging. The pipeline consists of three components: nuclear segmentation from RNAscope images, nuclear morphology identification, and downstream statistical analysis. Central to our approach is a novel unsupervised method, based on deep disentangled representation learning, which effectively captures diverse nuclear morphologies in large-scale datasets, as validated on synthetic benchmarks. We applied the full pipeline […]
- by Chatterjee, B., Gorga, K., Blair, C., Ohta, Y., Hill, E. M., Boughter, C. T., Meier-Schellersheim, M., Singh, N. J.Minimizing experimental noise is integral to robust data generation in single-cell science. Experimental processing of different samples as a single pool, made possible by hashtag-assisted pooling, helps minimize batch-effects, but the computational demultiplexing of the data can also lead to loss of cells whose hashtags cannot be resolved accurately. Here, we examine four alternate experimental designs that could be used instead of a single-pool approach and quantify the batch effects as well as cell loss in each case. While a […]
- by Vollmar, M., Westrip, S., Nair, S., Balasubramaniyan, B., Velankar, S., Jones, L., Strickland, P.Protein structures are crucial in understanding function, mechanism and disease-causing variants of proteins within any living cell. A number of experimental techniques are employed by researchers to determine said structure. Through structure inspection in molecular viewers combined with supporting biochemical and biophysical experiments, scientists are able to identify a protein's function, reaction mechanism and effects caused by sequence variation. These detailed findings supported by experimental results are documented and described in detail in scientific literature and by open sourcing the […]
- by Zaytsev, K. S., Bogatyreva, N. S., Fedorov, A. N.Genomic organization and its comparative analysis throughout all major kingdoms of life are extensively studied across multiple scales, ranging from individual gene-level analyses to system-wide investigations. This work introduces a novel framework for characterizing genetic architecture through a new integral genomic parameter. We propose the concept of a multidimensional Gene Space to enable holistic quantification of genome organization principles. Gene Space – a multidimensional space based on the frequencies of nucleotide tokens, such as individual nucleotides, codons, or codon pairs. […]
- by Xu, X., Bonvin, A. M. J. J.Antibodies play crucial roles in immune defense and serve as key therapeutic agents for numerous diseases. The structural and sequence diversity of their antigen recognition loops, coupled with the scarcity of high-quality data, pose significant challenges in the development of generalizable predictive models. Here, we present a sequence specific fine-tuning strategy for antibodies that partially bypass the need for generalization. We evaluated this approach in three biologically relevant tasks: antibody structure prediction, zero-shot prediction of beneficial mutation in antibody-antigen complexes […]
- by Chomicz, D., Dudzic, P., Wrobel, S., Gawlowski, T., Demharter, S., Spreafico, R., Minoux, H., Phillips, A., Krawczyk, K.Studying the interactions between antibodies and antigens is fundamental to the development of novel therapeutic biologics. Predictions of such interactions start with data collection. Though there exist reliable resources to identify antibody structures in the Protein Data Bank (PDB), such data still requires substantial processing to be usable in predictive tasks. Redundancy in sequences needs to be removed to avoid data leakages between train, test and validation sets. Descriptors such as surface accessibility, secondary structure and antibody region information need […]
- by Mehendale, N., Chikhale, A. A.Purpose: This work proposes a computer vision framework to automate the extraction of vital signs from bedside monitor systems and facilitate adaptive drug infusion in intensive care units (ICUs). This approach is intended to meet the requirement for less manual intervention and improved accuracy in critical care settings. Methods: An 8-megapixel camera captures time-lapse images of a simulated monitor display at a rate of 3 frames per second. Images are preprocessed (grayscale conversion, histogram stretching, edge detection, color filtering) before […]
- by Mo, X., Cai, J., Siu, S. W. I.Type VI secretion system effectors target the cell wall, membranes and nucleic acids, leading to the killing of bacteria or impairment of host cell defense mechanisms. Accurate identification of T6SEs will be beneficial to understand the virulence of these bacteria via type VI secretion systems as well as bacterial pathogenesis. Although some traditional machine learning-based and deep learning-based tools have been developed to distinguish T6SEs from non-T6SEs, we believe there is still room for further improvement. To obtain the robust […]
- by Lacroix, A. C., Armstrong, G. A. B.Zebrafish (Danio rerio) are a model organism used for the study of vertebrate development, disease and drug discovery. Two-day old larval zebrafish exhibit burst swimming behaviour that can be elicited by a light touch to the tail. Larval motor touch-responses are frequently video recorded and later analyzed. Methods to robustly analyze these videos in a reproducible and time-efficient manner are reliant on manual tracking, which is prone to experimenter bias and error. Here we present ZebraTrack, a machine learning-based program, […]
- by Ravikumar, V., Kulkarni, R., Maddox, A., Rao, A., Al-Holou, W.Glioblastoma (GBM) is the most common and lethal primary malignant tumor of the central nervous system. Advances in therapy are hindered by the complex intratumoral heterogeneity of GBM, where distinct malignant and non-malignant cellular states and interactions exist in spatially defined niches of the tumor microenvironment (TME), shaping both tumor behavior and treatment response. In this work, we define GBM biological reprogramming, TME recomposition, and cell-cell interactions in relation to spatially well defined Ivy Glioblastoma Atlas Project regions. Further, we […]
- by Correa Rojo, A., Moreau, Y., Ertaylan, G.The growing use of synthetic genomic data promises broader data access but raises unresolved concerns about privacy risk. We introduce PRISM-G, a model-agnostic framework that summarizes privacy exposure of synthetic genomes across three complementary components: (i) a proximity view that asks whether synthetic individuals lie unusually close to real genomes in genetic-coordinate space; (ii) a kinship view that detects replay of familial or population-structure patterns beyond what is expected by chance; and (iii) a trait-linked view that captures exposure through […]
- by Ben Aribi, H., Naitore, C., Ayadi, F., Guerbouj, S., Awe, O. I.Identifying differentially expressed genes associated with genetic pathologies is crucial to understanding the biological differences between healthy and diseased states and identifying potential biomarkers and therapeutic targets. However, gene expression profiles are controlled by various mechanisms including epigenomic changes, such as DNA methylation, histone modifications, and interfering microRNA silencing. We developed a novel Shiny application for transcriptomic and epigenomic change identification and correlation using a combination of Bioconductor and CRAN packages. The developed package, named EMImR, is a user-friendly tool […]
- by Choi, J. M., Zhang, L.Human cancer is highly heterogeneous, resulting in variable drug resistance and clinical outcomes. This complexity hinders accurate prognosis prediction and the development of targeted therapies. Molecular subtyping addresses these challenges by grouping cancers into more homogeneous subsets based on molecular characteristics, enabling subtype-specific treatment strategies. Subtyping is crucial for early diagnosis, personalized therapy, and improved survival by capturing differential therapeutic responses. Existing approaches to cancer subtyping fall into supervised and unsupervised categories. Supervised methods, often trained on The Cancer Genome […]
- by Smith, T. Q., Rahman, A., Szpiech, Z. A.Summary: We introduce Empirical Genotype Generalizer for Samples (EGGS) which accepts empirical genotypes with missing data and replicates the underlying dispersal and distribution of missing genotypes in other replicates. In addition, EGGS can remove phase, remove polarization, simulate deamination, create pseudohaploids, and convert between VCF, $ms$-style replicates, and EIGENSTRAT. Availability and Implementation: EGGS is written in the C programming language. Precompiled executables, source code, and the manual are available at https://github.com/TQ-Smith/EGGS
- by Jiang, X., Christian, L., Xi, Y., Zhou, L., Ahmed, A., Zheng, X., Unen, N. V., Neubert, L., Dietrich, J., Kamp, J. C., Kayser, M., Kuehnel, M., Hohlfeld, J., Slevogt, H., Hoeper, M. M., Kaminski, N., Welte, T., Fuge, J., Ringshausen, F. C., Homer, R. J., Jonigk, D. D., Xu, C.-J., Schupp, J. C., Li, Y.Granulomas are the hallmark of mycobacterial (MB) infections, forming structured immune environments that contain bacteria but also drive disease persistence. However, their spatial and functional organization remains unclear. Using spatial RNA sequencing on 38 patient samples, we identified five distinct granuloma niches: a necrotic core, an immune-activated inner niche, an inflammatory and an extracellular matrix (ECM)-remodeling middle niche, an outer structural niche, and a tertiary lymphoid structure niche supporting antigen presentation. Immune activity peaks in the inner niche, transitioning to […]