BioRxiv Bioinformatics – Mass Spectrometry Blog

Journal Home

RSS

Improving Drug Sensitivity Prediction and Inference by Multitask Learning
May 13, 2024 by Strauch, J., Asiaee, A.
The development of models to predict sensitivity to anticancer drugs is an area of significant interest, given the diverse responses to treatment among patients and the considerable expense and time involved in anticancer drug development. Leveraging "omic" data and anticancer response information from the Cancer Cell Line Encyclopedia, we propose a novel approach utilizing multitask learning to enhance prediction accuracy and inference. We extended a multitask learning framework called the Data Shared Lasso to develop the Data Shared Elastic Net. […]
stDyer enables spatial domain clustering with dynamic graph embedding
May 12, 2024 by Zhang, L., XU, K., XU, Y., Wang, Z.
Spatial transcriptomics data provide insights into gene expression patterns within tissue contexts, where identifying spatial domains with similar gene expression is crucial. Traditional clustering methods for spatial domain clustering often neglect spatial continuity, resulting in disjointed domains. Although recent computational approaches have integrated spatial information, they face limitations in recognizing domain boundaries, scalability, and the need for independent clustering steps. Here, we introduce stDyer, a novel end-to-end deep learning framework for spatial domain clustering in ST data. stDyer utilizes a […]
Standardized Nomenclature and Reporting for PacBio HiFi Sequencing and Analysis of rAAV Gene Therapy Vectors
May 12, 2024 by Talevich, E., Tseng, E., Diallo, A., Sellami, N., Elliott, A., Cantarel, B. L., Tonthat, N., Chatterjee, P., Tai, P. W. L., Aldridge, C.
Despite recombinant adeno-associated viruses (rAAVs) being the leading platform for gene therapy, there is a lack of standardized computational analysis methods and reporting to assess the contents of each capsid through long-read sequencing. PacBio's highly accurate long-read HiFi sequencing enables comprehensive characterization of AAV genomes but requires bioinformatics expertise for analyzing, interpreting and comparing the results. To address this need and improve the understanding of functional viral payloads, our working group established standardized nomenclature and reporting for long-read sequencing data […]
Draft genome analysis of Delftia tsuruhatensis IICT-RSP4, a strain with uricase potential isolated from soil
May 12, 2024 by Prakasham, R. S., Anumalla, M., Batchu, U. R., Bhukya, B.
Delftia tsuruhatensis IICT-RSP4, an uricase producing bacterium was isolated using i-chip method from soil and characterized. Here, we report the draft genome sequence of D. tsuruhatensis IICT-RSP4. The genome data comprised of 6,627,718bp (6.6 MB) with a GC content of 66.6% with 7 protein encoding genes, 346 sub-systems with 6165 coding sequences and 112 RNAs. The genome revealed five functional secondary metabolite biosynthetic gene clusters viz. terpene, resorcinol, NRP+PKS, T2PKS, and RiPPS related to antimicrobial, anticancer and antimalarial functionality. In […]
Predicting transcriptional activation domain function using Graph Neural Networks
May 12, 2024 by Farheen, F., Broyles, B. K., Zhang, Y., Ibtehaz, N., Erkine, A. M., Kihara, D.
Analysis of factors that lead to the functionality of transcriptional activation domains remains a crucial and yet challenging task owing to the significant diversity in their sequences and their intrinsically disordered nature. Almost all existing methods that have aimed to predict activation domains have involved traditional machine learning approaches, such as logistic regression, that are unable to capture complex patterns in data or plain convolutional neural networks and have been limited in exploration of structural features. However, there is a […]
The Continuous Language of Protein Structure
May 11, 2024 by Billera, L., Oresten, A., Stalmarck, A., Sato, K., Kaduk, M., Murrell, B.
Just as language is composed of sublexical tokens that combine to form words, sentences, and paragraphs, protein backbones are composed of sub-structural elements that combine to form helices, sheets, folds, domains, and chains. Autoregressive language models operate on discrete tokens, whereas protein structure is inherently continuous, and generative approaches to protein design have borrowed more from image generation than language modeling. But autoregressive models do not inherently require their inputs and outputs to be discrete. Here we describe a generative […]
Guide assignment in single-cell CRISPR screens using crispat
May 10, 2024 by Braunger, J. M., Velten, B.
Pooled single cell CRISPR screens have emerged as a powerful tool in functional genomics to probe the effect of genetic interventions at scale. A crucial step in the analysis of the resulting data is the assignment of cells to gRNAs corresponding to a specific genetic intervention. However, this step is challenging due to a lack of systematic benchmarks and accessible software to apply and compare different guide assignment strategies. To address this, we here propose crispat (CRISPR guide assignment tool), […]
Spatial domains identification in spatial transcriptomics by domain knowledge-aware and subspace-enhanced graph contrastive learning
May 10, 2024 by Gui, Y., Li, C., Xu, Y.
Spatial transcriptomics (ST) technologies have emerged as an effective tool to identify the spatial architecture of the tissue, facilitating a comprehensive understanding of organ function and tissue microenvironment. Spatial domain identification is the first and most critical step in ST data analysis, which requires thoughtful utilization of tissue microenvironment and morphological priors. To this end, we propose a graph contrastive learning framework, GRAS4T, which combines contrastive learning and subspace module to accurately distinguish different spatial domains by capturing tissue microenvironment […]
Phasing millions of samples achieves near perfect accuracy, enabling parent-of-origin classification of variants
May 10, 2024 by Williams, C. M., O'Connell, J., Freyman, W. A., 23andMe Research Team,, Gignoux, C. R., Ramachandran, S., Williams, A. L.
Haplotype phasing, the process of determining which genetic variants are physically located on the same chromosome, is crucial for various genetic analyses. In this study, we first benchmark SHAPEIT and Beagle, two state-of-the-art phasing methods, on two large datasets: >8 million diverse, research-consented 23andMe, Inc. customers and the UK Biobank (UKB). We find that both perform exceptionally well. Beagle's median switch error rate (SER) (after excluding single SNP switches) in white British trios from UKB is 0.026% compared to 0.00% […]
On the completeness of existing RNA fragment structures
May 10, 2024 by Hong, X., Zhan, J., Zhou, Y.
Success in protein structure prediction by the deep learning method AlphaFold 2 naturally gives arise the question if we can do the same for RNA structure prediction. One reason for the success in protein structure prediction is that the structural space of proteins at the fragment level has been nearly complete for many years. Here, we examined the completeness of RNA fragment structural space at dimeric, trimeric, tetrameric, and pentameric levels. We showed that the RNA structural space is not […]
Mechanistic modeling of Mycobacterium tuberculosis β-carbonic anhydrase inhibitors using integrated systems biology and the QSAR approach
May 10, 2024 by Bhowmik, R., Manaithiya, A., Parkkinen, J., Kumar, S., Mathew, B., Parikka, M., Carta, F., Supuran, C. T., Parkkila, S., Aspatwar, A.
Mycobacterium tuberculosis (Mtb) {beta}-carbonic anhydrases ({beta}-CAs) are crucial enzymes responsible for regulating pH by catalyzing the conversion of CO2 to HCO3-, which is essential for its survival in acidic environments in the host. By inhibiting Mtb {beta}-CAs, we can potentially discover new targets for anti-tuberculosis drugs with a different mechanism of action than existing FDA-approved drugs. This is crucial since Mtb has demonstrated the ability to develop different degrees of resistance to current drugs over time. This study employed machine […]
FAPM: Functional Annotation of Proteins using Multi-Modal Models Beyond Structural Modeling
May 10, 2024 by Xiang, W., Xiong, Z., Huan, C., Xiong, J., Zhang, W., Fu, Z., Zheng, M., Liu, B., Shi, Q.
Assigning appropriate property labels, such as functional terms and catalytic activity, to proteins, remains a significant challenge, particularly for the non-homologous ones. In contrast to prior approaches that mostly focused on protein sequence features, we employ pretrained protein language model to encode the sequence features, and natural language model for the semantic information of property descriptions. Specifically, we present FAPM, a contrastive model between natural language and protein sequence language, which combines the pretrained protein sequence model with the pretrained […]
Repeat and haplotype aware error correction in nanopore sequencing reads with DeChat
May 10, 2024 by Li, Y., Chen, E., Xu, J., Zhang, W., Zeng, X., Liu, Y., Luo, X.
Error self-correction is a pivotal first step in the analysis of long-read sequencing data. However, most existing methods for this purpose are primarily tailored for noisy sequencing data with error rates exceeding 5%, often collapsing true variants in repeats and haplotypes. Alternatively, some methods are heavily optimized for PacBio HiFi reads, leaving a gap in methods specifically designed for Nanopore R10 reads basecalled with high accuracy or super accuracy models, which typically have error rates below 2%. Here, we introduce […]
DeepES: Deep learning-based enzyme screening to identify orphan enzyme genes
May 10, 2024 by Hirota, K., Salim, F., Yamada, T.
Progress in sequencing technology has led to determination of large numbers of protein sequences, and large enzyme databases are now available. Although many computational tools for enzyme annotation were developed, sequence information is unavailable for many enzymes, known as orphan enzymes. These orphan enzymes hinder sequence similarity-based functional annotation, leading gaps in understanding the association between sequences and enzymatic reactions. Therefore, we developed DeepES, a deep learning-based tool for enzyme screening to identify orphan enzyme genes, focusing on biosynthetic gene […]
Genome-Wide Identification and Expression Pattern of the N-acetylserotonin deacetylase (ASDAC) Gene Family in Orchidaceae
May 10, 2024 by Sun, E., Zhao, E., Li, Q., Lu, W., Li, Y., Yang, C., Chen, T., Mou, Z., Zhao, D.
Orchids are a kind of horticultural plant with highly ornamental and medical value. N-acetylserotonin deacetylase (ASDAC) is the only reverse enzyme of the melatonin biosynthesis pathway, and plays an important role in regulating the balance of melatonin. Melatonin as a multifunctional molecule, is typically involved in plant growth and development regulation, as well as abiotic stress tolerance. Here, we aimed at identifying ASDAC genes from the orchid genome to provide valuable information for further study of the role of melatonin […]
Predicting MHC-I ligands across alleles and species: How far can we go?
May 10, 2024 by Tadros, D. M., Racle, J., Gfeller, D.
CD8+ T-cell activation is initiated by the recognition of epitopes presented on class I major histocompatibility complex (MHC-I) molecules. Identifying such epitopes is useful for molecular understanding of cellular immune responses and can guide the development of personalized vaccines for various diseases including cancer. Here, we capitalize on high-quality MHC-I peptidomics data available from different species and an expanded architecture of our MHC-I ligand predictor (MixMHCpred) to carefully explore how much predictions can be extrapolated to MHC-I alleles without known […]
Spectral Cluster Supertree: fast and statistically robust merging of rooted phylogenetic trees
May 10, 2024 by McArthur, R. N., Zehmakan, A. N., Charleston, M. A., Huttley, G. A.
The algorithms for phylogenetic reconstruction are central to computational molecular evolution. The relentless pace of data acquisition has exposed their poor scalability and the conclusion that the conventional application of these methods is impractical and not justifiable from an energy usage perspective. Furthermore, the drive to improve the statistical performance of phylogenetic methods produces increasingly parameter-rich models of sequence evolution, which worsens the computational performance. Established theoretical and algorithmic results identify supertree methods as critical to divide-and-conquer strategies for improving […]
CELLama: Foundation Model for Single Cell and Spatial Transcriptomics by Cell Embedding Leveraging Language Model Abilities
May 10, 2024 by Choi, H., Park, J., Kim, S., Kim, J., Lee, D., Bae, S., Shin, H., Lee, D.
Large-scale single-cell RNA sequencing (scRNA-seq) and spatial transcriptomics (ST) have transformed biomedical research into a data-driven field, enabling the creation of comprehensive data atlases. These methodologies facilitate detailed understanding of biology and pathophysiology, aiding in the discovery of new therapeutic targets. However, the complexity and sheer volume of data from these technologies present analytical challenges, particularly in robust cell typing, integration and understanding complex spatial relationships of cells. To address these challenges, we developed CELLama (Cell Embedding Leverage Language Model […]
A method for filtering abnormal modified base calling in Oxford Nanopore Technologies sequencing
May 10, 2024 by ma, x., jia, r., wang, y., ye, h.
Backgroundgene synthesis sequencing using the long-read Oxford Nanopore Technologies (ONT) provides a cost-effective option for gene synthesis quality control. Despite the advantage of using long reads, however, accurate base calling is influenced by modified bases. ResultsWe introduce a method for filtering abnormal modified base calling in Oxford Nanopore Technologies sequencing. This method is based on the mapping results and perform an exact binomial test on the proportion of single base forward and reverse chain depth to determine the presence of […]
Statistical Modeling for MicroRNA Sequencing Data
May 10, 2024 by Jun, S.-H., McCall, M.
MicroRNAs play a central role in regulating gene expression and modulating diseases. Despite the importance of micro RNAs, statistical methods for analyzing them have received far less attention compared to messenger RNAs. In fact, it is common practice to apply the methods developed for messenger RNA-seq data to analyze micro RNA-seq data. This study critically examines and challenges the assumptions of messenger RNA-based methods when applied to micro RNAs, highlighting the competitive nature of micro RNA expression. We propose a […]

Related Journals