- by Tao, S., Feng, Y., Yang, Y., Wu, M., Zheng, J.Synthetic lethality (SL) offers a promising approach for targeted cancer therapies. Current SL prediction models heavily rely on extensive labeled data for specific cell lines to accurately identify SL pairs. However, a major limitation is the scarcity of SL labels across most cell lines, which makes it challenging to predict SL pairs for target cell lines with limited or even no available labels in real-world scenarios. Furthermore, gene interactions could be opposite between training and test cell lines, i.e. SL […]
- by Yarnes, S. C., Palladino, N., Meidlinger, D. J., Philips, D. R., Sweeney, H. M., Mustafa, S., Mandych, M. L., Bouabane, S., Parsons, T. E., Slonecki, T. J., Zhao, D., Rife, T. W., Ellerbrock, B. J., Courtney, C., Selby, P., Flores-Gonzalez, M., Mueller, L. A., Aparicio, J. S., Al-Sham'aa, K., Raubach, S., Beil, C. T., Sheehan, M. J., Meng Lin, Edwin Reidel, Tyr Wiesner-Hanks, Arlyn J Ackerman, Alexandra M Casa, Alexander M SandercocDeltaBreed is a unified breeding data management system designed by Breeding Insight (BI, Cornell University) to serve the wide diversity of USDA-ARS specialty crop and animal breeding programs. DeltaBreed has a RESTful microservice architecture that utilizes the BrAPI Java Test Server as its primary database. The system is interoperable with many BrAPI-compliant applications (BrApps), including Field Book, and is continually aligned with the most recent BrAPI specifications. Here, we describe the features of DeltaBreed and provide several interoperability test cases […]
- by Sahu, V. K., Sand, A., Ballav, S., Raman, V., Nagar, S., Ranjan, A., Basu, S.Systematic Interaction Evaluation and Virtual Enhancement Analysis Interface (SieveAI) is an automated drug discovery pipeline developed to enhance the efficiency of virtual screening and computer-aided drug discovery processes. The molecular docking workflow encompasses acquiring, modeling, and pre-processing of molecular structure files, conducting docking with various algorithms, and subsequent analysis and interpretation of the outcomes by visualising or tabulating the results. While several open-source software tools are available to assist these operations at different steps of molecular docking, they often necessitate […]
- by Grein, S., Elschner, T., Kardinal, R., Bruder, J., Strohmeyer, A., Gunasekaran, K., Witt, J., Hermannsdottir, H., Behrens, J., U-Din, M., Yu, J., Heldmaier, G., Schreiber, R., Rozman, J., Heine, M., Scheja, L., Worthmann, A., Heeren, J., Wachten, D., Wilhelm-Juengling, K., Pfeifer, A., Hasenauer, J., Klingenspor, M.Indirect calorimetry is a cornerstone technique for metabolic phenotyping of animal models in preclinical research, with well-established experimental protocols and platforms. However, a flexible, extensible, and user-friendly software suite that enables standardized integration of data and metadata from diverse metabolic phenotyping platforms – followed by unified statistical analysis and visualization – remains absent. We present Shiny-Calorie, an open-source interactive web application for transparent data and metadata integration, comprehensive statistical data analysis, and visualization of indirect calorimetry datasets. Shiny-Calorie is compatible […]
- by Ryan, W. G., Eby, H. M., Bearss, N. R., Imami, A. S., Hamoud, A.-r., Pulvender, P., Bollinger, J. L., Wohleb, E. S., McCullumsmith, R. E.Protein kinases are central to healthy brain function, regulating critical cellular processes through complex signaling networks. However, understanding differences in kinase signaling of brain cells remains a preeminent challenge of neuroscience. This study aimed to characterize kinase pathways enriched in astrocytes and microglia isolated from male and female murine prefrontal cortex. Using the PamGene PamStation(R)12 platform, we discovered cell-type-specific kinomic profiles and computationally reconstructed each cell type's unique active signaling protein-protein interaction network. Notably, our analysis revealed minimal overlap between […]
- by Candir, E. B., Kuru, H. I., Rattray, M., Cicek, A. E., Tastan, O.Combinatorial drug therapy holds great promise for tackling complex diseases, but the vast number of possible drug combinations makes exhaustive experimental testing infeasible. Computational models have been developed to guide experimental screens by assigning synergy scores to drug pair-cell line combinations, where they take input structural and chemical information on drugs and molecular features of cell lines. The premise of these models is that they leverage this biological and chemical information to predict synergy measurements. In this study, we demonstrate […]
- by Forsgren, E., Rietdijk, J., Holmberg, D., Johansson, M. M., Carreras-Puigvert, J., Trygg, J., Lovell, G., Spjuth, O., Jonsson, P.Morphological profiling is a powerful method for identifying the modes of action (MOAs) of chemical compounds. However, most approaches rely on fixed-cell assays that capture only a single time point, missing the dynamic nature of cellular responses. While live-cell imaging captures these temporal effects, its potential for MOA classification using high-dimensional features remains unexplored. We strategically convert large-scale time-series image data (>82,000 images) into interpretable phenotypic trajectories and show that label-free live-cell imaging captures meaningful temporal phenotypic signatures that improve […]
- by Acquaye, F. L. N. A., Wen, B., Grant, C. E., Noble, W. S., Kertesz-Farkas, A.Ultimately, most tandem mass spectrometry (MS/MS) proteomics experiments aim to not just detect but also quantify the proteins in a given complex sample. Here, we describe an extension to the Crux MS/MS analysis toolkit to enable label-free quantification of peptides. We demonstrate that Crux's new quantification command, which is modeled after the algorithms implemented in the widely used FlashLFQ software, is both efficient and accurate. In particular, we achieve a 1.9-fold speedup while reducing the memory usage by 26%. The […]
- by Benoit, G., James, R., Raguideau, S., Alabone, G., Goodall, T., Chikhi, R., Quince, C.Third-generation long-read sequencing technologies, have been shown to significantly enhance the quality of metagenome assemblies. The results obtained using the highly accurate reads generated by PacBio HiFi have been particularly notable yielding hundreds of circularized, complete genomes as metagenome-assembled genomes (MAGs) without manual intervention. Oxford Nanopore Technologies (ONT) has recently improved the accuracy of its sequencing reads, achieving a per-base error rate of approximately 1-2%. Given the high-throughput, convenience and low-cost of ONT sequencing this could accelerate the uptake of […]
- by Trigodet, F., Sachdeva, R., Banfield, J. F., Eren, A. M.Genomes from metagenomes have revolutionised our understanding of microbial diversity, ecology, and evolution, propelling advances in basic science, biomedicine, and biotechnology. Assembly algorithms that take advantage of increasingly available long-read sequencing technologies bring the recovery of complete genomes directly from metagenomes within reach. However, assessing the accuracy of the assembled long reads, especially from complex environments that often include poorly studied organisms, poses remarkable challenges. Here we show that erroneous reporting is pervasive among long-read assemblers and can take many […]
- by Yan, J., Zhu, J., Yang, Y., Liu, Q., Zhang, K., Zhang, Z., Liu, X., Zhang, B., Gao, K., Xiao, J., Chen, E.Protein-ligand bioactivity data published in literature are essential for drug discovery, yet manual curation struggles to keep pace with rapidly growing literature. Automated bioactivity extraction is challenging due to the multi-modal distribution of information (text, tables, figures, structures) and the complexity of chemical representations (e.g., Markush structures). Furthermore, the lack of standardized benchmarks impedes the evaluation and development of extraction methods. In this work, we introduce BioMiner, a multi-modal system designed to automatically extract protein-ligand bioactivity data from thousands to […]
- by Hobby, D., Lindner, R., Mbebi, A. J., Tong, H., Nikoloski, Z.Ability to accurately predict multiple growth-related traits over plant developmental trajectories has the potential to revolutionize crop breeding and precision agriculture. Despite increased availability of time-resolved data for multiple traits from high-throughput phenotyping platforms of model plants and crops, genomic prediction is largely applied to a small number of traits, often neglecting their dynamics. Here, we compared and contrasted the performance of MegaLMM and dynamicGP as well as their hybrid variants that can handle high-dimensional temporal data for multi-trait genomic […]
- by Soleymani, S., Gravel, N. M., Kochut, K., Kannan, N.The integration of large language models (LLMs) with knowledge graphs (KGs) holds significant potential for simplifying the process of querying graph databases, especially for non-technical users. KGs provide a structured representation of domain-specific data, enabling rich and precise information retrieval. However, the complexity of graph query languages, such as Cypher, presents a barrier to their effective use by non-experts. This research addresses the challenge by proposing a novel approach, Prompt2Cypher (P2C), which leverages task splitting and prompt engineering to decompose […]
- by Majidifar, S., Hooshmand, M.Computational drug repurposing is vital in drug discovery research because it significantly reduces both the cost and time involved in the drug development process. Additionally, combination therapy–using more than one drug for treatment–can enhance efficacy and minimize the side effects associated with individual drugs. However, there is currently limited research focused on computational approaches to combination therapy for viral diseases. This paper proposes AI-based models to predict novel drug combinations that can synergistically treat viral diseases. To achieve this, we […]
- by Jiang, Z., Pan, W., Gao, R., Hu, H., Gao, W., Zhou, M., Yin, Y.-H., Qian, Z., Jin, S., Wang, G.Population genomics using short-read resequencing captures single nucleotide polymorphisms and small insertions and deletions but struggles with structural variants (SVs), leading to a loss of heritability in genome-wide association studies. In recent years, long-read sequencing has improved pangenome construction for key eukaryotic species, addressing this issue to some extent. Sufficient-coverage high-fidelity (HiFi) data for population genomics is often prohibitively expensive, limiting its use in large-scale populations and broader eukaryotic species and creating an urgent need for robust ultra-low coverage assemblies. […]
- by Zerefa, S., Cool, J., Singh, P., Petti, S.Recent advancements in protein structure prediction methods have vastly increased the size of databases of protein structures, necessitating fast methods for protein structure comparison. Search methods that find structurally similar proteins can be applied to find remote homologs, study the functional relationships among proteins, and aid in protein engineering tasks. The structure comparison method Foldseek represents each protein structure as a sequence of "3Di" characters and uses highly optimized sequence comparison software to search with this alphabet. An alternate alphabet […]
- by Fu, M. P., Edwards, K., Navarro-Delgado, E. I., Merrill, S. M., Kitaba, N. T., Konwar, C., Mandhane, P., Simons, E., Subbarao, P., Moraes, T. J., Holloway, J. W., Turvey, S. E., Kobor, M. S.Prospective birth cohorts offer the potential to interrogate the relation between early life environment and embedded biological processes such as DNA methylation (DNAme). These association studies are frequently conducted in the context of blood, a heterogeneous tissue composed of diverse cell types. Accounting for this cellular heterogeneity across samples is essential, as it is a main contributor to inter-individual DNAme variation. Integrated blood cell deconvolution of pediatric and longitudinal birth cohorts poses a major challenge, as existing methods fail to […]
- by Ulusoy, E., Dogan, T.Motivation: The rapid accumulation of protein sequence data, coupled with the slow pace of experimental annotations, creates a critical need for computational methods to predict protein functions. Existing models often rely on limited data types, such as sequence-based features or protein-protein interactions (PPIs), failing to capture the complex molecular relationships in biological systems. To address this, we developed ProtHGT, a heterogeneous graph transformer-based model that integrates diverse biological datasets into a unified framework using knowledge graphs for accurate and interpretable […]
- by Shi, X., Ramathal, C., Dezso, Z.The large-scale multiplexed drug screening platforms like PRISM and GDSC facilitate the screening of drug treatments over 1,000 cancer cell lines. The cancer cell lines are well characterized by multiomics screening in CCLE and DepMap, enabling the application of AI and machine learning techniques to study the association between drug sensitivity and the underlying molecular profiles. The large scale and variety of data modalities enabled us to build an interpretable deep learning framework, INSIGHT, integrating the multiomics data and the […]
- by Beasley, J.-M., Schatz, K., Ding, E., DeLuca, M., Abu Zaid, N., Tucker, N., Chirkova, R., Crona, D., Tropsha, A., Muratov, E.The identification of therapeutic protein targets is fundamental to the success of drug development and repurposing. Traditional approaches for target selection require extensive preclinical evaluation for toxicity and efficacy, making the process time-intensive and resource-heavy. Computational tools that efficiently prioritize and validate novel targets are needed to streamline drug discovery workflows. To address this gap, we developed TARRAGON: Therapeutic Target Applicability Ranking and Retrieval-Augmented Generation Over Networks, a computational framework that integrates data mining and machine learning to identify, rank, […]