- by Pirrotta, S., Bonora, M., Calura, E.Mitochondria are dynamic organelles that play crucial roles in energy transformation, biosynthesis, and cellular signaling. They actively process biological information, detecting and reacting to both internal and external stimuli. Through intricate physical interactions and diffusion mechanisms within cellular networks, mitochondria integrate diverse inputs and generate signals that finely adjust cellular functions and overall physiology. As a result, the phenotypic expressions of impaired mitochondrial function can exhibit high variability. High-throughput transcriptomic data can capture these changes, but traditional pathway analyses performed […]
- by Wang, X., Zhang, H., Huang, J., Qin, Z.We present MAPPE, a novel algorithm integrating a k-nearest neighbor (KNN) similarity network with co-occurrence matrix analysis to extract evolutionary insights from protein language model (PLM) embeddings. The KNN network captures diverse evolutionary relationships and events, while the co-occurrence matrix identifies directional evolutionary paths and potential signals of gene transfer. MAPPE overcomes the limitations of traditional sequence alignment methods in detecting structural homology and functional associations in low-similarity protein sequences. By employing sliding windows of varying sizes, it analyzes embeddings […]
- by Dabbaghie, F.The representation of genomes and genomic sequences through graph structures has undergone a period of rapid development in recent years, particularly to accommodate the growing size of genome sequences that are being produced. Genome graphs have been employed extensively for a variety of purposes, including assembly, variance detection, visualization, alignment, and pangenomics. Many tools have been developed to work with and manipulate such graphs. However, the majority of these tools tend to load the complete graph into memory, which results […]
- by Asghari, M., Sabo, A. R., Barwinska, D., Melo Ferreira, R., Ferkowicz, M., Bowen, W., Cheng, Y., Gisch, D., Gulbronson, C., Phillips, C. L., Kelly, K. J., Sutton, T. A., Williams, J., Vazquez, M., O'Toole, J., Palevsky, P., Rosas, S., Waikar, S. S., Kiryluk, K., Parikh, C., Hodgins, J., Sarder, P., De Boer, I., Himmelfarb, J., Kretzler, M., Kidney Precision Medicine Project,, Jain, S., Eadon, M., Winfree, S., El-Achkar, T. M., Dagher, P. C.The organizational principles of nephronal segments are based on longstanding anatomical and physiological attributes that are closely linked to the homeostatic functions of the kidney. Novel molecular approaches have recently uncovered layers of deeper signatures and states in tubular cells that arise at various timepoints on the spectrum between health and disease. For example, a dedifferentiated state of proximal tubular cells with mesenchymal stemness markers is frequently seen after injury. The persistence of such a state is associated with failed […]
- by Chen, H.The identification of an effective inhibitor is an essential starting point in drug discovery. Unfortunately, many issues arise with conventional high-throughput screening methods. Thus, new strategies are needed to filter through large compound screening libraries to create target-focused, smaller libraries. Effective computational methods in this respect have emerged in the past decade or so; among these methods is machine learning. Herein, we explore an ensemble Deep Learning model trained on MAPKAPK2 bioactivity data. This ensemble ML model consists of ten […]
- by Mayer, J. G., Delgoffe, B., Hebbring, S.Family data is a valuable data source in bioinformatic research. This is because family members often share common genetic and environmental exposures. Collecting this family data is traditionally very labor intensive but advances in electronic health record (EHR) data mining has proven useful when identifying pedigrees linked to longitudinal health histories. These are called e-pedigrees. Unfortunately, e-pedigrees tend to miss the oldest generations who inherently have the longest and richest health histories. A good source of family data from older […]
- by Gambardella, G.Despite the complementary strengths of short- and long-read sequencing approaches, variant calling methods still rely on a single data type. Here, leveraging harmonized Nanopore and Illumina data from the genome in a bottle consortium, we explore the benefits of combining Illumina and Nanopore data through a hybrid approach based on deep learning. By reducing sequencing costs, we show that a shallow hybrid sequencing approach can improve germline variant detection accuracy. Our findings offer promising potential for molecular diagnostics in clinical […]
- by Zhang, C., Xu, Z., Lin, K., Zhang, C., Xu, W., Duan, H.Cyclic peptides are potentially therapeutic in clinical applications, due to their great stability and activity. Yet, designing and identifying potential cyclic peptide binders targeting specific targets remains a formidable challenge, entailing significant time and resources. In this study, we modified the powerful RFdiffusion model to allow the cyclic peptide structure identification and integrated it with ProteinMPNN and HighFold to design binders for specific targets. This innovative approach, termed cycledesigner, was followed by a series of scoring functions that efficiently screen. […]
- by Haghani, M., Bhattacharya, D., Murali, T. M.Summary: A Multiple Sequence Alignment (MSA) contains fundamental evolutionary information that is useful in the prediction of structure and function of proteins and nucleic acids. The "Number of Effective Sequences" (NEFF) quantifies the diversity of sequences of an MSA. Several tools can compute the NEFF of an MSA, each offering various options. NEFFy is the first software package to integrate all these options and calculate NEFF across diverse MSA formats for proteins, RNAs, and DNAs. It surpasses existing tools in […]
- by Hall, M. B., Coin, L. J. M.Summary: Accurate genome size estimation is an important component of genomic analyses, though existing tools are primarily optimised for short-read data. We present LRGE, a novel tool that uses read-to-read overlap information to estimate genome size in a reference-free manner. LRGE calculates per-read genome size estimates by analysing the expected number of overlaps for each read, considering read lengths and a minimum overlap threshold. The final size is taken as the median of these estimates, ensuring robustness to outliers such […]
- by Chowdhury, M. Z. U. S., Any, S. S., Samee, M. A. H., Rahman, A.Gene regulatory networks are crucial for cellular function, and disruptions in transcription factor (TF) regulation often lead to diseases. However, identifying TFs to transition a source cell state to a desired target state remains challenging. We present a method to identify key TFs whose perturbation can restore gene expressions in a source state to target levels. Its effectiveness is demonstrated on datasets from yeast TF knockouts, cardiomyocytes from hypoplastic left heart syndrome patients, and mouse models of neurodegeneration. The method […]
- by Singh, N. P., Khan, J., Patro, R.Ultrafast mapping of short reads to transcriptomic and metagenomic references via lightweight mapping techniques such as pseudoalignment has demonstrated success in substantially accelerating several types of analyses without much loss in accuracy compared to alignment-based approaches. The application of pseudoalignment to large reference sequences – like the genome – is, however, not trivial, due to the large size of the references or "targets" (i.e. chromosomes) and the presence of repetitive sequences within an individual reference sequence. This can lead to […]
- by Adjavon, D.-Y., Eckstein, N., Bates, A. S., Jefferis, G. S. X. E., Funke, J.We address the problem of explaining the decision process of deep neural network classifiers on images, which is of particular importance in biomedical datasets where class-relevant differences are not always obvious to a human observer. Our proposed solution, termed quantitative attribution with counterfactuals (QuAC), generates visual explanations that highlight class-relevant differences by attributing the classifier decision to changes of visual features in small parts of an image. To that end, we train a separate network to generate counterfactual images (i.e., […]
- by Ayala Montano, S., Afolayan, A. O., Kociurzynski, R., Loeber, U., Reuter, S.Metagenomic sequencing has revolutionized our understanding of microbial communities, but the presence of contaminant DNA, particularly from the host, poses a significant challenge to accurate data interpretation. We present a methodology for the detection and removal of contaminant sequences in metagenomic datasets, focusing on microbial DNA as a primary contaminant. By integrating metrics such as the prevalence method proposed in decontam, and the coverage value per species per sample, we contributed to the remaining challenge of microbial contaminants that mislead […]
- by Ertelt, M., Schlegel, P., Beining, M., Kaysser, L., Meiler, J., Schoeder, C. T.Stability is a key factor to enable the use of recombinant proteins in therapeutic or biotechnological applications. Deep learning protein design approaches like ProteinMPNN have shown strong performance both in creating novel proteins or stabilizing existing ones. However, it is unlikely that the stability of the designs will significantly exceed that of the natural proteins in the training set, which are biophysically only marginally stable. Therefore, we collected predicted protein structures from hyperthermophiles, which differ substantially in their amino acid […]
- by Kamal, R., Narayanan, M.Genome-wide association studies (GWAS) aimed at estimating the disease risk of genetic factors have long been focusing on homogeneous Caucasian populations, at the expense of other understudied non-Caucasian populations. Therefore, active efforts are underway to understand the differences and commonalities in exhibited disease risk across different populations or ethnicities. There is, consequently, a pressing need for computational methods that efficiently exploit these population specific vs. shared aspects of the genotype-phenotype relation. We propose MultiPopPred, a novel trans-ethnic polygenic risk score […]
- by Vaughan, T. G., Stadler, T.Phylodynamic methods provide a coherent framework for the inference of population parameters directly from genetic data. They are an important tool for understanding both the spread of epidemics as well as long-term macroevolutionary trends in speciation and extinction. In particular, phylodynamic methods based on multi-type birth-death models have been used to infer the evolution of discrete traits, the movement of individuals or pathogens between geographic locations or host types, and the transition of infected individuals between disease stages. In these […]
- by Lim, H., Jun, S., Kim, T., Lee, J. H., Bang, D.Molecular barcoding methods enable high-sensitivity detection of circulating tumor DNA that is rarely present in liquid biopsy samples. Many methods involve ligation of molecular barcodes to DNA prior to hybridization capture, enabling recovery of starting molecules. Development of polymerase chain reaction (PCR)-based methods could facilitate more cost- and labor- effective detection; however, tracking molecular identity can be difficult, as new barcodes overwrite old barcodes in each cycle. We developed a sensitive genotyping method based on a peer-to-peer network-derived identifier for […]
- by Johnson, L. F., Casaletto, J. A., Costes, S. V., Proctor, C. R., Sanders, L. M.The genetic perturbations caused by spaceflight on biological systems tend to have a system-wide effect which is often difficult to deconvolute into individual signals with specific points of origin. Single cell multi-omic data can provide a profile of the perturbational effects but does not necessarily indicate the initial point of interference within a network. The objective of this project is to take advantage of large scale and genome-wide perturbational or Perturb-Seq datasets by using them to pre-train a generalist machine […]
- by Ben Mariem, O., Coppi, L., De Fabiani, E., Eberini, I., Crestani, M.Neuronatin (NNAT) is small transmembrane protein involved in a wide range of physiological processes, such as white adipose tissue browning and neuronal plasticity, as well as pathological ones, such as Lafora disease caused by the formation of NNAT aggregates. However, its 3D structure is unknown, and its mechanism of action is still unclear. In this study the two most well-known NNAT isoforms ( and {beta}) were modelled and the interaction with the SERCA2b calcium pump was assessed using computational methods. […]