Rapid Communications in Mass Spectrometry
Wiley: Rapid Communications in Mass Spectrometry: Table of Contents
Machine learning approach for the prediction of the number of Sulphur atoms in peptides using the theoretical aggregated isotope distribution
RATIONALE
The observed isotope distribution is an important attribute for the identification of peptides and proteins in mass spectrometry-based proteomics. Sulphur atoms have a very distinctive elemental isotope definition and therefore, the presence of Sulphur atoms has a substantial effect on the isotope distribution of biomolecules. Therefore, knowledge on the number of Sulphur atoms can improve identification of peptides and proteins.
METHODS
In this paper, we conduct a theoretical investigation on the isotope properties of Sulphur-containing peptides. We propose a gradient boosting approach to predict the number of Sulphur atoms based on the aggregated isotope distribution. We compare prediction accuracy and assess predictive power of the features using the mass and isotope abundance information from the first three, five, and eight aggregated isotope peaks.
RESULTS
Mass features alone are not enough to accurately predict the number of Sulphur atoms. However, we reach near-perfect prediction when we include isotope abundance features. The abundance ratio of the eight and the seventh, the fifth and the fourth, and the third and the second aggregated isotope peaks are the most important abundance features respectively. The mass difference between the eight, the fifth, or the third aggregated isotope peaks and the monoisotopic peak are the most predictive mass features respectively.
CONCLUSIONS
Based on the validation analysis it can be concluded that the prediction of the number of Sulphur atoms based on the isotope profile fails, because the isotope ratios are not measured accurately. These results indicate that it is valuable for future instrument developments to focus more on improving spectral accuracy to measure peak intensities of higher order isotope peaks more accurately.
Annelies Agten,
Jurgen Claesen,
Tomasz Burzykowski,
Dirk Valkenborg
February 17, 2023
https://analyticalsciencejournals.onlinelibrary.wiley.com/doi/10.1002/rcm.9480?af=R