Machine learning approach for the prediction of the number of Sulphur atoms in peptides using the theoretical aggregated isotope distribution

Rapid Communications in Mass Spectrometry

Wiley: Rapid Communications in Mass Spectrometry: Table of Contents

Table of Contents for Rapid Communications in Mass Spectrometry. List of articles from both the latest and EarlyView issues.

Machine learning approach for the prediction of the number of Sulphur atoms in peptides using the theoretical aggregated isotope distribution

RATIONALE

The observed isotope distribution is an important attribute for the identification of peptides and proteins in mass spectrometry-based proteomics. Sulphur atoms have a very distinctive elemental isotope definition and therefore, the presence of Sulphur atoms has a substantial effect on the isotope distribution of biomolecules. Therefore, knowledge on the number of Sulphur atoms can improve identification of peptides and proteins.

METHODS

In this paper, we conduct a theoretical investigation on the isotope properties of Sulphur-containing peptides. We propose a gradient boosting approach to predict the number of Sulphur atoms based on the aggregated isotope distribution. We compare prediction accuracy and assess predictive power of the features using the mass and isotope abundance information from the first three, five, and eight aggregated isotope peaks.

RESULTS

Mass features alone are not enough to accurately predict the number of Sulphur atoms. However, we reach near-perfect prediction when we include isotope abundance features. The abundance ratio of the eight and the seventh, the fifth and the fourth, and the third and the second aggregated isotope peaks are the most important abundance features respectively. The mass difference between the eight, the fifth, or the third aggregated isotope peaks and the monoisotopic peak are the most predictive mass features respectively.

CONCLUSIONS

Based on the validation analysis it can be concluded that the prediction of the number of Sulphur atoms based on the isotope profile fails, because the isotope ratios are not measured accurately. These results indicate that it is valuable for future instrument developments to focus more on improving spectral accuracy to measure peak intensities of higher order isotope peaks more accurately.

Annelies Agten,
Jurgen Claesen,
Tomasz Burzykowski,
Dirk Valkenborg
February 17, 2023
https://analyticalsciencejournals.onlinelibrary.wiley.com/doi/10.1002/rcm.9480?af=R