Research Article

Identification of Ca2+-binding residues of a protein from its primary sequence

Published: May 20, 2016
Genet. Mol. Res. 15(2): gmr7618 DOI: https://doi.org/10.4238/gmr.15027618
Cite this Article:
Z. Jiang, X.Z. Hu, G. Geriletu, H.R. Xing, X.Y. Cao, Z. Jiang, X.Z. Hu, G. Geriletu, H.R. Xing, X.Y. Cao (2016). Identification of Ca2+-binding residues of a protein from its primary sequence. Genet. Mol. Res. 15(2): gmr7618. https://doi.org/10.4238/gmr.15027618
2,305 views

Abstract

Calcium is one of the most abundant minerals in the human body, playing a critical role in many cellular activities by interacting with different calcium ion (Ca2+)-binding proteins. Therefore, the correct identification of Ca2+-binding residues is essential for protein functional research. In this study, a new method was developed to predict Ca2+-binding residues from the primary sequence without using three-dimensional information. Through statistical analysis, four kinds of feature parameters were extracted from amino acid sequences: the increment of diversity values of amino acid composition, the matrix scoring values of position conservation, the autocross covariance of physicochemical properties, and the center motif. These features served as input for a support vector machine to predict Ca2+-binding residues. This method was tested on four well-established datasets using a five-fold cross-validation. The accuracies and Matthews correlation coefficients were 75.9% and 0.53 (dataset 1), 79.2% and 0.58 (dataset 2), 77.4% and 0.55 (dataset 3), and 79.1% and 0.58 (dataset 4). Comparative results show that the developed method outperforms previous methods. Based on this study, a web server was developed for predicting Ca2+-binding residues from any protein sequence, being publically available at http://202.207.29.245/.

Calcium is one of the most abundant minerals in the human body, playing a critical role in many cellular activities by interacting with different calcium ion (Ca2+)-binding proteins. Therefore, the correct identification of Ca2+-binding residues is essential for protein functional research. In this study, a new method was developed to predict Ca2+-binding residues from the primary sequence without using three-dimensional information. Through statistical analysis, four kinds of feature parameters were extracted from amino acid sequences: the increment of diversity values of amino acid composition, the matrix scoring values of position conservation, the autocross covariance of physicochemical properties, and the center motif. These features served as input for a support vector machine to predict Ca2+-binding residues. This method was tested on four well-established datasets using a five-fold cross-validation. The accuracies and Matthews correlation coefficients were 75.9% and 0.53 (dataset 1), 79.2% and 0.58 (dataset 2), 77.4% and 0.55 (dataset 3), and 79.1% and 0.58 (dataset 4). Comparative results show that the developed method outperforms previous methods. Based on this study, a web server was developed for predicting Ca2+-binding residues from any protein sequence, being publically available at http://202.207.29.245/.