Abstract
Background
Conservation and variation scores are used when evaluating sites in a multiple sequence alignment, in order to identify residues critical for structure or function. A variety of scores are available today but it is not clear how different scores relate to each other.
Results
We applied 25 conservation and variation scores to alignments from the Catalytic Site Atlas (CSA). We calculated distances among scores based on correlation coefficients, and constructed a dendrogram of the scores by average linking cluster analysis. The cluster analysis showed that most scores fall into one of two groups--substitution matrix based group and frequency based group respectively. We also evaluated the scores' performance in predicting catalytic sites and found that frequency based scores generally perform best.
Conclusions
Conservation and variation scores can be classified into mainly two large groups. When using a score to predict catalytic sites, frequency based scores that also consider a background distribution are most successful.