Access to the Document
Monophyletic clustering and characterization of protein families
Zhang, Jian ; Zhao, Zhiyuan ; Evershed, Jennifer ; Li, Guoying
Journal of Integrative Bioinformatics - JIB (ISSN 1613-4516)
A protein family contains sequences that are evolutionarily related. Generally, this is reflected by sequence similarity. There have been many attempts to organize the set of protein families into evolutionarily homogenous clusters using certain clustering methods. How do we characterize these clusters? How can we cluster protein families using these characterizations? In this work, these questions were addressed by use of a concept called group-wide co-evolution, and was exemplified by some real and simulated protein family data. The results have shown that the trend of a group of monophyletic proteins might be characterized by a normal distribution, while the strength and variability of this trend can be described by the sample mean and variance of the observed correlation coefficients after a suitable transformation. To exploit this property, we have developed a monophyletic clustering method called monophyletic k-medoids clustering. A software package written in R has been made available at http://www.kent.ac.uk/ims/personal/jz .
||Faculty of Technology, Research Groups in Informatics
||Data processing, computer science, computer systems