Our method uses predominantly the goodness of geometric superposition as similarity measurement and considers the minimum unit of a match between two protein structures is a pair of residues. Protein structure is represented, in structural comparison, by its traces of alpha carbons. Thus, in our view, protein was an object composed of connected points in a 3-dimensional space. To find the most similar part between two proteins thereby is equivalent to obtain a list of matched pairs of cartesian points. Insertion, deletion, and sequence-order-dependent problems will not exist in our method.
Our residue-based algorithm can be described in two processes. In first process, a set of superpositions based on local similarity between two proteins were obtained through an efficient exhausting searches. In second process, found local similarities were then used as a starting point in an iterative procedure to locate the best global similarity between two proteins. In overall, our method is a fast, reliable protein structure comparison tool.
Since proteins are connected chains, we can superimpose every three consecutive alpha carbon atoms of one protein onto every three consecutive alpha carbon atoms of the other protein in order to find all possible local similarities between proteins. Thus, comparing two proteins said both with N residues, we have to perform N2 operations of comparison for exaustively searching good local matches. With Geometry Hashing algorithm, we are able to accomplish the same task described above by performing only N instead of N2 operations. The Geometry Hash algorithm is composed of two procedures, namely the "hash table construction" and "voting" procedure, respectively.
For any frame pair in which vote count exceeds a preset threshold, it is called a seed match. At this stage, we are able to detect local similarity between two proteins by geometric superposition criteria but no actual superimposition calculation was performed. A translational vector and rotational matrix for every seed match was calculated by a superposition procedure via its associated list of matched alpha carbon atom pairs. If a protein was compared with itself, we are going to obtain N-2 exactly the same seed matches. This imply similar seed matches are needed to be clustered and coalesced. After similar seed matches were grouped, a set of potential matches emerged and passed to the extending process.