We have developed an automatic, reliable procedure to cut protein into hydrophobic folding units. In our approach, the quality of a hydrophobic folding unit is evaluated by four measurements. The first two quantities correspond to visual inspection of a structural domain, which can be described in terms of two visualization concepts, the compactness and the isolatedness. We used the compact definition proposed by Zehfus and Rose (1986) to calculate the compactness of a cut protein unit. The isolatedness of an unit was based on the solvent accessible surface area (ASA) that were originally buried in the interior but became exposed to solvent after cutting. The third quantity is the hydrophobicity which was equivalent to the fraction of buried non-polar ASA with respect to total non-polar ASA. The last factor used in evaluation of a folding unit is the number of segments of the cut unit. We follow Holm and Sander's procedure (1994) to reduce the multiple-cutting-point problem to an one-dimensional search for all reasonable trial cuts. However, in order to bring out the plausible hydrophobic core, the contact matrix used to obtain the first non-trivial eigenvector only contains the hydrophobic contacts. Along the amplitude of the eigenvector, as many units as twice the size of protein were generated. Each candidate was then given a score with a function based on the above four measurements. The one with an local optimal score was cut out as a folding unit and the algorithm iterates alternatively along the opposite amplitude of the eigenvector. As long as residues left in the process is less than a preset minimum number of residues required for a hydrophobic folding unit, the cutting algorithm finishes and stops. A training set of 20 single-chain proteins were chosen to obtain optimal parameters for the scoring function with the associated automatic cutting algorithm. We discuss why all the four ingredients are needed for the definition of a hydrophobic folding unit. The cutting results of all chains in Protein Data Bank are available. The database of hydrophobic folding units generated by our procedure are useful for understanding the mechanism of protein folding, for recognizing a particular fold, and for protein engineering.



An example : 1atnA being cut into four units :
Click on image to view in full scale.


See also Hydrophobic folding units at interfaces


Please send questions or suggestions to tsai@ncifcrf.gov
Jan 28, 2005