With the current completion of the many genome initiatives, the focus of bioinformatics has turned from genes to proteins. My group is focused on computational proteomics: the application of informatics techniques to understanding the structure, function, and interaction of the protein universe.
Such methods are particularly useful in cases where large databases of biological information are mined for valuable insights. We are constructing a library of interacting secondary structure motifs from known protein structures. These motifs will be used in methods for de novo protein structure prediction and protein function identification. This project mines the current database of solved protein structures (the Protein Data Bank) to find recurring secondary structure pairs. Clustering these pairs identifies recurrent interacting motifs. As an extension to this project, we will classify the interactions important for signaling between proteins. These will be collected into a database that can be further mined for biological information.
Current models of protein folding are unable to correctly describe the folding landscape. One weakness of these models is how they treat the solvent surrounding a protein. We are testing various models for the water solvent against data derived from NMR (Nuclear Magnetic Resonance) experiments on small peptides. By comparing our model against experimental data, we hope to obtain a more accurate model for the protein solvation and a more accurate potential describing a protein folding landscape.