ProSeg: a database of local structures of protein segments

J Comput Aided Mol Des. 2009 Mar;23(3):163-9. doi: 10.1007/s10822-008-9248-x. Epub 2008 Oct 16.

Abstract

Integration of knowledge on the sequence-structure correlation of proteins provides a basis for the structural design of artificial novel proteins. As one of strategies, it is effective to consider a short segment, whose size is in between an amino acid and a domain, as a correlation unit for exploring the structure-to-sequence relationship. Here we report the development of a database called ProSeg, which consists of two sub-databases, Segment DB and Cluster DB. Segment DB contains tens of thousands of segments that were prepared by dividing the primary sequences of 370 proteins using a sliding L-residue window (L = 5, 9, 11, 15). These segments were classified into several thousands of clusters according to their three-dimensional structural resemblance. Cluster DB contains much cluster-related information, which includes image, rank, frequency, secondary structure assignment, sequence profile, etc. Users can search for a suitable cluster by inputting an appropriate parameter (i.e., PDB ID, dihedral angles, or DSSP symbols), which identifies the backbone structure of a query segment. Analogous to a language, ProSeg could be regarded as a 'structure-sequence dictionary' that contains over 10,000 'protein words'. ProSeg is freely accessible through the Internet ( http://riodb.ibase.aist.go.jp/proseg/ ).

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Amino Acid Sequence
  • Cluster Analysis
  • Databases, Protein*
  • Internet
  • Protein Conformation
  • Proteins / chemistry*

Substances

  • Proteins