Abstract | Research has shown that the functionalities of proteins are largely influenced by their three dimensional (3D) shapes. This observation is especially relevant in drug design, where the knowledge of the 3D structure of a protein enables pharmacologists to select the best binding proteins when aiming to moderate functions. However, a relatively small number of 3D shapes are known. In contrast, amino acid sequences may be acquired through very efficient automated, high throughput experimental methods and the amino acid sequences of a vast number of proteins have therefore been identified. It follows that it is important to address this knowledge gap. To this end, this paper introduces an approach to predict the 3D shapes of proteins, utilizing feed-forward artificial neural networks. Our novel solution allows one to learn the representations of the 3D shape associated with a protein by starting directly from its amino acid sequence descriptors. Once a neural network is trained, our search engine enables one to retrieve the closest known 3D shape associated with an unknown, so-called query protein. We evaluate the performance of our approach against the Protein Data Bank (PDB), by considering proteins from a diverse set of families. Our results indicate that our system is able to accurately find the most similar protein structures for a wide variety of protein 3D shapes and diverse protein family sizes. |
---|