Collective Density Estimation of Ramachandran Distributions

Published in Research Tagged under Ramachandran, Collective Density Estimation, Nonparametric Modeling, Protein, Backbone Angles Written by Super User Be the first to comment!

Recently, studying the structure of proteins using angular representations has attracted much attention. To address the challenge of efficient modeling, we take into consideration the continuous conformational space of protein structures and model to discover the commonalities between different Ramachandran plots. It is not usual to observe entirely different biological functionality for certain groups of proteins that have common overall structure, because of local differences in that structure. Therefore, protein structure classification, which reveals an evolutionary relationship between the biochemical compounds, has broad implications and applications in understanding the functionality of the proteins for scientists. This general framework also provides a comprehensive machinery for clustering, model assessment,and data modeling for groups of protein backbone angles. Joint estimation of the protein sets lets us use commonalities within the group for estimating the density of small sample size Ramachandran plots and also allows for recognition of outliers.

In Maadooliat et. al (2015), we developed a comprehensive statistical model to characterize the protein as a circular manifold and describe the protein angles with a smaller number of dimensions. The proposed method takes into account the circular nature of the angular data using a penalized spline which is statistically more relaxed than a parametric angular distribution using a sphere or torus. 

To assess the proposed  model, we implemented estimations of Ramachandran neighbor-dependent of amino acids in Rosetta software and showed that the results of loop modeling in a benchmark set are more consistent than competing techniques. 

Bayesian Alignment of Proteins via Delaunay Tetrahedralization

Published in Research Tagged under shape analysis, protein Alignment, Bioinformatics, Bayesian Modeling, Machine Learning, Directional Statistics Written by Super User Be the first to comment!

Today, statisticians are trying to use geometrical information of proteins to model alignment of two or more proteins probabilistically. Although superimposing is based on similarity, the main challenges are in proposing the sensible inferential methods and proper prediction algorithms for structural biomolecules to facilitate identification of the function of a protein. Directional statistics, which exploits statistical modeling in non-Euclidean space, deals with data structures that cannot be modeled with regular statistical methods. For example, optimal superimposition of protein sets on each other by rigid body transformations should be modeled in non-Euclidean shape space for proteins. In doing so, we enrich the Bayesian model for the matching problem in statistical shape analysis by using the geometric and sequence information of proteins (Najibi et. al 2015). We proposed a Delaunay tetrahedralization method using sequence and amino acid type of proteins  as an adaptive empirical prior in our Bayesian model and achieved significant improvement in convergence rate compared to previous statistical models.