Virtual Quantification of Protein Stability Using Applied Kinetic and Thermodynamic Parameters

Protein stability, the most important aspect of molecular dynamics and simulations, requires sophisticated instrumentations of molecular biology to analyze its kinetic and thermodynamic background. Sequenceand structure-based programs on protein stability exist which relies only on single point mutations and sequence optimality. The energy distribution conferred by each hydrophobic amino acid in the protein essentially paves way for understanding its stability. To the best of our knowledge, Protein Stability is a first program of its kind, developed to explore the energy requirement of each amino acid in the protein sequence derived from various applied kinetic and thermodynamic quantities. The algorithm is strongly dependent both on kinetic quantities such as atomic solvation energies and solvent accessible surface area and thermodynamic quantities viz. enthalpy, entropy, heat capacity, etc. The hydrophobicity pattern of protein was considered as the important component of protein stabilization.

In recent years, a certain success has been achieved in understanding the molecular basis of protein stability, mainly due to the considerable increase in the number of available amino acid sequences and 3-D structures.Protein stability is quantitatively described by the standard Gibbs energy change (ΔG).Such ΔG values are important properties for a quantitative comparison of stabilities of different proteins (Him et al., 1993).Computational tools are available which calculates the single point mutations from protein"s 3D structure such as I-Mutant 2.0 (Capriotti et al., 2001), FoldX (Schymkowitz et al., 2005), etc and from sequence-based approaches implemented in I-Mutant 2.0 (Capriotti et al., 2001), CUPSAT (Parthiban et al., 2006), etc.A new functionality called "sequence optimality" developed in PoPMuSiC 2.1, estimates the optimality of each amino acid in the sequence with respect to the stability of the structure that can be used to detect structural weaknesses (a cluster of non-optimal residues) which may represent interesting sites for introducing targeted mutations.However, this optimality predictor is simply derived from large-scale protein catalytic site data (Dehouck et al., 2011).The term "protein stability" used by these servers/programs is based on the intention of protein engineering and makes use of evolutionary protein sequence dynamics, statistical potentials extracted from datasets of protein structures, empirical potentials built from optimized combinations of various physical energy terms, etc (Capriotti et al., 2001;Schymkowitz et al., 2005;Parthiban et al., 2006;Dehouck et al., 2011).Here, we mean "protein stability" in the aspect of protein dynamics and distribution of hydrophobic amino acids which drives protein folding.Hydrophobic interaction is a major force contributing to the structural stability of proteins, nucleic acids and membranes.Gibbs free energy is an additive term, i.e. equal contribution of all the components (amino acids) in the system and its types of interaction (Herzfeld, 1991).It has also been generally agreed that hydrophobic effect i.e. the energy of stabilization provided by the transfer of hydrocarbon surfaces from solvent to interior of the protein, is about 25-30 cal/mol Å -2 (Matthews, 1993).Mutagenic studies on destabilization of T4 lysozyme strongly suggested that the stability of the protein is strongly dominated by its rigid parts and the flexible solvent-exposed part contribute little (Albert et al., 1987).
To understand the protein stability, kinetic and thermodynamic quantities in terms of Gibbs free energy term for all twenty natural amino acids had been used to BIOCHEMISTRY yield a more precise description of this process.The distribution of Gibbs free energy for hydrophobic amino acids indicates that there exists a strong correlation among frequency of hydrophobic amino acids, hydrophobicity, energy consumption with its equilibrium and stability.We proposed a new equation for Gibbs free energy calculation which takes into account all the important thermodynamic quantities.Benchmarking with site-directed mutagenesis experimental data demonstrated its ability to predict the overall protein stability in terms of hydrophobic amino acids (Matthews, 1993;Albert et al., 1987).
Octanol-to-water partitioning model was chosen to derive ΔG values which were based on kinetic parameters such as solvation energies of amino acid side chains and backbone in the pentapeptide, AcWL-X-LL (Wimley et al., 1996).This pentapeptide was chosen due to the following two reasons: i) it provides neighboring nonpolar side chains of moderate size and ii) average solvent-accessible surface areas were computationally analyzed by hard-sphere Monte Carlo simulations which necessarily occluded nonpolar average surface areas (ASAs) by neighboring residues.Thermodynamic quantities were derived from thermal denaturation of protein.Cytochrome-c denaturation experiments (Taneja and Ahmed, 1994) was chosen due to the following two reasons: i) denaturation can be efficiently monitored in the visible region, ii) microcalorimetric measurements suggest that its denaturation follows a two-state mechanism (native to denatured) for which conformational transition can be efficiently scrutinized.It was shown that small differences in amino acid sequence can cause changes in the stability of the protein.For example, ferredoxin from Clostridium thermosaccharolyticum differs from its less stable relative from Clostridium tartarivorum in only two positions: glutamines 31 and 44 are replaced by glutamates (Perutz and Raidt, 1975).Partitioning model helped to explore its kinetics and thermal denaturation gives major contribution to understand its transition from native to folded structure through thermodynamics.Both require the estimation of Gibbs free energy to study each amino acid contribution for the maintenance of native structure (kinetically and thermodynamically).
According to the best of our knowledge, Protein Stability, a first program of its kind, was developed which takes raw amino acid sequence as its input and produces energy distribution for individual amino acids and its overall stability.The main objective of this program is that one might get a clear understanding of the protein stability from the sequence itself without the need of its 3D structure which can help us to study the protein dynamics and folding pattern which act as a prerequisite for protein characterization experiments.This program will serve as a better tool for understanding protein stability in the context of molecular dynamics and the important amino acids in the domains driving folding.The program was written in PERL (Practical Extraction Report Language) programming language and distributed as Windows executable file.Academic and non-academic users can freely download this program hosted at http://virtualprotstab.webs.com.

Algorithm Development Algorithm on Protein Kinetics
Kinetic parameters such as Atomic Solvation Parameters (ASPs) were derived from octanol-to-water free energies calculation for the twenty natural amino acids (X) in the pentapeptide, AcWL-X-LL (Wimley et al., 1996).Gibbs free energies (ΔG) were calculated for each amino acid as follows: where Ai are the atomic solvent accessible surface areas and the σi are the ASP for the atomic group i (Eisenberg andMcLachlan,1986, Wesson andEisenberg, 1992).In the view of kinetics, Gibbs free energy is also known by "Gibbs free energy of activation".

Algorithm on Protein Thermodynamics
Thermodynamic quantities were studied for 13 amino acids from isothermal denaturation experiments in cytochrome-c (globular state) and it adopted a parabolic distribution (Taneja and Ahmed, 1994).
This function was used to explore the thermodynamic properties for the remaining 7 amino acids, based upon two amino acid properties: Solvent Exposed Area (SEA) > 30 Å 2 relative to individual amino acids (Bordo and Argos, 1991) and hydrophobicity scale (Wolfenden et al., 1981).5 blocks were constructed based upon the hydrophobicity (GLIVA, FCM, TSWYP, NKQEHD and R).In each block, the unavailable amino acid"s quantities were assigned with the available experimental data of a single amino acid (named as base) of the respective block.

BIOCHEMISTRY
The base for each block was selected by following a 1-D graph in which SEA property was plotted.The 5 blocks were encircled and in each block, the steepest descent amino acid (base) having the experimental data, were considered for assigning amino acid thermodynamic quantities.In order to penalize such assignments, nearest-neighbor approach was used (Figure 1).The formalism as follows: where aau, unassigned amino acid, p(aau)bn , the penalty for unassigned amino acid and bn, the block number : bn= 1 to 5. The penalty was multiplied with the above mentioned thermodynamic quantities for the newly assigned amino acids in order to follow the parabolic distribution.
Several studies formulated only a few quantities to study the Gibbs free energy and not constituted the important thermodynamic details of protein stability such as entropy, heat capacity, etc (Juffer et al., 1995).In this study, we propose a new equation for Gibbs free energy calculation which include prominent thermodynamic quantities such as the midpoint of thermal transition (Tm), standard enthalpy change at Tm (ΔHm), standard entropy change at Tm (ΔSm), change in heat capacity during transition (ΔCp) and the temperature at which liquid hydrocarbons solubility is minimum (Th).First, ΔCp for each amino acid was calculated as follows: The thermal denaturation of protein in terms of Gibbs energy was computed using equation ( 6) and the results were shown in (Table 1).From here, we will mention ΔGD as ΔGt, Gibbs free energy term where"t" represents thermodynamics.

PERL Programming
Kinetic and thermodynamic quantities were evaluated for individual amino acid with the intention of studying the protein stability in terms of hydrophobic amino acids and its correlation with the frequency of occurrence, properties (both kinetics and thermodynamics) and

BIOCHEMISTRY
Gibbs free energy contribution in stabilizing the protein"s 3-D structure.Several parameters were taken into account so as to minimize approximations in computational analysis (Table 1)."Protein Stability" program written in PERL language is intended to provide the frequency of individual amino acids, its Gibbs energy in terms of kinetic (otherwise called as Gibbs free energy of activation) and thermodynamic calculations, hydrophobic trend (kinetics: R>G>Y>F>K>L>I, thermodynamics: R>G>H>A>K>S), frequency of hydrophobic amino acids, protein stability, and the Gibbs energy contribution of hydrophobic amino acids in stabilizing the protein structure via kinetic and thermodynamic calculations (Figure 2).ΔGk Gibbs free energy of activation term (kinetic calculations), ΔGt Gibbs free energy term (thermodynamic calculations), Tm midpoint of thermal transition, ΔHm standard enthalpy change at Tm, ΔSm standard entropy change at Tm, ΔCp change in heat capacity during transition, p(aau) penalty of unassigned amino acids (calculated using Nearest-Neighbor approach), "-" indicates no penalty levied.Note: ΔCp and ΔGt, were calculated using equation ( 3) and ( 6), respectively.
Protein stability and Gibbs energy in terms of hydrophobic amino acids were computed using equation ( 7) and (8), respectively.
Protein stability = Σaah / Σf(aa1-20) …..( 7) where h, hydrophobic, ΔGh, Gibbs energy of hydrophobic amino acids (aah), f(aa1-20), total frequency of amino acids in the protein, hk and ht, hydrophobic amino acids according to hydrophobic trend in the context of kinetics and thermodynamics.The descriptor "protein stability" and its numerical values reflect the distribution of hydrophobic amino acids across the protein sequence.If the hydrophobic amino acids were higher in counts, the protein stability value will also increase.Thus, it is highly recommended that protein stability value and Gibbs energy in terms of hydrophobic amino acids calculated for both the kinetic and thermodynamic calculations should be analyzed and compared simultaneously.

RESULTS AND DISCUSSIONS
Most globular proteins relies on their packaging for its stability and hydrophobicity is one such force which drives the molecule toward a more condensed structure by decreasing the unfavorable contacts between the hydrophobic residues and water molecules (Lins and Brasseur, 1995).This process essentially necessitates the spending of free energy for proper folding by BIOCHEMISTRY hydrophobic residues.The graphical relationship between kinetic and thermodynamic energy terms showed that there exists a relationship between protein"s hydrophobicity and its stability (Figure 2).If the count of hydrophobic amino acids were higher, then the protein stability value along with the Gibbs energy term for kinetic and thermodynamic calculations will also increase.For a protein to maintain its stability there is a need of sufficient hydrophobic residues which will utilize free energy to guide proper folding.It was observed graphically that high and/or moderate free energy utilization by individual amino acids indicated peaks whereas low energy represented as descents.In kinetics point of view, more peaks represent more free energy utilization by hydrophobic amino acids.Hence, we showed that more peaks resulted in increased hydrophobicity of a protein and the related Gibbs free energy of activation utilized by hydrophobic residues will also tend to increase.
In the context of thermodynamics, the requirement of such low energy (graphically represented as descents or valleys) essentially quantifies the stability as the parameters were taken from protein stability experiment and the Gibbs free energy in terms of hydrophobic residues will tend to decrease due to the fact that these energies were numerically negative.Therefore, it is clearly understood that the occurrence of peaks were equally dominated by descents, in other words, the high energy expense by hydrophobic amino acids is equally amended by low energy of other amino acids so as to maintain equilibrium to establish a compact structure with less energy.
It can be debated that why protein stability is dependent upon its hydrophobicity.The explanation is that in order to establish stronger interaction with the solvent, more Gibbs free energy will be consumed and hydrophobic domains are mainly responsible for such consumption.
To maintain an energy equilibrium, buried residues utilizes low energy because of the fact that its surface exposed area is relatively small and its interaction with solvent is preferably less (Herzfeld, 1991).Therefore, protein stability is largely attributed to the high frequency of hydrophobic amino acids.The program estimates the protein stability descriptor from the frequency of hydrophobic amino acids (refer equation 7).
As the protein"s 3-D structure is determined by its amino acid sequence, we performed analysis to understand the protein stability from its sequence itself and developed a program named as "Protein Stability" written in PERL language (Tisdall, 2001) to address such issues.Simply, the users have to execute the program through a command line interpreter and provide the raw sequence data in a text editor document (Figure 3).Cytochrome-b protein sequence (NCBI Ac.No. AAA31851) was analyzed using this program (Figure 4) and it showed that leucine and phenylalanine contributed more for its stability.The interpretation of the result is discussed as follows.First, we have to find the top most 2 residues whose Gibbs free energy of activation is higher under kinetics column.Leucine and phenylalanine scored a value of 114.5 and 50.92Kcal/mol, respectively and thus, these two amino acids consumes more free energy of activation to promote folding and contributes more for the protein stability.
Next, the Gibbs energy term corresponding to tha above mentioned 2 amino acids under thermodynamics column should be inspected.These two amino acids were found to be stable and can promote the thermal stability of the protein in part, as the energy values were found to be -2757.49and -978.58KJ/mol, respectively.Now, we have to look upon the frequency of these 2 amino acids.Noteworthy, the frequency of both amino acids is relatively more (50 and 19).Hence, leucine and BIOCHEMISTRY phenyalanine may drive the folding mechanism and stability of the protein in partial.From this analysis, one might get an overall idea about the importance of amino acids in kinetics and thermodynamics point of view and the major force promoting the protein stability.Benchmarking was carried out with site-directed mutagenesis experimental data from "cavity-creating" leucine to alanine replacements and its relation to hydrophobic effect in T4 lysozyme to enumerate the prediction accuracy of "protein stability" descriptor (Erikkson et al., 1992).The following mutations viz.L46A, L99A, L118A, L121A and L133A was analyzed using the program.The results clearly demonstrated that the protein stability value in terms of hydrophobic amino acids were found to be less in kinetics calculation and tend to increase in thermodynamic calculation when the mutated protein was compared to normal (protein stability value in terms of hydrophobic amino acids for kinetic calculation: normal protein = 0.4573 Kcal/mol; mutated protein = 0.4294 Kcal/mol and thermodynamic calculation: normal protein = 0.3719 KJ/mol; mutated protein = 0.4049 KJ/mol) (Table 2).To understand the protein descriptor values, the difference pertaining to both kinetic and thermodynamic calculations should be normalized and evaluated.The normalized difference in protein stability showed that 85.4 acuuracy was found in normal protein whereas a single-point mutation, say, L46A in the sequence dropped its value to 73.2 and if all the single-point mutations were considered, then this value dropped to 24.5.The primary reason for this drastic variation in the protein stability value was due to the participation of leucines in the hydrophobic trend in kinetic calculation and found to promote the hydrophobic effect and the alanines in the hydrophobic trend in thermodynamic calculation and known to promote the thermal stability of the protein.Thus, it was demonstrated that contribution of hydrophobic amino acids in the protein sequence stabilizes and promotes folding.
Table 2. Protein stability value interpretation *Same result were obtained when single point mutations were performed one by one in the protein sequence because a single character "L" replacement by "A" will not substantially affect the results.† However, when all the single-point mutations were considered together, it is affecting the protein stability values and its corresponding normalized difference.
The main advantage of this program is the algorithm which is built upon prominent kinetic and thermodynamic quantities.The program script has been converted into Windows executable which eliminates the need of installation of PERL interpreter in the computer.Hence, the program is distributed as a standalone for Windows operating system.The limitation of this program is that it takes into account the kinetic and thermodynamic quantities solely from pentapeptide partitioning model and cytochrome thermal denaturation protein experiments.Hence, it is applicable only to globular proteins and not to soluble and membrane proteins.The Gibbs energy term depends upon the nature of experiments and will vary tremendously.Although our intention is to give a better understanding of the protein stability with applied parameters and it can be extendable to any experiments by approximations and/or optimization of quantities.Further, we urge the importance of hydrophobicity towards the stabilization of protein and no other interactions were considered in this regard.

CONCLUSION
Molecular dynamics and simulation only achieve a lowest energy conformer of a protein, which is

BIOCHEMISTRY
essentially need not be a stabilized structure.Hence, there is a tremendous requirement for the integration of kinetic and thermodynamic studies to understand the protein stability.In this study, we examined prominent kinetic and thermodynamic quantities to explore the energy and its equilibrium to stabilize structure.A program named as "Protein Stability" was developed to study the Gibbs free energy distribution from the protein sequence itself.The program is aimed to study the protein dynamics and folding pattern which act as a prerequisite for protein characterization experiments.It is developed in a view that one might get a clear understanding of the protein stability from the sequence itself without the need of its 3D structure.This program will serve as a better tool for understanding protein stability in the context of molecular dynamics and the important amino acids in the domains driving folding.

Figure 1 .
Figure 1.Nearest Neighbor approach applied for III block (TSWYP).Serine (S; violet colored circle) formed the base for the block TSWYP as it is the nearest-neighbor for the unassigned amino acids, Tryptophan (W; green colored circle) and Tyrosine (Y; green colored circle).The differences in surface exposed area (SEA) between the base and the unassigned amino acids were considered to penalize the assignment of values.p(aau) bn = SEA(basebn) -SEA(aau) .....(2)

Figure 3 .
Figure 3. Perl command line interface showing the results of cytochrome-b.Cytochrome-b protein sequence was given as input.Leucine and phenylalanine scored 114.5 and 50.92Kcal/mol as Gibbs free energy of activation (Kinetics column) and -2757.49and -978.58KJ/mol as Gibbs free energy (Thermodynamics column), respectively.

Figure 4 .
Figure 4. Results of protein stability for cytochrome-b.Leucine and phenylalanine contributes more for the protein stability in terms of hydrophobicity (ΔGk for leucine = 114.5 Kcal/mol; ΔGk for phenylalanine = 50.92Kcal/mol) and stability (ΔGt for leucine = -2757.49KJ/mol; ΔGt for phenylalanine = -978.58KJ/mol).The peaks and descents corresponding to L and F amino acids demonstrates that these amino acids were crucial for protein stability.

BIOCHEMISTRY
algorithms and tools development pertaining to structural bioinformatics, molecular modeling and cheminformatics.Muthusamy Meenachi, MSc, MPhil, PGDBI She is currently working as Assistant Professor and Head in the Department of Bioinformatics, Achariya Arts and Science College, Pondicherry.With more than 7 years of teaching experience, she has contributed more for the development of scientific community by providing practical experience to students in the areas of genetic engineering and microbiology.She is interested in genomics, proteomics, microbial biotechnology and genetic engineering.This journal is published by the University Library System of the University of Pittsburgh as part of its D-Scribe Digital Publishing Program, and is cosponsored by the University of Pittsburgh Press.

Table 1 .
Parameters devised in this study.