Solved by verified expert:The
purpose of this assignment is to summarize and evaluate existing
“literature” (or published material) in order to establish current
knowledge on the subject of homology modeling of proteins. You can use the
paper that I posted or use some other from the literature but do not forget to
cite them.
lab6_literatureandgui__1_.docx
rkfiser_methodsmolbiol_2010.pdf
rkjacobson_annrepmedchem_2004.pdf
rkresearch_paper_rubric__1_.pdf
Unformatted Attachment Preview
BIO 3356
MOLECULAR MODELING
NAME:
CITY TECH, CUNY
DATE:
Lab06 – Literature summary on Homology Modeling (50 pts)
The purpose of this assignment is to summarize and evaluate existing “literature” (or published
material) in order to establish current knowledge on the subject of homology modeling of
proteins. You can use the paper that I posted or use some other from the literature but do not
forget to cite them.
What format should you use for the review?
A literature review of formal academic writing includes:
Introduction
Body
Conclusion
Introduction
• It defines or identifies the general topic, provides an appropriate context for describing
the available methods found in the literature.
Body
• It summarizes individual studies or articles with as much or as little detail as each merits
according to its comparative importance in the literature, remembering that space
(length) denotes significance.
• You should not keep the title Body in your report but instead create different titles
relative to the content.
Conclusion
• It summarizes major contributions of significant studies and articles to the body of
knowledge under review, maintaining the focus established in the introduction.
References
• Identify at least two sources that could be useful for your summary. They may be news
articles, non-review journal articles, websites, software, or any other material from the
Internet. Write down the reference citation in a proper APA format.
To guide you, you will find 2 articles about homology modeling, and a very recent research
(from 2016) on the modeling of Zika virus proteins.
Your summary should answer the following questions in the body:
1. Define Comparative/Homology Modeling.
2. Enumerate the steps in Comparative/Homology Modeling?
3. What tools/software are needed for each steps.
BIO 3356
MOLECULAR MODELING
CITY TECH, CUNY
4. Draw or provide a flow chart of the steps involved in homology protein structure
modeling. (you can use an image found online (cite the source) or draw it yourself.)
5. Which are the most commonly used homology-modeling tools/software/websites?
6. What does template structure mean?
7. Can we trust the models obtained blindly?
8. What are the possible applications of Comparative/Homology Modeling?
9. Discuss one example of a homology modeling study.
BONUS ACTIVITY: MODELLER with GUI INTERFACE (10 pts)
For this bonus activity, your task is to search for a homology modeling software that uses
Modeller within a GUI graphical interface allowing a friendlier use of the python scripts.
Once you find one, describe briefly what it does, and use the target sequence below to build
models. Provide a snapshot and explanation of your steps.
Target Sequence:
MVHLTPEEKSAVTALWGKVNVDEVGGEALGRLLVVYPWTQRFFESFGDL
STPDAVMGNPKVKAHGKKVLGAFSDGLAHLDNLKGTFATLSELHCDKLH
VDPENFR
NIH Public Access
Author Manuscript
Methods Mol Biol. Author manuscript; available in PMC 2014 July 23.
NIH-PA Author Manuscript
Published in final edited form as:
Methods Mol Biol. 2010 ; 673: 73–94. doi:10.1007/978-1-60761-842-3_6.
Template-Based Protein Structure Modeling
Andras Fiser
Abstract
NIH-PA Author Manuscript
Functional characterization of a protein is often facilitated by its 3D structure. However, the
fraction of experimentally known 3D models is currently less than 1% due to the inherently timeconsuming and complicated nature of structure determination techniques. Computational
approaches are employed to bridge the gap between the number of known sequences and that of
3D models. Template-based protein structure modeling techniques rely on the study of principles
that dictate the 3D structure of natural proteins from the theory of evolution viewpoint. Strategies
for template-based structure modeling will be discussed with a focus on comparative modeling, by
reviewing techniques available for all the major steps involved in the comparative modeling
pipeline.
Keywords
Homology modeling; Comparative protein structure modeling; Template-based modeling; Loop
modeling; Side chain modeling; Sequence-to-structure alignment
1. Introduction
NIH-PA Author Manuscript
The class of methods referred to as template-based modeling includes both the threading
techniques that return a full 3D description for the target and comparative modeling (1). This
class of protein structure modeling relies on detectable similarity spanning most of the
modeled sequence and at least one known structure. Comparative modeling refers to those
template-based modeling cases where not only the fold is determined from a possible set of
available templates, but a full atom model is also built (2). In practice, it means that if the
structure of at least one protein in the family has been determined by experimentation, the
other members of the family can be modeled based on their alignment to the known
structure. It is possible because a small change in the protein sequence usually results in a
small change in its 3D structure (3). It is also facilitated by the fact that 3D structure of
proteins from the same family is more conserved than their amino-acid sequences (4).
Therefore, if similarity between two proteins is detectable at the sequence level, then
structural similarity can usually be assumed. The increasing applicability of template-based
modeling is owing to the observation that the number of different folds that proteins adopt is
rather limited and because worldwide Structural Genomics projects are aggressively
mapping out the universe of possible folds (5–7).
Template-based approaches to structure prediction have their advantages and limitations.
Comparative protein structure modeling usually provides high-quality models that are
comparable with low-resolution X-ray crystallography or medium-resolution NMR solution
structures. However, the applicability of these approaches is limited to those sequences that
Fiser
Page 2
NIH-PA Author Manuscript
can be confidently mapped to known structures. Currently, the probability of finding related
proteins of known structure for a sequence picked randomly from a genome ranges
approximately from 30 to 80%, depending on the genome. Approximately 70% of all known
sequences have at least one domain that is detectably related to at least one protein of known
structure (8). This fraction is more than an order of magnitude larger than the number of
experimentally determined protein structures deposited in the Protein Data Bank (PDB) (9).
As we will see, in practice, template-based modeling always includes information that is
independent from the template, in the form of various force restraints from general statistical
observations or molecular mechanical force fields. As a consequence of improving force
fields and search algorithms, the most successful approaches often explore more and more
template-independent conformational space (10, 11).
2. Methods
NIH-PA Author Manuscript
All current comparative modeling methods consist of five sequential steps: (1) to search for
proteins with known 3D structures that are related to the target sequence, (2) to pick those
structures that will be used as templates, (3) to align their sequences with the target
sequence, (4) to build the model for the target sequence given its alignment with the
template structures, and (5) to evaluate the model, using a variety of criteria.
There are several computer programs and web servers that automate the comparative
modeling process (Table 1). While the web servers are convenient and useful (10, 12–14),
the best results are still obtained by nonautomated, expert use of the various modeling tools
(15). Complex decisions for selecting the structurally and biologically most relevant
templates, optimally combining multiple template information, refining alignments in
nontrivial cases, selecting segments for loop modeling, including cofactors and ligands in
the model, or specifying external restraints require an expert knowledge that is difficult to
fully automate (16), although more and more efforts on automation point to this direction
(17, 18).
2.1. Searching for Structures Related to the Target Sequence
NIH-PA Author Manuscript
Comparative modeling usually starts by searching the PDB (9) for known protein structures
using the target sequence as the query. This search is generally done by comparing the target
sequence with the sequence of each of the structures in the database.
There are two main classes of protein comparison methods that are useful in fold
identification. The first class compares the sequences of the target with each of the database
templates by using pairwise sequence–sequence comparisons (such as FASTA and BLAST
(19)) (20–22) and fold assignments (23). To improve the sensitivity of the sequence-based
searches, evolutionary information can be incorporated in the form of multiple sequence
alignment (24–28). These approaches begin by finding all sequences in a sequence database
that are clearly related to the target and easily aligned with it (29, 30). The multiple
alignment of these sequences is the target sequence profile, which implicitly carries
additional information about the location and pattern of evolutionarily conserved positions
of the protein. The most well-known program in this class is PSI-BLAST (27), which
implements a heuristic search algorithm for short motifs. A further step to increase the
Methods Mol Biol. Author manuscript; available in PMC 2014 July 23.
Fiser
Page 3
NIH-PA Author Manuscript
sensitivity of this approach is to precalculate sequence profiles for all the known structures
and then use pairwise dynamic programming algorithm to compare the two profiles. This
has been implemented, among other programs, in COACH (31) and FFAS03 (32, 33). The
construction of profile-based Hidden Markov Models (HMM) is another sensitive way to
locate universally conserved motifs among sequences (34). A substantial improvement in
HMM approaches was achieved by incorporating information about predicted secondary
structural elements (35, 36). Another development in this group of methods is the
phylogenetic tree-driven HMM, which selects a different subset of sequences for profile
HMM analysis at each node in the evolutionary tree (37). Locating sequence intermediates
that are homologous to both sequences may also enhance the template searches (22, 38).
These more sensitive fold identification techniques are especially useful for finding
significant structural relationships when sequence identity between the target and the
template drops below 25%. More accurate sequence profiles and structural alignments can
be constructed with consistency-based approaches such as T-Coffee (39), PROMAL (and
PROMAL3D for structures) (40, 41), and ProbCons (42).
NIH-PA Author Manuscript
The second class of methods relies on pairwise comparison of a protein sequence and a
protein structure; the target sequence is matched against a library of 3D profiles or threaded
through a library of 3D folds. These methods are also called fold assignment, threading, or
3D template matching (32, 43–47). These methods are especially useful when sequence
profiles are not possible to construct because there are not enough known sequences that are
clearly related to the target or potential templates.
Template search methods “outperform” the needs of comparative modeling in the sense that
they are able to locate sequences that are so remotely related as to render construction of a
reliable comparative model impossible. The reason for this is that sequence relationships are
often established on short conserved segments, while a successful comparative modeling
exercise requires an overall correct alignment for the entire modeled part of the protein.
2.2. Selecting Templates
Once a list of potential templates is obtained using searching methods, it is necessary to
select one or more templates that are appropriate for the particular modeling problem.
Several factors need to be taken into account when selecting a template.
NIH-PA Author Manuscript
2.2.1. Considerations in Template Selection—The simplest template selection rule is
to select the structure with the highest sequence similarity to the modeled sequence. The
construction of a multiple alignment and a phylogenetic tree (48) can help in selecting the
template from the subfamily that is closest to the target sequence. The similarity between the
“environment” of the template and the environment in which the target needs to be modeled
should also be considered. The term “environment” is used here in a broad sense, including
everything that is not the protein itself (e.g., solvent, pH, ligands, quaternary interactions). If
possible, a template bound to the same or similar ligands as the modeled sequence should
generally be used. The quality of the experimentally determined structure is another
important factor in template selection. Resolution and R-factor of a crystal structure and the
number of restraints per residue for an NMR structure are indicative of their accuracy. The
Methods Mol Biol. Author manuscript; available in PMC 2014 July 23.
Fiser
Page 4
NIH-PA Author Manuscript
criteria for selecting templates also depend on the purpose of a comparative model. For
example, if a protein–ligand model is to be constructed, the choice of the template that
contains a similar ligand is probably more important than the resolution of the template.
2.2.2. Advantage of Using Multiple Templates—It is not necessary to select only one
template. In fact, the optimal use of several templates increases the model accuracy (13, 17,
49, 50); however, not all modeling programs are designed to accept more than one template.
The benefit of combining multiple template structures can be twofold. First, multiple
template structures may be aligned with different domains of the target, with little overlap
between them, in which case, the modeling procedure can construct a homology-based
model of the whole target sequence. Second, the template structures may be aligned with the
same part of the target and build the model on the locally best template.
NIH-PA Author Manuscript
An elaborate way to select suitable templates is to generate and evaluate models for each
candidate template structure and/or their combinations. The optimized all-atom models can
then be evaluated by an energy or scoring function, such as the Z-score of PROSA (46) or
VERIFY3D (51). These scoring methods are often sufficiently accurate to allow selection of
the most accurate of the generated models (52). This trial-and-error approach can be viewed
as limited threading (i.e., the target sequence is threaded through similar template
structures). However, these approaches are good only at selecting various templates on a
global level.
A recently developed method M4T (Multiple Mapping Method with Multiple Templates)
selects and combines multiple template structures through an iterative clustering approach
that takes into account the “unique” contribution of each template, their sequence similarity
among themselves and to the target sequence, and their experimental resolution (13, 17).
The resulting models systematically outperformed models that were based on the single best
template.
NIH-PA Author Manuscript
Another important observation from the same study was that below 40% sequence identity,
models built using multiple templates are more accurate than those built using a single
template only, and this trend is accentuated as one moves into more remote target–template
pair cases. Meanwhile, the advantage of using multiple templates gradually disappears
above 40% target–template sequence identity cases. This suggests that in this range, the
average differences between the template and target structures are smaller than the average
differences among alternative template structures that are all highly similar to the target
(17).
2.3. Sequence-to-Structure Alignment
To build a model, all comparative modeling programs depend on a list of assumed structural
equivalences between the target and template residues. This list is defined by the alignment
of the target and template sequences. Many template search methods will produce such an
alignment, and these sometimes can directly be used as the input for modeling. Often,
however, especially in the difficult cases, this initial alignment is not the optimal target–
template alignment. This is because search methods may be tuned for detection of remote
relationships, which is often realized on a local motif and not on a full-length, optimal
Methods Mol Biol. Author manuscript; available in PMC 2014 July 23.
Fiser
Page 5
NIH-PA Author Manuscript
alignment. Therefore, once the templates are selected, an alignment method should be used
to align them with the target sequence. When the target–template sequence identity is lower
than 40%, the alignment accuracy becomes the most important factor affecting the quality of
the resulting model. A misalignment by only one residue position will result in an error of
approximately 4 Å in the model.
2.3.1. Taking Advantage of Structural Information in Alignments—Alignments in
comparative modeling represent a unique class because on one side of the alignment there is
always a 3D structure, the template. Therefore, alignments can be improved by including
structural information from the template. For example, gaps should be avoided in secondary
structure elements, in buried regions, or between two residues that are far in space. Some
alignment methods take such criteria into account (47, 53, 54).
NIH-PA Author Manuscript
When multiple template structures are available, a good strategy is to superpose them with
each other first, to obtain a multiple structure-based alignment highlighting structurally
conserved residues (55–57). In the next step, the target sequence is aligned with this
multiple structure-based alignment. The benefits of using multiple structures and multiple
sequences are that they provide evolutionary and structural information about the templates,
as well as evolutionary information about the target sequence, and they often produce a
better alignment for modeling than the pairwise sequence alignment methods (22, 58).
NIH-PA Author Manuscript
Multiple Mapping Method (MMM) directly relies on information from the 3D structure (14,
59). MMM minimizes alignment errors by selecting and optimally splicing differently
aligned fragments from a set of alternative input alignments. This selection is guided by a
scoring function that determines the preference of each alternatively aligned fragment of the
target sequence in the structural environment of the template. The scoring function has four
terms, which are used to assess the compatibility of alternative variable segments in the
protein environment:(a) environment specific substitution matrices from FUGUE (47), (b)
residue substitution matrix, Blosum (60), (c) A 3D–1D substitution matrix, H3P2, that
scores the matches of predicted secondary structure of the target sequence to the observed
secondary structures and accessibility types of the template residues (61), and (d) a
statistically derived residue–residue contact energy term (62). MMM essentially performs a
limited and inverse threading of short fragments: in this exercise the actual question is not
the identification of a right fold, but identification of the correct alignment mapping, among
many alternatives, for sequence segments that are threaded on the same fold. These local
mappings are evaluated in the context of the rest of the model, where alignments provide a
consistent solution and framework for the evaluation.
2.4. Model Building
When discussing the model building step within comparative protein structure modeling, it
is useful to distinguish two parts: template-dependent and template-independent modeling.
This distinction is necessary because certain p …
Purchase answer to see full
attachment
You will get a plagiarism-free paper and you can get an originality report upon request.
All the personal information is confidential and we have 100% safe payment methods. We also guarantee good grades
Delivering a high-quality product at a reasonable price is not enough anymore.
That’s why we have developed 5 beneficial guarantees that will make your experience with our service enjoyable, easy, and safe.
You have to be 100% sure of the quality of your product to give a money-back guarantee. This describes us perfectly. Make sure that this guarantee is totally transparent.
Read moreEach paper is composed from scratch, according to your instructions. It is then checked by our plagiarism-detection software. There is no gap where plagiarism could squeeze in.
Read moreThanks to our free revisions, there is no way for you to be unsatisfied. We will work on your paper until you are completely happy with the result.
Read moreYour email is safe, as we store it according to international data protection rules. Your bank details are secure, as we use only reliable payment systems.
Read moreBy sending us your money, you buy the service we provide. Check out our terms and conditions if you prefer business talks to be laid out in official language.
Read more