Instructions for Web-based Gene/Protein Analysis.
Your assignment: follow the
instructions below for your assigned gene, and submit everything I've
highlighted in bold.
Your laboratory has isolated the cDNA
for a gene. You want to know things like
"What protein is made from this gene?" and "What does the
protein do?" Following the protocol
below, you will be able to determine the hypothetical structure of the protein,
related proteins, and possibly the function of the protein from knowing the
primary nucleotide sequence. This is
Bioinformatics.
1. You will be
emailed a DNA sequence that is your hypothetical cDNA clone.
2. Select the entire sequence, and
"copy" it (ctrl-C).
3. Paste (ctrl-V) your sequence into the box
located at the following website: http://mathcs.jcu.edu:8080/translatortool/faces/inputpage.jsp
. It will read in all six frames,
forward and reverse. Likely ORFs will be
highlighted in green. Possible, but less
likely ORFs will be highlighted in red.
Click the most likely. It will
pop up a new page that gives the protein sequence. Copy that sequence for the next operations
and print it for your report.
(As a backup plan
you could also try: http://www.expasy.ch/tools/dna.html
Below the box
is a drop down menu. Use the “compact” feature and click “Translate Sequence.” The algorithm has searched for open reading
frames in all six possible orientations.
You must figure out which one makes the most
sense. (This will be explained in more
detail in class.) Highlight the protein
sequence you chose. Also you should copy
that sequence electronically to be used later in this assignment.)
4.
Now, point your browser at the
Click on the “DNA and RNA” link. On the next page scroll
down to the "BLAST" button under the tools section. Scroll down and click "Nucleotide BLAST.” When the next page loads, paste (ctrl-v) your
DNA sequence into the form box and then click "BLAST!" A new page will pop up telling you how long
it should take for your request to be filled.
Click on "Format!"
When your results
arrive, look them over to see what known sequences look like yours. This will give you an idea if your gene is
related to any known genes and what it's function might be. Print out the first few summary
pages (NOT the sequence
alignment pages!).
5. Go back to your translated protein sequence
and copy it and go to the "Expasy" homepage: http://www.expasy.org/
Click on “full list” under the
“Databases” section. Scroll down to "Protein related databases" and click “NCBI Protein Resources.” Once there, you’ll be on a slightly different
part of the NCBI. Click on the “BLAST"
button on the left and proceed similarly to what you did with your DNA sequence
(but obviously use your protein sequence for this part). You will get results from this search in a
few minutes. Again, print out the first few summary pages of similar protein
sequences. Do these homologies agree
with the DNA homologies? Discuss.
6. Go "back" a few pages to www.expasy.ch
and click the “tools” link in the upper right corner of the page. Scroll down to the Pattern and Profile area, and
click on "MOTIFScan." This
will take you to the Swiss Institute of Bioinformatics (SIB). Paste your protein sequence into the field as
before, click the boxes for “PROSITE patterns, PROSITE patterns (frequent match
producers), and PROSITE profiles”, then "Search." This will return a
page that informs you of any potential post-translational modification sites,
based on homology to other protein motifs.
(This may take a few minutes for the search to complete.) Print this page.
7.
Return to the Proteomics tools page, scroll to the “Other prediction”
tools section and click on "ProtParam." Paste in your sequence as
above, and click "Compute Parameters." Your results will come back to
you quickly. It should tell you the number of residues, molecular weight,
isoelectric point, amino acid composition, extinction coefficient, and
predicted stability (some proteins have "tell-tale" instability
sequences) of your protein. Print this
out.
8. Return to the Proteomics tools page
and click on “TMPred" (found in the Topology Prediction section). Paste in
your sequence as before and "Submit." The results will come back with
a hydropathy plot, plus evaluation of that plot. Print this page out. Is your
protein expected to be transmembrane or cytoplasmic?
9.
Go back to "Proteomics Tools" and click on "Phyre"
(in the “threading” area within the "tertiary structure" area). Type your e-mail address to tell the program
where to send your results, give the sequence a name, paste your protein
sequence into the box and click "Quick Phyre Search." You will get an e-mail telling you where to
retrieve your results. While you're
waiting, the computer will be doing a primary, secondary and tertiary structure
alignment with all known proteins!).
Your results will be too cumbersome to
print out, but you can look through the names of the matches to see how this
search compares to the others. Discuss. Also, you can click on the proteins to
"see" how they might fold. The
"normal" size lines indicate well-matched structure alignments, thick
lines indicate that your sequence is missing some sequence found in the
structure that it's being compared to, and thin lines indicate that your
sequence has extra sequence not found in the structure that it's being compared
to.
You should now be able to tell me what
your DNA encodes and what the protein product probably does. You may have to look up data for homologous
proteins in textbooks or in PubMed to answer the "what does it do"
parts.