Instructions for Web-based Gene/Protein Analysis

Instructions for Web-based Gene/Protein Analysis.

Your assignment: follow the instructions below for your assigned gene, and submit everything I've highlighted in bold.

Your laboratory has isolated the cDNA for a gene. You want to know things like "What protein is made from this gene?" and "What does the protein do?" Following the protocol below, you will be able to determine the hypothetical structure of the protein, related proteins, and possibly the function of the protein from knowing the primary nucleotide sequence. This is Bioinformatics.

1. You will be emailed a DNA sequence that is your hypothetical cDNA clone.

2. Select the entire sequence, and "copy" it (ctrl-C).

3. Paste (ctrl-V) your sequence into the box located at the following website: http://mathcs.jcu.edu:8080/translatortool/faces/inputpage.jsp . It will read in all six frames, forward and reverse. Likely ORFs will be highlighted in green. Possible, but less likely ORFs will be highlighted in red. Click the most likely. It will pop up a new page that gives the protein sequence. Copy that sequence for the next operations and print it for your report.

(As a backup plan you could also try: http://www.expasy.ch/tools/dna.html Below the box is a drop down menu. Use the “compact” feature and click “Translate Sequence.” The algorithm has searched for open reading frames in all six possible orientations. You must figure out which one makes the most sense. (This will be explained in more detail in class.) Highlight the protein sequence you chose. Also you should copy that sequence electronically to be used later in this assignment.)

4. Now, point your browser at the National Center for Biotechnology Information (NCBI) homepage: http://www.ncbi.nlm.nih.gov/

Click on the “DNA and RNA” link. On the next page scroll down to the "BLAST" button under the tools section. Scroll down and click "Nucleotide BLAST.” When the next page loads, paste (ctrl-v) your DNA sequence into the form box and then click "BLAST!" A new page will pop up telling you how long it should take for your request to be filled. Click on "Format!"

When your results arrive, look them over to see what known sequences look like yours. This will give you an idea if your gene is related to any known genes and what it's function might be. Print out the first few summary pages (NOT the sequence alignment pages!).

5. Go back to your translated protein sequence and copy it and go to the "Expasy" homepage: http://www.expasy.org/

Click on “full list” under the “Databases” section. Scroll down to "Protein related databases" and click “NCBI Protein Resources.” Once there, you’ll be on a slightly different part of the NCBI. Click on the “BLAST" button on the left and proceed similarly to what you did with your DNA sequence (but obviously use your protein sequence for this part). You will get results from this search in a few minutes. Again, print out the first few summary pages of similar protein sequences. Do these homologies agree with the DNA homologies? Discuss.

6. Go "back" a few pages to www.expasy.ch and click the “tools” link in the upper right corner of the page. Scroll down to the Pattern and Profile area, and click on "MOTIFScan." This will take you to the Swiss Institute of Bioinformatics (SIB). Paste your protein sequence into the field as before, click the boxes for “PROSITE patterns, PROSITE patterns (frequent match producers), and PROSITE profiles”, then "Search." This will return a page that informs you of any potential post-translational modification sites, based on homology to other protein motifs. (This may take a few minutes for the search to complete.) Print this page.

7. Return to the Proteomics tools page, scroll to the “Other prediction” tools section and click on "ProtParam." Paste in your sequence as above, and click "Compute Parameters." Your results will come back to you quickly. It should tell you the number of residues, molecular weight, isoelectric point, amino acid composition, extinction coefficient, and predicted stability (some proteins have "tell-tale" instability sequences) of your protein. Print this out.

8. Return to the Proteomics tools page and click on “TMPred" (found in the Topology Prediction section). Paste in your sequence as before and "Submit." The results will come back with a hydropathy plot, plus evaluation of that plot. Print this page out. Is your protein expected to be transmembrane or cytoplasmic?

9. Go back to "Proteomics Tools" and click on "Phyre" (in the “threading” area within the "tertiary structure" area). Type your e-mail address to tell the program where to send your results, give the sequence a name, paste your protein sequence into the box and click "Quick Phyre Search." You will get an e-mail telling you where to retrieve your results. While you're waiting, the computer will be doing a primary, secondary and tertiary structure alignment with all known proteins!).

Your results will be too cumbersome to print out, but you can look through the names of the matches to see how this search compares to the others. Discuss. Also, you can click on the proteins to "see" how they might fold. The "normal" size lines indicate well-matched structure alignments, thick lines indicate that your sequence is missing some sequence found in the structure that it's being compared to, and thin lines indicate that your sequence has extra sequence not found in the structure that it's being compared to.

You should now be able to tell me what your DNA encodes and what the protein product probably does. You may have to look up data for homologous proteins in textbooks or in PubMed to answer the "what does it do" parts.