Part IV.  Using Bioinformatics Databases to explore Sickle Cell Anemia

            Bioinformatics is the use of computers to make sense of biological data.  In particular, the human genome project has generated lots and lots of sequences of DNA from many different organisms, not just humans.  These sequences are stored in public databases, and there are so many sequences that it takes computer power to store, analyze and work with this data.  One major gateway to the public databases is NCBI:  the National Center for Biotechnology Information, maintained by the National Library of Medicine at the National Institutes of Health.  These databases are not just for scientists; students can go to and learn from parts of these public databases.  Let’s mine these databases for some more information about sickle cell anemia.


Go to the NCBI sickle cell anemia “Genes and Disease” page, (http://www.ncbi.nlm.nih.gov/books/bv.fcgi?rid=gnd.section.98&ref=toc ) click on “Genome view” to visualize where on the chromosome this gene is located – would you describe its location as in the middle or near the end of the chromosome?
 
Back again to the sickle cell anemia “Genes and Disease” page.  Let’s look more closely at the HBB gene (symbol for the beta globin gene of hemoglobin).  Click on “Entrez Gene” to get to the Entrez gene entry for HBB, and on that page, scroll down to “genomic regions, transcripts, products” – this item tells you a lot about this gene.  For example, it reveals that the DNA for this gene is positioned on the chromosome between base # 5,203,272 and ____________________.  How many nucleotides are in this sequence? __________Within this stretch of DNA, how many coding regions are there:________?  The coding regions indicate what part of the DNA actually gets translated from DNA into protein, and a coding region is termed an exon.  The noncoding regions are called introns – how many are shown for this gene____?

Click on black link NC_000011.8 (FASTA) to get entire nucleotide sequence of gene -  this contains ALL the nucleotides of the gene, coding and non coding.  Next, go back one page and click on the blue link NM_000518 (FASTA) to get the mRNA sequence for this gene – what do you notice about this sequence – is it the same length?  Longer?  Shorter?  What has happened to account for this difference?  _________________________________________________

Now go back one page and click on the red link CCDS7753.1 to get to the Consensus Coding Sequence page, scroll down to nucleotide sequence which is _____ nts (nucleotides) long and see below the amino acid sequence which is ______amino acids long. Look at the seventh amino acid in the sequence – it is ____________. (mouse over the symbol to get the amino acid name) You now know that in sickle cell anemia, this amino acid is replaced with ______ when the codon is changed from ____ to ____. Looking at the chromosomal locations chart above the CCDS sequence data, you can see the exact 3 locations on the DNA where the exons are located.

 More Bioinformatics Practice:

Go to http://www.ebi.ac.uk/ and pull down the “Tools” menu and select “sequence analysis – align” and the page EMBOSS Pairwise Alignment Algorithms pops up.  Copy and paste the DNA sequence below into sequence 1, and then copy and paste the mRNA sequence above into sequence 2. Click on the RUN button.  From the results, you’ll notice that you get perfect alignment three times, with big gaps of no alignment.  What does this illustrate?

 

Complete nucleotide sequence of HBB gene in FASTA format:  DNA
>ref|NC_000011.8|NC_000011:c5204877-5203272 Homo sapiens chromosome 11, reference assembly, complete sequence (1606 nt)
ACATTTGCTTCTGACACAACTGTGTTCACTAGCAACCTCAAACAGACACCATGGTGCATCTGACTCCTGA
GGAGAAGTCTGCCGTTACTGCCCTGTGGGGCAAGGTGAACGTGGATGAAGTTGGTGGTGAGGCCCTGGGC
AGGTTGGTATCAAGGTTACAAGACAGGTTTAAGGAGACCAATAGAAACTGGGCATGTGGAGACAGAGAAG
ACTCTTGGGTTTCTGATAGGCACTGACTCTCTCTGCCTATTGGTCTATTTTCCCACCCTTAGGCTGCTGG
TGGTCTACCCTTGGACCCAGAGGTTCTTTGAGTCCTTTGGGGATCTGTCCACTCCTGATGCTGTTATGGG
CAACCCTAAGGTGAAGGCTCATGGCAAGAAAGTGCTCGGTGCCTTTAGTGATGGCCTGGCTCACCTGGAC
AACCTCAAGGGCACCTTTGCCACACTGAGTGAGCTGCACTGTGACAAGCTGCACGTGGATCCTGAGAACT
TCAGGGTGAGTCTATGGGACGCTTGATGTTTTCTTTCCCCTTCTTTTCTATGGTTAAGTTCATGTCATAG
GAAGGGGATAAGTAACAGGGTACAGTTTAGAATGGGAAACAGACGAATGATTGCATCAGTGTGGAAGTCT
CAGGATCGTTTTAGTTTCTTTTATTTGCTGTTCATAACAATTGTTTTCTTTTGTTTAATTCTTGCTTTCT
TTTTTTTTCTTCTCCGCAATTTTTACTATTATACTTAATGCCTTAACATTGTGTATAACAAAAGGAAATA
TCTCTGAGATACATTAAGTAACTTAAAAAAAAACTTTACACAGTCTGCCTAGTACATTACTATTTGGAAT
ATATGTGTGCTTATTTGCATATTCATAATCTCCCTACTTTATTTTCTTTTATTTTTAATTGATACATAAT
CATTATACATATTTATGGGTTAAAGTGTAATGTTTTAATATGTGTACACATATTGACCAAATCAGGGTAA
TTTTGCATTTGTAATTTTAAAAAATGCTTTCTTCTTTTAATATACTTTTTTGTTTATCTTATTTCTAATA
CTTTCCCTAATCTCTTTCTTTCAGGGCAATAATGATACAATGTATCATGCCTCTTTGCACCATTCTAAAG
AATAACAGTGATAATTTCTGGGTTAAGGCAATAGCAATATCTCTGCATATAAATATTTCTGCATATAAAT
TGTAACTGATGTAAGAGGTTTCATATTGCTAATAGCAGCTACAATCCAGCTACCATTCTGCTTTTATTTT
ATGGTTGGGATAAGGCTGGATTATTCTGAGTCCAAGCTAGGCCCTTTTGCTAATCATGTTCATACCTCTT
ATCTTCCTCCCACAGCTCCTGGGCAACGTGCTGGTCTGTGTGCTGGCCCATCACTTTGGCAAAGAATTCA
CCCCACCAGTGCAGGCTGCCTATCAGAAAGTGGTGGCTGGTGTGGCTAATGCCCTGGCCCACAAGTATCA
CTAAGCTCGCTTTCTTGCTGTCCAATTTCTATTAAAGGTTCCTTTGTTCCCTAAGTCCAACTACTAAACT
GGGGGATATTATGAAGGGCCTTGAGCATCTGGATTCTGCCTAATAAAAAACATTTATTTTCATTGC

 

mRNA of HBB in FASTA format:  (444 nt)
>gi|28302128|ref|NM_000518.4| Homo sapiens hemoglobin, beta (HBB), mRNA
ACATTTGCTTCTGACACAACTGTGTTCACTAGCAACCTCAAACAGACACCATGGTGCATCTGACTCCTGA
GGAGAAGTCTGCCGTTACTGCCCTGTGGGGCAAGGTGAACGTGGATGAAGTTGGTGGTGAGGCCCTGGGC
AGGCTGCTGGTGGTCTACCCTTGGACCCAGAGGTTCTTTGAGTCCTTTGGGGATCTGTCCACTCCTGATG
CTGTTATGGGCAACCCTAAGGTGAAGGCTCATGGCAAGAAAGTGCTCGGTGCCTTTAGTGATGGCCTGGC
TCACCTGGACAACCTCAAGGGCACCTTTGCCACACTGAGTGAGCTGCACTGTGACAAGCTGCACGTGGAT
CCTGAGAACTTCAGGCTCCTGGGCAACGTGCTGGTCTGTGTGCTGGCCCATCACTTTGGCAAAGAATTCA
CCCCACCAGTGCAGGCTGCCTATCAGAAAGTGGTGGCTGGTGTGGCTAATGCCCTGGCCCACAAGTATCA
CTAAGCTCGCTTTCTTGCTGTCCAATTTCTATTAAAGGTTCCTTTGTTCCCTAAGTCCAACTACTAAACT
GGGGGATATTATGAAGGGCCTTGAGCATCTGGATTCTGCCTAATAAAAAACATTTATTTTCATTGC

 

Protein sequence in FASTA format:  (147 aa)
>gi|4504349|ref|NP_000509.1| beta globin [Homo sapiens]
MVHLTPEEKSAVTALWGKVNVDEVGGEALGRLLVVYPWTQRFFESFGDLSTPDAVMGNPKVKAHGKKVLG
AFSDGLAHLDNLKGTFATLSELHCDKLHVDPENFRLLGNVLVCVLAHHFGKEFTPPVQAAYQKVVAGVAN
ALAHKYH

 

 

 

 

 

Professional Development