NIH (National Institutes of Health) sponsored the HGP (Human Genome Project) which began in 1990 with a 10 year plan. It had a step by step plan to cut the genome into setions, chromosome by chromosome, each section to be sequenced.
For each chromosome:
A former NIH employee, Craig Venter, wasn’t too fond of the methodology being used for the HGP. He went out and formed his own company, Celera, funded by pharmaceutical firms to sequence the genome using a different method. The method was called shotgun sequencing. Venter started by sequencing genomes of bacteria and fruit flies, and then moved on to humans. Celera started sequencing of the human genome in late 1999 and already had a draft sequence by June of 2000.
Celera had around 100 sequencers running 24/7 to sequence the human genome. Celera built a factory for research supplies near the sequencing building just to get things done faster.
HGP took around 10 years, Celera took about 2. Both came with a draft sequence by June 2000.
Well, is it accurate? The NIH HGP sequenced around 12 times, and Celera did 35.6 times.
How complete is the sequence? Proofreading was needed, and filling gaps wasn’t completed.
Whole Genome Sequencing, also known as Shotgun Sequencing, essentially cut the genomic DNA with various restriction enzymes, and then used software to assemble a chromosome. The fragments aligned based on identical DNA sequences and created an assembly of contiguous fragments, aka “contigs”.
Using the alignment of contigs to assemble a sequence.
What is an ORF? An ORF is a triple that encodes an amino acid.
Look for ORFs to find potential coding sequences.