What is it?
the New Genetics
Websites en Español
Primer Molecular Genetics
List of All Publications
Search This Site
Site Stats and Credits
| Quick links for this page:
What is DNA sequencing?
DNA sequencing, the process of determining the exact order of the 3
billion chemical building blocks (called bases and abbreviated A, T, C,
and G) that make up the DNA of the 24 different human chromosomes, was the
greatest technical challenge in the Human Genome Project. Achieving this
goal has helped reveal the estimated 20,000-25,000 human genes within our DNA as
well as the regions controlling them. The resulting DNA sequence maps are
being used by 21st Century scientists to explore human biology and other
Meeting Human Genome Project sequencing goals by 2003 required continual
improvements in sequencing speed, reliability, and costs. Previously,
standard methods were based on separating DNA fragments by gel electrophoresis,
which was extremely labor intensive and expensive. Total sequencing output
in the community was about 200 million base pairs for 1998. In January 2003, the DOE Joint
Genome Institute alone sequenced 1.5 billion bases for the month.
Gel-based sequencers use multiple tiny (capillary) tubes to run standard
electrophoretic separations. These separations are much faster because
the tubes dissipate heat well and allow the use of much higher electric
fields to complete sequencing in shorter times.
See a figure
depicting this technology.
Whose genome was sequenced in the public (HGP) and private projects?
The human genome reference sequences do not represent any one person’s
genome. Rather, they serve as a starting point for broad comparisons across
humanity. The knowledge obtained from the sequences applies to everyone because all humans
share the same basic set of genes and genomic regulatory regions that control
the development and maintenance of their biological structures and processes.
In the international public-sector Human Genome Project (HGP), researchers
collected blood (female) or sperm (male) samples from a large number of donors.
Only a few samples were processed as DNA resources. Thus donors' identities were protected so neither they nor scientists could know
whose DNA was sequenced. DNA clones from many libraries were used
in the overall project.
Technically, it is much easier to prepare DNA cleanly from sperm than from
other cell types because of the much higher ratio of DNA to protein in sperm
and the much smaller volume in which purifications can be done. Sperm contain all chromosomes necessary for study, including equal numbers of cells with
the X (female) or Y (male) sex chromosomes. However, HGP scientists also used
white cells from female donors' blood to include samples originating from women.
In the Celera Genomics private-sector project, DNA from a few different genomes
was mixed and processed for sequencing. DNA for these
studies came from anonymous donors of European, African, American (North, Central,
South), and Asian ancestry. The lead scientist of Celera Genomics at that time,
Craig Venter, has since acknowledged that his DNA was among those sequenced.
Many polymorphisms—small regions of DNA that vary among individuals—also were identified during the HGP, mostly single nucleotide polymorphisms
(SNPs). Most SNPs have no physiological effect, although a minority contribute
to the beneficial diversity of humanity. A much smaller minority
of polymorphisms affect an individual’s susceptibility to disease and
response to medical treatments.
Although the HGP has been completed, SNP studies continue in the International
HapMap Project, whose goal is to identify patterns of SNP groups (called
haplotypes, or “haps”). The DNA samples for the HapMap Project came from
270 individuals, including Yoruba people in Ibadan, Nigeria; Japanese in Tokyo;
Han Chinese in Beijing; and the French Centre d’Etude du Polymorphisme
Humain (CEPH) resource.
[Answer supplied by Dr. Marvin Stodolsky, U.S. DOE Office of Biological and
Environmental Research, Office of Science]
Who sequenced the human genome?
Human Genome Project research was funded at many laboratories across
the U.S. by the Department of Energy (DOE), the National Institutes of Health
(NIH), or both. A list of the major U.S. Human Genome Project research sites
can be found here.
Other researchers at numerous colleges, universities, and laboratories
throughout the United States also have received DOE and NIH funding for
human genome research. At any given time, the DOE Human Genome Project
has funded about 100 principal investigators. For DOE-funded
projects, see Research.
To see a list of NIH-funded projects, visit the agency's grants database.
In addition, many large and small private U.S. companies are conducting
genome research. For more on the genomics research partnership between
the public and private sectors, see the Human
Genome Project and the Private Sector Fact Sheet. At least 18 other
countries have participated in the Human Genome Project. See the
How is DNA sequencing done?
Download a PDF illustration courtesy of the
of Energy's Joint Genome Institute.
- Chromosomes, which range in size from 50 million to 250 million bases, must first be
broken into much shorter pieces (subcloning step).
- Each short piece is used as a template to generate a set of fragments that
differ in length from each other by a single base that will be identified
in a later step (template preparation and sequencing reaction steps).
figure depicting the sequencing reaction.
- The fragments in a set are separated by gel electrophoresis (separation step).
New fluorescent dyes allow separation of all four fragments in a single
lane on the gel.
See an example
of an electropherogram using fluorescent dyes. Click on the image for a caption.
- The final base at the end of each fragment is identified (base-calling step). This
process recreates the original sequence of As, Ts, Cs, and Gs for each short piece generated in the
Automated sequencers analyze the resulting electropherograms,
and the output is a four-color chromatogram showing peaks that represent
each of the four DNA bases.
After the bases are "read," computers are used to assemble the short sequences
(in blocks of about 500 bases each, called the read length) into long continuous
stretches that are analyzed for errors, gene-coding regions, and other characteristics.
To read about all the trouble researchers go through to "finish" this raw sequence from
automated sequencers, click here
(and scroll to bottom that begins "Here are our definitions of . . . ").
Finished sequences are submitted to major public sequence databases,
such as GenBank. Human Genome Project sequence
data are thus freely available to anyone around the world.
In May 2006, Human Genome Project (HGP) researchers announced the completion
of the DNA sequence for the last of the 24 human chromosomes. How does
this differ from the finished human genome announced by HGP researchers
The DNA sequences announced in 2003 were only rough drafts for each
human chromosome. While this draft already has advanced medical research,
more detail was needed. The draft genomic sequences can be compared broadly
to a cross-country road excavated by a bulldozer that leaves behind
many gaps across difficult terrain that will require bridges and other
So, too, with charting the landscape of the human genome. Researchers
have now filled in the gaps and provided far more detail for each chromosome.
Much of this was accomplished by comparing particular DNA sequences across
populations in genomic areas that may have contained anomalies in the
initial samples. For example, some DNA segments have proven unstable during
the process of copying them (cloning) for use in sequencing machines.
example.) Correcting minor errors (estimated at 1 error in every
10,000 DNA subunits) and cataloging of mutations will continue for some
time to come.
The entire collection of human chromosome DNA sequences is freely available
to the worldwide research community.
For more details, see the Nature
What is the difference between draft sequence and finished sequence?
In generating the draft sequence (released in June 2000), scientists
determined the order of base pairs in each chromosomal area at least 4
to 5 times (4x to 5x) to ensure data accuracy and to help with reassembling
DNA fragments in their original order. This repeated sequencing is known
as genome "depth of coverage." Draft sequence data are mostly
in the form of 10,000 base pair-sized fragments whose approximate chromosomal
locations are known.
To generate a high-quality reference sequence, completed in April
2003, additional sequencing was done to close gaps, reduce ambiguities,
and allow for only a single error every 10,000 bases, the agreed-upon
standard for the HGP. Investigators believe a high-quality sequence
is critical for recognizing gene-regulatory components important in understanding human biology and disorders such as heart disease,
cancer, and diabetes. The finished version provides an estimated 8x to
9x coverage of each chromosome.
What genomes have been sequenced completely?
The small genomes of several viruses and bacteria and the much
larger genomes of three higher organisms have been completely sequenced;
they are bakers' or brewers' yeast (Saccharomyces cerevisiae), the
roundworm (Caenorhabditis elegans), and the fruit fly (Drosophila
melanogaster). In October 2001, the draft sequence of the pufferfish
Fugu rubripes, the first vertebrate after the human, was completed;
and scientists finished the first genetic sequence of a plant, that of the
weed Arabidopsis thaliana, in December 2000. Many more genome sequences have
been completed since then.
For information on published and unpublished genomes, see Genomes Online
What nonhuman genome sequencing projects are supported by the U.S. Department
A list of microbial genome sequencing projects supported by the U.S. Department
of Energy Microbial Genome Program is available
What happens now that the human genome sequence is completed?
The working-draft DNA sequence and the more polished 2003 version represent
an enormous achievement, akin in scientific importance, some say, to developing
the periodic table of elements. And, as in most major scientific advances,
much work remains to realize the full potential of the accomplishment.
Early explorations of the human genome, now joined by projects on
the genomes of several other organisms, are generating data whose
volume and complex analyses are unprecedented in biology. Genomic-scale
technologies will be needed to study and compare entire genomes, sets
of expressed RNAs or proteins, gene families from a large number of species,
variation among individuals, and the classes of gene regulatory elements.
Deriving meaningful knowledge from DNA sequences will define biological
research through the coming decades and require the expertise and creativity
of teams of biologists, chemists, engineers, and computational scientists,
among others. A sampling follows of some research challenges in genetics--what
we still don't know, even with the full human DNA sequence in hand.
- Gene number, exact locations, and functions
- Gene regulation
- DNA sequence organization
- Chromosomal structure and organization
- Noncoding DNA types, amount, distribution, information content, and functions
- Coordination of gene expression, protein synthesis, and post-translational
- Interaction of proteins in complex molecular machines
- Predicted vs experimentally determined gene function
- Evolutionary conservation among organisms
- Protein conservation (structure and function)
- Proteomes (total protein content and function) in organisms
- Correlation of SNPs (single-base DNA variations among individuals) with
health and disease
- Disease-susceptibility prediction based on gene sequence variation
- Genes involved in complex traits and multigene diseases
- Complex systems biology, including microbial consortia useful for environmental
- Developmental genetics, genomics