What is it?
the New Genetics
Websites en Español
Primer Molecular Genetics
List of All Publications
Search This Site
Site Stats and Credits
Although the completion of the Human Genome Project was celebrated in April
2003 and sequencing of the human chromosomes is essentially "finished,"
the exact number of genes encoded by the genome is still unknown. October 2004
findings from The International Human Genome Sequencing Consortium, led in the
United States by the National Human Genome Research Institute (NHGRI) and the
Department of Energy (DOE), reduce the estimated number of human protein-coding
genes from 35,000 to only 20,000-25,000, a surprisingly low number for our species
(7). Consortium researchers have confirmed the existence
of 19,599 protein-coding genes in the human genome and identified another 2,188
DNA segments that are predicted to be protein-coding genes.
In 2003, estimates from gene-prediction programs suggested there might be 24,500
or fewer protein-coding genes (1). The Ensembl genome-annotation
system estimates them at 23,299.
When analysis of the draft human genome sequence was published by the
International Human Genome Sequencing Consortium on February 15, 2001,
the paper estimated only about 30,000 to 40,000 protein-coding genes,
much lower than previous estimates of about 100,000. This lower estimate
came as a shock to many scientists because counting genes was viewed as
a way of quantifying genetic complexity. With about 30,000, the human
gene count would be only one-third greater than that of the simple roundworm
C. elegans, which has about 20,000 genes (2).
Studies since the publication of the draft genome sequence have generated
widely different estimates. An analysis by scientists at Ohio State University
suggested between 65,000 and 75,000 human genes (3),
and another study published in Cell in August 2001 predicted
a total of 42,000 (4).
Although the exact number of human genes is still uncertain, a winner of GeneSweep
was announced in May 2003. GeneSweep was an informal gene-count betting
pool that began at the 2000 Cold Spring Harbor Laboratory Genome Meeting.
Bets ranged from around 26,000 to more than 150,000 genes. Since most
gene-prediction programs were estimating the number of protein-coding
genes at fewer than 30,000, GeneSweep officials decided to declare the
contestant with the lowest bet (25,947 by Lee Rowen of the Institute of
Systems Biology in Seattle) the winner (1).
It could be years before a truly reliable gene count can be assessed.
The reason for so much uncertainty is that predictions are derived from
different computational methods and gene-finding programs. Some programs
detect genes by looking for distinct patterns that define where a gene
begins and ends ("ab initio" gene finding). Other programs look
for genes by comparing segments of sequence with those of known genes
and proteins (comparative gene finding). While ab initio gene finding
tends to overestimate gene numbers by counting any segment that looks
like a gene, comparative gene finding tends to underestimate since it
is limited to recognizing only those genes similar to what scientists
have seen before. Defining a gene is problematic because small genes can
be difficult to detect, one gene can code for several protein products,
some genes code only for RNA, two genes can overlap, and many
other complications (5).
Even with improved genome analysis, computation alone is simply not enough
to generate an accurate gene number. Clearly, gene predictions will have
to be verified by labor-intensive work in the laboratory before the scientific
community can reach any real consensus (6).
NCBI Human Genome
- Release notes for the most current build of the human genome from the National Center for Biotechnology Information (NCBI) used in its genome browser called Map Viewer.
UCSC Human Genome Browser
Gateway - Genome browser maintained by the Genome Bioinformatics
Group of the University of California, Santa Cruz. Human genome data
based on the most recent build available from NCBI.
Ensembl Human Genome
- The most current human genome release available from the European
Bioinformatics Institute's human genome browser. The Ensembl release
is derived from the NCBI human genome build.
Focus: The Human Genome- Nature Publishing Group maintains this
website that links to scientific articles reporting the finished genome sequence
for each human chromosome.
Updated Summaries of Public Draft of Human Genome Sequence
- International Human Genome Sequencing Consortium. 2004. "Finishing the
Euchromatic Sequence of the Human Genome," Nature 431,
931-945. Available online.
- Schmutz, J., et al. 2004. "Human Genome: Quality Assessment of the Human Genome Sequence,"
Nature 429, 365-368. Available online.
Summary of Public Draft of Human Genome Sequence
Summary of Celera's Draft of Human Genome Sequence
Lander, E., et al. 2001. "Initial Sequencing and Analysis of the
Human Genome," Nature 409, 860-921. Available online.
Venter, J. Craig, et al. 2001. "The Sequence of the Human Genome,"
"The Nature of the Number," 2000. Nature Genetics
25, 127-28. (Editorial).
Aparicio, S. 2000. "How to Count...Human Genes," Nature
Genetics 25, 129-30.
Ewing, B. and P. Green. 2000. "Analysis of Expressed Sequence Tags Indicates
35,000 Human Genes," Nature Genetics 25,
Crollius, H. R., et al. 2000. "Estimate of Human Gene Number Provided
by Genome-Wide Analysis Using Tetraodon nigroviridis DNA Sequence,"
Nature Genetics 25, 235-38.
Liang, F., et al. 2000. "Gene Index Analysis of the Human Genome Estimates
Approximately 120,000 Gene," Nature Genetics 25,
- Pennisi, E. 2003. "A Low Number Wins the
GeneSweep Pool," Science 300, 1484.
- Claverie, J. 2001. "Gene Number. What
if There are Only 30,000 Human Genes?" Science
- Briggs, H. 2001. "Dispute Over Number of Human
Genes," BBC News Online.
- Wright, F., et al. 2001. "A Draft Annotation
and Overview of the Human Genome," Genome Biology 2,
- Pennisi, E. 2003. "Gene Counters
Struggle to Get the Right Answer," Science 301,
- Hollon, T. 2001. "Human Genes: How
Many?" The Scientist 15, 1.
- Stein, L. D. 2004. "Human Genome: End
of the Beginning," Nature 431, 915-916.