Shotgun sequencing is a method of DNA sequencing whereby a long stretch of DNA is physically broken into small (approximately 2,000 base-pair) fragments which are cloned, sequenced, and assembled using computer analysis. It was developed and made famous by Craig Venter of Celera Corporation. Venter developed the technique in 1996 while working at the Institute for Genome Research.
Venter founded Celera in 1998 with the mission of sequencing the human genome within three years. This goal was in direct competition with the already operating Human Genome Project, a consortium of universities working together to sequence the human genome using an older strategy called map-based or BAC-to-BAC sequencing. This method involved first breaking the genome into 150,000 base pair pieces called BACs, assembling the BACs in order, and then sequencing each BAC in detail.
Whole genome shotgun sequencing bypasses the creation and mapping of BACs and starts right in with DNA sequencing. The process starts with acquiring a sample of high molecular weight DNA from the organism of interest and physically breaking it into small pieces by passing it through a narrow gauge syringe or sonicating it, a way of breaking the sample using sound waves. Shearing is a random process, so the sequences of the fragments will have some overlap between them. Shearing does not specifically create the 2,000 base pair fragments needed for sequencing, rather fragments of the desired size must be purified from the mixture.
The next step is to join the DNA fragments with carrier DNA called a vector. This process is known as cloning, and it creates a sequencing library from which the sequence of an entire genome will created. The sequence of each clone in the library is determined, and computer analysis is used to find overlapping, or continuous sequences in each fragment. Assembling the overlaps creates a “contig,” which is a long continuous stretch of DNA sequence.
Shotgun cloning will usually result in some gaps between contigs because some sequences are missing from the library by chance. Gaps can be filled by making a new library or by using known sequences to extend outward from the contig. Because shotgun sequencing sequences DNA fragments at random, many fragments are sequenced more than once, creating greater certainty that the sequence is correct than if each fragment had only been sequenced once or twice.
The human genome was sequenced both by the Human Genome Project using map-based sequencing and by Celera using shotgun sequencing. Shotgun sequencing is now the preferred method for other kinds of genome sequencing. The full genomes of many organisms, such as the plant Arabidopsis thaliana, rice, the cow, dog, chicken, chimpanzee, rat, mouse, pufferfish, and many microorganisms have been sequenced this way.