Talk:Shotgun sequencing

	This article is within the scope of WikiProject Molecular Biology, a collaborative effort to improve the coverage of Molecular Biology on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.Molecular BiologyWikipedia:WikiProject Molecular BiologyTemplate:WikiProject Molecular BiologyMolecular Biology articles
Mid	This article has been rated as Mid-importance on the importance scale.
	This article is supported by the Molecular and Cell Biology task force (assessed as High-importance).

Name origin

According to the book Genome War (specific reference needed), the technique gets its name from the dispersive mechanism used break DNA strands into small pieces. I am not an expert, but someone who is should either correct this or offer an alternate reference.

David Hollman (talk) 22:20, 28 April 2008 (UTC)[reply]

Meaningless example

I have tabluated the strands so that the sequences line up. You should notice that every sequence is now the same.

I'm not a biologist of bioinformatician. Someone needs to fix this, because it's either wrong, unexplained, or both. I'm simply not knowledgeable here.

Josephholsten (talk) —Preceding comment was added at 02:14, 24 November 2007 (UTC)[reply]

Deleted

The DNA is first cut into small pieces by restriction enzymes.

because I feel certain that's wrong. If the DNA is cut, it's more likely fractionated, because it's important to get random samples. The article as a whole is pretty vague and somewhat suspect, but unfortunately I don't know that much about the details either...

Zashaw 23:11, 21 Jun 2004 (UTC)

So how do you do the sequencing then?

There's something odd about this article: Sequencing lists this as a method used to sequence DNA, but this article says "...and then these clones are sequenced". How can the last step of sequencing be sequencing? Does it mean, "these clones are sequenced using sequencing method X"? Of the three methods mentioned under sequencing, only Chain termination method seems to actually be a method of sequencing, the other two just refer back to some unexplained "sequencing" process. Anyone care to clarify? - IMSoP 00:19, 22 Oct 2004 (UTC)

Fixed (I hope I got it right). - IMSoP 21:21, 14 Nov 2004 (UTC)

Whole genome shotgun sequencing

What is the difference between this and Whole genome shotgun sequencing? Jmeppley 21:26, 26 May 2005 (UTC)[reply]

The two articles should really be merged. An alternative to WGS is the BAC-based method...the difference is in how careful you are about splitting up the original DNA into pieces to be sequenced. But the basic idea of assembly is the same with either method, and the term "shotgun" is in contrast to chromosome walking. The simplified information presently in this article applies to both WGS and BAC approaches. (see for example [1]) --Mike Lin 08:54, 13 October 2005 (UTC)[reply]

RESOLVED Josephholsten (talk) 02:18, 24 November 2007 (UTC)[reply]

Double Barrel Shotgun Sequencing

Some information about Double Barrel Shotgun Sequencing would be nice. T_P

reply (to Double Barrel Shotgun Sequencing": LOL man I'm a grad school student reading up about this stuff for a bioinformatics project and this really made my day!! haha -Kris

Wait until you read about 'machine gun sequencing'! (nah, I just made that up.) --Dan|^(talk) 15:01, 7 August 2008 (UTC)[reply]

External Link to Scientist Article

Why is there a link to material that is not freely available? I don't think we should link to articles that require registration. neffk 17:36, 9 August 2006 (UTC)[reply]

Most books need to be bought, or require membership at a library which has paid for access to the material. Because it is unreasonable to exclude books from citations, and that digital content behind pay-walls have equivalent accessibility, we should not discriminate against such content. For example, Nature and Science are certainly not free, but scientific articles demand citing them because of their quality. Josephholsten (talk) 02:26, 24 November 2007 (UTC)[reply]

Craig`s argument is that SPEED MATTERS

But that`s just really not true, as we know for a long time that even so much as one nucleotide pair can have a dramatic change, in fact a methylation alone can have an enourmous effect. So even the human genome project is barely enough in terms of quality. Whole genome shotgun sequencing is a top technology, no question about that but only to provide a first glimpse, similar to a page being scanned and getting a thumbail before the actual scanning progcess. In the future as software and our understanding of biology becomes better all this might change in favor of shotg. seq. but until then it remains as i pointed out.Slicky 12:26, 24 August 2006 (UTC)[reply]

Please provide reference

Please provide a reference for "For example, to complete the Human Genome Project, most of the human genome was sequenced at 12X or greater coverage; that is, each base in the final sequence was present, on average, in 12 reads. Even so, current methods have failed to isolate or assemble reliable sequence for approximately 1% of the (euchromatic) human genome.". Thank you. —The preceding unsigned comment was added by 130.212.235.23 (talk) 21:29, 20 March 2007 (UTC).[reply]

The answer to your question is present in the question itself..."most of the human genome was sequenced at 12X." Most of the sections that aren't complete have actually been sequenced but due to the highly repetitive nature of the repeats it is difficult to determine exactly how many repeats are present.

For example, suppose you sequence two fragments

Frag 1: AGCGATTAATTAATTAATTAATTAATT

Frag 2: AATTAATTAATTAATTAATTGCGCCAG

In this case, it's very difficult to know how many repeats of AATT are between the two unique sequences at the ends. This isn't an easy problem to solve because chain termination sequencing can only get about 300-1000 bases accurately and it's therefore difficult to bracket repetitive sequences with unique sequences.Mrestko 02:26, 13 April 2007 (UTC)[reply]

Omission?

The following two paragraphs are written on the topic of paired end sequencing:

To apply the strategy, high-molecular-weight DNA is sheared into random fragments, size-selected (usually 2, 10, 50, and 150 kb), and cloned into an appropriate vector. The clones are then sequenced from both ends using the chain termination method yielding two short sequences. Each sequence is called an end-read or read and two reads from the same clone are referred to as mate pairs. Since the chain termination method usually can only produce reads between 500 and 1000 bases long, in all but the smallest clones, mate pairs will rarely overlap.

The original sequence is reconstructed from the reads using sequence assembly software. First, overlapping reads are collected into longer composite sequences known as contigs. Contigs can be linked together into scaffolds by following connections between mate pairs. The distance between contigs can be inferred from the mate pair positions if the average fragment length of the library is known and has a narrow window of deviation.

This applies specifically to the preparation of a 'paried end sequence library' - Am I correct in thinking that this step implicitly comes after shotgun sequencing of (at least some of) another clone library?

If that is the case, it would be nice to make it clear. Also the typical process of generating and sequencing the original clone library would be nice to see. i.e. in what way does generating the initial library of clones for sequencing differ from creating the paired end clone library? In this context, should the page describe the so called map as you go strategies?

Hopefully this makes sense, and you can see my confusion over the implicit information above (else I need to keep reading!) --Dan|^(talk) 14:59, 7 August 2008 (UTC)[reply]