Statements in which the resource exists as a subject.
PredicateObject
rdf:type
lifeskim:mentions
pubmed:issue
5-6
pubmed:dateCreated
2004-12-28
pubmed:abstractText
We estimate DNA sequence error rates in Genbank records containing protein-coding and non-coding DNA sequences by comparing sequences of the inbred mouse strain C57BL/6J, sequenced as part of the mouse genome project and independently by other laboratories. C57BL/6J was produced by more than 100 generations of brother-sister mating, and can be assumed to be virtually free of residual polymorphism and mutational variation, so differences between independent sequences can be attributed to error. The estimated single nucleotide error rate for coding DNA is 0.10% (SE 0.012%), which is substantially lower than previous estimates for error rates in Genbank accessions. The estimated single nucleotide error rate for intronic DNA sequences (0.22%; SE 0.051%) is significantly higher than the rate for coding DNA. Since error rates for the mouse genome sequence are very low, the vast majority of the errors we detected are likely to be in individual Genbank accessions. The frequency of insertion-deletion (indel) errors in non-coding DNA approaches that of single nucleotide errors in non-coding DNA, whereas indel errors are uncommon in coding sequences.
pubmed:language
eng
pubmed:journal
pubmed:citationSubset
IM
pubmed:status
MEDLINE
pubmed:issn
1042-5179
pubmed:author
pubmed:issnType
Print
pubmed:volume
15
pubmed:owner
NLM
pubmed:authorsComplete
Y
pubmed:pagination
362-4
pubmed:dateRevised
2006-11-15
pubmed:meshHeading
pubmed:articleTitle
DNA sequence error rates in Genbank records estimated using the mouse genome as a reference.
pubmed:affiliation
University of Edinburgh, School of Biological Sciences, Ashworth Laboratories, UK.
pubmed:publicationType
Journal Article, Comparative Study