Statements in which the resource exists as a subject.
PredicateObject
rdf:type
lifeskim:mentions
pubmed:issue
2
pubmed:dateCreated
2011-2-22
pubmed:abstractText
The recent release of twenty-two new genome sequences has dramatically increased the data available for mammalian comparative genomics, but twenty of these new sequences are currently limited to ?2× coverage. Here we examine the extent of sequencing error in these 2× assemblies, and its potential impact in downstream analyses. By comparing 2× assemblies with high-quality sequences from the ENCODE regions, we estimate the rate of sequencing error to be 1-4 errors per kilobase. While this error rate is fairly modest, sequencing error can still have surprising effects. For example, an apparent lineage-specific insertion in a coding region is more likely to reflect sequencing error than a true biological event, and the length distribution of coding indels is strongly distorted by error. We find that most errors are contributed by a small fraction of bases with low quality scores, in particular, by the ends of reads in regions of single-read coverage in the assembly. We explore several approaches for automatic sequencing error mitigation (SEM), making use of the localized nature of sequencing error, the fact that it is well predicted by quality scores, and information about errors that comes from comparisons across species. Our automatic methods for error mitigation cannot replace the need for additional sequencing, but they do allow substantial fractions of errors to be masked or eliminated at the cost of modest amounts of over-correction, and they can reduce the impact of error in downstream phylogenomic analyses. Our error-mitigated alignments are available for download.
pubmed:grant
pubmed:commentsCorrections
http://linkedlifedata.com/resource/pubmed/commentcorrection/21340033-10581034, http://linkedlifedata.com/resource/pubmed/commentcorrection/21340033-11017085, http://linkedlifedata.com/resource/pubmed/commentcorrection/21340033-11779843, http://linkedlifedata.com/resource/pubmed/commentcorrection/21340033-12136410, http://linkedlifedata.com/resource/pubmed/commentcorrection/21340033-12466850, http://linkedlifedata.com/resource/pubmed/commentcorrection/21340033-12529310, http://linkedlifedata.com/resource/pubmed/commentcorrection/21340033-12529312, http://linkedlifedata.com/resource/pubmed/commentcorrection/21340033-1316617, http://linkedlifedata.com/resource/pubmed/commentcorrection/21340033-1358801, http://linkedlifedata.com/resource/pubmed/commentcorrection/21340033-14500911, http://linkedlifedata.com/resource/pubmed/commentcorrection/21340033-14744981, http://linkedlifedata.com/resource/pubmed/commentcorrection/21340033-15057822, http://linkedlifedata.com/resource/pubmed/commentcorrection/21340033-15059994, http://linkedlifedata.com/resource/pubmed/commentcorrection/21340033-15060014, http://linkedlifedata.com/resource/pubmed/commentcorrection/21340033-15459287, http://linkedlifedata.com/resource/pubmed/commentcorrection/21340033-15479945, http://linkedlifedata.com/resource/pubmed/commentcorrection/21340033-15574820, http://linkedlifedata.com/resource/pubmed/commentcorrection/21340033-15778292, http://linkedlifedata.com/resource/pubmed/commentcorrection/21340033-16136131, http://linkedlifedata.com/resource/pubmed/commentcorrection/21340033-16341006, http://linkedlifedata.com/resource/pubmed/commentcorrection/21340033-16645093, http://linkedlifedata.com/resource/pubmed/commentcorrection/21340033-17567995, http://linkedlifedata.com/resource/pubmed/commentcorrection/21340033-17571346, http://linkedlifedata.com/resource/pubmed/commentcorrection/21340033-17975171, http://linkedlifedata.com/resource/pubmed/commentcorrection/21340033-17981928, http://linkedlifedata.com/resource/pubmed/commentcorrection/21340033-17989253, http://linkedlifedata.com/resource/pubmed/commentcorrection/21340033-18714091, http://linkedlifedata.com/resource/pubmed/commentcorrection/21340033-19056694, http://linkedlifedata.com/resource/pubmed/commentcorrection/21340033-19411606, http://linkedlifedata.com/resource/pubmed/commentcorrection/21340033-19858363, http://linkedlifedata.com/resource/pubmed/commentcorrection/21340033-20305016, http://linkedlifedata.com/resource/pubmed/commentcorrection/21340033-20333182, http://linkedlifedata.com/resource/pubmed/commentcorrection/21340033-2062834, http://linkedlifedata.com/resource/pubmed/commentcorrection/21340033-271968, http://linkedlifedata.com/resource/pubmed/commentcorrection/21340033-2734106, http://linkedlifedata.com/resource/pubmed/commentcorrection/21340033-3294162, http://linkedlifedata.com/resource/pubmed/commentcorrection/21340033-7288891, http://linkedlifedata.com/resource/pubmed/commentcorrection/21340033-8165143, http://linkedlifedata.com/resource/pubmed/commentcorrection/21340033-9521922
pubmed:language
eng
pubmed:journal
pubmed:citationSubset
IM
pubmed:status
MEDLINE
pubmed:issn
1932-6203
pubmed:author
pubmed:issnType
Electronic
pubmed:volume
6
pubmed:owner
NLM
pubmed:authorsComplete
Y
pubmed:pagination
e17034
pubmed:meshHeading
pubmed:year
2011
pubmed:articleTitle
Error and error mitigation in low-coverage genome assemblies.
pubmed:affiliation
Department of Biological Statistics and Computational Biology, Cornell University, Ithaca, New York, United States of America.
pubmed:publicationType
Journal Article, Research Support, U.S. Gov't, Non-P.H.S., Research Support, Non-U.S. Gov't, Evaluation Studies, Research Support, N.I.H., Extramural