pubmed-article:10380198 | pubmed:abstractText | Genomic science and structural biology meet in the relationship between the sequence and the structure of nucleic acids. The structure that supports each function is preserved in the process of evolution as specific sequences. Particularly, the same sequence which appears in a different place such as a palindromic or repetitive sequence has biophysical meaning: recognition site of dimers, forming stem-loops, and contributions to global structure of nucleic acids. Also, the genetic network, transduction pathway, and tissue specificity largely depend on these. Although the relationship between them can be found experimentally, there is increasing demand for automated analysis. Especially, it is desirable to extract the same character sequences of arbitrary length (especially, very long ones) which co-occur at an arbitrary separation. We propose an algorithm to identify the maximum match sequence at each position with a calculation cost of O(N log N) and memory space of O(N). Applying it to some sequences, we found unexpectedly large palindromes and repeats in DNA. | lld:pubmed |