pubmed:abstractText |
The Arabidopsis Genome Initiative has released up to now more than 80% of the genome sequence of Arabidopsis thaliana. About 70% of the identified genes have at least one paralogue. In order to understand the biological function of individual genes, it is essential to study the structure, expression and organization of the entire multigene family. A systematic analysis of multigene families, made possible by the amount of genomic sequence data available, provides important clues for the understanding of genome evolution and plasticity. In this paper, four multigene families of A. thaliana are characterized, namely LCAD, HD-GL2, LGT and MYST. Members of HD-GL2 and LCAD have already been reported in plants. The LGT genes specify proteins containing motifs of glycosyl transferase. No plant genes similar to the LGT genes have been reported to date. The novel MYST family, most likely plant-specific, encodes proteins with no identified function. Sequencing and in silico analysis led to the characterization of 29 novel genes belonging to these four gene families. The organization, structure and evolution of all the members of the four families are discussed, as well as their chromosome location. Expression data of some of the paralogues of each family are also presented.
|