The nucleotide sequence of the celZ gene coding for a thermostable endo-beta-1,4-glucanase (Avicelase I) of Clostridium stercorarium was determined. The structural gene consists of an open reading frame of 2958 bp which encodes a preprotein of 986 amino acids with an Mr of 109,000. The signal peptide cleavage site was identified by comparison with the N-terminal amino acid sequence of Avicelase I purified from C. stercorarium culture supernatants. The recombinant protein expressed in Escherichia coli is proteolytically cleaved into catalytic and cellulose-binding fragments of about 50 kDa each. Sequence comparison revealed that the N-terminal half of Avicelase I is closely related to avocado (Persea americana) cellulase. Homology is also observed with Clostridium thermocellum endoglucanase D and Pseudomonas fluorescens cellulase. The cellulose-binding region was located in the C-terminal half of Avicelase I. It consists of a reiterated domain of 88 amino acids flanked by a repeated sequence about 140 amino acids in length. The C-terminal flanking sequence is highly homologous to the non-catalytic domain of Bacillus subtilis endoglucanase and Caldocellum saccharolyticum endoglucanase B. It is proposed that the enhanced cellulolytic activity of Avicelase I is due to the presence of multiple cellulose-binding sites.