Bioinformatics Analysis of Cellulose Synthase CesA Gene from Miscanthus lutarioriparius

Based on 3 generations transcriptome sequencing technology-PacBio SMRT single-molecule real-time sequencing method, supplemented with 2 generation sequencing method, a complete and accurate unigene library was constructed. After comparing with closely related species, the cellulose synthase gene ( CesA ) 4 , 7 , 9 gene sequences of Miscanthus lutarioriparius were spliced and named as MlCesA4 , MlCesA7 and MlCesA9 . Then, Bioinformatics analysis software was used to construct phylogenetic trees of proteins, to predict and analyze the phosphorylation sites of post-translational modifications of amino acids, and to predict and analyze conserved domains of proteins. The analysis results showed that the sequence lengths of MlCesA 4, 7, 9 genes were 2980 bp, 3310 bp, and 3208 bp, respectively, which were closely related to Sorghum bicolor and Zea mays ; cellulose synthase of Miscanthus lutarioriparius (MlCesA) MlCesA4 and MlCesA9 have 43 phosphorylation sites, MlCesA7 has 48 phosphorylation sites; MlCesA4 is an unstable protein with 6 transmembrane domains, 4 domains are outside the membrane and 3 are inside the membrane.; MlCesA7 and MlCesA9 are stable proteins, both of which have 8 transmembrane domains, 5 domains are outside the membrane and 4 are inside the membrane. The three are all hydrophilic proteins, and the secondary structure is dominated by α-helix and random coils.

Cellulose is an important component of the cell wall of higher plants. It is usually in the form of tiny filaments. The basic unit of cellulose is pyranoid d-glucose, which is connected to the long chain structure by -1, 4 glycosidic bonds. Many genes are involved in the biosynthesis of plant cellulose, the most important of which is cellulose synthase gene Cellulose synthase (CesA) belongs to the glycosyltransferase-2 superfamily. Cellulose synthases, which are usually located on the cytoplasmic membrane, have eight transmembrane domains, including six at the carboxyl end and two at the amino end. The amino terminus of cellulose synthase contains four cysteine domains with CxxC motif, which is conducive to the formation of zinc finger structure (Taylor, 2008). In addition, there are two conserved substrate binding sites and catalytically active sites (McFarlane et al., 2014). These domains and active sites interact with other enzymes or proteins in the process of cellulose biosynthesis and participate in different synthesis processes. In 1996, Peal et al cloned the -1, 4-glucosidase transferase gene encoding CESA catalytic subunit from cotton for the first time by using cDNA library random sequencing and sequence analysis (Pear et al., 1996). Since then, cellulose synthase genes in other species have been reported successively, among which the model plant Arabidopsis Thaliana has the most mature research on cellulose synthase genes (Richmond, 2000). There are 10 known cellulose synthase genes in Arabidopsis, AtCesA1, AtCesA3, AtCesA6 and AtCesA10 are involved in the formation of primary cell walls. AtCesA4, AtCesA7, and AtCesA8 are only expressed in the secondary cell wall and coordinate the formation of the secondary cell wall (Taylor et al., 2003); AtCesA2, AtCesA5, AtCesA9 and AtCesA6 are partially redundant in function (Desprez et al., 2007). Subsequently, Oryza sativa (Hazen et al., 2002), Zea mays (Appenzeller et al., 2004), Hordeum vulgare (Burton et al., 2004), Populus trichocarpa (Djerbi et al., 2005), Boehmeria nivea (Tian et al., 2008), Phyllostachys edulis (Zhang et al., 2010) and other CesA genes have also been cloned. Miscanthus Lutarioriparius, a perennial C4 plant belonging to Miscanthus, has the characteristics of fast growth, high yield, low cost, easy reproduction and wide genetic range, which can preserve water and soil, prevent erosion and maintain the surrounding ecological environment. It is a kind of plant resources with both ecological and economic benefits (Yi, 2012). The fiber cells in the stems of Miscanthus Lutarioriparius stalks account for about half of the total number of cells, and the cellulose content is high, and they are often widely used as extremely high-quality papermaking materials (Liu et al., 2001). As a special biomass raw material crop in China, Miscanthus Lutarioriparius is attracting more and more researchers' attention.
1 Results and Analysis 1.1 Acquisition of MlCesA gene fragment Based on 3 generations transcriptome sequencing technology-PacBio SMRT single-molecule real-time sequencing method, supplemented with 2 generation sequencing method, a complete and accurate unigene library was constructed. After comparing with closely related species, the cellulose synthase gene (CesA) 4, 7, 9 gene sequences of Miscanthus lutarioriparius were spliced and named as MlCesA4, MlCesA7 and MlCesA9. According to the transcriptome information, the gene sequence of CesA4, 7 and 9 was extended by kmer, and the length of kmer was set as 57. The length of MlCesA4 gene was 2 980 bp, MlCesA7 gene was 3 310 bp, and MlCesA9 gene was 3 208 bp. The RNA in M. Lutarioriparius sample was extracted and measured by agarose gel electrophoresis and ultrahigh ultraviolet spectrophotometer. The 28s and 18s RNA bands were clear ( Figure 1) and the value of OD260/OD280 was 2.18, indicating that the RNA extraction effect was good and the integrity was high, and the subsequent library construction could be carried out. According to the effective concentration of the library and data output, the Pacbio Sequel platform was used for transcriptome sequencing. The longest open reading frame of MlCesA4 is Frame+3, and its coding region is between 12~2 960 bp, with a total of 2 949 bp and encodes 982 amino acid residues. The longest open reading frame of MlCesA7 is Frame+1, and its coding region is between 22~3 285 bp, with a total of 3 264 bp and encodes 1087 amino acid residues. The longest open reading frame of MlCesA9 is Frame+2, and its coding region is between 35~3 190 bp, with a total of 3 156 bp and encodes 1051 amino acid residues.

Construction of phylogenetic tree and motif analysis of Miscanthus lutarioriparius cellulose synthase (MlCesA)
The three gene sequences obtained were translated into protein sequences. Through Blastp homologous sequence alignment, it was found that Miscanthus lutarioriparius CesA4 has high similarity with Sorghum bicolor CesA4 (XP_002456361.1), Zea mays CesA11 (NP_001105236.2) and Panicum miliaceum CesA4 (RLM93083.1). Miscanthus lutarioriparius CesA7 has high similarity with Miscanthus x giganteus CesA7 (AMQ81247.1), Sorghum bicolor CesA3 (XP_021309425.1) and Zea mays CesA7 (PWZ13382.1). Miscanthus lutarioriparius CesA9 has high similarity with Sorghum bicolor CesA9 (XP_002460229.1), and Zea mays CesA12 (NP_001105532.1). The results of Blast amino acid homology sequence alignment were combined with MEGA software to construct phylogenetic tree (Figure 2). According to the phylogenetic tree, Miscanthus lutarioriparius CesA is very similar to CesA of sorghum and corn. It is preliminary estimated that Miscanthus lutarioriparius is a relative species of sorghum and corn. According to the results of the phylogenetic tree, the sequence similar to MlCesA4, 7, 9 and the model plant Arabidopsis CesA were selected for motif analysis. According to the analysis results of MEME software, there are 15 Motif structures, and MlCesA4, 7 and 9 all contain Motif1~Motif15 ( Figure 3). Compared with the model plant Arabidopsis, MlCesA4, 7 and 9 motifs showed little volatility, indicating that their structure was more conservative. MlCesA4 repeated Motif3 structure, may be some overlap on the function, MlCesA7 and MlCesA9 Motif structure is very similar.

Prediction and analysis of basic physicochemical properties and post-translational modifications of amino acids of MlCesA protein
After analysis by ProtParm tool software, the basic physicochemical properties of MlCesA4, 7,9 proteins were obtained (Table 1). According to the data in the Table 1, MlCesA4 is an unstable protein, MlCesA7 and MlCesA9 are stable proteins, and the average coefficient of their total hydrophilicity is negative, all of which are hydrophilic proteins.
Protein phosphorylation is one of the most common and important covalent modifications in organisms, which can regulate the activity and function of proteins. Protein phosphorylation can occur on a variety of amino acids, of which protein phosphorylation on the side chains of serine (Ser), threonine (Thr) and tyrosine (Tyrosine, Tyr) residues is the main one (Yuan et al., 2020). The prediction and analysis of amino acid post-translational modification were performed on the entire polypeptide chain of MlCesA4, 7, and 9 through Net Phos 2.0 Server software ( Table 2). The results showed that there were 43 amino acid sites with MlCesA4 score above the threshold of 0.5, indicating that there were 43 phosphorylation sites of MlCesA4. There were 48 amino acid sites with MsCesA7 score above the threshold of 0.5, and 48 phosphorylation sites. There were 43 amino acid sites with MsCesA9 score above the threshold of 0.5, and 43 phosphorylation sites.

Prediction and analysis of conservative domains
Using the Conserved Domain tool in NCBI to predict the conserved domains of the proteins encoded by the MlCesA4, 7, and 9 genes, the results showed that the protein encoded by the MlCesA4 gene has a superfamily conserved region PLN02195 super family (cl33434), and the protein encoded by the MlCesA7 gene has a superfamily conserved region PLN02436 super family (cl33490), the protein encoded by the MlCesA9 gene has superfamily conserved regions PLN02189 super family (cl33433) These conserved regions belong to the cellulose synthase family.

Prediction and analysis of MlCesA protein transmembrane domain and signal peptide
The online software analysis of TMHMM Server v.2.0 transmembrane domains (Figure 4) shows that MlCesA4 has 6 transmembrane domains, of which 4 are outside the membrane and 3 are inside the membrane; MlCesA7 and MlCesA9 have 8 transmembrane domains, of which 5 are outside the membrane and 4 are inside the membrane. According to Signal P-5.0, the potential signal peptide cleavage site in the amino acid sequence of MlCesA4, 7, 9 protein was predicted, and the result showed that there was no signal peptide sequence.

The effect of CesA gene on plant growth and development
In recent years, with the rapid development of genetic engineering and bioinformatics, new CesA genes have been discovered in succession, and relevant studies on plant CesA genes have become more and more comprehensive. Studies have shown that the enzymes involved in cellulose synthesis are different in different tissues and different growth and development stages of plants. In gramineous rice, OsCesA4, OsCesA7 and OsCesA9 genes were found to be cooperatively expressed in seedlings, stems, immature panicles and roots, and jointly participate in the formation of secondary cell walls (Tanaka et al., 2003). In Miscanthus × giganteus, MgCesA4 and MgCesA7 are involved in the formation of primary cell walls, and MgCesA10, 11 and 12 are involved in the biosynthesis of secondary cell walls, and a cellulose synthase complex with a ratio of 1:1:1 is formed (Zeng et al., 2020). CesA1, CesA3 and CesA6 genes of sorghum are involved in the formation of primary cell walls and play the same role as those of Arabidopsis. CesA4, CesA7 and CesA9 genes are highly homologous to the formation of rice secondary cell walls and CesA4, CesA7 and CesA8 genes of Arabidopsis (Brian et al., 2016). According to the evolution and relationship between the species, MlCesA7 gene may be involved in the formation of primary cell wall, while MlCesA4 gene and MlCesA9 gene may be involved in the formation of secondary cell wall.

Expression regulation of CesA gene
Cellulose synthesis can be regulated at the transcriptional level through CesA, and the biggest difference between different CesA genes is the presence or absence of introns in the coding sequence and the position of introns in these sequences (Richmond, 2000).Triticum aestivum CesA1, 2 and 6 genes are involved in the formation of PCW with 13 introns, while TaCesA4, 7 and 8 are involved in the formation of SCW with 7, 12 and 9 introns respectively (Kaur et al., 2016), The number of introns of CesA gene in wheat PCW was higher than that of CesA gene in SCW. Previous studies have shown that introns in animal and plant genomes can enhance gene expression (Shaul, 2017), and genes containing introns have higher transcription levels (Tang and Gou, 2019). This indicates that in the expression of plant CesA genes, the transcription level of CesA genes involved in SCW formation is higher than that of CesA genes involved in PCW formation.
In addition to the regulation of CesA gene at the transcriptional level, the post-transcriptional level of CesA gene also affects the synthesis of cellulose. In Hordeum vulgare, the small RNA produced by HvCesA6 can selectively attenuate the expression of CesA gene, and this expression may affect the expression of other cell wall biosynthesis genes, thereby greatly affecting the content of barley cellulose ( Michael et al., 2008). In the plant cellulose synthesis pathway, in addition to CesA gene, the expression and mutation of CSL gene (Cao et al., 2019), KORRIGAN gene (Aggarwa et al., 2015), COBRA gene (Et al., 2020) and SUSY gene (Ahmed et al., 2019) in plants can also affect the synthesis of cellulose.

Experimental materials
The experimental material Miscanthus lutarioriparius (04081) was collected from Changsha, Hunan (113.07 E°, 28.18°N), and is now planted in the Miscanthus germplasm resource nursery of Hunan Agricultural University.

Extraction of total RNA from Miscanthus lutarioriparius
The young plants of Miscanthus lutarioriparius were selected, frozen in liquid nitrogen and ground, and the total RNA of Miscanthus lutarioriparius was extracted with an RNA extraction kit. Take 1 μL of the extracted total RNA and test it after electrophoresis on a 1.0% agarose gel under 180 v voltage for 16 min. After that, Nanodrop, Qubit 2.0, and Agilent 2100 were used to detect the purity, concentration, and integrity of RNA samples.

Acquisition of gene sequences
Based on 3 generations transcriptome sequencing technology-PacBio SMRT single-molecule real-time sequencing method, supplemented with 2 generation sequencing method, a complete and accurate Miscanthus lutarioriparius unigene library was constructed (Altschul et al., 1997). After comparing with closely related species, Use kmer to