Bioinformatic Analysis of Catalase Gene Family of Arabidopsis and Maize

Catalase (CAT) is an antioxidant enzyme, which plays a key role in plant development and abiotic stress response. In this study, three maize ZmCATs and three Arabidopsis AtCATs genes were identified by screening NCBI and phytozome databases. The protein characteristics, evolutionary relationship, gene structure, protein secondary and tertiary structure, gene expression at different developmental stages or under abiotic stress treatment were analyzed. The results showed that there were some similarities between CAT families from the two species. The proteins encoded by ZmCATs and AtCATs genes were hydrophobic. According to phylogenetic tree analysis, CATs gene family was divided into two subfamilies. ZmCATs and AtCATs both contain two conserved domains, and the secondary structures of ZmCATs and AtCATs protein are mainly α-helix and random coils. The cluster analysis of ZmCATs expression in different growth and development stages showed that ZmCAT1 , ZmCAT2 and ZmCAT3 were highly expressed in maize growth and development stage. Under abiotic stress, the expression of ZmCATs gene was significantly affected by temperature stress. The results of this study provide a theoretical basis for exploring the function of CAT in maize and other crops.

stress Catalase (CAT), which converts H2O2 to H2O and O2, is a common antioxidant enzyme in almost all organisms and plays an important role in the protection of plants against abiotic stresses and oxidative stresses (Zhou et al., 2018). CAT mainly scavenges hydrogen peroxide in free radicals and non-radical reactive oxygen species, thus maintaining a certain balance between the production and scavenging of ROS and maintaining the stability in plants (Sofo et al., 2015). The appropriate amount of H2O2 in an organism can act as a second messenger involved in cellular signaling pathways, causing activation of immune cells, cell proliferation, cellular senescence and apoptosis (Dash et al., 2012). On the contrary, excess ROS is capable of damaging various components in biological cells, such as proteins, lipids, DNA, etc. (Umeno et al., 2017). Therefore, CAT as a scavenger plays an extremely vital role in plants. The CATs gene family that encodes catalase in plants is a small family, which has been reported in different species. Barley (Hordeum vulgare L.) (Kendall et al., 1983) has 2 CATs genes, whereas there are three CATs genes in Arabidopsis (Arabidopsis thaliana) (Du et al., 2008), tobacco (Nicotiana L.) (Willekens et al., 1994) and rice (Oryza sativa L.) (Willekens et al., 2014), and four CATs genes in cucumbers (Cucumis sativus L.) (Hu et al., 2016) and Cotton (Gossypium spp.) (Wang et al., 2019). In cotton, the CAT proteins of two different varieties both contained a catalase core domain and a catalase immune response domain (Wang et al., 2019).
In ginseng, various expression profiles of PgCat1 was found in the leaves, stems and roots of the seedlings (Purev et al., 2010). Different stresses such as heavy metals, plant hormones, osmotic agents, high light, and abiotic stress can all induce the expression of PgCAT1 (Purev et al., 2010). The mRNA transcripts of CAT1 and CAT2 were only detected in non-senescent leaves of tobacco, but the expression of CAT3 was detected in both non-senescent leaves and senescent leaves (Niewiadomska et al., 2009). Overexpression of ScCAT1 enhanced the growth of recombinant E. coli under the stress of CuCl2, CdCl2 and NaCl, indicating that ScCAT1 could improve the tolerance of recombinant E. coli (Su et al., 2014). In rice seedlings, CATA, CATB and CATC genes are highly expressed in leaf sheaths, roots and leaves, respectively (Iwamoto et al., 2013). In Arabidopsis, the expression of CAT genes have obvious tissue specificity. CAT1 and CAT2 are mainly expressed in leaves and siliques, while CAT3 is mainly expressed in stems and roots. The expression of CAT2 and CAT3 was found to be controlled by circadian rhythms; CAT2 can be activated by drought stress, while CAT3 can be activated by abscisic acid, oxidative treatments and senescence (Du et al., 2008).
Maize (Zea mays L.) is a widely grown food crop and an important energy crop and fodder that plays an important role in the national economy. Abiotic stresses such as salt (Iwamoto et al., 2013), drought (Du et al., 2008), low temperature (Hackenberg et al., 2013) and high temperature (Li et al., 2013) are the main factors affecting its yield. For example, abiotic stress and other osmotic stress will cause ROS accumulation, which will affect plant growth and yield (Zou et al., 2015). In this study, the ZmCATs and AtCATs gene families were identified and bioinformatics analysis was performed. The protein characteristics, evolutionary relationships, structural domains, secondary and tertiary structures of the ZmCATs and AtCATs gene families and the expression levels at different developmental stages were predicted and analyzed. This study will provide a theoretical basis and lay a solid foundation for the further cloning and identification of the biological functions of ZmCATs and AtCATs.

Genome-wide identification of the ZmCATs and AtCATs gene families
In order to retrieve catalase (CAT) genes from the maize genome, the Arabidopsis accession numbers in published articles were used to query the CATs protein sequence, and three maize CATs genes were retrieved in the NCBI database and the maize genome database. The characteristics of ZmCATs and AtCATs are shown in Table 1. The three ZmCATs gene family members are named ZmCAT1~3 according to their chromosomal distribution. The coding region of ZmCATs is between 1628~2500 bp, the number of amino acids of the protein is between 492~704 aa, the molecular weight is between 56~80 kDa, and the isoelectric point is between 6.45~9.33. The coding region of AtCATs ranged from 1900 to 2023 bp, the amino acid number of the protein was 492 aa, the molecular weight was 56.69 to 56.93 kDa, and the predicted isoelectric points were 6.63 to 7.31, respectively. The proteins encoded by both ZmCATs and AtCATs genes are hydrophobic proteins. Chromosome mapping indicated that the three ZmCATs genes were located on three of the ten chromosomes of maize, while the three AtCAT genes are located on two of the five chromosomes of Arabidopsis ( Figure 1). The prediction of subcellular localization showed that 3 ZmCATs and 3 AtCATs proteins were localized in different subcellular localizations, which may play different functions in different positions.

Phylogenetic tree of ZmCATs and AtCATs gene families
The CAT protein sequences of Arabidopsis thaliana, wheat, rice, Sorghum, and upland cotton were retrieved from online databases and compared with ZmCATs. The phylogenetic tree was constructed using MEGA 7.0 software. The results show that subfamily I contains all members of the maize CAT family and one member of the Arabidopsis family (AtCAT3) as well as SbCATs, OsCATs, and TaCATs. Subfamily II is composed of two members of the Arabidopsis family (AtCAT1, AtCAT2) and members of the GhCAT family ( Figure 2). In Figure  2, the CATs of monocotyledonous plants and dicotyledonous plants converge into different branches, and the relationship between Arabidopsis AtCAT3 and monocotyledonous CAT is closer than that of dicotyledonous plants. Most Arabidopsis CAT and its upland cotton homologues form phylogenetic branches with higher values, which indicates that they have a high degree of sequence homology. Similarly, maize CATs have a high degree of sequence homology with Sorghum, indicating their close kinship.

Structure and functional domain analysis of ZmCATs and AtCATs
The online analysis software GSDS was used to analyze the structure of maize and Arabidopsis CAT family genes. The number of exons in ZmCATs is between 4 to 8, whereas the number of exons in AtCATs is between 6 and 7, respectively ( Figure 3A). In order to further verify the characteristics of the CATs family of maize and Arabidopsis, a conserved domain analysis was conducted. The results showed that the two species have a high degree of similarity in the domains, and they both contain a catalase core, the structural domain (Catalase, Pfam: PF00199) and the catalase-related immune response domain (Catalase-rel, Pfam: PF06628) ( Figure 3B). It shows that these two domains are necessary for CAT protein to function.

Analysis of the secondary and tertiary structure of ZmCATs and AtCATs
The prediction of the secondary structure of ZmCATs and AtCATs family proteins shows that the proportion of α-helical amino acids in ZmCATs is between 26.42% and 27.87%, the proportion of extended chain amino acids is between 14.43% and 15.34%, the proportion of β-sheet amino acids is generally between 5.40% and 5.65%, and the proportion of random curly amino acids in ZmCATs is generally between 51.41% and 53.66%, respectively. The proportion of α-helical amino acids in AtCATs is between 26.02% and 26.42%, the proportion of extended chain amino acids is between 15.45% and 15.65%, the proportion of β-sheet amino acids is between 4.885 and 5.49%, the random coil amino acids of AtCATs, and the quantity ratio is between 52.44% and 53.25%, respectively. It can be seen that both ZmCATs and AtCATs proteins have α-helices and random coils as their main structures (Table 2; Figure 4A). Predictive analysis of the tertiary structure of ZmCATs and AtCATs protein showed that the tertiary structure of ZmCAT3 and AtCAT1 protein is similar, and the tertiary structure of ZmCAT1, ZmCAT2, AtCAT2, AtCAT3 protein is similar ( Figure 4B).

Expression analysis of ZmCATs and AtCATs genes in different tissues and developmental stages
In order to determine the expression profiles of CAT genes in maize and Arabidopsis in different tissues and developmental stages, the maize GDB database was searched and expression analysis was performed. As shown in Figure 5, ZmCATs genes are expressed in different tissues, and the expression levels of ZmCATs are different in each developmental stage of vegetative growth and reproductive growth. The expression levels of the same ZmCAT in different developmental stages are also different. Compared with ZmCAT1 and ZmCAT2, ZmCAT3 has a higher expression level in various tissues in the reproductive and vegetative growth phases. The expression level of ZmCAT2 in various tissues during the reproductive growth stage and the vegetative growth stage is higher. Compared with ZmCAT2 and ZmCAT3, the expression level of ZmCAT1 in various tissues during the reproductive growth period is lower. AtCATs genes are expressed in different tissues. The expression level of AtCAT1 in mature pollen is higher than that in other growth and developmental stages, and the expression level of AtCAT1 in the developmental stages other than mature pollen is lower than that of other AtCATs genes. Compared with AtCAT3 and AtCAT2, AtCAT1 has a higher expression level in the growth and development stage of Arabidopsis.

Expression analysis of ZmCATs gene under abiotic stress
Under high temperature stress, the expression of ZmCAT1 and ZmCAT3 genes in the ZmCATs family shows an up-regulated expression pattern, and the ZmCAT2 gene shows a down-regulated expression pattern. Under cold stress, the up-regulated expression of ZmCAT2 gene is greater than the down-regulated expression of ZmCAT1 and ZmCAT3 genes. Under salt and ultraviolet irradiation stress, ZmCAT3 gene is up-regulated, but the magnitude of up-regulation is not obvious (Figure 6).

Discussion
Catalase (CAT) plays an important role in plant growth and development, adversity stress response, oxidative senescence and other physiological processes. Its activity is affected by various biological and abiotic factors, such as light, temperature, high salt, drought, and plant hormones and pathogenic microorganisms (Gondim et al., 2012;Luna et al., 2005). Catalase (CAT), peroxidase (POD) and superoxide dismutase (SOD) are effective scavengers of ROS, which can prevent cell damage (He et al., 2005).
In this study, the chromosome location map showed that AtCAT1 and AtCAT3 are localized on the same chromosome, but they are distributed in different subfamilies in the evolutionary tree. The reason why AtCAT1 and AtCAT2 are in the same subfamily may be that their sequences at the nucleotide and amino acid levels are more similar to each other than AtCAT3. Although they are not on the same chromosome, they remain tightly linked to promote co-regulation (Frugoli et al., 1996). In Hu L's study of cucumber CATs genes, the phylogenetic tree analysis of cucumber CATs revealed that Arabidopsis CATs were divided into three subfamilies, AtCAT1-3 were in different subfamilies, and the relationship between Arabidopsis and cucumber CAT families is relatively close (Hu et al., 2015). This study is based on the classification of CATs from monocotyledonous plants and divided them into two subfamilies. The reason for the obvious difference in this study may be related to the different members of the evolutionary tree and the different ways of classification.
In recent years, researchers have successively cloned related CAT genes from a variety of plants, and studied the functions and expression characteristics of these genes . Chen Shanshan predicted and analyzed the secondary structure of sugarcane S-CAT protein, and found that in the secondary structure of the protein, random coils accounted for the highest proportion of 60.77%, and the proportions of α-helix and extended strands were relatively small, accounting for 17.07% and 22.15%, respectively (Chen et al., 2012). It is consistent with the results of this study on the secondary structure prediction and analysis of Arabidopsis and maize CAT protein. It is speculated that random coils are the largest structural element in the secondary structure of CAT proteins, and α-helices and extended chains are dispersed throughout the protein.
DuYanyan's research showed that different stresses may trigger different signal transduction pathways and activate the transcription of different CAT genes (Du et al., 2008). Plant catalase plays a variety of roles in germination, photorespiration, resistance to oxidative stress, and may mediate signal transduction involving H2O2 as a second messenger (McClung, 1997;Yang et al., 1997). This conclusion is reflected in the expression patterns of maize CAT genes under different abiotic stresses in this study. In Frugoli's research, Arabidopsis CAT2 and CAT3 showed high expression patterns in leaves (Joo et al., 2014). In this study, the FPKM expression level of the CAT genes screened in the phyzome database showed that only Arabidopsis CAT3 showed high expression levels in the leaves. In rice, OsCATA, OsCATB and OsCATC participate in the environmental stress response, the regulation of root growth regulation and photorespiration ROS level or dynamic balance, and the overexpression of OsCATA and OsCATC could enhance rice drought-tolerance (Leung et al., 2018).
This study explored the evolutionary relationship of CAT genes in monocotyledonous and dicotyledonous plants, and found that it is relatively conservative in the evolution of different species. The distribution patterns of CAT gene structures, exons/introns and conserved motifs are relatively similar in maize and Arabidopsis. The gene expression profile showed that the CAT genes of the two species were specifically expressed in different tissues. At the same time, it was found that the expression patterns of the maize CAT genes were different under different abiotic stresses. Particularly, the expression of the CAT genes was obviously induced by temperature stress, which indicated that the CAT genes may play important roles in the regulation of response to temperature stress, while the specific regulation mechanism needs further study.

Basic characteristics of CATs in maize and Arabidopsis
The accession numbers, coding lengths and amino acid numbers of ZmCATs genes were obtained from the Maize Sequence database (https://www.maizegdb.org/). The molecular formula, molecular weight, isoelectric point and hydrophilicity of AtCATs and ZmCATs were obtained using the Expasy website (https://web.expasy.org/ protparam/). Subcellular localization analysis was performed using WolF PSORT Prediction (https://wolfpsort.hgc.jp/).

Protein evolutionary analysis of CATs
The CATs protein of Sorghum, Arabidopsis, rice, corn, wheat and upland cotton were analyzed by MEGA6.0 software. The Neighbor-Joining (NJ) method was used to construct a phylogenetic tree, and Bootstrap was set to 1000 for testing. In order to analyze the genetic relationship and phylogenetic relationship between the CATs of different species.

Analysis of gene structure and protein domains of CATs in maize and
Arabidopsis GSDS (http://gsds.cbi.pku.edu.cn/) was used to analyze the gene structure of ZmCATs and AtCATs. The conserved domains were analyzed through the SMART website (http://smart.embl-heidelberg.de/) and IBS 1.0.2 software.

Advanced structural prediction of CATs protein in maize and Arabidopsis
The protein secondary structure of ZmCATs and AtCATs was predicted through the online website (https://npsa-prabi.ibcp.fr/cgi-bin/npsa_automat.pl?page=/NPSA/npsa_server.html). The tertiary structure of ZmCATs and AtCATs was predicted using the swiss website (https://swissmodel.expasy.org/). The predicted protein secondary structures were combined with the phylogenetic tree constructed by MEGA6.0 into a graph.

Expression analysis of CATs genes in maize and Arabidopsis at different tissues and development stages
Expression analysis of annotated ZmCATs was performed using the maize GDB (https://www.maizegdb.org/) database, and expression analysis of annotated AtCATs was performed via the Plant Genome Database (https://phytozome.jgi.doe.gov), and HemI software was used for clustering the heat map.

Expression analysis of maize CATs under abiotic stresses
The transcriptome database of maize under various abiotic stresses was downloaded from NCBI (https://www.ncbi.nlm.nih.gov/sra), including hot (the SAR number: SRR1238715, SRR1819196 and SRR1819198), cold (SRR1238717, SRR1819204 and SRR1819205), salt (SRR1238719) and UV (SRR1238720) treatment conditions (Makarevitch et al., 2015). Maize seedlings was treated under 24 natural light conditions until two leaves and one heart stage, and the treatment for cold was set at 5°C for 16 h, heat treatment at 50°C for 4 h, salt treatment at 300 mmol for 20 h and UV treatment for 2 h. The differentially expressed maize CAT genes were screened and the data were compiled and plotted using Excel software.

Authors' contributions
GSN is the experiment designer and executor of the experiment, completing data analysis and writing the first draft of the paper; ZJJ and WYL participated in the experimental design and analysis of the experimental results; SWJ and LJX participated in the revision of the paper; DD and LM participated in the inspection and revision of the paper format; XJY is the creator and person in charge of the project, directing experimental design, data analysis, thesis writing and revision. All authors read and approved the final manuscript.