Identification and Expression Profiles of the WRKY Gene Family in Pecan ( Carya illinoinensis )

WRKY gene family encodes a large transcription factor family that play critical roles in various physiological processes. However, a systematic analysis of the WRKY transcription factor family has not been reported in pecan ( Carya illinoinensis ). In this study, a total of 89 putative pecan WRKY genes ( CiWRKYs ), named CiWRKY1-89 , were identified from the whole genome of pecan. Most of WRKY domain sequences in CiWRKY proteins were “WRKYGQK”, but there were “WRKYGKK” in CiWRKY19, CiWRKY27 and CiWRKY88, and “WKKYGQK” in CiWRKY85. All CiWRKYs were unevenly distributed on 16 chromosomes, the largest number of genes on chromosome 3. Based on phylogenetic analysis, the 89 putative CiWRKYs could be classified into three major groups. The CiWRKY genes shared similar exon-intron distribution, and conserved motifs within the same subgroups. Expression profiles indicated that CiWRKY16 , CiWRKY30 , CiWRKY42 , CiWRKY62 , and CiWRKY69 genes were involved in flower differentiation and development, and majority of CiWRKYs genes were differentially expressed during embryo development. The present study provides reference for further comparative genomics and functional studies of this important class of transcriptional regulators in pecan.


Identification of WRKY gene family in the pecan genome
In order to identify CiWRKYs comprehensively, the HMM profile of WRKY domain (PF03106) and the Arabidopsis WRKY protein sequences were as queries to search for putative CiWRKY genes. Finally, a total of 89 CiWRKY genes were obtained from the pecan genome and renamed from CiWRKY1~CiWRKY89 based on their chromosome positions. The detailed information of each CiWRKY gene was listed (Table 1), including gene ID, group, gene length, Molecular weight (MW), isoelectric point (pI), and subcellular localization. The deduced length of CiWRKY proteins ranged from 152 aa (CiWRKY72) to 747 aa (CiWRKY77). The predicted molecular weight ranged from 17.655 kD (CiWRKY72) to 80.303 kD (CiWRKY77) and pI value varied from 4.89 (CiWRKY69) to 9.94 (CiWRKY47). The majority of CiWRKY proteins (95.5%) were predicted to be located in the nucleus. Whereas CiWRKY5 and CiWRKY85 were located in chloroplasts, CiWRKY12 and CiWRKY51 were located in peroxisome.

Phylogenetic analysis and chromosomal distribution of CiWRKYs
The conserved domain of WRKY proteins in pecan were evaluated. Out of 89 CiWRKY members, 85 were properly conservative in the 'WRKYGQK' domain ( Figure 1). Based on the phylogenetic tree of WRKY proteins from pecan and Arabidopsis, all the 89 CiWRKY proteins could be divided into three major groups ( Figure 2). There were 16 CiWRKY proteins in Group I, each of which contained two WRKY domains and the C2H2-type zinc-finger motifs. 60 CiWRKY proteins assigned to Group II, which harbored one WRKY domain and C2H2-type zinc-finger motifs. The members of Group II were further classified into five subgroups and comprised of Group 10,27,15,and 16 members,respectively. Finally,10 CiWRKY proteins, each with a single WRKY domain and C2HC zinc-finger structure, were assigned to Group III. CiWRKY19, CiWRKY27, and CiWRKY88 exhibited sequence divergence in the WRKY domain. Therefore, three CiWRKY proteins (CiWRKY62, CiWRKY69, and CiWRKY85) were not classified into any group. Totally 89 candidate CiWRKYs were unevenly distributed on sixteen pecan chromosomes ( Figure 3). Chromosome 1 had the largest number (9, 10.11%) of BoWRKYs, chromosome 14 and 16 had the least number of CiWRKYs, only CiWRKY86 and CiWRKY89 respectively. Chromosome 7 contained seven CiWRKYs, which all belonged to Group II.

Motif analysis and exon-intron organization of CiWRKY genes
Fifteen conserved motifs in full length CiWRKY proteins were identified by using the MEME online tool (http://meme.sdsc.edu/meme/intro.html) (Figure 4). It can be observed that the motif 1 and 2, which are the WRKY domains, widely distributed in 89 members. Some motifs are shared by specific group such as motif 9 present in Group IIb. Group I contained the largest number of motifs, and motif 5, 13, 15, and 61 only existed in Group I. As expected, members in the same family shared similar motif compositions, suggesting functional similarities. The exon-intron structure of all CiWRKY genes was analyzed to gain more insight into the evolution of the WRKY family in pecan ( Figure 4). As a result, 39 CiWRKY genes (39/89) contained two introns, 22 CiWRKY genes were found to possess four introns, 12 CiWRKYs had three introns and ten CiWRKYs had only one intron. CiWRKY76 contained the largest number of introns. All the Group III CiWRKYs contained two introns. Members in the same subgroups shared similar gene structures.

Expression profiles of CiWRKYs during flower and embryo development process
To further understand the function of CiWRKYs, the global expression patterns of CiWRKYs at different stages of flower development were systematically analyzed. The expression profiles of CiWRKYs can be divided into four types ( Figure 5). 20 genes were included in type 1, which were almost not expressed during flower development. Genes within type 2 (17 genes) displayed high expression at the five stages. Especially, the expression level of CiWRKY16, CiWRKY30, CiWRKY42, CiWRKY62, and CiWRKY69 were the highest. The other CiWRKYs exhibited varied expression levels. The expression of CiWRKYs were also investigated during the embryo development of pecan ( Figure 6). 85.4% (76/89) of CiWRKYs were expressed during the embryo development. CiWRKY14, CiWRKY58, CiWRKY68, and CiWRKY70 were only expressed during the early stage of cotyledon Molecular Plant Breeding 2023, Vol.14, No.11, 1-14 http://genbreedpublisher.com/index.php/mpb 9 development, indicating they mainly participate in the organ differentiation process. Five CiWRKY genes (CiWRKY47, CiWRKY36, CiWRKY79, CiWRKY55, and CiWRKY73) showed higher expression levels in the fully matured stage of the embryos. CiWRKY41, CiWRKY9, CiWRKY42, CiWRKY80, CiWRKY21, and CiWRKY29 were highly expressed throughout the embryo development, these genes maybe closely related to the process of nutrients accumulation and embryonic tissue development.

Analysis of cis-acting elements in the promoter regions of CiWRKY genes
Ten CiWRKYs highly expressed during flower and embryo development were selected for further cis-element analysis (Figure 7). Nine meristem expression elements (CAT-box) were identified in CiWRKY41, CiWRKY62, CiWRKY69, and CiWRKY80 promoters. The four CiWRKYs were all had abscisic acid responsiveness elements (ABRE). The seed-specific regulation elements (RY-element) were found in the promoter regions of CiWRKY62 and CiWRKY80, indicating these two genes were very likely to participate in the embryo development process. Additionally, MeJA-responsiveness and salicylic acid responsiveness (TCA-element) regulatory elements were located in the promoter regions of seven and five CiWRKYs, respectively.  (Ramamoorthy et al., 2008;International Rice Genome Sequencing, 2005;Tomato Genome, 2012;Huang et al., 2012;Wang et al., 2014). The conserved domain of WRKY proteins in pecan were evaluated. Out of 89 CiWRKY members, 85 were properly conservative in the 'WRKYGQK' domain. However, three CiWRKY proteins belong to Group IIc, CiWRKY19, CiWRKY27, and CiWRKY88 (WRKYGKK) "Q" were replaced by "K". This WRKYGKK is a common variant in previous studies and usually present in Group IIc (Song et al., 2014;Song et al., 2016a;Song et al., 2016b). In a few WRKY proteins, the WRKYGQK sequence were replaced by WKKY, WRRY, WSKY, WKRY, WVKY, WRIC, WRMC, WIKY, and WKRY (Jiang et al., 2017). As shown in this study, the WKKYGQK variant appeared in CiWRKY88.

Molecular
The CiWRKY genes were categorized into three groups (I, II, and III), Group II were further classified into five distinct subgroups (IIa-e). Chen et al. (2017) proposed that IIa and IIb could be merged as a single subfamily, and the IId and IIe can also be merged into one subgroup. The phylogenetic analysis in this study showed the CiWRKY genes in Group IIa were closely related to IIb, and Group IIe genes were clustered with genes in IId, which support this classification.

CiWRKY genes function in flower and embryo development
Numerous studies have proved that WRKY genes regulate plant growth and development. This study focuses on the expression of WRKY genes during flower and embryo development. CiWRKY21 clustered with Arabidopsis AtWRKY71, which positively promotes flowering via the direct modulation of AtFT and AtLFY expression . In this study, CiWRKY21 was highly expressed in the whole process of female flower, suggesting that this gene is related to flower bud differentiation and flower development. Arabidopsis AtWRKY75 is a positive factor in regulating flowering through the GA signaling pathway (Zhang et al., 2018). Moreover, CiWRKY42 exhibited relatively higher expression throughout the whole flower development process and was closely related to Arabidopsis AtWRKY75, indicating CiWRKY42 as Arabidopsis homologs maybe the key regulators of flower development. CiWRKY42, CiWRKY21, CiWRKY80, CiWRKY12, and CiWRKY 41, all belonging to Group IIc, were also highly expressed, we speculated that these Group IIc WRKY proteins may play a role in flower development. Embryo development is a very important stage in the research of pecan. The expression changes of CiWRKYs at three stages of embryo development varied greatly. CiWRKY68 exhibited higher expression in the early stage of cotyledon development, which indicates its potential role in embryo development. AtWRKY2, a ClWRKY68 homolog, which mediates seed germination and postgermination developmental arrest by ABA (Jiang et al., 2009). CiWRKY36 clustered together with AtWRKY41, which positively regulates ABA signaling and seed maturation genes during early post-germination seedling growth (Ding et al., 2014). In the expression profile, CiWRKY36 was highly expressed in the fully matured stage of the embryos, suggesting that this gene may have similar functions as AtWRKY41.

Identification and annotation of WRKY genes in pecan genome
The genome sequences of pecan and Arabidopsis were downloaded from Phytozome 13 (https://phytozome next.jgi.doe.gov/info/CillinoinensisPawnee_v1_1) (Lovell et al., 2021) and TAIR (http://www.arabidopsis.org), respectively. The Hidden Markov Model (HMM)profile for the WRKY domain (PF03106) was downloaded from the Pfam database (http://pfam.xfam.org/). Then HMMER3.0 program was used to search against pecan protein database with the E-value≦1e-5. Meantime, the Arabidopsis WRKY proteins used as the query, local BLASTp were scanned for WRKY domains in pecan genome using BioEdit, the E value was set to 1e-2. The two data sets were merged to remove the repetitive sequence, then the NCBI-CDD (https://www.ncbi.nlm.nih.gov/cdd) were used to further verify. The characteristic of pecan WRKY proteins were analyzed using the ExPASy software (https://web.expasy.org/protparam/), and the WoLF PSORT (https://www.genscript.com/tools/wolf-psort) was used to predict the subcellular localization.

Phylogenetic tree analysis and classification of the pecan WRKY family
Multiple sequence alignments of WRKY domains of CiWRKY proteins were performed using BioEdit software. The WRKY proteins from pecan and Arabidopsis were compared using the ClustalW tool in MEGA5.0 software, the phylogenetic tree was constructed with neighbor joining (NJ) (Bootstrap=1000). The phylogenetic tree of full-length sequences of pecan WRKY proteins was built with the same method. The chromosome distribution map of pecan WRKY gene family was drew by TBtools software (Chen et al., 2020).

Motif analysis and exon-intron structures
The conserved motifs in the 89 CiWRKY proteins were detected by MEME (http://meme.nbcr.net/meme/cgibin/meme.cgi), with a maximum motif number of 15; the optimum motif width was 6-50 amino acid residues. The phylogenetic tree, gene structure, and conserved motif of WRKY family genes in pecan were visualized by TBtools software (Chen et al., 2020).

Expression analysis of CiWRKY genes during flower and fruit development
To reveal the expression pattern of CiWRKY genes during the flower development, the transcriptome data comes from our previous research, which contained early stage of female flower differentiation, female inflorescence differentiation stage, female flower involucre formation stage, bud stage, and female flower in full bloom . The CiWRKYs expression data (Fragments per kilobase of transcript per million mapped fragments, FPKM) during embryo development was downloaded from RNA transcriptome data (BioProject ID PRJNA435846, . The FPKM values were used to estimate the expression level of each gene. The log2(FPKM) values of CiWRKY genes were used to draw heat maps by TBtools (Chen et al., 2020).

Category and number of cis-acting elements in the promoters of CiWRKYs
The 1 500 bp sequences upstream from the start codon of CiWRKYs, extracted from the pecan genome data by Tbtools, were labeled as putative promoter regions. The online program PlantCARE (https://bioinformatics.psb.ugent.be/webtools/plantcare/html/) was used to analyze the cis-acting elements of ten selected CiWRKYs (Lescot et al., 2002).

Authors' contributions
WM was the executor of experimental design and research in this study. WM completed the data analysis and wrote the first draft of the manuscript. CJX, TX, and ZXW collected the data. BTY and ZCC guided experimental design and manuscript revision. All authors read and approved the final manuscript.