Research Article
Genomic Variation Detection Analysis of 3 Self-pruning Tomato Lines Based on Resequencing
2 Hebei Key Laboratory of Horticultural Germplasm Excavation and Innovative Utilization, Hebei Normal University of Science & Technology, Qinhuangdao, 066004, China
Author Correspondence author
Molecular Plant Breeding, 2022, Vol. 13, No. 9 doi: 10.5376/mpb.2022.13.0009
Received: 15 Mar., 2022 Accepted: 21 Mar., 2022 Published: 31 Mar., 2022
Du H.D., You X., Li Y.F., Wang S., Mao X.J., and Zhang N., 2022, Genomic variation detection analysis of 3 self-pruning tomato lines based on resequencing, Molecular Plant Breeding, 13(9): 1-8 (doi: 10.5376/mpb.2022.13.0009)
A close relationship was indicated between the yield of self-pruning tomato varieties and the number of main stem flowers. There were many reports on the genes controlling the traits of inflorescence sealing, but few on the related studies controlling the nodes of inflorescence sealing. To study the genetic variation between different self-pruning tomatoes stem and inflorescence segments, the candidate genes involved in the regulation of inflorescence segments were screened out. In this experiment, the variation of 3 tomatoes strains GXF, AXF, and 815 were detected by resequencing technology. The results showed that a total of 5 968 501 SNPs and 485 114 Indel were detected in the three samples, and a total of 33 473 gene mutations were detected after comparison with the reference genome. The GO and KEGG databases were used to compare the mutations in the CDS region, and it was found that they mainly focused on basic metabolism and zein biosynthesis. Through genome sequence alignment analysis, 16 genes that may be involved in the regulation of the terminal segments of the tomato main stem were screened. The results confirmed the mutation locus information of related genes and laid a foundation for further research on the genetic mechanism of inflorescence positions of self-pruning tomatoes.
Inflorescence is a major component of crop yield, and improving inflorescence is a recurring goal of artificial domestication and variety improvement (Meyer and Purugganan, 2013). For example, selection of inflorescence structure in wheat (Boden et al., 2015), maize (Doebley et al., 1997) and rice (Huang et al., 2009) has been widely used to improve yield. Although tomatoes have been domesticated for thousands of years, their inflorescence structure closely resembles that of their wild ancestors, making it difficult to exploit in breeding. It was not until the 1920s that a rare natural variation of the plant was found in the field, which reduced in size and looked as if it had been pruned (Molinero-Rosales et al., 2004). This plant has been used in large-scale production on land due to its characteristics of short growth cycle, early flowering and fruiting, more concentrated maturity time and suitable for dense planting.
With the rapid development of sequencing technology, more and more crop genes are studied by high-throughput sequencing technology to solve some biological problems. Zhang (2014) used high-throughput sequencing technology to resequence the whole genomes of two new apple varieties 'Su Shuai' and 'India', and obtained a total of 4 328 755 SNPs (Single nucleotide polymorphism) and 109 690 SVs (Structure variation). Through bioinformatics analysis, a total of 17 resistance genes and 19 genes related to fruit flavor were associated. Yin et al. (2020) conducted genome-wide resequencing analysis on alpine potato and local farm potato in Huaiyu Mountain, and detected 6 803 829 SNPs, 314 088 Indel and 1 490 CNV (Copy number variation), and finally confirmed that the function of the mutant genes was related to high-altitude habitat, defense response and photosynthesis. Zhang et al. (2019) detected genetic variation in 24 samples of Nyssa yunn-anensis based on simplified genome sequencing technology, and a total of 98 498 SNP loci were detected. In addition, 6 309 SNP markers were filtered and the population structure of 24 samples was analyzed, revealing the genetic structure and genetic pedigree of the population from the genome level.
The floral order of tomato plants is closely related to yield and is one of the most important yield factors, while the node position of the inflorescence of tomato main stem directly affects the length of harvest period and yield of tomato (Mao et al., 2018). Applied in the actual production of the cap type seal type variety for lower section an inflorescence, significantly lower production, and different section inflorescence caps of differences, influence the central results of spikelet, thereby directly affect tomato high yielding and uniformity, this does not favor the machinery operation, the overall economic benefit is low (Nie, 2004, Northern Horticulture, (6): 55; Silva et al., 2018). At present, researches on self-capped tomato mainly focus on variety selection, but there are few reports on the development of genetic and related molecular markers on node height of main stem capping inflorescence of tomato plant, which greatly restricts the selection and breeding speed of new varieties. Therefore, this study detected and analyzed the variation of node genes of different self-capping tomato inflorescence through whole-genome resequencing technology, which is helpful to screen out candidate genes that regulate node association of tomato capping inflorescence, and to clone and analyze their functions, thus providing theoretical guidance for tomato molecular breeding. It makes it possible to use genetic engineering and transgenic methods to improve the breeding of varieties.
1 Results and Analysis
1.1 Quality assessment of sequencing data
The total amount of data was 24.39 G, and the filtered Clean data was 24.22G. The Raw data of each sample ranged from 8 012 642 M to 8 260 038 M, with Q20≥97.63% and Q30≥94.83%, and GC content ranged from 36.22% to 36.47%. In summary, the data volume of all samples was sufficient, GC distribution was normal, and sequencing quality was qualified (Table 1).
Table 1 Statistics of quality evaluation of sequencing data |
1.2 Statistics of comparison with reference genome
'Heinz 1706' was taken as the reference genome (Table 2). The comparison rate of all samples ranged from 98.59% to 98.98%. The coverage of the average sequencing depth of 10X and 1X was above 99.33%, and the coverage of 4X was 89.65% (Table 3). In summary, the comparison results are normal and can be used for subsequent correlation analysis.
Table 2 Basic statistics of reference genome |
Table 3 Statistics of sequencing depth and coverage |
1.3 SNP detection and annotation
According to the comparison with the reference genome, a total of 5 968 501 SNPs were detected in the three samples. There were 931 773, 2 496 666, and 2 540 062 SNPs in 815, GXF and AXF, respectively, accounting for 15.61%, 41.83% and 42.56% of the total SNPS, respectively. Compared with GXF and AXF, the number of SNP in 815 was significantly different, indicating that 815 was significantly different from GXF and AXF. The difference in the number of SNP between GXF and AXF was 43 396, and the difference was small. To identify SNP variations between AXF and 815, GXF and 815, and GXF and AXF, pairings were compared (Table 4). The results showed that a total of 2 466 563 SNPs were detected between AXF and 815. A total of 2 436 748 SNPs were detected between GXF and 815. A total of 1 012 519 SNPs were detected between GXF and AXF.
Table 4 SNP statistical annotation results of three samples |
SNP variation was mainly concentrated in intergenic region, accounting for 66.02%~70.45% of the total. The second was in the upstream and downstream 1 kb region, accounting for 22.87%~24.78% of the total. The proportion of synonymous and non-synonymous mutations in CDS region was about 1.45%~1.89%. The proportion of non-synonymous mutations to synonymous mutations in the CDS region was 1.169~1.970.
Among all the mutated SNPs, there were 1 137 528 to 2 758 988 converted types; There were 881 348 to 2 152 540 of the transmutation types; The ratio of conversion to inversion ranged from 1.281 to 1.291 (Table 5). On the other hand, through further analysis, six types of SNPs were obtained. Comparing 815, GXF and AXF, it is found that T:A>C:G has the largest number of conversion types. C: G>T: A second; C:G>G:C is the least (Figure 1).
Table 5 Types of SNPs in the three samples |
Figure 1 Distribution of different types of SNPs |
1.4 InDel detection and annotation
The insertion and deletion of small fragments with a length less than 50 bp were detected by SAMTOOLs software. According to the comparison results with the reference genome, the Indel between these three samples was counted and a total of 485 114 Indel were found. 815 detected 89 529 Indel; GXF detected 197 549 Indel; AXF detected 198 036 Indel. In order to identify Indel variation between AXF and 815, GXF and 815, GXF and AXF, pairings were compared (Table 6). The statistical results showed that 201 087 Indel were detected between AXF and 815. A total of 200 500 Indel were detected between GXF and 815. A total of 88 325 Indel were detected between GXF and AXF.
Table 6 Indel statistical annotation results of three samples |
Genome-wide Indel variation between the three samples was annotated, and it was found that Indel variation was mainly concentrated in intergenic region, accounting for 48.02%~48.30% of the total. The second was in the region of 1 kb upstream and downstream, accounting for 37.95%-38.71 of the total. The proportion of frameshift mutations in CDS region accounted for 0.24%~0.27%. The distribution of Indel length in the coding regions of the three samples was analyzed. Within the whole genome, the longer the length of Indel variation, the less the number of mutations, and most types of mutations occurred in a single base, accounting for 42.31%~49.87% of the total (Figure 2).
Figure 2 CDS Indel length Distribution |
1.5 Genetic analysis of variation at DNA level of 3 samples
Mutations occurring in the CDS region may cause abnormal changes in gene functions. By searching for reference genomes, 33 473 mutated genes with non-synonymous mutations, synonymous mutations and transcoding mutations were detected among the genomes of three samples, and these mutated genes were compared by BLAST into GO and KEGG databases. A total of 14 510 genes were annotated in GO database, which were divided into three categories: biological process, cellular component and molecular function. In the category of biological processes, the largest number of metabolic processes, accounting for 46.39%; Among cell components, membrane was the most abundant, accounting for 28.23%. In the category of molecular functions, binding and catalytic activity were the largest subcategories, accounting for 50.64% and 38.91%, respectively (Figure 3).
Figure 3 GO annotation classification of mutant genes Note: (1): Metabolic process; (2): Cellular process; (3): Localization; (4): Biological regulation; (5): Stimulation response; (6): Cell component composition or biosynthesis; (7): Signaling; (8): Replication; (9): Developmental process; (10): Growth; (11): Multicellular biological process; (12): Polysomatic process; (13): Bioadhesion; (14): Reproductive process; and; (15): Membrane part; (16): Cell component; (17): Complex protein; (18): Cell; (19): Organelle; (20): Cell membrane; (21): Extracellular region; (22): Organelle; (23): Membrane−sealing inner cavity; (24): Binding; (25): Catalytic activity; (26): Transport activity; (27): Transcriptional regulatory factor activity; (28): Structural molecular activity; (29): Molecular regulatory function activity; (30): Nutrient receptor activity; (31): Antioxidant activity; (32): Activity of active molecular sensor; (33): Activity of molecular carrier |
In order to further understand the biological functions of mutated genes, KEGG database was selected for gene function annotation. A total of 6 pathways were significantly enriched (p<0.05, Figure 4). Including zeatin biosynthesis, phenylalanine metabolism, diarylheptane and gingerol biosynthesis, RNA polymerase, photosynthesis, pyrimidine metabolism and so on have significant significance.
Figure 4 Top 30 difference pathway of KEGG annotation Note: (1): Oxidative phosphorylation; (2): Photosynthesis; (3): Purine metabolism; (4): Pyrimidine metabolism; (5): Tyrosine metabolism; (6): Phenylalanine metabolism; (7): Neomycin biosynthesis; (8): Starch and sucrose metabolism; (9): N-glycan biosynthesis; (10): Amino sugar and nucleotide sugar metabolism; (11): Linolenic acid metabolism; (12): Sphingolipid metabolism; (13): Glycophospholipid biosynthesis; (14): Monoterpenoid biosynthesis; (15): Carotenoid biosynthesis; (16): Zeatin biosynthesis; (17) Phenylpropanoid biosynthesis; (18): Flavonoid biosynthesis; (19) Stilbene, heptane and gingerol biosynthesis; (20) Isoquinoline alkaloid biosynthesis; (21): Biosynthesis of anisodamine, piperidine and pyridine alkaloids; (22): The effect of cytochrome P450 on the metabolism of exogenous drugs; (23): Drug metabolism cytochrome P450; (24): ABC carrier; (25): RNA polymerase; (26): Mismatch repair; (27): Homologous recombination; (28): Protein digestion and absorption; (29): Chemical carcinogenesis; (30): Small molecule RNA in cancer |
The nonsynonymous mutation, synonymous mutation and frameshift mutation of three tomato plants with different nodes of main stem inflorescence were analyzed. It was found that the variation occurred during the biosynthesis of zeatin in all the three mutations. For this reason, candidate genes were excavated and 16 genes related to zeatin biosynthesis were screened out (Table 7), which may be the key reason for changing the different nodes of tomato inflorescence capping.
Table 7 Variation genes related to zeatin biosynthesis |
2 Discussion
In 2012, the publication of the whole genome sequence of tomato ended the history of tomato without reference genome, promoted the research on tomato evolution, utilization of wild germplasm resources, and accelerated the process of tomato genetic improvement and breeding (Lin et al., 2014, 36(12): 1275-1276). With the rapid development of high-throughput sequencing technology, which fully reflects its characteristics of low cost, fast speed and high throughput, resequencing technology has become an effective means to detect various genetic variations in the whole genome (Chakravorty and Hegde, 2018). The types of genetic variants that can be detected include SNP and Indel, and more complex genetic variants such as SV and CNV variants can also be detected in higher depth sequencing.
In this study, we selected representative tomato lines with high and low apex nodes and detected various genetic variations between different apex nodes of main stems by resequencing. The total number of SNPS in each sample ranges from 900 000 to 2 000 000, and the total number of Indel ranges from 90 000 to 190 000. The results showed that there were abundant genetic variation in the whole genome among tomato plants with different nodes of main stem inflorescence. The number of SNPS was significantly higher than the number of Indel variants, indicating that the changes of ecological environment on genome were mainly caused by single nucleotide variation, which was consistent with the research results of rape (Hu et al., 2018), tobacco (Zhou et al., 2019) and rice (Zhang et al., 2014).
In this study, non-synonymous mutations, synonymous mutations and frameshift mutations occurred among the detected genomes of the 3 samples were compared to GO and KEGG databases for analysis, and it was found that all the three mutations occurred in zeatin biosynthesis. 16 genes related to zeatin biosynthesis were screened out by genome sequence alignment analysis. In the process of plant growth and development, endogenous hormones, as a kind of trace organic substances generated by plant metabolism, play an important role in the process of flower bud differentiation (Davis, 2009). Studies have shown that cytokinins play an important role in promoting cell division in plants, and are an important class of plant hormones necessary for flower bud differentiation (Duan et al., 2015). Zeatin is a naturally occurring cytokinin in plants, and it is likely to be involved in the flower bud differentiation process of plants. When some important regulatory genes are mutated, the function of these genes will change, leading to the change of node position of tomato's main stem and inflorescence. In this study, mutation detection and analysis were carried out on three samples by whole genome resequencing, and the mutation loci were analyzed, and the key genes with mutation were discovered, which laid a good foundation for further gene localization, gene cloning and functional verification, and the utilization of excellent mutation resources in tomato breeding.
3 Materials and Methods
3.1 Test materials
In this experiment, GXF and AXF, 815, which were bred by the Tomato Research Group of Hebei Normal University of Science and Technology, were used as experimental materials. During the whole growing period, 14 inflorescences occurred on the main stem of GXF, and the internode length was shortened near the growing point, and no new leaves were formed, but flower clusters were formed around the growing point, which made the plant cap. AXF and 815 had two to four inflorescences, that is, plant capping.
3.2 DNA extraction and quality inspection
GXF, AXF and 815 were planted in the Solar Greenhouse of the Experimental Station of Hebei Normal University of Science and Technology, respectively, and were routinely managed in the field. After the inflorescence of the main stem of the plant was completely capped, inflorescence and leaves of 3 samples were taken respectively, and DNA was extracted using E.Z.N.A.Plant DNA Kit, and concentration and integrity were detected.
3.3 Library construction and sequencing
Eligible DNA samples were randomly interrupted at length of 350 bp. TruSeq Library Construction Kit was used to build the Library. After the library was constructed, the library was diluted to 1 ng/μL, and then the library was detected. After the results were consistent, the effective concentration of the library was accurately quantified by Q-PCR to ensure the quality of the library. Finally, Hiseq2000 was used for sequencing.
3.4 Data statistics
After high-throughput sequencing of the 3 samples, the effective sequencing data were compared to the reference genome by BWA software, and SNP and Indel were annotated for the sequence by SnpEff software to search for the mutated genes between the 3 samples and the reference genome. The mutated genes were compared to GO and KEGG databases by BLAST for annotation analysis.
Authors’ contributions
DHD, YX, LYF are the experimental design and executor of this study. DHD, YX, LYF completed data analysis and the writing of the first draft of the paper; ZN and MXJ participated in experimental design and analysis of experimental results; WS is the designer and principal of the project, directing experimental design, data analysis paper writing and revision. All authors read and approved the final manuscript.
Acknowledgments
This study was supported by the Natural Science Foundation of Hebei Province (C2019407077), the Key Research and Development Program of Hebei Province (20326325D), and the Doctoral Fund of Hebei Normal University of Science and Technology (2018YB021).
Boden S.A., Cavanagh C., Cullis B.R., Ramm K., Greenwood J.Jean F.E., Trevaskis B., and Swain S.M., 2015, Ppd-1 is a key regulator of inflorescence architecture and paired spikelet development in wheat, Nature Plants, 1(2): 14016-10421
https://doi.org/10.1038/nplants.2014.16
PMid:27246757
Chakravorty S., and Hegde M., 2018, Inferring the effect of genomic variation in the new era of genomics, Human Mutation, 39(9): 756-773
https://doi.org/10.1002/humu.23427
PMid:29633501
Davis S.J., 2009, Integrating hormones into the floral-transition pathway of Arabidopsis thaliana, Plant Cell Environ., 32(9): 1201-1210
https://doi.org/10.1111/j.1365-3040.2009.01968.x
PMid:19302104
Doebley J., Stec A., and Hubbard L., 1997, The evolution of apical dominance in maize, Nature, 386(6624): 485-488
https://doi.org/10.1038/386485a0
PMid:9087405
Duan N., Jia Y.K., Xu J., Chen H.L., and Sun P., 2015, Research progress on plant endogenous hormones, Zhongguo Nongxue Tongbao (Chinese Agricultural Science Bulletin), 31(2): 159-165
Hu M., Yao S.L., Cheng X.H., Liu Y.Y., Ma L.X., Xiang Y., Huang J.Y., Tong C.B., and Liu S.Y., 2018, Genomic variation of spring, semi-winter and winter Brassica napus by high-depth DNA re-sequencing, Zhongguo Youliao Zuowu Xuebao (Chinese Journal of Oil Crop Sciences), 40(4): 469-478
Huang X.Z., Qian Q., Liu Z.B., Sun H.Y., He S.Y., Luo D., Xia G.M., Chu C.C., Li J.Y., and Fu X.D., 2009, Natural variation at the DEP1 locus enhances grain yield in rice, Nature Genetics, 41(4): 494-497
https://doi.org/10.1038/ng.352
PMid:19305410
Mao X.J., Wang S., Wang Y., Cao X., Sun Z.F., and Wu C.C., 2018, Genetic difference of plant type characteristics and ssr markers associated with pruning inflorescence number of two self-pruning cultivar tomatoes, Beifang Yuanyi (Northern Horticulture), (21): 13-16
Meyer R.S., and Purugganan M.D., 2013, Evolution of crop species: genetics of domestication and diversification, Nat. Rev. Genet., 14(12): 840-852
https://doi.org/10.1038/nrg3605
PMid:24240513
Molinero-Rosales N., Latorre A., Jamilena M., and Lozano R., 2004, SINGLE FLOWER TRUSS regulates the transition and maintenance of flowering in tomato, Plant, 218(3): 427-434
https://doi.org/10.1007/s00425-003-1109-1
PMid:14504922
Silva Ferreira D., Kevei Z., Kurowski T., de Noronha Fonseca M.E., Mohareb F., Boiteux L.S., and Thompson A.J., 2018, BIFURCATE FLOWER TRUSS: a novel locus controlling inflorescence branching in tomato contains a defective MAP kinase gene, J. Exp. Bot., 69(10): 2581-2593
https://doi.org/10.1093/jxb/ery076
PMid:29509915 PMCid:PMC5920302
Yin M.H., Wang Q., Zhang H.L., Cai X.H., Xu C.Q., Chen F.L., Liu S.Y., Zhang Q.W., Cai H., and Chen R.H., 2020, Whole genome re-sequencing analysis of alpine potato and local farm potato in Huaiyu Mountain under high altitude habitats, Yijinzuxue Yu Yingyong Shengwuxue (Genomics and Applied Biology), 39(3): 1198-1207
Zhang S.J., 2014, Analysis of genetic variation associated with three agronomic traits in ‘Su Shuai’ apple, Thesis for M.S., Nanjing Agricultural University, Supervisor: Qu S.C., pp.35-45
Zhang S.S., Kang H.M., and Yang W.Z., 2019, Population genetic analysis of Nyssa yunn-anensis by reduced-representation sequencing technique, Zhiwu Yanjiu (Bulletin of Botanical Research), 39(6): 899-907
Zhang Z.Y., Pu Z.G., Wang P., Xiang Y.W., Cai P.Z., and Zhang Z.X., 2014, Sequencing research on the whole genome of rice mutant induced by space, Xinan Nongye Xuebao (Southwest China Journal of Agricultural Sciences), 27(2): 469-475
Zhou S.Q., Liu D.Y., Pan X.H., Qu J.K., Cheng L.R., Ren M., Chao J.T., Zhang Y., and Luo C.G., 2019, Sequence Modification analysis of tobacco mutant derived from space mutagenesis, Zhiwu Yichuan Ziyuan Xuebao (Journal of Plant Genetic Resources), 20(2): 377-386