2 Fruit Research Institute,Fujian Academy of Agricultural Sciences,Fuzhou 350013
Author Correspondence author
Plant Gene and Trait, 2020, Vol. 11, No. 10
Received: 20 Sep., 2020 Accepted: 23 Sep., 2020 Published: 30 Sep., 2020
A group of SNP locis with high specificity and stability were developed by using SLAF-seq sequencing technology, which will provide theoretical basis for loquat molecular assisted breeding, genetic map construction and species evolution. In the present study, 294 loquat accessions were collected, and genomic DNA was sequenced and analyzed according to SLAF-seq. The reference genome of pear was used for the prediction of electronic enzyme. Double digestion with enzymes HaeIII and Hpy166II then SLAF-seq library was constructed. In this study, over 526.63M reads data were generated for the 294 natural accessions, the number of read lengths obtained for each library was in the range of 119,893-9,305,152 reads, with an average of 1,787,581. The sequencing quality value Q30 ranged from 88.26 to 95.67%, with an average of 93.61%. GC content was distributed in the range of 38.79%~42.92%, with an average value of 40.35%. The data volume of 0.16M reads obtained from rice sequencing was as control, and the efficiency of double ended comparison was 91.28%, indicating that SLAF database was basically normal. The result of bioinformatics analysis showed that 623,356 SLAF tags were obtained, of which 123,498 were polymorphic, with a polymorphism rate of 19.81%. A total of 1,604,434 population SNPs were initially called for this set of polymorphic SLAF tags, leaving 95,960 SNPs at MAF > 0.05 and completeness > 0.8 for the further analyses.
Loquat (Eriobotrya japonica Lindl.) originated in China and has a long history of cultivation in China. A variety of loquat genetic resources have been formed in the process of long-term selection and domestication. Up to now, there are 26 species and varieties or forms of loquat plants. More than 20 Eriobotrya resources in China. There are not only common loquat with high cultivation value, rich species and high evolution degree, but also wild type loquat resources with original characters such as Eriobotrya malipoensis Kuan, Eriobotrya prinoides Rehd. Et Wils, Eriobotrya obovata W.W. Smith, and Eriobotrya bengalensis (Roxb.) Hook. F.(Lin, 2017)
Loquat is widely distributed in Henan, Anhui, Jiangsu, Hubei, Hunan, Jiangxi, Zhejiang, Fujian, Guangxi, Yunnan, Guizhou, Sichuan, Taiwan, Hainan and Guangdong. Loquat is also distributed in Vietnam, Myanmar, Thailand and Indonesia. Common loquat is formed in the long-term cultivation and breeding of loquat in China, and there are many varieties. In addition to China, common loquat is also cultivated in Japan, the United States, South Africa, New Zealand, Spain and other places. Loquat was first introduced into Japan from the south of China in the Tang and Song Dynasties, and then spread abroad through Japan (Lin, 2019).
Loquat in China has distinct agronomic characteristics and variety characteristics, which is a valuable genetic resource for loquat gene bank in origin, evolution, resource evaluation, variety selection and production. Although RAPD (Fukuda et al., 2016), SSR (Wu et al., 2015), genic SSR (Sun et al., 2018) and genic SNP (Li et al., 2015), rad-snp (Yang et al., 2017) and other markers have been applied in Loquat Germplasm Resources Evaluation, genetic diversity analysis, genetic map construction, variety identification and domestication. However, in the process of genetic evolution from wild loquat to common loquat, there are obvious changes in their agronomic traits, and these important economic traits are mostly quantitative traits. Only on the basis of a large number of molecular markers, can we find the excellent alleles of related traits and closely link them with important agronomic traits, so as to analyze the genetic evolution relationship of loquat and molecular assisted selection breeding The improvement of important characters plays an important theoretical guiding significance.
In recent years, high-density and accurate markers based on high-throughput sequencing technology provide a new strategy for the study of plant genetic traits. SLAF-seq (specific locus amplified fragment sequencing) sequencing technology is also widely used in sweet potato (Su et al., 2016), cassava (Yu et al., 2018), Ammopitanthus mongolicus (Duan et al., 2018), raspberry (Yang et al., 2018), pitaya (Yu et al., 2018). Studies on SNP marker development and evolutionary relationship of several crops, such as grape (Li et al., 2019), Camellia japonica (Liu et al., 2019), ancient tea tree (Geng et al., 2019), laver moss (Li et al., 2019), Forsythia suspensa (Jiang et al., 2020), Perilla (Jiang et al., 2020) and other crops have achieved remarkable results. However, there are few reports on the development of SNP markers in loquat and its related species by high-throughput sequencing. In this study, we developed SNP molecular markers of loquat including related species by SLAF-seq technology, and obtained molecular markers with high coverage rate, which provided theoretical basis for understanding population structure change, genetic evolution and GWAS correlation analysis of important botanical characters of cultivated loquat species.
1 Results and Analysis
1.1 Database Construction Evaluation
According to the prediction of electronic digestion, haeIII + hpy166II was used in this experiment, and 116,171 SLAF tags were predicted according to the defined length of SLAF tags, as shown in Table 1. Using the rice sequencing result as control (data volume 0.16mreads), the double ended comparison was carried out by soap (Li et al., 2009) software. The results showed that the efficiency of double ended comparison in this experiment was 91.28% (Table 2), which was basically normal. In this study, the residual restriction sites in the inserts of reads were used to reflect the enzyme digestion efficiency. In this study, the enzyme digestion efficiency was 91.82%, and the proportion of partial enzyme digestion was 8.18% (Table 2), indicating that the establishment of SLAF database in this study was normal. In addition, it can be seen from Figure 1 that the length of the control insert is within the expected range, indicating that the sequencing method used in this study has high accuracy and normal sequencing quality.
Figure 1 Distribution of insert fragment of control reads Note: The abscissa is the length of insert fragment, and the ordinate is the reads percentage of corresponding length |
Table 1 Statistic results of enzyme-cut prediction |
Table 2 The alignment results between obtained reads of control and its genome sequences |
1.2 Quality assessment of sequencing data
In this study, the analysis range of 125 bp×2 and data evaluation were used to ensure the analysis quality. Through Illumina High-seq 2500 platform sequencing, a total of 526.63M reads data were obtained, and the reading length range of each sample was 119,893-9,305,152, and the average reading length was 1,787,581.
1.2.1 Sequencing quality value distribution check
In this study, Q30, the percentage of bases whose sequencing quality value is greater than or equal to 30, is used to ensure the sequencing quality. Figure 2 shows that the first 125 bp and the last 125 bp represent the distribution of the mass values of the first and the other end of the sequenced reads respectively; the darker the color of the quality values at the same position, the higher the proportion of the mass values in the data. In this study, the distribution range of Q30 sequencing quality value ranged from 88.26% to 95.67%, with an average of 93.61%. If it is higher than 88%, it indicates that the sequencing base error rate is low and the sequencing data obtained are qualified.
Figure 2 Distribution of sequencing quality values representative sample Note: The abscissa is the position of Reads, and the ordinate is the quality score of the single base |
1.2.2 Base distribution check
The base distribution of SLAF-seq sequenced reads was affected by PCR amplification and restriction sites. The first two bases showed base separation consistent with the restriction site, and the subsequent bases showed varying degrees of fluctuation (Yang et al., 2018). Therefore, we can detect the separation of GC and at by base distribution inspection, and then judge the quality of sequencing. The distribution of sequencing bases is shown in Figure 3. The first 125 bp and the last 125 bp are the base distribution of the first end and the other end of the double ended sequencing sequence, and the single G and C contents fluctuate in the range of 15%~20%. The GC content analysis of the obtained sequences showed that the percentage distribution of G and C bases in the total base was 38.79-42.92%, with an average of 40.35% (Figure 4). The GC content was generally not high, indicating that the sequencing requirements were met.
Figure 3 The base content distribution of double terminal 125 bp Note: The abscissa is the position of Reads , and the ordinate is the quality score of the single base |
Figure 4 Distribution of Q30 sequencing quality values and GC content Note: The abscissa is the position of Reads , and the ordinate is the quality score of the single base |
1.3 SLAF tag and SNP site statistics
In this study, the number of SLAF tags contained in the corresponding samples ranged from 78,764 to 163,817, and an average of 103,248 SLAF tags were developed for each sample. A total of 623356 SLAF tags were obtained. Among them, 123,498 were polymorphic, and the polymorphism ratio was 19.8%. The total sequencing depth of samples in SLAF tags ranged from 650,577 to 5,594,798, with an average total depth of 1,285,202; the average sequencing depth was 6.19 × - 38.58 × and the average sequencing depth was 12.27 × (Table 3)。
Table 3 Statistics of the SLAF number, total depth and average depth of obtained by sequencing of samples |
In this study, GATK (McKenna et al., 2010) and samtools (Li et al., 2009) were used to compare the sequenced reads to the SLAF tag sequence with the highest depth, so as to obtain a reliable SNP marker dataset. A total of 123,498 polymorphic SLAF tags were obtained, and SNP markers were further developed. Filtration based on integrity (> 0.8) and minor genotype frequency (MAF > 0.05), a total of 95,960 SNPs were obtained. The average integrity of SNPs detected in the samples ranged from 18.5% to 98.53%, with an average of 90.75%; the heterozygosity rate of SNPs in samples ranged from 4.51% to 35.98%, with an average of 19.4%. The specific SNP information statistics are shown in Table 4.
Table 4 Statistics of SNP site points, integrity and heterozygosity obtained by sequencing of different individuals |
2 Discussion and Conclusion
Loquat has been cultivated for 3000 years in China, and its germplasm resources are rich. Abundant germplasm resources are the decisive factors to increase the yield, improve the quality and improve the resistance of loquat. Compared with the conventional methods of RAPD, SSR and genetic SSR, SNP molecular marker technology has the advantages of high efficiency, short time consumption and high coverage. Because of its unique advantages, SNP has become an important auxiliary tool for plant origin and evolution, resource evaluation and assisted breeding.
In recent years, with the rapid development of high-throughput sequencing technology, simple genome sequencing technologies such as SLAF-seq can effectively overcome the problems of genome complexity and lack of sequence and marker information, and are more suitable for the development and analysis of large-scale SNP markers (Li et al., 2018; Yu et al., 2018; Li et al., 2019). For species without reference genome, SLAF-seq technology has been applied very early and achieved the desired results. Such as the development of SNP markers on chromosome 14 of cotton (Chen et al., 2014), the development of specific molecular markers of Elytrigia elongata (Chen et al., 2013), and the development of SNP loci in sweet potato (Su et al., 2014), In the absence of a reference genome, all of them were developed based on SLAF tags, indicating that this technology has become the focus and trend of research and application in many fields such as plant molecular assisted breeding.
In this study, a total of 526.63Mreads data were obtained from 294 loquat specific molecular markers based on high-throughput sequencing. The average reading length was 1787581, the average Q30 and GC contents were 93.61% and 40.35% respectively. The efficiency of double ended comparison was 91.28%. A total of 623,356 SLAF tags were obtained, of which 123,498 were polymorphic, with a polymorphism ratio of 19.81%. A total of 1,604,434 SNP markers were developed on the polymorphic SLAF tag. According to the integrity and MAF values, 95,960 highly consistent SNPs were obtained. In this study, the average sequencing depth was relatively high, and a large number of SNPs were obtained, which were combined with leaf traits (Dan et al., 2017), fruit traits (Deng et al., 2009) and other quantitative traits (Chen et al., 2011) of loquat, provided important genetic information and molecular marker assisted breeding markers for loquat resources evaluation and utilization, such as parent selection, heterosis utilization, association mapping analysis and evolution.
3 Materials and Methods
3.1 Plant materials
In this study, 294 loquat leaf samples were collected from Loquat National Germplasm Resources nursery of Fujian Academy of Agricultural Sciences and Yangdu innovation base of Zhejiang Academy of Agricultural Sciences. The samples were frozen in liquid nitrogen and stored in -70℃ refrigerator. The loquat resources used in the experiment were from different regions, of which 95 were from Fujian, 44 from Yunnan, 37 from Zhejiang, 36 from Guizhou, 15 from Jiangsu, 13 from Japan, 12 from Sichuan, 9 from Guangdong, 6 from Anhui and 6 from Fujian Samples were from Guangxi, 5 from Spain, 4 from Hubei, 3 from the United States, 3 from New Zealand, 2 from Hainan, 1 from Hunan, 1 from Jiangxi, 1 from Shanghai and 1 from South Africa. The test materials were divided into cultivated species or local species, wild or semi wild species and related species, of which 244 were cultivated species, (31 were semi wild species), and 19 were related species (Table 5).
Table 5 The information of regional distribution and category of the collected 294 loquat accessions |
3.2 DNA extraction
The genomic DNA of loquat leaves were extracted by CTAB, and then detected by 1% agarose gel electrophoresis. Then the DNA concentration and purity were detected by Bio-Photometer nucleic acid detector (Eppendorf), so as to ensure that the DNA genome of loquat reached the requirement of building library (OD260/OD280 between 1.8~2.0). According to the determined standard dosage, the dilution shall be carried out according to the following principles: when the concentration of the original solution x is more than or equal to 120 ng/ μL, the diluted volume shall be 15~120 μL; when the concentration of DNA stock solution is 20 ng/μL<X≤120 ng/μL, the original solution shall be used, and the sample volume shall not exceed 1/2 of the original solution volume. After the sample was diluted, 2μLwas taken for nanodrop test or LabChip DS test. After the sample is diluted, put the sample back into the original box and store it at -20°C for standby.
3.3 Determination and sequencing of enzyme digestion scheme
Since the loquat genome data has not yet been published, this study selected the genome of pear (Pyrus spp) as a reference for further prediction. The actual genome size of Eriobotrya japonica is about 654Mb, and the GC content is 40.35%; reference species information: the genome size of pear is 508Mb, and the GC content is 37.28%. Download address: ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/000/315/295/GCA_ 000315295.1_ Pbr_ v1.0/. The prediction of restriction sites mainly depends on the proportion of digested fragments, distribution, the degree of agreement with the specific experimental system, the number of tags and other factors to determine the most suitable enzyme digestion scheme (Davey et al., 2013). After that, the DNA was digested by enzyme, and the 3′ end was added with A, linked joint, PCR amplification, purification, sample mixing, gel cutting, and library construction. After passing the quality inspection, the library was sent to Beijing Baimaike Biotechnology Co., Ltd. for double ended sequencing. Oryza sativa indica was used as the control to evaluate the accuracy and effectiveness of the enzyme digestion experiment http://rapdb.dna.affrc.go.jp/.
3.4 Data statistics and quality evaluation
After sequencing, the obtained sequences were successively de spliced, decontaminated, and low-quality reading frames were removed to obtain clean sequences. The sequencing quality, base distribution, sequencing data evaluation and enzyme digestion efficiency, fragment selection and sequencing quality of control sample data were evaluated.
3.5 SLAF tag and SNP tag development
According to the sequence differences of polymorphic SLAF tags among different varieties, a genome-wide SNP marker was developed in all loquat populations through Shengxin analysis, and the representative and highly consistent SNPs in loquat natural population were used for population genetic polymorphism analysis (Duan et al., 2018).
Authors’ contributions
Li Xiaoying and Zheng Shaoquan were responsible for the design and implementation of the experiment; Li Xiaoying completed data analysis and paper writing; Xu Hongxia, Hu wenshun, Deng Chaojun and Chen Xiuping participated in the selection of experimental materials; Chen Junwei guided the experimental design, paper writing and revision. All authors read and approved the final manuscript.
Acknowledgments
This research is jointly funded by the National Youth Fund Project(31601734); the major science and technology project of new agricultural varieties breeding in Zhejiang Province(2016C02052-3); the Taizhou Institute local cooperation project (TZ2018006); and the key R & D program of Zhejiang Province (2018C02011).
Chen S. Q., Qin S. W., Huang Z. F., Dai Y., Zhang L. L., Gao Y. Y., Gao Y., Chen J. M. 2013, Development of specific molecular markers for Thinopyrum elongatum chromosome using SLAF-seq technique, Acta Agronomica Sinica, 39(4): 727-734)
https://doi.org/10.3724/SP.J.1006.2013.00727
Chen W., Yao J. B., Chu L., Li Y., Guo X. M., Zhang Y. S., 2014, The development of specific SNP markers for chromosome 14 in cotton using next-generation sequencing, Plant Breeding, 133(2): 256-261
https://doi.org/10.1111/pbr.12144
Chen X.P., Huang A.P., Jiang J.M, Zheng S.Q., Deng C.J., Wei X.Q., Hu W.S. and Jiang F., 2011, Numerical classification of the loquat germplasm, Acta Horticulturae Sinica, 38(4): 644-656
Davey J.W., Cezard T., Fuentes‐Utrilla P., Eland C., Gharbi K., Blaxter M. L., 2013, Special features of RAD Sequencing data: implications for genotyping, Molecular ecology, 22(11): 3151- 3164
https://doi.org/10.1111/mec.12084
PMid:23110438 PMCid:PMC3712469
Duan Y.Z., Wang J.W., Du Z.Y., Kang F.R., 2018, SNP sites developed by specific length amplification fragment sequencing (SLAF-seq) and genetic analysis in Ammopitanthus mongolicus, Bulletin of botanical research, 38(1): 141-147
Fukuda S., Ishimoto K., Sato S., Terakami S., Hiehata N., Yamamoto T., 2016, A high-density genetic linkage map of bronze loquat based on SSR and RAPD markers, Tree Genetics Genomes, 12: 80 80
https://doi.org/10.1007/s11295-016-1040-9
Geng G.D., Chen L.J., Zhang S.Q., 2019, SNP sites development in old Camellia sinensis based on specific length amplification fragment sequencing (SLAF-seq) technology, Non-wood Forest Research, 37(2):7-12
Jiang T.T., Liu L., Tian W., Xie X.L., Wen C.X., 2020, SNP Molecular Markers Development and Genetic Diversity Analysis of Perilla frutescens (L.), Molecular Plant Breeding, http://kns.cnki.net/kcms/detail/46.1068.S.20200323.2056.010.html
Jiang T., Wen C.X., Tian W., Xie X.L., Lu R.K., Wen S.Q., Liu L.D., 2020, SNP molecular markers development and genetic diversity analysis of Forsythia suspensa based on SLAF-seq technology, Molecular Plant Breeding
Kozich J.J., Westcott S.L., Baxter N.T., Highlander S.K., and Schloss P.K., 2013, Development of a dual-index sequencing strategy and curation pipeline for analyzing amplicon sequence data on the MiSeqIllumina sequencing platform, Applied and environmental microbiology,79(17): 5112- 5120
https://doi.org/10.1128/AEM.01043-13
PMid:23793624 PMCid:PMC3753973
Li B.B., Zhang H., Jiang J.f., Zhang Y., Fan X.C.,Liu C.H., 2019, Analysis of genetic diversity of grape germplasms using SLAF-seq technology, Acta Horticulturae Sinica, 46(11): 2109–2118
Li G.H., Chen H.C., Liu J.L., Luo W.L., Xie D.S., Luo S.B., Wu T.Q., Akram W. and Zhong Y.J., 2019, A high-density genetic map developed by specific-locus amplified fragment (SLAF) sequencing and identification of a locus controlling anthocyanin pigmentation in stalk of Zicaitai (Brassica rapa L. ssp. chinensis var. purpurea), BMC Genomics, 20: 343
https://doi.org/10.1186/s12864-019-5693-2
PMid:31064320 PMCid:PMC6503552
Li H., Handsaker B., Wysoker A., Fennell T., Ruan J., Homer N., Marth G., Abecasis G., Durbin R., 2009, The sequence alignment/map format and SAMtools, Bioinformatics, 25(16): 2078- 2079
https://doi.org/10.1093/bioinformatics/btp352
PMid:19505943 PMCid:PMC2723002
Li H.F., Huang Y.M., Li Y.Q., Hua J.F., Wu C.R., Fan J.Z., Chen T.Y., 2019, Phylogenetic analysis of sweetpotato germplasm resources based on SLAF-seq technology, Chinese Journal of Tropical Crops, 40(12): 2390-2396
Li M., Guo C., Wang Y., Li YJ., Tan F., Zhang J., 2018, SNP Sites Developed by SLAF-seq Technology in Arbor Willow, Southwest China Journal of Agricultural Sciences, 31(5): 891-895
Li R., Yu C., Li Y., Lam T. W., Yiu S. M., Kristiansen K., Wang J., 2009, SOAP2: an improved ultrafast tool for short read alignment. Bioinformatics, 25(15): 1966- 1967
https://doi.org/10.1093/bioinformatics/btp336
PMid:19497933
Li X. Y. , Xu H. X., Feng J. J., Zhou X. Y., Chen J. W., 2015, Mining of genic SNPs and diversity evaluation of landraces in loquat, Scientia Horticulturae, 195: 82- 88
https://doi.org/10.1016/j.scienta.2015.08.040
Lin S.Q., 2017, A Review on Research of the Wild Species in Genus Eriobotrya Germplasm and Their Innovative Utilization, Acta Horticulturae Sinica, 44 (9): 1704-1716
Lin S.Q., 2019, Analysis of historical data on two groups of words: loquat and“Pipa”(Lute instrument), loquat and“Luju”plant, Journal of Fruit Science, 36(7): 922-927
Liu K., Li K.X., Wei X.J., Liang W.H., Wang K., 2019, Development and genetic analysis on SNP sites from Camellia nitidssima based on SLAF-seq technology, Non-wood Forest Research, 37(3): 79-83
Mckenna A., Hanna M., Banks E.,Sivachenko A., Cibulskis K., Kernytsky A., Garimella K., Altshuler D., Gabriel S., Daly M., Depristo1 M. A., 2010, The genome analysis toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data, Genome research, 20(9):1297- 1303.
https://doi.org/10.1101/gr.107524.110
PMid:20644199 PMCid:PMC2928508
Shan Y.X., Deng C.J., Hu W.S., Chen J.W., Chen X.P., Qin Q.P., and Zheng S.Q., 2017, Diversity analysis of loquat(Eriobotrya)defoliation color. Acta Horticulturae Sinica, 44 (4): 755-767
Su W.J., Zhao N., Lei J., Wang L.J., Chai S.S., Yang X.S., SNP sites developed by specific length amplification fragment sequencing (SLAF-seq) in sweetpotato, 2016, Scientia agricultura sinica, 49(1):27-34
Sun J., LI X. Y., XU H. X., Zhang L., Chen J.W., 2018, Identification of white flesh loquat germplasms of Zhejiang province with MCID strategy using genic-SSR markers, Journal of Fruit Science, 35(5): 539-547
Sun X., Liu D., Zhang X., Li W., Liu H., Hong W., Jiang C., Guan N., Ma C., Zeng H., Xu C., Song J., Huang L.,Wang C., Shi J., Wang R., Zheng X., Lu C., Wang X., Zheng H.,2013, SLAF-seq: an efficient method of large-scale De novo SNP discovery and genotyping using high-throughput sequencing, PloS one, 8(3): e58700
https://doi.org/10.1371/journal.pone.0058700
PMid:23527008 PMCid:PMC3602454
Wu D., Fan W., He Q., Guo Q. G., Spano A. J., Wang Y., Timko M. P., Liang G. L., 2015, Genetic diversity of loquat [Eriobotrya japonica (Thunb.) Lindl.] native to Guizhou province (China) and its potential in the genetic improvement of domesticated cultivars, Plant Molecular Biology Report, 33: 952- 961
https://doi.org/10.1007/s11105-014-0809-y
Yang G., Sun L., Li P., Duan Y, 2018, SNP sites development based on SLAF-seq technology in raspberry, Chinese agricultural science bulletin, 34(36):58-64
Yang X. H., Najafabadi S.K., Shahid M. Q., Zhang Z. K., Jing Y., Wei W. L., Wu J. C., Gao Y. S., Lin S. Q., 2017, Genetic relationships among Eriobotrya species revealed by genome-wide RAD sequence data, Ecology and Evolution, 7, (8): 2861- 2867
https://doi.org/10.1002/ece3.2902
PMid:28428875 PMCid:PMC5395450
Yu B.C., Wei L.J., Lei K.W., Song E.L., Ma C.X., 2018, Development of SNP markers in cassava (Manihot esculenta Crantz) based on SLAF-seq technology, Plant Physiology Journal, 54 (6): 1029-1037
Yu C., CHEN Y., WANG C., CHEN J.Y., JIN CB., SNP sites development by specific length amplification fragment sequencing (SLAF-seq) and genetic analysis in red Pitaya, Chinese Journal of Tropical Crops, 2017, 38(4): 591-596