SNP Mining by Genome Resequencing of 30 Apple Varieties in Shandong Province
1 Shandong Centre of Crop Germplasm Resources, Shandong Academy of Agricultural Sciences, Ji‘nan, 250101, China
2 Fruit Research Institute, Chinese Academy of Agricultural Sciences, Xingcheng, 125100, China
Molecular Plant Breeding, 2020, Vol. 11, No. 27 doi: 10.5376/mpb.2020.11.0027
Received: 19 Nov., 2020 Accepted: 30 Nov., 2020 Published: 31 Dec., 2020
© 2020 BioPublisher Publishing Platform
This article was first published in Molecular Plant Breeding in Chinese, and here was authorized to translate and publish the paper in English under the terms of Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Preferred citation for this article:
Duan N.B., Ma Y.M., Wang K., Wang X.M., Xie K., Bai J., Yang Y.Y., Pu Y.Y., and Gong Y.C., 2020, SNP mining by genome resequencing of 30 apple varieties in Shandong Province, Molecular Plant Breeding, 11(27): 1-12 (doi:10.5376/mpb.2020.11.0027)
In this article, we carried out genome resequencing and SNP mining for cultivated apples in Shandong Province for the sake of the rapid identification of apple varieties, germplasm evaluation, and utilization. Genomic DNA was extracted immediately from leaves of each sample, and Paired-end Illumina genomic libraries were prepared and sequenced on an Illumina Hiseq 4 000 platform following the manufacturer's instructions. Resequencing of the 31 apple genomes generated a total of 363 Gb high-quality cleaned sequences, with an average of 12.5 Gb per accession that represented approximately 15.9x coverage of the apple genome. The data volume fully meets the needs of downstream analysis and SNP mining. When we used the nucleotide mismatch parameter from 1~12, the mapping rate gradually increased to saturation. There was a highly significant correlation (p<0.0001) between the total mapping rate, mapping rate of pair-end data, and mismatch parameter. Univariate fourth-order equation (regression coefficient r>0.99) were predicted. As the mismatch rate increases, the accuracy of mapping decreases; the genome coverage gradually increases, and heterozygous sites' accuracy gradually increases. In this study, two algorithms were used in SNP mining. The intersection was further taken based on the 'chromosome+site information' as the eigenvalues to obtain a highly reliable single nucleotide variant dataset. A total of 374 404 SNP locus were detected. On average, one variation can be identified from 1 896 bp. The accuracy of the Sanger verification test is as high as 98.1%. Annotation analysis shows that among the 373 763 SNPs, 25 047 (6.7%) are located in the gene coding region, 143 269 (38.27%) are located in the intergenic region, and 179 426 (47.92%) are located in the 2 kb region upstream or downstream of the corresponding genes. Among the coding region SNPs, 13 422 are non-synonymous, while 11 625 are synonymous variations. The ratio of non-synonymous to synonymous SNP is 1.15: 1. Using the filtered 4DTV sites, population clustering analysis results constructed using neighbor-joining algorithms are in line with the trend of the classification of cultivated apples in Shandong province.
Cultivated apple; Genome resequencing; Development of SNP markers