Association Analysis of Resistance to Soybean Mosaic Virus and Variation in Resistance Allelic Excavation

Soybean Mosaic Virus (SMV) is one of the most important soybean diseases in the world, and it seriously affects soybean yield and quality. In this paper, 405 SSR molecular markers were used in genome-wide scan to analysis 327 soybean varieties in the North of China. The purpose was to search the genetic loci and excellent resistance genes related to SMV1 and SMV3, and to reveal the differences between the genetic models of varieties. The results confirmed that the genetic differences among the tested varieties were large. Through the association analysis of the population genome-wide association (GWA) and linkage group association (LGA), 14 highly significant sites related to SN1 and SN3 were found. There are three sites related to SN1 (Sat-297, sat-296, sat-154) which are coincident with or adjacent to the resistance sites found by predecessors; while there are two at SN3 (Sat-154 and sat-726) which are coincident with previous studies, and sat-154 is a resistance site shared by SN1 and SN3. 7 genes related to resistance were obtained, of which Glyma13g25420 and Glycma13g25440, Glyma14g08700 and Glyma14g08710 has resistance genes on the NBS-LRR domain; Glyma14g08900, Glyma14g08910, and Glyma14g08920 have PLP2 (Patation-like Protein) resistance genes. This study provides an important reference for SMV resistance research and technical support for soybean resistance breeding.


Background
SMV is one of the major diseases affecting the world's main soybean-producing areas. Soybeans infected with SMV have wrinkled leaves; are dwarfed; have fewer pods, which are wrinkled and speckled; have a significantly decreased number of root nodules, 100-seed weight and yield per plant; and exhibit decreased protein and oil levels. As a result, SMV suppresses soybean yield and quality and can even result in a failed harvest (Hill et al., 1987;Steinlage et al., 2002;Zhi et al., 2005;Liu, 2007;Hui et al., 2019;Gao and Shu, 2020;Chen et al., 2020).
SMV is divided into different strains based on the host in various incidence regions. Cho and Goodman divided American SMV materials into 7 strains (G1-G7) (Cho and Goodman, 1979). Wang divided common SMV in southern and northern China into 6 strains (Sa-Sf) and 3 strains (N1-N3), respectively (Wang et al., 2004. Northeastern China is the main production area for quality soybeans. Kai Li found that no two strains from these studies were the same; strains from different regions were specific (Li et al., 2014). SMV accounts for more than 9% of the total incidence of disease in some parts of this area. The main pathogens are SMV strain N1 (SN1) and SMV strain N3 (SN3). At present, it is believed that the cultivation of SMV-resistant material is a relatively effective method for preventing and controlling the virus. In early research into SN1 and SN3, scholars had proposed different genetic models based on the resistance separation ratio of offspring in genetic groups. Sun et al. (1990) demonstrated that SN1 resistance was controlled by 2 pairs of complementary recessive genes and SN3 resistance by 1 pair of recessive genes. Zheng et al. (2000a) demonstrated that SN3 resistance was controlled by 1 pair of recessive genes. The limited data on SMV genetics have contributed to the variation in scholars' views on genetic models of the virus. Therefore, it is difficult to devise an accurate genetic model simply by analyzing the separation ratio, as the heredity of soybean SMV resistance is complex. The descriptive statistics indicated that the experimental materials exhibit a wide variation in resistance to SN1 and SN3, and the DI was normally distributed (Figure 2).

Diversity of population genetics
In this paper, 1616 allelic variations were detected by 405 SSR markers; each locus had four allelic variations on average with a range of 2~9, among which Sct_033 had the most alleles (9). The Shannon index distribution of SSR markers was 0.089-4.136, with an average of 1.76. The PIC value of the SSR markers varied from 0.01 to 0.84, with an average of 0.51. The Shannon index and the PIC value (Table 1) showed that the SSR markers used in this experiment were sufficiently abundant for association analysis. The genetic similarity coefficient (genetic similarity, GS) of 327 materials varied from 0.668 to 0.926, with an average of 0.76 (Figure 3), and the GS of the experimental materials showed a large genetic difference and high genetic diversity.

The group structure and the linkage disequilibrium analysis
The experimental materials were partitioned by the population genetic structure based on the mathematic model with the help of STRUCTURE 2.3.4, and the Q-value was calculated. The maximum likelihood 1n(D) was found to increase with the increase in K-value. Therefore, the ΔK value must be calculated to confirm the ideal K value (G. EVANNO, 2005) ( Figure 4).
Using this method, the K-value was confirmed as 3 ( Figure 4) and the experimental materials were divided into 3 sub-populations.
The numbers of materials in the 3 sub-groups, respectively, were 123, 124, and 54, with rates of the total group of 38%, 38% and 17%. The other 26 materials belonged to the mixed subpopulation ( Figure 5). In the LGA analysis, the experimental materials were divided into two sub-populations, with material numbers of 170 and 157. The population structure of 326 soybean materials. The population structure of 326 soybean materials were calculated by the Software STRUCTURE. The association mapping population (AMP) was divided into three subgroups, which were represented by Arabic numerals (1-3). The vertical axis shows the probability that the material belongs to the population http://genbreedpublisher.com/index.php/mpb 14 The linkage disequilibrium analysis was the basis of the association analysis. In the experiment, a linkage disequilibrium existed between collinear SSR markers (in the same chromosome) and non-collinear (in different chromosomes) SSR markers to a different extent (Table 2). Among them, the linkage groups A1 and C1 had the highest linkage disequilibrium (D`>0.60), while the linkage group F had the lowest.

The analysis of resistance genes correlated with the soybean locus
GWA and LGA were conducted under the MLM model to map the loci related to SN1 resistance. Seven extremely significantly (p<0.01) associated loci and 24 significantly associated loci were obtained from GWA. Nine (p<0.05) significantly associated loci were obtained from LGA. There were five significantly associated loci common to both the GWA and LGA results.
In GWA, the accumulated phenotypic variation explanation ratio of seven extremely significantly associated SSR loci was 32.1%. Sat_150 in linkage group L exhibited the maximal explanation (6.8%), while Satt150 in linkage group M exhibited the minimal explanation (3%). All 7 loci were obtained for the first time (Table 3). In LGA, the accumulated phenotypic variation explanation ratio of 9 significant associated SSR loci was 24.4%. Satt296 exhibited the maximal explanation (3.34%), while Satt149 exhibited the minimal explanation (2.3%).
Candidate genes were mined from 3 cM sections on both sides of 15 associated SSR loci obtained from GWA and LGA. As shown in table 4, associated genes can generally be categorized into four types. In the first type, most resistance genes have NB-ARC and LRR domains, which identifies them as R resistance genes with the NBS-LRR domain. In the second type, most genes have the S_TKc domain, which identifies them as receptor-like protein kinase or MAPK (mitogen-activated protein kinase). The third type is related to jasmonate (JA) and the synthesis process of JAs, such as linoleate 9S-lipoxygenase and glyceraldehyde-3-phosphate dehydrogenase. The forth type of genes is related to biotic stress response proteins, such as SACPD (stearoyl-ACP desaturase) and cysteine protease (Table 4).
Taking Sat_150 (P=2.78E-05) positioned at 53.67 cM of the L Linkage group from the GWA analysis as an example, 9 resistance-related genes were obtained by gene mining. Among them, Glyma19g31950, Glyma19g32080, Glyma19g32110, and Glyma19g32150 are R resistance genes with NB-ARC and LRR domains. Glyma19g33290, Glyma19g33300, Glyma19g33310, Glyma19g33320, and Glyma19g33330 are the DIR (Dirigent-like) genes that participate in the synthesis of lignin and the defense response to fungi Riggleman et al., 1985). Taking Satt296 (P=5.12E-03) positioned at 52.61 cM of the D1b linkage group from LGA analysis as an example, 6 resistance-related genes were obtained by gene mining (Table 4). Among them, Glyma02g13460 and Glyma02g13470 are receptor-like protein kinase genes. Gene Glyma02g15600 encodes for SACPD (stearoyl-ACP desaturase), which may participate in the plant defense response to viruses. Silencing SACPD-encoded genes could increase the content of stearic acid (SA) and the gene expression of constitutive pathogenicity-related genes, which leads to the improvement of resistance (Kachroo et al., 2008). The gene Glyma02g14940 encodes for a type of ethylene-responsive transcription factor (ERF), which could regulate the expression of the RP gene by combining the ethylene-responsive transcription factor GCC-box with the RP gene promoter. Others have found that overexpressing ERF in Arabidopsis and tobacco can induce the expression of RP genes and improve the resistance to fungi, bacteria and viruses (Park and Twell, 2001;Yi, 2004;Zuo et al., 2007). The gene Glyma02g15690 encodes for a type of MAPK protein that belongs to the last protein kinase in the MAPK cascade, which has an important role in plant systemic defense responses (Meng et al., 2012). The gene Glyma02g15831 encodes for a type of cysteine protease that participates in the hypersensitivity caused by the invasion of germs and the PCD (programed cell death) of organ failure .

Related locus analysis of soybean SN3 resistance
GWA and LGA were conducted under the MLM model to map the loci related to SN3 resistance. Seven extremely significantly (p<0.01) associated loci and 39 significantly associated loci were obtained from GWA. Twelve significantly (p<0.05) associated loci were obtained from LGA. There were 5 significantly associated loci common to the GWA and LGA results.
According to GWA, the accumulated phenotypic variation explanation ratio of 7 extremely significantly associated SSR loci was 24.27%. Satt252 in linkage group F had the maximal explanation (4.61%), and Satt020 had the minimal explanation (2.96%) ( Table 5).
Candidate genes from a 3 cM section on both sides of 15 associated SSR loci obtained from GWA and LGA were mined. According to Table 8, the associated genes can generally be categorized into three types. In the first type, most resistance genes have NB-ARC and LRR domains, which identifies them as R resistance genes with an NBS-LRR domain. The second type is related to JA and the synthesis process of JAs, such as linoleate 13S-lipoxygenase, glyceraldehyde-3-phosphate dehydrogenase and 12-oxo-phytodienoic acid reductase. The third type of gene contains the S_TKc domain, which identifies them as MAPKKK (mitogen-activated protein kinase kinase kinase).
Taking Satt726 (P=4.20E-02) positioned at 100.55 cM of the B2 linkage group from the LGA analysis as an example, 9 resistance-related genes were obtained by gene mining. Among them, Glyma14g37860 belongs to the RPP13-like gene, which inhibits the growth of bacteria and viruses. The other 8 genes (Glyma14g36511, Glyma14g38500, Glyma14g38516, Glyma14g38533, Glyma14g38561, Glyma14g38586, Glyma14g38700, and Glyma14g38741) are all R resistance genes with an NBS-LRR domain. Taking Satt252 positioned at 43.01 cM of the F Linkage group from the GWA and LGA analyses as an example, 6 resistance-related genes, Glyma13g04200, Glyma13g04230, Glyma13g04410, Glyma13g04540, Glyma13g03790 and Glyma13g06010, were obtained by gene mining. Among them, Glyma13g04200 and Glyma13g04230 encode two RPP13-like antiviral proteins with NB-ARC and LRR domains, respectively. Glyma13g04410 encodes an alpha-dioxygenase that belongs to the peroxidase superfamily, which catalyzes the oxidation of fatty acids to form 2-hydroperoxy fatty acid. Its expression levels in different plants were associated with plant resistance to the invasion of pathogens (Hamberg et al., 2003). Glyma13g03790 encodes for a linoleate 13S-lipoxygenase, which is a key enzyme of the fatty acid metabolic pathway. Its activity enhancement and rapid peroxidation of fatty acid play a role in the stress response mechanism for plants in response to biotic and abiotic stress (Hwang and Hwang 2010;Vicente et al., 2012). Glyma13g06010 and Glyma13g04540 encode histone deacetylase and carboxylesterase, respectively.

Comprehensive analysis of soybean SN1 and SN3 resistance
The comprehensive analysis of soybean SN1 and SN3 resistance revealed that three loci are associated with two characters simultaneously. These loci are Sat_154 from the F linkage group, Satt168 from the B2 linkage group and Satt376 from the C2 linkage group.
Two resistance-related genes, Glyma13g25420 and Glyma13g25440, were obtained by gene mining around Sat_154. These genes are resistance genes with an NBS-LRR domain.
Locus Satt168 at 68.91 cM of the B2 linkage group is related to anti-fungal activity (Wang and Ghabrial, 2002) and Heterodera glycines (Yue et al., 2001) resistance in soybean. Soybeans with this locus exhibit broad-spectrum resistance that is related to the pest control response. Five resistance-related genes were obtained by gene mining. Among them, Glyma14g08700 and Glyma14g08710 are resistance genes with an NBS-LRR domain. Glyma14g08900, Glyma14g08910 and Glyma14g08920 are PLP2 (patatin-like protein) resistance genes.
Locus Satt376 is related to the tocopherol content of soybean. The tocopherol cyclase (TC) gene, which controls the synthesis of tocopherol, could be over-expressed to strengthen the resistance of plants (Li and Shi, 2010). Four genes related to resistance were obtained by gene mining. Among them, Glym06g19890 and Glyma06g19920 encode an EDS1K-like protein and regulate a resistant protein. Glyma06g18110 and Glyma06g18120 encodes glyceraldehyde-3-phosphate dehydrogenase (GAPDH), which may play a role in the viral infection process.

Analysis and comparison of GWA and LGA
This experiment adopted two association analysis strategies, genome-wide association (GWA) and linkage group association (LGA). GWA used SSR loci distributed in the whole genome to mine soybean SMV resistance-related loci with a threshold of 0.001.
LGA used SSR loci, distributed mainly on the linkage groups (B1, D1b, F) gathering known SMV resistance-related loci, to mine related loci with a threshold of 0.05.
In association analysis, the P value (threshold value) was selected to meet the requirements for finding significantly associated loci and to reduce false-positive associations. In GWA, when the P value was increased from 0.001 to 0.05, the corresponding associated loci number for SN1 and SN3 increased from 7 and 7 to 24 and 39, respectively. In addition, the degree of significant increase in false positives was greater than that of association. Therefore, to ensure the accuracy of associated loci finding, we selected the threshold value of 0.001.
Compared with the GWA, LGA had fewer molecular data, but three linkage groups contain most of the known loci related to SMV resistance. Setting the P value to 0.05 ensured valid mining of correlated loci when there were fewer molecular data, but the determination of final candidate-associated loci should be combined with the bioinformatics analysis of loci.
Comparing the results from GWA and LGA revealed that 5 loci from LGA (Satt296, Sat_154, Sat_240, Satt149, and Satt157) related to SN1 resistance showed significant but not extremely significant correlations in GWA (0.001<p<0.05). In the association analysis of SN3, there were 5 loci with significant correlation and extremely significant correlation by LGA and GWA separately. Therefore, the results obtained by the mutual authentication of two methods can prevent the omission of less intensely associated loci and provide secondary verification of a portion of the results. However, the LGA method is applicable to associated traits with a certain basis of molecular research, and related loci should distribute to a small number of linkage groups or chromosomes.

The associated loci and known RSV gene locations
Among the loci associated with SN1 resistance, 3 (Sat_297, Satt296 and Sat_154) were the same as or adjoining to those found in previous studies. Two relevant loci (Sat_154 and Satt726) associated with SN3 resistance that were the same as found in previous studies. Locus Sat_154 was associated with both SN1 and SN3.
Among those resistance-associated loci, Sat_297 (59.597) was mapped by . Shi et al. (2008) found that Satt726 was linked closely to the Rsv3 gene of cultivar J05, with a 2.0 cM genetic distance.
The locus Satt296, related to SN1 resistance according to Luan, was associated with SN3 resistance in our research (Luan et al., 2006). In the public genetic map, this locus has distances of 0.41 cM and 6 cM from Satt542 (53.02) related to gene Rsv4 from Hayes' research and Satt634 (46.61) related to gene Rsc7 from Fu's research (Hayes et al., 2000;Fu et al., 2006).
The research results showed that there were one or more genes related to SMV resistance surrounding these 4 associated loci. However, fine mapping and further gene mining are still required to specify these genes. The same and adjoining loci also showed that our research method has reliability and reference value.

The influence of populations and strains used in this research
In previous studies, mining the SMV resistance locus was uncommon; most studies used recombinant inbred lines as experimental materials, which led to fewer linkage loci identifications and limited application to the corresponding genetic population. Compared with the genetic population used in composite interval mapping, the resource population accumulated a large amount of recombinant information during long-term evolution. This research successfully structured a large resource population containing 327 breeding materials commonly used in northeastern China and took SMV resistance as the target trait. The primary popular strains N1 and N3 of SMV in northeastern China were chosen as the pathogenic strains. Because northeastern China is the main production area and source land for Chinese soybean and local cultivars and SMV pathogenic strains are regional, the resistance studies on other strains from different areas are difficult to apply. The present research focused on studying SMV resistance in northeastern China among local breeding materials, and the associated loci identified in the research may have limited application. The present findings also provide a partial molecular background for all the materials in this resource population, which would benefit breeding materials selection and SMV resistance marker breeding in the northeast region.
In the survey of SN1 infection, the DI of the entire population was mainly in the low acceptance area of 0.20-0.30, accounting for 49.84%, and its variance range was 0.1, defined as hypovirulence. For SN3 infection, the DI was mainly in the mid-acceptance area of 0.40-0.50, accounting for 34.58%, and its variance range was 0.1 with strong pathogenicity. The results of the surveys on inflection were the same as previously reported (Zheng et al., 2000b).
The resource population containing 327 materials used in this research covered most of the breeding materials in northeastern China. These materials were all susceptible to SMV but with different degrees of resistance, which means that SMV is a common disease in the northeast with a high incidence rate, leading to yield decrease. Soybean materials with full resistance were not found. Because the materials collected in the population were mainly breeding materials, which had all undergone resistance selection for SMV acceptance during their cultivation, material exhibiting extreme acceptance did not appear in the population. However, cultivars with obviously extreme resistance were also not found in this study. This result highlights the importance of identifying key resistance genes and cultivating resistance cultivars.
The association analysis method has a greater ability to explain the recombinant information in the resource population. In the genome-wide scan, a number of molecular markers were needed to obtain more precise and associated loci for the target trait. This research used SSR markers with the convenience of obtaining data and higher repeatability, but as its marker density was lower than for SNP markers, the precision of our research was somewhat diminished. In later studies, based on the current results, we could specifically develop a large number of SNP markers to improve the density. Combining these results with soybean genomic sequencing results, we could further define the molecular markers that are closely associated with SMV resistance genes and clone and validate the related candidate genes, providing a corresponding theoretical basis for SMV resistance breeding and the selection of breeding materials with higher resistance to SMV.

Test materials
The soybean mosaic virus SN1 and SN3 strains were provided by the Jilin Academy of Agricultural Sciences, China.

Identification of resistance to soybean diseases SN1 and SN3
The disease index detection for SN1 and SN3 infection was performed by Jia Liu from the Jilin Academy of Agricultural Sciences. The procedure was as follows: 150 plants of each cultivar or line were planted in a 6-meter-long ridge with 3 repetitions, and evaluations were made after artificial inoculation under an anti-aphid net. After inoculation for 20 to 30 days, when the plant was diseased, the states of illness were divided into five grades according to leaf color changes and the extent of shrinkage. The incidence grade for each plant was determined and recorded, and the disease index was calculated using the formula below, which was used as the phenotype of correlation analysis.

SSR markers genome scans
Soybean leaf DNA was extracted using the AxyPrep kit (Axygen Scientific Inc, Silicon Valley, USA). The DNA concentrations were 100~200 ng/µL as determined with a NANODROP 2000C instrument. The SSR primer information was obtained from the USDA soybean genetic database (http://www.soybase.org/), and 1015 pairs of primers were prepared by Sangon Biotech Co., Ltd. Ten random soybean materials were used to screen the primers. Four hundred and five pairs of SSR primers distributed evenly on 20 chromosomes with good polymorphisms were selected in this research. The PCR amplification products were detected by 6% polyacrylamide gel electrophoresis and developed by silver staining.

Data analysis
The software package Popgene32 was for the statistical analysis of allele variation number and the Shannon index. The software PIC6.0 was used to calculate and obtain the PIC (polymorphism information content) value of the SSR markers. The software NTSYS was used to determine the genetic similarity coefficient. The software STRUCTURE 2.3.4 was used to analyze the group structure of the material. The calculation assumed that the sub-group number K was between 2 to 15 and the calculation repetition number was from 10,000 to 100,000, adding 10,000 repetition times for each calculation, with 5 iterations. The LnP (D) value of each assumed population was obtained for each calculation, and the K value was confirmed through ΔK. The software SPAGeDi-1.3d was used to process SSR marker data to obtain a kinship matrix between different materials. The software TASSEL3.0 was used with a Q value as the covariate and a mixed linear model (MLM, Q+K) for regression analysis of SSR marker data and phenotypic data.

Gene mining around the SSR loci
Based on the whole-genome information for the soybean cultivar Williams82 provided by the SoyBase database (http://www.soybase.org/), the genomic positions of relevant SSR loci were obtained. In this research, the candidate-related genes were searched for in a 3 cM section on both sides of associated SSR loci and the candidate genes were sorted by GO (Gene ontology) enrichment analysis. Based on the GO enrichment results, the genes relevant to the plant-resistance response were preliminarily screened; these genes may be involved in defense reactions against bacteria, fungi and viruses or may be involved in the plant hormone synthetic process related to anti-pathogenic infection and systematic acquired disease resistance. By comparing the gene-encoded protein sequence through the NCBI BlastP online server, the structural domain information was obtained, and a second screening was conducted. Based on the enrichment information, structural domain information and previous research, the candidate genes were defined if they were homologous to known anti-disease genes, exhibited typical structural features of anti-disease genes, or participated in the synthesis process of plant hormones related to anti-phytopathogenic infection.