Sequencing of Chloroplast Genome of Cucumis melo L. var. agrestis Naud.

Cucumis melo  L. var . agrestis Naud.  belong to the genus  Cucumis  in the Gourd family, it is rich in oil. It can extract high quality edible oil and provide raw materials for medicine and food industry. The chloroplast genome of  Cucumis melo  L. was sequenced for the first time. The results showed that the chloroplast genome of  Cucumis melo  L. was similar to that of common melon. This study can provide a theoretical reference for breeding.


Statistics and annotation of chloroplast genome sequencing data
Sequencing showed that the GC content of chloroplast genome was 37.45%, Q20 was 96.83% (Table 1). A total of 133 genes, tRNA37, rRNA8, mRNA88 (Table 2). Chloroplast gene function is mainly divided into photosynthetic synthesis, self-replication and other genes, there are also some genes with unknown function (Table 3).

Translation initiation factor infA
Other -

RSCU codon preference
Codon preference indicates that, TGG separate corresponding Trp codes (Figure 2), Met correspond to GTG, ATT, ATG and ATC codes, respectively. both Arg, Leu and Scr correspond to 6 codon codes ( Figure 2). The results of codon preference analysis showed that the UAA and GCU values were higher (Table 4), while the GCG, UGC, UAG and UGA values are lower (Table 4).

Disperse repeats
The results showed that there were more forward and palindrome repeats, but less reverse repeats and no complementary repeats ( Figure 3).  (Table 5; Table 6), 66.55 per cent; SSR 11 dinucleotides, A total of 71 dinucleotide SSR, 24.74 per cent; Four, five and six nucleotides SSR less, were 9,3 and 2, respectively (Table 5). A,T nucleotide repeats are the largest in single nucleotide repeats, fewer repeats C,G single nucleotide ( Figure 4; Table 7).

KaKs experimental results
The ratio of non-synonymous mutation rate (Ka) to synonymous mutation rate (Ks) indicates the choice. The ratio is greater than 1, which indicates that the positive selection effect is less than 1, which indicates that there is purification selection effect. KaKs value is between 0.4 and 0.8, indicating purification selection (Table 8).

Analysis pi nucleic acid diversity
The higher pi values in the ssc and IR regions also mean that the region has a large degree of variation and a small value in the lsc interval, indicating a small degree of variation ( Figure 5).

IRscope test results
The chloroplast genome is a circular structure, IR has four boundaries with LSC and SSC, namely LSC-IRb, IRb-SSC, SSC-IRa and IRa-LSC. During the evolution of the genome, IR boundaries expand and shrink. Bringing certain genes into IR or single copy regions, Use the SVG module in the Perl to visualize the boundary information ( Figure 6). There's a slight difference between several melon species, the length of the genome varies slightly, slightly different lengths between the four boundaries, only subtle differences, perhaps this subtle difference, which determines the different species of melon. The traits were different ( Figure 6).

Comparison of chloroplast structures
Chloroplast structure is a ring-shaped structure. The results are similar by comparing the melon varieties of the proximal genus. Except for a few sites, the other approximations were higher (Figure 7).

Evolution tree clustering results
The results of cluster analysis showed that common melon, melon and snake melon were one kind, while Field Muskmelon was closely related to common melon, melon and snake melon, but far from bitter gourd, pumpkin and loofah (Figure 8).

Homologous and collinearity analysis of chloroplast sequences
using the Mauve (http://darlinglab.org/mauve) software default parameters for genome alignment, sequence homology indicates that equine pomelo has homologous similarity with existing known melon, vietnamese melon, cucumber and other crops, except for a few loci, most of the sequences are approximate. Especially with Yue melon, common melon is more similar. far away from balsam pear and loofah (Figure 9).

Discussion and Conclusions
Referring to the genome sequencing of Field Muskmelon, we obtained the complete chloroplast sequencing and genome map of the plant. Conomon melon (156 017 bp) was found between the equinoxa and the same species, melon (Cucumis melo var. Makuwa) (156 016 bp), melon subspecies (Cucumis melo subsp. Agrestis) (156 016 bp) close. Homologous and collinear analysis of chloroplast sequences showed that equine melon was consistent with melon and common melon, but different from watermelon and pumpkin in tRNA, few genes. The results of cluster analysis showed that the Field Muskmelon was closest to the common melon, and the melon was closest to the common melon, snake melon and Yue melon. The number of introns in Field Muskmelon was 21, which was basically consistent with that in common melon (Wang, 2017). Among the cucurbitaceae plants, the Field Muskmelon and common melon have the same origin, which is consistent with the previous research that the Field Muskmelon is the wild ancestor of Chinese thin skin melon (Ma, 2020a). The chloroplast genome is relatively conserved and can better explain the evolution and kinship of species. and this study shows that the genetic and taxonomic results of the Field Muskmelon are consistent with those of previous studies (Ma, 2020a).
Melon has a GC content of 36.9. Cucumber is also 36.9% (Zhu et al., 2016), and the GC content of Field Muskmelon is 37.45%. and the LSC content is 55.34%; SSC 11.59%, The IRs content of 33.1% was basically the same as that of melon (Zhu et al., 2016).

Plant materials
Fresh leaves of the Field Muskmelon plant. The collection site is Fuyang Academy of Agricultural Sciences Science and Technology Park.

Bioinformatics analysis
The Clean Data were assembled according to the chloroplast genome sequence of the reference species, the results of chloroplast sequence assembly were obtained, the results of chloroplast sequence assembly were annotated with gene structure, the chloroplast genome map was made, and the basic contents such as chloroplast genome SSR were analyzed. KaKs analysis, phylogenetic analysis and other advanced analysis content. Use fastp (version 0.20.0, https://github.com/OpenGene/fastp) software to filter raw data. The filtering criteria are as follows: (1) Sequencing connectors and primer sequences in the Reads were removed (2) Filter out reads whose average mass is less than Q5 (3) Filter out reads with N number greater than 5 Using OGDRAW (https://chlorobox.mpimp-golm.mpg.de/OGDraw.html) to make chloroplast genome maps.
Because of the characteristics of codon biology, the codon usage rate of different species is very different. This inequality in the use of synonymous codons is called codon preference (Relative Synonymous Codon Usage, RSCU). This preference is considered to be a combination of natural selection, species mutation and genetic drift. The calculation methods are as follows: (the number of codon encoding one of the amino acids / the number of all codon encoding the amino acid) / (1/the type of codon encoding the amino acid), and (the actual frequency of codon / the theoretical frequency of the codon). Calculate using a self-written Perl script.
Pi (nucleic acid diversity) can reveal the variation of nucleic acid sequences of different species, and regions with high variation can provide potential molecular markers for population genetics. The homologous gene sequences of different species were compared globally by mafft software (auto mode), and the pi values of each gene were calculated by dnasp5.
The whole genome is used for evolutionary tree analysis by default, the ring sequence is set the same starting point, and the interspecies sequence is compared by MAFFT software (v7.427,--auto mode). The compared data are trimmed with trimAl (v1.4.rev15), then the RAxML v8.2.10(https://cme.h-its.org/exelixis/software.html) software is used to select GTRGAMMA model, rapid Bootstrap analysis, bootstrap=1 The maximum likelihood evolution tree is constructed.

Genome assembly
To reduce the complexity of the next sequence assembly, bowtie2 (v2.2.4, http://bowtie-bio.sourceforge.net/bowtie2/index.shtml) was used very-sensitive-local align the database and sequence. The assembly core module uses SPAdes [1]v3.10.1(http://cab.spbu.ru/software/spades/) software to assemble chloroplast genome. Three aspects of genome quality control are performed to ensure the accuracy of assembly results: (1) reads specific genomes, statistics of genome coverage, size of inserted fragments, etc; (2) Genome alignment reference sequences to see collinear analysis of genome conservation and rearrangement.
(3) Genome alignment of reference sequence structure information to compare the differences between the two.
A reference sequence Cucumis_melo: MT240857.1. is used for this project.
Author's contributions Ma Zongxin is responsible for project design, data processing and gene sequencing, and completes the writing and revision of the paper. I have read and approved the final manuscript.