Development of polymorphic microsatellites for Sillago sihama based on next-generation sequencing and transferability to Sillago japonica
Sillago sihama (Forsskål, 1775), a commercially important marine fishery species in the Indo-West Pacific, is being developed as a target species for aquaculture and stock enhancement in China. However, due to the limitations of traditional isolation methods, the available microsatellite loci, or simple sequence repeats (SSRs), of S. sihama lack diversity. We used a stepwise approach including Illumina sequencing, primer screening, and SSR marker validation to develop diverse SSRs for S. sihama. A total of 853.48 Mb clean sequences were assembled with high coverage and sequencing depth, and 27,288 potential SSRs were identified. A set of 18 novel SSR markers with four type motifs including 11 di-, 1 tri-, 5 tetra-, and 1 hexanucleotide repeats were successfully isolated. The ranges of number of alleles per locus and observed and expected heterozygosities were 5-24, 0.226-0.968, and 0.319-0.950, respectively. The diversity parameters exhibited high levels of polymorphism in these 18 loci. Three loci with the presence of both null alleles and inbreeding showed significant deviation from Hardy-Weinberg equilibrium after Bonferroni correction. Moreover, 13 loci developed in S. sihama showed high transferability to the closely related species Sillago japonica. The polymorphic SSR markers developed in this study may serve as valuable tools for further basic and applied research on the genetic resources of S. sihama as well as S. japonica. Our results indicate that this approach, based on next-generation sequencing technology, is convenient, cost-effective, and suitable for SSR marker isolation in other sillaginid fishes.
Due to their ubiquitous occurrence and high allelic polymorphism, microsatellite loci, or simple sequence repeats (SSRs), have been widely applied in various research fields, especially in population genetics, conservation biology, molecular ecology, and genetic breeding (Ma et al., 2012; Zhang et al., 2016). However, traditional approaches for SSR isolation are laborious, time-consuming, and costly (Abdelkrim et al., 2009). Recently, with the rapid development of next-generation sequencing (NGS) technologies and their novel biological applications (Grover and Sharma, 2016), it is becoming more and more convenient to generate a large number of SSR reads from the genome in a short time and at a relatively low cost (Duan et al., 2014; Yang et al., 2015). Furthermore, genome de novo sequencing, genome re-sequencing, and RNA sequencing can also be realized in model or non-model species based on NGS technologies (Schuster 2008; Yang et al., 2015). Indeed, NGS technologies have been successfully applied to develop SSR markers for a large variety of organisms without reference sequences (Addamo et al., 2015; Villanova et al., 2015), and even for species such as birds, which have few SSRs (Page et al., 2014), and the extinct moa, which has a small quantity of ancient DNA (Allentoft et al., 2009).
The family Sillaginidae (Teleostei, Percoidei) contains four genera in approximately 33 species (Shao, 2016). They are small to moderate in size and typically inhabit the inshore coastal waters with sandy substrates or estuaries (Gao et al., 2011). As a representative species of this family, Sillago sihama (Forsskål, 1775) is the most widespread and abundant member of the sillaginids and is distributed in the Indo-West Pacific region from Knysna (South Africa) to Japan (Liu et al., 2012). S. sihama can be found in all Chinese coastal waters, supporting a commercially important marine fishery. Currently, the supply of this fish mainly relies on wild capture (Guo et al., 2012). However, S. sihama has been overexploited in the East and South China Seas during the last 20 years (Lu et al., 2008; Liu et al., 2010). This requires us to pay close attention to the genetic diversity conservation of wild S. sihama. Furthermore, due to the successful conduction of artificial propagation in the South China Sea (Huang et al., 2013), S. sihama is being developed as a target species for aquaculture and stock enhancement. Thus, escapes or releases of hatchery-produced S. sihama may pose a potential genetic risk to local populations. Accordingly, it is necessary to develop polymorphic SSR markers in order to provide an essential basis for future genetic resource study and aquaculture practice in S. sihama.
The SSRs of S. sihama and two other Sillago species (Sillago japonica Temminck & Schlegel, 1843, and Sillago parvisquamis Gill, 1861) have been isolated using different molecular techniques (Guo et al., 2012; Ueno et al., 2013; Umino et al., 2013; Wang et al., 2014), but none of them were developed from NGS technologies. The 66 SSRs developed in S. sihama using the FIASCO method, however, were all dinucleotide repeats, and 42 of them showed low polymorphism, with no more than six alleles (Guo et al., 2012). Moreover, traditional methods based on the enriched genomic library usually obtain a specific microsatellite type. For example, only dinucleotide repeat microsatellites of both S. sihama and S. japonica were isolated in previous reports (Guo et al., 2012; Ueno et al., 2013; Wang et al., 2014). This may limit both the diversity of SSR markers and the number of polymorphic SSRs obtained from the genome (Restrepo et al., 2015). Given these deficiencies, we utilized the high-throughput Illumina sequencing platform to develop a set of diverse SSR markers for S. sihama.
The aim of this study was to i) identify genomic SSRs and characterize 18 polymorphic SSR markers in S. sihama, ii) test the transferability of these new markers to the closely related species S. japonica, and iii) establish the first complete SSR marker isolation procedure in S. sihama using NGS technology. These results will significantly contribute to understanding the population genetic resources and conservation of S. sihama as well as S. japonica, and could also provide an alternative approach to the SSR marker isolation of other sillaginid fishes.
MATERIAL AND METHODS
Sample collection and DNA extraction
Fresh S. sihama (32 individuals) and S. japonica (24 individuals) were sampled from the east and west coasts of Leizhou Peninsula, Guangdong Province, the South China Sea, respectively. Samples were preserved in 95% alcohol, and total genomic DNA was extracted from muscle as described by Sambrook and Russell (1989).
Illumina paired-end sequencing, data filtering and genome assembly
One specimen of S. sihama was subjected to high-throughput sequencing using an Illumina Hiseq 2000 platform (Illumina Inc., San Diego, CA, USA) at BGI Genomics Co., Ltd. (Shenzhen, China), and a 100-bp read length paired-end sequencing library was constructed with insert sizes of about 500 bp. Illumina sequencing generated 1632 Mb of raw genomic data with 9.80X sequencing depth. Through a stringent filtering process (Li et al., 2010), which included removing reads containing adapter sequences, low quality reads (quality score of Q ≤ 5), reads containing poly-N, and duplicate reads (Duan et al., 2014), about 8% of raw reads were cleaned. The remaining 1479.99 Mb reads were high quality with 9.06X sequencing depth. Due to the low frequency (< 3) K-mer value (K = 17 bp) (Li et al., 2010), a small dataset of error sequencing reads was then removed from the raw reads (about 2.11%). Genome de novo assembly was performed using the SOAPdenovo software package with default settings (Duan et al., 2014). After contig assembly (contig longer than 100 bp were used), scaffolding, and gap closure (Li et al., 2010), the resulting 853.48 Mb assembled genome datasets were used for SSR discovery.
SSR identification and primer design
The potential SSRs were searched using the Tandem Repeats Finder tool (Benson, 1999) (
Primer polymorphism test
A total of 133 primer pairs containing di- to hexanucleotide repeats were randomly selected. Polymorphism detection was conducted in six S. sihama individuals. PCR was carried out in 15 µL volumes containing 1X PCR buffer [100 mM Tris-HCl pH 8.8, 25°C, 500 mM KCl, 0.8% Nonidet P40 (v/v), 15 mM MgCl2], 1 µL extracted DNA (10-50 ng), 0.3 µL each primer (10 µM), 0.3 µL dNTP mixture (10 mM), and 1.7 U Taq DNA polymerase (Transgen Biotech Co., Ltd., Beijing, China). The PCR amplifications were performed on a VeritiTM 96-Well Thermal Cycler (Applied Biosystems, USA) under the following conditions: an initial denaturation at 95°C for 4 min, a touchdown PCR protocol for 25 cycles at 95°C for 30 s, 62°-52°C for 30 s (decreasing 1°C per cycle for the first 10 cycles), 72°C for 30 s, and a final extension at 72°C for 10 min. PCR products were separated on 8% non-denaturing polyacrylamide gel and visualized by 0.1% silver nitrate staining. The allele size was identified roughly according to the pBR322 Marker/MspI marker (Tiangen, Beijing, China).
PCR amplification and genotyping
According to the M13 labeling method (Schuelke, 2000), an M13 tail (5'-AGGGTTTTCCCAGTCACG-3' or 5'-GAGCGGATAACAATTTCACAC-3') was added to the 5' end of each forward primer of selected polymorphic SSRs (Table 1), while the M13 universal primer (the sequence was same as for the M13 tail) was labeled with FAM or HEX (Generay Biotech Co., Ltd., Shanghai, China) at its 5' end. The optimal annealing temperature was identified for each pair of primers via gradient PCR (51°, 53°, 55°, 57°, 59°, and 61°C) using the three-primer PCR amplification protocol described by Boutin-Ganache et al. (2001) and Schuelke (2000). PCR amplifications were performed in a total volume of 15 µL, which included 1X PCR buffer [100 mM Tris-HCl, pH 8.8, 25°C, 500 mM KCl, 0.8% Nonidet P40 (v/v), 15 mM MgCl2], 1 µL extracted DNA (10-50 ng), 0.3 µL forward primer (10 µM) with M13 tail, 0.1 µL labeled M13 universal primer (10 µM), 0.3 µL reverse primer (10 µM), 0.3 µL dNTP mixture (10 mM), and 1.7 U Taq DNA polymerase (Transgen Biotech Co., Ltd., Beijing, China). The PCR programs were conducted as follows: an initial denaturation at 95°C for 5 min, followed by 25 cycles at 95°C for 30 s, 30 s at the locus-specific annealing temperature (51°-61°C), 30 s at 72°C, and a final extension for 10 min at 72°C. All sets of PCR included a negative control reaction tube in which all reagents were included except the template DNA. The successful PCR products were sent to Shanghai Generay Biotech Co., Ltd. (Shanghai, China) for genotyping by an Applied Biosystems 3730 DNA Analyzer and GeneMapper® software version 4.0.
Eighteen microsatellite primers developed in Sillago sihama and 13 primer pairs from successful cross-amplification of Sillago japonica.
|Locus||Primer sequences(5'-3')||Repeat motif||Allele size (bp)||Ta (°C)||GenBank accession No.|
|S. sihama||S. japonica|
Ta = annealing temperature; “–” = not applicable.
Cross-species amplification of developed SSRs
Cross-species amplifications of polymorphic SSRs developed from S. sihama were tested in the closely related species S. japonica (Tables 1 and 2
Characterization of 18 polymorphic microsatellite loci in Sillago sihama and 13 transferable loci in Sillago japonica.
|Locus||S. sihama||S. japonica|
NA = number of alleles; HO = observed heterozygosity; HE = expected heterozygosity; PIC = polymorphism information content; HWE = Hardy-Weinberg Equilibrium; Fis = inbreeding coefficient; Fua = frequency of null alleles; “–” = not applicable. Asterisks indicate significant departure from HWE after Bonferroni correction (α = 0.05) for S. sihama (P ≤ 0.0028) and S. japonica (P ≤ 0.0038).
The number of alleles (NA), observed heterozygosity (HO), expected heterozygosity (HE), and polymorphism information content (PIC) at each locus were calculated using Cervus 3.0.7 (Kalinowski et al., 2007). Tests for deviation from Hardy-Weinberg equilibrium (HWE) at each locus were carried out in GenePop 4.3 (Rousset, 2008) with the default settings. Significant P values were adjusted for multiple comparisons using the sequential Bonferroni correction (α = 0.05) (Rice, 1989). Frequency of null alleles (Fua) was evaluated by Micro-Checker v.2.2.3 (Van Oosterhout et al., 2004) based on the Brookfield-1 method. In addition, the inbreeding coefficient (Fis) of Weir and Cockerham’s version was calculated using the FSTAT 2.9.3 program package (Goudet, 2001).
Quality assessment of Illumina sequencing
A total of 1632 Mb of paired-end raw sequencing data was generated using the Illumina Hiseq 2000 platform. After data filtering and error sequencing removal, 1363.48 Mb high-quality clean sequences were obtained for further analysis. The final assembled S. sihama genome was 853.48 Mb with a scaffold N50 of 1870 bp, and average GC content was 43.3%. The scatterplot of the GC content and sequencing depth showed a concentrated distribution, indicating that the assembled genome was free of contamination. The coverage of the contigs reached 90.92% with 9.51X sequencing depth.
Identification of genomic SSRs
Based on microsatellite searching, a total of 27,288 SSRs of longer than 22 bp were identified in the S. sihama genome. Among them, more frequent repeats were dinucleotide SSRs (73.55%), followed by trinucleotide (9.87%), tetranucleotide (9.53%), hexanucleotide (3.89%), and pentanucleotide (2.94%). Although the longer repeat motifs (4-6 mers) shared relatively small proportions, the number of this type of motifs (4521) was considerable in the genome studied.
Polymorphism of detected SSRs
Of 133 primer sets tested, 88 pairs (66.2%) yielded specific PCR products and were selected to assess the polymorphism and PCR characterization of 32 individuals of S. sihama. In total, 18 SSR markers (20.5%) produced reliable PCR amplicons and were shown to be polymorphic, with allele sizes ranging from 102 to 348 bp. These 18 SSR sequences were submitted to the NCBI (GenBank accession Nos.: KX074059-KX074076, Table 1).
The NA per locus varied from 5 to 24 (mean = 13.94, same as below, Table 2). HO and HE ranged from 0.226 to 0.968 (0.752) and from 0.319 to 0.950 (0.817), respectively. The PIC ranged from 0.301 to 0.931 with an average of 0.788. After Bonferroni correction (adjusted P value = 0.0028), significant deviations from HWE were detected at three loci (SS2-3, SS2-15, SS4-6), which revealed notable heterozygote deficiency. Meanwhile, the presence of both null alleles and inbreeding was detected at these three loci, with Fua and Fis ranging from 0.067 to 0.191 and from 0.287 to 0.418, respectively.
Cross-species amplification results show that 13 of 18 loci were successfully amplified across 24 S. japonica individuals (Tables 1 and 2
Characterization and mining of genomic SSRs in S. sihama
Based on NGS, a total of 853.48 Mb of clean genome was assembled with high coverage and sequencing depth, which provided high single-base accuracy and effective assembly for the S. sihama genome. This large genomic dataset produced 27,288 potential SSRs with various microsatellite motifs. Consistent with other fish genomes (Villanova et al., 2015), the majority of SSRs in S. sihama were shorter repeat motifs (22,763 SSRs of 2-3 mers), suggesting that it could be convenient and effective for isolating shorter motifs. Moreover, a considerable number of the longer repeat motifs (4521 SSRs of 4-6 mers) were also identified in the present study, whereas this type of SSRs are difficult to isolate using the traditional method. It has been demonstrated that longer motifs show more accuracy and efficiency of allele assignment than shorter motifs (Restrepo et al., 2015). Thus, in the genomic era, it is necessary for researchers to develop longer SSR marker motifs rather than focusing on shorter motifs. Given these considerations, primer screening and polymorphism testing for longer motifs as well as shorter motifs were conducted in S. sihama.
Development of microsatellite markers and population genetic parameters for S. sihama
As a result, 18 polymorphic SSRs with four type motifs, including 11 di-, 1 tri-, 5 tetra- and 1 hexanucleotide repeats, were successfully isolated for S. sihama. The NA per locus varied from 5 to 24 (13.94), with HO ranging from 0.226 to 0.968 (0.752). These values were comparable to those reported in 32 other fishes (mean NA = 13.5, mean HO = 0.66) (DeWoody and Avise, 2000) but notably higher than those previously reported for S. sihama (NA, 2-16, 6; HO, 0.083-1.000, 0.631) (Guo et al., 2012), indicating considerable variability in these 18 SSR markers. According to the criteria defined by Botstein et al. (1980), 17 of 18 loci were highly informative (PIC = 0.715-0.925 > 0.5), and 1 locus (SS4-6) was reasonably informative (0.25 < PIC = 0.301 < 0.5). The presence of such highly polymorphic SSR markers (especially the six longest motifs) in the present study indicates that these loci can provide new insights for further research on the population genetic resources and conservation of S. sihama.
It has been reported that HWE deviation relative to heterozygote deficiency in microsatellites is commonly found in marine fishes (Yu et al., 2002; Lin et al., 2012). This deviation is usually caused by several major factors, including null alleles, inbreeding, the Wahlund effect, selection against heterozygotes, and population degradation (Yu et al., 2002). Our results show that the presence of both null alleles and inbreeding in three loci of S. sihama might cause heterozygote disequilibrium. With only one population tested, the Wahlund effect (subpopulations existing in samples) could not be considered as a reason for the HWE deviation in this study. However, other two possible causes (natural selection or population degradation) for HWE deviation could not be excluded in S. sihama, even though the tested population exhibited a high level of genetic diversity, as the natural resource of S. sihama has been significantly decreasing over the last 20 years (Lu et al., 2008; Liu et al., 2010). For further investigation, it is necessary to increase sample sizes and expand sample ranges to better understand the population evolution of S. sihama. In spite of this, the genetic parameter analyses of this study can provide particular information for the assessment of population genetic resources.
Transferability of the developed microsatellite markers to S. japonica
The success rate of cross-species amplification from S. sihama to S. japonica (13 of 18 loci, 72.2%) was much higher than that between other Sillago species; for instance, that of S. japonica to S. parvisquamis was 45.5% (Ueno et al., 2013) and that of S. parvisquamis to S. japonica was 63.6% (Umino et al., 2013). A significant negative correlation was found between cross-species amplification success and genetic divergence in the source species of SSRs (Addamo et al., 2015). Previous molecular studies demonstrate that the genetic distance between S. sihama and S. japonica is less than the distance between S. japonica and S. parvisquamis (Gao et al., 2011; Niu et al., 2016). Therefore, the high transferability of our SSR markers might result from the closer phylogenetic relationship between S. sihama and S. japonica. In addition, the different types of SSRs developed in the current study may increase their transferability.
In the present study, the microsatellite diversity values (NA = 3-19, 12.23; HO = 0.130-0.958, 0.625) of S. japonica were slightly lower than those of S. sihama but higher than those reported for 13 freshwater fishes (mean NA = 9.1, mean HO = 0.54) (DeWoody and Avise, 2000). Moreover, the PIC values showed high levels of polymorphism information for most of the 13 transferable loci. These results suggest that our transferable loci exhibited a high level of genetic variation. As analyzed, due to the null alleles and inbreeding, the five transferable loci detected in S. japonica showed a significant departure from HWE, with two loci having a null allele frequency threshold of >0.2 (0.243 for SS2-4 and 0.250 for SS2-6). According to Dakin and Avise (2004), if the frequency of a null allele is >0.2, the mean exclusion probability of parentage analysis will be significantly overestimated. For this reason, SS2-4 and SS2-6 should be ruled out for future parentage analysis or individual identification in S. japonica. It follows that the other 11 transferable loci have great potential to be used for genetic diversity and population structure studies in S. japonica. Because of the lack of other Sillaginidae species in our study, the transferability of the 18 developed loci to them is unknown. Thus, future study is needed to perform further cross-amplification testing in Sillaginidae fishes.
In this study, a stepwise approach including Illumina sequencing, primer screening, and SSR marker validation was first established to isolate SSR markers for sillaginid fishes. Using this method, we easily identified a large number of various SSRs from the genome and rapidly developed a set of polymorphic SSR markers for S. sihama. Moreover, the majority of isolated loci also showed high transferability to the closely related species S. japonica. The novel and high polymorphic SSRs (especially the longer motifs) discovered in this study will facilitate future estimation of genetic diversity, population structure and evolution, and molecular breeding for S. sihama. They also provide potential markers for S. japonica. Furthermore, the results indicate that our established approach is convenient and cost-effective, and is also suitable for SSR marker isolation of other sillaginid fishes.