Divergence and polymorphism analysis of IGF1Ra and IGF1Rb from orange-spotted grouper, Epinephelus coioides (Hamilton)
Orange-spotted grouper (Epinephelus coioides) is one of the most important marine fish and has a high market value. The insulin-like growth factor type 1 receptor (IGF1R) is a component of the insulin-like growth factor signaling system, and demonstrates important roles during growth. Based on information from livestock, we used IGF1R as a candidate gene to survey single nucleotide polymorphisms. In the present study, the sequences of IGF1Ra and IGF1Rb from orange-spotted grouper were obtained from the genome sequences and their clustering in clades a and b, respectively, was confirmed by phylogenetic analysis. Fourteen critical amino acids underlying functional divergence were detected between the two clades, revealing the molecular basis of their functional differences. Nearly one-fourth (22 kbp) of the genomic sequence of IGF1Ra was sequenced in a mass cross population, and nucleotide diversity and linkage disequilibrium levels were investigated. Nucleotide diversity was 0.00328 for π and 0.00344 for θw. The half decay of the squared allele-frequency correlation was 10,835 base pairs. Comparatively, the relatively high level of linkage and the significant deviation from neutrality-based codons in IGR1R showed that this gene was under selection. A site (KR269824.1:g. 63762C>T), located in the sixth intron, was significantly associated with eyeball diameter (P = 1.39 x 10-4, Q-value: 2.33 x 10-2), which accounted for 11.1% of phenotypic variance. These results highlight the important function of IGF1R in orange-spotted grouper and may be beneficial in the breeding of this species.
Orange-spotted grouper (Epinephelus coioides), which belongs to the Serranidae family (Froese and Pauly, 2011), is treated as a model species and is used to determine the genetic and physiological features of grouper, including the phenomenon of sex reversal. Moreover, this species has been a popular stock in aquaculture, particularly in Southeast Asia, due to its robustness in overcrowded conditions, fast growth at high temperatures, relatively high price in the market, and the insufficient supply of wild captured fish (Pierre et al., 2008). Global production of this species, including aquaculture and capture production, has rapidly increased, reaching 9357 tons in 2014, with aquaculture production also growing fast in recent years (FAO, 2016). Although the reproduction of orange-spotted grouper has been fully controlled, the fry are collected directly or indirectly from their natural habitat, which increases the risk in fish culture due to the unstable germplasm (Pierre et al., 2008). With the rapid development of economic growth in Southeast Asia, extensive culture is not sufficient to meet the demand of the market. There is a demand for high-quality fry to improve the quality of broodstock. Marker-assisted breeding has become an effective method for selecting high-quality stock, and linkage disequilibrium mapping (LD-mapping), also known as association mapping or association analysis, has been used to dissect the relationship between genotype and phenotype (Thavamanikumar et al., 2011). LD-mapping utilizes historically accumulated crossover, resulting in high resolution, and can be applied to random populations and breeding populations, while family-based quantitative trait locus mapping is limited to family population. A unified mixed model has been introduced to account for multiple levels of relatedness, including kinship and patterns of population structure (Yu et al., 2006). This method has been widely used to determine the association between quantitative traits and genotypes of candidate genes (Thavamanikumar et al., 2011). In addition, we have used this method to detect the association between growth traits and genotypes in growth hormone-releasing hormone (GHRH), the GHRH receptor, and PACAP-related peptide/pituitary adenylate cyclase activating polypeptide type a (PRP-PACAPa), and type b (PRP-PACAPb) in this species (Guo et al., 2015a).
Due to its central importance in growth regulation, the insulin-like growth factor (IGF) signaling system plays an important role in determining growth patterns (Maures et al., 2002). The function of the IGF type 1 receptor (IGF1R) as an important member of this signaling system has been explored, particularly in humans. This gene is associated with height in human (Fontenele et al., 2015). In mice, null mutants of this gene die invariably at birth due to respiratory failure and exhibit severe growth deficiency (Liu et al., 1993). Information on the important functions of this gene in growth and development is based on evidence from mammals, which suggests that this gene may be a target in agriculture. A large number of association analyses have been performed in livestock. This gene is associated with growth (Szewczuk et al., 2013) in cattle, reduced size (Hoopes et al., 2012) in dogs, body weight in Japanese quail (Moe et al., 2007), and early growth traits, carcass traits (Lei et al., 2008), and feed conversion ratio (Jin et al., 2014) in chicken. Accumulated evidence to date suggests that the major components of the IGF signaling system in teleost are structurally and functionally similar to those in mammals. Nevertheless, two subtypes of this gene, IGF1Ra and IGF1Rb, exist in teleost (Maures et al., 2002), which originated from gene duplication (Schlueter et al., 2006). Evidence from knockdown models in zebrafish showed that these two subtypes are required for proper embryonic growth, development, and survival (Schlueter et al., 2006), and diverge in their expression patterns (Maures et al., 2002; Li et al., 2014) and function, including early germline development (Schlueter et al., 2007) and the promotion of motor neuron innervation (Schlueter et al., 2006). Full-length cDNA sequences of both subtypes have also been cloned in Japanese flounder (Paralichthys olivaceus) (Nakao et al., 2002) and common carp (Cyprinus carpio). The full-length sequence of one of these subtypes has been cloned in orange-spotted grouper, although the clade has not been identified and the sequence has not been submitted to a gene bank (Kuang et al., 2005). This subtype has been proposed as subtype b following a primary search of the gene bank at the National Center for Biotechnology Information (NCBI). Unlike other genes in the IGF signaling system, this gene has not been targeted for association analyses in fish.
This study was performed to survey single nucleotide polymorphisms (SNPs) in the IGF1R gene and to investigate their association with growth traits. First, we obtained DNA and cDNA sequences of the IGF1Ra gene from the orange-spotted grouper genome sequences and the cDNA sequence of IGF1Rb from previous publication (Kuang et al., 2005). Second, the clades of these two IGF1R subtypes were confirmed and critical amino acid residues involved in their function were identified. Third, the IGF1Ra gene was selected for sequencing in a breeding population and nucleotide diversity, LD, and association analysis between genotypes and phenotypes in IGF1Ra were assessed. The estimated state of SNPs in the breeding population is expected to be meaningful for breeding in this species.
MATERIAL AND METHODS
The samples and phenotypic records studied have been described previously (Guo et al., 2015a). Briefly, the parents were captured in the South China Sea near Hainan Island (18°N, 109°E). First generation progenies were produced from a mass cross. Once the first generation progenies had matured, the second generation progenies were generated from a mass cross of 29 males and 12 females. The second generation fry, which numbered approximately 15,000, were fed according to the management practices of the fishery farm and cultured for 8 months in the same net cage in Haitang Bay of Sanya (Hainan Province, China). Additionally, 159 fry were randomly selected and 11 phenotypes were recorded, including body weight (BWT), maximum body depth, total length, standard length (SL), head length, body width, caudal peduncle length, caudal peduncle depth, snout length, eyeball diameter (ED), and interorbital distance. The condition factor (K) was calculated using the following formula: K = 100 x BWT / SL3. Tissues were sectioned from the caudal fin of each individual and preserved in 95% ethanol at -20°C immediately thereafter. DNA was extracted and the quality was confirmed on 1% agarose gels and a UV spectrophotometer (Nanodrop2000/2000c, Thermo Scientific, Pittsburgh, PA, USA). All animal experiments were approved and conducted in accordance with the guidelines of the Animal Research and Ethics Committees of Sun Yat-Sen University.
Phylogenetic analysis of IGF1R
An IGF1R mRNA sequence in orange-spotted grouper was obtained from the previously published paper (Kuang et al., 2005), which was predicted to be subtype b. The genomic and mRNA sequences of presumed IGF1Ra (accession No.: KR269824.1) were obtained from the genome sequences of orange-spotted grouper (Zhang Y, unpublished data). To confirm the clade of the two genes, additional full length mRNA sequences were downloaded from the NCBI database, including IGF1Ra (zebrafish: AF400275.1, Japanese flounder: AB065098.1, and common carp: AY144591.1), IGF1Rb (zebrafish: AF400276, Japanese flounder: AB065099.1, and common carp: AY144592.1), IGF1R (human: X04434.1, mouse: AF056187.1, chicken: AJ223164.1, and Xenopus: AF055980.1), and insulin receptor (IR) (human: M10051.1), in which IR was treated as the outgroup. The translated protein sequences were aligned using MUSCLE and evolutionary trees were constructed using the MEGA6 software (Tamura et al., 2013). Neighbor-joining (NJ) and maximum likelihood (ML) methods were used with the default parameters.
To detect the divergent sites of the two IGF1R clades, the protein sequences of zebrafish IGF1Ra and IGF1Rb were used to search the database of non-redundant protein sequences at NCBI with an e-value set to 10-15. In total, 250 of the closest related protein sequences of each search batch were downloaded. Sequences from teleost and those detailed above were selected for alignment, and a protein sequence from Callorhinchus milii was included as the outgroup (Table 1). NJ and ML trees were constructed as above. The DIVERGE 3.0 program (Gu and Vander Velden, 2002) was used to analyze functional divergence between the two clades of the IGF1R gene. DIVERGE estimates the ML of the theta (θ) Type-I and Type-II coefficients of functional divergence based on the occurrence of altered selective constraints and radical shifts of physicochemical properties, respectively. The θ value indicates the extent of functional divergence, ranging from 0 (no functional divergence) to 1 (maximum divergence). θ (Type-I) was calculated using the model-free method and θ (Type-II) was calculated using the simplified ML method. Statistical testing to determine whether θ was significantly larger than 0, was based on the estimate (θ) and its standard error (SE), in which the Z-score (θ/SE) follows a normal distribution. Amino acid positions with posterior probabilities (Q) greater than 0.8 were selected as candidates. The candidates may have functional roles in the selective divergence of subclasses and may represent the residues responsible for subclass-specific activity.
List of investigated insulin like growth factor 1 receptor (IGF1R) sequences.
|Species||Subtype||GenBank and RefSeq|
|Fundulus heteroclitus||a (isoform X2)||gi|831553915|ref|XP_012729198.1|
|Oryzias latipes||a (isoform X2)||gi|432861730|ref|XP_004069709.1|
|a (isoform X3)||gi|765122778|ref|XP_011474451.1|
|a (isoform X1)||gi|765122775|ref|XP_011474450.1|
|b (isoform X2)||gi|908429027|ref|XP_013132306.1|
|b (isoform X1)||gi|542213912|ref|XP_005447746.1|
|a (isoform X2)||gi|929123899|ref|XP_013979595.1|
|a (isoform X1)||gi|929123897|ref|XP_013979594.1|
|Larimichthys crocea||b (1)||gi|808882895|gb|KKF30174.1|
|Larimichthys crocea||b (2)||gi|734608293|ref|XP_010731390.1|
|IR Homo sapiens||gi|307070|gb|AAA59174.1|
Target sequencing and SNP determination
Four pairs of primers (Table 2) were previously designed (Guo et al., 2015b). DNA amplification, sequencing and SNP calling were performed according to previous methods (Guo et al., 2015a). High-fidelity DNA polymerase, PrimeSTAR GXL DNA Polymerase (TaKaRa, Dalian, China), was used for amplifications. The amplicons were sequenced on an ion torrent personal genomic machine using chip 318. High-quality SNPs were identified using the following criteria: 1) base quality levels, map quality levels, and SNP quality levels should be greater than or equal to 20 (Phred scale); 2) SNP coverage level should be greater than or equal to 10-fold; and 3) a mutation with a ratio of variant coverage (variant/total) from 10 to 90% should be defined as a heterozygous variant. Physical coverage was defined as the percentage of sites with a mean coverage depth above 10.
Primers specific for the IGF1-Ra gene (four fragments).
|Description||Forward 5' to 3'||Reverse 5' to 3'||Length (bp)|
Association analysis between genotypes and phenotypes and estimates of nucleotide diversity
Levels of LD between each pair of SNPs with a minor allele frequency (MAF) >10% were estimated as squared allele-frequency correlation (r2) using the Haploview 4.2 program (Barrett et al., 2005). The LD decay with distance was evaluated using a mutation-recombination drift model (Sved, 1971). Association analysis was performed using a mix linear model, which was made available through the TASSEL software program version 5.1.0 (Bradbury et al., 2007), in which kinship and population structure were considered to improve the true positive rate. SNPs with a MAF of >5% and a missing rate of <20% were selected, and the pairwise kinship coefficients and population structure matrix were previously estimated (Guo et al., 2015a). Multiple comparisons were adjusted using the false discovery ratio (Q) (Storey and Tibshirani, 2003). The variation explained by each SNP (R2) was estimated as: R2 = SSt / SST x 100%, where SSt is the variance explained by the marker and SST is the total variance. The gene action model was qualified using the ratio of dominance (d) to additive (a) effects (Tian et al., 2014).
A codon-based Z-test of the selection of all sequences was performed using the MEGA6 software (Tamura et al., 2013). The test statistic of the difference in the number of synonymous and nonsynonymous substitutions per site was calculated using the Nei-Gojobori method (Nei and Gojobori, 1986) and the variance of the difference was computed using the bootstrap method (1000 replicates). Haplotypes and missing genotypes were inferred using the PHASE 2.1.1 software (Stephens et al., 2001). Varying rates of recombination were applied, and burn-in and iteration quantities were both set to 100. Nucleotide diversity levels were estimated as π (Nei, 1987) and θw (Watterson, 1975), and allele frequency spectra was estimated as Tajima’s D using the DnaSP software version 5.10 (Librado and Rozas, 2009).
The genomic sequence of the putative IGF1Ra was obtained from the orange-spotted grouper genome sequence (Zhang Y, unpublished data) and the gene was found to contain 21 exons (Figure 1). Phylogenetic analysis supported the two clades of IGF1R in teleost with 100% of bootstrap replicates using both the NJ and ML methods. The IGF1R gene obtained from the orange-spotted grouper genome sequence was clustered in subtype a, and the previously reported IGF1R gene was clustered in subtype b, which confirmed the clade of the two IGF1R in orange-spotted grouper (Figure 2). Previous physiological experiments in zebrafish have demonstrated that the two subtypes have different roles during development (Schlueter et al., 2006). We collected 47 IGF1R protein sequences from teleost in order to determine the molecular basis of their functional divergence, which included almost all of the sequences from teleost. Sequences from teleost were grouped into two clades with 100% of the bootstrap replicates in both methods (NJ and ML), and all of these gene subtypes were clustered into the corresponding clade independently of the method used (NJ and ML). Figure 3 shows the cluster using the NJ method. The functional divergence between the two subtypes was estimated using the DIVERGE 3.0 program (Gu and Vander Velden, 2002) (Table 3). Type I functional divergence showed θ = 0.294 ± 0.031, which was significantly greater than 0 (P < 0.00001). Thus, the functional divergence at the primary structure level of protein between the two clades was significant, and 14 residues had a stringent threshold of posterior probability higher than 0.80. Type II functional divergence showed θ = 0.0023 ± 0.036, which was not significant according to a statistical test. Illustration of the insulin-like growth factor a (IGF1Ra) gene and the position of the four amplicons. The IGF1Ra gene contains 21 exons and 87.9 kbps from the start codon to the stop codon. This depiction shows the structure of the entire gene, including 551 bp upstream of the gene. The amplicons covered the most exons, except for the 2nd, 12th, 13th, 14th, 15th, and 21st exons. The four amplicons (IGR1R.H1, IGR1R.H2, IGR1R.H3, and IGR1R.H4), excluding the length of the primers contained 5231, 5313, 6176, and 5581 base pairs, respectively, in which the first amplicon (H1) contained 551 bp upstream of the start codon (pink). Phylogenetic relationships of the IGF1R gene inferred using the neighbor-joining (NJ) and maximum likelihood (ML) methods. Numbers at the nodes indicate the bootstrap values (%), in which the numbers present before the slash indicate the values (%) determined using the NJ method and those after the slash indicate the values using the ML method. The two IGF1R genes from orange-spotted grouper were clustered with subtype a and b, respectively, which confirmed that the corresponding genes were IGF1Ra and IGF1Rb. Phylogenetic relationships of IGF1R inferred using the NJ method. Numbers at the nodes indicate the bootstrap value (%). The blue branch indicates the gene clade IGF1Ra and the red branch indicates the gene clade IGF1Rb. P, posterior probability values; SE, standard error. Critical amino acid sites related to functional divergence with posterior possibility (Q) > 0.50 (*Q > 0.80) are listed. Numbering refers to the human IGF1R protein sequence (GenBank and RefSeq: gi|4557665|ref|NP_000866.1). Statistics of the mean coverage depths and physical coverage of the four amplicons.
Analysis of functional divergence between IGF1Ra and IGF1Rb.
Coefficient θ ± SE
Critical amino acid sites
0.2944 ± 0.031 (P < 0.00001)
155, 174, 187*, 223, 238, 287, 290*, 320, 331, 339*, 344, 357, 366, 370, 412*, 516, 565, 590, 685, 720, 765*, 767*, 828, 845, 903*, 917, 918*, 1089, 1143, 1287*, 1288, 1290*, 1292, 1298*, 1299*, 1300*, 1319, 1320, 1321, 1335*, 1337
0.0023 ± 0.036 (P = 0.47)
Illustration of the insulin-like growth factor a (IGF1Ra) gene and the position of the four amplicons. The IGF1Ra gene contains 21 exons and 87.9 kbps from the start codon to the stop codon. This depiction shows the structure of the entire gene, including 551 bp upstream of the gene. The amplicons covered the most exons, except for the 2nd, 12th, 13th, 14th, 15th, and 21st exons. The four amplicons (IGR1R.H1, IGR1R.H2, IGR1R.H3, and IGR1R.H4), excluding the length of the primers contained 5231, 5313, 6176, and 5581 base pairs, respectively, in which the first amplicon (H1) contained 551 bp upstream of the start codon (pink).
Phylogenetic relationships of the IGF1R gene inferred using the neighbor-joining (NJ) and maximum likelihood (ML) methods. Numbers at the nodes indicate the bootstrap values (%), in which the numbers present before the slash indicate the values (%) determined using the NJ method and those after the slash indicate the values using the ML method. The two IGF1R genes from orange-spotted grouper were clustered with subtype a and b, respectively, which confirmed that the corresponding genes were IGF1Ra and IGF1Rb.
Phylogenetic relationships of IGF1R inferred using the NJ method. Numbers at the nodes indicate the bootstrap value (%). The blue branch indicates the gene clade IGF1Ra and the red branch indicates the gene clade IGF1Rb.
P, posterior probability values; SE, standard error. Critical amino acid sites related to functional divergence with posterior possibility (Q) > 0.50 (*Q > 0.80) are listed. Numbering refers to the human IGF1R protein sequence (GenBank and RefSeq: gi|4557665|ref|NP_000866.1).
Statistics of the mean coverage depths and physical coverage of the four amplicons.
Nucleotide polymorphisms in the IGF1-Ra gene (KR269824.1).
|Description||No. of sites||No. of polymorphic sites||Nucleotide diversity||Tajima’s Dc|
aSilent substitutions were considered to be those in the coding and noncoding regions. bTotal number of sites of the fragment was assessed. cQuality of Tajima’s D for all types was not significant (P > 0.10).
In this study, the genomic and mRNA sequences of the IGF1Ra gene were obtained, and the clades of the two cDNA sequences in orange-spotted grouper were confirmed. Based on knowledge of their functional divergence from physiological experiments, we collected almost all of the protein sequences from teleost. Fourteen critical amino acids were detected based on the different evolutionary rates, which may explain the physical basis of the functional divergence at the primary structure level of the protein. Due to the important functions of the IGF signaling system, IGF1Ra was selected as a target gene, and nucleotide diversity was investigated in the breeding population. The level of LD within the gene was estimated and found to be relatively low. The associated site (KR269824.1:g. 63762C>T) with ED was detected. Because little is known about the genetics of this species, evaluation of the nucleotide diversity and level of LD may aid in breeding practices.
The level of nucleotide diversity in the gene regions was similar to that in GHRH, GHRH receptor, PRP-PACAPa, and PRP-PACAPb, which were analyzed based on the same population (Guo et al., 2015a). Furthermore, a region that included the IGF1Ra gene was inferred to be under selection. First, the large difference in nucleotide diversities between non-synonymous and synonymous sites showed that a mutation in the non-synonymous sites was swept. Second, a codon-based neutrality test indicated that variation in IGF1Ra significantly deviated from neutrality and was under purifying selection. Third, the half decay of r2 was much larger than that in the GHRH, PRP-PACPAa, PRP-PACAPb, and GHRHR regions, which was limited within 1 kbp (Guo et al., 2015a). In practice, the culture of this species is difficult due to the occurrence of serious diseases (Nagasawa and Cruz-Lacierda, 2004). Usually, diseases can result in the death of fish batches, and until recently, the scale of the culture was small. Thus, these factors may enlarge the effect of drift and the strength of selection. The broodstock was captured from the wild ocean, and the wild population is overfished (IUCN, 2012). These two factors may contribute to selection. However, confirmation of selection and driving factors require further evidence.
A significant association between the SNP KR269824.1:g. 63762C>T and ED was detected. However, this evidence was not strong. According to Yan et al. (2011), the statistical power was only 26% with a false positive rate of 1.39 x 10-4. Thus, this association was not sufficiently convincing within the background of the scale of the entire genome. Although no significant association was detected between harvest traits and genotypes, the identification of effective markers for detection should be encouraged. Orange-spotted grouper is an important aquaculture species, particularly in Southeast Asia. With economic development, the culture style is expected to be sustainable and environmentally friendly. Survey and utilization of the germplasm resource of this species is at the preliminary stage. Although this species is widely distributed in the open sea, resource investigations are limited within small regions, mainly in Southeast Asia (Antoro et al., 2006; Wang et al., 2009; Meng et al., 2012). In recent years, next generation sequencing (NGS) has revolutionized nucleotide sequencing (Helyar et al., 2011). The construction of genetic linkage maps for this species and association analyses between genes (GHRH, PRP-PACAPa, PRP-PACAPb, and GHRHR) and growth traits (Guo et al., 2015a) have benefited from the improvement of NGS. With the aid of NGS, whole genomic association has been successfully applied to search for powerful markers (Campbell et al., 2014). With this expectation, the next phase of investigation on orange-spotted grouper will be based on the whole genome considering large scale population resources.