Systems genetics and genome-wide association approaches for analysis of feed intake, feed efficiency, and performance in beef cattle
Feed intake, feed efficiency, and weight gain are important economic traits of beef cattle in feedlots. In the present study, we investigated the physiological processes underlying such traits from the point of view of systems genetics. Firstly, using data from 1334 Nellore (Bos indicus) cattle and 943,577 single nucleotide polymorphisms (SNPs), a genome-wide association analysis was performed for dry matter intake, average daily weight gain, feed conversion ratio, and residual feed intake with a Bayesian Lasso procedure. Genes within 50-kb SNPs, most relevant for explaining the genomic variance, were annotated and the biological processes underlying the traits were inferred from Database for Annotation, Visualization and Integrated Discovery (DAVID) and Kyoto Encyclopedia of Genes and Genomes (KEGG) databases. Our results indicated several putative genomic regions associated with the target phenotypes and showed that almost all genomic variances were in the SNPs located in the intergenic and intronic regions. We further identified five main metabolic pathways related to ion transport, body composition, and feed intake control, which influenced the four phenotypes simultaneously. The systems genetics approach used in this study revealed novel pathways related to feed efficiency traits in beef cattle.
The importance of feed in economical production of livestock has been long recognized. Feed costs typically represent the largest share of variable costs affecting the long-term profitability of cattle feedlot operations (Anderson et al., 2005). In addition, when considering sustainable intensification, cattle finished in confinement might offer an opportunity to increase the production using less land. Moreover, it could be an economically sound development strategy to reduce greenhouse gas emissions, particularly when reducing the reliance on pasture for finishing animals and when the use of by-products in animal diets are favorable (Havlík et al., 2014). Thus, feed intake, weight gain, and feed efficiency - the three metrics required for optimizing the resource use, together with meat quality are important variables to facilitate the improvement of the overall economic and environmental sustainability of livestock production systems (Nkrumah et al., 2007).
Several studies have used genomic information for understanding the biological processes underlying these production traits in beef cattle (Moore et al., 2009; Rolf et al., 2012). Genome-wide association studies (GWAS) with dense panels of genetic markers have potential to identify the genomic regions that govern such phenotypes (Rolf et al., 2012). More recently, systems genetics (SG) has been proposed as an ideal approach for unifying phenotypic and genotypic databases to get a single conceptual mechanistic view on how phenotypes emerge as an outcome of long and complex dynamic biological systems. Kadarmideen et al. (2006) introduced the concept of SG to animal science, highlighting that it could combine the already established GWAS methods with information about gene expression and metabolic pathways. Since then, SG has been crucial in understanding the biological role of genetic variants and genes identified with GWAS. Additionally, understanding of how genomic regions contribute to the variance of complex traits might help in selection of candidate markers; however, this has not been sufficiently studied so far. Using high-density single nucleotide polymorphism (SNP) chips, Koufariotis et al. (2014) found that SNPs in the upstream and downstream regions of genes in cattle were enriched for complex traits. Do et al. (2015) found that in pigs, SNPs in different annotated genomic regions contributed almost equally to the total genomic variance. However, their study was based on relatively few markers (30K) and was performed in a population with high linkage disequilibrium (LD).
SG has been used to search for genes and biological processes involved in the regulation of cattle production traits, providing insights on biological processes associated with feed intake, feed efficiency, and weight gain. However, as of date, these biological processes are not well described for Bos indicus cattle. To enhance the understanding of the genetics that defines the production efficiency of Nellore cattle, we performed GWAS. Herein, we discuss the outcomes of our study with regard to feed intake, feed efficiency, and weight gain, within the SG framework. We investigated genetic variants, genomic variance partitioning, and physiological processes determined by the genomic regions mapped for these phenotypes.
MATERIAL AND METHODS
No approval from the local Ethics Committee was required for the present study because we used the data previously collected in other experiments. In all the experiments, animals were handled in accordance with the guidelines of the Brazilian College of Animal Experimentation (COBEA). Phenotypic data and DNA samples were taken from each of the experiments, which had been approved by the respective Ethics Committees for each study (Santana et al., 2012; Gomes et al., 2013; Alexandre et al., 2015).
We collected data for 1334 Nellore bulls and steers (484 ± 95 days old and 357 ± 60.34 kg) from 16 experiments conducted in Brazil from 2007 to 2013. These trials were carried out in individual pens with either GrowSafe or Calan Gates for recording the feed intake. Further details about the animals, phenotypes, facilities, experimental diets, and management procedures are available in Santana et al. (2012), Gomes et al. (2013), and Alexandre et al. (2015).
The animals were weighed every 21 days. Average daily gain (ADG) was estimated by linear regression of weight measurements over time. Dry matter intake (DMI) was recorded daily, and feed conversion ratio (FCR) was calculated by dividing DMI by ADG. Residual feed intake (RFI) was assumed to be the residual of the DMI regression on ADG, mid-body weight, and contemporary group. ADG, DMI, FCR, and RFI were tested for normality (Shapiro-Wilk) and homoscedasticity of residuals (Breusch-Pagan). Data points with standard deviations of ±3 from the mean were excluded from the phenotypic dataset.
Genomic data and imputation
Animals with phenotypic records were genotyped using three different commercially available panels: 388 with Illumina BovineHD® BeadChip (777,962 SNPs), 636 with GGP Indicus Neogen HD® (84,379 SNPs), and 310 with Illumina BovineSNP50® version 2 BeadChip (54,609 SNPs). Only the genotype calls (standard cluster quality) greater than 0.70 and samples with a call rate higher than 90% were kept in the dataset. Duplicate samples, ascertained when more than 95% alleles were identical by state based on 20,000 randomly obtained markers, were excluded from further analysis.
Genotypic information of 279 Nellore cattle genotyped with Illumina BovineHD® BeadChip and Affymetrix Axiom® Genome-Wide BOS 1 (648,874 SNPs) were combined and 1,261,128 SNPs were used. Imputation of the missing genotypes in the low-density SNP panel was performed using FImpute v2.2 (Sargolzaei et al., 2014) for the animals with phenotypic records. The accuracy of imputed genotypes was tested by cross-validation for all the procedures and concordance rate between the real and imputed genotypes was always higher than 97.5%. As is the standard practice, genotypic data were submitted to quality control SNPs with minor allele frequency <2%, call rate <95%, or a significant deviation from Hardy-Weinberg equilibrium (Fisher exact test P value <1 x 10-5), and those that are located on allosomes were excluded. After imputation and quality control, 943,577 SNPs from 1334 phenotyped animals remained in the final dataset for the association test.
Considering the challenges of the “small n, large P” problem in GWAS, we used Lasso procedure implemented with a Gibbs sampler for parameter estimation within a Bayesian hierarchical framework, as described by Li et al. (2011). This approach considers covariate effects and proposes a more general penalization of the gene effect allowing sparsity, which means that the estimates of lower importance tended towards zero. The Gibbs sampler also estimates tuning parameters for the Lasso penalties, and thus, the degree of shrinkage of the marker effect is obtained from the data. A prior gamma distribution is assigned to the Lasso parameters and the conditional prior of the additive gene effect is assumed to follow a Laplace distribution.
ADG, DMI, FCR, and RFI were adjusted for contemporary groups (from the same experiment, fed the same diet, and administered under same management and environment) before the association test and the population substructures were evaluated. The animal’s age was considered as a potential covariate for adjusting the responses, except for RFI, previous analysis of which did not show any correlation with age. We first filtered out the SNP database by performing GWAS with the single-SNP analysis described by Das et al. (2011), and only those markers with P value ≤0.1 for the trait were selected for performing the Lasso procedure. Subsequently, the SNPs that passed through this last variable selection step were resubmitted to the GWAS method to refit the parameters in order to get lower biased estimates. We set a piecewise ratio (i.e., ratio of grouping SNPs) equal to three in order to obtain samples from the joint posterior distributions of additive gene effect with 5000 Markov chain Monte Carlo (MCMC) iterations after convergence, which was monitored by a potential scale reduction factor. Convergence was assumed to have been achieved when this factor was lower than 1.1. For claiming significance of the gene effects, the 95% posterior credible interval should not be zero. As a criterion for selecting SNPs with significant effects that were more relevant for the phenotype, we used the heritability estimate (i.e., portion of explained phenotypic variance) computed for each marker by using the posterior samples of the additive gene effect as recommended by Li et al. (2011). All computations and statistical analyses were performed using the gwas.lasso package for R, which is available at
Genomic variance partitioning
The Ensembl variant effect predictor was used to annotate SNPs obtained by the Lasso procedure; five classes were used: downstream (variants found 5 kb downstream of a gene), exonic (variants found in coding region), intergenic (variants that occurred in-between genes), intronic, and upstream (variants found 5 kb upstream of a transcription start site). The total genomic variance of each annotation group was computed according to the gbayz function using the BayZ package (
Systems genetics analysis
The gene set associated with each trait was determined from the Ensembl Genes 81 Database using the Biomart software by selecting genes located within 50 kb of each SNP-flanking region with percentage of explained genomic variance higher than 1%. Functional annotation of the gene sets was performed using the Database for Annotation, Visualization and Integrated Discovery (DAVID) v6.7 (Huang et al., 2009) against a bovine background (Bos taurus). Metabolic pathway analyses were based on the data available from the Kyoto Encyclopedia of Genes and Genomes (KEGG). To determine KEGG pathways significantly associated with all the traits, we implemented the Fisher exact test for assessing whether the genes in the gene set for each trait were overrepresented among all the genes in any given pathway.
ADG was associated with significant SNPs on chromosomes 1, 5, 6, 9, and 29. For DMI, chromosomes 2, 9, 19, 24, and 27 harbored significant SNPs. Among the feed efficiency traits, significant SNPs were observed on chromosomes 3, 4, 6, 7, 12, and 16 for FCR and on chromosomes 2, 3, 5, 15, and 22 for RFI (Table 1). The most associated genomic regions accounted for 14.67, 13.88, 17.55, and 16.83% of the genomic variance for ADG, DMI, FCR, and RFI, respectively. ADG = average daily gain, DMI = dry matter intake, FCR = feed conversion ratio, RFI = residual feed intake.
Single nucleotide polymorphisms that explain more than 1% genomic variance in feed efficiency-related traits.
Genomic variance (%)
ADG = average daily gain, DMI = dry matter intake, FCR = feed conversion ratio, RFI = residual feed intake.
Genomic variance partitioning
Five different classes were annotated for SNPs; the mean was 63.74% in the intergenic regions and 28.17% in the intron (Table 2). In general, the contribution of different annotated classes and the number of SNPs in these classes was linearly associated. ADG = average daily gain, DMI = dry matter intake, FCR = feed conversion ratio, RFI = residual feed intake.
Percentage variance in different genomic regions, grouped by feed efficiency-related traits in Nellore cattle.
The 50-kb criterium for selecting the flanking regions around significant SNPs was based on the known extent of LD blocks in cattle populations. A recent study on LD structure by Villa-Angulo et al. (2009) revealed that there were regions of high LD extending up to 100 kb and the size of haplotype blocks ranged between 30 bases and 75 kb (the average being 10.3 kb). Previous studies also suggested that a similar distance could be used in the SG approach to capture the causal genes/SNPs affecting a quantitative trait like feed efficiency (Do et al., 2014). The over-representation analyses identified five KEGG pathways associated with feed efficiency-related traits (Table 3).
Pathways involved in enrichment analysis of feed intake, efficiency, and performance traits in Nellore cattle, obtained from KEGG.
|Pathway||Frequency in Geneset||Frequency in genome||P value|
|Aldosterone-regulated sodium reabsorption||3||39||0.006|
|T cell receptor signaling pathway||3||106||0.011|
|Systemic lupus erythematosus||4||25||0.018|
|Ion transporter inhibitors||7||41||0.027|
One of the main dilemmas of GWAS with high-density panels of genetic markers is addressing the problem of the “small n, large P”, which is the observed scenario when the number of predictors far exceeds the sample size. This problem exists because not only such high dimensionality requires appropriate modeling for stable computations but also because of the fact that the more SNP markers we take into account, the greater is the need to control false positives. To overcome these challenges in the association analysis, we used Lasso procedure implemented with a Gibbs sampler for parameter estimation within a Bayesian hierarchical framework, as proposed by Li et al. (2011). Although more computationally expensive, this approach has advantages over methods based solely on single-SNP analysis in terms of parameter estimates.
Before variable selection, we pre-selected the markers with non-negligible effects on the trait by conducting a single-SNP analysis. We chose the cut-off criteria P value <0.1 to keep at least 10% of the full genotypic dataset for further analyses. Feed efficiency-related traits are complex and polygenic traits; they do not follow the Mendelian model of inheritance and are affected by several molecular processes with many potential influences, mainly the environmental factors. In ADG, 10.31% of all markers were selected for inclusion in the Lasso step, and 11.53, 11.79, and 13% were selected for DMI, FCR, and RFI, respectively. In an attempt to identify the SNPs that would affect the traits, we applied a variable selection approach where the Lasso penalization was placed on the size of the additive effect. We set a piecewise ratio equal to three because it yielded more biologically relevant estimates without affecting the detection power when compared to computations that used a ratio of two. The animal’s age at slaughter was treated as a covariate for ADG, DMI, and FCR, but it was significant only for ADG. All the SNPs selected by the Lasso procedure for ADG (588), DMI (604), FCR (559), and RFI (752) were resubmitted to the Bayesian model, but without imposing penalizations. This approach has the potential to improve the bias of parameter estimates (Li et al., 2011).
Genomic variance partitioning
The partitioning of genomic regions implies diversity in regulation of complex traits. Koufariotis et al. (2014) showed that, in cattle, the regulatory regions were significant for complex traits and the regions near genes might explain the high variance in human height and body mass index (Yang et al., 2011). The SNPs farther from the transcription start site were less likely to be significantly associated with the trait (Kindt et al., 2013). However, the annotation depends on many other factors. Perhaps, a common pattern for genomic variance does not exist; it depends on the biology of the traits, the population structure, and the LD among SNPs as well as on the relationship among the animals. The genomic partitioning, considering the LD among SNPs, and the partitioning based on the biological class might provide better insights.
Aldosterone-regulated sodium reabsorption is associated with mineral reabsorption, specifically the renal sodium that is regulated by aldosterone from the adrenal glands. These physiological processes (mineral reabsorption) are energetically expensive and account for 10% of the maintenance body energy (Kies et al., 2005) and expenditure and could regulate the feed efficiency indirectly.
The T-cell receptors are related to cell fate and regulate the cell-mediated immunity. Apparently, these two mechanisms are not directly linked to the phenotypes studied here. However, T-cell receptor genes were observed to be upregulated in adipose tissue of pig lines that differed in feed efficiency (Lkhagvadorj et al., 2010) and were associated with RFI in Angus cattle (Rolf et al., 2012).
The molecular relationship between systemic lupus erythematosus (SLE) and the phenotypes studied here is not directly understandable. An SNP in the STAT4 gene was considered a genetic marker for fat metabolism in dairy cattle (Zhang et al., 2010) and this SNP was also related to SLE in humans (Ji et al., 2010). Interestingly, the protein encoded by the STAT family of genes is related to growth factors and the STAT6 gene polymorphisms were associated with growth efficiency and carcass traits in beef cattle (Rincon et al., 2009).
Previously, proteoglycans were reported to be associated with feed intake. The syndecan-3 (cell surface proteoglycan, mainly receptors) can influence feed intake (appetite) by the expression of MC3R and MC4R on hypothalamic neurons (Gantz and Fong, 2003). Therefore, other proteoglycans like hyaluronan, glypican, perlecan, and syndecans could also be related to feed intake.
The relationship between feed efficiency traits and the ion transport has been described in several studies (Richardson et al., 2004; Moore et al., 2009) and ion transport has been shown to contribute to the physiological variation of RFI in beef cattle (Richardson and Herd, 2004). Specifically, the Na+/K+-ATPase metabolism, which is related to homeostasis, can contribute up to 12% of the total body energy expenditure in cattle (McBride and Kelly, 1990). Homeostasis requires more energy expenditure for animal maintenance and it involves a series of metabolic pathways, mainly related to ion transport, energy metabolism, and protein turnover (Richardson et al., 2004). In addition, GWAS have also pointed to ion transport as one of the main mechanisms related to feed efficiency (Serão et al., 2013; Santana et al., 2014); global expression profile analysis in high and low feed efficiency animals showed a regulator of ion transport as one of the differentially expressed genes in Angus cattle (Chen et al., 2011).
Bayesian Lasso approach for GWAS, as adopted here, uncovered the genomic regions with biologically relevant genes and pathways for feed intake, feed efficiency, and performance. Metabolic pathway analysis using the SG approach may provide additional perspectives on the metabolic processes related to feed efficiency-related traits. Moreover, it also reinforces the earlier studies that indicated that ion transport is one of the most important physiological processes for the variation of these traits in beef cattle. These metabolic pathways are mainly related to the energy expended on maintenance of the ionic transport, body composition, and possible regulatory appetite processes. The genomic variance effected by the SNPs in the genome-wide association study was low; most of this variance was in the intergenic and intronic regions. These results could help in the genetic improvement of beef feed efficiency.