Genome-wide classification of dairy cows using decision trees and artificial neural network algorithms
We compared two techniques of machine learning for the identification of cows that will be good producers of milk based on their genome-wide information. Data from a genome-wide genotyping panel, consisting of 164312 single nucleotide polymorphism markers (SNPs), within the 29 autosomal chromosomes, from 1092 Holstein cow samples were used for this study. Sample cows were divided as high-milk producers and low-milk producers based on their estimated breeding value of the 305 day average milk yield. Seven data sets were generated that grouped chromosomes with the highest number of SNPs related to milk production for prediction. Decision trees and artificial neural network algorithms were trained and tested, and the performance of prediction was computed. The mean prediction accuracy obtained with the decision tree algorithm was 92.44%, with a maximum of 94.5%, while the mean prediction accuracy obtained with the artificial neural network algorithm was 82.19%, with a maximum of 87.3%. Also, the decision tree algorithm permitted the identification of the most dominant single nucleotide polymorphism for prediction, which is situated within a milk-related quantitative trait locus in chromosome 14. Finally, our results add new evidence to support that machine learning algorithms may be used for managing genome-wide SNP markers, for implementing classification and prediction tools in the cattle industry.