Acta Veterinaria et Zootechnica Sinica ›› 2020, Vol. 51 ›› Issue (9): 2068-2078.doi: 10.11843/j.issn.0366-6964.2020.09.004

• ANIMAL GENETICS AND BREEDING • Previous Articles     Next Articles

Study on the Strategies of Genotype Imputation

DENG Tianyu, DU Lixin*, WANG Lixian, ZHAO Fuping*   

  1. Key Laboratory of Animal Genetics, Breeding and Reproduction(poultry) of Ministry of Agriculture, Institute of Animal Science, Chinese Academy of Agricultural Sciences, Beijing 100193, China
  • Received:2020-03-12 Online:2020-09-25 Published:2020-09-25

Abstract: Genomic data is more and more widely used in livestock breeding. Genotype imputation is an important tool to handle missing values in genotypic data, and the quality of imputation results directly affects the subsequent analysis. To obtain good imputation results, a comprehensive imputation strategy needs to be formulated. We studied on the effects of several factors on genotype imputation by simulation. The factors included reference population size, genetic relationship (distance) between the target population and the reference population, the number of target sites (proportion), the minimum allele frequency (MAF), and the imputation algorithm. The results showed that the number of target sites was the main factor affecting the genotype imputation, and it showed significantly positive correlation with the quality of imputation(P<0.05). The reference population size was the main factor affecting the imputation error rate in Beagle5.1. Correspondingly, the number of target sites was the main factor affecting the imputation error rate in Minimac4. Genetic distance between the target population and the reference population had a more significant effect on the imputation quality of Beagle5.1 than Minimac4. In general, the imputation error rate increased as the increases of MAF in a site. When the number of individuals in the reference population was small and the number of target sites was large, the speed of Minimac4 was superior to Beagle5.1, but there was a reverse trend as the reference population size increased. On the premise of ensuring the imputation quality, Beagle5.1 had relatively lower requirements for the above factors. In contrast, when the number of target sites was low and reference population size was large, the imputation effect of Beagle5.1 was better, while Minimac4 was more suitable for the imputation of a small reference population size and a higher number of target sites. In this study, different strategies were formulated for different imputation purposes, and the study results would provide a reference for genotype imputation.

Key words: genotype imputation, simulation data, reference population size, imputation method, error rate

CLC Number: