畜牧兽医学报 ›› 2024, Vol. 55 ›› Issue (6): 2431-2440.doi: 10.11843/j.issn.0366-6964.2024.06.015

• 遗传育种 • 上一篇    下一篇

基因型特征提取方法影响基因组选择预测准确性的研究

吴华煊(), 杜志强*()   

  1. 长江大学动物科学技术学院, 荆州 434025
  • 收稿日期:2023-11-08 出版日期:2024-06-23 发布日期:2024-06-28
  • 通讯作者: 杜志强 E-mail:2021710855@yangtzeu.edu.cn;zhqdu@yangtzeu.edu.cn
  • 作者简介:吴华煊(1998-),男,江西上饶人,硕士生,主要从事动物遗传育种研究,E-mail: 2021710855@yangtzeu.edu.cn
  • 基金资助:
    安徽省畜禽联合育种改良项目(2021-2025)

Methods of Genotype Feature Extraction Affecting the Prediction Accuracy of Genomic Selection

Huaxuan WU(), Zhiqiang DU*()   

  1. College of Animal Science and Technology, Yangtze University, Jingzhou 434025, China
  • Received:2023-11-08 Online:2024-06-23 Published:2024-06-28
  • Contact: Zhiqiang DU E-mail:2021710855@yangtzeu.edu.cn;zhqdu@yangtzeu.edu.cn

摘要:

旨在探索并评估6种不同的单核苷多态性(single nucleotide polymorphisms,SNP)基因型特征提取方法。本研究分析比较了6种方法:主成分分析(principal component analysis,PCA)、基因主成分分析(gene-principal component analysis,gene-PCA)、SNP位点间皮尔逊相关系数(SNP-pearson correlation coefficient, SNP-PCC)、连锁不平衡(linkage disequilibrium,LD)、全基因组关联分析(genome-wide association study,GWAS)和随机抽样(random sampling,RS),在两组数据(北京鸭,542个样本,SNP位点数39 932;杜洛克猪,2 549个样本,SNP位点数230 884)3组表型(北京鸭体长(body length)、杜洛克猪背膘厚(backfat thickness)和乳头数(teat number))上的GEBV预测准确率。发现SNP-PCC结合5种GS方法(GBLUP、BayesA、BayesB、BayesC、Bayesian Lasso),在北京鸭数据获得相对可靠的预测精度,在猪背膘厚和乳头数表型获得最高平均预测准确性(提升5%,达到32.3%),并显著提升计算效率(平均提升5~7倍)。综上,本研究发现选择合适的特征提取方法可以有效提升GS的预测准确性和计算效率,为深入研究不同特征提取方法对GS预测准确性的影响奠定了基础,并为其在育种实践中应用提供了参考。

关键词: 基因组选择, 特征提取, 预测准确性

Abstract:

The purpose of this study was to explore and evaluate 6 different methods for extracting genotype feature of single nucleotide polymorphisms (SNP). Six methods were analyzed and compared: principal component analysis (PCA), gene-principal component analysis (gene-PCA), SNP-Pearson correlation coefficient (SNP-PCC), linkage disequilibrium (LD), and genome-wide association study (GWAS) and random sampling (RS). The prediction accuracy of GEBV in 2 sets of data (Beijing duck, 542 samples, SNP loci 39 932; Duroc pig, 2 549 samples, SNP loci 230 884) and 3 sets of phenotypes (Beijing duck body length, Duroc pig backfat thickness and teat number) was evaluated. Results showed that SNP-PCC combined with 5 GS methods (GBLUP, BayesA, BayesB, BayesC, and Bayesian Lasso) achieved relatively reliable prediction accuracy for the Pecking duck body length phenotype and achieved the highest average prediction accuracy in pig backfat thickness and teat number phenotypes (increased by 5%, reaching 32.3%), and significantly improved computational efficiency (on average 5-7 times faster). In summary, this study found that selecting appropriate feature extraction methods can effectively improve the accuracy and computational efficiency of GS prediction, laying the foundation for in-depth research on the impact of different feature extraction methods on GS prediction accuracy, and providing reference for their application in breeding practice.

Key words: genomic selection, feature extraction, prediction accuracy

中图分类号: