Acta Veterinaria et Zootechnica Sinica ›› 2025, Vol. 56 ›› Issue (9): 4410-4421.doi: 10.11843/j.issn.0366-6964.2025.09.023

• Animal Genetics and Breeding • Previous Articles     Next Articles

Improving Genomic Prediction Accuracy via Auto-encoder-based Compression of Transcriptome Data

QIAN Li1(), LIANG Mang1, DENG Tianyu1,2, DU Lili1, LI Keanning1, QIU Shiyuan1, XUE Qingqing1,3, ZHANG Lupei1, GAO Xue1, XU Lingyang1, ZHENG Caihong1, LI Junya1, GAO Huijiang1,*()   

  1. 1. Institute of Animal Science, Chinese Academy of Agricultural Sciences, Beijing 100193, China
    2. College of Animal Science and Technology, Northwest A&F University, Yangling 712100, China
    3. Heilongjiang Bayi Agricultural University, Daqing 163319, China
  • Received:2025-02-26 Online:2025-09-23 Published:2025-09-30
  • Contact: GAO Huijiang E-mail:pbli0201@163.com;gaohuijiang@caas.cn

Abstract:

The study aimed to further address the limitations of traditional linear regression models in capturing the complex relationships between genotype and phenotype, and improve the accuracy of genomic prediction by integrating omics data using machine learning. This study was based on two datasets containing both genotype and transcriptome information: 1) The Huaxi cattle dataset involved 3 economically important traits: live weight, carcass weight, and net meat weight; 2) The rice dataset included 3 agronomic traits: yield, grain, and kilo-grain weight (KGW). Five-fold cross-validation was employed, and Pearson correlation coefficients were used to evaluate the accuracy of estimated breeding values. We first compared the prediction performance using single-omics data as input, and then applied an autoencoder to perform dimensionality reduction and construct latent matrices as new relationship matrix for model training. The results showed that using transcriptomic data instead of genomic data as model input improved prediction performance, with accuracy increases of 44.2% and 27.4% in the rice and Huaxi cattle datasets, respectively. Furthermore, incorporating latent matrices extracted via autoencoders further enhanced prediction accuracy by 4.10% in rice and 6.81% in Huaxi cattle compared to traditional genomic relationship matrix. Correlation analysis revealed that the latent matrix exhibited strong nonlinear relationships with the original omics data. Using transcriptomic data as model input and incorporating relationship matrices constructed via autoencoders can improve the accuracy of selection, provide valuable insights for sustained genetic improvement in breeding programs.

Key words: multi-omics data, features prescreening, machine learning, genomic prediction

CLC Number: