畜牧兽医学报 ›› 2009, Vol. 40 ›› Issue (2): 180-184.doi:

• 遗传繁育 • 上一篇    下一篇

不同实验类型的基因表达数据聚类分析方法研究

刘天飞,唐国庆, 李学伟*   

  1. 四川农业大学动物科技学院,雅安 625014
  • 收稿日期:1900-01-01 修回日期:1900-01-01 出版日期:2009-02-24 发布日期:2009-02-24
  • 通讯作者: 李学伟

Effects of Data Preprocessing and Measuring Metrics for Different Gene Expression Data

LIU Tian-fei,TANG Guo-qing,LI Xue-wei*   

  1. College of Animal Science and Technology,Sichuan Agricultural University, Ya’an 625014, China
  • Received:1900-01-01 Revised:1900-01-01 Online:2009-02-24 Published:2009-02-24

摘要: 就基因芯片数据聚类分析中广泛应用的K-means算法对常见的2种类型的基因芯片数据上的应用进行研究。结果表明,不同类型的基因芯片数据适用于不同的预处理方式和不同的相似度。对于时间序列数据集,对数化转换后,相似度选择协方差所得结果最好。对于非时间序列数据集,对数转化最好,相似度选取欧氏距离、平方欧氏距离、马氏距离都比较好。

Abstract: The effects of different measuring metrics and data preprocessing for different gene expression data on K-means clustering were studied. The results illustrated that different data preprocessing ways made significant differences under different measuring metrics. The best data preprocessing in K-means clustering was to select log transformations for the timecourse gene expression dataset, and measuring metrics is to select covariance metrics. However, the best data preprocessing is log transformations for other datasets, three measuring metrics (Euclidean distance, squared Euclidean distance and Manhattan distance) led to better results.