Statistical Tests for Identification of Differentially Expressed Genes in Microarray Data
Biomedical Statistics and Informatics
Volume 2, Issue 4, December 2017, Pages: 166-171
Received: Jul. 29, 2017; Accepted: Aug. 30, 2017; Published: Oct. 20, 2017
Views 2157      Downloads 209
Harun or Rashid, Department of Statistics, Faculty of Science, Begum Rokeya University, Rangpur, Bangladesh
Arefin Mowla, Department of Statistics, Faculty of Science, Begum Rokeya University, Rangpur, Bangladesh
Siddikur Rahman, Department of Statistics, Faculty of Science, Begum Rokeya University, Rangpur, Bangladesh
Siraj-Ud-Doulah, Department of Statistics, Faculty of Science, Begum Rokeya University, Rangpur, Bangladesh
Bipul Hossen, Department of Statistics, Faculty of Science, Begum Rokeya University, Rangpur, Bangladesh
Article Tools
Follow on us
Gene expression assay provide a fast and organic way to identity disease markers relevant to clinical trial in modern age. In microarray experiments, differentially expressed genes, or discriminator genes, are the genes with considerably different expression patterns in two user-defined groups. Typically microarray data consists of huge amount of genes, and which genes are responsible or differentiable for a particular disease. Identification of differentially expressed genes across multiple conditions has become a vigorous statistical problem in analyzing large-scale microarray data. In this perspective, we considered a simulated data and real data sets (Head and Neck cancer). This paper uses some statistical methods: t-test, Wilcoxon signed-rank sum test and renewed approach to detect the differential expression of genes between conditions and finding the required number of differentially expressed genes. Additionally Principal Component Analysis (PCA) and largest difference from mean and data methods are used for visualizing outliers and finding numerical outliers respectively. If introducing some artificial outliers to simulated and real data sets and these outliers are not affected or not related to the differentially expressed genes. Results reveal that 25, 126 and 385 differentially expressed genes are identified by using t-test, Wilcoxon Rank sum test and Renewed Approach respectively. Among the three methods 23 common genes those are may be responsible for cancer disease. This paper shows that the two samples mean test (t-test) is perfectly used to identify the differentially expressed genes in microarray data.
Microarray Gene Expression Data, T-Test, Renewed Approach, Wilcoxon Signed Rank Test, Differentially Expressed Genes, Outlier
To cite this article
Harun or Rashid, Arefin Mowla, Siddikur Rahman, Siraj-Ud-Doulah, Bipul Hossen, Statistical Tests for Identification of Differentially Expressed Genes in Microarray Data, Biomedical Statistics and Informatics. Vol. 2, No. 4, 2017, pp. 166-171. doi: 10.11648/j.bsi.20170204.16
Copyright © 2017 Authors retain the copyright of this article.
This article is an open access article distributed under the Creative Commons Attribution License ( which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Nguyen TV, Andresen BS, Corydon TJ, Ghisla S, Abd-El Razik N, Mohsen AW, Cederbaum SD, Roe DS, Roe CR, Lench NJ, Vockley J (2002); Identification of isobutyryl-CoA dehydrogenase and its deficiency in humans. Mol Genet Metab, vol. 77, pp. 68-79.
Chu G, Narasimhan B, Tibshirani R, Tusher V (2002); "SAM "Significance Analysis of Microarrays" Users Guide and technical document."
Monti S, Tamayo P, Mesirov J, Golub T. (2003); Consensus clustering: a re-sampling-based method for class discovery and visualization of gene expression microarray data. Mach Learn, vol. 52, pp. 91-118.
Devore J. And Peck R (1997); “Satistics: The exploration and analysis of data”, 3rd edition, Duxury Press, Pacific Grove, CA.
Thomas JG, Olson JM, Tapscott SJ, Zhao (2001); An efficient and robust statistical modeling approach to discover differentially expressed genes using genomic expression profiles. Genome Research, vol.11, No. 7, pp. 1227-1236.
Pan W (2001); A comparative review of statistical methods for discovering differentially expressed genes in replicated microarray experiments. Bioinformatics, vol. 18, pp. 546-554.
Efron B, Tibshirani R, Gross V, Tusher VG (2001); Empirical Bayes analysis of a microarray experiment. Journal of American Statistic Association, vol. 96, pp. 1151-1160.
Tusher VG, Tibshirani R, and Chu G (2001); “Significance Analysis of Microarrays Applied to the Ionizing Radiation Response,” Proceeding National Academy of Sciences USA, vol. 98, pp. 5116-5121.
Jung K., Quast K., Gannoun A. and Urfer W. (2006); A renewed approach to the nonparametric analysis of replicated microarray experiments. Biometrical Journal, vol. 48, pp. 245-254.
Quackenbush J (2001); Computational analysis of cDNA microarray data. Nature Reviews, vol. 6, No. 2, pp. 418-428.
Chun-Ming Jiang, Xiao-Hua Wang, Jin Shu, Wei-Xia Yang, Ping Fu, Li-Li Zhuang, Guo-Ping Zhou (2015); Analysis of differentially expressed genes based on microarray data of glioma. Int J Clin Exp Med, vol. 8, pp. 17321–17332.
Jennifer SM, Ariana KL, Charles JR, Qing-Xiang AS (2015); Differentially Expressed Genes and Signature Pathways of Human Prostate Cancer. PLoS One, vol. 10, No. 12, e0145322.
Hossen Md. B. and Siraj-Ud-Doulah (2016); Identification of Robust Clustering Methods in Gene Expression Data Analysis. Current Bioinformatics, vol. 11, No. 3, pp. 01-05.
Best DJ and Rayner JC (1987); Multiple Comparisons, Selection and Applications in Biometry. Vol. 30, pp. 719-724.
Dudoit S, Shaffer CBJ (2003); Multiple hypothesis testing in microarray experiments. Statistical Science. vol. 18, No. 1, pp. 71–103.
Alka B, Monir HS, Hassan AK (2015); Incremental principal component analysis based outlier detection methods for spatiotemporal data streams. ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences, vol. II-4/W2, pp. 67-71.
Jolliffe (2001); Principal Component Analysis, 2nd edition, Springer Series in Statistics.
Snedecor, G. W., Cochran, W. G. (1980). Statistical Methods (seventh edition). Iowa State University, Press, Ames, Iowa.
Corder, G. W., Foreman, D. I. (2009). Nonparametric Statistics for Non-Statisticians: A Step-by-Step Approach Wiley, ISBN 978-0-470-45461-9.
Meiller A, Alvarez S, Drané P, Lallemand C, Blanchard B, et al. (2007); p53-dependent stimulation of redox-related genes in the lymphoid organs of gamma-irradiated mice: identification of haeme-oxygenase 1 as a direct p53 target gene. Nucleic Acids Res, vol. 20, pp. 6924–6934.
Zhao LP, Prentice R and Breeden L (2001); Statistical modeling of large microarray data sets to identify stimulus-response profiles. Proc National Acedemy of Science USA, vol. 98, pp. 5631-5636.
Science Publishing Group
1 Rockefeller Plaza,
10th and 11th Floors,
New York, NY 10020
Tel: (001)347-983-5186