Statistical Tests for Identification of Differentially Expressed Genes in Microarray Data
Biomedical Statistics and Informatics
Volume 2, Issue 4, December 2017, Pages: 166-171
Received: Jul. 29, 2017;
Accepted: Aug. 30, 2017;
Published: Oct. 20, 2017
Views 2099 Downloads 207
Harun or Rashid, Department of Statistics, Faculty of Science, Begum Rokeya University, Rangpur, Bangladesh
Arefin Mowla, Department of Statistics, Faculty of Science, Begum Rokeya University, Rangpur, Bangladesh
Siddikur Rahman, Department of Statistics, Faculty of Science, Begum Rokeya University, Rangpur, Bangladesh
Siraj-Ud-Doulah, Department of Statistics, Faculty of Science, Begum Rokeya University, Rangpur, Bangladesh
Bipul Hossen, Department of Statistics, Faculty of Science, Begum Rokeya University, Rangpur, Bangladesh
Gene expression assay provide a fast and organic way to identity disease markers relevant to clinical trial in modern age. In microarray experiments, differentially expressed genes, or discriminator genes, are the genes with considerably different expression patterns in two user-defined groups. Typically microarray data consists of huge amount of genes, and which genes are responsible or differentiable for a particular disease. Identification of differentially expressed genes across multiple conditions has become a vigorous statistical problem in analyzing large-scale microarray data. In this perspective, we considered a simulated data and real data sets (Head and Neck cancer). This paper uses some statistical methods: t-test, Wilcoxon signed-rank sum test and renewed approach to detect the differential expression of genes between conditions and finding the required number of differentially expressed genes. Additionally Principal Component Analysis (PCA) and largest difference from mean and data methods are used for visualizing outliers and finding numerical outliers respectively. If introducing some artificial outliers to simulated and real data sets and these outliers are not affected or not related to the differentially expressed genes. Results reveal that 25, 126 and 385 differentially expressed genes are identified by using t-test, Wilcoxon Rank sum test and Renewed Approach respectively. Among the three methods 23 common genes those are may be responsible for cancer disease. This paper shows that the two samples mean test (t-test) is perfectly used to identify the differentially expressed genes in microarray data.
Harun or Rashid,
Statistical Tests for Identification of Differentially Expressed Genes in Microarray Data, Biomedical Statistics and Informatics.
Vol. 2, No. 4,
2017, pp. 166-171.
Nguyen TV, Andresen BS, Corydon TJ, Ghisla S, Abd-El Razik N, Mohsen AW, Cederbaum SD, Roe DS, Roe CR, Lench NJ, Vockley J (2002); Identification of isobutyryl-CoA dehydrogenase and its deficiency in humans. Mol Genet Metab, vol. 77, pp. 68-79.
Chu G, Narasimhan B, Tibshirani R, Tusher V (2002); "SAM "Significance Analysis of Microarrays" Users Guide and technical document."
Monti S, Tamayo P, Mesirov J, Golub T. (2003); Consensus clustering: a re-sampling-based method for class discovery and visualization of gene expression microarray data. Mach Learn, vol. 52, pp. 91-118.
Devore J. And Peck R (1997); “Satistics: The exploration and analysis of data”, 3rd edition, Duxury Press, Pacific Grove, CA.
Thomas JG, Olson JM, Tapscott SJ, Zhao (2001); An efficient and robust statistical modeling approach to discover differentially expressed genes using genomic expression profiles. Genome Research, vol.11, No. 7, pp. 1227-1236.
Pan W (2001); A comparative review of statistical methods for discovering differentially expressed genes in replicated microarray experiments. Bioinformatics, vol. 18, pp. 546-554.
Efron B, Tibshirani R, Gross V, Tusher VG (2001); Empirical Bayes analysis of a microarray experiment. Journal of American Statistic Association, vol. 96, pp. 1151-1160.
Tusher VG, Tibshirani R, and Chu G (2001); “Significance Analysis of Microarrays Applied to the Ionizing Radiation Response,” Proceeding National Academy of Sciences USA, vol. 98, pp. 5116-5121.
Jung K., Quast K., Gannoun A. and Urfer W. (2006); A renewed approach to the nonparametric analysis of replicated microarray experiments. Biometrical Journal, vol. 48, pp. 245-254.
Quackenbush J (2001); Computational analysis of cDNA microarray data. Nature Reviews, vol. 6, No. 2, pp. 418-428.
Chun-Ming Jiang, Xiao-Hua Wang, Jin Shu, Wei-Xia Yang, Ping Fu, Li-Li Zhuang, Guo-Ping Zhou (2015); Analysis of differentially expressed genes based on microarray data of glioma. Int J Clin Exp Med, vol. 8, pp. 17321–17332.
Jennifer SM, Ariana KL, Charles JR, Qing-Xiang AS (2015); Differentially Expressed Genes and Signature Pathways of Human Prostate Cancer. PLoS One, vol. 10, No. 12, e0145322. https://doi.org/10.1371/journal.pone.0145322.
Hossen Md. B. and Siraj-Ud-Doulah (2016); Identification of Robust Clustering Methods in Gene Expression Data Analysis. Current Bioinformatics, vol. 11, No. 3, pp. 01-05.
Best DJ and Rayner JC (1987); Multiple Comparisons, Selection and Applications in Biometry. Vol. 30, pp. 719-724.
Dudoit S, Shaffer CBJ (2003); Multiple hypothesis testing in microarray experiments. Statistical Science. vol. 18, No. 1, pp. 71–103.
Alka B, Monir HS, Hassan AK (2015); Incremental principal component analysis based outlier detection methods for spatiotemporal data streams. ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences, vol. II-4/W2, pp. 67-71.
Jolliffe (2001); Principal Component Analysis, 2nd edition, Springer Series in Statistics.
Snedecor, G. W., Cochran, W. G. (1980). Statistical Methods (seventh edition). Iowa State University, Press, Ames, Iowa.
Corder, G. W., Foreman, D. I. (2009). Nonparametric Statistics for Non-Statisticians: A Step-by-Step Approach Wiley, ISBN 978-0-470-45461-9.
Meiller A, Alvarez S, Drané P, Lallemand C, Blanchard B, et al. (2007); p53-dependent stimulation of redox-related genes in the lymphoid organs of gamma-irradiated mice: identification of haeme-oxygenase 1 as a direct p53 target gene. Nucleic Acids Res, vol. 20, pp. 6924–6934.
Zhao LP, Prentice R and Breeden L (2001); Statistical modeling of large microarray data sets to identify stimulus-response profiles. Proc National Acedemy of Science USA, vol. 98, pp. 5631-5636.