On the Detection of Influential Outliers in Linear Regression Analysis
American Journal of Theoretical and Applied Statistics
Volume 3, Issue 4, July 2014, Pages: 100-106
Received: Jul. 11, 2014; Accepted: Jul. 21, 2014; Published: Jul. 30, 2014
Views 2942      Downloads 210
Authors
Arimiyaw Zakaria, Department of Mathematics and Statistics, University of Cape Coast, Cape Coast, Ghana
Nathaniel Kwamina Howard, Department of Mathematics and Statistics, University of Cape Coast, Cape Coast, Ghana
Bismark Kwao Nkansah, Department of Mathematics and Statistics, University of Cape Coast, Cape Coast, Ghana
Article Tools
Follow on us
Abstract
In this paper, we propose a measure for detecting influential outliers in linear regression analysis. The performance of the proposed method, called the Coefficient of Determination Ratio (CDR), is then compared with some standard measures of influence, namely: Cook’s distance, studentised deleted residuals, leverage values, covariance ratio, and difference in fits standardized. Two existing datasets, one artificial and one real, are employed for the comparison and to illustrate the efficiency of the proposed measure. It is observed that the proposed measure appears more responsive to detecting influential outliers in both simple and multiple linear regression analyses. The CDR thus provides a useful alternative to existing methods for detecting outliers in structured datasets.
Keywords
Coefficient of Determination Ratio, Cook’s Distance, DFFITS, CVR, Studentised Deleted Residuals, Leverage Values
To cite this article
Arimiyaw Zakaria, Nathaniel Kwamina Howard, Bismark Kwao Nkansah, On the Detection of Influential Outliers in Linear Regression Analysis, American Journal of Theoretical and Applied Statistics. Vol. 3, No. 4, 2014, pp. 100-106. doi: 10.11648/j.ajtas.20140304.14
References
[1]
Barnett, V., & Lewis, T. (1994). Outliers in statistical data (3rd ed.). New York, NY: John Wiley and Sons.
[2]
Cohen, J., Cohen, P., West, S. G., & Aiken, L. S. (2003). Applied multiple regression/correlation analysis for the behavioral sciences (3rd ed.). London, England: Lawrence Erlbaum Associates.
[3]
Weisberg, S. (2005). Applied linear regression (3rd ed.). New York, NY: John Wiley and Sons.
[4]
Nurunnabi, A. A. M., Imon, A. H. M. R., Ali, A. B. M. S., & Nasser, M. (2011). Outlier detection in linear regression. Retrieved June 9, 2011 from http://irma-international.org/chapter/outlier-detection-linear-regression/53318/
[5]
Chatterjee, S., & Hadi, A. S. (1988). Sensitivity analysis in linear regression. New York, NY: John Wiley & Sons.
[6]
Cook, R. D. & Weisberg, S. (1982). Residuals and Influence in Regression. New York, NY: Chapman and Hall.
[7]
Rencher, A. C. & Schaalje, G. B. (2008). Linear models in statistics (2nd ed.). New Jersey, NJ: John Wiley & Sons.
[8]
Siniksaran, E. & Satman, M. H. (2011). PURO: A package for unmasking regression outliers. Gazi University Journal of Science, 24 (1), 59-68.
[9]
Moore, J. (1975): Total biochemical oxygen demand of dairy manures. Ph. D. Thesis, Univ. of Minnesota, Dept. Agricultural Engineering.
[10]
Chatterjee, S. & Hadi, A. S. (1986). Influential observations, high leverage points, and outliers in linear regression. Statistical Science, 1 (3), 379-393.
ADDRESS
Science Publishing Group
1 Rockefeller Plaza,
10th and 11th Floors,
New York, NY 10020
U.S.A.
Tel: (001)347-983-5186