Beginning in mid-February 2008, the 1997-2007 online version of the Science Watch® newsletter, ESI-Topics.com, and in-cites.com, will all be featured together on the redesigned ScienceWatch.com. All previous content from the three sites will be permanently archived, and remain accessible from any existing bookmarks to the archived pages. No new content will be added to this site. Updates and new content (updated biweekly) are available at ScienceWatch.com now.

Emerging Research Fronts Comments

Return to menu of Emerging Research Fronts

ESI Special Topics, October 2002
Citing URL: http://www.esi-topics.com/erf/comments/october02-GaryKing.html

From •>>October 2002

Gary King answers a few questions about this month's Emerging Research Front in field of Psychiatry/Psychology:

Psychiatry/Psychology, general
Article: "Analyzing incomplete political science data: An alternative algorithm for multiple imputation"
Authors: King, G;Honaker, J;Joseph, A;Scheve, K
Journal: AMER POLIT SCI REV, 95: (1) 49-69 MAR 2001
Addresses:
Harvard Univ, Ctr Basic Res Social Sci, World Hlth Org, Global Programme Evidence Hlth Policy, Cambridge, MA 02138 USA.
Harvard Univ, Ctr Basic Res Social Sci, World Hlth Org, Global Programme Evidence Hlth Policy, Cambridge, MA 02138 USA.
Harvard Univ, Ctr Basic Res Social Sci, Dept Govt, Cambridge, MA 02138 USA.
Yale Univ, Inst Social & Policy Studies, Dept Polit Sci, New Haven, CT 06520 USA.


ST:  Why do you think your paper is highly cited?

Our paper provides a way around a discrepancy between how almost all social scientists analyze data with missing values (such as public opinion surveys where respondents refuse to answer some questions) and the recommendations of the statistics community. With few exceptions, methodologists and statisticians agree that a technique called "multiple imputation" is superior to the way social scientists commonly treat missing data. The technique has been known for two decades, but it had rarely been used in real research settings (i.e., by few other than statisticians and their students and consulting clients). The discrepancy occurred because the only algorithms available to implement the technique were slow, extremely difficult to implement, impossible to run in existing statistical packages, and usable only by researchers with expertise in arcane techniques they would otherwise have little need for and did not know. We adapted an algorithm in a new way to implement a general-purpose, multiple imputation model for missing data (known as EMis) that is considerably easier to use and much faster. We also showed that the risks of existing missing data practices were substantial (i.e., on par with the much better known bias that can occur when omitting appropriate controls). Our article also gave examples where our approach led to more informative and less biased substantive conclusions. As a companion to the paper, we also offered easy-to-use, open source software that implements our methods (see "Amelia: A Program for Missing Data," available at http://GKing.Harvard.edu).

ST:  Does it describe a new discovery or new methodology that's useful to others?

The idea seems to have proven useful to others, and indeed many thousands of copies of our software have been downloaded. Our survey of the literature indicated that about half of the respondents who participate in sample surveys refuse to give answers to one or more questions researchers need in the average article. Almost all analysts contaminate their data at least partially by filling in educated guesses for some of these nonresponses (such as by coding "don't know'' on issue positions as the middle category of Likert scales), and approximately 94% of researchers use "listwise deletion" to eliminate entire observations (losing about one-third of their data on average) when any one variable remains missing after the first procedure. At best (when respondents choose randomly which questions they will answer), these procedures cause scholarly analyses of survey and other data to discard a substantial quantity of information. At worst (when respondents choose not to answer survey questions for a reason related to the research questions), these procedures induce massive bias. Our algorithm reduces the likelihood of both problems.

ST:  Could you summarize the significance of your paper in layman's terms?

When asked by survey researchers for their income, political opinions, health status, or other sensitive information, some citizens understandably refuse to answer. This is of course their right, but if researchers need this information to understand the world (and perhaps to design policies to reduce unemployment, improve democracy, or advance health), researchers have to do something to fill in the missing information. Before our article, most political scientists and many others dropped all information from any respondent who did not answer every question of interest. For the average research article, our approach amounts to a way of using about 50% more information from the data than had previously been used, making research funds and investigator effort go farther. For example, consider a graduate student writing a dissertation and needing to collect about eight months worth of complete data in uncomfortable circumstances far from home. Ideally every datum collected would be complete, but even the best researchers lose approximately one-third of their observations to item nonresponse and listwise deletion. So nonresponse must be anticipated as part of any realistic research plan. However, instead of booking a trip for 12 months and planning to lose a third of the data—and four months of his or her life—it probably makes more sense to collect data for 8 months and take a few days to learn and implement our methodology.

ST:  How did you become involved in this research?

My coauthors-to-be and then graduate students—James Honaker (now Assistant Professor at UCLA), Anne Joseph (now law clerk to U.S. Supreme Court Justice Ruth Bader Ginsburg), and Kenneth Scheve (now Assistant Professor at Yale University)—and I set out to study missing data. The problem of missing data arises in almost every quantitative social science study, and we had all confronted the problem in our research and frequently been asked for methodological advice on the subject by other researchers. We also knew of the discrepancy between the way missing data methods are recommended and used, and we set out to find a way to address the problem.

Gary King
David Florence Professor of Government
Center for Basic Research in the Social Sciences
34 Kirkland Street
Harvard University
Cambridge, MA 02138

Return to Emerging Research Fronts | Return to Special Topics main menu
 

ESI Special Topics, October 2002
Citing URL: http://www.esi-topics.com/erf/comments/october02-GaryKing.html

ScienceWatch.com - Tracking Trends and Perfomance in Basic Research
Go to the new ScienceWatch.com

Write to the Webmaster with questions/comments. Terms of Usage.
The Research Services Group of Thomson Scientific |
(c) 2008 The Thomson Corporation.