|
Ulisses Braga-Neto and Edward R. Dougherty answers a
few questions about this month's fast breaking paper in the field of
Computer Science.
From
•>>February 2006
-
[LATE ENTRY]
Field:
Computer Science
Article Title: Is cross-validation valid for small-sample microarray classification?
Authors: Braga-Neto,
UM;Dougherty, ER
Journal: BIOINFORMATICS
Volume: 20
Issue: 3
Page: 374-380
Year: FEB 12 2004
* Texas A&M Univ, Dept Elect Engn, 214 Zachry Engn Ctr, College Stn, TX 77840 USA.
* Texas A&M Univ, Dept Elect Engn, College Stn, TX 77840 USA.
* Univ Texas, MD Anderson Canc Ctr, Sect Clin Canc Genet, Houston, TX 77030 USA.
* Univ Texas, MD Anderson Canc Ctr, Dept Pathol, Houston, TX 77030 USA.
|
Why
do you think your paper is highly cited?
The field of functional genomics, which depends to a great
extent on DNA microarray technology, is highly successful and
active, engaging a vast number of research groups all over the
world and promising big breakthroughs in both basic science and
clinical applications. Our paper deals with the issue of
classifier error estimation in DNA microarray analysis.
|
|
|
“DNA microarrays make it possible to screen the expression of the whole human genome in a single experiment.”
|
|
Cross-validation has been taken for granted as the error
estimator of choice in most functional genomic applications, but
what about its own validity, especially in the small-sample
settings prevalent in functional genomics applications? We argue
in our paper, through both mathematical argumentation as well as
extensive simulation results with synthetic and real patient
data, that unqualified confidence in cross-validation is in fact
misplaced. This is a critical scientific issue, because
imprecise error estimation can lead to the inference of false
scientific hypotheses.
Does
it describe a new discovery or a new methodology that's useful to
others?
We provide a fairly comprehensive review of several error
estimation techniques that are applicable to microarray
analysis, commenting on the strong and weak points of each
technique. We also provide a thorough discussion of variance and
outlier issues in the context of error estimation for microarray
classification.
The fact that cross-validation can be unreliable due to its
variance has been known in the field of statistical pattern
recognition; however, it appears not to be well-known in
biostatistics, and appears to be even less known among
functional genomics practitioners. Our paper provides a valuable
contribution in pointing out pitfalls that arise from the use of
imprecise error estimation.
Could
you summarize the significance of your paper in layman's terms?
DNA microarrays make it possible to screen the expression of
the whole human genome in a single experiment. It can lead to
the discovery of genes whose expression can be used in disease
diagnosis and prognosis, as well as targets for drug
development. Error estimation is a crucial component of this
process, as it assesses the probability of making incorrect
predictions on future data, and thus directly affects the
accuracy of the scientific hypotheses obtained with microarray
technology.
Our paper provides a comprehensive review of error estimation
techniques in the context of microarray analysis, pointing out
pitfalls that can arise from the careless application of
imprecise error estimators.
How
did you become involved in this research, and were there successes
or failures along the way?
Ulisses Braga-Neto: After obtaining my Ph.D. in
Electrical and Computer Engineering a few years ago, I decided
to apply my background in signal processing and statistics to
the study of Computational Biology and Bioinformatics. I took a
two-year post-doctoral position at the University of Texas M.D.
Anderson Cancer Center in Houston, TX, where I was presented
with interesting and challenging scientific problems regarding
the use of microarray technology in chemo-prevention studies of
hereditary cancer.
During that same time, I was a visiting scholar with the
Genomic Signal Processing lab, headed by Prof. Edward Dougherty at
Texas A&M University, where I found an environment highly
conducive to critical thinking about the basic statistical
methodology assumed in functional genomics.
If
applicable, what are the social or political implications of your
research?
The social implications of this work have to do with its
applicability in the field of medical research. Using the
correct statistical methodology for the analysis of microarray
data has the potential to accelerate drug/vaccine development,
alleviating the suffering of millions of people. On the other
hand, using imprecise statistical methodology has the opposite
effect.
Ulisses Braga-Neto, Assistant Researcher
Laboratory of Virology and Experimental Therapeutics
Aggeu Magalhăes Research Center
Oswaldo Cruz Foundation (FIOCRUZ)
Recife, Brazil
Edward R. Dougherty, Professor
Department of Electrical Engineering
Texas A&M University
College Station, TX, USA
|
ESI Special Topics,
February 2006
Citing URL - http://www.esi-topics.com/fbp/2006/february06-Braga-Neto_Dougherty.html
|
|