By Leming Shi
ESI Special Topics,
March 2007
Citing URL - http://www.esi-topics.com/nhp/2007/march-07-LemingShi.html
|
Leming Shi
answers a few questions about this month's
new hot paper in the field of Computer Science.
The author has also
sent along images of their work.
From
•>>March 2007
Field:
Computer Science
Article Title: Cross-platform comparability of microarray technology: Intra-platform consistency and appropriate data analysis procedures are essential Authors:
Shi, LM;Tong, WD;Fang, H;Scherf, U;Han, J;Puri, RK;Frueh, FW;Goodsaid, FM;Guo, L;Su, ZQ;Han, T;Fuscoe, JC;Xu, ZA;Patterson, TA;Hong, HX;Xie, Q;Perkins, RG;Chen, JJ;Casciano, DA Journal: BMC BIOINFORMATICS Volume: 6 Issue: Page: :art. Year: no.-S12 Suppl. 2 JUL 15 2005 * US FDA, Natl Ctr Toxicol Res, 3900 NCTR Rd, Jefferson, AR 72079 USA. * US FDA, Natl Ctr Toxicol Res, Jefferson, AR 72079 USA. * Z Tech Corp, Jefferson, AR 72079 USA. * US FDA, Ctr Devices & Radiol Hlth, Rockville, MD 20850 USA. * US FDA, Ctr Biol Evaluat & Res, Bethesda, MD 20892 USA. * US FDA, Ctr Drug Evaluat & Res, Bethesda, MD 20892 USA.
|
Why
do you think your paper is highly cited?
The paper was highly cited because it examined the issue of
microarray cross-platform comparability, which was being widely
questioned at the time. Reported lack of cross-platform
comparability was being attributed solely to platform differences.
Our work demonstrated that the lack of quality experiment and a
poor choice of data analysis methods for selecting differentially
expressed genes were primarily responsible for the apparent
disagreement between microarray platforms. Our reanalysis led us to
conclude that DNA microarray results had far more reproducibility
and reliability than the growing negative perception had implied.
|
“Our work demonstrated that
the lack of quality experiment
and a poor choice of data
analysis methods for selecting
differentially expressed genes
were primarily responsible for
the apparent disagreement
between microarray platforms.” |
|
|
Importantly, our findings showed a clear and pressing need for
launching an ambitious, community-wide effort, the MicroArray
Quality Control (MAQC)1
project. The first phase of MAQC was completed and the results were
published in Nature Biotechnology2.
The aim of MAQC was to systematically address reliability concerns,
as well as those of performance along with standards, quality, and
data analysis issues about microarray technology.
Does
it describe a new discovery, methodology, or synthesis of knowledge?
The paper describes a reanalysis of the study by PK Tan et al.
(Nucleic Acids Res. 31: 5676-84, 2003) that was cited in a
high profile article by E Marshall (Science 306: 630-1,
2004). The Tan and Marshall articles raised serious concerns about
the disagreement of DNA microarray results obtained using different
experimental platforms. Marshall’s paper, in turn, apparently
fostered additional papers leading to widespread concerns about the
reliability of DNA microarray technology.
Our paper addresses a critical question or concern about DNA
microarray technology: Are microarray results reproducible and
reliable? We identified that the reported irreproducibility was due
to poor data quality and inappropriate data analysis approach, in
contrast to the cross-platform incomparability concluded by Tan
et al. We reanalyzed the data set using several different
statistical approaches and demonstrated that selecting
differentially expressed genes by simple t-test P values,
while ignoring fold change—the magnitude of difference in gene
expression levels, could be a major source of irreproducibility of
microarray results.
Could
you summarize the significance of your paper in layman’s terms?
DNA microarray is a highly parallel measurement technology
through which expression levels of tens of thousands of genes can be
simultaneously measured in one single experiment. One powerful
application of microarray technology is the identification of a
subset of "interesting" genes whose expression levels differ between
two biological conditions (e.g., normal vs. disease).
Usually, replicate samples from each condition are tested.
Consequently, we can calculate an average expression change, i.e.,
fold change, for each gene between the two conditions. In addition,
we can calculate a t-statistic (or the corresponding P value)
to indicate the statistical significance of the measured fold
change. Therefore, for a microarray platform with 30,000 genes, we
have to deal with 30,000 fold changes and P values.
The controversy starts when different yardsticks are used to
identify a subset of "interesting" or differentially expressed
genes. For people like me, with training in analytical chemistry,
the obvious ranking criterion should be fold change, i.e., the
magnitude of the actual quantity being measured by microarrays.
However, a commonly employed ranking criterion is the t-statistic
(or its equivalent P value), i.e., the statistical
significance of fold change. An inconvenient truth is that the
apparent lack of microarray reproducibility reported by Tan et
al. was caused, at least in part, by the common practice of
applying stringent statistical criteria to select genes without
considering fold change.
A general lesson we learned from this exercise is that, when
dealing with high-dimensional microarray data, we need to be mindful
of what is actually being measured. Statistical significance (e.g.,
t-statistic) intends to provide a confidence assessment of the
measured quantity (i.e., fold change). DNA microarray technology has
been criticized as irreproducible, based on the imposition of
stringent and conservative statistics measures that are inherently
less reproducible than the actual quantity being measured. The same
lesson applies to the analysis of high-dimensional proteomics and
metabonomics data.
How
did you become involved in this research, and were there obstacles along
the way?
Microarray technology has been identified by the U.S. FDA’s
Critical Path Initiative3
as a key tool for advancing medical product development and
personalized medicine. Reproducibility is an immutable principle of
science, and the concerns raised in the literature demanded
investigation and response.
I had been an ardent "fan" of microarray technology for some ten
years, and my hands-on experience had convinced me that it was a
reliable technology in good hands with great potential. Reanalysis
quickly revealed the simple reasons for Tan et al.’s negative
results.
It proved much more challenging, even contentious, to show that a
spreading negative image of microarrays was largely an artifact of
low-quality data and a poor choice of statistical methods. It is
commonplace for scientists to apply standard statistical methods to
support validity and explain limitations of research. Less often is
the validity and limitations of statistical methods given the same
scrutiny.
The high dimensionality of microarray (and all omics) data
requires statistical approaches that are themselves still a subject
of research, and that are neither broadly understood by nor
available to the scientific community. The conclusions we reached
and the consequent value of the paper were not appreciated by
reviewers until it was submitted to BMC Bioinformatics.
Are
there any social or political implications for your research?
Microarray technology has become ubiquitous and the excitement
about its prospects seems unprecedented. The benefits realized in
terms of hypotheses generated are already immense. Discovery of
biomarkers, faster and cheaper medical product development, and
personalized medicine are among the realistic goals that lie in the
future, provided the scientific community can develop and
disseminate standard methods and tools for reliable data analysis.
The appropriate way of identifying differentially expressed genes
is a continuing disagreement (L Klebanov et al., Nat
Biotechnol 25: 25-26 and L Shi et al., Nat Biotechnol
25: 26-27, 2007), which I predict will be unabated well into the
future. Disagreements among scientists should provide part of the
energy and process to move to consensus on the "best practices" for
the generation, analysis, and application of microarray data. This
is exactly the goal of the MAQC project that recently entered its
second phase.
Leming Shi, Ph.D.
Principal Investigator
National Center for Toxicological Research
U.S. Food and Drug Administration
Jefferson, Arkansas, USA
Disclaimer:
The views presented in this Commentary do
not necessarily reflect those of the U.S. Food and Drug Administration.
|
|
A Closer Look...
|
 |
Below
are images sent in by Leming Shi
which correspond with the featured
paper, or current research. |
|
|
Figure 1:
 |
|
Figure
1: The
level of concordance of "interesting" genes
identified by different microarray platforms
largely depends on the selection of different
data analysis procedures. A: Poor cross-platform
concordance was reported by PK Tan et al.
(2003); B and C: Much higher cross-platform
concordance was observed by our reanalysis of
the same data set (L Shi et al., 2005). |
|
|
|
Figure 2:
 |
|
Figure 2:
The
reproducibility of fold changes between two test
sites using the same microarray platform is much
higher than that of the t-statistic P
values between the two test sites. Therefore,
the lack of reproducibility of P values
should not be used as evidence to criticize
microarray technology that tries to measure fold
changes (FC) instead of P values. |
|
|
|
Figure 3:
 |
|
Figure
3: The
concordance of lists of "interesting"
(differentially expressed) genes depends on the
choice of gene selection methods and the
threshold of the selection criterion (L Shi
et al., 2006). The x-axis represents the
number of genes selected as differentially
expressed (corresponding to different
thresholds), and the y-axis is the percentage
(%) of genes common to the two gene lists
derived from two test sites. Concordance between
genes selected completely at random is shown in
red and reaches only 50% when all candidate
genes (about 10,000) are declared as
differentially expressed. Results of the popular
SAM method (pink line), although greatly
improved over those of simple t-test statistic
(purple line), approached, but did not exceed,
the level of concordance based on fold-change
ranking (green line). |
|
| |
Related
Links:
-
http://edkb.fda.gov/MAQC
[return]
-
http://www.nature.com/nbt/focus/maqc [return]
-
http://www.fda.gov/oc/initiatives/criticalpath/ [return]
All external sites will open in a new browser.
The Thomson Corporation and esi-topics.com does not endorse external sites.
|
ESI Special Topics,
March 2007
Citing URL - http://www.esi-topics.com/nhp/2007/march-07-LemingShi.html
|
|
|