Beginning in mid-February 2008, the 1997-2007 online version of the Science Watch® newsletter, ESI-Topics.com, and in-cites.com, will all be featured together on the redesigned ScienceWatch.com. All previous content from the three sites will be permanently archived, and remain accessible from any existing bookmarks to the archived pages. No new content will be added to this site. Updates and new content (updated biweekly) are available at ScienceWatch.com now.

New Hot Paper Comments

By Leming Shi

ESI Special Topics, March 2007
Citing URL - http://www.esi-topics.com/nhp/2007/march-07-LemingShi.html

A closer look at the work of Leming Shi
.Leming Shi answers a few questions about this month's new hot paper in the field of Computer Science. The author has also sent along images of their work.


From •>>March 2007

Field: Computer Science
Article Title: Cross-platform comparability of microarray technology: Intra-platform consistency and appropriate data analysis procedures are essential
Authors: Shi, LM;Tong, WD;Fang, H;Scherf, U;Han, J;Puri, RK;Frueh, FW;Goodsaid, FM;Guo, L;Su, ZQ;Han, T;Fuscoe, JC;Xu, ZA;Patterson, TA;Hong, HX;Xie, Q;Perkins, RG;Chen, JJ;Casciano, DA
Journal: BMC BIOINFORMATICS
Volume: 6
Issue: 
Page: :art.
Year: no.-S12 Suppl. 2 JUL 15 2005
* US FDA, Natl Ctr Toxicol Res, 3900 NCTR Rd, Jefferson, AR 72079 USA.
* US FDA, Natl Ctr Toxicol Res, Jefferson, AR 72079 USA.
* Z Tech Corp, Jefferson, AR 72079 USA.
* US FDA, Ctr Devices & Radiol Hlth, Rockville, MD 20850 USA.
* US FDA, Ctr Biol Evaluat & Res, Bethesda, MD 20892 USA.
* US FDA, Ctr Drug Evaluat & Res, Bethesda, MD 20892 USA.

ST:  Why do you think your paper is highly cited?

The paper was highly cited because it examined the issue of microarray cross-platform comparability, which was being widely questioned at the time. Reported lack of cross-platform comparability was being attributed solely to platform differences.

Our work demonstrated that the lack of quality experiment and a poor choice of data analysis methods for selecting differentially expressed genes were primarily responsible for the apparent disagreement between microarray platforms. Our reanalysis led us to conclude that DNA microarray results had far more reproducibility and reliability than the growing negative perception had implied.


“Our work demonstrated that the lack of quality experiment and a poor choice of data analysis methods for selecting differentially expressed genes were primarily responsible for the apparent disagreement between microarray platforms.”

 

Importantly, our findings showed a clear and pressing need for launching an ambitious, community-wide effort, the MicroArray Quality Control (MAQC)1 project. The first phase of MAQC was completed and the results were published in Nature Biotechnology2. The aim of MAQC was to systematically address reliability concerns, as well as those of performance along with standards, quality, and data analysis issues about microarray technology.

ST:  Does it describe a new discovery, methodology, or synthesis of knowledge?

The paper describes a reanalysis of the study by PK Tan et al. (Nucleic Acids Res. 31: 5676-84, 2003) that was cited in a high profile article by E Marshall (Science 306: 630-1, 2004). The Tan and Marshall articles raised serious concerns about the disagreement of DNA microarray results obtained using different experimental platforms. Marshall’s paper, in turn, apparently fostered additional papers leading to widespread concerns about the reliability of DNA microarray technology.

Our paper addresses a critical question or concern about DNA microarray technology: Are microarray results reproducible and reliable? We identified that the reported irreproducibility was due to poor data quality and inappropriate data analysis approach, in contrast to the cross-platform incomparability concluded by Tan et al. We reanalyzed the data set using several different statistical approaches and demonstrated that selecting differentially expressed genes by simple t-test P values, while ignoring fold change—the magnitude of difference in gene expression levels, could be a major source of irreproducibility of microarray results.

ST:  Could you summarize the significance of your paper in layman’s terms?

DNA microarray is a highly parallel measurement technology through which expression levels of tens of thousands of genes can be simultaneously measured in one single experiment. One powerful application of microarray technology is the identification of a subset of "interesting" genes whose expression levels differ between two biological conditions (e.g., normal vs. disease).

Usually, replicate samples from each condition are tested. Consequently, we can calculate an average expression change, i.e., fold change, for each gene between the two conditions. In addition, we can calculate a t-statistic (or the corresponding P value) to indicate the statistical significance of the measured fold change. Therefore, for a microarray platform with 30,000 genes, we have to deal with 30,000 fold changes and P values.

The controversy starts when different yardsticks are used to identify a subset of "interesting" or differentially expressed genes. For people like me, with training in analytical chemistry, the obvious ranking criterion should be fold change, i.e., the magnitude of the actual quantity being measured by microarrays. However, a commonly employed ranking criterion is the t-statistic (or its equivalent P value), i.e., the statistical significance of fold change. An inconvenient truth is that the apparent lack of microarray reproducibility reported by Tan et al. was caused, at least in part, by the common practice of applying stringent statistical criteria to select genes without considering fold change.

A general lesson we learned from this exercise is that, when dealing with high-dimensional microarray data, we need to be mindful of what is actually being measured. Statistical significance (e.g., t-statistic) intends to provide a confidence assessment of the measured quantity (i.e., fold change). DNA microarray technology has been criticized as irreproducible, based on the imposition of stringent and conservative statistics measures that are inherently less reproducible than the actual quantity being measured. The same lesson applies to the analysis of high-dimensional proteomics and metabonomics data.

ST:  How did you become involved in this research, and were there obstacles along the way?

Microarray technology has been identified by the U.S. FDA’s Critical Path Initiative3 as a key tool for advancing medical product development and personalized medicine. Reproducibility is an immutable principle of science, and the concerns raised in the literature demanded investigation and response.

I had been an ardent "fan" of microarray technology for some ten years, and my hands-on experience had convinced me that it was a reliable technology in good hands with great potential. Reanalysis quickly revealed the simple reasons for Tan et al.’s negative results.

It proved much more challenging, even contentious, to show that a spreading negative image of microarrays was largely an artifact of low-quality data and a poor choice of statistical methods. It is commonplace for scientists to apply standard statistical methods to support validity and explain limitations of research. Less often is the validity and limitations of statistical methods given the same scrutiny.

The high dimensionality of microarray (and all omics) data requires statistical approaches that are themselves still a subject of research, and that are neither broadly understood by nor available to the scientific community. The conclusions we reached and the consequent value of the paper were not appreciated by reviewers until it was submitted to BMC Bioinformatics.

ST:  Are there any social or political implications for your research?

Microarray technology has become ubiquitous and the excitement about its prospects seems unprecedented. The benefits realized in terms of hypotheses generated are already immense. Discovery of biomarkers, faster and cheaper medical product development, and personalized medicine are among the realistic goals that lie in the future, provided the scientific community can develop and disseminate standard methods and tools for reliable data analysis.

The appropriate way of identifying differentially expressed genes is a continuing disagreement (L Klebanov et al., Nat Biotechnol 25: 25-26 and L Shi et al., Nat Biotechnol 25: 26-27, 2007), which I predict will be unabated well into the future. Disagreements among scientists should provide part of the energy and process to move to consensus on the "best practices" for the generation, analysis, and application of microarray data. This is exactly the goal of the MAQC project that recently entered its second phase.End

Leming Shi, Ph.D.
Principal Investigator
National Center for Toxicological Research
U.S. Food and Drug Administration
Jefferson, Arkansas, USA

Disclaimer: The views presented in this Commentary do not necessarily reflect those of the U.S. Food and Drug Administration.


A Closer Look...

A closer look... Below are images sent in by Leming Shi which correspond with the featured paper, or current research.

Figure 1:

Figure 1: The level of concordance of "interesting" genes identified by different microarray platforms largely depends on the selection of different data analysis procedures. A: Poor cross-platform concordance was reported by PK Tan et al. (2003); B and C: Much higher cross-platform concordance was observed by our reanalysis of the same data set (L Shi et al., 2005).


Figure 2:

Figure 2: The reproducibility of fold changes between two test sites using the same microarray platform is much higher than that of the t-statistic P values between the two test sites. Therefore, the lack of reproducibility of P values should not be used as evidence to criticize microarray technology that tries to measure fold changes (FC) instead of P values.  


Figure 3:

Figure 3: The concordance of lists of "interesting" (differentially expressed) genes depends on the choice of gene selection methods and the threshold of the selection criterion (L Shi et al., 2006). The x-axis represents the number of genes selected as differentially expressed (corresponding to different thresholds), and the y-axis is the percentage (%) of genes common to the two gene lists derived from two test sites. Concordance between genes selected completely at random is shown in red and reaches only 50% when all candidate genes (about 10,000) are declared as differentially expressed. Results of the popular SAM method (pink line), although greatly improved over those of simple t-test statistic (purple line), approached, but did not exceed, the level of concordance based on fold-change ranking (green line).  

  

Related Links:

  1. http://edkb.fda.gov/MAQC   [return]
  2. http://www.nature.com/nbt/focus/maqc   [return]
  3. http://www.fda.gov/oc/initiatives/criticalpath/   [return]

All external sites will open in a new browser. The Thomson Corporation and esi-topics.com does not endorse external sites.
     

ESI Special Topics, March 2007
Citing URL - http://www.esi-topics.com/nhp/2007/march-07-LemingShi.html

•> Search Special Topics
New Hot Papers Menu || All Topics Menu
New Hot Papers Comments Menu
Help || About || Contact

ScienceWatch.com - Tracking Trends and Perfomance in Basic Research
Go to the new ScienceWatch.com

Write to the Webmaster with questions/comments. Terms of Usage.
The Research Services Group of Thomson Scientific |
(c) 2008 The Thomson Corporation.