By Sorin Draghici
ESI Special Topics,
October 2006
Citing URL - http://www.esi-topics.com/fbp/2006/october06-SorinDraghici.html
|
Sorin Draghici answers a
few questions about this month's fast breaking paper in
the field of Computer Science.
From
•>>October 2006
Field:
Computer Science
Article Title: Ontological analysis of gene expression data: current tools, limitations, and open problems
Authors: Khatri, P;Draghici, S
Journal: BIOINFORMATICS
Volume: 21
Issue: 18
Page: 3587-3595
Year: SEP 15 2005
* Wayne State Univ, Dept Comp Sci, 431 State Hall, Detroit, MI 48202 USA.
* Wayne State Univ, Dept Comp Sci, Detroit, MI 48202 USA.
|
Why
do you think your paper is highly cited?
I think the number of citations reflects the importance of
this area of research. High-throughput methods have become
ubiquitous in modern life sciences research. Our data-gathering
capabilities have greatly surpassed the available data analysis
techniques. The current challenge is to analyze these vast
amounts of data and translate them into biological knowledge. In
many gene or protein expression profiling experiments,
independently of the platform and the analysis methods used, the
result is a list of genes or proteins found to be differentially
expressed between two or more conditions under study. The
challenge faced by the researcher is to translate such lists
into a better understanding of the underlying biological
phenomena.
|

“This paper looked for the first time at the ontological analysis from a larger perspective, trying to unify existing efforts in the field.”
|
|
One approach to this is to translate the list of
differentially expressed genes or proteins into a functional
profile that identifies the biological processes, cellular
locations, and molecular functions which are significantly
different in the condition under study. However, this functional
profiling cannot be performed manually.
The area of ontological analysis includes all computerized
techniques and methods developed to perform such profiling. In
recent years, as it has become apparent that this type of
analysis can be useful in most, if not all, high-throughput
experiments, many tools have been developed to perform this
task. As more researchers discover the need for and the
potential of this type of analysis, they might find this paper
very useful.
Does
it describe a new discovery, methodology, or synthesis of
knowledge?
The paper attempts to synthesize our current knowledge in
this area. This detailed analysis of the capabilities of these
tools will hopefully help researchers understand the most
important problems associated with this type of analysis: the
scope of the analysis, visualization capabilities, the
statistical model(s) used, correction for multiple comparisons,
the reference gene lists used, some installation issues, and the
sources of annotation data.
More importantly, in spite of the fact that this type of
analysis has been generally adopted, this approach has several
important intrinsic drawbacks. These drawbacks are associated
with all tools discussed and represent conceptual limitations of
the current state-of-the-art in ontological analysis. We propose
these as challenges for the next generation of secondary data
analysis tools.
Could
you summarize the significance of your paper in layman's terms?
This paper examined, for the first time, the ontological
analysis as seen from a larger perspective, trying to unify
existing efforts in the field. We tried to identify the main
ideas used in this type of analysis, as well as the most common
mistakes and the yet unsolved problems. This approach has become
immensely popular in recent years with new tools being published
almost every month. However, most tools use the same approach,
and only a handful of statistical models, which are not that
different from each other.
In spite of its popularity and proven usefulness, this
approach is also severely limited in certain regards. It would
be more beneficial if developers of future tools could try to
expand the current ontological analysis approach by addressing
some of the limitations, rather than providing endless
variations of the same idea. If this paper could contribute to
this shift, it would be a significant contribution. Time will
tell. I am hopeful.
How
did you become involved in this research, and were any problems
encountered along the way?
Sometime in 2000, a colleague from a different department
approached us with this problem: given a list of differentially
expressed genes and a database of annotations using gene
ontology terms, could we build a software tool that would
automatically retrieve all the annotations for all genes and
then show the number of genes in each category? My Masters
student at the time, Purvesh Khatri, managed to implement
something very quickly. We then thought this might be useful to
more people and we developed a more refined tool,
"Onto-Express" (OE), able to perform this analysis
over the web (Khatri et al., "Profiling Gene
Expression Utilizing Onto-Express," Genomics
79[2]:266-270, February 2002).
At the same time, it occurred to me that the mere number of
genes in each category is completely insufficient because
various categories are represented to different extents in each
experiment. In many cases, the number of genes can actually be
misleading. Rather than looking at the number of genes in each
category, one should compare the observed number of genes with
what is expected in each category just by chance.
We then developed a statistical approach for this type of
analysis, which we published in a subsequent paper (Draghici et
al., "Global functional profiling of gene
expression," Genomics 81[2]:98-104, February 2003).
Since then, this analysis approach has become the de facto
standard in the second-stage analysis of microarray experiments.
Currently, while OE continues to have a devoted user base of
several thousand researchers world-wide, over 20 similar tools
are available from other groups.
Are
there any social or political implications for your research?
Today, life sciences are at the center of attention from many
points of view. Cutting-edge research, in everything from cancer
detection and treatment, to chronic illnesses and old age
afflictions, is performed with high-throughput techniques. The
functional profiling approach discussed here is ubiquitously
used in most modern high-throughput experiments.
Sorin Draghici, Ph.D.
Director of the Bioinformatics Core, Karmanos Cancer Institute
and Associate Professor
Dept. of Computer Science
Wayne State University
Detroit, MI, USA
|
ESI Special Topics,
October 2006
Citing URL - http://www.esi-topics.com/fbp/2006/october06-SorinDraghici.html
|
|
|