By Dr. Prakash Nadkarni
ESI Special Topics, June 2002
Citing URL - http://www.esi-topics.com/fbp/comments/june02-PrakashNadkarni.html
|
Dr. Prakash Nadkarni
answers a few questions about this month's fast breaking
paper in field of Social Sciences.
From
•>>June 2002
Field: Social Sciences, general
Article Title:
"UMLS concept indexing for production databases: A feasibility study"
Authors: Nadkarni,
P;Chen, R;Brandt, C
Journal: J AMER MED INFORM ASSOC
Volume: 8
Page: 80-91
Year: JAN-FEB 2001
* Yale Univ, Sch Med, Ctr Med Informat, POB 208009, New Haven, CT 06520 USA.
* Yale Univ, Sch Med, Ctr Med Informat, New Haven, CT 06520 USA.
|
Why
do you think your paper is highly cited?
I guess the field of concept indexing and the UMLS is
currently a hot one. I don't rate this paper as one of my very
best, though the work was definitely fun to do and describe, and
it did have the side effect of putting a lot of previous work
(by other researchers) into proper perspective.
Does
it describe a new discovery or new methodology that's useful to
others?
The paper discusses, based on an experiment, the possible
pitfalls that the researcher encounters when trying to match
phrases in electronic text to terms in a controlled vocabulary,
such as the National Library of Medicine's Unified Medical
Language System (UMLS). It does describe a computer
program that attempts this task.
Can
you give us some background on this research?
Information Retrieval (IR) is the field of computer science
that is concerned with general methods of processing text so as
to facilitate its subsequent search. Large bibliographic
databases such as MedLine (and ISI's own Current
Contents) use this technology to allow users to search
these databases by keywords. In most cases, keywords (or search
terms) are the individual words that are part of the abstract
(or the full text of the article). The drawback of using
individual words is the problem of synonyms: a user must specify
all synonyms of a particular word to make sure the search does
not miss articles of interest.
Controlled vocabularies are really thesauri, which contain
CONCEPTS in a particular domain (e.g., medicine, and life
sciences) and their synonyms. If one is able to scan the text of
an article and match the text to concepts in the thesaurus, then
the IDs of the concepts can be used for indexing, instead of
words. This way, the user, having specified a single keyword for
searching, can have the thesaurus expand the query by matching
the keyword to a
concept and then searching for all articles indexed by those
concepts.
In practice, concept-indexing is not
foolproof. One problem is polysemy-words that have multiple
meanings, and can match to multiple concepts. For example,
"anesthesia" can be a procedure ancillary to surgery, or
a loss of sensation in part of the body (e.g., following a nerve
injury). Also, cryptic abbreviations, neologisms, and elisions,
all of which occur in dictated medical text, can foil the process
of concept recognition. Our work concluded that concept indexing
by itself could not substitute for traditional word-indexing, but
could be ancillary to the latter.
Could
you summarize the significance of your paper in layman's terms?
If the kinks in concept indexing can be removed by very
sophisticated natural language processing (no guarantee that
this will happen), this technology could benefit all users of
bibliographic databases.
Prakash Nadkarni, MD
Associate Professor
Yale University School of medicine,
New Haven, CT
|
ESI Special Topics, June 2002
Citing URL - http://www.esi-topics.com/fbp/comments/june02-PrakashNadkarni.html
|
|
|