Please
tell us a little about your educational background and early research.
My first degree was in Biochemistry and Biological Chemistry,
from the University of Nottingham. I stayed in Nottingham to do a
Ph.D. in the Chemistry department, in Mark Searle's group. We were
interested in designing short peptides to fold into beta-sheet
structures, and mainly used NMR to determine our success.
|

“MicroRNAs were only recently identified as a major RNA class, and the fast-growing body of researchers determined a need for a centrally (and sensibly) managed nomenclature for these genes.
”
|
|
As a side project, I got interested in mining protein 3D
structures bioinformatically—first to get some idea of
propensities of different residues in proteins to adopt different
conformations. At the end of my Ph.D., I thought I was probably
pretty well suited to bioinformatics, and got a job at the Sanger
Institute, working on the Pfam database of protein families (despite
knowing embarrassingly little about genomics, the Sanger Institute,
or bioinformatics!).
What
drew you to this field of study?
It became obvious to us that we could apply the techniques that
Pfam used on proteins to families of structural RNA. We started the
Rfam database in 2002. MicroRNAs were only recently identified as a
major RNA class, and the fast-growing body of researchers determined
a need for a centrally (and sensibly) managed nomenclature for these
genes. We were happy to help out, without really realizing how
important this was to become for the microRNA field.
Your
2004 Nucleic Acids Research paper, "The microRNA
Registry," has been singled out as a highly cited recent paper on
gene silencing. Would you please sum up this paper and its
significance for the field?
The paper principally describes a service to name novel microRNA
genes. The primary aim is to ensure that different groups don't
inadvertently use the same names to describe different microRNA
sequences—the speed of growth of the field was and is so rapid
that this is a real possibility. I therefore confidentially deal out
names prior to publication of novel microRNAs.
The secondary aim is to make all microRNA sequence and annotation
data available after it is published. The microRNA Registry (now
rebranded the miRBase database) therefore became the central
microRNA database. I guess most citations for the paper are for the
"secondary" function of microRNA data resource. This has
become more and more important as the quantity of data has
increased, and the combination of a curated nomenclature and a
sensible and clean database resource have really helped the microRNA
field to grow at the incredible rate that it has.
What
is the process you use to amass a database such as the microRNA
Registry? What is involved in keeping the project current?
Much of the primary data is submitted to us, so we rely largely
on that. But of course, there is a considerable effort of curation:
reading papers, and incorporating new data types. Some of this is in
response to requests to be able to get data in a certain way, but
the aim is to drive what it's possible to do with the data.
If
you are free to discuss them, please tell us about your current
projects.
I currently manage Rfam and miRBase, and have research interests
in the field of RNA computational biology. For example, I use these
resources to annotate homologs of RNA genes in whole genomes, and to
understand RNA gene evolution and function. In January 2007, I'm
moving to the University of Manchester to start a research group
there. I will continue to look after microRNA gene names, and will
remain heavily involved with the Rfam project.
Dr. Sam Griffiths-Jones
Faculty of Life Sciences
University of Manchester
Manchester, UK