RDoQ is an extended interface for using
a vast digital library for the biomedical literature.
RDoQ allows creation of literature maps,
2-dimensional visual summaries of associations present in this literature.
RDoQ is a high-level interface, permitting people to ask shotgun-style queries
that look for associations between two sets of terms.
These terms can be any PubMed query,
and the two sets can be fairly large;
RDoQ can handle term sets that include several hundred queries.
Of course, as the number of terms grows, the visibility of the results
declines, so it should be thought of mainly as a tool for exploring
modest sets of terms.
For people familiar with bioinformatics,
it may help to think of RDoQ as a kind of
BLAST for PubMed,
performing simultaneous searches and reporting the results in a coherent way.
Overview of RDoQ presented at AMIA STB'09
A Quick Walkthrough of RDoQ
RDoQ is a program that acts as a front-end to PubMed.
The initial RDoQ query form looks like this:
As input RDoQ takes two sets of "terms",
which are text lines having one of the two formats
and analyze them in a table of associations.
Any PubMed query is valid here, and the more sophisticated the better, since
the results are typically more precise.
concept phrase : PubMed Query
It is permitted to include comments also, which can appear on any line,
and anywhere on the line.
Comments start with the # character.
Blank lines (or comment-only lines)
are fine also; they cause borders to be produced in the output association table.
An example could be:
# Project Direction
Bob Bilder: Bilder Robert [FAU] OR Bilder R [AU] # Director of CNP
Roberto Peccei: Peccei Roberto [FAU] OR Peccei R [AU]
Leonard Rome: Rome Leonard [FAU] OR Rome LH [AU]
Fred Sabb: Sabb Fred [FAU] OR Sabb FW [AU]
# Cognitive Neuroscience
Carrie Bearden: Bearden Carrie [FAU] OR Bearden CE [AU]
Bob Bilder: Bilder Robert [FAU] OR Bilder R [AU]
Ty Cannon: Cannon Tyrone [FAU] OR Cannon TD [AU]
If we do not give a second set of terms, RDoQ assumes the two sets are the same,
and looks for associations between all terms in this set.
If you look carefully, you can see that the figure above actually selects
a predefined term set -- a list of people in the CNP
(Consortium for Neuropsychiatric Phenomics).
Using the predefined term set achieves the same effect as typing
all this into the web form.
It also shows us setting the relevance level to 5,
asking that only people who have 5 or more publications with another person
be included in the result.
(This kind of thresholding is useful — the screen is filled more densely with relevant information.)
If we click on submit, we get a result page that has results from PubMed
for everyone in the CNP list:
Scrolling down further, we can explore the resulting table:
We can mouse over table entries and get a breakdown of the
corresponding co-occurrences of publications:
Also, clicking on the table entry for Bob Bilder and Arthur Toga
that is highlighted above
produces a PubMed summary of these publications:
By moving the sliders on the left, we can change the size of the display —
and for example go back in time to see what the
associations in PubMed between these people looked like at the end of 2000:
At the very bottom of the page is a Revise input button:
Clicking on this button produces the RDoQ query form, permitting us to revise our query:
By choosing one of the predefined lists of genes for our second set of terms
we can explore associations between people and genes:
This obtains a table of associations between people and genes in the literature:
This is just a quick sketch! Have fun!
A 2-Minute Tour: Some natural uses for RDoQ
Learning about People
First, RDoQ lets us learn things about people at CNP.
Here, we find associations between the predefined set of
people at CNP (grouped by research field)
and itself (so, associations between people at CNP).
The output, very like what was shown earlier, spotlights two patterns of research interests
that are surprising:
There are essentially two large clusters of people,
ranging over technical neuroscientific and informatics-related fields at one end,
and over genetic and psychiatric fields at the other.
There are a few key interdisciplinary members that span both clusters.
To be more easy to digest,
this query asked for associations between investigators
who had at least 10 publications with another investigator here.
By varying the query we could drill down and better understand
how people interact within the CNP.
Who Works on What
Who knows about the gene I am interested in?
Questions like this can be answered quickly with RDoQ,
simply by finding associations between
and the predefined set of
Genes that may be relevant to projects at CNP
available in RDoQ:
RDoQ also includes a list of
faculty in the UCLA Neuroscience Interdisciplinary program.
This can make it easy for RDoQ to find interdisciplinary research connections
between other people at UCLA and
Automatically generating Reviews of the Literature
formal systematic reviews that aggregate individual statistical results
into results with greater significance.
These systematic reviews can be better sources of evidence than the open literature,
and so in initial study it can help to see what is available in these reviews.
PubMed has features for specific retrieval of meta-analyses,
and RDoQ can take advantage of this.
RDoQ provides the ability to set a context on any search.
If we set the context to "Meta-Analyses",
limiting all retrievals from PubMed to be of this kind,
then we can get a detailed summary of available reviews
for all terms of interest to us.
For example, we can do this for the relationships between established
Another feature RDoQ provides is the ability to download any table like this in the format of either
a spreadsheet or a graph, in VUE format,
VUE is a graph editor and information visualization tool; it can be either downloaded or run as an applet in a new browser window.
In the downloaded graph, all links are active and can be used to retrieve PubMed records.
RDoQ can handle not just sets of terms, but hierarchies of terms.
This permits it to analyze vocabularies, lexicons, taxonomies (is-a hierarchies),
part-of hierachies, and ever some kinds of ontologies.
For example, one of the predefined sets of terms is a
hierarchy of 330 anatomical regions in the human brain
RDoQ also includes many hierarchies from
the Medical Subject Heading "ontology" of keywords used by PubMed.
For example, if we use the
MeSH hierarchy of neurotransmitter receptor terms
we can find a comprehensive picture of associations in the literature between
neuroanatomical regions and neurotransmitter receptors.
RDoQ can adjust the viewing scale also, so we can get two images —
an overall picture (from 30,000 feet, in 1-point font), and an up-close picture
(from 3 feet, with 14-point font),
that show associations between regions and receptors:
Sophisticated use: Knowing how to use PubMed and MeSH
To use RDoQ, it helps to be familiar with querying PubMed.
There is an extensive
online tutorial/reference/help system for PubMed,
page has links to information about important features.
This knowledge is important; being aware of how PubMed
interprets queries permits you to find associations (and relevant publications)
that otherwise are likely to be missed.
It also helps to know
the Medical Subject Heading "ontology" used by PubMed.
MeSH is a kind of "keyword" index, or set of "indexing terms", or "topics" —
as articles are entered into PubMed they are tagged by curators as
being relevant to certain MeSH terms, so that one can search articles by topic.
How RDoQ Works
RDoQ asks PubMed for documents matching
the queries in your sets of terms,
and then analyzes the extent to which these sets intersect.
The size of the intersection is used as a measure of association.
PubMed requires automated programs like RDoQ to make queries at least 3 seconds apart.
This is significant, since if you have 100 terms in your vocabulary,
to process it RDoQ needs at least 100*3 seconds = 5 minutes.
As a result, the process of running the program on a large vocabulary
is slow the first time, as the cache is getting initialized with
the PubMed results for each term in the vocabulary (every 3 seconds).
Once these results are in the cache, however, so PubMed does not need
to be disturbed, then execution becomes much faster.
In other words, any subsequent RDoQ analyses of these terms should be fast.
The PubMed cache files can get rather large. If a query
gets 200,000 hits, and each hit is a 10-character identifier (PubMed ID),
then 200,000 * 10 = 2MBytes of identifiers are
downloaded from PubMed and stored in a local cache.
If there were 1000 queries like this one,
this would require downloading 1000 * 2MBytes = 2GBytes.
Imposing this kind of load on PubMed should be avoided,
and if it cannot be avoided, then done during weekends and the off-hours.
The size of the RDoQ output (.html) files can be large.
Expect file sizes of 5MB – 25MB.
So: be thoughtful in your choice of queries, and avoid loading down
both RDoQ and PubMed with work that isn't really useful.
Do unto PubMed as you would have it do unto you.
To print in color (even to PDF), your browser must be set properly:
|Firefox||Page Setup → Print Background (Colors & Images)|
|Safari||Print; then, in print options: Safari → Print backgrounds|
|IE||Tools → Internet Options → Advanced → Printing → Print Background Colors.|
Copyright © 2007–2015 D.S. Parker
All Rights Reserved.