Generate a Corpus of all available documents at NCBI matching a given Query

Query that selects articles to include in the Corpus

Specify a NCBI query (such as a PubMed Central query) to obtain documents to include in the corpus:

Note: PubMed Central is basically an open-access subset of PubMed, but it also permits full-text search.

Option: Corpus format?

Specify the archive format for the corpus of articles (XML files):

.tar.gz (gzip'ed tar archive of xml documents)
.zip (zip archive of xml documents)
.xml.gz (gzip'ed concatenated xml documents)

Option: NCBI Database from which to extract the Corpus

NCBI has a large set of searchable databases.
PMC (PubMed Central) fulltext articles is the default and preferred choice here, since it executes quickly, using locally cached documents. The other selections perform a delayed sequence of small batch database searches at NCBI.

The corpus can be downloaded after conversion completes, usually within a few minutes.

PubMed Central Online help: searching PubMed Central
PubMed Central FAQ
Sharpening queries with PubMed search field tags
Sharpening queries with MeSH:   MeSH Browser,   MeSH DB

Thanks for visiting us

UCLA Hypothesis Web Project

Copyright © 2008-2010    All Rights Reserved.