Home | Biomedical
Resources
| Clients | Database
Indexing Resources
back to Shelley
Greenhouse’s resume
Reprinted with permission. Please cite as:
Greenhouse, S. (2000) The future of database indexing. Key
Words (bulletin of the American
Society of Indexers) 8(4):125-126, 132.
The Future
of Database Indexing
by Shelley
Greenhouse
I went through an identity crisis a few years ago, right
after I discovered what I had been doing was called indexing. Being a proper
indexer, I wanted to know what kind I was. I certainly wasn’t a
back-of-the-book indexer, and I didn’t index journal issues, or newspapers, or
genealogies. Then I heard the term “database indexer.” I interpreted database indexing as indexing
material for inclusion in searchable databases, rather than working directly
with the database software designing queries or constructing tables. I could
also call myself a periodical indexer, but I think of periodical indexing more
as producing annual or semiannual indexes for journals. No matter what label I
assign myself, whenever I identify myself as an indexer, I get queries about
what I do and how I do it.
What do I do?
I work for database
producers as an outsourced, freelance indexer. I index journal articles,
newspaper articles, books (whole books [metatopic] or individual chapters),
pamphlets, manuals, etc. for inclusion in searchable databases compiled and
maintained at the publishers. My
education is in biology, physiology, and neuroscience so I concentrate on
medical and scientific articles. I read or scan the entire article or abstract,
and attempt to describe the content in ten words or less. Publishers, or database producers. send
me the material either as published
journal issues, photocopies of published articles, or abstracts and their bibliographic information in a
text-only (ASCII) file. I apply index terms from controlled vocabularies (thesauri)
specific to each publisher. I either handwrite
descriptors on data forms, or enter them into the ASCII file after a
specified field name. A typical record might look like this:
AN accession number of article for tracking
purposes
AU author names
TI article title
SO source information (bibliographic
information)
PT publication type (journal article, letter,
abstract)
AF author affliation (where the author was
when he did what he did)
IS ISBN or ISSN number
YR year of publication
LG language(s) of publication
FS grant info
CG grant info
NT other pertinent info
AB Abstract
LT important, thesaurus-only descriptors
UC non-thesaurus terms for searching only
ID less important, thesaurus-only
descriptors
CY copyright info if necessary
DA date record was created by me
AD address for correspondence or reprint
requests
Depending on the
contract, some fields may already be filled in, or some may not be required.
There are strict guidelines on how to complete a record, including spacing after
the field names (those two letter abbreviations) and how to handle wrapped
lines in a single field.
How do I do it?
I don’t use dedicated indexing software. I can use any word
processing program my husband allows on our computer that can save my work as a
text-only file. Two of my contracts send me a database "dump" of
records, similar to the record shown above.
For one, I never see the
original article anymore. Bibliographic data and abstracts are already scanned
in, and field names/codes are already provided. All I do is look at the title
and read the abstract (if one is provided in the electronic file), then type in
descriptors and common names after the field names/codes. I use a thesaurus,
but the database will also accepts non-thesaurus terms as major descriptors. I
can enter identifier terms as well as any taxonomic terms (if provided by the
abstract). I also provide subject classification codes that designate the
database the article belongs in and the specific subject grouping within that database.
I get paid per article, based on the number of
databases the article is designated for. I can't make any corrections in
material already present in the file but can alert my editors by including a
note in a special field. I receive and
return the file via e-mail.
The second contract sends shipments of journals, books or
articles, as well as a database template on disk as a text-only file. The
template contains titles, authors, and source information (journal name,
volume/issue and pagination). I proof the information already scanned in and
make necessary corrections. I type in an abstract, either copying the one
provided or creating one of at least 100 words (up to a maximum of 250). I have
to key in field codes and data for addresses, correspondence/reprint addresses,
grant providers & numbers, descriptors (major, minor and identifiers), and
a subject classification keyword. The thesaurus is cumbersome and non-intuitive
to use, as it is primarily a numerical hierarchy of terms, only a small percentage
of which are allowed, and narrower terms may be discovered in a rather random
fashion (at least for me). I can add non-thesaurus identifier terms. They also
pay per article, but more than other contracts because I write abstracts and do
much of the data input. They do not
reimburse shipping costs.
A third contract
also sends shipments of journals, books or articles (the UPS man and I
are old friends). I read/scan the entire article, and index in depth using NLM
standards and the MeSH thesaurus. I write descriptors on data forms that get
typed in by someone else, and bibliographic data and abstracts are input by
someone else. I get paid per article,
and they send me a prepaid label for returning the shipment.
Good internet links for further reading:
http://www.slais.ubc.ca/courses/arstlibr512/winter2000/database1.htm
School of Library, Archival, and Information Studies, The
University of British Columbia
Their page on database indexing contains links to articles
and information on constructing and using thesauri;, a comprehensive set of
links to online thesauri, terms lists and schemes; and links to articles on database indexing guidelines and
discussions.
http://bioweb.pasteur.fr/docs/seqio/idxseq_doc.html
Site maintained by The Pasteur Institute, a private,
nonprofit organization devoted to fighting infectious diseases. This link is to
“idxseq – A Database Indexing Program,” a report which describes a software
program for indexing databases.
http://www.public.iastate.edu/~CYBERSTACKS/Morning.htm
This article, “Morning becomes electric: Post-modern
scholarly information access, organization and navigation,” contains a section
on indexing and abstracting services, which contains links to database
producers in biology, engineering, geology, mathematics, medicine, and citation
services.
How did I learn to do it?
Training for each publisher/database goes on at that
publisher's office if at all. Each
publisher has different rules for what
to pick up from an article, what thesaurus terms match what text terms, and how
to index specifically for that publisher. The exception is MeSH indexing. The
National Library of Medicine runs a two-week full time training class followed
by six weeks of on-site internship for MeSH indexing, but to take it you have
to be sponsored by one of NLM's contractors. You can find out who they are by
contacting NLM. In Indexing Specialties: Medicine, (L.
Pilar Wyman, ed.) is a chapter by Helen Ochej, which is a reprint of her
article from Key Words, Sept/Oct
1998, vol 6(5). It contains the most accurate description of MeSH indexing at
the National Library of Medicine that I have seen. (PubMed http://www.ncbi.nlm.nih.gov/PubMed/ is a good example of where the kind of
indexing I do winds up.)
Where do I find work?
I think, with few
exceptions, the jobs I found were either posted on Index-L, or I learned of them by networking with other
indexers. Each publisher I work with has in-house indexers
as well as freelancers, and I've seen ads that only want in-house folks. I've
applied for some real jobs as a full-time, in-house database indexer (from
ads posted to Index-L) but have always decided against going back to work (most
of my objections are having to wear real clothes and not being able to set my
own hours).
Check the Index-L archives:
http://www.indexpup.com/index-list/index.html
(before April 1999 and after April
2000)
http://listserv.binghamton.edu/archives/index-l.html
(between April 1999 and March 2000)
There's lots of great stuff in there, including a long
discussion on how to find database work. I also periodically check the want ads
in the Washington Post, which are searchable on the Web http://www.washingtonpost.com/wl/jobs/home?nav=left.
(updated URL added 1/7/04)
There are a few places on the web where you can find databases
listed. Go to one of these sites, find
the databases whose subject areas match your areas of expertise, and search
through the information on that database to find the publisher. Once you find a
publisher, check to see if they produce
print journals containing similar information. Print journals would tell you
the names of editors to write to (one of my contracts is really a secondary
publisher that started out printing abstract journals and moved into web-based
databases when subscriptions for the print journal started falling off). I
discovered the hard way that Human Resource departments don't know what
outsource or freelance or work-at-home means.
I’ve included a few good places to start looking for databases in a
particular field.
http://www.mclennan.library.mcgill.ca/cdroms/databases.htm
McGill Libraries Database list
http://library.dialog.com/bluesheets/html/bl0230.html
The Gale Directory of Online, Portable, and Internet
Databases
http://www.fcla.edu/LUISinfo/WLdoc/about_fs.html
FirstSearch Databases Index and descriptions
http://www.austlii.edu.au/links/World
Australasian Legal Information Institute databases
http://www.epnet.com/database.html
EBSCO Information Services provides a list of the databases
they publish along with a brief description of each. Within this site is
information on employment opportunities.
How long will I be able to do this?
In terms of hours per day, right now I try to work no more
than four hours. I can earn a nice
supplemental income this way.
In terms of years into the future, I find myself awake at 3
a.m. worrying about this. With the advent of full text searching, electronic
journals published on the Web, and all the folks out there generating automatic
indexing software, I am concerned with how sustainable this work is. Glenda
Browne’s article entitled “Automated Indexing” (http://www.aussi.org/conferences/papers/browneg.htm)
kept me up for weeks. I am heartened by her assertion that, even in the event
that computer database indexing becomes the norm, human quality control will
still be necessary. Automatic indexing tools may be able to tell that a
thesaurus term appears 64 times in an article, which may trigger the inclusion of
the term in the descriptor list, but only a human will be able to tell if the
use of the term is related to the “aboutness’ of the article or the verbosity
of the author.
Other light reading to keep me worrying at night includes:
http://www.gslis.utexas.edu/~ssoy/organizing/l391d2b.htm
References to the early years of automatic indexing and
information retrieval (a bibliography of articles)
http://www.pitt.edu/~korfhage/indexing.htm
Another bibliography of how automatic indexing is coming
along
http://ai.bpa.arizona.edu/papers/wcs96/wcs96.html
“A Concept Space Approach
to Addressing the Vocabulary
Problem in Scientific Information Retrieval: An Experiment on the Worm
Community System” The authors design and test term selection and thesaurus
building algorithms concerned with scientific information retrieval.
A brief note on how I found the links I’ve included: I
search either with Google (www.google.com) or AltaVista. I preferentially start
with Google because it gives more relevant results, but I often switch to
AltaVista, where I can group words into phrases with quotes and generally
fiddle with things to generate different results depending on which word I use
first.
In this reprint, I’ve activated those links that are still
relevant. For more information, and updated links to some sites, please go to Database
Indexing Resources on this site.