Home | Biomedical Resources | Clients | Database Indexing Resources

back to Shelley Greenhouse’s resume

 

 

Reprinted with permission. Please cite as:

Greenhouse, S. (2000) The future of database indexing. Key Words (bulletin of the American Society of Indexers) 8(4):125-126, 132.

 

The Future of Database Indexing

by Shelley Greenhouse

 

I went through an identity crisis a few years ago, right after I discovered what I had been doing was called indexing. Being a proper indexer, I wanted to know what kind I was. I certainly wasn’t a back-of-the-book indexer, and I didn’t index journal issues, or newspapers, or genealogies. Then I heard the term “database indexer.”  I interpreted database indexing as indexing material for inclusion in searchable databases, rather than working directly with the database software designing queries or constructing tables. I could also call myself a periodical indexer, but I think of periodical indexing more as producing annual or semiannual indexes for journals. No matter what label I assign myself, whenever I identify myself as an indexer, I get queries about what I do and how I do it.

 

What do I do?

 

I work for  database producers as an outsourced, freelance indexer. I index journal articles, newspaper articles, books (whole books [metatopic] or individual chapters), pamphlets, manuals, etc. for inclusion in searchable databases compiled and maintained at the publishers.  My education is in biology, physiology, and neuroscience so I concentrate on medical and scientific articles. I read or scan the entire article or abstract, and attempt to describe the content in ten words or less.  Publishers, or database producers. send me  the material either as published journal issues, photocopies of published articles, or abstracts and  their bibliographic information in a text-only (ASCII) file. I apply index terms from controlled vocabularies (thesauri) specific to each publisher. I either handwrite  descriptors on data forms, or enter them into the ASCII file after a specified field name. A typical record might look like this:

 

AN       accession number of article for tracking purposes

AU       author names

TI         article title

SO       source information (bibliographic information)

PT       publication type (journal article, letter, abstract)

AF        author affliation (where the author was when he did what he did)

IS         ISBN or ISSN number

YR       year of publication

LG       language(s) of publication

FS       grant info

CG       grant info

NT       other pertinent info

AB       Abstract

LT        important, thesaurus-only descriptors

UC       non-thesaurus terms for searching only

ID         less important, thesaurus-only descriptors

CY       copyright info if necessary

DA       date record was created by me

AD       address for correspondence or reprint requests

 

Depending on the contract, some fields may already be filled in, or some may not be required. There are strict guidelines on how to complete a record, including spacing after the field names (those two letter abbreviations) and how to handle wrapped lines in a single field.

 

How do I do it?

 

I don’t use dedicated indexing software. I can use any word processing program my husband allows on our computer that can save my work as a text-only file. Two of my contracts send me a database "dump" of records, similar to the record shown above.  For one,  I never see the original article anymore. Bibliographic data and abstracts are already scanned in, and field names/codes are already provided. All I do is look at the title and read the abstract (if one is provided in the electronic file), then type in descriptors and common names after the field names/codes. I use a thesaurus, but the database will also accepts non-thesaurus terms as major descriptors. I can enter identifier terms as well as any taxonomic terms (if provided by the abstract). I also provide subject classification codes that designate the database the article belongs in and the specific subject grouping within that database. I get paid per article, based on the number of  databases the article is designated for. I can't make any corrections in material already present in the file but can alert my editors by including a note in a special field.  I receive and return the file via e-mail.

 

The second contract sends shipments of journals, books or articles, as well as a database template on disk as a text-only file. The template contains titles, authors, and source information (journal name, volume/issue and pagination). I proof the information already scanned in and make necessary corrections. I type in an abstract, either copying the one provided or creating one of at least 100 words (up to a maximum of 250). I have to key in field codes and data for addresses, correspondence/reprint addresses, grant providers & numbers, descriptors (major, minor and identifiers), and a subject classification keyword. The thesaurus is cumbersome and non-intuitive to use, as it is primarily a numerical hierarchy of terms, only a small percentage of which are allowed, and narrower terms may be discovered in a rather random fashion (at least for me). I can add non-thesaurus identifier terms. They also pay per article, but more than other contracts because I write abstracts and do much of the data input.  They do not reimburse shipping costs.

 

A third contract  also sends shipments of journals, books or articles (the UPS man and I are old friends). I read/scan the entire article, and index in depth using NLM standards and the MeSH thesaurus. I write descriptors on data forms that get typed in by someone else, and bibliographic data and abstracts are input by someone else.  I get paid per article, and they send me a prepaid label for returning the shipment.

 

Good internet links for further reading:

 

http://www.slais.ubc.ca/courses/arstlibr512/winter2000/database1.htm

School of Library, Archival, and Information Studies, The University of British Columbia

Their page on database indexing contains links to articles and information on constructing and using thesauri;, a comprehensive set of links to online thesauri, terms lists and schemes;  and links to articles on database indexing guidelines and discussions.

 

http://bioweb.pasteur.fr/docs/seqio/idxseq_doc.html

Site maintained by The Pasteur Institute, a private, nonprofit organization devoted to fighting infectious diseases. This link is to “idxseq – A Database Indexing Program,” a report which describes a software program for indexing databases.

 

http://www.public.iastate.edu/~CYBERSTACKS/Morning.htm

This article, “Morning becomes electric: Post-modern scholarly information access, organization and navigation,” contains a section on indexing and abstracting services, which contains links to database producers in biology, engineering, geology, mathematics, medicine, and citation services.

 

How did I learn to do it?

 

Training for each publisher/database goes on at that publisher's office if at all.  Each publisher  has different rules for what to pick up from an article, what thesaurus terms match what text terms, and how to index specifically for that publisher. The exception is MeSH indexing. The National Library of Medicine runs a two-week full time training class followed by six weeks of on-site internship for MeSH indexing, but to take it you have to be sponsored by one of NLM's contractors. You can find out who they are by contacting NLM.  In  Indexing Specialties: Medicine, (L. Pilar Wyman, ed.) is a chapter by Helen Ochej, which is a reprint of her article from Key Words,  Sept/Oct 1998, vol 6(5). It contains the most accurate description of MeSH indexing at the National Library of Medicine that I have seen. (PubMed http://www.ncbi.nlm.nih.gov/PubMed/  is a good example of where the kind of indexing I do winds up.)

 

Where do I find work?

 

I think, with few  exceptions, the jobs I found were either posted on Index-L, or  I learned of them by networking with other indexers.   Each  publisher I work with has in-house indexers as well as freelancers, and I've seen ads that only want in-house folks. I've applied for some real jobs as a full-time, in-house  database indexer  (from ads posted to Index-L) but have always decided against going back to work (most of my objections are having to wear real clothes and not being able to set my own hours). 

Check the Index-L archives: 

http://www.indexpup.com/index-list/index.html  (before April 1999 and after April 2000)

http://listserv.binghamton.edu/archives/index-l.html  (between April 1999 and March 2000)

There's lots of great stuff in there, including a long discussion on how to find database work. I also periodically check the want ads in the Washington Post, which are searchable on the Web  http://www.washingtonpost.com/wl/jobs/home?nav=left. (updated URL added 1/7/04)

 

There are a few places on the web where you can find databases listed.  Go to one of these sites, find the databases whose subject areas match your areas of expertise, and search through the information on that database to find the publisher. Once you find a publisher, check to see if  they produce print journals containing similar information. Print journals would tell you the names of editors to write to (one of my contracts is really a secondary publisher that started out printing abstract journals and moved into web-based databases when subscriptions for the print journal started falling off). I discovered the hard way that Human Resource departments don't know what outsource or freelance or work-at-home means.  I’ve included a few good places to start looking for databases in a particular field.

 

http://www.mclennan.library.mcgill.ca/cdroms/databases.htm

McGill Libraries Database list

 

http://library.dialog.com/bluesheets/html/bl0230.html

The Gale Directory of Online, Portable, and Internet Databases

 

http://www.fcla.edu/LUISinfo/WLdoc/about_fs.html

FirstSearch Databases Index and descriptions

 

http://www.austlii.edu.au/links/World

Australasian Legal Information Institute databases

 

http://www.epnet.com/database.html

EBSCO Information Services provides a list of the databases they publish along with a brief description of each. Within this site is information on employment opportunities. 

 

How long will I be able to do this?

 

In terms of hours per day, right now I try to work no more than four hours.  I can earn a nice supplemental income this way. 

 

In terms of years into the future, I find myself awake at 3 a.m. worrying about this. With the advent of full text searching, electronic journals published on the Web, and all the folks out there generating automatic indexing software, I am concerned with how sustainable this work is. Glenda Browne’s article entitled “Automated Indexing” (http://www.aussi.org/conferences/papers/browneg.htm) kept me up for weeks. I am heartened by her assertion that, even in the event that computer database indexing becomes the norm, human quality control will still be necessary. Automatic indexing tools may be able to tell that a thesaurus term appears 64 times in an article, which may trigger the inclusion of the term in the descriptor list, but only a human will be able to tell if the use of the term is related to the “aboutness’ of the article or the verbosity of the author. 

 

Other light reading to keep me worrying at night includes:

 

http://www.gslis.utexas.edu/~ssoy/organizing/l391d2b.htm

References to the early years of automatic indexing and information retrieval (a bibliography of articles)

 

http://www.pitt.edu/~korfhage/indexing.htm

Another bibliography of how automatic indexing is coming along

 

http://ai.bpa.arizona.edu/papers/wcs96/wcs96.html

“A Concept Space Approach  to  Addressing the Vocabulary Problem in Scientific Information Retrieval: An Experiment on the Worm Community System” The authors design and test term selection and thesaurus building algorithms concerned with scientific information retrieval.

 

A brief note on how I found the links I’ve included: I search either with Google (www.google.com) or AltaVista. I preferentially start with Google because it gives more relevant results, but I often switch to AltaVista, where I can group words into phrases with quotes and generally fiddle with things to generate different results depending on which word I use first.

 

 

In this reprint, I’ve activated those links that are still relevant. For more information, and updated links to some sites, please go to Database Indexing Resources on this site.