Willpower
Information Information Management Consultants |
The simple answer is that we give each item a "name", and then we can create a file of index cards, or a computer file, in which we can search for these names and expect to find all the appropriate items. This is the concept of the Object name field in the Spectrum data structure. It is straightforward at first, and seems intuitive, but once you have documentation which has been built up over time, perhaps by many different people, problems creep in unless there are rules and guidelines to maintain consistency.
The word thesaurus is a rather fancy name, which has acquired a certain mystique, because it is often bandied about as something necessary for effective information retrieval, but something which sounds as though it will involve a lot of work. I have often heard curators say "That's all very well if you have the time and resources, but I have this great backlog of cataloguing to do, and I would never get through the half of it if I had to spend time setting up anything as complicated as a thesaurus. What I need is a simple list of names which I can use to index my objects."
My main purpose in this paper is to make three points:
What are these rules?
This is no problem if the two words are really synonyms, and even if they do differ slightly in meaning it may still be preferable to choose one and index everything under that. I do not know the difference between dresses and frocks but I am fairly sure that someone searching a modern clothing collection who was interested in the one would also want to see what had been indexed under the other. We normally do this by linking the terms with the terms USE and USE FOR, thus:
Dresses | USE FOR | Frocks |
Frocks | USE | Dresses |
This may be shown in a printed list, or it may be held in a computer system, which can make the substitution automatically. If an indexer assigns the term Frocks, the computer will change it to Dresses, and if someone searches for Frocks the computer will search for Dresses instead, so that the same items will be retrieved whichever term is used. A friendly computer will explain what it is doing, so that the user is not puzzled by being given items with terms different from those asked for.
USE and USE FOR relationships are thus used between synonyms or pairs of terms which are so nearly the same that they do not need to be distinguished in the context of a particular collection. Other examples might be:
Cloaks | USE | Capes |
Capes | USE FOR | Cloaks |
|
||
Nuclear energy | USE | Nuclear power |
Nuclear power | USE FOR | Nuclear energy |
|
||
Baby carriages | USE | Perambulators |
Perambulators | USE FOR | Baby carriages |
Perambulators | USE FOR | Prams |
Prams | USE | Perambulators |
If we name objects, we want to be as specific as possible. If we have worked hard to discern subtle distinctions in nature, type or style, we certainly want to record these. The point is that the thesaurus is not the place to do this. Detailed description of an object is the job of the catalogue record; the job of the thesaurus, and the index which is built by allocating thesaurus terms to objects, is to provide useful access points by which that record can be retrieved.
USE and USE FOR relationships can also be used to group similar items together, because too much specificity is as bad as too little. If we have a small clothing collection, containing ten jackets, it is more useful to give them all the index term jackets than to create many specific categories. Anyone searching our catalogue will then be able to search on the single term jackets and see a list of the ten items, each with a description of exactly what kind of jacket it is, as follows:
Jackets: | |
---|---|
1. | Anorak in green cotton, England, 1985. |
2. | Tweed sports jacket, Hawick, Scotland |
3. | Silk bolero with floral embroidery, Spanish, 1930. |
If we used all the possible specific names, each of which would have only one or two items in it, such as blazers, dinner jackets, boleros, donkey jackets, anoraks, flying jackets, sports jackets, and so on, enquirers would have to search the catalogue under each name in turn in order to find all the jackets in the collection, and they would never be sure that there was not a kind of jacket that they had overlooked.
To help enquirers who approach the system by one of these terms, we therefore create the references:
Blazers | USE | Jackets |
Dinner jackets | USE | Jackets |
Jackets | |
NT |
Anoraks Blazers Boleros Dinner jackets Donkey jackets Flying jackets Kagouls Sports jackets |
We could just invert terms and rely on the alphabet to bring them together, in a list such as
Jackets, dinner Jackets, donkey Jackets, flying Jackets, sports |
but this is unreliable and subject to the vagaries of the language, which does not always describe a specific type of item by an adjective preceding the generic name. We have to accommodate types of jacket which have their own distinctive names such as Anoraks or Blazers.
In both the above cases, it is important that the terms
which are linked are of the same type. That is to say
that any narrower term must be a specific case of the
broader term, and able to inherit its characteristics.
(The developers of Object Oriented Programming have
recently discovered this idea, which has been known to
the worlds of information science and biological taxonomy
for a very long time.) Thus if we say that
Blazers is a narrower term of Jackets,
we mean that every blazer is, whatever else it may be,
inherently a jacket, and that it has the characteristics
which define a jacket.
Mice can properly be said to be a narrower term of Rodents, because all mice are inherently rodents, but it is not correct to list Mice as a narrower term of Pests, because some mice, such as laboratory mice and pet mice, are not pests. The idea is to have relationships in the thesaurus which are always true, irrespective of context. In the same way, it would not be correct to list Buses as a narrower term of Diesel-engined vehicles, although many of them are; if we have a diesel-engined bus in our collection, we can show this by giving it the two terms Buses and Diesel-engined vehicles. |
|
Good computer software should allow you to search for "Jackets and all its narrower terms" as a single operation, so that it will not be necessary to type in all the possibilities if you want to do a generic search:
It is also possible to use the RELATED TERM relationship between terms which are of the same kind, not hierarchically related, but where someone looking for one ought also to consider searching under the other, e.g. Beds RT Bedding; Quilts RT Feathers; Floors RT Floor coverings.
A thesaurus is not a dictionary, and it does not normally
contain authoritative definitions of the terms which it
lists. It could perfectly well do this, but a lot more
work would be required to develop it in this way. In an
automated system, however, the thesaurus would be a
logical place to record information which is common to
all objects to which a term might be applied, for example
notes on the history and origin of Anoraks or the
identifying characteristics and lifestyle of Mice (or
perhaps Mus musculus in a taxonomic thesaurus).
Where there is any doubt about the meaning of a term, or the types of objects which it is to represent, a SCOPE NOTE (SN) is attached to it. For example,
6 Form of the thesaurusA list based on these relationships can be arranged in various ways; alphabetical and hierarchical sequences are usually required, and thesaurus software is generally designed to give both forms of output from a single input. A typical simple thesaurus of a few clothing terms is shown in Tables 1 and 2. |
|
Table 2: Sample thesaurus - alphabetical sequence | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
|
|
You may well wish to allocate abstract and discipline terms to objects too, so that you can retrieve all the objects to do with Dentistry, Laundry, Warfare or Food preparation. These terms can also be included in the thesaurus, so long as they are not given hierarchical relationships to names of objects. They should be given RT relationships to an appropriate level of object terms.
Some thesauri interfile terms of different types in their hierarchical display. Indentation in such cases does not necessarily indicate a BT/NT relationship. The relationships may be shown in the alphabetical sequence of thesaurus terms, and it is misleading if they are not distinguished in the hierarchical one.
Because abstract terms do not describe what the object is, they could be put into a field in the catalogue record labelled concept or subject, distinct from the field containing terms which name the object. I do not think that such a distinction will generally be helpful to users, however, and there seems to be no disadvantage in putting both types of term into a single field so that they can easily be searched as alternatives or in combination. Such a field would not be correctly called name and I therefore prefer to call it simply indexing terms or subject indexing terms.
The point is not a trivial one, because as discussed in section 2 above there is a conceptual difference between naming or describing an object and grouping it with others so that it can be found. Both are essential steps, but an information retrieval thesaurus is primarily concerned with grouping.
Singular or plural terms? | |
The cataloguer thinks:
"This is a clock". |
|
The enquirer asks:
"What clocks do you have?" |
|
Prefer plural terms because:
|
The International Standard for thesaurus construction, (ISO 25964-1:2011) recommends that plural terms should be used, except for a few well-defined cases, and my view is that this practice should be followed. Unfortunately, there are many records in museum collections which have been given singular "object names", and the work of changing these to plurals in a move to a thesaurus structure may be so great as to require some compromise.
In the thesaurus, BT/NT relationships can be used for parts and wholes in only four special cases: parts of the body, places, disciplines and hierarchical social structures.
With a polyhierarchical thesaurus it would take more space to repeat full hierarchies under each of several broader terms in a printed version, but this can be overcome by using references. There is no difficulty in displaying polyhierarchies in a computerised version of a thesaurus.
Even when using an authoritative thesaurus, some care is needed. It is still much easier to base your work on an existing thesaurus than to build your own from scratch, unless you have a very specialised collection, and it will also be easier to share your data with other organizations if you use the same standards and terms.
Authority files and thesauri are two examples of a generalised data structure, or ontology, which can allow the indication of any type of relationship between two concepts, and modern computer software should allow different types of relationship to be included if needed.
Revised 2021-10-03 21:10
Comments and feedback on content or presentation are welcome and should be sent to Leonard Will at L.Will@willpowerinfo.co.uk
Copyright © Leonard Will, 1998-2021.