Problem indicators (“bad smells”)

As is the case for component and profile definitions, “smells” can also be identified for metadata records. The patterns described in this section serve as indicators for potential issues with the metadata at hand. It can also be the case that a smell in a record is indicative of an issue with the design of the profile it is based on, or its constituent components, i.e. a modelling issue. To identify a specific “smell” in a record does not necessarily mean that the metadata is broken. Instead it should be seen as a warning sign that there may be a problem which should be checked.

  • The record file is very large (several megabytes). A higher granularity may improve the usability of the metadata. Large collections records can be divided into subcollections. Non-collection records should only describe a single resource or a set of strongly related resources. In any case, be aware that large files are not ideal for processing by humans and machines.

  • There is no single title or description for the record as a whole. If the record can only be described using multiple titles, this indicates that the record should perhaps be split up into multiple records or perhaps a hierarchical collection. Of course, if a single title could be imagined but simply is missing, it should just be added.

  • The record contains very little information.: Sparseness can be a sign of assumed information ‘percolation’. In such cases, records ‘higher up’ in the hierarchy may contain relevant information but this information is not included in the records that describe the actual data. The CMD infrastructure provides no mechanism for inferring information across hierarchy levels.

  • Many of the available fields are omitted or left empty. This could indicate that potentially relevant information is lacking from the description. Note that this could also be the result of a conscious decision, i.e. an existing ‘broad’ profile was chosen that fits the description but allows for additional information that is irrelevant in the particular use case.

  • Low “information entropy”. If there are many highly similar files, this could be an indication that relevant, distinguishing information is lacking. If the used profile does not allow for richer descriptions, reconsider your choice of profile and/or modelling decisions.

  • Occurrence of multiple values in a single string. For example multiple element values separated by commas, semicolons or other characters. In general it is better to use repeating elements, or use separate elements if the individual values have different semantics (for example resource type “English literature -- Middle English“).

  • Metadata content in multiple languages but no ‘xml:lang’ attributes. It is highly recommended to provide multilingual metadata if applicable (for example original and translated titles, see Multilingual metadata [To be written]). However when doing so, the elements concerned should be annotated with the @xml:lang attribute. If there are no such attributes, or another (custom) attribute is used to indicate the description language, use a profile (version) that allows for this attribute instead.

  • URLs or persistent identifiers (PIDs) in the payload. Online resources that are described through a metadata record or relate to it in another way are generally accommodated for by the Resource Proxies (see E5). In many cases URLs or PIDs such as handle URIs appearing in elements and attributes outside the envelope do in fact represent either a described resource, related metadata, landing page or search page and therefore should be embedded in a resource proxy. Exceptions are for example pointers to related literature, or web pages of projects, researchers, actors or organisations that are not landing pages for the described resource(s). Note that resource proxies can be referred to (using the @cmd:ref attribute) from any point in the record that is at the component level (see CS2).

  • The only resource proxy is of type ‘Resource’ but it points to a web page. Should probably be a landing page (see E11), ‘Resource’ type proxies should lead to machine processable files.

Last updated