Metadata Database Design Information and Background

The metadata database was initially created in 2007 and has evolved over time as our understanding of our needs became clearer.  We knew that the metadata database should support our user and data needs.  We tried, at first, to use a metadata standard but found them to be either overly complex or not sufficient for our needs.  Eventually, we realized that we could support any well designed standard by mapping our metadata database entries to the standard, and indeed we have already done this for the ISO19115-2 metadata standards. Data that we archive at NODC are bundled with the corresponding metadata formatted using the ISO19115-2 standard.

This page will contain additional notes and comments that might help people (including ourselves!) better understand how the database evolved, what decisions dictated certain choices, how the design is evolving.

A PDF image of the metadata database table organization is located at http://www.bco-dmo.org/files/bcodmo/schema_BCO-DMO.pdf and it contains several pages of notes describing the changes made to the database over time.

Migration to Drupal 7

On November 14, 2013 we migrated our metadata database, together with the normal website content, to Drupal Version 7. Drupal is a public domain content management system. Up until this time, we were using Drupal Version 6 to manage only the normal web content, while the metadata were stored in a separate MySQL-based database. Before the migration, we used ColdFusion software to provide the public and data manager view of the metadata content. Migrading the metadata to Drupall was not a straightforward task as Drupal does not easily support the many to many relationships that are common in our metadata database.

Parameters table and the Dataset_parameters table:

The U.S. GLOBEC thesaurus and the U.S. JGOFS thesaurus were the starting points for the BCO-DMO "official" Parameters table.  The Parameters table is a MySQL table defining the BCO-DMO "official" parameter (data variable or data column) names.  This is an evolving list which also seeks input from evolving ontologies (see for example the MMI web site http://marinemetadata.org/).  There is also a Dataset_parameters table containing the parameter names as used by each contributing scientist on each dataset (data object) basis.  There is an attempt to make a connection between the Dataset_parameters table and the Parameters table, and (eventually) an entry in the Dataset_parameters table pointing to some kind of conversion routine.  This is a long range goal, but it gives us the beginning for establishing "standard" parameter names and yet we are still able to support the reality that we get data with user supplied parameter names that do not conform to any standard. A more recent enhancement has been the addition of a "parameter_external_identifier" column in the Parameters table. This allows us to link our BCO-DMO parameter name to a community standard name, such as is evolving at the British Oceanographic Data Centre.

Lookup and Person_role tables:

The Lookup table and the Person_role tables are connected to too many other tables to include the lines in the schema diagram. The details of how these tables are used is described below.

We are using the Lookup table as a replacement for separate tables in one to many relationships. For example, people have honorifics, like Mr. Ms., Dr., etc. and contact methods include email, phone, in person, etc. The Lookup table contains all of this information so that the People table can use a lookup_id to reference the honorific value instead of having a separate table for honorific values. Similarly, the Tracking_ status table references the tracking_contact_method which uses the lookup_id to point to one of the contact methods.

The Person_role table maintains information about a person's role within a program, project, dataset, dataset_deployment and deployment. For example, it maintains the information that someone is

    1. A chief scientist for cruise AL9906
    2. A PI in the US GLOBEC Program
    3. A Scientist in the US GLOBEC Southern Ocean Project
    4. A contact person for the XYZ dataset
    5. A Scientist for the ABC dataset on the AL9906 cruise.

So, this one table replaces five intersection tables between the People table and the Dataset, Deployment, Dataset_deployment, Project, and Program tables. Again, this isn't the recommended way to show the relationships since SQL retrievals have to be done differently than expected, but it works.