Scientific Variables Ontology

Please be aware that this documentation is out of date and will soon be archived. New documentation will be posted shortly.

What are the Geoscience Standard Names?

The Geoscience Standard Names(GSN) are a set of variable names for concept labeling in the geosciences. The GSN are derived from and dramatically extend the CSDMS Standard Names, a set of variable names generated utilizing rules and controlled vocabularies described in Peckham (2014a). The goal of the CSN and GSN is to formalize the concepts needed to provide a deep description of a resource. This information can then be used to discover, compare, use and connect geoscience resources into workflows.

The GSN are a list of variable name strings standardized and encoded through an RDF ontology and a standard names generation engine. The ontology is split into an upper ontology that describes classes and class properties and a lower ontology that contains instances of the ontology. The most recent upper ontology can be found here and the accompanying lower ontology files can be found here.

We have made available a SPARQL endpoint that can be accessed over HTTP at http://35.194.43.13:3030/ds/query. Note that this is not a browser compatible site. For your convenience, we have also created a web freeform tool to search terms in the ontology. This tool is available by cliking the 'Search' tab in the menu at the top of this page.

The CSN, GSN, and ontology projects have been funded almost entirely within NSF EarthCube projects, including:

             
Earth System Bridge       OntoSoft / GeoSoft GeoSemantics
EarthCube website       EarthCube website EarthCube website
Main website       Main website Main website

Standardized Metadata for Models

Standardized metadata for models is the key to reliable and greatly simplified coupling in model coupling frameworks like CSDMS (Community Surface Dynamics Modeling System). This model metadata also helps model users to understand the important details that underpin computational models and to compare the capabilities of different models. These details include simplifying assumptions on the physics, governing equations and the numerical methods used to solve them, discretization of space (the grid) and time (the time- stepping scheme), state variables (input or output), model configuration parameters. This kind of metadata provides a "deep description" of a computational model that goes well beyond simple discovery/citation metadata (e.g. author, purpose, scientific domain, programming language, digital rights, provenance, execution) and captures the science that underpins a model. Basic metadata for discovery and citation is already well-served by projects like Dublin Core ( main site ) and DataCite.

The Model Component Metadata App

While having this kind of standardized metadata for each model in a repository opens up a wide range of exciting possibilities, it is difficult to collect this information and a carefully conceived "data model" or schema (like the GSN) is needed to store it. Automated harvesting and scraping methods can provide some useful information, but they often result in metadata that is inaccurate or incomplete, and this is not sufficient to enable the desired capabilities. In order to address this problem, we have developed a tool (tentatively) called the MCM App (Model Component Metadata) which runs on notebooks, tablets and smart phones. This tool was partially inspired by the TurboTax software, which greatly simplifies the necessary task of preparing tax documents. It allows a model developer or advanced user to provide a standardized, deep description of virtually any computational geoscience model. Under the hood, the tool uses the Geoscience Standard Names ontology for models, expressed as a collection of RDF files (Resource Description Framework). This ontology is based on core concepts such as variables, objects, quantities, operations, processes and assumptions, as described below.

The MCM App is being developed with Ionic 2 (an open-source app development framework built on Google's Angular 2) and will be available on this page soon. It will be accessible in a desktop browser window and also as an app for mobile devices, including iOS and Android.

Variables — The Currency of Science

Variables are the fundamental currency of science. Values of variables are the items that are measured and saved in data sets of all kinds. They are the inputs and outputs of predictive models and the items exchanged between coupled models. They also appear in the equations that summarize our scientific knowledge. But what are they? Variables are symbols, names or labels that refer to the pairing of a particular object and a particular attribute of that object. Geospatial variables have associated grids, and require geo-referencing with ellipsoids, datums and map projections.

Geoscience resources, such as data sets, models and even papers typically use their own internal vocabulary to refer to variables, which may use domain jargon, abbreviations and an over-reliance on context to provide omitted information. When resources are connected into some kind of workflow, the values of variables need to be reliably passed between these resources. This gives rise to the problem of semantic mediation or reconciliation. Hub-and-spoke mappings from internal names to standard variable names provide an elegant solution to this problem if they are: unambiguous, rules-based, expressive, cross-domain and widely recognized.

Why Do We Need Standard Names?

There are a wide variety of computational models and data sets in use all over the world. Unfortunately, they differ in a variety of ways and this makes it labor-intensive to connect them into workflows. For example, each has its own internal vocabulary for referring to its variables. However, automated interoperability becomes possible once these are mapped to unambiguous, cross-domain, rule-based standard names. For a more in-depth discussion of the semantic mediation problem, what kinds of variable names are needed and prior work, see Peckham (2014a).

The Eight Core Concepts of the GSN Ontology

The GSN Ontology is based on 8 core concepts or entities that are inter-related. These are variables, objects, quantities, operations, processes, grids, assumptions and science domains. Objects and quantities are also considered fundamental in the International System of Quantities (ISO 80000).

1. Variable Names

Variables are the fundamental currency of science. Values of variables are what scientists measure and save in data sets of all kinds. They are the inputs and outputs of predictive models and the items exchanged between coupled models. They also appear in the equations that summarize our scientific knowledge. But what are they? Variables are symbols, names or labels that refer to the pairing of an object and one of its attributes.

2. Object Names

In our context, an object is any physical thing that we can observe (body, substance, etc.). We are often interested in a particular part of something larger, or an object contained in another object. For context and alphabetical grouping, it is therefore helpful to use hierarchical object names. Objects may have both numerical and string attributes. In the GSN, a word after a tilde '~' in an object name is an adjective.

3. Quantity Names

A quantity is an attribute of an object that has a numerical value. It will often have measurement units but can also be dimensionless (e.g. [m/m]). It may be represented as a scalar, vector or tensor. Many distinct quantities may have the same root quantity, such as constant, exponent and angle. Good quantity names are object free and can then be applied to many different objects. For example, volume flow rate is preferable to streamflow.

4. Operation Names

When a mathematical operation is applied to a quantity it simply creates a new quantity, often with new units. So quantity names may contain zero, one or a chain of operations. In the GSN, all operation names end in the word of. Examples include: time_derivative_of, area_integral_of, x_component_of, log_of and divergence_of.

5. Process Names

A process is an action that an object can do or that can happen to it. For example, a glacier can advance, calve, melt, sublimate, slide, or deform. Process names are nouns derived from verbs. E.g. water can infiltrate into soil, and this process is called infiltration.

6. Grids

Variables can be associated with a fixed location or can vary in space and time, such as temperature within a room. As appropriate, they may then be treated as scalar, vector or tensor fields. A grid is a subdivision or discretization of space into grid cells. Grids for geospatial variables require geo-referencing with ellipsoids, datums and map projections.

7. Assumption Names

In the GSN ontology, the term assumption is used broadly to refer to any type of qualifier, such as a simplification, limitation, convention, exclusion, condition, approximation, clarification or restriction. Scientists refer to assumptions with standard phrases, such as incompressible flow. Any of the other 7 entities in the GSN can be tagged and qualified with an assumption.

8. Science Domain Names

The GSN is currently using the UNESCO Nomenclature for Fields of Science and Technology which uses SKOS. This is a hierarchical classification of different science and technology domains. These can be used to tag the other 7 entities, as appropriate, so that they can be filtered based on the most relevant science domain.

Relationships between the Core Concepts

Models can have Objects, Assumptions and (science) Domains. Objects can have Quantities, Processes and Assumptions. Quantities can have Grids and Assumptions. Processes can have Assumptions. Operations act on Quantities to create new ones.

For More Information and Applications

David, C.H., Y. Gil, C. Duffy, S.D. Peckham and S.K. Venayagamoorthy (2016) An introduction to the Earth and Space Science, special issue: "Geoscience Papers of the Future", American Geophysical Union, 1-4, http://dx.doi.org/10.1002/2016EA000201.

Elag, M.M., P. Kumar, L. Marini, S.D. Peckham (2015) Semantic interoperability of long-tail geoscience resources over the Web, In: Large-Scale Machine Learning in the Earth Sciences, Eds. A.N. Srivastava, R. Nemani and K. Steinhaeuser, Taylor and Francis (book chapter, accepted)

Jiang, P., M. Elag, P. Kumar, S.D. Peckham, L. Marini, L. Riu (2016) A service-oriented architecture for coupling web service models using the Basic Model Interface (BMI), (submitted to Environmental Modeling and Software)

Laniak, G.F., G. Olchin, J. Goodall, A. Voinov, M. Hill, P. Glynn, G. Whelan, G. Geller, N. Quinn, M. Blind, S. Peckham, S. Reaney, N. Gaber, R. Kennedy and A. Hughes (2013) Integrated environmental modeling: A vision and roadmap for the future, 39, 3-23, Environmental Modeling & Software, http://dx.doi.org/10.1016/j.envsoft.2012.09.006

Peckham, S.D., E.W.H. Hutton and B. Norris (2013) A component-based approach to integrated modeling in the geosciences: The Design of CSDMS, Computers & Geosciences, special issue: Modeling for Environmental Change, 53, 3-12. http://dx.doi.org/10.1016/j.cageo.2012.04.002.

Peckham, S.D. (2014a) The CSDMS Standard Names: Cross-domain naming conventions for describing process models, data sets and their associated variables, Proceedings of the 7th Intl. Congress on Env. Modelling and Software, International Environmental Modelling and Software Society (iEMSs), San Diego, CA. (Eds. D.P. Ames, N.W.T. Quinn, A.E. Rizzoli). http://scholarsarchive.byu.edu/iemssconference/2014/Stream-A/12/.

Peckham, S.D. (2014b) EMELI 1.0: An experimental smart modeling framework for automatic coupling of self-describing models, Proceedings of HIC 2014, 11th International Conf. on Hydroinformatics, New York, NY. http://academicworks.cuny.edu/cc_conf_hic/464/.

Peckham, S.D., A. Kelbert, M.C. Hill and E.W.H. Hutton (2016) Towards uncertainty quantification and parameter estimation for Earth system models in a component-based modeling framework, Computers & Geosciences, special issue: Uncertainty and Sensitivity in Surface Dynamics Modeling, 90(B), 152-161 . http://dx.doi.org/10.1016/j.cageo.2016.03.005

Peckham, S.D. and J.L. Goodall (2013) Driving plug-and-play models with data from web-services: A demonstration of interoperability between CSDMS and CUAHSI-HIS, Computers & Geosciences, special issue: Modeling for Environmental Change, 53, 154-161, http://dx.doi.org/10.1016/j.cageo.2012.04.019.

Syvitski, J.P., E. Hutton, M. Piper, I. Overeem, A. Kettner and S. Peckham (2014) Plug and play component modeling — The CSDMS 2.0 approach, Proceedings of the 7th Intl. Congress on Env. Modelling and Software, International Environmental Modelling and Software Society (iEMSs), San Diego, CA. (Eds. D.P. Ames, N.W.T. Quinn, A.E. Rizzoli), Paper 4. http://scholarsarchive.byu.edu/iemssconference/2014/Stream-B/4/

Voinov, A.A., C. Deluca, R.R. Hood, S.D. Peckham, C.R. Sherwood, J.P.M. Syvitski (2010) A community approach to earth systems modeling, EOS, Transactions American Geophysical Union, 91(13), p. 117, 30 March 2010, http://dx.doi.org/10.1029/2010EO130001.