This document presents a series of examples of a proposed new way to model quantitative and qualitative values in the context of the Open Bio Ontologies (OBO).
Technical Stuff
This document is a Quarto notebook. First let’s get some technical bits out of the way, such as specifying some classes, object properties, and data properties to reuse below (in YAML format).
Code
from src.display import displaycontext ="""classes: Adelie Penguin (Pygoscelis adeliae): NCBITaxon:9238 attribute: :attribute assay: OBI:0000070 averaging data transformation: OBI:0200170 characteristic: PATO:0000001 data item: IAO:0000027 entity: BFO:0000001 g: unit:g genotypic sex: PATO:0020000 male genotypic sex: PATO:0020001 mass: PATO:0000125 mass measurement assay: OBI:0000445 normal mass: PATO:0045030 value: :valueobject properties: conversion of: :conversionOf has attribute: :hasAttribute has characteristic: RO:0000053 has specified output: OBI:0000299 has value: :hasValue is about: IAO:0000136 measures attribute: :measuresAttribute measures entity: :measuresEntity specifies value: :specifiesValuedata properties: has quantity: :hasQuantity has unit: :hasUnitshort: Adelie Penguin (Pygoscelis adeliae): Adelie Penguinreverse: - conversion of - has specified output - measures attribute - measures entity - specifies valueloose: - is about"""
You can view each example as a diagram, in Turtle syntax, or in a human-readable YAML syntax that was used to generate both the diagram and the Turtle.
Characteristics
One of the simplest things we can say in OBO is that an entity has a characteristic, using the ‘has characteristic’ relation. Here we have
two very general classes: ‘entity’ and ‘characteristic’
an instance of the ‘entity’ class named ‘e’
an anonymous instance of the ‘characteristic’ class
one relation between the instances: ‘has characteristic’
Some characteristics are fundamental to what an entity is. Every penguin has body mass. Every penguin has a genotypic sex. By knowing that something is a penguin, we automatically know that it has these characteristics. Let’s call these characteristics ‘attributes’, and introduce a ‘has attribute’ relation to relate an entity to one of its attributes.
Other characteristics can be different at different times or under different conditions. This particular penguin might be male, and it might have a body mass of 3750g. We expect that its genotypic sex won’t change, but we know that its body mass will change. Let’s call these characteristics ‘values’.
Every value is a more specific form of an attribute. A penguin always has one body mass attribute, and that attribute has values at different times and under different conditions.
We’ll intoduce a ‘has value’ relation between an attribute and one of its values.
Say we’re studying penguins on an island. We give names to the penguins we’re studying, and we name one of the “N1A1”. N1A1 is male (i.e. has a male genotype). We say that N1A1 ‘has attribute’ a ‘genotpic sex’, and that attribute ‘has value’ a ‘male genotypic sex’. Since we don’t need to say anything more about these characteristics, we’ll leave them anonymous.
Code
display(context, """subject: N1A1type: Adelie Penguin (Pygoscelis adeliae)has attribute: - type: genotypic sex has value: - type: male genotypic sex""")
We say that N1A1 has a body mass of 3750g like this:
Code
display(context, """subject: N1A1type: Adelie Penguin (Pygoscelis adeliae)has attribute: - type: mass has value: - type: mass has quantity: 3750 has unit: g""")
Note that the attribute and the value belong to the same class ‘mass’, but we use the ‘has quality’ and ‘has unit’ data properties to make the value more specific than the attribute.
Values Without Attributes
In all our examples, the value is still a characteristic the entity. The benefit of distinguishing attributes from values is in linking values to each other (discussed below), in grouping values together when they share an attribute, in connecting a measurement process to the attribute it measures (and thus to the possible values), in validating data, etc.
In all our examples here we will consistently distinguish attributes and values. If you don’t want to make this distinction, you can just use ‘has characteristic’ to connect an entity to a specific value.
Code
display(context, """subject: etype: entityhas characteristic: - type: mass has quantity: 3750 has unit: g - type: normal mass - type: male genotypic sex""")
In our example we measured N1A1’s body mass. In OBI we call that measurement process a ‘mass measurement assay’, which outputs an IAO ‘data item’. A ‘data item’ is an ‘information content entity’ (ICE), and ICEs are about things. So we can say that there is a data item about the value.
Code
display(context, """- subject: N1A1 type: Adelie Penguin (Pygoscelis adeliae) has attribute: _:N1A1_mass_attribute- subject: _:N1A1_mass_attribute type: mass has value: _:N1A1_mass_value- subject: _:N1A1_mass_value type: mass- subject: _:N1A1_data_item type: data item is about: _:N1A1_mass_value""")
Currently, OBI does not work like this. An OBI assay has a material entity as its “evaluant”, and the output data item ‘is about’ the evaluant. So the diagram looks like this:
Code
display(context, """- subject: N1A1 type: Adelie Penguin (Pygoscelis adeliae) has attribute: _:N1A1_mass_attribute- subject: _:N1A1_mass_attribute type: mass has value: _:N1A1_mass_value- subject: _:N1A1_mass_value type: mass- subject: _:N1A1_data_item type: data item is about: N1A1- subject: _:N1A1_mass_assay type: mass measurement assay has specified output: _:N1A1_data_item""")
Nothing connects the data item to the relevant value. An ICE can have multiple ‘is about’ relations, but it may be better to introduce a ‘specifies value’ relation.
OBI assays have material entities as inputs but in many cases the point of an assay is to measure a process. We never measure everything about a target entity, but we can often specify which attribute an assay is measuring. The OBI assay template has a ‘target entity’ column which contains a mixture of material entities, processes, and characteristics. OBI assay modelling could be improved by introducing ‘measures entity’ and ‘measures attribute’ relations. The range for ‘measures attribute’ would simply be any attribute. The range for ‘measures entity’ would be ‘material entity’ or ‘process’ or (arguably) ‘spatial region’ – the same as the domain for ‘has attribute’.
assay
input
measures entity
measures attribute
mass measurement assay
material entity
material entity
mass
binding assay
material entity
binding
binding strength/time ?
basophil assay
blood
basophil
count ?
cell-killing
cell population
cell
cell killing
We would then say that the assay outputs a data item, and that data item ‘specifies value’ some value, which is a value of the measured attribute, which is an attribute of the measured entity.
Including all those links at once, we get:
Code
display(context, """- subject: N1A1 type: Adelie Penguin (Pygoscelis adeliae) has attribute: _:N1A1_mass_attribute- subject: _:N1A1_mass_attribute type: mass has value: _:N1A1_mass_value- subject: _:N1A1_mass_value type: mass measured by: _:N1A1_mass_assay- subject: _:N1A1_data_item type: data item specifies value: _:N1A1_mass_value- subject: _:N1A1_mass_assay type: mass measurement assay has specified output: _:N1A1_data_item measures attribute: _:N1A1_mass_attribute measures entity: N1A1""")
We often want to convert a scalar value from one unit system to another. In our RDF representation, a value cannot have more than one ‘has quantity’ or ‘has unit’ property, because we wouldn’t keep track of which quantity goes with which unit. Instead we can use multiple values, and keep track of the connection between them using a new ‘conversion of’ relation.
Code
display(context, """- subject: N1A1 type: Adelie Penguin (Pygoscelis adeliae) has attribute: - type: mass has value: - _:N1A1_mass_value_g - _:N1A1_mass_value_kg- subject: _:N1A1_mass_value_g type: mass has quantity: 3750 has unit: g- subject: _:N1A1_mass_value_kg type: mass has quantity: 3.750 has unit: kg conversion of: _:N1A1_mass_value_g""")
From an ontological perspective, there might only be one value here, and we just refer to it by different names practical reasons.
Penguin Mass Ascertained
When talking about a value, we might be interested in what caused that value, or we might be interested in how we know what that value is. The measurement and conversion processes discussed above tell us how we know a value, but there are many other ways.
For example, we might want to talk about the average weight of N1A1, accounting for multiple mass measurements. Our data and diagrams should have the same shape as in the measurement case.
Code
display(context, """subject: N1A1type: Adelie Penguin (Pygoscelis adeliae)has attribute: - type: mass has value: - type: mass ascertained by: - type: averaging data transformation""")
:N1A1aNCBITaxon:9238;# 'Adelie Penguin (Pygoscelis adeliae)':hasAttribute[aPATO:0000125;# 'mass':hasValue[aPATO:0000125;# 'mass':ascertainedby[aOBI:0200170# 'averaging data transformation']]].
The ‘ascertained by’ relation is not causal. Calculating the average did not cause the mass value. Calculating the average is how we know the mass value. ‘measured by’ and ‘conversion of’ are subproperties of ‘ascertained by’. They are important enough that they deserve special treatment. The general ‘ascertained by’ relation could be used when we:
calculate a value
estimate a value
curate a value from the literature
In each case we can point to a planned process that ascerained the value. We can say more about the inputs, ouputs, participants, timing, etc. of those planned processes.
For the practical reasons discussed above for unit conversion, it is best for one value to have no more than one ascertaining process, including measurement and conversion processes. Even if five processes come up with the same scalar “3750g”, for practical purposes we should use five different values.
NOTE: In previous drafts I used the word ‘determine’, but changed to ‘ascertain’ to avoid any connotations of causation or choice, and to avoid other confusion with “determinate” and “determinable” terminology.