Dapper in-situ conventions spec available
John Caron
caron at unidata.ucar.edu
Fri Oct 6 09:45:48 PDT 2006
Hi All:
Here are some specific comments on the spec:
1. "The inner sequence contains all of the measurement variables".
Generalization: It could be reasonable to allow measurements in the outer sequence, to save space and clarify invariants.
2. "The outer sequence must have an Int32 variable named _id and this variable must have a unique value for each entry of the outer sequence".
Generalization: _id could be any type, including String.
3. "The outer sequence must have two variables that specify the x (longitude) axis and y (latitude) axis of the dataset. These variables are identified by the axis attribute and have values of "X" and "Y" respectively."
Clarity: ... have values of "lat" and "lon" respectively.
4. "The x,y, t, and z axes must have a Float32 or Float64 type."
Generalization: t could alternately be an ISO 8601 formatted String.
5. "The sequence may have an attributes variable (of type Structure) that contains per-profile or per-time series metadata as a set of name-value pairs... All members of the structure must have a String type."
- Are these handled in any specail way? Is there a need to seperate the measurements from the metadata? Otherwise, these could just be variables in the inner sequence.
- Can there be other data in the outer sequence, eg a station id string? Is this attribute structure a way to eliminate that?
6. "The sequence may have a variable_attributes variable of type Structure that contains per-profile or per-time series metadata for a specific data variable". The example, though, seems to describe variable metadata as a whole, not per-profile. In cany case, I wonder if these cant just be variables in the inner sequence?
7. Why do the inner sequence variables have to be floats?
8. How does the structure variable constrained_ranges differ from the global attributes lon_range, lat_range, etc?
9. max_profiles_per_request could be unlimited?
10. total_profiles_in_dataset could be unknown ?
11. "A server must support all selection constrains on coordinate variables."
- Does this mean any combination of constraints, even very long and complex ones? A useful restriction might be just space/time bounding boxes.
- on all coordinate variables? a useful restriction might be only on outer coordinate variables.
12. The DAS example implies that _id, lat, lon, time, depth can have missing values. It seems like one might want to dissallow that ?
A. Stepping back, this spec allows:
- sets of variable length structures (i.e. 2D ragged arrays)
- constraint selections allow queries that subset the dateset by space and time.
- the _id allows a smart client to first get the ids of the inner sequences that have been subsetted by space and time, and then retrieve those inner sequences by id.
B. One type of data we try to deal with is track/trajectory data where x, y, z, and t can all vary from observation to observation (e.g., data taken during an aircraft flight). So something like:
netcdf trajectory_one.nc {
dimensions:
traj = 5;
time = 2848; // (has coord.var)
variables:
String traj(traj);
double time(time, traj);
double depth(time, traj);
double latitude(time, traj);
double longitude(time, traj);
double temperature(time, traj);
}
A Dapper-ish view might look like:
Dataset {
Sequence {
Int32 _id;
Sequence {
Float64 time;
Float64 lat;
Float64 lon;
Float64 elev;
Float32 temp;
} observation;
} trajectory;
...
} my_trajectory_collection;
In this case, the x,y,z, and t coordinates are all in the inner sequence. One could imagine other datasets that want some coordinates in the inner and some in the outer sequence.
C. Providing all possible selections on a sequence is a real implementation problem. One could state it as: how do you let the client know what selections are efficient. It might be interesting to provide some metadata conventions that do that in the general case, eg:
NC_GLOBAL {
String selection_efficient "x y _id inner.depth";
String selection_forbidden "inner.*";
String selection_forbidden "inner.*";
}
says "selections on x, y, _id, and inner.depth are efficient, anything else on inner sequence is forbidded, anything else is allowed but may be slow". Of course, clients would have to get smart to use that info. This is equivilent to examining a database schema. In that sense, the info might belong in the DDS somehow. OTOH, the DAP-2 spec says that an opendap server must allow projections, so does that make a dapper server non-compliant?
D. BTW, we've been working on a netCDF-3 file convention for storing observation data, and we are studying if we can map these files into this spec. A lot of the problems have to do with efficiently implementing the projections. See
http://www.unidata.ucar.edu/software/netcdf-java/formats/UnidataObsConvention.html).
if you are interested.
John and Ethan
More information about the Opendap-tech
mailing list