Dapper in-situ conventions spec available
Daniel Holloway
d.holloway at webhost.opendap.org
Wed Oct 11 13:33:55 PDT 2006
On Oct 11, 2006, at 3:41 PM, Steve Hankin wrote:
> All,
>
> I'd like to second Roy's point about the "has_data attribute". A
> real weakness of the current OPeNDAP/DAPPER formulation is that
> there is no standard way defined to request (say)
>
> "give me all of the profiles in <space-time region> that contain
> measurements of TEMPERATURE AND SALINITY"
Steve,
I'm not convinced the 'has_data' attribute is the answer.
First, my experience with Dapper was that it can (and does) expose a
different DDS for each dataset it serves. Basically the underlying
data represented via Dapper as a dataset should contain all of the
variables represented, granted that's not an absolute but it's
generally true for homogeneous collections. Anyway, I think the
point Benno raised about extending the constraint-mechanism for
Sequences is the solution to your problem and would be better than
adding the 'has_data' attribute as you're suggesting. Though
implementing this constraint operation might be non-trivial.
With respect to the discussion on this convention I think we need
to be careful with the terminology, a 'collection of observations'
could be interpreted in different ways.
Dan
>
> In real world practice collections of observations very frequently
> have lists of variables that vary from one site to the next. The
> ability to constrain requests to only observations that contain the
> variables that of interest seems pretty fundamental.
>
> - Steve
>
> ===========================================
>
> Roy Mendelssohn wrote:
>> First, it is nice to see some efforts to agree on conventions for
>> in-situ data. It is long overdue and will simplify a lot of the
>> work we (ERD) do.
>>
>> I have been trying to follow this discussion as best as I can, and
>> would like to add several comments that may be somewhat orthogonal
>> to previous discussion points. I would like to add that some of
>> the points raised in previous emails appear more about how to
>> store in-situ data, rather than a formal convention for
>> transmitting them in OPeNDAP. I assume that this is the primary
>> purpose of the specification.
>>
>> We now have several Dapper servers serving a fairly large amount
>> of data using the precursor to this specification (though it is
>> essentially the same), and my comments are directed at the types
>> of in-situ data that we have found do not make for a natural fit
>> with this convention. It may be that for a first pass we do not
>> want to include this in the specification, as no one spec will
>> always please everybody.
>>
>> 1. Station data with an inexact station location. In many
>> fisheries and oceanographic surveys data are taken at "stations"
>> but the location is inexact, so that it is necessary to have
>> changing lat/lon information with the observations. You can do
>> this by having a separate "file" for each profile, and having the
>> station number as a variable in the inner sequence, or including
>> the lat/lon in the inner sequence (which to some extent would
>> violate the convention), but since most programs will look to the
>> outer sequence for coordinate type information, neither of these
>> solutions work that well. A possibility would be to have an option
>> to have station number in the inner sequence in a set way, and
>> that server/clients know to look for this (the present Dapper
>> server actually does this).
>>
>> 2. Ragged arrays and either z or t in the inner sequence.
>> Netcdf-4 will have ragged arrays - though I haven't had a chance
>> yet to see how the handle the dimensioning for the ragged array.
>> Do we want something that can handle that in one file. Again
>> staying with the idea that we have set of profiles at depth at a
>> "station" with inexact positions, and we would like to send all
>> the data from that station together. When you have subsurface
>> data, the biggest problem is that the depths vary with each
>> profile. So to combine the profiles, you either have a depth
>> dimension with all possible depths and a lot of missing data, or
>> else you do one "file" per profile. The latter, combined with the
>> either 't' or 'z' axis in the inner sequence tends to make it so
>> we can't readily do time series from the same station, though that
>> is an obvious thing to want to do (to be more precise - clearly
>> one could do that if they know what to look for in our files and
>> that they have that structure - but there is nothing in the spec
>> per se that would make this a general solution). So do we want
>> something in the spec that describes ragged arrays?
>>
>> 3. has_data attribute. to use the spec effectively, particularly
>> in a time series sense, we have found that any parameter in the
>> inner sequence show always be there, but often it will not be
>> observed while other parameters were. Rather than having to look
>> at the data itself to see if it is totally missing, do we want a
>> "has_data" attribute required?
>>
>> I hope these comments are at least somewhat clear. I am a little
>> fuzzy-headed normally and a bad cold hasn't helped. May have more
>> comments but those are my initial ones.
>>
>> BTW - for others on my staff that are not on the mail-list - are
>> the sequence of emails being archived somewhere that they can view
>> them. The discussion has been very interesting.
>>
>> -Roy M.
>
> --
> --
>
> Steve Hankin, NOAA/PMEL -- Steven.C.Hankin at noaa.gov
> 7600 Sand Point Way NE, Seattle, WA 98115-0070
> ph. (206) 526-6080, FAX (206) 526-6744
More information about the Opendap-tech
mailing list