Dapper in-situ conventions spec available
Steve Hankin
Steven.C.Hankin at noaa.gov
Wed Oct 11 15:00:04 PDT 2006
Daniel Holloway wrote:
>
> On Oct 11, 2006, at 3:41 PM, Steve Hankin wrote:
>
>> All,
>>
>> I'd like to second Roy's point about the "has_data attribute". A
>> real weakness of the current OPeNDAP/DAPPER formulation is that there
>> is no standard way defined to request (say)
>>
>> "give me all of the profiles in <space-time region> that contain
>> measurements of TEMPERATURE AND SALINITY"
>
> Steve,
>
> I'm not convinced the 'has_data' attribute is the answer. First,
> my experience with Dapper was that it can (and does) expose a
> different DDS for each dataset it serves. Basically the underlying
> data represented via Dapper as a dataset should contain all of the
> variables represented, granted that's not an absolute but it's
> generally true for homogeneous collections. Anyway, I think the point
> Benno raised about extending the constraint-mechanism for Sequences is
> the solution to your problem and would be better than adding the
> 'has_data' attribute as you're suggesting. Though implementing this
> constraint operation might be non-trivial.
>
Hi Dan,
Full agreement and sorry if I was unclear. I intended only to describe
a problem -- not to propose a specific solution. (The idea of some
improved form of "constraint operator" was exactly what I was
envisioning based upon previous discussions with you.) Next step is to
get a specific proposal on the table. This topic has been bounced around
for several years, but as far as I am aware proposals have never taken
concrete form.
- Steve
> With respect to the discussion on this convention I think we need
> to be careful with the terminology, a 'collection of observations'
> could be interpreted in different ways.
>
> Dan
>
>
>>
>> In real world practice collections of observations very frequently
>> have lists of variables that vary from one site to the next. The
>> ability to constrain requests to only observations that contain the
>> variables that of interest seems pretty fundamental.
>>
>> - Steve
>>
>> ===========================================
>>
>> Roy Mendelssohn wrote:
>>> First, it is nice to see some efforts to agree on conventions for
>>> in-situ data. It is long overdue and will simplify a lot of the
>>> work we (ERD) do.
>>>
>>> I have been trying to follow this discussion as best as I can, and
>>> would like to add several comments that may be somewhat orthogonal
>>> to previous discussion points. I would like to add that some of the
>>> points raised in previous emails appear more about how to store
>>> in-situ data, rather than a formal convention for transmitting them
>>> in OPeNDAP. I assume that this is the primary purpose of the
>>> specification.
>>>
>>> We now have several Dapper servers serving a fairly large amount of
>>> data using the precursor to this specification (though it is
>>> essentially the same), and my comments are directed at the types of
>>> in-situ data that we have found do not make for a natural fit with
>>> this convention. It may be that for a first pass we do not want to
>>> include this in the specification, as no one spec will always please
>>> everybody.
>>>
>>> 1. Station data with an inexact station location. In many
>>> fisheries and oceanographic surveys data are taken at "stations" but
>>> the location is inexact, so that it is necessary to have changing
>>> lat/lon information with the observations. You can do this by
>>> having a separate "file" for each profile, and having the station
>>> number as a variable in the inner sequence, or including the lat/lon
>>> in the inner sequence (which to some extent would violate the
>>> convention), but since most programs will look to the outer sequence
>>> for coordinate type information, neither of these solutions work
>>> that well. A possibility would be to have an option to have station
>>> number in the inner sequence in a set way, and that server/clients
>>> know to look for this (the present Dapper server actually does this).
>>>
>>> 2. Ragged arrays and either z or t in the inner sequence. Netcdf-4
>>> will have ragged arrays - though I haven't had a chance yet to see
>>> how the handle the dimensioning for the ragged array. Do we want
>>> something that can handle that in one file. Again staying with the
>>> idea that we have set of profiles at depth at a "station" with
>>> inexact positions, and we would like to send all the data from that
>>> station together. When you have subsurface data, the biggest
>>> problem is that the depths vary with each profile. So to combine
>>> the profiles, you either have a depth dimension with all possible
>>> depths and a lot of missing data, or else you do one "file" per
>>> profile. The latter, combined with the either 't' or 'z' axis in the
>>> inner sequence tends to make it so we can't readily do time series
>>> from the same station, though that is an obvious thing to want to
>>> do (to be more precise - clearly one could do that if they know
>>> what to look for in our files and that they have that structure -
>>> but there is nothing in the spec per se that would make this a
>>> general solution). So do we want something in the spec that
>>> describes ragged arrays?
>>>
>>> 3. has_data attribute. to use the spec effectively, particularly in
>>> a time series sense, we have found that any parameter in the inner
>>> sequence show always be there, but often it will not be observed
>>> while other parameters were. Rather than having to look at the data
>>> itself to see if it is totally missing, do we want a "has_data"
>>> attribute required?
>>>
>>> I hope these comments are at least somewhat clear. I am a little
>>> fuzzy-headed normally and a bad cold hasn't helped. May have more
>>> comments but those are my initial ones.
>>>
>>> BTW - for others on my staff that are not on the mail-list - are the
>>> sequence of emails being archived somewhere that they can view them.
>>> The discussion has been very interesting.
>>>
>>> -Roy M.
>>
>> ----
>>
>> Steve Hankin, NOAA/PMEL -- Steven.C.Hankin at noaa.gov
>> 7600 Sand Point Way NE, Seattle, WA 98115-0070
>> ph. (206) 526-6080, FAX (206) 526-6744
>
More information about the Opendap-tech
mailing list