Dapper in-situ conventions spec available

Steve Hankin Steven.C.Hankin at noaa.gov
Wed Oct 11 15:00:04 PDT 2006



Daniel Holloway wrote:
>
> On Oct 11, 2006, at 3:41 PM, Steve Hankin wrote:
>
>> All,
>>
>> I'd like to second Roy's point about the "has_data attribute".   A 
>> real weakness of the current OPeNDAP/DAPPER formulation is that there 
>> is no standard way defined to request (say)
>>
>>    "give me all of the profiles in <space-time region> that contain 
>> measurements of TEMPERATURE AND SALINITY"
>
>   Steve,
>
>    I'm not convinced the 'has_data' attribute is the answer.   First, 
> my experience with Dapper was that it can (and does) expose a 
> different DDS for each dataset it serves.  Basically the underlying 
> data represented via Dapper as a dataset should contain all of the 
> variables represented, granted that's not an absolute but it's 
> generally true for homogeneous collections.  Anyway, I think the point 
> Benno raised about extending the constraint-mechanism for Sequences is 
> the solution to your problem and would be better than adding the 
> 'has_data' attribute as you're suggesting.  Though implementing this 
> constraint operation might be non-trivial.
>
Hi Dan,

Full agreement and sorry if I was unclear.  I intended only to describe 
a problem -- not to propose a specific solution.   (The idea of some 
improved form of "constraint operator" was exactly what I was 
envisioning based upon previous discussions with you.)  Next step is to 
get a specific proposal on the table. This topic has been bounced around 
for several years, but as far as I am aware proposals have never taken 
concrete form.

    - Steve
>    With respect to the discussion on this convention I think we need 
> to be careful with the terminology, a 'collection of observations' 
> could be interpreted in different ways.
>
>     Dan
>
>
>>
>> In real world practice collections of observations very frequently 
>> have lists of variables that vary from one site to the next.  The 
>> ability to constrain requests to only observations that contain the 
>> variables that of interest seems pretty fundamental.
>>
>>    - Steve
>>
>> ===========================================
>>
>> Roy Mendelssohn wrote:
>>> First, it is nice to see some efforts to agree on conventions for 
>>> in-situ data.  It is long overdue and will simplify a lot of the 
>>> work we  (ERD) do.
>>>
>>> I have been trying to follow this discussion as best as I can, and 
>>> would like to add several comments that may be somewhat orthogonal 
>>> to previous discussion points.  I would like to add that some of the 
>>> points raised in previous emails appear more about how to store 
>>> in-situ data, rather than a formal convention for transmitting them 
>>> in OPeNDAP.  I assume that this is the primary purpose of the 
>>> specification.
>>>
>>> We now have several Dapper  servers serving a fairly large amount of 
>>> data using the precursor to this specification (though it is 
>>> essentially the same),  and my comments are directed at the types of 
>>> in-situ data that we have found do not make for a natural fit with 
>>> this convention.  It may be that for a first pass we do not want to 
>>> include this in the specification, as no one spec will always please 
>>> everybody.
>>>
>>> 1.  Station data with an inexact station location.  In many 
>>> fisheries and oceanographic surveys data are taken at "stations" but 
>>> the location is inexact, so that it is necessary to have changing 
>>> lat/lon information with the observations.  You can do this by 
>>> having a separate  "file" for each profile, and having the station 
>>> number as a variable in the inner sequence, or including the lat/lon 
>>> in the inner sequence (which to some extent would violate the 
>>> convention), but since most programs will look to the outer sequence 
>>> for coordinate type information, neither of these solutions work 
>>> that well. A possibility would be to have an option to have station 
>>> number in the inner sequence in a set way, and that server/clients 
>>> know to look for this  (the present Dapper server actually does this).
>>>
>>> 2.  Ragged arrays and either z or t in the inner sequence.  Netcdf-4 
>>> will have ragged arrays - though I haven't had a chance yet to see 
>>> how the handle the dimensioning for the ragged array.  Do we want 
>>> something that can handle that in one file.  Again staying with the 
>>> idea that we have set of profiles at depth at a "station" with 
>>> inexact positions, and we would like to send all the data from that 
>>> station together.  When you have subsurface data, the biggest 
>>> problem is that the depths vary with  each profile.  So to combine 
>>> the profiles, you either have a depth dimension with all possible 
>>> depths and a lot of missing data, or else you do one "file" per 
>>> profile. The latter, combined with the either 't' or 'z' axis in the 
>>> inner sequence tends to make it so we can't readily do time series 
>>> from the same station, though that is an obvious thing to want to 
>>> do  (to be more precise - clearly one could do that if they know 
>>> what to look for in our files and that they have that structure - 
>>> but there is nothing in the spec per se that would make this a 
>>> general solution). So do we want something in the spec that 
>>> describes ragged arrays?
>>>
>>> 3. has_data attribute.  to use the spec effectively, particularly in 
>>> a time series sense, we have found that any parameter in the inner 
>>> sequence show always be there, but often it will not be observed 
>>> while other parameters were.  Rather than having to look at the data 
>>> itself to see if it is totally missing, do we want a "has_data" 
>>> attribute required?
>>>
>>> I hope these comments are at least somewhat clear.  I am a little 
>>> fuzzy-headed normally and a bad cold hasn't helped.  May have more 
>>> comments but those are my initial ones.
>>>
>>> BTW - for others on my staff that are not on the mail-list - are the 
>>> sequence of emails being archived somewhere that they can view them. 
>>> The discussion has been very interesting.
>>>
>>> -Roy M.
>>
>> ----
>>
>> Steve Hankin, NOAA/PMEL -- Steven.C.Hankin at noaa.gov
>> 7600 Sand Point Way NE, Seattle, WA 98115-0070
>> ph. (206) 526-6080, FAX (206) 526-6744
>



More information about the Opendap-tech mailing list