Dapper in-situ conventions spec available

Daniel Holloway d.holloway at webhost.opendap.org
Wed Oct 11 13:33:55 PDT 2006


On Oct 11, 2006, at 3:41 PM, Steve Hankin wrote:

> All,
>
> I'd like to second Roy's point about the "has_data attribute".   A  
> real weakness of the current OPeNDAP/DAPPER formulation is that  
> there is no standard way defined to request (say)
>
>    "give me all of the profiles in <space-time region> that contain  
> measurements of TEMPERATURE AND SALINITY"

   Steve,

    I'm not convinced the 'has_data' attribute is the answer.    
First, my experience with Dapper was that it can (and does) expose a  
different DDS for each dataset it serves.  Basically the underlying  
data represented via Dapper as a dataset should contain all of the  
variables represented, granted that's not an absolute but it's  
generally true for homogeneous collections.  Anyway, I think the  
point Benno raised about extending the constraint-mechanism for  
Sequences is the solution to your problem and would be better than  
adding the 'has_data' attribute as you're suggesting.  Though  
implementing this constraint operation might be non-trivial.

    With respect to the discussion on this convention I think we need  
to be careful with the terminology, a 'collection of observations'  
could be interpreted in different ways.

     Dan


>
> In real world practice collections of observations very frequently  
> have lists of variables that vary from one site to the next.  The  
> ability to constrain requests to only observations that contain the  
> variables that of interest seems pretty fundamental.
>
>    - Steve
>
> ===========================================
>
> Roy Mendelssohn wrote:
>> First, it is nice to see some efforts to agree on conventions for  
>> in-situ data.  It is long overdue and will simplify a lot of the  
>> work we  (ERD) do.
>>
>> I have been trying to follow this discussion as best as I can, and  
>> would like to add several comments that may be somewhat orthogonal  
>> to previous discussion points.  I would like to add that some of  
>> the points raised in previous emails appear more about how to  
>> store in-situ data, rather than a formal convention for  
>> transmitting them in OPeNDAP.  I assume that this is the primary  
>> purpose of the specification.
>>
>> We now have several Dapper  servers serving a fairly large amount  
>> of data using the precursor to this specification (though it is  
>> essentially the same),  and my comments are directed at the types  
>> of in-situ data that we have found do not make for a natural fit  
>> with this convention.  It may be that for a first pass we do not  
>> want to include this in the specification, as no one spec will  
>> always please everybody.
>>
>> 1.  Station data with an inexact station location.  In many  
>> fisheries and oceanographic surveys data are taken at "stations"  
>> but the location is inexact, so that it is necessary to have  
>> changing lat/lon information with the observations.  You can do  
>> this by having a separate  "file" for each profile, and having the  
>> station number as a variable in the inner sequence, or including  
>> the lat/lon in the inner sequence (which to some extent would  
>> violate the convention), but since most programs will look to the  
>> outer sequence for coordinate type information, neither of these  
>> solutions work that well. A possibility would be to have an option  
>> to have station number in the inner sequence in a set way, and  
>> that server/clients know to look for this  (the present Dapper  
>> server actually does this).
>>
>> 2.  Ragged arrays and either z or t in the inner sequence.   
>> Netcdf-4 will have ragged arrays - though I haven't had a chance  
>> yet to see how the handle the dimensioning for the ragged array.   
>> Do we want something that can handle that in one file.  Again  
>> staying with the idea that we have set of profiles at depth at a  
>> "station" with inexact positions, and we would like to send all  
>> the data from that station together.  When you have subsurface  
>> data, the biggest problem is that the depths vary with  each  
>> profile.  So to combine the profiles, you either have a depth  
>> dimension with all possible depths and a lot of missing data, or  
>> else you do one "file" per profile. The latter, combined with the  
>> either 't' or 'z' axis in the inner sequence tends to make it so  
>> we can't readily do time series from the same station, though that  
>> is an obvious thing to want to do  (to be more precise - clearly  
>> one could do that if they know what to look for in our files and  
>> that they have that structure - but there is nothing in the spec  
>> per se that would make this a general solution). So do we want  
>> something in the spec that describes ragged arrays?
>>
>> 3. has_data attribute.  to use the spec effectively, particularly  
>> in a time series sense, we have found that any parameter in the  
>> inner sequence show always be there, but often it will not be  
>> observed while other parameters were.  Rather than having to look  
>> at the data itself to see if it is totally missing, do we want a  
>> "has_data" attribute required?
>>
>> I hope these comments are at least somewhat clear.  I am a little  
>> fuzzy-headed normally and a bad cold hasn't helped.  May have more  
>> comments but those are my initial ones.
>>
>> BTW - for others on my staff that are not on the mail-list - are  
>> the sequence of emails being archived somewhere that they can view  
>> them. The discussion has been very interesting.
>>
>> -Roy M.
>
> -- 
> --
>
> Steve Hankin, NOAA/PMEL -- Steven.C.Hankin at noaa.gov
> 7600 Sand Point Way NE, Seattle, WA 98115-0070
> ph. (206) 526-6080, FAX (206) 526-6744



More information about the Opendap-tech mailing list