Dapper in-situ conventions spec available

Joe Sirott Joe.Sirott at noaa.gov
Thu Oct 12 12:43:53 PDT 2006


Hi Roy,

Roy Mendelssohn wrote:
>
> 1.  Station data with an inexact station location.  In many fisheries 
> and oceanographic surveys data are taken at "stations" but the 
> location is inexact, so that it is necessary to have changing lat/lon 
> information with the observations.  You can do this by having a 
> separate  "file" for each profile, and having the station number as a 
> variable in the inner sequence, or including the lat/lon in the inner 
> sequence (which to some extent would violate the convention), but 
> since most programs will look to the outer sequence for coordinate 
> type information, neither of these solutions work that well. A 
> possibility would be to have an option to have station number in the 
> inner sequence in a set way, and that server/clients know to look for 
> this  (the present Dapper server actually does this).
A couple of potential solutions :

   1. The outer sequence could have a latitude and longitude (perhaps
      the centroid of the inexact lats and lons) that would be "good
      enough" for queries. The inner sequence could then have the more
      precise values as variables in the inner sequence.
   2. The outer sequence could be used to store a bounding for x,y,(z|t)
      rather than a point. Again, the inner sequence could then be used
      to store precise values.

Note that both of these schemes might also be used to represent 
trajectories.

1) has the advantage of simplicity and compatibility with the current 
Dapper implementation but might break down if the trajectory covered a 
large geographical region. 2) offers more generality at the expense of 
increased client complexity.
>
> 2.  Ragged arrays and either z or t in the inner sequence.  Netcdf-4 
> will have ragged arrays - though I haven't had a chance yet to see how 
> the handle the dimensioning for the ragged array.  Do we want 
> something that can handle that in one file.  Again staying with the 
> idea that we have set of profiles at depth at a "station" with inexact 
> positions, and we would like to send all the data from that station 
> together.  When you have subsurface data, the biggest problem is that 
> the depths vary with  each profile.  So to combine the profiles, you 
> either have a depth dimension with all possible depths and a lot of 
> missing data, or else you do one "file" per profile. The latter, 
> combined with the either 't' or 'z' axis in the inner sequence tends 
> to make it so we can't readily do time series from the same station, 
> though that is an obvious thing to want to do  (to be more precise - 
> clearly one could do that if they know what to look for in our files 
> and that they have that structure - but there is nothing in the spec 
> per se that would make this a general solution). So do we want 
> something in the spec that describes ragged arrays?
This, I believe, is more of a Dapper implementation issue (e.g. how 
Dapper aggregates existing file formats) and file formatting issue than 
an issue with the Dapper OPeNDAP protocol, correct?
>
> 3. has_data attribute.  to use the spec effectively, particularly in a 
> time series sense, we have found that any parameter in the inner 
> sequence show always be there, but often it will not be observed while 
> other parameters were.  Rather than having to look at the data itself 
> to see if it is totally missing, do we want a "has_data" attribute 
> required?
Yes, this is an issue that we too have encountered with our Web 
interface to Dapper. The problem is that the DDS contains the union of 
all of the variables in the dataset, even if a given variable is only 
measured in a few time series or profiles. One example,  is the World 
Ocean Database 2005. A chlorophyll variable is contained in the DDS, but 
very few of the millions of profiles in the database actually contain 
measurements for this variable. So any user who is looking for 
chlorophyll measurements will have to sift through many profiles before 
finding a profile with valid data.

Adding a has_data as an optional variable attribute sounds like a good 
idea to me.
>
> I hope these comments are at least somewhat clear.  I am a little 
> fuzzy-headed normally and a bad cold hasn't helped.  May have more 
> comments but those are my initial ones.
Hope the cold is better ...
>
> BTW - for others on my staff that are not on the mail-list - are the 
> sequence of emails being archived somewhere that they can view them. 
> The discussion has been very interesting.
Looks like it's here:

http://www.unidata.ucar.edu/support/help/MailArchives/opendap-tech/maillist.html

>
> -Roy M.



More information about the Opendap-tech mailing list