Cut and hung threads
James Gallagher
jhrg at mac.com
Thu Feb 15 08:51:22 PST 2007
On Feb 14, 2007, at 7:14 PM, Jennifer Adams wrote:
>
> On Feb 14, 2007, at 6:06 PM, James Gallagher wrote:
>
>>
>> On Feb 14, 2007, at 2:21 PM, Jennifer Adams wrote:
>>
>>> Dear Experts,
>>>
>>> I've got a couple of questions about the connection between a
>>> server (in this case a GDS) and a DODS client (in this case,
>>> GrADS linked with DODS core 3.4.8 and DODS netcdf 3.4.7).
>>>
>>> 1. When the client sends a request to the server, it can take
>>> some time for the data to come back -- especially if the server
>>> is busy helping other customers or if the request is a
>>> particularly large subset. During that period, while the client
>>> is waiting and data are streaming over the internet from the
>>> server, the server sometimes gets rebooted (an administrative
>>> necessity) and the 'thread' is cut. The problem is that the
>>> client doesn't seem to notice or care, and junk gets written to
>>> the local file. There's never any evidence of failure -- no bad
>>> return codes -- just funky data. And by funky, I don't mean
>>> outlandishly wrong, just something subtle like grid indices out
>>> of order. Very, very difficult to detect unless you're explicitly
>>> debugging for it. It's almost as if there's a 'retry' built into
>>> the library call, so if it fails it tries again and finds the
>>> server up so it continues where it left off but gets it wrong
>>> anyway.
>>
>> I'm not sure if you mean 'thread' in programming sense or in the
>> sense of 'connection between the client and server.' If the
>> latter, lets call it a 'socket' to avoid confusion with other
>> meanings of thread.
> Right. We're talking about sockets.
>
>> If the socket is being dropped by the server, the client should be
>> notified by the OS using the PIPE signal. If the client doesn't
>> catch that signal, the OS will stop the client's process. So maybe
>> something is happening here because the GDS uses Servlets and
>> maybe those don't work quite this way. We might need a Java expert
>> for this one, although this behavior is something Dan Holloway has
>> reported, too, and using a C++/Perl server. The problem may also
>> be that the exact behavior of SIGPIPE varies between different
>> Unix versions...
>>
>> It would be great if we could isolate this so that it was a
>> repeatable problem. Maybe we could do that with GDS?
> I will set up a test bed with a GrADS client built from source with
> libnc-dap-3.7.0 on a linux box in Peru (one of our regional
> forecasting prototypes where they're experiencing these problems)
> and the GDS at COLA. Stay tuned.
OK. The latest libdap (just released but not yet announced while I
work on the web pages) and libnc-dap 3.7.0 (the newest libnc-dap)
seem to work fine together. That is, it's building on linux and OS/X
and passing the tests run by the nightly builds.
>
>
>>> 2. This is related to problem #1. The scenario is similar,
>>> although this time the client places a call to the server with
>>> ncvarget and the server responds by fulfilling the request and
>>> sending the data on their way back to the client. Then ...
>>> nothing. The ncvarget call hangs and the size of the local data
>>> file freezes and there's no recovery except to kill the GrADS
>>> process and start again. This problem is not readily
>>> reproducible, but it is happening often enough to undermine a
>>> regional desktop forecasting project that is dependent on OPeNDAP
>>> subsetting.
>>
>> Does this happen only with ncvarget() or with other functions too?
> It doesn't seem to happen with the metadata calls, just ncvarget.
Well, the metadata requests are pretty small, so it might be a size/
time/timing problem. I was wondering about other data requests.
>
>>> Has anyone else experienced this behavior with these or other
>>> clients/servers? Given that this is all happening inside a call
>>> to ncvarget from GrADS, I'm wondering whether the problem is with
>>> the DODS netcdf library instead of GrADS. I could probably write
>>> a stand-alone program with the ncvarget calls and factor out
>>> GrADS. Maybe it's known behavior that might easily be solved by
>>> upgrading to a later version of the OPeNDAP netcdf library? Maybe
>>> the server is failing to do something the client library needs?
>>> Any other suggestions?
>>
>> It would be great to get the GDS and GrADS working with the newer
>> code. It's _much_ newer at this point. In particular, libnc-dap is
>> really different on the inside than the version your using. How
>> can I help with this? Even if it does not clear up the problem, we
>> should do this as a first step.
> OK, thanks James. I will start working with libnc-dap-3.7 and let
> you know if I have any problems building. If I can readily
> duplicate the problems with the new build, I'll be back.
OK. Let me know either way ...
James
>
> Jennifer
>
>
>
>
--
James Gallagher jgallagher at opendap.org
OPeNDAP, Inc 406.723.8663
More information about the Opendap-tech
mailing list