Cut and hung threads
Jennifer Adams
jma at cola.iges.org
Wed Feb 14 13:21:31 PST 2007
Dear Experts,
I've got a couple of questions about the connection between a server
(in this case a GDS) and a DODS client (in this case, GrADS linked
with DODS core 3.4.8 and DODS netcdf 3.4.7).
1. When the client sends a request to the server, it can take some
time for the data to come back -- especially if the server is busy
helping other customers or if the request is a particularly large
subset. During that period, while the client is waiting and data are
streaming over the internet from the server, the server sometimes
gets rebooted (an administrative necessity) and the 'thread' is cut.
The problem is that the client doesn't seem to notice or care, and
junk gets written to the local file. There's never any evidence of
failure -- no bad return codes -- just funky data. And by funky, I
don't mean outlandishly wrong, just something subtle like grid
indices out of order. Very, very difficult to detect unless you're
explicitly debugging for it. It's almost as if there's a 'retry'
built into the library call, so if it fails it tries again and finds
the server up so it continues where it left off but gets it wrong
anyway.
2. This is related to problem #1. The scenario is similar, although
this time the client places a call to the server with ncvarget and
the server responds by fulfilling the request and sending the data on
their way back to the client. Then ... nothing. The ncvarget call
hangs and the size of the local data file freezes and there's no
recovery except to kill the GrADS process and start again. This
problem is not readily reproducible, but it is happening often enough
to undermine a regional desktop forecasting project that is dependent
on OPeNDAP subsetting.
Has anyone else experienced this behavior with these or other clients/
servers? Given that this is all happening inside a call to ncvarget
from GrADS, I'm wondering whether the problem is with the DODS netcdf
library instead of GrADS. I could probably write a stand-alone
program with the ncvarget calls and factor out GrADS. Maybe it's
known behavior that might easily be solved by upgrading to a later
version of the OPeNDAP netcdf library? Maybe the server is failing to
do something the client library needs? Any other suggestions?
Thanks in advance for your help,
Jennifer
--
Jennifer M. Adams
IGES/COLA
4041 Powder Mill Road, Suite 302
Calverton, MD 20705
jma at cola.iges.org
-------------- next part --------------
More information about the Opendap-tech
mailing list