Cut and hung threads

Jennifer Adams jma at cola.iges.org
Wed Feb 14 13:21:31 PST 2007


Dear Experts,

I've got a couple of questions about the connection between a server  
(in this case a GDS) and a DODS client (in this case, GrADS linked  
with DODS core 3.4.8 and DODS netcdf 3.4.7).

1. When the client sends a request to the server, it can take some  
time for the data to come back -- especially if the server is busy  
helping other customers or if the request is a particularly large  
subset. During that period, while the client is waiting and data are  
streaming over the internet from the server, the server sometimes  
gets rebooted (an administrative necessity) and the 'thread' is cut.  
The problem is that the client doesn't seem to notice or care, and  
junk gets written to the local file. There's never any evidence of  
failure -- no bad return codes -- just funky data. And by funky, I  
don't mean outlandishly wrong, just something subtle like grid  
indices out of order. Very, very difficult to detect unless you're  
explicitly debugging for it. It's almost as if there's a 'retry'  
built into the library call, so if it fails it tries again and finds  
the server up so it continues where it left off but gets it wrong  
anyway.

2. This is related to problem #1. The scenario is similar, although  
this time the client places a call to the server with ncvarget and  
the server responds by fulfilling the request and sending the data on  
their way back to the client. Then ... nothing. The ncvarget call  
hangs and the size of the local data file freezes and there's no  
recovery except to kill the GrADS process and start again. This  
problem is not readily reproducible, but it is happening often enough  
to undermine a regional desktop forecasting project that is dependent  
on OPeNDAP subsetting.

Has anyone else experienced this behavior with these or other clients/ 
servers? Given that this is all happening inside a call to ncvarget  
from GrADS, I'm wondering whether the problem is with the DODS netcdf  
library instead of GrADS. I could probably write a stand-alone  
program with the ncvarget calls and factor out GrADS. Maybe it's  
known behavior that might easily be solved by upgrading to a later  
version of the OPeNDAP netcdf library? Maybe the server is failing to  
do something the client library needs? Any other suggestions?

Thanks in advance for your help,
Jennifer

--
Jennifer M. Adams
IGES/COLA
4041 Powder Mill Road, Suite 302
Calverton, MD 20705
jma at cola.iges.org



-------------- next part --------------



More information about the Opendap-tech mailing list