Cut and hung threads

James Gallagher jhrg at mac.com
Wed Feb 14 15:06:42 PST 2007


On Feb 14, 2007, at 2:21 PM, Jennifer Adams wrote:

> Dear Experts,
>
> I've got a couple of questions about the connection between a  
> server (in this case a GDS) and a DODS client (in this case, GrADS  
> linked with DODS core 3.4.8 and DODS netcdf 3.4.7).
>
> 1. When the client sends a request to the server, it can take some  
> time for the data to come back -- especially if the server is busy  
> helping other customers or if the request is a particularly large  
> subset. During that period, while the client is waiting and data  
> are streaming over the internet from the server, the server  
> sometimes gets rebooted (an administrative necessity) and the  
> 'thread' is cut. The problem is that the client doesn't seem to  
> notice or care, and junk gets written to the local file. There's  
> never any evidence of failure -- no bad return codes -- just funky  
> data. And by funky, I don't mean outlandishly wrong, just something  
> subtle like grid indices out of order. Very, very difficult to  
> detect unless you're explicitly debugging for it. It's almost as if  
> there's a 'retry' built into the library call, so if it fails it  
> tries again and finds the server up so it continues where it left  
> off but gets it wrong anyway.

I'm not sure if you mean 'thread' in programming sense or in the  
sense of 'connection between the client and server.' If the latter,  
lets call it a 'socket' to avoid confusion with other meanings of  
thread. If the socket is being dropped by the server, the client  
should be notified by the OS using the PIPE signal. If the client  
doesn't catch that signal, the OS will stop the client's process. So  
maybe something is happening here because the GDS uses Servlets and  
maybe those don't work quite this way. We might need a Java expert  
for this one, although this behavior is something Dan Holloway has  
reported, too, and using a C++/Perl server. The problem may also be  
that the exact behavior of SIGPIPE varies between different Unix  
versions...

It would be great if we could isolate this so that it was a  
repeatable problem. Maybe we could do that with GDS?

>
> 2. This is related to problem #1. The scenario is similar, although  
> this time the client places a call to the server with ncvarget and  
> the server responds by fulfilling the request and sending the data  
> on their way back to the client. Then ... nothing. The ncvarget  
> call hangs and the size of the local data file freezes and there's  
> no recovery except to kill the GrADS process and start again. This  
> problem is not readily reproducible, but it is happening often  
> enough to undermine a regional desktop forecasting project that is  
> dependent on OPeNDAP subsetting.

Does this happen only with ncvarget() or with other functions too?

>
> Has anyone else experienced this behavior with these or other  
> clients/servers? Given that this is all happening inside a call to  
> ncvarget from GrADS, I'm wondering whether the problem is with the  
> DODS netcdf library instead of GrADS. I could probably write a  
> stand-alone program with the ncvarget calls and factor out GrADS.  
> Maybe it's known behavior that might easily be solved by upgrading  
> to a later version of the OPeNDAP netcdf library? Maybe the server  
> is failing to do something the client library needs? Any other  
> suggestions?

It would be great to get the GDS and GrADS working with the newer  
code. It's _much_ newer at this point. In particular, libnc-dap is  
really different on the inside than the version your using. How can I  
help with this? Even if it does not clear up the problem, we should  
do this as a first step.

James
>
> Thanks in advance for your help,
> Jennifer
>
> --
> Jennifer M. Adams
> IGES/COLA
> 4041 Powder Mill Road, Suite 302
> Calverton, MD 20705
> jma at cola.iges.org
>
>
>

--
James Gallagher                jgallagher at opendap.org
OPeNDAP, Inc                   406.723.8663

-------------- next part --------------



More information about the Opendap-tech mailing list