[Opendap-tech] SWAMP 0.1 beta client release

Daniel Wang wangd at uci.edu
Fri Dec 7 11:55:52 PST 2007


SWAMP: A System for Server-side Geoscience Data Analysis
             http://code.google.com/p/swamp

We are pleased to announce the beta release of the SWAMP system.
SWAMP augments data servers to have a data analysis service.
Current data access services focus on providing managed and
highly-available access to data. Many have realized the value of
providing computational services as well. This combined
data+computation approach is natural in the geosciences, where
datasets are often too large and bulky to exploit standard
computational grids.

SWAMP answers this need by providing a way to specify a rich set of
analysis operations to a server and merely download the results.
It leverages scientists' knowledge of shell scripts.
SWAMP can process almost any POSIX-compliant data analysis script
whose commands it "understands" (i.e., can parse). SWAMP currently
understands the full suite of NCO (netCDF Operator) commands and
it can be made to understand any command line operators (CLOs).
Using SWAMP is easy once it understands your CLO syntax.
In most cases a slight modification makes the same data analysis
script run remotely via a SWAMP service (i.e., the server with the
data does the analysis) or locally on the scientist's own machine
(aka the traditional method).

Request for testers:

We are currently looking for both scientists (clients) and data center
administrators (servers) to test SWAMP. We have a server you can try.
Our server contains a small subset of IPCC AR4 climate simulation data
from ~17 different climate models. You can use our demo script
or write your own to test remote analysis of these large datasets.
Our demo computes a time series of the predicted ensemble-average 21st
century temperature change.

Everything needed to test our SWAMP service is provided in the
swamp-client package. Installing the client and running the IPCC test  
on our server takes only five commands:

wget http://swamp.googlecode.com/files/swamp-client-0.1.tar.gz
tar xjvf swamp-client-0.1.tar.gz
cd swamp-client-0.1
export SWAMPURL='http://pbs.ess.uci.edu:8080/SOAP'
python swamp_client.py ipcctest.swamp

Administration, while relatively straightforward, may require our
help initially. Download the swamp-server package and let us know if
we can be of assistance. We are actively looking for large test sites.

Announcements:

This beta-announcement is being made on many lists (netcdf, opendap,
nco, swamp, esg) that potential SWAMP users may read.   
Future announcements will be restricted to the swamp and nco lists.
Sign up for one of these to stay apprised of SWAMP development:

For releases and other major announcements:
http://groups.google.com/group/swamp-announce

For discussion, help, bug reports, comments, and test server status:
http://groups.google.com/group/swamp-users


Interesting Features:

- Supports most common shell syntax, including for-loops, if-branching, 
and variables.

- Detects dependencies and output files in your script-- no need to 
manually specify.

- Detects intermediate files in your workflow -- no wasted time or space 
transferring or storing them.

- Exploits parallelism in systems with multiple cores, multiple CPUs, 
and compute clusters.

- Supports NCO-based data processing and reduction.

- Saves bandwidth: Transfers only output data, which is a few times to 
tens of thousands of times smaller.

- Simple logging: know what sorts of analyses your users are interested in.

- Overall time speedup ranges from 1X to 1000X.  In rare cases,
  SWAMP may slow things down by at most 10%, but these are very rare cases.

Coming Soon:

- Integration with Grid Engine: Dynamic, on-demand allocation of compute 
nodes in response to changing computational load

- Better performance for complex scripts by coarser-grained work 
distribution.

- Support for workflows operating on data at multiple sites.

- Support for "standalone" mode: Take advantage of SWAMP's parallelism 
and optimization on a single workstation.


Known Issues:

- Not all shell syntax is supported.  SWAMP implements the most
  commonly used syntax, but every user has her own style.  Let us know
  if you think something's missing.

- Log files are messy.  SWAMP is under constant development, and a
  little mess and sawdust is inevitable.  Let us know if there's
  information you'd like logged or if you have specific ideas on how
  to reduce the clutter.

- Only supports NCO binaries and a few common shell programming
  helpers (e.g. printf, seq).  While this already provides a rich set
  of data reduction and analysis functionality, we understand that
  other tools are desired.  Let us know which ones you use.  Please
  keep in mind that SWAMP is focused on programs that work with large
  data sets, and that, for security reasons, not all programs/binaries
  should be supported.

- Beta release roughness.  

Learning more:

Visit the homepage to learn more about SWAMP including how it works:

http://code.google.com/p/swamp

Learn more at the 2007 AGU Fall Meeting, Monday at our SWAMP Poster, 
IN11B-0469


More information about the opendap-tech mailing list