Fixing scientific software distribution: Matt T

My name is Matt Turk, and I'm a computational astrophysicist working on structure formation in the early universe. I am the original author of the yt code and a developer on the Enzo project. I'm at Columbia University, on an NSF postdoctoral fellowship to drive forward my studies of the First Stars while developing infrastructure for these simulations, targeting simulations from laptop- to peta-scale. yt is a python project, while Enzo is largely C/C++/Fortran. yt is designed to target the output of multiple different simulation codes, and has a growing user- and developer-base.

My primary interests with respect to software are ensuring that communities of users can deploy appropriate analysis packages easily and on multiple systems. While the majority of the users of yt utilize XSEDE resources, we have a large number that also use laptops and local computing clusters.

yt started out as very, very difficult to install. The software stack was quite large and it was not automated. For the most part, we have addressed this in two ways. The first is that the dependency stack has been whittled away substantially; we are extremely conservative in adding new dependencies to yt, and the core dependencies for most simulation input types is simply numpy, hdf5, and Python itself. The second approach is to provide a hand-written installer script, which handles installation of the following dependencies into an isolated directory structure:

zlib
bzlib
libpng
freetype (optional)
sqlite (optional)
Python
numpy
matplotlib (optional)
ipython (optional)
hdf5
h5py (optional)
Cython (optional)
Forthon (optional)
mercurial
yt

This seems like a large stack, but the trickiest libraries are usually matplotlib and numpy. We have also reached out to XSEDE and modules are now available on several HPC installations. The install script takes care of the rest. We're currently in the process of attempting to make yt available as a component of both ParaView's superbuild and VisIt's build_visit script, both of which also handle dependency stacks. I'm extremely concerned with ensuring that yt's installation works everywhere, especially those systems where root / sudo is not available.

Easily the hardest problem, and the one that I hope we can solve in some way, is that of static builds. The problem of building a stack library (for use, for instance, on Compute Node Linux on some Cray systems) is difficult; starting with the GPAW instructions we at one time attempted to maintain static builds of yt, but the inclusion of C++ components (and the lack of C++ ABI interoperability) became too much of a burden and we no longer do so. Now we are faced with the issue of needing one because file systems typically cannot keep up with importing a python stack from every MPI task (which becomes burdensome even at as few as 256 processes, and essentially impossible above a couple thousand). While egg imports and zipped file systems alleviate this problem for pure-python libraries, this will not work for shared libraries. Neither I nor my fellow developers have found a simple and easy way to generate static builds that are easily updated, but this is a primary concern for me.

I don't have a particular takeaway or suggestion for a call to action; we have lately simply come to terms with the time it takes to load shared libraries, and we'll probably have another go at a unified static builder at some point in the future. But for now, out install script works reasonably well, and we will probably continue using it while still reaching out to system administrators for assistance building on individual supercomputer installations.

1 comment:

AronOctober 03, 2011
"Now we are faced with the issue of needing one because file systems typically cannot keep up with importing a python stack from every MPI task (which becomes burdensome even at as few as 256 processes, and essentially impossible above a couple thousand). While egg imports and zipped file systems alleviate this problem for pure-python libraries, this will not work for shared libraries. Neither I nor my fellow developers have found a simple and easy way to generate static builds that are easily updated, but this is a primary concern for me."

Jed Brown (ANL) and I (KAUST) are looking at a solution to this that involves intercepting file system access from the dynamic loader and putting in our own collective file access over the interconnect fabric. Will you be at SC? I'd be happy to discuss this with you there.

Fixing scientific software distribution

Blog Archive

2011-10-02

Matt T

1 comment: