They are several ways of efficient programming with IDL. The most common way is to use the IDL Thread Pool, which is implemented in several routines and operation on large arrays. But sometimes, matrix or large vector operations are not possible, and/or not sufficient. Sometimes, we would like to break a simple FOR loop into several parallel IDL sessions.


This is the purpose of IDL_IDLBridge which allow to create sub instances of IDL and control them within the original IDL instance. A nice tutorial can be find here, and Robert da Silva wrote the routine split_for which allow to split simple loops without entering into the specific syntax of IDL_IDLBridge. Unfortunately, IDL_IDLBridge only creates new processes of IDL on the same computer, as child processes, and sometimes, this is not enough.

When dealing with a cluster of computer, the most convenient way to use several nodes is through MPI. At the expense of loosing the real purpose of MPI, i.e. passing messages, it is possible to launch several instances of IDL in parallel, on one or several nodes in a cluster. It is thus possible to split a long and complex FOR loop into several processes on potentially several nodes.


It is possible to launch several instance of IDL using MPI :

mpirun -n $nMPI idl

will launch several independent IDL processes, which will all run the same program This would produce several times the exact same output. However, at least using the OpenMPI implementation, it is possible to retrieve MPI_COMM_WORLD_RANK and MPI_COMM_WORLD_SIZE as system variables and thus split any big FOR loop. The results will then depends on the MPI_COMM_WORLD_RANK value and could be saved using the rank as an unique identifier.

I wrote a few simple scripts to help parallelize a long FOR loop. Note however that combining the results of the FOR loop will require disk IO and thus, the iterations of the FOR loop should be rather CPU intensive in order to be efficient. The following example compute the sum of all indexes between two values.

;; Let's compute the sum of all indexes between start and stop

;; Retrieve the MPI RANK from system variables
mpi_rank, rank

start = 0
stop = 11
sum = 0

;; Divide the task among the different MPI ranks
mpi_helper, start, stop, i_start, i_stop

;; This is the main loop, should be an intensive task....
FOR index=i_start, i_stop-1 DO BEGIN $
   PRINT, rank, index, FORMAT='("rank: ", I2, " doing ", I2)' &$
   WAIT, start &$
   sum += index

;; save the result in an unique output file
SAVE, FILENAME=STRING(rank, FORMAT='("poc_",I2.2,".sav")'), sum

;; Do not forget to exit at the end

Launching 5 instances of this program with

mpirun -n 5 idl

will result in 5 files

poc_00.sav  poc_01.sav  poc_02.sav  poc_03.sav  poc_04.sav

which can be combined using

sum_combined = 0
files = FILE_SEARCH('poc_*.sav')
FOR iFile=0, N_ELEMENTS(files)-1 DO BEGIN $
   RESTORE, files[iFile] &$
   sum_combined += sum

PRINT, sum_combined


When using a cluster of computer, it is thus possible to allocate several nodes and launch one instance of IDL per node which will use efficiently all the core of the node using IDL_IDLBridge. For example, using the slurm scheduler, one could setup a batch script as


#SBATCH -n 64
#SBATCH --exclusive
#SBATCH --job-name "MPI IDL"

mpirun -pernode idl

which will launch 2 IDL processes on two different nodes with a total of 64 cores.

accept that the IDL_IDLBridge documentation state that :

Note: On UNIX systems, the IDL_IDLBridge requires that the DISPLAY environment variable be set to a valid X Windows display. If no display is available, execution of the IDL_IDLBridge will halt, with error messages that include:

'IDL_IDLBridge: Unable to establish X Connection.'

which render the sbatch very impracticable, as one need to be sure that X11 is forwarded on all the allocated node of the cluster which are not known in advance. Thanks again IDL... Hopefully Xvfb allows to run virtual x11 environment and thus

srun xvfb-run idl

will be able to run.