They are several ways of efficient programming with IDL. The most common way is to use the IDL Thread Pool, which is implemented in several routines and operation on large arrays. But sometimes, matrix or large vector operations are not possible, and/or not sufficient. Sometimes, we would like to break a simple FOR loop into several parallel IDL sessions.
This is the purpose of IDL_IDLBridge which allow to create sub instances of IDL and control them within the original IDL instance. A nice tutorial can be find here, and Robert da Silva wrote the routine split_for which allow to split simple loops without entering into the specific syntax of IDL_IDLBridge. Unfortunately, IDL_IDLBridge only creates new processes of IDL on the same computer, as child processes, and sometimes, this is not enough.
When dealing with a cluster of computer, the most convenient way to use several nodes is through MPI. At the expense of loosing the real purpose of MPI, i.e. passing messages, it is possible to launch several instances of IDL in parallel, on one or several nodes in a cluster. It is thus possible to split a long and complex FOR loop into several processes on potentially several nodes.
nMPI=#ntasks mpirun -n $nMPI idl mpi_poc.pro
will launch several independent IDL processes, which will all run the same program
mpi_poc.pro. This would produce several times the exact same output. However, at least using the OpenMPI implementation, it is possible to retrieve
MPI_COMM_WORLD_SIZE as system variables and thus split any big FOR loop. The results will then depends on the
MPI_COMM_WORLD_RANK value and could be saved using the rank as an unique identifier.
I wrote a few simple scripts to help parallelize a long FOR loop. Note however that combining the results of the FOR loop will require disk IO and thus, the iterations of the FOR loop should be rather CPU intensive in order to be efficient. The following example compute the sum of all indexes between two values.
;; mpi_poc.pro ;; Let's compute the sum of all indexes between start and stop ;; Retrieve the MPI RANK from system variables mpi_rank, rank start = 0 stop = 11 sum = 0 ;; Divide the task among the different MPI ranks mpi_helper, start, stop, i_start, i_stop ;; This is the main loop, should be an intensive task.... FOR index=i_start, i_stop-1 DO BEGIN $ PRINT, rank, index, FORMAT='("rank: ", I2, " doing ", I2)' &$ WAIT, start &$ sum += index ;; save the result in an unique output file SAVE, FILENAME=STRING(rank, FORMAT='("poc_",I2.2,".sav")'), sum ;; Do not forget to exit at the end exit
Launching 5 instances of this program with
mpirun -n 5 idl mpi_poc.pro
will result in 5 files
poc_00.sav poc_01.sav poc_02.sav poc_03.sav poc_04.sav
which can be combined using
sum_combined = 0 files = FILE_SEARCH('poc_*.sav') FOR iFile=0, N_ELEMENTS(files)-1 DO BEGIN $ RESTORE, files[iFile] &$ sum_combined += sum PRINT, sum_combined PRINT, TOTAL(INDGEN(11))
IDL & MPI & IDL_IDLBridge
When using a cluster of computer, it is thus possible to allocate several nodes and launch one instance of IDL per node which will use efficiently all the core of the node using IDL_IDLBridge. For example, using the slurm scheduler, one could setup a batch script as
#!/bin/bash #SBATCH -N 2 #SBATCH -n 64 #SBATCH --exclusive #SBATCH --job-name "MPI IDL" mpirun -pernode idl mpi_poc.pro
which will launch 2 IDL processes on two different nodes with a total of 64 cores.
accept that the IDL_IDLBridge documentation state that :
Note: On UNIX systems, the IDL_IDLBridge requires that the DISPLAY environment variable be set to a valid X Windows display. If no display is available, execution of the IDL_IDLBridge will halt, with error messages that include:
'IDL_IDLBridge: Unable to establish X Connection.'
which render the sbatch very impracticable, as one need to be sure that X11 is forwarded on all the allocated node of the cluster which are not known in advance. Thanks again IDL... Hopefully Xvfb allows to run virtual x11 environment and thus
srun xvfb-run idl mpi_poc.pro
will be able to run.