How do I run MMC simulations across a cluster ?

As you already knew, MMC supports multi-threaded computing. That means if you launch MMC in a single PC with multi-core CPUs, it will execute several parallel threads to use all the available resources to accelerate the computation. However, this approach is limited to SMP (shared memory processors) systems like a stand-alone PC. If you have a distributed memory system, such as a cluster, MMC must be launched across the network.

GNU parallel is a free software to help run multiple MMC simulations in parallel on a multi-core PC or distributed servers (including a cluster). Here we give examples on how to use this tool to do parallel computing over a cluster.

install GNU parallel

To use GNU parallel, you first need to install it as it is not typically shipped by most Linux distributions. You don't have to be an administrator to install it. Here are the commands:

 cd /tmp
 tar jxvf parallel-20110205.tar.bz2
 cd parallel-20110205

You can find an executable named "parallel" under the src/ directory. It is in fact a perl script and can be readily copied and used.

Now you need to copy the parallel script to a common folder, for example, "~/bin/", and then add the path to this folder to your PATH environment variable. To do so, you need to read this page for instructions.

After updating your PATH variable, you need to start a new console window, and type

  parallel --help
if you see the usage information, then parallel is successfully installed.

prepare your MMC session

Of course, you have to prepare the necessary input files for the MMC session that you want to run. This includes the mesh and input files. For each simulation, it is recommended putting all related files in a separate folder, so the output files won't be mixed or overwritten. You can find more details in this page.

prepare the server list

GNU parallel can launch jobs across the Internet using ssh remote execution. You can specify a list of servers by the command line option --sshlogin or --sshloginfile. In the later case, you simply create a ASCII file with each row being the hostname of the computer that you want to use. If the computer requires separate login, you can specify the protocol and username. The example host file can be found in the manual.

If you don't know the node computer names in your cluster, you can browse the /etc/hosts" file where all the hostnames are listed.

run parallel simulations

Then you are ready to launch a distributed simulation for MMC. Here is a set of example commands (assuming your shell is bash) to run one of the build-in example in the MMC package:

 seq -f "%02g" 1 10 | parallel -j1 --sshloginfile nodes.txt --progress \
"cd $MMCROOT/examples/meshtest && $MMCROOT/src/bin/mmc -f sph1.inp -s sph1_{} -n 1000000 -b 1 -E 12345678{} -D T"

Let me explain each part of the above command. The first line

defines a variable named "MMCROOT" pointing to the full path of the MMC root directory. This has two uses: 1) you don't have to repeat it again in the later commands, and 2) using the full path ensures the unique execution path when logging on remote servers (here we assume all nodes in your cluster shares a common file system).

The first part of the second line

 seq -f "%02g" 1 10
generates a list of strings (in this case, 10). Each string corresponds to a job.

The second part of the second line

 parallel -j1 --sshloginfile nodes.txt --progress ...
launches a parallel job. The "-j1" option tells parallel to maintain 1 job per server at any given time. Because mmc binary is multi-threaded, so running 1 job per node is sufficient to use all the resources. If you happen to compiled a "single-threaded" mmc, then you need to use "-j0" which tells parallel to launch as many job as the CPU cores in a server.

The "--sshloginfile nodes.txt" option tells parallel to read the server name list from a file, nodes.txt. The content of this file is

 fangq@launchpad:[meshtest]$ cat nodes.txt 

You can see each row in the file is a computer name. You many decide how many servers you want to use. Here we use only 5 servers for this demonstration

The --progress flag is optional. It tells parallel to print overall progress while executing the jobs.

The third line is the actual command for the simulation:

 "cd $MMCROOT/examples/meshtest && $MMCROOT/src/bin/mmc -f sph1.inp -s sph1_{} -n 1000000 -b 1 -E 12345678{} -D T"
you need to quote it with "" and the command will be sent to each server and executed. The "cd $MMCROOT/examples/meshtest" command tells parallel to change directory to the simulation folder on the remote server. Then the mmc command is just as usual:
 $MMCROOT/src/bin/mmc -f sph1.inp -s sph1_{} -n 1000000 -b 1 -E 12345678{} -D T
The only difference in the above line is the "{}" placeholder. When running with parallel, each presence of {} will be replaced by an element from the list piped to parallel command. In this case, it will be replaced by strings "01" to "10" as generated by the seq command.

To put everything together, you are telling parallel to run 10 parallel jobs (as defined by the seq output) over 5 remote servers (as defined in the nodes.txt file). The full job list can be printed by inserting --dry-run after parallel command:

 cd .../meshtest && .../bin/mmc -f sph1.inp -s sph1_01 -n 1000000 -b 1 -E 38918101 -D T
 cd .../meshtest && .../bin/mmc -f sph1.inp -s sph1_02 -n 1000000 -b 1 -E 38918102 -D T
 cd .../meshtest && .../bin/mmc -f sph1.inp -s sph1_03 -n 1000000 -b 1 -E 38918103 -D T
 cd .../meshtest && .../bin/mmc -f sph1.inp -s sph1_04 -n 1000000 -b 1 -E 38918104 -D T
 cd .../meshtest && .../bin/mmc -f sph1.inp -s sph1_05 -n 1000000 -b 1 -E 38918105 -D T
 cd .../meshtest && .../bin/mmc -f sph1.inp -s sph1_06 -n 1000000 -b 1 -E 38918106 -D T
 cd .../meshtest && .../bin/mmc -f sph1.inp -s sph1_07 -n 1000000 -b 1 -E 38918107 -D T
 cd .../meshtest && .../bin/mmc -f sph1.inp -s sph1_08 -n 1000000 -b 1 -E 38918108 -D T
 cd .../meshtest && .../bin/mmc -f sph1.inp -s sph1_09 -n 1000000 -b 1 -E 38918109 -D T

where the simulation specified by input file sph1.inp will be executed 10 times; for each time, the output files are named differently (by the -s option) to avoid conflicts; for each simulation, the random number generator (RNG) are seeded explicitly by the -E option. Here I used "..." to save space, but in the real case, you need to make sure those are full path names. You also need to make sure the seeds are different for each job, or you will get (nearly) repeated results for 10x, and they are not helpful. You can also set the seed in the sph1.inp file to -1. This will let MMC to automatically generate a seed at run-time based on the system clock. However, you won't know what is the actual seed, and it is not possible for you to repeat your simulation again.

Now let's see how parallel executes this command. Because we have 5 servers and 10 jobs in total, parallel will first launch 5 jobs in parallel to each server (as specified by -j1). When any of the jobs completes, parallel will submit new jobs from the unfinished ones, until no job left. Every time when a job finish, parallel will print the command line output so you can see if there is any error. When all jobs complete, parallel will give a short summary. In the meantime, you will see 10 output files appear under the work directory

 fangq@launchpad:[meshtest]$ ls -S sph1_*
 sph1_01.dat  sph1_03.dat  sph1_05.dat  sph1_07.dat  sph1_09.dat
 sph1_01.mch  sph1_03.mch  sph1_05.mch  sph1_07.mch  sph1_09.mch
 sph1_02.dat  sph1_04.dat  sph1_06.dat  sph1_08.dat  sph1_10.dat
 sph1_02.mch  sph1_04.mch  sph1_06.mch  sph1_08.mch  sph1_10.mch

The output include the flux data (.dat file) and detected photon paths from the 10 completed jobs. Now you can load the flux data in matlab/octave, and average them to produce the final solution. For the detected photon info, you can simply load them in matlab and concatenate the data section.

Powered by Habitat