ITS currently operates and maintains a 42-node beowulf cluster. This cluster was implemented to augment the current computational infrastructure ITS supports for the research needs of the Brock community.

 

Hardware

Main Node -> beowulf.ac.brocku.ca

  • Dell OptiPlex GX270
  • Intel 3.00 GHz Pentium 4 Processor, 800 MHz FSB, 512K Cache
  • 1 GB DDR 400 MHz RAM
  • 40 GB EIDE HD
  • Intel 100/1000 Mbit Ethernet Adapter
  • 3COM 3C980 100 Mbit Ethernet Adapter

Compute Nodes -> node2.brockwulf.ca to node42.brockwulf.ca

  • Dell OptiPlex GX270
  • Intel 3.00 GHz Pentium 4 Processor, 800 MHz FSB, 512K Cache
  • 1 GB DDR 400 MHz RAM
  • 40 GB EIDE HD
  • Intel 100/1000 Mbit Ethernet Adapter

    All nodes are interconnected through a private CISCO Catalyst 2948G switch which has a non-blocking 24 Gbps architecture:

 


File System

All beowulf compute nodes have a local file system (CentOS release 5.4) with a NFS mounted /home directory
from the main node.


/home/login_id

  • permanent disk space where login_id is your login name
  • NFS mounted on all nodes from main node
  • nightly backups

swap

  • on main node 2 GB of swap is available
  • on compute nodes 1 GB of swap is available
  • note that swapping on compute nodes should be discouraged as this slows down computation

Access

Secure shell access is available to the main node from the local .brocku.ca network

eg:
newton 1% ssh kaizaad@beowulf.ac.brocku.ca

Access to each compute node is through the main node.
eg:
[kaizaad@beowulf kaizaad]$ ssh node2


PBS

Brock's beowulf uses PBS (Portable Batch System) as its resource manager to ensure a balanced use of the available resources and also to submit jobs. MAUI is used to schedule the jobs through PBS to ensure fair usage.

The main node (beowulf.ac.brocku.ca) is used for logging in and compiling programs. The compute nodes are used for CPU intensive computations (node2.brockwulf.ca to node42.brockwulf.ca).

PLEASE DO NOT RUN CPU intensive programs on the main node as this affects other users and their ability to logon and use Brock's Beowulf cluster.

The PBS environment is configured to use a "whole pool" approach where all jobs are submitted to PBS through a single queue (workq). THe MAUI scheduler then evaluates each job and decides when it can run.

The following example scripts can be used as a templates for requesting resources through PBS.

PBS Example Scripts

A single pbs job -> single_job.pbs

A pvm pbs job -> pvm_job.pbs

A MPICH MPI pbs job -> mpich-mpi_job.pbs

A LAM MPI pbs job -> lam-mpi_job.pbs

An interactive pbs session that brings up a xterm window-> xterm_ssh.pbs

PBS Commands

You can then submit your jobs to PBS using the qsub command.

For a single processor job you can use the following:
qsub -V single_job.pbs
For a 7 processor LAM-MPI job use:
qsub -l nodes=7:ppn=1 lam-mpi_job.pbs
To see all the jobs in the queue use:
qstat -n
To see a list of jobs you have submitted:
qstat -nu login_id
Or a specific job
qstat -n 2533
To delete a job, use qdel and specify the job number:
qdel 2533
Type man qsub for more information or look at the PBS documentation

Job Stdout and Stderr

PBS directs the stdout and stderr from running jobs into temporary files in the users $HOME directory.

job#.beowulf.OU
job#.beowulf.ER

Upon job completion these are copied to $PBS_O_WORKDIR (where the job was started from) to files specifed in pbs submit script by the
#PBS -o  and  #PBS -e  directives.

If you would like to see the stdout and stderr of a running job you can use the qjob_ou and qjob_er commands and specify the job number:

qjob_ou 2533
qjob_er 2533
These commands are equivilant to the standard less text paging command. Type man less for more information.



Utilities

C3 tools

The Cluster Command Control (C3) tools are a suite of cluster tools developed at Oak Ridge National Laboratory that are useful for executing commands on cluster nodes such as file distribution and gathering, and process query and termination.

cexec - general utility that enables the execution of any standard command on all cluster nodes
eg: To get a listings of all process you are running on the nodes
cexec "ps -fu kaizaad"
eg: To kill all processes named myprog
cexec "killall -KILL myprog"
eg: To make a directory on each node, one level below the /scratch directory to store temporary files used by your program
cexec "mkdir /scratch/kaizaad"
type man cexec for more information

cget - retrieves files or directories from all cluster nodes

eg: To retrive a copy of a file from all nodes on the cluster into the current directory
cget /scratch/kaizaad/data1
eg: To retrive a copy of a file from nodes 3 through 5 into a specific directory
cget :3-5 /scratch/kaizaad/output /home/kaizaad/data
type man cget for more information

cpush - distribute files or directories to all cluster nodes

eg: To copy a single file to all nodes on the cluster
cpush /home/kaizaad/data /scratch/kaizaad/data
type man cpush for more information

ckill - terminates a user specified process on all cluster nodes

eg: To send a process named myprog on nodes 3 through 5 a signal 9
ckill -s 9 :3-5 myprog
type man ckill for more information

crm - remove files or directories from all cluster nodes

eg: To remove all .dat files on nodes 3 through 5
crm :3-5 '/scratch/kaizaad/*.dat'
type man crm for more information

Compilers and Libraries

Brockwulf has Intel's C/C++ (icc) and Fortran (ifc) compilers installed along with the GNU familiy of compilers (gcc, g77).

You can use the following optimization flags with the Intel compilers

-O2 -xW -tpp7
which will optimize your code for the P4 processor

For the GNU family of compilers you can use

-O2 -march=pentium4
Intel's Math Kernel Library is also avaliable and provides vendor optimized LAPACK and BLAS libraries. They can be linked using
-L/opt/intel/mkl70cluster/lib/32/ -lmkl_lapack -lmkl_p4 -lguide -lpthread
Both versions 3.0.1 and 2.1.5 of the FFTW implementation of DFT are avaliable on Brockwulf.



Additional Information

The official beowulf FAQ: http://www.canonical.org/~kragen/beowulf-faq.txt

also http://www.beowulf.org/


Programming environments

MPI (Message Passing Interface, both the LAM/MPI and MPICH implementations)

LAM/MPI: http://www.lam-mpi.org/.
MPICH: http://www-unix.mcs.anl.gov/mpi/mpich/.

PVM (Parallel Virtual Machine): http://www.epm.ornl.gov/pvm/.

Local Intel PDFs containing information on:

Parallelism on Pentium Processor Based Systems
Optimizing Applications with Intel Compilers
Intel C/C++ and Fortran 90 Compilers User's Guides
Intel Fortran Language Reference Manual
Intel Math Kernel Library Technical User Notes
Intel Math Kernel Library Reference Manual
Intel Debugger (IDB) Manual


Workload Management:

The Portable Batch System Software: http://www.OpenPBS.org
Maui scheduler: http://www.supercluster.org/maui

Local additional PBS documentation: pbs


Monitoring:

system monitoring: http://beowulf.ac.brocku.ca/ganglia

disk usage monitoring: http://beowulf.ac.brocku.ca/usage