The U.S. Army Engineer Research and Development Center (ERDC), headquartered in Vicksburg, MS, is the premier research and development laboratory complex of the Corps of Engineers. The Army Supercomputer Center at ERDC was established in 1989. In 1993, ERDC began operations as the first of the Department of Defense (DoD) High Performance Computing (HPC) Major Shared Resource Centers (MSRCs). The ERDC MSRC was formed under the auspices of the DoD HPC Modernization Program and is located in the ERDC Information Technology Laboratory (ITL) in Vicksburg, MS. The name was changed to the ERDC DoD Supercomputing Resource Center (DSRC) in 2009. The ERDC DSRC mission is to deliver HPC leadership, service, education, and technical expertise to achieve research and engineering objectives vital to the Nation. Access to the ERDC DSRC systems is available through multiple, nationwide high‑speed data communications networks. In addition, training and consultation are provided to local and remote DoD users of these systems.
Questions, comments, and suggestions about this guide are welcome. Comments may be sent in the following ways:
Toll‑free long distance: 800-500-HPCC (4722)
Local ERDC telephone: 601-634-4400, Option 1
E-mail: dsrchelp@erdc.hpc.mil
Facsimile: ERDC HPC Service Center, 601-634-3808
U.S. Postal Mail:
U.S. Army Engineer Research and Development Center
ATTN: CEERD‑IH (HPC Service Center)
3909 Halls Ferry Road
Vicksburg, MS 39180‑6199
Users requiring printed copies of Center documentation may contact our Service Center.
table of contents — top of page
Scope
This document provides an overview and introduction to the use of the Cray XT3
(sapphire) located at the ERDC DSRC and a description of the
specific computing environment on sapphire. The intent of this guide is to provide
information that will enable the average user to perform computational tasks on the system.
To receive the most benefit from the information provided here, you should be proficient
in the following areas:
table of contents — top of page
Font Conventions
The following font conventions will be used in this manual:
| Style | Meaning |
|---|---|
| Boldface | Indicates acronyms, abbreviations, or terms that will be used later in the
text. (e.g., Portable Batch System (PBS) ) |
| Italic | Indicates items that are especially important, or noteworthy. (e.g., Kerberized ftp is required...) |
| Constant Width | Indicates environment variables, file names, and command‑line output. (e.g., /etc/motd) |
| Constant Width Italic | Indicates items that you should replace with your own actual values. (e.g., login: my_user_name) |
table of contents — top of page
System Access
Before you can access our systems, you must have Kerberos version 5 installed on your PC
or workstation. This software must be used to acquire a Kerberos ticket before a
connection to the ERDC DSRC systems is allowed. More information on Kerberos can be
found below. After acquiring a Kerberos ticket, sapphire can be
accessed via Kerberized ssh as follows:
ssh login_node.erdc.hpc.mil
where login_node is one of sapphire01 through sapphire06.
If a Kerberized version of rlogin or telnet is available on your local machine, connection is also available via those commands.
Six login nodes, sapphire01 through sapphire06, provide login access to the XT3. These nodes provide the look and feel of a Linux-based environment with full access to the standard Linux utilities, commands, and shells that make program development easy and portable. Subsequent sequential processes, such as system commands or sequential user programs, run on the same node as your login shell. All jobs, both parallel and serial, are to be submitted to the batch queuing system. See the Job Submission section for more information on batch processing. Production jobs found running on the login nodes will be unilaterally terminated because of the negative impact those jobs have on the response time of the login nodes.
table of contents — top of page
Service Center
The Consolidated Customer Assistance Center (CCAC)
is available to help users with any problems, questions, or training requirements for our
HPC systems. Analysts are on duty Monday - Friday, 7:00 a.m. to 10:00 p.m. Central
Time. After‑hours support is provided by ERDC ITL operations staff.
You can contact us in any of the following ways:
U.S. Mail:
U.S. Army Engineer Research and Development Center
ATTN: CEERD-IH HPC Service Center
3909 Halls Ferry Road
Vicksburg, MS 39180-6199
The ERDC DSRC provides application support for the following Computational Technology Areas (CTAs):
table of contents — top of page
Obtaining an Account
Authorized DoD and contractor personnel may obtain computer accounts on our systems
through their site's Service/Agency Approval Authority (S/AAA). Please
see the instructions for obtaining
accounts prior to contacting your S/AAA.
table of contents — top of page
Security
We have implemented a three‑phase transition to a secure computing environment. The
overlapping transition began with the implementation of Secure Shell and has been
enhanced by Kerberos version 5. A hardware preauthentication step is performed using
Security Dynamics' SecurID card. This card implements a one‑time password
mechanism and requires you to enter a personal identification number (PIN) in order to
generate a passcode.
table of contents — top of page
Using Kerberos
To use our computer systems, you must have Kerberos version 5 installed on your PC or
workstation. Kerberos client kits and documentation are available from the Kerberos &
SecurID Information Center.
From that page, you can download client kits by clicking the "Software" link and then selecting from the end-user clients listed under the Binary section.
If you need help installing, configuring, or using the Kerberos Client Kit or your SecurID card, click "Documentation" and select the document that you need.
Other information, such as the HPCMP Kerberos Ticket Lifetimes and Required Minimum
Versions, is available only via Kerberized login. To login, click here
.
If you still need help, contact our Service Center.
For information on required port configurations for Kerberized services, please see the Kerberos Port Configuration Document.
For those unfamiliar with Kerberos, a few commonly used commands are shown below.
To obtain a Kerberos ticket on a UNIX‑based system
%kinit
Password for user@WES.HPC.MIL: enter Kerberos password
Passcode: enter your PIN number into the SecurID card, press
the diamond, and then enter the six‑digit passcode.
Note: Remember to press the "P" to delete the passcode from the SecurID card.
To verify that you have received a Kerberos ticket
%klist
Ticket cache: PIPE:1023 Default principal: user@WES.HPC.MIL Valid starting Expires Service principal 10/17/05 16:04:16 10/18/05 02:04:16 krbtgt/WES.HPC.MIL@WES.HPC.MIL
For security reasons, passwords are "aged" on all of our systems. As a result, your password will eventually expire, and you will have to change it. To change your password, use the kpasswd command. NOTE: kpasswd is not the same as the typical UNIX passwd command. The UNIX passwd command will not change your Kerberos password.
To change your Kerberos password
%kpasswd
Password for user@WES.HPC.MIL: enter your password
SAM Authentication
Challenge for Security Dynamics mechanism
SecurID Passcode: enter passcode from SecurID card
Enter new password: enter new password
Enter it again: re‑enter new password
Password changed.
To establish a login session using the ssh command
ssh login_node.erdc.hpc.mil
where login_node is one of sapphire01 through sapphire06.
Login sessions may also be established using the rlogin and telnet commands provided in the Kerberos kits.
For more complete instructions on using Kerberos, contact our Service Center.
table of contents — top of page
Services and Information
Users of our systems are provided with information through the toll‑free Service
Center hotline, workshops and seminars, the Web site, and on‑line documentation. A
brief discussion of some on‑line services follows:
An informative message of the day (motd) is displayed upon login to the ERDC DSRC systems. The motd contains important information about imminent events that will affect the immediate usage of the system. The UNIX more command is used with motd to prevent longer messages from scrolling off the monitor. The message is located in the file /etc/motd and may be viewed at any time by issuing the command "more /etc/motd" at the system prompt. Please read this information carefully.
An on‑line bulletin system is available on each system and can be used to obtain important information about the system. The bulletins are usually brief and contain information on a variety of topics. To display the list of available bulletins, use the bull command. A menu with a list of available bulletins will be displayed on the screen. Enter the number of the bulletin you wish to display. The bulletin will be displayed at your terminal one screen at a time. Press <spacebar> to display each additional screen until you have reached the end of the list of bulletins. Press "q <return>" to exit the bulletin utility from the main menu. For more information about the bull command, type "man bull".
table of contents — top of page
Training
The ERDC DSRC also supports an extensive training schedule through our User
Productivity Enhancement and Technology Transfer (PET)
function. Most training is conducted in the PET Training Facility in Room 1205 in the ERDC ITL. Training at remote facilities and specialized training courses will be considered
upon request. Please contact our Service Center for more information or to submit training
requests. Also, for users migrating their code to sapphire, the Computational Science and
Engineering (CS&E) group is available for consultation.
The training schedule is updated regularly on the Online Knowledge Center
. You can also
contact our Service Center for additional
information.
System Configuration
Sapphire is a Massively Parallel Processor (MPP)
supercomputer that is a successor to the Cray T3D and T3E. Sapphire contains
4,160 nodes, each containing one 2.6‑GHz AMD Opteron 64‑bit
dual‑core processor and dedicated memory. The node pool is partitioned into compute
and service partitions that are composed of 4,096 nodes (8,192 computational cores)
and 64 nodes, respectively. The compute nodes contain 4 GBytes of RAM. The
service nodes contain 4 GBytes of RAM except for the 6 login nodes, which contain
16 GBytes. The compute nodes, which perform computation only, run a microkernel OS
called Compute Node Linux (CNL) that has limited UNIX capabilities.
The service nodes run SUSE Linux and perform support functions for
application and system services. There are four types of predefined service nodes on
sapphire: login, IO, boot, and database. All nodes are connected to each other in a
three‑dimensional torus using a HyperTransport link to a dedicated Cray SeaStar
communications engine. Sapphire is rated at 42.6 peak TFLOPS and contains
374 TBytes of Fibre Channel RAID disk space.
Users are allocated 1 GByte of permanent disk space for their home directories on sapphire. You can reference this area with the $HOME environment variable.
Users are also assigned temporary work areas on the /work and /work2 file systems. These directories are seen by all the processors on the system and may be referenced by the environment variables $WORKDIR and $WORKDIR2, respectively. Please review the Temporary File Storage and Managing Temporary File Storage sections of this document for more information about using $WORKDIR and $WORKDIR2.
table of contents — top of page
Operating System
Sapphire's operating system is Cray Linux Environment (CLE), which consists of a
full-featured SUSE Linux kernel for the service nodes (including the login nodes) and a
Compute Node Linux (CNL) microkernel for the compute nodes. The heritage of the CNL
microkernel is partially Linux, but it resembles the distributed UNICOS/mk operating system
of the Cray T3E. The microkernel interacts with a user's application process in a very
limited way by managing virtual memory addressing, providing memory protection, and
performing basic scheduling. The microkernel architecture ensures reproducible run times
for MPP jobs, supports fine‑grained synchronization at scale, and ensures
high‑performance, high‑bandwidth MPI and SHMEM communication. Service nodes run
a full SUSE Linux distribution with specific Cray XT3 modifications.
table of contents — top of page
Login Node Abuse Policy
The login nodes, sapphire01 - sapphire06, provide login access for sapphire and support
such activities as compiling, editing, and general interactive use by all users.
Consequently, memory or CPU intensive programs running on the login nodes can
significantly affect all users of the system. Therefore, only small applications requiring
less than 10 minutes of runtime and less than 2 GBytes of memory are allowed on the
login nodes. Any job running on the login nodes that exceeds these limits may be
unilaterally terminated.
The preferred method to run interactive jobs is to use the Interactive Batch Environment. Jobs submitted to the batch queuing system from the Interactive Batch Environment will be submitted to compute nodes for execution.
table of contents — top of page
Data Storage
Home Directory Storage
Each user is allocated a home directory (the current working directory immediately after login) with an initial disk quota of 1 GByte of permanent nonmigrated storage. Your home directory can be referenced locally with the $HOME environment variable from all nodes in the system.
Requests to increase disk space quotas may be submitted by contacting our Service Center. You must supply the following information for evaluation of the request by the system administrators and the ERDC DSRC management:
table of contents — top of page
Sapphire has two large file systems, /work and /work2, for the temporary storage of data files needed for executing programs. You may access your personal working directories under these file systems by using the $WORKDIR and $WORKDIR2 environment variables, which are set for every user upon login. These WORKDIR directories have no disk quotas, and files stored there do not affect your permanent file quota usage. Because there are no disk quotas for the WORKDIRs, each user's interactive and batch jobs may consume large amounts of disk space. This fact, compounded by the large number of sapphire users, predisposes these file systems to space shortages and necessitates regular purges of unaccessed files. Don't forget that the WORKDIRs are "scratch" file systems. They are not backed up, and files may be deleted at any time. Always back up working files to the DMS to ensure safekeeping during (if possible) and upon completion of your jobs.
It is important to note that these are parallel, striped file systems. This means that as files are written, they are automatically divided into chunks and written across multiple disk sets, or "OSTs," simultaneously. This process, called "striping," plays a vital role in running very large jobs because it significantly improves file I/O speed, thereby reducing the time required to read or write a file. Without parallel striping, large jobs, many of which require hundreds of gigabytes of disk space, would spend much of their time just reading from and writing to disk.
The default stripe size for both /work and /work2 is 1 MB. The default stripe count for /work is two stripes and for /work2 is six. This increased stripe count makes /work2 an especially convenient place for the temporary staging of very large files, like tar files. When creating such files (> 20 GBytes) on /work, however, you should remember to increase the stripe count for those files. Click here for an explanation of how to do this.
Please note that all of your jobs should execute from one of your WORKDIRs. Jobs that are run from $HOME are subject to disk space quotas and have a greater chance of failing if problems occur with that resource. Jobs that are run entirely from a WORKDIR directory are more likely to complete, even if all other resources are temporarily unavailable.
If you use $WORKDIR or $WORKDIR2 in your batch scripts, you must be careful to avoid having one job accidentally contaminate the files of another job. If two different batch jobs use the same names for temporary files, unusual errors can arise if the two jobs happen to run at the same time. By having each job create and use its own subdirectory underneath one of your WORKDIRs, this problem can be avoided.
table of contents — top of page
Managing Temporary File Storage
Close management of your temporary storage is a very high priority. This is because the system halts processing, and manual intervention is required to restart processing when disk space becomes too low. Users are responsible for managing their own files in their WORKDIRs by transferring needed files to the DMS and deleting unneeded files when their processes end. If available space becomes critically low, a manual purge may be run, and all files in the WORKDIRs are eligible for deletion.
table of contents — top of page
All of our systems share an on‑line DMS that currently includes more than 32 TBytes of high‑speed disk cache, 18.2 TBytes of Tier 1 archival storage, and 3 PBytes of Tier 2 high‑speed archival storage utilizing a robotic tape library. The DMS should be used for all long‑term storage (more than 90 days).
Every user is given an account and an archival directory on one of the two partitions (gold and silver) of the DMS system - a Sun Enterprise 15000. The command getarchost can be used to determine your host DMS partition. Kerberized login and ftp are allowed into the DMS. Locally developed utilities may be used to transfer files to and from the DMS as well as to create and delete directories, rename files, and list directory contents. For convenience, the environment variable $ARCHIVE_HOME can be used to reference your DMS archive directory when using DMS commands. The command getarchome can be used to display the value of $ARCHIVE_HOME for any user.
The ERDC DSRC provides the user with the option to place files in a subdirectory specifically designated for the subproject under which the files were created. For additional details on these enhancements, click here.
table of contents — top of page
A synopsis of the archival utilities is listed below. For additional information, read the on‑line man pages that are available on each system.
Change file and directory permissions on the DMS
msfchmod [-d] -m mode file1 [file2 ...]
Copy one or more files from the DMS
archive get [-C path ] [-s] file1 [file2 ...]
List files and directory contents on the DMS
archive ls [lsopts] [file/dir ...]
Create directories on the DMS
archive mkdir [-C path] [-m mode] [-p] [-s] dir1
[dir2 ...]
Rename a file on the DMS
msfmv old-filename new-filename
Copy one or more files to the DMS
archive put [-C path ] [-D] [-s] file1
[file2 ...]
Delete files on the DMS
msfrm [-i] file1 [file2 ...]
Delete directories on the DMS
msfrmdir dir1 [dir2 ...]
Check the status and availability of the DMS
archive stat [-s]
For a sample batch script using these archival commands, see below.
table of contents — top of page
Network Connectivity
The ERDC DSRC is a critical node of the Defense Research and Engineering
Network (DREN) and has direct, redundant connectivity to the DREN.
The internal DSRC networks are built on redundant Gigabit Ethernet technology. Sapphire
can be accessed via Kerberized ssh as follows:
ssh login_node.erdc.hpc.mil
where login_node is one of sapphire01 through sapphire06.
You may also connect using Kerberized telnet or rlogin. For security purposes, you must have a current Kerberos ticket on your computer before attempting to connect. Use of the hostname, sapphire, is preferred since Internet Protocol (IP) addresses are subject to change.
table of contents — top of page
Application Support Software
All of our systems run derivatives of the UNIX System V operating system with
vendor‑specific enhancements. A large variety of compiler environments, numerical
libraries, graphics libraries, and third‑party analysis applications is available on
the systems. Additional applications can be added to accommodate the diverse needs of the
user communities that we serve. Please contact our
Service Center for more information.
A list of third‑party software licensed for sapphire is available at http://www.erdc.hpc.mil/hardSoft/Software/XT3.
table of contents — top of page
Application/Utilities File Systems
In addition to the software mentioned previously, applications and utilities on sapphire
include Cray‑proprietary and supported programs, commercial packages, and privately
written and supported programs. All non‑Cray applications software, libraries,
utilities, and documentation are stored in one of the following important subdirectories
of the /usr/local/ directory structure:
| Subdirectories | Description of contents |
|---|---|
| applic | Third‑party applications. |
| bin | Third‑party software executable files and system shell scripts. |
| info | Bulletin files for user access. |
| man | Locally developed ERDC DSRC man pages. |
| usp | Contents of these subdirectories are not supported by our systems personnel, but rather by the owner/user. Subdirectories named bin, lib, and applic contain the programs. |
Overview of Compilers and Development Tools
Sapphire provides a full complement of programming development tools. These tools include
assemblers, compilers, parallelizing compilers, and programming utilities. The following
sections describe these elements of the Cray XT3 programming environment.
Sapphire has three programming environments for compiling: Portland Group (PGI), PathScale, and GNU. The PGI Programming Environment is the default programming environment on sapphire. To switch from the PGI default programming environment, use one of the following commands:
module swap PrgEnv-pgi PrgEnv-pathscale //To switch to PathScale module swap PrgEnv-pgi PrgEnv-gnu //To switch to GNU
Optimization flags are different for each programming environment and can be found in the man pages for each: for PGI, man pgf90, pgf77, pgcc, or pgCC; for PathScale, man pathf90, pathcc, or pathCC; for GNU, man gfortran, g77, gcc, or g++.
To compile your code to run on the compute nodes using any of the three programming environments, use the compilers listed in the following table, which will invoke your loaded programming environment. Note, you will still need to issue an aprun command in a batch job to run the compiled code on the compute nodes.
| Compiler | Description |
|---|---|
| ftn | Fortran 90/95 |
| f77 | FORTRAN 77 |
| cc | C |
| CC | C++ |
You may run small applications on sapphire's login nodes if they do not run for more than a few minutes. Alternatively, you may use PBS to schedule a batch job on a batch interactive node. The OS on the batch interactive nodes is full SUSE Linux that can run serial or threaded applications. The batch interactive nodes contain a single dual‑core 2.6‑GHz Opteron processor with about 14‑GBytes of usable memory. Any of the three programming environments may be used, but you must issue the same "module swap" commands listed above if you want to compile with PathScale or GNU. To schedule a single batch interactive node, use the PBS option "-l ncpus=0".
To compile a serial or threaded code to run on a login node or on a batch interactive node, use the compilers listed in the following table.
| Compiler | Description |
|---|---|
| pgf90 | PGI Fortran 90/95 |
| pgf77 | PGI FORTRAN 77 |
| pgcc | PGI C |
| pgCC | PGI C++ |
| pathf90 | PathScale Fortran 77/90/95 |
| pathcc | PathScale C |
| pathCC | PathScale C++ |
| gfortran | GNU Fortran 90/95 |
| g77 | GNU FORTRAN 77 |
| gcc | GNU C |
| g++ | GNU C++ |
Some useful compiler options on the Cray XT3 are
presented in the following table. For additional information on these options, see the
compiler man pages or the PGI User's Guide
.
| Useful PGI Compiler Options | |
|---|---|
| OPTION | PURPOSE |
| Fortran & C/C++: -yod |
When used with the -target=linux (Deferred) flag, allows the compiled program to execute under yod. |
| Fortran & C/C++: -ON |
Specifies a level of optimization between 0 - 3. As "N" increases, compilation time increases, and execution time decreases. N=3 may generate results that differ from those obtained at lower levels. |
| Fortran & C/C++: -fastsse |
An aggregate option that includes a number of individual PGI compiler options. The actual included options depend on the compilation target. |
| Fortran & C/C++: -Mipa=fast |
Invokes interprocedural analysis including several IPA suboptions under the PGI environment. |
| Fortran & C/C++: -dryrun |
Causes the command‑line inputs to be printed to stdout but not actually performed under the PGI environment. |
| Fortran & C/C++: -Minfo=all |
Causes the PGI compilers to issue informational messages to stdout as compilation proceeds. From these messages, you can determine which loops are optimized using unrolling, SSE/SSE2 instructions, vectorization, parallelization, inter-procedural optimizations and various optimizations. |
table of contents — top of page
AMD Core Math Library (ACML) and Cray LibSci
In addition to the AMD Core Math Libraries, sapphire provides Cray's LibSci library as
part of the Cray Programming Environment. This library is a collection of
single‑processor and parallel numerical routines that have been tuned for optimal
performance on Cray XT systems. While both libraries are loaded by default, the
LibSci library contains optimized versions of many of the BLAS math routines. In general,
the LibSci routines will take precedence over their ACML counterparts. Users should call
these routines, instead of the public domain or user written versions, to optimize
application performance on sapphire.
The ACML includes the following:
Cray LibSci includes the following:
table of contents — top of page
C and C++ Compilers
Sapphire provides both C and C++ compilers. The C compiler conforms to the ANSI C standard
as well as "traditional C," the dialect of C defined by Kernigan and Ritchie in "The C
Programming Language." Compiler options allow compilation of programs written in
"traditional C" or pure ANSI C.
The cc and CC commands invoke the Portland Group International (PGI) C and C++ compilers. The C and C++ command‑line syntax is as follows:
cc [option(s)] filename[...]
CC [option(s)] filename[...]
Where option(s) are one or more command‑line options. See above. And, filename is the name of the source file, assembly‑language file, object file, or library to be processed by the compilation system. More than one filename may be specified.
The following table lists the C and C++ file extensions that are supported on sapphire:
| Filename | Assumed Type |
|---|---|
| file.s | Assembly language source file |
| file.o | Object file |
| file.a | Library input file |
| file.lst | Listing file |
| a.out | C or C++ Executable output file |
| file.c | C source file |
| file.C, file.c++, file.cc, file.cxx, file.cpp |
C++ source files |
After compilation, a ".o" extension will be added to each object program produced. For further information and a list of options, use "man cc" or "man CC".
table of contents — top of page
Fortran Compilers
The full ANSI Programming Languages capabilities of FORTRAN 77, Fortran 90, and
Fortran 95 are available on sapphire with a comprehensive set of Fortran extensions.
The FORTRAN 77, Fortran 90, and Fortran 95 command‑line syntax is as follows:
f77 [option(s)] filename[...]
ftn [option(s)] filename[...]
Where option(s) are one or more command‑line options. See above. And, filename is the name of the source file, assembly‑language file, object file, or library to be processed by the compilation system. More than one filename may be specified.
The following table lists the Fortran file extensions that are supported on sapphire.
| Filename | Assumed Type |
|---|---|
| file.a | Library file to be searched for external references. |
| file.f, file.F | Input Fortran source file in fixed source form. If the filename extension is .F, the Fortran preprocessor is invoked. |
| file.f90, file.F90, file.f95, file.F95 | Input Fortran source file in free source form. If the filename extension is .F90 or .F95, the Fortran preprocessor is invoked. |
| file.i | Preprocessor output file |
| file.lst | Listing file. |
| file.o | Object file. |
| file.s | Assembly language file |
| a.out | Default name for a binary (executable) file |
After compilation, a ".o" extension will be added to each object program produced. For further information and a list of options, use "man f77" or "man ftn".
Sapphire supports three programming models: Message Passing Interface (MPI), SHared‑MEMory (SHMEM), and Open Multi‑Processing (OpenMP). MPI and SHMEM are examples of the message- or data‑passing models, while OpenMP only uses shared memory on a node by spawning threads.
Message Passing Interface (MPI)
The MPI package on sapphire is derived from MPICH‑2 and implements the MPI‑2
standard except for spawn support. It also implements the MPI 1.2 standard, as
documented by the MPI Forum in the spring 1997 release of MPI: A Message Passing
Interface Standard.
For more information on included MPI‑2 features, see the Cray XT Series Programming
Environment User's Guide
,
available on‑line from Cray.
On CLE systems, the Cray Message Passing Toolkit (MPT) supports only the single-system image model. MPI is a component of the MPT, which is a software package that supports parallel programming across a network of computer systems through a technique known as message passing. MPI establishes a practical, portable, efficient, and flexible standard for message passing that makes use of the most attractive features of a number of existing message‑passing systems, rather than selecting one of them and adopting it as the standard. See "man intro_mpi" for additional information.
When creating an MPI program on sapphire, ensure that the following actions are taken:
INCLUDE "mpif.h" //if written in Fortran, or #include <mpi.h> //if written in C
To compile an MPI program, use the following examples:
cc -o mpi_program mpi_program.c
ftn -o mpi_program mpi_program.f
To run an MPI program within a batch script, use the following command:
aprun -n N mpi_program [user_arguments]
where N is the number of processes to start. The aprun command launches executables across a set of CNL compute nodes. File operations performed by the compute node processes (if not directed to a parallel I/O facility) are transparently forwarded to aprun, which executes the operations and returns the results to the application. When each member of the parallel application has exited, aprun exits. For more information about aprun, see the aprun man page.
table of contents — top of page
SHared MEMory (SHMEM)
The logically shared, distributed-memory access routines provide high‑performance,
high‑bandwidth communication for use in highly parallelized scalable programs. The
SHMEM data‑passing library routines are similar to the MPI library routines: they
pass data between cooperating parallel processes. The SHMEM data‑passing routines
can be used in programs that perform computations in separate address spaces and that
explicitly pass data to and from different processes in the program.
The SHMEM routines minimize the overhead associated with data‑passing requests, maximize bandwidth, and minimize data latency. Data latency is the length of time between a process initiating a transfer of data and that data becoming available for use at its destination.
SHMEM routines support remote data transfer through put operations that transfer data to a different process and get operations that transfer data from a different process. Other supported operations are work‑shared broadcast and reduction, barrier synchronization, and atomic memory updates. An atomic memory operation is an atomic read and update operation, such as a fetch and increment, on a remote or local data object. The value read is guaranteed to be the value of the data object just prior to the update. See "man intro_shmem" for details on the SHMEM library.
When creating a SHMEM program on sapphire, ensure that the following actions are taken:
INCLUDE 'mpp/shmem.fh' //if written in Fortran, or #include <mpp/shmem.h> //if written in C
To compile a SHMEM program, use the following examples:
cc -lsma -o shmem_program shmem_program.c or
ftn -lsma -o shmem_program shmem_program.f90
Before running a SHMEM program, you may want to set following environment variables:
setenv XT_LINUX_SHMEM_STACK_SIZE 24m
setenv XT_LINUX_SHMEM_HEAP_SIZE 120m
setenv XT_SYMMETRIC_HEAP SIZE 20m
The program can then be launched using the aprun command as follows:
aprun -n N shmem_program [user_arguments]
where N is the number of processes to start. The aprun command launches executables across a set of CNL compute nodes. File operations performed by the compute node processes (if not directed to a parallel I/O facility) are transparently forwarded to aprun, which executes the operations and returns the results to the application. When each member of the parallel application has exited, aprun exits. For more information about aprun, see the aprun man page.
For more information on the performance and use of SHMEM calls, see the Cray XT Series
Programming Environment User's Guide
, available on‑line from Cray.
table of contents — top of page
Open Multi-Processing (OpenMP)
OpenMP is a shared‑memory parallel programming model that consists of a set of
compiler directives (Fortran directives, C and C++ pragmas), library routines, and
environment variables.
When creating an OpenMP program on sapphire, ensure that the following actions are taken:
USE omp_libOr, includes one of the following:
INCLUDE 'omp.h' //if written in Fortran, or #include <omp.h> //if written in C.
To compile an OpenMP program, use the following examples:
# For C codes: cc -o OpenMP_program -mp=nonuma OpenMP_program.c //PGI compiler cc -o OpenMP_program -mp OpenMP_program.c //PathScale compiler cc -o OpenMP_program -fopenmp OpenMP_program.c //GNU compiler # For Fortran codes: ftn -o OpenMP_program -mp=nonuma OpenMP_program.f //PGI compiler ftn -o OpenMP_program -mp OpenMP_program.f //PathScale compiler ftn -o OpenMP_program -fopenmp OpenMP_program.f //GNU compiler
To run an OpenMP program within a batch script, you also need to set the $OMP_NUM_THREADS environment variable to the number of threads in the team. For example:
setenv OMP_NUM_THREADS 2
aprun -n 1 -d 2 OpenMP_program [user_arguments]
In the example above, the application starts OpenMP_program on one node and spawns an additional thread.
An application built with the hybrid model of parallel programming can run on sapphire using both OpenMP and MPI. In OpenMP/MPI applications, MPI calls can be made from MPI parallel regions but not from inside the threaded regions.
table of contents — top of page
The Portable Batch System (PBS) is currently running on sapphire. It schedules jobs and manages resources and job queues, and can be accessed through the interactive batch environment or by submitting a batch request. PBS is able to manage both single‑processor and multiprocessor jobs.
Available Queues
Currently, the XT3 batch environment consists of seven queues.
For a complete description of job queue limits for sapphire, see the Cray XT3 Queue Limits Summary.
table of contents — top of page
Interactive Environment
When you log in to sapphire, you will be running in an interactive shell on a login node.
These nodes are for compiling, editing, and general interactive use by all users. You may
only run small applications on these nodes if they complete in less than 10 minutes and
use less than 2 GBytes of memory. In the interest of all users, any job running on
the login nodes that exceeds these limits may be unilaterally terminated.
The preferred method to run interactive jobs is to use the Interactive Batch Environment. Jobs submitted to the batch queuing system from the Interactive Batch Environment will be submitted to compute nodes for execution.
table of contents — top of page
Interactive Batch Environment
In order to use the interactive batch environment, you must first acquire an interactive
batch shell. This is done by executing a qsub command with the "-I"
option from within the interactive environment. For example,
qsub -l ncpus=# -A project_name -q queue_name -l walltime=wall_time -I
Your batch shell request will be placed in the desired queue and scheduled for execution. This may take a few minutes or more because of the system load. Once your shell starts, you will be logged in to one of the PBS host nodes. At this point, you can run or debug interactive applications, execute job scripts, start an execution on the compute node via the aprun command or postprocess data, etc.
table of contents — top of page
Batch Request Submission
An alternative to using the interactive batch environment is to submit batch requests
directly to PBS from within the interactive environment. This is done by using the
qsub command to hand off a job script to the PBS scheduler. The scheduler will
determine when the job is eligible for execution based on job resource requirements and
available system resources.
table of contents — top of page
Creating a Batch Script
While it is possible to include all PBS directives at the qsub command‑line, the
preferred method is to embed the PBS directives within the batch request script using
"#PBS". Such a script might look like the following:
# This is a sample PBS batch script.
# Declare the project under which this job run will be charged.
# (required)
# Users can find eligible projects by typing "show_usage" on the command line.
#PBS -A project_name
# Request 1 hour of wallclock time for execution (required).
#PBS -l walltime=01:00:00
# Request 4 cores (required).
#PBS -l ncpus=4
# Submit job to debug queue (required).
#PBS -q debug
# Declare a jobname.
#PBS -N myjob
# Send standard output (stdout) and error (stderr) to the same file.
#PBS -j oe
# Make a new subdirectory in working storage space.
mkdir $WORKDIR/projA-7
# Change to the new directory.
cd $WORKDIR/projA-7
# Check DMS availability. If not available, then wait.
archive stat -s
# Retrieve executable program from the DMS.
archive get -C $ARCHIVE_HOME/project_name program.exe
# Retrieve input data file from the DMS.
archive get -C $ARCHIVE_HOME/project_name/input data.in
# Execute a parallel program.
aprun -n 4 my_program < data.in > projA-7.out
# Check DMS availability. If not available, then wait.
archive stat -s
# Create a new subdirectory on the DMS.
archive mkdir -C $ARCHIVE_HOME/project_name output7
# Transfer output file back to the DMS.
archive put -C $ARCHIVE_HOME/project_name/output7 projA-7.out
# Clean up unneeded files from working storage.
cd $WORKDIR
rm -r projA-7
Submitting a Batch Script
To submit the batch request script, use the following command:
qsub scriptname
When the script (above) begins execution, it first copies the executable program and input files from your $ARCHIVE_HOME directory to your working directory, $WORKDIR. It then runs the executable on four cores and returns the results to your $ARCHIVE_HOME.
Option "-j oe" creates a file named myjob.ojobid that contains both stdout and stderr from the job. This file name is a combination of the name that you supplied with the "-N" option and the numeric job ID assigned by PBS.
You can monitor batch jobs by using the qstat, qview, or qhist commands. You can delete a job by using "qdel job_ID". The job_ID can be obtained from the output of the qstat command.
For more information on qsub and other PBS commands, see their respective man pages or the PBS qsub Quick Reference Guide.
table of contents — top of page
Single-System View Commands
Regular Unix commands only work on the specific node into which you are logged. The
following commands allow operations to and provide information on the entire system.
Further information and command syntax can be found using the system's man page utility.
| Command | Description |
|---|---|
| xtshowmesh | Shows information about compute and service partition processors and the jobs running in each partition. |
| xtshowcabs | Shows information about compute and service nodes organized by chassis and cabinet. |
| xt_ps | Provides process information for all login nodes of the system. |
| xthostname | Displays or sets the xthostname value. |
| xt_who | Shows users logged onto the Cray XT3 system. |
| xt_free | Shows free and used physical memory for all login nodes. |
Last update: July 13, 2009
You are accessing a U.S. Government (USG) Information System (IS) that is provided for USG-authorized use only. By using this IS (which includes any device attached to this IS), you consent to the following conditions: * The USG routinely intercepts and monitors communications on this IS for purposes including, but not limited to, penetration testing, COMSEC monitoring, network operations and defense, personnel misconduct (PM), law enforcement (LE), and counterintelligence (CI) investigations. * At any time, the USG may inspect and seize data stored on this IS. * Communications using, or data stored on, this IS are not private, are subject to routine monitoring, interception, and search, and may be disclosed or used for any USG- authorized purpose. * This IS includes security measures (e.g., authentication and access controls) to protect USG interests--not for your personal benefit or privacy. * Not withstanding the above, using this IS does not constitute consent to PM, LE or CI investigative searching or monitoring of the content of privileged communications, or work product, related to personal representation or services by attorneys, psychotherapists, or clergy, and their assistants. Such communications and work product are private and confidential.