SuperMicro SuperServer MLA V100 (Vulcanite)
User Guide
Table of Contents
- 1. Introduction
- 1.1. Document Scope and Assumptions
- 1.2. Obtaining an Account
- 1.3. Requesting Assistance
- 2. System Configuration
- 2.1. System Summary
- 2.2. Operating System
- 2.3. File Systems
- 2.3.1. /home
- 2.3.2. /gpfs/cwfs
- 2.3.3. /tmp
- 3. Accessing the System
- 3.1. Kerberos
- 3.2. Logging In
- 4. User Environment
- 4.1. Modules
- 4.2. Archive Usage
- 4.3. Available Compilers
- 4.4. Programming Models
- 5. Batch Scheduling
- 5.1. Scheduler
- 5.2. Queue Information
- 5.3. Interactive Batch Sessions
- 5.4. Batch Resource Directives
- 5.5. Launch Commands
- 5.6. Sample Script
- 5.7. PBS Commands
1. Introduction
1.1. Document Scope and Assumptions
This document provides an overview and introduction to the use of the SuperMicro SuperServer MLA V100 (Vulcanite) located at the ERDC DSRC, along with a description of the specific computing environment on Vulcanite. The intent of this guide is to provide information that will enable the average user to perform computational tasks on the system. To receive the most benefit from the information provided here, you should be proficient in the following areas:
- Use of the UNIX operating system
- Use of an editor (e.g., vi or emacs)
- Remote usage of computer systems via network or modem access
- A selected programming language and its related tools and libraries
1.2. Obtaining an Account
To get an account on Vulcanite, you must first submit a Vulcanite Project Proposal. You will also require an account on the HPCMP Portal to the Information Environment, commonly called a "pIE User Account." Once you have submitted your proposal, if you do not yet have a pIE User Account, please visit HPC Centers: Obtaining An Account and follow the instructions there. If you need assistance with any part of this process, please contact the HPC Help Desk at accounts@helpdesk.hpc.mil.
1.3. Requesting Assistance
The ERDC DSRC HPC Service Center is available to help users with problems, issues, or questions. Analysts are on duty 8:00 a.m. - 5:00 p.m. Central, Monday - Friday (excluding Federal holidays).
To request assistance, contact the ERDC DSRC directly in any of the following ways:
- E-mail: dsrchelp@erdc.hpc.mil
- Phone: 1-800-500-4722 or (601) 634-4400
For more detailed contact information, please see our Contact Page.
2. System Configuration
2.1. System Summary
Vulcanite is an exploratory system meant to provide users access to a variety of high density GPU node configurations. Each node type has a different number of processors, amount of memory, number of GPUs, amount of SSD storage, and number of network interfaces. Because of this users should take care when migrating between node types.
Login Nodes | Accelerator Nodes | ||||
---|---|---|---|---|---|
2 GPU | 4 GPU | 8 GPU | |||
Total Nodes | 2 | 26 | 8 | 5 | |
Operating System | RHEL 7 | ||||
Cores/Node | 12 | 24 | 48 | ||
Core Type | Intel Gold 6126T Skylake (12 cores) |
Intel Gold 6126T Skylake (12 cores) +NVIDIA V100 PCIe |
Dual Intel Gold 6136 Skylake (12 cores/socket) +NVIDIA V100 SXM2 |
Dual Intel Platinum 8160 Skylake (24 cores/socket) +NVIDIA V100 SXM2 |
|
Core Speed | 2.6 GHz | 3.0 GHz | 2.1 GHz | ||
Memory/Node | 192 GBytes DDR4-2666 |
192 GBytes DDR4-2666 +2 x 32 GBytes |
384 GBytes DDR4-2666 +4 x 32 GBytes |
768 GBytes DDR4-2666 +8 x 32 GBytes |
|
Accessible Memory/Node | 8 GBytes | 206 GBytes +2 x 32 GBytes |
284 GBytes +4 x 32 GBytes |
764 GBytes +8 x 32 GBytes |
|
Interconnect Type | EDR InfiniBand 1x | EDR InfiniBand 2x | EDR InfiniBand 4x | ||
SSD local on Node | 2 TBytes NVMe | 2 TBytes NVMe | 4 TBytes NVMe | 8 TBytes NVMe |
Path | Capacity | Type |
---|---|---|
/home | 4 TBytes | SSD |
/gpfs/cwfs | 3 PBytes | GPFS |
2.2. Operating System
Vulcanite's operating system is RedHat Enterprise Linux 7.
2.3. File Systems
Vulcanite has the following file systems available for user storage:
2.3.1. /home
/home is a locally mounted SSD with a unformatted capacity of 4 TBytes. All users have a home directory located on this file system, which can be referenced by the environment variable $HOME. /home has a 30 GByte quota.
2.3.2. /gpfs/cwfs
The Center-Wide File System (CWFS) provides file storage that is accessible from all Vulcanite's nodes. The environment variable $CENTER refers to this directory.
2.3.3. /tmp
The /tmp directory allows users write access to the local SSD on each node. The size of the SDD depends on configuration of the node (see table above). Note: any files placed on /tmp will be removed when the batch job ends. Users should copy any necessary files from $CENTER to /tmp near the beginning of their batch script. Then copy any desired results from /tmp to $CENTER before the end of their batch script.
3. Accessing the System
3.1. Kerberos
A Kerberos client kit must be installed on your desktop to enable you to get a Kerberos ticket. Kerberos is a network authentication tool that provides secure communication by using secret cryptographic keys. Only users with a valid HPCMP Kerberos authentication can gain access to Vulcanite. More information about installing Kerberos clients on your desktop can be found at HPC Centers: Kerberos & Authentication.
3.2. Logging In
The login nodes for the Vulcanite cluster are vulcanite01 and vulcanite02.
The preferred way to login to Vulcanite is via ssh, as follows:
% ssh vulcanite.erdc.hpc.mil
4. User Environment
4.1. Modules
A number of modules are loaded automatically as soon as you log in. To see the modules which are currently loaded, use the "module list" command. To see the entire list of available modules, use the "module avail" command. You can modify the configuration of your environment by loading and unloading modules. For complete information on how to do this, see the Modules User Guide.
4.2. Archive Usage
Vulcanite does not have direct access to the MSAS archive server, but does have access to the CWFS.
4.3. Available Compilers
Vulcanite has the GNU and Intel compilers.
Vulcanite has several MPI suites:
- OpenMPI (GCC)
- MPICH (GCC)
- MVAPICH2 (GCC)
- IMPI (INTEL)
4.4. Programming Models
Vulcantie supports two base programming models: Message Passing Interface (MPI) and Open Multi-Processing (OpenMP). A Hybrid MPI/OpenMP programming model is also supported.
5. Batch Scheduling
5.1. Scheduler
The Portable Batch System (PBS) is currently running on Vulcanite.
5.2. Queue Information
Vulcanite only has the Standard queue. The maximum wall clock time is 168 hours.
5.3. Interactive Batch Sessions
To get an interactive batch session, you must first submit an interactive batch job through PBS. This is done by executing a qsub command with the "-I" option from within the interactive login environment. For example:
qsub -l select=N1:ncpus=12:mpiprocs=N2 -A Project_ID -q standard -l walltime=HHH:MM:SS -I
You must specify the number of nodes requested (N1), the number of processes per node (N2), the desired maximum walltime, your project ID, and a job queue.
Your interactive batch sessions will be scheduled just as normal batch jobs are scheduled depending on the other queued batch jobs, so it may take quite a while. Once your interactive batch shell starts, you can run or debug interactive applications, post-process data, etc.
At this point, you can run parallel applications on your assigned set of compute nodes. You can also run interactive commands or scripts on this node.
5.4. Batch Resource Directives
Batch resource directives allow you to specify to PBS how your batch jobs should be run and what resources your job requires. Although PBS has many directives, you only need to know a few to run most jobs.
The basic syntax of PBS directives is as follows:
#PBS option[[=]value]
where some options may require values to be included. For example, to start a 8-process job, you would request one node of 12 cores and specify that you will be running 8 processes per node:
#PBS -l select=1:ncpus=12:mpiprocs=8:ngpus=2
The following directives are required for all jobs:
Directive | Value | Description |
---|---|---|
-A | Project_ID | Name of the project |
-q | queue_name | Name of the queue |
-l | select=N1:ncpus=12:mpiprocs=N2ngpus=2 | For 2-GPU nodes: N1 = Number of nodes N2 = MPI processes per node |
-l | select=N1:ncpus=24:mpiprocs=N2ngpus=4 | For 4-GPU node: N1 = Number of nodes N2 = MPI processes per node |
-l | select=N1:ncpus=48:mpiprocs=N2ngpus=8 | For 8-GPU nodes: N1 = Number of nodes N2 = MPI processes per node |
-l | walltime=HHH:MM:SS | Maximum wall time |
5.5. Launch Commands
To launch an MPI executable use mpiexec. For example:
mpiexec -n #_of_MPI_tasks ./mpijob.exe
For OpenMP executables, no launch command is needed.
5.6. Sample Script
While it is possible to include all PBS directives at the qsub command line, the preferred method is to embed the PBS directives within the batch request script using "#PBS". The following is a sample batch script:
#!/bin/csh # Declare the project under which this job run will be charged. (required) # Users can find eligible projects by typing "show_usage" on the command line. #PBS -A Project_ID # Request 1 hour of wallclock time for execution. #PBS -l walltime=01:00:00 # Request nodes. #PBS -l select=1:ncpus=12:mpiprocs=12:ngpus=2 # Submit job to standard queue. #PBS -q standard # Declare a jobname. #PBS -N myjob # Send standard output (stdout) and error (stderr) to the same file. #PBS -j oe # Change to the working directory. cd $PBS_O_WORKDIR # Execute a parallel program. ???
5.7. PBS Commands
The following commands provide the basic functionality for using the PBS batch system:
qsub: Used to submit jobs for batch processing.
qsub [ options ] my_job_script
qstat: Used to check the status of submitted jobs.
qstat PBS_JOBID ## check one job
qstat -u my_user_name ## check all of user's jobs
qdel: Used to kill queued or running jobs.
qdel PBS_JOBID