Onyx - Hardware Maintenance
In order to repair a Lustre OSS node, Onyx will temporarily lose access to 2 OSTs for approximately one hour. OSTs 62 and 63 from /p/work will be taken offline for emergency maintenance beginning at 1500 Central Time on Thursday, 16 March 2023.
During this short outage, current jobs will be allowed to run but new jobs will be prevented from starting. Files on these 2 OSTs are currently read-only.
While the 2 OSTs are offline, approximately 3% of files in $WORKDIR will display errors such as "Cannot send after transport endpoint shutdown" and the permissions on the file will display as ??????????. These errors should resolve after the short maintenance.
NOTE: Users can check files under their work directory to see if any will be impacted using the script ost_62_63_check.sh. "cd" into a directory containing files you would like to check, then pass as arguments a list of files (wildcards are allowed). The script will output only files on OSTs 62 or 63 which will be impacted. If these files are important to runs occurring during the 16 March outage, users can duplicate (copy) the files which will create them on other OSTs. The original files on OSTs 62-63 can be removed and the new files renamed.