System News

02.06.23

Onyx - System Maintenance

In order to repair a Lustre OSS node, Onyx will temporarily lose access to 2 OSTs for approximately one hour. OSTs 70 and 71 from /p/work will be taken offline for scheduled maintenance beginning at 0900 Central Time on Monday, 6 February 2023.

During this short outage, current jobs will be allowed to run but new jobs will be prevented from starting. Files on these 2 OSTs are currently read-only. Users have not been able to write to OST 70 or 71 for a few weeks.

While the 2 OSTs are offline, approximately 3% of files in $WORKDIR will display errors such as "Cannot send after transport endpoint shutdown" and the permissions on the file will display as ??????????. These errors should resolve after the short maintenance.

NOTE: Users can check files under their work directory to see if any will be impacted using the script ost_70_71_check.sh. "cd" into a directory containing files you would like to check, then pass as arguments a list of files (wildcards are allowed). The script will output only files on OSTs 70 or 71 which will be impacted. If these files are important to runs occurring during the 6 Feb outage, users can duplicate (copy) the files which will create them on other OSTs. The original files on OSTs 70-71 can be removed and the new files renamed.


Return to System News