MPI Common Error Messages on Cray XT Systems
Taken From: Cray XT Series Programming Environment User's Guide S-2396-20
-
If you see this error message:
internal ABORT - process 0: Other MPI error, error stack: MPIDI_PortalsU_Request_PUPE###: exhausted unexpected receive queue buffering increase via env. var. MPICH_UNEX_BUFFER_SIZE
It means:
The application is sending too many short, unexpected messages to a particular receiver. An unexpected message is an MPI message that has been sent to the receiver, but for which the receiver has yet to post its corresponding MPI receive request. As a result, the message is buffered in the receiver's MPI buffer for unexpected messages.Try this to work around the problem:
Increase the amount of memory for MPI buffering using the MPICH_UNEX_BUFFER_SIZE variable (default is "60M") and/or decrease the short message size threshold using the MPICH_MAX_SHORT_MSG_SIZE variable (default is 128000 bytes). -
If you see this error message:
[0] MPIDI_PortalsU_Request_..._...: dropped event on unexpected receive queue, increase [0] queue size by setting the environment variable MPICH_PTL_UNEX_EVENTS
It means:
The event queue entries associated with the unexpected messages queue have been exhausted. The default value is 20480 events.Try doing this to work around the problem:
Increase the size of this queue by setting the environment variable MPICH_PTL_UNEX_EVENTS to some value higher than 20480. -
If you see this error message:
[0] MPIDI_Portals_Progress: dropped event on "other" queue, increase [0] queue size by setting the environment variable MPICH_PTL_OTHER_EVENTS aborting job: Dropped Portals event
It means:
The event queue entries associated with the "other" queue have been exhausted. This can happen if the application is posting many nonblocking sends, or if a large number of pre‑posted receives are being posted, or if many MPI‑2 RMA operations are posted in a single epoch. The default size of the "other" event queue is 2048 events.Try doing this to work around the problem:
Increase the size of this queue by setting the environment variable MPICH_PTL_OTHER_EVENTS to some value higher than 2048. -
If you see this error message:
0:(/notbackedup/users/rsrel/rs64.REL_._._....... Wed/pe/computelibs/mpich2/src/mpid/portals32/src/portals_progress.c:642) PtlEQAlloc failed : PTL_NO_SPACE
It means:
You have requested so much event queue space for MPI (and possibly SHMEM if using both in the same application) that there are not sufficient Portals resources to satisfy the request.Try doing this to work around the problem:
Decrease the size of the event queues by setting the environment variables MPICH_PTL_UNEXPECTED_EVENTS and MPICH_PTL_OTHER_EVENTS to smaller values.