Friday, October 23, 2009

resolve MPI deadlocking

Deadlock is a problem with blocked communication.

Example 1: always deadlocking

if (rank == 0) then
call MPI_Recv(..., 1, tag, MPI_COMM_WORLD, status, ierr)
call MPI_Send(..., 1, tag, MPI_COMM_WORLD, ierr)
elseif (rank == 1) then
call MPI_Recv(..., 0, tag, MPI_COMM_WORLD, status, ierr)
call MPI_Send(..., 0, tag, MPI_COMM_WORLD, ierr)
endif


Example 2: sometimes deadlocking: MPI is using internal buffers (the “message envelope”) to cache messages. A blocked comm pattern may work for some values of count, and then fail as count is increased.

if (rank == 0) then
call MPI_Send(..., 1, tag, MPI_COMM_WORLD, ierr)
call MPI_Recv(..., 1, tag, MPI_COMM_WORLD, status, ierr)
elseif (rank == 1) then
call MPI_Send(..., 0, tag, MPI_COMM_WORLD, ierr)
call MPI_Recv(..., 0, tag, MPI_COMM_WORLD, status, ierr)
endif


A couple of ways to fix this problem.
Method 1: reverse the order of one of the send/receive pairs

if (rank == 0) then
call MPI_Send(..., 1, tag, MPI_COMM_WORLD, ierr)
call MPI_Recv(..., 1, tag, MPI_COMM_WORLD, status, ierr)
elseif (rank == 1) then
call MPI_Recv(..., 0, tag, MPI_COMM_WORLD, status, ierr)
call MPI_Send(..., 0, tag, MPI_COMM_WORLD, ierr)
endif


Method 2: using unblocked communication

if (rank == 0) then
call MPI_Isend(..., 1, tag, MPI_COMM_WORLD, req, ierr)
call MPI_Recv(..., 1, tag, MPI_COMM_WORLD, status, ierr)
call MPI_Wait(req, status)
elseif (rank == 1) then
call MPI_Recv(..., 0, tag, MPI_COMM_WORLD, status, ierr)
call MPI_Send(..., 0, tag, MPI_COMM_WORLD, ierr)
endif

No comments: