Next: Advanced distributed-transpose interface, Previous: FFTW MPI Transposes, Up: FFTW MPI Transposes

In particular, suppose that we have an `n0`

by `n1`

array in
row-major order, block-distributed across the `n0`

dimension. To
transpose this into an `n1`

by `n0`

array block-distributed
across the `n1`

dimension, we would create a plan by calling the
following function:

fftw_plan fftw_mpi_plan_transpose(ptrdiff_t n0, ptrdiff_t n1, double *in, double *out, MPI_Comm comm, unsigned flags);

The input and output arrays (`in`

and `out`

) can be the
same. The transpose is actually executed by calling
`fftw_execute`

on the plan, as usual.

The `flags`

are the usual FFTW planner flags, but support
two additional flags: `FFTW_MPI_TRANSPOSED_OUT`

and/or
`FFTW_MPI_TRANSPOSED_IN`

. What these flags indicate, for
transpose plans, is that the output and/or input, respectively, are
*locally* transposed. That is, on each process input data is
normally stored as a `local_n0`

by `n1`

array in row-major
order, but for an `FFTW_MPI_TRANSPOSED_IN`

plan the input data is
stored as `n1`

by `local_n0`

in row-major order. Similarly,
`FFTW_MPI_TRANSPOSED_OUT`

means that the output is `n0`

by
`local_n1`

instead of `local_n1`

by `n0`

.

To determine the local size of the array on each process before and
after the transpose, as well as the amount of storage that must be
allocated, one should call `fftw_mpi_local_size_2d_transposed`

,
just as for a 2d DFT as described in the previous section:

ptrdiff_t fftw_mpi_local_size_2d_transposed (ptrdiff_t n0, ptrdiff_t n1, MPI_Comm comm, ptrdiff_t *local_n0, ptrdiff_t *local_0_start, ptrdiff_t *local_n1, ptrdiff_t *local_1_start);

Again, the return value is the local storage to allocate, which in
this case is the number of *real* (`double`

) values rather
than complex numbers as in the previous examples.