pyfftw.interfaces - Drop in replacements for other FFT implementations

The pyfftw.interfaces package provides interfaces to pyfftw that implement the API of other, more commonly used FFT libraries; specifically numpy.fft, scipy.fft and scipy.fftpack. The intention is to satisfy two clear use cases:

  1. Simple, clean and well established interfaces to using pyfftw, removing the requirement for users to know or understand about creating and using pyfftw.FFTW objects, whilst still benefiting from most of the speed benefits of FFTW.

  2. A library that can be dropped into code that is already written to use a supported FFT library, with no significant change to the existing code. The power of python allows this to be done at runtime to a third party library, without changing any of that library’s code.

The pyfftw.interfaces implementation is designed to sacrifice a small amount of the flexibility compared to accessing the pyfftw.FFTW object directly, but implements a reasonable set of defaults and optional tweaks that should satisfy most situations.

The precision of the transform that is used is selected from the array that is passed in, defaulting to double precision if any type conversion is required.

This module works by generating a pyfftw.FFTW object behind the scenes using the pyfftw.builders interface, which is then executed. There is therefore a potentially substantial overhead when a new plan needs to be created. This is down to FFTW’s internal planner process. After a specific transform has been planned once, subsequent calls in which the input array is equivalent will be much faster, though still not without potentially significant overhead. This overhead can be largely alleviated by enabling the pyfftw.interfaces.cache functionality. However, even when the cache is used, very small transforms may suffer a significant relative slow-down not present when accessing pyfftw.FFTW directly (because the transform time can be negligibly small compared to the fixed pyfftw.interfaces overhead).

In addition, potentially extra copies of the input array might be made.

If speed or memory conservation is of absolutely paramount importance, the suggestion is to use pyfftw.FFTW (which provides better control over copies and so on), either directly or through pyfftw.builders. As always, experimentation is the best guide to optimisation.

In practice, this means something like the following (taking numpy_fft as an example):

>>> import pyfftw, numpy
>>> a = pyfftw.empty_aligned((128, 64), dtype='complex64', n=16)
>>> a[:] = numpy.random.randn(*a.shape) + 1j*numpy.random.randn(*a.shape)
>>> fft_a = pyfftw.interfaces.numpy_fft.fft2(a) # Will need to plan
>>> b = pyfftw.empty_aligned((128, 64), dtype='complex64', n=16)
>>> b[:] = a
>>> fft_b = pyfftw.interfaces.numpy_fft.fft2(b) # Already planned, so faster
>>> c = pyfftw.empty_aligned(132, dtype='complex128', n=16)
>>> fft_c = pyfftw.interfaces.numpy_fft.fft(c) # Needs a new plan
>>> c[:] = numpy.random.randn(*c.shape) + 1j*numpy.random.randn(*c.shape)
>>> pyfftw.interfaces.cache.enable()
>>> fft_a = pyfftw.interfaces.numpy_fft.fft2(a) # still planned
>>> fft_b = pyfftw.interfaces.numpy_fft.fft2(b) # much faster, from the cache

The usual wisdom import and export functions work well for the case where the initial plan might be prohibitively expensive. Just use pyfftw.export_wisdom() and pyfftw.import_wisdom() as needed after having performed the transform once.

Implemented Functions

The implemented functions are listed below. numpy.fft is implemented by pyfftw.interfaces.numpy_fft, scipy.fftpack by pyfftw.interfaces.scipy_fftpack and scipy.fft by pyfftw.interfaces.scipy_fft. All the implemented functions are extended by the use of additional arguments, which are documented below.

Not all the functions provided by numpy.fft, scipy.fft and scipy.fftpack are implemented by pyfftw.interfaces. In the case where a function is not implemented, the function is imported into the namespace from the corresponding library. This means that all the documented functionality of the library is provided through pyfftw.interfaces.

One known caveat is that repeated axes are handled differently. Axes that are repeated in the axes argument are considered only once and without error; as compared to numpy.fft in which repeated axes results in the DFT being taken along that axes as many times as the axis occurs, or to scipy where an error is raised.

numpy_fft

scipy_fft

scipy_fftpack

dask_fft

Additional Arguments

In addition to the equivalent arguments in numpy.fft, scipy.fft and scipy.fftpack, all these functions also add several additional arguments for finer control over the FFT. These additional arguments are largely a subset of the keyword arguments in pyfftw.builders with a few exceptions and with different defaults.

  • overwrite_input: Whether or not the input array can be overwritten during the transform. This sometimes results in a faster algorithm being made available. It causes the 'FFTW_DESTROY_INPUT' flag to be passed to the intermediate pyfftw.FFTW object. Unlike with pyfftw.builders, this argument is included with every function in this package.

    In scipy_fftpack and scipy_fft, this argument is replaced by overwrite_x, to which it is equivalent (albeit at the same position).

    The default is False to be consistent with numpy.fft.

  • planner_effort: A string dictating how much effort is spent in planning the FFTW routines. This is passed to the creation of the intermediate pyfftw.FFTW object as an entry in the flags list. They correspond to flags passed to the pyfftw.FFTW object.

    The valid strings, in order of their increasing impact on the time to compute are: 'FFTW_ESTIMATE', 'FFTW_MEASURE' (default), 'FFTW_PATIENT' and 'FFTW_EXHAUSTIVE'.

    The Wisdom that FFTW has accumulated or has loaded (through pyfftw.import_wisdom()) is used during the creation of pyfftw.FFTW objects.

    Note that the first time planning stage can take a substantial amount of time. For this reason, the default is to use 'FFTW_ESTIMATE', which potentially results in a slightly suboptimal plan being used, but with a substantially quicker first-time planner step.

  • threads: The number of threads used to perform the FFT.

    In scipy_fft, this argument is replaced by workers, which serves the same purpose, but is also compatible with the scipy.fft.set_workers() context manager.

    The default is 1.

  • auto_align_input: Correctly byte align the input array for optimal usage of vector instructions. This can lead to a substantial speedup.

    This argument being True makes sure that the input array is correctly aligned. It is possible to correctly byte align the array prior to calling this function (using, for example, pyfftw.byte_align()). If and only if a realignment is necessary is a new array created.

    It’s worth noting that just being aligned may not be sufficient to create the fastest possible transform. For example, if the array is not contiguous (i.e. certain axes have gaps in memory between slices), it may be faster to plan a transform for a contiguous array, and then rely on the array being copied in before the transform (which pyfftw.FFTW will handle for you). The auto_contiguous argument controls whether this function also takes care of making sure the array is contiguous or not.

    The default is True.

  • auto_contiguous: Make sure the input array is contiguous in memory before performing the transform on it. If the array is not contiguous, it is copied into an interim array. This is because it is often faster to copy the data before the transform and then transform a contiguous array than it is to try to take the transform of a non-contiguous array. This is particularly true in conjunction with the auto_align_input argument which is used to make sure that the transform is taken of an aligned array.

    The default is True.

Caching

During calls to functions implemented in pyfftw.interfaces, a pyfftw.FFTW object is necessarily created. Although the time to create a new pyfftw.FFTW is short (assuming that the planner possesses the necessary wisdom to create the plan immediately), it may still take longer than a short transform.

This module implements a method by which objects that are created through pyfftw.interfaces are temporarily cached. If an equivalent transform is then performed within a short period, the object is acquired from the cache rather than a new one created. The equivalency is quite conservative and in practice means that if any of the arguments change, or if the properties of the array (shape, strides, dtype) change in any way, then the cache lookup will fail.

The cache temporarily stores a copy of any interim pyfftw.FFTW objects that are created. If they are not used for some period of time, which can be set with pyfftw.interfaces.cache.set_keepalive_time(), then they are removed from the cache (liberating any associated memory). The default keepalive time is 0.1 seconds.

Enable the cache by calling pyfftw.interfaces.cache.enable(). Disable it by calling pyfftw.interfaces.cache.disable(). By default, the cache is disabled.

Note that even with the cache enabled, there is a fixed overhead associated with lookups. This means that for small transforms, the overhead may exceed the transform. At this point, it’s worth looking at using pyfftw.FFTW directly.

When the cache is enabled, the module spawns a new thread to keep track of the objects. If threading is not available, then the cache is not available and trying to use it will raise an ImportError exception.

The actual implementation of the cache is liable to change, but the documented API is stable.

pyfftw.interfaces.cache.disable()

Disable the cache.

pyfftw.interfaces.cache.enable()

Enable the cache.

pyfftw.interfaces.cache.set_keepalive_time(keepalive_time)

Set the minimum time in seconds for which any pyfftw.FFTW object in the cache is kept alive.

When the cache is enabled, the interim objects that are used through a pyfftw.interfaces function are cached for the time set through this function. If the object is not used for the that time, it is removed from the cache. Using the object zeros the timer.

The time is not precise, and sets a minimum time to be alive. In practice, it may be quite a bit longer before the object is deleted from the cache (due to implementational details - e.g. contention from other threads).