FFT Benchmarks

Polynomial interpolation methods comparison

Three methods of polynomial interpolation were benchmarked, with different input sizes each time:

  • CPU Lagrange: Finding the Lagrange polynomial of a set of random points via a naive algorithm (see math/src/polynomial.rs:interpolate())
  • CPU FFT: Finding the lowest degree polynomial that interpolates pairs of twiddle factors and Fourier coefficients (the results of applying the Fourier transform to the coefficients of a polynomial) (see math/src/polynomial.rs:interpolate_fft()).
  • GPU FFT (Metal): Same as CPU FFT but the FFT algorithm is run on the GPU via the Metal API (Apple).

these were run with criterion-rs in a MacBook Pro M1 (18.3), statistically measuring the total run time of one iteration. The field used was a 256 bit STARK-friendly prime field.

All values of time are in milliseconds. Those cases which were greater than 30 seconds were marked respectively as they're too slow and weren't worth to be benchmarked. The input size refers to d + 1 where d is the polynomial's degree (so size is amount of coefficients).

Input sizeCPU LagrangeCPU FFTGPU FFT (Metal)
2^42.2 ms0.2 ms2.5 ms
2^59.6 ms0.4 ms2.5 ms
2^642.6 ms0.8 ms2.5 ms
2^7200.8 ms1.7 ms2.9 ms
..........
2^21>30000 ms28745 ms574.2 ms
2^22>30000 ms>30000 ms1144.9 ms
2^23>30000 ms>30000 ms2340.1 ms
2^24>30000 ms>30000 ms4652.9 ms

NOTE: Metal FFT execution includes the Metal state setup, twiddle factors generation and a bit-reverse permutation.