Extreme Filters-cache-efficient implementation of long IIR and FIR filters

Modern general purpose processors have powerful vector processing units that can be used in infinite impulse response and finite impulse response filtering in real time. In practice, the calculations are efficient only if the filter parameters fit into the processor's cache. With long filters, the computation is slowed down since the filter data do not fit into the cache at once. By computing the filter in multiple segments, more efficient cache utilization can be achieved. The presented optimization method works when multiple samples are processed in row. The improved cache efficiency increases performance by up to almost one magnitude over direct filter implementation on modern hardware