S4 Structured State Spaces for Sequence Modeling

1 minute read

Part 2 of Study notes from the video presented by Albert Gu

1 Motivation

HiPPO maps a input to a state, and now we need to project state to an output. Alt text Now we get the full equations for S4, which is similar to SSM proposed by R.E. Kalman in 1960. \(x'=Ax+Bu \\y=Cx+Du\)

SSM + HiPPO + Structured Matrices = S4

2 General Properties of SSMs

Continuous representation
- SSMs are paramiterized signal model
Recurrent representation
- Output depeonds on the entire input but can be computed in constant time.
Convolutional representation
- Output can be computed without computing state -EMA is a convo. with infinitely long kernels

3 S4 resolving limitations of SSM

HiPPO state compresses the history of input. So introducing it can boost the performance of naive SSM.
SSM computation is SLOW and computer the kernal is as slow as the state. But we can use a structed kernel to make it fast. Why the kernel works is not explained here
4 Addressing signal data with S4

For multi-dimen input, apply S4 per channel (like depthwise-separable CNN). Traditional DL/ML can’t not work well on bio data
and break at different frequencies
But S4 can be trained at different resolutoins w the same model due to it’s continuous representation feature.
Even work on NLP tasks, it can get really close to transformers but much faster.