S4 Structured State Spaces for Sequence Modeling
Part 2 of Study notes from the video presented by Albert Gu
1 Motivation
HiPPO maps a input to a state, and now we need to project state to an output. Now we get the full equations for S4, which is similar to SSM proposed by R.E. Kalman in 1960. \(x'=Ax+Bu \\y=Cx+Du\)
SSM + HiPPO + Structured Matrices = S4
2 General Properties of SSMs
- Continuous representation
- SSMs are paramiterized signal model
- Recurrent representation
- Output depeonds on the entire input but can be computed in constant time.
- Convolutional representation
- Output can be computed without computing state -EMA is a convo. with infinitely long kernels
3 S4 resolving limitations of SSM
- HiPPO state compresses the history of input. So introducing it can boost the performance of naive SSM.
- SSM computation is SLOW
and computer the kernal is as slow as the state. But we can use a structed kernel to make it fast.
Why the kernel works is not explained here
4 Addressing signal data with S4
For multi-dimen input, apply S4 per channel (like depthwise-separable CNN). Traditional DL/ML can’t not work well on bio data
and break at different frequencies
But S4 can be trained at different resolutoins w the same model due to it’s continuous representation feature.
Even work on NLP tasks, it can get really close to transformers but much faster.