S4 Structured State Spaces for Sequence Modeling

1 minute read

Part 2 of Study notes from the video presented by Albert Gu

1 Motivation

HiPPO maps a input to a state, and now we need to project state to an output. Alt text Now we get the full equations for S4, which is similar to SSM proposed by R.E. Kalman in 1960. \(x'=Ax+Bu \\y=Cx+Du\)

SSM + HiPPO + Structured Matrices = S4

2 General Properties of SSMs

  1. Continuous representation
    • SSMs are paramiterized signal model
  2. Recurrent representation
    • Output depeonds on the entire input but can be computed in constant time. Alt text
  3. Convolutional representation
    • Output can be computed without computing state Alt text -EMA is a convo. with infinitely long kernels Alt text

3 S4 resolving limitations of SSM

  • HiPPO state compresses the history of input. Alt text So introducing it can boost the performance of naive SSM. Alt text
  • SSM computation is SLOW Alt text and computer the kernal is as slow as the state. But we can use a structed kernel to make it fast. Alt text Why the kernel works is not explained here Alt text

    4 Addressing signal data with S4

    For multi-dimen input, apply S4 per channel (like depthwise-separable CNN). Traditional DL/ML can’t not work well on bio data Alt text
    and break at different frequencies Alt text
    But S4 can be trained at different resolutoins w the same model due to it’s continuous representation feature. Alt text
    Even work on NLP tasks, it can get really close to transformers but much faster. Alt text

Tags:

Categories:

Updated: