Deepseek Conditional Memory Engram

less than 1 minute read

N-Gram is coming back to life with this paper. There is already a detailed code explanation version of Engram, but I may defer to read it at a later time. Here is high level overviews

0 N-gram

N-gram is good at catching local and fixed texture pattern.
But it’s bad at long distance dependency and data sparsity issues.

1 Over Tokenized Transformer

This is from ByteDance’s seed team, decoupling input and output vocabularies to improve language modeling performance. Alt text Over Encoding + Over Decoding (MTP) = Over Tokennized Transformer

Twitter Facebook LinkedIn

Deepseek Conditional Memory Engram

0 N-gram

1 Over Tokenized Transformer

You May Also Enjoy

Nanobot MCP 集成 - 连接外部工具的桥梁

Nanobot 源码深度解析 - Agent 架构与运行机制

Nanobot Agent Skills 实践

Skills from the first principle