BERT

less than 1 minute read

I think I need to review BERT more closely to undertstand the encoder structure for dLLM. So I checked out this video

0 Encoder

It’s called encoder structure, b/c it outputs an embedding code, CAE Alt text It helps cluster not just words, sentences but documents

Now let’s take a closer look at Encoder architecture for Transformers for BERT (video)

1 BERT Pre-Training

Two phases for the Pretraining phase.

Masked LM: Fill in the blanks
Next Setence Prediction: If two sentences are related The input is sum of 3 embeddings
Token embedding is from wordpiece w 30k tokens
Segement embedding is A or B
Position embedding The output of each word is mapped to 30k neurons (token vocab size) and compaire w one hot encoding for a loss calculation.

3 BERT Fine Tuning

The Finetuning is for a Q/A pair
Only output layer is FTed, so the process is fast
The input is Question, and a Passage contains the answer
The output is the start/end words for the answer

Twitter Facebook LinkedIn

You May Also Enjoy

Nanobot MCP 集成 - 连接外部工具的桥梁

March 09 2026

Nanobot 的核心定位是 MCP Host — 它不仅有内置工具，还能连接任意 MCP (Model Context Protocol) 服务器，动态扩展 Agent 能力。本文详解 MCP 在 nanobot 中的工作机制。

Nanobot 源码深度解析 - Agent 架构与运行机制

March 08 2026

Nanobot 是香港大学数据科学实验室 (HKU Data Science Lab) 发布的超轻量级 AI Agent 框架，核心代码仅约 4000 行。本文从源码角度深入分析其架构设计和运行机制。

Nanobot Agent Skills 实践

March 04 2026

最近在使用 nanobot 这个 AI agent 框架，记录一下 skill 系统的实践，包括创建翻译工具和本地网页服务。

Skills from the first principle

January 27 2026

This 10mins video explains Skills from the first principle