Posts by Category

Study

TensorRL-LLM

October 30 2024

To understand NIM, you can not avoid deep undertanding of TRT-LLM, Triton and even vLLM. Those will be focus for the near future.

Lookahead

October 24 2024

Last blog about Medusa was going longer than I expected. So I will write separete blogs about lookahead and EAGLE1/2

Boltzmann Machine

October 20 2024

I started learning ML with Andrew Ng’s course, and at the same time, I also took Neuron Network from Hinton. The second one is actually very hard for me and ...

Avoid secret leak in GIT

October 16 2024

Frankly speaking security is the area I care the least. Not interested in any security related topics except for RSA, which is only because the algorithm beh...

Hopfield Network

October 11 2024

Hopfield and Hinton won Noble Prize for Physics this year. Big surprise! I found this video explains what’s Hopfield’s work. It gets me to think how NN is in...

Medusa and EAGLE

October 08 2024

Both are speculative decoding technologies used to accelerate decoding. There are lookahead, and ReDrafter as well.

Helm and Operators

October 04 2024

Helm to K8s is similar apt to Ubuntu, which is a package management system. It defines pod yaml, deployment yaml, ServiceAccount, Secrets, etc

Knowledge Distillation

September 15 2024

Distillation was introduced by Hinton and Dean in 2015, another masterpiece from Google.

Git Merges

September 13 2024

Merge vs Rebase Great explanation from this tutorial and this video

Git Undo

September 10 2024

I will start with simple cheat sheet before diving into reset/revert/checkout

LLM Router

August 30 2024

LLM Router was introduced by LMSYS and Anyscale. The first open sourced LLM routing and introduced 4 different routing policies.

Kubenetes 101

August 19 2024

I finially started the learning of K8S, and followed this offical doc to get my pods listed, deployment done, and service created.

Proxy and Reverse Proxy

August 10 2024

Every time I see the word “Proxy” I feel some kind of uneasy, not to mention how I feel when I see “Reverse Proxy”. Now looked it up at this intro

Bradley Terry and Elo Score

July 31 2024

Worked on SW(Similarity Weighted) routing policy on LLM Router, and learned Bradyley Terry and Elo score. Very interesting topics

Elo ranking and Bradly-Terry

July 29 2024

1. try/catch/else/finally Encountered an interesting piece of code for try/catch ```python def no_env_var(var: str): try: # If you have this var, remove it i...

SAM and BLIP

June 16 2024

Segement Anything Model and Boostrap Lang-Image Pretraining

CLIP

June 12 2024

Dr Vlog gave a talk on CLIP to Math PhDs and summarized in a 50mins video.

YOLO v4-v9

June 08 2024

Continue to finish YOLO v4-v9 in this video. Just curious what they did to ship these many new versions of YOLO

Something about Docker

June 03 2024

Almost every 6 months, I would look up the differences between Docker ENTERPOINT and CMD. Now it comes to my blog

SSD and YOLO

June 01 2024

There is really no need to know the details of implementation from YOLO V1 to V9. But considering it’s the one model helped me quite a lot on AWS projects, I...

Image Text Fusion

May 31 2024

Jump into Multi Models before June. This video talks 6 different ways to fuse text and image together.

Diffusion Models

May 22 2024

The original motivation of this tech blog was to understand diffusion models. It’s such a beautify algorithm that I spent lots of time reading from Lilian...

GenAI by Hung-yi Lee 2024-03

May 21 2024

It’s been a while that I follow Dr Lee’s 2024 lecture. But watched his video about GPT-4o yesterday and would like to continue this series 0 RLHF First let m...

Andrej Karpathy-Tokenizer

May 18 2024

Tokenizer sounds trivial but plays such an important role in LLM. It actually simply explains why LLM is not good at math arithmatics.

Spline 2

May 15 2024

Second part of this videos talks about spines again. and I think I found some clue to 1/3 of vel in the previous spine

Bezier Curves

May 14 2024

This is one of most amazing videos for math concept introduction,together with a great primer

Mamba

May 12 2024

I finally reached the last part of this Mamba intro after going through HiPPO and this video from Umal Jamil.

Llama2 tricks (2)

May 09 2024

This is summary of explanantion of KV cache and RoPE in this video. I really like how Bai explained RoPE. 1 KV Cache First review the meaning of Q/K/V

Speculative Decoding

May 08 2024

A discussion around tokenizers in slack leads to following comment: Would be the natural progression after you arrive at the fact taking three steps to get ...

Spline

May 06 2024

I reviewed some basics about Spline in the previous blog about KAN, and happened to find this tutorial. The most amazing part of this video is to show you wh...

Kolmogorov-Arnold Network

May 05 2024

It’s all over the internet about how KAN will revolutionize ML by replacing MLP. Here are my frist read about KAN

GPU and related techs 101

May 02 2024

Last time I touched CUDA and GPU was in 2018, when I was preparing for job hunting at CGG. It’s time to review some basics about GPU now

Andrej Karpathy-WaveNet

April 25 2024

WaveNet published by Google in 2016, a wave generating DNN with dilated casual convolutions.

Andrej Karpathy-MLP

April 18 2024

It’s based on Bengio’s paper on MLP, A Neural Probabilistic Language Model in 2003.

Andrej Karpathy-GPT from scratch

April 06 2024

If there is one video you should watch about GPT, this is it. Karpathy’s dive deep on code level of explanation of GPT, it’s a bless to all GenAI engineers.

Object Detection Summary

April 03 2024

I feel I was reading a lot of LLM related topics recently but getting far away from CV. I happened to read this post from v_JULY_v and it’s a good review for...

State Space Machine

March 30 2024

Structured State Space for Sequence Modeliing S4 paper by Albert Gu, 2021

RL 2024-3

March 29 2024

Focus on PPO in this post

RL 2024-2

March 28 2024

Focus on Policy Gradient in this post from Cameron Wolfe. and with implementation from Spinning up, it’s essencially a feedforward network (MLP with 3 layer...

Knowledge Distillation

March 26 2024

Distillation was introduced by Hinton and Dean in 2015, another masterpiece from Google.

GenAI by Hung-yi Lee 2024-02

March 24 2024

Continue with Part 1, The 5th way of prompt engineering. 5 Model Cooperation Model Cooperatation could be due to cost. This is similar to MoE but no LLM arch...

LLM Pre-Training and Inference

March 20 2024

This is from Cameron Wolfe’s website and discussed LLM pretraining and Inference in details with code. Very educational and I will write multiple study notes...

Theorema Egregium

March 16 2024

Gauss’s Theorema Egregium, which is Latin for “Remarkable Theorem”, is a major result of differential geometry. I found a good introduction here and video by...

Kaiming’s ML overview at MIT

March 13 2024

Kaiming He joint MIT as associate professor in Feb 2024 and deliveried “Deep Learning Bootcamp” as his first public talk as MIT professor. It’s very pleasant...

Entropy and Perplexity

March 12 2024

Understanding Shannon’s entropy is curcial to understand concepts like cross entropy and KL divergenece. But perplexity is the concept comes with NLP. Here i...

Prompts

March 07 2024

Prompt methods Notes for this prompt engineer website 1 Automatic Reasoning and Tool-use (ART) Key idea: Add code executing results in the chaining prompt ...

Nvidia GenAI Stack

February 25 2024

Nvidia NeMo GenAI framework on DGX Cloud/Kubernetes Clusters AutoConfigurator SFT and PEFT

Git

February 21 2024

Migrate my notes on Git from Google Keep to here

SQL 101

February 17 2024

The first new thing I learnt in 2024 is SQL. I did a try back in 2022 job hunting, and was asked to write SQL quries in Databricks SA interview. I never writ...

LLM overview in 2024

February 10 2024

Why SFT is before RLHF RLHF need basic capability from the model. 2 steps in RLHF If only learns from HF, then you knows answer to a...

Python tips

February 08 2024

Random Python tips I collect recently.

Distributed System comparison

February 05 2024

Again, I summarized the comparison between each distributed systems here. Couple of interesting points to DASK

Pytorch - 3 Distributed Computing

January 30 2024

Pytorch gets its own distributed implemetion, either through MPI backend, point to point, inspriation for torch.distributed Meta’s own [GLOO](https://gi...

Pytorch - 2 Model Parallel

January 29 2024

Continue Pytorch notes with Single-Machine Model Parallel. Model Parallel When the model is too large to fit into a single GPU, model parallel is necessary. ...

PyTorch - 1 Data Parallel

January 28 2024

My ML journey started with building NN layer by layer with Tensorflow in 2016. Keras was invented but I still wanted to know more details of Tensorflow but s...

Llama2 tricks

January 26 2024

A good youtube video explained several tricks applied in Llama2 Here is the study notes. 1. Layer normalization Batch norm: normalized by columns (same feat...

AlphaGeometry

January 19 2024

I have been sick since the trip to China and haven’t fully recovered even till today. I totally underestimated the damage of 雾霾(smog, a word combining smoke ...

Nvidia GPUs

December 10 2023

Nvidia’s server GPUs have been envolving. I was quite familiar with GPU types back in CGG but after 4 or 5 years, I am totally out of sync

Tokenizers in LLM

November 25 2023

Tokenizer is a basic concept in NLP, and basically it generates tokens from a sentence. A token is a bit “less” than a word, so the common ratio between toke...

Poisson and Exponential Distribution

November 02 2023

马同学(Student Horse) is a great source of math concept clarifictions, both in linear algebra and statistics. I came across this explanantion for both Poisson a...

Continuous Batch

November 01 2023

Read the blog about continuous batching, from Cade and Shen.

Flash Attention

October 30 2023

It’s time to dig into some LLM optimization algirthms. My first googled question was “Flash Attention vs Paged Attention”, which are two popular optimization...

RAG Fusion

October 25 2023

This is a typical example of we can enrich RAG with more advanved methods, and it does NOT required more complicated algorithm or theories. RAG Fusion is ver...

FireAct and LLM Datasets

October 15 2023

There are multiple LLM related datasets and I didn’t really pay attentions to till I started working on FireAct demo.

LLM functions

September 29 2023

Open Interpreter was one of the fast growing repos on Github, and it got 26k stars in a month time. I played it in last weekend and had quite some fun with O...

From LLM to Agents

September 19 2023

Yao Shunyu, the original author of ReAct paper, talk about LLM and Agents.

ML101 -3

August 12 2023

Start with Regression vs Classification. and introduce softmax It seems there are long stories behind softmax, rather than normalization (Answer: Use Sigmoi...

ML101 -2

August 02 2023

This blog is mainly about optimizers. It’s good to review them all. Overall problem to be solved, different parameters need different learning rate 1 AdaGra...

ML101 -1

July 28 2023

I was recommending some online materials for ML101 to a friend, and videos from Hung-yi Lee are always my first choices. I selected ML courses from his 2021 ...

Reinforcement Learning - Q Learning

June 12 2023

Normally we learn RL from Q learning, seems most easy to understand. But this lecture goes PD and PPO first, then Q learning. Interesting, reminds me of line...

Reinforcement Learning - PPO

June 11 2023

PPO, Proximal Policy Optimizatoin. One of the most powerful RL algorithm, and the default RL training algorithm by OpenAI.

Reinforcement Learning - Policy Gradient

June 09 2023

Learning RL is actually my very first ML project since joined AWS. DeepRace was released at reInvent 2018, and our prototype team got hand on it at early 201...

Back to Top ↑

Application

Andrej Karpathy-MicroGrad

April 15 2024

2.5 hr video of micrgrad. I wish I could’ve watched this video 5 yrs earlier! It clears out so many questions about loss.backward()!

Workshops and Buildouts

January 24 2024

One interesting part of my job is to create workshops and buildouts (one type of workshop focusing on AWS building). Here are some screenshots of my previous...

Pinecone Canopy, tokenizer, poetry…

November 20 2023

Pinecone released Canopy, which is a framework for RAG. It original has OpenAI as LLM and embedding model provider and wants to cooperate with Anyscale for o...

Poe and Modal

November 15 2023

Poe, a chatbot hosting service backed by Quora, is getting popular. I tried to add AE as one of the Chatbot, with Zephyr 7B model. (It seems that first model...

OpenAI API v1 change

November 10 2023

OpenAI’s Dev Day was quite exciting, right? But do you konw they quitely release API v1 and has breaking changes in it? This would keep my busy for next coup...

OpenAI Dev Day

November 07 2023

Assistant is the killer API, for quite some startups, and even for big and popular projects like Pinecone and Open Interpreter.

Anyscale Endpoint Integration - LangChain

October 09 2023

First thing first, it took me couple of commits to pass the formatting check in this PR with ruff and black, so I’d better to record them down first Ruff ...

CORS setup on AWS APIGateway

September 15 2023

It has been also two month sicne my last update. There is no execuse, but I do want to claim that I have been out of home fro Aug 1st to 22nd, and got totall...

Create Workshop by Hugo Part 2

July 09 2023

This part will focus on how to host static webpage created by Hugo online, mainly leverage online cloud services like AWS S3.

Create Workshop by Hugo Part 1

July 03 2023

Workshop instructions by Hugo is a great tool. Easy to create, goodlooking template and look professional. I am so regretful that my early workshops in AWS w...

Building Slack Bot for LLM on AWS - Part 1

May 28 2023

Kind of a hackathon/weekend project. Built a Slack bot running backend on Anyscale. Actually my very first engineer project did at Amazon was using slack bot...

Back to Top ↑

Code

Concurrency Execution

October 21 2024

It’s all over the internet about how KAN will revolutionize ML by replacing MLP. Here are my frist read about KAN

Torchtune

August 22 2024

A customer request, show a OSS solution for LoRA finetune. Open sourced NeMo is the backup plan and Pytorch PEFT is preferred

Httpx

July 18 2024

HTTPX is another HTTP client similar to Requests. It’s used as OpenAI’s OpenAI constructor for http_client option.

Pydantic Validators

June 28 2024

Read a good introductions to validators in Pydantic here. Even though Pydantic is gonna deprecate validator and root_valiator decerators in v3, but it’s stil...

Tetrics

December 22 2023

I was creating a Tetris game for the first time. Even though it was a popular programming task during college time, but I never actually tried to implement i...

Ray Data

December 15 2023

This is not quite the ML topic, but more about using Ray Data library to run batch progressing. Yes, the core function of Ray Data is batch progress, and her...

JSON Schema

December 08 2023

I was playing with function calling with AE, yes, now AE is the first host to enable function call/JSON model on open source models, Mistral and Mixtral seri...

Pytest

November 28 2023

Unit Test, is something I ignore for a long time. I know it existence but barely initiate one. If it’s already in the system, I don’t mind add one, like for ...

argparse and Namespace

May 23 2023

Learn something really trival today, but kind of interesting and useful, which is related to argparse library We all use argparse for argument parsing. The s...

Create python packages Part 1

May 15 2023

This blog records my tests on how to create a Python package 1. Simplest demo of creating a python package Creating a python packages is not hard, but there ...

Back to Top ↑