TensorRL-LLM
To understand NIM, you can not avoid deep undertanding of TRT-LLM, Triton and even vLLM. Those will be focus for the near future.
To understand NIM, you can not avoid deep undertanding of TRT-LLM, Triton and even vLLM. Those will be focus for the near future.
Last blog about Medusa was going longer than I expected. So I will write separete blogs about lookahead and EAGLE1/2
I started learning ML with Andrew Ng’s course, and at the same time, I also took Neuron Network from Hinton. The second one is actually very hard for me and ...
Frankly speaking security is the area I care the least. Not interested in any security related topics except for RSA, which is only because the algorithm beh...
Hopfield and Hinton won Noble Prize for Physics this year. Big surprise! I found this video explains what’s Hopfield’s work. It gets me to think how NN is in...
Both are speculative decoding technologies used to accelerate decoding. There are lookahead, and ReDrafter as well.
Helm to K8s is similar apt to Ubuntu, which is a package management system. It defines pod yaml, deployment yaml, ServiceAccount, Secrets, etc
Distillation was introduced by Hinton and Dean in 2015, another masterpiece from Google.
Merge vs Rebase Great explanation from this tutorial and this video
I will start with simple cheat sheet before diving into reset/revert/checkout
LLM Router was introduced by LMSYS and Anyscale. The first open sourced LLM routing and introduced 4 different routing policies.
I finially started the learning of K8S, and followed this offical doc to get my pods listed, deployment done, and service created.
Every time I see the word “Proxy” I feel some kind of uneasy, not to mention how I feel when I see “Reverse Proxy”. Now looked it up at this intro
Worked on SW(Similarity Weighted) routing policy on LLM Router, and learned Bradyley Terry and Elo score. Very interesting topics
1. try/catch/else/finally Encountered an interesting piece of code for try/catch ```python def no_env_var(var: str): try: # If you have this var, remove it i...
Summary from this post
Segement Anything Model and Boostrap Lang-Image Pretraining
Basic ideas in Contrastive Learning and Kaiming’s improvment in MoCo.
Dr Vlog gave a talk on CLIP to Math PhDs and summarized in a 50mins video.
Continue to finish YOLO v4-v9 in this video. Just curious what they did to ship these many new versions of YOLO
Almost every 6 months, I would look up the differences between Docker ENTERPOINT and CMD. Now it comes to my blog
There is really no need to know the details of implementation from YOLO V1 to V9. But considering it’s the one model helped me quite a lot on AWS projects, I...
Jump into Multi Models before June. This video talks 6 different ways to fuse text and image together.
PhD Vlog talked about some OD networks, and this is the development line of ODs
The original motivation of this tech blog was to understand diffusion models. It’s such a beautify algorithm that I spent lots of time reading from Lilian...
It’s been a while that I follow Dr Lee’s 2024 lecture. But watched his video about GPT-4o yesterday and would like to continue this series 0 RLHF First let m...
Tokenizer sounds trivial but plays such an important role in LLM. It actually simply explains why LLM is not good at math arithmatics.
Second part of this videos talks about spines again. and I think I found some clue to 1/3 of vel in the previous spine
This is one of most amazing videos for math concept introduction,together with a great primer
I finally reached the last part of this Mamba intro after going through HiPPO and this video from Umal Jamil.
Part 2 of Study notes from the video presented by Albert Gu
This is summary of explanantion of KV cache and RoPE in this video. I really like how Bai explained RoPE. 1 KV Cache First review the meaning of Q/K/V
A discussion around tokenizers in slack leads to following comment: Would be the natural progression after you arrive at the fact taking three steps to get ...
I reviewed some basics about Spline in the previous blog about KAN, and happened to find this tutorial. The most amazing part of this video is to show you wh...
It’s all over the internet about how KAN will revolutionize ML by replacing MLP. Here are my frist read about KAN
Last time I touched CUDA and GPU was in 2018, when I was preparing for job hunting at CGG. It’s time to review some basics about GPU now
WaveNet published by Google in 2016, a wave generating DNN with dilated casual convolutions.
Andrej explain his own blog in this lecture.
This part goes deep into some training tricks. Very insightful!
It’s based on Bengio’s paper on MLP, A Neural Probabilistic Language Model in 2003.
2.5 hr video of MakeMore.
Andrej took a sleep break and here is the second part of his intruct.
If there is one video you should watch about GPT, this is it. Karpathy’s dive deep on code level of explanation of GPT, it’s a bless to all GenAI engineers.
I feel I was reading a lot of LLM related topics recently but getting far away from CV. I happened to read this post from v_JULY_v and it’s a good review for...
Study notes from the video presented by Albert Gu on S4, Structured State Space Sequence model
Structured State Space for Sequence Modeliing S4 paper by Albert Gu, 2021
Focus on PPO in this post
Focus on Policy Gradient in this post from Cameron Wolfe. and with implementation from Spinning up, it’s essencially a feedforward network (MLP with 3 layer...
Distillation was introduced by Hinton and Dean in 2015, another masterpiece from Google.
Continue with Part 1, The 5th way of prompt engineering. 5 Model Cooperation Model Cooperatation could be due to cost. This is similar to MoE but no LLM arch...
Summary from this MoE link and this Decoder-only transformer link
This is from Cameron Wolfe’s website and discussed LLM pretraining and Inference in details with code. Very educational and I will write multiple study notes...
Gauss’s Theorema Egregium, which is Latin for “Remarkable Theorem”, is a major result of differential geometry. I found a good introduction here and video by...
Kaiming He joint MIT as associate professor in Feb 2024 and deliveried “Deep Learning Bootcamp” as his first public talk as MIT professor. It’s very pleasant...
Understanding Shannon’s entropy is curcial to understand concepts like cross entropy and KL divergenece. But perplexity is the concept comes with NLP. Here i...
It’s time to learn ML/GenAI with Dr Lee AGAIN in 2024.
Prompt methods Notes for this prompt engineer website 1 Automatic Reasoning and Tool-use (ART) Key idea: Add code executing results in the chaining prompt ...
Varies of prompt engineering tricks for Anthropic Claud
Nvidia NeMo GenAI framework on DGX Cloud/Kubernetes Clusters AutoConfigurator SFT and PEFT
Migrate my notes on Git from Google Keep to here
The first new thing I learnt in 2024 is SQL. I did a try back in 2022 job hunting, and was asked to write SQL quries in Databricks SA interview. I never writ...
Why SFT is before RLHF RLHF need basic capability from the model. 2 steps in RLHF If only learns from HF, then you knows answer to a...
Random Python tips I collect recently.
Again, I summarized the comparison between each distributed systems here. Couple of interesting points to DASK
Pytorch gets its own distributed implemetion, either through MPI backend, point to point, inspriation for torch.distributed Meta’s own [GLOO](https://gi...
Continue Pytorch notes with Single-Machine Model Parallel. Model Parallel When the model is too large to fit into a single GPU, model parallel is necessary. ...
My ML journey started with building NN layer by layer with Tensorflow in 2016. Keras was invented but I still wanted to know more details of Tensorflow but s...
A good youtube video explained several tricks applied in Llama2 Here is the study notes. 1. Layer normalization Batch norm: normalized by columns (same feat...
I have been sick since the trip to China and haven’t fully recovered even till today. I totally underestimated the damage of 雾霾(smog, a word combining smoke ...
Nvidia’s server GPUs have been envolving. I was quite familiar with GPU types back in CGG but after 4 or 5 years, I am totally out of sync
Tokenizer is a basic concept in NLP, and basically it generates tokens from a sentence. A token is a bit “less” than a word, so the common ratio between toke...
马同学(Student Horse) is a great source of math concept clarifictions, both in linear algebra and statistics. I came across this explanantion for both Poisson a...
Read the blog about continuous batching, from Cade and Shen.
It’s time to dig into some LLM optimization algirthms. My first googled question was “Flash Attention vs Paged Attention”, which are two popular optimization...
This is a typical example of we can enrich RAG with more advanved methods, and it does NOT required more complicated algorithm or theories. RAG Fusion is ver...
Came across another agent training paper from Tsinghua, AgentTuning.
There are multiple LLM related datasets and I didn’t really pay attentions to till I started working on FireAct demo.
Open Interpreter was one of the fast growing repos on Github, and it got 26k stars in a month time. I played it in last weekend and had quite some fun with O...
Yao Shunyu, the original author of ReAct paper, talk about LLM and Agents.
One more good resource for this introduction is here
Start with Regression vs Classification. and introduce softmax It seems there are long stories behind softmax, rather than normalization (Answer: Use Sigmoi...
This blog is mainly about optimizers. It’s good to review them all. Overall problem to be solved, different parameters need different learning rate 1 AdaGra...
I was recommending some online materials for ML101 to a friend, and videos from Hung-yi Lee are always my first choices. I selected ML courses from his 2021 ...
Normally we learn RL from Q learning, seems most easy to understand. But this lecture goes PD and PPO first, then Q learning. Interesting, reminds me of line...
PPO, Proximal Policy Optimizatoin. One of the most powerful RL algorithm, and the default RL training algorithm by OpenAI.
Learning RL is actually my very first ML project since joined AWS. DeepRace was released at reInvent 2018, and our prototype team got hand on it at early 201...
2.5 hr video of micrgrad. I wish I could’ve watched this video 5 yrs earlier! It clears out so many questions about loss.backward()!
One interesting part of my job is to create workshops and buildouts (one type of workshop focusing on AWS building). Here are some screenshots of my previous...
Pinecone released Canopy, which is a framework for RAG. It original has OpenAI as LLM and embedding model provider and wants to cooperate with Anyscale for o...
Poe, a chatbot hosting service backed by Quora, is getting popular. I tried to add AE as one of the Chatbot, with Zephyr 7B model. (It seems that first model...
OpenAI’s Dev Day was quite exciting, right? But do you konw they quitely release API v1 and has breaking changes in it? This would keep my busy for next coup...
Assistant is the killer API, for quite some startups, and even for big and popular projects like Pinecone and Open Interpreter.
First thing first, it took me couple of commits to pass the formatting check in this PR with ruff and black, so I’d better to record them down first Ruff ...
It has been also two month sicne my last update. There is no execuse, but I do want to claim that I have been out of home fro Aug 1st to 22nd, and got totall...
This part will focus on how to host static webpage created by Hugo online, mainly leverage online cloud services like AWS S3.
Workshop instructions by Hugo is a great tool. Easy to create, goodlooking template and look professional. I am so regretful that my early workshops in AWS w...
“How to create Slack bot which interact with AWS Lambda” This is the prompt I entered for GPT4, and there are the answers from it
Kind of a hackathon/weekend project. Built a Slack bot running backend on Anyscale. Actually my very first engineer project did at Amazon was using slack bot...
Ray is an open-source framework that provides a simple and flexible way to build and scale distributed applications. It is designed to enable efficient and h...
It’s all over the internet about how KAN will revolutionize ML by replacing MLP. Here are my frist read about KAN
A customer request, show a OSS solution for LoRA finetune. Open sourced NeMo is the backup plan and Pytorch PEFT is preferred
HTTPX is another HTTP client similar to Requests. It’s used as OpenAI’s OpenAI constructor for http_client option.
Read a good introductions to validators in Pydantic here. Even though Pydantic is gonna deprecate validator and root_valiator decerators in v3, but it’s stil...
I was creating a Tetris game for the first time. Even though it was a popular programming task during college time, but I never actually tried to implement i...
This is not quite the ML topic, but more about using Ray Data library to run batch progressing. Yes, the core function of Ray Data is batch progress, and her...
I was playing with function calling with AE, yes, now AE is the first host to enable function call/JSON model on open source models, Mistral and Mixtral seri...
Unit Test, is something I ignore for a long time. I know it existence but barely initiate one. If it’s already in the system, I don’t mind add one, like for ...
Learn something really trival today, but kind of interesting and useful, which is related to argparse library We all use argparse for argument parsing. The s...
This blog records my tests on how to create a Python package 1. Simplest demo of creating a python package Creating a python packages is not hard, but there ...