Albert is the Founder and CEO of Compute Labs, launched in March 2024 to realize his GPU RWA vision. He was a founding team member at Delysium, a core member at rct.AI (YC19), and Product Owner at ...
This project is a step-by-step learning journey where we implement various types of Triton kernels—from the simplest examples to more advanced applications—while exploring GPU programming with Triton.
Online LLM inference powers many exciting applications such as intelligent chatbots and autonomous agents. Modern LLM inference engines widely rely on request batching to improve inference throughput, ...