General Generatives

This blog post is a step-by-step guide for running Llama-2 7B model using llama.cpp, with NVIDIA CUDA and Ubuntu 22.04. llama.cpp is an C/C++ library for the inference of Llama/Llama-2 models. It has grown insanely popular along with the booming of large language model applications. Throughout this guide, we assume the user home directory (usually /home/username) is the working directory. Install NVIDIA CUDA To start, let's install NVIDIA CUDA on Ubuntu 22.04. The guide presented here is the same as the CUDA Toolkit download page provided by NVIDIA, but I deviate a little bit by installing CUDA 11.8 instead of the latest version. At the time of writing, PyTorch 2.0 stable is released for CUDA 11.8 and I find it convenient to keep my deployed CUDA version in sync with that. $ wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-keyring_1.1-1_all.deb $ sudo dpkg -i cuda-keyring_1.1-1_all.deb $ sudo apt update $ sudo apt install cuda-11-8 After ...

General Generatives

Posts

Serving Llama-2 7B using llama.cpp with NVIDIA CUDA on Ubuntu 22.04

A Perplexity Benchmark of llama.cpp