LLama-2 – jakubpolec.com

LLama-2

24 July 2023

LLaMA 2 is a large language model developed by Meta and is the successor to LLaMA 1. LLaMA 2 is available for free for research and commercial use through providers like AWS, Hugging Face, and others. LLaMA 2 pretrained models are trained on 2 trillion tokens, and have double the context length than LLaMA 1. Its fine-tuned models have been trained on over 1 million human annotations.

The official announcement from Meta can be found here: https://ai.meta.com/llama/

What is LLaMa 2?

Meta released LLaMA 2, the new state-of-the-art open large language model (LLM). LLaMA 2 represents the next iteration of LLaMA and comes with a commercially-permissive license. LLaMA 2 comes in 3 different sizes – 7B, 13B, and 70B parameters. New improvements compared to the original LLaMA include:

Trained on 2 trillion tokens of text data
Allows commercial use
Uses a 4096 default context window (can be expanded)
The 70B model adopts grouped-query attention (GQA)
Available on Hugging Face Hub

LLaMA Playgrounds, test it

There are a few different playgrounds available to test out interacting with LLaMA 2 Chat:

HuggingChat allows you to chat with the LLaMA 2 70B model through Hugging Face’s conversational interface. This provides a simple way to see the chatbot in action.
Hugging Face Spaces has LLaMA 2 models in 7B, 13B and 70B sizes available to test. The interactive demos let you compare different model sizes.
Perplexity has both the 7B and 13B LLaMA 2 models accessible through their conversational AI demo. You can chat with the models and provide feedback on the responses.

Research Behind LLaMA 2

LLaMA 2 is a base LLM model and pretrained on publicly available data found online. Additionally Meta released a CHAT version. The first version of the CHAT model was SFT (Supervised fine-tuned) model. After that, LLaMA-2-chat was iteratively improved through Reinforcement Learning from Human Feedback (RLHF). The RLHF process involved techniques like rejection sampling and proximal policy optimization (PPO) to further refine the chatbot. Meta only released the latest RLHF (v5) versions of the model. If you curious how the process was behind checkout:

How good is LLaMA 2, benchmarks?

Meta claims that “Llama 2 outperforms other open source language models on many external benchmarks, including reasoning, coding, proficiency, and knowledge tests.”. You can find more insights over the performance at:

How to Prompt LLaMA 2 Chat

LLaMA 2 Chat is an open conversational model. Interacting with LLaMA 2 Chat effectively requires providing the right prompts and questions to produce coherent and useful responses. Meta didn’t choose the simplest prompt. Below is the prompt template for single-turn and multi-turn conversations. This template follows the model’s training procedure, as described in the LLaMA 2 paper. You can also take a look at LLaMA 2 Prompt Template.

Single-turn

<s>[INST] <<SYS>>
{{ system_prompt }}
<</SYS>>
{{ user_message }} [/INST]

Multi-turn

<s>[INST] <<SYS>>
{{ system_prompt }}
<</SYS>>
{{ user_msg_1 }} [/INST] {{ model_answer_1 }} </s>\
<s>[INST] {{ user_msg_2 }} [/INST] {{ model_answer_2 }} </s>\
<s>[INST] {{ user_msg_3 }} [/INST]

How to train LLaMA 2

LLaMA 2 is openly available making it easy to fine-tune using techniques, .e.g. PEFT. There are great resources available for training your own versions of LLaMA 2:

How to Deploy LLaMA 2

LLaMA 2 can be deployed in local environment (llama.cpp), using managed services like Hugging Face Inference Endpoints or through or cloud platforms like AWS, Google Cloud, and Microsoft Azure.

Deploy LLaMa 2 Using text-generation-inference and Inference Endpoints
Deploy LLaMA 2 70B using Amazon SageMaker (coming soon)
Llama-2-13B-chat locally on your M1/M2 Mac with GPU inference

Other Sources

Llama 2 Resources

Let me know if you would like me to expand on any section or add additional details. I aimed to provide a high-level overview of key information related to LLaMA 2’s release based on what is publicly known so far.

LLama-2