LLaMA 2 is a large language model developed by Meta and is the successor to LLaMA 1. LLaMA 2 is available for free for research and commercial use through providers like AWS, Hugging Face, and others. LLaMA 2 pretrained models are trained on 2 trillion tokens, and have double the context length than LLaMA 1. Its fine-tuned models have been trained on over 1 million human annotations.
The official announcement from Meta can be found here: https://ai.meta.com/llama/
Meta released LLaMA 2, the new state-of-the-art open large language model (LLM). LLaMA 2 represents the next iteration of LLaMA and comes with a commercially-permissive license. LLaMA 2 comes in 3 different sizes – 7B, 13B, and 70B parameters. New improvements compared to the original LLaMA include:
There are a few different playgrounds available to test out interacting with LLaMA 2 Chat:
LLaMA 2 is a base LLM model and pretrained on publicly available data found online. Additionally Meta released a CHAT version. The first version of the CHAT model was SFT (Supervised fine-tuned) model. After that, LLaMA-2-chat was iteratively improved through Reinforcement Learning from Human Feedback (RLHF). The RLHF process involved techniques like rejection sampling and proximal policy optimization (PPO) to further refine the chatbot. Meta only released the latest RLHF (v5) versions of the model. If you curious how the process was behind checkout:
Meta claims that “Llama 2 outperforms other open source language models on many external benchmarks, including reasoning, coding, proficiency, and knowledge tests.”. You can find more insights over the performance at:
LLaMA 2 Chat is an open conversational model. Interacting with LLaMA 2 Chat effectively requires providing the right prompts and questions to produce coherent and useful responses. Meta didn’t choose the simplest prompt. Below is the prompt template for single-turn and multi-turn conversations. This template follows the model’s training procedure, as described in the LLaMA 2 paper. You can also take a look at LLaMA 2 Prompt Template.
Single-turn
<s>[INST] <<SYS>>
{{ system_prompt }}
<</SYS>>
{{ user_message }} [/INST]
Multi-turn
<s>[INST] <<SYS>>
{{ system_prompt }}
<</SYS>>
{{ user_msg_1 }} [/INST] {{ model_answer_1 }} </s>\
<s>[INST] {{ user_msg_2 }} [/INST] {{ model_answer_2 }} </s>\
<s>[INST] {{ user_msg_3 }} [/INST]
LLaMA 2 is openly available making it easy to fine-tune using techniques, .e.g. PEFT. There are great resources available for training your own versions of LLaMA 2:
LLaMA 2 can be deployed in local environment (llama.cpp), using managed services like Hugging Face Inference Endpoints or through or cloud platforms like AWS, Google Cloud, and Microsoft Azure.
Let me know if you would like me to expand on any section or add additional details. I aimed to provide a high-level overview of key information related to LLaMA 2’s release based on what is publicly known so far.