{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "## Qwen Quick Start Notebook\n", "\n", "This notebook shows how to train and infer the Qwen-7B-Chat model on a single GPU. Similarly, Qwen-1.8B-Chat, Qwen-14B-Chat can also be leveraged for the following steps. We only need to modify the corresponding `model name` and hyper-parameters. The training and inference of Qwen-72B-Chat requires higher GPU requirements and larger disk space." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Requirements\n", "- Python 3.8 and above\n", "- Pytorch 1.12 and above, 2.0 and above are recommended\n", "- CUDA 11.4 and above are recommended (this is for GPU users, flash-attention users, etc.)\n", "We test the training of the model on an A10 GPU (24GB)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Extra\n", "If you need to speed up, you can install `flash-attention`. The details of the installation can be found [here](https://github.com/Dao-AILab/flash-attention)." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "!git clone https://github.com/Dao-AILab/flash-attention\n", "!cd flash-attention && pip install .\n", "# Below are optional. Installing them might be slow.\n", "# !pip install csrc/layer_norm\n", "# If the version of flash-attn is higher than 2.1.1, the following is not needed.\n", "# !pip install csrc/rotary" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Step 0: Install Package Requirements" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [] }, "outputs": [], "source": [ "!pip install transformers>=4.32.0 accelerate tiktoken einops scipy transformers_stream_generator==0.0.4 peft deepspeed modelscope" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Step 1: Download Model\n", "When using `transformers` in some regions, the model cannot be automatically downloaded due to network problems. We recommend using `modelscope` to download the model first, and then use `transformers` for inference." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [] }, "outputs": [], "source": [ "from modelscope import snapshot_download\n", "\n", "# Downloading model checkpoint to a local dir model_dir.\n", "model_dir = snapshot_download('Qwen/Qwen-7B-Chat', cache_dir='.', revision='master')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Step 2: Direct Model Inference \n", "We recommend two ways to do model inference: `modelscope` and `transformers`.\n", "\n", "#### 2.1 Model Inference with ModelScope" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "ExecutionIndicator": { "show": true }, "tags": [] }, "outputs": [], "source": [ "from modelscope import AutoModelForCausalLM, AutoTokenizer\n", "from modelscope import GenerationConfig\n", "\n", "# Note: The default behavior now has injection attack prevention off.\n", "tokenizer = AutoTokenizer.from_pretrained(\"Qwen/Qwen-7B-Chat/\", trust_remote_code=True)\n", "\n", "# use bf16\n", "# model = AutoModelForCausalLM.from_pretrained(\"qwen/Qwen-7B-Chat/\", device_map=\"auto\", trust_remote_code=True, bf16=True).eval()\n", "# use fp16\n", "# model = AutoModelForCausalLM.from_pretrained(\"qwen/Qwen-7B-Chat/\", device_map=\"auto\", trust_remote_code=True, fp16=True).eval()\n", "# use cpu only\n", "# model = AutoModelForCausalLM.from_pretrained(\"qwen/Qwen-7B-Chat/\", device_map=\"cpu\", trust_remote_code=True).eval()\n", "# use auto mode, automatically select precision based on the device.\n", "model = AutoModelForCausalLM.from_pretrained(\"Qwen/Qwen-7B-Chat/\", device_map=\"auto\", trust_remote_code=True).eval()\n", "\n", "# Specify hyperparameters for generation. But if you use transformers>=4.32.0, there is no need to do this.\n", "# model.generation_config = GenerationConfig.from_pretrained(\"Qwen/Qwen-7B-Chat/\", trust_remote_code=True) # 可指定不同的生成长度、top_p等相关超参\n", "\n", "# 第一轮对话 1st dialogue turn\n", "response, history = model.chat(tokenizer, \"你好\", history=None)\n", "print(response)\n", "# 你好!很高兴为你提供帮助。\n", "\n", "# 第二轮对话 2nd dialogue turn\n", "response, history = model.chat(tokenizer, \"给我讲一个年轻人奋斗创业最终取得成功的故事。\", history=history)\n", "print(response)\n", "# 这是一个关于一个年轻人奋斗创业最终取得成功的故事。\n", "# 故事的主人公叫李明,他来自一个普通的家庭,父母都是普通的工人。从小,李明就立下了一个目标:要成为一名成功的企业家。\n", "# 为了实现这个目标,李明勤奋学习,考上了大学。在大学期间,他积极参加各种创业比赛,获得了不少奖项。他还利用课余时间去实习,积累了宝贵的经验。\n", "# 毕业后,李明决定开始自己的创业之路。他开始寻找投资机会,但多次都被拒绝了。然而,他并没有放弃。他继续努力,不断改进自己的创业计划,并寻找新的投资机会。\n", "# 最终,李明成功地获得了一笔投资,开始了自己的创业之路。他成立了一家科技公司,专注于开发新型软件。在他的领导下,公司迅速发展起来,成为了一家成功的科技企业。\n", "# 李明的成功并不是偶然的。他勤奋、坚韧、勇于冒险,不断学习和改进自己。他的成功也证明了,只要努力奋斗,任何人都有可能取得成功。\n", "\n", "# 第三轮对话 3rd dialogue turn\n", "response, history = model.chat(tokenizer, \"给这个故事起一个标题\", history=history)\n", "print(response)\n", "# 《奋斗创业:一个年轻人的成功之路》" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### 2.2 Model Inference with transformers" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "ExecutionIndicator": { "show": true }, "tags": [] }, "outputs": [], "source": [ "from transformers import AutoModelForCausalLM, AutoTokenizer\n", "from transformers.generation import GenerationConfig\n", "\n", "tokenizer = AutoTokenizer.from_pretrained(\"Qwen/Qwen-7B-Chat/\", trust_remote_code=True)\n", "\n", "# use bf16\n", "# model = AutoModelForCausalLM.from_pretrained(\"Qwen/Qwen-7B-Chat/\", device_map=\"auto\", trust_remote_code=True, bf16=True).eval()\n", "# use fp16\n", "# model = AutoModelForCausalLM.from_pretrained(\"Qwen/Qwen-7B-Chat/\", device_map=\"auto\", trust_remote_code=True, fp16=True).eval()\n", "# use cpu only\n", "# model = AutoModelForCausalLM.from_pretrained(\"Qwen/Qwen-7B-Chat/\", device_map=\"cpu\", trust_remote_code=True).eval()\n", "# use auto mode, automatically select precision based on the device.\n", "model = AutoModelForCausalLM.from_pretrained(\n", " \"Qwen/Qwen-7B-Chat/\",\n", " device_map=\"auto\",\n", " trust_remote_code=True\n", ").eval()\n", "\n", "# Specify hyperparameters for generation. But if you use transformers>=4.32.0, there is no need to do this.\n", "# model.generation_config = GenerationConfig.from_pretrained(\"Qwen/Qwen-7B-Chat/\", trust_remote_code=True)\n", "\n", "# 1st dialogue turn\n", "response, history = model.chat(tokenizer, \"你好\", history=None)\n", "print(response)\n", "# 你好!很高兴为你提供帮助。\n", "\n", "# 2nd dialogue turn\n", "response, history = model.chat(tokenizer, \"给我讲一个年轻人奋斗创业最终取得成功的故事。\", history=history)\n", "print(response)\n", "# 这是一个关于一个年轻人奋斗创业最终取得成功的故事。\n", "# 故事的主人公叫李明,他来自一个普通的家庭,父母都是普通的工人。从小,李明就立下了一个目标:要成为一名成功的企业家。\n", "# 为了实现这个目标,李明勤奋学习,考上了大学。在大学期间,他积极参加各种创业比赛,获得了不少奖项。他还利用课余时间去实习,积累了宝贵的经验。\n", "# 毕业后,李明决定开始自己的创业之路。他开始寻找投资机会,但多次都被拒绝了。然而,他并没有放弃。他继续努力,不断改进自己的创业计划,并寻找新的投资机会。\n", "# 最终,李明成功地获得了一笔投资,开始了自己的创业之路。他成立了一家科技公司,专注于开发新型软件。在他的领导下,公司迅速发展起来,成为了一家成功的科技企业。\n", "# 李明的成功并不是偶然的。他勤奋、坚韧、勇于冒险,不断学习和改进自己。他的成功也证明了,只要努力奋斗,任何人都有可能取得成功。\n", "\n", "# 3rd dialogue turn\n", "response, history = model.chat(tokenizer, \"给这个故事起一个标题\", history=history)\n", "print(response)\n", "# 《奋斗创业:一个年轻人的成功之路》" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Step 3: LoRA Fine-Tuning Model (Single GPU)\n", "\n", "#### 3.1 Download Example Training Data\n", "Download the data required for training; here, we provide a tiny dataset as an example. It is sampled from [Belle](https://github.com/LianjiaTech/BELLE)." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [] }, "outputs": [], "source": [ "!wget https://atp-modelzoo-sh.oss-cn-shanghai.aliyuncs.com/release/tutorials/qwen_recipes/Belle_sampled_qwen.json" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You can refer to this format to prepare the dataset. Below is a simple example list with 1 sample:\n", "\n", "```json\n", "[\n", " {\n", " \"id\": \"identity_0\",\n", " \"conversations\": [\n", " {\n", " \"from\": \"user\",\n", " \"value\": \"你好\"\n", " },\n", " {\n", " \"from\": \"assistant\",\n", " \"value\": \"我是一个语言模型,我叫通义千问。\"\n", " }\n", " ]\n", " }\n", "]\n", "```\n", "\n", "You can also use multi-turn conversations as the training set. Here is a simple example:\n", "\n", "```json\n", "[\n", " {\n", " \"id\": \"identity_0\",\n", " \"conversations\": [\n", " {\n", " \"from\": \"user\",\n", " \"value\": \"你好\"\n", " },\n", " {\n", " \"from\": \"assistant\",\n", " \"value\": \"你好!我是一名AI助手,我叫通义千问,有需要请告诉我。\"\n", " },\n", " {\n", " \"from\": \"user\",\n", " \"value\": \"你都能做什么\"\n", " },\n", " {\n", " \"from\": \"assistant\",\n", " \"value\": \"我能做很多事情,包括但不限于回答各种领域的问题、提供实用建议和指导、进行多轮对话交流、文本生成等。\"\n", " }\n", " ]\n", " }\n", "]\n", "```\n", "\n", "#### 3.2 Fine-Tune the Model\n", "\n", "You can directly run the prepared training script to fine-tune the model. Remember to check `model_name_or_path`." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [] }, "outputs": [], "source": [ "!python ../finetune/deepspeed/finetune.py \\\n", " --model_name_or_path \"Qwen/Qwen-7B-Chat/\"\\\n", " --data_path \"Belle_sampled_qwen.json\"\\\n", " --bf16 \\\n", " --output_dir \"output_qwen\" \\\n", " --num_train_epochs 5 \\\n", " --per_device_train_batch_size 1 \\\n", " --per_device_eval_batch_size 1 \\\n", " --gradient_accumulation_steps 16 \\\n", " --evaluation_strategy \"no\" \\\n", " --save_strategy \"steps\" \\\n", " --save_steps 1000 \\\n", " --save_total_limit 10 \\\n", " --learning_rate 1e-5 \\\n", " --weight_decay 0.1 \\\n", " --adam_beta2 0.95 \\\n", " --warmup_ratio 0.01 \\\n", " --lr_scheduler_type \"cosine\" \\\n", " --logging_steps 1 \\\n", " --report_to \"none\" \\\n", " --model_max_length 512 \\\n", " --gradient_checkpointing \\\n", " --lazy_preprocess \\\n", " --use_lora" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 3.3 Merge Weights\n", "\n", "LoRA training only saves the adapter parameters. You can load the fine-tuned model and merge weights as shown below:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from transformers import AutoModelForCausalLM\n", "from peft import PeftModel\n", "import torch\n", "\n", "model = AutoModelForCausalLM.from_pretrained(\"Qwen/Qwen-7B-Chat/\", torch_dtype=torch.float16, device_map=\"auto\", trust_remote_code=True)\n", "model = PeftModel.from_pretrained(model, \"output_qwen/\")\n", "merged_model = model.merge_and_unload()\n", "merged_model.save_pretrained(\"output_qwen_merged\", max_shard_size=\"2048MB\", safe_serialization=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The tokenizer files are not saved in the new directory in this step. You can copy the tokenizer files or use the following code:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from transformers import AutoTokenizer\n", "\n", "tokenizer = AutoTokenizer.from_pretrained(\n", " \"Qwen/Qwen-7B-Chat/\",\n", " trust_remote_code=True\n", ")\n", "\n", "tokenizer.save_pretrained(\"output_qwen_merged\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 3.4 Test the Model\n", "\n", "After merging the weights, we can test the model as follows:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from transformers import AutoModelForCausalLM, AutoTokenizer\n", "from transformers.generation import GenerationConfig\n", "\n", "tokenizer = AutoTokenizer.from_pretrained(\"output_qwen_merged\", trust_remote_code=True)\n", "model = AutoModelForCausalLM.from_pretrained(\n", " \"output_qwen_merged\",\n", " device_map=\"auto\",\n", " trust_remote_code=True\n", ").eval()\n", "\n", "response, history = model.chat(tokenizer, \"你好\", history=None)\n", "print(response)" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.13" }, "vscode": { "interpreter": { "hash": "2d58e898dde0263bc564c6968b04150abacfd33eed9b19aaa8e45c040360e146" } } }, "nbformat": 4, "nbformat_minor": 4 }