You cannot select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

422 lines
16 KiB
Plaintext

{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Qwen Quick Start Notebook\n",
"\n",
"This notebook shows how to train and infer the Qwen-7B-Chat model on a single GPU. Similarly, Qwen-1.8B-Chat, Qwen-14B-Chat can also be leveraged for the following steps. We only need to modify the corresponding `model name` and hyper-parameters. The training and inference of Qwen-72B-Chat requires higher GPU requirements and larger disk space."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Requirements\n",
"- Python 3.8 and above\n",
"- Pytorch 1.12 and above, 2.0 and above are recommended\n",
"- CUDA 11.4 and above are recommended (this is for GPU users, flash-attention users, etc.)\n",
"We test the training of the model on an A10 GPU (24GB)."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Extra\n",
"If you need to speed up, you can install `flash-attention`. The details of the installation can be found [here](https://github.com/Dao-AILab/flash-attention)."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"!git clone https://github.com/Dao-AILab/flash-attention\n",
"!cd flash-attention && pip install .\n",
"# Below are optional. Installing them might be slow.\n",
"# !pip install csrc/layer_norm\n",
"# If the version of flash-attn is higher than 2.1.1, the following is not needed.\n",
"# !pip install csrc/rotary"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Step 0: Install Package Requirements"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"!pip install transformers>=4.32.0 accelerate tiktoken einops scipy transformers_stream_generator==0.0.4 peft deepspeed modelscope"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Step 1: Download Model\n",
"When using `transformers` in some regions, the model cannot be automatically downloaded due to network problems. We recommend using `modelscope` to download the model first, and then use `transformers` for inference."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"from modelscope import snapshot_download\n",
"\n",
"# Downloading model checkpoint to a local dir model_dir.\n",
"model_dir = snapshot_download('Qwen/Qwen-7B-Chat', cache_dir='.', revision='master')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Step 2: Direct Model Inference \n",
"We recommend two ways to do model inference: `modelscope` and `transformers`.\n",
"\n",
"#### 2.1 Model Inference with ModelScope"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"ExecutionIndicator": {
"show": true
},
"tags": []
},
"outputs": [],
"source": [
"from modelscope import AutoModelForCausalLM, AutoTokenizer\n",
"from modelscope import GenerationConfig\n",
"\n",
"# Note: The default behavior now has injection attack prevention off.\n",
"tokenizer = AutoTokenizer.from_pretrained(\"Qwen/Qwen-7B-Chat/\", trust_remote_code=True)\n",
"\n",
"# use bf16\n",
"# model = AutoModelForCausalLM.from_pretrained(\"qwen/Qwen-7B-Chat/\", device_map=\"auto\", trust_remote_code=True, bf16=True).eval()\n",
"# use fp16\n",
"# model = AutoModelForCausalLM.from_pretrained(\"qwen/Qwen-7B-Chat/\", device_map=\"auto\", trust_remote_code=True, fp16=True).eval()\n",
"# use cpu only\n",
"# model = AutoModelForCausalLM.from_pretrained(\"qwen/Qwen-7B-Chat/\", device_map=\"cpu\", trust_remote_code=True).eval()\n",
"# use auto mode, automatically select precision based on the device.\n",
"model = AutoModelForCausalLM.from_pretrained(\"Qwen/Qwen-7B-Chat/\", device_map=\"auto\", trust_remote_code=True).eval()\n",
"\n",
"# Specify hyperparameters for generation. But if you use transformers>=4.32.0, there is no need to do this.\n",
"# model.generation_config = GenerationConfig.from_pretrained(\"Qwen/Qwen-7B-Chat/\", trust_remote_code=True) # 可指定不同的生成长度、top_p等相关超参\n",
"\n",
"# 第一轮对话 1st dialogue turn\n",
"response, history = model.chat(tokenizer, \"你好\", history=None)\n",
"print(response)\n",
"# 你好!很高兴为你提供帮助。\n",
"\n",
"# 第二轮对话 2nd dialogue turn\n",
"response, history = model.chat(tokenizer, \"给我讲一个年轻人奋斗创业最终取得成功的故事。\", history=history)\n",
"print(response)\n",
"# 这是一个关于一个年轻人奋斗创业最终取得成功的故事。\n",
"# 故事的主人公叫李明,他来自一个普通的家庭,父母都是普通的工人。从小,李明就立下了一个目标:要成为一名成功的企业家。\n",
"# 为了实现这个目标,李明勤奋学习,考上了大学。在大学期间,他积极参加各种创业比赛,获得了不少奖项。他还利用课余时间去实习,积累了宝贵的经验。\n",
"# 毕业后,李明决定开始自己的创业之路。他开始寻找投资机会,但多次都被拒绝了。然而,他并没有放弃。他继续努力,不断改进自己的创业计划,并寻找新的投资机会。\n",
"# 最终,李明成功地获得了一笔投资,开始了自己的创业之路。他成立了一家科技公司,专注于开发新型软件。在他的领导下,公司迅速发展起来,成为了一家成功的科技企业。\n",
"# 李明的成功并不是偶然的。他勤奋、坚韧、勇于冒险,不断学习和改进自己。他的成功也证明了,只要努力奋斗,任何人都有可能取得成功。\n",
"\n",
"# 第三轮对话 3rd dialogue turn\n",
"response, history = model.chat(tokenizer, \"给这个故事起一个标题\", history=history)\n",
"print(response)\n",
"# 《奋斗创业:一个年轻人的成功之路》"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### 2.2 Model Inference with transformers"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"ExecutionIndicator": {
"show": true
},
"tags": []
},
"outputs": [],
"source": [
"from transformers import AutoModelForCausalLM, AutoTokenizer\n",
"from transformers.generation import GenerationConfig\n",
"\n",
"tokenizer = AutoTokenizer.from_pretrained(\"Qwen/Qwen-7B-Chat/\", trust_remote_code=True)\n",
"\n",
"# use bf16\n",
"# model = AutoModelForCausalLM.from_pretrained(\"Qwen/Qwen-7B-Chat/\", device_map=\"auto\", trust_remote_code=True, bf16=True).eval()\n",
"# use fp16\n",
"# model = AutoModelForCausalLM.from_pretrained(\"Qwen/Qwen-7B-Chat/\", device_map=\"auto\", trust_remote_code=True, fp16=True).eval()\n",
"# use cpu only\n",
"# model = AutoModelForCausalLM.from_pretrained(\"Qwen/Qwen-7B-Chat/\", device_map=\"cpu\", trust_remote_code=True).eval()\n",
"# use auto mode, automatically select precision based on the device.\n",
"model = AutoModelForCausalLM.from_pretrained(\n",
" \"Qwen/Qwen-7B-Chat/\",\n",
" device_map=\"auto\",\n",
" trust_remote_code=True\n",
").eval()\n",
"\n",
"# Specify hyperparameters for generation. But if you use transformers>=4.32.0, there is no need to do this.\n",
"# model.generation_config = GenerationConfig.from_pretrained(\"Qwen/Qwen-7B-Chat/\", trust_remote_code=True)\n",
"\n",
"# 1st dialogue turn\n",
"response, history = model.chat(tokenizer, \"你好\", history=None)\n",
"print(response)\n",
"# 你好!很高兴为你提供帮助。\n",
"\n",
"# 2nd dialogue turn\n",
"response, history = model.chat(tokenizer, \"给我讲一个年轻人奋斗创业最终取得成功的故事。\", history=history)\n",
"print(response)\n",
"# 这是一个关于一个年轻人奋斗创业最终取得成功的故事。\n",
"# 故事的主人公叫李明,他来自一个普通的家庭,父母都是普通的工人。从小,李明就立下了一个目标:要成为一名成功的企业家。\n",
"# 为了实现这个目标,李明勤奋学习,考上了大学。在大学期间,他积极参加各种创业比赛,获得了不少奖项。他还利用课余时间去实习,积累了宝贵的经验。\n",
"# 毕业后,李明决定开始自己的创业之路。他开始寻找投资机会,但多次都被拒绝了。然而,他并没有放弃。他继续努力,不断改进自己的创业计划,并寻找新的投资机会。\n",
"# 最终,李明成功地获得了一笔投资,开始了自己的创业之路。他成立了一家科技公司,专注于开发新型软件。在他的领导下,公司迅速发展起来,成为了一家成功的科技企业。\n",
"# 李明的成功并不是偶然的。他勤奋、坚韧、勇于冒险,不断学习和改进自己。他的成功也证明了,只要努力奋斗,任何人都有可能取得成功。\n",
"\n",
"# 3rd dialogue turn\n",
"response, history = model.chat(tokenizer, \"给这个故事起一个标题\", history=history)\n",
"print(response)\n",
"# 《奋斗创业:一个年轻人的成功之路》"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Step 3: LoRA Fine-Tuning Model (Single GPU)\n",
"\n",
"#### 3.1 Download Example Training Data\n",
"Download the data required for training; here, we provide a tiny dataset as an example. It is sampled from [Belle](https://github.com/LianjiaTech/BELLE)."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"!wget https://atp-modelzoo-sh.oss-cn-shanghai.aliyuncs.com/release/tutorials/qwen_recipes/Belle_sampled_qwen.json"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"You can refer to this format to prepare the dataset. Below is a simple example list with 1 sample:\n",
"\n",
"```json\n",
"[\n",
" {\n",
" \"id\": \"identity_0\",\n",
" \"conversations\": [\n",
" {\n",
" \"from\": \"user\",\n",
" \"value\": \"你好\"\n",
" },\n",
" {\n",
" \"from\": \"assistant\",\n",
" \"value\": \"我是一个语言模型,我叫通义千问。\"\n",
" }\n",
" ]\n",
" }\n",
"]\n",
"```\n",
"\n",
"You can also use multi-turn conversations as the training set. Here is a simple example:\n",
"\n",
"```json\n",
"[\n",
" {\n",
" \"id\": \"identity_0\",\n",
" \"conversations\": [\n",
" {\n",
" \"from\": \"user\",\n",
" \"value\": \"你好\"\n",
" },\n",
" {\n",
" \"from\": \"assistant\",\n",
" \"value\": \"你好我是一名AI助手我叫通义千问有需要请告诉我。\"\n",
" },\n",
" {\n",
" \"from\": \"user\",\n",
" \"value\": \"你都能做什么\"\n",
" },\n",
" {\n",
" \"from\": \"assistant\",\n",
" \"value\": \"我能做很多事情,包括但不限于回答各种领域的问题、提供实用建议和指导、进行多轮对话交流、文本生成等。\"\n",
" }\n",
" ]\n",
" }\n",
"]\n",
"```\n",
"\n",
"#### 3.2 Fine-Tune the Model\n",
"\n",
"You can directly run the prepared training script to fine-tune the model. Remember to check `model_name_or_path`."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"!python ../finetune/deepspeed/finetune.py \\\n",
" --model_name_or_path \"Qwen/Qwen-7B-Chat/\"\\\n",
" --data_path \"Belle_sampled_qwen.json\"\\\n",
" --bf16 \\\n",
" --output_dir \"output_qwen\" \\\n",
" --num_train_epochs 5 \\\n",
" --per_device_train_batch_size 1 \\\n",
" --per_device_eval_batch_size 1 \\\n",
" --gradient_accumulation_steps 16 \\\n",
" --evaluation_strategy \"no\" \\\n",
" --save_strategy \"steps\" \\\n",
" --save_steps 1000 \\\n",
" --save_total_limit 10 \\\n",
" --learning_rate 1e-5 \\\n",
" --weight_decay 0.1 \\\n",
" --adam_beta2 0.95 \\\n",
" --warmup_ratio 0.01 \\\n",
" --lr_scheduler_type \"cosine\" \\\n",
" --logging_steps 1 \\\n",
" --report_to \"none\" \\\n",
" --model_max_length 512 \\\n",
" --gradient_checkpointing \\\n",
" --lazy_preprocess \\\n",
" --use_lora"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 3.3 Merge Weights\n",
"\n",
"LoRA training only saves the adapter parameters. You can load the fine-tuned model and merge weights as shown below:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from transformers import AutoModelForCausalLM\n",
"from peft import PeftModel\n",
"import torch\n",
"\n",
"model = AutoModelForCausalLM.from_pretrained(\"Qwen/Qwen-7B-Chat/\", torch_dtype=torch.float16, device_map=\"auto\", trust_remote_code=True)\n",
"model = PeftModel.from_pretrained(model, \"output_qwen/\")\n",
"merged_model = model.merge_and_unload()\n",
"merged_model.save_pretrained(\"output_qwen_merged\", max_shard_size=\"2048MB\", safe_serialization=True)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The tokenizer files are not saved in the new directory in this step. You can copy the tokenizer files or use the following code:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from transformers import AutoTokenizer\n",
"\n",
"tokenizer = AutoTokenizer.from_pretrained(\n",
" \"Qwen/Qwen-7B-Chat/\",\n",
" trust_remote_code=True\n",
")\n",
"\n",
"tokenizer.save_pretrained(\"output_qwen_merged\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 3.4 Test the Model\n",
"\n",
"After merging the weights, we can test the model as follows:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from transformers import AutoModelForCausalLM, AutoTokenizer\n",
"from transformers.generation import GenerationConfig\n",
"\n",
"tokenizer = AutoTokenizer.from_pretrained(\"output_qwen_merged\", trust_remote_code=True)\n",
"model = AutoModelForCausalLM.from_pretrained(\n",
" \"output_qwen_merged\",\n",
" device_map=\"auto\",\n",
" trust_remote_code=True\n",
").eval()\n",
"\n",
"response, history = model.chat(tokenizer, \"你好\", history=None)\n",
"print(response)"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.13"
},
"vscode": {
"interpreter": {
"hash": "2d58e898dde0263bc564c6968b04150abacfd33eed9b19aaa8e45c040360e146"
}
}
},
"nbformat": 4,
"nbformat_minor": 4
}