## Qwen Quick Start Notebook

This notebook shows how to train and infer the Qwen-7B-Chat model on a single GPU. Similarly, Qwen-1.8B-Chat, Qwen-14B-Chat can also be leveraged for the following steps. We only need to modify the corresponding `model name` and hyper-parameters. The training and inference of Qwen-72B-Chat requires higher GPU requirements and larger disk space.

## Requirements
- Python 3.8 and above
- Pytorch 1.12 and above, 2.0 and above are recommended
- CUDA 11.4 and above are recommended (this is for GPU users, flash-attention users, etc.)
We test the training of the model on an A10 GPU (24GB).

## Extra
If you need to speed up, you can install `flash-attention`. The details of the installation can be found [here](https://github.com/Dao-AILab/flash-attention).

In [None]:
!git clone https://github.com/Dao-AILab/flash-attention
!cd flash-attention && pip install .
# Below are optional. Installing them might be slow.
# !pip install csrc/layer_norm
# If the version of flash-attn is higher than 2.1.1, the following is not needed.
# !pip install csrc/rotary

### Step 0: Install Package Requirements

In [None]:
!pip install transformers>=4.32.0 accelerate tiktoken einops scipy transformers_stream_generator==0.0.4 peft deepspeed modelscope

### Step 1: Download Model
When using `transformers` in some regions, the model cannot be automatically downloaded due to network problems. We recommend using `modelscope` to download the model first, and then use `transformers` for inference.

In [None]:
from modelscope import snapshot_download

# Downloading model checkpoint to a local dir model_dir.
model_dir = snapshot_download('Qwen/Qwen-7B-Chat', cache_dir='.', revision='master')

### Step 2: Direct Model Inference 
We recommend two ways to do model inference: `modelscope` and `transformers`.

#### 2.1 Model Inference with ModelScope

In [None]:
from modelscope import AutoModelForCausalLM, AutoTokenizer
from modelscope import GenerationConfig

# Note: The default behavior now has injection attack prevention off.
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen-7B-Chat/", trust_remote_code=True)

# use bf16
# model = AutoModelForCausalLM.from_pretrained("qwen/Qwen-7B-Chat/", device_map="auto", trust_remote_code=True, bf16=True).eval()
# use fp16
# model = AutoModelForCausalLM.from_pretrained("qwen/Qwen-7B-Chat/", device_map="auto", trust_remote_code=True, fp16=True).eval()
# use cpu only
# model = AutoModelForCausalLM.from_pretrained("qwen/Qwen-7B-Chat/", device_map="cpu", trust_remote_code=True).eval()
# use auto mode, automatically select precision based on the device.
model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen-7B-Chat/", device_map="auto", trust_remote_code=True).eval()

# Specify hyperparameters for generation. But if you use transformers>=4.32.0, there is no need to do this.
# model.generation_config = GenerationConfig.from_pretrained("Qwen/Qwen-7B-Chat/", trust_remote_code=True) # 可指定不同的生成长度、top_p等相关超参

# 第一轮对话 1st dialogue turn
response, history = model.chat(tokenizer, "你好", history=None)
print(response)
# 你好!很高兴为你提供帮助。

# 第二轮对话 2nd dialogue turn
response, history = model.chat(tokenizer, "给我讲一个年轻人奋斗创业最终取得成功的故事。", history=history)
print(response)
# 这是一个关于一个年轻人奋斗创业最终取得成功的故事。
# 故事的主人公叫李明,他来自一个普通的家庭,父母都是普通的工人。从小,李明就立下了一个目标:要成为一名成功的企业家。
# 为了实现这个目标,李明勤奋学习,考上了大学。在大学期间,他积极参加各种创业比赛,获得了不少奖项。他还利用课余时间去实习,积累了宝贵的经验。
# 毕业后,李明决定开始自己的创业之路。他开始寻找投资机会,但多次都被拒绝了。然而,他并没有放弃。他继续努力,不断改进自己的创业计划,并寻找新的投资机会。
# 最终,李明成功地获得了一笔投资,开始了自己的创业之路。他成立了一家科技公司,专注于开发新型软件。在他的领导下,公司迅速发展起来,成为了一家成功的科技企业。
# 李明的成功并不是偶然的。他勤奋、坚韧、勇于冒险,不断学习和改进自己。他的成功也证明了,只要努力奋斗,任何人都有可能取得成功。

# 第三轮对话 3rd dialogue turn
response, history = model.chat(tokenizer, "给这个故事起一个标题", history=history)
print(response)
# 《奋斗创业:一个年轻人的成功之路》

#### 2.2 Model Inference with transformers

In [None]:
from transformers import AutoModelForCausalLM, AutoTokenizer
from transformers.generation import GenerationConfig

tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen-7B-Chat/", trust_remote_code=True)

# use bf16
# model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen-7B-Chat/", device_map="auto", trust_remote_code=True, bf16=True).eval()
# use fp16
# model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen-7B-Chat/", device_map="auto", trust_remote_code=True, fp16=True).eval()
# use cpu only
# model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen-7B-Chat/", device_map="cpu", trust_remote_code=True).eval()
# use auto mode, automatically select precision based on the device.
model = AutoModelForCausalLM.from_pretrained(
 "Qwen/Qwen-7B-Chat/",
 device_map="auto",
 trust_remote_code=True
).eval()

# Specify hyperparameters for generation. But if you use transformers>=4.32.0, there is no need to do this.
# model.generation_config = GenerationConfig.from_pretrained("Qwen/Qwen-7B-Chat/", trust_remote_code=True)

# 1st dialogue turn
response, history = model.chat(tokenizer, "你好", history=None)
print(response)
# 你好!很高兴为你提供帮助。

# 2nd dialogue turn
response, history = model.chat(tokenizer, "给我讲一个年轻人奋斗创业最终取得成功的故事。", history=history)
print(response)
# 这是一个关于一个年轻人奋斗创业最终取得成功的故事。
# 故事的主人公叫李明,他来自一个普通的家庭,父母都是普通的工人。从小,李明就立下了一个目标:要成为一名成功的企业家。
# 为了实现这个目标,李明勤奋学习,考上了大学。在大学期间,他积极参加各种创业比赛,获得了不少奖项。他还利用课余时间去实习,积累了宝贵的经验。
# 毕业后,李明决定开始自己的创业之路。他开始寻找投资机会,但多次都被拒绝了。然而,他并没有放弃。他继续努力,不断改进自己的创业计划,并寻找新的投资机会。
# 最终,李明成功地获得了一笔投资,开始了自己的创业之路。他成立了一家科技公司,专注于开发新型软件。在他的领导下,公司迅速发展起来,成为了一家成功的科技企业。
# 李明的成功并不是偶然的。他勤奋、坚韧、勇于冒险,不断学习和改进自己。他的成功也证明了,只要努力奋斗,任何人都有可能取得成功。

# 3rd dialogue turn
response, history = model.chat(tokenizer, "给这个故事起一个标题", history=history)
print(response)
# 《奋斗创业:一个年轻人的成功之路》

### Step 3: LoRA Fine-Tuning Model (Single GPU)

#### 3.1 Download Example Training Data
Download the data required for training; here, we provide a tiny dataset as an example. It is sampled from [Belle](https://github.com/LianjiaTech/BELLE).

In [None]:
!wget https://atp-modelzoo-sh.oss-cn-shanghai.aliyuncs.com/release/tutorials/qwen_recipes/Belle_sampled_qwen.json

You can refer to this format to prepare the dataset. Below is a simple example list with 1 sample:

```json
[
 {
 "id": "identity_0",
 "conversations": [
 {
 "from": "user",
 "value": "你好"
 },
 {
 "from": "assistant",
 "value": "我是一个语言模型,我叫通义千问。"
 }
 ]
 }
]
```

You can also use multi-turn conversations as the training set. Here is a simple example:

```json
[
 {
 "id": "identity_0",
 "conversations": [
 {
 "from": "user",
 "value": "你好"
 },
 {
 "from": "assistant",
 "value": "你好!我是一名AI助手,我叫通义千问,有需要请告诉我。"
 },
 {
 "from": "user",
 "value": "你都能做什么"
 },
 {
 "from": "assistant",
 "value": "我能做很多事情,包括但不限于回答各种领域的问题、提供实用建议和指导、进行多轮对话交流、文本生成等。"
 }
 ]
 }
]
```

#### 3.2 Fine-Tune the Model

You can directly run the prepared training script to fine-tune the model. Remember to check `model_name_or_path`.

In [None]:
!python ../finetune/deepspeed/finetune.py \
 --model_name_or_path "Qwen/Qwen-7B-Chat/"\
 --data_path "Belle_sampled_qwen.json"\
 --bf16 \
 --output_dir "output_qwen" \
 --num_train_epochs 5 \
 --per_device_train_batch_size 1 \
 --per_device_eval_batch_size 1 \
 --gradient_accumulation_steps 16 \
 --evaluation_strategy "no" \
 --save_strategy "steps" \
 --save_steps 1000 \
 --save_total_limit 10 \
 --learning_rate 1e-5 \
 --weight_decay 0.1 \
 --adam_beta2 0.95 \
 --warmup_ratio 0.01 \
 --lr_scheduler_type "cosine" \
 --logging_steps 1 \
 --report_to "none" \
 --model_max_length 512 \
 --gradient_checkpointing \
 --lazy_preprocess \
 --use_lora

### 3.3 Merge Weights

LoRA training only saves the adapter parameters. You can load the fine-tuned model and merge weights as shown below:

In [None]:
from transformers import AutoModelForCausalLM
from peft import PeftModel
import torch

model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen-7B-Chat/", torch_dtype=torch.float16, device_map="auto", trust_remote_code=True)
model = PeftModel.from_pretrained(model, "output_qwen/")
merged_model = model.merge_and_unload()
merged_model.save_pretrained("output_qwen_merged", max_shard_size="2048MB", safe_serialization=True)

The tokenizer files are not saved in the new directory in this step. You can copy the tokenizer files or use the following code:

In [None]:
from transformers import AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained(
 "Qwen/Qwen-7B-Chat/",
 trust_remote_code=True
)

tokenizer.save_pretrained("output_qwen_merged")

### 3.4 Test the Model

After merging the weights, we can test the model as follows:

In [None]:
from transformers import AutoModelForCausalLM, AutoTokenizer
from transformers.generation import GenerationConfig

tokenizer = AutoTokenizer.from_pretrained("output_qwen_merged", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
 "output_qwen_merged",
 device_map="auto",
 trust_remote_code=True
).eval()

response, history = model.chat(tokenizer, "你好", history=None)
print(response)