# LoRA Fine-Tuning Qwen-Chat Large Language Model (Single GPU)

Tongyi Qianwen is a large language model developed by Alibaba Cloud based on the Transformer architecture, trained on an extensive set of pre-training data. The pre-training data is diverse and covers a wide range, including a large amount of internet text, specialized books, code, etc. In addition, an AI assistant called Qwen-Chat has been created based on the pre-trained model using alignment mechanism.

This notebook uses Qwen-1.8B-Chat as an example to introduce how to LoRA fine-tune the Qianwen model using Deepspeed.

## Environment Requirements

Please refer to **requirements.txt** to install the required dependencies.

## Preparation

### Download Qwen-1.8B-Chat

First, download the model files. You can choose to download directly from ModelScope.

In [None]:
from modelscope.hub.snapshot_download import snapshot_download
model_dir = snapshot_download('Qwen/Qwen-1_8B-Chat', cache_dir='.', revision='master')

### Download Example Training Data

Download the data required for training; here, we provide a tiny dataset as an example. It is sampled from [Belle](https://github.com/LianjiaTech/BELLE).

Disclaimer: the dataset can be only used for the research purpose.

In [None]:
!wget https://atp-modelzoo-sh.oss-cn-shanghai.aliyuncs.com/release/tutorials/qwen_recipes/Belle_sampled_qwen.json

You can also refer to this format to prepare the dataset. Below is a simple example list with 1 sample:

```json
[
 {
 "id": "identity_0",
 "conversations": [
 {
 "from": "user",
 "value": "你好"
 },
 {
 "from": "assistant",
 "value": "我是一个语言模型,我叫通义千问。"
 }
 ]
 }
]
```

You can also use multi-turn conversations as the training set. Here is a simple example:

```json
[
 {
 "id": "identity_0",
 "conversations": [
 {
 "from": "user",
 "value": "你好,能告诉我遛狗的最佳时间吗?"
 },
 {
 "from": "assistant",
 "value": "当地最佳遛狗时间因地域差异而异,请问您所在的城市是哪里?"
 },
 {
 "from": "user",
 "value": "我在纽约市。"
 },
 {
 "from": "assistant",
 "value": "纽约市的遛狗最佳时间通常在早晨6点至8点和晚上8点至10点之间,因为这些时间段气温较低,遛狗更加舒适。但具体时间还需根据气候、气温和季节变化而定。"
 }
 ]
 }
]
```

## Fine-Tune the Model

You can directly run the prepared training script to fine-tune the model.

In [None]:
!export CUDA_VISIBLE_DEVICES=0
!python ../../finetune.py \
 --model_name_or_path "Qwen/Qwen-1_8B-Chat/"\
 --data_path "Belle_sampled_qwen.json"\
 --bf16 \
 --output_dir "output_qwen" \
 --num_train_epochs 5 \
 --per_device_train_batch_size 1 \
 --per_device_eval_batch_size 1 \
 --gradient_accumulation_steps 16 \
 --evaluation_strategy "no" \
 --save_strategy "steps" \
 --save_steps 1000 \
 --save_total_limit 10 \
 --learning_rate 1e-5 \
 --weight_decay 0.1 \
 --adam_beta2 0.95 \
 --warmup_ratio 0.01 \
 --lr_scheduler_type "cosine" \
 --logging_steps 1 \
 --report_to "none" \
 --model_max_length 512 \
 --gradient_checkpointing \
 --lazy_preprocess \
 --use_lora

## Merge Weights

The training of both LoRA and Q-LoRA only saves the adapter parameters. You can load the fine-tuned model and merge weights as shown below:

In [None]:
from transformers import AutoModelForCausalLM
from peft import PeftModel
import torch

model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen-1_8B-Chat/", torch_dtype=torch.float16, device_map="auto", trust_remote_code=True)
model = PeftModel.from_pretrained(model, "output_qwen/")
merged_model = model.merge_and_unload()
merged_model.save_pretrained("output_qwen_merged", max_shard_size="2048MB", safe_serialization=True)

The tokenizer files are not saved in the new directory in this step. You can copy the tokenizer files or use the following code:

In [None]:
from transformers import AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained(
 "Qwen/Qwen-1_8B-Chat/",
 trust_remote_code=True
)

tokenizer.save_pretrained("output_qwen_merged")

## Test the Model

After merging the weights, we can test the model as follows:

In [None]:
from transformers import AutoModelForCausalLM, AutoTokenizer
from transformers.generation import GenerationConfig

tokenizer = AutoTokenizer.from_pretrained("output_qwen_merged", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
 "output_qwen_merged",
 device_map="auto",
 trust_remote_code=True
).eval()

response, history = model.chat(tokenizer, "你好", history=None)
print(response)