Isekai-Qwen/recipes/finetune/deepspeed/finetune_qlora_multi_gpu.ipynb

{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "6e6981ab-2d9a-4280-923f-235a166855ba",
   "metadata": {},
   "source": [
    "# QLoRA Fine-Tuning Qwen-Chat Large Language Model (Multiple GPUs)\n",
    "\n",
    "Tongyi Qianwen is a large language model developed by Alibaba Cloud based on the Transformer architecture, trained on an extensive set of pre-training data. The pre-training data is diverse and covers a wide range, including a large amount of internet text, specialized books, code, etc. In addition, an AI assistant called Qwen-Chat has been created based on the pre-trained model using alignment mechanism.\n",
    "\n",
    "This notebook uses Qwen-1.8B-Chat as an example to introduce how to QLoRA fine-tune the Qianwen model using Deepspeed.\n",
    "\n",
    "## Environment Requirements\n",
    "\n",
    "Please refer to **requirements.txt** to install the required dependencies.\n",
    "\n",
    "## Preparation\n",
    "\n",
    "### Download Qwen-1.8B-Chat\n",
    "\n",
    "First, download the model files. You can choose to download directly from ModelScope."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "248488f9-4a86-4f35-9d56-50f8e91a8f11",
   "metadata": {
    "ExecutionIndicator": {
     "show": true
    },
    "execution": {
     "iopub.execute_input": "2023-12-31T08:42:52.842315Z",
     "iopub.status.busy": "2023-12-31T08:42:52.841665Z",
     "iopub.status.idle": "2023-12-31T08:44:19.832661Z",
     "shell.execute_reply": "2023-12-31T08:44:19.832193Z",
     "shell.execute_reply.started": "2023-12-31T08:42:52.842295Z"
    },
    "tags": []
   },
   "outputs": [],
   "source": [
    "from modelscope.hub.snapshot_download import snapshot_download\n",
    "model_dir = snapshot_download('Qwen/Qwen-1_8B-Chat-Int4', cache_dir='.', revision='master')"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "7b2a92b1-f08e-4413-9f92-8f23761e6e1f",
   "metadata": {},
   "source": [
    "### Download Example Training Data\n",
    "\n",
    "Download the data required for training; here, we provide a tiny dataset as an example. It is sampled from [Belle](https://github.com/LianjiaTech/BELLE).\n",
    "\n",
    "Disclaimer: the dataset can be only used for the research purpose."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "ce195f08-fbb2-470e-b6c0-9a03457458c7",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "!wget https://atp-modelzoo-sh.oss-cn-shanghai.aliyuncs.com/release/tutorials/qwen_recipes/Belle_sampled_qwen.json"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7226bed0-171b-4d45-a3f9-b3d81ec2bb9f",
   "metadata": {},
   "source": [
    "You can also refer to this format to prepare the dataset. Below is a simple example list with 1 sample:\n",
    "\n",
    "```json\n",
    "[\n",
    "  {\n",
    "    \"id\": \"identity_0\",\n",
    "    \"conversations\": [\n",
    "      {\n",
    "        \"from\": \"user\",\n",
    "        \"value\": \"你好\"\n",
    "      },\n",
    "      {\n",
    "        \"from\": \"assistant\",\n",
    "        \"value\": \"我是一个语言模型，我叫通义千问。\"\n",
    "      }\n",
    "    ]\n",
    "  }\n",
    "]\n",
    "```\n",
    "\n",
    "You can also use multi-turn conversations as the training set. Here is a simple example:\n",
    "\n",
    "```json\n",
    "[\n",
    "  {\n",
    "    \"id\": \"identity_0\",\n",
    "    \"conversations\": [\n",
    "      {\n",
    "        \"from\": \"user\",\n",
    "        \"value\": \"你好，能告诉我遛狗的最佳时间吗？\"\n",
    "      },\n",
    "      {\n",
    "        \"from\": \"assistant\",\n",
    "        \"value\": \"当地最佳遛狗时间因地域差异而异，请问您所在的城市是哪里？\"\n",
    "      },\n",
    "      {\n",
    "        \"from\": \"user\",\n",
    "        \"value\": \"我在纽约市。\"\n",
    "      },\n",
    "      {\n",
    "        \"from\": \"assistant\",\n",
    "        \"value\": \"纽约市的遛狗最佳时间通常在早晨6点至8点和晚上8点至10点之间，因为这些时间段气温较低，遛狗更加舒适。但具体时间还需根据气候、气温和季节变化而定。\"\n",
    "      }\n",
    "    ]\n",
    "  }\n",
    "]\n",
    "```\n",
    "\n",
    "## Fine-Tune the Model\n",
    "\n",
    "You can directly run the prepared training script to fine-tune the model. **nproc_per_node** refers to the number of GPUs used fro training."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "7ab0581e-be85-45e6-a5b7-af9c42ea697b",
   "metadata": {
    "ExecutionIndicator": {
     "show": true
    },
    "execution": {
     "iopub.execute_input": "2023-12-31T08:45:37.959631Z",
     "iopub.status.busy": "2023-12-31T08:45:37.958961Z",
     "iopub.status.idle": "2023-12-31T08:46:19.501657Z",
     "shell.execute_reply": "2023-12-31T08:46:19.500854Z",
     "shell.execute_reply.started": "2023-12-31T08:45:37.959609Z"
    },
    "tags": []
   },
   "outputs": [],
   "source": [
    "!torchrun --nproc_per_node 2 --nnodes 1 --node_rank 0 --master_addr localhost --master_port 6601 ../../finetune.py \\\n",
    "    --model_name_or_path \"Qwen/Qwen-1_8B-Chat-Int4/\" \\\n",
    "    --data_path \"Belle_sampled_qwen.json\" \\\n",
    "    --bf16 True \\\n",
    "    --output_dir \"output_qwen\" \\\n",
    "    --num_train_epochs 5 \\\n",
    "    --per_device_train_batch_size 1 \\\n",
    "    --per_device_eval_batch_size 1 \\\n",
    "    --gradient_accumulation_steps 16 \\\n",
    "    --evaluation_strategy \"no\" \\\n",
    "    --save_strategy \"steps\" \\\n",
    "    --save_steps 1000 \\\n",
    "    --save_total_limit 10 \\\n",
    "    --learning_rate 1e-5 \\\n",
    "    --weight_decay 0.1 \\\n",
    "    --adam_beta2 0.95 \\\n",
    "    --warmup_ratio 0.01 \\\n",
    "    --lr_scheduler_type \"cosine\" \\\n",
    "    --logging_steps 1 \\\n",
    "    --report_to \"none\" \\\n",
    "    --model_max_length 512 \\\n",
    "    --gradient_checkpointing True \\\n",
    "    --lazy_preprocess True \\\n",
    "    --deepspeed \"../../finetune/ds_config_zero2.json\" \\\n",
    "    --use_lora \\\n",
    "    --q_lora"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Merge Weights\n",
    "\n",
    "The training of both LoRA and Q-LoRA only saves the adapter parameters. Note that you can not merge weights into quantized models. Instead, we can merge the weights based on the original chat model.\n",
    "\n",
    "You can load the fine-tuned model and merge weights as shown below:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from modelscope.hub.snapshot_download import snapshot_download\n",
    "snapshot_download('Qwen/Qwen-1_8B-Chat', cache_dir='.', revision='master')\n",
    "\n",
    "from transformers import AutoModelForCausalLM\n",
    "from peft import PeftModel\n",
    "import torch\n",
    "\n",
    "model = AutoModelForCausalLM.from_pretrained(\"Qwen/Qwen-1_8B-Chat/\", torch_dtype=torch.float16, device_map=\"auto\", trust_remote_code=True)\n",
    "model = PeftModel.from_pretrained(model, \"output_qwen/\")\n",
    "merged_model = model.merge_and_unload()\n",
    "merged_model.save_pretrained(\"output_qwen_merged\", max_shard_size=\"2048MB\", safe_serialization=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The tokenizer files are not saved in the new directory in this step. You can copy the tokenizer files or use the following code:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from transformers import AutoTokenizer\n",
    "\n",
    "tokenizer = AutoTokenizer.from_pretrained(\n",
    "    \"Qwen/Qwen-1_8B-Chat-Int4/\",\n",
    "    trust_remote_code=True\n",
    ")\n",
    "\n",
    "tokenizer.save_pretrained(\"output_qwen_merged\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Test the Model\n",
    "\n",
    "After merging the weights, we can test the model as follows:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from transformers import AutoModelForCausalLM, AutoTokenizer\n",
    "from transformers.generation import GenerationConfig\n",
    "\n",
    "tokenizer = AutoTokenizer.from_pretrained(\"output_qwen_merged\", trust_remote_code=True)\n",
    "model = AutoModelForCausalLM.from_pretrained(\n",
    "    \"output_qwen_merged\",\n",
    "    device_map=\"auto\",\n",
    "    trust_remote_code=True\n",
    ").eval()\n",
    "\n",
    "response, history = model.chat(tokenizer, \"你好\", history=None)\n",
    "print(response)"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.10.13"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}