This file contains ambiguous Unicode characters that may be confused with others in your current locale. If your use case is intentional and legitimate, you can safely ignore this warning. Use the Escape button to highlight these characters.
{
"cells": [
{
"cell_type": "markdown",
"id": "0e7993c3-3999-4ac5-b1dc-77875d80e4c8",
"metadata": {},
"source": [
"# Fine-tuning Qwen-7B-Chat on Your Own Domain-Specific Data\n",
"\n",
"This notebook uses Qwen-7B-Chat as an example to introduce how to LoRA fine-tune the Qwen model on a specific domain.\n",
"\n",
"Qwen is a pretrained conversational model that supports English and Chinese. It is suitable for universal scenarios, but may lack some specialized knowledge in certain specific domain. If you would like to fine-tune it for a specific domain, or on your own private dataset, you can refer to this tutorial.\n",
"\n",
"Here is an example showing the differences before and after fine-tuning.\n",
"Download the data required for training; here, we provide a medical conversation dataset for training. It is sampled from [MedicalGPT repo](https://github.com/shibing624/MedicalGPT/) and we have converted this dataset into a format that can be used for fine-tuning.\n",
"\n",
"Disclaimer: the dataset can be only used for the research purpose."
"You can prepare your dataset in JSON format following the format below, and then modify the `--data_path` parameter in the training command to point to your JSON file.\n",
"\n",
"These data instances can be conversations in the real world or include domain knowledge QA pairs. Besides, fine-tuning allows Qwen-chat to play like some specific roles. As Qwen-chat is a dialogue model for general scenarios, your fine-tuning can customize a chatbot to meet your requirements.\n",
"\n",
"We recommend that you prepare 50~ data instances if you want to fine-tune Qwen-chat as a roleplay model.\n",
"\n",
"You may prepare much more data instances if you want to infuse the domain knowledge of your field into the model.\n",
"\n",
"In this tutorial, we have prepared a medical domain fine-tuning dataset consisting of 1000 data instancess as an example. You can refer to our example to fine-tune on your own domain-specific dataset.\n",
"\n",
"Below is a simple example list with 1 sample:\n",
"You can directly run the prepared training script to fine-tune the model. \n",
"\n",
"For parameter settings, you can modify `--model_name_or_path` to the location of the model you want to fine-tune, and set `--data_path` to the location of the dataset.\n",
"\n",
"You should remove the `--bf16` parameter if you are using a non-Ampere architecture GPU, such as a V100. \n",
"\n",
"For `--model_max_length` and `--per_device_train_batch_size`, we recommend the following configurations, ,you can refer to [this document](../../finetune/deepspeed/readme.md) for more details:\n",
"You can use our recommended saving parameters, or you can save by epoch by just setting `--save_strategy \"epoch\"` if you prefer to save at each epoch stage. `--save_total_limit` means the limit on the number of saved checkpoints.\n",
"\n",
"For other parameters, such as `--weight_decay` and `--adam_beta2`, we recommend using the values we provided blow.\n",
"\n",
"Setting the parameters `--gradient_checkpointing` and `--lazy_preprocess` is to save GPU memory.\n",
"\n",
"The parameters for the trained Lora module will be saved in the **output_qwen** folder."
"The model is automatically converting to bf16 for faster inference. If you want to disable the automatic precision, please manually add bf16/fp16/fp32=True to \"AutoModelForCausalLM.from_pretrained\".\n",
"Try importing flash-attention for faster inference...\n",
"Warning: import flash_attn rms_norm fail, please install FlashAttention layer_norm to get higher efficiency https://github.com/Dao-AILab/flash-attention/tree/main/csrc/layer_norm\n",