From da5b44f9343cc725535d36722bfef1162d2c2612 Mon Sep 17 00:00:00 2001
From: Ren Xuancheng <jklj077@users.noreply.github.com>
Date: Tue, 12 Mar 2024 15:36:02 +0800
Subject: [PATCH 01/12] Update README.md

---
 README.md | 11 +++--------
 1 file changed, 3 insertions(+), 8 deletions(-)

diff --git a/README.md b/README.md
index d3f19ce..2b132d1 100644
--- a/README.md
+++ b/README.md
@@ -889,18 +889,13 @@ The statistics are listed below:
 
 For deployment and fast inference, we suggest using vLLM. 
 
-If you use cuda 12.1 and pytorch 2.1, you can directly use the following command to install vLLM.
+If you use **CUDA 12.1 and PyTorch 2.1**, you can directly use the following command to install vLLM.
 
 ```bash
-# pip install vllm  # This line is faster but it does not support quantization models.
-
-# The below lines support int4 quantization (int8 will be supported soon). The installation are slower (~10 minutes).
-git clone https://github.com/QwenLM/vllm-gptq
-cd vllm-gptq
-pip install -e .
+pip install vllm
 ```
 
-Otherwise, please refer to the official vLLM [Installation Instructions](https://docs.vllm.ai/en/latest/getting_started/installation.html), or our [vLLM repo for GPTQ quantization](https://github.com/QwenLM/vllm-gptq).
+Otherwise, please refer to the official vLLM [Installation Instructions](https://docs.vllm.ai/en/latest/getting_started/installation.html).
 
 #### vLLM + Transformer-like Wrapper
 

From 5ff919d6f0dd52c1b308b544a1935b502d484b6f Mon Sep 17 00:00:00 2001
From: Ren Xuancheng <jklj077@users.noreply.github.com>
Date: Tue, 12 Mar 2024 15:37:18 +0800
Subject: [PATCH 02/12] Update README_CN.md

---
 README_CN.md | 11 +++--------
 1 file changed, 3 insertions(+), 8 deletions(-)

diff --git a/README_CN.md b/README_CN.md
index 77963d5..1f2fa5d 100644
--- a/README_CN.md
+++ b/README_CN.md
@@ -879,18 +879,13 @@ print(response)
 ### vLLM
 如希望部署及加速推理，我们建议你使用vLLM。
 
-如果你使用cuda12.1和pytorch2.1，可以直接使用以下命令安装vLLM。
+如果你使用**CUDA 12.1和PyTorch 2.1**，可以直接使用以下命令安装vLLM。
 
 ```bash
-# pip install vllm  # 该方法安装较快，但官方版本不支持量化模型
-
-# 下面方法支持int4量化 (int8量化模型支持将近期更新)，但安装更慢 (约~10分钟)。
-git clone https://github.com/QwenLM/vllm-gptq
-cd vllm-gptq
-pip install -e .
+# pip install vllm
 ```
 
-否则请参考vLLM官方的[安装说明](https://docs.vllm.ai/en/latest/getting_started/installation.html)，或者安装我们[vLLM分支仓库](https://github.com/QwenLM/vllm-gptq)。
+否则请参考vLLM官方的[安装说明](https://docs.vllm.ai/en/latest/getting_started/installation.html)。
 
 #### vLLM + 类Transformer接口
 

From 490f7480e0f73608b86d8f84255e771f1e58c9a4 Mon Sep 17 00:00:00 2001
From: Ren Xuancheng <jklj077@users.noreply.github.com>
Date: Tue, 12 Mar 2024 15:38:40 +0800
Subject: [PATCH 03/12] Update README_JA.md

---
 README_JA.md | 11 +++--------
 1 file changed, 3 insertions(+), 8 deletions(-)

diff --git a/README_JA.md b/README_JA.md
index c4a939e..a955213 100644
--- a/README_JA.md
+++ b/README_JA.md
@@ -782,17 +782,12 @@ Qwen-72B については、2 つの方法で実験します。1) 4 つの A100-S
 ### vLLM 
 デプロイメントと高速推論のためには、vLLMを使用することをお勧めします。
 
-cuda 12.1 および pytorch 2.1 を使用している場合は、次のコマンドを直接使用して vLLM をインストールできます。
+**CUDA 12.1** および **PyTorch 2.1** を使用している場合は、次のコマンドを直接使用して vLLM をインストールできます。
 ```bash
-# pip install vllm  # この行はより速いですが、量子化モデルをサポートしていません。
-
-# 以下のはINT4の量子化をサポートします（INT8はまもなくサポートされます）。 インストールは遅くなります（〜10分）。
-git clone https://github.com/QwenLM/vllm-gptq
-cd vllm-gptq
-pip install -e .
+pip install vllm
 ```
 
-それ以外の場合は、公式 vLLM [インストール手順](https://docs.vllm.ai/en/latest/getting_started/installation.html) 、または[GPTQの量子化 vLLM レポ](https://github.com/QwenLM/vllm-gptq)を参照してください。
+それ以外の場合は、公式 vLLM [インストール手順](https://docs.vllm.ai/en/latest/getting_started/installation.html) を参照してください。
 
 #### vLLM + Transformer Wrapper
 

From ae78a97db9c42b6d069a75b4bda654856c717753 Mon Sep 17 00:00:00 2001
From: Ren Xuancheng <jklj077@users.noreply.github.com>
Date: Tue, 12 Mar 2024 15:41:39 +0800
Subject: [PATCH 04/12] Update README_CN.md

---
 README_CN.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/README_CN.md b/README_CN.md
index 1f2fa5d..92d74e7 100644
--- a/README_CN.md
+++ b/README_CN.md
@@ -882,7 +882,7 @@ print(response)
 如果你使用**CUDA 12.1和PyTorch 2.1**，可以直接使用以下命令安装vLLM。
 
 ```bash
-# pip install vllm
+pip install vllm
 ```
 
 否则请参考vLLM官方的[安装说明](https://docs.vllm.ai/en/latest/getting_started/installation.html)。

From 43656ed69c414a35314db6476ddebbbae0212a30 Mon Sep 17 00:00:00 2001
From: Ren Xuancheng <jklj077@users.noreply.github.com>
Date: Tue, 12 Mar 2024 15:44:25 +0800
Subject: [PATCH 05/12] Update requirements.txt

---
 requirements.txt | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/requirements.txt b/requirements.txt
index 7351c4b..1d20fd9 100644
--- a/requirements.txt
+++ b/requirements.txt
@@ -1,4 +1,4 @@
-transformers==4.32.0
+transformers>=4.32.0
 accelerate
 tiktoken
 einops

From d330013599d59ed4490465d94d382c434b18a331 Mon Sep 17 00:00:00 2001
From: Ren Xuancheng <jklj077@users.noreply.github.com>
Date: Tue, 12 Mar 2024 15:45:53 +0800
Subject: [PATCH 06/12] Update requirements.txt

---
 requirements.txt | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/requirements.txt b/requirements.txt
index 1d20fd9..9c35591 100644
--- a/requirements.txt
+++ b/requirements.txt
@@ -1,4 +1,4 @@
-transformers>=4.32.0
+transformers>=4.32.0,<4.38.0
 accelerate
 tiktoken
 einops

From 79e166e7b7cb61cd4837c4ce657db41847258fd9 Mon Sep 17 00:00:00 2001
From: Ren Xuancheng <jklj077@users.noreply.github.com>
Date: Tue, 12 Mar 2024 15:50:23 +0800
Subject: [PATCH 07/12] Update README.md

---
 README.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/README.md b/README.md
index 2b132d1..dd5c64d 100644
--- a/README.md
+++ b/README.md
@@ -620,7 +620,7 @@ We also measure the inference speed and GPU memory usage with different settings
 ### Usage
 Now we provide the official training script, `finetune.py`, for users to finetune the pretrained model for downstream applications in a simple fashion. Additionally, we provide shell scripts to launch finetuning with no worries. This script supports the training with [DeepSpeed](https://github.com/microsoft/DeepSpeed) and [FSDP](https://engineering.fb.com/2021/07/15/open-source/fsdp/). The shell scripts that we provide use DeepSpeed (Note: this may have conflicts with the latest version of pydantic and you should use make sure `pydantic<2.0`) and Peft. You can install them by:
 ```bash
-pip install peft deepspeed
+pip install peft<0.8.0 deepspeed
 ```
 
 To prepare your training data, you need to put all the samples into a list and save it to a json file. Each sample is a dictionary consisting of an id and a list for conversation. Below is a simple example list with 1 sample:

From 1c5691dc51ed4f85fe42c850af1d332b2392e4b2 Mon Sep 17 00:00:00 2001
From: Ren Xuancheng <jklj077@users.noreply.github.com>
Date: Tue, 12 Mar 2024 15:50:48 +0800
Subject: [PATCH 08/12] Update README_CN.md

---
 README_CN.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/README_CN.md b/README_CN.md
index 92d74e7..fef3db1 100644
--- a/README_CN.md
+++ b/README_CN.md
@@ -611,7 +611,7 @@ model = AutoModelForCausalLM.from_pretrained(
 ### 使用方法
 我们提供了`finetune.py`这个脚本供用户实现在自己的数据上进行微调的功能，以接入下游任务。此外，我们还提供了shell脚本减少用户的工作量。这个脚本支持 [DeepSpeed](https://github.com/microsoft/DeepSpeed) 和 [FSDP](https://engineering.fb.com/2021/07/15/open-source/fsdp/) 。我们提供的shell脚本使用了DeepSpeed，因此建议您确保已经安装DeepSpeed和Peft（注意：DeepSpeed可能不兼容最新的pydantic版本，请确保`pydantic<2.0`）。你可以使用如下命令安装：
 ```bash
-pip install peft deepspeed
+pip install "peft<0.8.0" deepspeed
 ```
 
 首先，你需要准备你的训练数据。你需要将所有样本放到一个列表中并存入json文件中。每个样本对应一个字典，包含id和conversation，其中后者为一个列表。示例如下所示：

From d8c0abb0474f9634ca906783fc409d6e46399f6a Mon Sep 17 00:00:00 2001
From: Ren Xuancheng <jklj077@users.noreply.github.com>
Date: Tue, 12 Mar 2024 15:51:07 +0800
Subject: [PATCH 09/12] Update README.md

---
 README.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/README.md b/README.md
index dd5c64d..3494ae6 100644
--- a/README.md
+++ b/README.md
@@ -620,7 +620,7 @@ We also measure the inference speed and GPU memory usage with different settings
 ### Usage
 Now we provide the official training script, `finetune.py`, for users to finetune the pretrained model for downstream applications in a simple fashion. Additionally, we provide shell scripts to launch finetuning with no worries. This script supports the training with [DeepSpeed](https://github.com/microsoft/DeepSpeed) and [FSDP](https://engineering.fb.com/2021/07/15/open-source/fsdp/). The shell scripts that we provide use DeepSpeed (Note: this may have conflicts with the latest version of pydantic and you should use make sure `pydantic<2.0`) and Peft. You can install them by:
 ```bash
-pip install peft<0.8.0 deepspeed
+pip install "peft<0.8.0" deepspeed
 ```
 
 To prepare your training data, you need to put all the samples into a list and save it to a json file. Each sample is a dictionary consisting of an id and a list for conversation. Below is a simple example list with 1 sample:

From b4c8693001175c9ea02e372f1fbf9217ca3a26a8 Mon Sep 17 00:00:00 2001
From: Ren Xuancheng <jklj077@users.noreply.github.com>
Date: Tue, 12 Mar 2024 15:51:39 +0800
Subject: [PATCH 10/12] Update README_ES.md

---
 README_ES.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/README_ES.md b/README_ES.md
index 1103e22..15d59c9 100644
--- a/README_ES.md
+++ b/README_ES.md
@@ -612,7 +612,7 @@ También medimos la velocidad de inferencia y el uso de memoria de la GPU con di
 ### Utilización
 Ahora proporcionamos el script de entrenamiento oficial, `finetune.py`, para que los usuarios puedan ajustar el modelo preentrenado para aplicaciones posteriores de forma sencilla. Además, proporcionamos scripts de shell para lanzar el ajuste fino sin preocupaciones. Este script soporta el entrenamiento con [DeepSpeed](https://github.com/microsoft/DeepSpeed) y [FSDP](https://engineering.fb.com/2021/07/15/open-source/fsdp/). Los shell scripts que proporcionamos utilizan DeepSpeed (Nota: esto puede tener conflictos con la última versión de pydantic y debe utilizar make sure `pydantic<2.0`) y Peft. Puede instalarlos de la siguiente manera:
 ```bash
-pip install peft deepspeed
+pip install "peft<0.8.0" deepspeed
 ```
 
 Para preparar tus datos de entrenamiento, necesitas poner todas las muestras en una lista y guardarla en un archivo json. Cada muestra es un diccionario que consiste en un id y una lista para la conversación. A continuación se muestra una lista de ejemplo simple con 1 muestra:

From f08a039d4638e1c4d8c72a7f2a07a6453332bef1 Mon Sep 17 00:00:00 2001
From: Ren Xuancheng <jklj077@users.noreply.github.com>
Date: Tue, 12 Mar 2024 15:51:58 +0800
Subject: [PATCH 11/12] Update README_FR.md

---
 README_FR.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/README_FR.md b/README_FR.md
index e323a76..b34aeca 100644
--- a/README_FR.md
+++ b/README_FR.md
@@ -614,7 +614,7 @@ Nous mesurons également la vitesse d'inférence et l'utilisation de la mémoire
 ### Utilisation
 Nous fournissons maintenant le script d'entraînement officiel, `finetune.py`, pour que les utilisateurs puissent ajuster le modèle pré-entraîné pour les applications en aval de manière simple. De plus, nous fournissons des scripts shell pour lancer le finetune sans soucis. Ce script prend en charge l'entraînement avec [DeepSpeed](https://github.com/microsoft/DeepSpeed) et [FSDP](https://engineering.fb.com/2021/07/15/open-source/fsdp/). Les scripts que nous fournissons utilisent DeepSpeed (Note : il peut y avoir des conflits avec la dernière version de pydantic et vous devriez utiliser make sure `pydantic<2.0`) et Peft. Vous pouvez les installer en procédant comme suit :
 ```bash
-pip install peft deepspeed
+pip install "peft<0.8.0" deepspeed
 ```
 
 Pour préparer vos données d'entraînement, vous devez rassembler tous les échantillons dans une liste et l'enregistrer dans un fichier json. Chaque échantillon est un dictionnaire composé d'un identifiant et d'une liste de conversation. Voici un exemple simple de liste avec 1 échantillon :

From 82d6256cdc4072e87d63a6a98f686cd4ab8cfef0 Mon Sep 17 00:00:00 2001
From: Ren Xuancheng <jklj077@users.noreply.github.com>
Date: Tue, 12 Mar 2024 15:53:01 +0800
Subject: [PATCH 12/12] Update README_JA.md

---
 README_JA.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/README_JA.md b/README_JA.md
index a955213..7663d97 100644
--- a/README_JA.md
+++ b/README_JA.md
@@ -608,7 +608,7 @@ BF16、Int8、および Int4 のモデルを使用して 2048 を生成する際
 ### 使用方法
 現在、公式のトレーニングスクリプト `finetune.py` を提供しています。さらに、finetune.pyのシェルスクリプトを提供し、finetune.pyを実行することで、finetune.pyを起動することができる。さらに、安心してファインチューニングを開始するためのシェルスクリプトも提供しています。このスクリプトは、[DeepSpeed](https://github.com/microsoft/DeepSpeed) (注意：これはpydanticの最新バージョンとコンフリクトする可能性があるので、`pydantic<2.0`にする必要があります) および [FSDP](https://engineering.fb.com/2021/07/15/open-source/fsdp/) を使用したトレーニングをサポートします。弊社が提供するシェル・スクリプトは DeepSpeed と Peft を使用するため、事前に DeepSpeed と Peft をインストールすることをお勧めします：
 ```bash
-pip install -r requirements_finetune.txt
+pip install "peft<0.8.0" deepspeed
 ```
 
 学習データを準備するには、すべてのサンプルをリストにまとめ、jsonファイルに保存する必要があります。各サンプルはidと会話リストで構成される辞書です。以下は1つのサンプルを含む単純なリストの例です：