update readme

2 years ago · 6b8fd32248
parent f9870f5ce7
commit 6b8fd32248
4 changed files with 153 additions and 71 deletions
--- a/README.md
+++ b/README.md
@ -34,7 +34,7 @@ Qwen-7B is the 7B-parameter version of the large language model series, Qwen (ab
 In general, Qwen-7B outperforms the baseline models of a similar model size, and even outperforms larger models of around 13B parameters, on a series of benchmark datasets, e.g., MMLU, C-Eval, GSM8K, HumanEval, and WMT22, etc., which evaluate the models' capabilities on natural language understanding, mathematic problem solving, coding, etc. See the results below.

 | Model             | MMLU           |         C-Eval |          GSM8K |      HumanEval |  WMT22 (en-zh) |
-| :---------------- | -------------: | -------------: | -------------: | -------------: | -------------: |
+| :---------------- | :------------: | :------------: | :------------: | :------------: | :------------: |
 | LLaMA-7B          | 35.1           |              - |           11.0 |           10.5 |            8.7 |
 | LLaMA 2-7B        | 45.3           |              - |           14.6 |           12.8 |           17.9 |
 | Baichuan-7B       | 42.3           |           42.8 |            9.7 |            9.2 |           26.6 |
@ -83,7 +83,7 @@ Now you can start with ModelScope or Transformers.

 #### 🤗 Transformers

-To use Qwen-7B-Chat for the inference, all you need to do is to input a few lines of codes as demonstrated below:
+To use Qwen-7B-Chat for the inference, all you need to do is to input a few lines of codes as demonstrated below. However, **please make sure that you are using the latest code.**

 ```python
 from transformers import AutoModelForCausalLM, AutoTokenizer
@ -197,6 +197,12 @@ We provide examples to show how to load models in `NF4` and `Int8`. For starters
 **Requirements** Python >=3.8. Linux distribution (Ubuntu, MacOS, etc.) + CUDA > 10.0.
 ```

+Then run the following command to install `bitsandbytes`:
+
+```
+pip install bitsandbytes
+```
+
 Windows users should find another option, which might be [bitsandbytes-windows-webui](https://github.com/jllllll/bitsandbytes-windows-webui/releases/tag/wheels).

 Then you only need to add your quantization configuration to `AutoModelForCausalLM.from_pretrained`. See the example below:
@ -226,36 +232,55 @@ model = AutoModelForCausalLM.from_pretrained(
 With this method, it is available to load Qwen-7B in `NF4` and `Int8`, which saves you memory usage. We provide related statistics of model performance below. We find that the quantization downgrades the effectiveness slightly but significantly increases inference efficiency and reduces memory costs.

 | Precision   |   MMLU   |  Memory  |
-| :---------: | -------: | -------: |
-|   BF16   |  56.7 |   16.2G |
-|   Int8   |  52.8 |   10.1G |
-|    NF4   |  48.9 |   7.4G |
+| :---------: | :------: | :------: |
+|   BF16      |   56.7   |   16.2G  |
+|   Int8      |   52.8   |   10.1G  |
+|    NF4      |   48.9   |   7.4G   |
+
+## Demo
+
+### CLI Demo
+
+We provide a CLI demo example in `cli_demo.py`, which supports streaming output for the generation. Users can interact with Qwen-7B-Chat by inputting prompts, and the model returns model outputs in the streaming mode. Run the command below:
+
+```
+python cli_demo.py
+```
+
+### Web UI
+
+We provide code for users to build a web UI demo (thanks to @wysiad). Before you start, make sure you install the following packages:

-## CLI Demo
+```
+pip install gradio mdtex2html
+```

-We provide a CLI demo example in `cli_demo.py`, which supports streaming output for the generation. Users can interact with Qwen-7B-Chat by inputting prompts, and the model returns model outputs in the streaming mode.
+Then run the command below and click on the generated link:
+
+```
+python web_demo.py
+```

 ## Tool Usage

 Qwen-7B-Chat is specifically optimized for tool usage, including API, database, models, etc., so that users can build their own Qwen-7B-based LangChain, Agent, and Code Interpreter. In our evaluation [benchmark](eval/EVALUATION.md) for assessing tool usage capabilities, we find that Qwen-7B reaches stable performance.
-[](https://)

-| Model       | Tool Selection (Acc.↑) | Tool Input (Rouge-L↑) | False Positive Error↓ |
-|-------------|------------------------|-----------------------|-----------------------|
-| GPT-4       | 95%                    | **0.90**              | 15%                   |
-| GPT-3.5     | 85%                    | 0.88                  | 75%                   |
-| **Qwen-7B** | **99%**                | 0.89                  | **9.7%**              |
+| Model       | Tool Selection (Acc.↑) | Tool Input (Rouge-L↑)  | False Positive Error↓  |
+|:------------|:----------------------:|:----------------------:|:----------------------:|
+| GPT-4       | 95%                    | **0.90**               | 15%                    |
+| GPT-3.5     | 85%                    | 0.88                   | 75%                    |
+| **Qwen-7B** | **99%**                | 0.89                   | **9.7%**               |

 For how to write and use prompts for ReAct Prompting, please refer to [the ReAct examples](examples/react_prompt.md). The use of tools can enable the model to better perform tasks.

 Additionally, we provide experimental results to show its capabilities of playing as an agent. See [Hugging Face Agent](https://huggingface.co/docs/transformers/transformers_agents) for more information. Its performance on the run-mode benchmark provided by Hugging Face is as follows:

-| Model | Tool Selection↑ | Tool Used↑ | Code↑ |
-|-|-|-|-|
-|GPT-4 | **100** | **100** | **97.41** |
-|GPT-3.5 | 95.37 | 96.30 | 87.04 |
-|StarCoder-15.5B | 87.04 | 87.96 | 68.89 |
-| **Qwen-7B** | 90.74 | 92.59 | 74.07 |
+| Model          | Tool Selection↑ | Tool Used↑  |   Code↑   |
+|:---------------|:---------------:|:-----------:|:---------:|
+|GPT-4           |     **100**     |   **100**   | **97.41** |
+|GPT-3.5         |      95.37      |    96.30    |   87.04   |
+|StarCoder-15.5B |      87.04      |    87.96    |   68.89   |
+| **Qwen-7B**    |      90.74      |    92.59    |   74.07   |

 ## Long-Context Understanding

@ -293,3 +318,4 @@ Researchers and developers are free to use the codes and model weights of both Q
 ## Contact Us

 If you are interested to leave a message to either our research team or product team, feel free to send an email to qianwen_opensource@alibabacloud.com.
+
--- a/README_CN.md
+++ b/README_CN.md
@ -33,18 +33,18 @@

 Qwen-7B在多个全面评估自然语言理解与生成、数学运算解题、代码生成等能力的评测数据集上，包括MMLU、C-Eval、GSM8K、HumanEval、WMT22等，均超出了同规模大语言模型的表现，甚至超出了如12-13B参数等更大规模的语言模型。

-| Model        | MMLU     |   C-Eval |    GSM8K | HumanEval | WMT22 (en-zh) |
-| :------------- | ---------- | ---------: | ---------: | ----------: | --------------: |
-| LLaMA-7B     | 35.1     |        - |     11.0 |      10.5 |           8.7 |
-| LLaMA 2-7B   | 45.3     |        - |     14.6 |      12.8 |          17.9 |
-| Baichuan-7B  | 42.3     |     42.8 |      9.7 |       9.2 |          26.6 |
-| ChatGLM2-6B  | 47.9     |     51.7 |     32.4 |       9.2 |             - |
-| InternLM-7B  | 51.0     |     52.8 |     31.2 |      10.4 |          14.8 |
-| Baichuan-13B | 51.6     |     53.6 |     26.6 |      12.8 |          30.0 |
-| LLaMA-13B    | 46.9     |     35.5 |     17.8 |      15.8 |          12.0 |
-| LLaMA 2-13B  | 54.8     |        - |     28.7 |      18.3 |          24.2 |
-| ChatGLM2-12B | 56.2     | **61.6** |     40.9 |         - |             - |
-| **Qwen-7B**  | **56.7** |     59.6 | **51.6** |  **24.4** |      **30.6** |
+| Model             | MMLU           |         C-Eval |          GSM8K |      HumanEval |  WMT22 (en-zh) |
+| :---------------- | :------------: | :------------: | :------------: | :------------: | :------------: |
+| LLaMA-7B          | 35.1           |              - |           11.0 |           10.5 |            8.7 |
+| LLaMA 2-7B        | 45.3           |              - |           14.6 |           12.8 |           17.9 |
+| Baichuan-7B       | 42.3           |           42.8 |            9.7 |            9.2 |           26.6 |
+| ChatGLM2-6B       | 47.9           |           51.7 |           32.4 |            9.2 |              - |
+| InternLM-7B       | 51.0           |           52.8 |           31.2 |           10.4 |           14.8 |
+| Baichuan-13B      | 51.6           |           53.6 |           26.6 |           12.8 |           30.0 |
+| LLaMA-13B         | 46.9           |           35.5 |           17.8 |           15.8 |           12.0 |
+| LLaMA 2-13B       | 54.8           |              - |           28.7 |           18.3 |           24.2 |
+| ChatGLM2-12B      | 56.2           |       **61.6** |           40.9 |              - |              - |
+| **Qwen-7B**       | **56.7**       |           59.6 |       **51.6** |       **24.4** |       **30.6** |

 <p align="center">
    <img src="assets/performance.png" width="1000"/>
@ -83,7 +83,7 @@ cd flash-attention && pip install .

 #### 🤗 Transformers

-如希望使用Qwen-7B-chat进行推理，所需要写的只是如下所示的数行代码：
+如希望使用Qwen-7B-chat进行推理，所需要写的只是如下所示的数行代码。**请确保你使用的是最新代码。**

 ```python
 from transformers import AutoModelForCausalLM, AutoTokenizer
@ -192,7 +192,6 @@ print(f'Response: {response}')

 基于tiktoken的tokenizer有别于其他分词器，比如sentencepiece tokenizer。尤其在微调阶段，需要特别注意特殊token的使用。关于tokenizer的更多信息，以及微调时涉及的相关使用，请参阅[文档](tokenization_note_zh.md)。

-
 ## 量化

 如希望使用更低精度的量化模型，如4比特和8比特的模型，我们提供了简单的示例来说明如何快速使用量化模型。在开始前，确保你已经安装了`bitsandbytes`。请注意，`bitsandbytes`的安装要求是：
@ -201,6 +200,12 @@ print(f'Response: {response}')
 **Requirements** Python >=3.8. Linux distribution (Ubuntu, MacOS, etc.) + CUDA > 10.0.
 ```

+随后运行如下命令安装`bitsandbytes`:
+
+```
+pip install bitsandbytes
+```
+
 Windows用户需安装特定版本的`bitsandbytes`，可选项包括[bitsandbytes-windows-webui](https://github.com/jllllll/bitsandbytes-windows-webui/releases/tag/wheels)。

 你只需要在`AutoModelForCausalLM.from_pretrained`中添加你的量化配置，即可使用量化模型。如下所示：
@ -229,25 +234,45 @@ model = AutoModelForCausalLM.from_pretrained(

 上述方法可以让我们将模型量化成`NF4`和`Int8`精度的模型进行读取，帮助我们节省显存开销。我们也提供了相关性能数据。我们发现尽管模型在效果上存在损失，但模型的显存开销大幅降低。

-| Precision | MMLU | Memory |
-| :---------: | -------: | -----: |
-|   BF16   |  56.7 |   16.2G |
-|   Int8   |  52.8 |   10.1G |
-|    NF4    |  48.9 |    7.4G |
+| Precision   |   MMLU   |  Memory  |
+| :---------: | :------: | :------: |
+|   BF16      |   56.7   |   16.2G  |
+|   Int8      |   52.8   |   10.1G  |
+|    NF4      |   48.9   |   7.4G   |

-## 交互式Demo
+## Demo
+
+### 交互式Demo
+
+我们提供了一个简单的交互式Demo示例，请查看`cli_demo.py`。当前模型已经支持流式输出，用户可通过输入文字的方式和Qwen-7B-Chat交互，模型将流式输出返回结果。运行如下命令：
+
+```
+python cli_demo.py
+```

-我们提供了一个简单的交互式Demo示例，请查看`cli_demo.py`。当前模型已经支持流式输出，用户可通过输入文字的方式和Qwen-7B-Chat交互，模型将流式输出返回结果。
+### Web UI
+
+我们提供了Web UI的demo供用户使用 (感谢 @wysiad 支持)。在开始前，确保已经安装如下代码库：
+
+```
+pip install gradio mdtex2html
+```
+
+随后运行如下命令，并点击生成链接：
+
+```
+python web_demo.py
+```

 ## 工具调用

 Qwen-7B-Chat针对包括API、数据库、模型等工具在内的调用进行了优化。用户可以开发基于Qwen-7B的LangChain、Agent甚至Code Interpreter。在我们开源的[评测数据集](eval/EVALUATION.md)上测试模型的工具调用能力，并发现Qwen-7B-Chat能够取得稳定的表现。

-| Model       | Tool Selection (Acc.↑) | Tool Input (Rouge-L↑) | False Positive Error↓ |
-| ------------- | ------------------------- | ------------------------ | ------------------------ |
-| GPT-4       | 95%                     | **0.90**               | 15%                    |
-| GPT-3.5     | 85%                     | 0.88                   | 75%                    |
-| **Qwen-7B** | **99%**                 | 0.89                   | **9.7%**               |
+| Model       | Tool Selection (Acc.↑) | Tool Input (Rouge-L↑)  | False Positive Error↓  |
+|:------------|:----------------------:|:----------------------:|:----------------------:|
+| GPT-4       | 95%                    | **0.90**               | 15%                    |
+| GPT-3.5     | 85%                    | 0.88                   | 75%                    |
+| **Qwen-7B** | **99%**                | 0.89                   | **9.7%**               |

 我们提供了文档说明如何根据ReAct Prompting的原则写作你的prompt。

@ -255,12 +280,12 @@ For how to write and use prompts for ReAct Prompting, please refer to [the ReAct

 此外，我们还提供了实验结果表明我们的模型扮演Agent的能力。请阅读相关文档[链接](https://huggingface.co/docs/transformers/transformers_agents)了解更多信息。模型在Hugging Face提供的评测数据集上表现如下：

-| Model           | Tool Selection↑ | Tool Used↑ | Code↑    |
-| ----------------- | ------------------ | ------------- | ----------- |
-| GPT-4           | **100**          | **100**     | **97.41** |
-| GPT-3.5         | 95.37            | 96.30       | 87.04     |
-| StarCoder-15.5B | 87.04            | 87.96       | 68.89     |
-| **Qwen-7B**     | 90.74            | 92.59       | 74.07     |
+| Model          | Tool Selection↑ | Tool Used↑  |   Code↑   |
+|:---------------|:---------------:|:-----------:|:---------:|
+|GPT-4           |     **100**     |   **100**   | **97.41** |
+|GPT-3.5         |      95.37      |    96.30    |   87.04   |
+|StarCoder-15.5B |      87.04      |    87.96    |   68.89   |
+| **Qwen-7B**    |      90.74      |    92.59    |   74.07   |

 ## 长文本理解

@ -298,3 +323,4 @@ For how to write and use prompts for ReAct Prompting, please refer to [the ReAct
 ## 联系我们

 如果你想给我们的研发团队和产品团队留言，请通过邮件（qianwen_opensource@alibabacloud.com）联系我们。
+
--- a/README_JA.md
+++ b/README_JA.md
@ -38,7 +38,7 @@ Qwen-7Bは、アリババクラウドが提唱する大規模言語モデルシ
 一般的に、Qwen-7B は、MMLU、C-Eval、GSM8K、HumanEval、WMT22 などの自然言語理解、数学的問題解決、コーディングなどに関するモデルの能力を評価する一連のベンチマークデータセットにおいて、同程度のモデルサイズのベースラインモデルを凌駕し、さらには 13B 程度のパラメータを持つより大規模なモデルをも凌駕している。以下の結果をご覧ください。

 | Model             | MMLU           |         C-Eval |          GSM8K |      HumanEval |  WMT22 (en-zh) |
-| :---------------- | -------------: | -------------: | -------------: | -------------: | -------------: |
+| :---------------- | :------------: | :------------: | :------------: | :------------: | :------------: |
 | LLaMA-7B          | 35.1           |              - |           11.0 |           10.5 |            8.7 |
 | LLaMA 2-7B        | 45.3           |              - |           14.6 |           12.8 |           17.9 |
 | Baichuan-7B       | 42.3           |           42.8 |            9.7 |            9.2 |           26.6 |
@ -87,7 +87,7 @@ cd flash-attention && pip install .

 #### 🤗 Transformers

-Qwen-7B-Chat を推論に使用するには、以下のように数行のコードを入力するだけです:
+Qwen-7B-Chat を推論に使用するには、以下のように数行のコードを入力するだけです。**最新のコードを使用していることを確認してください。**

 ```python
 from transformers import AutoModelForCausalLM, AutoTokenizer
@ -201,6 +201,12 @@ tiktoken に基づくトークナイザーは、他のトークナイザー、
 **必要条件** Python >= 3.8。Linux ディストリビューション（Ubuntu、MacOS など）+ CUDA > 10.0。
 ```

+そして、以下のコマンドを実行して `bitsandbytes` をインストールする：
+
+```
+pip install bitsandbytes
+```
+
 Windows ユーザは、[bitsandbytes-windows-webui](https://github.com/jllllll/bitsandbytes-windows-webui/releases/tag/wheels) という別のオプションを見つける必要があります。

 そして、量子化の設定を `AutoModelForCausalLM.from_pretrained` に追加するだけとなります。以下の例を参照してください:
@ -230,36 +236,60 @@ model = AutoModelForCausalLM.from_pretrained(
 この方法では、Qwen-7B を `NF4` と `Int8` でロードすることができ、メモリ使用量を節約できる。以下にモデル性能の関連統計量を示します。量子化により、有効性は若干低下するが、推論効率は大幅に向上し、メモリコストが削減されることがわかります。

 | Precision   |   MMLU   |  Memory  |
-| :---------: | -------: | -------: |
-|   BF16      |  56.7    |   16.2G  |
-|   Int8      |  52.8    |   10.1G  |
-|    NF4      |  48.9    |   7.4G   |
+| :---------: | :------: | :------: |
+|   BF16      |   56.7   |   16.2G  |
+|   Int8      |   52.8   |   10.1G  |
+|    NF4      |   48.9   |   7.4G   |

-## CLI デモ
+## 

 `cli_demo.py` に CLI のデモ例を用意しています。ユーザはプロンプトを入力することで Qwen-7B-Chat と対話することができ、モデルはストリーミングモードでモデルの出力を返します。

+## デモ
+
+### CLI デモ
+
+`cli_demo.py` に CLI のデモ例を用意しています。ユーザはプロンプトを入力することで Qwen-7B-Chat と対話することができ、モデルはストリーミングモードでモデルの出力を返します。以下のコマンドを実行する：
+
+```
+python cli_demo.py
+```
+
+### ウェブ UI
+
+ウェブUIデモを構築するためのコードを提供します（@wysiadに感謝）。始める前に、以下のパッケージがインストールされていることを確認してください：
+
+```
+pip install gradio mdtex2html
+```
+
+そして、以下のコマンドを実行し、生成されたリンクをクリックする：
+
+```
+python web_demo.py
+```
+
 ## ツールの使用

 Qwen-7B-Chat は、API、データベース、モデルなど、ツールの利用に特化して最適化されており、ユーザは独自の Qwen-7B ベースの LangChain、エージェント、コードインタプリタを構築することができます。ツール利用能力を評価するための評価[ベンチマーク](eval/EVALUATION.md)では、Qwen-7B は安定した性能に達しています。
 [](https://)

-| Model       | Tool Selection (Acc.↑) | Tool Input (Rouge-L↑) | False Positive Error↓ |
-|-------------|------------------------|-----------------------|-----------------------|
-| GPT-4       | 95%                    | **0.90**              | 15%                   |
-| GPT-3.5     | 85%                    | 0.88                  | 75%                   |
-| **Qwen-7B** | **99%**                | 0.89                  | **9.7%**              |
+| Model       | Tool Selection (Acc.↑) | Tool Input (Rouge-L↑)  | False Positive Error↓  |
+|:------------|:----------------------:|:----------------------:|:----------------------:|
+| GPT-4       | 95%                    | **0.90**               | 15%                    |
+| GPT-3.5     | 85%                    | 0.88                   | 75%                    |
+| **Qwen-7B** | **99%**                | 0.89                   | **9.7%**               |

 ReAct プロンプトの書き方や使い方については、[ReAct の例](examples/react_prompt.md)を参照してください。ツールを使用することで、モデルがよりよいタスクを実行できるようになります。

 さらに、エージェントとしての能力を示す実験結果を提供する。詳細は [Hugging Face Agent](https://huggingface.co/docs/transformers/transformers_agents) を参照。Hugging Face が提供するランモードベンチマークでの性能は以下の通りです:

-| Model           | Tool Selection↑  | Tool Used↑ | Code↑     |
-| --------------- | ---------------- | ---------- | --------- |
-| GPT-4           | **100**          | **100**    | **97.41** |
-| GPT-3.5         | 95.37            | 96.30      | 87.04     |
-| StarCoder-15.5B | 87.04            | 87.96      | 68.89     |
-| **Qwen-7B**     | 90.74            | 92.59      | 74.07     |
+| Model          | Tool Selection↑ | Tool Used↑  |   Code↑   |
+|:---------------|:---------------:|:-----------:|:---------:|
+|GPT-4           |     **100**     |   **100**   | **97.41** |
+|GPT-3.5         |      95.37      |    96.30    |   87.04   |
+|StarCoder-15.5B |      87.04      |    87.96    |   68.89   |
+| **Qwen-7B**    |      90.74      |    92.59    |   74.07   |

 ## 長い文脈の理解

@ -297,3 +327,4 @@ Qwen-7B と Qwen-7B-Chat のコードとモデルウェイトは、研究者や
 ## お問い合わせ

 研究チームまたは製品チームへのメッセージは、qianwen_opensource@alibabacloud.com までお気軽にお送りください。
+
--- a/requirements.txt
+++ b/requirements.txt
@ -3,5 +3,4 @@ accelerate
 tiktoken
 einops
 transformers_stream_generator==0.0.4
-bitsandbytes
 scipy