From 783b0538e840049d5df67ef251d66a1522859cde Mon Sep 17 00:00:00 2001
From: JustinLin610 <justinlin930319@hotmail.com>
Date: Thu, 10 Aug 2023 11:29:03 +0800
Subject: [PATCH 1/3] add faq files

---
 FAQ.md       | 76 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 FAQ_zh.md    | 76 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 README.md    |  6 ++++-
 README_CN.md |  7 ++++-
 README_JA.md |  6 ++++-
 5 files changed, 168 insertions(+), 3 deletions(-)
 create mode 100644 FAQ.md
 create mode 100644 FAQ_zh.md

diff --git a/FAQ.md b/FAQ.md
new file mode 100644
index 0000000..b4ffe0e
--- /dev/null
+++ b/FAQ.md
@@ -0,0 +1,76 @@
+# FAQ
+
+## Installation & Environment
+
+#### Failure in installing flash attention
+
+Flash attention is an option for accelerating training and inference. Only NVIDIA GPUs of Turing, Ampere, Ada, and Hopper architecture, e.g., H100, A100, RTX 3090, T4, RTX 2080, can support flash attention. You can use our models without installing it.
+
+#### Which version of transformers should I use?
+
+4.31.0 is preferred.
+
+#### I downloaded the codes and checkpoints but I can't load the model locally. What should I do?
+
+Please check if you have updated the code to the latest, and correctly downloaded all the sharded checkpoint files.
+
+#### `qwen.tiktoken` is not found. What is it?
+
+This is the merge file of the tokenizer. You have to download it. Note that if you just git clone the repo without [git-lfs](https://git-lfs.com), you cannot download this file.
+
+#### transformers_stream_generator/tiktoken/accelerate not found
+
+Run the command `pip install -r requirements.txt`. You can find the file at [https://github.com/QwenLM/Qwen-7B/blob/main/requirements.txt](https://github.com/QwenLM/Qwen-7B/blob/main/requirements.txt).
+
+## Demo & Inference
+
+#### Is there any demo? CLI demo and Web UI demo?
+
+Yes, see `web_demo.py` for web demo and `cli_demo.py` for CLI demo. See README for more information.
+
+
+
+#### Can I use CPU only?
+
+Yes, run `python  cli_demo.py --cpu_only` will load the model and inference on CPU only.
+
+#### Can Qwen support streaming?
+
+Yes. See the function `chat_stream` in `modeling_qwen.py`.
+
+#### Gibberish in result when using chat_stream().
+
+This is because tokens represent bytes and a single token may be a meaningless string. We have updated the default setting of our tokenizer to avoid such decoding results. Please update the code to the latest version.
+
+#### It seems that the generation is not related to the instruction...
+
+Please check if you are loading Qwen-7B-Chat instead of Qwen-7B. Qwen-7B is the base model without alignment, which behaves differently from the SFT/Chat model.
+
+#### Is quantization supported?
+
+Yes, the quantization is supported by `bitsandbytes`. We are working on an improved version and will release the quantized model checkpoints.
+
+#### Errors in running quantized models: `importlib.metadata.PackageNotFoundError: No package metadata was found for bitsandbytes`
+
+For Linux users，running `pip install bitsandbytes` directly can solve the problem. For Windows users, you can run `python -m pip install bitsandbytes --prefer-binary --extra-index-url=https://jllllll.github.io/bitsandbytes-windows-webui`·
+
+#### Slow when processing long sequences
+
+We solved this problem. Updating the code to the latest version can help.
+
+#### Unsatisfactory performance in processing long sequences
+
+Please ensure that NTK is applied. `use_dynamc_ntk` and `use_logn_attn` in `config.json` should be set to `true` (`true` by default).
+
+## Finetuning
+
+#### Can Qwen support SFT or even RLHF?
+
+We do not provide finetuning or RLHF codes for now. However, some projects have supported finetuning, see [FastChat](**[https://github.com/lm-sys/FastChat](https://github.com/lm-sys/FastChat)), [Firefly]([https://github.com/yangjianxin1/Firefly](https://github.com/yangjianxin1/Firefly)), [**LLaMA Efficient Tuning**]([https://github.com/hiyouga/LLaMA-Efficient-Tuning](https://github.com/hiyouga/LLaMA-Efficient-Tuning)), etc. We will soon update the relevant codes.
+
+## Tokenizer
+
+#### bos_id/eos_id/pad_id not found
+
+In our training, we only use `<|endoftext|>` as the separator and padding token. You can set bos_id, eos_id, and pad_id to tokenizer.eod_id. Learn more about our tokenizer from our documents about the tokenizer.
+
diff --git a/FAQ_zh.md b/FAQ_zh.md
new file mode 100644
index 0000000..23205fd
--- /dev/null
+++ b/FAQ_zh.md
@@ -0,0 +1,76 @@
+# FAQ
+
+## 安装&环境
+
+#### flash attention 安装失败
+
+flash attention是一个用于加速模型训练推理的可选项，且仅适用于Turing、Ampere、Ada、Hopper架构的Nvidia GPU显卡（如H100、A100、RTX 3090、T4、RTX 2080），您可以在不安装flash attention的情况下正常使用模型进行推理。
+
+#### 我应该用哪个transformers版本？
+
+建议使用4.31.0。
+
+#### 我把模型和代码下到本地，按照教程无法使用，该怎么办？
+
+答：别着急，先检查你的代码是不是更新到最新版本，然后确认你是否完整地将模型checkpoint下到本地。
+
+#### `qwen.tiktoken`这个文件找不到，怎么办？
+
+这个是我们的tokenizer的merge文件，你必须下载它才能使用我们的tokenizer。注意，如果你使用git clone却没有使用git-lfs，这个文件不会被下载。如果你不了解git-lfs，可点击[官网](https://git-lfs.com/)了解。
+
+#### transformers_stream_generator/tiktoken/accelerate，这几个库提示找不到，怎么办？
+
+运行如下命令：`pip install -r requirements.txt`。相关依赖库在[https://github.com/QwenLM/Qwen-7B/blob/main/requirements.txt](https://github.com/QwenLM/Qwen-7B/blob/main/requirements.txt) 可以找到。
+
+## Demo & 推理
+
+#### 是否提供Demo？CLI Demo及Web UI Demo？
+
+`web_demo.py`和`cli_demo.py`分别提供了Web UI以及CLI的Demo。请查看README相关内容了解更多。
+
+#### 我没有GPU，只用CPU运行CLI demo可以吗？
+
+可以的，运行`python  cli_demo.py --cpu_only`命令即可将模型读取到CPU并使用CPU进行推理。
+
+#### Qwen支持流式推理吗？
+
+Qwen当前支持流式推理。见位于`modeling_qwen.py`的`chat_stream`函数。
+
+#### 使用`chat_stream()`生成混乱的内容及乱码，为什么？
+
+这是由于模型生成过程中输出的部分token需要与后续token一起解码才能输出正常文本，单个token解码结果是无意义字符串，我们已经更新了tokenizer解码时的默认设置，避免这些字符串在生成结果中出现，如果仍有类似问题请更新模型至最新版本。
+
+#### 模型的输出看起来与输入无关/没有遵循指令/看起来呆呆的
+
+请检查是否加载的是Qwen-7B-Chat模型进行推理，Qwen-7B模型是未经align的预训练基模型，不期望具备响应用户指令的能力。我们在模型最新版本已经对`chat`及`chat_stream`接口内进行了检查，避免您误将预训练模型作为SFT/Chat模型使用。
+
+#### 是否有量化版本模型
+
+目前Qwen支持基于`bitsandbytes`的8-bit和4-bit的量化推理。后续我们将进一步更新提供更加高效的量化推理实现，并提供对应的量化模型。
+
+#### 运行量化推理报错：`importlib.metadata.PackageNotFoundError: No package metadata was found for bitsandbytes`
+
+对于linux 用户，直接`pip install bitsandbytes`即可。对于windows用户，可以 运行`python -m pip install bitsandbytes --prefer-binary --extra-index-url=https://jllllll.github.io/bitsandbytes-windows-webui`。
+
+#### 生成序列较长后速度显著变慢
+
+这一问题已经在最新版本中修复。请更新到最新代码。
+
+#### 处理长序列时效果有问题**
+
+请确认是否开启ntk。若要启用这些技巧，请将`config.json`里的`use_dynamc_ntk`和`use_logn_attn`设置为`true`。最新代码默认为`true`。
+
+
+## 微调
+
+#### 当前是否支持SFT和RLHF？
+
+我们目前未提供SFT和RLHF代码。当前有多个外部项目已实现支持，如[FastChat](**[https://github.com/lm-sys/FastChat](https://github.com/lm-sys/FastChat))、[Firefly]([https://github.com/yangjianxin1/Firefly](https://github.com/yangjianxin1/Firefly))、[**LLaMA Efficient Tuning**]([https://github.com/hiyouga/LLaMA-Efficient-Tuning](https://github.com/hiyouga/LLaMA-Efficient-Tuning))等。我们会尽快更新这部分代码和说明。
+
+
+## Tokenizer
+
+#### bos_id/eos_id/pad_id，这些token id不存在，为什么？
+
+在训练过程中，我们仅使用<|endoftext|>这一token作为sample/document之间的分隔符及padding位置占位符，你可以将bos_id, eos_id, pad_id均指向tokenizer.eod_id。请阅读我们关于tokenizer的文档，了解如何设置这些id。
+
diff --git a/README.md b/README.md
index c7a819b..a5803d7 100644
--- a/README.md
+++ b/README.md
@@ -311,9 +311,13 @@ To extend the context length and break the bottleneck of training sequence lengt
 
 For your reproduction of the model performance on benchmark datasets, we provide scripts for you to reproduce the results. Check [eval/EVALUATION.md](eval/EVALUATION.md) for more information. Note that the reproduction may lead to slight differences from our reported results.
 
+## FAQ
+
+If you meet problems, please refer to [FAQ](FQA.md) and the issues first to search a solution before you launch a new issue.
+
 ## License Agreement
 
-Researchers and developers are free to use the codes and model weights of both Qwen-7B and Qwen-7B-Chat. We also allow their commercial use. Check our license at [LICENSE](LICENSE) for more details.
+Researchers and developers are free to use the codes and model weights of both Qwen-7B and Qwen-7B-Chat. We also allow their commercial use. Check our license at [LICENSE](LICENSE) for more details. If you have requirements for commercial use, please fill out the [form](https://dashscope.console.aliyun.com/openModelApply/qianwen) to apply.
 
 ## Contact Us
 
diff --git a/README_CN.md b/README_CN.md
index 39291b3..126d101 100644
--- a/README_CN.md
+++ b/README_CN.md
@@ -316,9 +316,14 @@ For how to write and use prompts for ReAct Prompting, please refer to [the ReAct
 
 我们提供了评测脚本以供复现我们的实验结果。注意，由于内部代码和开源代码存在少许差异，评测结果可能与汇报结果存在细微的结果不一致。请阅读[eval/EVALUATION.md](eval/EVALUATION.md)了解更多信息。
 
+## FAQ
+
+如遇到问题，敬请查阅[FAQ](FAQ_zh.md)以及issue区，如仍无法解决再提交issue。
+
+
 ## 使用协议
 
-研究人员与开发者可使用Qwen-7B和Qwen-7B-Chat或进行二次开发。我们同样允许商业使用，具体细节请查看[LICENSE](LICENSE)。
+研究人员与开发者可使用Qwen-7B和Qwen-7B-Chat或进行二次开发。我们同样允许商业使用，具体细节请查看[LICENSE](LICENSE)。如需商用，请填写[问卷](https://dashscope.console.aliyun.com/openModelApply/qianwen)申请。
 
 ## 联系我们
 
diff --git a/README_JA.md b/README_JA.md
index d5c0e2b..fdf3101 100644
--- a/README_JA.md
+++ b/README_JA.md
@@ -320,9 +320,13 @@ ReAct プロンプトの書き方や使い方については、[ReAct の例](ex
 
 ベンチマークデータセットでのモデル性能の再現のために、結果を再現するスクリプトを提供しています。詳しくは [eval/EVALUATION.md](eval/EVALUATION.md) を確認してください。なお、再現の結果、我々の報告結果と若干異なる場合がある。
 
+## FAQ
+
+問題が発生した場合は、[FAQ](FQA.md)やissueを参照し、新しいissueを立ち上げる前に解決策を探してください。
+
 ## ライセンス契約
 
-Qwen-7B と Qwen-7B-Chat のコードとモデルウェイトは、研究者や開発者が自由に使用することができます。また、商用利用も可能です。詳しくは [LICENSE](LICENSE) をご覧ください。
+Qwen-7B と Qwen-7B-Chat のコードとモデルウェイトは、研究者や開発者が自由に使用することができます。また、商用利用も可能です。詳しくは [LICENSE](LICENSE) をご覧ください。商用利用を希望される方は、[リクエストフォーム](https://dashscope.console.aliyun.com/openModelApply/qianwen)に必要事項をご記入の上、お申し込みください。
 
 ## お問い合わせ
 

From a380678a004feb5e2beb260aaeed54ca319b44e6 Mon Sep 17 00:00:00 2001
From: JustinLin610 <justinlin930319@hotmail.com>
Date: Thu, 10 Aug 2023 11:36:56 +0800
Subject: [PATCH 2/3] update faq

---
 FAQ.md    | 9 +++++++++
 FAQ_zh.md | 4 ++++
 2 files changed, 13 insertions(+)

diff --git a/FAQ.md b/FAQ.md
index b4ffe0e..c645286 100644
--- a/FAQ.md
+++ b/FAQ.md
@@ -21,6 +21,9 @@ This is the merge file of the tokenizer. You have to download it. Note that if y
 #### transformers_stream_generator/tiktoken/accelerate not found
 
 Run the command `pip install -r requirements.txt`. You can find the file at [https://github.com/QwenLM/Qwen-7B/blob/main/requirements.txt](https://github.com/QwenLM/Qwen-7B/blob/main/requirements.txt).
+<br><br>
+
+
 
 ## Demo & Inference
 
@@ -61,12 +64,18 @@ We solved this problem. Updating the code to the latest version can help.
 #### Unsatisfactory performance in processing long sequences
 
 Please ensure that NTK is applied. `use_dynamc_ntk` and `use_logn_attn` in `config.json` should be set to `true` (`true` by default).
+<br><br>
+
+
 
 ## Finetuning
 
 #### Can Qwen support SFT or even RLHF?
 
 We do not provide finetuning or RLHF codes for now. However, some projects have supported finetuning, see [FastChat](**[https://github.com/lm-sys/FastChat](https://github.com/lm-sys/FastChat)), [Firefly]([https://github.com/yangjianxin1/Firefly](https://github.com/yangjianxin1/Firefly)), [**LLaMA Efficient Tuning**]([https://github.com/hiyouga/LLaMA-Efficient-Tuning](https://github.com/hiyouga/LLaMA-Efficient-Tuning)), etc. We will soon update the relevant codes.
+<br><br>
+
+
 
 ## Tokenizer
 
diff --git a/FAQ_zh.md b/FAQ_zh.md
index 23205fd..de29bec 100644
--- a/FAQ_zh.md
+++ b/FAQ_zh.md
@@ -21,6 +21,8 @@ flash attention是一个用于加速模型训练推理的可选项，且仅适
 #### transformers_stream_generator/tiktoken/accelerate，这几个库提示找不到，怎么办？
 
 运行如下命令：`pip install -r requirements.txt`。相关依赖库在[https://github.com/QwenLM/Qwen-7B/blob/main/requirements.txt](https://github.com/QwenLM/Qwen-7B/blob/main/requirements.txt) 可以找到。
+<br><br>
+
 
 ## Demo & 推理
 
@@ -59,6 +61,7 @@ Qwen当前支持流式推理。见位于`modeling_qwen.py`的`chat_stream`函数
 #### 处理长序列时效果有问题**
 
 请确认是否开启ntk。若要启用这些技巧，请将`config.json`里的`use_dynamc_ntk`和`use_logn_attn`设置为`true`。最新代码默认为`true`。
+<br><br>
 
 
 ## 微调
@@ -66,6 +69,7 @@ Qwen当前支持流式推理。见位于`modeling_qwen.py`的`chat_stream`函数
 #### 当前是否支持SFT和RLHF？
 
 我们目前未提供SFT和RLHF代码。当前有多个外部项目已实现支持，如[FastChat](**[https://github.com/lm-sys/FastChat](https://github.com/lm-sys/FastChat))、[Firefly]([https://github.com/yangjianxin1/Firefly](https://github.com/yangjianxin1/Firefly))、[**LLaMA Efficient Tuning**]([https://github.com/hiyouga/LLaMA-Efficient-Tuning](https://github.com/hiyouga/LLaMA-Efficient-Tuning))等。我们会尽快更新这部分代码和说明。
+<br><br>
 
 
 ## Tokenizer

From 81829da0d0787bb3c6f6dd571abcfda50440393a Mon Sep 17 00:00:00 2001
From: Yang An <yangapku@gmail.com>
Date: Thu, 10 Aug 2023 12:13:36 +0800
Subject: [PATCH 3/3] Update FAQ_zh.md

---
 FAQ_zh.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/FAQ_zh.md b/FAQ_zh.md
index de29bec..174ae69 100644
--- a/FAQ_zh.md
+++ b/FAQ_zh.md
@@ -58,7 +58,7 @@ Qwen当前支持流式推理。见位于`modeling_qwen.py`的`chat_stream`函数
 
 这一问题已经在最新版本中修复。请更新到最新代码。
 
-#### 处理长序列时效果有问题**
+#### 处理长序列时效果有问题
 
 请确认是否开启ntk。若要启用这些技巧，请将`config.json`里的`use_dynamc_ntk`和`use_logn_attn`设置为`true`。最新代码默认为`true`。
 <br><br>