Merge pull request #144 from QwenLM/add_faq

add faq files
main
Yang An 1 year ago committed by GitHub
commit 8af13b9706
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23

@ -0,0 +1,85 @@
# FAQ
## Installation & Environment
#### Failure in installing flash attention
Flash attention is an option for accelerating training and inference. Only NVIDIA GPUs of Turing, Ampere, Ada, and Hopper architecture, e.g., H100, A100, RTX 3090, T4, RTX 2080, can support flash attention. You can use our models without installing it.
#### Which version of transformers should I use?
4.31.0 is preferred.
#### I downloaded the codes and checkpoints but I can't load the model locally. What should I do?
Please check if you have updated the code to the latest, and correctly downloaded all the sharded checkpoint files.
#### `qwen.tiktoken` is not found. What is it?
This is the merge file of the tokenizer. You have to download it. Note that if you just git clone the repo without [git-lfs](https://git-lfs.com), you cannot download this file.
#### transformers_stream_generator/tiktoken/accelerate not found
Run the command `pip install -r requirements.txt`. You can find the file at [https://github.com/QwenLM/Qwen-7B/blob/main/requirements.txt](https://github.com/QwenLM/Qwen-7B/blob/main/requirements.txt).
<br><br>
## Demo & Inference
#### Is there any demo? CLI demo and Web UI demo?
Yes, see `web_demo.py` for web demo and `cli_demo.py` for CLI demo. See README for more information.
#### Can I use CPU only?
Yes, run `python cli_demo.py --cpu_only` will load the model and inference on CPU only.
#### Can Qwen support streaming?
Yes. See the function `chat_stream` in `modeling_qwen.py`.
#### Gibberish in result when using chat_stream().
This is because tokens represent bytes and a single token may be a meaningless string. We have updated the default setting of our tokenizer to avoid such decoding results. Please update the code to the latest version.
#### It seems that the generation is not related to the instruction...
Please check if you are loading Qwen-7B-Chat instead of Qwen-7B. Qwen-7B is the base model without alignment, which behaves differently from the SFT/Chat model.
#### Is quantization supported?
Yes, the quantization is supported by `bitsandbytes`. We are working on an improved version and will release the quantized model checkpoints.
#### Errors in running quantized models: `importlib.metadata.PackageNotFoundError: No package metadata was found for bitsandbytes`
For Linux usersrunning `pip install bitsandbytes` directly can solve the problem. For Windows users, you can run `python -m pip install bitsandbytes --prefer-binary --extra-index-url=https://jllllll.github.io/bitsandbytes-windows-webui`·
#### Slow when processing long sequences
We solved this problem. Updating the code to the latest version can help.
#### Unsatisfactory performance in processing long sequences
Please ensure that NTK is applied. `use_dynamc_ntk` and `use_logn_attn` in `config.json` should be set to `true` (`true` by default).
<br><br>
## Finetuning
#### Can Qwen support SFT or even RLHF?
We do not provide finetuning or RLHF codes for now. However, some projects have supported finetuning, see [FastChat](**[https://github.com/lm-sys/FastChat](https://github.com/lm-sys/FastChat)), [Firefly]([https://github.com/yangjianxin1/Firefly](https://github.com/yangjianxin1/Firefly)), [**LLaMA Efficient Tuning**]([https://github.com/hiyouga/LLaMA-Efficient-Tuning](https://github.com/hiyouga/LLaMA-Efficient-Tuning)), etc. We will soon update the relevant codes.
<br><br>
## Tokenizer
#### bos_id/eos_id/pad_id not found
In our training, we only use `<|endoftext|>` as the separator and padding token. You can set bos_id, eos_id, and pad_id to tokenizer.eod_id. Learn more about our tokenizer from our documents about the tokenizer.

@ -0,0 +1,80 @@
# FAQ
## 安装&环境
#### flash attention 安装失败
flash attention是一个用于加速模型训练推理的可选项且仅适用于Turing、Ampere、Ada、Hopper架构的Nvidia GPU显卡如H100、A100、RTX 3090、T4、RTX 2080您可以在不安装flash attention的情况下正常使用模型进行推理。
#### 我应该用哪个transformers版本
建议使用4.31.0。
#### 我把模型和代码下到本地,按照教程无法使用,该怎么办?
别着急先检查你的代码是不是更新到最新版本然后确认你是否完整地将模型checkpoint下到本地。
#### `qwen.tiktoken`这个文件找不到,怎么办?
这个是我们的tokenizer的merge文件你必须下载它才能使用我们的tokenizer。注意如果你使用git clone却没有使用git-lfs这个文件不会被下载。如果你不了解git-lfs可点击[官网](https://git-lfs.com/)了解。
#### transformers_stream_generator/tiktoken/accelerate这几个库提示找不到怎么办
运行如下命令:`pip install -r requirements.txt`。相关依赖库在[https://github.com/QwenLM/Qwen-7B/blob/main/requirements.txt](https://github.com/QwenLM/Qwen-7B/blob/main/requirements.txt) 可以找到。
<br><br>
## Demo & 推理
#### 是否提供DemoCLI Demo及Web UI Demo
`web_demo.py`和`cli_demo.py`分别提供了Web UI以及CLI的Demo。请查看README相关内容了解更多。
#### 我没有GPU只用CPU运行CLI demo可以吗
可以的,运行`python cli_demo.py --cpu_only`命令即可将模型读取到CPU并使用CPU进行推理。
#### Qwen支持流式推理吗
Qwen当前支持流式推理。见位于`modeling_qwen.py`的`chat_stream`函数。
#### 使用`chat_stream()`生成混乱的内容及乱码,为什么?
这是由于模型生成过程中输出的部分token需要与后续token一起解码才能输出正常文本单个token解码结果是无意义字符串我们已经更新了tokenizer解码时的默认设置避免这些字符串在生成结果中出现如果仍有类似问题请更新模型至最新版本。
#### 模型的输出看起来与输入无关/没有遵循指令/看起来呆呆的
请检查是否加载的是Qwen-7B-Chat模型进行推理Qwen-7B模型是未经align的预训练基模型不期望具备响应用户指令的能力。我们在模型最新版本已经对`chat`及`chat_stream`接口内进行了检查避免您误将预训练模型作为SFT/Chat模型使用。
#### 是否有量化版本模型
目前Qwen支持基于`bitsandbytes`的8-bit和4-bit的量化推理。后续我们将进一步更新提供更加高效的量化推理实现并提供对应的量化模型。
#### 运行量化推理报错:`importlib.metadata.PackageNotFoundError: No package metadata was found for bitsandbytes`
对于linux 用户,直接`pip install bitsandbytes`即可。对于windows用户可以 运行`python -m pip install bitsandbytes --prefer-binary --extra-index-url=https://jllllll.github.io/bitsandbytes-windows-webui`。
#### 生成序列较长后速度显著变慢
这一问题已经在最新版本中修复。请更新到最新代码。
#### 处理长序列时效果有问题
请确认是否开启ntk。若要启用这些技巧请将`config.json`里的`use_dynamc_ntk`和`use_logn_attn`设置为`true`。最新代码默认为`true`。
<br><br>
## 微调
#### 当前是否支持SFT和RLHF
我们目前未提供SFT和RLHF代码。当前有多个外部项目已实现支持如[FastChat](**[https://github.com/lm-sys/FastChat](https://github.com/lm-sys/FastChat))、[Firefly]([https://github.com/yangjianxin1/Firefly](https://github.com/yangjianxin1/Firefly))、[**LLaMA Efficient Tuning**]([https://github.com/hiyouga/LLaMA-Efficient-Tuning](https://github.com/hiyouga/LLaMA-Efficient-Tuning))等。我们会尽快更新这部分代码和说明。
<br><br>
## Tokenizer
#### bos_id/eos_id/pad_id这些token id不存在为什么
在训练过程中,我们仅使用<|endoftext|>这一token作为sample/document之间的分隔符及padding位置占位符你可以将bos_id, eos_id, pad_id均指向tokenizer.eod_id。请阅读我们关于tokenizer的文档了解如何设置这些id。

@ -311,9 +311,13 @@ To extend the context length and break the bottleneck of training sequence lengt
For your reproduction of the model performance on benchmark datasets, we provide scripts for you to reproduce the results. Check [eval/EVALUATION.md](eval/EVALUATION.md) for more information. Note that the reproduction may lead to slight differences from our reported results.
## FAQ
If you meet problems, please refer to [FAQ](FQA.md) and the issues first to search a solution before you launch a new issue.
## License Agreement
Researchers and developers are free to use the codes and model weights of both Qwen-7B and Qwen-7B-Chat. We also allow their commercial use. Check our license at [LICENSE](LICENSE) for more details.
Researchers and developers are free to use the codes and model weights of both Qwen-7B and Qwen-7B-Chat. We also allow their commercial use. Check our license at [LICENSE](LICENSE) for more details. If you have requirements for commercial use, please fill out the [form](https://dashscope.console.aliyun.com/openModelApply/qianwen) to apply.
## Contact Us

@ -316,9 +316,14 @@ For how to write and use prompts for ReAct Prompting, please refer to [the ReAct
我们提供了评测脚本以供复现我们的实验结果。注意,由于内部代码和开源代码存在少许差异,评测结果可能与汇报结果存在细微的结果不一致。请阅读[eval/EVALUATION.md](eval/EVALUATION.md)了解更多信息。
## FAQ
如遇到问题,敬请查阅[FAQ](FAQ_zh.md)以及issue区如仍无法解决再提交issue。
## 使用协议
研究人员与开发者可使用Qwen-7B和Qwen-7B-Chat或进行二次开发。我们同样允许商业使用具体细节请查看[LICENSE](LICENSE)。
研究人员与开发者可使用Qwen-7B和Qwen-7B-Chat或进行二次开发。我们同样允许商业使用具体细节请查看[LICENSE](LICENSE)。如需商用,请填写[问卷](https://dashscope.console.aliyun.com/openModelApply/qianwen)申请。
## 联系我们

@ -320,9 +320,13 @@ ReAct プロンプトの書き方や使い方については、[ReAct の例](ex
ベンチマークデータセットでのモデル性能の再現のために、結果を再現するスクリプトを提供しています。詳しくは [eval/EVALUATION.md](eval/EVALUATION.md) を確認してください。なお、再現の結果、我々の報告結果と若干異なる場合がある。
## FAQ
問題が発生した場合は、[FAQ](FQA.md)やissueを参照し、新しいissueを立ち上げる前に解決策を探してください。
## ライセンス契約
Qwen-7B と Qwen-7B-Chat のコードとモデルウェイトは、研究者や開発者が自由に使用することができます。また、商用利用も可能です。詳しくは [LICENSE](LICENSE) をご覧ください。
Qwen-7B と Qwen-7B-Chat のコードとモデルウェイトは、研究者や開発者が自由に使用することができます。また、商用利用も可能です。詳しくは [LICENSE](LICENSE) をご覧ください。商用利用を希望される方は、[リクエストフォーム](https://dashscope.console.aliyun.com/openModelApply/qianwen)に必要事項をご記入の上、お申し込みください。
## お問い合わせ

Loading…
Cancel
Save