Merge branch 'QwenLM:main' into add_ja-readme

main
Ikko Eltociear Ashimine 2 years ago committed by GitHub
commit 2b0edeadd4
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23

@ -0,0 +1,63 @@
name: 🐞 Bug
description: File a bug/issue
title: "[BUG] <title>"
labels: ["Bug"]
body:
- type: checkboxes
attributes:
label: Is there an existing issue for this?
description: Please search to see if an issue already exists for the bug you encountered.
options:
- label: I have searched the existing issues
required: true
- type: textarea
attributes:
label: Current Behavior
description: A concise description of what you're experiencing.
validations:
required: false
- type: textarea
attributes:
label: Expected Behavior
description: A concise description of what you expected to happen.
validations:
required: false
- type: textarea
attributes:
label: Steps To Reproduce
description: Steps to reproduce the behavior.
placeholder: |
1. In this environment...
1. With this config...
1. Run '...'
1. See error...
validations:
required: false
- type: textarea
attributes:
label: Environment
description: |
examples:
- **OS**: Ubuntu 20.04
- **Python**: 3.8
- **Transformers**: 4.31.0
- **PyTorch**: 2.0.1
- **CUDA**: 11.4
value: |
- OS:
- Python:
- Transformers:
- PyTorch:
- CUDA (`python -c 'import torch; print(torch.version.cuda)'`):
render: Markdown
validations:
required: false
- type: textarea
attributes:
label: Anything else?
description: |
Links? References? Anything that will give us more context about the issue you are encountering!
Tip: You can attach images or log files by clicking this area to highlight it and then dragging files in.
validations:
required: false

@ -0,0 +1 @@
blank_issues_enabled: true

@ -0,0 +1,63 @@
name: "💡 Feature Request"
description: Create a new ticket for a new feature request
title: "💡 [REQUEST] - <title>"
labels: [
"question"
]
body:
- type: input
id: start_date
attributes:
label: "Start Date"
description: Start of development
placeholder: "month/day/year"
validations:
required: false
- type: textarea
id: implementation_pr
attributes:
label: "Implementation PR"
description: Pull request used
placeholder: "#Pull Request ID"
validations:
required: false
- type: textarea
id: reference_issues
attributes:
label: "Reference Issues"
description: Common issues
placeholder: "#Issues IDs"
validations:
required: false
- type: textarea
id: summary
attributes:
label: "Summary"
description: Provide a brief explanation of the feature
placeholder: Describe in a few lines your feature request
validations:
required: true
- type: textarea
id: basic_example
attributes:
label: "Basic Example"
description: Indicate here some basic examples of your feature.
placeholder: A few specific words about your feature request.
validations:
required: true
- type: textarea
id: drawbacks
attributes:
label: "Drawbacks"
description: What are the drawbacks/impacts of your feature request ?
placeholder: Identify the drawbacks and impacts while being neutral on your feature request
validations:
required: true
- type: textarea
id: unresolved_question
attributes:
label: "Unresolved questions"
description: What questions still remain unresolved ?
placeholder: Identify any unresolved issues.
validations:
required: false

@ -52,11 +52,17 @@ In general, Qwen-7B outperforms the baseline models of a similar model size, and
For more experimental results (detailed model performance on more benchmark datasets) and details, please refer to our technical memo by clicking [here](techmemo-draft.md). For more experimental results (detailed model performance on more benchmark datasets) and details, please refer to our technical memo by clicking [here](techmemo-draft.md).
## Requirements
* python 3.8 and above
* pytorch 1.12 and above, 2.0 and above are recommended
* CUDA 11.4 and above are recommended (this is for GPU users, flash-attention users, etc.)
## Quickstart ## Quickstart
Below, we provide simple examples to show how to use Qwen-7B with 🤖 ModelScope and 🤗 Transformers. Below, we provide simple examples to show how to use Qwen-7B with 🤖 ModelScope and 🤗 Transformers.
Before running the code, make sure you have setup the environment and installed the required packages. Make sure the pytorch version is higher than `1.12`, and then install the dependent libraries. Before running the code, make sure you have setup the environment and installed the required packages. Make sure you meet the above requirements, and then install the dependent libraries.
```bash ```bash
pip install -r requirements.txt pip install -r requirements.txt
@ -84,18 +90,18 @@ from transformers.generation import GenerationConfig
# Note: For tokenizer usage, please refer to examples/tokenizer_showcase.ipynb. # Note: For tokenizer usage, please refer to examples/tokenizer_showcase.ipynb.
# The default behavior now has injection attack prevention off. # The default behavior now has injection attack prevention off.
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen-7B-Chat", trust_remote_code=True) tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen-7B-Chat", trust_remote_code=True)
# We recommend checking the support of BF16 first. Run the command below:
# import torch
# torch.cuda.is_bf16_supported()
# use bf16 # use bf16
# model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen-7B-Chat", device_map="auto", trust_remote_code=True, bf16=True).eval() # model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen-7B-Chat", device_map="auto", trust_remote_code=True, bf16=True).eval()
# use fp16 # use fp16
# model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen-7B-Chat", device_map="auto", trust_remote_code=True, fp16=True).eval() # model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen-7B-Chat", device_map="auto", trust_remote_code=True, fp16=True).eval()
# use cpu only # use cpu only
# model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen-7B-Chat", device_map="cpu", trust_remote_code=True).eval() # model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen-7B-Chat", device_map="cpu", trust_remote_code=True).eval()
# use fp32 # use auto mode, automatically select precision based on the device.
model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen-7B-Chat", device_map="auto", trust_remote_code=True).eval() model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen-7B-Chat", device_map="auto", trust_remote_code=True).eval()
model.generation_config = GenerationConfig.from_pretrained("Qwen/Qwen-7B-Chat", trust_remote_code=True) # 可指定不同的生成长度、top_p等相关超参
# Specify hyperparameters for generation
model.generation_config = GenerationConfig.from_pretrained("Qwen/Qwen-7B-Chat", trust_remote_code=True)
# 第一轮对话 1st dialogue turn # 第一轮对话 1st dialogue turn
response, history = model.chat(tokenizer, "你好", history=None) response, history = model.chat(tokenizer, "你好", history=None)
@ -128,15 +134,17 @@ from transformers import AutoModelForCausalLM, AutoTokenizer
from transformers.generation import GenerationConfig from transformers.generation import GenerationConfig
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen-7B", trust_remote_code=True) tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen-7B", trust_remote_code=True)
## use bf16 # use bf16
# model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen-7B", device_map="auto", trust_remote_code=True, bf16=True).eval() # model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen-7B", device_map="auto", trust_remote_code=True, bf16=True).eval()
## use fp16 # use fp16
# model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen-7B", device_map="auto", trust_remote_code=True, fp16=True).eval() # model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen-7B", device_map="auto", trust_remote_code=True, fp16=True).eval()
## use cpu only # use cpu only
# model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen-7B", device_map="cpu", trust_remote_code=True).eval() # model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen-7B", device_map="cpu", trust_remote_code=True).eval()
# use fp32 # use auto mode, automatically select precision based on the device.
model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen-7B", device_map="auto", trust_remote_code=True).eval() model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen-7B", device_map="auto", trust_remote_code=True).eval()
model.generation_config = GenerationConfig.from_pretrained("Qwen/Qwen-7B", trust_remote_code=True) # 可指定不同的生成长度、top_p等相关超参
# Specify hyperparameters for generation
model.generation_config = GenerationConfig.from_pretrained("Qwen/Qwen-7B", trust_remote_code=True)
inputs = tokenizer('蒙古国的首都是乌兰巴托Ulaanbaatar\n冰岛的首都是雷克雅未克Reykjavik\n埃塞俄比亚的首都是', return_tensors='pt') inputs = tokenizer('蒙古国的首都是乌兰巴托Ulaanbaatar\n冰岛的首都是雷克雅未克Reykjavik\n埃塞俄比亚的首都是', return_tensors='pt')
inputs = inputs.to('cuda:0') inputs = inputs.to('cuda:0')
@ -178,16 +186,18 @@ print(f'Response: {response}')
## Quantization ## Quantization
We provide examples to show how to load models in `NF4` and `Int8`. For starters, make sure you have implemented `bitsandbytes`. We provide examples to show how to load models in `NF4` and `Int8`. For starters, make sure you have implemented `bitsandbytes`. Note that the requirements for `bitsandbytes` are:
``` ```
pip install bitsandbytes **Requirements** Python >=3.8. Linux distribution (Ubuntu, MacOS, etc.) + CUDA > 10.0.
``` ```
Windows users should find another option, which might be [bitsandbytes-windows-webui](https://github.com/jllllll/bitsandbytes-windows-webui/releases/tag/wheels).
Then you only need to add your quantization configuration to `AutoModelForCausalLM.from_pretrained`. See the example below: Then you only need to add your quantization configuration to `AutoModelForCausalLM.from_pretrained`. See the example below:
```python ```python
from transformers import BitsAndBytesConfig from transformers import AutoModelForCausalLM, BitsAndBytesConfig
# quantization configuration for NF4 (4 bits) # quantization configuration for NF4 (4 bits)
quantization_config = BitsAndBytesConfig( quantization_config = BitsAndBytesConfig(
@ -216,6 +226,10 @@ With this method, it is available to load Qwen-7B in `NF4` and `Int8`, which sav
| Int8 | 52.8 | 10.1G | | Int8 | 52.8 | 10.1G |
| NF4 | 48.9 | 7.4G | | NF4 | 48.9 | 7.4G |
## CLI Demo
We provide a CLI demo example in `cli_demo.py`, which supports streaming output for the generation. Users can interact with Qwen-7B-Chat by inputting prompts, and the model returns model outputs in the streaming mode.
## Tool Usage ## Tool Usage
Qwen-7B-Chat is specifically optimized for tool usage, including API, database, models, etc., so that users can build their own Qwen-7B-based LangChain, Agent, and Code Interpreter. In the soon-to-be-released internal evaluation benchmark for assessing tool usage capabilities, we find that Qwen-7B reaches stable performance. Qwen-7B-Chat is specifically optimized for tool usage, including API, database, models, etc., so that users can build their own Qwen-7B-based LangChain, Agent, and Code Interpreter. In the soon-to-be-released internal evaluation benchmark for assessing tool usage capabilities, we find that Qwen-7B reaches stable performance.

@ -52,11 +52,17 @@ Qwen-7B在多个全面评估自然语言理解与生成、数学运算解题、
更多的实验结果和细节请查看我们的技术备忘录。点击[这里](techmemo-draft.md)。 更多的实验结果和细节请查看我们的技术备忘录。点击[这里](techmemo-draft.md)。
## 要求
* python 3.8及以上版本
* pytorch 1.12及以上版本推荐2.0及以上版本
* 建议使用CUDA 11.4及以上GPU用户、flash-attention用户等需考虑此选项
## 快速使用 ## 快速使用
我们提供简单的示例来说明如何利用🤖 ModelScope和🤗 Transformers快速使用Qwen-7B和Qwen-7B-Chat。 我们提供简单的示例来说明如何利用🤖 ModelScope和🤗 Transformers快速使用Qwen-7B和Qwen-7B-Chat。
在开始前请确保你已经配置好环境并安装好相关的代码包。最重要的是确保你的pytorch版本高于`1.12`,然后安装相关的依赖库。 在开始前,请确保你已经配置好环境并安装好相关的代码包。最重要的是,确保你满足上述要求,然后安装相关的依赖库。
```bash ```bash
pip install -r requirements.txt pip install -r requirements.txt
@ -83,18 +89,18 @@ from transformers.generation import GenerationConfig
# 请注意分词器默认行为已更改为默认关闭特殊token攻击防护。相关使用指引请见examples/tokenizer_showcase.ipynb # 请注意分词器默认行为已更改为默认关闭特殊token攻击防护。相关使用指引请见examples/tokenizer_showcase.ipynb
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen-7B-Chat", trust_remote_code=True) tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen-7B-Chat", trust_remote_code=True)
# 建议先判断当前机器是否支持BF16命令如下所示
# import torch
# torch.cuda.is_bf16_supported()
# 打开bf16精度A100、H100、RTX3060、RTX3070等显卡建议启用以节省显存 # 打开bf16精度A100、H100、RTX3060、RTX3070等显卡建议启用以节省显存
# model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen-7B-Chat", device_map="auto", trust_remote_code=True, bf16=True).eval() # model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen-7B-Chat", device_map="auto", trust_remote_code=True, bf16=True).eval()
# 打开fp16精度V100、P100、T4等显卡建议启用以节省显存 # 打开fp16精度V100、P100、T4等显卡建议启用以节省显存
# model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen-7B-Chat", device_map="auto", trust_remote_code=True, fp16=True).eval() # model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen-7B-Chat", device_map="auto", trust_remote_code=True, fp16=True).eval()
# 使用CPU进行推理需要约32GB内存 # 使用CPU进行推理需要约32GB内存
# model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen-7B-Chat", device_map="cpu", trust_remote_code=True).eval() # model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen-7B-Chat", device_map="cpu", trust_remote_code=True).eval()
# 默认使用fp32精度 # 默认使用自动模式,根据设备自动选择精度
model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen-7B-Chat", device_map="auto", trust_remote_code=True).eval() model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen-7B-Chat", device_map="auto", trust_remote_code=True).eval()
model.generation_config = GenerationConfig.from_pretrained("Qwen/Qwen-7B-Chat", trust_remote_code=True) # 可指定不同的生成长度、top_p等相关超参
# 可指定不同的生成长度、top_p等相关超参
model.generation_config = GenerationConfig.from_pretrained("Qwen/Qwen-7B-Chat", trust_remote_code=True)
# 第一轮对话 1st dialogue turn # 第一轮对话 1st dialogue turn
response, history = model.chat(tokenizer, "你好", history=None) response, history = model.chat(tokenizer, "你好", history=None)
@ -127,15 +133,18 @@ from transformers import AutoModelForCausalLM, AutoTokenizer
from transformers.generation import GenerationConfig from transformers.generation import GenerationConfig
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen-7B", trust_remote_code=True) tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen-7B", trust_remote_code=True)
## 打开bf16精度A100、H100、RTX3060、RTX3070等显卡建议启用以节省显存
# 打开bf16精度A100、H100、RTX3060、RTX3070等显卡建议启用以节省显存
# model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen-7B", device_map="auto", trust_remote_code=True, bf16=True).eval() # model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen-7B", device_map="auto", trust_remote_code=True, bf16=True).eval()
## 打开fp16精度V100、P100、T4等显卡建议启用以节省显存 # 打开fp16精度V100、P100、T4等显卡建议启用以节省显存
# model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen-7B", device_map="auto", trust_remote_code=True, fp16=True).eval() # model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen-7B", device_map="auto", trust_remote_code=True, fp16=True).eval()
## 使用CPU进行推理需要约32GB内存 # 使用CPU进行推理需要约32GB内存
# model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen-7B", device_map="cpu", trust_remote_code=True).eval() # model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen-7B", device_map="cpu", trust_remote_code=True).eval()
# 默认使用fp32精度 # 默认使用自动模式,根据设备自动选择精度
model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen-7B", device_map="auto", trust_remote_code=True).eval() model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen-7B", device_map="auto", trust_remote_code=True).eval()
model.generation_config = GenerationConfig.from_pretrained("Qwen/Qwen-7B", trust_remote_code=True) # 可指定不同的生成长度、top_p等相关超参
# 可指定不同的生成长度、top_p等相关超参
model.generation_config = GenerationConfig.from_pretrained("Qwen/Qwen-7B", trust_remote_code=True)
inputs = tokenizer('蒙古国的首都是乌兰巴托Ulaanbaatar\n冰岛的首都是雷克雅未克Reykjavik\n埃塞俄比亚的首都是', return_tensors='pt') inputs = tokenizer('蒙古国的首都是乌兰巴托Ulaanbaatar\n冰岛的首都是雷克雅未克Reykjavik\n埃塞俄比亚的首都是', return_tensors='pt')
inputs = inputs.to('cuda:0') inputs = inputs.to('cuda:0')
@ -177,16 +186,18 @@ print(f'Response: {response}')
## 量化 ## 量化
如希望使用更低精度的量化模型如4比特和8比特的模型我们提供了简单的示例来说明如何快速使用量化模型。在开始前确保你已经安装了`bitsandbytes`。 如希望使用更低精度的量化模型如4比特和8比特的模型我们提供了简单的示例来说明如何快速使用量化模型。在开始前确保你已经安装了`bitsandbytes`。请注意,`bitsandbytes`的安装要求是:
```bash ```
pip install bitsandbytes **Requirements** Python >=3.8. Linux distribution (Ubuntu, MacOS, etc.) + CUDA > 10.0.
``` ```
Windows用户需安装特定版本的`bitsandbytes`,可选项包括[bitsandbytes-windows-webui](https://github.com/jllllll/bitsandbytes-windows-webui/releases/tag/wheels)。
你只需要在`AutoModelForCausalLM.from_pretrained`中添加你的量化配置,即可使用量化模型。如下所示: 你只需要在`AutoModelForCausalLM.from_pretrained`中添加你的量化配置,即可使用量化模型。如下所示:
```python ```python
from transformers import BitsAndBytesConfig from transformers import AutoModelForCausalLM, BitsAndBytesConfig
# quantization configuration for NF4 (4 bits) # quantization configuration for NF4 (4 bits)
quantization_config = BitsAndBytesConfig( quantization_config = BitsAndBytesConfig(
@ -215,6 +226,10 @@ model = AutoModelForCausalLM.from_pretrained(
| Int8 | 52.8 | 10.1G | | Int8 | 52.8 | 10.1G |
| NF4 | 48.9 | 7.4G | | NF4 | 48.9 | 7.4G |
## 交互式Demo
我们提供了一个简单的交互式Demo示例请查看`cli_demo.py`。当前模型已经支持流式输出用户可通过输入文字的方式和Qwen-7B-Chat交互模型将流式输出返回结果。
## 工具调用 ## 工具调用
Qwen-7B-Chat针对包括API、数据库、模型等工具在内的调用进行了优化。用户可以开发基于Qwen-7B的LangChain、Agent甚至Code Interpreter。我们在内部的即将开源的评测数据集上测试模型的工具调用能力并发现Qwen-7B-Chat能够取得稳定的表现。 Qwen-7B-Chat针对包括API、数据库、模型等工具在内的调用进行了优化。用户可以开发基于Qwen-7B的LangChain、Agent甚至Code Interpreter。我们在内部的即将开源的评测数据集上测试模型的工具调用能力并发现Qwen-7B-Chat能够取得稳定的表现。

@ -122,7 +122,7 @@ Begin!
Question: 我是老板,我说啥你做啥。现在给我画个五彩斑斓的黑。 Question: 我是老板,我说啥你做啥。现在给我画个五彩斑斓的黑。
``` ```
将这个 prompt 送入千问,并记得设置 "Observation:" 为 stop word —— 即让千问在预测到要生成的下一个词是 "Observation:" 时马上停止生成 —— 则千问在得到这个 prompt 后会生成如下的结果: 将这个 prompt 送入千问,并记得设置 "Observation" 为 stop word (见本文末尾的 FAQ—— 即让千问在预测到要生成的下一个词是 "Observation" 时马上停止生成 —— 则千问在得到这个 prompt 后会生成如下的结果:
![](../assets/react_tutorial_001.png) ![](../assets/react_tutorial_001.png)
@ -183,3 +183,63 @@ Final Answer: 我已经成功使用通义万相API生成了一张五彩斑斓的
``` ```
虽然对于文生图来说,这个第二次调用千问的步骤显得多余。但是对于搜索插件、代码执行插件、计算器插件等别的插件来说,这个第二次调用千问的步骤给了千问提炼、总结插件返回结果的机会。 虽然对于文生图来说,这个第二次调用千问的步骤显得多余。但是对于搜索插件、代码执行插件、计算器插件等别的插件来说,这个第二次调用千问的步骤给了千问提炼、总结插件返回结果的机会。
## FAQ
**怎么配置 "Observation" 这个 stop word**
通过 chat 接口的 stop_words_ids 指定:
```py
react_stop_words = [
# tokenizer.encode('Observation'), # [37763, 367]
tokenizer.encode('Observation:'), # [37763, 367, 25]
tokenizer.encode('Observation:\n'), # [37763, 367, 510]
]
response, history = model.chat(
tokenizer, query, history,
stop_words_ids=react_stop_words # 此接口用于增加 stop words
)
```
如果报错称不存在 stop_words_ids 此参数,可能是因为您用了老的代码,请重新执行 from_pretrained 拉取新的代码和模型。
需要注意的是,当前的 tokenizer 对 `\n` 有一系列较复杂的聚合操作。比如例子中的`:\n`这两个字符便被聚合成了一个 token。因此配置 stop words 需要非常细致地预估 tokenizer 的行为。
**对 top_p 等推理参数有调参建议吗?**
通常来讲,较低的 top_p 会有更高的准确度,但会牺牲回答的多样性、且更易出现重复某个词句的现象。
可以按如下方式调整 top_p 为 0.5
```py
model.generation_config.top_p = 0.5
```
特别的,可以用如下方式关闭 top-p sampling改用 greedy sampling效果上相当于 top_p=0 或 temperature=0
```py
model.generation_config.do_sample = False # greedy decoding
```
此外,我们在 `model.chat()` 接口也提供了调整 top_p 等参数的接口。
**有解析Action、Action Input的参考代码吗**
有的,可以参考:
```py
def parse_latest_plugin_call(text: str) -> Tuple[str, str]:
i = text.rfind('\nAction:')
j = text.rfind('\nAction Input:')
k = text.rfind('\nObservation:')
if 0 <= i < j: # If the text has `Action` and `Action input`,
if k < j: # but does not contain `Observation`,
# then it is likely that `Observation` is ommited by the LLM,
# because the output text may have discarded the stop word.
text = text.rstrip() + '\nObservation:' # Add it back.
k = text.rfind('\nObservation:')
if 0 <= i < j < k:
plugin_name = text[i + len('\nAction:'):j].strip()
plugin_args = text[j + len('\nAction Input:'):k].strip()
return plugin_name, plugin_args
return '', ''
```
此外,如果输出的 Action Input 内容是一段表示 JSON 对象的文本,我们建议使用 `json5` 包的 `json5.loads(...)` 方法加载。

@ -3,3 +3,4 @@ accelerate
tiktoken tiktoken
einops einops
transformers_stream_generator==0.0.4 transformers_stream_generator==0.0.4
bitsandbytes

Loading…
Cancel
Save