<br> <p align="center"> <img src="assets/logo.jpg" width="400"/> <p> <br> <p align="center"> Qwen-7B <a href="https://modelscope.cn/models/qwen/Qwen-7B/summary">ð€ <a> | <a href="https://huggingface.co/Qwen/Qwen-7B">ð€</a>  ïœ Qwen-7B-Chat <a href="https://modelscope.cn/models/qwen/Qwen-7B-Chat/summary">ð€ <a>| <a href="https://huggingface.co/Qwen/Qwen-7B-Chat">ð€</a>  | Qwen-7B-Chat-Int4 <a href="https://huggingface.co/Qwen/Qwen-7B-Chat-Int4">ð€</a>  ïœ  <a href="https://modelscope.cn/studios/qwen/Qwen-7B-Chat-Demo/summary">Demo</a>  ïœ  <a href="https://github.com/QwenLM/Qwen-7B/blob/main/tech_memo.md">Report</a>   |   <a href="https://discord.gg/Dds6qKaK">Discord</a> </p> <br> <p align="center"> <a href="README_CN.md">äžæ</a>  ïœ  <a href="README.md">English</a>  ïœ  æ¥æ¬èª </p> <br><br> <p align="right"> Japanese document maintainer: Ikko Eltociear Ashimine </p> <br><br> ç§ãã¡ã¯ã**Qwen-7B** ãš **Qwen-7B-Chat** ã **ð€ ModelScope** ãš **ð€ Hugging Face** ã®äž¡æ¹ã§ãªãŒãã³ãœãŒã¹åããŠããŸã(äžéšã®ããŽãã¯ãªãã¯ãããšãã³ãŒããšãã§ãã¯ãã€ã³ãã®ãããªããžããªã«ç§»åããŸã)ããã®ã¬ãã«ã¯ãQwen-7B ã®ç°¡åãªçŽ¹ä»ãšã䜿ãæ¹ã®æåŒããããã«è©³ããæ å ±ãæäŸããæè¡ã¡ã¢ [link](tech_memo.md) ãå«ãŸããŠããŸãã Qwen-7Bã¯ãã¢ãªããã¯ã©ãŠããæå±ãã倧èŠæš¡èšèªã¢ãã«ã·ãªãŒãºQwenïŒç¥ç§°ïŒTongyi QianwenïŒã®7Bãã©ã¡ãŒã¿çã§ããQwen-7Bã¯TransformerããŒã¹ã®å€§èŠæš¡èšèªã¢ãã«ã§ããããŠã§ãããã¹ããæžç±ãã³ãŒããªã©ãå«ã倧éã®ããŒã¿ã§äºååŠç¿ããããããã«ãäºååŠç¿ãããQwen-7BãããŒã¹ã«ãã¢ã©ã€ã¡ã³ãæè¡ã§åŠç¿ããã倧èŠæš¡ã¢ãã«ããŒã¹ã®AIã¢ã·ã¹ã¿ã³ãã§ããQwen-7B-ChatããªãªãŒã¹ãããQwen-7Bã·ãªãŒãºã®ç¹åŸŽã¯ä»¥äžã®éãã§ã: 1. **é«å質ãªäºåãã¬ãŒãã³ã°ããŒã¿ã§ãã¬ãŒãã³ã°**ãQwen-7B 㯠2.2 å 以äžã®ããŒã¯ã³ãå«ã倧èŠæš¡ã§é«å質ãªããŒã¿ã»ããã«å¯ŸããŠäºååŠç¿ãè¡ã£ãããã®ããŒã¿ã»ããã«ã¯å¹³æãšã³ãŒããå«ãŸããäžè¬çãªãã¡ã€ã³ããŒã¿ãšå°éçãªãã¡ã€ã³ããŒã¿ãå«ãå¹ åºããã¡ã€ã³ãã«ããŒããŠããã 2. **匷ãããã©ãŒãã³ã¹**ãèªç¶èšèªç解ãæ°åŠãã³ãŒãã£ã³ã°ãªã©ãè©äŸ¡ããäžé£ã®ãã³ãããŒã¯ããŒã¿ã»ããã«ãããŠãåçšåºŠã®ã¢ãã«ãµã€ãºã®ã¢ãã«ãšæ¯èŒããŠã競åä»ç€Ÿãåé§ããŠããŸãã 3. **èšèªãµããŒãã®åäž**ãQwen-7B ã®ããŒã¯ãã€ã¶ã¯ã15 äžä»¥äžã®ããŒã¯ã³ã®èªåœãããŒã¹ã«ããŠãããä»ã®ããŒã¯ãã€ã¶ã«æ¯ã¹ãŠå¹ççã§ããå€ãã®èšèªã«å¯Ÿå¿ããŠããããŠãŒã¶ãç¹å®ã®èšèªãç解ããããã« Qwen-7B ãããã«åŸ®èª¿æŽããã®ã«åœ¹ç«ã¡ãŸãã 4. **8K ã³ã³ããã¹ãé·ããµããŒã**ãQwen-7B ãš Qwen-7B-Chat ã¯ãšãã« 8K ã®ã³ã³ããã¹ãé·ããµããŒãããŠãããé·ãã³ã³ããã¹ãã§ã®å ¥åãå¯èœã«ããŠããã 5. **ãã©ã°ã€ã³ã®ãµããŒã**ãQwen-7B-Chat ã¯ããã©ã°ã€ã³é¢é£ã®ã¢ã©ã€ã¡ã³ãããŒã¿ã§ãã¬ãŒãã³ã°ãããŠãããããAPIãã¢ãã«ãããŒã¿ããŒã¹ãªã©ã®ããŒã«ã䜿çšããããšãã§ãããšãŒãžã§ã³ããšããŠãã¬ã€ããããšãã§ããã 以äžã®ã»ã¯ã·ã§ã³ã«ã¯ãåèã«ãªãæ å ±ãèšèŒãããŠããŸããç¹ã«ãissueãç«ã¡äžããåã«FAQã»ã¯ã·ã§ã³ããèªã¿ã«ãªãããšããå§ãããŸãã ## ãã¥ãŒã¹ * 2023.8.21 Qwen-7B-Chat çš Int4 éååã¢ãã«(**Qwen-7B-Chat-Int4**)ããªãªãŒã¹ããŸãããã¡ã¢ãªã³ã¹ãã¯äœãããæšè«é床ã¯åäžããŠããããŸãããã³ãããŒã¯è©äŸ¡ã«ãããŠå€§ããªæ§èœå£åã¯ãããŸããã * 2023.8.3 Qwen-7B ãš Qwen-7B-Chat ã ModelScope ãš Hugging Face ã§å ¬éããŸãããã¬ãŒãã³ã°ã®è©³çŽ°ãã¢ãã«ã®æ§èœãªã©ãã¢ãã«ã®è©³çŽ°ã«ã€ããŠã¯ãã¯ãã«ã«ã¡ã¢ãæäŸããŠããŸãã ## ããã©ãŒãã³ã¹ äžè¬çã«ãQwen-7B ã¯ãMMLUãC-EvalãGSM8KãHumanEvalãWMT22ãCMMLU ãªã©ã®èªç¶èšèªç解ãæ°åŠçåé¡è§£æ±ºãã³ãŒãã£ã³ã°ãªã©ã«é¢ããã¢ãã«ã®èœåãè©äŸ¡ããäžé£ã®ãã³ãããŒã¯ããŒã¿ã»ããã«ãããŠãåçšåºŠã®ã¢ãã«ãµã€ãºã®ããŒã¹ã©ã€ã³ã¢ãã«ãåé§ããããã«ã¯ 13B çšåºŠã®ãã©ã¡ãŒã¿ãæã€ãã倧èŠæš¡ãªã¢ãã«ããåé§ããŠããã以äžã®çµæãã芧ãã ããã | Model | MMLU | C-Eval | GSM8K | HumanEval | WMT22 (en-zh) | CMMLU | | :---------------- | :------------: | :------------: | :------------: | :------------: | :------------: |:------------: | | LLaMA-7B | 35.1 | - | 11.0 | 10.5 | 8.7 | - | | LLaMA 2-7B | 45.3 | - | 14.6 | 12.8 | 17.9 | - | | Baichuan-7B | 42.3 | 42.8 | 9.7 | 9.2 | 26.6 | 44.4 | | ChatGLM2-6B | 47.9 | 51.7 | 32.4 | 9.2 | - | 48.8 | | InternLM-7B | 51.0 | 52.8 | 31.2 | 10.4 | 14.8 | - | | Baichuan-13B | 51.6 | 53.6 | 26.6 | 12.8 | 30.0 | 55.8 | | LLaMA-13B | 46.9 | 35.5 | 17.8 | 15.8 | 12.0 | - | | LLaMA 2-13B | 54.8 | - | 28.7 | 18.3 | 24.2 | - | | ChatGLM2-12B | 56.2 | **61.6** | 40.9 | - | - | - | | **Qwen-7B** | **56.7** | 59.6 | **51.6** | **24.4** | **30.6** | **58.8** | <p align="center"> <img src="assets/performance.png" width="1000"/> <p> <br> ããã«ã[OpenCompass](https://opencompass.org.cn/leaderboard-llm)ãå®æœãã倧èŠæš¡èšèªã¢ãã«ã®ç¬¬äžè è©äŸ¡ã«ãããšãQwen-7BãšQwen-7B-Chatã¯7Bãã©ã¡ãŒã¿ã¢ãã«ã®ãããã§ããããã®è©äŸ¡ã¯ãèšèªç解ã»çæãã³ãŒãã£ã³ã°ãæ°åŠãæšè«ãªã©ã®è©äŸ¡ã®ããã®å€§éã®å ¬éãã³ãããŒã¯ã§æ§æãããŠããã ãã詳现ãªå®éšçµæïŒããå€ãã®ãã³ãããŒã¯ããŒã¿ã»ããã§ã®è©³çŽ°ãªã¢ãã«æ§èœïŒã詳现ã«ã€ããŠã¯ã[ãã¡ã](tech_memo.md)ãã¯ãªãã¯ããŠæè¡ã¡ã¢ãåç §ããŠãã ããã ## å¿ èŠæ¡ä»¶ * python 3.8 ä»¥äž * pytorch 1.12 以äžã2.0 以äžãæšå¥š * CUDA 11.4 以äžãæšå¥šïŒGPU ãŠãŒã¶ãŒããã©ãã·ã¥ã¢ãã³ã·ã§ã³ãŠãŒã¶ãŒåããªã©ïŒ ## ã¯ã€ãã¯ã¹ã¿ãŒã 以äžã§ã¯ãQwen-7B ãš ð€ ModelScope ãš ð€ Transformers ã®ç°¡åãªäœ¿çšäŸã瀺ããŸãã ã³ãŒããå®è¡ããåã«ãç°å¢ã®ã»ããã¢ãããšå¿ èŠãªããã±ãŒãžã®ã€ã³ã¹ããŒã«ãæžãã§ããããšã確èªããŠãã ãããäžèšã®èŠä»¶ãæºãããŠããããšã確èªããŠãããäŸåããã©ã€ãã©ãªãã€ã³ã¹ããŒã«ããŠãã ããã ```bash pip install -r requirements.txt ``` ã䜿ãã®ããã€ã¹ã fp16 ãŸã㯠bf16 ããµããŒãããŠããå Žåã[flash-attention](https://github.com/Dao-AILab/flash-attention) ãã€ã³ã¹ããŒã«ããããšã§ãããé«ãå¹çãšã¡ã¢ãªäœ¿çšéãæããããšãã§ããŸãã(**flash-attention ã¯ãªãã·ã§ã³ã§ãããã€ã³ã¹ããŒã«ããªããŠããããžã§ã¯ãã¯æ£åžžã«å®è¡ã§ããŸã**) ```bash git clone -b v1.0.8 https://github.com/Dao-AILab/flash-attention cd flash-attention && pip install . # 以äžã¯ãªãã·ã§ã³ã§ããã€ã³ã¹ããŒã«ã«æéããããå ŽåããããŸãã # pip install csrc/layer_norm # pip install csrc/rotary ``` ãã㧠ModelScope ã Transformers ã§å§ããããšãã§ããŸãã #### ð€ Transformers Qwen-7B-Chat ãæšè«ã«äœ¿çšããã«ã¯ã以äžã®ããã«æ°è¡ã®ã³ãŒããå ¥åããã ãã§ãã**ææ°ã®ã³ãŒãã䜿çšããŠããããšã確èªããŠãã ããã** ```python from transformers import AutoModelForCausalLM, AutoTokenizer from transformers.generation import GenerationConfig # 泚: ããã©ã«ãã®åäœã§ã¯ãã€ã³ãžã§ã¯ã·ã§ã³æ»æé²æ¢æ©èœããªãã«ãªã£ãŠããŸãã tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen-7B-Chat", trust_remote_code=True) # bf16 ãäœ¿çš # model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen-7B-Chat", device_map="auto", trust_remote_code=True, bf16=True).eval() # fp16 ãäœ¿çš # model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen-7B-Chat", device_map="auto", trust_remote_code=True, fp16=True).eval() # CPU ã®ã¿äœ¿çš # model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen-7B-Chat", device_map="cpu", trust_remote_code=True).eval() # ãªãŒãã¢ãŒãã䜿çšãããšãããã€ã¹ã«å¿ããŠèªåçã«ç²ŸåºŠãéžæãããŸãã model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen-7B-Chat", device_map="auto", trust_remote_code=True).eval() # çæã®ããã®ãã€ããŒãã©ã¡ãŒã¿ãæå® model.generation_config = GenerationConfig.from_pretrained("Qwen/Qwen-7B-Chat", trust_remote_code=True) # 第äžèœ®å¯¹è¯ 第äžå察話ã¿ãŒã³ response, history = model.chat(tokenizer, "äœ å¥œ", history=None) print(response) # ããã«ã¡ã¯ïŒ ã圹ã«ç«ãŠãŠããããã§ãã # 第äºèœ®å¯¹è¯ 第äºå察話ã¿ãŒã³ response, history = model.chat(tokenizer, "ç»æ讲äžäžªå¹Žèœ»äººå¥æåäžæç»ååŸæåçæ äºã", history=history) print(response) # ããã¯ãèªåã®ããžãã¹ãå§ããããšå¥®éãããããŠæåããè¥è ã®ç©èªã§ããã # ãã®ç©èªã®äž»äººå ¬ã¯ãå¹³å¡ãªå®¶åºã«çãŸããå¹³å¡ãªåŽåè ã§ãã䞡芪ãæã€ææã§ããã ææã¯åäŸã®é ããèµ·æ¥å®¶ãšããŠæåããããšãç®æšãšããŠããã # ãã®ç®æšãéæãããããææã¯çå匷ããŠå€§åŠã«å ¥ã£ãã 倧åŠæ代ã«ã¯ãããŸããŸãªèµ·æ¥å®¶ã³ã³ãã¹ãã«ç©æ¥µçã«åå ããå€ãã®è³ãç²åŸããã ãŸããäœæãå©çšããŠã€ã³ã¿ãŒã³ã·ããã«ãåå ãã貎éãªçµéšãç©ãã ã # åæ¥åŸãææã¯èµ·æ¥ã決æããã æè³å ãæ¢ãå§ããããäœåºŠãæãããã ãããã圌ã¯ãããããªãã£ãã 圌ã¯æžåœã«åãç¶ããããžãã¹ãã©ã³ãæ¹åããæ°ããªæè³æ©äŒãæ¢ããã # ãããŠææã¯æè³ãåããããšã«æåããèªåã®ããžãã¹ãå§ããã 圌ã¯æ°ããã¿ã€ãã®ãœãããŠã§ã¢ã®éçºã«çŠç¹ãåœãŠããã¯ãããžãŒäŒç€Ÿãèšç«ããã 圌ã®ãªãŒããŒã·ããã®äžãäŒç€Ÿã¯æ¥éã«æé·ãããã¯ãããžãŒäŒæ¥ãšããŠæåãåããã # ææã®æåã¯å¶ç¶ã§ã¯ãªãã 圌ã¯å€åã§ããããŸãããåéºå¥œãã§ãåžžã«åŠã³ãèªåãé«ããŠããã 圌ã®æåã¯ãŸããåªåããã°èª°ã§ãæåã§ããããšã蚌æããŠããã # 第äžèœ®å¯¹è¯ 第äžå察話ã¿ãŒã³ response, history = model.chat(tokenizer, "ç»è¿äžªæ äºèµ·äžäžªæ é¢", history=history) print(response) # ãèµ·æ¥ãžã®å¥®éïŒããè¥è ã®æåãžã®éã ``` Qwen-7B ã®åŠç¿æžã¿ããŒã¹ã¢ãã«ã®å®è¡ãç°¡åã§ãã <details> <summary>Qwen-7B ã®å®è¡</summary> ```python from transformers import AutoModelForCausalLM, AutoTokenizer from transformers.generation import GenerationConfig tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen-7B", trust_remote_code=True) # bf16 ãäœ¿çš # model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen-7B", device_map="auto", trust_remote_code=True, bf16=True).eval() # fp16 ãäœ¿çš # model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen-7B", device_map="auto", trust_remote_code=True, fp16=True).eval() # CPU ã®ã¿äœ¿çš # model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen-7B", device_map="cpu", trust_remote_code=True).eval() # ãªãŒãã¢ãŒãã䜿çšãããšãããã€ã¹ã«å¿ããŠèªåçã«ç²ŸåºŠãéžæãããŸãã model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen-7B", device_map="auto", trust_remote_code=True).eval() # çæã®ããã®ãã€ããŒãã©ã¡ãŒã¿ãæå® model.generation_config = GenerationConfig.from_pretrained("Qwen/Qwen-7B", trust_remote_code=True) inputs = tokenizer('ã¢ã³ãŽã«ã®éŠéœã¯ãŠã©ã³ããŒãã«ïŒUlaanbaatarïŒ\nã¢ã€ã¹ã©ã³ãã®éŠéœã¯ã¬ã€ãã£ãã¯ïŒReykjavikïŒ\nãšããªãã¢ã®éŠéœã¯', return_tensors='pt') inputs = inputs.to(model.device) pred = model.generate(**inputs) print(tokenizer.decode(pred.cpu()[0], skip_special_tokens=True)) # ã¢ã³ãŽã«ã®éŠéœã¯ãŠã©ã³ããŒãã«ïŒUlaanbaatarïŒ\nã¢ã€ã¹ã©ã³ãã®éŠéœã¯ã¬ã€ãã£ãã¯ïŒReykjavikïŒ\nãšããªãã¢ã®éŠéœã¯ã¢ãã£ã¹ã¢ããïŒAddis AbabaïŒ... ``` </details> #### ð€ ModelScope ModelScope ã¯ãMaaSïŒModel-as-a-ServiceïŒ ã®ããã®ãªãŒãã³ãœãŒã¹ãã©ãããã©ãŒã ã§ãããAI éçºè ã«æè»ã§è²»çšå¯Ÿå¹æã®é«ãã¢ãã«ãµãŒãã¹ãæäŸããŸããåæ§ã«ã以äžã®ããã« ModelScope ã§ã¢ãã«ãå®è¡ããããšãã§ããŸã: ```python import os from modelscope.pipelines import pipeline from modelscope.utils.constant import Tasks from modelscope import snapshot_download model_id = 'QWen/qwen-7b-chat' revision = 'v1.0.0' model_dir = snapshot_download(model_id, revision) pipe = pipeline( task=Tasks.chat, model=model_dir, device_map='auto') history = None text = 'æµæ±çã®çéœã¯ã©ãã§ããïŒ' results = pipe(text, history=history) response, history = results['response'], results['history'] print(f'Response: {response}') text = 'äœããããªã«é¢çœãã®ãïŒ' results = pipe(text, history=history) response, history = results['response'], results['history'] print(f'Response: {response}') ``` ## ããŒã¯ãã€ã¶ãŒ tiktoken ã«åºã¥ãããŒã¯ãã€ã¶ãŒã¯ãä»ã®ããŒã¯ãã€ã¶ãŒãäŸãã°ã»ã³ãã³ã¹ããŒã¹ããŒã¯ãã€ã¶ãŒãšã¯ç°ãªããŸããç¹ã«ãã¡ã€ã³ãã¥ãŒãã³ã°ã®éã«ã¯ãç¹æ®ãªããŒã¯ã³ã«æ³šæãæãå¿ èŠããããŸããããŒã¯ãã€ã¶ã«é¢ãã詳现ãªæ å ±ãããã¡ã€ã³ãã¥ãŒãã³ã°ã«ããã䜿çšæ¹æ³ã«ã€ããŠã¯ã[ããã¥ã¡ã³ã](tokenization_note.md)ãåç §ããŠãã ããã ## éåå ### 䜿çšæ¹æ³ **泚ïŒ[AutoGPTQ](https://github.com/PanQiWei/AutoGPTQ)ã«åºã¥ãæ°ãã解決çãæäŸããQwen-7B-Chatçšã®Int4éååã¢ãã«[ãããã¯ãªãã¯](https://huggingface.co/Qwen/Qwen-7B-Chat-Int4)ããªãªãŒã¹ããŸããããã®ã¢ãã«ã¯ãåŸæ¥ã®è§£æ±ºçãšæ¯èŒããŠãã»ãŒç¡æ倱ã®ã¢ãã«å¹æãéæãã€ã€ãã¡ã¢ãªã³ã¹ããšæšè«é床ã®äž¡æ¹ã§æ§èœãåäžããŠããŸã**ã ããã§ã¯ãéååãããã¢ãã«ãæšè«ã«äœ¿çšããæ¹æ³ã瀺ããŸããå§ããåã«ãAutoGPTQã®èŠä»¶ãæºãããŠããããšã確èªãããœãŒã¹ããã€ã³ã¹ããŒã«ããŠãã ããïŒäžæçã«Qwenã®ã³ãŒãã¯ææ°çã®PyPIããã±ãŒãžã§ã¯ãŸã ãªãªãŒã¹ãããŠããŸããïŒïŒ ```bash git clone https://github.com/PanQiWei/AutoGPTQ.git && cd AutoGPTQ pip install . ``` ããããã°ã以äžã®ããã«ç°¡åã«éååã¢ãã«ãèªã¿èŸŒãããšãã§ããã ```python from auto_gptq import AutoGPTQForCausalLM model = AutoGPTQForCausalLM.from_quantized("Qwen/Qwen-7B-Chat-Int4", device_map="auto", trust_remote_code=True, use_safetensors=True).eval() ``` æšè«ãå®è¡ããã«ã¯ãäžã§ç€ºããåºæ¬çãªäœ¿ãæ¹ã«äŒŒãŠããããgeneration configurationãæ瀺çã«æž¡ãããšãå¿ããªãããšïŒ ```python from transformers import GenerationConfig config = GenerationConfig.from_pretrained("Qwen/Qwen-7B-Chat-Int4", trust_remote_code=True) response, history = model.chat(tokenizer, "Hi", history=None, generation_config=config) ``` ### æ§èœ ãã³ãããŒã¯ã«ãããBF16ã¢ãã«ãšInt4ã¢ãã«ã®æ§èœã«ã€ããŠèª¬æãããçµæã以äžã«ç€ºããŸãïŒ | Quantization | MMLU | CEval (val) | GSM8K | Humaneval | | ------------- | :--------: | :----------: | :----: | :--------: | | BF16 | 53.9 | 54.2 | 41.1 | 24.4 | | Int4 | 52.6 | 52.9 | 38.1 | 23.8 | ### æšè«ã¹ããŒã BF16ã®ç²ŸåºŠãšInt4ã®éååã¬ãã«ã®äžã§ããããã2048åãš8192åã®ããŒã¯ã³ãçæããå¹³åæšè«é床(tokens/s)ã枬å®ããã | Quantization | Speed (2048 tokens) | Speed (8192 tokens) | | ------------- | :------------------:| :------------------:| | BF16 | 30.53 | 28.51 | | Int4 | 45.60 | 33.83 | 詳现ã«ã¯ããããã¡ã€ãªã³ã°ã®èšå®ã¯ã1ã³ã³ãã¯ã¹ãã»ããŒã¯ã³ã§8192åã®æ°ããããŒã¯ã³ãçæããŠããããããã¡ã€ãªã³ã°ã¯ãPyTorch 2.0.1ãšCUDA 11.4ãæèŒããã·ã³ã°ã«A100-SXM4-80G GPUã§å®è¡ããããæšè«é床ã¯çæããã8192åã®ããŒã¯ã³ã®å¹³åå€ã§ãã ### GPUã¡ã¢ãªäœ¿çšé ãŸããBF16ãŸãã¯Int4ã®éååã¬ãã«ã§ããããã2048ããŒã¯ã³ãã³ã³ããã¹ããšããŠãšã³ã³ãŒãããå ŽåïŒããã³åäžã®ããŒã¯ã³ãçæããå ŽåïŒãšã8192ããŒã¯ã³ãçæããå ŽåïŒåäžã®ããŒã¯ã³ãã³ã³ããã¹ããšããŠçæããå ŽåïŒã®GPUã¡ã¢ãªäœ¿çšéã®ããŒã¯å€ããããã¡ã€ãªã³ã°ããŸããããã®çµæã以äžã«ç€ºããŸãã | Quantization Level | Peak Usage for Encoding 2048 Tokens | Peak Usage for Generating 8192 Tokens | | ------------------ | :---------------------------------: | :-----------------------------------: | | BF16 | 18.99GB | 24.40GB | | Int4 | 10.20GB | 15.61GB | äžèšã®ã¹ããŒããšã¡ã¢ãªãŒã®ãããã¡ã€ãªã³ã°ã¯ã[ãã®ã¹ã¯ãªãã](https://qianwen-res.oss-cn-beijing.aliyuncs.com/profile.py)ã䜿çšããŠããŸãã ## ã㢠### ãŠã§ã UI ãŠã§ãUIãã¢ãæ§ç¯ããããã®ã³ãŒããæäŸããŸãïŒ@wysaidã«æè¬ïŒãå§ããåã«ã以äžã®ããã±ãŒãžãã€ã³ã¹ããŒã«ãããŠããããšã確èªããŠãã ããïŒ ``` pip install -r requirements_web_demo.txt ``` ãããŠã以äžã®ã³ãã³ããå®è¡ããçæããããªã³ã¯ãã¯ãªãã¯ããïŒ ``` python web_demo.py ``` <p align="center"> <br> <img src="assets/web_demo.gif" width="600" /> <br> <p> ### CLI ã㢠`cli_demo.py` ã« CLI ã®ãã¢äŸãçšæããŠããŸãããŠãŒã¶ã¯ããã³ãããå ¥åããããšã§ Qwen-7B-Chat ãšå¯Ÿè©±ããããšãã§ããã¢ãã«ã¯ã¹ããªãŒãã³ã°ã¢ãŒãã§ã¢ãã«ã®åºåãè¿ããŸãã以äžã®ã³ãã³ããå®è¡ããïŒ ``` python cli_demo.py ``` <p align="center"> <br> <img src="assets/cli_demo.gif" width="600" /> <br> <p> ## API OpenAI APIãããŒã¹ã«ããŒã«ã«APIããããã€ããæ¹æ³ãæäŸããïŒ@hanpenggitã«æè¬ïŒãå§ããåã«ãå¿ èŠãªããã±ãŒãžãã€ã³ã¹ããŒã«ããŠãã ããïŒ ```bash pip install fastapi uvicorn openai pydantic sse_starlette ``` ãããããAPIããããã€ããã³ãã³ããå®è¡ããïŒ ```bash python openai_api.py ``` ãã§ãã¯ãã€ã³ãåããã¹ã«ã¯ `-c` ãCPU ãããã€ã¡ã³ãã«ã¯ `--cpu-only` ãªã©ãåŒæ°ãå€æŽã§ããŸããAPIãããã€ã¡ã³ããèµ·åããéã«åé¡ãçºçããå Žåã¯ãããã±ãŒãžãææ°ããŒãžã§ã³ã«æŽæ°ããããšã§è§£æ±ºã§ããå¯èœæ§ããããŸãã APIã®äœ¿ãæ¹ãç°¡åã ã以äžã®äŸãã芧ãã ããïŒ ```python import openai openai.api_base = "http://localhost:8000/v1" openai.api_key = "none" # create a request activating streaming response for chunk in openai.ChatCompletion.create( model="Qwen-7B", messages=[ {"role": "user", "content": "äœ å¥œ"} ], stream=True ): if hasattr(chunk.choices[0].delta, "content"): print(chunk.choices[0].delta.content, end="", flush=True) # create a request not activating streaming response response = openai.ChatCompletion.create( model="Qwen-7B", messages=[ {"role": "user", "content": "äœ å¥œ"} ], stream=False ) print(response.choices[0].message.content) ``` <p align="center"> <br> <img src="assets/openai_api.gif" width="600" /> <br> <p> ## ããŒã«ã®äœ¿çš Qwen-7B-Chat ã¯ãAPIãããŒã¿ããŒã¹ãã¢ãã«ãªã©ãããŒã«ã®å©çšã«ç¹åããŠæé©åãããŠããããŠãŒã¶ã¯ç¬èªã® Qwen-7B ããŒã¹ã® LangChainããšãŒãžã§ã³ããã³ãŒãã€ã³ã¿ããªã¿ãæ§ç¯ããããšãã§ããŸããããŒã«å©çšèœåãè©äŸ¡ããããã®è©äŸ¡[ãã³ãããŒã¯](eval/EVALUATION.md)ã§ã¯ãQwen-7B ã¯å®å®ããæ§èœã«éããŠããŸãã [](https://) | Model | Tool Selection (Acc.â) | Tool Input (Rouge-Lâ) | False Positive Errorâ | |:------------|:----------------------:|:----------------------:|:----------------------:| | GPT-4 | 95% | **0.90** | 15% | | GPT-3.5 | 85% | 0.88 | 75% | | **Qwen-7B** | **99%** | 0.89 | **9.7%** | ReAct ããã³ããã®æžãæ¹ã䜿ãæ¹ã«ã€ããŠã¯ã[ReAct ã®äŸ](examples/react_prompt.md)ãåç §ããŠãã ãããããŒã«ã䜿çšããããšã§ãã¢ãã«ãããããã¿ã¹ã¯ãå®è¡ã§ããããã«ãªããŸãã ããã«ããšãŒãžã§ã³ããšããŠã®èœåã瀺ãå®éšçµæãæäŸããã詳现㯠[Hugging Face Agent](https://huggingface.co/docs/transformers/transformers_agents) ãåç §ãHugging Face ãæäŸããã©ã³ã¢ãŒããã³ãããŒã¯ã§ã®æ§èœã¯ä»¥äžã®éãã§ã: | Model | Tool Selectionâ | Tool Usedâ | Codeâ | |:---------------|:---------------:|:-----------:|:---------:| |GPT-4 | **100** | **100** | **97.41** | |GPT-3.5 | 95.37 | 96.30 | 87.04 | |StarCoder-15.5B | 87.04 | 87.96 | 68.89 | | **Qwen-7B** | 90.74 | 92.59 | 74.07 | ## é·ãæèã®ç解 ã³ã³ããã¹ãã®é·ããæ¡åŒµããèšç·Žã·ãŒã±ã³ã¹ã®é·ãã®ããã«ããã¯ã解æ¶ããããã«ãNTK ãèæ ®ããè£éããŠã£ã³ããŠã¢ãã³ã·ã§ã³ãLogN ã¢ãã³ã·ã§ã³ã¹ã±ãŒãªã³ã°ãªã©ã®æè¡ãå°å ¥ããã³ã³ããã¹ãã®é·ãã 8K ããŒã¯ã³ä»¥äžã«æ¡åŒµãããarXiv ããŒã¿ã»ãããçšã㊠PPL è©äŸ¡ã«ããèšèªã¢ããªã³ã°å®éšãè¡ããQwen-7B ãé·ãã³ã³ããã¹ãã®ã·ããªãªã«ãããŠåè¶ããæ§èœãéæã§ããããšãèŠåºããã以äžã«çµæã瀺ããŸã: <table> <tr> <th rowspan="2">Model</th><th colspan="5" align="center">Sequence Length</th> </tr> <tr> <th align="center">1024</th><th align="center">2048</th><th align="center">4096</th><th align="center">8192</th><th align="center">16384</th> </tr> <tr> <td>Qwen-7B</td><td align="center"><b>4.23</b></td><td align="center"><b>3.78</b></td><td align="center">39.35</td><td align="center">469.81</td><td align="center">2645.09</td> </tr> <tr> <td>+ dynamic_ntk</td><td align="center"><b>4.23</b></td><td align="center"><b>3.78</b></td><td align="center">3.59</td><td align="center">3.66</td><td align="center">5.71</td> </tr> <tr> <td>+ dynamic_ntk + logn</td><td align="center"><b>4.23</b></td><td align="center"><b>3.78</b></td><td align="center"><b>3.58</b></td><td align="center">3.56</td><td align="center">4.62</td> </tr> <tr> <td>+ dynamic_ntk + logn + window_attn</td><td align="center"><b>4.23</b></td><td align="center"><b>3.78</b></td><td align="center"><b>3.58</b></td><td align="center"><b>3.49</b></td><td align="center"><b>4.32</b></td> </tr> </table> ## åçŸ ãã³ãããŒã¯ããŒã¿ã»ããã§ã®ã¢ãã«æ§èœã®åçŸã®ããã«ãçµæãåçŸããã¹ã¯ãªãããæäŸããŠããŸãã詳ãã㯠[eval/EVALUATION.md](eval/EVALUATION.md) ã確èªããŠãã ããããªããåçŸã®çµæãæã ã®å ±åçµæãšè¥å¹²ç°ãªãå Žåãããã ## FAQ åé¡ãçºçããå Žåã¯ã[FAQ](FAQ.md)ãissueãåç §ããæ°ããissueãç«ã¡äžããåã«è§£æ±ºçãæ¢ããŠãã ããã ## ã©ã€ã»ã³ã¹å¥çŽ Qwen-7B ãš Qwen-7B-Chat ã®ã³ãŒããšã¢ãã«ãŠã§ã€ãã¯ãç 究è ãéçºè ãèªç±ã«äœ¿çšããããšãã§ããŸãããŸããåçšå©çšãå¯èœã§ãã詳ãã㯠[LICENSE](LICENSE) ãã芧ãã ãããåçšå©çšãåžæãããæ¹ã¯ã[ãªã¯ãšã¹ããã©ãŒã ](https://dashscope.console.aliyun.com/openModelApply/qianwen)ã«å¿ èŠäºé ããèšå ¥ã®äžããç³ã蟌ã¿ãã ããã ## ãåãåãã ç 究ããŒã ãŸãã¯è£œåããŒã ãžã®ã¡ãã»ãŒãžã¯ãqianwen_opensource@alibabacloud.com ãŸã§ãæ°è»œã«ãéããã ããã