From 19474456d80f787fd941e31c57b118a06e894628 Mon Sep 17 00:00:00 2001 From: Yang An Date: Thu, 3 Aug 2023 23:13:02 +0800 Subject: [PATCH] Update tech_memo.md --- tech_memo.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/tech_memo.md b/tech_memo.md index 0939521..6e41e6e 100644 --- a/tech_memo.md +++ b/tech_memo.md @@ -312,7 +312,7 @@ ReAct is also one of the main approaches used by the [LangChain](https://python. For how to write and use prompts for ReAct Prompting, please refer to [the ReAct examples](examples/react_prompt.md). In the soon-to-be-released evaluation benchmark for assessing tool usage capabilities, Qwen's performance is as follows: -| Model | Tool Selection (Acc.↑) | Tool Input (Rouge-L↑) | False Positive Error↓ | +| Model | Tool Selection (Acc.↑) | Tool Input (Rouge-L↑) | False Positive Error↓ | | :---------- | --------------------------: | -------------------------: | -------------------------: | | GPT-4 | 95% | **0.90** | 15.0% | | GPT-3.5 | 85% | 0.88 | 75.0% | @@ -325,7 +325,7 @@ In the soon-to-be-released evaluation benchmark for assessing tool usage capabil Qwen also has the capability to be used as a [HuggingFace Agent](https://huggingface.co/docs/transformers/transformers_agents). Its performance on the benchmark provided by HuggingFace is as follows: -| Model | Tool Selection↑ | Tool Used↑ | Code↑ | +| Model | Tool Selection↑ | Tool Used↑ | Code↑ | | :-------------- | -------------------: | --------------: | ---------: | | GPT-4 | **100.00** | **100.00** | **97.41** | | GPT-3.5 | 95.37 | 96.30 | 87.04 |