Merge pull request #38 from QwenLM/add_chat_eval

update EVALUATION.md
main
Yang An 2 years ago committed by GitHub
commit 7d8ddbe8d9
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23

@ -8,7 +8,13 @@ mkdir data/ceval
mv ceval-exam.zip data/ceval mv ceval-exam.zip data/ceval
cd data/ceval; unzip ceval-exam.zip cd data/ceval; unzip ceval-exam.zip
cd ../../ cd ../../
# Qwen-7B
python evaluate_ceval.py -d data/ceval/ python evaluate_ceval.py -d data/ceval/
# Qwen-7B-Chat
pip install thefuzz
python evaluate_chat_ceval.py -d data/ceval/
``` ```
- MMLU - MMLU
@ -19,7 +25,13 @@ mkdir data/mmlu
mv data.tar data/mmlu mv data.tar data/mmlu
cd data/mmlu; tar xf data.tar cd data/mmlu; tar xf data.tar
cd ../../ cd ../../
# Qwen-7B
python evaluate_mmlu.py -d data/mmlu/data/ python evaluate_mmlu.py -d data/mmlu/data/
# Qwen-7B-Chat
pip install thefuzz
python evaluate_chat_mmlu.py -d data/mmlu/data/
``` ```
- HumanEval - HumanEval
@ -27,19 +39,28 @@ python evaluate_mmlu.py -d data/mmlu/data/
Get the HumanEval.jsonl file from [here](https://github.com/openai/human-eval/tree/master/data) Get the HumanEval.jsonl file from [here](https://github.com/openai/human-eval/tree/master/data)
```Shell ```Shell
python evaluate_humaneval.py -f HumanEval.jsonl -o HumanEval_res.jsonl
git clone https://github.com/openai/human-eval git clone https://github.com/openai/human-eval
pip install -e human-eval pip install -e human-eval
# Qwen-7B
python evaluate_humaneval.py -f HumanEval.jsonl -o HumanEval_res.jsonl
evaluate_functional_correctness HumanEval_res.jsonl evaluate_functional_correctness HumanEval_res.jsonl
# Qwen-7B-Chat
python evaluate_chat_mmlu.py -f HumanEval.jsonl -o HumanEval_res_chat.jsonl
evaluate_functional_correctness HumanEval_res_chat.jsonl
``` ```
When installing package human-eval, please note its following disclaimer: When installing package human-eval, please note its following disclaimer:
This program exists to run untrusted model-generated code. Users are strongly encouraged not to do so outside of a robust security sandbox. The execution call in execution.py is deliberately commented out to ensure users read this disclaimer before running code in a potentially unsafe manner. See the comment in execution.py for more information and instructions. This program exists to run untrusted model-generated code. Users are strongly encouraged not to do so outside of a robust security sandbox. The execution call in execution.py is deliberately commented out to ensure users read this disclaimer before running code in a potentially unsafe manner. See the comment in execution.py for more information and instructions.
- GSM8K - GSM8K
```Shell ```Shell
# Qwen-7B
python evaluate_gsm8k.py python evaluate_gsm8k.py
# Qwen-7B-Chat
python evaluate_chat_gsm8k.py # zeroshot
python evaluate_chat_gsm8k.py --use-fewshot # fewshot
``` ```
Loading…
Cancel
Save