Sean
|
2b565da220
|
Update evaluate_plugin.py
change Old Evaluation Dataset (Version 20230803) to new version
|
11 months ago |
兼欣
|
7eb9016908
|
update agent benchmarks and add qwen-72b results
|
1 year ago |
yangapku
|
e8e15962d8
|
add 72B and 1.8B Qwen models, add Ascend 910 and Hygon DCU support, add docker support
|
1 year ago |
yangapku
|
c00209f932
|
update evaluate scripts
|
1 year ago |
yangapku
|
f076e2fa42
|
specify repetition penalty
|
1 year ago |
yangapku
|
fc57dea277
|
release latest models
|
1 year ago |
yangapku
|
b86a0f2c8a
|
update EVALUATION.md
|
1 year ago |
feihu.hf
|
4864f7b278
|
fix format problems in evaluation code; update ceval extraction rules
|
1 year ago |
Yang An
|
677180a653
|
Merge pull request #185 from Owen-Qin/fix_ceval
fix bug for ceval
|
1 year ago |
qinxy3
|
543ffaf617
|
fix code
|
1 year ago |
qinxy3
|
bff91b3305
|
fix bug for ceval
|
1 year ago |
Haonan Li
|
e7072a49c0
|
add CMMLU evaluation results
|
1 year ago |
兼欣
|
9139fbdf99
|
release the evaluation benchmark for tool use; update tool use results to that of the hf version
|
1 year ago |
feihu.hf
|
680a3e8bb8
|
update EVALUATION.md
|
1 year ago |
feihu.hf
|
1134e08be7
|
add evaluation code for Qwen-7B-Chat
|
1 year ago |
JustinLin610
|
ba2d85a13b
|
first commit
|
1 year ago |