Models

Large Model
Benchmark

Large Model

OpenAI o1: https://openai.com/index/introducing-openai-o1-preview/
GPT-4o: https://openai.com/index/hello-gpt-4o/
Claude 3.5: https://docs.anthropic.com/zh-CN/docs/intro-to-claude#claude-3-5
Qwen: https://tongyi.aliyun.com/
ERNIE: https://wenxin.baidu.com/wenxin
NVIDIA Cosmos: https://www.nvidia.com/en-us/ai/cosmos/
KTransformers: https://kvcache-ai.github.io/ktransformers/
Veo: https://deepmind.google/models/veo/
Alpaca: https://crfm.stanford.edu/2023/03/13/alpaca.html

Benchmark

AGI-Eval: https://agi-eval.cn/mvp/home?sourcePage=aihub.cn
AI-Ceping: https://ai-ceping.com/
WebWalker: https://alibaba-nlp.github.io/WebWalker/

Continue reading Datasets

Models

Large Model

Benchmark

Jade Cong

Error

Large Model

Benchmark

Templates (for web app):

Error