ChinaTravel: A Real-World Benchmark for Language Agents in Chinese Travel Planning

Official codebase for the paper "ChinaTravel: A Real-World Benchmark for Language Agents in Chinese Travel Planning".

| Webpage | Paper | Dataset(Huggingface), Dataset(ModelScope) |

ChangeLog

2025.04

Added local data loaderUsers can now load custom queries locally. When specifying non-default splits_name values (e.g., "abc") in run_exp, the system will automatically load corresponding files from evaluation/default_splits/abc.txt, where the TXT file contains the target query filenames.
Detailed constraints classification.See detailed docs at Evaluation README
Introduced LLM-modulo baseline Implement the LLM-modulo pipeline with a ground-truth symbolic verifier. Based on methodology from: Paper: Robust Planning with Compound LLM Architectures: An LLM-Modulo Approach Codebase: https://github.com./Atharva-Gundawar/LLM-Modulo-prompts
Support local LLMs inference with Qwen3-8B/4B.

Quick Start

Setup

Create a conda environment and install dependencies:

conda create -n chinatravel python=3.9  
conda activate chinatravel  
pip install -r requirements.txt

Download the database and unzip it to the chinatravel/environment/ directory

Download Links: Google Drive, NJU Drive

Running

We support the deepseek (offical API from deepseek), gpt-4o (chatgpt-4o-latest), glm4-plus, and local inferences with qwen (Qwen2.5-7B-Instruct).

export OPENAI_API_KEY=""

python run_exp.py --splits easy --agent LLMNeSy --llm deepseek --oracle_translation
python run_exp.py --splits medium --agent LLMNeSy --llm deepseek --oracle_translation
python run_exp.py --splits human --agent LLMNeSy --llm deepseek --oracle_translation


python run_exp.py --splits human --agent LLMNeSy --llm deepseek

Note: please download the model weights to the "project_root_path/chinatravel/open_source_llm/Qwen2.5-7B-Instruct/".

Evaluation

python eval_exp.py --splits human --method LLMNeSy_deepseek_oracletranslation
python eval_exp.py --splits human --method LLMNeSy_deepseek

Docs

Environment Constraints

Contact

If you have any problems, please contact Jie-Jing Shao, Bo-Wen Zhang, Xiao-Wen Yang.

Citation

If our paper or related resources prove valuable to your research, we kindly ask for citation.

@misc{shao2024chinatravelrealworldbenchmarklanguage,
      title={ChinaTravel: A Real-World Benchmark for Language Agents in Chinese Travel Planning}, 
      author={Jie-Jing Shao and Xiao-Wen Yang and Bo-Wen Zhang and Baizhi Chen and Wen-Da Wei and Guohao Cai and Zhenhua Dong and Lan-Zhe Guo and Yu-feng Li},
      year={2024},
      eprint={2412.13682},
      archivePrefix={arXiv},
      primaryClass={cs.AI},
      url={https://arxiv.org/abs/2412.13682}, 
}

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
chinatravel		chinatravel
images		images
.gitignore		.gitignore
README.md		README.md
download_llm.sh		download_llm.sh
eval_exp.py		eval_exp.py
requirements.txt		requirements.txt
run_exp.py		run_exp.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ChinaTravel: A Real-World Benchmark for Language Agents in Chinese Travel Planning

ChangeLog

2025.04

Quick Start

Setup

Running

Evaluation

Docs

Contact

Citation

About

Releases

Packages

Contributors 2

Languages

LAMDASZ-ML/ChinaTravel

Folders and files

Latest commit

History

Repository files navigation

ChinaTravel: A Real-World Benchmark for Language Agents in Chinese Travel Planning

ChangeLog

2025.04

Quick Start

Setup

Running

Evaluation

Docs

Contact

Citation

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages