This paper introduces MotionLLaMA, a unified framework for motion synthesis and comprehension, along with a novel full-body motion tokenizer called the HoMi Tokenizer. MotionLLaMA is developed based on three core principles. First, it establishes a powerful unified representation space through the HoMi Tokenizer. Using a single codebook, the HoMi Tokenizer in MotionLLaMA achieves reconstruction accuracy comparable to residual vector quantization tokenizers utilizing six codebooks, outperforming all existing single-codebook tokenizers. Second, MotionLLaMA integrates a large language model to tackle various motion-related tasks. This integration bridges various modalities, facilitating both comprehensive and intricate motion synthesis and comprehension. Third, MotionLLaMA introduces the MotionHub dataset, currently the most extensive multimodal, multitask motion dataset, which enables fine-tuning of large language models. Extensive experimental results demonstrate that MotionLLaMA not only covers the widest range of motion-related tasks but also achieves state-of-the-art (SOTA) performance in motion completion, interaction dual-person text-to-motion, and all comprehension tasks while reaching performance comparable to SOTA in the remaining tasks.
a person quickly sidesteps to the left and regains balance.
the person is hopping up-and-down.
a person puts hands together, then places them on his thighs and lastly outstretched from his sides.
running from side to side.
A woman is dancing a House Heel Step.
A person is pushed by the other in front of him/her.
A person is stepping forward and raising his/her hands to hug the other's back
person standing bends forward slightly to left to pick something up with both hands and moves item slightly to the right
a person walks forward unbalanced and almost falls over.
Two people stand facing each other, extending their hands simultaneously to push each other's shoulders.
One person bends down towards the other, while the other sits in a chair and waves his/her right hand to signal for him/her to stand up.
the first person raises her right leg while the second disappears to the left.
@misc{ling2024motionllama,
title={MotionLLaMA: A Unified Framework for Motion Synthesis and Comprehension},
author={Zeyu Ling and Bo Han and Shiyang Li and Hongdeng Shen and Jikang Cheng and Changqing Zou},
year={2024},
archivePrefix={arxiv},
}This website draws heavy design inspiration from the excellent EDGE site.