Abstract: Many recent Transformer-based 3D HPE(Human Pose Estimation) methods treat each frame of the video as an individual pose token and input it into the model for computation, which leads to ...