Abstract: Many recent Transformer-based 3D HPE(Human Pose Estimation) methods treat each frame of the video as an individual pose token and input it into the model for computation, which leads to ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results