Abstract
Egocentric action anticipation consists in predicting a future action the
camera wearer will perform from egocentric video. While the task has recently
attracted the attention of the research community, current approaches assume
that the input videos are "trimmed", meaning that a short video sequence is
sampled a fixed time before the beginning of the action. We argue that, despite
the recent advances in the field, trimmed action anticipation has a limited
applicability in real-world scenarios where it is important to deal with
"untrimmed" video inputs and it cannot be assumed that the exact moment in
which the action will begin is known at test time. To overcome such
limitations, we propose an untrimmed action anticipation task, which, similarly
to temporal action detection, assumes that the input video is untrimmed at test
time, while still requiring predictions to be made before the actions actually
take place. We design an evaluation procedure for methods designed to address
this novel task, and compare several baselines on the EPIC-KITCHENS-100
dataset. Experiments show that the performance of current models designed for
trimmed action anticipation is very limited and more research on this task is
required.
Citation
ID:
282431
Ref Key:
farinella2022untrimmed