Abstract
Most of the top performing action recognition methods use optical flow as a
"black box" input. Here we take a deeper look at the combination of flow and
action recognition, and investigate why optical flow is helpful, what makes a
flow method good for action recognition, and how we can make it better. In
particular, we investigate the impact of different flow algorithms and input
transformations to better understand how these affect a state-of-the-art action
recognition method. Furthermore, we fine tune two neural-network flow methods
end-to-end on the most widely used action recognition dataset (UCF101). Based
on these experiments, we make the following five observations: 1) optical flow
is useful for action recognition because it is invariant to appearance, 2)
optical flow methods are optimized to minimize end-point-error (EPE), but the
EPE of current methods is not well correlated with action recognition
performance, 3) for the flow methods tested, accuracy at boundaries and at
small displacements is most correlated with action recognition performance, 4)
training optical flow to minimize classification error instead of minimizing
EPE improves recognition performance, and 5) optical flow learned for the task
of action recognition differs from traditional optical flow especially inside
the human body and at the boundary of the body. These observations may
encourage optical flow researchers to look beyond EPE as a goal and guide
action recognition researchers to seek better motion cues, leading to a tighter
integration of the optical flow and action recognition communities.