Abstract: In most of the existing work for activity recognition, 3D ConvNets show promising performance for learning spatiotemporal features of videos. However, most methods sample fixed-length frames ...