The customer is a North American start-up providing automatic security solutions for
businesses and government institutions. The company actively adopts a machine learning approach for real-time
object detection and action recognition tasks in video streams from their customers’ security cameras.
The customer’s company was developing a brand new security solution for outdoor
surveillance. The MERA team was tasked to create a PoC (proof of concept) for action recognition using machine
learning techniques. MERA experts had to deliver the solution at short notice operating with limited resources,
including both design resource for implementation and hardware resources for the selected platform.
Action recognition is known to be an unsolved problem until now with limited accuracy of
state-of-the-art solutions. One reason for this is lack of good datasets for such tasks. Thus, our specialists had
to deal with less than 30 min of videos available for labeling. The other problem is that the system should
recognize actions from different angles of the camera. The MERA team had to cope with both issues.
Bearing in mind limited hardware capabilities and lack of dataset provided by the customer,
MERA engineers conducted a brief study to determine potential solutions which can show appropriate results with a
limited dataset. Several neural network architectures were selected as possible candidates for PoC, including:
activity detection based on the single frame using CNN, activity detection using multiple frames. The last option
showed more promising results during experiments. Therefore, the MERA team had checked a few other options: 3DCNN
Eventually, the most encouraging results were demonstrated by an approach combining transfer
learning (using pre-trained Inception v3 model), and LSTM. This model was selected as the final solution for PoC.
MERA provided the customers with the reliable Proof of Concept they needed for further
solution development. Our experts successfully accomplished the task in as little as 3 months, despite resource
constraints and lack of dataset. The models were integrated into the web server allowing the customers to perform
live-action demonstrations for their clients.