Human Activity Recognition (HAR): Fundamentals, Models, Datasets

Table of Contents

  1. Introduction
  2. What is Pose Estimation?
  3. How Does AI-Based Human Activity Recognition Work?
  4. Some Important Datasets for Human Activity Recognition
  5. Real-Life Applications of Human Activity Recognition
  6. Conclusion
  7. Frequently Asked Questions (FAQ)

Introduction

Human activity recognition (HAR) refers to using computer and machine vision technology to interpret and understand human motion. HAR involves analyzing sensor-recorded data to interpret various forms of human motion, including activities, gestures, and behaviors.

This data is then translated into actionable commands that computers can execute and analyze using HAR algorithms.

Human activity recognition (HAR) has numerous applications across various domains. In healthcare, HAR can monitor and assess patients' movements and activities to detect abnormalities, track rehabilitation progress, or provide personalized care.

           Figure: Human Activity Recognition

In sports and athletics, HAR can analyze athletes' performance, provide feedback on technique, and prevent injuries by identifying improper movements.

HAR also finds application in surveillance systems, which can automatically detect and classify suspicious or abnormal activities for enhanced security.

Vision-based HAR systems often employ pose estimation techniques, which provide valuable insights into human behavior.

Pose estimation is crucial in tasks like HAR, content extraction, and semantic comprehension. Deep learning approaches, particularly convolutional neural networks, are commonly used in pose estimation.

One of the significant challenges in HAR is considering various factors such as physical attributes, cultural markers, direction, and pose types. For instance, distinguishing between a person falling and attempting a handstand can be difficult.

Addressing this uncertainty requires the development of novel methods within the artificial intelligence framework.

Researchers are exploring techniques such as multi-modal and graph-based learning to improve the accuracy and robustness of HAR systems.

These approaches involve incorporating more complex features, utilizing multiple data sources, and capturing the spatial and temporal relationships between different body parts.

In addition, to pose estimation and model complexity, HAR faces other challenges. Disparities in sensor data due to the placement of sensors, variations in human movement patterns, overlapping activities that interfere with accurate recognition, noisy data causing distortions, and the time-consuming and expensive nature of data collection methods are some of the prominent challenges in the field.

What is Pose Estimation?

Pose estimation is a task in computer vision that involves determining the position and orientation of a person or object in an image or video. It can be thought of as the process of inferring the pose based on the given visual data.

This is achieved by identifying and tracking specific points, known as key points, on the object or person of interest.

             Figure: Pose Estimation

These key points can be significant features or joints, such as corners for objects or major joints like elbows or knees for humans. By analyzing these key points' spatial relationships and movements, pose estimation algorithms can estimate the pose accurately.

How Does AI-Based Human Activity Recognition Work?

AI-based human activity recognition utilizes advanced machine learning and computer vision techniques to analyze sensor data and identify and classify human activities. The major steps involved include:

  1. Data Collection
  2. Data Preprocessing
  3. Model Selection and Training

This section focuses on a pipeline for how general Human Activity Recogonition is developed.

1.  Data Collection

HAR data is commonly gathered using sensors attached to or worn by the user. These sensors include accelerometers, gyroscopes, magnetometers, and GPS sensors.

Accelerometers can detect changes in motion and direction and measure velocity along three axes (x, y, and z). On the other hand, Magnetometers can perceive magnetic fields and their orientation, while gyroscopes can measure rotations and angular velocity.

GPS sensors can provide information about the user's location and movement, although they are not frequently used in HAR due to their high power consumption and limited accuracy indoors.

The sensor data collected is typically recorded as time-series data, where each sample represents the sensor measurements at a specific point in time (e.g., every second).

2. Data Preprocessing

Data preprocessing is a critical stage in Human Activity Recognition (HAR) as it plays a fundamental role in cleaning, transforming, and preparing raw sensor data for subsequent analysis and modeling. The following are key processes involved in data preparation:

i) Filtering

Filtering is a signal processing technique that removes noise and undesirable signals from raw sensor data. In HAR, various filters are applied depending on the frequency range of the signals of interest.

Commonly used filters include low-pass filters, which allow low-frequency components to pass while attenuating high-frequency noise; high-pass filters, which suppress low-frequency noise and emphasize high-frequency variations; and band-pass filters, which selectively allow a specific range of frequencies to pass, effectively filtering out unwanted signals and enhancing the desired signals.

ii) Feature extraction

The choice of features depends on the specific actions and the type of sensors used. For example, features such as mean, standard deviation, and frequency-domain properties (e.g., Fourier transformation and wavelet transformation parameters) can be extracted with accelerometer data.

These features capture essential characteristics of the motion patterns and provide relevant information for activity recognition.

iii) Feature selection

Feature selection aims to reduce the dimensionality of the feature space while retaining the most informative and discriminative features. The performance and efficiency of activity identification algorithms can be improved by selecting the most relevant features.

Features are evaluated based on their ability to distinguish between different activities, association with activity labels, and redundancy with other features.

iv) Data Segmentation

Data Segmentation involves dividing the continuous stream of sensor data into smaller segments or windows to capture the temporal aspects of activities. The size and overlap of the windows depend on the duration and intensity of the activities being monitored.

Segmentation enables the analysis of activity patterns within shorter time intervals, facilitating the extraction of meaningful features from specific activity segments.

v) Data Normalization

Data Normalization is a process that scales the features to have a standardized mean and variance, typically aiming for a mean of zero and a variance of one.

This step ensures that the features from different sensors or participants are on a comparable scale, preventing any biases introduced by variations in sensor sensitivity or participant characteristics.

vi) Dimensionality reduction

Dimensionality reduction techniques such as Principal Component Analysis (PCA) and t-distributed stochastic neighbor embedding (t-SNE) are applied to reduce the dimensionality of the feature space.

PCA identifies the most significant components of the data, capturing the maximum variance and allowing for a lower-dimensional representation.

t-SNE is a nonlinear technique that aims to preserve the local structure of the data, enabling visualization of high-dimensional data in a lower-dimensional space.

vii) Missing Value Imputation

Missing value imputation addresses the issue of incomplete sensor data, which can occur due to device malfunctions or data transmission faults. Simple imputation approaches, such as mean or median interpolation, can estimate missing values based on the available data, ensuring the data is complete and ready for analysis.

viii) Data Preparation

Data preparation is a crucial stage in HAR as it directly impacts the accuracy and reliability of activity identification models.

By effectively preprocessing the sensor data, removing noise, extracting informative features, and reducing dimensionality, researchers and practitioners can enhance the precision and dependability of HAR systems, leading to more robust and accurate activity recognition results.

3. Model Selection

Several machine learning models have been successfully applied in Human Activity Recognition (HAR) tasks. Let's delve into some popular models used in detail:

i) Decision Trees

Decision Trees are simple yet effective models for classification tasks in HAR. They create a tree-like structure where each internal node represents a feature or attribute, and each leaf node corresponds to a class label. Decision trees can handle continuous and categorical data and capture non-linear interactions among features.

They provide interpretability, allowing us to understand the decision-making process. However, decision trees can be prone to overfitting when the data is complex or noisy.

ii) Random Forest

Random Forest is an ensemble model that combines multiple decision trees to improve performance and reduce overfitting. It creates a collection of decision trees, each trained on a different subset of the data with random feature subsets.

The final prediction is made by aggregating the predictions from individual trees. Random forests can handle noisy and high-dimensional data and are robust against overfitting.

They are computationally efficient and can handle missing values. However, they may require more computational resources compared to decision trees.

iii) Support Vector Machines (SVMs)

SVMs are potent models for linear and non-linear classification tasks in HAR. They aim to find an optimal hyperplane separating different classes by maximizing the margin between them.

SVMs can handle high-dimensional data and are less prone to overfitting. They work well even with small to medium-sized datasets and can handle both continuous and categorical features.

iv) Hidden Markov Models (HMMs)

HMMs are statistical models widely used in HAR for recognizing sequential patterns in sensor data. They are particularly suitable for time-series data where the temporal dependencies between observations are important.

HMMs consist of hidden states representing different activities and observed emissions corresponding to sensor measurements. They can capture the dynamics and transitions between different activities, making them effective for modeling complex activities with multiple steps.

v) Neural Networks

Activity classification involves predicting future values from past data using statistical techniques and is commonly used for forecasting and analyzing sensor data.

In human activity recognition, neural networks have shown great effectiveness. Two widely used approaches for this task are Convolutional Neural Network (CNN) and Recurrent Neural Network (RNN) models. Below we discuss both of these approaches in brief.

  1. Recurrent neural networks (RNNs)

RNN models are particularly well-suited for handling time-series data. They can process sequences of variable lengths, making them ideal for activity recognition.

Classifying activities using RNN models involves vectorizing video files, calculating descriptors to represent activity characteristics, forming a visual bag of words, feeding the descriptors into input layers, analyzing and classifying the data using RNN layers, and obtaining the final result.

     Figure: General Architecture for Recurrent Neural Networks

RNNs have been successfully employed in various applications, such as predicting pedestrian movements using camera and GPS data.

2.  Convolutional Neural Network

On the other hand, CNN models are specialized neural networks known for their effectiveness in processing visual data.

They are resilient to changes in scale, rotation, and other variations. CNNs have been widely used in image recognition, automatic number plate reading, and self-driving car software.

An example of their application in human activity recognition is a 3D CNN algorithm that accurately reconstructs the three-dimensional pose of animals without the need for attached markers.

This method proves beneficial for observing animals in both laboratory and wildlife settings. The training process involves assembling a dataset of synchronized video frames with labeled anatomical landmarks and training the CNN using this data.

Some Important Datasets for Human Activity Recognition

In this section, we discuss various datasets that can be used to train a computer vision model for Human Activity recognition.

  1. UCI Human Activity Recognition Using Smartphones Dataset: This dataset contains data from smartphones' accelerometers and gyroscope sensors, capturing various activities performed by different subjects. It is widely used as a benchmark for HAR algorithms.
  2. KTH Human Activity Recognition Dataset: This dataset comprises videos demonstrating six human activities, including walking, jogging, running, boxing, handwaving, and handclapping. It is commonly utilized for action recognition and activity classification tasks.

         Figure: KTH Human Activity Recognition Dataset

3. UCF101: UCF101 is a large-scale video dataset consisting of 101 action classes. It features real-world videos from YouTube, covering various human activities such as sports, dancing, playing musical instruments, and more.

          Figure: Sample Images of UCF101

4. HMDB51: The HMDB51 dataset is a video dataset widely employed in human activity recognition. It contains videos from diverse sources, encompassing 51 action classes, including walking, jumping, cooking, brushing teeth, and more.

        Figure: Sample Images of the HMDB51 dataset

5.  ActivityNet: ActivityNet is a large-scale video dataset containing diverse human activities. It includes untrimmed videos annotated with activity labels, enabling research in activity recognition and temporal localization tasks.

These datasets are widely utilized resources for researchers and developers in the field of human activity recognition, facilitating the evaluation and advancement of HAR algorithms and techniques.

Real-Life Applications of Human Activity Recognition

In the below section, we discuss some use cases of Human Activity Recogonition along with their real-life implementations.

1. Human Activity Recognition in Health Monitoring Application

Healthcare and Wellness: HAR Techniques can monitor patient activities and detects abnormal behavior or changes in daily routines.

Figure: Human Activity Recognition in Health Monitoring Application

For example, HAR is widely used to monitor medication adherence by analyzing patient activities through smartphone sensors.

Through Remote patient engagement and assessment methods, which include the measurement of digital biomarkers and real-time monitoring of medication dosing, adherence to prescribed medications can be ensured.

The dosing support solution operates through a smartphone application, where patients receive alerts reminding them to take their medication and are guided through the correct administration process.

One such company is AICure which utilizes artificial intelligence (AI) and advanced data analytics to oversee patient behavior and facilitate remote engagement in clinical trials.

2) Applications of Human Activity Recognition in the Sports and Fitness Industry

HAR is employed in the sports and fitness industry for sports performance analysis to track and evaluate athletes' movements and techniques. HAR is mostly utilized to provide insights into player performance, injury prevention, and training optimization.

This is primarily achieved using wearable tracking devices and data analytics. These devices, such as GPS trackers and inertial sensors, are designed to collect various metrics during training or competition, including player movement, speed, acceleration, deceleration, and positional data. This data is then transmitted wirelessly to a central system for analysis.

The central system employs advanced data analytics algorithms to process and interpret the collected data. It provides valuable insights into athlete performance, workload, and injury risk.

Coaches, trainers, and sports scientists can access this information through user-friendly dashboards and visualizations to make data-driven decisions and optimize training strategies.

       Figure: Sports Analysis using Catapult’s Software

Catapult's Sports technology is widely used in professional sports leagues, including soccer, basketball, American football, and rugby, as well as in collegiate and Olympic-level programs.

3) Applications of Human Activity Recognition in Securit and Surveillance

HAR is utilized in security systems to identify suspicious activities and enhance surveillance. Camio, a video surveillance company, uses HAR algorithms to detect and classify human actions in real time, enabling proactive security measures.

The goal of introducing HAR in the surveillance industry is to make videos valuable and actionable by transforming it into real-time insights and alerts. This can be done by leveraging advanced computer vision, machine learning, and artificial intelligence algorithms to extract valuable information from video streams.

Camio's platform is designed to be flexible and scalable, capable of processing video streams from various sources such as IP cameras, smartphones, and drones. Using cloud infrastructure, Camio provides their clients with real-time video analysis and monitoring capabilities.

4) Application of HAR in Manufacturing Industries

In industrial settings, Human Activity Recognition (HAR) is implemented to monitor worker activities and effectively ensure safety protocol adherence. HAR technology plays a vital role in detecting and preventing hazardous movements, offering real-time feedback to workers for enhanced safety measures.

Various companies, including WearKinetic, specialize in wearable technology and data analytics. They focus on developing innovative wearable devices and software solutions that empower individuals and organizations to track and analyze human movement data.

Through wearable devices, statistical data demonstrates a significant reduction in manual injuries by 50-60% while increasing working efficiency by an impressive 72%. These advantages highlight the positive impact of wearable technology on workplace safety and productivity.

5) Application of HAR in the Gaming Industry

Human activity recognition (HAR) has several applications in the gaming industry, enhancing the gaming experience and enabling more immersive gameplay.

With the help of HAR, systems accurately track and recognize various human activities, such as running, jumping, punching, or swinging a sword.

This data is then used to control and animate the in-game characters, allowing players to engage in virtual environments using their own body movements.

Xsens is a leading provider of 3D motion capture technology and solutions. They offer a range of products and software that enable real-time human motion tracking and analysis. In the gaming industry, Xsens technology captures players' movements and translates them into in-game actions.

Conclusion

Human activity recognition (HAR) is a field that uses computer and machine vision technology to interpret and understand human motion. It involves analyzing sensor-recorded data to interpret various human activities, gestures, and behaviors. HAR has applications in healthcare, sports, surveillance, and other domains.

HAR relies on techniques such as pose estimation, which determines the position and orientation of a person or object in an image or video by identifying key points or joints. Deep learning approaches, particularly convolutional neural networks (CNNs), are commonly used for pose estimation.

Data collection for HAR involves using accelerometers, gyroscopes, magnetometers, and GPS sensors, which capture motion, direction, and location information.

The collected sensor data is then preprocessed by filtering out noise, extracting relevant features, segmenting the data, normalizing the features, and reducing dimensionality.

HAR model selection includes decision trees, random forests, support vector machines (SVMs), hidden Markov models (HMMs), and neural networks.

Decision trees and random forests provide interpretability and handle complex data, SVMs handle high-dimensional data, HMMs are suitable for time-series data, and neural networks, such as recurrent neural networks (RNNs) and CNNs, are effective for handling time-series and visual data, respectively.

Several datasets are commonly used for HAR research, including the UCI Human Activity Recognition Using Smartphones Dataset, KTH Human Activity Recognition Dataset, UCF101, HMDB51, and ActivityNet.

Real-life applications of HAR include healthcare and wellness monitoring, sports performance analysis, security and surveillance systems, industrial safety monitoring, and enhancing the gaming experience.

Frequently Asked Questions (FAQ)

What are the different types of human activity recognition?

Different types of human activity recognition include Sensor-Based Activity Recognition, Single-User Activity Recognition, Multi-User Activity Recognition, and Group Activity Recognition.

What is the meaning of activity recognition?

Activity recognition refers to predicting human movement or activities based on sensor data, typically from devices like smartphone accelerometers. This involves analyzing streams of sensor data, which are divided into smaller segments known as windows.

Each window is then associated with a specific activity, following a sliding window approach. The goal is to accurately identify and classify different activities based on the patterns and information captured by the sensors.

What is the aim of human activity recognition?

The aim of human activity recognition is to analyze video sequences or still images and accurately classify the input data into different activity categories. The goal is to develop systems that can correctly identify and categorize the underlying activities based on the visual information captured in the input data.