A Public Domain Dataset For Real-life Human Activity Recognition Using Smartphone Sensors

Abstract

In recent years, human activity recognition has become a hot topic inside the scientific community. The reason to be under the spotlight is its direct application in multiple domains, like healthcare or fitness. Additionally, the current worldwide use of smartphones makes particularly easy to get this kind of data from people in a non-intrusive and cheaper way, without the need of other wearables. In this paper, we introduce our orientation-independent, placement-independent and subject-independent human activity recognition dataset. The information in this dataset is the measurements from the accelerometer, gyroscope, magnetometer, and GPS of the smartphone. Additionally, each measure is associated with one of the four possible registered activities: inactive, active, walking and driving. This work also proposes an SVM model to perform some preliminary experiments on the dataset. Considering that this dataset was taken from smartphones in their actual use, unlike other datasets, the development of a good model on such data is an open problem and a challenge for researchers. By doing so, we would be able to close the gap between the model and a real-life application.

Data collection was made through an Android app, for each smartphone of every individual from the 19 that got part in the study. The data collected, by focusing such gathering on a real-life environment, does not have a fixed orientation or placement of the smartphone, leaving each individual to use their device as they would for each of the specified actions.

The activities performed were four:

  • Inactive: not carrying the mobile phone. For example, the device is on the desk while the individual performs another kind of activities.
  • Active: carrying the mobile phone, moving, but not going to a particular place. In other words, this means that, for example, making dinner, being in a concert, buying groceries or doing the dishes count as "active" activities.
  • Walking: Moving to a specific place. In this case, running or jogging count as a "walking" activity.
  • Driving: Moving in a means of transport powered by an engine. This would include cars, buses, motorbikes, trucks and any similar.

The data collected comes from four different sensors: accelerometer, gyroscope, magnetometer and GPS. We saved the data of the accelerometer, the gyroscope and the magnetometer with their tri-axial values. In the case of GPS, we stored the device's increments in latitude, longitude and altitude, as well as the bearing, speed and accuracy of the collected measurements.

Note that these measures do not have a fixed frequency, as it is not possible to set it on Android devices. In addition, some of the people who took part in the study did not have all these sensors available on their smartphone, so there are certain sessions that do not have a gyroscope, or neither a gyroscope nor a magnetometer together.

Inside each .zip file, there is a README file with details for each part.

Updates

  • 01/07/2020: updated all data files, except the raw and cleaned ones, due to errors found on the related scripts, which were updated as well.
  • 02/04/2020: updated some data files that were not using the correct names in the headers

Source Code

Datasets

  • Raw data: data_raw.zip. Data extracted directly from the Android app and the sensors of each smartphone involved in the study.
  • Data cleaned: data_cleaned.zip. Data obtained after the application of our Anomaly_Detector.py script to preprocess raw data.
  • Data cleaned and adapted (full): data_cleaned_adapted_full.zip. Data obtained after the application of our Data_Adapter.py script.
  • Data cleaned and adapted (splits): data_cleaned_adapted_splits.zip. Cleaned and adapted data, divided into splits obtained after the application of our Data_Splitter.py script.
  • Data cleaned, adapted and with features extracted (splits): data_cleaned_adapted_features_splits.zip. Data obtained after the application of our Feature_Extraction.py script, divided into splits.
  • Data cleaned, adapted and with features extracted (full): data_cleaned_adapted_features_full.zip. Data obtained after joining all feature extraction splits.

Publications

This work has been submitted to Sensors (MDPI).

About us

We are a group of researchers from the University of A Coruña (Spain).

Funding

This research was partially funded by Xunta de Galicia/FEDER-UE (ConectaPeme, GEMA: IN852A 2018/14) and MINECO-AEI/FEDER-UE (Flatcity: TIN2016-77158-C4-3-R).

Acknowledgments

First of all, we want to thank the support from the CESGA to execute the code related to this paper. Also, we would like to thank all of the participants who took part in our data collection experiment.