• Home   /  
  • Archive by category "1"

Haibing Lu Dissertation Sample


Dissertations from 2017


Method for Enabling Causal Inference in Relational Domains, David Arbour


Problems in Graph-Structured Modeling and Learning, James Atwood


Spreadsheet Tools for Data Analysts, Daniel W. Barowy


Deep Energy-Based Models for Structured Prediction, David Belanger


Graph Construction for Manifold Discovery, CJ Carey


Automatic Derivation of Requirements for Components Used in Human-Intensive Systems, Heather Conboy




Controversy Analysis and Detection, Shiri Dori-Hacohen




On Leveraging Multi-Path Transport in Mobile Networks, Yeon-sup Lim


Inference in Networking Systems with Designed Measurements, Chang Liu


Style-driven Shape Analysis and Synthesis, Zhaoliang Lun


Temporal and Relational Models for Causality: Representation and Learning, Katerina Marazopoulou


The Complexity of Resilience, Cibele Matos Freire


Knowledge Representation and Reasoning with Deep Neural Networks, Arvind Ramanathan Neelakantan


Belief-Space Planning for Resourceful Manipulation and Mobility, Dirk Ruiken


Database Usability Enhancement in Data Exploration, Yue Wang



Dissertations from 2016


Effective Performance Analysis and Debugging, Charles M. Curtsinger


Detecting Anomalously Similar Entities in Unlabeled Data, Lisa D. Friedland


Elastic Resource Management in Distributed Clouds, Tian Guo


Application-Aware Resource Management for Cloud Platforms, Xin He


Extending Faceted Search to the Open-Domain Web, Weize Kong


Efficient Inference, Search and Evaluation for Latent Variable Models of Text with Applications to Information Retrieval and Machine Translation, Kriste Krstovski


Combining Static and Dynamic Analysis for Bug Detection and Program Understanding, Kaituo Li


Algorithms for First-order Sparse Reinforcement Learning, Bo Liu


Applications of Sampling and Estimation on Networks, Fabricio Murai Ferreira


An Incremental Approach to Identifying Causes of System Failures using Fault Tree Analysis, Huong T. Phan


Specification and Analysis of Resource Utilization Policies for Human-Intensive Systems, Seung Yeob Shin


Intrinsically Motivated Exploration in Hierarchical Reinforcement Learning, Christopher M. Vigorito


Stochastic Network Design: Models and Scalable Algorithms, Xiaojian Wu


Leveraging Backscatter for Ultra-low Power Wireless Sensing Systems, PENGYU ZHANG


Shape Design and Optimization for 3D Printing, Yahan Zhou

Dissertations from 2015


Reconstructing Geometric Structures from Combinatorial and Metric Information, Md Ashraful Alam


An Analysis of Student Learning Strategies in High Enrollment Computer-Based College Courses, gordon c. anderson


Fundamental Limits of Covert Communication, Boulat A. Bash


Automated Style Feedback for Advanced Beginner Java Programmers, Hannah Blau


Skeleton Structures and Origami Design, John C. Bowers


Exploiting Concepts In Videos For Video Event Detection, Ethem Can


Learning Parameterized Skills, Bruno Castro da Silva


Robust Mobile Data Transport: Modeling, Measurements, and Implementation, Yung-Chih Chen


Model-Based Guidance for Human-Intensive Processes, Stefan Christov


General Program Synthesis from Examples Using Genetic Programming with Parent Selection Based on Random Lexicographic Orderings of Test Cases, Thomas Helmuth


Network Characteristics and Dynamics: Reciprocity, Competition and Information Dissemination, Bo Jiang


Application of Techniques for MAP Estimation to Distributed Constraint Optimization Problem, Yoonheui Kim


Exploiting Social Media Sources for Search, Fusion and Evaluation, Chia-Jung Lee


A Platform for Scalable Low-Latency Analytics using MapReduce, Boduo Li


Energy-Efficient Content Delivery Networks, Vimal Mathew


Energy Optimizations for Smart Buildings and Smart Grids, Aditya K. Mishra


Learning with Joint Inference and Latent Linguistic Structure in Graphical Models, Jason Narad


Long Range Motion Estimation and Applications, Laura Sevilla-Lara


Content Placement as a Key to a Content-Dominated, Highly mobile Internet, Abhigyan Sharma


Variation in Human-Intensive Systems: a Conceptual Framework for Characterizing, Modeling, and Analyzing Families of Systems, Borislava I. Simidchieva


Informed Search for Learning Causal Structure, Brian J. Taylor


Safe Reinforcement Learning, Philip S. Thomas


Epistemological Databases for Probabilistic Knowledge Base Construction, Michael Louis Wick


An Opportunistic Service Oriented Approach for Robot Search, Dan Xie


Forensic and Management Challenges in Wireless and Mobile Network Environment, sookhyun yang


Universal Schema for Knowledge Representation from Text and Structured Data, Limin Yao

Dissertations from 2014


Integrating Non-Topical Aspects Into Information Retrieval, Elif Aktolga


Subtyping with Generics: A Unified Approach, John G. Altidor


Model-Driven Analytics of Energy Meter Data in Smart Homes, Sean K. Barker


Streaming Algorithms Via Reductions, Michael S. Crouch




Entity-based Enrichment for Information Extraction and Retrieval, Jeffrey Dalton


A Proportionality-based Approach to Search Result Diversification, Van Bac Dang


Improving Text Recognition in Images of Natural Scenes, Jacqueline Feild


Making Networks Robust to Component Failures, Daniel Gyllstrom


Indexing Proximity-based Dependencies for Information Retrieval, Samuel Huston




Incorporating Boltzmann Machine Priors for Semantic Labeling in Images and Videos, Andrew Kae




A Probabilistic Model of Hierarchical Music Analysis, Phillip Benjamin Kirlin


Reliable and Efficient Multithreading, Tongping Liu


Privacy-preserving Sanitization in Data Sharing, Wentian Lu


Causal Discovery for Relational Domains: Representation, Reasoning, and Learning, Marc Maier


Unsupervised Joint Alignment, Clustering and Feature Learning, Mohamed Marwan Mattar


Probabilistic Models for Motion Segmentation in Image Sequences, Manjunath Narayana


Using Formal Methods to Verify Transactional Abstract Concurrency Control, Trek S. Palmer


Designing Efficient and Accurate Behavior-Aware Mobile Systems, Abhinav Parate


Retrieval Models based on Linguistic Features of Verbose Queries, Jae Hyun Park


Efficient Routing and Scheduling in Wireless Networks, Anand Seetharam


Scaling MCMC Inference and Belief Propagation to Large, Dense Graphical Models, Sameer Singh






Efficient Representation and Matching of Texts and Images in Scanned Book Collections, Ismet Zeki Yalniz

Dissertations from 2013


Query-Time Optimization Techniques for Structured Queries in Information Retrieval, Marc-Allen Cartright


The Security and Privacy Implications of Energy-Proportional Computing, Shane S. Clark


The Impact of Integrated Coaching and Collaboration Within an Inquiry Learning Environment, Toby Dragon


Exploring Privacy and Personalization in Information Retrieval Applications, Henry A. Feild


Evolving Expert Knowledge Bases: Applications of Crowdsourcing and Serious Gaming to Advance Knowledge Development for Intelligent Tutoring Systems, Mark Floryan


Accurate and Robust Mechanical Modeling of Proteins, Naomi Fox


Exploiting Domain Structure in Multiagent Decision-Theoretic Planning and Reasoning, Akshat Kumar


Multiscale Modeling of Human Addiction: a Computational Hypothesis for Allostasis and Healing, Yariv Z. Levy


Optimizing Linear Queries Under Differential Privacy, Chao Li


Privacy-Aware Collaboration Among Untrusted Resource Constrained Devices, Andres David Molina-Markham


Semantically Grounded Learning from Unstructured Demonstrations, Scott D. Niekum


Characterization and Network Consequences Of Low Spreading Loss in Underwater Acoustic Networks, James W Partan


Transiently Powered Computers, Benjamin Ransford


Software Techniques to Reduce the Energy Consumption of Low-Power Devices at the Limits of Digital Abstractions, Mastooreh Salajegheh



In this paper, we propose a novel energy-efficient approach for mobile activity recognition system (ARS) to detect human activities. The proposed energy-efficient ARS, using low sampling rates, can achieve high recognition accuracy and low energy consumption. A novel classifier that integrates hierarchical support vector machine and context-based classification (HSVMCC) is presented to achieve a high accuracy of activity recognition when the sampling rate is less than the activity frequency, i.e., the Nyquist sampling theorem is not satisfied. We tested the proposed energy-efficient approach with the data collected from 20 volunteers (14 males and six females) and the average recognition accuracy of around 96.0% was achieved. Results show that using a low sampling rate of 1Hz can save 17.3% and 59.6% of energy compared with the sampling rates of 5 Hz and 50 Hz. The proposed low sampling rate approach can greatly reduce the power consumption while maintaining high activity recognition accuracy. The composition of power consumption in online ARS is also investigated in this paper.

Keywords: activity recognition, low power consumption, low sampling rate, energy-efficient classifier

1. Introduction

Human activity recognition plays a crucial role in pervasive computing. Many applications for healthcare, sports, security agencies and context-aware services applications have emerged [1,2]. For example, life logs collected by smart mobile phone sensors (such as accelerometers) have been used to provide personalized health care [3]. Vermeulen et al. [4] developed a smartphone-based falls detection application to help elderly people. Zhou et al. [5] implemented a phone system for indoor pedestrian localization. Google Now is one of the emerging smart applications that provide context-aware services. It calculates and pushes relevant information automatically to mobile users based on their current locations [6].

The history of human activity recognition can be traced back to the late 1990s [7]. Four sensors (accelerometers) were placed on different positions of body to detect human activities (lying, sitting, sitting/talking, sitting/operating, standing, walking, upstairs, downstairs, and cycling). Randle and Muller [8] used a single wired biaxial accelerometer to classify six activities (sitting, standing, walking, running, upstairs, and downstairs) in 2000. However, the early systems are not easy to use.

Thanks to the development of microelectronics and computer systems, the sensors and mobile devices are now with higher computational capability, smaller size and more acceptable usability. The studies on activity recognition systems (ARS), especially smartphone based activity recognition system (ARS), have been set off a booming in recent years [9,10,11,12,13]. The accelerometers and gyroscopes embedded in smart phones have been used to collect raw activity data in ARS. Smart phones have become one of the most indispensable parts of life when comparing with other special devices [13,14]. It is now a relative low-cost device for both developers and users.

The smart phone based ARS can be divided into two types. One is online activity recognition systems, i.e., data collection, data processing and classification are carried out locally on the mobile phones [11,15]. The other is offline activity recognition systems, i.e., the classification is carried out non-real-time, or offline. Similar to other research [16], we consider an ARS in which the classification is carried out in a remote server or cloud as an offline ARS because the classification becomes non-real-time when the phone has no internet connection.

An online ARS can recognize the user’s behavior and provide the feedback in real time to support user’s daily life [17]. A number of studies on online ARS have been carried out. Anjum et al. [18] developed an application for recognizing a number of activities, including driving and cycling, with an average accuracy of greater than 95%. Kose et al. [15] investigated the performance of different classifiers and used accelerometer of the smartphone to classify four activities (sitting, standing, walking, and running). Schindhelm et al. [19] explored the capability of using smartphone (HTC hero) sensors for the detection of steps and movement/activity types. Martín et al. [20] presented the work of using smartphone for activity recognition without interfering user’s life.

Although the previous research work has achieved good accuracy in activity detection, there are few reports on power consumption. The power consumption is one of the main challenges [11], especially for the online ARS. The mobile phones are usually used for making phone calls or Internet surfing, so the power consumption of the online ARS must be reduced as low as possible. In order to solve this issue, one straightforward method is to reduce the number of sensors—for example, turning off the Global Positioning System (GPS) while the user is indoors or applying some energy-efficient sensors (such as accelerometers, gyroscopes) [10,21,22] instead. The other approach is to lower the sampling rate. However, in most studies [13], the sampling rates were still high because they followed the Nyquist theorem, i.e., the sampling rate must be equal to or higher than twice the signal frequency so that no actual information will be lost during the sampling process.

In this paper, we propose an energy-efficient ARS that uses low sampling rate and can still achieve high accuracy. A theoretically proof of the rationale of using low sampling rate in ARS is presented. A novel classifier is also proposed and developed to improve the performance of activity recognition. The proposed system consists of three components: (a) sensors using the proposed low sampling rate for data collection; (b) feature extraction for training and classification; (c) the proposed classifier which integrates hierarchical support vector machine (H-SVM) with context-based classification (HSVMCC) to detect user’s activities.

The rest of the paper is organized as follows. In Section 2, we briefly describe the related work. Section 3 details the proposed energy-efficient system, including data collection using low sampling rate, feature extraction method, the proposed HSVMCC algorithm, and the composition of power consumption in online ARS. The discussion of experiments and results is presented in Section 4. The paper is concluded by the summary of merits, limitations and future work in Section 5.

2. Related Work

The common approach of saving energy consumption in online ARS is to detect the mobile status and user’s location or activities and then turn on/off some unused sensors [22,23,24]. Wang et al. [24] presented a novel design framework for an Energy Efficient Mobile Sensing System (EEMSS). Using a hierarchical sensor management strategy to recognize user states and detect state transitions, the EEMSS significantly improved battery life of the device.

Adopting one tri-axial accelerometer, Lius et al. [13] detected activities (including walking, jumping, immobile, running, up, down, cycling and driving) using the sampling rate varying from 32 Hz to 50 Hz. An average accuracy of 98% was achieved. Discrete variables were used to reduce the calculation costs and save the energy. However, the sampling rate is still too high to maximize energy savings.

Reddy et al. [25] employed the GPS and accelerometer to detect activities (including stationary, walking, running, biking or motorized transport). Although the sampling rate was set to 1 Hz only, the GPS is not an energy-efficient sensor and cannot be used indoors.

Similarly, aiming to detect human mobility states, Oshin et al. [10] used an accelerometer at a sampling rate of 4 Hz. The results showed an overall average accuracy of 92%. However, it failed to detect some regular indoor activities, such as climbing stairs, in contrast to other studies [12,18,26]. The rationale of using the sampling rate of 4 Hz was not presented in the paper.

Applying a tri-accelerometer at the sampling rate of 2 Hz, Liang et al. [27] managed to obtain the average accuracy of 89% detecting human activities (standing, sitting, lying, walking, running, jumping, ascending, descending, cycling and driving). However, no justification was provided for the choice of sampling rate. It also lacked the accurate evaluation of power performance.

Activity recognition plays an important role in the area of pervasive healthcare. Liang et al. [28] proposed a hierarchical method to recognize user activities based on a single tri-axial accelerometer in smart phones for health monitoring.

Li et al. [29] proposed to leverage machine learning technologies for improving the energy efficiency of multiple high-energy-consuming context sensors by trading off the sensing accuracy.

For the purpose of utilizing available energy efficiently while achieving a desired activity recognition accuracy, Zappi et al. [30] investigated the benefits of dynamic sensor selection. It introduced and characterized an activity recognition method with the help of an underlying run-time sensor selection scheme.

Mortazavi et al. [31] presented a multiple model approach to classifying movements in exergame environment with fine-grain motions. Expert knowledge was applied to identify similar movements. Each submodel was modeled using a one to many support vector machine (SVM) with nonlinear kernel. Although the multiple model approach achieved a good classification performance, the study didn’t consider the power consumption either in algorithms or in data sampling, where a sampling rate of 50 Hz was used.

In our previous work [32], we have experimentally tested and confirmed that the sampling rate of 1 Hz could achieve high performance for detecting activities (including sitting, standing, walking, and running) in an offline ARS.

Considering the above-related work, we have carried out a theoretical analysis on why using a low sampling rate in ARS can also achieve high performance. Experiments based on the smartphone have also been undertaken to evaluate the power consumption of the online ARS with different sampling rates. Furthermore, the recognition of climbing stairs activities (upstairs and downstairs) is also included in the proposed ARS.

3. Energy-Efficient Activity Recognition System

The aim of this research is to build a user-independent and energy-efficient online ARS with high accuracy. The proposed system, as shown in Figure 1, includes data collection, feature extraction, and the training and classification (H-SVM and context-based classification). The system does not contain data processing before the feature extraction because the data obtained have been preprocessed by the phone’s built-in filters.

3.1. Data Collection at a Low Sampling Rate

The types of sensors and the sampling rates are two factors that must be considered during data collection in ARS. The barometer, accelerometer and gyroscope are the sensors usually used in ARS.

Barometer is the sensor for measuring altitude or height. Using low sampling rate of barometer can detect whether the user is climbing stairs or not.

Inertial measurement units (IMUs), such as accelerometer and gyroscope, are used to measure the user motion. In previous studies [1], the sampling rate of these sensors (such as accelerometers) in ARS was set between 10 Hz and 100 Hz. It is a general view that a high sampling rate can avoid the information loss of signals. Some research also claimed that high sampling rate could achieve high accuracy of recognition [33]. However, the higher the sampling rate is, the more energy consumed. The trade-off between the sampling rate and the power consumption has become one important concern in most energy-efficient ARS.

In our research, we proposed a solution to solve the contradiction between the sampling rate and the power consumption, that is, using a low sampling rate of IMU in ARS to achieve a similar recognition accuracy, compared with using high sampling rates.

For human activity recognition, the purpose of sampling is not to restore the raw signals of activities, but to detect different activities according to the statistical properties of signals, such as means, variance, and maximum. It is considered that using high sampling rate can capture all the details of the person’s movements, and this would benefit the recognition of activity [34]. However, the signal information would not be lost if the sampling rate agreed with the Nyquist theorem, which means that the statistical properties of using low or high sampling rate are consistent. When the sampling rate is less than the frequency required by the Nyquist theorem, we suggested that adding more sampling periods can acquire the consistent statistical properties, which is demonstrated as follows:

Set the frequency of activity as Fa, the sampling rate as FS, and the sampling period as T.

For different sampling rates of FS1 and FS2, if they agree with the following conditions:

There exists Equation (2):

FS1 × T1FS2 × T2T1 ∈ {1, 2, …, n}, FS1 × T1 ∈ {1, 2, …, n}


where T1 and T2 are sampling periods.

The elements of dataset X1 = {x1x2, …, xn} obtained at the sampling rate of FS1 and sampling period of T1 are the same with dataset Y1 = {y1y, …, yn} obtained at the sampling rate of FS2 and sampling period of T2. Thus, the statistical properties of the dataset X1 and the dataset Y1 are the same, if the sampling period is long enough.

If sampling rate FS3 is less than the frequency required by the Nyquist theorem Fa, which is:

Human activity signal is a non-strict period, so Fa is not a determined value, but a fluctuating value. The relation between 2Fa and FS3 satisfied the Equation (1). The same as above, the elements of dataset D = {d1d2, …, dn} obtained at the sampling rate of 2Fa and the sampling period of T has the same statistical properties as the dataset obtained at the sampling rate of FS3 and the sampling period of T3.

Combined with the formulas above, time period T3 is calculated as Equation (4):

Therefore, when we use a low sampling rate that does not agree with the Nyquist theorem in ARS, we can add the sampling time to ensure the same statistical properties.

3.2. Hierarchical Support Vector Machine (H-SVM)

Support vector machine (SVM) is a supervised learning algorithm. The basic SVM model is the probability of a binary classification. In order to deal with multiple classes, Liu et al. [35] proposed an adaptive hierarchical multi-class SVM classification scheme at the training stage.

In this paper, the k-means clustering algorithm was used in training the H-SVM classifier.

The training algorithm is summarized in Algorithm 1.

Algorithm 1 Training algorithm
  • 1: Construct a feature set ({f1f2, …, fn, (fi1,fj1), (fi2,fj2), …, (f1fi, …, fm)}). The feature set is the combination of all features.

  • 2: Sorting features according to the power consumption of the sensor and the computational cost of feature extraction. The lower power consumption of the sensor ranks the higher. Features with lower computational cost have in higher priority when the sensor is the same.

  • 3: Selecting m top features with the higher priority from the feature set.

  • 4: For each of these m features, set the whole dataset as DataSets_In.

  • 5: Input DataSets_In. do

  • 6: {

  • 7:  Set parameter of clustering k = 2

  • 8:  Use k-means clustering algorithm to get two cluster A and B. The two clusters are the subsets of DataSets_In. Evaluate the performance of k-means clustering, select the optimal ones according the accuracy and the equilibrium of the two subset.

  • 9:  Input subsets A and B

  • 10:  Training a binary classifier SVM

  • 11:  SVM_n = SVM (n = 1, 2, 3……)

  • 12:  DataSets_In = subsets A or B respectively

  • 13: } while DataSets_In only contains data from a single class

  • 14: The resulting classifier (i.e., final classifier) is a one-node SVM classifier, as an N-class classification needs an N-1 node classifier.

3.3. Context-Based Classification

During the study, we found that there are two reasons that some errors may occur in the activity recognition: (1) features were similar between two activities; (2) some measurement errors. To correct these errors, we proposed a context-based classification approach. Context-based classification is a method that combining the previous variables’ information and the following variables’ information during the analysis of the current variable. It can effectively eliminate the individual errors.

Human activities are continuous processes. Therefore, the previous recognition results and the following recognition results can be used to check and correct the current recognition result. The process is archived by using a sliding window (the window length is 2k + 1) to correct the result at time t, as shown in Figure 2.

Figure 2

The context-based classification.

In Figure 2, the variable Rt is the result of the H-SVM model at time t, and RWt is the corrected result after context-based classification. RWt is defined as the mode (the most frequent result) among {RtkRtk+1, …, Rt+k}. We assume that the probability of recognition errors ψ is independent identically distributed, the accuracy at time t (Accuracyt) with values of k is showed as follows:


The algorithm is summarized in Algorithm 2:

Algorithm 2 Context-based classification Algorithm
  • 1: Initialize the data buffer {Result1Result2, …, Result2k+1}

  • 2: while R is input from H-SVM do

  • 3: {

  • 4:  If n < 2k + 2 then

  • 5:   ResultnR(n = 1, 2, …, 2k + 1)

  • 6:   RWTRt

  • 7:  else

  • 8:   {Mode1Mode2, …} = MajorityOf{Result1Result2, …, Result2k+1}

  • 9:     if Resultk+1 ∈ {{Mode1}, {Mode2}, …} then

  • 10:       RWtResultk+1Rt

  • 11:     else

  • 12:       RWtMode1

  • 13: }

  • 14: end while

Firstly, variables {Result1Result2, …, Result2k+1} are initilised as data buffer. Then, for each R received, it is placed to the data buffer {Result1Result2, …, Result2k+1}. For the first R (which equals to Rtk) input from H-SVM, the RWt is set as Rt. For the following R, RWt does not change. When the 2k + 1 of R are all input, the majority {Mode1Mode2, …} of the data buffer is calculated. If Resultk+1 (here, Resultk+1 equals to Rt) belongs to {Mode1Mode2, …}, and the RWt is set as Resultk+1; otherwise, the RWt is set as Mode1. Then, the data buffer is shifted right by one to store the next Rt. For the next RW, the steps of calculating RW are the same as RWt. The algorithm is stopped when R is no longer received.

Time delay is the main defect of the context-based classification algorithm. It is equal to the value of k. It increases with the growth of the sliding window size. Thus, it is important to choose a suitable sliding time window size (that is, the values of 2k + 1) in an online ARS.

4. Experiments and Discussion

The experiments include four parts: (1) data collection; (2) parameter selections; (3) classification performance; and (4) the power consumption of the proposed energy-efficient online ARS.

4.1. Data Collection

For data collection, a smartphone (Nexus 5, Google Inc., Mountain View, CA, United States of America) was placed in the right-front pocket of the pants, showed in Figure 3. The sensors used in the experiments were barometer and accelerometer inside the smartphone. As shown in Table 1, four independent data collections were carried out separately in the study.

Figure 3

Illustration of the placement of the phone on the participant’s body.

Table 1

Data collection of four experiments.

Data collection 1 (Training datasets): the sampling rate was set to 1 Hz, the time window was 5 s, without overlap. One volunteer (male, 23 years old, healthy student) was asked to perform six types of activities: sitting, standing, walking, running, climbing upstairs and going downstairs. The volunteer sat and stood indoors, walked in the corridor or in the room, ran on a treadmill at 9 km/h, climbed upstairs and went downstairs in our lab building, which is six-floors. Each activity was carried out 15 times (15 samples collected). In total, 90 samples were collected as the training data-sets.

Data collection 2: The sampling rate was set to 1 Hz, the time window was 5 s, without overlap. Twenty volunteers (14 males and six females, ages between 22 to 25) participated in the data collection. The volunteers were asked to undertake the six activities as described in the Data collection 1. Each activity (sitting, standing, walking and running) was consecutively carried out for 5 min. The volunteers were asked to climb upstairs from the second floor to the sixth floor and go downstairs from the sixth floor to the second floor, five times repeatedly. All these data collected in Data collection 2 are only used for testing, as shown in Table 2. For each activity, we removed the data of the first time window (5 s) and the last time window (5 s) to ensure that the data obtained only contained one type of activity.

Table 2

Testing datasets collected from 20 participants (Data collection 2).

To compare the accuracy of activity recognition at different sampling rates, five volunteers (from the above 20 volunteers in the Data collection 2) participated again in the following two new data collections.

Data collection 3: The purpose of this data collection is to verify that using a low sampling rate (1 Hz) can also achieve high accuracy of recognition, compared with the sampling rate agreed with the Nyquist theorem. Thus, the sampling rate was set as 5 Hz, which agreed with the Nyquist theorem and was close to twice the frequency of human activity obtained by phone sensor. The time window was 1 s.

Data collection 4: The aim of this data collection is to verify that different sampling rates that agreed with the Nyquist theorem achieve almost the same accuracy. Activity data were collected at the sampling rates of 10 Hz and then 50 Hz. The time window of different sampling rates was 1 s.

4.2. The K-Means Clustering and H-SVM

In general, the feature extraction aims to identify the main characteristics that accurately represented the original data [36]. The process is to find the most useful, valid and meaningful information to recognize activities with high accuracy. In previous studies [1,10,13], the common features include time domains and frequency domains, such as means, standard deviation, magnitude of acceleration and FFT (Fast Fourier Transform). There are no fixed features that are suitable for all ARS.

In this paper, we firstly constructed a feature set. The feature set is the combination of all features, that is Pd (pressure difference), Pdabs (absolute value of pressure difference), XmeansYmeansZmeans (the means of X-/Y-/Z-axis accelerometer values), and Twaves(the sum of root mean squares of the difference of adjacent points in a time window).

After constructing the feature set, algorithm 1 was applied to feature selections and classification. Based on the power consumption of the sensor and the computational cost of feature extraction, m features (PdPdabsYmeansTwaves) with the higher priority from the set of optimal features were selected.

(1) Pressure difference Pd: The difference of pressure value is measured by barometer built-in the mobile phone, as shown in Equation (6). The barometer value is considered as height changing. When the altitude increases, the pressure value decreases, and vice versa:

where pn is the last pressure value and p0 is the first pressure value in the time window (sampling period). The Pd value is negative when the user climbs upstairs, and it is positive when going downstairs.

(2) The absolute value of pressure difference (Pdabs): The Pdabs is calculated as follows:

(3) X-/Y-/Z-axis accelerometer value (XmeansYmeansZmeans): This is the means of the X-/Y-/Z-axis accelerometer values. The values of tri-axial accelerometer we got from the smartphone (Android API) contained the gravity values. The following is the calculation for XmeansYmeansZmeans:


(4) The wave of three-axis accelerometer (Twaves): this is the sum of the RMS (Root Mean Square) of the difference of adjacent points in a time window, and can be calculated using Equation (9):


where AccXiAccYiAccZi are the three-axis values of accelerometer at time stamp i, respectively.

The training carried out on the whole dataset (Data collection 1) using algorithm 1. As shown in Figure 4, for the feature Pd (Figure 4a,b), the whole training dataset was divided into subset A (downstairs) and B (upstairs, sitting, standing, walking, running). For the feature Pdabs (Figure 4c,d), the whole training dataset was divided into two subsets A (upstairs and downstairs) and subset B (sitting, standing, walking, running) using the k-means clustering algorithm. For the feature Ymeans (Figure 4e,f), the whole training dataset was divided into subset A (sitting) and B (downstairs, upstairs, standing, walking, running). For the feature Twaves (Figure 4g,h), the whole training dataset was divided into subset A (running) and B (downstairs, upstairs, sitting, standing, walking).

Figure 4

(a) Pd value; (b) results of k-means clustering of Pd feature; (c) Pdabs value; (d) results of k-means clustering of Pdabs feature; (e) Ymeans value; (f) results of k-means clustering of Ymeans feature; (g) Twaves value; (h) results of k-means clustering...

As shown in Figure 4, the accuracy and partition degree of k-means clustering of different features were assessed. The equilibrium of two subsets was also considered. Figure 4b,d,f shows the results that k-means clustering can get good performance using the selected features, but only the feature Pdabs can meet the equilibrium requirement. Thus, the selected feature Pdabs is the optimal feature for training the first SVM classifier (SVM1). In the Algorithm 1, the data were randomly divided into 80% for training and 20% for testing in order to select the optimal features and build the SVM classification models.

According to Algorithm 2, for subset (upstairs or downstairs), feature Pd is the optimal feature for training SVM classifier (SVM2) to partition the upstairs or downstairs because it is the most accuracy ones. For subset (sitting, standing, walking, running), feature Ymeans is suitable for classification (SVM3), dividing dataset (sitting, standing, walking, running) into subset (sitting) and subset (standing, walking, running) with the highest accuracy. In addition, Ymeans is also the optimal feature for dividing dataset (standing, walking, running) into subset (standing, walking) and running (SVM4). Finally, the Twaves is used for partition standing and walking (SVM5). The whole H-SVM classification model is shown as follows.

The whole training dataset (Data collection 1) contains 90 samples. After training, the five-node SVM classifier was built. As illustrated in Figure 5, the dataset is divided into two sets whether the activity is climbing stairs or not. If the classification result of classifier SVM1 is climbing stairs, classifier SVM2 is used to judge if the activity is climbing upstairs or going downstairs. If the result of SVM1 is not stair climbing, classifier SVM3 is applied to classify sitting or standing, walking, running and then classifier SVM4 will contribute to recognize standing, walking or running. Finally, classifier SVM5 is used to differentiate standing or walking. Considering the k-means clustering results discussed before, Pdabs can be used as the input feature for SVM1 to detect climbing stairs or not. Furthermore, Pd, Twaves can be used as input feature for SVM2 and SVM5, respectively, and Ymeans can be input as feature for SVM3 and SVM4. These may reflect the body movement efforts and acceleration patterns when carrying out different types of activities, that is: (1) the pressure used in climbing stairs activities is different from activities on flat ground; (2) changing of acceleration on the Y-axis can be used to differentiate the sitting status from standing/walking/running and further differentiate running from walking/standing; (3) information of changes in all three directions needs to be taken into account in order to classify standing from walking.

Figure 5

The whole H-SVM classification model.

4.3. The Parameter Settings of Proposed ARS.

The sampling rate and time window of accelerometer during data collection and sliding window size of context-based classification are three crucial parameters that may affect the power consumption and accuracy of proposed ARS.

The frequency of human activity is about 2 Hz. For example, the frequency of going downstairs with fast speed is less than 2 Hz, and the step time of fast walking is 0.35 s/step [37]. The time windows were usually 1 s in previous studies [38]. In our research, the sampling rate of accelerometer was 1 Hz. According to Equation (4), the time window was about 5 s.

In Section 3.3, we proposed context-based classification to improve the accuracy of recognition. For different values of the probability of recognition error ψ (0.3, 0.2, 0.1, and 0.05), the Accuracyt of different sliding window size (2k + 1) is shown in Figure 6:

Figure 6

The Accuracyt of different sliding window length k.

Figure 6 shows that the Accuracyt has improved with the increase of k-values. For example, the Accuracyt has improved 8% with the change of value k from 0 to 1 when ψ = 0.3. However, the time delay will also increase, which may be harmful to the online ARS. Especially when the recognition error ψ is becoming closer to 0, with the increase of value k, the improvement of Accuracyt is becoming smaller, but the time delay is becoming greater.

The classification performance of our proposed ARS shows that the largest recognition error ψ is less than 0.2 and the average recognition error is less than 0.1. As shown in Figure 6, no matter ψ = 0.2 or ψ = 0.1 or ψ = 0.05, the accuracy is improved quickly when k is increased from 0 to 1. However, the improvement of Accuracyt slows down when k ≥ 1, but the time delay became greater. Therefore, the slide window size is set as 3 (k = 1).

4.4. The Classification Performance of Proposed ARS

In this section, we assess the performance of proposed classifier and compare it with other classifiers. The classification accuracy of different sampling rates is also discussed.

4.4.1. Performance of Different Classifiers

In our research, we used the H-SVM model and context-based classification. In order to analyze the performance of H-SVM, we used the training dataset (Data collection 1), testing dataset (Data collection 2) and features obtained from the mobile phone. The training and classification were carried out in Matlab 2014a (MathWorks Inc., Natick, MA, USA), using Libsvm library [39]. The training used the linear kernel, cost and without cross-validation. The features of H-SVM and the parameters of SVM used in Matlab were the same as those on the phone. The classification results are shown in Table 3.

Table 3

The performance of the H-SVM classification.

As shown in Table 3, the average accuracy of six activities is 90.9% and the weakest performance (the accuracy is only 83.8%) occurred when recognizing climbing upstairs. Upon close examination, we found that it was caused by the noise of the signals, which led to the misclassification in some discrete time windows. We randomly selected 200 continuous recognition results of climbing upstairs shown in Figure 7. In Figure 7, it can be seen that some activities of climbing upstairs were misclassified as other activities, such as standing, walking and running.

Figure 7

The classification results of climbing upstairs activity after H-SVM.

To reduce the impact of the noise and to improve the accuracy, we applied context-based classification after H-SVM. The results are shown in Table 4. The process of data collection, processing, training, and classification are all done by the phone. The accuracy values of six activities are increased by 1.8%, 3.1%, 5.5%, 4.7%, 8.3% and 6.6%, respectively, and the average accuracy of six activities is increased by 5.1%. The average accuracy of six activities of the proposed ARS is 96.0%, which is high enough for most applications.

Table 4

The performance of H-SVM and Context-based classification (HSVMCC).

We compared our method with other classification algorithms such as J48 Naive Bayes (NB) and Random Forest (RF). The machine learning tool weka [40] was used in the study and the results are shown in Figure 8. We used the same training datasets (Data collection 1) described before to obtain the model of other classification algorithms. The universal parameters were selected for these classification algorithms. For J48, the parameter C was set as 0.25 and M was set as 2. For Random Forest, the parameter I was set as 100, K was set as 0 and S was set as 1. Then, we used all testing datasets (Data collection 2) in Table 2 to test the classifiers.

Figure 8

Comparison of the proposed ARS vs. classification models, J48, NB and Random Forest.

Figure 8 shows that the accuracies of the proposed method (HSVMCC) are more than 90% for all six activities. However, the accuracies of other algorithms vary between different activities. For sitting, the Random Forest (RF) achieves a high accuracy of 98.9%, while J48 obtained the lowest accuracy of 29.5%, but, for the ‘going downstairs’ activity, the accuracy of J48 is 94.8%, while the accuracy of Random Forest only achieves 76.1%.

Figure 9 shows the average accuracy of six activities of HSVMCC in comparison to Naive Bayes (NB), J48, and Random Forest (RF). The average classification accuracy for HSVNCC, NB, J48 and RF are 96%, 82.6%, 73.9%, and 85.6%, respectively. It can be concluded that the proposed HSVMCC outperformed other classifiers in terms of the average accuracy.

Figure 9

The comparison of average accuracy of the proposed AR system vs. other classifications.

4.4.2. The Accuracy of Different Sampling Rates

As mentioned before, we can use the sampling rate, which is less than the frequency required by the Nyquist theorem. Figure 10 shows the recognition results of ARS using the sampling rate of 1 Hz (less than the frequency required by the Nyquist theorem), 5 Hz (agreed with the Nyquist theorem), 10 Hz and 50 Hz. It can be observed that the accuracy of using 1 Hz sampling rate and using 5 Hz sampling rate are comparable, or similar. This means that, if the sampling rate is less than the frequency required by the Nyquist theorem, we can add the sampling period to achieve the similar accuracy of using the higher sampling rate that agrees with the Nyquist theorem.

Figure 10

Recognition accuracy of activities at different sampling rates.

Figure 10 also shows the accuracy has only improved slightly with the increase of the sampling rate from 1 Hz to 50 Hz, i.e., (1 Hz: 96.2%, 5 Hz: 97.2%, 10 Hz: 97.6%, 50 Hz: 98.0%). The accuracy of 1 Hz (96.2%) is sufficiently high for practical applications.

4.5. The Power Consumption of the Energy-Efficient ARS

The research about the compositions of energy consumption in ARS can help us to assess whether the proposed energy-efficient strategies are effective or not. Furthermore, the analysis of the composition of energy consumption in ARS can provide guidance for the researchers in energy-efficient fields.

An online ARS consists of data collection, data processing and activity recognition. Thus, the main composition of energy consumption in ARS can be divided into three parts. The first part is the power consumption used by the sensors. In our research, this part does not contain the data collection. The second part is the power consumption used in data processing, including data collection, feature extraction and data storage. The last part is the power consumption used by the activity recognition algorithm.

In our previous work [32], we proposed that the low sampling rate can decrease the power consumption. Power consumption for ARS is caused by the sensor running [10] or the total power consumption [13]. In this paper, we carried out the experiments to analyze the composition of power consumption in ARS.

We use the other mobile phone (Nexus 5, with an Android 4.4.2 system) for experiments. Firstly, we restored the phone to factory data to avoid power consumption caused by other applications, and we installed the requiring applications in the phone. Then, we put the phone in a shaker to do the experiments.

The experiments can be divided into two categories.

Category 1: we carried out the experiments in the shaker with the setting of 5 mm amplitude and 5–10 Hz variant-frequency vibration and a total of 17 experiments were undertaken.

Category 2: we carried out the experiments in the shaker with the static state and a total of 17 experiments were undertaken.

The purpose of contrasting two states (shaker and static state) is to simulate the real situations. We used the shaker to simulate the status of moving such as walking. Similarly, the static state was used to simulate standing and sitting status.

For each category, we conducted four experiments (sampling rate of 1 Hz, 5 Hz, 10 Hz, 50 Hz respectively) with the setting of running the whole ARS, four experiments with the setting of only running sensors, four experiments with the setting of running the ARS without activity recognition and result processing, four experiments with the setting of running the ARS without result processing and one experiment when the phone was on standby. The details are listed in Table 5.

Table 5

Experiment setting of each case of study.

For each experiment, we fixed the mobile phone in the shaker (Figure 11a) and connected an external signal generator (3.8 V) to the mobile phone (Figure 11b), and then connected the signal generator with computer to collect data of current (the time is set as 20 min). We turned on the phone and started the application (five experiment settings shown in Table 5) under the experiment condition (shaker or static state). In the end, we clicked the button (“start to save data”) to collect the data of current.

Figure 11

(a) Power test environments; (b) the phone was fixed in the shaker; (c) the external power; (d) computer used to control the experiments.

Figure 12 illustrates the average current of ARS at different sampling rates. It shows that the average current increases with the increase of the sampling rate. The average current is 20.3 mA at 1 Hz, and it is 42.7 mA at 50 Hz when the phone is in the shaker state. The average current is 20.1 mA at 1 Hz, and it is 41.1 mA at 50 Hz when the phone is in the static state. It also infers that the power consumption of ARS at rate of 10 Hz has slightly increased compared with 5 Hz. There is a large increase of power consumption when the sampling rate changes from 10 Hz to 50 Hz. There are two main reasons. One reason is that the sensor running has a great increase when the rate changes from 10 Hz to 50 Hz (as shown in Figure 13). The other reason is the amount of data increase greatly when the sampling rate increases from 10 Hz to 50 Hz, which causes more power consumption in data processing.

Figure 12

The average currents of the ARS at different sampling rates.

Figure 13

The average current in the sensor running, data processing and activity recognition. (a) ARS with a sampling rate of 1 Hz; (b) ARS with a sampling rate of 5 Hz; (c) ARS with a sampling rate of 10 Hz; (d) ARS with a sampling rate of 50 Hz.

Figure 13 also shows the average current of different parts in ARS. The data processing consumes most of the power in the online ARS. The second large power consumption is the sensor running. The power consumption of the proposed recognition algorithm is very small and can even be negligible. With the decrease of the sampling rate, the energy is saved in the sensor running and data processing for the reason that the amount of data is smaller.

We carried out another experiment to evaluate the power performance with different sampling rates. We turned on the phone when the phone was fully charged, and started phone application at four different sampling rates (1 Hz, 5 Hz, 10 Hz, 50 Hz) or idle state, respectively. Then, put the phone statically on the table, unplugged the charging cable and turned off the screen. After 24 h, we turned on the screen, stopped the application and recorded the data. Experiments on each sampling rate and the idle state were repeated four times. We also installed an external application called Battery Monitor Weight [41

One thought on “Haibing Lu Dissertation Sample

Leave a comment

L'indirizzo email non verrà pubblicato. I campi obbligatori sono contrassegnati *