The data consists of a set of readings from sensors attached to subjects performing a variety of exercises. The task is to build a model which will correctly classify activities, given unlabelled sets of readings.
Load the Caret library and set the random number generator's seed, to ensure reproducibility:
library(caret)
set.seed(1234)
rawData <- read.csv("pml-training.csv", na.strings=c("NA",""), strip.white=T)
dim(rawData)
## [1] 19622 160
na <- apply(rawData, 2, function(x) { sum(is.na(x)) })
cleandata <- subset(rawData[, which(na == 0)], select=-c(X, user_name, new_window, num_window, raw_timestamp_part_1, raw_timestamp_part_2, cvtd_timestamp))
dim(cleandata)
## [1] 19622 53
Partitioning the data into training and test
inTrain <- createDataPartition(cleandata$classe, p=0.7, list=F)
training <- cleandata[inTrain,]
testing <- cleandata[-inTrain,]
Now use the train model on the training set.
ctrl <- trainControl(allowParallel=T, method="cv", number=4)
rawdata2 <- read.csv("pml-testing.csv", na.strings=c("NA",""), strip.white=T)
testdata <- subset(rawdata2[, which(na == 0)],
select=-c(X, user_name, new_window, num_window, raw_timestamp_part_1, raw_timestamp_part_2, cvtd_timestamp))
validdata <- subset(cleandata, select=c(roll_belt, pitch_forearm, yaw_belt, magnet_dumbbell_y, pitch_belt, magnet_dumbbell_z, roll_forearm, accel_dumbbell_y, roll_dumbbell, magnet_dumbbell_x,classe))
model <- train(classe ~ ., data=validdata[inTrain,], model="rf", trControl=ctrl)
predict(model, newdata=testdata)
## Loading required package: randomForest
## randomForest 4.6-7
## Type rfNews() to see new features/changes/bug fixes.
## [1] B A B A A E D B A A B C B A E E A B B B
## Levels: A B C D E
pred <- predict(model, newdata=testing)
sum(pred == testing$classe) / length(pred)
## [1] 0.9842
confusionMatrix(testing$classe, pred)$table
## Reference
## Prediction A B C D E
## A 1662 2 9 1 0
## B 13 1100 11 13 2
## C 1 10 1008 7 0
## D 0 4 2 956 2
## E 0 7 5 4 1066
I use this method because it is fast and its accuracy on the test set is 98.5%.