\(\newcommand{\vect}[1]{\boldsymbol{#1}}\) \(\newcommand{\transp}{^{\text{T}}}\) \(\newcommand{\mat}[1]{\boldsymbol{\mathcal{#1}}}\) \(\newcommand{\sign}{\text{sign}}\)
This code samples are mainly from Chapter 3, Section 7 of Deep Learning with R.
Aim: predict the median price of homes in a given Boston suburb in the mid-1970s, given a few data points about the suburb at the time, such as the crime rate, the local property tax rate, etc.
The dataset:
506 in total, split between 404 training samples and 102 test samples
Each “feature” in the input data (e.g. the crime rate is a feature) has a different scale.
library(keras)
dataset <- dataset_boston_housing()
c(c(train_data, train_targets), c(test_data, test_targets)) %<-% dataset
str(train_data)
num [1:404, 1:13] 1.2325 0.0218 4.8982 0.0396 3.6931 ...
str(test_data)
num [1:102, 1:13] 18.0846 0.1233 0.055 1.2735 0.0715 ...
The 13 features in the input data are as follow:
The targets are the median values of owner-occupied homes, in thousands of dollars:
str(train_targets)
num [1:404(1d)] 15.2 42.3 50 21.1 17.7 18.5 11.3 15.6 15.6 14.4 ...
The prices are typically between \(\$ 10,000\) and \(\$ 50,000\). If that sounds cheap, remember this was the mid-1970s, and these prices are not inflation-adjusted.
mean <- apply(train_data, 2, mean)
std <- apply(train_data, 2, sd)
train_data <- scale(train_data, center = mean, scale = std)
test_data <- scale(test_data, center = mean, scale = std)
# Because we will need to instantiate the same model multiple times,
# we use a function to construct it.
build_model <- function() {
model <- keras_model_sequential() %>%
layer_dense(units = 64, activation = "relu",
input_shape = dim(train_data)[[2]]) %>%
layer_dense(units = 64, activation = "relu") %>%
layer_dense(units = 1)
model %>% compile(
optimizer = "rmsprop",
loss = "mse",
metrics = c("mae")
)
}
For instance, a MAE of 0.5 on this problem would mean that our predictions are off by \(\$500\) on average.
k <- 4
indices <- sample(1:nrow(train_data))
folds <- cut(1:length(indices), breaks = k, labels = FALSE)
num_epochs <- 100
all_scores <- c()
for (i in 1:4) {
cat("processing fold #", i, "\n")
# Prepare the validation data: data from partition # k
val_indices <- which(folds == i, arr.ind = TRUE)
val_data <- train_data[val_indices,]
val_targets <- train_targets[val_indices]
# Prepare the training data: data from all other partitions
partial_train_data <- train_data[-val_indices,]
partial_train_targets <- train_targets[-val_indices]
# Build the Keras model (already compiled)
model <- build_model()
# Train the model (in silent mode, verbose=0)
model %>% fit(partial_train_data, partial_train_targets,
epochs = num_epochs, batch_size = 1, verbose = 0)
# Evaluate the model on the validation data
results <- model %>% evaluate(val_data, val_targets, verbose = 0)
all_scores <- c(all_scores, results["mae"])
}
all_scores
mae mae mae mae
2.252327 2.576473 2.563274 2.586149
mean(all_scores)
[1] 2.494556
validation scores, from 2.1 to 2.6.
average (2.37) is a much more reliable metric
We are off by \(\$2,375\) on average, which is still significant considering that the prices range from \(\$10,000\) to \(\$50,000\).
Training the network for a bit longer: 500 epochs.
# Some memory clean-up
k_clear_session()
num_epochs <- 500
all_mae_histories <- NULL
for (i in 1:k) {
cat("processing fold #", i, "\n")
# Prepare the validation data: data from partition # k
val_indices <- which(folds == i, arr.ind = TRUE)
val_data <- train_data[val_indices,]
val_targets <- train_targets[val_indices]
# Prepare the training data: data from all other partitions
partial_train_data <- train_data[-val_indices,]
partial_train_targets <- train_targets[-val_indices]
# Build the Keras model (already compiled)
model <- build_model()
# Train the model (in silent mode, verbose=0)
history <- model %>% fit(
partial_train_data, partial_train_targets,
validation_data = list(val_data, val_targets),
epochs = num_epochs, batch_size = 1, verbose = 0
)
mae_history <- history$metrics$val_mae
all_mae_histories <- rbind(all_mae_histories, mae_history)
}
We can then compute the average of the per-epoch MAE scores for all folds:
average_mae_history <- data.frame(
epoch = seq(1:ncol(all_mae_histories)),
validation_mae = apply(all_mae_histories, 2, mean)
)
Let’s plot this:
library(ggplot2)
ggplot(average_mae_history, aes(x = epoch, y = validation_mae)) + geom_line()
It may be a bit hard to see the plot due to scaling issues and relatively high variance. Let’s use geom_smooth()
to try to get a clearer picture:
ggplot(average_mae_history, aes(x = epoch, y = validation_mae)) + geom_smooth()
# Get a fresh, compiled model.
model <- build_model()
# Train it on the entirety of the data.
model %>% fit(train_data, train_targets,
epochs = 80, batch_size = 16, verbose = 0)
result <- model %>% evaluate(test_data, test_targets)
result
loss mae
19.405186 2.737695
We are still off by about \(\$2,680\).