渋谷駅前で働くデータサイエンティストのブログ

元祖「六本木で働くデータサイエンティスト」です / 道玄坂→銀座→東京→六本木→渋谷駅前

生TensorFlow七転八倒記(4):簡易版MNISTをMLPで分類してみる

出張していたり足底筋膜炎にかかったりしているうちに、すっかり生TensorFlowの勉強が滞ってしまっていました。とりあえず先に進みます。いつも通りですが、特に意味はないものの教科書としてこちらを挙げておきます。

深層学習 (機械学習プロフェッショナルシリーズ)

深層学習 (機械学習プロフェッショナルシリーズ)

今回は隠れ層を2層以上にしたDeep Learningです。と言ってもまだ本格的なものはやらずに、ひとまずMLP (multi-layer perceptron)を試すに留めておきます。


生TensorFlowでやってみる


今回はGitHub上に転がしてある簡易版MNIST(学習データ5000行&テストデータ1000行)でやります。


import tensorflow as tf
import numpy as np
import pandas as pd
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import confusion_matrix
from sklearn.metrics import accuracy_score

d_train = pd.read_csv("short_prac_train.csv", sep=',')
d_test = pd.read_csv("short_prac_test.csv", sep=',')

train_X = d_train.iloc[:, 1:785]/255
train_Y = d_train[[0]]
test_X = d_test.iloc[:, 1:785]/255
test_Y = d_test[[0]]

x = tf.placeholder(tf.float32, [None, 784])

W1 = tf.Variable(tf.random_normal([784,512], mean=0.0, stddev=0.5))
b1 = tf.Variable(tf.random_normal([512], mean=0.0, stddev=0.1))
y1 = tf.matmul(x, W1) + b1
y1 = tf.nn.relu(y1)

W2 = tf.Variable(tf.random_normal([512,256], mean=0.0, stddev=0.5))
b2 = tf.Variable(tf.random_normal([256], mean=0.0, stddev=0.1))
y2 = tf.matmul(y1, W2) + b2
y2 = tf.nn.relu(y2)

W3 = tf.Variable(tf.random_normal([256,10], mean=0.0, stddev=0.5))
b3 = tf.Variable(tf.random_normal([10], mean=0.0, stddev=0.1))
y3 = tf.matmul(y2, W3) + b3

y = tf.placeholder(tf.int64, [None, 1])
y_ = tf.one_hot(indices = y, depth = 10)
global_step = tf.Variable(0, trainable=False)
starter_learning_rate = 0.01
learning_rate = tf.train.exponential_decay(starter_learning_rate, global_step,
                                           10000, 1 - 1e-6, staircase=True)


cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels = y_, logits = y3))
optimizer = tf.train.MomentumOptimizer(learning_rate, momentum = 0.9, use_nesterov=True).minimize(cost, global_step = global_step)

# With minibatch
init = tf.initialize_all_variables()
sess = tf.Session()
sess.run(init)

num_epochs = 10
num_data = train_X.shape[0]
batch_size = 100
for epoch in range(num_epochs):
  s_idx = np.random.permutation(num_data)
  for idx in range(0, num_data, batch_size):
    batch_x = train_X.iloc[s_idx[idx: idx + batch_size].tolist(),:]
    batch_y = train_Y.iloc[s_idx[idx: idx + batch_size].tolist()]
    sess.run(optimizer, feed_dict={x:batch_x, y:batch_y})

pred = sess.run(tf.nn.softmax(y3), feed_dict = {x: test_X})
pred_d = tf.argmax(pred,1)
from sklearn.metrics import confusion_matrix
from sklearn.metrics import accuracy_score
print confusion_matrix(test_Y, sess.run(pred_d)), accuracy_score(test_Y, sess.run(pred_d))
[[94  1  1  0  0  1  1  1  1  0]
 [ 0 96  0  0  0  1  0  1  2  0]
 [ 0  1 88  2  2  1  2  1  1  2]
 [ 1  0  4 78  0  3  1  1  4  8]
 [ 1  1  1  0 78  1  0  2  2 14]
 [ 5  1  0  7  0 78  2  1  5  1]
 [ 1  0  7  0  1  3 84  2  2  0]
 [ 0  1  2  0  3  2  0 87  0  5]
 [ 0  1  2  3  2  2  1  0 89  0]
 [ 0  0  1  1 11  1  1  0  1 84]] 0.856

ACC 0.856というのは。。。ちょっと低い気が。。。


Keras for Rでやってみる


実は全く同じことを以前Keras for Rでやったことがあります。


> train <- read.csv('short_prac_train.csv', header=T, sep=',')
> test <- read.csv('short_prac_test.csv', header=T, sep=',')
> train.x <- as.matrix(train[,-1]/255)
> test.x <- as.matrix(test[,-1]/255)
> train.y <- train[,1] %>% to_categorical(num_classes = 10)
> test.y <- test[,1] %>% to_categorical(num_classes = 10)

> library(keras)
> model <- keras_model_sequential()
> model %>%
+ layer_dense(units = 128, input_shape=c(784)) %>%
+ layer_activation(activation = 'relu') %>%
+ layer_dense(units = 64) %>%
+ layer_activation(activation = 'relu') %>%
+ layer_dense(units = 10) %>%
+ layer_activation(activation = 'softmax') %>%
+ compile(
+ loss = 'categorical_crossentropy',
+ optimizer = optimizer_sgd(lr = 0.07, decay = 1e-6, momentum = 0.9, nesterov = TRUE)
+ )
> t <- proc.time()
> model %>% fit(train.x, train.y, epochs = 10, batch_size = 100)
Epoch 1/10
5000/5000 [==============================] - 0s - loss: 0.7911     
Epoch 2/10
5000/5000 [==============================] - 0s - loss: 0.2737     
Epoch 3/10
5000/5000 [==============================] - 0s - loss: 0.1813     
Epoch 4/10
5000/5000 [==============================] - 0s - loss: 0.1218     
Epoch 5/10
5000/5000 [==============================] - 0s - loss: 0.0807     
Epoch 6/10
5000/5000 [==============================] - 0s - loss: 0.0583     
Epoch 7/10
5000/5000 [==============================] - 0s - loss: 0.0358     
Epoch 8/10
5000/5000 [==============================] - 0s - loss: 0.0238     
Epoch 9/10
5000/5000 [==============================] - 0s - loss: 0.0162     
Epoch 10/10
5000/5000 [==============================] - 0s - loss: 0.0104     
> proc.time() - t
   ユーザ   システム       経過  
     5.216      0.538      3.332 
> pred_class <- model %>% predict(test.x, batch_size=100)
> pred_label <- t(max.col(pred_class))
> table(test[,1], pred_label)
   pred_label
     1  2  3  4  5  6  7  8  9 10
  0 94  0  0  0  0  1  2  1  1  1
  1  0 97  2  0  0  0  0  0  1  0
  2  0  0 96  1  0  0  1  2  0  0
  3  0  1  2 91  0  3  0  0  2  1
  4  0  0  0  0 95  0  2  1  0  2
  5  0  1  0  1  0 95  2  0  1  0
  6  0  0  0  0  0  2 97  0  1  0
  7  0  0  0  0  1  0  0 96  0  3
  8  0  0  1  1  1  0  0  0 97  0
  9  0  0  0  0  2  1  1  1  0 95
> sum(diag(table(test[,1], pred_label)))/nrow(test)
[1] 0.953 # seed次第で変動する

ということで、もっと上がる余地があるはずなんですが。。。どこを直したものやら。詳しい方ご教授お願いいたしますm(_ _)m


追記1


いただいたコメントに基づいてモデルのところを以下のように変えました。

x = tf.placeholder(tf.float32, [None, 784])

# W1 = tf.Variable(tf.random_normal([784, 512], mean=0.0, stddev=1.0))
W1 = tf.Variable(tf.truncated_normal([784, 512], mean=0.0, stddev=tf.sqrt(2.0 / (784.0 + 512.0))))
# b1 = tf.Variable(tf.random_normal([512], mean=0.0, stddev=0.5))
b1 = tf.Variable(tf.zeros([512]))
y1 = tf.matmul(x, W1) + b1
y1 = tf.nn.relu(y1)

# W2 = tf.Variable(tf.random_normal([512, 256], mean=0.0, stddev=1.0))
W2 = tf.Variable(tf.truncated_normal([512, 256], mean=0.0, stddev=tf.sqrt(2.0 / (512.0 + 256.0))))
# b2 = tf.Variable(tf.random_normal([256], mean=0.0, stddev=0.5))
b2 = tf.Variable(tf.zeros([256]))
y2 = tf.matmul(y1, W2) + b2
y2 = tf.nn.relu(y2)

# W3 = tf.Variable(tf.random_normal([256, 10], mean=0.0, stddev=1.0))
W3 = tf.Variable(tf.truncated_normal([256, 10], mean=0.0, stddev=tf.sqrt(2.0 / (256.0 + 10.0))))
# b3 = tf.Variable(tf.random_normal([10], mean=0.0, stddev=0.5))
b3 = tf.Variable(tf.zeros([10]))
y3 = tf.matmul(y2, W3) + b3

y = tf.placeholder(tf.int64, [None, 1])
y_ = tf.one_hot(indices = y, depth = 10)
global_step = tf.Variable(0, trainable=False)
starter_learning_rate = 0.001
learning_rate = tf.train.exponential_decay(starter_learning_rate, global_step,
                                           10000, 1 - 1e-6, staircase=True)


cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels = y_, logits = y3))
optimizer = tf.train.MomentumOptimizer(learning_rate, momentum = 0.9, use_nesterov=True).minimize(cost, global_step = global_step)

init = tf.initialize_all_variables()
sess = tf.Session()
sess.run(init)

num_epochs = 50
num_data = train_X.shape[0]
batch_size = 100
for epoch in range(num_epochs):
  s_idx = np.random.permutation(num_data)
  for idx in range(0, num_data, batch_size):
    batch_x = train_X.iloc[s_idx[idx: idx + batch_size].tolist(),:]
    batch_y = train_Y.iloc[s_idx[idx: idx + batch_size].tolist()]
    sess.run(optimizer, feed_dict={x:batch_x, y:batch_y})

pred = sess.run(tf.nn.softmax(y3), feed_dict = {x: test_X})
pred_d = tf.argmax(pred,1)
from sklearn.metrics import confusion_matrix
from sklearn.metrics import accuracy_score
print confusion_matrix(test_Y, sess.run(pred_d)), accuracy_score(test_Y, sess.run(pred_d))
[[93  0  0  2  0  1  3  0  1  0]
 [ 0 96  1  0  0  0  0  1  2  0]
 [ 0  1 89  3  1  0  2  2  2  0]
 [ 0  0  3 87  0  2  0  1  4  3]
 [ 0  2  0  0 89  0  3  0  0  6]
 [ 0  1  0  5  2 81  5  1  4  1]
 [ 0  0  3  0  3  2 91  0  1  0]
 [ 0  1  0  0  2  0  0 94  0  3]
 [ 0  1  1  1  1  3  0  0 93  0]
 [ 0  0  0  0  5  1  0  2  0 92]] 0.905

ACC 0.905まで来ました。


追記2


ついでにepochsを増やしてみました。

init = tf.initialize_all_variables()
sess = tf.Session()
sess.run(init)

num_epochs = 500
num_data = train_X.shape[0]
batch_size = 100
for epoch in range(num_epochs):
  s_idx = np.random.permutation(num_data)
  for idx in range(0, num_data, batch_size):
    batch_x = train_X.iloc[s_idx[idx: idx + batch_size].tolist(),:]
    batch_y = train_Y.iloc[s_idx[idx: idx + batch_size].tolist()]
    sess.run(optimizer, feed_dict={x:batch_x, y:batch_y})

pred = sess.run(tf.nn.softmax(y3), feed_dict = {x: test_X})
pred_d = tf.argmax(pred,1)
from sklearn.metrics import confusion_matrix
from sklearn.metrics import accuracy_score
print confusion_matrix(test_Y, sess.run(pred_d)), accuracy_score(test_Y, sess.run(pred_d))
[[93  0  1  1  0  1  2  0  1  1]
 [ 0 99  1  0  0  0  0  0  0  0]
 [ 0  1 94  2  1  0  0  2  0  0]
 [ 0  0  3 91  0  2  0  0  3  1]
 [ 0  0  1  0 93  0  2  1  0  3]
 [ 0  1  0  2  0 95  1  0  1  0]
 [ 1  0  0  0  0  2 97  0  0  0]
 [ 0  0  0  0  3  0  0 95  0  2]
 [ 0  0  1  1  0  0  1  0 97  0]
 [ 0  0  0  0  3  1  1  1  0 94]] 0.948

ACC 0.948まで上がりました。