出張していたり足底筋膜炎にかかったりしているうちに、すっかり生TensorFlowの勉強が滞ってしまっていました。とりあえず先に進みます。いつも通りですが、特に意味はないものの教科書としてこちらを挙げておきます。
- 作者: 岡谷貴之
- 出版社/メーカー: 講談社
- 発売日: 2015/04/08
- メディア: 単行本(ソフトカバー)
- この商品を含むブログ (13件) を見る
今回は隠れ層を2層以上にしたDeep Learningです。と言ってもまだ本格的なものはやらずに、ひとまずMLP (multi-layer perceptron)を試すに留めておきます。
生TensorFlowでやってみる
今回はGitHub上に転がしてある簡易版MNIST(学習データ5000行&テストデータ1000行)でやります。
import tensorflow as tf import numpy as np import pandas as pd from sklearn.preprocessing import StandardScaler from sklearn.metrics import confusion_matrix from sklearn.metrics import accuracy_score d_train = pd.read_csv("short_prac_train.csv", sep=',') d_test = pd.read_csv("short_prac_test.csv", sep=',') train_X = d_train.iloc[:, 1:785]/255 train_Y = d_train[[0]] test_X = d_test.iloc[:, 1:785]/255 test_Y = d_test[[0]] x = tf.placeholder(tf.float32, [None, 784]) W1 = tf.Variable(tf.random_normal([784,512], mean=0.0, stddev=0.5)) b1 = tf.Variable(tf.random_normal([512], mean=0.0, stddev=0.1)) y1 = tf.matmul(x, W1) + b1 y1 = tf.nn.relu(y1) W2 = tf.Variable(tf.random_normal([512,256], mean=0.0, stddev=0.5)) b2 = tf.Variable(tf.random_normal([256], mean=0.0, stddev=0.1)) y2 = tf.matmul(y1, W2) + b2 y2 = tf.nn.relu(y2) W3 = tf.Variable(tf.random_normal([256,10], mean=0.0, stddev=0.5)) b3 = tf.Variable(tf.random_normal([10], mean=0.0, stddev=0.1)) y3 = tf.matmul(y2, W3) + b3 y = tf.placeholder(tf.int64, [None, 1]) y_ = tf.one_hot(indices = y, depth = 10) global_step = tf.Variable(0, trainable=False) starter_learning_rate = 0.01 learning_rate = tf.train.exponential_decay(starter_learning_rate, global_step, 10000, 1 - 1e-6, staircase=True) cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels = y_, logits = y3)) optimizer = tf.train.MomentumOptimizer(learning_rate, momentum = 0.9, use_nesterov=True).minimize(cost, global_step = global_step) # With minibatch init = tf.initialize_all_variables() sess = tf.Session() sess.run(init) num_epochs = 10 num_data = train_X.shape[0] batch_size = 100 for epoch in range(num_epochs): s_idx = np.random.permutation(num_data) for idx in range(0, num_data, batch_size): batch_x = train_X.iloc[s_idx[idx: idx + batch_size].tolist(),:] batch_y = train_Y.iloc[s_idx[idx: idx + batch_size].tolist()] sess.run(optimizer, feed_dict={x:batch_x, y:batch_y}) pred = sess.run(tf.nn.softmax(y3), feed_dict = {x: test_X}) pred_d = tf.argmax(pred,1) from sklearn.metrics import confusion_matrix from sklearn.metrics import accuracy_score print confusion_matrix(test_Y, sess.run(pred_d)), accuracy_score(test_Y, sess.run(pred_d))
[[94 1 1 0 0 1 1 1 1 0] [ 0 96 0 0 0 1 0 1 2 0] [ 0 1 88 2 2 1 2 1 1 2] [ 1 0 4 78 0 3 1 1 4 8] [ 1 1 1 0 78 1 0 2 2 14] [ 5 1 0 7 0 78 2 1 5 1] [ 1 0 7 0 1 3 84 2 2 0] [ 0 1 2 0 3 2 0 87 0 5] [ 0 1 2 3 2 2 1 0 89 0] [ 0 0 1 1 11 1 1 0 1 84]] 0.856
ACC 0.856というのは。。。ちょっと低い気が。。。
Keras for Rでやってみる
実は全く同じことを以前Keras for Rでやったことがあります。
> train <- read.csv('short_prac_train.csv', header=T, sep=',') > test <- read.csv('short_prac_test.csv', header=T, sep=',') > train.x <- as.matrix(train[,-1]/255) > test.x <- as.matrix(test[,-1]/255) > train.y <- train[,1] %>% to_categorical(num_classes = 10) > test.y <- test[,1] %>% to_categorical(num_classes = 10) > library(keras) > model <- keras_model_sequential() > model %>% + layer_dense(units = 128, input_shape=c(784)) %>% + layer_activation(activation = 'relu') %>% + layer_dense(units = 64) %>% + layer_activation(activation = 'relu') %>% + layer_dense(units = 10) %>% + layer_activation(activation = 'softmax') %>% + compile( + loss = 'categorical_crossentropy', + optimizer = optimizer_sgd(lr = 0.07, decay = 1e-6, momentum = 0.9, nesterov = TRUE) + ) > t <- proc.time() > model %>% fit(train.x, train.y, epochs = 10, batch_size = 100) Epoch 1/10 5000/5000 [==============================] - 0s - loss: 0.7911 Epoch 2/10 5000/5000 [==============================] - 0s - loss: 0.2737 Epoch 3/10 5000/5000 [==============================] - 0s - loss: 0.1813 Epoch 4/10 5000/5000 [==============================] - 0s - loss: 0.1218 Epoch 5/10 5000/5000 [==============================] - 0s - loss: 0.0807 Epoch 6/10 5000/5000 [==============================] - 0s - loss: 0.0583 Epoch 7/10 5000/5000 [==============================] - 0s - loss: 0.0358 Epoch 8/10 5000/5000 [==============================] - 0s - loss: 0.0238 Epoch 9/10 5000/5000 [==============================] - 0s - loss: 0.0162 Epoch 10/10 5000/5000 [==============================] - 0s - loss: 0.0104 > proc.time() - t ユーザ システム 経過 5.216 0.538 3.332 > pred_class <- model %>% predict(test.x, batch_size=100) > pred_label <- t(max.col(pred_class)) > table(test[,1], pred_label) pred_label 1 2 3 4 5 6 7 8 9 10 0 94 0 0 0 0 1 2 1 1 1 1 0 97 2 0 0 0 0 0 1 0 2 0 0 96 1 0 0 1 2 0 0 3 0 1 2 91 0 3 0 0 2 1 4 0 0 0 0 95 0 2 1 0 2 5 0 1 0 1 0 95 2 0 1 0 6 0 0 0 0 0 2 97 0 1 0 7 0 0 0 0 1 0 0 96 0 3 8 0 0 1 1 1 0 0 0 97 0 9 0 0 0 0 2 1 1 1 0 95 > sum(diag(table(test[,1], pred_label)))/nrow(test) [1] 0.953 # seed次第で変動する
ということで、もっと上がる余地があるはずなんですが。。。どこを直したものやら。詳しい方ご教授お願いいたしますm(_ _)m
追記1
いただいたコメントに基づいてモデルのところを以下のように変えました。
x = tf.placeholder(tf.float32, [None, 784]) # W1 = tf.Variable(tf.random_normal([784, 512], mean=0.0, stddev=1.0)) W1 = tf.Variable(tf.truncated_normal([784, 512], mean=0.0, stddev=tf.sqrt(2.0 / (784.0 + 512.0)))) # b1 = tf.Variable(tf.random_normal([512], mean=0.0, stddev=0.5)) b1 = tf.Variable(tf.zeros([512])) y1 = tf.matmul(x, W1) + b1 y1 = tf.nn.relu(y1) # W2 = tf.Variable(tf.random_normal([512, 256], mean=0.0, stddev=1.0)) W2 = tf.Variable(tf.truncated_normal([512, 256], mean=0.0, stddev=tf.sqrt(2.0 / (512.0 + 256.0)))) # b2 = tf.Variable(tf.random_normal([256], mean=0.0, stddev=0.5)) b2 = tf.Variable(tf.zeros([256])) y2 = tf.matmul(y1, W2) + b2 y2 = tf.nn.relu(y2) # W3 = tf.Variable(tf.random_normal([256, 10], mean=0.0, stddev=1.0)) W3 = tf.Variable(tf.truncated_normal([256, 10], mean=0.0, stddev=tf.sqrt(2.0 / (256.0 + 10.0)))) # b3 = tf.Variable(tf.random_normal([10], mean=0.0, stddev=0.5)) b3 = tf.Variable(tf.zeros([10])) y3 = tf.matmul(y2, W3) + b3 y = tf.placeholder(tf.int64, [None, 1]) y_ = tf.one_hot(indices = y, depth = 10) global_step = tf.Variable(0, trainable=False) starter_learning_rate = 0.001 learning_rate = tf.train.exponential_decay(starter_learning_rate, global_step, 10000, 1 - 1e-6, staircase=True) cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels = y_, logits = y3)) optimizer = tf.train.MomentumOptimizer(learning_rate, momentum = 0.9, use_nesterov=True).minimize(cost, global_step = global_step) init = tf.initialize_all_variables() sess = tf.Session() sess.run(init) num_epochs = 50 num_data = train_X.shape[0] batch_size = 100 for epoch in range(num_epochs): s_idx = np.random.permutation(num_data) for idx in range(0, num_data, batch_size): batch_x = train_X.iloc[s_idx[idx: idx + batch_size].tolist(),:] batch_y = train_Y.iloc[s_idx[idx: idx + batch_size].tolist()] sess.run(optimizer, feed_dict={x:batch_x, y:batch_y}) pred = sess.run(tf.nn.softmax(y3), feed_dict = {x: test_X}) pred_d = tf.argmax(pred,1) from sklearn.metrics import confusion_matrix from sklearn.metrics import accuracy_score print confusion_matrix(test_Y, sess.run(pred_d)), accuracy_score(test_Y, sess.run(pred_d))
[[93 0 0 2 0 1 3 0 1 0] [ 0 96 1 0 0 0 0 1 2 0] [ 0 1 89 3 1 0 2 2 2 0] [ 0 0 3 87 0 2 0 1 4 3] [ 0 2 0 0 89 0 3 0 0 6] [ 0 1 0 5 2 81 5 1 4 1] [ 0 0 3 0 3 2 91 0 1 0] [ 0 1 0 0 2 0 0 94 0 3] [ 0 1 1 1 1 3 0 0 93 0] [ 0 0 0 0 5 1 0 2 0 92]] 0.905
ACC 0.905まで来ました。
追記2
ついでにepochsを増やしてみました。
init = tf.initialize_all_variables() sess = tf.Session() sess.run(init) num_epochs = 500 num_data = train_X.shape[0] batch_size = 100 for epoch in range(num_epochs): s_idx = np.random.permutation(num_data) for idx in range(0, num_data, batch_size): batch_x = train_X.iloc[s_idx[idx: idx + batch_size].tolist(),:] batch_y = train_Y.iloc[s_idx[idx: idx + batch_size].tolist()] sess.run(optimizer, feed_dict={x:batch_x, y:batch_y}) pred = sess.run(tf.nn.softmax(y3), feed_dict = {x: test_X}) pred_d = tf.argmax(pred,1) from sklearn.metrics import confusion_matrix from sklearn.metrics import accuracy_score print confusion_matrix(test_Y, sess.run(pred_d)), accuracy_score(test_Y, sess.run(pred_d))
[[93 0 1 1 0 1 2 0 1 1] [ 0 99 1 0 0 0 0 0 0 0] [ 0 1 94 2 1 0 0 2 0 0] [ 0 0 3 91 0 2 0 0 3 1] [ 0 0 1 0 93 0 2 1 0 3] [ 0 1 0 2 0 95 1 0 1 0] [ 1 0 0 0 0 2 97 0 0 0] [ 0 0 0 0 3 0 0 95 0 2] [ 0 0 1 1 0 0 1 0 97 0] [ 0 0 0 0 3 1 1 1 0 94]] 0.948
ACC 0.948まで上がりました。