安装好了tensorflow(TensorFlow 安装笔记),接下来就在他的官网指导下进行Mnist手写数字识别实验。

softmax 实验过程

进入tfgpu虚拟环境后,首先进入目录:/anaconda2/envs/tfgpu/lib/python2.7/site-packages/tensorflow/examples/tutorials/mnist/,然后进入IPython交互终端。

In [4]: from tensorflow.examples.tutorials.mnist import input_data
   ...: mnist = input_data.read_data_sets(\"MNIST_data/\", one_hot=True)
   ...: 
Successfully downloaded train-images-idx3-ubyte.gz 9912422 bytes.
Extracting MNIST_data/train-images-idx3-ubyte.gz
Successfully downloaded train-labels-idx1-ubyte.gz 28881 bytes.
Extracting MNIST_data/train-labels-idx1-ubyte.gz
Successfully downloaded t10k-images-idx3-ubyte.gz 1648877 bytes.
Extracting MNIST_data/t10k-images-idx3-ubyte.gz
Successfully downloaded t10k-labels-idx1-ubyte.gz 4542 bytes.
Extracting MNIST_data/t10k-labels-idx1-ubyte.gz

In [5]: import tensorflow as tf

In [6]: x = tf.placeholder(tf.float32, [None, 784])

In [7]: W = tf.Variable(tf.zeros([784, 10]))
   ...: b = tf.Variable(tf.zeros([10]))
   ...: 

In [8]: y = tf.nn.softmax(tf.matmul(x, W) + b)

In [9]: y_ = tf.placeholder(tf.float32, [None, 10])

In [10]: cross_entropy = tf.reduce_mean(-tf.reduce_sum(y_ * tf.log(y), reduction_indices=[1]))

In [11]: train_step = tf.train.GradientDescentOptimizer(0.5).minimize(cross_entropy)

In [12]: init = tf.initialize_all_variables()

In [13]: sess = tf.Session()
    ...: sess.run(init)
    ...: 
I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:925] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
I tensorflow/core/common_runtime/gpu/gpu_init.cc:102] Found device 0 with properties: 
name: GeForce 940M
major: 5 minor: 0 memoryClockRate (GHz) 1.124
pciBusID 0000:08:00.0
Total memory: 1023.88MiB
Free memory: 997.54MiB
I tensorflow/core/common_runtime/gpu/gpu_init.cc:126] DMA: 0 
I tensorflow/core/common_runtime/gpu/gpu_init.cc:136] 0:   Y 
I tensorflow/core/common_runtime/gpu/gpu_device.cc:839] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce 940M, pci bus id: 0000:08:00.0)

In [14]: for i in range(1000):
    ...:   batch_xs, batch_ys = mnist.train.next_batch(100)
    ...:   sess.run(train_step, feed_dict={x: batch_xs, y_: batch_ys})
    ...:   

In [15]: correct_prediction = tf.equal(tf.argmax(y,1), tf.argmax(y_,1))

In [16]: accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
    ...: 

In [17]: print(sess.run(accuracy, feed_dict={x: mnist.test.images, y_: mnist.test.labels}))
0.9186

softmax实验说明

In [4]:主要是下载数据

In [6]:意思是先分配输入x,None即输入图片数量稍后运行时确定,784即28*28,把一张28*28的图片拉长成为一维向量,保证每张图片拉长方式相同即可

In [7]:分配权重w和偏置b

In [8]:实现softmax模型,获得输出判断值y

In [9]: 分配实际判断值y_

In [10]:获得交叉熵形式的代价函数

In [11]:每一步使用0.5的学习率(步长)来进行梯度下降算法

In [12]:初始化所有变量

In [13]:开启一个会话,启动模型

In [14]:进行1000次随机梯度下降算法

In [15]:比较输出判断值y和真实判断值y_

In [16]:获得准确率

In [17]:获得测试集上的准确率:91.86%

神经网络实验过程

In [1]: from tensorflow.examples.tutorials.mnist import input_data
   ...: mnist = input_data.read_data_sets(\'MNIST_data\', one_hot=True)
   ...: 
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcublas.so locally
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcudnn.so locally
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcufft.so locally
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcurand.so locally
Extracting MNIST_data/train-images-idx3-ubyte.gz
Extracting MNIST_data/train-labels-idx1-ubyte.gz
Extracting MNIST_data/t10k-images-idx3-ubyte.gz
Extracting MNIST_data/t10k-labels-idx1-ubyte.gz

In [2]: import tensorflow as tf
   ...: sess = tf.InteractiveSession()
   ...: 
I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:925] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
I tensorflow/core/common_runtime/gpu/gpu_init.cc:102] Found device 0 with properties: 
name: GeForce 940M
major: 5 minor: 0 memoryClockRate (GHz) 1.124
pciBusID 0000:08:00.0
Total memory: 1023.88MiB
Free memory: 997.54MiB
I tensorflow/core/common_runtime/gpu/gpu_init.cc:126] DMA: 0 
I tensorflow/core/common_runtime/gpu/gpu_init.cc:136] 0:   Y 
I tensorflow/core/common_runtime/gpu/gpu_device.cc:839] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce 940M, pci bus id: 0000:08:00.0)

In [3]: def weight_variable(shape):
   ...:   initial = tf.truncated_normal(shape, stddev=0.1)
   ...:   return tf.Variable(initial)
   ...: 
   ...: def bias_variable(shape):
   ...:   initial = tf.constant(0.1, shape=shape)
   ...:   return tf.Variable(initial)
   ...: 

In [4]: 

In [4]: def conv2d(x, W):
   ...:   return tf.nn.conv2d(x, W, strides=[1, 1, 1, 1], padding=\'SAME\')
   ...: 
   ...: def max_pool_2x2(x):
   ...:   return tf.nn.max_pool(x, ksize=[1, 2, 2, 1],
   ...:                         strides=[1, 2, 2, 1], padding=\'SAME\')
   ...: 

In [5]: W_conv1 = weight_variable([5, 5, 1, 32])
   ...: b_conv1 = bias_variable([32])
   ...: 

In [6]: 

In [7]: x = tf.placeholder(tf.float32, shape=[None, 784])
   ...: y_ = tf.placeholder(tf.float32, shape=[None, 10])
   ...: 

In [8]: x_image = tf.reshape(x, [-1,28,28,1])
   ...: 

In [9]: h_conv1 = tf.nn.relu(conv2d(x_image, W_conv1) + b_conv1)
   ...: h_pool1 = max_pool_2x2(h_conv1)
   ...: 

In [10]: W_conv2 = weight_variable([5, 5, 32, 64])
    ...: b_conv2 = bias_variable([64])
    ...: 
    ...: h_conv2 = tf.nn.relu(conv2d(h_pool1, W_conv2) + b_conv2)
    ...: h_pool2 = max_pool_2x2(h_conv2)
    ...: 

In [11]: W_fc1 = weight_variable([7 * 7 * 64, 1024])
    ...: b_fc1 = bias_variable([1024])
    ...: 
    ...: h_pool2_flat = tf.reshape(h_pool2, [-1, 7*7*64])
    ...: h_fc1 = tf.nn.relu(tf.matmul(h_pool2_flat, W_fc1) + b_fc1)
    ...: 

In [12]: keep_prob = tf.placeholder(tf.float32)
    ...: h_fc1_drop = tf.nn.dropout(h_fc1, keep_prob)
    ...: 

In [13]: W_fc2 = weight_variable([1024, 10])
    ...: b_fc2 = bias_variable([10])
    ...: 
    ...: y_conv=tf.nn.softmax(tf.matmul(h_fc1_drop, W_fc2) + b_fc2)
    ...: 

In [14]: cross_entropy = tf.reduce_mean(-tf.reduce_sum(y_ * tf.log(y_conv), reduction_indices=[1]))
    ...: train_step = tf.train.AdamOptimizer(1e-4).minimize(cross_entropy)
    ...: correct_prediction = tf.equal(tf.argmax(y_conv,1), tf.argmax(y_,1))
    ...: accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
    ...: sess.run(tf.initialize_all_variables())
    ...: for i in range(2000):
    ...:   batch = mnist.train.next_batch(50)
    ...:   if i%100 == 0:
    ...:     train_accuracy = accuracy.eval(feed_dict={
    ...:         x:batch[0], y_: batch[1], keep_prob: 1.0})
    ...:     print(\"step %d, training accuracy %g\"%(i, train_accuracy))
    ...:   train_step.run(feed_dict={x: batch[0], y_: batch[1], keep_prob: 0.5})
    ...: 
    ...: print(\"test accuracy %g\"%accuracy.eval(feed_dict={
    ...:     x: mnist.test.images, y_: mnist.test.labels, keep_prob: 1.0}))
    ...: 
step 0, training accuracy 0.04
step 100, training accuracy 0.86
step 200, training accuracy 0.92
step 300, training accuracy 0.88
step 400, training accuracy 0.96
step 500, training accuracy 0.9
step 600, training accuracy 1
step 700, training accuracy 0.98
step 800, training accuracy 0.92
step 900, training accuracy 0.98
step 1000, training accuracy 0.94
step 1100, training accuracy 0.96
step 1200, training accuracy 1
step 1300, training accuracy 0.98
step 1400, training accuracy 0.94
step 1500, training accuracy 0.96
step 1600, training accuracy 1
step 1700, training accuracy 0.92
step 1800, training accuracy 0.92
step 1900, training accuracy 0.96
I tensorflow/core/common_runtime/bfc_allocator.cc:639] Bin (256):     Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:639] Bin (512):     Total Chunks: 1, Chunks in use: 0 768B allocated for chunks. 6.4KiB client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:639] Bin (1024):     Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
...
...
...
...
Limit:                   836280320
InUse:                    83845120
MaxInUse:                117678336
NumAllocs:                  246915
MaxAllocSize:             45883392

W tensorflow/core/common_runtime/bfc_allocator.cc:270] *****_******________________________________________________________________________________________
W tensorflow/core/common_runtime/bfc_allocator.cc:271] Ran out of memory trying to allocate 957.03MiB.  See logs for memory state.
W tensorflow/core/framework/op_kernel.cc:936] Resource exhausted: OOM when allocating tensor with shape[10000,28,28,32]
E tensorflow/core/client/tensor_c_api.cc:485] OOM when allocating tensor with shape[10000,28,28,32]
     [[Node: Conv2D = Conv2D[T=DT_FLOAT, data_format=\"NHWC\", padding=\"SAME\", strides=[1, 1, 1, 1], use_cudnn_on_gpu=true, _device=\"/job:localhost/replica:0/task:0/gpu:0\"](Reshape, Variable/read)]]


In [20]: cross_entropy = tf.reduce_mean(-tf.reduce_sum(y_ * tf.log(y_conv), reduction_indices=[1]))
    ...: train_step = tf.train.AdamOptimizer(1e-4).minimize(cross_entropy)
    ...: correct_prediction = tf.equal(tf.argmax(y_conv,1), tf.argmax(y_,1))
    ...: accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
    ...: sess.run(tf.initialize_all_variables())
    ...: for i in range(20000):
    ...:   batch = mnist.train.next_batch(50)
    ...:   if i%100 == 0:
    ...:     train_accuracy = accuracy.eval(feed_dict={
    ...:         x:batch[0], y_: batch[1], keep_prob: 1.0})
    ...:     print(\"step %d, training accuracy %g\"%(i, train_accuracy))
    ...:   train_step.run(feed_dict={x: batch[0], y_: batch[1], keep_prob: 0.5})
    ...: 
    ...: print(\"test accuracy %g\"%accuracy.eval(feed_dict={
    ...:     x: mnist.test.images[0:200,:], y_: mnist.test.labels[0:200,:], keep_prob: 1.0}))
    ...: 
step 0, training accuracy 0.12
step 100, training accuracy 0.78
step 200, training accuracy 0.88
step 300, training accuracy 0.96
step 400, training accuracy 0.9
step 500, training accuracy 0.96
step 600, training accuracy 0.94
step 700, training accuracy 0.92
step 800, training accuracy 0.92
step 900, training accuracy 0.96
step 1000, training accuracy 0.94
step 1100, training accuracy 0.98
step 1200, training accuracy 0.96
step 1300, training accuracy 1
step 1400, training accuracy 0.98
...
...
...
test accuracy 0.995

In [21]: cross_entropy = -tf.reduce_sum(y_*tf.log(y_conv))
...: train_step = tf.train.AdamOptimizer(1e-4).minimize(cross_entropy)
...: correct_prediction = tf.equal(tf.argmax(y_conv,1), tf.argmax(y_,1))
...: accuracy = tf.reduce_mean(tf.cast(correct_prediction, \"float\"))
...: sess.run(tf.initialize_all_variables())
...: for i in range(20000):
...:   batch = mnist.train.next_batch(50)
...:   if i%100 == 0:
...:     train_accuracy = accuracy.eval(feed_dict={
...:         x:batch[0], y_: batch[1], keep_prob: 1.0})
...:     print \"step %d, training accuracy %g\"%(i, train_accuracy)
...:   train_step.run(feed_dict={x: batch[0], y_: batch[1], keep_prob: 0.5})
...: 
...: print \"test accuracy %g\"%accuracy.eval(feed_dict={
...:     x: mnist.test.images[200:400,:], y_: mnist.test.labels[200:400,:], keep_prob: 1.0})
...:     
...: 
step 0, training accuracy 0.12
step 100, training accuracy 0.94
step 200, training accuracy 0.86
step 300, training accuracy 0.96
step 400, training accuracy 0.9
step 500, training accuracy 1
step 600, training accuracy 0.96
step 700, training accuracy 0.88
step 800, training accuracy 1
step 900, training accuracy 0.98
step 1000, training accuracy 0.96
step 1100, training accuracy 0.94
step 1200, training accuracy 0.96
step 1300, training accuracy 0.96
step 1400, training accuracy 0.94
step 1500, training accuracy 0.98
step 1600, training accuracy 0.96
step 1700, training accuracy 0.98
...
...
...
test accuracy 0.975

In [22]: for i in range(20000):
...:   batch = mnist.train.next_batch(50)
...:   if i%100 == 0:
...:     train_accuracy = accuracy.eval(feed_dict={
...:         x:batch[0], y_: batch[1], keep_prob: 1.0})
...:     print \"step %d, training accuracy %g\"%(i, train_accuracy)
...:   train_step.run(feed_dict={x: batch[0], y_: batch[1], keep_prob: 0.5})
...: 
...: print \"test accuracy %g\"%accuracy.eval(feed_dict={
...:     x: mnist.test.images[400:1000,:], y_: mnist.test.labels[400:1000,:], keep_prob: 1.0})
...: 
step 0, training accuracy 1
step 100, training accuracy 1
step 200, training accuracy 0.98
step 300, training accuracy 0.98
step 400, training accuracy 0.98
step 500, training accuracy 1
step 600, training accuracy 0.96
step 700, training accuracy 0.96
...
...
...
test accuracy  0.983333

神经网络实验说明

In [1]: 导入数据,即测试集和验证集

In [2]: 引入 tensorflow 启动InteractiveSession(比session更灵活)

In [3]: 定义两个初始化w和b的函数,方便后续操作

In [4]: 定义卷积和池化函数,这里卷积采用padding,使得输入输出图像一样大,池化采取2x2,那么就是4格变一格

In [5]: 定义第一层卷积的w和b

In [7]: 分配输入x和y_

In [8]: 修改x的shape

In [9]: 把x_image和w进行卷积,加上b,然后应用ReLU激活函数,最后进行max-pooling

In [10]: 第二层卷积,和第一层卷积类似

In [11]: 全连接层

In [12]: 为了减少过拟合,可以在输出层之前加入dropout。(但是本例子比较简单,即使不加,影响也不大)

In [13]: 由一个softmax层来得到输出

In [14]: 定义代价函数,训练步骤,用ADAM来进行优化,可以看出,最后测试集太大了,我得显存不够    

In [20]: 只使用1~200个图片作为测试集,正确率是 0.995

In [21]: 只使用201~400个图片作为测试集,正确率是 0.975

In [22]: 只使用401~1000个图片作为测试集,正确率是 0.983333

这个CNN的结构如下图所示:

修改CNN

现在尝试修改这个CNN结构,增加特征数量以期获得更好的效果,修改后的CNN结构如图:

实验过程如下:

In [23]:     def weight_variable(shape):
    ...:       initial = tf.truncated_normal(shape, stddev=0.1)
    ...:       return tf.Variable(initial)
    ...:     
    ...:     def bias_variable(shape):
    ...:       initial = tf.constant(0.1, shape=shape)
    ...:       return tf.Variable(initial)
    ...:     def conv2d(x, W):
    ...:       return tf.nn.conv2d(x, W, strides=[1, 1, 1, 1], padding=\'SAME\')
    ...:     def max_pool_2x2(x):
    ...:       return tf.nn.max_pool(x, ksize=[1, 2, 2, 1],
    ...:                             strides=[1, 2, 2, 1], padding=\'SAME\')
    ...:     W_conv1 = weight_variable([5, 5, 1, 64])
    ...:     b_conv1 = bias_variable([64])
    ...:     h_conv1 = tf.nn.relu(conv2d(x_image, W_conv1) + b_conv1)
    ...:     h_pool1 = max_pool_2x2(h_conv1)
    ...:     W_conv2 = weight_variable([5, 5, 64, 128])
    ...:     b_conv2 = bias_variable([128])
    ...:     
    ...:     h_conv2 = tf.nn.relu(conv2d(h_pool1, W_conv2) + b_conv2)
    ...:     h_pool2 = max_pool_2x2(h_conv2)
    ...:     W_fc1 = weight_variable([7 * 7 * 128, 1024])
    ...:     b_fc1 = bias_variable([1024])
    ...:     
    ...:     h_pool2_flat = tf.reshape(h_pool2, [-1, 7*7*128])
    ...:     h_fc1 = tf.nn.relu(tf.matmul(h_pool2_flat, W_fc1) + b_fc1)
    ...:     keep_prob = tf.placeholder(\"float\")
    ...:     h_fc1_drop = tf.nn.dropout(h_fc1, keep_prob)
    ...:     W_fc2 = weight_variable([1024, 10])
    ...:     b_fc2 = bias_variable([10])
    ...:     
    ...:     y_conv=tf.nn.softmax(tf.matmul(h_fc1_drop, W_fc2) + b_fc2)
    ...:     cross_entropy = -tf.reduce_sum(y_*tf.log(y_conv))
    ...:     train_step = tf.train.AdamOptimizer(1e-4).minimize(cross_entropy)
    ...:     correct_prediction = tf.equal(tf.argmax(y_conv,1), tf.argmax(y_,1))
    ...:     accuracy = tf.reduce_mean(tf.cast(correct_prediction, \"float\"))
    ...:     sess.run(tf.initialize_all_variables())
    ...:     for i in range(20000):
    ...:       batch = mnist.train.next_batch(50)
    ...:       if i%100 == 0:
    ...:         train_accuracy = accuracy.eval(feed_dict={
    ...:             x:batch[0], y_: batch[1], keep_prob: 1.0})
    ...:         print \"step %d, training accuracy %g\"%(i, train_accuracy)
    ...:       train_step.run(feed_dict={x: batch[0], y_: batch[1], keep_prob: 0.5})
    ...:     
    ...:     print \"test accuracy %g\"%accuracy.eval(feed_dict={
    ...:         x: mnist.test.images[0:200,:], y_: mnist.test.labels[0:200,:], keep_prob: 1.0})
    ...: 
    ...
    ...
    ...
    test accuracy 1
In [24]: for i in range(20000):
    ...:    batch = mnist.train.next_batch(50)
    ...:    if i%100 == 0:
    ...:       train_accuracy = accuracy.eval(feed_dict={
    ...:          x:batch[0], y_: batch[1], keep_prob: 1.0})
    ...:       print \"step %d, training accuracy %g\"%(i, train_accuracy)
    ...:    train_step.run(feed_dict={x: batch[0], y_: batch[1], keep_prob: 0.5})
    ...: 
    ...: print \"test accuracy %g\"%accuracy.eval(feed_dict={x: mnist.test.images[200:400,:], y_: mnist.test.labels[200:400,:], keep_prob: 1.0})
    ...: 
    ...
    ...
    ...
    test accuracy 0.975
In [25]: for i in range(20000):
    ...:    batch = mnist.train.next_batch(50)
    ...:    if i%100 == 0:
    ...:       train_accuracy = accuracy.eval(feed_dict={
    ...:          x:batch[0], y_: batch[1], keep_prob: 1.0})
    ...:       print \"step %d, training accuracy %g\"%(i, train_accuracy)
    ...:    train_step.run(feed_dict={x: batch[0], y_: batch[1], keep_prob: 0.5})
    ...: 
    ...: print \"test accuracy %g\"%accuracy.eval(feed_dict={x: mnist.test.images[400:1000,:], y_: mnist.test.labels[400:1000,:], keep_prob: 1.0})
    ...: 
    ...
    ...
    ...
    W tensorflow/core/common_runtime/bfc_allocator.cc:213] Ran out of memory trying to allocate 717.77MiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory is available.
In [26]: print \"test accuracy %g\"%accuracy.eval(feed_dict={x: mnist.test.images[400:600,:], y_: mnist.test.labels[400:600,:], keep_prob: 1.0})
    ...:
    test accuracy 0.985
In [28]: print \"test accuracy %g\"%accuracy.eval(feed_dict={x: mnist.test.images[600:800,:], y_: mnist.test.labels[600:800,:], keep_prob: 1.0})
    ...: 
    ...: 
    ...: 
test accuracy 0.985
In [29]: print \"test accuracy %g\"%accuracy.eval(feed_dict={x: mnist.test.images[800:1000,:], y_: mnist.test.labels[800:1000,:], keep_prob: 1.0})
    ...: 
    ...: 
test accuracy 0.995

修改前的平均准确率是:

( 0.995*2 + 0.975*2 + 0.9833*6 )/ 10 = 0.98398

修改后的平均准确率是:

(1*2 + 0.975*2+ 0.985*4 + 0.995*2)/ 10 = 0.98800

可以看出增加特征过后,准确率提高了,但是内存消耗也变大了(400~1000的图片验证出现了OOM错误),而且实验过程中也感受到时间消耗更大,怎么取舍就取决于具体需求和具体的硬件配置了。