# 训练分类器

## 数据呢？

• 对于图片，有Pillow，OpenCV等包可以使用
• 对于音频，有scipy和librosa等包可以使用
• 对于文本，不管是原生python的或者是基于Cython的文本，可以使用NLTK和SpaCy

## 训练一个图片分类器

1. 通过torchvision加载CIFAR10里面的训练和测试数据集，并对数据进行标准化
2. 定义卷积神经网络
3. 定义损失函数
4. 利用训练数据训练网络
5. 利用测试数据测试网络

### 1.加载并标准化CIFAR10

import torch
import torchvision
import torchvision.transforms as transforms


torchvision数据集加载完后的输出是范围在[0, 1]之间的PILImage。我们将其标准化为范围在[-1, 1]之间的张量。

transform = transforms.Compose(
[transforms.ToTensor(),
transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])

classes = ('plane', 'car', 'bird', 'cat',
'deer', 'dog', 'frog', 'horse', 'ship', 'truck')


Downloading https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz to ./data/cifar-10-python.tar.gz



import matplotlib.pyplot as plt
import numpy as np

# 输出图像的函数

def imshow(img):
img = img / 2 + 0.5     # unnormalize
npimg = img.numpy()
plt.imshow(np.transpose(npimg, (1, 2, 0)))
plt.show()

# 随机得到一些训练图片
images, labels = dataiter.next()

# 显示图片
imshow(torchvision.utils.make_grid(images))
# 打印图片标签
print(' '.join('%5s' % classes[labels[j]] for j in range(4)))



horse horse horse   car


### 2.定义卷积神经网络

import torch.nn as nn
import torch.nn.functional as F

class Net(nn.Module):
def __init__(self):
super(Net, self).__init__()
self.conv1 = nn.Conv2d(3, 6, 5)
self.pool = nn.MaxPool2d(2, 2)
self.conv2 = nn.Conv2d(6, 16, 5)
self.fc1 = nn.Linear(16 * 5 * 5, 120)
self.fc2 = nn.Linear(120, 84)
self.fc3 = nn.Linear(84, 10)

def forward(self, x):
x = self.pool(F.relu(self.conv1(x)))
x = self.pool(F.relu(self.conv2(x)))
x = x.view(-1, 16 * 5 * 5)
x = F.relu(self.fc1(x))
x = F.relu(self.fc2(x))
x = self.fc3(x)
return x

net = Net()


### 3.定义损失函数和优化器

import torch.optim as optim

criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(net.parameters(), lr=0.001, momentum=0.9)


### 4.训练网络

for epoch in range(2):  # loop over the dataset multiple times

running_loss = 0.0
for i, data in enumerate(trainloader, 0):
# get the inputs
inputs, labels = data

# forward + backward + optimize
outputs = net(inputs)
loss = criterion(outputs, labels)
loss.backward()
optimizer.step()

# print statistics
running_loss += loss.item()
if i % 2000 == 1999:    # print every 2000 mini-batches
print('[%d, %5d] loss: %.3f' % (epoch + 1, i + 1, running_loss / 2000))
running_loss = 0.0

print('Finished Training')


[1,  2000] loss: 2.182
[1,  4000] loss: 1.819
[1,  6000] loss: 1.648
[1,  8000] loss: 1.569
[1, 10000] loss: 1.511
[1, 12000] loss: 1.473
[2,  2000] loss: 1.414
[2,  4000] loss: 1.365
[2,  6000] loss: 1.358
[2,  8000] loss: 1.322
[2, 10000] loss: 1.298
[2, 12000] loss: 1.282
Finished Training


### 5.使用测试数据测试网络

ok，第一步。让我们显示测试集中的图像来熟悉一下。

dataiter = iter(testloader)
images, labels = dataiter.next()

# 输出图片
imshow(torchvision.utils.make_grid(images))
print('GroundTruth: ', ' '.join('%5s' % classes[labels[j]] for j in range(4)))


GroundTruth:    cat  ship  ship plane


ok，现在让我们看看神经网络认为上面的例子是:

outputs = net(images)


_, predicted = torch.max(outputs, 1)

print('Predicted: ', ' '.join('%5s' % classes[predicted[j]] for j in range(4)))


Predicted:    dog  ship  ship plane


correct = 0
total = 0
images, labels = data
outputs = net(images)
_, predicted = torch.max(outputs.data, 1)
total += labels.size(0)
correct += (predicted == labels).sum().item()

print('Accuracy of the network on the 10000 test images: %d %%' % (
100 * correct / total))


Accuracy of the network on the 10000 test images: 55 %


class_correct = list(0. for i in range(10))
class_total = list(0. for i in range(10))
images, labels = data
outputs = net(images)
_, predicted = torch.max(outputs, 1)
c = (predicted == labels).squeeze()
for i in range(4):
label = labels[i]
class_correct[label] += c[i].item()
class_total[label] += 1

for i in range(10):
print('Accuracy of %5s : %2d %%' % (
classes[i], 100 * class_correct[i] / class_total[i]))


Accuracy of plane : 70 %
Accuracy of   car : 70 %
Accuracy of  bird : 28 %
Accuracy of   cat : 25 %
Accuracy of  deer : 37 %
Accuracy of   dog : 60 %
Accuracy of  frog : 66 %
Accuracy of horse : 62 %
Accuracy of  ship : 69 %
Accuracy of truck : 61 %


ok，接下来呢？

## 在GPU上训练

device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

# Assuming that we are on a CUDA machine, this should print a CUDA device:

print(device)


cuda:0


net.to(device)


inputs, labels = inputs.to(device), labels.to(device)


• 在更高层次上理解PyTorch的Tensor库和神经网络
• 训练一个小的神经网络做图片分类

## 接下来要做什么？

• #### scikit-learn (sklearn) 官方文档中文版

ApacheCN python 65页 2019年5月26日
2022

• #### Scapy 中文文档

wizardforcel python 10页 2018年5月3日
17

• #### 关于python的面试题

jackfrued python 271页 2019年5月26日
33

• #### Rust 程序设计语言（第二版 & 2018 edition）

KaiserY rust 105页 2020年3月6日
0

• #### 给小白看的设计模式书

tzivanmoe code 13页 2018年7月1日
0

• #### guava学习记录项目

tiantiangao java 24页 2018年6月5日
70