当前位置：服务支持 > 软件文章 > TensorFlow模型参数迁移至PyTorch

TensorFlow模型参数迁移至PyTorch

阅读数 819

问题1：在tensorflow中有with tf.variable_scope(): 用来共享参数，在转为pytorch时候，应该怎么考虑。

比如下面例子，在tensorflow中使用卷积层.

# char embedding
## [batch_size*max_utter_num*max_utter_len, emb]

utterances_cnn_char_emb = cnn_layer(utterances_char_embedded, 
filter_sizes=[3, 4, 5],
num_filters=50,
scope="CNN_char_emb",
scope_reuse=False)  
cnn_char_dim = utterances_cnn_char_emb.get_shape()[1].value

def cnn_layer(inputs, filter_sizes, num_filters, scope=None, scope_reuse=False):
    with tf.variable_scope(scope, reuse=scope_reuse):
        input_size = inputs.get_shape()[2].value
outputs = []
for i, filter_size in enumerate(filter_sizes):
with tf.variable_scope("conv_{}".format(i)):
w = tf.get_variable("w", [filter_size, input_size, num_filters])
b = tf.get_variable("b", [num_filters])
conv = tf.nn.conv1d(inputs, w, stride=1, padding="VALID") # [num_words, num_chars - filter_size, num_filters]
h = tf.nn.relu(tf.nn.bias_add(conv, b)) # [num_words, num_chars - filter_size, num_filters]
pooled = tf.reduce_max(h, 1) # [num_words, num_filters]
outputs.append(pooled)
return tf.concat(outputs, 1) # [num_words, num_filters * len(filter_sizes)]

解决：

在tensorflow中有variable_scope方法实现参数共享，也就是说对于2张图片，第二张训练时的权重参数与第一张图片所使用的相同，详见tf.variable_scope. 同样，在PyTorch则不存在这样的问题，因为PyTorch中使用的卷积（或者其他）层首先需要初始化，也就是需要建立一个实例，然后使用实例搭建网络，因此在多次使用这个实例时权重都是共享。（这一段借鉴某博客大佬https://www.freesion.com/article/67161238755/）

问题2：pytorch代替tf.get_variable()的方法

tensorflow中：

w = tf.get_variable("w", [filter_size, input_size, num_filters])
b = tf.get_variable("b", [num_filters])

改到pytorch中：

w = Variable(torch.randn(filter_size, input_size, num_filters))
b = Variable(torch.randn(num_filters))

问题3：pytorch代替tf.nn.conv1d的方法

在tensorflow中

tf.nn.conv2d(input, filter, strides, padding, use_cudnn_on_gpu=None, name=None)

1.第一个参数input：指需要做卷积的输入图像，它要求是一个Tensor，具有[batch, in_height, in_width, in_channels]这样的shape

2.第二个参数filter：相当于CNN中的卷积核，它要求是一个Tensor，具有[filter_height, filter_width, in_channels, out_channels]

3.第三个参数strides：卷积时在图像每一维的步长，这是一个一维的向量，长度4

4.第四个参数padding：string类型的量，只能是"SAME","VALID"其中之一

5.use_cudnn_on_gpu:bool类型，是否使用cudnn加速，默认为true

结果返回一个Tensor，这个输出，就是我们常说的feature map

conv = tf.nn.conv1d(inputs, w, stride=1, padding="VALID")

在pytorch中
torch.nn.Conv1d(in_channels, out_channels, kernel_size, stride=1, padding=0, dilation=1, groups=1, bias=True)
1.in_channels(int) – 输入信号的通道。在文本分类中，即为词向量的维度
2.out_channels(int) – 卷积产生的通道。有多少个out_channels，就需要多少个1维卷积
3.kernel_size(int or tuple) - 卷积核的尺寸，卷积核的大小为(k,)，第二个维度是由in_channels来决定的，所以实际上卷积大小为kernel_size*in_channels
4.stride(int or tuple, optional) - 卷积步长
5.padding (int or tuple, optional)- 输入的每一条边补充0的层数

在这里插入代码片

问题4：在pytorch中类似TensorFlow的h = tf.nn.bias_add(conv, b)

pytorch中直接两个Tensor相加即可

import torch

torch.manual_seed(2019)
a = torch.rand(2, 3, 4, 5)
print(a)
b = torch.ones(5)
print(b)
print(a + b)

问题5：在tensorflow中 tf.nn.relu（）函数，在pytorch中应该怎么替代

在pytorch中，激活函数的使用方法有两种，分别是：

import torch.nn.functional as F
'''
out = F.relu(input)

import torch.nn as nn
'''
nn.ReLU()

这两种方法都是使用relu激活，只是使用的场景不一样，F.relu()是函数调用，一般使用在foreward函数里。而nn.ReLU()是模块调用，一般在定义网络层的时候使用。
当用print(net)输出时，会有nn.ReLU()层，而F.ReLU()是没有输出的

问题6：pytorch中类似于tf.reduce_max

h_tf = tf.reduce_max(d_tf,axis=1)               # 最值
print(h_tf.shape)
h_th,_ = th.max(d_th, dim=1)  # 1.7 版本之后⽀持 h_th = th.amax(d_th, dim=1)
print(h_th.size())

问题7：pytorch中类似于tf.concat

a_tf = tf.random.normal((3, 4, 1))
b_tf = tf.random.normal((3, 4, 1))
c_tf = tf.concat((a_tf, b_tf), axis=1)
print(c_tf.shape)
a_th = th.randn(3, 4, 1)
b_th = th.randn(3, 4, 1)
c_th = th.cat((a_th, b_th), dim=1)
print(c_th.shape)

问题8：pytorch中类似于tf中tf.placeholder占位符

PyTorch是一个动态的框架，而TensorFlow是一个静态的框架。使用TensorFlow时，必须先搭建好网络的结构，然后使用预先留出的几个占位符作为样本输入和label输入，这就像是通过开了几个洞的木板进行交互，中途无法对计算的流程进行更改。
与之不同的是，PyTorch遵循动态的图像计算方法，不需要考虑张量尺寸的问题。另外，PyTorch能够自动求变量的导数，易于在编程中理解网络底层的原理。

问题9：pytorch中类似于tf.nn.dropout的函数

pytorch中有两个dropout，⼀个是函数形式的torch.nn.functional.dropout;⼀个是封装好的类torch.nn.Dropout
torch.nn.Dropout(p=0.5, inplace=False)
1.p为对于input中各个元素zero out的概率，也就是说当p=1时，output为全0。
2.inplace参数，是否对tensor本身操作：

#在tensorflow中
utterances_embedded = tf.nn.dropout(utterances_embedded, keep_prob=self.dropout_keep_prob)
#在pytorch中
dropout = nn.Dropout(dropout_keep_prob)
utterances_embedded=dropout(utterances_embedded)

问题10：pytorch中可以代替tf.reshpe()的

tf.reshape(
    tensor, shape, name=None
)

shape变量中-1的情况,
例如当shape=(-1,2)时,-1代表任意维度,

torch.reshape(input, shape) → Tensor

tf.reshape函数用于对输入tensor进行维度调整，但是这种调整方式并不会修改内部元素的数量以及元素之间的顺序
在pytorch中，view(),reshape()都可以
二者功能类似，都是为了改变tensor的shape。
区别：
1.view()产生的tensor总是和原来的tensor共享一份相同的数据，而reshape()在新形状满足一定条件时会共享相同一份数据，否则会复制一份新的数据。
2.两者对于原始tensor的连续性要求不同。reshape()不管tensor是否是连续的，都能成功改变形状。而view()对于不连续的tensor()，需要新形状shape满足一定条件才能成功改变形状，否则会报错。 transpose, permute 等操作会改变 tensor的连续性，在新形状shape不满足一定的情况下会报错。

这里的view和数据库中的视图(view)概念上十分类似，其本质就是不会复制一份新的数据，而是与原来的tensor或原来的数据库表共享一份相同的数据。
上面我们说到view()对于不连续的tensor，需要新形状shape满足一定条件才能成功改变形状。那这里的条件是什么呢？

首先我们需要知道view()改变形状改的是什么，我们知道它与原tensor共享一份数据，所以数据存放顺序并没改变，它改变的是tensor的步幅(stride)，步幅的改变使得新tensor有他自己的访问方式和访问顺序。
reshape()本着尽量节省内存的原件进行形状的调整。

如果新形状满足view函数所要求的条件(即基于不变的内存数据仍能以固定的新步幅访问该数据)，那么这时候reshape()跟view()等价，不会复制一份新的数据。

如果新的形状不满足view函数所要求的条件(即无法求得满足条件的新步幅)，这时候reshape也能工作，这时候它会将原来非连续性的tensor按逻辑顺序copy到新的内存空间(即使得要使用view函数的tensor b其逻辑数据顺序和物理数据顺序一致)，然后再改变tensor b形状。

问题11：pytorch代替tensorflow实现bilstm

总体tensorflow实现：

def lstm_layer(inputs,input_seq_len,rnn_size,dropout_keep_prob, scope, scope_reuse=False):
    with tf.variable_scope(scope, reuse=scope_reuse) as vs:
        fw_cell = tf.nn.rnn_cell.LSTMCell(rnn_size, forget_bias=1.0, state_is_tuple=True, reuse=scope_reuse, name='fw_cell')
        fw_cell = tf.nn.rnn_cell.DropoutWrapper(fw_cell, output_keep_prob=dropout_keep_prob)
        bw_cell = tf.nn.rnn_cell.LSTMCell(rnn_size, forget_bias=1.0, state_is_tuple=True, reuse=scope_reuse, name='bw_cell')
        bw_cell = tf.nn.rnn_cell.DropoutWrapper(bw_cell, output_keep_prob=dropout_keep_prob)
        #[batch_size,max_len,2*hiddens_num]
        rnn_outputs, rnn_states = tf.nn.bidirectional_dynamic_rnn(cell_fw=fw_cell, cell_bw=bw_cell,
                                                                  inputs=inputs,
                                                                  sequence_length=input_seq_len,
                                                                  dtype=tf.float32)
        return rnn_outputs, rnn_states

在这涉及几个子问题1.tensorflow的tf.nn.rnn_cell.LSTMCell和pytorch的nn.LSTM区别

tf.nn.rnn_cell.LSTMCell
函数初始化:
init(
num_units,
use_peepholes=False,
cell_clip=None,
initializer=None,
num_proj=None,
proj_clip=None,
num_unit_shards=None,
num_proj_shards=None,
forget_bias=1.0,
state_is_tuple=True,
activation=None,
reuse=None,
name=None,
dtype=None,
**kwargs
)

num_units: int型, LSTM网络单元的个数，即隐藏层的节点数。
use_peepholes: bool型, 默认False，True表示启用Peephole连接。peephole是指门层也会接受细胞状态的输入，也就是说在基本的LSTM的基础上，在每一个门层的输入时加入细胞状态的输入。如下图所示。f 和 i 都加入了Ct-1的细胞状态，o 加入了Ct的细胞状态。

cell_clip: (可选) 一个浮点值, 是否在输出前对cell状态按照给定值进行截断处理。
initializer: (可选) 权重和映射矩阵的初始化器。
num_proj: (可选) int型, 映射矩阵的输出维度，如果为None，则不会进行映射。
proj_clip: (可选) 一个浮点值. 如果num_proj > 0 而且proj_clip不为空，那么映射后的值被逐元素裁剪到[-proj_clip, proj_clip]的分为内.
num_unit_shards: Deprecated.
num_proj_shards: Deprecated.
forget_bias: 在训练开始时，为了减小遗忘尺度，遗忘门的偏置默认初始化为1.0，当从已经训练好的CudnnLSTM的checkpoints文件恢复时，这个值必须手动设置为0.
state_is_tuple: 如果为True, 接受的和返回的状态是一个(c, h)的二元组，其中c为细胞当前状态，h为当前时间段的输出的同时，
也是下一时间段的输入的一部分。如果为False, 那么它们会concatenated到一起. 为False的情况将来会废弃.
activation: 内部状态的激活函数，默认为tanh.
reuse: (可选)bool型，是否重用已经存在scope中的变量. 如果为False, 而且已经存在的scope中已经有同一个变量，则会出错.
name: String型, 网络层的名字，拥有相同名字的网络层将共享权重，但是为了避免出错，这种情况需要设置reuse=True.
dtype: 网络层的默认类型，默认为None，意味着使用第一次输入的类型.
**kwargs: Dict型, 一般网络层属性的关键字命名属性.

nn.LSTM
class torch.nn.LSTM(*args, **kwargs)

参数列表

input_size：x的特征维度
hidden_size：隐藏层的特征维度
num_layers：lstm隐层的层数，默认为1
bias：False则bih=0和bhh=0. 默认为True
batch_first：True则输入输出的数据格式为 (batch, seq, feature)
dropout：除最后一层，每一层的输出都进行dropout，默认为: 0
bidirectional：True则为双向lstm默认为False
输入：input, (h0, c0)
输出：output, (hn,cn)
输入数据格式：
input(seq_len, batch, input_size)
h0(num_layers * num_directions, batch, hidden_size)
c0(num_layers * num_directions, batch, hidden_size)

输出数据格式：
output(seq_len, batch, hidden_size * num_directions)
hn(num_layers * num_directions, batch, hidden_size)
cn(num_layers * num_directions, batch, hidden_size)

默认情况batch_first=False
区别主要在于隐藏层数
tf.nn.rnn_cell.LSTMCel相当于一次创建一个但隐层的LSTM单元，如果想创建多层的LSTMCell，则可以使用tf.nn.rnn_cell.LSTMCel，然后Cell必须套入tf.nn.dynamic_rnn才能使用。
nn.LSTM创建多层时候直接传个参数就好了！

然后tf的time_major参数和torch的batch_first相当于一个意思，就是batch_size的维度是不是在第一维。
在这首先先了解一下tf.nn.dynamic_rnn

tf.nn.dynamic_rnn(
    cell,
    inputs,
    sequence_length=None,
    initial_state=None,
    dtype=None,
    parallel_iterations=None,
    swap_memory=False,
    time_major=False,
    scope=None
)

1）cell：LSTM、GRU等的记忆单元。cell参数代表一个LSTM或GRU的记忆单元，也就是一个cell。例如，cell = tf.nn.rnn_cell.LSTMCell((num_units)，其中，num_units表示rnn cell中神经元个数，也就是下文的cell.output_size。返回一个LSTM或GRU cell，作为参数传入。
2）inputs：输入的训练或测试数据，一般格式为[batch_size, max_time, embed_size]，其中batch_size是输入的这批数据的数量，max_time就是这批数据中序列的最长长度，embed_size表示嵌入的词向量的维度。
3）sequence_length：是一个list，假设你输入了三句话，且三句话的长度分别是5,10,25,那么sequence_length=[5,10,25]。
4）time_major：决定了输出tensor的格式，如果为True, 张量的形状必须为 [max_time, batch_size,cell.output_size]。如果为False, tensor的形状必须为[batch_size, max_time, cell.output_size]，cell.output_size表示rnn cell中神经元个数。
5）返回值：元组（outputs, states）
outputs：outputs很容易理解，就是每个cell会有一个输出
states：states表示最终的状态，也就是序列中最后一个cell输出的状态。一般情况下states的形状为 [batch_size, cell.output_size ]，但当输入的cell为BasicLSTMCell时，state的形状为[2，batch_size, cell.output_size ]，其中2也对应着LSTM中的cell state和hidden state。

实例化一个nn.LSTM单元时会用到的参数，

例如lstm = nn.LSTM(10, 20, 2)就实例化了一个input_size=10, hidden_size=20，num_layer=2的LSTM网络，也就是输入的维度为10，隐层的神经元数目为20，总共有2个隐层。
**实例化好的LSTM如何使用呢？**以下是输入，h0和c0都是可选的（不指定则默认为0），重点是input，是一个表示输入序列特征的tensor，维度是（seq_len, batch, input_size），比如接上例，x = torch.randn(5, 3, 10),每个句子5个词，每个词用10维向量表示（正好对应LSTM单元里的input_size），一次把3个句子作为一个batch一起输入。
至于h0和c0，分别是hidden和cell的初始状态，维度是（num_layers * num_directions, batch, hidden_size）。

原理性的知识先补懂了，那么后续代码部分，看到了[Pytorch和Tensorflow对比（二）]：LSTM 。

双向LSTM

**在tensorflow中**
# 定义前向和后向的LSTM单元
lstmFwCell = tf.nn.rnn_cell.LSTMCell(num_units=hidden_size)
lstmBwCell = tf.nn.rnn_cell.LSTMCell(num_units=hidden_size)

## 输出outputs的形式是(output_fw, output_bw)形式的元组
## 两个的形状都是 [batch_size, max_time, hidden_size]
## 可以直接通过 tf.concat(outputs, 2) 的方法拼接成 [batch_size, max_time, hidden_size*2]

## state是(state_fw, state_bw)的形式
## 而state_fw中的元素是(c_f, h_f)，对应state_bw类似
## 对应的 c_f 形状是 [batch_size, hidden_size]，其他类似

outputs, state = tf.nn.bidirectional_dynamic_rnn(lstmFwCell, lstmBwCell, inputs, 
sequence_length, scope)

**在pytorch中**
bilstm = torch.nn.LSTM(input_size, hidden_size, batch_first, bidirectional=True)
# inputs形状默认需要是 [max_time, batch_size, input_size]
## 如果设置 batch_first=True，则应该是[batch_size, max_time, input_size]
## outputs的形状为 [max_time, batch, 2*hidden_size]
## 如果设置batch_first，则对应batch在最前面
## h_n和c_n的形状为 [2, batch_size, hidden_size]

outputs, (h_n, c_n) = bilstm(inputs)
'''
input_size: 输入数据的特征维数，通常就是embedding_dim(词向量的维度)
hidden_size: LSTM中隐层的维度
num_layers: 循环神经网络的层数
'''

免责声明：本文系网络转载或改编，未找到原创作者，版权归原作者所有。如涉及版权，请联系删

返回上级列表

联系我们

，获取更多内容