Äú¿ÉÒÔ¾èÖú£¬Ö§³ÖÎÒÃǵĹ«ÒæÊÂÒµ¡£

1Ôª 10Ôª 50Ôª





ÈÏÖ¤Â룺  ÑéÖ¤Âë,¿´²»Çå³þ?Çëµã»÷Ë¢ÐÂÑéÖ¤Âë ±ØÌî



  ÇóÖª ÎÄÕ ÎÄ¿â Lib ÊÓÆµ iPerson ¿Î³Ì ÈÏÖ¤ ×Éѯ ¹¤¾ß ½²×ù Model Center   Code  
»áÔ±   
   
 
     
   
 ¶©ÔÄ
  ¾èÖú
Àí½âÇé¸Ð¡ª´ÓKerasÒÆÖ²µ½pyTorch
 
À´Ô´£ºCSDN ·¢²¼ÓÚ£º 2017-10-16
  5717  次浏览      27
 

Çé¸ÐÇéÐ÷¼ì²âÊÇ×ÔÈ»ÓïÑÔÀí½âµÄ¹Ø¼üÒªËØ¡£×î½ü£¬ÎÒÃǽ«Ô­À´µÄÏîÄ¿Ç¨ÒÆµ½ÁËÐµļ¯³ÉϵͳÉÏ£¬¸Ãϵͳ»ùÓÚÂéÊ¡Àí¹¤Ñ§ÔºÃ½ÌåʵÑéÊÒÍÆ³öµÄNLPÄ£Ðʹ¶ø³É¡£

´úÂëÒѾ­¿ªÔ´ÁË£¡£¨Ïê¼ûGitHub£ºhttps://github.com/huggingface/torchMoji£©

¸ÃÄ£ÐÍ×î³õµÄÉè¼ÆÊ¹ÓÃÁËTensorFlow¡¢TheanoºÍKeras£¬½Ó×ÅÎÒÃǽ«ÆäÒÆÖ²µ½ÁËpyTorchÉÏ¡£ÓëKerasÏà±È£¬pyTorchÄÜÈÃÎÒÃǸü×ÔÓɵؿª·¢ºÍ²âÊÔ¸÷ÖÖ¶¨ÖÆ»¯µÄÉñ¾­ÍøÂçÄ£¿é£¬²¢Ê¹ÓÃÒ×ÓÚÔĶÁµÄnumpy·ç¸ñÀ´±àд´úÂë¡£ÔÚÕâÆªÎÄÕÂÖУ¬ÎÒ½«Ïêϸ˵Ã÷ÔÚÒÆÖ²¹ý³ÌÖгöÏֵöÓÐȤµÄÎÊÌ⣺

  • ÈçºÎʹÓÃ×Ô¶¨Ò弤»î¹¦Äܶ¨ÖÆpyTorch LSTM
  • PackedSequence¶ÔÏóµÄ¹¤×÷Ô­Àí¼°Æä¹¹½¨
  • ÈçºÎ½«¹Ø×¢²ã´ÓKerasת»»³ÉpyTorch
  • ÈçºÎÔÚpyTorchÖмÓÔØÊý¾Ý£ºDataSetºÍSmart Batching
  • ÈçºÎÔÚpyTorchÖÐʵÏÖKerasµÄÈ¨ÖØ³õʼ»¯

Ê×ÏÈ£¬ÎÒÃÇÀ´¿´¿´torchMoji/DeepMojiµÄÄ£ÐÍ¡£ËüÊÇÒ»¸öÏ൱±ê×¼¶øÇ¿´óµÄÈ˹¤ÓïÑÔ´¦ÀíÉñ¾­ÍøÂ磬¾ßÓÐÁ½¸öË«LSTM²ã£¬ÆäºóÊǹØ×¢²ãºÍ·ÖÀàÆ÷£º

torchMoji/DeepMojiÄ£ÐÍ

ÈçºÎ¹¹½¨Ò»¸ö¶¨ÖÆ»¯µÄpyTorch LSTMÄ£¿é

DeepMojiÓÐÒ»¸öºÜ²»´íµÄÌØµã£ºBjarke Felbo¼°ÆäЭ×÷ÕßÄܹ»ÔÚÒ»¸öÓµÓÐ16ÒÚÌõ¼Ç¼µÄº£Á¿Êý¾Ý¼¯ÉÏѵÁ·¸ÃÄ£ÐÍ¡£Òò´Ë£¬Ô¤ÏÈѵÁ·µÄÄ£ÐÍÔÚ´ËѵÁ·¼¯ÖоßÓзdz£·á¸»µÄÇé¸ÐºÍÇéÐ÷±íÕ÷£¬ÎÒÃÇ¿ÉÒԺܷ½±ãµØÊ¹ÓÃÕâ¸öѵÁ·¹ýµÄÄ£ÐÍ¡£

¸ÃÄ£ÐÍÊÇʹÓÃÕë¶ÔLSTMµÄ»Ø¹éÄں˵ÄTheano/KerasĬÈϼ¤»îº¯Êýhard sigmoidѵÁ·µÄ£¬¶øpyTorchÊÇ»ùÓÚNVIDIAµÄcuDNN¿â½¨Ä£µÄ£¬ÕâÑù£¬¿É»ñµÃÔ­ÉúÖ§³ÖLSTMµÄGPU¼ÓËÙÓë±ê×¼µÄsigmoid»Ø¹é¼¤»îº¯Êý£º

KerasĬÈϵÄLSTMºÍpyTorchĬÈϵÄLSTM

Òò´Ë£¬ÎÒдÁËÒ»¸ö¾ßÓÐhard sigmoid»Ø¹é¼¤»îº¯ÊýµÄ×Ô¶¨ÒåLSTM²ã£º

def LSTMCell(input, hidden, w_ih, w_hh, b_ih=None, b_hh=None):
"""
A modified LSTM cell with hard sigmoid activation on the input, forget and output gates.
"""
hx, cx = hidden
gates = F.linear(input, w_ih, b_ih) + F.linear(hx, w_hh, b_hh)
ingate, forgetgate, cellgate, outgate = gates.chunk(4, 1)
ingate = hard_sigmoid(ingate)
forgetgate = hard_sigmoid(forgetgate)
cellgate = F.tanh(cellgate)
outgate = hard_sigmoid(outgate)
cy = (forgetgate * cx) + (ingate * cellgate)
hy = outgate * F.tanh(cy)
return hy, cy
def hard_sigmoid(x):
"""
Computes element-wise hard sigmoid of x.
See e.g. https://github.com/Theano/Theano/blob/master/theano/tensor/nnet/sigm.py#L279
"""
x = (0.2 * x) + 0.5
x = F.threshold(-x, -1, -1)
x = F.threshold(-x, 0, 0)
return x

Õâ¸öLSTMµ¥Ôª±ØÐ뼯³ÉÔÚÒ»¸öÍêÕûµÄÄ£¿éÖУ¬ÕâÑù²Å¿ÉÒÔʹÓÃpyTorchËùÓеŦÄÜ¡£Õâ¸ö¼¯³ÉÏà¹ØµÄ´úÂëºÜ³¤£¬½¨ÒéÖ±½ÓÒýÓõ½GithubÖеÄÏà¹ØÔ´´úÂë¡£

KerasºÍpyTorchÖеĹØ×¢²ã

Ä£Ð͵ĹØ×¢²ãÊÇÒ»¸öÓÐȤµÄÄ£¿é£¬ÎÒÃÇ¿ÉÒÔ·Ö±ðÔÚKerasºÍpyTorchµÄ´úÂëÖнøÐбȽϣº

class Attention(Module):
"""
Computes a weighted average of channels across timesteps (1 parameter pr. channel).
"""
def __init__(self, attention_size, return_attention=False):
""" Initialize the attention layer
# Arguments:
attention_size: Size of the attention vector.
return_attention: If true, output will include the weight for each input token
used for the prediction
"""
super(Attention, self).__init__()
self.return_attention = return_attention
self.attention_size = attention_size
self.attention_vector = Parameter(torch.FloatTensor(attention_size))
def __repr__(self):
s = '{name}({attention_size}, return attention={return_attention})'
return s.format(name=self.__class__.__name__, **self.__dict__)
def forward(self, inputs, input_lengths):
""" Forward pass.
# Arguments:
inputs (Torch.Variable): Tensor of input sequences
input_lengths (torch.LongTensor): Lengths of the sequences
# Return:
Tuple with (representations and attentions if self.return_attention else None).
"""
logits = inputs.matmul(self.attention_vector)
unnorm_ai = (logits - logits.max()).exp()
# Compute a mask for the attention on the padded sequences
# See e.g. https://discuss.pytorch.org/t/self-attention-on-words-and-masking/5671/5
max_len = unnorm_ai.size(1)
idxes = torch.arange(0, max_len, out=torch.LongTensor(max_len)).unsqueeze(0)
if torch.cuda.is_available():
idxes = idxes.cuda()
mask = Variable((idxes < input_lengths.unsqueeze(1)).float())
# apply mask and renormalize attention scores (weights)
masked_weights = unnorm_ai * mask
att_sums = masked_weights.sum(dim=1, keepdim=True) # sums per sequence
attentions = masked_weights.div(att_sums)
# apply attention weights
weighted = torch.mul(inputs, attentions.unsqueeze(-1).expand_as(inputs))
# get the final fixed vector representations of the sentences
representations = weighted.sum(dim=1)
return (representations, attentions if self.return_attention else None)
class AttentionWeightedAverage(Layer):
"""
Computes a weighted average of the different channels across timesteps.
Uses 1 parameter pr. channel to compute the attention value for a single timestep.
"""
def __init__(self, return_attention=False, **kwargs):
self.init = initializers.get('uniform')
self.supports_masking = True
self.return_attention = return_attention
super(AttentionWeightedAverage, self).__init__(** kwargs)
def build(self, input_shape):
self.input_spec = [InputSpec(ndim=3)]
assert len(input_shape) == 3
self.W = self.add_weight(shape=(input_shape[2], 1),
name='{}_W'.format(self.name),
initializer=self.init)
self.trainable_weights = [self.W]
super(AttentionWeightedAverage, self).build(input_shape)
def call(self, x, mask=None):
# computes a probability distribution over the timesteps
# uses 'max trick' for numerical stability
# reshape is done to avoid issue with Tensorflow
# and 1-dimensional weights
logits = K.dot(x, self.W)
x_shape = K.shape(x)
logits = K.reshape(logits, (x_shape[0], x_shape[1]))
ai = K.exp(logits - K.max(logits, axis=-1, keepdims=True))
# masked timesteps have zero weight
if mask is not None:
mask = K.cast(mask, K.floatx())
ai = ai * mask
att_weights = ai / K.sum(ai, axis=1, keepdims=True)
weighted_input = x * K.expand_dims(att_weights)
result = K.sum(weighted_input, axis=1)
if self.return_attention:
return [result, att_weights]
return result
def get_output_shape_for(self, input_shape):
return self.compute_output_shape(input_shape)
def compute_output_shape(self, input_shape):
output_len = input_shape[2]
if self.return_attention:
return [(input_shape[0], output_len), (input_shape[0], input_shape[1])]
return (input_shape[0], output_len)
def compute_mask(self, input, input_mask=None):
if isinstance(input_mask, list):
return [None] * len(input_mask)
else:
return None

ÈçÄãËù¼û£¬Ö÷ÒªµÄËã·¨´óÖÂÏàͬ£¬µ«PyTorch´úÂëÖеĴ󲿷ֶ¼ÊÇ×¢ÊÍ£¬¶øKerasÔòÐèÒª±àд¼¸¸ö¸½¼Óº¯Êý²¢½øÐе÷Óá£

*ÔÚ±àдºÍµ÷ÊÔ×Ô¶¨ÒåÄ£¿éºÍ²ãʱ£¬pyTorchÊÇÒ»¸ö¸ü¿ìµÄÑ¡Ôñ£»¶ø¶ÔÓÚ¿ìËÙѵÁ·ºÍ²âÊÔÓɱê×¼²ã¹¹½¨µÄÄ£ÐÍʱ£¬KerasÏÔÈ»¸ü¼ÓºÏÊÊ¡£

PackedSequence¶ÔÏóµÄ¹¤×÷Ô­Àí

KerasÓÐÒ»¸ö²»´íµÄÑÚÂ빦ÄÜ¿ÉÒÔÓÃÀ´´¦Àí¿É±ä³¤¶ÈÐòÁС£ÄÇôÔÚpyTorchÖÐÓÖ¸ÃÈçºÎ´¦ÀíÕâ¸öÄØ£¿¿ÉÒÔʹÓÃPackedSequences£¡ pyTorchÎĵµÖÐÓйØPackedSequenceµÄ½éÉܲ¢²»ÊǺÜÏêϸ£¬ËùÒÔÕâÀï»áÏêϸÃèÊöËüµÄϸ½Ú¡£

Ò»¸öÓµÓÐ5¸öÐòÁÐ18¸öÁîÅÆµÄµäÐÍNLPÅú´Î

¼ÙÉèÎÒÃÇÓÐÒ»Åú¿É±ä³¤¶ÈµÄÐòÁУ¨ÔÚNLPÓ¦ÓÃÖÐͨ³£¾ÍÊÇÕâÑùµÄ£©¡£ÎªÁËÔÚGPUÉϲ¢ÐмÆËãÕâÑùÒ»¸öÅú´Î£¬ÎÒÃÇÏ£Íû£º

  • ¾¡¿ÉÄÜ¶àµØ²¢Ðд¦ÀíÕâ¸öÐòÁУ¬ÒòΪLSTMÒþ²Ø×´Ì¬ÒÀÀµÓÚÿ¸öÐòÁеÄǰһ¸öʱ¼ä²½³¤£¬ÒÔ¼°
  • ÒÔÕýÈ·µÄʱ¼ä²½³¤£¨Ã¿¸öÐòÁеĽáβ£©Í£Ö¹Ã¿¸öÐòÁеļÆËã¡£

Õâ¿ÉÒÔͨ¹ýʹÓÃpyTorchÖеÄPackedSequenceÀàÀ´ÊµÏÖ¡£ÎÒÃÇÊ×ÏÈͨ¹ý¼õÉÙ³¤¶ÈÀ´¶ÔÐòÁнøÐÐÅÅÐò£¬²¢½«ËüÃǷŵ½ÔÚÕÅÁ¿ÖС£È»ºó¶ÔÕÅÁ¿ºÍÐòÁ㤶ÈÁбíµ÷ÓÃpack_padded_sequenceº¯Êý

# input_seqs is a batch of input sequences as a numpy array of integers (word indices in vocabulary) padded with zeroas
input_seqs = Variable(torch.from_numpy(input_seqs.astype('int64')).long())
# First: order the batch by decreasing sequence length
input_lengths = torch.LongTensor([torch.max(input_seqs[i, :].data.nonzero()) + 1 for i in range(input_seqs.size()[0])])
input_lengths, perm_idx = input_lengths.sort(0, descending=True)
input_seqs = input_seqs[perm_idx][:, :input_lengths.max()]
# Then pack the sequences
packed_input = pack_padded_sequence(input_seqs, input_lengths.cpu().numpy(), batch_first=True)

PackedSequence¶ÔÏó°üÀ¨£º

  • Ò»¸ödata¶ÔÏó£ºÒ»¸ötorch.Variable£¨ÁîÅÆµÄ×ÜÊý£¬Ã¿¸öÁîÅÆµÄά¶È£©£¬ÔÚÕâ¸ö¼òµ¥µÄÀý×ÓÖÐÓÐÎå¸öÁîÅÆÐòÁУ¨ÓÃÕûÊý±íʾ£©£º£¨18£¬1£©
  • Ò»¸öbatch_sizes¶ÔÏó£ºÃ¿¸öʱ¼ä²½³¤µÄÁîÅÆÊýÁÐ±í£¬ÔÚÕâ¸öÀý×ÓÖÐΪ£º[6£¬5£¬2£¬4£¬1]

ÓÃpack_padded_sequenceº¯ÊýÀ´¹¹ÔìÕâ¸ö¶ÔÏó·Ç³£µÄ¼òµ¥£º

ÈçºÎ¹¹ÔìÒ»¸öPackedSequence¶ÔÏó£¨batch_first = True£©

PackedSequence¶ÔÏóÓÐÒ»¸öºÜ²»´íµÄÌØÐÔ£¬¾ÍÊÇÎÒÃÇÎÞÐè¶ÔÐòÁнâ°ü£¨ÕâÒ»²½²Ù×÷·Ç³£Âý£©¼´¿ÉÖ±½ÓÔÚPackedSequenceÊý¾Ý±äÁ¿ÉÏÖ´ÐÐÐí¶à²Ù×÷¡£ÌرðÊÇÎÒÃÇ¿ÉÒÔ¶ÔÁîÅÆÖ´ÐÐÈκβÙ×÷£¨¼´¶ÔÁîÅÆµÄ˳Ðò/ÉÏÏÂÎIJ»Ãô¸Ð£©¡£µ±È»£¬ÎÒÃÇÒ²¿ÉÒÔʹÓýÓÊÜPackedSequence×÷ΪÊäÈëµÄÈκÎÒ»¸öpyTorchÄ£¿é£¨pyTorch 0.2£©¡£

ÀýÈ磬ÔÚÎÒÃǵÄNLPÄ£ÐÍÖУ¬ÎÒÃÇ¿ÉÒÔÔÚ¶ÔPackedSequence¶ÔÏó²»½â°üµÄÇé¿öÏÂÁ¬½ÓÁ½¸öLSTMÄ£¿éµÄÊä³ö£¬²¢Ôڴ˶ÔÏóÉÏÓ¦ÓÃLSTM¡£ÎÒÃÇ»¹¿ÉÒÔÔÚ²»½â°üµÄÇé¿öÏÂÖ´ÐйØ×¢²ãµÄһЩ²Ù×÷¡£

pyTorchÖеÄÖÇÄÜÊý¾Ý¼ÓÔØ£ºDataSetsºÍBatches

ÔÚKerasÖУ¬Êý¾Ý¼ÓÔØºÍÅú´¦Àíͨ³£Òþ²ØÔÚfit_generatorº¯ÊýÖС£ÖØÉêÒ»±é£¬Èç¹ûÄãÏëÒª¿ìËٵزâÊÔÄ£ÐÍ£¬KerasºÜºÃÓ㬵«ÕâÒ²Òâζ×ÅÎÒÃDz»ÄÜÍêÈ«¿ØÖÆÄ£ÐÍÖеÄÖØÒª²¿·Ö¡£

ÔÚpyTorchÖУ¬ÎÒÃǽ«Ê¹ÓÃÈý¸öÀàÀ´Íê³ÉÕâ¸öÈÎÎñ£º

  • Ò»¸öDataSetÀ࣬ÓÃÓÚ±£´æ¡¢Ô¤´¦ÀíºÍË÷ÒýÊý¾Ý¼¯
  • Ò»¸öBatchSamplerÀ࣬ÓÃÓÚ¿ØÖÆÑù±¾ÈçºÎÅúÁ¿ÊÕ¼¯
  • Ò»¸öDataLoaderÀ࣬¸ºÔð½«ÕâЩÅú´ÎÌṩ¸øÄ£ÐÍ

ÎÒÃǵÄDataSetÀà·Ç³£¼òµ¥£º

class DeepMojiDataset(Dataset):
""" A simple Dataset class.
# Arguments:
X_in: Inputs of the given dataset.
y_in: Outputs of the given dataset.
# __getitem__ output:
(torch.LongTensor, torch.LongTensor)
"""
def __init__(self, X_in, y_in):
# Check if we have Torch.LongTensor inputs (assume Numpy array otherwise)
if not isinstance(X_in, torch.LongTensor):
X_in = torch.from_numpy(X_in.astype('int64')).long()
if not isinstance(y_in, torch.LongTensor):
y_in = torch.from_numpy(y_in.astype('int64')).long()
self.X_in = torch.split(X_in, 1, dim=0)
self.y_in = torch.split(y_in, 1, dim=0)
def __len__(self):
return len(self.X_in)
def __getitem__(self, idx):
return self.X_in[idx].squeeze(), self.y_in[idx].squeeze()

ÎÒÃÇBatchSamplerÔò¸üÓÐȤ¡£

ÎÒÃÇÓм¸¸öСµÄNLPÊý¾Ý¼¯£¬ÓÃÓÚ΢µ÷Çé¸ÐÇéÐ÷¼ì²âÄ£ÐÍ¡£ÕâЩÊý¾Ý¼¯ÓÐ×Ų»Í¬µÄ³¤¶ÈºÍijЩ²»Æ½ºâµÄÖÖÀ࣬ËùÒÔÎÒÃÇÏëÉè¼ÆÕâôһ¸öÅúÁ¿²ÉÑùÆ÷£º

  • ÔÚÔ¤Ïȶ¨ÒåµÄÑù±¾ÊýÖÐÊÕ¼¯Åú´Î£¬ÕâÑùÎÒÃǵÄѵÁ·¹ý³Ì¾Í¿ÉÒÔ²»ÒÀÀµÓÚÅú´ÎµÄ³¤¶ÈÄܹ»´Ó²»Æ½ºâµÄÊý¾Ý¼¯ÖÐÒÔÆ½ºâµÄ·½Ê½½øÐвÉÑù¡£
  • ÔÚPyTorchÖУ¬BatchSamplerÊÇÒ»¸ö¿ÉÒÔµü´úÉú³ÉÅú´ÎµÄÀ࣬BatchSamplerµÄÿ¸öÅú´¦Àí¶¼°üº¬Ò»¸öÁÐ±í£¬ÆäÖаüº¬ÒªÔÚDataSetÖÐÑ¡ÔñµÄÑù±¾µÄË÷Òý¡£

Òò´Ë£¬ÎÒÃÇ¿ÉÒÔ¶¨ÒåÒ»¸öÓÃÊý¾Ý¼¯Àà±êÇ©ÏòÁ¿À´³õʼ»¯µÄBatchSampler¶ÔÏó£¬ÒÔ¹¹½¨Âú×ãÎÒÃÇÐèÇóµÄÅú´ÎÁÐ±í£º

class DeepMojiBatchSampler(object):
"""A Batch sampler that enables larger epochs on small datasets and
has upsampling functionality.
# Arguments:
y_in: Labels of the dataset.
batch_size: Batch size.
epoch_size: Number of samples in an epoch.
upsample: Whether upsampling should be done. This flag should only be
set on binary class problems.
seed: Random number generator seed.
# __iter__ output:
iterator of lists (batches) of indices in the dataset
"""
def __init__(self, y_in, batch_size, epoch_size, upsample, seed):
self.batch_size = batch_size
self.epoch_size = epoch_size
self.upsample = upsample
np.random.seed(seed)
if upsample:
# Should only be used on binary class problems
assert len(y_in.shape) == 1
neg = np.where(y_in.numpy() == 0)[0]
pos = np.where(y_in.numpy() == 1)[0]
assert epoch_size % 2 == 0
samples_pr_class = int(epoch_size / 2)
else:
ind = range(len(y_in))
if not upsample:
# Randomly sample observations in a balanced way
self.sample_ind = np.random.choice(ind, epoch_size, replace=True)
else:
# Randomly sample observations in a balanced way
sample_neg = np.random.choice(neg, samples_pr_class, replace=True)
sample_pos = np.random.choice(pos, samples_pr_class, replace=True)
concat_ind = np.concatenate((sample_neg, sample_pos), axis=0)
# Shuffle to avoid labels being in specific order
# (all negative then positive)
p = np.random.permutation(len(concat_ind))
self.sample_ind = concat_ind[p]
label_dist = np.mean(y_in.numpy()[self.sample_ind])
assert(label_dist > 0.45)
assert(label_dist < 0.55)
def __iter__(self):
# Hand-off data using batch_size
for i in range(int(self.epoch_size/self.batch_size)):
start = i * self.batch_size
end = min(start + self.batch_size, self.epoch_size)
yield self.sample_ind[start:end]
def __len__(self):
# Take care of the last (maybe incomplete) batch
return (self.epoch_size + self.batch_size - 1) // self.batch_size

´ÓKerasµ½pyTorch£º²»ÒªÍü¼Ç³õʼ»¯

½«Keras/Tensorflow/Theano´úÂëÒÆÖ²µ½pyTorchµÄ¹ý³ÌÖУ¬×îºóÐèҪעÒâµÄÊÂÇéÊǶÔÈ¨ÖØµÄ³õʼ»¯¡£

KerasÔÚ¿ª·¢ËÙ¶È·½ÃæµÄÁíÒ»¸öÇ¿´óÌØµãÊDzãµÄĬÈϳõʼ»¯¡£

Ïà·´£¬pyTorch²¢Ã»Óгõʼ»¯È¨ÖØ£¬¶øÊÇÓÉ¿ª·¢Õß×Ô¼ºÀ´¾ö¶¨¡£ÎªÁËÔÚ΢µ÷È¨ÖØÊ±»ñµÃÒ»ÖµĽá¹û£¬ÎÒÃǽ«ÏñÈçÏ´úÂëÄÇÑù¸´ÖÆÄ¬ÈϵÄKerasÈ¨ÖØ³õʼ»¯£º

def init_weights(self):
"""
Here we reproduce Keras default initialization weights to initialize Embeddings/LSTM weights
"""
ih = (param.data for name, param in self.named_parameters() if 'weight_ih' in name)
hh = (param.data for name, param in self.named_parameters() if 'weight_hh' in name)
b = (param.data for name, param in self.named_parameters() if 'bias' in name)
nn.init.uniform(self.embed.weight.data, a=-0.5, b=0.5)
for t in ih:
nn.init.xavier_uniform(t)
for t in hh:
nn.init.orthogonal(t)
for t in b:
nn.init.constant(t, 0)

½áÂÛ

µ±ÎÒÃÇÕë¶ÔÒ»¸öÄ£ÐͱȽÏKerasºÍpyTorchÕâÁ½¸ö¿ò¼Üʱ£¬ÎÒÃÇ¿ÉÒԸоõµ½ËüÃÇÓÐ×Ų»Í¬µÄÕÜѧºÍÄ¿±ê¡£

¸ù¾ÝÎҵľ­ÑéÀ´¿´£º

  • Keras·Ç³£ÊʺÏÓÚ¿ìËÙ²âÊÔÔÚ¸ø¶¨ÈÎÎñÉÏ×éºÏ±ê×¼Éñ¾­ÍøÂç¿éµÄ¸÷ÖÖ·½·¨£»
  • pyTorch·Ç³£ÊʺÏÓÚ¿ìËÙ¿ª·¢ºÍ²âÊÔ×Ô¶¨ÒåµÄÉñ¾­ÍøÂçÄ£¿é£¬ÒòΪËüÓÐןܴóµÄ×ÔÓɶȺÍÒ×ÓÚÔĶÁµÄnumpy·ç¸ñµÄ´úÂë¡£
   
5717 ´Îä¯ÀÀ       27
Ïà¹ØÎÄÕÂ

»ùÓÚͼ¾í»ýÍøÂçµÄͼÉî¶Èѧϰ
×Ô¶¯¼ÝÊ»ÖеÄ3DÄ¿±ê¼ì²â
¹¤Òµ»úÆ÷ÈË¿ØÖÆÏµÍ³¼Ü¹¹½éÉÜ
ÏîĿʵս£ºÈçºÎ¹¹½¨ÖªÊ¶Í¼Æ×
 
Ïà¹ØÎĵµ

5GÈ˹¤ÖÇÄÜÎïÁªÍøµÄµäÐÍÓ¦ÓÃ
Éî¶ÈѧϰÔÚ×Ô¶¯¼ÝÊ»ÖеÄÓ¦ÓÃ
ͼÉñ¾­ÍøÂçÔÚ½»²æÑ§¿ÆÁìÓòµÄÓ¦ÓÃÑо¿
ÎÞÈË»úϵͳԭÀí
Ïà¹Ø¿Î³Ì

È˹¤ÖÇÄÜ¡¢»úÆ÷ѧϰ&TensorFlow
»úÆ÷ÈËÈí¼þ¿ª·¢¼¼Êõ
È˹¤ÖÇÄÜ£¬»úÆ÷ѧϰºÍÉî¶Èѧϰ
ͼÏñ´¦ÀíËã·¨·½·¨Óëʵ¼ù