±à¼ÍƼö: |
ÎÄÕ½éÉÜÁËÈçºÎ½¨Á¢ÆðÒ»¸öÈÎÒâ²ãÊýµÄÉî¶ÈÉñ¾ÍøÂç¡£Õâ¸öÉñ¾ÍøÂç¿ÉÒÔÓ¦ÓÃÓÚ¶þÔª·ÖÀàµÄ¼à¶½Ñ§Ï°ÎÊÌâ¡£
±¾ÎÄÀ´×ÔÓÚ°Ù¶È£¬ÓÉ»ðÁú¹ûÈí¼þAlice±à¼¡¢ÍƼö¡£ |
|

ͼ1 Éñ¾ÍøÂç¹¹ÔìµÄÀý×Ó£¨·ûºÅ˵Ã÷£ºÉϱê[l]±íʾÓëµÚl²ã£»Éϱ꣨i£©±íʾµÚi¸öÀý×Ó£»Ï±êi±íʾʸÁ¿µÚiÏ
µ¥²ãÉñ¾ÍøÂç

ͼ2 µ¥²ãÉñ¾ÍøÂçʾÀý
Éñ¾ÔªÄ£ÐÍÊÇÏȼÆËãÒ»¸öÏßÐÔº¯Êý£¨z=Wx+b£©£¬½Ó×ÅÔÙ¼ÆËãÒ»¸ö¼¤»îº¯Êý¡£Ò»°ãÀ´Ëµ£¬Éñ¾ÔªÄ£Ð͵ÄÊä³öÖµÊÇa=g(Wx+b)£¬ÆäÖÐgÊǼ¤»îº¯Êý£¨sigmoid,tanh,
ReLU, ¡£©¡£
Êý¾Ý¼¯
¼ÙÉèÓÐÒ»¸öºÜ´óµÄÊý¾Ý¿â£¬ÀïÃæ¼Ç¼Á˺ܶàÌìÆøÊý¾Ý£¬ÀýÈç£¬ÆøÎ¡¢Êª¶È¡¢ÆøÑ¹ºÍ½µÓêÂÊ¡£
ÎÊÌâ³ÂÊö£º
Ò»×éѵÁ·Êý¾Ým_train£¬ÏÂÓê±ê¼ÇΪ£¨1£©£¬²»ÏÂÓê±ê¼ÇΪ£¨0£©¡£
Ò»¸ö²âÊÔÊý¾Ý×ém_test£¬±ê¼ÇÊÇ·ñÏÂÓê¡£
ÿһ¸öÌìÆøÊý¾Ý°üº¬x1=ÆøÎ£¬x2=ʪ¶È£¬x3=ÆøÑ¹¡£
»úÆ÷ѧϰÖÐÒ»¸ö³£¼ûµÄÔ¤´¦Àí²½ÖèÊǽ«Êý¾Ý¼¯¾ÓÖв¢±ê×¼»¯£¬ÕâÒâζ×Å´Óÿ¸öʾÀýÖмõÈ¥Õû¸önumpyÊý×éµÄƽ¾ùÖµ£¬È»ºó½«Ã¿¸öʾÀý³ýÒÔÕû¸önumpyÊý×éµÄ±ê׼ƫ²î¡£

ͨÓ÷½·¨£¨½¨Á¢²¿·ÖËã·¨£©
ʹÓÃÉî¶ÈѧϰÀ´½¨ÔìÄ£ÐÍ
1. ¶¨ÒåÄ£Ð͹¹Ô죨ÀýÈ磬Êý¾ÝµÄÊäÈëÌØÕ÷£©
2. ³õʼ»¯²ÎÊý²¢¶¨Ò峬²ÎÊý
µü´ú´ÎÊý
ÔÚÉñ¾ÍøÂçÖеÄL²ãµÄ²ãÊý
Òþ²Ø²ã´óС
ѧϰÂʦÁ
3. µü´úÑ»·
ÕýÏò´«²¥£¨¼ÆËãµçÁ÷ËðºÄ£©
¼ÆËã³É±¾º¯Êý
·´Ïò´«²¥£¨¼ÆËãµçÁ÷ËðºÄ£©
Éý¼¶²ÎÊý£¨Ê¹Óñ³¾°²ÎÊýºÍÌݶȣ©
4. ʹÓÃѵÁ·²ÎÊýÀ´Ô¤²â±êÇ©
³õʼ»¯
¸üÉî²ã´ÎµÄL-²ãÉñ¾ÍøÂçµÄ³õʼ»¯¸üΪ¸´ÔÓ£¬ÒòΪÓиü¶àµÄÈ¨ÖØ¾ØÕóºÍÆ«ÖÃÏòÁ¿¡£Ï±íչʾÁ˲»Í¬½á¹¹µÄ¸÷Öֲ㼶¡£

±í1 L²ãµÄÈ¨ÖØ¾ØÕów¡¢Æ«ÖÃÏòÁ¿bºÍ¼¤»îº¯Êýz

±í2 ʾÀý¼Ü¹¹ÖеÄÉñ¾ÍøÂçÈ¨ÖØ¾ØÕów¡¢Æ«ÖÃÏòÁ¿bºÍ¼¤»îº¯Êýz
±í2°ïÖúÎÒÃÇΪͼ1ÖеÄʾÀýÉñ¾ÍøÂç¼Ü¹¹µÄ¾ØÕó×¼±¸ÁËÕýÈ·µÄά¶È¡£
import numpy
as np
import matplotlib.pyplot as plt
nn_architecture = [ {"layer_size": 4,"activation":
"none"}, # input layer {"layer_size": 5,"activation":
"relu"}, {"layer_size": 4,"activation":
"relu"}, {"layer_size": 3,"activation":
"relu"}, {"layer_size": 1,"activation":
"sigmoid"} ] def initialize_parameters(nn_architecture,
seed = 3): np.random.seed(seed) # python dictionary containingour parameters
"W1", "b1", ..., "WL","bL" parameters = {} number_of_layers = len(nn_architecture) for l in range(1,number_of_layers): parameters['W' + str(l)] =np.random.randn( nn_architecture[l]["layer_size"], nn_architecture[l-1]["layer_size"] ) * 0.01 parameters['b' + str(l)] =np.zeros((nn_architecture[l]["layer_size"],
1)) return parameters |
´úÂë¶Î1 ²ÎÊý³õʼ»¯
ʹÓÃÐ¡Ëæ»úÊý³õʼ»¯²ÎÊýÊÇÒ»ÖÖ¼òµ¥µÄ·½·¨£¬µ«Í¬Ê±Ò²±£Ö¤Ëã·¨µÄÆðʼֵ×ã¹»ºÃ¡£
¼Çס£º
¡¤ ²»Í¬µÄ³õʼ»¯¹¤¾ß£¬ÀýÈçZero,Random, He or Xavier£¬¶¼»áµ¼Ö²»Í¬µÄ½á¹û¡£
¡¤ Ëæ»ú³õʼ»¯Äܹ»È·±£²»Í¬µÄÒþ²Øµ¥Ôª¿ÉÒÔѧϰ²»Í¬µÄ¶«Î÷£¨³õʼ»¯ËùÓÐÈ¨ÖØÎªÁã»áµ¼Ö£¬ËùÓвã´ÎµÄËùÓиÐÖª»ú¶¼½«Ñ§Ï°ÏàͬµÄ¶«Î÷£©¡£
¡¤ ²»Òª³õʼ»¯ÎªÌ«´óµÄÖµ¡£
¼¤»îº¯Êý
¼¤»îº¯ÊýµÄ×÷ÓÃÊÇΪÁËÔö¼ÓÉñ¾ÍøÂçµÄ·ÇÏßÐÔ¡£ÏÂÀý½«Ê¹ÓÃsigmoid and ReLU¡£
SigmoidÊä³öÒ»¸ö½éÓÚ0ºÍ1Ö®¼äµÄÖµ£¬ÕâʹµÃËü³ÉΪ¶þ½øÖÆ·ÖÀàµÄÒ»¸öºÜºÃµÄÑ¡Ôñ¡£Èç¹ûÊä³öСÓÚ0.5£¬¿ÉÒÔ½«Æä·ÖÀàΪ0£»Èç¹ûÊä³ö´óÓÚ0.5£¬¿ÉÒÔ½«Æä·ÖÀàΪ1¡£
def sigmoid(Z):
S = 1 / (1 + np.exp(-Z)) return S def relu(Z): R = np.maximum(0, Z) return R def sigmoid_backward(dA, Z): S = sigmoid(Z) dS = S * (1 - S) return dA * dS def relu_backward(dA, Z): dZ = np.array(dA, copy = True) dZ[Z <= 0] = 0 return dZ |
´úÂë¶Î2 SigmoidºÍReLU¼¤»îº¯Êý£¬¼°ÆäÑÜÉúÎï
ÔÚ´úÂë¶Î2ÖУ¬¿ÉÒÔ¿´µ½¼¤»îº¯Êý¼°ÆäÅÉÉúµÄʸÁ¿»¯±à³ÌʵÏÖ¡£¸Ã´úÂ뽫ÓÃÓÚ½øÒ»²½µÄ¼ÆËã¡£
ÕýÏò´«²¥
ÔÚÕýÏò´«²¥ÖУ¬ÔÚ²ãlµÄÕýÏòº¯ÊýÖУ¬ÐèÒªÖªµÀ¸Ã²ãÖеļ¤»îº¯ÊýÊÇÄÄÒ»ÖÖ£¨sigmoid¡¢tanh¡¢ReLUµÈ£©¡£Ç°Ò»²ãµÄÊä³öֵΪÕâÒ»²ãµÄÊäÈëÖµ£¬ÏȼÆËãz£¬ÔÙÓÃÑ¡¶¨µÄ¼¤»îº¯Êý¼ÆËã¡£

ͼ3 Éñ¾ÍøÂçµÄÕýÏò´«²¥
ÏßÐÔÕýÏòÄ£¿é£¨¶ÔËùÓÐʾÀý½øÐÐʸÁ¿»¯£©¼ÆËãÒÔÏ·½³Ìʽ£º

·½³Ìʽ1 ÏßÐÔÕýÏòº¯Êý
def L_model_forward(X,
parameters, nn_architecture):
forward_cache = {} A = X number_of_layers =len(nn_architecture) for l in range(1,number_of_layers): A_prev = A W = parameters['W' + str(l)] b = parameters['b' + str(l)] activation =nn_architecture[l]["activation"] Z, A =linear_activation_forward(A_prev, W,
b, activation) forward_cache['Z' + str(l)] =Z forward_cache['A' + str(l)] =A AL = A return AL, forward_cache def linear_activation_forward(A_prev, W, b,
activation): if activation =="sigmoid": Z = linear_forward(A_prev, W,b) A = sigmoid(Z) elif activation =="relu": Z = linear_forward(A_prev, W,b) A = relu(Z) return Z, A def linear_forward(A, W, b): Z = np.dot(W, A) + b return Z |
´úÂë¶Î3 ÕýÏò´«²¥Ä£ÐÍ
ʹÓá°cache¡±£¨python×Öµä°üº¬ÎªÌض¨²ãËù¼ÆËãµÄaºÍzÖµ£©ÒÔÔÚÕýÏò´«²¥ÖÁÏàÓ¦µÄ·´Ïò´«²¥ÆÚ¼ä´«µÝ±äÁ¿¡£Ëü°üº¬ÓÃÓÚ·´Ïò´«²¥¼ÆËãµ¼ÊýµÄÓÐÓÃÖµ¡£
Ëðʧº¯Êý
ΪÁ˹ܳÌѧϰ¹ý³Ì£¬ÐèÒª¼ÆËã´ú¼Ûº¯ÊýµÄÖµ¡£ÏÂÃæµÄ¹«Ê½ÓÃÓÚ¼ÆËã³É±¾¡£

·½³Ìʽ2 ½»²æìسɱ¾
def compute_cost(AL,
Y):
m = Y.shape[1] # Compute loss from AL and y logprobs =np.multiply(np.log(AL),Y) + np.multiply(1
- Y, np.log(1 - AL)) # cross-entropy cost cost = - np.sum(logprobs) / m cost = np.squeeze(cost) return cost |
´úÂë¶Î4 ´ú¼Ûº¯ÊýµÄ¼ÆËã
·´Ïò´«²¥
·´Ïò´«²¥ÓÃÓÚ¼ÆËã²ÎÊýµÄËðʧº¯ÊýÌݶȡ£¸ÃËã·¨ÊÇÓÉ΢·ÖѧÖÐÒÑÖªµÄ¡°Á´¹æÔò¡±µÝ¹éʹÓõġ£
·´Ïò´«²¥¼ÆËãÖÐʹÓõĹ«Ê½£º

·½³Ìʽ3 ·´Ïò´«²¥¼ÆË㹫ʽ
Á´Ê½·¨ÔòÊǼÆË㸴ºÏº¯Êýµ¼ÊýµÄ¹«Ê½¡£¸´ºÏº¯Êý¾ÍÊǺ¯ÊýÌ׺¯Êý¡£

·½³Ìʽ4 Á´¹æÔòʾÀý
¡°Á´¹æÔò¡±ÔÚ¼ÆËãËðʧʱʮ·ÖÖØÒª£¨ÒÔ·½³Ìʽ5ΪÀý£©¡£

·½³Ìʽ5 Ëðʧº¯Êý£¨º¬Ìæ»»Êý¾Ý£©¼°ÆäÏà¶ÔÓÚµÚÒ»È¨ÖØµÄµ¼Êý
Éñ¾ÍøÂçÄ£ÐÍ·´Ïò´«²¥µÄµÚÒ»²½ÊǼÆËã×îºóÒ»²ãËðʧº¯ÊýÏà¶ÔÓÚzµÄµ¼Êý¡£·½³Ìʽ6ÓÉÁ½²¿·Ö×é³É£º·½³Ìʽ2Ëðʧº¯ÊýµÄµ¼Êý£¨¹ØÓÚ¼¤»îº¯Êý£©ºÍ¼¤»îº¯Êý¡°sigmoid¡±¹ØÓÚ×îºóÒ»²ãZµÄµ¼Êý¡£

·½³Ìʽ6 ´Ó4²ã¶ÔzµÄËðʧº¯Êýµ¼Êý
·½³Ìʽ6µÄ½á¹û¿ÉÓÃÓÚ¼ÆËã·½³Ìʽ3µÄµ¼Êý¡£

·½³Ìʽ7 Ëðʧº¯ÊýÏà¶ÔÓÚ3²ãµÄµ¼Êý
ÔÚ½øÒ»²½¼ÆËãÖУ¬Ê¹ÓÃÁËÓëµÚÈý²ã¼¤»îº¯ÊýÓйصÄËðʧº¯ÊýµÄµ¼Êý£¨·½³Ìʽ7£©¡£

·½³Ìʽ8 µÚÈý²ãµÄµ¼Êý
·½³Ìʽ7µÄ½á¹ûºÍµÚÈý²ã»î»¯º¯Êý¡°relu¡±µÄµ¼ÊýÓÃÓÚ¼ÆËã·½³Ìʽ8µÄµ¼Êý£¨Ëðʧº¯ÊýÏà¶ÔÓÚzµÄµ¼Êý£©¡£È»ºó£¬ÎÒÃǶԷ½³Ìʽ3½øÐÐÁ˼ÆËã¡£
ÎÒÃǶԷ½³Ì9ºÍ10×öÁËÀàËÆµÄ¼ÆËã¡£

·½³Ìʽ9 µÚ¶þ²ãµÄµ¼Êý

·½³Ìʽ10 µÚÒ»²ãµÄµ¼Êý
×ÜÌå˼·
´ÓµÚÒ»²ã²ã¶ÔzµÄËðʧº¯Êýµ¼ÊýÓÐÖúÓÚ¼ÆË㣨L-1£©²ã£¨ÉÏÒ»²ã£©¶ÔËðʧº¯ÊýµÄµ¼Êý¡£½á¹û½«ÓÃÓÚ¼ÆË㼤»îº¯ÊýµÄµ¼Êý¡£

ͼ4 Éñ¾ÍøÂçµÄ·´Ïò´«²¥
def L_model_backward(AL,
Y, parameters, forward_cache, nn_architecture):
grads = {} number_of_layers =len(nn_architecture) m = AL.shape[1] Y = Y.reshape(AL.shape) # afterthis line, Y
is the same shape as AL # Initializing thebackpropagation dAL = - (np.divide(Y, AL) -np.divide(1 - Y,
1 - AL)) dA_prev = dAL for l in reversed(range(1,number_of_layers)): dA_curr = dA_prev activation =nn_architecture[l]["activation"] W_curr = parameters['W' +str(l)] Z_curr = forward_cache['Z' +str(l)] A_prev = forward_cache['A' +str(l-1)] dA_prev, dW_curr, db_curr =linear_activation_backward(dA_curr,
Z_curr, A_prev, W_curr, activation) grads["dW" +str(l)] = dW_curr grads["db" +str(l)] = db_curr return grads def linear_activation_backward(dA, Z, A_prev,
W, activation): if activation =="relu": dZ = relu_backward(dA, Z) dA_prev, dW, db =linear_backward(dZ, A_prev,
W) elif activation =="sigmoid": dZ = sigmoid_backward(dA, Z) dA_prev, dW, db =linear_backward(dZ, A_prev,
W) return dA_prev, dW, db def linear_backward(dZ, A_prev, W): m = A_prev.shape[1] dW = np.dot(dZ, A_prev.T) / m db = np.sum(dZ, axis=1,keepdims=True) / m dA_prev = np.dot(W.T, dZ) return dA_prev, dW, db |
´úÂë¶Î5 ·´Ïò´«²¥Ä£¿é
¸üвÎÊý
¸Ãº¯ÊýµÄÄ¿±êÊÇͨ¹ýÌݶÈÓÅ»¯À´¸üÐÂÄ£Ð͵IJÎÊý¡£
def update_parameters(parameters,
grads, learning_rate):
L = len(parameters) for l in range(1, L): parameters["W" +str(l)] = parameters["W"
+ str(l)] - learning_rate *grads["dW"
+ str(l)] parameters["b" +str(l)] = parameters["b"
+ str(l)] - learning_rate *grads["db"
+ str(l)] return parameters |
´úÂë¶Î6 ʹÓÃÌݶÈϽµ¸üвÎÊýÖµ
ȫģÐÍ
Éñ¾ÍøÂçÄ£Ð͵ÄÍêÕûʵÏÖ°üÀ¨ÔÚÆ¬¶ÎÖÐÌṩµÄ·½·¨¡£
def L_layer_model(X,
Y, nn_architecture, learning_rate = 0.0075,num_iterations
= 3000, print_cost=False):
np.random.seed(1) # keep track of cost costs = [] # Parameters initialization. parameters =initialize_parameters(nn_architecture) # Loop (gradient descent) for i in range(0,num_iterations): # Forward propagation:[LINEAR -> RELU]*(L-1)
-> LINEAR -> SIGMOID. AL, forward_cache =L_model_forward(X, parameters,
nn_architecture) # Compute cost. cost = compute_cost(AL, Y) # Backward propagation. grads = L_model_backward(AL,Y, parameters,
forward_cache, nn_architecture) # Update parameters. parameters =update_parameters(parameters, grads,
learning_rate) # Print the cost every 100training example if print_cost and i % 100 ==0: print("Cost afteriteration %i: %f"
%(i, cost)) costs.append(cost) # plot the cost plt.plot(np.squeeze(costs)) plt.ylabel('cost') plt.xlabel('iterations (pertens)') plt.title("Learning rate=" + str(learning_rate)) plt.show() return parameters |
´úÂë¶Î7 Õû¸öÉñ¾ÍøÂçÄ£ÐÍ
Ö»ÐèÒª½«ÒÑÖªµÄÈ¨ÖØºÍϵÁвâÊÔÊý¾Ý£¬Ó¦ÓÃÓÚÕýÏò´«²¥Ä£ÐÍ£¬¾ÍÄÜÔ¤²â½á¹û¡£
¿ÉÒÔÐÞ¸Äsnippet1ÖеÄnn_¼Ü¹¹£¬ÒÔ¹¹½¨¾ßÓв»Í¬²ãÊýºÍÒþ²Ø²ã´óСµÄÉñ¾ÍøÂç¡£´ËÍ⣬׼±¸ÕýȷʵÏÖ¼¤»îº¯Êý¼°ÆäÅÉÉúº¯Êý£¨´úÂë¶Î2£©¡£ËùʵÏֵĺ¯Êý¿ÉÓÃÓÚÐ޸ĴúÂë¶Î3ÖеÄÏßÐÔÕýÏò¼¤»î·½·¨ºÍ´úÂë¶Î5ÖеÄÏßÐÔ·´Ïò¼¤»î·½·¨¡£
½øÒ»²½¸Ä½ø
Èç¹ûѵÁ·Êý¾Ý¼¯²»¹»´ó£¬Ôò¿ÉÄÜÃæÁÙ¡°¹ý¶ÈÄâºÏ¡±ÎÊÌâ¡£ÕâÒâζ×ÅËùѧµÄÍøÂç²»»á¸ÅÀ¨ÎªËü´Óδ¼û¹ýµÄÐÂÀý×Ó¡£¿ÉÒÔʹÓÃÕýÔò»¯·½·¨£¬ÈçL2¹æ·¶»¯£¨Ëü°üÀ¨Êʵ±µØÐ޸ijɱ¾º¯Êý£©»òÍ˳ö£¨ËüÔÚÿ´Îµü´úÖÐËæ»ú¹Ø±ÕһЩ¸ÐÖª»ú£©¡£
ÎÒÃÇʹÓÃÌݶÈϽµÀ´¸üвÎÊýºÍ×îС»¯³É±¾¡£Äã¿ÉÒÔѧϰ¸ü¶à¸ß¼¶ÓÅ»¯·½·¨£¬ÕâЩ·½·¨¿ÉÒÔ¼Ó¿ìѧϰËÙ¶È£¬ÉõÖÁ¿ÉÒÔΪ³É±¾º¯ÊýÌṩ¸üºÃµÄ×îÖÕ¼ÛÖµ£¬ÀýÈ磺
¡¤ СÅúÁ¿ÌݶÈϽµ
¡¤ ¶¯Á¦
¡¤ AdamÓÅ»¯Æ÷

|