±à¼ÍƼö: |
±¾ÎÄÀ´×ÔÓÚdaniellaah.github.io£¬½éÉÜÁË»úÆ÷ѧϰµÄµ¥±äÁ¿ÏßÐԻعéµÄÄ£ÐÍչʾÖеĺ¯ÊýºÍÌݶȣ¬ÎÄÖвÉÓöàÊýͼƬµÄÐÎʽÏêϸÃèÊö¡£ |
|
±¾ÎĵĵÚһƪÎҵĻúÆ÷ѧϰ±Ê¼Ç(Ò»)-¼à¶½Ñ§Ï°vs
Î޼ලѧϰ
Ä£ÐÍչʾ
ѵÁ·¼¯
ʲôÊÇѵÁ·¼¯(Training Set)£¿ÓÐѵÁ·ÑùÀý(training example)×é³ÉµÄ¼¯ºÏ¾ÍÊÇѵÁ·¼¯¡£ÈçÏÂͼËùʾ£¬ÓұߵÄÁ½ÁÐÊý¾Ý¾ÍÊDZ¾Àý×ÓÖеÄѵÁ·¼¯£¬
ÆäÖÐ\((x, y)\)ÊÇÒ»¸öѵÁ·ÑùÀý£¬\((x^{(i)}, y^{(i)})\)ÊǵÚ\(i\)¸öѵÁ·ÑùÀý¡£

¼ÙÉ躯Êý
ͨ¹ýѵÁ·¼¯ºÍѧϰËã·¨ÎÒÃǾͿÉÒԵõ½¼ÙÉ躯Êý(Hypothesis
Function)£¬¼ÙÉ躯Êý¼ÇΪh¡£ÔÚ·¿ÎݵÄÀý×ÓÖУ¬ÎÒÃǵļÙÉ躯Êý¾ÍÏ൱ÓÚÒ»¸öÓÉ·¿ÎÝÃæ»ýµ½·¿Îݼ۸ñµÄ½üËÆº¯Êý£¬Í¨¹ýÕâ¸ö¼ÙÉè¾Í¿ÉÒԵóöÏàÓ¦Ãæ»ý·¿ÎݵĹÀ¼ÛÁË¡£ÈçÏÂͼËùʾ£º

ÄÇôÎÒÃǸÃÈçºÎ±íʾ¼ÙÉ躯ÊýÄØ£¿ÔÚ±¾ÀýÖУ¬Ö»ÓÐÒ»¸ö±äÁ¿x(·¿ÎݵÄÃæ»ý)£¬ÎÒÃÇ¿ÉÒÔ½«¼ÙÉ躯ÊýhÒÔÈçϵÄÐÎʽ±íʾ£º<font
size='4'>$$ {h_\theta(x) =\theta_0+\theta_1x} $$</font>ΪÁË·½±ã$h_\theta(x)$Ò²¿ÉÒÔ¼Ç×÷$h(x)$¡£Õâ¸ö¾Í½Ð×öµ¥±äÁ¿µÄÏßÐԻعé(Linear
Regression with One Variable)¡£(Linear regression with
one variable = Univariate linear regression£¬ univariateÊÇone
variableµÄ×°±ÆÐ´·¨¡£) ÈçÏÂͼËùʾ¡£

´ú¼Ûº¯Êý
ÔڸղŵļÙÉ躯ÊýÖÐÓÐÁ½¸öδ֪µÄ²ÎÊý$\theta_0$ºÍ$\theta_1$£¬µ±Ñ¡Ôñ²»Í¬µÄ$\theta_0$ºÍ$\theta_1$ʱ£¬ÎÒÃÇÄ£Ð͵ÄЧ¹û¿Ï¶¨ÊDz»Ò»ÑùµÄ¡£ÈçÏÂͼËùʾ£¬ÁоÙÁËÈýÖÖÇé¿öϵļÙÉ躯Êý¡£

ÄÇôÎÒÃǸÃÈçºÎÑ¡ÔñÕâÁ½¸ö²ÎÊýÄØ£¿ÎÒÃǵÄÏë·¨ÊÇÑ¡Ôñ$\theta_0$ºÍ$\theta_1$£¬Ê¹µÃ¶ÔÓÚѵÁ·ÑùÀý$(x,y)$£¬$h_\theta(x)$×î½Ó½ü$y$¡£¼´£¬Ê¹Ã¿¸öÑùÀýµÄ¹À¼ÆÖµÓëÕæÊµÖµÖ®¼äµÄ²îµÄƽ·½µÄ¾ùÖµ×îС¡£Óù«Ê½±í´ïΪ:
<font size='4'>$${\mathop{minimize}\limits_{\theta_0,\theta_1}
\frac{1}{2m}\sum_{i=0}^m\left(h_\theta(x^{(i)}) -y^{(i)}\right)^2}$$</font>
½«ÉÏÃæµÄ¹«Ê½minimizeÓұ߲¿·Ö¼ÇΪ$J(\theta_0,\theta_1)$£º
<font size='4'>$$ {J(\theta_0,\theta_1)
=\frac{1}{2m}\sum_{i=0} ^m\left(h_\theta(x^{(i)})-y^{(i)}\right)^2}$$</font>
ÕâÑù¾ÍµÃµ½ÁËÎÒÃǵĴú¼Ûº¯Êý(Cost Function)$J (\theta_0,\theta_1)$£¬ÎÒÃǵÄÄ¿±ê¾ÍÊÇ<font
size='4'>$$\mathop{minimize} \limits_{\theta_0,\theta_1}
J(\theta_0,\theta_1)$$</font>

´ú¼Ûº¯ÊýII¡¡
ÏÖÔÚΪÁ˸ü·½±ãµØÌ½¾¿$h_\theta(x)$Óë$ J(\theta_0,\theta_1)$µÄ¹ØÏµ£¬ÎÒÃÇÏÈÁî$\theta_0$µÈÓÚ0¡£ÕâÑùÎÒÃǾ͵õ½Á˼ò»¯ºóµÄ¼ÙÉ躯Êý£¬ÏàÓ¦µØÒ²¿ÉÒԵõ½¼ò»¯µÄ´ú¼Ûº¯Êý¡£ÈçͼËùʾ:

¼ò»¯Ö®ºó£¬ÎÒÃÇÔÙÁî$\theta_1=1$£¬¾ÍµÃµ½$h_\theta(x)=x$ÈçÏÂͼ×óËùʾ¡£Í¼ÖÐÈý¸öºì²æ±íʾѵÁ·ÑùÀý£¬Í¨¹ý´ú¼Ûº¯ÊýµÄ¶¨ÒåÎÒÃǼÆËãµÃ³ö$J(1)=0$£¬¶ÔÓ¦ÏÂͼÓÒÖеÄ$(1,0)$×ø±ê¡£

ÖØ¸´ÉÏÃæµÄ²½Ö裬ÔÙÁî$\theta_1=0.5$£¬µÃµ½$h_\theta(x)$ÈçÏÂͼ×óËùʾ¡£Í¨¹ý¼ÆËãµÃ³ö$J(0.5)=0.58$£¬¶ÔÓ¦ÏÂͼÓÒÖеÄ$(0.5,0.58)$×ø±ê¡£

¶ÔÓÚ²»Í¬µÄ$\theta_1$£¬¿ÉÒԵõ½²»Í¬µÄ¼ÙÉ躯Êý$h_\theta(x)$£¬ÓÚÊǾÍÓÐÁ˲»Í¬µÄ$J(\theta_1)$µÄÖµ¡£½«ÕâЩµãÁ¬½ÓÆðÀ´¾Í¿ÉÒԵõ½$J(\theta_1)$µÄÇúÏߣ¬ÈçÏÂͼËùʾ£º

´ú¼Ûº¯ÊýIII¡¡¡¡¡¡
ÔÚÉÏÒ»½ÚÖУ¬ÎÒÃÇÁî$\theta_0$µÈÓÚ0£¬µÃµ½$J(\theta_1)$µÄÇúÏß¡£Èç¹û$\theta_0$²»µÈÓÚ0£¬ÀýÈç$\theta_0=50$,
$\theta_0=0.06$£¬´Ëʱ¾ÍÓÐÁ½¸ö±äÁ¿£¬ºÜÈÝÒ×Ïëµ½$J(\theta_1)$Ó¦¸ÃÊÇÒ»¸öÇúÃæ¡£

Õâ¸öͼÊǽÌÊÚÓÃmatlab»æÖƵģ¬ÓÉÓÚ3DͼÐβ»Ì«·½±ãÎÒÃÇÑо¿£¬ÎÒÃǾÍʹÓöþάµÄµÈ¸ßÏß(ÉÏͼÓÒÉϽǽÌÊÚдµÄcontour
plots/figures)£¬ÕâÑù¿´ÉÏÈ¥±È½ÏÇå³þһЩ¡£ÈçÏÂͼÓÒ£¬Ô½ÍùÀï±íʾ$J(\theta_0,\theta_1)$µÄֵԽС(¶ÔÓ¦3DͼÖÐÔ½¿¿½ü×îµÍµãµÄλÖÃ)¡£ÏÂͼ×ó±íʾµ±$\theta_0=800$,
$\theta_1=0.15$µÄʱºò¶ÔÓ¦µÄ$h_\theta(x)$£¬Í¨¹ý$\theta_0$, $\theta_1$µÄÖµ¿ÉÒÔÕÒµ½ÏÂͼÓÒÖÐ$J(\theta_0,\theta_1)$µÄÖµ¡£

ÀàËÆµØ£º



ÎÒÃDz»¶Ï³¢ÊÔÖ±µ½ÕÒµ½Ò»¸ö×î¼ÑµÄ$h_\theta(x)$£¬Ê¹µÃ$J(\theta_0,\theta_1)$×îС¡£µ±È»ÎÒÃDz»¿ÉÄÜËæ»ú²Â²â»òÕßÊÖ¹¤³¢ÊÔ²»Í¬²ÎÊýµÄÖµ¡£ÎÒÃÇÄÜÏëµ½µÄÓ¦¸Ã¾ÍÊÇͨ¹ýÉè¼Æ³ÌÐò£¬ÕÒµ½×î¼ÑµÄ$h_\theta(x)$£¬Ò²¾ÍÊÇ×îºÏÊʵÄ$\theta_0$ºÍ$\theta_1$¡£
ÌݶÈϽµI
ÎÒÃÇÏÈÖ±¹ÛµÄ¸ÐÊÜÒ»ÏÂʲôÊÇÌݶÈϽµ(Gradient Descent)¡£ÏëÒªÕÒµ½×îºÏÊʵÄ$\theta_0$ºÍ$\theta_1$£¬ÎÒÃÇ¿ÉÒÔÏÈÒÔijһ$\theta_0$ºÍ$\theta_1$¿ªÊ¼£¬È»ºó²»¶Ï¸Ä±ä$\theta_0$ºÍ$\theta_1$µÄֵʹµÃ$J(\theta_0,\theta_1)$Öµ²»¶Ï¼õС£¬Ö±µ½ÕÒµ½Ò»¸ö×îСֵ¡£

ÈçÏÂͼËùʾ£¬´Óijһµã¿ªÊ¼£¬Ã¿´ÎÑØ×ÅÒ»¶¨µÄÌݶÈϽµÖ±µ½µ½´ïÒ»¸ö¼«Ð¡ÖµÎªÖ¹¡£

µ±´Ó²»Í¬µÄµã¿ªÊ¼Ê±(¼´²»Í¬µÄ$\theta_0$ºÍ$\theta_1$)£¬¿ÉÄܵ½´ï²»Í¬µÄ×îСֵ(¼«Ð¡Öµ)£¬ÈçÏÂͼ£º

ÏÖÔÚÎÒÃÇ´ó¸ÅÖªµÀʲôÊÇÌݶÈϽµÁË£¬¾ÍºÃ±ÈÏÂɽһÑù£¬²»Í¬µÄɽ·Óв»Í¬µÄÆÂ¶È£¬ÓеÄɽ·×ߵÿìÓеÄ×ßµÃÂý¡£Ò»Ö±ÍùµØ´¦×ßÓпÉÄÜ×ßµ½²»Í¬µÄ×îµÍµã¡£ÄÇôÎÒÃÇÿ´Î¸ÃÈçºÎÓ¦¸ÃÈçºÎ¸Ä±ä$\theta_0$ºÍ$\theta_1$µÄֵĨ£¿ÈçÏÂͼËùʾ£¬ÕâÀïÌáµ½ÁËÌݶÈϽµËã·¨(Gradient
Descent Algorithm)£¬ÆäÖÐ$:=$±íʾ¸³Öµ£¬$\alpha$½Ð×öѧϰÂÊ£¬$\frac{\partial
}{\partial\theta_j}J(\theta_0, \theta_1)$½Ð×öÌݶȡ£ÕâÀïÒ»¶¨Òª×¢ÒâµÄÊÇ£¬Ë㷨ÿ´ÎÊÇͬʱ(simultaneous)¸Ä±ä$\theta_0$ºÍ$\theta_1$µÄÖµ£¬ÈçͼÏÂͼËùʾ¡£

ÌݶÈϽµII
ÏÖÁî$\theta_0$µÈÓÚ0£¬¼ÙÉèÒ»¿ªÊ¼Ñ¡È¡µÄ$\theta_1$ÔÚ×îµÍµãµÄÓҲ࣬´ËʱµÄÌݶÈÊÇÒ»¸öÕýÊý¡£¸ù¾ÝÉÏÃæµÄËã·¨¸üÐÂ$\theta_1$µÄʱºò£¬ËüµÄÖµ»á¼õС£¬¼´¿¿½ü×îµÍµã¡£

ÀàËÆµØ¼ÙÉèÒ»¿ªÊ¼Ñ¡È¡µÄ$\theta_1$ÔÚ×îµÍµãµÄ×ó²à£¬´ËʱµÄÌݶÈÊÇÒ»¸ö¸ºÊý£¬¸ù¾ÝÉÏÃæµÄËã·¨¸üÐÂ$\theta_1$µÄʱºò£¬ËüµÄÖµ»áÔö´ó£¬Ò²»á¿¿½ü×îµÍµã¡£

Èç¹ûÒ»¿ªÊ¼Ñ¡È¡µÄ$\theta_1$Ç¡ºÃÔÚ×îÊÊλÖã¬ÄÇô¸üÐÂ$\theta_1$ʱ£¬ËüµÄÖµ²»»á·¢Éú±ä»¯¡£

ѧϰÂÊ$\alpha$»áÓ°ÏìÌݶÈϽµµÄ³Ì¶È¡£Èç¹û$\alpha$̫С£¬¸ù¾ÝËã·¨£¬$\theta$µÄֵÿ´Î»á±ä»¯µÄºÜС£¬ÄÇôÌݶÈϽµ¾Í»á·Ç³£Âý£»Ïà·´µØ£¬Èç¹û$\alpha$¹ý´ó£¬$\theta$µÄֵÿ´Î»á±ä»¯»áºÜ´ó£¬ÓпÉÄÜÖ±½ÓÔ½¹ý×îµÍµã£¬¿ÉÄܵ¼ÖÂÓÀԶû·¨µ½´ï×îµÍµã¡£

Ëæ×ÅÔ½À´Ô½½Ó½ü×îµÍµãбÂÊ(¾ø¶ÔÖµ)»áÖð½¥¼õС£¬Ã¿´ÎϽµ³Ì¶È¾Í»áÔ½À´Ô½Ð¡¡£ËùÒÔ²¢²»ÐèÒª¼õС$\alpha$µÄÖµÀ´¼õСϽµ³Ì¶È¡£

ÌݶÈϽµIII
ÏÖÔÚÎÒÃÇËùÒª×öµÄ¾ÍÊǽ«ÌݶÈϽµËã·¨Ó¦Óõ½ÏßÐԻعéÄ£ÐÍÖÐÈ¥£¬¶øÆäÖÐ×î¹Ø¼üµÄ¾ÍÊǼÆËãÆäÖÐµÄÆ«µ¼ÊýÏÈçÏÂͼËùʾ¡£

ÎÒÃǽ«$h_\theta(x^{(i)})=\theta_0+\theta_1x^{(i)}$´øÈëµ½$J(\theta_0,\theta_1)$ÖУ¬²¢ÇÒ·Ö±ð¶Ô$\theta_0$ºÍ$\theta_1$Çóµ¼µÃ:

Óɴ˿ɵõ½ÎÒÃǵĵÚÒ»¸ö»úÆ÷ѧϰËã·¨£¬ÌݶÈϽµËã·¨:

ÔÚÎÒÃÇ֮ǰ½²µ½ÌݶÈϽµµÄʱºò£¬ÎÒÃÇÓõ½µÄÊÇÕâ¸öͼ£º

Æðʼµã²»Í¬£¬»áµÃµ½²»Í¬µÄ¾Ö²¿×îÓŽ⡣µ«ÊÂʵÉÏ£¬ÓÃÓÚÏßÐԻعéµÄ´ú¼Ûº¯Êý×ÜÊÇÒ»¸ö͹º¯Êý(Convex
Function)¡£ÕâÑùµÄº¯ÊýûÓоֲ¿×îÓŽ⣬ֻÓÐÒ»¸öÈ«¾Ö×îÓŽ⡣ËùÒÔÎÒÃÇÔÚʹÓÃÌݶÈϽµµÄʱºò£¬×Ü»áµÃµ½Ò»¸öÈ«¾Ö×îÓŽ⡣

ÏÂÃæÎÒÃÇÀ´¿´Ò»ÏÂÌݶÈϽµµÄÔËÐйý³Ì£º






µü´ú¶à´Îºó£¬ÎÒÃǵõ½ÁË×îÓŽ⡣ÏÖÔÚÎÒÃÇ¿ÉÒÔÓÃ×îÓŽâ¶ÔÓ¦µÄ¼ÙÉ躯ÊýÀ´¶Ô·¿¼Û½øÐÐÔ¤²âÁË¡£ÀýÈçÒ»¸ö1,250ƽ·½Ó¢³ßµÄ·¿×Ó´ó¸ÅÄÜÂôµ½250k$£¬ÈçÏÂͼËùʾ£º

×îºóÎÒÃÇÔÚ½éÉܼ¸¸öÏà¹ØµÄ¸ÅÄî¡£¸Õ²ÅÎÒÃÇÓõ½µÄÌݶÈϽµÒ²½Ð×÷ÅúÌݶÈϽµ(Batch Gradient
Descent)¡£ÕâÀïµÄ¡®Åú¡¯µÄÒâ˼ÊÇ˵£¬ÎÒÃÇÿ´Î¸üÐÂ$\theta$µÄʱºò£¬¶¼ÊÇÓÃÁËËùÓеÄѵÁ·ÑùÀý(training
example)¡£µ±È»Ò²ÓÐһЩÆäËûµÄÌݶÈϽµ£¬ÔÚºóÃæµÄ¿Î³ÌÖлá½éÉܵ½¡£

ÔÚºóÃæµÄ¿Î³ÌÖÐÎÒÃÇ»¹»áѧϰµ½ÁíÒ»ÖÖ²»ÐèÒªÏñÌݶÈϽµÒ»Ñù¶à´Îµü´úÒ²ÄÜÇó³ö×îÓŽâµÄ·½·¨£¬ÄǾÍÊÇÕý¹æ·½³Ì(Normal
Equation)¡£µ«ÊÇÔÚÊý¾ÝÁ¿ºÜ´óµÄÇé¿öÏ£¬ÌݶÈϽµ±È½ÏÊÊÓá£
|