ÔÚ¼ÆËã»ú¿ÆÑ§ÖУ¬Ê÷ÊÇÒ»ÖÖºÜÖØÒªµÄÊý¾Ý½á¹¹£¬±ÈÈçÎÒÃÇ×îΪÊìϤµÄ¶þ²æ²éÕÒÊ÷£¨Binary
Search Tree£©£¬ºìºÚÊ÷£¨Red-Black Tree£©µÈ£¬Í¨¹ýÒýÈëÊ÷ÕâÖÖÊý¾Ý½á¹¹£¬ÎÒÃÇ¿ÉÒÔºÜ¿ìµØËõСÎÊÌâ¹æÄ££¬ÊµÏÖ¸ßЧµÄ²éÕÒ¡£
ÔڼලѧϰÖУ¬Ãæ¶ÔÑù±¾Öи´ÔÓ¶àÑùµÄÌØÕ÷£¬Ñ¡È¡Ê²Ã´ÑùµÄ²ßÂÔ¿ÉÒÔʵÏֽϸߵÄѧϰЧÂʺͽϺõķÖÀàЧ¹ûÒ»Ö±ÊÇ¿ÆÑ§¼ÒÃÇ̽Ë÷µÄÄ¿±ê¡£ÄÇô£¬Ê÷ÕâÖֽṹµ½µ×¿ÉÒÔÈçºÎÓÃÓÚ»úÆ÷ѧϰÖÐÄØ£¿ÎÒÃÇÏÈ´ÓÒ»¸öÓÎÏ·¿ªÊ¼¡£
ÎÒÃÇÓ¦¸Ã¶¼Íæ¹ý»òÕßÌý¹ýÕâôһÖÖÓÎÏ·£ºÓÎÏ·ÖУ¬³öÌâÕßдÏÂÒ»¸öÃ÷ÐǵÄÃû×Ö£¬ÆäËûÈËÐèÒª²Â³öÕâ¸öÈËÊÇË¡£µ±È»£¬Èç¹ûÓÎÏ·¹æÔò½ö´Ë¶øÒѵϰ£¬¼¸ºõÊÇÎÞ·¨²Â³öÀ´µÄ£¬ÒòΪÎÊÌâµÄ¹æÄ£Ì«´óÁË¡£ÎªÁ˽µµÍÓÎÏ·µÄÄѶȣ¬´ðÌâÕß¿ÉÒÔÏò³öÌâÕßÎÊÎÊÌ⣬¶ø³öÌâÕß±ØÐë׼ȷ»Ø´ðÊÇ»òÕß·ñ£¬´ðÌâÕßÒÀ¾Ý»Ø´ðÌá³öÏÂÒ»¸öÎÊÌ⣬Èç¹ûÄܹ»ÔÚÖ¸¶¨´ÎÊýÄÚÈ·¶¨ÃÕµ×£¬¼´ÎªÊ¤³ö¡£¼ÓÈëÁËÎÊ´ð¹æÔòÖ®ºó£¬ÎÒÃÇÊÇ·ñÓпÉÄܲ³öÃÕµ×ÄØ£¿ÎÒÃÇÏÈʵÑéһϣ¬ÏÖÔÚÎÒÒѾдÏÂÁËÒ»¸öÓ°ÊÓÃ÷ÐǵÄÃû×Ö£¬¶øÄãºÍÎÒµÄÎÊ´ð¼Ç¼ÈçÏ£º
ÊÇÄеÄÂð£¿Y
ÊÇÑÇÖÞÈËÂð£¿Y
ÊÇÖйúÈËÂð£¿N
ÊÇÓ¡¶ÈÈËÂð£¿Y
¡¡
ËäȻֻÓж̶ÌËĸöÎÊÌ⣬µ«ÊÇÎÒÃÇÒѾ°Ñ´ð°¸µÄ·¶Î§´ó´óËõСÁË£¬ÄÇô½ÓÏ£¬µÚ5¸öÎÊÌâÄãÓ¦¸ÃÈçºÎÎÊÄØ£¿ÎÒÏàÐÅÄãÓ¦¸Ã»ù±¾¿ÉÒÔËø¶¨´ð°¸ÁË£¬ÒòΪÎÒ¿´¹ýµÄÓ¡¶ÈµçÓ°¾ÍÄÇô¼¸²¿¡£ÎÒÃǽ«ÉÏÃæµÄÐÅÏ¢½á¹¹»¯ÈçÏÂͼËùʾ£º

ÔÚÉÏÃæµÄÓÎÏ·ÖУ¬ÎÒÃÇÕë¶ÔÐÔµÄÌá³öÎÊÌ⣬ÿһ¸öÎÊÌâ¶¼¿ÉÒÔ½«ÎÒÃǵĴ𰸷¶Î§ËõС£¬ÔÚÌáÎÊÖкͻشðÕßÓÐÏà֪ͬʶ±³¾°µÄǰÌáÏ£¬µÃ³ö´ð°¸µÄÄѶȱÈÎÒÃÇÏëÏóµÄҪСºÜ¶à¡£
»Øµ½ÎÒÃÇ×î³õµÄÎÊÌâÖУ¬ÈçºÎ½«Ê÷½á¹¹ÓÃÓÚ»úÆ÷ѧϰÖУ¿½áºÏÉÏÃæµÄͼ£¬ÎÒÃÇ¿ÉÒÔ¿´³ö£¬ÔÚÿһ¸ö½Úµã£¬ÒÀ¾ÝÎÊÌâ´ð°¸£¬¿ÉÒÔ½«´ð°¸»®·ÖΪ×óÓÒÁ½¸ö·ÖÖ§£¬×ó·ÖÖ§´ú±íµÄÊÇYes£¬ÓÒ·ÖÖ§´ú±íµÄÊÇNo£¬ËäȻΪÁ˼ò»¯£¬ÎÒÃÇÖ»»³öÁËÆäÖеÄÒ»Ìõ·¾¶£¬µ«ÊÇÒ²¿ÉÒÔÃ÷ÏÔ¿´³öÕâÊÇÒ»¸öÊ÷Ðνṹ£¬Õâ±ãÊǾö²ßÊ÷µÄÔÐÍ¡£
1. ¾ö²ßÊ÷Ëã·¨¼ò½é
ÎÒÃÇÃæ¶ÔµÄÑù±¾Í¨³£¾ßÓкܶà¸öÌØÕ÷£¬ÕýËùν¶ÔÊÂÎïµÄÅжϲ»ÄÜÖ»´ÓÒ»¸ö½Ç¶È£¬ÄÇÈçºÎ½áºÏ²»Í¬µÄÌØÕ÷ÄØ£¿¾ö²ßÊ÷Ëã·¨µÄ˼ÏëÊÇ£¬ÏÈ´ÓÒ»¸öÌØÕ÷ÈëÊÖ£¬¾ÍÈçͬÎÒÃÇÉÏÃæµÄÓÎÏ·ÖÐÒ»Ñù£¬¼ÈÈ»ÎÞ·¨Ö±½Ó·ÖÀ࣬ÄǾÍÏȸù¾ÝÒ»¸öÌØÕ÷½øÐзÖÀ࣬ËäÈ»·ÖÀà½á¹û´ï²»µ½ÀíÏëЧ¹û£¬µ«ÊÇͨ¹ýÕâ´Î·ÖÀ࣬ÎÒÃǵÄÎÊÌâ¹æÄ£±äСÁË£¬Í¬Ê±·ÖÀàºóµÄ×Ó¼¯Ïà±ÈÔÀ´µÄÑù±¾¼¯¸ü¼ÓÒ×ÓÚ·ÖÀàÁË¡£È»ºóÕë¶ÔÉÏÒ»´Î·ÖÀàºóµÄÑù±¾×Ó¼¯£¬Öظ´Õâ¸ö¹ý³Ì¡£ÔÚÀíÏëµÄÇé¿öÏ£¬¾¹ý¶à²ãµÄ¾ö²ß·ÖÀ࣬ÎÒÃǽ«µÃµ½ÍêÈ«´¿¾»µÄ×Ó¼¯£¬Ò²¾ÍÊÇÿһ¸ö×Ó¼¯ÖеÄÑù±¾¶¼ÊôÓÚͬһ¸ö·ÖÀà¡£

±ÈÈçÉÏͼÖУ¬Æ½Ãæ×ø±êÖеÄÁù¸öµã£¬ÎÒÃÇÎÞ·¨Í¨¹ýÆäx×ø±ê»òÕßy×ø±êÖ±½Ó¾Í½«Á½Ààµã·Ö¿ª¡£²ÉÓþö²ßÊ÷Ë㷨˼Ï룺ÎÒÃÇÏÈÒÀ¾Ýy×ø±ê½«Áù¸öµã»®·ÖΪÁ½¸ö×ÓÀࣨÈçˮƽÏßËùʾ£©£¬Ë®Æ½ÏßÉÏÃæµÄÁ½¸öµãÊÇͬһ¸ö·ÖÀ࣬µ«ÊÇˮƽÏß֮ϵÄËĸöµãÊDz»´¿¾»µÄ¡£µ«ÊÇû¹ØÏµ£¬ÎÒÃǶÔÕâËĸöµã½øÐÐÔٴηÖÀ࣬Õâ´ÎÎÒÃÇÒÔx×ó±ß·ÖÀࣨ¼ûͼÖеÄÊúÏߣ©£¬Í¨¹ýÁ½²ã·ÖÀ࣬ÎÒÃÇʵÏÖÁ˶ÔÑù±¾µãµÄÍêÈ«·ÖÀà¡£ÕâÑù£¬ÎÒÃǵľö²ßÊ÷µÄα´úÂëʵÏÖÈçÏ£º
if
y > a:
output dot
else:
if x < b:
output cross
else:
output dot |
ÓÉÕâ¸ö·ÖÀàµÄ¹ý³ÌÐγÉÒ»¸öÊ÷ÐεÄÅоöÄ£ÐÍ£¬Ê÷µÄÿһ¸ö·ÇÒ¶×ӽڵ㶼ÊÇÒ»¸öÌØÕ÷·Ö¸îµã£¬Ò¶×Ó½ÚµãÊÇ×îÖյľö²ß·ÖÀà¡£ÈçÏÂͼËùʾ

½«ÐÂÑù±¾ÊäÈë¾ö²ßÊ÷½øÐÐÅоöʱ£¬¾ÍÊǽ«Ñù±¾ÔÚ¾ö²ßÊ÷ÉÏ×Ô¶¥ÏòÏ£¬ÒÀ¾Ý¾ö²ßÊ÷µÄ½Úµã¹æÔò½øÐбéÀú£¬×îÖÕÂäÈëµÄÒ¶×Ó½Úµã¾ÍÊǸÃÑù±¾ËùÊôµÄ·ÖÀà¡£
2 ¾ö²ßÊ÷Ëã·¨Á÷³Ì
ÉÏÃæÎÒÃǽéÉܾö²ßÊ÷Ëã·¨µÄ˼Ï룬¿ÉÒÔ¼òµ¥¹éÄÉΪÈçÏÂÁ½µã£º
1.ÿ´ÎÑ¡ÔñÆäÖÐÒ»¸öÌØÕ÷¶ÔÑù±¾¼¯½øÐзÖÀà
2.¶Ô·ÖÀàºóµÄ×Ó¼¯µÝ¹é½øÐв½Öè1
¿´ÆðÀ´ÊDz»ÊÇҲ̫¼òµ¥ÁËÄØ£¿Êµ¼ÊÉÏÿһ¸ö²½ÖèÎÒÃÇ»¹Óкܶ࿼Âǵġ£ÔÚµÚÒ»¸ö²½ÖèÖУ¬ÎÒÃÇÐèÒª¿¼ÂǵÄÒ»¸ö×îÖØÒªµÄ²ßÂÔÊÇ£¬Ñ¡È¡Ê²Ã´ÑùµÄÌØÕ÷¿ÉÒÔʵÏÖ×îºÃµÄ·ÖÀàЧ¹û£¬¶øËùνµÄ·ÖÀàЧ¹ûºÃ»µ£¬±ØÈ»Ò²ÐèÒªÒ»¸öÆÀ¼ÛµÄÖ¸±ê¡£ÔÚÉÏÎÄÖУ¬ÎÒÃǶ¼Óô¿¾»À´ËµÃ÷·ÖÀàЧ¹ûºÃ£¬ÄǺÎΪ´¿¾»ÄØ£¿Ö±¹ÛÀ´Ëµ¾ÍÊǼ¯ºÏÖÐÑù±¾ËùÊôÀà±ð±È½Ï¼¯ÖУ¬×îÀíÏëµÄÊÇÑù±¾¶¼ÊôÓÚͬһ¸ö·ÖÀà¡£Ñù±¾¼¯µÄ´¿¶È¿ÉÒÔÓÃìØÀ´½øÐкâÁ¿¡£
ÔÚÐÅÏ¢ÂÛÖУ¬ìØ´ú±íÁËÒ»¸öϵͳµÄ»ìÂҳ̶ȣ¬ìØÔ½´ó£¬ËµÃ÷ÎÒÃǵÄÊý¾Ý¼¯´¿¶ÈÔ½µÍ£¬µ±ÎÒÃǵÄÊý¾Ý¼¯¶¼ÊÇͬһ¸öÀà±ðµÄʱºò£¬ìØÎª0£¬ìصļÆË㹫ʽÈçÏ£º

ÆäÖУ¬P(xi)±íʾ¸ÅÂÊ£¬bÔÚ´Ë´¦È¡2¡£±ÈÈçÅ×Ó²±ÒµÄʱºò£¬ÕýÃæµÄ¸ÅÂʾÍÊÇ1/2£¬·´ÃæµÄ¸ÅÂÊÒ²ÊÇ1/2£¬ÄÇôÕâ¸ö¹ý³ÌµÄìØÎª£º

¿É¼û£¬ÓÉÓÚÅ×Ó²±ÒÊÇÒ»¸öÍêÈ«Ëæ»úʼþ£¬Æä½á¹ûÕýÃæºÍ·´ÃæÊǵȸÅÂʵģ¬ËùÒÔ¾ßÓкܸߵÄìØ¡£¼ÙÈçÎÒÃǹ۲ìµÄÊÇÓ²±Ò×îÖÕ·ÉÐеķ½Ïò£¬ÄÇôӲ±Ò×îºóÍùÏÂÂäµÄ¸ÅÂÊÊÇ1£¬ÍùÌìÉϷɵĸÅÂÊÊÇ0£¬´øÈëÉÏÃæµÄ¹«Ê½ÖУ¬¿ÉÒԵõ½Õâ¸ö¹ý³ÌµÄìØÎª0£¬ËùÒÔ£¬ìØÔ½Ð¡£¬½á¹ûµÄ¿ÉÔ¤²âÐÔ¾ÍԽǿ¡£ÔÚ¾ö²ßÊ÷µÄÉú³É¹ý³ÌÖУ¬ÎÒÃǵÄÄ¿±ê¾ÍÊÇÒª»®·ÖºóµÄ×Ó¼¯ÖÐÆäìØ×îС£¬ÕâÑùºóÐøµÄµÄµü´úÖУ¬¾Í¸üÈÝÒ×¶ÔÆä½øÐзÖÀà¡£
¼ÈÈ»Êǵݹé¹ý³Ì£¬ÄÇô¾ÍÐèÒªÖÆ¶¨µÝ¹éµÄÍ£Ö¹¹æÔò¡£ÔÚÁ½ÖÖÇé¿öÏÂÎÒÃÇÍ£Ö¹½øÒ»²½¶Ô×Ó¼¯½øÐл®·Ö£¬ÆäÒ»ÊÇ»®·ÖÒѾ´ïµ½¿ÉÒÔÀíÏëЧ¹ûÁË£¬ÁíÍâÒ»ÖÖ¾ÍÊǽøÒ»²½»®·ÖÊÕЧÉõ΢£¬²»ÖµµÃÔÙ¼ÌÐøÁË¡£ÓÃרҵÊõÓï×ܽáÖÕÖ¹Ìõ¼þÓÐÒÔϼ¸¸ö£º
1.×Ó¼¯µÄìØ´ïµ½ãÐÖµ
2.×Ó¼¯¹æÄ£¹»Ð¡
3.½øÒ»²½»®·ÖµÄÔöÒæÐ¡ÓÚãÐÖµ
ÆäÖУ¬Ìõ¼þ3ÖеÄÔöÒæ´ú±íµÄÊÇÒ»´Î»®·Ö¶ÔÊý¾Ý´¿¶ÈµÄÌáÉýЧ¹û£¬Ò²¾ÍÊÇ»®·ÖÒÔºó£¬ìؼõÉÙÔ½¶à£¬ËµÃ÷ÔöÒæÔ½´ó£¬ÄÇôÕâ´Î»®·ÖÒ²¾ÍÔ½ÓмÛÖµ£¬ÔöÒæµÄ¼ÆË㹫ʽÈçÏ£º

ÉÏÊö¹«Ê½¿ÉÒÔÀí½âΪ£º¼ÆËãÕâ´Î»®·ÖÖ®ºóÁ½¸ö×Ó¼¯µÄìØÖ®ºÍÏà¶Ô»®·Ö֮ǰµÄìØ¼õÉÙÁ˶àÉÙ£¬ÐèҪעÒâµÄÊÇ£¬¼ÆËã×Ó¼¯µÄìØÖ®ºÍÐèÒª³ËÉϸ÷¸ö×Ó¼¯µÄÈ¨ÖØ£¬È¨ÖصļÆËã·½·¨ÊÇ×Ó¼¯µÄ¹æÄ£Õ¼·Ö¸îǰ¸¸¼¯µÄ±ÈÖØ£¬±ÈÈç»®·ÖǰìØÎªe£¬»®·ÖΪ×Ó¼¯AºÍB£¬´óС·Ö±ðΪmºÍn£¬ìØ·Ö±ðΪe1ºÍe2£¬ÄÇôÔöÒæ¾ÍÊÇe
- m/(m + n) * e1 - n/(m + n) * e2¡£
3. ¾ö²ßÊ÷Ë㷨ʵÏÖ
ÓÐÁËÉÏÊö¸ÅÄÎÒÃǾͿÉÒÔ¿ªÊ¼¿ªÊ¼¾ö²ßÊ÷µÄѵÁ·ÁË£¬ÑµÁ·¹ý³Ì·ÖΪ£º
1.Ñ¡È¡ÌØÕ÷£¬·Ö¸îÑù±¾¼¯
2.¼ÆËãÔöÒæ£¬Èç¹ûÔöÒæ¹»´ó£¬½«·Ö¸îºóµÄÑù±¾¼¯×÷Ϊ¾ö²ßÊ÷µÄ×ӽڵ㣬·ñÔòÍ£Ö¹·Ö¸î
3.µÝ¹éÖ´ÐÐÉÏÁ½²½
ÉÏÊö²½ÖèÊÇÒÀÕÕID3µÄË㷨˼Ï루ÒÀ¾ÝÐÅÏ¢ÔöÒæ½øÐÐÌØÕ÷ѡȡºÍ·ÖÁÑ£©£¬³ý´ËÖ®Í⻹ÓÐC4.5ÒÔ¼°CARTµÈ¾ö²ßÊ÷Ëã·¨¡£
Ëã·¨¿ò¼ÜÈçÏ£º
class DecisionTree(object):
def fit(self, X, y):
# ÒÀ¾ÝÊäÈëÑù±¾Éú³É¾ö²ßÊ÷
self.root = self._build_tree(X, y)
def _build_tree(self, X, y, current_depth=0):
#1. ѡȡ×î¼Ñ·Ö¸îÌØÕ÷£¬Éú³É×óÓÒ½Úµã
#2. Õë¶Ô×óÓÒ½ÚµãµÝ¹éÉú³É×ÓÊ÷
def predict_value(self, x, tree=None):
# ½«ÊäÈëÑù±¾´«Èë¾ö²ßÊ÷ÖУ¬×Ô¶¥ÏòϽøÐÐÅж¨
# µ½´ïÒ¶×ӽڵ㼴ΪԤ²âÖµ
|
ÔÚÉÏÊö´úÂëÖУ¬ÊµÏÖ¾ö²ßÊ÷µÄ¹Ø¼üÊǵݹ鹹Ôì×ÓÊ÷µÄ¹ý³Ì£¬ÎªÁËʵÏÖÕâ¸ö¹ý³Ì£¬ÎÒÃÇÐèÒª×öºÃÈý¼þÊ£º·Ö±ðÊǽڵãµÄ¶¨Ò壬×î¼Ñ·Ö¸îÌØÕ÷µÄÑ¡Ôñ£¬µÝ¹éÉú³É×ÓÊ÷¡£
3.1 ½Úµã¶¨Òå
¾ö²ßÊ÷µÄÄ¿µÄÊÇÓÃÓÚ·ÖÀàÔ¤²â£¬¼´¸÷¸ö½ÚµãÐèҪѡȡÊäÈëÑù±¾µÄÌØÕ÷£¬½øÐйæÔòÅж¨£¬×îÖÕ¾ö¶¨Ñù±¾¹éÊôµ½ÄÄÒ»¿Ã×ÓÊ÷£¬»ùÓÚÕâ¸öÄ¿µÄ£¬¾ö²ßÊ÷µÄÿһ¸ö½ÚµãÐèÒª°üº¬ÒÔϼ¸¸ö¹Ø¼üÐÅÏ¢£º
1.ÅоöÌØÕ÷£ºµ±Ç°½ÚµãÕë¶ÔÄÄÒ»¸öÌØÕ÷½øÐÐÅоö
2.Åоö¹æÔò£º¶ÔÓÚ¶þÀàÎÊÌ⣬Õâ¸ö¹æÔòÒ»°ãÊÇÒ»¸ö²¼¶û±í´ïʽ
3.×ó×ÓÊ÷£ºÅоöΪTRUEµÄÑù±¾
4.ÓÒ×ÓÊ÷£ºÅоöΪFALSEµÄÑù±¾
¾ö²ßÊ÷½ÚµãµÄ¶¨Òå´úÂëÈçÏÂËùʾ£º
class DecisionNode():
def __init__(self, feature_i=None, threshold=None,
value=None, true_branch=None, false_branch=None):
self.feature_i = feature_i # ÓÃÓÚ²âÊÔµÄÌØÕ÷¶ÔÓ¦µÄË÷Òý
self.threshold = threshold # ÅжϹæÔò£º>=thresholdΪtrue
self.value = value # Èç¹ûÊÇÒ¶×ӽڵ㣬ÓÃÓÚ±£´æÔ¤²â½á¹û
self.true_branch = true_branch # ×ó×ÓÊ÷
self.false_branch = false_branch # ÓÒ×ÓÊ÷
|
3.2 ÌØÕ÷ѡȡ
ÌØÕ÷ѡȡÊǹ¹Ôì¾ö²ßÊ÷×î¹Ø¼üµÄ²½Ö裬ÆäÄ¿µÄÊÇÑ¡³öÄܹ»ÊµÏÖ·Ö¸î½á¹û×î´¿¾»µÄÄǸöÌØÕ÷£¬Æä²Ù×÷Á÷³ÌµÄ´úÂëÈçÏ£º
#
±éÀúÑù±¾¼¯ÖеÄËùÓÐÌØÕ÷£¬Õë¶Ôÿһ¸öÌØÕ÷¼ÆËã×î¼Ñ·Ö¸îµã
# ÔÙѡȡ×î¼ÑµÄ·Ö¸îÌØÕ÷
for feature_i in range(n_features):
# ±éÀú¼¯ºÏÖÐij¸öÌØÕ÷µÄËùÓÐȡֵ
for threshold in unique_values:
# ÒÔµ±Ç°ÌØÕ÷Öµ×÷ΪãÐÖµ½øÐзָî
Xy1, Xy2 = divide_on_feature(X_y, feature_i,
threshold)
# ¼ÆËã·Ö¸îºóµÄÔöÒæ
gain = gain(y, y1, y2)
# ¼Ç¼×î¼Ñ·Ö¸îÌØÕ÷£¬×î¼Ñ·Ö¸îãÐÖµ
if gain > largest_gain:
largest_gain = gain
best_criteria = {
"feature_i": feature_i,
"threshold": threshold
} |
3.3 ½Úµã·ÖÁÑ
½Úµã·ÖÁѵÄʱºòÓÐÁ½Ìõ´¦Àí·ÖÖ§£¬Èç¹ûÔöÒæ¹»´ó£¬¾Í·ÖÁÑΪ×óÓÒ×ÓÊ÷£¬Èç¹ûÔöÒæºÜС£¬¾ÍÍ£Ö¹·ÖÁÑ£¬½«Õâ¸ö½ÚµãÖ±½Ó×÷ΪҶ×ӽڵ㡣½Úµã·ÖÁѺÍGain£¨·Ö¸îºóÔöÒæ£©µÄ¼ÆËã¿ÉÒÔ×öÒ»¸öÓÅ»¯£¬ÔÚÉÏÒ»¸ö²½ÖèÖУ¬ÎÒÃÇѰÕÒ×îÓÅ·Ö¸îµãµÄʱºòÆäʵ¾Í¿ÉÒÔ½«×î¼Ñ·ÖÁÑ×Ó¼¯ºÍGain¼ÆËã²¢±£´æÏÂÀ´£¬½«ÉÏÒ»²½ÖеÄforÑ»·¸ÄдΪ£º
#
ÒÔµ±Ç°ÌØÕ÷Öµ×÷ΪãÐÖµ½øÐзָî
Xy1, Xy2 = divide_on_feature(X_y, feature_i,
threshold)
# ¼ÆËã·Ö¸îºóµÄìØ
gain = gain(y, y1, y2)
# ¼Ç¼×î¼Ñ·Ö¸îÌØÕ÷£¬×î¼Ñ·Ö¸îãÐÖµ
if gain > largest_gain:
largest_gain = gain
best_criteria = {
"feature_i": feature_i,
"threshold": threshold ,
}
best_sets = {
"left": Xy1,
"right": Xy2,
"gain": gain
} |
ΪÁË·ÀÖ¹¹ýÄâºÏ£¬ÐèÒªÉèÖúÏÊʵÄÍ£Ö¹Ìõ¼þ£¬±ÈÈçÉèÖÃGainµÄãÐÖµ£¬Èç¹ûGain±È½ÏС£¬¾ÍûÓбØÒª¼ÌÐø½øÐзָËùÒÔ½ÓÏÂÀ´£¬ÎÒÃǾͿÉÒÔÒÀ¾ÝgainÀ´¾ö¶¨·Ö¸î²ßÂÔ£º
if
best_sets["gain"] > MIN_GAIN:
# ¶Ôbest_sets["left"]½øÒ»²½¹¹Ôì×ÓÊ÷£¬²¢×÷Ϊ¸¸½ÚµãµÄ×ó×ÓÊ÷
# ¶Ôbest_sets["right"]½øÒ»²½¹¹Ôì×ÓÊ÷£¬²¢×÷Ϊ¸¸½ÚµãµÄÓÒ×ÓÊ÷
...
else:
# Ö±½Ó½«¸¸½Úµã×÷ΪҶ×Ó½Úµã
... |
ÏÂÃæ£¬ÎÒÃǽáºÏÒ»×éʵÑéÊý¾ÝÀ´Ñ§Ï°¾ö²ßÊ÷µÄѵÁ··½·¨¡£ÊµÑéÊý¾ÝÀ´Ô´ÓÚÕâÀϱíÖеÄÊý¾ÝÊÇÒ»×éÏû·Ñµ÷²é½á¹û£¬ÎÒÃÇѵÁ·¾ö²ßÊ÷µÄÄ¿µÄ£¬¾ÍÊǹ¹ÔìÒ»¸ö·ÖÀàËã·¨£¬Ê¹µÃÓÐеÄÓû§Êý¾Ýʱ£¬ÎÒÃÇÒÀ¾ÝѵÁ·½á¹ûÈ¥ÍÆ¶ÏÒ»¸öÓû§ÊÇ·ñ¹ºÂòÕâ¸öÉÌÆ·£º
´Ó±íÖпÉÒÔ¿´³ö£¬ÎÒÃÇÒ»¹²ÓÐ20×éµ÷²éÑù±¾£¬Ã¿Ò»×éÑù±¾°üº¬ËĸöÌØÕ÷£¬·Ö±ðÊÇÄêÁä¶Î£¬Ñ§Àú£¬ÊÕÈ룬»éÒö×´¿ö£¬¶ø×îºóÒ»ÁÐÊÇËùÊô·ÖÀ࣬ÔÚÕâ¸öÎÊÌâÖоʹú±íÊÇ·ñ¹ºÂòÁ˸òúÆ·¡£
¼à¶½Ñ§Ï°¾ÍÊÇÔÚÿһ¸öÑù±¾¶¼ÓÐÕýÈ·´ð°¸µÄǰÌáÏ£¬ÓÃËã·¨Ô¤²â½á¹û£¬È»ºó¸ù¾ÝÔ¤²âÇé¿öµÄºÃ»µ£¬µ÷ÕûËã·¨²ÎÊý¡£ÔÚ¾ö²ßÊ÷ÖУ¬Ô¤²âµÄ¹ý³Ì¾ÍÊÇÒÀ¾Ý¸÷¸öÌØÕ÷»®·ÖÑù±¾¼¯£¬ÆÀ¼ÛÔ¤²â½á¹ûµÄºÃ»µ±ê×¼ÊÇ»®·Ö½á¹ûµÄ´¿¶È¡£
ΪÁË·½±ã´¦Àí£¬ÎÒÃǶÔÑù±¾Êý¾Ý½øÐÐÁ˼ò»¯£¬½«ÄêÁäÌØÕ÷°´ÕÕÑù±¾µÄÌØµã£¬×ª»¯ÎªÀëÉ¢µÄÊý¾Ý£¬±ÈÈçСÓÚ18¶ÔÓ¦0£¬18¶ÔÓ¦1£¬18-35¶ÔÓ¦2£¬36-55¶ÔÓ¦3£¬´óÓÚ55¶ÔÓ¦4£¬ÒÔ´ËÀàÍÆ£¬Í¬ÑùÆäËûµÄÌØÕ÷Ò²Ò»Ñù×îÊý×Ö»¯´¦Àí£¬½ÌÓýˮƽ·Ö±ðÓ³ÉäΪ0(hight
school)£¬1(bachelor¡¯s)£¬2(master¡¯s)£¬ÊÕÈëÓ³ÉäΪ0(low)ºÍ1(hight),
»éÒö×´¿öͬÑùÓ³ÉäΪ0(single), 1(married)£¬×îÖÕ´¦ÀíºóµÄÑù±¾£¬·Åµ½Ò»¸önumpy¾ØÕóÖУ¬ÈçÏÂËùʾ£º
X_y
= np.array(
[[3, 2, 1, 0, 1],
[2, 0, 0, 0, 0],
[3, 2, 0, 0, 1],
[2, 1, 1, 0, 0],
[0, 0, 0, 0, 1],
[2, 1, 1, 1, 0],
[3, 1, 0, 1, 0],
[4, 1, 1, 0, 1],
[3, 2, 0, 1, 0],
[4, 2, 0, 1, 1],
[3, 2, 1, 0, 1],
[4, 2, 1, 0, 1],
[1, 0, 1, 0, 0],
[3, 2, 0, 0, 1],
[3, 0, 0, 0, 1],
[0, 0, 0, 1, 1],
[2, 1, 1, 1, 0],
[4, 0, 1, 1, 1],
[4, 1, 0, 0, 1],
[3, 0, 1, 1, 0]]
) |
4. ÐÂÑù±¾Ô¤²â
ÒÀÕÕÉÏÃæµÄËã·¨¹¹Ôì¾ö²ßÊ÷£¬ÎÒÃǽ«¾ö²ßÊ÷´òÓ¡³öÀ´£¬ÈçÏÂËùʾ£º
--
Classification Tree --
0 : 4?
T -> 1
F -> 3 : 1?
T -> 0 : 2?
T -> 0
F -> 1
F -> 0 : 3?
T -> 1
F -> 0 : 1?
T -> 0
F -> 1 |
ÆäÖУ¬Ã°ºÅǰ´ú±íÑ¡ÔñµÄ·Ö¸îÌØÕ÷£¬Ã°ºÅºóÃæ´ú±íÅбð¹æÔò£¬¶þÕß×éºÏÆðÀ´¾ÍÊÇÒ»¸ö¾ö²ßÊ÷µÄ·ÇÒ¶×ӽڵ㣬ÿ¸ö·ÇÒ¶×Ó½Úµã½øÒ»²½·Ö¸îΪ·ÖΪTrueºÍFalse·ÖÖ§£¬¶ÔÓÚÒ¶×Ó½Úµã¼ýÍ·ºóÃæ±íʾ×îÖÕ·ÖÀ࣬0±íʾ²»¹ºÂò£¬1±íʾ¹ºÂò¡£ÓÉÓÚÎÒÃǵÄÊý¾Ý×ö¹ý¼ò»¯£¬ËùÒÔÉÏÊö½á¹û²»Ì«Ö±¹Û£¬ÎÒÃǽ«¶ÔÓ¦µÄÌØÕ÷ÒÔ¼°ÅжϹæÔò·Òëһϣº
ÄêÁä
: ´óÓÚ55?
ÊÇ -> ¹ºÂò
·ñ -> ÊÕÈë : ¸ß?
ÊÇ -> ÄêÁä : ´óÓÚ18?
ÊÇ -> ²»¹ºÂò
·ñ -> ¹ºÂò
·ñ -> ÄêÁä : ´óÓÚ36?
ÊÇ -> ¹ºÂò
·ñ -> ÄêÁä : ´óÓÚµÈÓÚ18?
ÊÇ -> ²»¹ºÂò
·ñ -> ¹ºÂò |
¾ö²ßÊ÷¹¹ÔìÍêÖ®ºó£¬ÎÒÃǾͿÉÒÔÓÃÀ´½øÐÐÐÂÑù±¾µÄ·ÖÀàÁË¡£¾ö²ßÊ÷µÄÔ¤²â¹ý³ÌÊ®·ÖÈÝÒ×Àí½â£¬Ö»ÐèÒª½«´Ó¸ù½Úµã¿ªÊ¼£¬°´Õսڵ㶨ÒåµÄ¹æÔò½øÐÐÅоö£¬Ñ¡Ôñ¶ÔÓ¦µÄ×ÓÊ÷£¬²¢Öظ´Õâ¸ö¹ý³Ì£¬Ö±µ½Ò¶×ӽڵ㼴¿É¡£¾ö²ßÊ÷µÄÔ¤²â¹¦ÄÜʵÏÖ´úÂëÈçÏ£º
def
predict_value(self, x, tree=None):
# Èç¹ûµ±Ç°½ÚµãÊÇÒ¶×ӽڵ㣬ֱ½ÓÊä³öÆäÖµ
if tree.value is not None:
return tree.value
# ·ñÔò½«x°´ÕÕµ±Ç°½ÚµãµÄ¹æÔò½øÐÐÅоö
# Èç¹ûÅоöΪtrueÑ¡Ôñ×ó×ÓÊ÷£¬·ñÔòÑ¡ÔñÓÒ×ÓÊ÷£¬
feature_value = x[tree.feature_i]
if feature_value >= tree.threshold:
branch = tree.true_branch
else:
branch = tree.false_branch
# ÔÚÑ¡ÖеÄ×ÓÊ÷ÉÏµÝ¹é½øÐÐÅжÏ
return self.predict_value(x, branch) |
5. ×ܽá
¾ö²ßÊ÷ÊÇÒ»ÖÖ¼òµ¥³£ÓõķÖÀàÆ÷£¬Í¨¹ýѵÁ·ºÃµÄ¾ö²ßÊ÷¿ÉÒÔʵÏÖ¶Ôδ֪µÄÊý¾Ý½øÐиßЧ·ÖÀà¡£´ÓÎÒÃǵÄÀý×ÓÖÐÒ²¿ÉÒÔ¿´³ö£¬¾ö²ßÊ÷Ä£Ð;ßÓнϺõĿɶÁÐÔºÍÃèÊöÐÔ£¬ÓÐÖúÓÚ¸¨ÖúÈ˹¤·ÖÎö£»´ËÍ⣬¾ö²ßÊ÷µÄ·ÖÀàЧÂʸߣ¬Ò»´Î¹¹½¨ºó¿ÉÒÔ·´¸´Ê¹Ó㬶øÇÒÿһ´ÎÔ¤²âµÄ¼ÆËã´ÎÊý²»³¬¹ý¾ö²ßÊ÷µÄÉî¶È¡£
µ±È»£¬¾ö²ßÊ÷Ò²ÓÐÆäȱµã¡£¶ÔÓÚÁ¬ÐøµÄÌØÕ÷£¬±È½ÏÄÑÒÔ´¦Àí£¬¶ÔÓÚ¶à·ÖÀàÎÊÌ⣬¼ÆËãÁ¿ºÍ׼ȷÂʶ¼²»ÀíÏë¡£´ËÍ⣬ÔÚʵ¼ÊÓ¦ÓÃÖУ¬ÓÉÓÚÆä×îµ×²ãÒ¶×Ó½ÚµãÊÇͨ¹ý¸¸½ÚµãÖеĵ¥Ò»¹æÔòÉú³ÉµÄ£¬ËùÒÔͨ¹ýÊÖ¶¯ÐÞ¸ÄÑù±¾ÌØÕ÷±È½ÏÈÝÒ×ÆÛÆ·ÖÀàÆ÷£¬±ÈÈçÔÚÀ¹»÷Óʼþʶ±ðϵͳÖУ¬Óû§¿ÉÄÜͨ¹ýÐÞ¸Äijһ¸ö¹Ø¼üÌØÕ÷£¬¾Í¿ÉÒÔÆ¹ýÀ¬»øÓʼþʶ±ðϵͳ¡£´ÓʵÏÖÉÏÀ´½²£¬ÓÉÓÚÊ÷µÄÉú³É²ÉÓõÄÊÇµÝ¹é£¬Ëæ×ÅÑù±¾¹æÄ£µÄÔö´ó£¬¼ÆËãÁ¿ÒÔ¼°ÄÚ´æÏûºÄ»á±äµÃÔ½À´Ô½´ó¡£
´ËÍ⣬¹ýÄâºÏÒ²ÊǾö²ßÊ÷ÃæÁÙµÄÒ»¸öÎÊÌ⣬ÍêȫѵÁ·µÄ¾ö²ßÊ÷(δ½øÐмôÖ½£¬Î´ÏÞÖÆGainµÄãÐÖµ)Äܹ»100%׼ȷµØÔ¤²âѵÁ·Ñù±¾£¬ÒòΪÆäÊǶÔѵÁ·Ñù±¾µÄÍêÈ«ÄâºÏ£¬µ«ÊÇ£¬¶ÔÓëѵÁ·Ñù±¾ÒÔÍâµÄÑù±¾£¬ÆäÔ¤²âЧ¹û¿ÉÄܻ᲻ÀíÏ룬Õâ¾ÍÊǹýÄâºÏ¡£½â¾ö¾ö²ßÊ÷µÄ¹ýÄâºÏ£¬³ýÁËÉÏÎÄ˵µ½µÄͨ¹ýÉèÖÃGainµÄãÐÖµ×÷ΪֹͣÌõ¼þÖ®Í⣬ͨ³£»¹ÐèÒª¶Ô¾ö²ßÊ÷½øÐмôÖ¦£¬³£ÓõļôÖ¦²ßÂÔÓУº
Pessimistic Error Pruning£º±¯¹Û´íÎó¼ôÖ¦
Minimum Error Pruning£º×îСÎó²î¼ôÖ¦
Cost-Complexity Pruning£º´ú¼Û¸´ÔÓ¼ôÖ¦
Error-Based Pruning£º»ùÓÚ´íÎóµÄ¼ôÖ¦£¬¼´¶Ôÿһ¸ö½Úµã£¬¶¼ÓÃÒ»×é²âÊÔÊý¾Ý¼¯½øÐвâÊÔ£¬Èç¹û·ÖÁÑÖ®ºó£¬Äܹ»½µµÍ´íÎóÂÊ£¬ÔÙ¼ÌÐø·ÖÁÑΪÁ½¿Ã×ÓÊ÷£¬·ñÔòÖ±½Ó×÷ΪҶ×ӽڵ㡣
Critical Value Pruning£º¹Ø¼üÖµ¼ôÖ¦£¬Õâ¾ÍÊÇÉÏÎÄÖÐÌáµ½µÄÉèÖÃGainµÄãÐÖµ×÷ΪֹͣÌõ¼þ¡£
|