Äú¿ÉÒÔ¾èÖú£¬Ö§³ÖÎÒÃǵĹ«ÒæÊÂÒµ¡£

1Ôª 10Ôª 50Ôª





ÈÏÖ¤Â룺  ÑéÖ¤Âë,¿´²»Çå³þ?Çëµã»÷Ë¢ÐÂÑéÖ¤Âë ±ØÌî



  ÇóÖª ÎÄÕ ÎÄ¿â Lib ÊÓÆµ iPerson ¿Î³Ì ÈÏÖ¤ ×Éѯ ¹¤¾ß ½²×ù Model Center   Code  
»áÔ±   
   
 
     
   
 ¶©ÔÄ
  ¾èÖú
ÈçºÎÓÃPython×ö×Ô¶¯»¯ÌØÕ÷¹¤³Ì
 
  2467  次浏览      27
 2019-8-16
 
±à¼­ÍƼö:

±¾ÎÄÀ´×Ô´óÊý¾ÝÎÄÕª£¬±¾ÎÄ×÷Õß½«Ê¹ÓÃPythonµÄfeaturetools¿â½øÐÐ×Ô¶¯»¯ÌØÕ÷¹¤³ÌµÄʾÀý£¬Ï£Íû¶ÔÄúµÄѧϰÓÐËùÆôµÏ¡£

»úÆ÷ѧϰµÄÄ£ÐÍѵÁ·Ô½À´Ô½×Ô¶¯»¯£¬µ«ÌØÕ÷¹¤³Ì»¹ÊÇÒ»¸öÂþ³¤µÄÊÖ¶¯¹ý³Ì£¬ÒÀÀµÓÚרҵµÄÁìÓò֪ʶ£¬Ö±¾õºÍÊý¾Ý´¦Àí¡£¶øÌØÕ÷ѡȡǡǡÊÇ»úÆ÷Ñ§Ï°ÖØÒªµÄÏÈÆÚ²½Ö裬ËäÈ»²»ÈçÄ£ÐÍѵÁ·ÄÇÑùÄܲúÉúÖ±½Ó¿ÉÓõĽá¹û¡£±¾ÎÄ×÷Õß½«Ê¹ÓÃPythonµÄfeaturetools¿â½øÐÐ×Ô¶¯»¯ÌØÕ÷¹¤³ÌµÄʾÀý¡£

»úÆ÷ѧϰԽÀ´Ô½¶àµØ´ÓÊÖ¶¯Éè¼ÆÄ£ÐÍת±äΪʹÓÃH20£¬TPOTºÍauto-sklearnµÈ¹¤¾ßÀ´×Ô¶¯ÓÅ»¯µÄÇþµÀ¡£ÕâЩ¿âÒÔ¼°Ëæ»úËÑË÷µÈ·½·¨Ö¼ÔÚͨ¹ý²éÕÒÊý¾Ý¼¯µÄ×îÓÅÄ£ÐÍÀ´¼ò»¯Ä£ÐÍÑ¡ÔñºÍת±ä»úÆ÷ѧϰµÄ²¿·Ö£¬¼¸ºõ²»ÐèÒªÈ˹¤¸ÉÔ¤¡£È»¶ø£¬ÌØÕ÷¹¤³Ì¼¸ºõÍêÈ«ÊÇÈ˹¤£¬ÕâÎÞÒÉÊÇ»úÆ÷ѧϰ¹ÜµÀÖиüÓмÛÖµµÄ·½Ãæ¡£

ÌØÕ÷¹¤³ÌÒ²³ÆÎªÌØÕ÷´´½¨£¬ÊÇ´ÓÏÖÓÐÊý¾Ý¹¹½¨ÐÂÌØÕ÷ÒÔѵÁ·»úÆ÷ѧϰģÐ͵Ĺý³Ì¡£Õâ¸ö²½Öè¿ÉÄܱÈʵ¼ÊÓ¦ÓõÄÄ£Ð͸üÖØÒª£¬ÒòΪ»úÆ÷ѧϰËã·¨Ö»´ÓÎÒÃÇÌṩµÄÊý¾ÝÖÐѧϰ£¬È»¶ø´´½¨ÓëÈÎÎñÏà¹ØµÄÌØÕ÷¾ø¶ÔÊÇÖÁ¹ØÖØÒªµÄ¡£

ͨ³££¬ÌØÕ÷¹¤³ÌÊÇÒ»¸öÂþ³¤µÄÊÖ¶¯¹ý³Ì£¬ÒÀÀµÓÚרҵµÄÁìÓò֪ʶ£¬Ö±¾õºÍÊý¾Ý´¦Àí¡£Õâ¸ö¹ý³Ì¿ÉÄܷdz£·±Ëö£¬¶øÇÒ×îÖÕµÄÌØÕ÷½«Êܵ½ÈËÀàÖ÷¹ÛÐÔºÍʱ¼äµÄÏÞÖÆ¡£×Ô¶¯»¯ÌØÕ÷¹¤³ÌÖ¼ÔÚͨ¹ý´ÓÊý¾Ý¼¯ÖÐ×Ô¶¯´´½¨Ðí¶àºòÑ¡ÌØÕ÷À´°ïÖúÊý¾Ý¿ÆÑ§¼Ò£¬²¢´ÓÖпÉÒÔÑ¡Ôñ×î¼ÑÌØÕ÷ÓÃÓÚѵÁ·¡£

ÔÚ±¾ÎÄÖУ¬ÎÒÃǽ«Ê¹ÓÃPython µÄfeaturetools¿â½øÐÐ×Ô¶¯»¯ÌØÕ÷¹¤³ÌµÄʾÀý¡£ÎÒÃǽ«Ê¹ÓÃʾÀýÊý¾Ý¼¯À´ÑÝʾ»ù´¡ÖªÊ¶¡£

ÍêÕû´úÂ룺

https://github.com/WillKoehrsen/automated-feature-engineering/blob/master/walk_through/Automated_Feature
_Engineering.ipynb

ÌØÕ÷¹¤³Ì»ù´¡

ÌØÕ÷¹¤³ÌÒâζ×Å´ÓÏÖÓÐÊý¾ÝÖй¹½¨¶îÍâÌØÕ÷£¬ÕâЩÊý¾Ýͨ³£·Ö²¼ÔÚ¶à¸öÏà¹Ø±íÖС£ÌØÕ÷¹¤³ÌÐèÒª´ÓÊý¾ÝÖÐÌáÈ¡Ïà¹ØÐÅÏ¢²¢½«Æä·ÅÈëµ¥¸ö±íÖУ¬È»ºó¿ÉÒÔʹÓøñíÀ´ÑµÁ·»úÆ÷ѧϰģÐÍ¡£

¹¹½¨ÌØÕ÷µÄ¹ý³Ì·Ç³£µØºÄʱ£¬ÒòΪÿ¸öÌØÕ÷µÄ¹¹½¨Í¨³£ÐèҪһЩ²½ÖèÀ´ÊµÏÖ£¬ÓÈÆäÊÇʹÓöà¸ö±íÖеÄÐÅϢʱ¡£ÎÒÃÇ¿ÉÒÔ½«ÌØÕ÷´´½¨µÄ²½Öè·ÖΪÁ½Àࣺת»»ºÍ¾ÛºÏ¡£ÈÃÎÒÃÇ¿´¼¸¸öÀý×ÓÀ´Á˽âÕâЩ¸ÅÄîµÄʵ¼ÊÓ¦Óá£

ת»»×÷ÓÃÓÚµ¥¸ö±í£¨´ÓPython½Ç¶ÈÀ´¿´£¬±íÖ»ÊÇÒ»¸öPandas Êý¾Ý¿ò£©£¬Ëüͨ¹ýÒ»¸ö»ò¶à¸öÏÖÓеÄÁд´½¨ÐÂÌØÕ÷¡£

ÀýÈ磬Èç¹ûÎÒÃÇÓÐÈçÏ¿ͻ§±í¡£

ÎÒÃÇ¿ÉÒÔͨ¹ý²éÕÒjoinedÁеÄÔ·ݻòÊÇ»ñÈ¡incomeÁеÄ×ÔÈ»¶ÔÊýÀ´´´½¨ÌØÕ÷¡£ÕâЩ¶¼ÊÇת»»£¬ÒòΪËüÃǽöʹÓÃÀ´×ÔÒ»¸ö±íµÄÐÅÏ¢¡£

import pandas as pd

# Group loans by client id and calculate mean, max, min of loans
stats = loans.groupby('client_id')['loan_amount'].agg(['mean', 'max', 'min'])
stats.columns = ['mean_loan_amount', 'max_loan_amount', 'min_loan_amount']

# Merge with the clients dataframe
stats = clients.merge(stats, left_on = 'client_id', right_index=True, how = 'left')

stats.head(10)

ÁíÒ»·½Ã棬¾ÛºÏ×÷ÓÃÓÚ¶à¸ö±í£¬²¢Ê¹ÓÃÒ»¶Ô¶àµÄ¹ØÏµ¶Ô¹Û²âÖµ½øÐзÖ×飬Ȼºó¼ÆËãͳ¼ÆÊý¾Ý¡£ÀýÈ磬Èç¹ûÎÒÃÇÓÐÁíÒ»¸ö°üº¬¿Í»§´û¿îµÄÐÅÏ¢±í¸ñ£¬ÆäÖÐÿ¸ö¿Í»§¿ÉÄÜÓжà±Ê´û¿î£¬ÎÒÃÇ¿ÉÒÔ¼ÆËãÿ¸ö¿Í»§µÄ´û¿îµÄƽ¾ùÖµ£¬×î´óÖµºÍ×îСֵµÈͳ¼ÆÊý¾Ý¡£

´Ë¹ý³Ì°üÀ¨Í¨¹ý¿Í»§ÐÅÏ¢¶Ô´û¿î±í½øÐзÖ×飬¼ÆËã¾ÛºÏ£¬È»ºó½«½á¹ûÊý¾ÝºÏ²¢µ½¿Í»§Êý¾ÝÖС£ÒÔÏÂÊÇÎÒÃÇÈçºÎʹÓÃPandas¿âÔÚPythonÖÐÖ´Ðд˲Ù×÷¡£

ÕâЩ²Ù×÷±¾Éí²¢²»À§ÄÑ£¬µ«Èç¹ûÎÒÃÇÓÐÊý°Ù¸ö±äÁ¿·Ö²¼ÔÚ¼¸Ê®¸ö±íÖУ¬ÄÇôÕâ¸ö¹ý³ÌҪͨ¹ýÊÖ¹¤Íê³ÉÊDz»¿ÉÐеġ£ÀíÏëÇé¿öÏ£¬ÎÒÃÇÐèÒªÒ»ÖÖÄܹ»¿ç¶à¸ö±í×Ô¶¯Ö´ÐÐת»»ºÍ¾ÛºÏµÄ½â¾ö·½°¸£¬²¢½«½á¹ûÊý¾ÝºÏ²¢µ½Ò»¸ö±íÖС£¾¡¹ÜPandas¿âÊÇÒ»¸öºÜºÃµÄ×ÊÔ´£¬µ«Í¨¹ýÎÒÃÇÊÖ¹¤Íê³ÉµÄÊý¾Ý²Ù×÷ÊÇÓÐÏ޵ġ£

ÊÖ¶¯ÌØÕ÷¹¤³ÌµÄ¸ü¶àÐÅÏ¢£º

https://jakevdp.github.io/PythonDataScienceHandbook/


05.04-feature-engineering.html

Featuretools

ÐÒÔ˵ÄÊÇ£¬featuretoolsÕýÊÇÎÒÃÇÕýÔÚѰÕҵĽâ¾ö·½°¸¡£Õâ¸ö¿ªÔ´Python¿â½«×Ô¶¯´ÓÒ»×éÏà¹Ø±íÖд´½¨Ðí¶àÌØÕ÷¡£Featuretools»ùÓÚÒ»ÖÖ³ÆÎª¡°Éî¶ÈÌØÕ÷ºÏ³É¡±µÄ·½·¨£¬Õâ¸öÃû×ÖÌýÆðÀ´±Èʵ¼ÊµÄÓÃ;¸üÁîÈËÓ¡ÏóÉî¿Ì

Éî¶ÈÌØÕ÷ºÏ³ÉʵÏÖÁ˶àÖØ×ª»»ºÍ¾ÛºÏ²Ù×÷£¨ÔÚfeaturetoolsµÄ´Ê»ãÖгÆÎªÌØÕ÷»ùÔª£©£¬Í¨¹ý·Ö²¼ÔÚÐí¶à±íÖеÄÊý¾ÝÀ´´´½¨ÌØÕ÷¡£Ïñ»úÆ÷ѧϰÖеĴó¶àÊý¹ÛÄîÒ»Ñù£¬ËüÊǽ¨Á¢ÔÚ¼òµ¥¸ÅÄî»ù´¡Éϵĸ´ºÏÐÍ·½·¨¡£Í¨¹ýÒ»´Îѧϰһ¸ö¹¹Ôì¿éµÄʾÀý£¬ÎÒÃǾͻáÈÝÒ×Àí½âÕâÖÖÇ¿´óµÄ·½·¨¡£

Ê×ÏÈ£¬ÎÒÃÇÀ´¿´¿´ÎÒÃǵÄʾÀýÊý¾Ý¡£ ÎÒÃÇÒѾ­¿´µ½ÁËÉÏÃæµÄһЩÊý¾Ý¼¯£¬ÍêÕûµÄ±í¼¯ºÏÈçÏ£º

¿Í»§£º¼´ÓйØÐÅ´ûÁªÃËÖпͻ§µÄ»ù±¾ÐÅÏ¢¡£Ã¿¸ö¿Í»§ÔÚ´ËÊý¾Ý¿òÖÐÖ»ÓÐÒ»ÐС£

´û¿î£º¼´¿Í»§´û¿î¡£Ã¿Ïî´û¿îÔÚ´ËÊý¾Ý¿òÖÐÖ»ÓÐ×Ô¼ºµ¥¶ÀÒ»ÐеļǼ£¬µ«¿Í»§¿ÉÄÜÓжàÏî´û¿î¡£

¸¶¿î£º¼´Ö§¸¶´û¿î¡£ ÿ±ÊÖ§¸¶Ö»ÓÐÒ»ÐмǼ£¬µ«Ã¿±Ê´û¿î¶¼Óжà±ÊÖ§¸¶¼Ç¼¡£

Èç¹ûÎÒÃÇÓлúÆ÷ѧϰĿ±ê£¬ÀýÈçÔ¤²â¿Í»§ÊÇ·ñ½«³¥»¹Î´À´´û¿î£¬ÎÒÃÇÏ£Íû½«Óйؿͻ§µÄËùÓÐÐÅÏ¢×éºÏµ½Ò»¸ö±íÖС£ÕâЩ±íÊÇÏà¹ØµÄ£¨Í¨¹ýclient_idºÍloan_id±äÁ¿£©£¬Ä¿Ç°ÎÒÃÇ¿ÉÒÔÊÖ¶¯Íê³ÉһϵÁÐת»»ºÍ¾ÛºÏ¹ý³Ì¡£È»¶ø£¬²»¾ÃÖ®ºóÎÒÃǾͿÉÒÔʹÓÃfeaturetoolsÀ´×Ô¶¯»¯¸Ã¹ý³Ì¡£

ʵÌåºÍʵÌ弯

featuretoolsµÄǰÁ½¸ö¸ÅÄîÊÇʵÌåºÍʵÌ弯¡£ÊµÌåÖ»ÊÇÒ»¸ö±í£¨Èç¹ûÓÃPandas¿âµÄ¸ÅÄîÀ´Àí½â£¬ÊµÌåÊÇÒ»¸öDataFrame£¨Êý¾Ý¿ò£©£©¡£

EntitySet£¨ÊµÌ弯£©ÊDZíµÄ¼¯ºÏÒÔ¼°ËüÃÇÖ®¼äµÄ¹ØÏµ¡£¿ÉÒÔ½«ÊµÌ弯ÊÓΪÁíÒ»¸öPythonÊý¾Ý½á¹¹£¬¸Ã½á¹¹¾ßÓÐ×Ô¼ºµÄ·½·¨ºÍÊôÐÔ¡££©

ÎÒÃÇ¿ÉÒÔʹÓÃÒÔÏÂÃüÁîÔÚfeaturetoolsÖд´½¨Ò»¸ö¿ÕʵÌ弯£º

import featuretools as ft

# Create new entityset
es = ft.EntitySet(id = 'clients')

ÏÖÔÚÎÒÃÇÌí¼ÓʵÌ塣ÿ¸öʵÌå¶¼±ØÐëÓÐÒ»¸öË÷Òý£¬¸ÃË÷ÒýÊÇÒ»¸ö°üº¬ËùÓÐÎ¨Ò»ÔªËØµÄÁС£Ò²¾ÍÊÇ˵£¬Ë÷ÒýÖеÄÿ¸öÖµÖ»ÄܳöÏÖÔÚ±íÖÐÒ»´Î¡£

clientsÊý¾Ý¿òÖеÄË÷ÒýÊÇclient_id£¬ÒòΪÿ¸ö¿Í»§ÔÚ´ËÊý¾Ý¿òÖÐÖ»ÓÐÒ»ÐС£ ÎÒÃÇʹÓÃÒÔÏÂÓï·¨½«Ò»¸öÏÖÓÐË÷ÒýµÄʵÌåÌí¼Óµ½ÊµÌ弯ÖУº

# Create an entity from the client dataframe
# This dataframe already has an index and a time index
es = es.entity_from_dataframe(entity_id = 'clients', dataframe = clients, index = 'client_id', time_index = 'joined')

loansÊý¾Ý¿ò»¹¾ßÓÐΨһË÷Òýloan_id£¬²¢ÇÒ½«ÆäÌí¼Óµ½ÊµÌ弯µÄÓï·¨ÓëclientsÏàͬ¡£µ«ÊÇ£¬¶ÔÓÚpaymentsÊý¾Ý¿ò£¬Ã»ÓÐΨһË÷Òý¡£µ±ÎÒÃǽ«´ËʵÌåÌí¼Óµ½ÊµÌ弯ʱ£¬ÎÒÃÇÐèÒª´«Èë²ÎÊýmake_index = True²¢Ö¸¶¨Ë÷ÒýµÄÃû³Æ¡£´ËÍ⣬ËäÈ»featuretools»á×Ô¶¯ÍƶÏʵÌåÖÐÿÁеÄÊý¾ÝÀàÐÍ£¬µ«ÎÒÃÇ¿ÉÒÔͨ¹ý½«ÁÐÀàÐ͵Ä×ֵ䴫µÝ¸ø²ÎÊývariable_typesÀ´¸²¸ÇËü¡£

# Create an entity from the payments dataframe
# This does not yet have a unique index
es = es.entity_from_dataframe(entity_id = 'payments',
dataframe = payments,
variable_types = {'missed': ft.variable_types.Categorical},
make_index = True,
index = 'payment_id',
time_index = 'payment_date')

¶ÔÓÚÕâ¸öÊý¾Ý¿ò£¬¼´Ê¹missed µÄÀàÐÍÊÇÒ»¸öÕûÊý£¬µ«Ò²²»ÊÇÒ»¸öÊý×Ö±äÁ¿£¬ÒòΪËüÖ»ÄÜÈ¡2¸öÀëÉ¢Öµ£¬ËùÒÔÎÒÃǸæËßfeaturetools½«È±Ê§Êý¾ÝÊÓ×÷ÊÇÒ»¸ö·ÖÀà±äÁ¿¡£½«Êý¾Ý¿òÌí¼Óµ½ÊµÌ弯ºó£¬ÎÒÃǼì²éËüÃÇÖеÄÈκÎÒ»¸ö£º

ʹÓÃÎÒÃÇÖ¸¶¨µÄÐÞ¸ÄÄ£ÐÍÄܹ»ÕýÈ·ÍÆ¶ÏÁÐÀàÐÍ¡£½ÓÏÂÀ´£¬ÎÒÃÇÐèÒªÖ¸¶¨ÊµÌ弯ÖеıíÊÇÈçºÎÏà¹ØµÄ¡£

Êý¾Ý±íÖ®¼äµÄ¹ØÏµ

¿¼ÂÇÁ½ÕÅÊý¾Ý±íÖ®¼ä¹ØÏµµÄ×î¼Ñ·½Ê½ÊÇÓø¸¶Ô×ÓµÄÀà±È ¡£¸¸Óë×ÓÊÇÒ»¶Ô¶àµÄ¹ØÏµ£ºÃ¿¸ö¸¸Ä¸¿ÉÒÔÓжà¸öº¢×Ó¡£ÔÚÊý¾Ý±íµÄ·¶³ëÖУ¬¸¸±íµÄÿһÐдú±íһλ²»Í¬µÄ¸¸Ä¸£¬µ«×Ó±íÖеĶàÐдú±íµÄ¶à¸öº¢×Ó¿ÉÒÔ¶ÔÓ¦µ½¸¸±íÖеÄͬһλ¸¸Ä¸¡£

ÀýÈ磬ÔÚÎÒÃǵÄÊý¾Ý¼¯ÖУ¬clients¿Í»§Êý¾Ý¿òÊÇloan ´û¿îÊý¾Ý¿òµÄ¸¸¼¶£¬ÒòΪÿ¸ö¿Í»§ÔÚ¿Í»§±íÖÐÖ»ÓÐÒ»ÐУ¬µ«´û¿î¿ÉÄÜÓжàÐС£

ͬÑù£¬´û¿îloanÊý¾ÝÊÇÖ§¸¶paymentsÊý¾ÝµÄ¸¸¼¶£¬ÒòΪÿ±Ê´û¿î¶¼Óжà±Ê¸¶¿î¡£¸¸¼¶Êý¾Ý±íͨ¹ý¹²Ïí±äÁ¿Óë×Ó¼¶Êý¾Ý±í¹ØÁª¡£µ±ÎÒÃÇÖ´ÐоۺϲÙ×÷ʱ£¬ÎÒÃÇͨ¹ý¸¸±äÁ¿¶Ô×Ó±í½øÐзÖ×飬²¢¼ÆËãÿ¸ö¸¸ÏîµÄ×ÓÏîÖ®¼äµÄͳ¼ÆÊý¾Ý¡£

ÎÒÃÇÖ»ÐèÒªÖ¸Ã÷½«Á½ÕÅÊý¾Ý±í¹ØÁªµÄÄǸö±äÁ¿£¬¾ÍÄÜÓÃfeaturetoolsÀ´½¨Á¢±í¸ñ¼ûµÄ¹ØÏµ ¡£

¿Í»§clientsÊý¾Ý±íºÍ´û¿îloansÊý¾Ý±íͨ¹ý±äÁ¿client_id

Ï໥¹ØÁª£¬¶ø´û¿îloansÊý¾Ý±íºÍÖ§¸¶paymentsÊý¾Ý±íÔòͨ¹ý±äÁ¿loan_idÏ໥¹ØÁª¡£ÒÔÏÂÊǽ¨Á¢¹ØÁª²¢½«ÆäÌí¼Óµ½entiytsetµÄÓï·¨£º

# Relationship between clients and previous loans
r_client_previous = ft.Relationship(es['clients']['client_id'],
es['loans']['client_id'])
# Add the relationship to the entity set
es = es.add_relationship(r_client_previous)

# Relationship between previous loans and previous payments
r_payments = ft.Relationship(es['loans']['loan_id'],
es['payments']['loan_id'])


# Add the relationship to the entity set
es = es.add_relationship(r_payments)

es

ÏÖÔÚ£¬ÔÚentitysetÖаüº¬ÁËÈýÕÅÊý¾Ý±í£¬ÒÔ¼°ÈýÕß¼äµÄ¹ØÏµ¡£ÔÚÌí¼Óentities²¢½¨Á¢¹ØÁªºó£¬ÎÒÃǵÄentityset¾ÍËãÍê³ÉÁË£¬¿ÉÒÔ¿ªÊ¼½¨Á¢ÌØÕ÷Á¿ÁË¡£

ÌØÕ÷»ùÔª

ÔÚÎÒÃÇÍêÈ«ÉîÈë½øÐÐÌØÕ÷ºÏ³É֮ǰ£¬ÎÒÃÇÐèÒªÁ˽âÌØÕ÷»ùÔª¡£ÎÒÃÇÒѾ­ÖªµÀËüÃÇÊÇʲôÁË£¬µ«ÎÒÃǸոÕÓò»Í¬µÄÃû×ÖÀ´³ÆºôËüÃÇ£¡ÕâЩֻÊÇÎÒÃÇÓÃÀ´ÐγÉй¦ÄܵĻù±¾²Ù×÷£º

¾ÛºÏ£º»ùÓÚ¸¸±íÓë×Ó±í£¨Ò»¶Ô¶à£©¹ØÏµÍê³ÉµÄ²Ù×÷£¬°´¸¸±í·Ö×飬²¢¼ÆËã×Ó±íµÄͳ¼ÆÊý¾Ý¡£Ò»¸öÀý×ÓÊÇͨ¹ýclient_id¶Ô´û¿îloan±í½øÐзÖ×飬²¢ÕÒµ½Ã¿¸ö¿Í»§µÄ×î´ó´û¿î¶î¡£

ת»»£ºÔÚµ¥¸ö±íÉ϶ÔÒ»Áлò¶àÁÐÖ´ÐеIJÙ×÷¡£Ò»¸öÀý×ÓÊÇÔÚÒ»¸ö±íÖÐÈ¡Á½¸öÁÐÖ®¼äµÄ²îÒì»òȡһÁеľø¶ÔÖµ¡£

ÔÚfeaturetoolsÖÐʹÓÃÕâЩ»ùÔª±¾Éí»ò¶Ñµþ¶à¸ö»ùÔª£¬À´´´½¨Ð¹¦ÄÜ¡£ÏÂÃæÊÇfeaturetoolsÖÐÒ»Ð©ÌØÕ÷»ùÔªµÄÁÐ±í£¨ÎÒÃÇÒ²¿ÉÒÔ¶¨Òå×Ô¶¨Òå»ùÔª£©

ÕâЩԭÓï¿ÉÒÔµ¥¶ÀʹÓã¬Ò²¿ÉÒÔ×éºÏʹÓÃÀ´´´½¨ÌØÕ÷Á¿¡£ÒªÊ¹ÓÃÖ¸¶¨µÄ»ùÔªÖÆ×÷ÌØÕ÷£¬ÎÒÃÇʹÓÃft.dfsº¯Êý£¨´ú±íÉî¶ÈÌØÕ÷ºÏ³É£©¡£ÎÒÃÇ´«Èëentityset£¬target_entity£¬ÕâÊÇÎÒÃÇÒªÌí¼ÓÌØÕ÷µÄ±í£¬Ñ¡ÔñµÄtrans_primitives£¨×ª»»£©ºÍagg_primitives£¨¾ÛºÏ£©£º

# Create new features using specified primitives
features, feature_names = ft.dfs(entityset = es, target_entity = 'clients',
agg_primitives = ['mean', 'max', 'percent_true', 'last'],
trans_primitives = ['years', 'month', 'subtract', 'divide'])

½á¹ûÊÇÿ¸ö¿Í»§¶ËµÄÐÂÌØÕ÷Êý¾Ý¿ò£¨ÒòΪÎÒÃÇʹ¿Í»§¶Ë³ÉΪtarget_entity£©¡£ÀýÈ磬ÎÒÃÇÓÐÿ¸ö¿Í»§¼ÓÈëµÄÔ·ݣ¬ÕâÊÇÓÉת»»ÌØÕ÷»ùÔªÉú³ÉµÄ£º

ÎÒÃÇ»¹ÓÐÐí¶à¾ÛºÏ»ùÔª£¬ÀýÈçÿ¸ö¿Í»§µÄƽ¾ù¸¶¿î½ð¶î£º

¾¡¹ÜÎÒÃÇÖ»Ö¸¶¨ÁËÒ»Ð©ÌØÕ÷»ùÔª£¬µ«featuretoolsͨ¹ý×éºÏºÍ¶ÑµþÕâЩ»ùÔª´´½¨ÁËÐí¶àÐÂÌØÕ÷¡£

Éî¶ÈÌØÕ÷ºÏ³É

ÎÒÃÇÏÖÔÚÒѾ­×öºÃ×¼±¸À´Àí½âÉî¶ÈÌØÕ÷ºÏ³É£¨dfs£©¡£Êµ¼ÊÉÏ£¬ÎÒÃÇÒѾ­ÔÚ֮ǰµÄº¯Êýµ÷ÓÃÖÐÖ´ÐÐÁËdfs£¡Éî¶ÈÌØÕ÷½ö½öÊǶѵþ¶à¸ö»ùÔªµÄÌØÕ÷£¬¶ødfsÊÇÖÆ×÷ÕâÐ©ÌØÕ÷µÄ¹ý³ÌÃû³Æ¡£Éî¶ÈÌØÕ÷µÄÉî¶ÈÊÇÖÆ×÷ÌØÕ÷ËùÐèµÄ»ùÔªµÄÊýÁ¿¡£

ÀýÈ磬MEAN£¨payments.payment_amount£©ÁÐÊÇÉî¶ÈΪ1µÄÉî²ãÌØÕ÷£¬ÒòΪËüÊÇʹÓõ¥¸ö¾ÛºÏ´´½¨µÄ¡£Éî¶ÈΪ2µÄÌØÕ÷ÊÇLAST£¨´û¿î£¨MEAN£¨payments.payment_amount£©£©ÕâÊÇͨ¹ý¶ÑµþÁ½¸ö¾ÛºÏÀ´ÊµÏֵģº×îºóÒ»¸ö£¨×î½üµÄ£©ÔÚMEANÖ®ÉÏ¡£Õâ±íʾÿ¸ö¿Í»§×î½ü´û¿îµÄƽ¾ùÖ§¸¶¶î¡£

ÎÒÃÇ¿ÉÒÔ½«¹¦Äܶѵþµ½ÎÒÃÇÏëÒªµÄÈκÎÉî¶È£¬µ«ÔÚʵ¼ùÖУ¬ÎÒ´ÓδÓùý³¬¹ý2µÄÉî¶È¡£ÔÚ´ËÖ®ºó£¬Éú³ÉµÄÌØÕ÷¾ÍºÜÄѽâÊÍ£¬µ«ÎÒ¹ÄÀøÈκÎÓÐÐËȤµÄÈ˳¢ÊÔ¡°¸üÉîÈ롱 ¡£

ÎÒÃDz»±ØÊÖ¶¯Ö¸¶¨ÌØÕ÷»ùÔª£¬¶øÊÇ¿ÉÒÔÈÃfeaturetools×Ô¶¯ÎªÎÒÃÇÑ¡ÔñÌØÕ÷¡£ÎÒÃÇ¿ÉÒÔʹÓÃÏàͬµÄft.dfsº¯Êýµ÷Ó㬵«²»´«ÈëÈκÎÌØÕ÷»ùÔª£º

# Perform deep feature synthesis without specifying primitives
features, feature_names = ft.dfs(entityset=es, target_entity='clients',
max_depth = 2)

features.head()

FeaturetoolsΪÎÒÃǹ¹½¨ÁËÐí¶àÐÂÌØÕ÷¡£ËäÈ»´Ë¹ý³Ì»á×Ô¶¯´´½¨ÐÂÌØÕ÷£¬µ«ÈÔÐèÒªÊý¾Ý¿ÆÑ§¼ÒÀ´ÅªÇå³þÈçºÎ´¦ÀíËùÓÐÕâÐ©ÌØÕ÷¡£ÀýÈ磬Èç¹ûÎÒÃǵÄÄ¿±êÊÇÔ¤²â¿Í»§ÊÇ·ñ»á³¥»¹´û¿î£¬ÎÒÃÇ¿ÉÒÔѰÕÒÓëÖ¸¶¨½á¹û×îÏà¹ØµÄÌØÕ÷¡£´ËÍ⣬Èç¹ûÎÒÃÇÓÐÁìÓò֪ʶ£¬ÎÒÃÇ¿ÉÒÔʹÓÃËüÀ´Ñ¡ÔñÌØ¶¨µÄÌØÕ÷»ùÔª»òÖÖ×ÓÉî¶ÈÌØÕ÷ºÏ³ÉºòÑ¡ÌØÕ÷¡£

ÏÂÒ»²½

×Ô¶¯»¯ÌØÕ÷¹¤³ÌËäÈ»½â¾öÁËÒ»¸öÎÊÌ⣬µ«ÓÖµ¼ÖÂÁËÁíÒ»¸öÎÊÌâ£ºÌØÕ÷Ì«¶à¡£ËäÈ»ÔÚÄâºÏÄ£ÐÍ֮ǰºÜÄÑ˵ÄÄÐ©ÌØÕ÷ºÜÖØÒª£¬µ«ºÜ¿ÉÄܲ¢·ÇËùÓÐÕâÐ©ÌØÕ÷¶¼ÓëÎÒÃÇÏëҪѵÁ·Ä£Ð͵ÄÈÎÎñÏà¹Ø¡£´ËÍâ£¬ÌØÕ÷Ì«¶à¿ÉÄܻᵼÖÂÄ£ÐÍÐÔÄܲ»¼Ñ£¬ÒòΪһЩ²»ÊǺÜÓÐÓõÄÌØÕ÷»áÑÍûÄÇЩ¸üÖØÒªµÄÌØÕ÷¡£

ÌØÕ÷¹ý¶àµÄÎÊÌâ±»³ÆÎªÎ¬¶È×çÖä ¡£Ëæ×ÅÌØÕ÷ÊýÁ¿µÄÔö¼Ó£¨Êý¾ÝµÄά¶ÈÔö¼Ó£©£¬Ä£ÐÍÔ½À´Ô½ÄÑÒÔÑ§Ï°ÌØÕ÷ºÍÄ¿±êÖ®¼äµÄÓ³É䡣ʵ¼ÊÉÏ£¬Ä£ÐÍÖ´ÐÐËùÐèµÄÊý¾ÝÁ¿Ëæ×ÅÌØÕ÷ÊýÁ¿³ÊÖ¸Êý¼¶Ôö³¤¡£

ά¶È×çÖäÓëÌØÕ÷Ëõ¼õ£¨Ò²³ÆÎªÌØÕ÷Ñ¡Ôñ£©Ïà¶ÔÓ¦£ºÉ¾³ý²»Ïà¹ØÌØÕ÷µÄ¹ý³Ì¡£ÌØÕ÷Ñ¡Ôñ¿ÉÒÔ²ÉÓöàÖÖÐÎʽ£ºÖ÷³É·Ö·ÖÎö£¨PCA£©£¬SelectKBest£¬Ê¹ÓÃÄ£ÐÍÖеÄÌØÕ÷ÖØÒªÐÔ£¬»òʹÓÃÉî¶ÈÉñ¾­ÍøÂç½øÐÐ×Ô¶¯±àÂë¡£µ«ÊÇ£¬¼õÉÙ¹¦ÄÜÊÇÁíһƪÎÄÕµÄÁíÒ»¸öÖ÷Ì⡣Ŀǰ£¬ÎÒÃÇÖªµÀÎÒÃÇ¿ÉÒÔʹÓÃfeaturetoolsÒÔ×îСµÄŬÁ¦´ÓÐí¶à±í´´½¨Ðí¶à¹¦ÄÜ£¡

½áÂÛ

Óë»úÆ÷ѧϰÖеÄÐí¶àÖ÷ÌâÒ»Ñù£¬Ê¹ÓÃfeaturetoolsµÄ×Ô¶¯»¯ÌØÕ÷¹¤³ÌÊÇÒ»¸ö»ùÓÚ¼òµ¥Ïë·¨µÄ¸´ÔÓ¸ÅÄʹÓÃʵÌ弯£¬ÊµÌåºÍ¹ØÏµµÄ¸ÅÄfeaturetools¿ÉÒÔÖ´ÐÐÉî¶ÈÌØÕ÷ºÏ³ÉÒÔн¨ÌØÕ÷¡£

¾ÛºÏ¾ÍÊǽ«Éî¶ÈÌØÕ÷ºÏ³ÉÒÀ´Î½«ÌØÕ÷»ùÔª¶Ñµþ £¬ÀûÓÃÁË¿ç±íÖ®¼äµÄÒ»¶Ô¶à¹ØÏµ£¬¶ø×ª»»ÊÇÓ¦ÓÃÓÚµ¥¸ö±íÖеÄÒ»¸ö»ò¶à¸öÁеĺ¯Êý£¬´Ó¶à¸ö±í¹¹½¨ÐÂÌØÕ÷¡£

ÔÚÒÔºóµÄÎÄÕÂÖУ¬ÎÒ½«Õ¹Ê¾ÈçºÎʹÓÃÕâÖÖ¼¼Êõ½â¾öÏÖʵÖеÄÎÊÌ⣬Ҳ¾ÍÊÇĿǰÕýÔÚKaggleÉÏÖ÷³ÖµÄHome Credit Default Risk¾ºÈü¡£Çë¼ÌÐø¹Ø×¢¸ÃÌû×Ó£¬Í¬Ê±ÔĶÁ´Ë½éÉÜÒÔ¿ªÊ¼²Î¼Ó±ÈÈü£¡ÎÒÏ£ÍûÄúÏÖÔÚ¿ÉÒÔʹÓÃ×Ô¶¯»¯ÌØÕ÷¹¤³Ì×÷ΪÊý¾Ý¿ÆÑ§¹ÜµÀµÄ¸¨Öú¹¤¾ß¡£Ä£Ð͵ÄÐÔÄÜÊÇÓÉÎÒÃÇÌṩµÄÊý¾ÝËù¾ö¶¨µÄ£¬¶ø×Ô¶¯»¯¹¦Äܹ¤³Ì¿ÉÒÔ°ïÖúÌá¸ß½¨Á¢ÐÂÌØÕ÷µÄЧÂÊ¡£

   
2467 ´Îä¯ÀÀ       27
Ïà¹ØÎÄÕÂ

»ùÓÚͼ¾í»ýÍøÂçµÄͼÉî¶Èѧϰ
×Ô¶¯¼ÝÊ»ÖеÄ3DÄ¿±ê¼ì²â
¹¤Òµ»úÆ÷ÈË¿ØÖÆÏµÍ³¼Ü¹¹½éÉÜ
ÏîĿʵս£ºÈçºÎ¹¹½¨ÖªÊ¶Í¼Æ×
 
Ïà¹ØÎĵµ

5GÈ˹¤ÖÇÄÜÎïÁªÍøµÄµäÐÍÓ¦ÓÃ
Éî¶ÈѧϰÔÚ×Ô¶¯¼ÝÊ»ÖеÄÓ¦ÓÃ
ͼÉñ¾­ÍøÂçÔÚ½»²æÑ§¿ÆÁìÓòµÄÓ¦ÓÃÑо¿
ÎÞÈË»úϵͳԭÀí
Ïà¹Ø¿Î³Ì

È˹¤ÖÇÄÜ¡¢»úÆ÷ѧϰ&TensorFlow
»úÆ÷ÈËÈí¼þ¿ª·¢¼¼Êõ
È˹¤ÖÇÄÜ£¬»úÆ÷ѧϰºÍÉî¶Èѧϰ
ͼÏñ´¦ÀíËã·¨·½·¨Óëʵ¼ù