±à¼ÍƼö: |
±¾ÎÄÀ´×Ô´óÊý¾ÝÎÄÕª£¬±¾ÎÄ×÷Õß½«Ê¹ÓÃPythonµÄfeaturetools¿â½øÐÐ×Ô¶¯»¯ÌØÕ÷¹¤³ÌµÄʾÀý£¬Ï£Íû¶ÔÄúµÄѧϰÓÐËùÆôµÏ¡£ |
|
»úÆ÷ѧϰµÄÄ£ÐÍѵÁ·Ô½À´Ô½×Ô¶¯»¯£¬µ«ÌØÕ÷¹¤³Ì»¹ÊÇÒ»¸öÂþ³¤µÄÊÖ¶¯¹ý³Ì£¬ÒÀÀµÓÚרҵµÄÁìÓò֪ʶ£¬Ö±¾õºÍÊý¾Ý´¦Àí¡£¶øÌØÕ÷ѡȡǡǡÊÇ»úÆ÷Ñ§Ï°ÖØÒªµÄÏÈÆÚ²½Ö裬ËäÈ»²»ÈçÄ£ÐÍѵÁ·ÄÇÑùÄܲúÉúÖ±½Ó¿ÉÓõĽá¹û¡£±¾ÎÄ×÷Õß½«Ê¹ÓÃPythonµÄfeaturetools¿â½øÐÐ×Ô¶¯»¯ÌØÕ÷¹¤³ÌµÄʾÀý¡£
»úÆ÷ѧϰԽÀ´Ô½¶àµØ´ÓÊÖ¶¯Éè¼ÆÄ£ÐÍת±äΪʹÓÃH20£¬TPOTºÍauto-sklearnµÈ¹¤¾ßÀ´×Ô¶¯ÓÅ»¯µÄÇþµÀ¡£ÕâЩ¿âÒÔ¼°Ëæ»úËÑË÷µÈ·½·¨Ö¼ÔÚͨ¹ý²éÕÒÊý¾Ý¼¯µÄ×îÓÅÄ£ÐÍÀ´¼ò»¯Ä£ÐÍÑ¡ÔñºÍת±ä»úÆ÷ѧϰµÄ²¿·Ö£¬¼¸ºõ²»ÐèÒªÈ˹¤¸ÉÔ¤¡£È»¶ø£¬ÌØÕ÷¹¤³Ì¼¸ºõÍêÈ«ÊÇÈ˹¤£¬ÕâÎÞÒÉÊÇ»úÆ÷ѧϰ¹ÜµÀÖиüÓмÛÖµµÄ·½Ãæ¡£
ÌØÕ÷¹¤³ÌÒ²³ÆÎªÌØÕ÷´´½¨£¬ÊÇ´ÓÏÖÓÐÊý¾Ý¹¹½¨ÐÂÌØÕ÷ÒÔѵÁ·»úÆ÷ѧϰģÐ͵Ĺý³Ì¡£Õâ¸ö²½Öè¿ÉÄܱÈʵ¼ÊÓ¦ÓõÄÄ£Ð͸üÖØÒª£¬ÒòΪ»úÆ÷ѧϰËã·¨Ö»´ÓÎÒÃÇÌṩµÄÊý¾ÝÖÐѧϰ£¬È»¶ø´´½¨ÓëÈÎÎñÏà¹ØµÄÌØÕ÷¾ø¶ÔÊÇÖÁ¹ØÖØÒªµÄ¡£
ͨ³££¬ÌØÕ÷¹¤³ÌÊÇÒ»¸öÂþ³¤µÄÊÖ¶¯¹ý³Ì£¬ÒÀÀµÓÚרҵµÄÁìÓò֪ʶ£¬Ö±¾õºÍÊý¾Ý´¦Àí¡£Õâ¸ö¹ý³Ì¿ÉÄܷdz£·±Ëö£¬¶øÇÒ×îÖÕµÄÌØÕ÷½«Êܵ½ÈËÀàÖ÷¹ÛÐÔºÍʱ¼äµÄÏÞÖÆ¡£×Ô¶¯»¯ÌØÕ÷¹¤³ÌÖ¼ÔÚͨ¹ý´ÓÊý¾Ý¼¯ÖÐ×Ô¶¯´´½¨Ðí¶àºòÑ¡ÌØÕ÷À´°ïÖúÊý¾Ý¿ÆÑ§¼Ò£¬²¢´ÓÖпÉÒÔÑ¡Ôñ×î¼ÑÌØÕ÷ÓÃÓÚѵÁ·¡£
ÔÚ±¾ÎÄÖУ¬ÎÒÃǽ«Ê¹ÓÃPython µÄfeaturetools¿â½øÐÐ×Ô¶¯»¯ÌØÕ÷¹¤³ÌµÄʾÀý¡£ÎÒÃǽ«Ê¹ÓÃʾÀýÊý¾Ý¼¯À´ÑÝʾ»ù´¡ÖªÊ¶¡£
ÍêÕû´úÂ룺
https://github.com/WillKoehrsen/automated-feature-engineering/blob/master/walk_through/Automated_Feature
_Engineering.ipynb |
ÌØÕ÷¹¤³Ì»ù´¡
ÌØÕ÷¹¤³ÌÒâζ×Å´ÓÏÖÓÐÊý¾ÝÖй¹½¨¶îÍâÌØÕ÷£¬ÕâЩÊý¾Ýͨ³£·Ö²¼ÔÚ¶à¸öÏà¹Ø±íÖС£ÌØÕ÷¹¤³ÌÐèÒª´ÓÊý¾ÝÖÐÌáÈ¡Ïà¹ØÐÅÏ¢²¢½«Æä·ÅÈëµ¥¸ö±íÖУ¬È»ºó¿ÉÒÔʹÓøñíÀ´ÑµÁ·»úÆ÷ѧϰģÐÍ¡£
¹¹½¨ÌØÕ÷µÄ¹ý³Ì·Ç³£µØºÄʱ£¬ÒòΪÿ¸öÌØÕ÷µÄ¹¹½¨Í¨³£ÐèҪһЩ²½ÖèÀ´ÊµÏÖ£¬ÓÈÆäÊÇʹÓöà¸ö±íÖеÄÐÅϢʱ¡£ÎÒÃÇ¿ÉÒÔ½«ÌØÕ÷´´½¨µÄ²½Öè·ÖΪÁ½Àࣺת»»ºÍ¾ÛºÏ¡£ÈÃÎÒÃÇ¿´¼¸¸öÀý×ÓÀ´Á˽âÕâЩ¸ÅÄîµÄʵ¼ÊÓ¦Óá£
ת»»×÷ÓÃÓÚµ¥¸ö±í£¨´ÓPython½Ç¶ÈÀ´¿´£¬±íÖ»ÊÇÒ»¸öPandas Êý¾Ý¿ò£©£¬Ëüͨ¹ýÒ»¸ö»ò¶à¸öÏÖÓеÄÁд´½¨ÐÂÌØÕ÷¡£
ÀýÈ磬Èç¹ûÎÒÃÇÓÐÈçÏ¿ͻ§±í¡£

ÎÒÃÇ¿ÉÒÔͨ¹ý²éÕÒjoinedÁеÄÔ·ݻòÊÇ»ñÈ¡incomeÁеÄ×ÔÈ»¶ÔÊýÀ´´´½¨ÌØÕ÷¡£ÕâЩ¶¼ÊÇת»»£¬ÒòΪËüÃǽöʹÓÃÀ´×ÔÒ»¸ö±íµÄÐÅÏ¢¡£
import pandas
as pd
# Group loans by client id and calculate mean,
max, min of loans
stats = loans.groupby('client_id')['loan_amount'].agg(['mean',
'max', 'min'])
stats.columns = ['mean_loan_amount', 'max_loan_amount',
'min_loan_amount']
# Merge with the clients dataframe
stats = clients.merge(stats, left_on = 'client_id',
right_index=True, how = 'left')
stats.head(10) |

ÁíÒ»·½Ã棬¾ÛºÏ×÷ÓÃÓÚ¶à¸ö±í£¬²¢Ê¹ÓÃÒ»¶Ô¶àµÄ¹ØÏµ¶Ô¹Û²âÖµ½øÐзÖ×飬Ȼºó¼ÆËãͳ¼ÆÊý¾Ý¡£ÀýÈ磬Èç¹ûÎÒÃÇÓÐÁíÒ»¸ö°üº¬¿Í»§´û¿îµÄÐÅÏ¢±í¸ñ£¬ÆäÖÐÿ¸ö¿Í»§¿ÉÄÜÓжà±Ê´û¿î£¬ÎÒÃÇ¿ÉÒÔ¼ÆËãÿ¸ö¿Í»§µÄ´û¿îµÄƽ¾ùÖµ£¬×î´óÖµºÍ×îСֵµÈͳ¼ÆÊý¾Ý¡£
´Ë¹ý³Ì°üÀ¨Í¨¹ý¿Í»§ÐÅÏ¢¶Ô´û¿î±í½øÐзÖ×飬¼ÆËã¾ÛºÏ£¬È»ºó½«½á¹ûÊý¾ÝºÏ²¢µ½¿Í»§Êý¾ÝÖС£ÒÔÏÂÊÇÎÒÃÇÈçºÎʹÓÃPandas¿âÔÚPythonÖÐÖ´Ðд˲Ù×÷¡£

ÕâЩ²Ù×÷±¾Éí²¢²»À§ÄÑ£¬µ«Èç¹ûÎÒÃÇÓÐÊý°Ù¸ö±äÁ¿·Ö²¼ÔÚ¼¸Ê®¸ö±íÖУ¬ÄÇôÕâ¸ö¹ý³ÌҪͨ¹ýÊÖ¹¤Íê³ÉÊDz»¿ÉÐеġ£ÀíÏëÇé¿öÏ£¬ÎÒÃÇÐèÒªÒ»ÖÖÄܹ»¿ç¶à¸ö±í×Ô¶¯Ö´ÐÐת»»ºÍ¾ÛºÏµÄ½â¾ö·½°¸£¬²¢½«½á¹ûÊý¾ÝºÏ²¢µ½Ò»¸ö±íÖС£¾¡¹ÜPandas¿âÊÇÒ»¸öºÜºÃµÄ×ÊÔ´£¬µ«Í¨¹ýÎÒÃÇÊÖ¹¤Íê³ÉµÄÊý¾Ý²Ù×÷ÊÇÓÐÏ޵ġ£
ÊÖ¶¯ÌØÕ÷¹¤³ÌµÄ¸ü¶àÐÅÏ¢£º
https://jakevdp.github.io/PythonDataScienceHandbook/
05.04-feature-engineering.html |
Featuretools
ÐÒÔ˵ÄÊÇ£¬featuretoolsÕýÊÇÎÒÃÇÕýÔÚѰÕҵĽâ¾ö·½°¸¡£Õâ¸ö¿ªÔ´Python¿â½«×Ô¶¯´ÓÒ»×éÏà¹Ø±íÖд´½¨Ðí¶àÌØÕ÷¡£Featuretools»ùÓÚÒ»ÖÖ³ÆÎª¡°Éî¶ÈÌØÕ÷ºÏ³É¡±µÄ·½·¨£¬Õâ¸öÃû×ÖÌýÆðÀ´±Èʵ¼ÊµÄÓÃ;¸üÁîÈËÓ¡ÏóÉî¿Ì
Éî¶ÈÌØÕ÷ºÏ³ÉʵÏÖÁ˶àÖØ×ª»»ºÍ¾ÛºÏ²Ù×÷£¨ÔÚfeaturetoolsµÄ´Ê»ãÖгÆÎªÌØÕ÷»ùÔª£©£¬Í¨¹ý·Ö²¼ÔÚÐí¶à±íÖеÄÊý¾ÝÀ´´´½¨ÌØÕ÷¡£Ïñ»úÆ÷ѧϰÖеĴó¶àÊý¹ÛÄîÒ»Ñù£¬ËüÊǽ¨Á¢ÔÚ¼òµ¥¸ÅÄî»ù´¡Éϵĸ´ºÏÐÍ·½·¨¡£Í¨¹ýÒ»´Îѧϰһ¸ö¹¹Ôì¿éµÄʾÀý£¬ÎÒÃǾͻáÈÝÒ×Àí½âÕâÖÖÇ¿´óµÄ·½·¨¡£
Ê×ÏÈ£¬ÎÒÃÇÀ´¿´¿´ÎÒÃǵÄʾÀýÊý¾Ý¡£ ÎÒÃÇÒѾ¿´µ½ÁËÉÏÃæµÄһЩÊý¾Ý¼¯£¬ÍêÕûµÄ±í¼¯ºÏÈçÏ£º
¿Í»§£º¼´ÓйØÐÅ´ûÁªÃËÖпͻ§µÄ»ù±¾ÐÅÏ¢¡£Ã¿¸ö¿Í»§ÔÚ´ËÊý¾Ý¿òÖÐÖ»ÓÐÒ»ÐС£

´û¿î£º¼´¿Í»§´û¿î¡£Ã¿Ïî´û¿îÔÚ´ËÊý¾Ý¿òÖÐÖ»ÓÐ×Ô¼ºµ¥¶ÀÒ»ÐеļǼ£¬µ«¿Í»§¿ÉÄÜÓжàÏî´û¿î¡£

¸¶¿î£º¼´Ö§¸¶´û¿î¡£ ÿ±ÊÖ§¸¶Ö»ÓÐÒ»ÐмǼ£¬µ«Ã¿±Ê´û¿î¶¼Óжà±ÊÖ§¸¶¼Ç¼¡£

Èç¹ûÎÒÃÇÓлúÆ÷ѧϰĿ±ê£¬ÀýÈçÔ¤²â¿Í»§ÊÇ·ñ½«³¥»¹Î´À´´û¿î£¬ÎÒÃÇÏ£Íû½«Óйؿͻ§µÄËùÓÐÐÅÏ¢×éºÏµ½Ò»¸ö±íÖС£ÕâЩ±íÊÇÏà¹ØµÄ£¨Í¨¹ýclient_idºÍloan_id±äÁ¿£©£¬Ä¿Ç°ÎÒÃÇ¿ÉÒÔÊÖ¶¯Íê³ÉһϵÁÐת»»ºÍ¾ÛºÏ¹ý³Ì¡£È»¶ø£¬²»¾ÃÖ®ºóÎÒÃǾͿÉÒÔʹÓÃfeaturetoolsÀ´×Ô¶¯»¯¸Ã¹ý³Ì¡£
ʵÌåºÍʵÌ弯
featuretoolsµÄǰÁ½¸ö¸ÅÄîÊÇʵÌåºÍʵÌ弯¡£ÊµÌåÖ»ÊÇÒ»¸ö±í£¨Èç¹ûÓÃPandas¿âµÄ¸ÅÄîÀ´Àí½â£¬ÊµÌåÊÇÒ»¸öDataFrame£¨Êý¾Ý¿ò£©£©¡£
EntitySet£¨ÊµÌ弯£©ÊDZíµÄ¼¯ºÏÒÔ¼°ËüÃÇÖ®¼äµÄ¹ØÏµ¡£¿ÉÒÔ½«ÊµÌ弯ÊÓΪÁíÒ»¸öPythonÊý¾Ý½á¹¹£¬¸Ã½á¹¹¾ßÓÐ×Ô¼ºµÄ·½·¨ºÍÊôÐÔ¡££©
ÎÒÃÇ¿ÉÒÔʹÓÃÒÔÏÂÃüÁîÔÚfeaturetoolsÖд´½¨Ò»¸ö¿ÕʵÌ弯£º
import featuretools
as ft
# Create new entityset
es = ft.EntitySet(id = 'clients') |
ÏÖÔÚÎÒÃÇÌí¼ÓʵÌ塣ÿ¸öʵÌå¶¼±ØÐëÓÐÒ»¸öË÷Òý£¬¸ÃË÷ÒýÊÇÒ»¸ö°üº¬ËùÓÐÎ¨Ò»ÔªËØµÄÁС£Ò²¾ÍÊÇ˵£¬Ë÷ÒýÖеÄÿ¸öÖµÖ»ÄܳöÏÖÔÚ±íÖÐÒ»´Î¡£
clientsÊý¾Ý¿òÖеÄË÷ÒýÊÇclient_id£¬ÒòΪÿ¸ö¿Í»§ÔÚ´ËÊý¾Ý¿òÖÐÖ»ÓÐÒ»ÐС£ ÎÒÃÇʹÓÃÒÔÏÂÓï·¨½«Ò»¸öÏÖÓÐË÷ÒýµÄʵÌåÌí¼Óµ½ÊµÌ弯ÖУº
# Create an entity
from the client dataframe
# This dataframe already has an index and a time
index
es = es.entity_from_dataframe(entity_id = 'clients',
dataframe = clients, index = 'client_id', time_index
= 'joined') |
loansÊý¾Ý¿ò»¹¾ßÓÐΨһË÷Òýloan_id£¬²¢ÇÒ½«ÆäÌí¼Óµ½ÊµÌ弯µÄÓï·¨ÓëclientsÏàͬ¡£µ«ÊÇ£¬¶ÔÓÚpaymentsÊý¾Ý¿ò£¬Ã»ÓÐΨһË÷Òý¡£µ±ÎÒÃǽ«´ËʵÌåÌí¼Óµ½ÊµÌ弯ʱ£¬ÎÒÃÇÐèÒª´«Èë²ÎÊýmake_index
= True²¢Ö¸¶¨Ë÷ÒýµÄÃû³Æ¡£´ËÍ⣬ËäÈ»featuretools»á×Ô¶¯ÍƶÏʵÌåÖÐÿÁеÄÊý¾ÝÀàÐÍ£¬µ«ÎÒÃÇ¿ÉÒÔͨ¹ý½«ÁÐÀàÐ͵Ä×ֵ䴫µÝ¸ø²ÎÊývariable_typesÀ´¸²¸ÇËü¡£
# Create an entity
from the payments dataframe
# This does not yet have a unique index
es = es.entity_from_dataframe(entity_id = 'payments',
dataframe = payments,
variable_types = {'missed': ft.variable_types.Categorical},
make_index = True,
index = 'payment_id',
time_index = 'payment_date') |
¶ÔÓÚÕâ¸öÊý¾Ý¿ò£¬¼´Ê¹missed µÄÀàÐÍÊÇÒ»¸öÕûÊý£¬µ«Ò²²»ÊÇÒ»¸öÊý×Ö±äÁ¿£¬ÒòΪËüÖ»ÄÜÈ¡2¸öÀëÉ¢Öµ£¬ËùÒÔÎÒÃǸæËßfeaturetools½«È±Ê§Êý¾ÝÊÓ×÷ÊÇÒ»¸ö·ÖÀà±äÁ¿¡£½«Êý¾Ý¿òÌí¼Óµ½ÊµÌ弯ºó£¬ÎÒÃǼì²éËüÃÇÖеÄÈκÎÒ»¸ö£º

ʹÓÃÎÒÃÇÖ¸¶¨µÄÐÞ¸ÄÄ£ÐÍÄܹ»ÕýÈ·ÍÆ¶ÏÁÐÀàÐÍ¡£½ÓÏÂÀ´£¬ÎÒÃÇÐèÒªÖ¸¶¨ÊµÌ弯ÖеıíÊÇÈçºÎÏà¹ØµÄ¡£
Êý¾Ý±íÖ®¼äµÄ¹ØÏµ
¿¼ÂÇÁ½ÕÅÊý¾Ý±íÖ®¼ä¹ØÏµµÄ×î¼Ñ·½Ê½ÊÇÓø¸¶Ô×ÓµÄÀà±È ¡£¸¸Óë×ÓÊÇÒ»¶Ô¶àµÄ¹ØÏµ£ºÃ¿¸ö¸¸Ä¸¿ÉÒÔÓжà¸öº¢×Ó¡£ÔÚÊý¾Ý±íµÄ·¶³ëÖУ¬¸¸±íµÄÿһÐдú±íһλ²»Í¬µÄ¸¸Ä¸£¬µ«×Ó±íÖеĶàÐдú±íµÄ¶à¸öº¢×Ó¿ÉÒÔ¶ÔÓ¦µ½¸¸±íÖеÄͬһλ¸¸Ä¸¡£
ÀýÈ磬ÔÚÎÒÃǵÄÊý¾Ý¼¯ÖУ¬clients¿Í»§Êý¾Ý¿òÊÇloan ´û¿îÊý¾Ý¿òµÄ¸¸¼¶£¬ÒòΪÿ¸ö¿Í»§ÔÚ¿Í»§±íÖÐÖ»ÓÐÒ»ÐУ¬µ«´û¿î¿ÉÄÜÓжàÐС£
ͬÑù£¬´û¿îloanÊý¾ÝÊÇÖ§¸¶paymentsÊý¾ÝµÄ¸¸¼¶£¬ÒòΪÿ±Ê´û¿î¶¼Óжà±Ê¸¶¿î¡£¸¸¼¶Êý¾Ý±íͨ¹ý¹²Ïí±äÁ¿Óë×Ó¼¶Êý¾Ý±í¹ØÁª¡£µ±ÎÒÃÇÖ´ÐоۺϲÙ×÷ʱ£¬ÎÒÃÇͨ¹ý¸¸±äÁ¿¶Ô×Ó±í½øÐзÖ×飬²¢¼ÆËãÿ¸ö¸¸ÏîµÄ×ÓÏîÖ®¼äµÄͳ¼ÆÊý¾Ý¡£
ÎÒÃÇÖ»ÐèÒªÖ¸Ã÷½«Á½ÕÅÊý¾Ý±í¹ØÁªµÄÄǸö±äÁ¿£¬¾ÍÄÜÓÃfeaturetoolsÀ´½¨Á¢±í¸ñ¼ûµÄ¹ØÏµ ¡£
¿Í»§clientsÊý¾Ý±íºÍ´û¿îloansÊý¾Ý±íͨ¹ý±äÁ¿client_id
Ï໥¹ØÁª£¬¶ø´û¿îloansÊý¾Ý±íºÍÖ§¸¶paymentsÊý¾Ý±íÔòͨ¹ý±äÁ¿loan_idÏ໥¹ØÁª¡£ÒÔÏÂÊǽ¨Á¢¹ØÁª²¢½«ÆäÌí¼Óµ½entiytsetµÄÓï·¨£º
# Relationship
between clients and previous loans
r_client_previous = ft.Relationship(es['clients']['client_id'],
es['loans']['client_id'])
# Add the relationship to the entity set
es = es.add_relationship(r_client_previous)
# Relationship between previous loans and previous
payments
r_payments = ft.Relationship(es['loans']['loan_id'],
es['payments']['loan_id'])
# Add the relationship to the entity set
es = es.add_relationship(r_payments)
es |

ÏÖÔÚ£¬ÔÚentitysetÖаüº¬ÁËÈýÕÅÊý¾Ý±í£¬ÒÔ¼°ÈýÕß¼äµÄ¹ØÏµ¡£ÔÚÌí¼Óentities²¢½¨Á¢¹ØÁªºó£¬ÎÒÃǵÄentityset¾ÍËãÍê³ÉÁË£¬¿ÉÒÔ¿ªÊ¼½¨Á¢ÌØÕ÷Á¿ÁË¡£
ÌØÕ÷»ùÔª
ÔÚÎÒÃÇÍêÈ«ÉîÈë½øÐÐÌØÕ÷ºÏ³É֮ǰ£¬ÎÒÃÇÐèÒªÁ˽âÌØÕ÷»ùÔª¡£ÎÒÃÇÒѾ֪µÀËüÃÇÊÇʲôÁË£¬µ«ÎÒÃǸոÕÓò»Í¬µÄÃû×ÖÀ´³ÆºôËüÃÇ£¡ÕâЩֻÊÇÎÒÃÇÓÃÀ´ÐγÉй¦ÄܵĻù±¾²Ù×÷£º
¾ÛºÏ£º»ùÓÚ¸¸±íÓë×Ó±í£¨Ò»¶Ô¶à£©¹ØÏµÍê³ÉµÄ²Ù×÷£¬°´¸¸±í·Ö×飬²¢¼ÆËã×Ó±íµÄͳ¼ÆÊý¾Ý¡£Ò»¸öÀý×ÓÊÇͨ¹ýclient_id¶Ô´û¿îloan±í½øÐзÖ×飬²¢ÕÒµ½Ã¿¸ö¿Í»§µÄ×î´ó´û¿î¶î¡£
ת»»£ºÔÚµ¥¸ö±íÉ϶ÔÒ»Áлò¶àÁÐÖ´ÐеIJÙ×÷¡£Ò»¸öÀý×ÓÊÇÔÚÒ»¸ö±íÖÐÈ¡Á½¸öÁÐÖ®¼äµÄ²îÒì»òȡһÁеľø¶ÔÖµ¡£
ÔÚfeaturetoolsÖÐʹÓÃÕâЩ»ùÔª±¾Éí»ò¶Ñµþ¶à¸ö»ùÔª£¬À´´´½¨Ð¹¦ÄÜ¡£ÏÂÃæÊÇfeaturetoolsÖÐÒ»Ð©ÌØÕ÷»ùÔªµÄÁÐ±í£¨ÎÒÃÇÒ²¿ÉÒÔ¶¨Òå×Ô¶¨Òå»ùÔª£©

ÕâЩÔÓï¿ÉÒÔµ¥¶ÀʹÓã¬Ò²¿ÉÒÔ×éºÏʹÓÃÀ´´´½¨ÌØÕ÷Á¿¡£ÒªÊ¹ÓÃÖ¸¶¨µÄ»ùÔªÖÆ×÷ÌØÕ÷£¬ÎÒÃÇʹÓÃft.dfsº¯Êý£¨´ú±íÉî¶ÈÌØÕ÷ºÏ³É£©¡£ÎÒÃÇ´«Èëentityset£¬target_entity£¬ÕâÊÇÎÒÃÇÒªÌí¼ÓÌØÕ÷µÄ±í£¬Ñ¡ÔñµÄtrans_primitives£¨×ª»»£©ºÍagg_primitives£¨¾ÛºÏ£©£º
# Create new
features using specified primitives
features, feature_names = ft.dfs(entityset = es,
target_entity = 'clients',
agg_primitives = ['mean', 'max', 'percent_true',
'last'],
trans_primitives = ['years', 'month', 'subtract',
'divide']) |
½á¹ûÊÇÿ¸ö¿Í»§¶ËµÄÐÂÌØÕ÷Êý¾Ý¿ò£¨ÒòΪÎÒÃÇʹ¿Í»§¶Ë³ÉΪtarget_entity£©¡£ÀýÈ磬ÎÒÃÇÓÐÿ¸ö¿Í»§¼ÓÈëµÄÔ·ݣ¬ÕâÊÇÓÉת»»ÌØÕ÷»ùÔªÉú³ÉµÄ£º

ÎÒÃÇ»¹ÓÐÐí¶à¾ÛºÏ»ùÔª£¬ÀýÈçÿ¸ö¿Í»§µÄƽ¾ù¸¶¿î½ð¶î£º

¾¡¹ÜÎÒÃÇÖ»Ö¸¶¨ÁËÒ»Ð©ÌØÕ÷»ùÔª£¬µ«featuretoolsͨ¹ý×éºÏºÍ¶ÑµþÕâЩ»ùÔª´´½¨ÁËÐí¶àÐÂÌØÕ÷¡£

Éî¶ÈÌØÕ÷ºÏ³É
ÎÒÃÇÏÖÔÚÒѾ×öºÃ×¼±¸À´Àí½âÉî¶ÈÌØÕ÷ºÏ³É£¨dfs£©¡£Êµ¼ÊÉÏ£¬ÎÒÃÇÒѾÔÚ֮ǰµÄº¯Êýµ÷ÓÃÖÐÖ´ÐÐÁËdfs£¡Éî¶ÈÌØÕ÷½ö½öÊǶѵþ¶à¸ö»ùÔªµÄÌØÕ÷£¬¶ødfsÊÇÖÆ×÷ÕâÐ©ÌØÕ÷µÄ¹ý³ÌÃû³Æ¡£Éî¶ÈÌØÕ÷µÄÉî¶ÈÊÇÖÆ×÷ÌØÕ÷ËùÐèµÄ»ùÔªµÄÊýÁ¿¡£
ÀýÈ磬MEAN£¨payments.payment_amount£©ÁÐÊÇÉî¶ÈΪ1µÄÉî²ãÌØÕ÷£¬ÒòΪËüÊÇʹÓõ¥¸ö¾ÛºÏ´´½¨µÄ¡£Éî¶ÈΪ2µÄÌØÕ÷ÊÇLAST£¨´û¿î£¨MEAN£¨payments.payment_amount£©£©ÕâÊÇͨ¹ý¶ÑµþÁ½¸ö¾ÛºÏÀ´ÊµÏֵģº×îºóÒ»¸ö£¨×î½üµÄ£©ÔÚMEANÖ®ÉÏ¡£Õâ±íʾÿ¸ö¿Í»§×î½ü´û¿îµÄƽ¾ùÖ§¸¶¶î¡£

ÎÒÃÇ¿ÉÒÔ½«¹¦Äܶѵþµ½ÎÒÃÇÏëÒªµÄÈκÎÉî¶È£¬µ«ÔÚʵ¼ùÖУ¬ÎÒ´ÓδÓùý³¬¹ý2µÄÉî¶È¡£ÔÚ´ËÖ®ºó£¬Éú³ÉµÄÌØÕ÷¾ÍºÜÄѽâÊÍ£¬µ«ÎÒ¹ÄÀøÈκÎÓÐÐËȤµÄÈ˳¢ÊÔ¡°¸üÉîÈ롱
¡£
ÎÒÃDz»±ØÊÖ¶¯Ö¸¶¨ÌØÕ÷»ùÔª£¬¶øÊÇ¿ÉÒÔÈÃfeaturetools×Ô¶¯ÎªÎÒÃÇÑ¡ÔñÌØÕ÷¡£ÎÒÃÇ¿ÉÒÔʹÓÃÏàͬµÄft.dfsº¯Êýµ÷Ó㬵«²»´«ÈëÈκÎÌØÕ÷»ùÔª£º
# Perform deep
feature synthesis without specifying primitives
features, feature_names = ft.dfs(entityset=es,
target_entity='clients',
max_depth = 2)
features.head() |

FeaturetoolsΪÎÒÃǹ¹½¨ÁËÐí¶àÐÂÌØÕ÷¡£ËäÈ»´Ë¹ý³Ì»á×Ô¶¯´´½¨ÐÂÌØÕ÷£¬µ«ÈÔÐèÒªÊý¾Ý¿ÆÑ§¼ÒÀ´ÅªÇå³þÈçºÎ´¦ÀíËùÓÐÕâÐ©ÌØÕ÷¡£ÀýÈ磬Èç¹ûÎÒÃǵÄÄ¿±êÊÇÔ¤²â¿Í»§ÊÇ·ñ»á³¥»¹´û¿î£¬ÎÒÃÇ¿ÉÒÔѰÕÒÓëÖ¸¶¨½á¹û×îÏà¹ØµÄÌØÕ÷¡£´ËÍ⣬Èç¹ûÎÒÃÇÓÐÁìÓò֪ʶ£¬ÎÒÃÇ¿ÉÒÔʹÓÃËüÀ´Ñ¡ÔñÌØ¶¨µÄÌØÕ÷»ùÔª»òÖÖ×ÓÉî¶ÈÌØÕ÷ºÏ³ÉºòÑ¡ÌØÕ÷¡£
ÏÂÒ»²½
×Ô¶¯»¯ÌØÕ÷¹¤³ÌËäÈ»½â¾öÁËÒ»¸öÎÊÌ⣬µ«ÓÖµ¼ÖÂÁËÁíÒ»¸öÎÊÌâ£ºÌØÕ÷Ì«¶à¡£ËäÈ»ÔÚÄâºÏÄ£ÐÍ֮ǰºÜÄÑ˵ÄÄÐ©ÌØÕ÷ºÜÖØÒª£¬µ«ºÜ¿ÉÄܲ¢·ÇËùÓÐÕâÐ©ÌØÕ÷¶¼ÓëÎÒÃÇÏëҪѵÁ·Ä£Ð͵ÄÈÎÎñÏà¹Ø¡£´ËÍâ£¬ÌØÕ÷Ì«¶à¿ÉÄܻᵼÖÂÄ£ÐÍÐÔÄܲ»¼Ñ£¬ÒòΪһЩ²»ÊǺÜÓÐÓõÄÌØÕ÷»áÑÍûÄÇЩ¸üÖØÒªµÄÌØÕ÷¡£
ÌØÕ÷¹ý¶àµÄÎÊÌâ±»³ÆÎªÎ¬¶È×çÖä ¡£Ëæ×ÅÌØÕ÷ÊýÁ¿µÄÔö¼Ó£¨Êý¾ÝµÄά¶ÈÔö¼Ó£©£¬Ä£ÐÍÔ½À´Ô½ÄÑÒÔÑ§Ï°ÌØÕ÷ºÍÄ¿±êÖ®¼äµÄÓ³É䡣ʵ¼ÊÉÏ£¬Ä£ÐÍÖ´ÐÐËùÐèµÄÊý¾ÝÁ¿Ëæ×ÅÌØÕ÷ÊýÁ¿³ÊÖ¸Êý¼¶Ôö³¤¡£
ά¶È×çÖäÓëÌØÕ÷Ëõ¼õ£¨Ò²³ÆÎªÌØÕ÷Ñ¡Ôñ£©Ïà¶ÔÓ¦£ºÉ¾³ý²»Ïà¹ØÌØÕ÷µÄ¹ý³Ì¡£ÌØÕ÷Ñ¡Ôñ¿ÉÒÔ²ÉÓöàÖÖÐÎʽ£ºÖ÷³É·Ö·ÖÎö£¨PCA£©£¬SelectKBest£¬Ê¹ÓÃÄ£ÐÍÖеÄÌØÕ÷ÖØÒªÐÔ£¬»òʹÓÃÉî¶ÈÉñ¾ÍøÂç½øÐÐ×Ô¶¯±àÂë¡£µ«ÊÇ£¬¼õÉÙ¹¦ÄÜÊÇÁíһƪÎÄÕµÄÁíÒ»¸öÖ÷Ì⡣Ŀǰ£¬ÎÒÃÇÖªµÀÎÒÃÇ¿ÉÒÔʹÓÃfeaturetoolsÒÔ×îСµÄŬÁ¦´ÓÐí¶à±í´´½¨Ðí¶à¹¦ÄÜ£¡
½áÂÛ
Óë»úÆ÷ѧϰÖеÄÐí¶àÖ÷ÌâÒ»Ñù£¬Ê¹ÓÃfeaturetoolsµÄ×Ô¶¯»¯ÌØÕ÷¹¤³ÌÊÇÒ»¸ö»ùÓÚ¼òµ¥Ïë·¨µÄ¸´ÔÓ¸ÅÄʹÓÃʵÌ弯£¬ÊµÌåºÍ¹ØÏµµÄ¸ÅÄfeaturetools¿ÉÒÔÖ´ÐÐÉî¶ÈÌØÕ÷ºÏ³ÉÒÔн¨ÌØÕ÷¡£
¾ÛºÏ¾ÍÊǽ«Éî¶ÈÌØÕ÷ºÏ³ÉÒÀ´Î½«ÌØÕ÷»ùÔª¶Ñµþ £¬ÀûÓÃÁË¿ç±íÖ®¼äµÄÒ»¶Ô¶à¹ØÏµ£¬¶ø×ª»»ÊÇÓ¦ÓÃÓÚµ¥¸ö±íÖеÄÒ»¸ö»ò¶à¸öÁеĺ¯Êý£¬´Ó¶à¸ö±í¹¹½¨ÐÂÌØÕ÷¡£
ÔÚÒÔºóµÄÎÄÕÂÖУ¬ÎÒ½«Õ¹Ê¾ÈçºÎʹÓÃÕâÖÖ¼¼Êõ½â¾öÏÖʵÖеÄÎÊÌ⣬Ҳ¾ÍÊÇĿǰÕýÔÚKaggleÉÏÖ÷³ÖµÄHome
Credit Default Risk¾ºÈü¡£Çë¼ÌÐø¹Ø×¢¸ÃÌû×Ó£¬Í¬Ê±ÔĶÁ´Ë½éÉÜÒÔ¿ªÊ¼²Î¼Ó±ÈÈü£¡ÎÒÏ£ÍûÄúÏÖÔÚ¿ÉÒÔʹÓÃ×Ô¶¯»¯ÌØÕ÷¹¤³Ì×÷ΪÊý¾Ý¿ÆÑ§¹ÜµÀµÄ¸¨Öú¹¤¾ß¡£Ä£Ð͵ÄÐÔÄÜÊÇÓÉÎÒÃÇÌṩµÄÊý¾ÝËù¾ö¶¨µÄ£¬¶ø×Ô¶¯»¯¹¦Äܹ¤³Ì¿ÉÒÔ°ïÖúÌá¸ß½¨Á¢ÐÂÌØÕ÷µÄЧÂÊ¡£ |