Äú¿ÉÒÔ¾èÖú£¬Ö§³ÖÎÒÃǵĹ«ÒæÊÂÒµ¡£

1Ôª 10Ôª 50Ôª





ÈÏÖ¤Â룺  ÑéÖ¤Âë,¿´²»Çå³þ?Çëµã»÷Ë¢ÐÂÑéÖ¤Âë ±ØÌî



  ÇóÖª ÎÄÕ ÎÄ¿â Lib ÊÓÆµ iPerson ¿Î³Ì ÈÏÖ¤ ×Éѯ ¹¤¾ß ½²×ù Model Center   Code  
»áÔ±   
   
 
     
   
 ¶©ÔÄ
  ¾èÖú
»úÆ÷ѧϰʵս(ÓÃScikit-learnºÍTensorFlow½øÐлúÆ÷ѧϰ)(Èý)
 
  2965  次浏览      30
 2018-11-22
 
±à¼­ÍƼö:

±¾ÎÄÀ´×ÔÓÚcsdn£¬±¾ÎÄÕÂÖ÷Òª½éÉÜÁËÕæÊµÊý¾Ý£¨csv±í¸ñÊý¾Ý£©ÑµÁ·¼¯µÄ²é¿´ÓëÔ¤´¦ÀíÒÔ¼°PinelineµÄ»ù±¾¼Ü¹¹µÄʵս²Ù×÷£¬Ï£Íû¶ÔÄúµÄѧϰÓаïÖú¡£

Èý¡¢¿ªÊ¼ÊµÕ½

7¡¢Ñ¡Ôñ¼°ÑµÁ·Ä£ÐÍ

Ê×Ïȳ¢ÊÔѵÁ·Ò»¸öÏßÐԻعéÄ£ÐÍ£¨LinearRegression£©

from sklearn.linear_model import LinearRegression
lin_reg = LinearRegression()
lin_reg.fit(train_housing_prepared, train_housing_labels)

ѵÁ·Íê³É£¬È»ºóÆÀ¹ÀÄ£ÐÍ£¬¼ÆËãѵÁ·¼¯Öеľù·½¸ùÎó²î£¨RMSE£©

from sklearn.metrics import mean_squared_error
housing_predictions = lin_reg.predict(train_housing_prepared)
lin_mse = mean_squared_error(train_housing_labels, housing_predictions)
lin_rmse = np.sqrt(lin_mse)
lin_rmse

¿ÉÒÔ¿´µ½ÏßÐԻعéÄ£Ð͵ÄѵÁ·¼¯¾ù·½Îó²îΪ68626

ÔÙÊÔÊÔ¿´¸üÇ¿´óµÄÄ£ÐÍ£¬¾ö²ßÊ÷Ä£ÐÍ£¨DecisionTreeRegressor£©

from sklearn.tree import DecisionTreeRegressor
tree_reg = DecisionTreeRegressor()
tree_reg.fit(train_housing_prepared, train_housing_labels)
housing_predictions = tree_reg.predict(train_housing_prepared)
tree_mse = mean_squared_error(train_housing_labels, housing_predictions)
tree_rmse = np.sqrt(tree_mse)
tree_rmse

¿ÉÒÔ¿´µ½¾ö²ßÊ÷»Ø¹éÄ£Ð͵ĵÄѵÁ·¼¯¾ù·½Îó²î¾¹È»Îª0¡£±ÈÏßÐԻعéÄ£Ð͵ĵÄѵÁ·¼¯¾ù·½Îó²îС̫¶àÌ«¶à¡£

µ«ÕâÊÇ·ñ˵Ã÷Á˾ö²ßÊ÷»Ø¹éÄ£ÐͱÈÏßÐԻعéÄ£ÐÍÔÚ´ËÎÊÌâÉϺúܶ࣬µ±È»²»ÊÇ£¬ÑµÁ·Îó²îСµÄÄ£ÐͲ¢²»´ú±íΪºÃÄ£ÐÍ£¬ÕâÊÇÒòΪģÐÍ¿ÉÄܹý¶ÈµØÑ§Ï°ÁËѵÁ·¼¯µÄÊý¾Ý£¬Ö»ÊÇÔÚѵÁ·¼¯ÉϵıíÏֺ㨼´¹ýÄâºÏ£©£¬Ò»µ©²âÊÔеÄÊý¾Ý±íÏ־ͻáºÜ²î¡£

Òò´ËÔÚѵÁ·µÄʱºòÐèÒª½«²¿·ÖµÄѵÁ·Êý¾ÝÌáÈ¡³öÀ´×÷ΪÑéÖ¤¼¯£¬ÑéÖ¤¸ÃÄ£ÐÍÊÇ·ñ¶Ô´ËÎÊÌâÊÊÓá£ÆäÖбȽϳ£ÓõľÍÊǽ»²æÑéÖ¤·¨¡£

½»²æÑéÖ¤·¨

½»²æÑéÖ¤µÄ»ù±¾Ë¼ÏëÊǽ«ÑµÁ·Êý¾Ý¼¯·ÖΪk·Ý£¬Ã¿´ÎÓÃk-1·ÝѵÁ·Ä£ÐÍ£¬ÓÃÊ£ÓàµÄ1·Ý×÷ΪÑéÖ¤¼¯¡£°´Ë³ÐòѵÁ·k´Îºó£¬¼ÆËãk´ÎµÄƽ¾ùÎó²îÀ´ÆÀ¼ÛÄ£ÐÍ£¨¸Ä±ä²ÎÊýºó¼´ÎªÁíÒ»¸öÄ£ÐÍ£©µÄºÃ»µ¡££¨¾ßÌå×ö·¨¿ÉÒÔ¿´°Ù¶È°Ù¿Æ£©

ÔÚScikit-LearnÖн»²æÑéÖ¤¶ÔÓ¦µÄÀàΪcross_val_score£¬ÏÂÃæÊÇÏßÐԻعéÄ£ÐÍÓë¾ö²ßÊ÷»Ø¹éÄ£Ð͵Ľ»²æÑé֤ʵÀý£º

from sklearn.model_selection import cross_val_score
tree_scores = cross_val_score(tree_reg, train_housing_prepared, train_housing_labels,
scoring="neg_mean_squared_error", cv=10)
lin_scores = cross_val_score(lin_reg, train_housing_prepared, train_housing_labels,
scoring="neg_mean_squared_error", cv=10)
tree_rmse_scores = np.sqrt(-tree_scores)
lin_rmse_scores = np.sqrt(-lin_scores)
def display_scores(scores):
print("Scores:", scores)
print("Mean:", scores.mean())
print("Standard deviation:", scores.std())
display_scores(tree_rmse_scores)
display_scores(lin_rmse_scores)

ÆäÖвÎÊýscoringΪѡÔñÒ»¸öÖ¸±ê£¬´úÂëÖÐÑ¡µÄΪ¾ù·½Îó²î£»²ÎÊýcvÊǽ»²æÑéÖ¤»®·ÖµÄ¸öÊý£¬ÕâÀﻮΪΪ10·Ý¡£

ÐèҪעÒ⣺ÕâÀï¾­¹ý½»²æÑéÖ¤Çó¾ù·½Îó²îµÄ½á¹ûΪ¸ºÖµ£¬ËùÒÔºóÃæÇ󯽷½¸ùǰÐèÒª¼Ó¸ººÅ¡£

¿ÉÒÔ¿´µ½¾ö²ßÊ÷»Ø¹éÄ£Ð͵Ľ»²æÑé֤ƽ¾ùÎó²îΪ71163£¬¶øÏßÐԻعéÄ£Ð͵Ľ»²æÑé֤ƽ¾ùÎó²îΪ69051£¬Õâ˵Ã÷¾ö²ßÊ÷»Ø¹éÄ£ÐÍÃ÷ÏÔÊǹýÄâºÏ£¬Êµ¼ÊÉϱÈÏßÐԻعéÄ£ÐÍÒª²îһЩ¡£

³ýÁËÕâÁ½¸ö¼òµ¥µÄÄ£ÐÍÒÔÍ⣬»¹Ó¦¸ÃÊÔÑ鲻ͬµÄÄ£ÐÍ£¨ÈçËæ»úÉ­ÁÖ£¬²»Í¬ºËµÄSVM£¬Éñ¾­ÍøÂçµÈ£©£¬×îÖÕÑ¡Ôñ2-5¸öºòÑ¡µÄÄ£ÐÍ¡££¨Ò²¿ÉÒÔдµ½Í¬Ò»¸öÎļþÏ£¬·½±ãÒÔºóÖ±½Óµ÷Óã©

±£´æÄ£ÐÍ

×îºó½éÉÜÒ»ÏÂÈçºÎ±£´æÄ£Ð͵½±¾µØ£¨Ó²ÅÌ£©ÓëÖØÐ¼ÓÔØ±¾µØÄ£ÐÍ£¬¿ÉÒÔʹÓÃPickle¿â£¬Ò²¿ÉÒÔʹÓÃscikit-learnÖеÄjoblib¿â£¬¾ßÌå´úÂëÈçÏ£º

from sklearn.externals import joblib
joblib.dump(my_model, "my_model.pkl") #±£´æÄ£ÐÍ
# and later...
my_model_loaded = joblib.load("my_model.pkl") #¼ÓÔØÄ£ÐÍ

8¡¢Ä£Ð͵÷²Î

ÏÖÔÚÒѾ­ÓÐһЩºòÑ¡µÄÄ£ÐÍ£¬ÄãÐèÒª¶ÔÄ£Ð͵IJÎÊý½øÐÐ΢µ÷£¬Ê¹Ä£ÐͱíÏֵĸüºÃ¡£ÏÂÃæ½éÉܼ¸ÖÖµ÷²Î·½·¨

Íø¸ñËÑË÷£¨Grid Search£©

scikit-learnÖÐÌṩº¯ÊýGridSearchCVÓÃÓÚÍø¸ñËÑË÷µ÷²Î£¬Íø¸ñËÑË÷¾ÍÊÇͨ¹ý×Ô¼º¶ÔÄ£ÐÍÐèÒªµ÷ÕûµÄ¼¸¸ö²ÎÊýÉ趨һЩ¿ÉÐÐÖµ£¬È»ºóGrid Search»áÅÅÁÐ×éºÏÕâЩ²ÎÊýÖµ£¬Ã¿Ò»ÖÖÇé¿ö¶¼È¥ÑµÁ·Ò»¸öÄ£ÐÍ£¬¾­¹ý½»²æÑéÖ¤½ñºóÊä³ö½á¹û¡£ÏÂÃæÎªËæ»úÉ­ÁֻعéÄ£ÐÍ£¨RandomForestRegression£©µÄÒ»¸öGrid SearchµÄÀý×Ó¡£

from sklearn.model_selection import GridSearchCV
param_grid = [
{'n_estimators': [3, 10, 30], 'max_features': [2, 4, 6, 8]},
{'bootstrap': [False], 'n_estimators': [3, 10], 'max_features': [2, 3, 4]},
]
forest_reg = RandomForestRegressor()
grid_search = GridSearchCV(forest_reg, param_grid, cv=5,
scoring='neg_mean_squared_error')
grid_search.fit(train_housing_prepared, train_housing_labels)

Àý×ÓÖÐÊ×Ïȵ÷µÚÒ»ÐеIJÎÊýΪn_estimatorsºÍmax_features£¬¼´ÓÐ3*4=12ÖÖ×éºÏ£¬È»ºóÔÙµ÷µÚ¶þÐеIJÎÊý£¬¼´2*3=6ÖÖ×éºÏ£¬¾ßÌå²ÎÊýµÄ´ú±íµÄÒâ˼ÒÔºóÔÙ½²Êö¡£×ܹ²×éºÏÊýΪ12+6=18ÖÖ×éºÏ¡£Ã¿ÖÖ½»²æÑéÖ¤5´Î£¬¼´18*5=90´ÎÄ£ÐͼÆË㣬ËäÈ»ÔËËãÁ¿±È½Ï´ó£¬µ«ÔËÐÐÍêºóÄܵõ½½ÏºÃµÄ²ÎÊý¡£

Êä³ö×îºÃµÄ²ÎÊý

grid_search.best_params_

¿ÉÒÔ¿´µ½×îºÃ²ÎÊýÖÐ30ÊÇÑ¡¶¨²ÎÊýµÄ±ßÔµ£¬ËùÒÔ¿ÉÒÔÔÙÑ¡¸ü´óµÄÊýÊÔÑ飬¿ÉÄÜ»áµÃµ½¸üºÃµÄÄ£ÐÍ£¬»¹¿ÉÒÔÔÚ6¸½½üÑ¡¶¨²ÎÊý£¬Ò²¿ÉÄÜ»áµÃµ½¸üºÃµÄÄ£ÐÍ¡£

Êä³ö×îºÃ²ÎÊýµÄÄ£ÐÍ

grid_search.best_params_

Ò²¿ÉÒÔ¿´¿´Ã¿Ò»¸ö×éºÏ·Ö±ðµÄ½»²æÑéÖ¤µÄ½á¹û

cvres = grid_search.cv_results_
... for mean_score, params in zip(cvres["mean_test_score"], cvres["params"]):
... print(np.sqrt(-mean_score), params)

Ëæ»úËÑË÷£¨Randomized Search£©

ÓÉÓÚÉÏÃæµÄÍø¸ñËÑË÷ËÑË÷¿Õ¼äÌ«´ó£¬¶ø»úÆ÷¼ÆËãÄÜÁ¦²»×㣬Ôò¿ÉÒÔͨ¹ý¸ø²ÎÊýÉ趨һ¶¨µÄ·¶Î§£¬ÔÚ·¶Î§ÄÚʹÓÃËæ»úËÑË÷Ñ¡Ôñ²ÎÊý£¬Ëæ»úËÑË÷µÄºÃ´¦ÊÇÄÜÔÚ¸ü´óµÄ·¶Î§ÄÚ½øÐÐËÑË÷£¬²¢ÇÒ¿ÉÒÔͨ¹ýÉ趨µü´ú´ÎÊýn_iter£¬¸ù¾Ý»úÆ÷µÄ¼ÆËãÄÜÁ¦À´È·¶¨²ÎÊý×éºÏµÄ¸öÊý£¬ÊÇÏÂÃæ¸ø³öÒ»¸öËæ»úËÑË÷µÄÀý×Ó¡£

from sklearn.model_selection import RandomizedSearchCV
param_ran={'n_estimators':range(30,50),'max_features': range(3,8)}
forest_reg = RandomForestRegressor()
random_search = RandomizedSearchCV(forest_reg,param_ran,cv=5,

scoring='neg_mean_squared_error',n_iter=10)
random_search.fit(train_housing_prepared, train_housing_labels)

·ÖÎö×îºÃµÄÄ£ÐÍÿ¸öÌØÕ÷µÄÖØÒªÐÔ

¼ÙÉèÏÖÔÚµ÷²ÎÒÔºóµÃµ½×îºÃµÄ²ÎÊýÄ£ÐÍ£¬È»ºó¿ÉÒԲ鿴ÿ¸öÌØÕ÷¶ÔÔ¤²â½á¹ûµÄ¹±Ï׳̶ȣ¬¸ù¾Ý¹±Ï׳̶ȣ¬¿ÉÒÔɾ¼õ¼õÉÙһЩ²»±ØÒªµÄÌØÕ÷¡£

feature_importances = grid_search.best_estimator_.feature_importances_
extra_attribs = ["rooms_per_hhold", "pop_per_hhold", "bedrooms_per_room"]
cat_one_hot_attribs = list(encoder.classes_)
attributes = num_attribs + extra_attribs + cat_one_hot_attribs
sorted(zip(feature_importances, attributes), reverse=True)

¿ÉÒÔ¿´µ½ocean_proximityÖеÄ4¸öÌØÕ÷ÖÐÖ»ÓÐÒ»¸öÌØÕ÷ÊÇÓÐÓõ쬯äËû3¸ö¼¸ºõûÓÐÓã¬ËùÒÔ¿ÉÒÔ¿¼ÂÇÈ¥³ýÆäËû3¸öÌØÕ÷¡£

ÔÚ²âÊÔ¼¯ÖÐÆÀ¹À

¾­¹ýŬÁ¦ÖÕÓڵõ½ÁË×îÖÕµÄÄ£ÐÍ£¬ÏÖÔھͲîÔÚ²âÊÔ¼¯ÉÏÑéÖ¤Õâ¸öÄ£Ð͵ķº»¯ÄÜÁ¦ÒÔ¼°×¼È·ÐÔ¡£²âÊÔ¼¯ÖеIJÙ×÷ºÍѵÁ·¼¯ÖеIJÙ×÷»ù±¾Ïàͬ£¬Î¨Ò»²»Í¬µÄÊDz»ÐèÒªfit()£¬Ö»ÐèÒªtransform()¾Í¿ÉÒÔÁË£¬ÕâÊÇÒòΪ²âÊÔ¼¯²»ÊÇÓÃÀ´ÑµÁ·Ä£ÐÍ£¬ËùÒÔ²»ÓÃfit()£¬ËùÒÔ½«fit_transform()¸ÄΪtransform()¡£

final_model = grid_search.best_estimator_
X_test = strat_test_set.drop("median_house_value", axis=1)
y_test = strat_test_set["median_house_value"].copy()
X_test_prepared = full_pipeline.transform(X_test)
final_predictions = final_model.predict(X_test_prepared)
final_mse = mean_squared_error(y_test, final_predictions)
final_rmse = np.sqrt(final_mse)

¿ÉÒÔ·¢ÏÖ£¬½á¹ûºÍ½»²æÑéÖ¤ÒÔºóµÄ½á¹û±È½ÏÏàËÆ£¬ËµÃ÷¾­¹ý½»²æÑéÖ¤ºó£¬ÔÚеÄÊý¾Ý¼¯ÉÏÒ²ÄÜ´ïµ½ÀàËÆµÄЧ¹û¡£

ÐèҪעÒ⣺ÔÚ²âÊÔ¼¯Öв¹È±Ê§Öµ£¬±ê×¼»¯µÈÓõ½µÄÖµ¶¼ÊÇѵÁ·¼¯ÉϵÄÖÐÖµ£¬Æ½¾ùÖµµÈ£¬¶ø²»ÊDzâÊÔ¼¯Éϵġ£ÒòΪ±ØÐë°ÑÊý¾Ý·ÅËõµ½Í¬Ò»³ß¶È¡£

×îºó»¹¿ÉÒÔ·ÖÎöÕâ¸öÄ£ÐÍѧϰµ½ÁËʲô£¬Ã»×öµ½Ê²Ã´£¬×÷³öÁËʲô¼ÙÉ裬ÓÐʲô¾ÖÏÞÐÔ£¬µÃµ½ÁËʲô½áÂÛ£¨±ÈÈçmedian incomeÊÇ×îÓ°Ïì½á¹ûµÄ£©

   
2965 ´Îä¯ÀÀ       30
Ïà¹ØÎÄÕÂ

»ùÓÚͼ¾í»ýÍøÂçµÄͼÉî¶Èѧϰ
×Ô¶¯¼ÝÊ»ÖеÄ3DÄ¿±ê¼ì²â
¹¤Òµ»úÆ÷ÈË¿ØÖÆÏµÍ³¼Ü¹¹½éÉÜ
ÏîĿʵս£ºÈçºÎ¹¹½¨ÖªÊ¶Í¼Æ×
 
Ïà¹ØÎĵµ

5GÈ˹¤ÖÇÄÜÎïÁªÍøµÄµäÐÍÓ¦ÓÃ
Éî¶ÈѧϰÔÚ×Ô¶¯¼ÝÊ»ÖеÄÓ¦ÓÃ
ͼÉñ¾­ÍøÂçÔÚ½»²æÑ§¿ÆÁìÓòµÄÓ¦ÓÃÑо¿
ÎÞÈË»úϵͳԭÀí
Ïà¹Ø¿Î³Ì

È˹¤ÖÇÄÜ¡¢»úÆ÷ѧϰ&TensorFlow
»úÆ÷ÈËÈí¼þ¿ª·¢¼¼Êõ
È˹¤ÖÇÄÜ£¬»úÆ÷ѧϰºÍÉî¶Èѧϰ
ͼÏñ´¦ÀíËã·¨·½·¨Óëʵ¼ù