±à¼ÍƼö: |
±¾ÎÄÀ´×ÔÓÚfujiabin,±¾Æª»áʹÓÃscikit-learnÕâ¸ö¿ªÔ´»úÆ÷ѧϰ¿âÀ´¶ÔirisÊý¾Ý¼¯½øÐзÖÀàÁ·Ï°¡£
|
|
ǰÑÔ
ÎÒ½«·Ö±ðʹÓÃÁ½ÖÖ²»Í¬µÄscikit-learnÄÚÖÃËã·¨¡ª¡ªDecision Tree£¨¾ö²ßÊ÷£©ºÍkNN£¨ÁÚ½üËã·¨£©£¬ËæºóÎÒÒ²»á³¢ÊÔ×Ô¼ºÊµÏÖkNNËã·¨¡£Ä¿Ç°ÎªÖ¹£¬ÎÒ»¹ÊÇÔÚ»úÆ÷ѧϰµÄÈëÃŽ׶Σ¬ÎÄÕÂÖÐÔݲ»Ïêϸ½âÊÍËã·¨ÔÀí£¬Èç¹ûÏëÁ˽âϸ½ÚÐÅÏ¢¿É×ÔÐÐËÑË÷¡£
´úÂë·Ö½â
¶ÁÈ¡Êý¾Ý¼¯
scikit-learnÖÐÔ¤ÖÆÁ˺ܶྵäÊý¾Ý¼¯£¬·Ç³£·½±ãÎÒÃÇ×Ô¼ºÁ·Ï°Óá£Ê¹Ó÷½Ê½Ò²ºÜÈÝÒ×£º
# ÒýÈëdatasets
from sklearn import datasets
# »ñÈ¡ËùÐèÊý¾Ý¼¯
iris = datasets.load_iris() |
load_iris·µ»ØµÄ½á¹ûÓÐÈçÏÂÊôÐÔ£º
feature_names - ·Ö±ðΪ£ºsepal length (cm)£¬ sepal width
(cm)£¬ petal length (cm)ºÍ petal width (cm)
data - ÿÐеÄÊý¾Ý£¬Ò»¹²ËÄÁУ¬Ã¿Ò»ÁÐÓ³ÉäΪfeature_namesÖжÔÓ¦µÄÖµ
target - ÿÐÐÊý¾Ý¶ÔÓ¦µÄ·ÖÀà½á¹ûÖµ£¨Ò²¾ÍÊÇÿÐÐÊý¾ÝµÄlabelÖµ£©£¬ÆäֵΪ[0,1,2]
target_names - targetµÄÖµ¶ÔÓ¦µÄÃû³Æ£¬ÆäֵΪ['setosa' 'versicolor'
'virginica']
·ÖÀëÊý¾Ý
¼à¶½Ñ§Ï°¿ÉÒÔÓÃÒ»¸ö¼òµ¥µÄÊýѧ¹«Ê½À´´ú±í£º
y = f(X)
°´ÉÏһƪÖеÄÏà¹ØÊõÓïÃèÊö¾ÍÊÇÒÑÖªX£¨features£©£¬Í¨¹ý·½·¨f£¨classifier£©Çóy£¨label£©¡£
°´ÕÕÕâ¸ö˼·£¬ÎÒ½«irisÊý¾Ý·ÖÀëΪ£º
# X = features
X = iris.data
# y = label
y = iris.target |
ÄÇÈçºÎÀ´Ê¹ÓÃÊý¾ÝÄØ£¿ÒòΪֻÓÐ150ÐÐÊý¾Ý£¬ËùÒÔΪÁËÑéÖ¤Ëã·¨µÄÕýÈ·ÐÔ£¬ÐèÒª½«Êý¾Ý·Ö³ÉÁ½²¿·Ö£ºÑµÁ·Êý¾ÝºÍ²âÊÔÊý¾Ý£¬ºÜÐÒÔ˵ÄÊÇscikit-learnÒ²ÌṩÁË·½±ã·ÖÀëÊý¾ÝµÄ·½·¨train_test_split£¬ÎÒ½«Êý¾Ý·ÖÀë³É60%£¨¼´90ÌõÊý¾Ý£©ÓÃÓÚѵÁ·£¬40%£¨¼´60ÌõÊý¾Ý£©ÓÃÓÚ²âÊÔ£¬´úÂëÈçÏÂ:
from sklearn.model_selection
import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X,
y, test_size=.6) |
ÄÚÖÃËã·¨¡ª¡ªDecision Tree£¨¾ö²ßÊ÷£©
ÉÏһƪÖÐÒѾÓùý¾ö²ßÊ÷£¬Ê¹Óþö²ßÊ÷µÄ´úÂë¼òµ¥ÈçÏ£º
# Decision tree
classifier
# Éú³É¾ö²ßÊ÷
my_classification = tree.DecisionTreeClassifier()
# ѵÁ·
my_classification.fit(X_train, y_train)
# Ô¤²â
predictions = my_classification.predict(X_test) |
ͨ¹ý¾ö²ßÊ÷Ëã·¨£¬×îÖյõ½µÄÄ£Ð͵Ä׼ȷÂÊÓжàÉÙÄØ£¿Õâ¸öʱºò¿ÉÒÔʹÓÃscikit-learnµÄaccuracy_score·½·¨£º
# »ñµÃÔ¤²â׼ȷÂÊ
print(accuracy_score(y_test, predictions)) |
ÓÉÓÚtrain_test_splitÊÇËæ»úÇзÖÊý¾Ý£¬Òò´Ë×îÖÕÅܳöÀ´µÄ׼ȷÂʲ»ÊÇÒ»¸ö¹Ì¶¨Öµ
ÄÚÖÃËã·¨¡ª¡ªkNN£¨ÁÚ½üËã·¨£©
kNNËã·¨¾ÍÊÇѡȡk¸ö×î½üÁÚ¾ÓÀ´¹éÀàÑù±¾ÖµµÄ·½·¨£¬ÕâÊÇ×î¼òµ¥µÄÒ»ÖÖ·ÖÀàËã·¨£¬µ±È»È±µãÒ²ºÜÃ÷ÏÔ£¬±ØÐëÑ»·¼ÆËã²âÊÔÑù±¾ÖµºÍËùÓеÄÑù±¾Ö®¼äµÄ¾àÀ룬ÔËÐÐЧÂʱȽϵ͡£
ÔÚÑ¡ÓÃkNNËã·¨µÄʱºò£¬kÖµ×îºÃÊÇÆæÊý£¬Å¼ÊýÖµ»áÔì³ÉÎÞ·¨¹éµ½Î¨Ò»ÀàµÄÇé¿ö£¨ÊôÓÚ²»Í¬·ÖÀàµÄ¸ÅÂÊÕýºÃÏàµÈ£©¡£
Ö»ÐèÔÚÉÏÊö·ÖÀëÊý¾ÝÖ®ºó£¬½«¾ö²ßÊ÷Ëã·¨µÄ´úÂëÌæ»»Îª£º
# N neighbors
classifier
# Éú³ÉkNN
my_classification2 = KNeighborsClassifier(n_neighbors=5)
# ѵÁ·
my_classification2.fit(X_train, y_train)
# Ô¤²â
predictions2 = my_classification2.predict(X_test)
# »ñµÃÔ¤²â׼ȷÂÊ
print(accuracy_score(y_test, predictions2)) |
ÓÉÓÚtrain_test_splitÊÇËæ»úÇзÖÊý¾Ý£¬Òò´Ë×îÖÕÅܳöÀ´µÄ׼ȷÂʲ»ÊÇÒ»¸ö¹Ì¶¨Öµ¡£¶øÇÒÓÉÓÚËã·¨²»Í¬£¬¼´±ãÊÇÏàͬµÄÊý¾Ý£¬ÅܳöÀ´µÄ׼ȷÂÊÒ²ºÍ¾ö²ßÊ÷ÅܳöÀ´µÄ²»Í¬¡£
×Ô¼ºÊµÏÖkNN
»ù±¾Ë¼Â·ÊÇÑØÓÃÉÏÊöÄÚÖÃkNNËã·¨µÄ´úÂë£¬ÖØÐÂʵÏÖKNeighborsClassifier£¬³ÆÖ®ÎªMyKNNºÃÁË¡£³ýÁ˳õʼ»¯º¯ÊýÖ®Í⣬»¹ÐèÒªfitºÍpredictÕâÁ½¸ö·½·¨£¬²¢ÇÒ·½·¨Ç©ÃûºÍÔÏȵı£³ÖÒ»Ö£¬ËùÒÔMyKNNÀàµÄ»ù±¾½á¹¹ÈçÏ£º
class MyKNN:
def __init__(self, n_neighbors=5):
pass
def fit(self, X_train, y_train):
pass
def predict(self, X_test):
pass |
ʵÏÖ__init__
³õʼ»¯·½·¨½öÐè³õʼ»¯¼¸¸ö²ÎÊýÒÔ±ãºóÐøÊ¹Óãº
def __init__(self,
n_neighbors=5):
self.n_neighbors = n_neighbors
self.X_train = None
self.y_train = None |
ʵÏÖfit
ÔÚÕâÀïÎÒ¼òµ¥´¦Àí¸Ã·½·¨£¬ÓÉÓÚÔÏÈfit·½·¨°üº¬ÁËXºÍyÁ½¸ö²ÎÊý£¬Òò´ËÑØÓø÷½·¨Ç©Ãû£¬ÕâÑù¾Í²»ÐèÒª¸Ä¶¯ÆäËû´úÂëÁË£º
def fit(self,
X_train, y_train):
self.X_train = X_train
self.y_train = y_train |
ʵÏÖpredict
ÎÒÐèÒªÔڸ÷½·¨ÖбéÀú¼ÆËã²âÊÔÊý¾ÝºÍѵÁ·Êý¾ÝÖ®¼äµÄ¾àÀ룬Á½µãÖ®¼äµÄ¾àÀë¿ÉÒÔʹÓÃÅ·¼¸ÀïµÃ¹«Ê½£¬Òò´ËÐèÒªÏȶ¨ÒåÒ»¸öÍⲿ·½·¨my_euclidean£º
def my_euclidean(a,
b):
return distance.euclidean(a, b) |
ÎÒÐèÒª¼ÆË㵱ǰ²âÊÔÊý¾ÝÓëK¸ö×î½ü¾àÀëµÄѵÁ·Êý¾ÝÖ®¼äµÄÖµ£¬È»ºó¿´Ò»ÏÂÕâK¸öÊý¾ÝÖУ¬×î¶àµÄ·ÖÀàÊÇÄÄÖÖ£¬Ôò¿ÉÈÏΪ²âÊÔÊý¾ÝÒ²ÊôÓÚ¸ÃÖÖ·ÖÀࣨ¸ÅÂÊ×î¸ß£©¡£Òò´ËÏȶ¨ÒåÒ»¸ö˽Óз½·¨__closest£º
def __closest(self,
row):
all_labels = []
for i in range (0, len(self.X_train)):
dist = my_euclidean (row, self.X_train[i])
# »ñÈ¡k¸ö×î½ü¾àÀëµÄÁÚ¾Ó£¬¸ñʽΪ (distance, index)µÄtuple¼¯ºÏ
all_labels = self.__append_neighbors (all_labels,
(dist, i))
# ½«k¸ö¾àÀë×î½üµÄÁÚ¾Ó£¬Ó³ÉäΪlabelµÄ¼¯ºÏ
nearest_ones = np.array([self.y_train[idx] for
val, idx in all_labels])
# ʹÓÃnumpyµÄunique·½·¨£¬·Ö×鼯ËãlabelµÄΨһֵ¼°Æä¶ÔÓ¦µÄÖµµÚÒ»´Î³öÏÖµÄindexºÍÖµµÄ¼ÆÊý
# Àý£º elements = [1, 2], elements_index = [3,0],
elements_count = [1, 4] Õâ¸ö½áºÏ±íʾ£º
# elements = [1, 2] £º ³öÏÖÁË1ºÍ2Á½ÖÖÀàÐ͵ÄÊý¾Ý
# elements_index = [3,0] £º 1µÚÒ»´Î³öÏÖµÄindexÊÇ3£¬ 2µÚÒ»´Î³öÏÖµÄindexÊÇ0
# elements_count = [1, 4] £º 1¹²³öÏÖÁË1´Î£¬ 2¹²³öÏÖÁË4´Î
elements, elements_index, elements_count = np.unique(nearest_ones,
return_counts= True, return_index=True)
# ·µ»Ø×î´ó¿ÉÄÜÐÔµÄÄÇÖÖÀàÐ͵ÄlabelÖµ
return elements [list(elements_count).index (max(elements_count))] |
ΪÁËÌáÉýÐÔÄÜ£¬ÎÒ¶¨ÒåÁË__append_neighbors·½·¨£¬¸Ã·½·¨½«µ±Ç°¾àÀë-ÐòºÅµÄtuple¼ÓÈëµ½Êý×éÖв¢°´ÉýÐòÅÅÐò£¬×îÖÕÖ»½ØÈ¡Ç°k¸öÖµ£¬¿ÉÒÔÓÃpythonµÄÌØÐÔºÜÈÝÒ×ʵÏÖ¸ÃÂß¼£º
def __append_neighbors(self,
arr, item):
if len(arr) <= self.n_neighbors:
arr.append(item)
return sorted(arr, key=lambda tup: tup[0])[:self.n_neighbors]! |
ºó¼Ç
¶Ì¶Ì¼¸ÐдúÂë¾ÍʵÏÖÁË×Ô¼ºµÄkNNËã·¨£¬ÎÒ±¾µØÅÜÏÂÀ´µÄ׼ȷÂÊÔÚ95%ÒÔÉÏ¡£
ÐèÒªÍêÕû´úÂë¿ÉÒÔÔÚÎÒµÄGitHubÉÏÕÒµ½¡£ |