| ±à¼ÍƼö: |
±¾ÎÄÀ´×ÔÓÚsegmentfault£¬ÎÄÕÂÏêϸ½éÉÜÁËÈçºÎÔÚPythonÖй¹½¨»úÆ÷ѧϰ·ÖÀàÆ÷µÈÏà¹ØÖªÊ¶¡£
|
|
½éÉÜ
»úÆ÷ѧϰÊǼÆËã»ú¿ÆÑ§¡¢È˹¤ÖÇÄܺÍͳ¼ÆÑ§µÄÑо¿ÁìÓò¡£»úÆ÷ѧϰµÄÖØµãÊÇѵÁ·Ëã·¨ÒÔѧϰģʽ²¢¸ù¾ÝÊý¾Ý½øÐÐÔ¤²â¡£»úÆ÷Ñ§Ï°ÌØ±ðÓмÛÖµ£¬ÒòΪËüÈÃÎÒÃÇ¿ÉÒÔʹÓüÆËã»úÀ´×Ô¶¯»¯¾ö²ß¹ý³Ì¡£
ÔÚ±¾½Ì³ÌÖУ¬Äú½«Ê¹ÓÃScikit-learn£¨PythonµÄ»úÆ÷ѧϰ¹¤¾ß£©ÔÚPythonÖÐʵÏÖÒ»¸ö¼òµ¥µÄ»úÆ÷ѧϰËã·¨¡£Äú½«Ê¹ÓÃNaive
Bayes£¨NB£©·ÖÀàÆ÷£¬½áºÏÈéÏÙ°©Ö×ÁöÐÅÏ¢Êý¾Ý¿â£¬Ô¤²âÖ×ÁöÊǶñÐÔ»¹ÊÇÁ¼ÐÔ¡£
ÔÚ±¾½Ì³Ì½áÊøÊ±£¬Äú½«Á˽âÈçºÎʹÓÃPython¹¹½¨×Ô¼ºµÄ»úÆ÷ѧϰģÐÍ¡£
×¼±¸
ÒªÍê³É±¾½Ì³Ì£¬ÄúÐèÒª£º
Python 3 ±¾µØ±à³Ì»·¾³
ÔÚvirtualenvÖа²×°Jupyter Notebook¡£Jupyter NotebooksÔÚÔËÐлúÆ÷ѧϰʵÑéʱ·Ç³£ÓÐÓá£Äú¿ÉÒÔÔËÐж̴úÂë¿é²¢¿ìËٲ鿴½á¹û£¬´Ó¶øÇáËɲâÊԺ͵÷ÊÔ´úÂë¡£
µÚÒ»²½ - µ¼ÈëScikit-learn
ÈÃÎÒÃÇÊ×ÏȰ²×°PythonÄ£¿éScikit-learn£¬ÕâÊÇPython ×îºÃ¡¢Îĵµ¼Ç¼×î¶àµÄ»úÆ÷ѧϰ¿âÖ®Ò»¡£
Òª¿ªÊ¼ÎÒÃǵıàÂëÏîÄ¿£¬ÏÈÒª¼¤»îÎÒÃǵÄPython 3±à³Ì»·¾³¡£È·±£ÄúλÓÚ»·¾³ËùÔÚµÄĿ¼ÖУ¬È»ºóÔËÐÐÒÔÏÂÃüÁ
¼¤»îÎÒÃǵıà³Ì»·¾³ºó£¬¼ì²éÊÇ·ñÒѰ²×°Sckikit-learnÄ£¿é£º
| (my_env) $ python
-c "import sklearn" |
Èç¹ûsklearnÒѰ²×°£¬Ôò´ËÃüÁÍê³ÉÇÒûÓдíÎó¡£Èç¹ûδ°²×°£¬Äú½«¿´µ½ÒÔÏ´íÎóÏûÏ¢£º
| Traceback (most
recent call last): File "<string>",
line 1, in <module> ImportError: No module
named 'sklearn' |
´íÎóÏûÏ¢±íÃ÷sklearnδ°²×°£¬Òò´ËÇëʹÓÃpipÏÂÔØ¿â£º
| (my_env) $ pip
install scikit-learn[alldeps] |
°²×°Íê³Éºó£¬Æô¶¯Jupyter Notebook£º
| (my_env) $ jupyter
notebook |
ÔÚJupyterÖУ¬´´½¨Ò»¸öÃûΪML TutorialµÄÐÂPython Notebook¡£ÔÚNotebookµÄµÚÒ»¸öµ¥Ôª¸ñ£¬ÊäÈësklearnÄ£¿é£º
ÄúµÄ NotebookÓ¦ÈçÏÂͼËùʾ£º

Notebook
ÏÖÔÚÎÒÃÇÒѾÔÚ NotebookÖе¼ÈëÁËsklearn£¬ÎÒÃÇ¿ÉÒÔ¿ªÊ¼Ê¹ÓûúÆ÷ѧϰģÐ͵ÄÊý¾Ý¼¯¡£
µÚ¶þ²½ - µ¼ÈëScikit-learnµÄÊý¾Ý¼¯
ÎÒÃǽ«ÔÚ±¾½Ì³ÌÖÐʹÓõÄÊý¾Ý¼¯ÊÇÈéÏÙ°©Íþ˹¿µÐÇÕï¶ÏÊý¾Ý¿â¡£¸ÃÊý¾Ý¼¯°üÀ¨¹ØÓÚÈéÏÙ°©Ö×ÁöµÄ¸÷ÖÖÐÅÏ¢£¬ÒÔ¼°¶ñÐÔ»òÁ¼ÐԵķÖÀà±êÇ©¡£¸ÃÊý¾Ý¼¯ÔÚ569¸öÖ×ÁöÉϾßÓÐ569¸öʵÀý»òÊý¾Ý£¬²¢ÇÒ°üÀ¨¹ØÓÚ30¸öÊôÐÔ»òÌØÕ÷µÄÐÅÏ¢£¬ÀýÈçÖ×ÁöµÄ°ë¾¶£¬ÎÆÀí£¬Æ½»¬¶ÈºÍÃæ»ý¡£
ʹÓøÃÊý¾Ý¼¯£¬ÎÒÃǽ«¹¹½¨»úÆ÷ѧϰģÐÍÒÔʹÓÃÖ×ÁöÐÅÏ¢À´Ô¤²âÖ×ÁöÊǶñÐԵϹÊÇÁ¼ÐԵġ£
Scikit-learn°²×°Á˸÷ÖÖÊý¾Ý¼¯£¬ÎÒÃÇ¿ÉÒÔ½«Æä¼ÓÔØµ½PythonÖУ¬²¢°üº¬ÎÒÃÇÏëÒªµÄÊý¾Ý¼¯¡£µ¼Èë²¢¼ÓÔØÊý¾Ý¼¯£º
...
from sklearn.datasets import load_breast_cancer
# Load dataset
data = load_breast_cancer() |
¸Ãdata±äÁ¿±íʾһ¸öÏñ×ÖµäÒ»Ñù¹¤×÷µÄPython¶ÔÏó¡£×ÖµäµÄ¹Ø¼üÊÇ·ÖÀà±êÇ©Ãû³Æ£¨target_names£©£¬Êµ¼Ê±êÇ©£¨target£©£¬ÊôÐÔ/ÌØÕ÷Ãû³Æ£¨feature_names£©ºÍÊôÐÔ£¨data£©¡£
ÊôÐÔÊÇÈκηÖÀàÆ÷µÄ¹Ø¼ü²¿·Ö¡£ÊôÐÔ²¶»ñÓйØÊý¾ÝÐÔÖʵÄÖØÒªÌØÕ÷¡£¼øÓÚÎÒÃÇÊÔͼԤ²âµÄ±êÇ©ÊǶñÐÔÖ×ÁöÓëÁ¼ÐÔÖ×Áö£¬¿ÉÄܵÄÓÐÓÃÊôÐÔÓÐÖ×ÁöµÄ´óС£¬°ë¾¶ºÍÖʵء£
Ϊÿ¸öÖØÒªÐÅÏ¢¼¯´´½¨Ð±äÁ¿²¢·ÖÅäÊý¾Ý£º
...
# Organize our data
label_names = data['target_names']
labels = data['target']
feature_names = data['feature_names']
features = data['data'] |
ÎÒÃÇÏÖÔÚÓÐÁËÿ×éÐÅÏ¢µÄÁÐ±í¡£ÎªÁ˸üºÃµØÀí½âÎÒÃǵÄÊý¾Ý¼¯£¬ÈÃÎÒÃÇͨ¹ýÊä³öÎÒÃǵÄÀà±êÇ©¡¢µÚÒ»¸öÊý¾ÝʵÀýµÄ±êÇ©¡¢ÎÒÃǵŦÄÜÃû³ÆÒÔ¼°µÚÒ»¸öÊý¾ÝʵÀýµÄ¹¦ÄÜÖµÀ´²é¿´ÎÒÃǵÄÊý¾Ý£º
...
# Look at our data
print(label_names)
print(labels[0])
print(feature_names[0])
print(features[0]) |
Èç¹ûÔËÐдúÂ룬Äú½«¿´µ½ÒÔϽá¹û£º

Êä³ö½á¹û
ÈçͼËùʾ£¬ÎÒÃǵÄÀàÃûÊǶñÐÔºÍÁ¼ÐÔ£¬È»ºó½«ÆäÓ³Éäµ½¶þ½øÖÆÖµ0ºÍ1£¬ÆäÖÐ0´ú±í¶ñÐÔÖ×Áö1´ú±íÁ¼ÐÔÖ×Áö¡£Òò´Ë£¬ÎÒÃǵĵÚÒ»¸öÊý¾ÝʵÀýÊǶñÐÔÖ×Áö£¬Æäƽ¾ù°ë¾¶Îª1.79900000e+01¡£
ÏÖÔÚÎÒÃÇÒѾ¼ÓÔØÁËÊý¾Ý£¬ÎÒÃÇ¿ÉÒÔʹÓÃÎÒÃǵÄÊý¾ÝÀ´¹¹½¨ÎÒÃǵĻúÆ÷ѧϰ·ÖÀàÆ÷¡£
µÚÈý²½ - ½«Êý¾Ý×éÖ¯µ½¼¯ºÏÖÐ
ÒªÆÀ¹À·ÖÀàÆ÷µÄÐÔÄÜ£¬ÄúÓ¦¸ÃʼÖÕÔÚ¿´²»¼ûµÄÊý¾ÝÉϲâÊÔÄ£ÐÍ¡£Òò´Ë£¬ÔÚ¹¹½¨Ä£ÐÍ֮ǰ£¬½«Êý¾Ý²ð·ÖΪÁ½²¿·Ö£ºÑµÁ·¼¯ºÍ²âÊÔ¼¯¡£
Äú¿ÉÒÔʹÓÃѵÁ·¼¯ÔÚ¿ª·¢½×¶ÎѵÁ·ºÍÆÀ¹ÀÄ£ÐÍ¡£È»ºó£¬ÄúʹÓÃѵÁ·µÄÄ£ÐͶԿ´²»¼ûµÄ²âÊÔ¼¯½øÐÐÔ¤²â¡£ÕâÖÖ·½·¨ÈÃÄúÁ˽âÄ£Ð͵ÄÐÔÄܺÍÎȽ¡ÐÔ¡£
ÐÒÔ˵ÄÊÇ£¬sklearnÓÐÒ»¸öÃûΪtrain_test_split()µÄº¯Êý£¬Ëü½«ÄúµÄÊý¾Ý»®·ÖΪÕâЩ¼¯ºÏ¡£µ¼Èë¸Ãº¯Êý£¬È»ºóʹÓÃËüÀ´²ð·ÖÊý¾Ý£º
...
from sklearn.model_selection import train_test_split
# Split our data
train, test, train_labels, test_labels = train_test_split(features,
labels,
test_size=0.33,
random_state=42) |
¸Ãº¯ÊýʹÓÃtest_size²ÎÊýËæ»ú·Ö¸îÊý¾Ý¡£ÔÚÕâ¸öÀý×ÓÖУ¬ÎÒÃÇÏÖÔÚÓÐÒ»¸ö²âÊÔ¼¯£¨test£©´ú±íÔʼÊý¾Ý¼¯µÄ33£¥¡£È»ºóʣϵÄÊý¾Ý£¨train£©×é³ÉѵÁ·Êý¾Ý¡£ÎÒÃÇ»¹ÓÐÁгµ/²âÊÔ±äÁ¿µÄÏàÓ¦±êÇ©£¬¼´train_labelsºÍtest_labels¡£
ÎÒÃÇÏÖÔÚ¿ÉÒÔ¼ÌÐøÅàѵÎÒÃǵĵÚÒ»¸öÄ£ÐÍ¡£
µÚËIJ½ - ¹¹½¨ºÍÆÀ¹ÀÄ£ÐÍ
»úÆ÷ѧϰÓкܶàÄ£ÐÍ£¬Ã¿ÖÖÄ£ÐͶ¼ÓÐ×Ô¼ºµÄÓŵãºÍȱµã¡£ÔÚ±¾½Ì³ÌÖУ¬ÎÒÃǽ«Öصã½éÉÜÒ»ÖÖͨ³£ÔÚ¶þ½øÖÆ·ÖÀàÈÎÎñÖбíÏÖÁ¼ºÃµÄ¼òµ¥Ëã·¨£¬¼´Naive
Bayes (NB)¡£
Ê×ÏÈ£¬µ¼ÈëGaussianNBÄ£¿é¡£È»ºóʹÓÃGaussianNB()º¯Êý³õʼ»¯Ä£ÐÍ£¬È»ºóͨ¹ýʹÓÃgnb.fit()½«Ä£ÐÍÄâºÏµ½Êý¾ÝÀ´ÑµÁ·Ä£ÐÍ£º
...
from sklearn.naive_bayes import GaussianNB
# Initialize our classifier
gnb = GaussianNB()
# Train our classifier
model = gnb.fit(train, train_labels) |
ÔÚÎÒÃÇѵÁ·Ä£ÐÍÖ®ºó£¬ÎÒÃÇ¿ÉÒÔʹÓÃѵÁ·µÄÄ£ÐͶÔÎÒÃǵIJâÊÔ¼¯½øÐÐÔ¤²â£¬ÕâÀÎÒÃÇʹÓÃpredict()º¯Êý¡£¸Ãpredict()º¯Êý·µ»Ø²âÊÔ¼¯ÖÐÿ¸öÊý¾ÝʵÀýµÄÔ¤²âÊý×顣ȻºóÎÒÃÇ¿ÉÒÔÊä³öÎÒÃǵÄÔ¤²â£¬ÒÔÁ˽âÄ£ÐÍÈ·¶¨µÄÄÚÈÝ¡£
ʹÓôøÓÐtestµÄpredict()º¯ÊýÊä³ö½á¹û£º
...
# Make predictions
preds = gnb.predict(test)
print(preds) |
ÔËÐдúÂ룬Äú½«¿´µ½ÒÔϽá¹û£º

Ô¤²âÊä³ö½á¹û
ÕýÈçÄúÔÚJupyter NotebookÊä³öÖп´µ½µÄ£¬¸Ãpredict()º¯Êý·µ»ØÁËÒ»¸ö0sºÍ1s
Êý×飬ËüÃÇ´ú±íÁËÎÒÃǶÔÖ×ÁöÀàµÄÔ¤²âÖµ£¨¶ñÐÔÓëÁ¼ÐÔ£©¡£
ÏÖÔÚÎÒÃÇÓÐÁËÔ¤²â£¬ÈÃÎÒÃÇÆÀ¹À·ÖÀàÆ÷µÄ±íÏÖ¡£
µÚÎå²½ - ÆÀ¹ÀÄ£Ð͵Ä׼ȷÐÔ
ʹÓÃÕæÊµÀà±êÇ©Êý×飬ÎÒÃÇ¿ÉÒÔͨ¹ý±È½ÏÁ½¸öÊý×飨test_labelsvs.preds£©À´ÆÀ¹ÀÄ£ÐÍÔ¤²âÖµµÄ׼ȷÐÔ¡£ÎÒÃǽ«Ê¹ÓÃsklearnº¯Êýaccuracy_score()À´È·¶¨»úÆ÷ѧϰ·ÖÀàÆ÷µÄ׼ȷÐÔ¡£
...
from sklearn.metrics import accuracy_score
# Evaluate accuracy
print(accuracy_score(test_labels, preds)) |
Äú½«¿´µ½ÒÔϽá¹û£º

׼ȷÐÔ½á¹û
ÕýÈçÄúÔÚÊä³öÖп´µ½µÄÄÇÑù£¬NB·ÖÀàÆ÷׼ȷÂÊΪ94.15£¥¡£ÕâÒâζ×Å·ÖÀàÆ÷ÓÐ94.15£¥µÄʱ¼äÄܹ»ÕýÈ·Ô¤²âÖ×ÁöÊǶñÐÔ»¹ÊÇÁ¼ÐÔ¡£ÕâЩ½á¹û±íÃ÷ÎÒÃǵÄ30¸öÊôÐÔµÄÌØÕ÷¼¯ÊÇÖ×ÁöÀà±ðµÄÁ¼ºÃÖ¸±ê¡£
ÄúÒѳɹ¦¹¹½¨Á˵Úһ̨»úÆ÷ѧϰ·ÖÀàÆ÷¡£ÈÃÎÒÃÇͨ¹ý½«ËùÓÐimportÓï¾ä·ÅÔÚNotebook»ò½Å±¾µÄ¶¥²¿À´ÖØÐÂ×éÖ¯´úÂë¡£´úÂëµÄ×îÖÕ°æ±¾Ó¦ÈçÏÂËùʾ£º
ML½Ì³Ì
from sklearn.datasets
import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import accuracy_score
# Load dataset
data = load_breast_cancer()
# Organize our data
label_names = data['target_names']
labels = data['target']
feature_names = data['feature_names']
features = data['data']
# Look at our data
print(label_names)
print('Class label = ', labels[0])
print(feature_names)
print(features[0])
# Split our data
train, test, train_labels, test_labels = train_test_split(features,
labels,
test_size=0.33,
random_state=42)
# Initialize our classifier
gnb = GaussianNB()
# Train our classifier
model = gnb.fit(train, train_labels)
# Make predictions
preds = gnb.predict(test)
print(preds)
# Evaluate accuracy
print(accuracy_score(test_labels, preds)) |
ÏÖÔÚ£¬Äú¿ÉÒÔ¼ÌÐøÊ¹ÓôúÂëÀ´²é¿´ÊÇ·ñ¿ÉÒÔʹ·ÖÀàÆ÷µÄÐÔÄܸü¼Ñ¡£Äú¿ÉÒÔ³¢ÊÔ²»Í¬µÄ¹¦ÄÜ×Ó¼¯£¬ÉõÖÁ³¢ÊÔÍêÈ«²»Í¬µÄËã·¨¡£
½áÂÛ
ÔÚ±¾½Ì³ÌÖУ¬ÄúѧϰÁËÈçºÎÔÚPythonÖй¹½¨»úÆ÷ѧϰ·ÖÀàÆ÷¡£ÏÖÔÚ£¬Äú¿ÉÒÔʹÓÃScikit-learnÔÚPythonÖмÓÔØÊý¾Ý¡¢×éÖ¯Êý¾Ý¡¢ÑµÁ·¡¢Ô¤²âºÍÆÀ¹À»úÆ÷ѧϰ·ÖÀàÆ÷¡£±¾½Ì³ÌÖеIJ½Öè¿ÉÒÔ°ïÖúÄú¼ò»¯ÔÚPythonÖÐʹÓÃ×Ô¼ºµÄÊý¾ÝµÄ¹ý³Ì¡£ |