Äú¿ÉÒÔ¾èÖú£¬Ö§³ÖÎÒÃǵĹ«ÒæÊÂÒµ¡£

1Ôª 10Ôª 50Ôª





ÈÏÖ¤Â룺  ÑéÖ¤Âë,¿´²»Çå³þ?Çëµã»÷Ë¢ÐÂÑéÖ¤Âë ±ØÌî



  ÇóÖª ÎÄÕ ÎÄ¿â Lib ÊÓÆµ iPerson ¿Î³Ì ÈÏÖ¤ ×Éѯ ¹¤¾ß ½²×ù Model Center   Code  
»áÔ±   
   
 

     
   
 ¶©ÔÄ
  ¾èÖú
ÈçºÎʹÓÃScikit-learnʵÏÖÓÃÓÚ»úÆ÷ѧϰµÄÎı¾Êý¾Ý×¼±¸

 
À´Ô´£ºInfoQ ·¢²¼ÓÚ£º 2017-11-10
  2646  次浏览      29
 

ÔÚʹÓÃÎı¾Êý¾ÝÀ´´î½¨Ô¤²âÄ£ÐÍǰ£¬¶¼ÐèÒªÌØÊâµÄ×¼±¸¹¤×÷¡£

Îı¾Ê×ÏÈҪͨ¹ý½âÎöÀ´ÌáÈ¡µ¥´Ê£¬ÕâÒ»¹ý³Ì³ÆÎª´ÊÌõ»¯¡£È»ºóµ¥´ÊÐèÒª±àÂëΪÕûÊý»ò¸¡µãÖµ£¬×÷Ϊ»úÆ÷ѧϰËã·¨µÄÊäÈ룬³ÆÎªÌØÕ÷ÌáÈ¡£¨»òÁ¿»¯£©¡£

scikit-learnÌṩÁ˼òµ¥µÄ¹¤¾ß°ïÖúÎÒÃǶÔÄãµÄÎı¾Êý¾Ý½øÐдÊÌõ»¯ºÍÌØÕ÷ÌáÈ¡¡£

ÔÚÕâÆªÎÄÕÂÖУ¬Äã»áѧµ½ÔÚPythonÖÐÈçºÎʹÓÃscikit-learnʵÏÖÓÃÓÚ»úÆ÷ѧϰµÄÎı¾Êý¾Ý×¼±¸¡£

ÔÚ¶ÁÍêÕâÆªÎÄÕºó£¬Äã»áÁ˽⵽£º

1.ÈçºÎʹÓÃCountVectorizer½«Îı¾µÄת»¯³Éµ¥´ÊƵÊýÏòÁ¿¡£

2.ÈçºÎʹÓÃTfidfVectorizerÌáÈ¡Îı¾µÄµ¥´ÊÈ¨ÖØÏòÁ¿¡£

3.ÈçºÎʹÓÃHashingVectorizer½«Îı¾Ó³Éäµ½ÌØÕ÷Ë÷Òý¡£

ÈÃÎÒÃÇ¿ªÊ¼°É¡£

¡°´Ê´ü£¨Bag-of-words£©¡±Ä£ÐÍ

ÔÚʹÓûúÆ÷ѧϰË㷨ʱ£¬ÎÒÃDz»ÄÜÖ±½ÓÓÃÎı¾½øÐÐÔËËã¡£Ïà·´£¬ÎÒÃÇÐèÒª½«Îı¾×ª»»³ÉÊý×Ö¡£

ÎÒÃÇÏë¶ÔÎĵµ½øÐзÖÀàʱ£¬Ã¿¸öÎĵµ×÷Ϊ¡°ÊäÈ롱£¬ÎĵµµÄÀà±ð±êÇ©ÊÇÎÒÃÇÔ¤²âËã·¨µÄ¡°Êä³ö¡±¡£Ëã·¨Ö»ÄܽÓÊÜÊý×ÖÏòÁ¿×÷ΪÊäÈ룬ËùÒÔÐèÒª½«Îĵµ×ª»»³É¹Ì¶¨³¤¶ÈµÄÊý×ÖÏòÁ¿¡£

»úÆ÷ѧϰÁìÓòÓÐÒ»¸ö¼òµ¥ÇÒÓÐЧµÄÄ£ÐÍ£¬ÊÊÓÃÓÚÎı¾Îĵµ£¬½Ð×ö¡°´Ê´ü¡±£¨Bag-of-Words£©Ä£ÐÍ£¬¼ò³ÆÎªBOW¡£

¸ÃÄ£Ð͵ļòµ¥Ö®´¦ÔÚÓÚ£¬ËüÉáÆúÁ˵¥´ÊÖеÄËùÓÐ˳ÐòÐÅÏ¢£¬²¢Ö÷Òª¹Ø×¢ÎĵµÖе¥´ÊµÄ³öÏÖÆµÂÊ¡£

ÕâÒ»µã¿ÉÒÔͨ¹ý·ÖÅä¸øÃ¿¸öµ¥´ÊÒ»¸öΨһµÄÊý×ÖÀ´ÊµÏÖ¡£ÕâÑùÒ»À´£¬ÎÒÃÇ¿´µ½µÄÈκÎÎĵµ¶¼¿ÉÒÔ±àÂë³ÉÒ»¸ö¹Ì¶¨³¤¶ÈµÄÏòÁ¿£¬³¤¶ÈΪÒÑÖªµ¥´ÊËù¹¹³ÉµÄ´Ê»ã±íµÄ³¤¶È¡£¸ÃÏòÁ¿ÖÐÿ¸öλÖõÄÖµÊDZàÂëÎĵµÖеÄÿ¸öµ¥´Ê³öÏֵĴÎÊý»òƵÂÊ¡£

Õâ¾ÍÊÇ¡°´Ê´ü¡±Ä£ÐÍ£¬ÎÒÃÇÖ»¹ØÐıàÂë·½·¨£¬ÄܱíʾÄÄЩ´ÊÓïÔÚÎĵµÖгöÏÖÁË£¬»òÕßËûÃÇÔÚ±àÂëÎĵµÖгöÏֵįµÂÊ£¬¶ø²»¿¼ÂÇÈκιØÓÚ˳ÐòµÄÐÅÏ¢¡£

Õâ¸ö¼òµ¥µÄ·½·¨ÓкܶàÖÖÀ©Õ¹£¬¼È¿ÉÒÔ¸üºÃµØ½âÊÍ¡°µ¥´Ê¡±µÄº¬Ò壬Ҳ¿ÉÒÔ¶¨ÒåÏòÁ¿ÖÐÿ¸öµ¥´ÊµÄ±àÂ뷽ʽ¡£

scikit-learnÌṩÁË3Öֿɹ©ÎÒÃÇʹÓõIJ»Í¬·½·¨£¬ÎÒÃǽ«¼òÒªµØ¿´Ò»ÏÂÿÖÖ·½·¨¡£

CountVectorizer¡ª¡ªÁ¿»¯µ¥´ÊÊýÁ¿

CountVectorizerÌṩÁËÒ»ÖÖ¼òµ¥µÄ·½·¨£¬²»½ö¿ÉÒÔ½«Îı¾ÎĵµµÄÊý¾Ý¼¯×ª»¯³É´ÊÌõ²¢½¨Á¢Ò»¸öÒÑÖªµ¥´ÊµÄ´Ê»ã±í£¬¶øÇÒ»¹¿ÉÒÔÓøôʻã±í¶ÔÐÂÎı¾½øÐбàÂë¡£

ʹÓ÷½·¨ÈçÏ£º

1. ´´½¨CountVectorizerÀàµÄÒ»¸öʵÀý¡£

2. µ÷ÓÃfit()º¯Êý£¬Í¨¹ýѧϰ´ÓÒ»¸ö»ò¶à¸öÎĵµÖеóöÒ»¸ö´Ê»ã±í¡£

3. ¶ÔÒ»»ò¶à¸öÎĵµÓ¦ÓÃtransform()º¯Êý£¬½«Ã¿¸öÎĵµ±àÂë³ÉÒ»¸öÏòÁ¿¡£

±àÂëµÃµ½µÄÏòÁ¿Äܹ»·µ»ØÕû¸ö´Ê»ã±íµÄ³¤¶È£¬ÒÔ¼°Ã¿¸öµ¥´ÊÔÚ¸ÃÎĵµÖгöÏֵĴÎÊý¡£

ÓÉÓÚÕâЩÏòÁ¿º¬ÓÐÐí¶àÁãÖµ£¬ËùÒÔÎÒÃdzÆÖ®ÎªÏ¡ÊèµÄ¡£PythonÔÚscipy.sparse¿âÖÐÌṩÁËÒ»ÖÖ´¦ÀíÕâÀàÏ¡ÊèÏòÁ¿µÄÓÐЧ·½·¨¡£

µ÷ÓÃtransform()Ëù·µ»ØµÄÏòÁ¿ÊÇÏ¡ÊèÏòÁ¿£¬Äã¿ÉÒÔ½«ËüÃÇת»»ÎªnumpyÊý×飬¿´ÆðÀ´¸üÖ±¹ÛÒ²¸üºÃÀí½â£¬ÕâÒ»²½¿ÉÒÔͨ¹ýµ÷ÓÃtoarray()º¯ÊýÍê³É¡£

ÏÂÃæÊÇÒ»¸öʹÓÃCountVectorizerÀ´´ÊÌõ»¯¡¢¹¹Ôì´Ê»ã±í£¬ÒÔ¼°±àÂëÎĵµµÄʾÀý¡£

from sklearn.feature_extraction.text import CountVectorizer
# Îı¾ÎĵµÁбí
text = ["The quick brown fox jumped over the lazy dog."]
# ¹¹Ôì±ä»»º¯Êý
vectorizer = CountVectorizer()
# ´ÊÌõ»¯ÒÔ¼°½¨Á¢´Ê»ã±í
vectorizer.fit(text)
# ×ܽá
print(vectorizer.vocabulary_)
# ±àÂëÎĵµ
vector = vectorizer.transform(text)
# ×ܽá±àÂëÎĵµ
print(vector.shape)
print(type(vector))
print(vector.toarray())

´ÓÉÏÀýÖпÉÒÔ¿´µ½£¬ÎÒÃÇͨ¹ý´Ê»ã±íÀ´²é¿´µ½µ×ÊÇʲô±»´ÊÌõ»¯ÁË£º

print(vectorizer.vocabulary_)

¿ÉÒÔ¿´µ½£¬ËùÓе¥´ÊĬÈÏÇé¿öÏÂÊÇСд£¬²¢ÇÒºöÂÔµô±êµã·ûºÅ¡£´ÊÌõ»¯µÄÕâЩ²ÎÊýÒÔ¼°ÆäËû·½ÃæÊÇ¿ÉÅäÖõģ¬ÎÒ½¨ÒéÄãÔÚAPIÎĵµÖв鿴ËùÓÐÑ¡Ïî¡£

ÔËÐÐÕâ¸öʾÀý£¬Ê×ÏÈ»áÏÔʾ³ö´Ê»ã±í£¬È»ºóÏÔʾ³ö±àÂëÎĵµµÄÐÎ×´¡£ÎÒÃÇ¿ÉÒÔ¿´µ½£¬´Ê»ã±íÖÐÓÐ8¸öµ¥´Ê£¬ÓÚÊDZàÂëÏòÁ¿µÄ³¤¶ÈΪ8¡£

¿ÉÒÔ¿´³ö£¬±àÂëÏòÁ¿ÊÇÒ»¸öÏ¡Êè¾ØÕó¡£×îºó£¬ÎÒÃÇ¿ÉÒÔ¿´µ½ÒÔÊý×éÐÎʽ³öÏֵıàÂëÏòÁ¿£¬ÏÔʾ³öÿ¸öµ¥´ÊµÄ³öÏÖ´ÎÊýΪ1£¬³ýÁËË÷ÒýºÅΪ7µÄµ¥´Ê³öÏÖ´ÎÊýΪ2¡£

{'dog': 1, 'fox': 2, 'over': 5, 'brown': 0, 'quick': 6, 'the': 7, 'lazy': 4, 'jumped': 3}
(1, 8)
<class 'scipy.sparse.csr.csr_matrix'>
[[1 1 1 1 1 1 1 2]]

ÖØÒªµÄÊÇ£¬¸ÃÁ¿»¯·½·¨¿ÉÒÔÓÃÓÚº¬Óдʻã±íÖÐûÓгöÏֵĵ¥´ÊµÄÎĵµ¡£ÕâЩµ¥´Ê»á±»ºöÂÔµô£¬È»ºóÔڵõ½µÄÏòÁ¿½á¹ûÖв»»á¸ø³ö³öÏÖ´ÎÊý¡£

ÏÂÃæÊÇÒ»¸öʹÓÃÉÏÊöµÄ´ÊÌõ»¯¹¤¾ß¶ÔÎĵµ½øÐбàÂëµÄʾÀý£¬¸ÃÎĵµÖк¬ÓÐÒ»¸ö´Ê»ã±íÖеĴʣ¬ÒÔ¼°Ò»¸ö²»ÔÚ´Ê»ã±íÖеĴʡ£

# ±àÂëÆäËûÎĵµ
text2 = ["the puppy"]
vector = vectorizer.transform(text2)
print(vector.toarray())

ÔËÐÐʾÀý£¬ÏÔʾ³ö±àÂëÏ¡ÊèÏòÁ¿µÄ¾ØÕóÐÎʽ£¬¿ÉÒÔ¿´³ö´Ê»ã±íÖеĵ¥´Ê³öÏÖÁË1´Î£¬¶øÃ»ÔÚ´Ê»ã±íÖеĵ¥´ÊÍêÈ«±»ºöÂÔÁË¡£

[[0 0 0 0 0 0 0 1]]

±àÂëµÄÏòÁ¿¿ÉÒÔÖ±½ÓÓÃÓÚ»úÆ÷ѧϰËã·¨¡£

TfidfVectorizer¡ª¡ª¼ÆËãµ¥´ÊÈ¨ÖØ

ͳ¼Æµ¥´Ê³öÏÖ´ÎÊýÊÇÒ»¸öºÜºÃµÄÇÐÈëµã£¬µ«Ò²ÊǺܻù´¡µÄÌØÕ÷¡£

¼òµ¥µÄ´ÎÊýͳ¼ÆµÄÒ»¸öÎÊÌâÔÚÓÚ£¬ÓÐЩµ¥´Ê£¬ÀýÈç¡°the¡±»á³öÏֺܶà´Î£¬ËüÃǵÄͳ¼ÆÊýÁ¿¶ÔÓÚ±àÂëÏòÁ¿Ã»ÓÐÌ«´óÒâÒå¡£

Ò»¸öÌæ´ú·½·¨ÊÇͳ¼Æµ¥´ÊÈ¨ÖØ£¬Ä¿Ç°×îÁ÷Ðеķ½·¨ÊÇTF-IDF¡£ÕâÊÇÒ»¸öËõд´Ê£¬´ú±í¡°´ÊƵ-ÄæÎĵµÆµÂÊ¡±£¨Term Frequency¨CInverse Document Frequency£©£¬´ú±íÒ»¸ö´Ê¶ÔÓÚÒ»¸öÎĵµµÄÖØÒª³Ì¶È¡£

´ÊƵ£¨Term Frequency£©£ºÖ¸µÄÊÇijһ¸ö¸ø¶¨µÄ´ÊÓïÔÚһƪÎĵµÖгöÏֵĴÎÊý¡£

ÄæÎĵµÆµÂÊ£¨Inverse Document Frequency£©£ºµ¥´ÊÔÚÎĵµÖгöÏֵįµÂÊÔ½¸ß£¬IDFÖµÔ½µÍ¡£

Ʋ¿ªÊýѧ²»Ëµ£¬TF-IDF¸ø³öµÄÊǵ¥´ÊÈ¨ÖØ£¬»á°Ñ¸üÓÐÒâ˼µÄµ¥´Ê±ê×¢³öÀ´£¬ÀýÈç½öÔÚijƪÎĵµÖÐÆµÂʺܸߵ«²»»áÔÚËùÓÐÎĵµÖж¼Æµ·±³öÏֵĴʡ£

TfidfVectorizer¿ÉÒÔ´ÊÌõ»¯Îĵµ£¬Ñ§Ï°´Ê»ã±íÒÔ¼°ÄæÎĵµÆµÂÊÈ¨ÖØ£¬²¢ÇÒ¿ÉÒÔ±àÂëÐÂÎĵµ¡£»òÕߣ¬Èç¹ûÄãÒѾ­ÓÃCountVectorizerѧϰµÃµ½ÁËÏòÁ¿£¬Äã¿ÉÒÔ¶ÔËüʹÓÃTfidftransformerº¯Êý£¬¼ÆËãÄæÎĵµÆµÂʲ¢ÇÒ¿ªÊ¼±àÂëÎļþ¡£

ͬÑùµÄ£¬´´½¨£¨create£©¡¢ÄâºÏ£¨fit£©ÒÔ¼°±ä»»£¨transform£©º¯ÊýµÄµ÷Óö¼ÓëCountVectorizerÏàͬ¡£

ÏÂÃæÊÇÒ»¸öʹÓÃTfidfVectorizerÀ´Ñ§Ï°´Ê»ã±íºÍ3ƪСÎĵµµÄÄæÎĵµÆµÂʵÄʾÀý£¬²¢¶ÔÆäÖÐһƪÎĵµ½øÐбàÂë¡£

from sklearn.feature_extraction.text import TfidfVectorizer
# Îı¾ÎĵµÁбí
text = ["The quick brown fox jumped over the lazy dog.",
"The dog.",
"The fox"]
# ´´½¨±ä»»º¯Êý
vectorizer = TfidfVectorizer()
# ´ÊÌõ»¯ÒÔ¼°´´½¨´Ê»ã±í
vectorizer.fit(text)
# ×ܽá
print(vectorizer.vocabulary_)
print(vectorizer.idf_)
# ±àÂëÎĵµ
vector = vectorizer.transform([text[0]])
# ×ܽá±àÂëÎĵµ
print(vector.shape)
print(vector.toarray())

ÉÏÀýÖУ¬ÎÒÃÇ´ÓÎĵµÖÐѧµ½Á˺¬ÓÐ8¸öµ¥´ÊµÄ´Ê»ã±í£¬ÔÚÊä³öÏòÁ¿ÖУ¬Ã¿¸öµ¥´Ê¶¼·ÖÅäÁËÒ»¸öΨһµÄÕûÊýË÷Òý¡£

ÎÒÃǼÆËãÁË´Ê»ã±íÖÐÿ¸öµ¥´ÊµÄÄæÎĵµÆµÂÊ£¬¸ø¹Û²âµ½µÄ×î³£³öÏֵĵ¥´Ê¡°the¡±£¨Ë÷ÒýºÅΪ7£©·ÖÅäÁË×îµÍµÄ·ÖÊý1.0¡£

×îÖÕ£¬µÚÒ»¸öÎĵµ±»±àÂë³ÉÒ»¸ö8¸öÔªËØµÄÏ¡Êè¾ØÕó£¬ÎÒÃÇ¿ÉÒԲ鿴ÿ¸öµ¥´ÊµÄ×îÖÕÈ¨ÖØ·ÖÊý£¬¿ÉÒÔ¿´µ½¡°the¡±¡¢¡°fox¡±£¬ÒÔ¼°¡°dog¡±µÄÖµÓë´Ê»ã±íÖÐÆäËûµ¥´ÊµÄÖµ²»Í¬¡£

{'fox': 2, 'lazy': 4, 'dog': 1, 'quick': 6, 'the': 7, 'over': 5, 'brown': 0, 'jumped': 3}
[ 1.69314718 1.28768207 1.28768207 1.69314718 1.69314718 1.69314718
1.69314718 1. ]
(1, 8)
[[ 0.36388646 0.27674503 0.27674503 0.36388646 0.36388646 0.36388646
0.36388646 0.42983441]]

ÕâЩ·ÖÊý±»¹éÒ»»¯Îª0µ½1Ö®¼äµÄÖµ£¬±àÂëµÄÎĵµÏòÁ¿¿ÉÒÔÖ±½ÓÓÃÓÚ´ó¶àÊý»úÆ÷ѧϰËã·¨¡£

HashingVectorizer¡ª¡ª¹þÏ£Á¿»¯Îı¾

µ¥´ÊƵÂʺÍÈ¨ÖØÊǺÜÓÐÓõ쬵«Êǵ±´Ê»ã±í±äµÃºÜ´óʱ£¬ÒÔÉÏÁ½ÖÖ·½·¨¾Í»á³öÏÖ¾ÖÏÞÐÔ¡£

·´¹ýÀ´£¬Õ⽫ÐèÒª¾Þ´óµÄÏòÁ¿À´±àÂëÎĵµ£¬²¢¶ÔÄÚ´æÒªÇóºÜ¸ß£¬¶øÇÒ»á¼õÂýËã·¨µÄËÙ¶È¡£

Ò»Öֺܺõķ½·¨ÊÇʹÓõ¥Ïò¹þÏ£·½·¨À´½«µ¥´Êת»¯³ÉÕûÊý¡£ºÃ´¦ÊǸ÷½·¨²»ÐèÒª´Ê»ã±í£¬¿ÉÒÔÑ¡ÔñÈÎÒⳤµÄ¹Ì¶¨³¤¶ÈÏòÁ¿¡£È±µãÊǹþÏ£Á¿»¯Êǵ¥ÏòµÄ£¬Òò´ËÎÞ·¨½«±àÂëת»»»Øµ¥´Ê£¨¶ÔÓëÐí¶àÓмලµÄѧϰÈÎÎñÀ´Ëµ»òÐí²¢²»ÖØÒª£©¡£

HashingVectorizerÀàʵÏÖÁËÕâÒ»·½·¨£¬ËùÒÔ¿ÉÒÔʹÓÃËü¶Ôµ¥´Ê½øÐÐÁ¬Ðø¹þÏ£Á¿»¯£¬È»ºó°´ÐèÇó´ÊÌõ»¯ºÍ±àÂëÎĵµ¡£

ÏÂÃæÊǶԵ¥Ò»ÎĵµÊ¹ÓÃHashingVectorizer½øÐбàÂëµÄʾÀý¡£

ÎÒÃÇÑ¡ÔñÁËÒ»¸ö¹Ì¶¨³¤¶ÈΪ20µÄÈÎÒâÏòÁ¿¡£Õâ¸öÖµ¶ÔÓ¦¹þÏ£º¯ÊýµÄ·¶Î§£¬Ð¡µÄÖµ£¨ÀýÈç20£©¿ÉÄܻᵼÖ¹þÏ£Åöײ¡£ÔÚ֮ǰµÄ¼ÆËã»ú¿ÆÑ§¿Î³ÌÖУ¬ÎÒÃǽéÉܹýһЩÆô·¢Ê½Ëã·¨£¬¿ÉÒÔ¸ù¾Ý¹À¼ÆµÄ´Ê»ãÁ¿À´Ñ¡Ôñ¹þÏ£³¤¶ÈºÍÅöײ¸ÅÂÊ¡£

ҪעÒâÕâÖÖÁ¿»¯·½·¨²»ÒªÇóµ÷Óú¯ÊýÀ´¶ÔѵÁ·Êý¾ÝÎļþ½øÐÐÄâºÏ¡£Ïà·´£¬ÔÚʵÀý»¯Ö®ºó£¬Ëü¿ÉÒÔÖ±½ÓÓÃÓÚ±àÂëÎĵµ¡£

from sklearn.feature_extraction.text import HashingVectorizer
# Îı¾ÎĵµÁбí
text = ["The quick brown fox jumped over the lazy dog."]
# ´´½¨±ä»»º¯Êý
vectorizer = HashingVectorizer(n_features=20)
# ±àÂëÎĵµ
vector = vectorizer.transform(text)
# ×ܽá±àÂëÎĵµ
print(vector.shape)
print(vector.toarray())

ÔËÐиÃʾÀý´úÂë¿ÉÒÔ°ÑÑùÀýÎĵµ±àÂë³ÉÒ»¸öº¬ÓÐ20¸öÔªËØµÄÏ¡Êè¾ØÕó¡£

±àÂëÎĵµµÄÖµ¶ÔÓ¦ÓÚÕýÔò»¯µÄµ¥´Ê¼ÆÊý£¬Ä¬ÈÏÖµÔÚ-1µ½1Ö®¼ä£¬µ«ÊÇ¿ÉÒÔÐÞ¸ÄĬÈÏÉèÖã¬È»ºóÉèÖóÉÕûÊý¼ÆÊýÖµ¡£

(1, 20)
[[ 0. 0. 0. 0. 0. 0.33333333
0. -0.33333333 0.33333333 0. 0. 0.33333333
0. 0. 0. -0.33333333 0. 0.
-0.66666667 0. ]]

Éî¶ÈÔĶÁ

ÕâÒ»½ÚÎÒÃÇΪ´ó¼ÒÌṩÁËһЩ¹ØÓÚÕâÆªÎÄÕµÄÉî¶ÈÔĶÁ²ÄÁÏ¡£

×ÔÈ»ÓïÑÔ´¦Àí

1.ά»ù°Ù¿Æ¡°´Ê´ü¡±£¨Bag-of-words£©Ä£ÐͽéÉÜ¡£

2.ά»ù°Ù¿Æ¡°´ÊÌõ»¯¡±£¨Tokenization£©½éÉÜ¡£

3.ά»ù°Ù¿Æ¡°TF-IDF¡±¡£

scikit-learn

1.scikit-learnʹÓÃÊÖ²á4.2½Ú£¬ÌØÕ÷ÌáÈ¡¡£

2.sckit-learnÌØÕ÷ÌáÈ¡API¡£

3.scikit-learn½Ì³Ì£ºÎı¾Êý¾Ý´¦Àí¡£

ÀàAPI

1.CountVectorizer scikit-learn API

2.TfidfVectorizer scikit-learn API

3.TfidfTransformer scikit-learn API

4.HashingVectorizer scikit-learn API

×ܽá

ÔÚÕâÆª½Ì³ÌÖУ¬Äã»áѧϰµ½ÈçºÎÓÃscikit-learnÀ´×¼±¸ÓÃÓÚ»úÆ÷ѧϰµÄÎı¾Êý¾Ý¡£

 

   
2646 ´Îä¯ÀÀ       29
Ïà¹ØÎÄÕÂ

»ùÓÚͼ¾í»ýÍøÂçµÄͼÉî¶Èѧϰ
×Ô¶¯¼ÝÊ»ÖеÄ3DÄ¿±ê¼ì²â
¹¤Òµ»úÆ÷ÈË¿ØÖÆÏµÍ³¼Ü¹¹½éÉÜ
ÏîĿʵս£ºÈçºÎ¹¹½¨ÖªÊ¶Í¼Æ×
 
Ïà¹ØÎĵµ

5GÈ˹¤ÖÇÄÜÎïÁªÍøµÄµäÐÍÓ¦ÓÃ
Éî¶ÈѧϰÔÚ×Ô¶¯¼ÝÊ»ÖеÄÓ¦ÓÃ
ͼÉñ¾­ÍøÂçÔÚ½»²æÑ§¿ÆÁìÓòµÄÓ¦ÓÃÑо¿
ÎÞÈË»úϵͳԭÀí
Ïà¹Ø¿Î³Ì

È˹¤ÖÇÄÜ¡¢»úÆ÷ѧϰ&TensorFlow
»úÆ÷ÈËÈí¼þ¿ª·¢¼¼Êõ
È˹¤ÖÇÄÜ£¬»úÆ÷ѧϰºÍÉî¶Èѧϰ
ͼÏñ´¦ÀíËã·¨·½·¨Óëʵ¼ù