±à¼ÍƼö: |
±¾ÎÄÖØµã½éÉܱ´Ò¶Ë¹·ÖÀà£¬Éæ¼°ÆÓËØ±´Ò¶Ë¹Ä£ÐÍ¡¢¶þÏî¶ÀÁ¢Ä£ÐÍ¡¢¶àÏîÄ£ÐÍ¡¢»ìºÏÄ£Ð͵È֪ʶ¡£
±¾ÎÄÀ´×ÔÓÚcnblogs.£¬ÓÉ»ðÁú¹ûÈí¼þAnna±à¼¡¢ÍƼö¡£ |
|
0 ÒýÑÔ
ÓÚ°ëÔÂǰ£¬Õë¶ÔÎı¾·ÖÀà½øÐÐѧϰ£¬ÊµÑéµÄÄ¿µÄÊÇͨ¹ý¶ÔÏÂͼ1ÖеIJ»Í¬Çé¸ÐÎı¾¹¹½¨ÑµÁ·¼¯Ä£ÐÍ£¬¶ÔÓ¦µÄÏÂͼ2ÊǶÔѵÁ·¼¯µÄ×¢ÊÍ˵Ã÷¡£Àà±ê0¿ªÍ·ÎªÏ²ÔÃÀà±ð£¬Àà±ê1¿ªÍ·µÄΪ·ßÅÀà±ð£¬Àà±ð2¿ªÍ·µÄÊÇÑá¶ñÀà±ð£¬Àà±ð3¿ªÍ·µÄΪµÍÂäÀà±ð¡£4¸öѵÁ·¼¯Îı¾£¬·Ö±ð¶ÔÓ¦4¸ö·ÖÀà¡£ÈçºÎͨ¹ýѵÁ·¼¯¹¹Ôì·ÖÀàÆ÷£¬²¢¶Ô²âÊÔÊý¾Ý½øÐÐÑéÖ¤ÊDZ¾¿ÎÌâµÄ×îÖÕÄ¿µÄ¡£ÆäÖлáÉæ¼°±´Ò¶Ë¹¹«Ê½µÄÀí½âÓëʵÏÖ£¬Îı¾µÄÔ¤´¦Àí£¨ÏÂͼ1ÖÐ0_simplifyweiboµÄѵÁ·¼¯ÊÇ´¦Àí¹ýµÄÊý¾ÝÈçÏÂͼ£©£¬·Ö´Ê¹¤¾ßµÄʹÓ㬲»Í¬±´Ò¶Ë¹Ä£Ð͵Ĺ¹Ô죬ÊÔÑé½á¹û¶Ô±È¡£ºËÐÄ˼·¾ÍÁ½µã£º1£¬Ä£ÐÍѵÁ·½×¶Î
2£¬·ÖÀàÔ¤²â½×¶Î¡£ÍêÕûÁ÷³ÌÈçÏ£º
-->ѵÁ·Îı¾Ô¤´¦Àí£¬¹¹Ôì·ÖÀàÆ÷¡££¨¼´¶Ô±´Ò¶Ë¹¹«Ê½ÊµÏÖÎı¾·ÖÀà²ÎÊýÖµµÄÇó½â£¬ÔÝʱ²»Àí½âû¹ØÏµ£¬ÏÂÎÄÏê½â£©
-->¹¹ÔìÔ¤²â·ÖÀຯÊý
-->¶Ô²âÊÔÊý¾ÝÔ¤´¦Àí
-->ʹÓ÷ÖÀàÆ÷·ÖÀà

1 ËÄÖÖÄ£Ðͽṹ
ÆÓËØ±´Ò¶Ë¹Ä£ÐÍ NaiveBayes model NM
¶þÏî¶ÀÁ¢Ä£ÐÍ Bnary independence model BIM
¶àÏîʽģÐÍ multinomial model MM
»ìºÏÄ£ÐÍ hyorid model HM
ƽ»¬Òò×Ó»ìºÏÄ£ÐÍ hyorid model with new smooth factore HM&NSF
2 ÆÓËØ±´Ò¶Ë¹·ÖÀàÆ÷
˼Ïë¸ÅÊö
-- ¹«Ê½ P( Category | Document) = (P ( Document | Category
) * P( Category))/ P(Document)
-- ÆÓËØ±´Ò¶Ë¹·ÖÀàÆ÷£º P(c|d)~=P(c)*P(d|c)
-- ѵÁ·½×¶Î£º¶Ôÿһ¸öW_k£¬C_i¹À¼ÆÏÈÑéÌõ¼þ¸ÅÂÊP(w_k|c_i)ºÍ¸ÅÂÊP(C_i)
-- ·ÖÀà½×¶Î£º¼ÆËãºóÑé¸ÅÂÊ£¬·µ»ØÊ¹ºóÑé¸ÅÂÊ×î´óµÄÀà
-- C(d)=argmax {P(C_i)*P(d|c_i)
¶ÔÓÚÒ»¸öеÄѵÁ·Îĵµd£¬¾¿¾¹ÊôÓÚÈçÉÏËĸöÀà±ðµÄÄĸöÀà±ð£¿ÎÒÃÇ¿ÉÒÔ¸ù¾Ý±´Ò¶Ë¹¹«Ê½£¬Ö»ÊǴ˿̱仯³É¾ßÌåµÄ¶ÔÏó¡£
> P( Category | Document)£º²âÊÔÎĵµÊôÓÚijÀàµÄ¸ÅÂÊ
> P( Category))£º´ÓÎĵµ¿Õ¼äÖÐËæ»ú³éȡһ¸öÎĵµd£¬ËüÊôÓÚÀà±ðcµÄ¸ÅÂÊ¡££¨Ä³ÀàÎĵµÊýÄ¿/×ÜÎĵµÊýÄ¿£©
> (P ( Document | Category )£ºÎĵµd¶ÔÓÚ¸ø¶¨ÀàcµÄ¸ÅÂÊ£¨Ä³ÀàÏÂÎĵµÖе¥´ÊÊý/ijÀàÖÐ×ܵĵ¥´ÊÊý£©
> P(Document)£º´ÓÎĵµ¿Õ¼äÖÐËæ»ú³éȡһ¸öÎĵµdµÄ¸ÅÂÊ£¨¶ÔÓÚÿ¸öÀà±ð¶¼Ò»Ñù£¬¿ÉÒÔºöÂÔ²»¼ÆËã¡£´ËʱΪÇó×î´óËÆÈ»¸ÅÂÊ£©
> C(d)=argmax {P(C_i)*P(d|c_i)}£ºÇó³ö½üËÆµÄ±´Ò¶Ë¹Ã¿¸öÀà±ðµÄ¸ÅÂÊ£¬±È½Ï»ñÈ¡×î´óµÄ¸ÅÂÊ£¬´ËʱÎĵµ¹éΪ×î´ó¸ÅÂʵÄÒ»À࣬·ÖÀà³É¹¦¡£
×ÛÉÏ£º¶ÔѵÁ·¼¯¹¹³ÉѵÁ··ÖÀàÆ÷Ä£Ð͵Ĺý³Ì£¬±¾ÖÊÊǶԲÎÊýÄ£Ð͵ÄÇó½â¡£È»ºó½«ÕâЩ²ÎÊýÔÚÔ¤²â·½·¨ÖÐʹÓ㬸ù¾Ý¹«Ê½»ñÈ¡×î´ó¸ÅÂʼ´¿ÉÍê³ÉÎĵµ·ÖÀà¡£
¹«Ê½ÍƵ¼Óë½âÎö
ÆÓËØ±´Ò¶Ë¹¹«Ê½£º£¨¼ÙÉèÌõ¼þ£ºµ±ÎĵµdÊôÓÚÀàcʱ£¬ÎĵµdÖеÄÔªËØwµÄȡֵÓëÀàcÖеÄwµÄȡֵÊǶÀÁ¢¹ØÏµ[ʵ¼ÊÏÔʾ²»¶ÀÁ¢£¬Ò»ÖÖ½üËÆ´¦Àí]£©

¹«Ê½½âÎö£º
> P(d)£º´ÓÎĵµ¿Õ¼äÖÐËæ»ú³éȡһ¸öÎĵµdµÄ¸ÅÂÊ£¨¶ÔÓÚÿ¸öÀà±ð¶¼Ò»Ñù£¬¿ÉÒÔºöÂÔ²»¼ÆËã¡£´ËʱΪÇó×î´óËÆÈ»¸ÅÂÊ£©
> P(c)£º´ÓÎĵµ¿Õ¼äÖÐËæ»ú³éȡһ¸öÎĵµd£¬ËüÊôÓÚÀà±ðcµÄ¸ÅÂÊ¡££¨Ä³ÀàÎĵµÊýÄ¿/×ÜÎĵµÊýÄ¿£©
> (P ( d| c )£ºÎĵµd¶ÔÓÚ¸ø¶¨ÀàcµÄ¸ÅÂÊ£¨Ä³ÀàÏÂÎĵµÖе¥´ÊÊý/ijÀàÖÐ×ܵĵ¥´ÊÊý£©
> Àà±ð¼¯: c={c1,c2,.....,cn}
> ÎĵµÏòÁ¿: d={w1,w2,.....,wn}
> Àà±ð¼¯: c={c1,c2,.....,cn}
> P(c| d)£º²âÊÔÎĵµdÊôÓÚijÀàcµÄ¸ÅÂÊ(¹À¼ÆÌõ¼þ¸ÅÂÊ)¡¾¹À¼Æ¸ÅÂÊ£ºÑµÁ·¼¯ÖнøÐÐѵÁ·¹ý³Ì£¬ÔÚijÖÖ¼ÙÉèÌõ¼þÏÂʵÏֵġ¿
> MaxP(c| d)£º²âÊÔÎĵµdÊôÓÚijÀàcµÄ×î´ó¸ÅÂÊ
ÏÈÑéÌõ¼þ¸ÅÂÊ£º

½«(2)ʽ´úÈë(1)µÃ£º£¨ÏÂʽÖÐp(d)¶ÔÓÚËùÓеÄÀàc¶¼ÊÇÒ»ÑùµÄ£©

×¢£ºÖ»Òª¶ÔÉÏʽÖеķÖĸÇó³ö×î´óÖµ¼´¿É¡£·ÖĸÖеÄ×󲿷Öͨ¹ý£¨Ä³ÀàÎĵµÊýÄ¿/×ÜÎĵµÊýÄ¿£©Ò׵ã¬ÓÒ²àÖÐͨ¹ýÇóÎĵµdºÍcÖе¥´ÊÁ¿¼´¿É¡£µ½´Ë£¬½â¾ö˼·ºÍ˼Ïë¶¼ÓÐÁË£¬ÏÂÃæ»ùÓÚ´ËÍê³ÉËã·¨¡£
Ëã·¨½éÉÜÓëʵÏÖ
Ëã·¨1£ºÎı¾·ÖÀàµÄÆÓËØ±´Ò¶Ë¹Ëã·¨
ѵÁ·½×¶Î£º¶Ôÿһ¸öw_k,c_i¹À¼ÆÏÈÑéÌõ¼þ¸ÅÂÊp£¨w_k|c_i£©ºÍ¸ÅÂÊp£¨c_i£©¡£
·ÖÀà½×¶Î£º¼ÆËãºóÑé¸ÅÂÊ£¬·µ»ØÊ¹ºóÑé¸ÅÂÊ×î´óµÄÀà¡£

Ëã·¨¾ßÌåʵÏÖ£º
/**
* ÆÓËØ±´Ò¶Ë¹Îı¾·ÖÀàÆ÷
* ѵÁ·½×¶Î
* Ë㷨˼Ï룺ÎĵµdÊôÓÚijÀàcµÄ¸ÅÂÊ=Îĵµ¿Õ¼äËæ»ú³éȡһ¸öÎĵµdÊôÓÚijÀàcµÄ¸ÅÂÊ*ÎĵµÖеĵ¥´ÊÓë×ܵ¥´ÊµÄ±ÈÀý
* P(c|d)~=P(c)*P(d|c)
* P(c)=classDocnum/classAlldocnum
* ¼ÆËã²ÎÊý£º
* classDocnum£ºÄ³ÀàÖеÄÎĵµÊýÄ¿
* classAlldocnum£ºÊý¾Ý¼¯ÖÐ×ܵÄÎĵµÊýÄ¿
* classWordfru£ºÄ³ÀàÏÂÎĵµÖе¥´ÊƵÊý
* classAllwordnum:ijÀàÖÐ×ܵĵ¥´ÊÊý
* @param fileDirPath ѵÁ·¼¯Îļþ¼ÐĿ¼
*/ |
package com.naivebayes.bnc;
import java.io.BufferedReader;
import java.io.File;
import java.io.FileInputStream;
import java.io.IOException;
import java.io.InputStreamReader;
import java.util.HashMap;
import java.util.Iterator;
import java.util.Map;
import jeasy.analysis.MMAnalyzer;
/**
* @¹«Ê½ P( Category | Document) = (P ( Document
| Category ) * P( Category))/ P(Document)
* @ÆÓËØ±´Ò¶Ë¹·ÖÀàÆ÷£º P(c|d)~=P(c)*P(d|c)
* @Ë㷨˼Ï룺
* ѵÁ·½×¶Î£º¶Ôÿһ¸öW_k£¬C_i¹À¼ÆÏÈÑéÌõ¼þ¸ÅÂÊP(w_k|c_i)ºÍ¸ÅÂÊP(C_i)
* ·ÖÀà½×¶Î£º¼ÆËãºóÑé¸ÅÂÊ£¬·µ»ØÊ¹ºóÑé¸ÅÂÊ×î´óµÄÀà
* C(d)=argmax {P(C_i)*P(d|c_i)}
* @Ìõ¼þ£º¸ø¶¨Ä¿±êֵʱÊôÐÔÖ®¼äÏ໥Ìõ¼þ¶ÀÁ¢¡£»»ÑÔÖ®¡£¸Ã¼Ù¶¨ËµÃ÷¸ø¶¨ÊµÀýµÄÄ¿±êÖµÇé¿öÏ¡£¹Û²ìµ½ÁªºÏµÄa1,a2...anµÄ¸ÅÂÊÕýºÃÊǶÔÿ¸öµ¥¶ÀÊôÐԵĸÅÂʳ˻ý£º
P(a1,a2...an | Vj ) =¦°i P( ai| Vj ).
* @ȱµã£º ÔÚÊôÐÔ¸öÊý±È½Ï¶à»òÕßÊôÐÔÖ®¼äÏà¹ØÐԽϴóʱ£¬NBCÄ£Ð͵ķÖÀàЧÂʱȲ»ÉϾö²ßÊ÷Ä£ÐÍ¡£
* @Óŵ㣺¾ö²ßÊ÷Ä£ÐÍÒ²ÓÐһЩȱµã£¬±ÈÈç´¦ÀíȱʧÊý¾ÝʱµÄÀ§ÄÑ£¬¹ý¶ÈÄâºÏÎÊÌâµÄ³öÏÖ£¬ÒÔ¼°ºöÂÔÊý¾Ý¼¯ÖÐÊôÐÔÖ®¼äµÄÏà¹ØÐԵȣ¬ÊÊÓÃNBC£¨ÆÓËØ±´Ò¶Ë¹·ÖÀࣩ
* @±È½Ï£ºÔÚÊôÐÔÏà¹ØÐÔ½ÏСʱ£¬NBCÄ£Ð͵ÄÐÔÄÜÉÔ΢Á¼ºÃ¡£ÊôÐÔÏà¹ØÐÔ½ÏСµÄʱºò£¬ÆäËûµÄËã·¨ÐÔÄÜÒ²ºÜºÃ£¬ÕâÊÇÓÉÓÚÐÅÏ¢ìØÀíÂÛ¾ö¶¨µÄ¡£
* @author °×Äþ³¬
*
*/
public class NaiveBayesToClass {
//ͳ¼ÆÄ³ÀàÎĵµµÄÊýÄ¿
public static Map<String,Integer> classDocnum=new
HashMap<String, Integer>();
//ÊôÓÚÀà±ðµÄµ¥´Ê ×ÜÊý
public static int classAlldocnum=0;
//ͳ¼ÆÄ³ÀàÖÐij¸öµ¥´Ê³öÏֵĴÎÊý
public static Map< String,Integer> classWordfru=new
HashMap<String, Integer>();
//ͳ¼ÆÄ³ÀàÖеĵ¥´Ê×ÜÊý
public static Map<String,Integer> classAllwordnum=new
HashMap<String, Integer>();
/**
* ÆÓËØ±´Ò¶Ë¹Îı¾·ÖÀàÆ÷
* ѵÁ·½×¶Î
* Ë㷨˼Ï룺ÎĵµdÊôÓÚijÀàcµÄ¸ÅÂÊ=Îĵµ¿Õ¼äËæ»ú³éȡһ¸öÎĵµdÊôÓÚijÀàcµÄ¸ÅÂÊ*ÎĵµÖеĵ¥´ÊÓë×ܵ¥´ÊµÄ±ÈÀý
* P(c|d)~=P(c)*P(d|c)
* P(c)=classDocnum/classAlldocnum
* ¼ÆËã²ÎÊý£º
* classDocnum£ºÄ³ÀàÖеÄÎĵµÊýÄ¿
* classAlldocnum£ºÊý¾Ý¼¯ÖÐ×ܵÄÎĵµÊýÄ¿
* classWordfru£ºÄ³ÀàÏÂÎĵµÖе¥´ÊƵÊý
* classAllwordnum:ijÀàÖÐ×ܵĵ¥´ÊÊý
* @param fileDirPath ѵÁ·¼¯Îļþ¼ÐĿ¼
*/
public static void BayesModel (String fileDirPath){
try{
File dir=new File(fileDirPath);
if(dir.exists() &&dir.isDirectory()){
File[] files=dir.listFiles(); //»ñÈ¡ËùÓÐѵÁ·¼¯Îļþ
for(File file:files){
String classNo=file.getName().split("\\_")[0];//»ñÈ¡ÎļþÀà±ê
FileInputStream stream =new FileInputStream(file);
//»ñÈ¡ÎļþÁ÷
InputStreamReader strRead=new InputStreamReader(stream,"UTF-8");
//¶ÔÎļþ½øÐжÁÈ¡£¬ÇÒÖ¸¶¨±àÂë¸ñʽ
BufferedReader bufReader = new BufferedReader(strRead);
String line=null;
//¶ÁÈ¡ÎļþÄÚÈÝ
while((line=bufReader.readLine())!=null){
//ͳ¼ÆÄ³ÀàÎĵµµÄÊýÄ¿
if(classDocnum.containsKey(classNo)){
classDocnum.put (classNo, classDocnum.get(classNo)+1);
}
else{
classDocnum.put(classNo, 1);//µÚÒ»´Î´æÊý¾Ý£¬Ã»ÓÐÀà±ê£¬µ«ÊÇÒѾ¶ÁȡһÐУ¬¹ÊÉèÖÃ1
}
String lineText = line.trim(); //³ýÈ¥×Ö·û´®¿ªÍ·ºÍĩβµÄ¿Õ¸ñ»òÆäËû×Ö·û
String[] words = lineText.split(" ");
//±éÀúËùÓе¥´Ê
for(String word:words){
//ͳ¼ÆÄ³ÀàÖеĵ¥´Ê×ÜÊý
if(classAllwordnum.containsKey(classNo)){
classAllwordnum.put(classNo, classAllwordnum.get(classNo)+1);
}
else{
classAllwordnum.put(classNo, 1);
}
//ͳ¼ÆÄ³ÀàÖÐij¸öµ¥´Ê³öÏֵĴÎÊý
String wordNo=classNo+"_"+word;
if(classWordfru.containsKey(wordNo)){
classWordfru.put (wordNo, classWordfru.get(wordNo)+1);
}
else{
classWordfru.put(wordNo, 1);
}
}
classAlldocnum++;
}
strRead.close();
}
}
else{
System.out.println("ÕÒ²»µ½Ä¿Â¼Îļþ"+fileDirPath);
}
}
catch (Exception e) {
System.out.println("³ö´íÐÅÏ¢ÃèÊöÈçÏ£º"+e.getMessage());
}
}
/**
* ¶Ô²âÊÔÎı¾½øÐзÖÀàÔ¤²â
* Ô¤²â½×¶Î£º
* @param testText ²âÊÔÊý¾Ý¼¯
* @return ·µ»Ø·ÖÀà½á¹û
*/
public static String PredictReslut (String testText){
//Ô¤²â½á¹û
String PredictResult="";
testText=SplitWords(testText, " ");//
¶Ô²âÊÔÎĵµ½øÐÐÖÐÎÄ·Ö´Ê´¦Àí
String[] words=testText.split(" ");
//¶Ô×Ö·û´®½øÐзָî
double argmax = Double.NEGATIVE_INFINITY;//×î´óÀà¸ÅÂÊ£¨Ä¬ÈÏֵΪ¸ºÎÞÇîС£©,ÊÇ·ñ¿ÉÒÔд³É0£¿£¨ÎÞÇîС±¾Éí¾ÍÊǽӽü0£©
Iterator iterator= classDocnum.keySet().iterator();//±éÀú
while(iterator.hasNext()){
String classNo = (String) iterator.next();
double prior = classDocnum.get (classNo)/(double)classAlldocnum;//ÏÈÑé¸ÅÂÊ
double classcount= (double)(classAllwordnum.get(classNo)+1);//ijһÀàµÄ×î´óÖµ
double likelihoodProbability=0; //³õʼ»¯ËÆÈ»¸ÅÂÊ
//¸ù¾Ý¹«Ê½Çó½â×î´óËÆÈ»¸ÅÂÊ£¬ÆäÖÐwordsÏ൱ÓÚÊôÐÔ¼´¶àάµÄ
for (int i = 0; i < words.length; i++){
String word_classNo = words[i]+"_"+classNo;
//»ñÈ¡²âÊÔÊý¾ÝµÄµ¥´ÊÀà±ð
//ÓëѵÁ·Êý¾Ý´Ê¿â½øÐжԱȣ¬ÇóµÃÏàËÆµÄ¸ÅÂÊ
if(classWordfru.containsKey(word_classNo)){
//½«Á¬³Ë×°»»³É¶ÔÊýÏà¼Ó¡¾ln(a*b)=lna+lnb¡¿,Ìá¸ßЧÂÊ
likelihoodProbability += Math.log(classWordfru.get(word_classNo)/classcount);
}
else{
likelihoodProbability += Math.log(1/classcount);
}
}
//ÀûÓÃ×ÔÈ»¶ÔÊýe^loga = a£¬È¡µÃÔʼֵ
likelihoodProbability = Math.exp(likelihoodProbability)*prior;
System.out.println ("classNo:"+classNo);
System.out.println ("×î´óËÆÈ»¸ÅÂÊ:"+argmax);
System.out.println ("ËÆÈ»¸ÅÂÊ:"+likelihoodProbability);
if(likelihoodProbability>argmax){
argmax = likelihoodProbability; //×î´óËÆÈ»¸ÅÂÊÒ»Ö±±£³Ö×î´óµÄËÆÈ»¸ÅÂÊ
PredictResult = classNo; //·µ»Ø·ÖÀàµÄ½á¹û
}
}
System.out.println("********");
System.out.println("¡¾ÆÓËØ±´Ò¶Ë¹×îÖÕ·ÖÀà½á¹û£º¡¿"+PredictResult);
return PredictResult;
}
/**
* ¶Ô×Ö·û´®½øÐÐÖÐÎÄ·Ö´Ê´¦Àí
* @param text ¸ø¶¨Ô¤´¦ÀíµÄ×Ö·û´®
* @param splitToken ÓÃÓÚ·Ö¸îµÄ±ê¼Ç¡£Èç","
* @return ´¦ÀíºóµÄ×Ö·û´®
*/
public static String SplitWords (String text,String
splitToken){
String result = null;
MMAnalyzer analyzer = new MMAnalyzer(); //¼«Ò×ÖÐÎÄ·Ö´Ê
try {
result = analyzer.segment(text, splitToken);
}
catch (IOException e){
e.printStackTrace();
}
return result;
}
} |
ÔËÐнá¹û£º

ÓÅȱµã¶Ô±È·ÖÎö
* @Ìõ¼þ£º¸ø¶¨Ä¿±êֵʱÊôÐÔÖ®¼äÏ໥Ìõ¼þ¶ÀÁ¢¡£»»ÑÔÖ®¡£¸Ã¼Ù¶¨ËµÃ÷¸ø¶¨ÊµÀýµÄÄ¿±êÖµÇé¿öÏ¡£¹Û²ìµ½ÁªºÏµÄa1,a2...anµÄ¸ÅÂÊÕýºÃÊǶÔÿ¸öµ¥¶ÀÊôÐԵĸÅÂʳ˻ý£º
P(a1,a2...an | Vj ) =¦°i P( ai| Vj ).
* @ȱµã£º ÔÚÊôÐÔ¸öÊý±È½Ï¶à»òÕßÊôÐÔÖ®¼äÏà¹ØÐԽϴóʱ£¬NBCÄ£Ð͵ķÖÀàЧÂʱȲ»ÉϾö²ßÊ÷Ä£ÐÍ¡£
* @Óŵ㣺¾ö²ßÊ÷Ä£ÐÍÒ²ÓÐһЩȱµã£¬±ÈÈç´¦ÀíȱʧÊý¾ÝʱµÄÀ§ÄÑ£¬¹ý¶ÈÄâºÏÎÊÌâµÄ³öÏÖ£¬ÒÔ¼°ºöÂÔÊý¾Ý¼¯ÖÐÊôÐÔÖ®¼äµÄÏà¹ØÐԵȣ¬ÊÊÓÃNBC£¨ÆÓËØ±´Ò¶Ë¹·ÖÀࣩ
* @±È½Ï£ºÔÚÊôÐÔÏà¹ØÐÔ½ÏСʱ£¬NBCÄ£Ð͵ÄÐÔÄÜÉÔ΢Á¼ºÃ¡£ÊôÐÔÏà¹ØÐÔ½ÏСµÄʱºò£¬ÆäËûµÄËã·¨ÐÔÄÜÒ²ºÜºÃ£¬ÕâÊÇÓÉÓÚÐÅÏ¢ìØÀíÂÛ¾ö¶¨µÄ¡£
3 ¶þÏî¶ÀÁ¢Ä£ÐÍ
˼Ïë¸ÅÊö
¶þÏî¶ÀÁ¢Ä£ÐÍÓÖ³ÆÎª¶à±äÁ¿²®Å¬ÀûÄ£ÐÍ£¬ÊÇÆÓËØ±´Ò¶Ë¹×î³£ÓõÄʵÏÖÄ£ÐÍÖ®Ò»¡£Ê¹ÓöþÖµÏòÁ¿À´±íʾÎĵµ£¬µ±w=1ʱ£¬µ¥´ÊÔÚÎĵµÖгöÏÖw=0²»³öÏÖ¡£Ö»ÊÇÔÚÇó½âÏÈÑé¸ÅÂÊʱºòÓÐËù±ä»¯£¬ÆäËûºÍÆÓËØ±´Ò¶Ë¹Ò»Ñù¡£ºóÃæ»áÉæ¼°Æ½»¬Òò×Ó±ÜÃâ·ÖĸΪ0µÄÎÊÌâ¡£
·ÖÀàÄ£ÐÍ£º¶ÔÓÚÀàc_i£¬µ¥´Êw_kµÄÏÈÑéÌõ¼þ¸ÅÂÊ
¶þÏî¶ÀÁ¢Ä£Ð͵ÄÏÈÑé¸ÅÂÊ£º£¨¼ÙÉèÌõ¼þ£ºÆäÔÚÒ»¶¨¼ÙÉèÌõ¼þÏÂʵÏֵ쬼´¸ø¶¨µÄÀàc_i£¬ÎĵµdÖе¥´Êw_kºÍw_iÊÇ·ñ³öÏÖÊÇÏ໥¶ÀÁ¢µÄ¡££©

¹«Ê½½âÎö£º
> ÆäʹÓöþÖµÏòÁ¿À´±íʾһ¸öÎĵµ£¬¼´d={w1,w2,...,w|v|}£¬ÆäÖÐw_kÊôÓÚ{0,1}
>|V|£ºµ¥´Ê±íµÄ³ß´ç
> w_k=1£ºµ¥´ÊwÔÚÎĵµÖгöÏÖ
> P_ki£ºP(w_k=1|c_i)
ÆäÖÐÎĵµd¿ÉÒÔ¿´×ö|V|ÖØ¶ÀÁ¢µÄ²®Å¬ÀûÊÔÑ飬¶ÔÓÚ¸ø¶¨µÄc_i£¬ÎĵµdµÄÌõ¼þ¸ÅÂÊ¿ÉÒÔͨ¹ý£¨3£©¹À¼ÆÕâÀïn=|V|£¬Í¬ÑùÎĵµdµÄÀà±ð¿ÉÒÔͨ¹ý¹«Ê½£¨4£©¾ö¶¨£¬°Ñ¹«Ê½£¨6£©´úÈ루2£©£¨4£©£©

> (7)ʽµ½(8)£º ¦°A*B=¦²logA+¦²logB
> (8)ʽÖпÉÖª£¬ËäȻģÐÍÖп¼Âǵ¥´Ê³öÏÖºÍδ³öÏÖÇé¿ö£¬µ«ÊÇ·ÖÀàÆð×÷ÓõÄʵ¼ÊÉÏÊÇw_k·ÇÁãµÄµ¥´Ê¡£
²ÎÊý¹À¼Æ£º
Ä£ÐÍÖÐÓõ½µÄ²ÎÊý¶¼ÊÇͨ¹ýѵÁ·½×¶Î£¬´ÓѵÁ·Êý¾ÝÖÐѧϰµÃµ½µÄ£¬Í¨³£È¡ËüÃǵÄ×î´óËÆÈ»¹À¼Æ£¨¼´£¨1£©Ê½ÖÐÈ¥µô·Öĸp(d)£©£¬ÉèѵÁ·Îĵµ¼¯D={d1,d2,...,d|v|}
ÀàcµÄ¸ÅÂÊÓÉÏÂʽ¹À¼Æ£º

> n_i:ѵÁ·¼¯ÖÐÀà±ðc_iµÄ ÎĵµÊý
µ±Àà±ðc_i µÄÎĵµÊýΪ0£¬¼´n_i=0,µ¼ÖÂp(c_i)=0.×îºó×î´óËÆÈ»¸ÅÂÊΪ0µÄºó¹û£¬¸ÃÈçºÎ±ÜÃâ?
ƽ»¬Òò×ӵijöÏÖ£º

> n_i£ºÑµÁ·¼¯ÖÐÀà±ðc_iµÄ ÎĵµÊý
> n_ki:ѵÁ·Îĵµ¼¯Öк¬ÓÐw_k£¬²¢ÇÒÀà±ðc_iµÄÎĵµÊý
Ëã·¨½éÉÜÓëʵÏÖ£º
¾ßÌå´úÂëʵÏÖ£º
package com.bernouli.bnc;
import java.io.BufferedReader;
import java.io.File;
import java.io.FileInputStream;
import java.io.IOException;
import java.io.InputStreamReader;
import java.util.HashMap;
import java.util.HashSet;
import java.util.Iterator;
import java.util.Map;
import java.util.Set;
import jeasy.analysis.MMAnalyzer;
public class BernouliBayesClass {
//ͳ¼ÆÄ³ÀàÖÐÎĵµµÄÊýÄ¿
public static Map<String,Integer> classcountMap=new
HashMap<String, Integer>();
//ͳ¼ÆÑµÁ·¼¯ÖÐ×ÜÎĵµµÄÊýÄ¿
public static Integer datacount=0;
//ѵÁ·¼¯ÖÐËùº¬µ¥´Êͨ¹ýÀà±ê½øÐбê¼Ç£¨ÑµÁ·¼¯µÄµ¥´ÊÒÑ¾È¥ÖØ£¬²»¿¼ÂÇÆµÊý£©
public static Map<String,Integer> likelihoodMap=new
HashMap<String, Integer>();
//ËùÓÐÀà´Ê»ã¼¯ºÏ£¨×Ü´Ê»ã±íµÄ³¤¶È£©
public static Set<String> vocabularySet
= new HashSet<String>();
/**
* ½¨Á¢ÆÓËØ±´Ò¶Ë¹Îı¾·ÖÀàÄ£ÐÍ
* @param fileDirPath ÎļþĿ¼
*/
public static void BayesModel (String fileDirPath){
try{
File dir=new File(fileDirPath);
if(dir.exists()&&dir.isDirectory()){
File[] files=dir.listFiles(); //»ñÈ¡ËùÓÐѵÁ·¼¯Îļþ
//±éÀúѵÁ·¼¯Îļþ
for(File file:files){
String classNo=file.getName().split("\\_")[0];//»ñÈ¡ÎļþÀà±ê
FileInputStream stream=new FileInputStream(file);
//»ñÈ¡ÎļþÁ÷
InputStreamReader strRead=new InputStreamReader(stream,"UTF-8");
//¶ÔÎļþ½øÐжÁÈ¡£¬ÇÒÖ¸¶¨±àÂë¸ñʽ
BufferedReader bufReader = new BufferedReader(strRead);
String line=null;
//¶ÁÈ¡ÎļþÄÚÈÝ
while((line=bufReader.readLine())!=null){
//ͳ¼ÆÄ³ÀàÎĵµµÄÊýÄ¿
if(classcountMap.containsKey(classNo)){
classcountMap.put(classNo, classcountMap.get(classNo)+1);
}
else{
classcountMap.put(classNo, 1);//µÚÒ»´Î´æÊý¾Ý£¬Ã»ÓÐÀà±ê£¬µ«ÊÇÒѾ¶ÁȡһÐУ¬¹ÊÉèÖÃ1
}
String lineText = line.trim(); //³ýÈ¥×Ö·û´®¿ªÍ·ºÍĩβµÄ¿Õ¸ñ»òÆäËû×Ö·û
String[] words = lineText.split(" ");
Set<String> wordaSet = arrayToSet(words);//µ¥´ÊÈ¥ÖØ£¬Áгö´Ê»ã±íµÄ³ß´ç
if(!wordaSet.isEmpty()){
vocabularySet.addAll(wordaSet);//¼ÓÈë´Ê»ã±í¼¯ºÏ
}
//±éÀúËùÓе¥´Ê
for(String word:words){
String wordNo=word+"_"+classNo;
if(likelihoodMap.containsKey(wordNo)){
likelihoodMap.put(wordNo, likelihoodMap.get(wordNo)+1);
}
else{
likelihoodMap.put(wordNo, 1);
}
}
datacount++;
}
strRead.close();
}
}
else{
System.out.println("ÕÒ²»µ½Ä¿Â¼Îļþ"+fileDirPath);
}
}
catch (Exception e) {
System.out.println("³ö´íÐÅÏ¢ÃèÊöÈçÏ£º"+e.getMessage());
}
}
/**
* ½«Êý×éת»»³ÉSet¼¯ºÏ(Ï൱ÓÚÈ¥ÖØ)
* @param words
* @return
*/
public static Set<String> arrayToSet(String[]
words){
Set<String> wordaSet = new HashSet<String>();
for (String word:words) {
if(""!=word&&!word.equals("")){
wordaSet.add(word);
}
}
return wordaSet;
}
/**
* ¶Ô×Ö·û´®½øÐÐÖÐÎÄ·Ö´Ê´¦Àí
* @param text ¸ø¶¨Ô¤´¦ÀíµÄ×Ö·û´®
* @param splitToken ÓÃÓÚ·Ö¸îµÄ±ê¼Ç¡£Èç","
* @return ´¦ÀíºóµÄ×Ö·û´®
*/
public static String SplitWords (String text,String
splitToken){
String result = null;
MMAnalyzer analyzer = new MMAnalyzer(); //¼«Ò×ÖÐÎÄ·Ö´Ê
try{
result = analyzer.segment(text, splitToken);
}
catch (IOException e){
e.printStackTrace();
}
return result;
}
/**
* ¶Ô²âÊÔÎı¾½øÐзÖÀàÔ¤²â
* @param testText ²âÊÔÊý¾Ý¼¯
* @return ·µ»Ø·ÖÀà½á¹û
*/
public static String PredictReslut (String testText){
String PredictResult="";
testText=SplitWords(testText, " ");
// ¶Ô²âÊÔÎĵµ½øÐÐÖÐÎÄ·Ö´Ê´¦Àí
String[] words=testText.split(" ");
//¶Ô×Ö·û´®½øÐзָî
Set<String> wordSet = arrayToSet(words);//µ¥´ÊÈ¥ÖØ£¬»ñÈ¡²âÊÔ¼¯µÄµ¥´Ê±í³ß´ç
//TODO ²»´æÔڵĵ¥´ÊµÄУÑé
wordSet.addAll(vocabularySet);
double argmax = Double.NEGATIVE_INFINITY;//×î´óÀà¸ÅÂÊ£¨Ä¬ÈÏֵΪ¸ºÎÞÇîС£©,ÊÇ·ñ¿ÉÒÔд³É0£¿£¨ÎÞÇîС±¾Éí¾ÍÊǽӽü0£©
//for (Iterator iterator = classcountMap.keySet().iterator();
iterator.hasNext();)
Iterator iterator= classcountMap.keySet().iterator();//±éÀú
while(iterator.hasNext()){
String classNo = (String) iterator.next();
double prior = classcountMap.get(classNo)/(double)datacount;//ÏÈÑé¸ÅÂÊ
double likelihoodProbability=0; //³õʼ»¯ËÆÈ»¸ÅÂÊ
//¸ù¾Ý¹«Ê½Çó½â×î´óËÆÈ»¸ÅÂÊ£¬ÆäÖÐwordsÏ൱ÓÚÊôÐÔ¼´¶àάµÄ
for (String word:wordSet){
if(""!=word){
String word_classNo = word+"_"+classNo;
//»ñÈ¡²âÊÔÊý¾ÝµÄµ¥´ÊÀà±ð
//ÓëѵÁ·Êý¾Ý´Ê¿â½øÐжԱȣ¬ÇóµÃÏàËÆµÄ¸ÅÂÊ
if(likelihoodMap.containsKey(word_classNo)){
//½«Á¬³Ë×°»»³É¶ÔÊýÏà¼Ó¡¾ln(a*b)=lna+lnb¡¿,Ìá¸ßЧÂÊ
likelihoodProbability += Math.log((likelihoodMap.get(word_classNo)+1)
/((double)classcountMap.get(classNo)+2));
}
else{
likelihoodProbability +=Math.log((1-1/(double)(classcountMap.get(classNo)+2)));
}
}
}
//ÀûÓÃ×ÔÈ»¶ÔÊýe^loga = a£¬È¡µÃÔʼֵ
likelihoodProbability += Math.exp(likelihoodProbability)*prior;
System.out.println("×î´óËÆÈ»¸ÅÂÊ:"+argmax);
System.out.println("µÚ["+classNo+"]ÀàËÆÈ»¸ÅÂÊ:"+likelihoodProbability);
if(likelihoodProbability>argmax){
argmax = likelihoodProbability; //×î´óËÆÈ»¸ÅÂÊÒ»Ö±±£³Ö×î´óµÄËÆÈ»¸ÅÂÊ
PredictResult = classNo; //·µ»Ø·ÖÀàµÄ½á¹û
}
}
System.out.println("********");
System.out.println("¡¾²®Å¬ÀûÄ£ÐÍ×îÖÕ·ÖÀà½á¹û£º¡¿"+PredictResult);
return PredictResult;
}
public static void main(String[] args) {
long beginTime=System.currentTimeMillis();
String filedir="./data_training";
String testText = "ÄϾ© ±¬Õ¨ ʼþ ÖÐ £¬ ÄϾ© µçÊǪ́
Éú»î ƵµÀ ÒòΪ ×î Ôç ×ö ÁË ÏÖ³¡ Ö±²¥ ¶ø Êܵ½ Éϼ¶ ÅúÆÀ £» ±¬³ö ¡° ×î Å£1
¹ÙÇ» ¡± µÄ1 ½ËÕ µçÊǪ́ ³ÇÊРƵµÀ Ò² Òò Ö±²¥ ´Ë Ê ±» Éϼ¶ ÅúÆÀ £¬ Ïà¹Ø
À¸Ä¿ Ò² ÃæÁ٠ͣ ²¥ ¡£ ²» ÖªµÀ ÊÇ ¹ÙÔ± µÄ1 ¿É±¯ ¡¢ ÈËÃñ µÄ1 ¿É±¯ £¬ »¹
ÊÇ ¼ÇÕß µÄ1 ¿É±¯ £¡ Ò» ¸ö ±¯¾ç µÄ1 Éç»á £¡ Õæ TM ö»öº";
BayesModel(filedir);
PredictReslut(testText);
long endTime=System.currentTimeMillis();
long between=endTime-beginTime;
System.out.println("¹²¼ÆÓÃʱ£º"+between+"ºÁÃë");
}
} |
ÔËÐнá¹û£º

4 ¶àÏîʽģÐÍ
¸ÅÊö
±ÈBIM¸üΪ³£Óã¬ÓëBIM²»Í¬£¬¶àÏîʽ£¬Ä£ÐÍ¿¼Âǵ¥´ÊÔÚÎĵµÖÐµÄ´ÊÆµÐÅÏ¢¡£×îÖÕ´¦Àí»¹ÊǺóÑéÌõ¼þ¸ÅÂÊÔÚ½¨Ä£ºÍÔ¤²âµÄÓ°Ï죬²»Í¬ÓÚÒÔÉÏÏÈÑé¸ÅÂʵÄÇó½â¡£ÏÂÃæ¾ßÌåÆÊÎö¡£
·ÖÀàÄ£ÐÍ
Ä£ÐÍÖУ¬Îĵµ¿ÉÒÔ¿´×öÒ»¸ö³¤¶ÈΪfµÄµ¥´ÊÐòÁУ¨Í¬Ò»¸öµ¥´Ê¿É³öÏÖ¶à´Î£©£¬²¢¼ÙÉèÎĵµµÄ³¤¶ÈÓëÀà±ðÎ޹أ¬¶øÇÒÿ¸öµ¥´Ê³öÏÖµÄλÖÃÓëÆäËûµ¥´Ê¶ÀÁ¢£¬Éèµ¥´Êw_kÔÚÎĵµÖÐ´ÊÆµf_k¡£
ÎĵµdÔÚ¸ø¶¨Àà±ðc_iµÄÌõ¼þ¸ÅÂÊp(d|c_i)¿ÉÓÉÏÂÃæ¹«Ê½£º

½«£¨11)ʽ´úÈë(4)µÃ¶àÏîʽģÐ͵ķÖÀàÅбð¹æÔò£º

> ¶àÏîʽģÐÍÔÚÅбðÎĵµdµÄÀà±ðʱ£¬Í¬ÑùÖ»ÊÇʹÓÃÆµÊý·ÇÁãµÄµ¥´Ê¡££¨¶àÏîʽҲÊÇͨ¹ýÎĵµÖгöÏÖµ¥´ÊÀ´Åж¨ÎĵµÀà±ð£©
²ÎÊý¹À¼Æ
¶ÔÓÚÀà±ðc_iµÄÏÈÑé¸ÅÂʹÀ¼Æ¡£¶àÏîÄ£ÐÍÓë¶þÏîÄ£ÐÍÒ»Ñù£¬¶¼Ê¹Óù«Ê½(9)
µ¥´Êw_k¶ÔÓÚÀàc_iµÄÌõ¼þÏÈÑéÌõ¼þ¸ÅÂʵĹÀ¼Æ£¨¶àÏîʽ¿¼ÂÇͬһ´Ê¶à´Î³öÏÖ£©

> n_ki£ºw_kÔÚÀà±ðc_iÖгöÏÖ×Ü´ÎÊý
> |V|: ѵÁ·¼¯Öе¥´Ê±íµÄ³ß´ç
Ëã·¨½éÉÜÓëʵÏÖ
Ë㷨ʵÏÖ£º
package com.multinomial.bnc;
import java.io.BufferedReader;
import java.io.File;
import java.io.FileInputStream;
import java.io.IOException;
import java.io.InputStreamReader;
import java.util.HashMap;
import java.util.HashSet;
import java.util.Iterator;
import java.util.Map;
import java.util.Set;
import jeasy.analysis.MMAnalyzer;
public class MultinoBayesclass {
//´Ê»ã±í¼¯ºÏ(´Ê»ã±íµÄ³ß´ç£¬²»¿¼ÂÇÆµÊý£¬¼´Öظ´µ¥´Ê)
public static Set<String> vocabularySet
= new HashSet<String>();
//ÊôÓÚ×ܵĵ¥´ÊÊý
public static Integer datacount=0;
//ѵÁ·ÎĵµÖк¬Óе¥´Ê²¢ÇÒÀà±êΪijÀàµÄµ¥´ÊÊý
public static Map<String,Integer> likelihoodMap=new
HashMap<String, Integer>();
//ѵÁ·¼¯ÖÐijÀàµÄµ¥´ÊÊý
public static Map<String,Integer> classVocabularyMap=new
HashMap<String, Integer>();
/**
* ½¨Á¢ÆÓËØ±´Ò¶Ë¹Îı¾·ÖÀàÄ£ÐÍ
* @param fileDirPath ÎļþĿ¼
*/
public static void BayesModel(String fileDirPath){
try{
File dir=new File(fileDirPath);
if(dir.exists()&&dir.isDirectory()){
File[] files=dir.listFiles(); //»ñÈ¡ËùÓÐѵÁ·¼¯Îļþ
//±éÀúѵÁ·¼¯Îļþ
for(File file:files){
String classNo=file.getName().split("\\_")[0];//»ñÈ¡ÎļþÀà±ê
FileInputStream stream=new FileInputStream(file);
//»ñÈ¡ÎļþÁ÷
InputStreamReader strRead=new InputStreamReader(stream,"UTF-8");
//¶ÔÎļþ½øÐжÁÈ¡£¬ÇÒÖ¸¶¨±àÂë¸ñʽ
BufferedReader bufReader = new BufferedReader(strRead);
String line=null;
//¶ÁÈ¡ÎļþÄÚÈÝ
while((line=bufReader.readLine())!=null){
String lineText = line.trim(); //³ýÈ¥×Ö·û´®¿ªÍ·ºÍĩβµÄ¿Õ¸ñ»òÆäËû×Ö·û
String[] words = lineText.split(" ");
//¼ÓÈë´Ê»ã±í¼¯ºÏ
for(String word:words){
vocabularySet.add(word);
//ͳ¼ÆÄ³ÀàÖеĵ¥´Ê×ÜÊý
if(classVocabularyMap.containsKey(classNo)){
classVocabularyMap.put(classNo, classVocabularyMap.get(classNo)+1);
}
else{
classVocabularyMap.put(classNo, 1);
}
//ͳ¼ÆÄ³ÀàÖÐij¸öµ¥´Ê³öÏֵĴÎÊý
String wordNo=word+"_"+classNo;
if(likelihoodMap.containsKey(wordNo)){
likelihoodMap.put(wordNo, likelihoodMap.get(wordNo)+1);
}
else{
likelihoodMap.put(wordNo, 1);
}
}
datacount++;
}
strRead.close();
}
}
else{
System.out.println("ÕÒ²»µ½Ä¿Â¼Îļþ"+fileDirPath);
}
}
catch (Exception e) {
System.out.println("³ö´íÐÅÏ¢ÃèÊöÈçÏ£º"+e.getMessage());
}
}
/**
* ¶Ô²âÊÔÎı¾½øÐзÖÀàÔ¤²â
* @param testText ²âÊÔÊý¾Ý¼¯
* @return ·µ»Ø·ÖÀà½á¹û
*/
public static String PredictReslut(String testText)
{
int vsSize = vocabularySet.size();//´Ê»ã±í³¤¶È
String PredictResult="";
testText=SplitWords(testText, " ");//
¶Ô²âÊÔÎĵµ½øÐÐÖÐÎÄ·Ö´Ê´¦Àí
String[] words=testText.split(" ");
//¶Ô×Ö·û´®½øÐзָî
double argmax = Double.NEGATIVE_INFINITY;//×î´óÀà¸ÅÂÊ£¨Ä¬ÈÏֵΪ¸ºÎÞÇîС£©,ÊÇ·ñ¿ÉÒÔд³É0£¿£¨ÎÞÇîС±¾Éí¾ÍÊǽӽü0£©
Iterator iterator= classVocabularyMap.keySet().iterator();//±éÀú
while(iterator.hasNext())
{
String classNo = (String) iterator.next();
double prior = classVocabularyMap.get(classNo)/(double)datacount;//ÏÈÑé¸ÅÂÊ
double likelihoodProbability=0; //³õʼ»¯ËÆÈ»¸ÅÂÊ
//¸ù¾Ý¹«Ê½Çó½â×î´óËÆÈ»¸ÅÂÊ£¬ÆäÖÐwordsÏ൱ÓÚÊôÐÔ¼´¶àάµÄ
for (String word:words)
{
if(""!=word)
{
String word_classNo = word+"_"+classNo;
//»ñÈ¡²âÊÔÊý¾ÝµÄµ¥´ÊÀà±ð
//ÓëѵÁ·Êý¾Ý´Ê¿â½øÐжԱȣ¬ÇóµÃÏàËÆµÄ¸ÅÂÊ
if(likelihoodMap.containsKey(word_classNo)){
//½«Á¬³Ë×°»»³É¶ÔÊýÏà¼Ó¡¾ln(a*b)=lna+lnb¡¿,Ìá¸ßЧÂÊ
likelihoodProbability += Math.log((likelihoodMap.get(word_classNo)+1)/ ((double)classVocabularyMap.get(classNo)+vsSize));
}
else{
likelihoodProbability += Math.log((1/(double)
(classVocabularyMap.get(classNo)+vsSize)));
}
}
}
//ÀûÓÃ×ÔÈ»¶ÔÊýe^loga = a£¬È¡µÃÔʼֵ
likelihoodProbability +=Math.exp(likelihoodProbability)*prior;
System.out.println("×î´óËÆÈ»¸ÅÂÊ:"+argmax);
System.out.println("µÚ["+classNo+"]ÀàËÆÈ»¸ÅÂÊ:"+likelihoodProbability);
if(likelihoodProbability>argmax)
{
argmax = likelihoodProbability; //×î´óËÆÈ»¸ÅÂÊÒ»Ö±±£³Ö×î´óµÄËÆÈ»¸ÅÂÊ
PredictResult = classNo; //·µ»Ø·ÖÀàµÄ½á¹û
}
}
System.out.println("****************");
System.out.println("¡¾¶àÏîʽģÐÍ×îÖÕ·ÖÀà½á¹û£º¡¿"+PredictResult);
return PredictResult;
}
/**
* ¶Ô×Ö·û´®½øÐÐÖÐÎÄ·Ö´Ê´¦Àí
* @param text ¸ø¶¨Ô¤´¦ÀíµÄ×Ö·û´®
* @param splitToken ÓÃÓÚ·Ö¸îµÄ±ê¼Ç¡£Èç","
* @return ´¦ÀíºóµÄ×Ö·û´®
*/
public static String SplitWords (String text,String
splitToken){
String result = null;
MMAnalyzer analyzer = new MMAnalyzer(); //¼«Ò×ÖÐÎÄ·Ö´Ê
try{
result = analyzer.segment(text, splitToken);
}
catch (IOException e){
e.printStackTrace();
}
return result;
}
public static void main(String[] args) {
String filedir="./data_training";
String testText = "ÄϾ© ±¬Õ¨ ʼþ ÖÐ £¬ ÄϾ© µçÊǪ́
Éú»î ƵµÀ ÒòΪ ×î Ôç ×ö ÁË ÏÖ³¡ Ö±²¥ ¶ø Êܵ½ Éϼ¶ ÅúÆÀ £» ±¬³ö ¡° ×î Å£1
¹ÙÇ» ¡± µÄ1 ½ËÕ µçÊǪ́ ³ÇÊРƵµÀ Ò² Òò Ö±²¥ ´Ë Ê ±» Éϼ¶ ÅúÆÀ £¬ Ïà¹Ø
À¸Ä¿ Ò² ÃæÁ٠ͣ ²¥ ¡£ ²» ÖªµÀ ÊÇ ¹ÙÔ± µÄ1 ¿É±¯ ¡¢ ÈËÃñ µÄ1 ¿É±¯ £¬ »¹
ÊÇ ¼ÇÕß µÄ1 ¿É±¯ £¡ Ò» ¸ö ±¯¾ç µÄ1 Éç»á £¡ Õæ TM ö»öº";
BayesModel(filedir);
PredictReslut(testText);
}
} |
ÔËÐнá¹û£º

5 »ìºÏÄ£ÐÍ
˼Ïë¸ÅÊö
ÔÚ¹À¼Æµ¥´Ê¶ÔÀà±ðµÄÏÈÑé¸ÅÂÊʱʹÓöþÏî¶ÀÁ¢Ä£ÐÍ£¬¶ø·ÖÀà½×¶Î¹À¼ÆÀà±ð¶ÔÓÚÌØ·ÖÀàÎĵµµÄºóÑé¸ÅÂÊʱ£¬Ê¹ÓöàÏîʽģÐÍ
¶Ô±ÈÌåÏÖ
¶þÏî¶ÀÁ¢Ä£ÐÍȱµã£ºÖ»¿¼Âǵ¥´Ê³öÏֺͲ»³öÏÖµÄÇé¿ö£¬ºöÂÔÁËÆµÂÊÐÅÏ¢£¨ÓпÉÄÜ»ìÏýÁËÖØÒªµ¥´ÊºÍ²»ÖØÒªµ¥´ÊÇø±ð£©
¶àÏîʽģÐÍȱµã£º¼ÙÉè¹ýÓÚÑϸñ£¬¼´¼ÙÉèͬһµ¥´ÊÔÚͬһÎĵµÖеĶà´Î³öÏÖÊǶÀÁ¢µÄ£¨ÊÂʵ²¢·ÇÈç´Ë£©
¶þÏî¶ÀÁ¢Ä£ÐͼÙÉ裺²»Í¬µ¥´ÊÔÚͬһÎĵµÖжà´Î³öÏÖÏ໥¶ÀÁ¢
¶àÏîʽ¼ÙÉ裺ͬһµ¥´ÊÔÚͬһÎĵµÖжà´Î³öÏÖÏ໥¶ÀÁ¢£¨ÏÔÈ»±È½Ï¶þÏîʽ¼ÙÉè¸ü²»ºÏÀí£©
ʵ¼ÊÖУ¬ÑµÁ·ÎĵµÍ¨³£³ä×㣬²»Ê¹Óõ¥´ÊÔÚÎĵµÖÐµÄÆµÂÊÐÅÏ¢£¬Ò²¿ÉÒԺܺõķÖÀ࣬¹ý¶à¿¼ÂÇÆµÂÊÐÅÏ¢·Çµ«²»»á¶Ô·ÖÀàÓаïÖú£¬·´¶øÆðÏà·´×÷Óᣵ«ÊÇÔÚѵÁ·½×¶Î¿ÉÒÔ²»¿¼ÂÇÆµÂÊÐÅÏ¢£¬ÔÚ·ÖÀà½×¶Î£¬ÎÒÃÇÕë¶ÔÎĵµ£¬Õâʱµ¥´ÊÔÚÎĵµÖÐµÄÆµÊýÐÅÏ¢ÓÈÎªÖØÒª¡£Èç¹û
½ö½ö¿¼ÂdzöÏÖÓë·ñ£¬²»Í¬Àà±ð³öÏÖ¹²Í¬ÆµÊý¸ßµÄ´Ê±»ºöÂÔ£¬¿ÉÄܵ¼Ö·ÖÀàÎó²î´ó
ÏÈÑéÌõ¼þ¸ÅÂÊ£ºÍ¨¹ým¹À¼Æ

> n_ki:w_kÓëc_iͬʱ³öÏֵĴÎÊý
> n_i£ºÑµÁ·¼¯ÖÐÀà±ðc_iµÄ³öÏֵĴÎÊý
> p:w_kÔÚc_i³öÏֵĹÀ¼Æ¸ÅÂÊ
> m:µÈЧÑù±¾Êý
×¢£º¶þÏî¶ÀÁ¢Ä£ÐÍÖÐÈ¡p=1/2,m=2
¶àÏîʽģÐÍÈ¡p=1/|V| , m=|V| |V|:µ¥´Ê±í³ß´ç
È»¶ø¡£Êµ¼ÊÖÐijÀàµÄ¾Ö²¿µ¥´Ê±í±ÈÕû¸öÊý¾Ý¼¯Ð¡µÃ¶à£¬Òò´ËBIMÖйÀ¼ÆÄ£ÐÍÖе¥´Ê¶ÔÀàÌõ¼þÏÈÑé¸ÅÂÊÈ¡p=1/2²»ºÏÀí¡£¹Ê»ìÏýÄ£ÐÍÖйÀ¼Æp(w_K|c_i)ʱ£¬²»Ê¹Óù«Ê½£¨10£©£¬¶ø²Î¿¼MMÄ£ÐÍÖд¦Àí·½Ê½È¡p=1/|v|,m=|v|

> n_ki:ѵÁ·Îĵµº¬Óе¥´ÊW_k²¢ÇÒΪc_iµÄÎĵµÊý
> n_i£ºÑµÁ·ÎĵµÖÐÀà±ðc_iµÄÎĵµ´ÎÊý
> |V|:µ¥´Ê±í³ß´ç
»ìÏýÄ£ÐÍµÄÆÓËØ±´Ò¶Ë¹Ëã·¨
Ëã·¨2£º»ìÏýÄ£ÐÍµÄÆÓËØ±´Ò¶Ë¹·ÖÀàÆ÷
ѵÁ·½×¶Î£ºÀûÓù«Ê½£¨16£©¹À¼ÆÏÈÑéÌõ¼þ¸ÅÂÊp£¨w_k|c_i£©£¬ÀûÓù«Ê½£¨9£©¹À¼Æ¸ÅÂÊp£¨c_i£©¡£
·ÖÀà½×¶Î£º¸ø¶¨´ý·ÖÀàÎĵµdÓù«Ê½£¨13£©¾ö¶¨ËüµÄÀà±ð¡£
6 ×ÛÊö
1. ÊÂÏÈÊÕ¼¯´¦ÀíÊý¾Ý¼¯£¨Éæ¼°ÍøÂçÅÀ³æºÍÖÐÎÄÇдʣ¬ÌØÕ÷ѡȡ£©
2. Ô¤´¦Àí£º£¨È¥µôÍ£Óôʣ¬ÒƳýƵÊý¹ýСµÄ´Ê»ã¡¾¸ù¾Ý¾ßÌåÇé¿ö¡¿£©
3. ʵÑé¹ý³Ì£º
Êý¾Ý¼¯·ÖÁ½²¿·Ö£¨3:7£©£º30%×÷Ϊ²âÊÔ¼¯£¬70%×÷ΪѵÁ·¼¯
Ôö¼ÓÖÃÐŶȣº10-ÕÛ½»²æÑéÖ¤£¨Õû¸öÊý¾Ý¼¯·ÖΪ10µÈ·Ý£¬9·ÝºÏ²¢ÎªÑµÁ·¼¯£¬ÓàÏÂ1·Ý×÷Ϊ²âÊÔ¼¯¡£Ò»¹²ÔËÐÐ10±é£¬È¡Æ½¾ùÖµ×÷Ϊ·ÖÀà½á¹û£©ÓÅȱµã¶Ô±È·ÖÎö
4. ÆÀ¼Û±ê×¼£º
ºêÆÀ¼Û&΢ÆÀ¼Û
ÐÂµÄÆ½»¬Òò×Ó
ÒýÈëµ¥´ÊÁ¿Ïà¹ØµÄƽ»¬Òò×Ó£¬pÈÔ¾ÉΪ1/|V|£¬¶øµÈЧÑù±¾ÊýmÔòȡƽ¾ùÿÀà°üº¬µÄµ¥´ÊÁ¿µÄ¦Á±¶£¨¦Á<<1£©µÃµ½£º

ÔÚËã·¨2ÖУ¬Óù«Ê½£¨11£©´úÌæ£¨16£©¶Ôp£¨w_k|c_i£©½øÐйÀ¼Æ¡£
7 ½áÊøÓï
±¾ÎĵĶÔ֮ǰÏîÄ¿ºÍ×ÊÁϽøÐÐÕûÀí×ܽáËùµÃ£¬ÍêÕûµÄдÁËÒ»Ì죬¶Ô²©¿ÍÔ°µÄ±à¼Æ÷СС±§Ô¹£¬Êéд¹«Ê½Ì«²»·½±ã
ÁË¡£Å׿ª´ÎÒªÎÊÌâ¡£±¾ÎÄ»¹ÓдýÍêÉÆµÄ²¿·Ö£º¶à¸öÊý¾Ý¼¯·ÖÀàЧ¹ûµÄ±È½Ï¡¢²»Í¬Æ½»¬Òò×Ó·ÖÀà½á¹û¡¢·ÖÀà½á¹ûµÄÑéÖ¤£¨±ÈÈç10-ÕÛ½»²æÑéÖ¤£©¡¢Óë¾ö²ßÊ÷Ö§³ÖÏòÁ¿»ú·ÖÀàµÄÓÅȱµã±È½ÏµÈ¡£ÔÚÎĵµÕûÀí¹ý³ÌÖв»ÉÙÄÚÈÝûÓÐһһд½øÁË£¬°üÀ¨²¿·ÖÄÚÈÝÖ»ÊÇÌáÈ¡ºËÐÄ֪ʶ£¬»¶Ó´ó¼ÒÖ¸ÕýºÍÓÅ»¯¡£ÐèÒªÔ´Âë¿ÉÒÔ˽ÐÅÎÒ¡£±ÊÕß½ÓÏÂÀ´Ñо¿·½Ïò£º
k¾ùÖµ¡¢·Ö²ã¾ÛÀà¡¢Æ×¾ÛÀàµÄÇø±ðºÍ¾ÛÀàʵÏÖ
¹ØÓÚÖ÷¶¯Ñ§Ï°µÄÁìÓò±¾Ìå¹¹½¨
»úÆ÷ѧϰËã·¨Ñо¿ |