Äú¿ÉÒÔ¾èÖú£¬Ö§³ÖÎÒÃǵĹ«ÒæÊÂÒµ¡£

1Ôª 10Ôª 50Ôª





ÈÏÖ¤Â룺  ÑéÖ¤Âë,¿´²»Çå³þ?Çëµã»÷Ë¢ÐÂÑéÖ¤Âë ±ØÌî



  ÇóÖª ÎÄÕ ÎÄ¿â Lib ÊÓÆµ iPerson ¿Î³Ì ÈÏÖ¤ ×Éѯ ¹¤¾ß ½²×ù Modeler   Code  
»áÔ±   
 
   
 
 
     
   
 ¶©ÔÄ
  ¾èÖú
È«ÎļìË÷ÒýÇæSolrϵÁУ¨ÉÏ£©
 
×÷ÕߣºÁõÖ¾¾ü À´Ô´£ºImportNew ·¢²¼ÓÚ 2016-1-4
  3798  次浏览      27
 

È«ÎļìË÷ÒýÇæSolrϵÁСª¡ªÈëÃÅÆª

Solr²ÉÓÃLuceneËÑË÷¿âΪºËÐÄ£¬ÌṩȫÎÄË÷ÒýºÍËÑË÷¿ªÔ´Æóҵƽ̨£¬ÌṩRESTµÄHTTP/XMLºÍJSONµÄAPI£¬Èç¹ûÄãÊÇSolrÐÂÊÖ£¬ÄÇô¾ÍºÍÎÒÒ»ÆðÀ´ÈëÃŰɣ¡±¾½Ì³ÌÒÔsolr4.8×÷Ϊ²âÊÔ»·¾³£¬jdk°æ±¾ÐèÒª1.7¼°ÒÔÉϰ汾¡£

×¼±¸

±¾ÎļÙÉèÄã¶ÔJavaÓгõÖм¶ÒÔÉÏˮƽ£¬Òò´Ë²»ÔÙ½éÉÜJavaÏà¹Ø»·¾³µÄÅäÖá£ÏÂÔØ½âѹËõsolr£¬ÔÚexampleĿ¼ÓÐstart.jarÎļþ£¬Æô¶¯£º

java -jar start.jar

Ë÷ÒýÊý¾Ý

·þÎñÆô¶¯ºó£¬Ä¿Ç°Äã¿´µ½µÄ½çÃæÃ»ÓÐÈκÎÊý¾Ý£¬Äã¿ÉÒÔͨ¹ýPOSTingÃüÁîÏòSolrÖÐÌí¼Ó£¨¸üУ©Îĵµ£¬É¾³ýÎĵµ£¬ÔÚexampledocsĿ¼°üº¬Ò»Ð©Ê¾ÀýÎļþ£¬ÔËÐÐÃüÁ

java -jar post.jar solr.xml monitor.xml

ÉÏÃæµÄÃüÁîÊÇÏòsolrÌí¼ÓÁËÁ½·ÝÎĵµ£¬´ò¿ªÕâÁ½¸öÎļþ¿´¿´ÀïÃæÊÇʲôÄÚÈÝ£¬solr.xmlÀïÃæµÄÄÚÈÝÊÇ£º

<add>
<doc>
<field name="id">SOLR1000</field>
<field name="name">Solr, the Enterprise Search Server</field>
<field name="manu">Apache Software Foundation</field>
<field name="cat">software</field>
<field name="cat">search</field>
<field name="features">Advanced Full-Text Search Capabilities using Lucene</field>
<field name="features">Optimized for High Volume Web Traffic</field>
<field name="features">Standards Based Open Interfaces - XML and HTTP</field>
<field name="features">Comprehensive HTML Administration Interfaces</field>
<field name="features">Scalability - Efficient Replication to other Solr Search Servers</field>
<field name="features">Flexible and Adaptable with XML configuration and Schema</field>
<field name="features">Good unicode support: h&#xE9;llo (hello with an accent over the e)</field>
<field name="price">0</field>
<field name="popularity">10</field>
<field name="inStock">true</field>
<field name="incubationdate_dt">2006-01-17T00:00:00.000Z</field>
</doc>
</add>

±íʾÏòË÷ÒýÖÐÌí¼ÓÒ»¸öÎĵµ£¬Îĵµ¾ÍÊÇÓÃÀ´ËÑË÷µÄÊý¾ÝÔ´£¬ÏÖÔھͿÉÒÔͨ¹ý¹ÜÀí½çÃæËÑË÷¹Ø¼ü×Ö¡±solr¡±£¬¾ßÌå²½ÖèÊÇ£º

µã»÷Ò³ÃæÏµÄExecute Query°´Å¥ºóÓÒ²à¾Í»áÏÔʾ²éѯ½á¹û£¬Õâ¸ö½á¹û¾ÍÊǸղŵ¼Èë½øÈ¥µÄsolr.xmlµÄjson¸ñʽµÄչʾ½á¹û¡£solrÖ§³Ö·á¸»µÄ²éѯÓï·¨£¬±ÈÈ磺ÏÖÔÚÏëËÑË÷×Ö¶ÎnameÀïÃæµÄ¹Ø¼ü×Ö¡±Search¡±¾Í¿ÉÒÔÓÃÓï·¨name:search£¬µ±È»Èç¹ûÄãËÑË÷name:xxx¾ÍûÓзµ»Ø½á¹ûÁË£¬ÒòΪÎĵµÖÐûÓÐÕâÑùµÄÄÚÈÝ¡£

Êý¾Ýµ¼Èë

µ¼ÈëÊý¾Ýµ½SolrµÄ·½Ê½Ò²ÊǶàÖÖ¶àÑùµÄ£º

¿ÉÒÔʹÓÃDIH(DataImportHandler)´ÓÊý¾Ý¿âµ¼ÈëÊý¾Ý

Ö§³ÖCSVÎļþµ¼È룬Òò´ËExcelÊý¾ÝÒ²ÄÜÇáËɵ¼Èë

Ö§³ÖJSON¸ñʽÎĵµ

¶þ½øÖÆÎĵµ±ÈÈ磺Word¡¢PDF

»¹ÄÜÒÔ±à³ÌµÄ·½Ê½À´×Ô¶¨Òåµ¼Èë

¸üÐÂÊý¾Ý

Èç¹ûͬһ·ÝÎĵµsolr.xmlÖØ¸´µ¼Èë»á³öÏÖʲôÇé¿öÄØ£¿Êµ¼ÊÉÏsolr»á¸ù¾ÝÎĵµµÄ×Ö¶ÎidÀ´Î¨Ò»±êʶÎĵµ£¬Èç¹ûµ¼ÈëµÄÎĵµµÄidÒѾ­´æÔÚsolrÖУ¬ÄÇôÕâ·ÝÎĵµ¾Í±»×îе¼ÈëµÄͬidµÄÎĵµ×Ô¶¯Ìæ»»¡£Äã¿ÉÒÔ×Ô¼º³¢ÊÔÊÔÑéһϣ¬¹Û²ìÌæ»»Ç°ºó¹ÜÀí½çÃæµÄ¼¸¸ö²ÎÊý£ºNum Docs£¬Max Doc£¬Deleted DocsµÄ±ä»¯¡£

numDocs£ºµ±Ç°ÏµÍ³ÖеÄÎĵµÊýÁ¿£¬ËüÓпÉÄÜ´óÓÚxmlÎļþ¸öÊý£¬ÒòΪһ¸öxmlÎļþ¿ÉÄÜÓжà¸ö<doc>±êÇ©¡£

maxDoc£ºmaxDocÓпÉÄܱÈnumDocsµÄÖµÒª´ó£¬±ÈÈçÖØ¸´postͬһ·ÝÎļþºó£¬maxDocÖµ¾ÍÔö´óÁË¡£

deletedDocs£ºÖظ´postµÄÎļþ»áÌæ»»µôÀϵÄÎĵµ£¬Í¬Ê±deltedDocsµÄÖµÒ²»á¼Ó1£¬²»¹ýÕâÖ»ÊÇÂß¼­ÉϵÄɾ³ý£¬²¢Ã»ÓÐÕæÕý´ÓË÷ÒýÖÐÒÆ³ýµô

ɾ³ýÊý¾Ý

ͨ¹ýidɾ³ýÖ¸¶¨µÄÎĵµ£¬»òÕßͨ¹ýÒ»¸ö²éѯÀ´É¾³ýÆ¥ÅäµÄÎĵµ

java -Ddata=args -jar post.jar "<delete><id>SOLR1000</id></delete>"
java -Ddata=args -jar post.jar "<delete><query>name:DDR</query></delete>"

´Ëʱsolr.xmlÎĵµ´ÓË÷ÒýÖÐɾ³ýÁË£¬ÔÙ´ÎËÑ¡±solr¡±Ê±²»ÔÙ·µ»Ø½á¹û¡£µ±È»solrÒ²ÓÐÊý¾Ý¿âÖеÄÊÂÎñ£¬Ö´ÐÐɾ³ýÃüÁîµÄʱºòÊÂÎñ×Ô¶¯Ìá½»ÁË£¬Îĵµ¾Í»áÁ¢¼´´ÓË÷ÒýÖÐɾ³ý¡£ÄãÒ²¿ÉÒÔ°ÑcommitÉèÖÃΪfalse£¬ÊÖ¶¯Ìá½»ÊÂÎñ¡£

java -Ddata=args  -Dcommit=false -jar post.jar "<delete><id>3007WFP</id></delete>"

Ö´ÐÐÍêÉÏÃæµÄÃüÁîʱÎĵµ²¢Ã»ÓÐÕæÕýɾ³ý£¬»¹ÊÇ¿ÉÒÔ¼ÌÐøËÑË÷Ïà¹Ø½á¹û£¬×îºó¿ÉÒÔͨ¹ýÃüÁ

java -jar post.jar -

Ìá½»ÊÂÎñ£¬Îĵµ¾Í³¹µ×ɾ³ýÁË¡£ÏÖÔڰѸոÕɾ³ýµÄÎļþÖØÐµ¼ÈëSolrÖÐÀ´£¬¼ÌÐøÎÒÃǵÄѧϰ¡£

ɾ³ýËùÓÐÊý¾Ý£º

http://localhost:8983/solr/collection1/update?stream.body=<delete><query>*:*</query></delete>&commit=true

ɾ³ýÖ¸¶¨Êý¾Ý

http://localhost:8983/solr/collection1/update?stream.body=<delete><query>title:abc</query></delete>&commit=true

¶àÌõ¼þɾ³ý

http://localhost:8983/solr/collection1/update?stream.body=<delete>
<query>title:abc AND name:zhang</query></delete>&commit=true

²éѯÊý¾Ý

²éѯÊý¾Ý¶¼ÊÇͨ¹ýHTTPµÄGETÇëÇó»ñÈ¡µÄ£¬ËÑË÷¹Ø¼ü×ÖÓòÎÊýqÖ¸¶¨£¬ÁíÍ⻹¿ÉÒÔÖ¸¶¨ºÜ¶à¿ÉÑ¡µÄ²ÎÊýÀ´¿ØÖÆÐÅÏ¢µÄ·µ»Ø£¬ÀýÈ磺ÓÃflÖ¸¶¨·µ»ØµÄ×ֶΣ¬±ÈÈçf1=name£¬ÄÇô·µ»ØµÄÊý¾Ý¾ÍÖ»°üÀ¨name×ֶεÄÄÚÈÝ

http://localhost:8983/solr/collection1/select?q=solr&fl=name&wt=json&indent=true

ÅÅÐò

SolrÌṩÅÅÐòµÄ¹¦ÄÜ£¬Í¨¹ý²ÎÊýsortÀ´Ö¸¶¨£¬ËüÖ§³ÖÕýÐò¡¢µ¹Ðò£¬»òÕß¶à¸ö×Ö¶ÎÅÅÐò

q=video&sort=price desc
q=video&sort=price asc
q=video&sort=inStock asc, price desc

ĬÈÏÌõ¼þÏ£¬Solr¸ù¾Ýsocre µ¹ÐòÅÅÁУ¬socreÊÇÒ»ÌõËÑË÷¼Ç¼¸ù¾ÝÏà¹Ø¶È¼ÆËã³öÀ´µÄÒ»¸ö·ÖÊý¡£

¸ßÁÁ

ÍøÒ³ËÑË÷ÖУ¬ÎªÁËÍ»³öËÑË÷½á¹û£¬¿ÉÄÜ»á¶ÔÆ¥ÅäµÄ¹Ø¼ü×Ö¸ßÁÁ³öÀ´£¬SolrÌṩÁ˺ܺõÄÖ§³Ö£¬Ö»ÒªÖ¸¶¨²ÎÊý£º

hl=true #¿ªÆô¸ßÁÁ¹¦ÄÜ

hl.fl=name #Ö¸¶¨ÐèÒª¸ßÁÁµÄ×Ö¶Î

http://localhost:8983/solr/collection1/select?q=Search&wt=json&indent=true&hl=true&hl.fl=features

·µ»ØµÄÄÚÈÝÖаüº¬£º

"highlighting":{
"SOLR1000":{
"features":["Advanced Full-Text <em>Search</em> Capabilities using Lucene"]
}
}

Îı¾·ÖÎö

Îı¾×Ö¶Îͨ¹ý°ÑÎı¾·Ö¸î³Éµ¥´ÊÒÔ¼°ÔËÓø÷ÖÖת»»·½·¨£¨È磺Сдת»»¡¢¸´ÊýÒÆ³ý¡¢´Ê¸ÉÌáÈ¡£©ºó±»Ë÷Òý£¬schema.xmlÎļþÖж¨ÒåÁË×Ö¶ÎÔÚË÷ÒýÖУ¬ÕâЩ×ֶν«×÷ÓÃÓÚÆäÖÐ.

ĬÈÏÇé¿öÏÂËÑË÷¡±power-shot¡±ÊDz»ÄÜÆ¥Å䡱powershot¡±µÄ£¬Í¨¹ýÐÞ¸Äschema.xmlÎļþ(solr/example/solr/collection1/confĿ¼)£¬°ÑfeaturesºÍtext×Ö¶ÎÌæ»»³É¡±text_en_splitting¡±ÀàÐÍ£¬¾ÍÄÜË÷Òýµ½ÁË¡£

<field name="features" type="text_en_splitting" indexed="true" stored="true" multiValued="true"/>
...
<field name="text" type="text_en_splitting" indexed="true" stored="false" multiValued="true"/>

ÐÞ¸ÄÍêºóÖØÆôsolr£¬È»ºóÖØÐµ¼ÈëÎĵµ

java -jar post.jar *.xml

ÏÖÔھͿÉÒÔÆ¥ÅäÁË

power-shot¡ª>Powershot
features:recharing¡ª>Rechargeable
1 gigabyte ¨C> 1G

×ܽá

×÷ΪÈëÃÅÎÄÕ£¬±¾ÎÄûÓÐÒýÈëÌ«¶à¸ÅÄî¡£°²×°µ½²¿Êð£¬Îĵµ¸üУ¬¶ÔsolrÓÐÁ˳õ²½¸ÐÐÔµÄÈÏʶ£¬ÏÂһƪ½«½éÉÜÈ«ÎļìË÷µÄ»ù±¾Ô­Àí¡£

È«ÎļìË÷ÒýÇæSolrϵÁСª¨CÈ«ÎļìË÷»ù±¾Ô­Àí

³¡¾°£ºÐ¡Ê±ºòÎÒÃǶ¼Ê¹Óùýлª×ֵ䣬ÂèÂè½ÐÄã·­¿ªµÚ38Ò³£¬ÕÒµ½¡°¿Óµù¡±ËùÔÚµÄλÖ㬴ËʱÄã»áÔõô²éÄØ£¿ºÁÎÞÒÉÎÊ£¬ÄãµÄÑÛ¾¦»á´Ó38Ò³µÄµÚÒ»¸ö×Ö¿ªÊ¼´ÓÍ·ÖÁβµØÉ¨Ã裬ֱµ½ÕÒµ½¡°¿Óµù¡±¶þ×ÖΪֹ¡£ÕâÖÖËÑË÷·½·¨½Ð×ö˳ÐòɨÃè·¨¡£¶ÔÓÚÉÙÁ¿µÄÊý¾Ý£¬Ê¹ÓÃ˳ÐòɨÃèÊǹ»Óõġ£µ«ÊÇÂèÂè½ÐÄã²é³ö¿ÓµùµÄ¡°¿Ó¡±×ÖÔÚÄÄһҳʱ£¬ÄãÒªÊÇ´ÓµÚÒ»Ò³µÄµÚÒ»¸ö×ÖÖð¸öµÄɨÃèÏÂÈ¥£¬ÄÇÄãÕæµÄÊDZ»¿ÓÁË¡£´ËʱÄã¾ÍÐèÒªÓõ½Ë÷Òý¡£Ë÷Òý¼Ç¼ÁË¡°¿Ó¡±×ÖÔÚÄÄÒ»Ò³£¬ÄãÖ»ÐèÔÚË÷ÒýÖÐÕÒµ½¡°¿Ó¡±×Ö£¬È»ºóÕÒµ½¶ÔÓ¦µÄÒ³Â룬´ð°¸¾Í³öÀ´ÁË¡£ÒòΪÔÚË÷ÒýÖвéÕÒ¡°¿Ó¡±×ÖÊǷdz£¿ìµÄ£¬ÒòΪÄãÖªµÀËüµÄÆ«ÅÔ£¬Òò´ËÒ²¾Í¿ÉѸËÙ¶¨Î»µ½Õâ¸ö×Ö¡£

ÄÇôлª×ÖµäµÄĿ¼£¨Ë÷Òý±í£©ÊÇÔõô±àд¶ø³ÉµÄÄØ£¿Ê×ÏȶÔÓÚлª×ÖµäÕâ±¾ÊéÀ´Ëµ£¬³ýȥĿ¼ºó£¬Õâ±¾Êé¾ÍÊÇÒ»¶ÑûÓнṹµÄÊý¾Ý¼¯¡£µ«ÊÇ´ÏÃ÷µÄÈËÀàÉÆÓÚ˼¿¼×ܽᣬ·¢ÏÖÿ¸ö×Ö¶¼»á¶ÔÓ¦µ½Ò»¸öÒ³Â룬±ÈÈç¡°¿Ó¡±×Ö¾ÍÔÚµÚ38Ò³£¬¡°µù¡±×ÖÔÚµÚ90Ò³¡£ÓÚÊÇËûÃǾʹÓÖÐÌáÈ¡ÕâЩÐÅÏ¢£¬¹¹Ôì³ÉÒ»¸öÓнṹµÄÊý¾Ý¡£ÀàËÆÊý¾Ý¿âÖеıí½á¹¹£º

word    page_no
---------------
¿Ó 38
µù 90
... ...

ÕâÑù¾ÍÐγÉÁËÒ»¸öÍêÕûµÄĿ¼£¨Ë÷Òý¿â£©£¬²éÕÒµÄʱºò¾Í·Ç³£·½±ãÁË¡£¶ÔÓÚÈ«ÎļìË÷Ò²ÊÇÀàËÆµÄÔ­Àí£¬Ëü¿ÉÒÔ¹é½áΪÁ½¸ö¹ý³Ì£º1.Ë÷Òý´´½¨£¨Indexing£©2. ËÑË÷Ë÷Òý£¨Search£©¡£ÄÇôË÷Òýµ½µ×ÊÇÈçºÎ´´½¨µÄÄØ£¿Ë÷ÒýÀïÃæ´æ·ÅµÄÓÖÊÇʲô¶«Î÷ÄØ£¿ËÑË÷µÄµÄʱºòÓÖÊÇÈçºÎÈ¥²éÕÒË÷ÒýµÄÄØ£¿´ø×ÅÕâһϵÁÐÎÊÌâ¼ÌÐøÍùÏ¿´¡£

Ë÷Òý

Solr/Lucene²ÉÓõÄÊÇÒ»ÖÖ·´ÏòË÷Òý£¬Ëùν·´ÏòË÷Òý£º¾ÍÊǴӹؼü×Öµ½ÎĵµµÄÓ³Éä¹ý³Ì£¬±£´æÕâÖÖÓ³ÉäÕâÖÖÐÅÏ¢µÄË÷Òý³ÆÎª·´ÏòË÷Òý

×ó±ß±£´æµÄÊÇ×Ö·û´®ÐòÁÐ

ÓÒ±ßÊÇ×Ö·û´®µÄÎĵµ£¨Document£©±àºÅÁ´±í£¬³ÆÎªµ¹ÅÅ±í£¨Posting List£©

×ֶδ®ÁбíºÍÎĵµ±àºÅÁ´±íÁ½Õß¹¹³ÉÁËÒ»¸ö×ֵ䡣ÏÖÔÚÏëËÑË÷¡±lucene¡±£¬ÄÇôË÷ÒýÖ±½Ó¸æËßÎÒÃÇ£¬°üº¬ÓС±lucene¡±µÄÎĵµÓУº2£¬3£¬10£¬35£¬92£¬¶øÎÞÐèÔÚÕû¸öÎĵµ¿âÖÐÖð¸ö²éÕÒ¡£Èç¹ûÊÇÏëËѼȰüº¬¡±lucene¡±ÓÖ°üº¬¡±solr¡±µÄÎĵµ£¬ÄÇôÓëÖ®¶ÔÓ¦µÄÁ½¸öµ¹ÅűíÈ¥½»¼¯¼´¿É»ñµÃ£º3¡¢10¡¢35¡¢92¡£

Ë÷Òý´´½¨

¼ÙÉèÓÐÈçÏÂÁ½¸öԭʼÎĵµ£º

ÎĵµÒ»£ºStudents should be allowed to go out with their friends, but not allowed to drink beer.

Îĵµ¶þ£ºMy friend Jerry went to school to see his students but found them drunk which is not allowed.

´´½¨¹ý³Ì´ó¸Å·ÖΪÈçϲ½Ö裺

Ò»£º°ÑԭʼÎĵµ½»¸ø·Ö´Ê×é¼þ(Tokenizer)

·Ö´Ê×é¼þ(Tokenizer)»á×öÒÔϼ¸¼þÊÂÇé(Õâ¸ö¹ý³Ì³ÆÎª£ºTokenize)£¬´¦ÀíµÃµ½µÄ½á¹ûÊǴʻ㵥Ԫ£¨Token£©

½«Îĵµ·Ö³ÉÒ»¸öÒ»¸öµ¥¶ÀµÄµ¥´Ê

È¥³ý±êµã·ûºÅ

È¥³ýÍ£´Ê(stop word)

Ëùνͣ´Ê(Stop word)¾ÍÊÇÒ»ÖÖÓïÑÔÖÐûÓоßÌ庬Ò壬Òò¶ø´ó¶àÊýÇé¿öϲ»»á×÷ΪËÑË÷µÄ¹Ø¼ü´Ê£¬

ÕâÑùÒ»À´´´½¨Ë÷ÒýʱÄܼõÉÙË÷ÒýµÄ´óС¡£Ó¢ÓïÖÐÍ£´Ê(Stop word)È磺

¡±the¡±¡¢¡±a¡±¡¢¡±this¡±£¬ÖÐÎÄÓУº¡±µÄ£¬µÃ¡±µÈ¡£

²»Í¬ÓïÖֵķִÊ×é¼þ(Tokenizer)£¬¶¼ÓÐ×Ô¼ºµÄÍ£´Ê(stop word)¼¯ºÏ¡£

¾­¹ý·Ö´Ê(Tokenizer)ºóµÃµ½µÄ½á¹û³ÆÎª´Ê»ãµ¥Ôª(Token)¡£ÉÏÀý×ÓÖУ¬±ãµÃµ½ÒÔÏ´ʻ㵥Ԫ(Token)£º

"Students"£¬"allowed"£¬"go"£¬"their"£¬"friends"£¬"allowed"£¬

"drink"£¬"beer"£¬"My"£¬"friend"£¬"Jerry"£¬"went"£¬"school"£¬

"see"£¬"his"£¬"students"£¬"found"£¬"them"£¬"drunk"£¬"allowed"

¶þ£º´Ê»ãµ¥Ôª(Token)´«¸øÓïÑÔ´¦Àí×é¼þ(Linguistic Processor)

ÓïÑÔ´¦Àí×é¼þ(linguistic processor)Ö÷ÒªÊǶԵõ½µÄ´ÊÔª(Token)×öһЩÓïÑÔÏà¹ØµÄ´¦Àí¡£

¶ÔÓÚÓ¢ÓÓïÑÔ´¦Àí×é¼þ(Linguistic Processor)Ò»°ã×öÒÔϼ¸µã£º

±äΪСд(Lowercase)¡£

½«µ¥´ÊËõ¼õΪ´Ê¸ùÐÎʽ£¬È硱cars¡±µ½¡±car¡±µÈ¡£ÕâÖÖ²Ù×÷³ÆÎª£ºstemming¡£

½«µ¥´Êת±äΪ´Ê¸ùÐÎʽ£¬È硱drove¡±µ½¡±drive¡±µÈ¡£ÕâÖÖ²Ù×÷³ÆÎª£ºlemmatization¡£

ÓïÑÔ´¦Àí×é¼þ(linguistic processor)´¦ÀíµÃµ½µÄ½á¹û³ÆÎª´Ê(Term)£¬Àý×ÓÖо­¹ýÓïÑÔ´¦ÀíºóµÃµ½µÄ´Ê(Term)ÈçÏ£º

"student"£¬"allow"£¬"go"£¬"their"£¬"friend"£¬"allow"£¬"drink"£¬"beer"£¬"my"£¬"friend"£¬

"jerry"£¬"go"£¬"school"£¬"see"£¬"his"£¬"student"£¬"find"£¬"them"£¬"drink"£¬"allow"¡£

¾­¹ýÓïÑÔ´¦Àíºó£¬ËÑË÷driveʱdroveÒ²Äܱ»ËÑË÷³öÀ´¡£Stemming ºÍ lemmatizationµÄÒìͬ£º

Ïà֮ͬ´¦£º

StemmingºÍlemmatization¶¼ÒªÊ¹´Ê»ã³ÉΪ´Ê¸ùÐÎʽ¡£

Á½Õߵķ½Ê½²»Í¬£º

Stemming²ÉÓõÄÊÇ¡±Ëõ¼õ¡±µÄ·½Ê½£º¡±cars¡±µ½¡±car¡±£¬¡±driving¡±µ½¡±drive¡±¡£

Lemmatization²ÉÓõÄÊÇ¡±×ª±ä¡±µÄ·½Ê½£º¡±drove¡±µ½¡±drove¡±£¬¡±driving¡±µ½¡±drive¡±¡£

Á½ÕßµÄËã·¨²»Í¬£º

StemmingÖ÷ÒªÊDzÉȡijÖ̶ֹ¨µÄËã·¨À´×öÕâÖÖËõ¼õ£¬ÈçÈ¥³ý¡±s¡±£¬

È¥³ý¡±ing¡±¼Ó¡±e¡±£¬½«¡±ational¡±±äΪ¡±ate¡±£¬½«¡±tional¡±±äΪ¡±tion¡±¡£

LemmatizationÖ÷ÒªÊDzÉÓÃÊÂÏÈÔ¼¶¨µÄ¸ñʽ±£´æÄ³ÖÖ×ÖµäÖС£

±ÈÈç×ÖµäÖÐÓС±driving¡±µ½¡±drive¡±£¬¡±drove¡±µ½¡±drive¡±£¬¡±am,

is, are¡±µ½¡±be¡±µÄÓ³É䣬×öת±äʱ£¬°´ÕÕ×ÖµäÖÐÔ¼¶¨µÄ·½Ê½×ª»»¾Í¿ÉÒÔÁË¡£

StemmingºÍlemmatization²»ÊÇ»¥³â¹ØÏµ£¬ÊÇÓн»¼¯µÄ£¬ÓеĴÊÀûÓÃÕâÁ½ÖÖ·½Ê½¶¼ÄÜ´ïµ½ÏàͬµÄת»»¡£

Èý£ºµÃµ½µÄ´Ê(Term)´«µÝ¸øË÷Òý×é¼þ(Indexer)

ÀûÓõõ½µÄ´Ê(Term)´´½¨Ò»¸ö×Öµä

Term    Document ID
student 1
allow 1
go 1
their 1
friend 1
allow 1
drink 1
beer 1
my 2
friend 2
jerry 2
go 2
school 2
see 2
his 2
student 2
find 2
them 2
drink 2
allow 2

¶Ô×ֵ䰴×Öĸ˳ÐòÅÅÐò£º

Term    Document ID
allow 1
allow 1
allow 2
beer 1
drink 1
drink 2
find 2
friend 1
friend 2
go 1
go 2
his 2
jerry 2
my 2
school 2
see 2
student 1
student 2
their 1
them 2

ºÏ²¢ÏàͬµÄ´Ê(Term)³ÉΪÎĵµµ¹ÅÅ(Posting List)Á´±ípostlist

Document Frequency£ºÎĵµÆµ´Î£¬±íʾ¶àÉÙÎĵµ³öÏÖ¹ý´Ë´Ê(Term)

Frequency£º´ÊƵ£¬±íʾij¸öÎĵµÖиôÊ(Term)³öÏÖ¹ý¼¸´Î

¶Ô´Ê(Term) ¡°allow¡±À´½²£¬×ܹ²ÓÐÁ½ÆªÎĵµ°üº¬´Ë´Ê(Term)£¬´Ê£¨Term)ºóÃæµÄÎĵµÁ´±í×ܹ²ÓÐÁ½¸ö£¬µÚÒ»¸ö±íʾ°üº¬¡±allow¡±µÄµÚһƪÎĵµ£¬¼´1ºÅÎĵµ£¬´ËÎĵµÖУ¬¡±allow¡±³öÏÖÁË2´Î£¬µÚ¶þ¸ö±íʾ°üº¬¡±allow¡±µÄµÚ¶þ¸öÎĵµ£¬ÊÇ2ºÅÎĵµ£¬´ËÎĵµÖУ¬¡±allow¡±³öÏÖÁË1´Î

ÖÁ´ËË÷Òý´´½¨Íê³É£¬ËÑË÷¡±drive¡±Ê±£¬¡±driving¡±£¬¡±drove¡±£¬¡±driven¡±Ò²Äܹ»±»Ëѵ½¡£ÒòΪÔÚË÷ÒýÖУ¬¡±driving¡±£¬¡±drove¡±£¬¡±driven¡±¶¼»á¾­¹ýÓïÑÔ´¦Àí¶ø±ä³É¡±drive¡±£¬ÔÚËÑË÷ʱ£¬Èç¹ûÄúÊäÈ롱driving¡±£¬ÊäÈëµÄ²éѯÓï¾äͬÑù¾­¹ý·Ö´Ê×é¼þºÍÓïÑÔ´¦Àí×é¼þ´¦ÀíµÄ²½Ö裬±äΪ²éѯ¡±drive¡±£¬´Ó¶ø¿ÉÒÔËÑË÷µ½ÏëÒªµÄÎĵµ¡£

ËÑË÷²½Öè

ËÑË÷¡±microsoft job¡±£¬Óû§µÄÄ¿µÄÊÇÏ£ÍûÔÚ΢ÈíÕÒÒ»·Ý¹¤×÷£¬Èç¹ûËѳöÀ´µÄ½á¹ûÊÇ:¡±Microsoft does a good job at software industry¡­¡±£¬Õâ¾ÍÓëÓû§µÄÆÚÍûÆ«Àë̫ԶÁË¡£ÈçºÎ½øÐкÏÀíÓÐЧµÄËÑË÷£¬ËÑË÷³öÓû§×îÏëÒªµÃ½á¹ûÄØ£¿ËÑË÷Ö÷ÒªÓÐÈçϲ½Ö裺

Ò»£º¶Ô²éѯÄÚÈݽøÐдʷ¨·ÖÎö¡¢Óï·¨·ÖÎö¡¢ÓïÑÔ´¦Àí

´Ê·¨·ÖÎö£ºÇø·Ö²éѯÄÚÈÝÖе¥´ÊºÍ¹Ø¼ü×Ö£¬±ÈÈ磺english and janpan£¬¡±and¡±¾ÍÊǹؼü×Ö£¬¡±english¡±ºÍ¡±janpan¡±ÊÇÆÕͨµ¥´Ê¡£

¸ù¾Ý²éѯÓï·¨µÄÓï·¨¹æÔòÐγÉÒ»¿ÃÊ÷

ÓïÑÔ´¦Àí£¬ºÍ´´½¨Ë÷Òýʱ´¦Àí·½Ê½ÊÇÒ»ÑùµÄ¡£±ÈÈ磺leaned¨C>lean£¬driven¨C>drive

¶þ£ºËÑË÷Ë÷Òý£¬µÃµ½·ûºÏÓï·¨Ê÷µÄÎĵµ¼¯ºÏ

Èý£º¸ù¾Ý²éѯÓï¾äÓëÎĵµµÄÏà¹ØÐÔ£¬¶Ô½á¹û½øÐÐÅÅÐò

ÎÒÃǰѲéѯÓï¾äÒ²¿´×÷ÊÇÒ»¸öÎĵµ£¬¶ÔÎĵµÓëÎĵµÖ®¼äµÄÏà¹ØÐÔ£¨relevance£©½øÐдò·Ö£¨scoring£©£¬·ÖÊý¸ß±È½ÏÔ½Ïà¹Ø£¬ÅÅÃû¾ÍÔ½¿¿Ç°¡£µ±È»»¹¿ÉÒÔÈ˹¤Ó°Ïì´ò·Ö£¬±ÈÈç°Ù¶ÈËÑË÷£¬¾Í²»Ò»¶¨ÍêÈ«°´ÕÕÏà¹ØÐÔÀ´ÅÅÃûµÄ¡£

ÈçºÎÆÀÅÐÎĵµÖ®¼äµÄÏà¹ØÐÔ£¿Ò»¸öÎĵµÓɶà¸ö£¨»òÕßÒ»¸ö£©´Ê£¨Term£©×é³É£¬±ÈÈ磺¡±solr¡±£¬ ¡°toturial¡±£¬²»Í¬µÄ´Ê¿ÉÄÜÖØÒªÐÔ²»Ò»Ñù£¬±ÈÈçsolr¾Í±ÈtoturialÖØÒª£¬Èç¹ûÒ»¸öÎĵµ³öÏÖÁË10´Îtoturial£¬µ«Ö»³öÏÖÁËÒ»´Îsolr£¬¶øÁíÒ»Îĵµsolr³öÏÖÁË4´Î£¬toturial³öÏÖÒ»´Î£¬ÄÇôºóÕߺÜÓпÉÄܾÍÊÇÎÒÃÇÏëÒªµÄËѵĽá¹û¡£Õâ¾ÍÒýÉê³öÈ¨ÖØ£¨Term weight£©µÄ¸ÅÄî¡£

È¨ÖØ±íʾ¸Ã´ÊÔÚÎĵµÖеÄÖØÒª³Ì¶È£¬Ô½ÖØÒªµÄ´Êµ±È»È¨ÖØÔ½¸ß£¬Òò´ËÔÚ¼ÆËãÎĵµÏà¹ØÐÔʱӰÏìÁ¦¾Í¸ü´ó¡£Í¨¹ý´ÊÖ®¼äµÄÈ¨ÖØµÃµ½ÎĵµÏà¹ØÐԵĹý³Ì½Ð×ö¿Õ¼äÏòÁ¿Ä£ÐÍËã·¨(Vector Space Model)

Ó°ÏìÒ»¸ö´ÊÔÚÎĵµÖеÄÖØÒªÐÔÖ÷ÒªÓÐÁ½¸ö·½Ã棺

Term Frequencey£¨tf£©£¬TermÔÚ´ËÎĵµÖгöÏֵįµÂÊ£¬ftÔ½´ó±íÊ¾Ô½ÖØÒª

Document Frequency£¨df£©£¬±íʾÓжàÉÙÎĵµÖгöÏÖ¹ýÕâ¸öTrem£¬dfÔ½´ó±íʾԽ²»ÖØÒª

ÎïÒÔϣΪ¹ó£¬´ó¼Ò¶¼ÓеĶ«Î÷£¬×ÔÈ»¾Í²»ÄÇô¹óÖØÁË£¬Ö»ÓÐÄãרÓеĶ«Î÷±íʾÕâ¸ö¶«Î÷ºÜÕä¹ó£¬È¨ÖصĹ«Ê½£º

¿Õ¼äÏòÁ¿Ä£ÐÍ

ÎĵµÖдʵÄÈ¨ÖØ¿´×÷Ò»¸öÏòÁ¿

Document = {term1, term2, ¡­¡­ ,term N}

Document Vector = {weight1, weight2, ¡­¡­ ,weight N}

°ÑÓûÒª²éѯµÄÓï¾ä¿´×÷Ò»¸ö¼òµ¥µÄÎĵµ£¬Ò²ÓÃÏòÁ¿±íʾ£º

Query = {term1, term 2, ¡­¡­ , term N}

Query Vector = {weight1, weight2, ¡­¡­ , weight N}

°ÑËÑË÷³öµÄÎĵµÏòÁ¿¼°²éѯÏòÁ¿·ÅÈëNά¶ÈµÄ¿Õ¼äÖУ¬Ã¿¸ö´Ê±íʾһά£º

¼Ð½ÇԽС£¬±íʾԽÏàËÆ£¬Ïà¹ØÐÔÔ½´ó

   
3798 ´Îä¯ÀÀ       27
Ïà¹ØÎÄÕÂ

Java΢·þÎñÐÂÉú´úÖ®Nacos
ÉîÈëÀí½âJavaÖеÄÈÝÆ÷
JavaÈÝÆ÷Ïê½â
Java´úÂëÖÊÁ¿¼ì²é¹¤¾ß¼°Ê¹Óð¸Àý
Ïà¹ØÎĵµ

JavaÐÔÄÜÓÅ»¯
Spring¿ò¼Ü
SSM¿ò¼Ü¼òµ¥¼òÉÜ
´ÓÁ㿪ʼѧjava±à³Ì¾­µä
Ïà¹Ø¿Î³Ì

¸ßÐÔÄÜJava±à³ÌÓëϵͳÐÔÄÜÓÅ»¯
JavaEE¼Ü¹¹¡¢ Éè¼ÆÄ£Ê½¼°ÐÔÄܵ÷ÓÅ
Java±à³Ì»ù´¡µ½Ó¦Óÿª·¢
JAVAÐéÄâ»úÔ­ÀíÆÊÎö
×îл¼Æ»®
DeepSeekÔÚÈí¼þ²âÊÔÓ¦ÓÃʵ¼ù 4-12[ÔÚÏß]
DeepSeek´óÄ£ÐÍÓ¦Óÿª·¢Êµ¼ù 4-19[ÔÚÏß]
UAF¼Ü¹¹ÌåϵÓëʵ¼ù 4-11[±±¾©]
AIÖÇÄÜ»¯Èí¼þ²âÊÔ·½·¨Óëʵ¼ù 5-23[ÉϺ£]
»ùÓÚ UML ºÍEA½øÐзÖÎöÉè¼Æ 4-26[±±¾©]
ÒµÎñ¼Ü¹¹Éè¼ÆÓ뽨ģ 4-18[±±¾©]

Java ÖеÄÖÐÎıàÂëÎÊÌâ
Java»ù´¡ÖªÊ¶µÄÈýÊ®¸ö¾­µäÎÊ´ð
Íæ×ª Java Web Ó¦Óÿª·¢
ʹÓÃSpring¸üºÃµØ´¦ÀíStruts
ÓÃEclipse¿ª·¢iPhone WebÓ¦ÓÃ
²å¼þϵͳ¿ò¼Ü·ÖÎö

Struts+Spring+Hibernate
»ùÓÚJ2EEµÄWeb 2.0Ó¦Óÿª·¢
J2EEÉè¼ÆÄ£Ê½ºÍÐÔÄܵ÷ÓÅ
Java EE 5ÆóÒµ¼¶¼Ü¹¹Éè¼Æ
Javaµ¥Ôª²âÊÔ·½·¨Óë¼¼Êõ
Java±à³Ì·½·¨Óë¼¼Êõ

Struts+Spring+Hibernate/EJB+ÐÔÄÜÓÅ»¯
»ªÏÄ»ù½ð ActiveMQ Ô­ÀíÓë¹ÜÀí
ijÃñº½¹«Ë¾ Java»ù´¡±à³Ìµ½Ó¦Óÿª·¢
ij·çµç¹«Ë¾ Java Ó¦Óÿª·¢Æ½Ì¨ÓëÇ¨ÒÆ
ÈÕÕÕ¸Û J2EEÓ¦Óÿª·¢¼¼Êõ¿ò¼ÜÓëʵ¼ù
ij¿ç¹ú¹«Ë¾ ¹¤×÷Á÷¹ÜÀíJBPM
¶«·½º½¿Õ¹«Ë¾ ¸ß¼¶J2EE¼°ÆäÇ°ÑØ¼¼Êõ