Äú¿ÉÒÔ¾èÖú£¬Ö§³ÖÎÒÃǵĹ«ÒæÊÂÒµ¡£

1Ôª 10Ôª 50Ôª





ÈÏÖ¤Â룺  ÑéÖ¤Âë,¿´²»Çå³þ?Çëµã»÷Ë¢ÐÂÑéÖ¤Âë ±ØÌî



  ÇóÖª ÎÄÕ ÎÄ¿â Lib ÊÓÆµ iPerson ¿Î³Ì ÈÏÖ¤ ×Éѯ ¹¤¾ß ½²×ù Modeler   Code  
»áÔ±   
 
   
 
 
     
   
 ¶©ÔÄ
  ¾èÖú
È«ÎļìË÷ÒýÇæSolrϵÁУ¨ÖУ©
 
×÷ÕߣºÁõÖ¾¾ü À´Ô´£ºImportNew ·¢²¼ÓÚ 2016-1-6
  2812  次浏览      27
 

È«ÎļìË÷ÒýÇæSolrϵÁСª¡ªSolrºËÐĸÅÄî¡¢ÅäÖÃÎļþ

Document

DocumentÊÇSolrË÷Òý£¨¶¯´Ê£¬indexing£©ºÍËÑË÷µÄ×î»ù±¾µ¥Ôª£¬ËüÀàËÆÓÚ¹ØÏµÊý¾Ý¿â±íÖеÄÒ»Ìõ¼Ç¼£¬¿ÉÒÔ°üº¬Ò»¸ö»ò¶à¸ö×ֶΣ¨Field£©£¬Ã¿¸ö×ֶΰüº¬Ò»¸önameºÍÎı¾Öµ¡£×Ö¶ÎÔÚ±»Ë÷ÒýµÄͬʱ¿ÉÒÔ´æ´¢ÔÚË÷ÒýÖУ¬ËÑË÷ʱ¾ÍÄÜ·µ»Ø¸Ã×ֶεÄÖµ£¬Í¨³£Îĵµ¶¼Ó¦¸Ã°üº¬Ò»¸öÄÜΨһ±íʾ¸ÃÎĵµµÄid×ֶΡ£ÀýÈ磺

<doc>
<field name="id">company123</field>
<field name="companycity">Atlanta</field>
<field name="companystate">Georgia</field>
<field name="companyname">Code Monkeys R Us, LLC</field>
<field name="companydescription">we write lots of code</field>
<field name="lastmodified">2013-06-01T15:26:37Z</field>
</doc>

Schema

SolrÖеÄSchemaÀàËÆÓÚ¹ØÏµÊý¾Ý¿âÖеıí½á¹¹£¬ËüÒÔschema.xmlµÄÎı¾ÐÎʽ´æÔÚÔÚconfĿ¼Ï£¬ÔÚÌí¼ÓÎĵ±µ½Ë÷ÒýÖÐʱÐèÒªÖ¸¶¨Schema£¬SchemaÎļþÖ÷Òª°üº¬Èý²¿·Ö£º×ֶΣ¨Field£©¡¢×Ö¶ÎÀàÐÍ£¨FieldType£©¡¢Î¨Ò»¼ü£¨uniqueKey£©

×Ö¶ÎÀàÐÍ£¨FieldType£©£ºÓÃÀ´¶¨ÒåÌí¼Óµ½Ë÷ÒýÖеÄxmlÎļþ×ֶΣ¨Field£©ÖеÄÀàÐÍ£¬È磺int£¬String£¬date£¬

×ֶΣ¨Field£©£ºÌí¼Óµ½Ë÷ÒýÎļþÖÐʱµÄ×Ö¶ÎÃû³Æ

Ψһ¼ü£¨uniqueKey£©£ºuniqueKeyÊÇÓÃÀ´±êʶÎĵµÎ¨Ò»ÐÔµÄÒ»¸ö×ֶΣ¨Feild£©£¬ÔÚ¸üкÍɾ³ýʱÓõ½

ÀýÈ磺

<schema name="example" version="1.5">
<field name="id" type="string" indexed="true" stored="true" required="true" multiValued="false" />
<field name="title" type="text_general" indexed="true" stored="true" multiValued="true"/>

<uniqueKey>id</uniqueKey>
<fieldType name="string" class="solr.StrField" sortMissingLast="true" />
<fieldType name="text_general" class="solr.TextField" positionIncrementGap="100">
<analyzer type="index">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" />
<!-- in this example, we will only use synonyms at query time
<filter class="solr.SynonymFilterFactory" synonyms="index_synonyms.txt" ignoreCase="true" expand="false"/>
-->
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" />
<filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
</fieldType>
</schema>

Field

ÔÚSolrÖУ¬×Ö¶Î(Field)Êǹ¹³ÉDocumentµÄ»ù±¾µ¥Ôª¡£¶ÔÓ¦ÓÚÊý¾Ý¿â±íÖеÄijһÁС£×Ö¶ÎÊǰüÀ¨ÁËÃû³Æ£¬ÀàÐÍÒÔ¼°¶Ô×ֶζÔÓ¦µÄÖµÈçºÎ´¦ÀíµÄÒ»ÖÖÔªÊý¾Ý¡£±ÈÈ磺

<field name="name" type="text_general" indexed="true" stored="true"/>

Indexed£ºIndexed=trueʱ£¬±íʾ×ֶλá¼Ó±»Sorl´¦Àí¼ÓÈëµ½Ë÷ÒýÖУ¬Ö»Óб»Ë÷ÒýµÄ×ֶβÅÄܱ»ËÑË÷µ½¡£

Stored£ºStored=true£¬×Ö¶ÎÖµ»áÒÔ±£´æÒ»·ÝԭʼÄÚÈÝÔÚÔÚË÷ÒýÖУ¬¿ÉÒÔ±»ËÑË÷×é¼þ×é¼þ·µ»Ø£¬¿¼Âǵ½ÐÔÄÜÎÊÌ⣬¶ÔÓÚ³¤Îı¾¾Í²»Êʺϴ洢ÔÚË÷ÒýÖС£

Field Type

SolrÖÐÿ¸ö×ֶζ¼ÓÐÒ»¸ö¶ÔÓ¦µÄ×Ö¶ÎÀàÐÍ£¬±ÈÈ磺float¡¢long¡¢double¡¢date¡¢text£¬SolrÌṩÁ˷ḻ×Ö¶ÎÀàÐÍ£¬Í¬Ê±£¬ÎÒÃÇ»¹¿ÉÒÔ×Ô¶¨ÒåÊʺÏ×Ô¼ºµÄÊý¾ÝÀàÐÍ£¬ÀýÈ磺

<!-- Ik ·Ö´ÊÆ÷ --> 
<fieldType name="text_cn_stopword" class="solr.TextField">
<analyzer type="index">
<tokenizer class="org.wltea.analyzer.lucene.IKAnalyzerSolrFactory" useSmart="false"/>
</analyzer>
<analyzer type="query">
<tokenizer class="org.wltea.analyzer.lucene.IKAnalyzerSolrFactory" useSmart="true"/>
</analyzer>
</fieldType>
<!-- Ik ·Ö´ÊÆ÷ -->

Solrconfig£º

Èç¹û°ÑSchema¶¨ÒåΪSolrµÄModelµÄ»°£¬ÄÇôSolrconfig¾ÍÊÇSolrµÄConfiguration£¬Ëü¶¨ÒåSolrÈç¹û´¦ÀíË÷Òý¡¢¸ßÁÁ¡¢ËÑË÷µÈºÜ¶àÇëÇó£¬Í¬Ê±»¹Ö¸¶¨ÁË»º´æ²ßÂÔ£¬ÓõıȽ϶àµÄÔªËØ°üÀ¨£º

Ö¸¶¨Ë÷ÒýÊý¾Ý·¾¶

<!-- 
Used to specify an alternate directory to hold all index data
other than the default ./data under the Solr home.
If replication is in use, this should match the replication configuration.
-->
<dataDir>${solr.data.dir:./solr/data}</dataDir>

»º´æ²ÎÊý

<filterCache
class="solr.FastLRUCache"
size="512"
initialSize="512"
autowarmCount="0"/>

<!-- queryResultCache caches results of searches - ordered lists of
document ids (DocList) based on a query, a sort, and the range
of documents requested. -->
<queryResultCache
class="solr.LRUCache"
size="512"
initialSize="512"
autowarmCount="0"/>

<!-- documentCache caches Lucene Document objects (the stored fields for each document).
Since Lucene internal document ids are transient, this cache will not be autowarmed. -->
<documentCache
class="solr.LRUCache"
size="512"
initialSize="512"
autowarmCount="0"/>

ÇëÇó´¦ÀíÆ÷

ÇëÇó´¦ÀíÆ÷ÓÃÓÚ½ÓÊÕHTTPÇëÇ󣬴¦ÀíËÑË÷ºó£¬·µ»ØÏìÓ¦½á¹ûµÄ´¦ÀíÆ÷¡£±ÈÈ磺queryÇëÇó£º

<!-- A request handler that returns indented JSON by default -->
<requestHandler name="/query" class="solr.SearchHandler">
<lst name="defaults">
<str name="echoParams">explicit</str>
<str name="wt">json</str>
<str name="indent">true</str>
<str name="df">text</str>
</lst>
</requestHandler>

ÿ¸öÇëÇó´¦ÀíÆ÷°üÀ¨Ò»ÏµÁпÉÅäÖõÄËÑË÷²ÎÊý£¬ÀýÈ磺wt,indent,dfµÈµÈ¡£

È«ÎļìË÷ÒýÇæSolrϵÁСª¡ªÕûºÏÖÐÎÄ·Ö´Ê×é¼þmmseg4j

ĬÈÏSolrÌṩµÄ·Ö´Ê×é¼þ¶ÔÖÐÎĵÄÖ§³ÖÊDz»ÓѺõ쬱ÈÈ磺¡°VIM±È×÷ÊDZ༭Æ÷Ö®Éñ¡±Õâ¸ö¾ä×ÓÔÚË÷ÒýµÄµÄʱºò£¬Ñ¡ÔñFieldTypeΪ¡±text_general¡±×÷Ϊ·Ö´ÊÒÀ¾Ýʱ£¬·Ö´ÊЧ¹ûÊÇ£º

Ëü°Ñÿһ¸ö´Ê¶¼·Ö¿ªÁË£¬¿ÉÒÔÏëÏóÈç¹ûһƪÎÄÕÂÕâÑù·Ö´ÊµÄËÑË÷µÄÌåÑéЧ¹û·Ç³£²î¡£Äܹ»ºÍSolr¼¯³ÉµÄÖÐÎÄ·Ö´Ê×é¼þÓкܶ࣬±ÈÈ磺mmseg4j¡¢IkAnalyzer¡¢ICTCLASµÈµÈ¡£¸÷Óи÷µÄÌØµã¡£ÕâÆªÎÄÕ½²ÊöÈçºÎÕûºÏSolrÓëmmseg4j¡£mmeseg4j×îа汾ÊÇ1.9.1£¬ÏÂÔØ½âѹ£¬ÌáÈ¡ÆäÖеÄÈý¸öÎļþ£ºmmseg4j-analysis-1.9.1.jar£¬ mmseg4j-core-1.9.1.jar£¬mmseg4j-solr-1.9.1.jar¡£·Åµ½Ä¿Â¼£ºE:\solr-4.8.0\example\solr-webapp\webapp\WEB-INF\lib£¬ÐÞ¸ÄÅäÖÃÎļþschema.xml£¬Ìí¼ÓÏÂÃæµÄÁ½¶Î´úÂ룺

fieldType:

<!-- mmseg4j-->
<fieldType name="text_mmseg4j_complex" class="solr.TextField" positionIncrementGap="100" >
<analyzer>
<tokenizer class="com.chenlb.mmseg4j.solr.MMSegTokenizerFactory" mode="complex" dicPath="dic"/>
</analyzer>
</fieldType>
<fieldType name="text_mmseg4j_maxword" class="solr.TextField" positionIncrementGap="100" >
<analyzer>
<tokenizer class="com.chenlb.mmseg4j.solr.MMSegTokenizerFactory" mode="max-word" dicPath="dic"/>
</analyzer>
</fieldType>
<fieldType name="text_mmseg4j_simple" class="solr.TextField" positionIncrementGap="100" >
<analyzer>
<!--
<tokenizer class="com.chenlb.mmseg4j.solr.MMSegTokenizerFactory" mode="simple" dicPath="n:/OpenSource/apache-solr-1.3.0/example/solr/my_dic"/>
-->
<tokenizer class="com.chenlb.mmseg4j.solr.MMSegTokenizerFactory" mode="simple" dicPath="dic"/>
</analyzer>
</fieldType>
<!-- mmseg4j-->

ÓëfieldType¶ÔÓ¦µÄfield£º

<!-- mmseg4j -->
<field name="mmseg4j_complex_name" type="text_mmseg4j_complex" indexed="true" stored="true"/>
<field name="mmseg4j_maxword_name" type="text_mmseg4j_maxword" indexed="true" stored="true"/>
<field name="mmseg4j_simple_name" type="text_mmseg4j_simple" indexed="true" stored="true"/>
<!--mmseg4j -->

´Ëʱ¾ÍËãÅäÖÃÍê³ÉÁË£¬ÖØÆô·þÎñ£ºjava -jar start.jar£¬À´¿´¿´mmseg4jµÄ·Ö´ÊЧ¹ûÔõôÑù£¬´ò¿ªSolr¹ÜÀí½çÃæ£¬µã»÷×ó²àµÄAnalysisÒ³Ãæ

¶Ô±È֮ǰµÄ·Ö´ÊЧ¹û£¬¸Ä½øÁ˺ܶ࣬²î²»¶à¾ÍÊÇÕý³£µÄÓïÒåÁË¡£ÕâÀïÔڷִʵÄʱºòÄãÓпÉÄÜ»áÓöµ½Ò»¸öÎÊÌ⣺

TokenStream contract violation: reset()/close() call missing, reset() called multiple times, or subclass does not call super.reset(). Please see Javadocs of TokenStream class for more information about the correct consuming workflow.

Õâ¸öÊÇSolr4.8»·¾³ÏÂmmseg4jµÄÒ»¸öbug£¬ÕâÊÇmmseg4j-analysis-1.9.1.jarÒýÆðµÄ£¬ÐèÒªÐÞ¸ÄÔ´Â룬ÕÒµ½Îļþ£ºmmseg4j-1.9.1\mmseg4j-analysis\src\main\java\com\chenlb\mmseg4j\analysis\MMSegTokenizer.java£¬¼ÓÉÏsuper.reset()£º

public void reset() throws IOException {
//lucene 4.0
//org.apache.lucene.analysis.Tokenizer.setReader(Reader)
//setReader ×Ô¶¯±»µ÷ÓÃ, input ×Ô¶¯±»ÉèÖá£
super.reset(); //¼ÓÉÏÕâÒ»ÐÐ
mmSeg.reset(input);
}

ÐÞ¸ÄÍêÖ®ºóÓÃmavenÖØÆô±àÒ룺mvn clean package -DskipTests£¬ÓÃеÄmmseg4j-1.9.1\mmseg4j-analysis\target\mmseg4j-analysis-1.9.2-SNAPSHOT.jarÌæ»»µôÔ­À´ÄǸöÎļþ£¬ÖØÆô·þÎñ¾ÍokÁË¡£

mmeseg4j-1.9.1Õâ¸ö°æ±¾µÄµÄ´Ê¿âÈ«²¿´ò°ü·ÅÔÚÁËjarÎļþÀïÃæ£¬Òò´ËÎÞÐèÔÙÖ¸¶¨´Ê¿âÎļþ(chars.dic£¬units.dic£¬words.dic)£¬µ±È»ÄãÒ²¿ÉÒÔ¸²¸ÇÕâЩÎļþ£¬Ö»ÐèÒª°ÉÔ¤Ìæ»»µÄÎļþ·ÅÔÚÔÚWEB-INF\data\¼´¿É¡£

ÏÖÔÚÌí¼ÓÁ½¸öÖÐÎÄÎĵµµ½Ë÷ÒýÖÐÈ¥£¬ÊÔÊÔmmeseg4jµÄЧ¹ûÔõôÑù£º

<add>  
<doc>
<field name="id">0001</field>
<field name="mmseg4j_complex_name">°ÑEmacs±È×÷ÊÇÉñµÄ±à¼­Æ÷£¬VIM±È×÷ÊDZ༭Æ÷Ö®Éñ£¬ 2012Ä꿪ʼ½Ó´¥VIM£¬Ò»Ö±ÑØÓÃÖÁ½ñ£¬Ò²Ôø½ñ×ܽá¹ýVIMµÄÏà¹ØÖªÊ¶£¬ ÎÄÕ¶¼ÕûÀíÔÚÒÔǰµÄITeye²©¿ÍºÍGitHub£¬Õâ¿î¹Å¶ø²»Àϵı༭Æ÷ÖÁ½ñÈÔÊÜÖÚ¶à³ÌÐòÔ±×·Åõ£¬ µ±È»ÎÒÒ²ÊÇÖÒʵµÄVIMÓû§£¬ÕâÆªÎÄÕ¾ÍÊÇÓÃVIM±à¼­Íê³É¡£</field>
</doc>
<doc>
<field name="id">0002</field>
<field name="mmseg4j_complex_name">ÓÃGoogleËÑË÷"Python IDE"£¬ µÚÒ»Ìõ¾ÍÊÇstackoverflowÉÏÒ»¸ö·Ç³£ÈÈÃŵÄÎÊÌ⣺ "what IDE to use for Python"£¬ÉϰÙÖֱ༭Æ÷µÄ¹¦ÄܶԱÈͼÈÃÈËÑÛ»¨çÔÂÒ¡£ ÆäÖÐÓÐÎÒ½Ó´¥¹ýµÄ¼¸¿î±à¼­Æ÷£¨IDE£©°üÀ¨£ºEclilpse(PyDev)¡¢VIM¡¢NotePad++¡¢PyCharm¡£ Èç¹ûÄãµÄÈÕ³£¿ª·¢ÓïÑÔÊÇPythonµÄ»°£¬ÔÙËÑË÷"python vim"£¬´óÔ¼ÓÐ328ÍòÌõ½á¹û£¬ ¿É¼ûÓÃVIM×öPython¿ª·¢µÄ³ÌÐòÔ±ÄÇÊÇÏ൱֮¶à£¬ÎÒ´ó¸Å×ܽáµÄ¼¸µãÔ­Òò£¬µ±È»²»Ò»¶¨ÕýÈ·</field>
</doc>
</add>

±£´æÎªutf-8¸ñʽµÄÎļþÃû£ºmmseg4j-solr-demo-doc.xml£¬¼ÓÈëµ½SolrÖÐÈ¥£º

E:\solr-4.8.0\example\exampledocs>java -jar post.jar mmseg4j-solr-demo-doc.xml
SimplePostTool version 1.5
Posting files to base url http://localhost:8983/solr/update using content-type application/xml..
POSTing file mmseg4j-solr-demo-doc.xml
1 files indexed.
COMMITting Solr index changes to http://localhost:8983/solr/update..
Time spent: 0:00:01.055

¿´ËÑË÷½á¹û£º

   
2812 ´Îä¯ÀÀ       27
Ïà¹ØÎÄÕÂ

Java΢·þÎñÐÂÉú´úÖ®Nacos
ÉîÈëÀí½âJavaÖеÄÈÝÆ÷
JavaÈÝÆ÷Ïê½â
Java´úÂëÖÊÁ¿¼ì²é¹¤¾ß¼°Ê¹Óð¸Àý
Ïà¹ØÎĵµ

JavaÐÔÄÜÓÅ»¯
Spring¿ò¼Ü
SSM¿ò¼Ü¼òµ¥¼òÉÜ
´ÓÁ㿪ʼѧjava±à³Ì¾­µä
Ïà¹Ø¿Î³Ì

¸ßÐÔÄÜJava±à³ÌÓëϵͳÐÔÄÜÓÅ»¯
JavaEE¼Ü¹¹¡¢ Éè¼ÆÄ£Ê½¼°ÐÔÄܵ÷ÓÅ
Java±à³Ì»ù´¡µ½Ó¦Óÿª·¢
JAVAÐéÄâ»úÔ­ÀíÆÊÎö
×îл¼Æ»®
DeepSeekÔÚÈí¼þ²âÊÔÓ¦ÓÃʵ¼ù 4-12[ÔÚÏß]
DeepSeek´óÄ£ÐÍÓ¦Óÿª·¢Êµ¼ù 4-19[ÔÚÏß]
UAF¼Ü¹¹ÌåϵÓëʵ¼ù 4-11[±±¾©]
AIÖÇÄÜ»¯Èí¼þ²âÊÔ·½·¨Óëʵ¼ù 5-23[ÉϺ£]
»ùÓÚ UML ºÍEA½øÐзÖÎöÉè¼Æ 4-26[±±¾©]
ÒµÎñ¼Ü¹¹Éè¼ÆÓ뽨ģ 4-18[±±¾©]

Java ÖеÄÖÐÎıàÂëÎÊÌâ
Java»ù´¡ÖªÊ¶µÄÈýÊ®¸ö¾­µäÎÊ´ð
Íæ×ª Java Web Ó¦Óÿª·¢
ʹÓÃSpring¸üºÃµØ´¦ÀíStruts
ÓÃEclipse¿ª·¢iPhone WebÓ¦ÓÃ
²å¼þϵͳ¿ò¼Ü·ÖÎö

Struts+Spring+Hibernate
»ùÓÚJ2EEµÄWeb 2.0Ó¦Óÿª·¢
J2EEÉè¼ÆÄ£Ê½ºÍÐÔÄܵ÷ÓÅ
Java EE 5ÆóÒµ¼¶¼Ü¹¹Éè¼Æ
Javaµ¥Ôª²âÊÔ·½·¨Óë¼¼Êõ
Java±à³Ì·½·¨Óë¼¼Êõ

Struts+Spring+Hibernate/EJB+ÐÔÄÜÓÅ»¯
»ªÏÄ»ù½ð ActiveMQ Ô­ÀíÓë¹ÜÀí
ijÃñº½¹«Ë¾ Java»ù´¡±à³Ìµ½Ó¦Óÿª·¢
ij·çµç¹«Ë¾ Java Ó¦Óÿª·¢Æ½Ì¨ÓëÇ¨ÒÆ
ÈÕÕÕ¸Û J2EEÓ¦Óÿª·¢¼¼Êõ¿ò¼ÜÓëʵ¼ù
ij¿ç¹ú¹«Ë¾ ¹¤×÷Á÷¹ÜÀíJBPM
¶«·½º½¿Õ¹«Ë¾ ¸ß¼¶J2EE¼°ÆäÇ°ÑØ¼¼Êõ