Äú¿ÉÒÔ¾èÖú£¬Ö§³ÖÎÒÃǵĹ«ÒæÊÂÒµ¡£

1Ôª 10Ôª 50Ôª





ÈÏÖ¤Â룺  ÑéÖ¤Âë,¿´²»Çå³þ?Çëµã»÷Ë¢ÐÂÑéÖ¤Âë ±ØÌî



  ÇóÖª ÎÄÕ ÎÄ¿â Lib ÊÓÆµ iPerson ¿Î³Ì ÈÏÖ¤ ×Éѯ ¹¤¾ß ½²×ù Model Center   Code  
»áÔ±   
   
 
     
   
 ¶©ÔÄ
  ¾èÖú
ÓÃSpark½øÐдóÊý¾Ý´¦ÀíÖ®»úÆ÷ѧϰƪ
 
À´Ô´£ºÁ¶Êý³É½ð ·¢²¼ÓÚ£º 2017-5-19
  2426  次浏览      27
 

ÔÚ±¾ÆªÎÄÕ£¬×÷Õß½«ÌÖÂÛ»úÆ÷ѧϰ¸ÅÄîÒÔ¼°ÈçºÎʹÓÃSpark MLlibÀ´½øÐÐÔ¤²â·ÖÎö¡£ºóÃæ½«»áʹÓÃÒ»¸öÀý×ÓչʾSpark MLlibÔÚ»úÆ÷ѧϰÁìÓòµÄÇ¿º·¡£

1.ÒýÑÔ

Spark»úÆ÷ѧϰAPI°üº¬Á½¸öpackage£ºspark.mllib ºÍspark.ml¡£

spark.mllib °üº¬»ùÓÚµ¯ÐÔÊý¾Ý¼¯£¨RDD£©µÄԭʼSpark»úÆ÷ѧϰAPI¡£ËüÌṩµÄ»úÆ÷ѧϰ¼¼ÊõÓУºÏà¹ØÐÔ¡¢·ÖÀàºÍ»Ø¹é¡¢Ð­Í¬¹ýÂË¡¢¾ÛÀàºÍÊý¾Ý½µÎ¬¡£

spark.mlÌṩ½¨Á¢ÔÚDataFrameµÄ»úÆ÷ѧϰAPI£¬DataFrameÊÇSpark SQLµÄºËÐIJ¿·Ö¡£Õâ¸ö°üÌṩ¿ª·¢ºÍ¹ÜÀí»úÆ÷ѧϰ¹ÜµÀµÄ¹¦ÄÜ£¬¿ÉÒÔÓÃÀ´½øÐÐÌØÕ÷ÌáÈ¡¡¢×ª»»¡¢Ñ¡ÔñÆ÷ºÍ»úÆ÷ѧϰËã·¨£¬±ÈÈç·ÖÀàºÍ»Ø¹éºÍ¾ÛÀà¡£

±¾ÆªÎÄÕ¾۽¹ÔÚSpark MLlibÉÏ£¬²¢ÌÖÂÛ¸÷¸ö»úÆ÷ѧϰËã·¨¡£

2.»úÆ÷ѧϰºÍÊý¾Ý¿ÆÑ§

»úÆ÷ѧϰÊÇ´ÓÒѾ­´æÔÚµÄÊý¾Ý½øÐÐѧϰÀ´¶Ô½«À´½øÐÐÊý¾ÝÔ¤²â£¬ËüÊÇ»ùÓÚÊäÈëÊý¾Ý¼¯´´½¨Ä£ÐÍ×öÊý¾ÝÇý¶¯¾ö²ß¡£

Êý¾Ý¿ÆÑ§ÊÇ´Óº£ÀïÊý¾Ý¼¯£¨½á¹¹»¯ºÍ·Ç½á¹¹»¯Êý¾Ý£©Öгéȡ֪ʶ£¬ÎªÉÌÒµÍŶÓÌṩÊý¾Ý¶´²ìÒÔ¼°Ó°ÏìÉÌÒµ¾ö²ßºÍ·Ïßͼ¡£Êý¾Ý¿ÆÑ§¼ÒµÄµØÎ»±ÈÒÔǰÓô«Í³ÊýÖµ·½·¨½â¾öÎÊÌâµÄÈËÒªÖØÒª¡£

ÒÔÏÂÊǼ¸Àà»úÆ÷ѧϰģÐÍ£º

¼à¶½Ñ§Ï°Ä£ÐÍ

·Ç¼à¶½Ñ§Ï°Ä£ÐÍ

°ë¼à¶½Ñ§Ï°Ä£ÐÍ

ÔöǿѧϰģÐÍ

ÏÂÃæ¼òµ¥µÄÁ˽âϸ÷»úÆ÷ѧϰģÐÍ£¬²¢½øÐбȽϣº

¼à¶½Ñ§Ï°Ä£ÐÍ£º¼à¶½Ñ§Ï°Ä£ÐͶÔÒѱê¼ÇµÄѵÁ·Êý¾Ý¼¯ÑµÁ·³ö½á¹û£¬È»ºó¶Ôδ±ê¼ÇµÄÊý¾Ý¼¯½øÐÐÔ¤²â£»

¼à¶½Ñ§Ï°ÓÖ°üº¬Á½¸ö×ÓÄ£ÐÍ£º»Ø¹éÄ£ÐͺͷÖÀàÄ£ÐÍ¡£

·Ç¼à¶½Ñ§Ï°Ä£ÐÍ£º·Ç¼à¶½Ñ§Ï°Ä£ÐÍÊÇÓÃÀ´´ÓԭʼÊý¾Ý£¨ÎÞѵÁ·Êý¾Ý£©ÖÐÕÒµ½Òþ²ØµÄģʽ»òÕß¹ØÏµ£¬Òò¶ø·Ç¼à¶½Ñ§Ï°Ä£ÐÍÊÇ»ùÓÚδ±ê¼ÇÊý¾Ý¼¯µÄ£»

°ë¼à¶½Ñ§Ï°Ä£ÐÍ£º°ë¼à¶½Ñ§Ï°Ä£ÐÍÓÃÔڼලºÍ·Ç¼à¶½»úÆ÷ѧϰÖÐ×öÔ¤²â·ÖÎö£¬Æä¼ÈÓбê¼ÇÊý¾ÝÓÖÓÐδ±ê¼ÇÊý¾Ý¡£µäÐ͵ij¡¾°ÊÇ»ìºÏÉÙÁ¿±ê¼ÇÊý¾ÝºÍ´óÁ¿Î´±ê¼ÇÊý¾Ý¡£°ë¼à¶½Ñ§Ï°Ò»°ãʹÓ÷ÖÀàºÍ»Ø¹éµÄ»úÆ÷ѧϰ·½·¨£»

ÔöǿѧϰģÐÍ£ºÔöǿѧϰģÐÍͨ¹ý²»Í¬µÄÐÐΪÀ´Ñ°ÕÒÄ¿±ê»Ø±¨º¯Êý×î´ó»¯¡£

ÏÂÃæ¸ø¸÷¸ö»úÆ÷ѧϰģÐ;ٸöÁÐ×Ó£º

¼à¶½Ñ§Ï°£ºÒì³£¼à²â£»

·Ç¼à¶½Ñ§Ï°£ºÉç½»ÍøÂ磬ÓïÑÔÔ¤²â£»

°ë¼à¶½Ñ§Ï°£ºÍ¼Ïñ·ÖÀà¡¢ÓïÒôʶ±ð£»

Ôöǿѧϰ£ºÈ˹¤ÖÇÄÜ£¨AI£©¡£

3.»úÆ÷ѧϰÏîÄ¿²½Öè

¿ª·¢»úÆ÷ѧϰÏîĿʱ£¬Êý¾ÝÔ¤´¦Àí¡¢ÇåÏ´ºÍ·ÖÎöµÄ¹¤×÷ÊǷdz£ÖØÒªµÄ£¬Óë½â¾öÒµÎñÎÊÌâµÄʵ¼ÊµÄѧϰģÐͺÍËã·¨Ò»ÑùÖØÒª¡£

µäÐ͵ĻúÆ÷ѧϰ½â¾ö·½°¸µÄÒ»°ã²½Ö裺

ÌØÕ÷¹¤³Ì

Ä£ÐÍѵÁ·

Ä£ÐÍÆÀ¹À

ͼ1

ԭʼÊý¾ÝÈç¹û²»ÄÜÇåÏ´»òÕßÔ¤´¦Àí£¬Ôò»áÔì³É×îÖյĽá¹û²»×¼È·»òÕß²»¿ÉÓã¬ÉõÖÁ¶ªÊ§ÖØÒªµÄϸ½Ú¡£

ѵÁ·Êý¾ÝµÄÖÊÁ¿¶Ô×îÖÕµÄÔ¤²â½á¹û·Ç³£ÖØÒª£¬Èç¹ûѵÁ·Êý¾Ý²»¹»Ëæ»ú£¬µÃ³öµÄ½á¹ûÄ£ÐͲ»¾«È·£»Èç¹ûÊý¾ÝÁ¿Ì«Ð¡£¬»úÆ÷ѧϰ³öµÄÄ£ÐÍÒ²²»×¼È·¡£

ʹÓð¸Àý£º

ÒµÎñʹÓð¸Àý·Ö²¼ÓÚ¸÷¸öÁìÓò£¬°üÀ¨¸öÐÔ»¯ÍƼöÒýÇæ£¨Ê³Æ·ÍƼöÒýÇæ£©£¬Êý¾ÝÔ¤²â·ÖÎö£¨¹É¼ÛÔ¤²â»òÕßÔ¤²âº½°àÑÓ³Ù£©£¬¹ã¸æ£¬Òì³£¼à²â£¬Í¼ÏñºÍÊÓÆµÄ£ÐÍʶ±ð£¬ÒÔ¼°ÆäËû¸÷ÀàÈ˹¤ÖÇÄÜ¡£

½Ó×ÅÀ´¿´Á½¸ö±È½ÏÁ÷ÐеĻúÆ÷ѧϰӦÓ㺸öÐÔ»¯ÍƼöÒýÇæºÍÒì³£¼à²â¡£

4.»úÆ÷ѧϰӦÓÃ

4.1¡¢ÍƼöÒýÇæ

¸öÐÔ»¯ÍƼöÒýÇæÊ¹ÓÃÉÌÆ·ÊôÐÔºÍÓû§ÐÐΪÀ´½øÐÐÔ¤²â¡£ÍƼöÒýÇæÒ»°ãÓÐÁ½ÖÖË㷨ʵÏÖ£º»ùÓÚÄÚÈݹýÂ˺ÍЭͬ¹ýÂË¡£

Эµ÷¹ýÂ˵Ľâ¾ö·½°¸±ÈÆäËûËã·¨ÒªºÃ£¬Spark MLlibʵÏÖÁËALSЭͬ¹ýÂËËã·¨¡£Spark MLlibµÄЭͬ¹ýÂËÓÐÁ½ÖÖÐÎʽ£ºÏÔʽ·´À¡ºÍÒþÊÔ·´À¡¡£ÏÔʽ·´À¡ÊÇ»ùÓÚÓû§¹ºÂòµÄÉÌÆ·£¨±ÈÈ磬µçÓ°£©£¬ÏÔʽ·´À¡ËäºÃ£¬µ«ºÜ¶àÇé¿öÏ»á³öÏÖÊý¾ÝÇãб£»ÒþÊÔ·´À¡ÊÇ»ùÓÚÓû§µÄÐÐΪÊý¾Ý£¬±ÈÈ磬ä¯ÀÀ¡¢µã»÷¡¢Ï²»¶µÈÐÐΪ¡£ÒþÊÔ·´À¡ÏÖÔÚ´ó¹æÄ£Ó¦ÓÃÔÚ¹¤ÒµÉϽøÐÐÊý¾ÝÔ¤²â·ÖÎö£¬ÒòΪÆäºÜÈÝÒ×ÊÕ¼¯¸÷ÀàÊý¾Ý¡£

ÁíÍâÓлùÓÚÄ£Ð͵ķ½·¨ÊµÏÖÍÆ¼öÒýÇæ£¬ÕâÀïÔÝÇÒÂÔ¹ý¡£

4.2Òì³£¼à²â

Òì³£¼à²âÊÇ»úÆ÷ѧϰÖÐÁíÍâÒ»¸öÓ¦Ó÷dz£¹ã·ºµÄ¼¼Êõ£¬ÒòΪÆä¿ÉÒÔ¿ìËÙºÍ׼ȷµØ½â¾ö½ðÈÚÐÐÒµµÄ¼¬ÊÖÎÊÌâ¡£½ðÈÚ·þÎñÒµÐèÒªÔÚ¼¸°ÙºÁÃëÄÚÅжϳöÒ»±ÊÔÚÏß½»Ò×ÊÇ·ñ·Ç·¨¡£

Éñ¾­ÍøÂç¼¼Êõ±»ÓÃÀ´½øÐÐÏúÊÛµãµÄÒì³£¼à²â¡£±ÈÈçÏñPayPalµÈ¹«Ë¾Ê¹Óò»Í¬µÄ»úÆ÷ѧϰËã·¨£¨±ÈÈ磬ÏßÐԻع飬Éñ¾­ÍøÂçºÍÉî¶Èѧϰ£©À´½øÐзçÏÕ¹ÜÀí¡£

Spark MLlib¿âÌṩ¸øÁ˼¸¸öʵÏÖµÄËã·¨£¬±ÈÈ磬ÏßÐÔSVM¡¢Âß¼­»Ø¹é¡¢¾ö²ßÊ÷ºÍ±´Ò¶Ë¹Ëã·¨¡£ÁíÍ⣬һЩ¼¯³ÉÄ£ÐÍ£¬±ÈÈçËæ»úÉ­ÁÖºÍgradient-boostingÊ÷¡£

ÄÇôÏÖÔÚ¿ªÊ¼ÎÒÃǵÄʹÓÃApache Spark¿ò¼Ü½øÐлúÆ÷ѧϰ֮Âá£

5.Spark Mlib

Spark MLlibʵÏֵĻúÆ÷ѧϰ¿âʹµÃ»úÆ÷ѧϰģÐÍ¿ÉÀ©Õ¹ºÍÒ×ʹÓ㬰üÀ¨·ÖÀàËã·¨¡¢»Ø¹éËã·¨¡¢¾ÛÀàËã·¨¡¢Ð­Í¬¹ýÂËËã·¨¡¢½µÎ¬Ëã·¨£¬²¢ÌṩÁËÏàÓ¦µÄAPI¡£³ýÁËÕâЩËã·¨Í⣬Spark MLlib»¹ÌṩÁ˸÷ÖÖÊý¾Ý´¦Àí¹¦ÄܺÍÊý¾Ý·ÖÎö¹¤¾ßΪ´ó¼ÒʹÓãº

ͨ¹ýFP-growthËã·¨½øÐÐÆµ·±ÏÍÚ¾òºÍ¹ØÁª·ÖÎö£»

ͨ¹ýPrefixSpanËã·¨½øÐÐÐòÁÐģʽÍÚ¾ò£»

Ìṩ¸ÅÀ¨ÐÔͳ¼ÆºÍ¼ÙÉè¼ìÑ飻

Ìá¹©ÌØÕ÷ת»»£»

»úÆ÷ѧϰģÐÍÆÀ¹ÀºÍ³¬²ÎÊýµ÷ÓÅ¡£

ͼ2 չʾSparkÉú̬

Spark MLlib APIÖ§³ÖScala£¬JavaºÍPython±à³Ì¡£

6.Spark MLlibÓ¦ÓÃʵ¼ù

ʹÓÃSpark MLlibʵÏÖÍÆ¼öÒýÇæ¡£ÍƼöÒýÇæ×î¼Ñʵ¼ùÊÇ»ùÓÚÒÑÖªÓû§µÄÉÌÆ·ÐÐΪ¶øÈ¥Ô¤²âÓû§¿ÉÄܸÐÐËȤµÄδ֪ÉÌÆ·¡£ÍƼöÒýÇæ»ùÓÚÒÑÖªÊý¾Ý£¨Ò²¼´£¬ÑµÁ·Êý¾Ý£©ÑµÁ·³öÔ¤²âÄ£ÐÍ¡£È»ºóÀûÓÃѵÁ·ºÃµÄÔ¤²âÄ£ÐÍÀ´Ô¤²â¡£

×î¼ÑµçÓ°ÍÆ¼öÒýÇæµÄʵÏÖÓÐÏÂÃæ¼¸²½£º

¼ÓÔØµçÓ°Êý¾Ý£»

¼ÓÔØÄãÖ¸¶¨µÄÆÀ¼ÛÊý¾Ý£»

¼ÓÔØÉçÇøÌṩµÄÆÀ¼ÛÊý¾Ý£»

½«ÆÀ¼ÛÊý¾Ýjoin³Éµ¥¸öRDD£»

ʹÓÃALSË㷨ѵÁ·Ä£ÐÍ£»

È·ÈÏÖ¸¶¨Óû§£¨userId £½ 1£©Î´ÆÀ¼ÛµÄµçÓ°£»

Ô¤²âδ±»Óû§ÆÀ¼ÛµÄµçÓ°µÄÆÀ¼Û£»

»ñÈ¡Top NµÄÍÆ¼ö£¨ÕâÀïN£½ 5£©£»

ÔÚÖÕ¶ËÏÔÊ¾ÍÆ¼ö½á¹û¡£

Èç¹ûÄãÏë¶ÔÊä³öµÄÊý¾Ý×ö½øÒ»²½·ÖÎö£¬Äã¿ÉÒÔ°ÑÔ¤²âµÄ½á¹û´æ´¢µ½Cassandra»òÕßMongoDBµÈÊý¾Ý¿â¡£

7.ʹÓõ½µÄ¼¼Êõ

ÕâÀï²ÉÓÃJava¿ª·¢Spark MLlib³ÌÐò£¬²¢ÔÚstand£­aloneÄ£ÐÍÏÂÖ´ÐС£Ê¹Óõ½µÄMLlib JavaÀࣺorg.apache.spark.mllib.recommendation¡£

ALS

MatrixFactorizationModel

Rating

ͼ3 Spark»úÆ÷ѧϰµÄÀý×Ó³ÌÐò¼Ü¹¹

³ÌÐòÖ´ÐУº

¿ª·¢ºÃµÄ³ÌÐò½øÐдò°ü£¬ÉèÖû·¾³±äÁ¿£ºJDK (JAVA_HOME), Maven (MAVEN_HOME)ºÍSpark (SPARK_HOME)¡£

ÔÚWindows»·¾³ÖУº

set JAVA_HOME=[JDK_INSTALL_DIRECTORY]

set PATH=%PATH%;%JAVA_HOME%\bin

set MAVEN_HOME=[MAVEN_INSTALL_DIRECTORY]

set PATH=%PATH%;%MAVEN_HOME%\bin

set SPARK_HOME=[SPARK_INSTALL_DIRECTORY]

set PATH=%PATH%;%SPARK_HOME%\bin

cd c:\dev\projects\spark-mllib-sample-app

mvn clean install

mvn eclipse:clean eclipse:eclipse

ÔÚLinux»òÕßMACϵͳÖУ»

export JAVA_HOME=[JDK_INSTALL_DIRECTORY]

export PATH=$PATH:$JAVA_HOME/bin

export MAVEN_HOME=[MAVEN_INSTALL_DIRECTORY]

export PATH=$PATH:$MAVEN_HOME/bin

export SPARK_HOME=[SPARK_INSTALL_DIRECTORY]

export PATH=$PATH:$SPARK_HOME/bin

cd /Users/USER_NAME/spark-mllib-sample-app

mvn clean install

mvn eclipse:clean eclipse:eclipse

ÔËÐÐSpark³ÌÐò£¬ÃüÁîÈçÏ£º

%SPARK_HOME%\bin\spark-submit --class "org.apache.spark.examples.mllib.JavaRecommendationExample" --master local[*] target\spark-mllib-sample-1.0.jar

ÔÚWindows»·¾³Ï£º

%SPARK_HOME%\bin\spark-submit --class "org.apache.spark.examples.mllib.JavaRecommendationExample" --master local[*] target\spark-mllib-sample-1.0.jar

ÔÚLinux»òÕßMAC»·¾³Ï£º

$SPARK_HOME/bin/spark-submit --class "org.apache.spark.examples.mllib.JavaRecommendationExample" --master local[*] target/spark-mllib-sample-1.0.jar

Spark MLlibÓ¦ÓÃ¼à¿Ø

ʹÓÃSparkµÄweb¿ØÖÆÌ¨¿ÉÒÔ½øÐÐ¼à¿Ø³ÌÐòÔËÐÐ״̬¡£ÕâÀïÖ»¸ø³ö³ÌÐòÔËÐеÄÓÐÏòÎÞ»·Í¼£¨DAG£©£º

ͼ4 DAGµÄ¿ÉÊÓ»¯

8.½áÂÛ

Spark MLlibÊÇSparkʵÏֵĻúÆ÷ѧϰ¿âÖеÄÒ»ÖÖ£¬¾­³£ÓÃÀ´×öÒµÎñÊý¾ÝµÄÔ¤²â·ÖÎö£¬±ÈÈç¸öÐÔ»¯ÍƼöÒýÇæºÍÒì³£¼à²âϵͳ

   
2426 ´Îä¯ÀÀ       27
Ïà¹ØÎÄÕÂ

»ùÓÚEAµÄÊý¾Ý¿â½¨Ä£
Êý¾ÝÁ÷½¨Ä££¨EAÖ¸ÄÏ£©
¡°Êý¾Ýºþ¡±£º¸ÅÄî¡¢ÌØÕ÷¡¢¼Ü¹¹Óë°¸Àý
ÔÚÏßÉ̳ÇÊý¾Ý¿âϵͳÉè¼Æ ˼·+Ч¹û
 
Ïà¹ØÎĵµ

GreenplumÊý¾Ý¿â»ù´¡Åàѵ
MySQL5.1ÐÔÄÜÓÅ»¯·½°¸
ijµçÉÌÊý¾ÝÖÐ̨¼Ü¹¹Êµ¼ù
MySQL¸ßÀ©Õ¹¼Ü¹¹Éè¼Æ
Ïà¹Ø¿Î³Ì

Êý¾ÝÖÎÀí¡¢Êý¾Ý¼Ü¹¹¼°Êý¾Ý±ê×¼
MongoDBʵս¿Î³Ì
²¢·¢¡¢´óÈÝÁ¿¡¢¸ßÐÔÄÜÊý¾Ý¿âÉè¼ÆÓëÓÅ»¯
PostgreSQLÊý¾Ý¿âʵսÅàѵ