Äú¿ÉÒÔ¾èÖú£¬Ö§³ÖÎÒÃǵĹ«ÒæÊÂÒµ¡£

1Ôª 10Ôª 50Ôª





ÈÏÖ¤Â룺  ÑéÖ¤Âë,¿´²»Çå³þ?Çëµã»÷Ë¢ÐÂÑéÖ¤Âë ±ØÌî



  ÇóÖª ÎÄÕ ÎÄ¿â Lib ÊÓÆµ iPerson ¿Î³Ì ÈÏÖ¤ ×Éѯ ¹¤¾ß ½²×ù Model Center   Code  
»áÔ±   
   
 
     
   
 ¶©ÔÄ
  ¾èÖú
Spark on Angel£ºSpark»úÆ÷ѧϰµÄºËÐļÓËÙÆ÷
 
À´Ô´£º Áú¹ûѧԺ ·¢²¼ÓÚ£º2017-9-27
  3662  次浏览      27
 

SparkµÄºËÐĸÅÄîÊÇRDD£¬¶øRDDµÄ¹Ø¼üÌØÐÔÖ®Ò»ÊÇÆä²»¿É±äÐÔ£¬À´¹æ±Ü·Ö²¼Ê½»·¾³Ï¸´Ôӵĸ÷ÖÖ²¢ÐÐÎÊÌâ¡£Õâ¸ö³éÏó£¬ÔÚÊý¾Ý·ÖÎöµÄÁìÓòÊÇûÓÐÎÊÌâµÄ£¬ËüÄÜ×î´ó»¯µÄ½â¾ö·Ö²¼Ê½ÎÊÌ⣬¼ò»¯¸÷ÖÖËã×ӵĸ´ÔÓ¶È£¬²¢Ìṩ¸ßÐÔÄܵķֲ¼Ê½Êý¾Ý´¦ÀíÔËËãÄÜÁ¦¡£

È»¶øÔÚ»úÆ÷ѧϰÁìÓò£¬RDDµÄÈõµãºÜ¿ìÒ²±©Â¶ÁË¡£»úÆ÷ѧϰµÄºËÐÄÊǵü´úºÍ²ÎÊý¸üС£RDDƾ½è×ÅÂß¼­Éϲ»Â䵨µÄÄÚ´æ¼ÆËãÌØÐÔ£¬¿ÉÒԺܺõĽâ¾öµü´úµÄÎÊÌ⣬Ȼ¶øRDDµÄ²»¿É±äÐÔ£¬È´·Ç³£²»ÊʺϲÎÊý·´¸´¶à´Î¸üеÄÐèÇó¡£Õâ±¾ÖÊÉϵIJ»Æ¥ÅäÐÔ£¬µ¼ÖÂÁËSparkµÄMLlib¿â£¬·¢Õ¹Ò»Ö±·Ç³£»ºÂý£¬´Ó2015Ä꿪ʼ¾ÍûÓÐʵÖÊÐԵĴ´Ð£¬ÐÔÄÜÒ²²»ºÃ¡£

Ϊ´Ë£¬AngelÔÚÉè¼ÆÉú̬ȦµÄʱºò£¬ÓÅÏÈ¿¼ÂÇÁËSpark¡£ÔÚV1.0.0ÍÆ³öµÄʱºò£¬¾ÍÒѾ­¾ß±¸ÁËSpark on AngelµÄ¹¦ÄÜ£¬»ùÓÚAngelΪSpark¼ÓÉÏÁËPS¹¦ÄÜ£¬ÔÚ²»±äÖмÓÈëÁ˱仯µÄÒòËØ£¬¿ÉνÈ绢ÌíÒí¡£

ÎÒÃǽ«ÒÔL-BFGSΪÀý£¬À´·ÖÎöSparkÔÚ»úÆ÷ѧϰËã·¨µÄʵÏÖÉϵÄÎÊÌ⣬ÒÔ¼°Spark on AngelÊÇÈçºÎ½â¾öSparkÔÚ»úÆ÷ѧϰÈÎÎñÖеÄÓöµ½µÄÆ¿¾±£¬ÈÃSparkµÄ»úÆ÷ѧϰ¸ü¼ÓÇ¿´ó¡£

1. L-BFGSË㷨˵Ã÷

2.L-BFGSµÄSparkʵÏÖ

3.L-BFGSµÄSpark on AngelʵÏÖ

3.1 ʵÏÖ¿ò¼Ü

Spark on Angel½èÖúAngel PS-ServiceµÄ¹¦ÄÜΪSparkÒýÈëPSµÄ½ÇÉ«£¬¼õÇáÕû¸öËã·¨Á÷³Ì¶ÔdriverµÄÒÀÀµ¡£two-loop recursionËã·¨µÄÔËËã½»¸øPS£¬¶ødriverÖ»¸ºÔðÈÎÎñµÄµ÷¶È£¬´ó´ó¼õÇáµÄ¶ÔdriverÐÔÄܵÄÒÀÀµ¡£

Angel PSÓÉÒ»×é·Ö²¼Ê½½Úµã×é³É£¬Ã¿¸övector¡¢matrix±»Çзֳɶà¸öpartition±£´æµ½²»Í¬µÄ½ÚµãÉÏ£¬Í¬Ê±Ö§³ÖvectorºÍmatrixÖ®¼äµÄÔËË㣻

3.2 ÐÔÄÜ·ÖÎö

Õû¸öËã·¨¹ý³Ì£¬driverÖ»¸ºÔðÈÎÎñµ÷¶È£¬¶ø¸´ÔÓµÄtwo-loop recursionÔËËãÔÚPSÉÏÔËÐУ¬ÌݶȵÄAggregateºÍÄ£Ð͵Äͬ²½ÊÇexecutorºÍPSÖ®¼ä½øÐУ¬ËùÓÐÔËËã¶¼±ä³É·Ö²¼Ê½¡£ÔÚÍøÂç´«ÊäÖУ¬¸ßά¶ÈµÄPSVector»á±»ÇгÉСµÄÊý¾Ý¿éÔÙ·¢Ë͵½Ä¿±ê½Úµã£¬ÕâÖÖ½ÚµãÖ®¼ä¶à¶Ô¶àµÄ´«Êä´ó´óÌá¸ßÁËÌݶȾۺϺÍÄ£ÐÍͬ²½µÄËÙ¶È¡£ ÕâÑùSpark on AngelÍêÈ«±Ü¿ªÁËSparkÖÐdriverµ¥µãµÄÆ¿¾±£¬ÒÔ¼°ÍøÂç´«Êä¸ßά¶ÈÏòÁ¿µÄÎÊÌâ¡£

4.¡°ÇáÒ×Ç¿¿ì¡±µÄSpark on Angel

Spark on AngelÊÇAngelΪ½â¾öSparkÔÚ»úÆ÷ѧϰģÐÍѵÁ·ÖеÄȱÏݶøÉè¼ÆµÄ¡°²å¼þ¡±£¬Ã»ÓжÔSpark×ö¡°ÇÖÈëʽ¡±µÄÐ޸ģ¬ÊÇÒ»¸ö¶ÀÁ¢µÄ¿ò¼Ü¡£¿ÉÒÔÓà ¡°Çᡱ¡¢¡°Òס±¡¢¡°Ç¿¡±¡¢¡°¿ì¡± À´¸ÅÀ¨Spark on AngelµÄÌØµã¡£

4.1 Çá ¡ª ¡°²å¼þʽ¡±µÄ¿ò¼Ü

Spark on AngelÊÇAngelΪ½â¾öSparkÔÚ»úÆ÷ѧϰģÐÍѵÁ·ÖеÄȱÏݶøÉè¼ÆµÄ¡°²å¼þ¡±¡£Spark on AngelûÓжÔSparkÖеÄRDD×öÇÖÈëʽµÄÐ޸ģ¬Spark on AngelÊÇÒÀÀµÓÚSparkºÍAngelµÄ¿ò¼Ü£¬Í¬Ê±ÆäÂß¼­ÓÖ¶ÀÁ¢ÓÚSparkºÍAngel¡£ Òò´Ë£¬SparkÓû§Ê¹ÓÃSpark on Angel·Ç³£¼òµ¥£¬Ö»ÐèÔÚSparkµÄÌá½»½Å±¾Àï×öÈý´¦¸Ä¶¯¼´¿É£¬ÏêÇé¿É¼ûAngelµÄGithub Spark on Angel Quick StartÎĵµ

¿ÉÒÔ¿´µ½Ìá½»µÄSpark on AngelÈÎÎñ£¬Æä±¾ÖÊÉÏÒÀÈ»ÊÇÒ»¸öSparkÈÎÎñ£¬Õû¸öÈÎÎñµÄÖ´Ðйý³ÌÓëSparkÒ»ÑùµÄ¡£

source ${Angel_HOME}/bin/spark-on-angel-env.sh

$SPARK_HOME/bin/spark-submit \

--master yarn-cluster \

--conf spark.ps.jars=$SONA_ANGEL_JARS \

--conf spark.ps.instances=20 \

--conf spark.ps.cores=4 \

--conf spark.ps.memory=10g \

--jars $SONA_SPARK_JARS \

....

Spark on AngelÄܹ»³ÉΪÈç´ËÇáÁ¿¼¶µÄ¿ò¼Ü£¬µÃÒæÓÚAngel¶ÔPS-ServiceµÄ·â×°£¬Ê¹SparkµÄdriverºÍexecutor¿ÉÒÔͨ¹ýPsAgent¡¢PSClientÓëAngel PS×öÊý¾Ý½»»¥¡£

4.2 Ç¿ ¡ª ¹¦ÄÜÇ¿´ó£¬Ö§³Öbreeze¿â

breeze¿âÊÇscalaʵÏÖµÄÃæÏò»úÆ÷ѧϰµÄÊýÖµÔËËã¿â¡£Spark MLlibµÄ´ó²¿·ÖÊýÖµÓÅ»¯Ëã·¨¶¼ÊÇͨ¹ýµ÷ÓÃbreezeÀ´Íê³ÉµÄ¡£ÈçÏÂËùʾ£¬SparkºÍSpark on AngelÁ½ÖÖʵÏÖ¶¼ÊÇͨ¹ýµ÷ÓÃbreeze.optimize.LBFGSʵÏֵġ£SparkµÄʵÏÖÊÇ---BreezePSVector¡£-----

BreezePSVectorÊÇÖ¸Angel PSÉϵÄVector£¬¸ÃVectorʵÏÖÁËbreeze NumericOpsÏµķ½·¨£¬Èç³£ÓÃµÄ dot£¬scale£¬axpy£¬addµÈÔËË㣬Òò´ËÔÚLBFGS[BreezePSVector] two-loop recursionËã·¨Öеĸßά¶ÈÏòÁ¿ÔËËãÊÇBreezePSVectorÖ®¼äµÄÔËË㣬¶øBreezePSVectorÖ®¼äÈ«²¿ÔÚAngel PSÉÏ·Ö²¼Ê½Íê³É¡£

SparkµÄL-BFGSʵÏÖ

4.3 Ò× ¡ª ±à³Ì½Ó¿Ú¼òµ¥

SparkÄܹ»ÔÚ´óÊý¾ÝÁìÓòÕâôÁ÷ÐеÄÁíÍâÒ»¸öÔ­ÒòÊÇ£ºÆä±à³Ì·½Ê½¼òµ¥¡¢ÈÝÒ×Àí½â£¬Spark on AngelͬÑù¼Ì³ÐÁËÕâ¸öÌØÐÔ¡£ Spark on Angel±¾ÖÊÊÇÒ»¸öSparkÈÎÎñ£¬Õû¸ö´úÂëʵÏÖÂß¼­¸úSparkÊÇÒ»Öµģ»µ±ÐèÒªÓëPSVector×öÔËËãʱ£¬µ÷ÓÃÏàÓ¦µÄ½Ó¿Ú¼´¿É¡£

ÈçÏ´úÂëËùʾ£¬LBFGSÔÚSparkºÍSpark on AngelÉϵÄʵÏÖ£¬¶þÕß´úÂëµÄÕûÌå˼·ÊÇÒ»ÑùµÄ£¬Ö÷ÒªµÄÇø±ðÊÇÌݶÈÏòÁ¿µÄAggregateºÍÄ£ÐÍ µÄpull/push¡£ Òò´Ë£¬Èç¹û½«SparkµÄËã·¨¸ÄÔì³ÉSpark on AngelµÄÈÎÎñ£¬Ö»ÐèÒªÐÞ¸ÄÉÙÁ¿µÄ´úÂë¼´¿É¡£

L-BFGSÐèÒªÓû§ÊµÏÖDiffFunction£¬DiffFunctionµÄcalculte½Ó¿ÚÊäÈë²ÎÊýÊÇ £¬±éÀúѵÁ·Êý¾Ý²¢·µ»Ø loss ºÍ gradient¡£

ÆäÍêÕû´úÂ룬ÇëǰÍùGithub SparseLogistic

SparkµÄDiffFunctionʵÏÖ

 

4.4 ¿ì ¡ª ÐÔÄÜÇ¿¾¢

ÎÒÃÇ·Ö±ðʵÏÖÁËSGD¡¢LBFGS¡¢OWLQNÈýÖÖÓÅ»¯·½·¨µÄLR£¬²¢ÔÚSparkºÍSpark on AngelÉÏ×öÁËʵÑé¶Ô±È¡£ ¸ÃʵÑé´úÂëÇëǰÍùGithub SparseLRWithX.scala .

Êý¾Ý¼¯£ºÌÚѶÄÚ²¿Ä³ÒµÎñµÄÒ»·ÝÊý¾Ý¼¯£¬2.3ÒÚÑù±¾£¬5ǧÍòά¶È

ʵÑéÉèÖãº

˵Ã÷1£ºÈý×é¶Ô±ÈʵÑéµÄ×ÊÔ´ÅäÖÃÈçÏ£¬ÎÒÃǾ¡¿ÉÄܱ£Ö¤ËùÓÐÈÎÎñÔÚ×ÊÔ´³ä×ãµÄÇé¿öÏÂÖ´ÐУ¬Òò´ËÅäÖõÄ×ÊÔ´±Èʵ¼ÊÐèÒªµÄÆ«¶à£»

˵Ã÷2£ºÖ´ÐÐSparkÈÎÎñʱ£¬ÐèÒª¼Ó´óspark.driver.maxResultSize²ÎÊý£»¶øSpark on Angel¾Í²»ÓÃÅäÖô˲ÎÊý¡£

ÈçÉÏÊý¾ÝËùʾ£¬Spark on AngelÏà½ÏÓÚSparkÔÚѵÁ·LRÄ£ÐÍʱÓÐ50%ÒÔÉϵļÓËÙ£»¶ÔÓÚÔ½¸´ÔÓµÄÄ£ÐÍ£¬Æä¼ÓËٵıÈÀýÔ½´ó¡£

5.½áÓï

Spark on AngelµÄ³öÏÖ¿ÉÒÔ¸ßЧ¡¢µÍ³É±¾µØ¿Ë·þSparkÔÚ»úÆ÷ѧϰÁìÓòÓöµ½µÄÆ¿¾±£»ÎÒÃǽ«¼ÌÐøÓÅ»¯Spark on Angel£¬²¢Ìá¸ßÆäÐÔÄÜ¡£Ò²»¶Ó­´ó¼ÒÔÚGithubÉÏÒ»Æð²ÎÓëÎÒÃǵĸĽø¡£

 

 

   
3662 ´Îä¯ÀÀ       27
Ïà¹ØÎÄÕÂ

»ùÓÚEAµÄÊý¾Ý¿â½¨Ä£
Êý¾ÝÁ÷½¨Ä££¨EAÖ¸ÄÏ£©
¡°Êý¾Ýºþ¡±£º¸ÅÄî¡¢ÌØÕ÷¡¢¼Ü¹¹Óë°¸Àý
ÔÚÏßÉ̳ÇÊý¾Ý¿âϵͳÉè¼Æ ˼·+Ч¹û
 
Ïà¹ØÎĵµ

GreenplumÊý¾Ý¿â»ù´¡Åàѵ
MySQL5.1ÐÔÄÜÓÅ»¯·½°¸
ijµçÉÌÊý¾ÝÖÐ̨¼Ü¹¹Êµ¼ù
MySQL¸ßÀ©Õ¹¼Ü¹¹Éè¼Æ
Ïà¹Ø¿Î³Ì

Êý¾ÝÖÎÀí¡¢Êý¾Ý¼Ü¹¹¼°Êý¾Ý±ê×¼
MongoDBʵս¿Î³Ì
²¢·¢¡¢´óÈÝÁ¿¡¢¸ßÐÔÄÜÊý¾Ý¿âÉè¼ÆÓëÓÅ»¯
PostgreSQLÊý¾Ý¿âʵսÅàѵ