Äú¿ÉÒÔ¾èÖú£¬Ö§³ÖÎÒÃǵĹ«ÒæÊÂÒµ¡£

1Ôª 10Ôª 50Ôª





ÈÏÖ¤Â룺  ÑéÖ¤Âë,¿´²»Çå³þ?Çëµã»÷Ë¢ÐÂÑéÖ¤Âë ±ØÌî



  ÇóÖª ÎÄÕ ÎÄ¿â Lib ÊÓÆµ iPerson ¿Î³Ì ÈÏÖ¤ ×Éѯ ¹¤¾ß ½²×ù Model Center   Code  
»áÔ±   
   
 
     
   
 ¶©ÔÄ
  ¾èÖú
»úÆ÷ѧϰʵ¼ù£ºÈçºÎ½«SparkÓëPython½áºÏ
 
  2524  次浏览      27
 2019-9-17
 
±à¼­ÍƼö:
±¾ÎÄÀ´×ÔÍøÂç´óÊý¾Ý,±¾ÎÄͨ¹ý½éÉÜApache SparkµÄÒ»Ð©ÌØÐÔÒÔ¼°ÈçºÎʹÓÃPythonÉèÖÃSpark£¬Ê¹SparkºÍPythonÏà½áºÏ¡£

Apache SparkÊÇ´¦ÀíºÍʹÓôóÊý¾Ý×î¹ã·ºµÄ¿ò¼ÜÖ®Ò»£¬PythonÊÇÊý¾Ý·ÖÎö¡¢»úÆ÷ѧϰµÈÁìÓò×î¹ã·ºÊ¹Óõıà³ÌÓïÑÔÖ®Ò»¡£Èç¹ûÏëÒª»ñµÃ¸ü°ôµÄ»úÆ÷ѧϰÄÜÁ¦£¬ÎªÊ²Ã´²»½«SparkºÍPythonÒ»ÆðʹÓÃÄØ?
ÔÚ¹úÍ⣬Apache Spark¿ª·¢ÈËÔ±µÄƽ¾ùÄêнΪ110,000ÃÀÔª¡£ºÁÎÞÒÉÎÊ£¬SparkÔÚÕâ¸öÐÐÒµÖб»¹ã·ºÊ¹Óá£ÓÉÓÚÆä·á¸»µÄ¿â¼¯£¬PythonÒ²±»´ó¶àÊýÊý¾Ý¿ÆÑ§¼ÒºÍ·ÖÎöר¼ÒʹÓ᣶þÕß¼¯³ÉÒ²²¢Ã»ÓÐÄÇôÀ§ÄÑ£¬SparkÓÃScalaÓïÑÔ¿ª·¢£¬ÕâÖÖÓïÑÔÓëJava·Ç³£ÏàËÆ¡£Ëü½«³ÌÐò´úÂë±àÒëΪÓÃÓÚSpark´óÊý¾Ý´¦ÀíµÄJVM×Ö½ÚÂ롣ΪÁ˼¯³ÉSparkºÍPython£¬Apache SparkÉçÇø·¢²¼ÁËPySpark¡£

Apache SparkÊÇApache Software Foundation¿ª·¢µÄÓÃÓÚʵʱ´¦ÀíµÄ¿ªÔ´¼¯Èº¼ÆËã¿ò¼Ü¡£SparkÌṩÁËÒ»¸ö½Ó¿Ú£¬ÓÃÓÚ±à³Ì¾ßÓÐÒþʽÊý¾Ý²¢ÐкÍÈÝ´í¹¦ÄܵÄÕû¸ö¼¯Èº¡£

ÏÂÃæÊÇApache SparkµÄÒ»Ð©ÌØÐÔ£¬Ëü±ÈÆäËû¿ò¼Ü¸ü¾ßÓÅÊÆ£º

ËÙ¶È£º±È´«Í³µÄ´óÐÍÊý¾Ý´¦Àí¿ò¼Ü¿ì100±¶¡£

Ç¿´óµÄ»º´æ£º¼òµ¥µÄ±à³Ì²ãÌṩǿ´óµÄ»º´æºÍ´ÅÅ̳־ÃÐÔ¹¦ÄÜ¡£

²¿Ê𣺿ÉÒÔͨ¹ýMesos¡¢Yarn»òSpark×Ô¼ºµÄ¼¯Èº¹ÜÀíÆ÷½øÐв¿Êð¡£

ʵʱ£ºÄÚ´æ¼ÆË㣬ʵʱ¼ÆËãÇÒµÍÑÓ³Ù¡£

Polyglot£ºÕâÊǸÿò¼Ü×îÖØÒªµÄÌØÐÔÖ®Ò»£¬ÒòΪËü¿ÉÒÔÔÚScala£¬Java£¬PythonºÍRÖбà³Ì¡£

ËäÈ»SparkÊÇÔÚScalaÖÐÉè¼ÆµÄ£¬µ«ËüµÄËٶȱÈPython¿ì10±¶£¬µ«Ö»Óе±Ê¹ÓõÄÄÚºËÊýÁ¿ÉÙʱ£¬Scala²Å»áÌåÏÖ³öËÙ¶ÈÓÅÊÆ¡£ÓÉÓÚÏÖÔÚ´ó¶àÊý·ÖÎöºÍ´¦Àí¶¼ÐèÒª´óÁ¿Äںˣ¬Òò´ËScalaµÄÐÔÄÜÓÅÊÆ²¢²»´ó¡£

¶ÔÓÚ³ÌÐòÔ±À´Ëµ£¬ÓÉÓÚÆäÓï·¨ºÍ±ê×¼¿â·á¸»£¬PythonÏà¶ÔÀ´Ëµ¸üÈÝÒ×ѧϰ¡£¶øÇÒ£¬ËüÊÇÒ»ÖÖ¶¯Ì¬ÀàÐÍÓïÑÔ£¬ÕâÒâζ×ÅRDD¿ÉÒÔ±£´æ¶àÖÖÀàÐ͵ĶÔÏó¡£

¾¡¹ÜScalaÓµÓÐSparkMLlib£¬µ«ËüûÓÐ×ã¹»µÄ¿âºÍ¹¤¾ßÀ´ÊµÏÖ»úÆ÷ѧϰºÍNLP¡£´ËÍ⣬Scala ȱ·¦Êý¾Ý¿ÉÊÓ»¯¡£

ʹÓÃPythonÉèÖÃSpark(PySpark)

Ê×ÏÈÒªÏÂÔØSpark²¢°²×°£¬Ò»µ©Äã½âѹËõÁËsparkÎļþ£¬°²×°²¢½«ÆäÌí¼Óµ½ .bashrcÎļþ·¾¶ÖУ¬ÄãÐèÒªÊäÈësource .bashrc

Òª´ò¿ªPySpark shell£¬ÐèÒªÊäÈëÃüÁî./bin/pyspark

PySpark SparkContextºÍÊý¾ÝÁ÷

ÓÃPythonÀ´Á¬½ÓSpark£¬¿ÉÒÔʹÓÃRD4s²¢Í¨¹ý¿âPy4jÀ´ÊµÏÖ¡£PySpark Shell½«Python APIÁ´½Óµ½Spark Core²¢³õʼ»¯Spark Context¡£SparkContextÊÇSparkÓ¦ÓóÌÐòµÄºËÐÄ¡£

Spark ContextÉèÖÃÄÚ²¿·þÎñ²¢½¨Á¢µ½SparkÖ´Ðл·¾³µÄÁ¬½Ó¡£

Çý¶¯³ÌÐòÖеÄSpark Context¶ÔÏóЭµ÷ËùÓзֲ¼Ê½½ø³Ì²¢ÔÊÐí½øÐÐ×ÊÔ´·ÖÅä¡£

¼¯Èº¹ÜÀíÆ÷Ö´ÐгÌÐò£¬ËüÃÇÊǾßÓÐÂß¼­µÄJVM½ø³Ì¡£

Spark Context¶ÔÏó½«Ó¦ÓóÌÐò·¢Ë͸øÖ´ÐÐÕß¡£

Spark ContextÔÚÿ¸öÖ´ÐÐÆ÷ÖÐÖ´ÐÐÈÎÎñ¡£

PySpark KDDÓÃÀý

ÏÖÔÚÈÃÎÒÃÇÀ´¿´Ò»¸öÓÃÀý£ºÊý¾ÝÀ´Ô´ÎªKDD'99 Cup(¹ú¼Ê֪ʶ·¢ÏÖºÍÊý¾ÝÍÚ¾ò¹¤¾ß¾ºÈü£¬¹úÄÚÒ²ÓÐÀàËÆµÄ¾ºÈü¿ª·ÅÊý¾Ý¼¯£¬±ÈÈçÖªºõ)¡£ÕâÀïÎÒÃǽ«È¡Êý¾Ý¼¯µÄÒ»²¿·Ö£¬ÒòΪԭʼÊý¾Ý¼¯Ì«´ó¡£

´´½¨RDD£º

ÏÖÔÚÎÒÃÇ¿ÉÒÔʹÓÃÕâ¸öÎļþÀ´´´½¨ÎÒÃǵÄRDD¡£

¹ýÂË

¼ÙÉèÎÒÃÇÒª¼ÆËãÎÒÃÇÔÚÊý¾Ý¼¯ÖÐÓжàÉÙÕý³£µÄÏ໥×÷Óᣣ¬¿ÉÒÔ°´ÈçϹýÂËÎÒÃǵÄraw_data RDD¡£

¼ÆÊý£º

ÏÖÔÚÎÒÃÇ¿ÉÒÔ¼ÆËã³öÐÂRDDÖÐÓжàÉÙÔªËØ¡£

Êä³ö£º

ÖÆÍ¼£º

ÔÚÕâÖÖÇé¿öÏ£¬ÎÒÃÇÏëÒª½«Êý¾ÝÎļþ×÷ΪCSV¸ñʽÎļþ¶ÁÈ¡¡£ÎÒÃÇ¿ÉÒÔͨ¹ý¶ÔRDDÖеÄÿ¸öÔªËØÓ¦ÓÃlambdaº¯Êý¡£ÈçÏÂËùʾ£¬ÕâÀïÎÒÃǽ«Ê¹ÓÃmap()ºÍtake()ת»»¡£

Êä³ö:

²ð·Ö£º

ÏÖÔÚ£¬ÎÒÃÇÏ£Íû½«RDDÖеÄÿ¸öÔªËØ¶¼ÓÃ×÷¼üÖµ¶Ô£¬ÆäÖмüÊDZê¼Ç(ÀýÈçÕý³£Öµ)£¬ÖµÊDZíʾCSV¸ñʽÎļþÖÐÐеÄÕû¸öÔªËØÁÐ±í¡£ ÎÒÃÇ¿ÉÒÔ°´ÈçϽøÐУ¬ÕâÀïÎÒÃÇʹÓÃline.split()ºÍmap()¡£

Êä³ö:

ÊÕ¼¯£º

ʹÓÃcollect()¶¯×÷£¬½«RDDËùÓÐÔªËØ´æÈëÄÚ´æ¡£Òò´Ë£¬Ê¹ÓôóÐÍRDDʱ±ØÐëСÐÄʹÓá£

Êä³ö:

µ±È»£¬Õâ±ÈÎÒÃÇ֮ǰµÄÈκβÙ×÷»¨·ÑµÄʱ¼ä¶¼Òª³¤¡£Ã¿¸ö¾ßÓÐRDDƬ¶ÎµÄSpark¹¤×÷½Úµã¶¼±ØÐë½øÐÐЭµ÷£¬ÒÔ±ã¼ìË÷Æä¸÷²¿·ÖÄÚÈÝ£¬È»ºó½«ËùÓÐÄÚÈݼ¯ºÏµ½Ò»Æð¡£

×÷Ϊ½áºÏÇ°ÃæËùÓÐÄÚÈݵÄ×îºóÒ»¸öÀý×Ó£¬ÎÒÃÇÏ£ÍûÊÕ¼¯ËùÓг£¹æ½»»¥×÷Ϊ¼üÖµ¶Ô¡£

Êä³ö:

   
2524 ´Îä¯ÀÀ       27
Ïà¹ØÎÄÕÂ

»ùÓÚEAµÄÊý¾Ý¿â½¨Ä£
Êý¾ÝÁ÷½¨Ä££¨EAÖ¸ÄÏ£©
¡°Êý¾Ýºþ¡±£º¸ÅÄî¡¢ÌØÕ÷¡¢¼Ü¹¹Óë°¸Àý
ÔÚÏßÉ̳ÇÊý¾Ý¿âϵͳÉè¼Æ ˼·+Ч¹û
 
Ïà¹ØÎĵµ

GreenplumÊý¾Ý¿â»ù´¡Åàѵ
MySQL5.1ÐÔÄÜÓÅ»¯·½°¸
ijµçÉÌÊý¾ÝÖÐ̨¼Ü¹¹Êµ¼ù
MySQL¸ßÀ©Õ¹¼Ü¹¹Éè¼Æ
Ïà¹Ø¿Î³Ì

Êý¾ÝÖÎÀí¡¢Êý¾Ý¼Ü¹¹¼°Êý¾Ý±ê×¼
MongoDBʵս¿Î³Ì
²¢·¢¡¢´óÈÝÁ¿¡¢¸ßÐÔÄÜÊý¾Ý¿âÉè¼ÆÓëÓÅ»¯
PostgreSQLÊý¾Ý¿âʵսÅàѵ