Äú¿ÉÒÔ¾èÖú£¬Ö§³ÖÎÒÃǵĹ«ÒæÊÂÒµ¡£

1Ôª 10Ôª 50Ôª





ÈÏÖ¤Â룺  ÑéÖ¤Âë,¿´²»Çå³þ?Çëµã»÷Ë¢ÐÂÑéÖ¤Âë ±ØÌî



  ÇóÖª ÎÄÕ ÎÄ¿â Lib ÊÓÆµ iPerson ¿Î³Ì ÈÏÖ¤ ×Éѯ ¹¤¾ß ½²×ù Modeler   Code  
»áÔ±   
 
   
 
 
     
   
 ¶©ÔÄ
  ¾èÖú
È¡´ú¶ø·Ç²¹³ä£¬Spark Summit 2014¾«²Ê»Ø¹Ë
 
×÷Õß Íõ¶«&ÐÁœ›  »ðÁú¹ûÈí¼þ  ·¢²¼ÓÚ 2014-07-21
  1680  次浏览      27
 

Apache Spark¿ªÔ´Éú̬ϵͳÔÚ2014ÉϰëÄê´ó·ùÔö³¤£¬ÒÑѸËÙ³ÉΪ´óÊý¾ÝÁìÓòÖÐ×î»îÔ¾µÄ¿ªÔ´ÏîÄ¿£¬HDFSλÁеڶþ£¬Æä´úÂë±ä¶¯´ÎÊý£¨commits£©ºÍÐÐÊý½ö½öÓÐSparkµÄÒ»°ë£º

  • Óг¬¹ý50¸ö»ú¹¹250¸ö¹¤³Ìʦ¹±Ï×¹ý´úÂë
  • ºÍÈ¥ÄêÁùÔÂÏà±È£¬´úÂëÐÐÊý¼¸ºõÀ©´óÈý±¶¡£

Ëæ×Å1.0°æ±¾ÓÚ5ÔÂ30ÈÕÍÆ³ö£¬SparkÌṩÁËÒ»¸öÎȶ¨µÄAPI£¬¿ª·¢ÈËÔ±¿ÉÒÔÒÀ¿¿ËüÀ´±£Ö¤´úÂëµÄ¼æÈÝÐÔ¡£ËùÓÐÖ÷Á÷µÄHadoop·¢ÐÐÉÌ£¬°üÀ¨Hortonworks¡¢IBM¡¢Cloudera¡¢MapRºÍPivotal¶¼ÌṩÁËSparkµÄ°ü×°ºÍ¼¼ÊõÖ§³Ö¡£

»áÒéµÚÈýÈÕÅàѵ

°éËæ×ÅSparkƽ̨µÄ·¢Õ¹£¬Spark Summit2014ÓÚ6ÔÂ30ÈÕÔھɽðɽÕýʽչ¿ªÎªÆÚÈýÌìµÄ·å»á£¬Ò²ÊÇÓÐÊ·ÒÔÀ´×î´óµÄSpark»áÒé¡£

  • ´ó»áÊÕµ½Á˰üÀ¨SAP¡¢IBM¡¢Intel¡¢AmazonºÍClouderaµÈ½ü30¸ö¹«Ë¾µÄÔÞÖú
  • 1000¶àλ´ÓÊ´óÊý¾ÝÓ¦ÓúͿª·¢µÄר¼Ò×¢²áÁË»áÒé
  • 300¶àÃû¿ª·¢ÕߺÍÊý¾Ý¿ÆÑ§¼Ò²Î¼ÓÁ˵ÚÈýÌìµÄÅàѵ
  • 12λÀ´×ÔDatabricks¡¢SAP¡¢Cloudera¡¢MapR¡¢DataStaxºÍJawboneµÈ¹«Ë¾µÄ¸ß¹Ü·¢±íÁËÖ÷ÌâÑݽ²
  • ´ó»á¹²Éè50¸ö¼¼Êõ½²×ù£¬·ÖÎªÌØÉ«Ó¦Óᢿª·¢¡¢Êý¾Ý¿ÆÑ§ÓëÑо¿Èý¸öÁìÓò

ÔÚÕâÆªÎÄÕÂÖУ¬ÎÒÃǽ«Ò»Í¬»Ø¹ËÕâ´Î·å»áµÄÁÁµã¡£

Spark¿ªÔ´Éú̬ϵͳµÄÏÖÔÚºÍδÀ´

1. Spark´´Ê¼ÈË¡¢Databricks CTO Matei Zaharia£ºSparkÔÚ´óÊý¾ÝÁìÓòµÄ½ÇÉ«

Mate ZahariaÔÚ¼ÓÖÝ´óѧ²®¿ËÀû·ÖУAMPLab²©Ê¿ÉúÑĵÄʱºòÉè¼ÆºÍ±àдÁ˵ÚÒ»¸ö°æ±¾µÄSpark£¬ÔÚÔ²ÂúÍê³É²©Ê¿ÉúÑĺó£¬Ä¿Ç°ÊÇDatabricks¹«Ë¾µÄCTO£¬²¢½«ÔÚÂéÊ¡Àí¹¤Ñ§Ôº³öÈÎÖúÀí½ÌÊÚְλ¡£MateiÊǴ˴ηå»áµÄµÚÒ»¸öÑݽ²ÈË£¬ËûÊ×ÏȻعËÁËSparkÔÚÉçÇø¹æÄ£ºÍ¼¼ÊõÄÜÁ¦ÉϵÄ×îнøÕ¹¡£×Ô2013Äê12ÔµÄÊ×´ÎSpark SummitÒÔÀ´£¬SparkµÄ¿ªÔ´¹±Ï×ÕßÒѾ­´Ó100λÔö¼ÓÖÁ250+£¬ÒѳÉΪ´óÊý¾ÝÁìÓò×î»îÔ¾µÄ¿ªÔ´ÏîÄ¿¡£SparkÐÂÔöÁËÒ»Ð©ÖØÒªµÄ×é¼þ£¬ÈçSpark?SQLÔËÐлúÖÆ£¬Ò»¸ö¸ü´óµÄ»úÆ÷ѧϰ¿âMLLib£¬ÒÔ¼°·á¸»µÄÓëÆäËüÊý¾Ý´¦ÀíϵͳµÄ¼¯³É¡£¹ØÓÚSparkÔÚ´óÊý¾ÝÁìÓòδÀ´½ÇÉ«£¬MateiÉèÏëSparkºÜ¿ì»á³ÉΪ´óÊý¾ÝµÄͳһƽ̨£¬¸÷ÖÖ²»Í¬µÄÓ¦Óã¬ÈçÁ÷´¦Àí£¬»úÆ÷ѧϰºÍSQL£¬¶¼¿ÉÒÔͨ¹ýSpark½¨Á¢ÔÚ²»Í¬µÄ´æ´¢ºÍÔËÐÐϵͳÉÏ¡£

2. DatabricksÁªºÏ´´Ê¼ÈËPatrick Wendell£ºÕ¹ÍûSparkµÄδÀ´

Patrick WendellÊÇApache SparkµÄÏîÄ¿¹ÜÀí»á³ÉÔ±£¬ÔøÔÚ²®¿ËÀû·ÖУ¹¥¶Á²©Ê¿Ñ§Î»£¬Óë2013ÄêÀ뿪²®¿ËÀû°ïÖú´´½¨ÁËDatabricks¡£Ä¿Ç°ËûÔÚDatabricks´ÓÊ¿ªÔ´¹ÜÀí¹¤×÷£¬ÔÚ¼¼ÊõÉϲàÖØÓÚSparkºÍÍøÂç²Ù×÷ϵͳµÄ¹ØÏµ¡£ÔÚÕâ¸öÑݽ²ÖУ¬Patrick»Ø¹ËÁËSparkµÄ¿ìËÙÔö³¤£¬ËûÇ¿µ÷SparkµÄδÀ´½«ÌṩÓɸ÷ÁìÓòµÄר¼ÒÁìµ¼¿ª·¢µÄÇ¿´óµÄ³ÌÐò¿â¡£ÎªÁËʵÏÖÕâһĿ±ê£¬Ëû²ûÊöÁËÓ¦¸Ã²ÉÓõķ¢²¼Á÷³ÌºÍ½Ú×࣬ÒÔÌṩÍêÕûµÄ»¥²Ù×÷ÐÔÓëÎȶ¨µÄ°æ±¾£¬Í¬Ê±Ö§³Ö¿ìËٵĿª·¢¡£¸÷ÖÖ³ÌÐò¿âÓ¦ÓëSparkºËÐÄAPI¸ß¶È²ß»®ºÍÕûºÏÔÚÒ»Æð¡£SparkºËÐĻ᲻¶Ï¸Ä½ø£¬ÒÔÍÆ¶¯Î´À´µÄ´´Ð¡£Patrick½²½âÁËÏÖÓеÄÖ÷ÒªSpark¿âºÍËüÃǸ÷×Եķ¢Õ¹·½Ïò£¬°üÀ¨Ö§³Ö½á¹¹»¯Êý¾ÝµÄSpark SQL¡¢Spark Streaming¡¢ÓÃÓÚ»úÆ÷ѧϰµÄMLLibÒÔ¼°SparkRºÍGraphX¡£

Databrick Cloud ²úÆ··¢²¼

1. Databricks CEO Ion Stoica£ºDatabricks¹«Ë¾µÄ½øÕ¹ºÍ²úÆ··¢²¼

Databricks CEO Ion Stoica

Ion StoicaÊÇDatabricks¹«Ë¾µÄCEO¡£ËûÊǼÓÖÝ´óѧ²®¿ËÀû·ÖУ¼ÆËã»ú¿ÆÑ§ÏµµÄ½ÌÊÚ£¬²¢Óë2013Äê²ÎÓë´´°ìÁËDatabricks¡£IonÊ×ÏȲûÊöÁËDatabricks¹«Ë¾ÎªÍƽøSparkÔÚ¹¤Òµ½çµÄÓ¦ÓÃËù²ÉÈ¡µÄÁ½¸ö´ëÊ©¡£

  • DatabricksºÍSparkµÄ·ÖÏúÉÌ£ºCloudera£¬DataStax£¬MAPRºÍSAP£¬½¨Á¢ÁË»ï°é¹ØÏµ£¬ÒÔÌá¸ßÓû§µÄÌåÑé¡£
  • ½ñÄê¶þÔ£¬DatabricksÍÆ³öÁËSparkÈÏÖ¤¼Æ»®£¬ÒÔÈ·±£¾­ÈÏÖ¤µÄÓ¦ÓóÌÐò¿ÉÒÔÔËÐÐÔÚÈκξ­¹ýÈÏÖ¤µÄSpark·¢²¼ÉÏ¡£

IonÖ÷ÌâÑݽ²µÄÖØµãÊÇÍÆ³öDatabricks Cloud¡£IonÁоÙÁ˵±Ç°´ÓÊý¾Ýµ½¼ÛÖµ¹ý³ÌÖеÄÖÖÖÖÕϰ­£¬Databricks CloudµÄÍÆ³ö¾ÍÊÇΪÁËʹ´óÊý¾ÝÈÝÒס£Databricks CloudÄܹ»Ê¹Óû§·½±ãµÄ´´½¨Êý¾Ý´¦ÀíµÄÕû¸öÁ÷³Ì£¬Í¬Ê±Ö§³ÖSparkÏÖÓеÄÓ¦Ó㬲¢¼ÓÈëÁËÐí¶àÔöÇ¿ºÍ¸½¼Ó¹¦ÄÜ¡£Databricks CloudµÄÉè¼Æ³õÖÔ¾ÍÊÇÒª´ó´ó¼ò»¯´óÊý¾Ý´¦ÀíµÄµÄ¸´ÔÓÐÔ£¬Ëü»áÎüÒý¸ü¶àµÄÆóÒµÓû§´Óʵ½ÀûÓôóÊý¾ÝÀ´ÊµÏÖȫеļÛÖµ¡£

Databricks CloudÓÉDatabricks Platform£¬SparkºÍDatabricks WorkspaceÈý²¿·Ö×é³É¡£Databricks PlatformʹÓû§·Ç³£ÈÝÒ׵Ĵ´½¨ºÍ¹ÜÀíSpark¼ÆËã»úȺ£¬Ä¿Ç°ÔËÐÐÔÚAmazon AWSÉÏ£¬²»¾Ã½«À©Õ¹µ½¸ü¶àµÄÔÆ¹©Ó¦É̵ÄÉèÊ©ÉÏ¡£Databricks WorkspaceÓÉnotebook¡¢dashboardºÍÒ»¸öjob launcher×é³É£º

  1. NotebookÌṩÁ˷ḻµÄ½çÃæ£¬ÔÊÐíÓû§½øÐÐÊý¾ÝµÄ·¢ÏÖºÍ̽Ë÷£¬½»»¥Ê½»æÖƽá¹û£¬°ÑÕû¸ö¹¤×÷Á÷³Ì±äΪ½Å±¾Ö´ÐУ¬²¢Ö§³ÖÓû§Ö®¼äµÄ½»»¥Ð­×÷¡£
  2. ʹÓÃdashboard£¬Óû§¿ÉÒÔÑ¡ÔñÈκÎÒÔǰ´´½¨µÄnotebook£¬Í¨¹ýWISIWYG±à¼­Æ÷½«ËùÑ¡µÄnotebooks×é×°³ÉÒ»¸ödashboard£¬²¢·¢²¼¸ø¸ü¶àµÄÓû§¡£DashboardÉϵÄÊý¾ÝºÍ²éѯ»¹¿ÉÒÔ¶¨ÆÚˢС£
  3. Job launcherÔÊÐíÓû§ÔËÐÐÈÎÒâµÄApache SparkÈÎÎñ£¬´Ó¶ø¼ò»¯¹¹½¨Êý¾Ý²úÆ·µÄ¹ý³Ì¡£

2. DatabricksÁªºÏ´´Ê¼ÈËAli Ghodsi£ºÏÖ³¡ÑÝʾDatabricks Cloud

Ali GhodsiÓë2013Ä깲ͬ´´Á¢Databricks£¬ÏÖÈ餳ÌÖ÷¹Ü¡£Í¨¹ýDatabricks Cloud£¬AliÏ£ÍûÇáËÉÍê³É¼òµ¥µÄÈÎÎñ£¬²¢Ê¹¸´ÔӵķÖÎö³ÉΪ¿ÉÄÜ¡£ËûÑÝʾÁ˽öÐèµã»÷Êó±ê¼¸´Î¾Í¿ÉÒÔ·½±ãµÄÔÚAWSÉϽ¨Á¢Ò»¸öSpark¼ÆËã»úȺ¡£Ê¹ÓÃÒ»¸ö¹ØÓÚFIFAÊÀ½ç±­µÄʾÀýÊý¾Ý£¬ËûÑÝʾÁËnotebook£¬½»»¥Ê½Óû§½çÃæ£¬»æÍ¼£¬²ÎÊý»¯µÄ²éѯºÍdashboard¡£¹ØÓÚ´óÊý¾Ý·ÖÎö£¬ËûʹÓÃSpark SQL½»»¥´¦ÀíÁËÒ»¸ö3.4 TBµÄÍÆÌØÊý¾Ý¼¯¡£AliÖØµãÑÝʾÁËͨ¹ý»úÆ÷ѧϰÀ´½øÐÐʵʱ¸ÅÄîËÑË÷¡£ËûÊ×ÏÈʹÓÃMLlibÔÚÒ»¸ö60GBά»ù°Ù¿ÆÊý¾ÝÉϽ¨Á¢ÁËÒ»¸öTF-IDF´ÊÄ£ÐÍ£¬²¢ÓÃScala»ùÓÚ´ËÄ£Ðͽ¨Á¢ÁËÒ»¸ö²»Í¬´ÊÖ®¼äµÄÏàËÆº¯Êý£¬»¹ÔÚSpark SQLÉÏ×¢²áÁ˴˺¯Êý¡£×îºóʹÓÃSpark StreamingÉú³ÉÒ»¸ötweetÁ÷£¬²¢ÓÃSpark SQL¹ýÂ˳öºÍÓû§¸ø³öµÄËÑË÷´ÊÏà¹ØµÄtweets£¬±ÈÈçËÑË÷×ãÇò»áÏÔʾÊÀ½ç±­µÄtweets¡£Õâ¸öÑÝʾÔÚÌýÖÚÖеõ½¼«¸ßµÄÆÀ¼Û¡£ÈËÃǾªÌ¾ÑÝʾÖи´ÔÓµÄÊý¾ÝÁ÷³ÌºÍ·ÖÎöµÄÎ޷켯³É£¬´ó¼ÒÈÏΪDatabricks Cloudʹ×Ô¼º¿ÉÒÔ¸üרעÓÚ·ÖÎö±¾Éí£¬¶ø²»ÊÇ»¨·Ñ´óÁ¿Ê±¼ä¾«Á¦À´½¨Á¢Êý¾ÝµÄÁ÷³ÌÉèÊ©£¬Õâ»á¸øËûÃǹ«Ë¾ÒµÎñµÄÔö³¤Ìṩֱ½ÓµÄ¶¯Á¦¡£

Apache SparkºÍ´óÊý¾Ý²úÒµ

³öϯ±¾´Î·å»áµÄÓÐSAP¡¢DataStax¡¢Cloudera¡¢MapRµÈ¹«Ë¾µÄ¸ß¼¶Ö÷¹Ü£¬ËûÃǹØÓÚSparkºÍ´óÊý¾Ý²úÒµµÄÖ÷ÌâÑݽ²·Ç³£¾«²Ê¡£

ÔÚ·å»áÉÏ£¬DatabricksºÍSAPÐû²¼³ÉÁ¢ºÏ×÷»ï°é¹ØÏµ£¬ÔÚSAP HANAƽ̨Éϰü×°¾­¹ýÈÏÖ¤µÄSpark¡£SAP¸ß¼¶¸±×ܲÃAiaz Kazi½éÉÜÁËSAPµÄHANAºÍApache SparkÖ®¼äµÄЭͬЧӦ£¬ËüÃǵĽáºÏ¸øÆóÒµ´óÊý¾ÝÌṩÁ˸üºÃµÄÖ§³Ö¡£

HortonworksǰCEO/CTO Eric Baldeschwieler£¨Ò²³ÆEric 14£©ÖØÉêÁËËûµÄ¹Ûµã£º¡°Apache SparkÊǵ±½ñ´óÊý¾ÝÁìÓò×¶¯ÈËÐĵÄÊÂÇ顱¡£ËûÈÏΪSparkÉçÇøµÄÒ»¸öÖØÒªÄ¿±êÊÇʹSparkÔÚÊý¾Ý¿ÆÑ§ºÍÏÖʵÊÀ½çÓ¦ÓÃÖдó·ÅÒì²Ê¡£Îª´ËËû¸ÅÊöÁ˼¸¸öÈÎÎñ£¬È罨Á¢Ò»¸ö¿ª·ÅµÄÈÏÖ¤Ì×¼þ£¬¸üºÃµÄÖ§³Ö¶à¸öSpark¼ÆËã»úȺ²¢´æ£¬Ìṩ±ãЯÐԵĴ洢µÈ¡£

ClouderaµÄCSOºÍ¹²Í¬´´Ê¼ÈËMike Olson·¢±íÁËÖ÷ÌâÊÇSpark×÷ΪÏÂÒ»´ú´óÊý¾ÝMapReduce±ê׼ģʽµÄÑݽ²¡£MikeÃèÊöÁËSparkÔÚCloudera²úÆ·ÖеÄÖØÒªµØÎ»£ºÔÚ¹ýÈ¥Ò»ÄêËùÓÐClouderaÖ§³ÖµÄÏîÄ¿ÖУ¬SparkµÄ¿ªÔ´´úÂë¸üлռ×ÜÊýµÄ21£¥¡£SparkÒѾ­ÍêÈ«ÈÚÈëCDH£¬²¢±»ClouderaµÄÖ÷Òª¿Í»§²ÉÓ᣹ØÓëSQLÔÚHadoopÉÏÔËÐУ¬Cloudera»á¼ÌÐøÖ§³ÖÓÃÓëBI·ÖÎöµÄImpala£¬ÓÃÓÚÅúÁ¿´¦ÀíµÄHive on Spark£¬ÒÔ¼°ÓÃÓÚ»ìºÏSparkºÍSQLÓ¦ÓóÌÐòµÄSpark SQL¡£

MapRÊ×ϯ¼¼Êõ¹ÙºÍ´´Ê¼ÈËMC Srivas˵£¬MapRƽ̨°üÀ¨ÍêÕûµÄSpark³ÉÔ±¡£SparkµÄÓŵã°üÀ¨Ò×ÓÚ¿ª·¢£¬»ùÓÚÄÚ´æµÄ¸ßÐÔÄܺÍͳһµÄ¹¤×÷Á÷³Ì£¬HadoopµÄÓŵã°üÀ¨¹æÄ£¿ÉÎÞÏÞÀ©Õ¹£¬Í¨ÓÃµÄÆóҵƽ̨ºÍ¹ã·ºµÄÓ¦Ó÷¶Î§¡£Í¨¹ý½áºÏHadoopºÍSparkµÄÓÅÊÆ£¬¿ÉÒÔ¸øMapR¿Í»§Ìṩ¸üºÃµÄÖ§³Ö¡£ËûչʾÁ˼¸¸öÔÚ²»Í¬ÁìÓòµÄ°¸Àý£¬°üÀ¨¹ã¸æÓÅ»¯£¬»ùÒò×éºÏ£¬ÍøÂ簲ȫºÍ±£½¡±£ÏÕ¡£

DataStaxÖ´Ðи±×ܲÃMartin Van RyswykµÄÑݽ²ÊǹØÓÚÈçºÎÕûºÏSparkºÍCassandra¡£ËûÐû²¼ÍƳöcassandra-driver-spark v1.0¡£DataStaxµÄCassandraÓëSparkµÄ×éºÏ±ÈÓÅ»¯ºóµÄHadoop on CassandraËÙ¶È¿ì2µ½30±¶¡£

SparkµÄSQLÖ§³Ö

1. Spark SQLµÄÖ÷Òª¿ª·¢ÈËÔ±Michael Armbrust£ºÊ¹ÓÃSpark SQL½øÐи߼¶Êý¾Ý·ÖÎö

Spark SQLÊÇSpark1.0ÖÐ×îеÄÒ»¸öalpha×é³É²¿·Ö¡£ÔÚ·å»áÉÏ£¬DatabricksÐû²¼£¬ÒÑÍ£Ö¹¶ÔSharkµÄ¿ª·¢£¬Spark SQL½«ÊÇÆä¿ª·¢µÄÖØµã¡£Spark SQLÔÊÐí¿ª·¢ÈËÔ±Ö±½Ó´¦ÀíRDD£¬Í¬Ê±Ò²¿É²éѯÀýÈçÔÚApache HiveÉÏ´æÔÚµÄÍⲿÊý¾Ý¡£Spark SQLµÄÒ»¸öÖØÒªÌØµãÊÇÆäÄܹ»Í³Ò»´¦Àí¹ØÏµ±íºÍRDD£¬Ê¹µÃ¿ª·¢ÈËÔ±¿ÉÒÔÇáËɵØÊ¹ÓÃSQLÃüÁî½øÐÐÍⲿ²éѯ£¬Í¬Ê±½øÐиü¸´ÔÓµÄÊý¾Ý·ÖÎö¡£³ýÁËSpark SQLÍ⣬Michael»¹Ì¸µ½CatalystÓÅ»¯¿ò¼Ü£¬ËüÔÊÐíSpark SQL×Ô¶¯Ð޸IJéѯ·½°¸£¬Ê¹SQL¸üÓÐЧµØÖ´ÐС£

2. Ó¢ÌØ¶ûÈí¼þÓë·þÎñ²¿Ãʤ³Ì¾­ÀíGrace Huang£ºStreamSQL

ΪÁËʹSQLÓû§Ñ¸ËÙÕÆÎÕÁ÷´¦Àí£¬StreamSQLÖ§³Öͨ¹ýSQL²Ù×÷Á÷Êý¾Ý£¬Ëü½¨Á¢ÔÚSpark StreamingºÍCatalystÓÅ»¯¿ò¼ÜÖ®ÉÏ¡£Ä¿Ç°£¬ËüÖ§³ÖÁ÷Ö®¼ä¼òµ¥µÄ²éѯÒÔ¼°Á÷ºÍ½á¹¹»¯Êý¾ÝÖ®¼äµÄÏ໥²Ù×÷£¬Ò²Ö§³ÖÔÚCatalystÖеĵäÐÍÓ÷¨£¨ÈçLINQ±í´ïʽ£¬SQLºÍDStreamµÄ½áºÏ£©¡£StreamSQL½ñºóµÄ¹¤×÷½«°üÀ¨Òƶ¯´°¿ÚÖ§³Ö£¬Ê¹ÓÃHiveµÄDDL£¬Í³Ò»µÄÊäÈë/Êä³ö¸ñʽµÈ¡£

RºÍCascading×÷ΪSparkµÄǰ¶Ë

1. ¼ÓÖÝ´óѧ²®¿ËÀû·ÖУZongheng Yang£ºSparkR

RÊÇÊý¾Ý¿ÆÑ§¼ÒÃǽøÐзÖÎöºÍ»æÍ¼µÄ×î¹ã·ºÊ¹ÓõÄÓïÑÔÖ®Ò»£¬µ«ÊÇËüÖ»ÄÜÔËÐÐÔÚһ̨¼ÆËã»úÉÏ£¬µ±Êý¾Ý´óµ½³¬¹ýÆäÄÚ´æÊ±£¬R¾Í»á±äµÃÎÞÄÜΪÁ¦ÁË¡£SparkRÊÇRµÄÒ»¸ö³ÌÐò°ü£¬Òò´ËËüÌṩÁËÔÚRµÄ»·¾³ÖÐʹÓÃSparkµÄÒ»¸ö¼òÒ×·½·¨¡£SparkRÔÊÐíÓû§´´½¨RDD²¢ÓÃRº¯Êý¶ÔÆä½øÐб任¡£ÔÚR½»»¥»·¾³ÖпÉÒÔ¸øSpark¼ÆËã»úȺÌá½»×÷Òµ¡£ÔÚSparkRÖл¹¿ÉÒÔ·½±ãµØÀûÓÃÏÖÓеÄR³ÌÐò°ü¡£

2. Concurrent¹«Ë¾¸±×ܲÃSupreet Oberoi£ºCascading on Spark

CascadingÊÇÒ»¸öÁ÷ÐеÄÓ¦ÓóÌÐò¿ª·¢¿ò¼Ü£¬¿ÉÓÃÀ´¹¹½¨ÒÔÊý¾ÝΪÖÐÐĵÄÓ¦ÓóÌÐò¡£ËüʹÓÃTapºÍPipeµÄ¸ÅÄ´Ó¶øÌá¸ßÁËÓû§½¨Á¢MapReduce³ÌÐòµÄ³éÏóˮƽ¡£Cascading 3.0°æ°üÀ¨Ò»¸ö¿É¶¨ÖƵIJéѯ¹æ»®·½°¸£¬ËùÒÔCascading³ÌÐò¿ÉÔËÐÐÔÚ°üÀ¨±¾µØÄÚ´æ¡¢Apache?MapReduceºÍApache?TezµÄºó¶Ë»·¾³ÉÏ¡£¼´½«·¢²¼µÄ3.1°æ½«¿ÉÔËÐÐÔÚSparkÉÏ¡£

Apache SparkÄÚ²¿»úÖÆºÍÓÅ»¯

1. MLlibÖ÷Òª¿ª·¢ÈËÔ±Xiangru Meng£ºMLlibºÍÏ¡ÊèÊý¾Ý

ʵ¼ÊÓ¦ÓÃÖеĴóÐÍÊý¾Ý¼¯ÍùÍùÊÇÏ¡ÊèµÄ¡£Spark?MLlibÖ§³ÖÏ¡Êè¾ØÕóºÍÏòÁ¿µÄ´æ´¢¼°´¦Àí¡£×÷ΪMLlibµÄÓû§£¬Ó¦Ê¶±ðËùÃæÁÙµÄÎÊÌâÊÇ·ñ¿ÉÒÔÓÃÏ¡ÊèÊý¾ÝÀ´±íʾ¡£µ±Êý¾Ý·Ç³£Ï¡Êèʱ£¬ÕâÍùÍù¾ö¶¨ÁËÔËÐеÄЧÂÊ¡£¶ÔÓÚ¿ª·¢Õß¶øÑÔ£¬Ó¦²ÉÓÃÊʵ±µÄ¼ÆËãºÍËã·¨À´ÀûÓÃÏ¡ÊèÊý¾Ý¡£XiangruÏêÊöÁ˶ÔÏ¡ÊèÊý¾ÝµÄÈý¸öÓÅ»¯Ëã·¨£ºÔÚKMeansÖмÆËãÁ½µãµÄ¾àÀ룬ÔÚÏßÐÔÄ£ÐÍÖмÆËãÌݶȵÄ×ܺͣ¬ÒÔ¼°ÈçºÎÔÚSVDÖÐÀûÓÃÏ¡ÊèÊý¾Ý¡£

2. DatabricksµÄAaron Davidson£ºÀí½âSparkµÄÄÚ²¿»úÖÆ

AaronµÄÑݽ²Ö÷ÒªÊÇÈçºÎÔÚʵ¼ÊÓ¦ÓÃÖÐÌá¸ßSparkºËÐÄÐÔÄÜ¡£ËûÏêÊöÁËSpark RDDµÄÖ´ÐÐÄ£ÐͺÍshuffle²Ù×÷¡£RDD±£´æÁ˲úÉúµÄ˳ÐòºÍ¼ÆËã¹ý³Ì£¬´Ó??¶øÐγÉÒ»¸öÂß¼­µÄ¼Æ»®¡£Âß¼­¼Æ»®ÔÚshuffleµÄ±ß½ç·ÖΪ²»Í¬µÄÖ´Ðн׶Σ¬ËùÓÐÖ´Ðн׶ÎÐγÉÒ»¸öDAG¡£Ö´Ðн׶μÓÉÏÒ»¸öÊý¾ÝpartitionÐγÉÒ»¸öÖ´ÐÐÈÎÎñ¡£µ±¸¸±²½×¶ÎÖ´Ðкó£¬ÈÎÎñµ÷¶ÈÆ÷¾Í»áΪÿһ¸öÈÎÎñÌá½»Ò»¸ö×÷Òµ¡£ÔÚshuffleµÄ±ß½ç£¬MapperÈÎÎñ½«Êý¾Ý°´ÕÕpartition±£´æµ½´ÅÅÌ£¬¶øreducer´Ó¶à¸ömapperÌáÈ¡Êý¾Ý£¬²¢°´ÕÕkeyÀ´×éºÏÊý¾Ý¡£´Ë¹ý³ÌÖÐshuffleµÄÍøÂçͨÐÅÊǰº¹óµÄ£¬°´keyµÄÊý¾Ý×éºÏÒ²»áʹÓôóÁ¿µÄÄÚ´æ¡£Aaron¾ÙÁËÒ»¸ö¼òµ¥µÄÀý×Ó£º¼ÆË㲻ͬÈËÃûµÄÊýÁ¿£¬²¢ÓÃÈËÃûµÄµÚÒ»¸ö×Öĸ·Ö×é¡£ËûÑÝʾÁËÁ½¸ö²»Í¬µÄʵÏÖ·½·¨£¬²¢ÔÚDatabricks CloudÖÐÔËÐУ¬±È½ÏÁËÖ´Ðн׶κÍÔËÐÐʱ¼ä¡£

»ùÓÚApache SparkµÄ¿ÆÑм°Ó¦ÓÃ

1. ¼ÓÖÝ´óѧ²®¿ËÀû·ÖУ½ÌÊÚDavid Patterson£ºSparkºÍ»ùÒòѧ

David PattersonÊÇRISC½á¹¹µÄ´´Ê¼Õߣ¬ÓÉËûºÏ×÷׫дµÄ¼ÆËã»úÌåϵ½á¹¹Ò»ÊéÊǾ­µäµÄ½Ì¿ÆÊé¡£DavidÔÚÑݽ²ÖнéÉÜÁ˼¸¸ö»ùÓÚSparkÖ®ÉϵĿªÔ´»ùÒòѧÈí¼þÏîÄ¿¡£SNAPÊǶ̶Á»ùÒòÐòÁÐУ׼Æ÷£¬ËüÊÇÆù½ñΪֹ×î׼ȷºÍ×î¿ìµÄУ׼Æ÷£¬±ÈÆäËûµÄУ׼Æ÷¿ì3µ½10±¶¡£?ADAMÊÇÒ»¸öÊÊÓÃÓÚÔÚ¼ÆËã»úȺÖд洢µÄ»ùÒò¸ñʽ£¬Ê¹ÓÃÏȽøµÄϵͳ¼¼Êõ£¬¿É´ó´ó¼ÓËÙÕû¸ö»ùÒò´¦ÀíÈí¼þÁ÷Ë®ÏßµÄЧÂÊ¡£?ʹÓÃÒ»¸ö82¸ö½ÚµãµÄ¼ÆËã»úȺ£¬ADAM¿ÉÒÔÓÃ±ÈÆäËûϵͳ¿ì110±¶µÄËÙ¶ÈÀ´Ö´ÐлùÒòѧÖÐÁ½¸ö×î°º¹óµÄ²½Öè¡£µ±David½ÌÊÚ½éÉÜÁËŦԼʱ±¨ÎªÆÚ2014Äê6ÔÂ4ÈÕ¹ØÓÚSNAPÈçºÎ°ïÖúÍì¾ÈÁËÒ»¸öº¢×ÓµÄÉúÃüµÄÐÂÎÅʱ£¬È«³¡ÏìÆðÈÈÁÒµÄÕÆÉù¡£

2. Jawbone¹«Ë¾Êý¾Ý¸±×ܲÃMonica Rogati£ºÎªÊý¾Ýʱ´úµÄ´óÖÚÖÆÔìÊý¾Ý²úÆ·

¿ÉÁªÍøµÄÉ豸µÄÊýÁ¿µ½2020Ä꽫Ôö³¤µ½500ÒÚ¡£ÔÚÊý¾Ýʱ´úµÄ´óÖÚÑÛÖУ¬ÊÀ½ç½«ÊÇ´ÏÃ÷µÄ£¬¿ÉÊÊӦÿ¸öÈ˵ĶÀÌØÇé¿ö¡£MonicaÈÏΪSparkÊǹ¹½¨ÖÇÄÜÊý¾Ý²úÆ·µÄÖ÷Òª×é³É²¿·Ö£¬ÒòΪËüÖ§³Ö¹¤Òµ½çËùÐèµÄÊý¾ÝÁ÷³Ì£¬ÎÞ¿ÉÌôÌÞµÄÊý¾ÝÇåÀí£¬µü´ú£¬»úÆ÷ѧϰºÍ¸ü¿ìµÄÔËÐÐËÙ¶È¡£

3. SpotifyµÄ¹¤³ÌʦChris Johnson£º´óÐÍÒôÀÖÍÆ¼öϵͳ

SpotifyʹÓø÷ÖÖ»úÆ÷ѧϰģÐÍÀ´ÔöÇ¿ÆäÒôÀÖÍÆ¼ö¹¦ÄÜ£¬°üÀ¨ÍøÒ³·¢Ïֺ͵ç̨¡£ÓÉÓÚÕâЩģÐ͵ĵü´úÌØÐÔ£¬ËüÃǷdz£ÊʺÏSparkµÄ¼ÆËãģʽ£¬¿ÉÒÔ±ÜÃâHadoopÊäÈë/Êä³öËù´øÀ´µÄ¿ªÏú¡£ÔÚÕâ´Î½²×ùÖУ¬ChrisÆÀÂÛÁËÁ½¸ö¹²Í¬É¸Ñ¡Ëã·¨£¬ÒÔ¼°ËûÈçºÎ»ùÓÚSpark?MLlibÖеÄALSÀ´´¦ÀíÊýǧÒÚµÄÊý¾Ýµã¡£

4. ËÕÀèÊÀÁª°îÀí¹¤Ñ§Ôº½²Ê¦Kevin Mader£ºÊ¹ÓÃSpark½øÐÐʵʱͼÏñ´¦ÀíºÍ·ÖÎö

²ÉÓûùÓÚͬ²½¼ÓËÙÆ÷XÉäÏß²ãÎöMicroscopy¿ÉÒÔÿÃë²úÉú8GBµÄͼÏñÊý¾Ý¡£ÎªÁËʵʱ´¦ÀíÕâЩͼÏñ£¬Kevin²ÉÓÃÁËÒ»¸öÉÏǧ̨»úÆ÷µÄ¼ÆËã»úȺ£¬²¢ÔÚÉÏÃæ¿ª·¢ÁËÒ»Ì×»ùÓÚSpark£¬¿É½øÐйýÂË¡¢·Ö¸îºÍÐÎ×´·ÖÎöµÄϵͳ¡£ÎªÁ˼ÓËÙºóÆÚ´¦Àí£¬Kevin½øÐÐÁËʵʱ½üËÆ·ÖÎö£¬ÈçÇøÓòɸѡºÍ²ÉÑù¡£

½áÊøÓï

Spark Summit 2014ÊÇSpark¿ªÔ´Éú̬ϵͳ·¢Õ¹×³´óµÄÒ»¸öÖØÒªÀï³Ì±®£¬Apache SparkÒѾ­³ÉΪÕûºÏÒÔÏ´óÊý¾ÝÓ¦Óõıê׼ƽ̨£º

  • ¸´ÔӵķÖÎö£¨ÀýÈç»úÆ÷ѧϰ£©
  • ½»»¥Ê½²éѯ£¬°üÀ¨SQL
  • ʵʱÁ÷´¦Àí

ÓÐÔ½À´Ô½¶àµÄ¹¤Òµ²úÆ·½¨Á¢ÔÚ»ò¼¯³ÉÁËSparkÈçDatabricks CloudºÍSAP?HANAµÈ¡£

Õ¹ÍûδÀ´£¬Apache SparkÉçÇø½«¼ÌÐøÔÚ¶à¸öÁìÓò½øÒ»²½´´Ð£¬ÒÔÌṩ¸ü¶àµÄ¹¦ÄÜ£¬¸ü¿ìµÄÔËÐÐЧÂʺ͸üÉî¶ÈµÄÕûºÏ£º

  • SparkÄں˽«Ìṩһ¸ö¿É°Î²åµÄshuffle½Ó¿Ú¡£ÏÖÓеÄshufflerÊÇ»ùÓÚHashMapÀ´»ã×ܾßÓÐÏàͬ¹Ø¼ü´ÊµÄÊý¾Ý£¬µ±ÄÚ´æÑ¹Á¦¸ßʱ£¬Ëü»á×Ô¶¯Òç³öµ½´ÅÅÌÀï¡£ÓÐÁ˿ɲå°Î½Ó¿Ú£¬ÔÚδÀ´µÄ°æ±¾Öн«¼ÓÈëÅÅÐòºÍÁ÷Ë®Ïßshuffler¡£
  • SparkÄں˽«½¨Á¢Ò»¸öͳһµÄ´æ´¢API£¬¿ÉÒÔÖ§³Ö¹Ì̬ӲÅÌÇý¶¯Æ÷£¨SSD£©£¬ÒÔ¼°ÆäËû¹²ÏíÄÚ´æµÄÈí¼þ´æ´¢ÏµÍ³£¬ÈçTachyon£¬HDFS»º´æµÈ¡£
  • ÓëYARN¸ü½ôÃܵɣ¬±ÈÈ綯̬µ÷Õû×ÊÔ´·ÖÅ䣬À´¸üºÃµÄÖ§³Ömulti-tenency¡£
  • Spark SQL×÷ΪеÄSQLÒýÇæÀ´È¡´úShark¡£»ùÓÚCatalystµÄÓÅ»¯ÒýÇæ¿ÉÒÔÖ±½ÓΪSparkÄں˽øÐÐÓÅ»¯´¦Àí¡£¼´½«ÍƳöµÄ¶¯Ì¬´úÂëÉú³É½«´ó´óÌá¸ß²éѯЧÂÊ¡£
  • Spark SQL½«ÕûºÏ¸÷ÖÖÊý¾ÝÔ´£¬°üÀ¨Parquet£¬JSON£¬NoSQLÊý¾Ý¿â£¨Cassandra£¬HBase£¬MongoDB£©ºÍ´«Í³µÄÐÍÊý¾Ý¿â£¨SAP£¬VerticaºÍOracle£©¡£
  • MLlib½«°üÀ¨Ò»¸öͳ¼Æ¿âÀ´½øÐгéÑù£¬Ïà¹ØÐÔ£¬¹À¼ÆºÍ²âÊÔ¡£²»¾ÃÍÆ³ö½«Ò»×éеÄËã·¨£¬°üÀ¨·Ç¸º¾ØÕó·Ö½â£¬Ï¡ÊèµÄSVD£¬LDAµÈ¡£
  • Spark Streaming½«Ôö¼ÓеÄÊý¾ÝÔ´ºÍ¸üºÃµÄÓëApache FlumeµÄÕûºÏ¡£

ͨ¹ýÕâ´ÎµÄÊ¢»á£¬¸ü¼Ó¼á¶¨ÁËSparkÔÚ´óÊý¾ÝÖеĺËÐĵØÎ»¡£ÈÃÎÒÃÇÆÚ´ýSparkÔÚδÀ´µÄ¸ü¼Ó¾«²ÊµÄ·¢Õ¹¡£

¹ØÓÚ×÷Õß

ÐÁœ›£¨Reynold Xin£©ÊÇApache Spark¿ªÔ´ÉçÇøµÄÖ÷µ¼ÈËÎïÖ®Ò»¡£ËûÔÚUC Berkeley AMPLab½øÐв©Ê¿Ñ§ÒµÆÚ¼ä²ÎÓëÁËSparkµÄ¿ª·¢£¬²¢ÔÚSparkÖ®ÉϱàдÁËSharkºÍGraphXÁ½¸ö¿ªÔ´¿ò¼Ü¡£ËûºÍAMPLabͬÁŹ²Í¬´´½¨ÁËDatabricks¹«Ë¾¡£

Íõ¶«ÊÇDatabricksµÄÈí¼þ¹¤³Ìʦ£¬Ä¿Ç°ÔÚ¿ª·¢»ùÓÚApache?SparkµÄ²úÆ·¡£ËûÓÚ2003Äê»ñµÃ¿¨ÄÚ»ù÷¡´óѧµÄ²©Ê¿Ñ§Î»ºó£¬Ò»Ö±Ôھɽðɽ¼°ÍåÇø¹¤×÷¡£ÔÚ¼ÓÈëDatabricksǰËûÔÚTwitter´ÓÊÂËÑË÷ºÍÍÆ¼öϵͳµÄÈí¼þ¿ª·¢ºÍ´óÊý¾Ý·ÖÎö¡£

   
1680 ´Îä¯ÀÀ       27
Ïà¹ØÎÄÕÂ

»ùÓÚEAµÄÊý¾Ý¿â½¨Ä£
Êý¾ÝÁ÷½¨Ä££¨EAÖ¸ÄÏ£©
¡°Êý¾Ýºþ¡±£º¸ÅÄî¡¢ÌØÕ÷¡¢¼Ü¹¹Óë°¸Àý
ÔÚÏßÉ̳ÇÊý¾Ý¿âϵͳÉè¼Æ ˼·+Ч¹û
 
Ïà¹ØÎĵµ

GreenplumÊý¾Ý¿â»ù´¡Åàѵ
MySQL5.1ÐÔÄÜÓÅ»¯·½°¸
ijµçÉÌÊý¾ÝÖÐ̨¼Ü¹¹Êµ¼ù
MySQL¸ßÀ©Õ¹¼Ü¹¹Éè¼Æ
Ïà¹Ø¿Î³Ì

Êý¾ÝÖÎÀí¡¢Êý¾Ý¼Ü¹¹¼°Êý¾Ý±ê×¼
MongoDBʵս¿Î³Ì
²¢·¢¡¢´óÈÝÁ¿¡¢¸ßÐÔÄÜÊý¾Ý¿âÉè¼ÆÓëÓÅ»¯
PostgreSQLÊý¾Ý¿âʵսÅàѵ
×îл¼Æ»®
DeepSeekÔÚÈí¼þ²âÊÔÓ¦ÓÃʵ¼ù 4-12[ÔÚÏß]
DeepSeek´óÄ£ÐÍÓ¦Óÿª·¢Êµ¼ù 4-19[ÔÚÏß]
UAF¼Ü¹¹ÌåϵÓëʵ¼ù 4-11[±±¾©]
AIÖÇÄÜ»¯Èí¼þ²âÊÔ·½·¨Óëʵ¼ù 5-23[ÉϺ£]
»ùÓÚ UML ºÍEA½øÐзÖÎöÉè¼Æ 4-26[±±¾©]
ÒµÎñ¼Ü¹¹Éè¼ÆÓ뽨ģ 4-18[±±¾©]

MySQLË÷Òý±³ºóµÄÊý¾Ý½á¹¹
MySQLÐÔÄܵ÷ÓÅÓë¼Ü¹¹Éè¼Æ
SQL ServerÊý¾Ý¿â±¸·ÝÓë»Ö¸´
ÈÃÊý¾Ý¿â·ÉÆðÀ´ 10´óDB2ÓÅ»¯
oracleµÄÁÙʱ±í¿Õ¼äдÂú´ÅÅÌ
Êý¾Ý¿âµÄ¿çƽ̨Éè¼Æ


²¢·¢¡¢´óÈÝÁ¿¡¢¸ßÐÔÄÜÊý¾Ý¿â
¸ß¼¶Êý¾Ý¿â¼Ü¹¹Éè¼ÆÊ¦
HadoopÔ­ÀíÓëʵ¼ù
Oracle Êý¾Ý²Ö¿â
Êý¾Ý²Ö¿âºÍÊý¾ÝÍÚ¾ò
OracleÊý¾Ý¿â¿ª·¢Óë¹ÜÀí


GE Çø¿éÁ´¼¼ÊõÓëʵÏÖÅàѵ
º½Ìì¿Æ¹¤Ä³×Ó¹«Ë¾ Nodejs¸ß¼¶Ó¦Óÿª·¢
ÖÐÊ¢Òæ»ª ׿Խ¹ÜÀíÕß±ØÐë¾ß±¸µÄÎåÏîÄÜÁ¦
ijÐÅÏ¢¼¼Êõ¹«Ë¾ PythonÅàѵ
ij²©²ÊITϵͳ³§ÉÌ Ò×ÓÃÐÔ²âÊÔÓëÆÀ¹À
ÖйúÓÊ´¢ÒøÐÐ ²âÊÔ³ÉÊì¶ÈÄ£Ðͼ¯³É(TMMI)
ÖÐÎïÔº ²úÆ·¾­ÀíÓë²úÆ·¹ÜÀí