Äú¿ÉÒÔ¾èÖú£¬Ö§³ÖÎÒÃǵĹ«ÒæÊÂÒµ¡£

1Ôª 10Ôª 50Ôª





ÈÏÖ¤Â룺  ÑéÖ¤Âë,¿´²»Çå³þ?Çëµã»÷Ë¢ÐÂÑéÖ¤Âë ±ØÌî



  ÇóÖª ÎÄÕ ÎÄ¿â Lib ÊÓÆµ iPerson ¿Î³Ì ÈÏÖ¤ ×Éѯ ¹¤¾ß ½²×ù Modeler   Code  
»áÔ±   
 
   
 
 
     
   
 ¶©ÔÄ
  ¾èÖú
¿´´óƬ ÉîÈëÀí½âSparkµÄ¸ÅÄîºÍ±à³Ì·½Ê½
 
×÷ÕߣºØýÃû À´Ô´£º51CTO ·¢²¼ÓÚ£º2017-1-22
  1929  次浏览      27
 

µÚÒ»´ÎÌýÎÅSparkÊÇ2013ÄêÄêÄ©£¬µ±Ê±±ÊÕß¶ÔScala(SparkµÄ±à³ÌÓïÑÔ)¸ÐÐËȤ¡£Ò»¶Îʱ¼äÖ®ºó×öÁËÒ»¸öÓÐȤµÄÊý¾Ý¿ÆÑ§ÏîÄ¿£¬ÊÔͼԤ²â̩̹Äá¿ËºÅÉϵÄÉú»¹Çé¿ö(Kaggle¾ºÈüÏîÄ¿£¬Í¨¹ýʹÓûúÆ÷ѧϰԤ²â̩̹Äá¿ËºÅÉÏÄÄЩ³Ë¿Í¾ß±¸¸ü¸ßµÄÉú»¹¿ÉÄÜÐÔ)¡£Í¨¹ý¸ÃÏîÄ¿¿ÉÒÔ¸üÉîÈëµØÀí½âSparkµÄ¸ÅÄîºÍ±à³Ì·½Ê½¡£

ÔÚ±¾ÎÄIntroduction to Apache Spark with Examples and Use Cases£¬×÷ÕßRADEK OSTROWSKI½«Í¨¹ýKaggle¾ºÈüÏîÄ¿¡°Ô¤²â̩̹Äá¿ËºÅÉϵÄÉú»¹Çé¿ö¡±´ø´ó¼ÒÉîÈëѧϰSpark¡£

ÒÔÏÂΪÒëÎÄ

µÚÒ»´ÎÌýÎÅSparkÊÇ2013ÄêÄêÄ©£¬µ±Ê±±ÊÕß¶ÔScala(SparkµÄ±à³ÌÓïÑÔ)¸ÐÐËȤ¡£Ò»¶Îʱ¼äÖ®ºó×öÁËÒ»¸öÓÐȤµÄÊý¾Ý¿ÆÑ§ÏîÄ¿£¬ÊÔͼԤ²â̩̹Äá¿ËºÅÉϵÄÉú»¹Çé¿ö(Kaggle¾ºÈüÏîÄ¿£¬Í¨¹ýʹÓûúÆ÷ѧϰԤ²â̩̹Äá¿ËºÅÉÏÄÄЩ³Ë¿Í¾ß±¸¸ü¸ßµÄÉú»¹¿ÉÄÜÐÔ)¡£Í¨¹ý¸ÃÏîÄ¿¿ÉÒÔ¸üÉîÈëµØÀí½âSparkµÄ¸ÅÄîºÍ±à³Ì·½Ê½£¬Ç¿ÍƼöÏëÒª¾«½øSparkµÄ¿ª·¢ÈËÔ±ÄøÃÏîÄ¿ÈëÊÖ¡£

Èç½ñSparkÔÚÖڶ໥ÁªÍø¹«Ë¾±»¹ã·º²ÉÓã¬ÀýÈçAmazon¡¢eBayºÍYahooµÈ¡£Ðí¶à¹«Ë¾ÓµÓÐÔËÐÐÔÚÉÏǧ¸ö½ÚµãµÄSpark¼¯Èº¡£¸ù¾ÝSpark FAQ£¬ÒÑÖª×î´óµÄ¼¯ÈºÓÐ×ų¬¹ý8000¸ö½Úµã¡£²»ÄÑ¿´³ö£¬SparkÊÇÒ»ÏîÖµµÃ¹Ø×¢ºÍѧϰµÄ¼¼Êõ¡£

±¾ÎÄͨ¹ýһЩʵ¼Ê°¸ÀýºÍ´úÂëʾÀý¶ÔSpark½øÐнéÉÜ£¬°¸ÀýºÍ´úÂëʾÀý²¿·Ö³ö×ÔApache Spark¹Ù·½ÍøÕ¾£¬Ò²ÓÐÒ»²¿·Ö³ö×Ô¡¶Learning Spark - Lightning-Fast Big Data Analysis¡·Ò»Êé¡£

ʲôÊÇ Apache Spark? ³õ²½½éÉÜ

SparkÊÇApacheµÄÒ»¸öÏîÄ¿£¬±»Ðû´«Îª"ÉÁµç°ã¿ìËÙ¼¯Èº¼ÆËã"£¬ËüÓµÓз±ÈٵĿªÔ´ÉçÇø£¬Í¬Ê±Ò²ÊÇĿǰ×î»îÔ¾µÄApacheÏîÄ¿¡£

SparkÌṩÁËÒ»¸ö¸ü¿ì¸üͨÓõÄÊý¾Ý´¦ÀíÆ½Ì¨¡£ÓëHadoopÏà±È£¬ÔËÐÐÔÚÄÚ´æÖеijÌÐò£¬SparkµÄËÙ¶È¿ÉÒÔÌá¸ß100±¶£¬¼´Ê¹ÔËÐÐÔÚ´ÅÅÌÉÏ£¬ÆäËÙ¶ÈÒ²ÄÜÌá¸ß10±¶¡£È¥Ä꣬SparkÔÚ´¦ÀíËÙ¶È·½ÃæÒѾ­³¬Ô½ÁËHadoop£¬½öÀûÓÃÊ®·ÖÖ®Ò»ÓÚHadoopƽ̨µÄ»úÆ÷£¬È´ÒÔ3±¶ÓÚHadoopµÄËÙ¶ÈÍê³ÉÁË100TBÊýÁ¿¼¶µÄDaytona GreySort±ÈÈü£¬³ÉΪÁËPB¼¶±ðÅÅÐòËÙ¶È×î¿ìµÄ¿ªÔ´ÒýÇæ¡£

ͨ¹ýʹÓÃSparkËùÌṩµÄ³¬¹ý80¸ö¸ß¼¶º¯Êý£¬Èøü¿ìËÙµØÍê³É±àÂë³ÉΪ¿ÉÄÜ¡£´óÊý¾ÝÖеÄ"Hello World!"(±à³ÌÓïÑÔÑÓÐøÏÂÀ´Ò»¸ö¹ßÀý)£ºWord Count³ÌÐòʾÀý¿ÉÒÔ˵Ã÷ÕâÒ»µã£¬Í¬ÑùµÄÂß¼­Ê¹ÓÃJavaÓïÑÔ±àдMapReduce´úÂëÐèÒª50ÐÐ×óÓÒ£¬µ«ÔÚSpark(ScalaÆÀÒéʵÏÖ)ÖеÄʵÏַdz£¼òµ¥£º

sparkContext.textFile("hdfs://..."). flatMap(line => line.split(" ")). map(word => (word, 1)).
 reduceByKey(_ + _).saveAsTextFile("hdfs://...")

ѧϰÈçApache SparkµÄÁíÒ»¸öÖØÒªÍ¾¾¶ÊÇʹÓý»»¥Ê½shell (REPL)£¬Ê¹ÓÃREPL¿ÉÒÔ½»»¥ÏÔʾ´úÂëÔËÐнá¹û£¬ÊµÊ±²âÊÔÿÐдúÂëµÄÔËÐнá¹û£¬ÎÞÐèÏȱàÂë¡¢ÔÙÖ´ÐÐÕû¸ö×÷Òµ£¬Èç´Ë±ãÄÜËõ¶Ì»¨ÔÚ´úÂëÉϵŤ×÷ʱ¼ä£¬Í¬Ê±Îª¼´Ï¯Êý¾Ý·ÖÎöÌṩÁË¿ÉÄÜ¡£

SparkµÄÆäËûÖ÷Òª¹¦ÄܰüÀ¨£º

Ŀǰ֧³ÖScala£¬JavaºÍPythonÈýÖÖÓïÑ﵀ API£¬²¢ÕýÔÚÖð²½Ö§³ÖÆäËûÓïÑÔ(ÀýÈçRÓïÑÔ);

Äܹ»ÓëHadoopÉú̬ϵͳºÍÊý¾ÝÔ´(HDFS£¬Amazon S3£¬Hive£¬HBase£¬CassandraµÈ)ÍêÃÀ¼¯³É;

¿ÉÒÔÔËÐÐÔÚHadoop YARN»òÕßApache Mesos¹ÜÀíµÄ¼¯ÈºÉÏ£¬Ò²¿ÉÒÔͨ¹ý×Ô´øµÄ×ÊÔ´¹ÜÀíÆ÷¶ÀÁ¢ÔËÐС£

Spark ÄÚºËÖ®ÉÏ»¹ÓÐÐí¶àÇ¿´óµÄ¡¢¸ü¸ß¼¶µÄ¿â×÷Ϊ²¹³ä£¬¿ÉÒÔÔÚͬһӦÓóÌÐòÖÐÖ±½ÓʹÓã¬Ä¿Ç°ÓÐSparkSQL£¬Spark Streaming£¬MLlib(ÓÃÓÚ»úÆ÷ѧϰ)ºÍGraphXÕâËÄ´ó×é¼þ¿â£¬±¾ÎĽ«¶ÔSpark Core¼°ËÄ´ó×é¼þ¿â½øÐÐÏêϸ½éÉÜ¡£µ±È»£¬»¹ÓжîÍâÆäËüµÄSpark¿âºÍÀ©Õ¹¿âĿǰҲ´¦ÓÚ¿ª·¢ÖС£

Spark Core

Spark CoreÊÇ´ó¹æÄ£²¢ÐмÆËãºÍ·Ö²¼Ê½Êý¾Ý´¦ÀíµÄ»ù´¡ÒýÇæ¡£ËüµÄÖ°ÔðÓУº

ÄÚ´æ¹ÜÀíºÍ¹ÊÕϻָ´;

µ÷¶È¡¢·Ö·¢ºÍ¼à¿Ø¼¯ÈºÉϵÄ×÷Òµ;

Óë´æ´¢ÏµÍ³½øÐн»»¥¡£

SparkÒýÈëÁËRDD(µ¯ÐÔ·Ö²¼Ê½Êý¾Ý¼¯)µÄ¸ÅÄRDDÊÇÒ»¸ö²»¿É±äµÄÈÝ´í¡¢·Ö²¼Ê½¶ÔÏ󼯺ϣ¬Ö§³Ö²¢ÐвÙ×÷¡£RDD¿É°üº¬ÈκÎÀàÐ͵ĶÔÏ󣬿Éͨ¹ý¼ÓÔØÍⲿÊý¾Ý¼¯»òͨ¹ýDriver³ÌÐòÖеļ¯ºÏÀ´Íê³É´´½¨¡£

RDDÖ§³ÖÁ½ÖÖÀàÐ͵IJÙ×÷£º

ת»»(Transformations)Ö¸µÄÊÇ×÷ÓÃÓÚÒ»¸öRDDÉϲ¢»á²úÉú°üº¬½á¹ûµÄÐÂRDDµÄ²Ù×÷(ÀýÈçmap, filter, join, unionµÈ)

¶¯×÷(Actions)Ö¸µÄÊÇ×÷ÓÃÓÚÒ»¸öRDDÖ®ºó£¬»á´¥·¢¼¯Èº¼ÆËã²¢µÃµ½·µ»ØÖµµÄ²Ù×÷(ÀýÈçreduce£¬count£¬firstµÈ)

SparkÖеÄת»»²Ù×÷ÊÇ¡°ÑÓ³ÙµÄ(lazy)¡±£¬Òâζ×Åת»»Ê±ËüÃDz¢²»Á¢¼´Æô¶¯¼ÆËã²¢·µ»Ø½á¹û¡£Ïà·´£¬ËüÃÇÖ»ÊÇ¡°¼Çס¡±ÒªÖ´ÐеIJÙ×÷ºÍ´ýÖ´ÐвÙ×÷µÄÊý¾Ý¼¯(ÀýÈçÎļþ)¡£×ª»»²Ù×÷½öµ±²úÉúµ÷ÓÃaction²Ù×÷ʱ²Å»á´¥·¢Êµ¼Ê¼ÆË㣬Íê³Éºó½«½á¹û·µ»Øµ½driver³ÌÐò¡£ÕâÖÖÉè¼ÆÊ¹SparkÄܹ»¸üÓÐЧµØÔËÐУ¬ÀýÈ磬Èç¹ûÒ»¸ö´óÎļþÒÔ²»Í¬·½Ê½½øÐÐת»»²Ù×÷²¢´«µÝµ½Ê׸öaction²Ù×÷£¬´ËʱSpark½«Ö»·µ»ØµÚÒ»ÐеĽá¹û£¬¶ø²»ÊǶÔÕû¸öÎļþÖ´ÐвÙ×÷¡£

ĬÈÏÇé¿öÏ£¬Ã¿´Î¶ÔÆä´¥·¢Ö´ÐÐaction²Ù×÷ʱ£¬¶¼ÐèÒªÖØÐ¼ÆËãÇ°Ãæ¾­¹ýת»»²Ù×÷µÄRDD£¬²»¹ý£¬ÄãÒ²¿ÉÒÔʹÓó־û¯»ò»º´æ·½·¨ÔÚÄÚ´æÖг־û¯RDDÀ´±ÜÃâÕâÒ»ÎÊÌ⣬´Ëʱ£¬Spark½«ÔÚ¼¯ÈºµÄÄÚ´æÖб£ÁôÕâÐ©ÔªËØ£¬´Ó¶øÔÚÏ´ÎʹÓÃʱ¿ÉÒÔ¼ÓËÙ·ÃÎÊ¡£

SparkSQL

SparkSQLÊÇSparkÖÐÖ§³ÖSQLÓïÑÔ»òÕßHive²éѯÓïÑÔ²éѯÊý¾ÝµÄÒ»¸ö×é¼þ¡£ËüÆðÏÈ×÷ΪApache Hive ¶Ë¿ÚÔËÐÐÔÚSparkÖ®ÉÏ(Ìæ´úMapReduce)£¬ÏÖÔÚÒѾ­±»¼¯³ÉΪSparkµÄÒ»¸öÖØÒª×é¼þ¡£³ýÖ§³Ö¸÷ÖÖÊý¾ÝÔ´£¬Ëü»¹¿ÉÒÔʹÓôúÂëת»»À´½øÐÐSQL²éѯ£¬¹¦ÄÜÊ®·ÖÇ¿´ó¡£ÏÂÃæÊǼæÈÝHive²éѯµÄʾÀý£º

// sc is an existing SparkContext.  
val sqlContext = new org.apache.spark.sql.hive.HiveContext(sc)
sqlContext.sql("CREATE TABLE IF NOT EXISTS src (key INT, value STRING)")
sqlContext.sql("LOAD DATA LOCAL INPATH 'examples/src/main/resources/kv1.txt' INTO TABLE src") // Queries are expressed in HiveQL
sqlContext.sql("FROM src SELECT key, value").collect().foreach(println)

Spark Streaming

Spark StreamingÖ§³Öʵʱ´¦ÀíÁ÷Êý¾Ý£¬ÀýÈçÉú²ú»·¾³ÖеÄWeb·þÎñÆ÷ÈÕÖ¾Îļþ(ÀýÈç Apache FlumeºÍ HDFS/S3)£¬É罻ýÌåÊý¾Ý(ÀýÈçTwitter)ºÍ¸÷ÖÖÏûÏ¢¶ÓÁÐÖÐ(ÀýÈçKafka)µÄʵʱÊý¾Ý¡£ÔÚÒýÇæÄÚ²¿£¬Spark Streaming½ÓÊÕÊäÈëµÄÊý¾ÝÁ÷£¬Óë´Ëͬʱ½«Êý¾Ý½øÐÐÇз֣¬ÐγÉÊý¾ÝƬ¶Î(batch)£¬È»ºó½»ÓÉSparkÒýÇæ´¦Àí£¬°´Êý¾ÝƬ¶ÎÉú³É×îÖյĽá¹ûÁ÷£¬ÈçÏÂͼËùʾ¡£

Spark Streaming APIÓëSpark Core½ôÃܽáºÏ£¬Ê¹µÃ¿ª·¢ÈËÔ±¿ÉÒÔÇáËɵØÍ¬Ê±¼ÝÊ»Åú´¦ÀíºÍÁ÷Êý¾Ý¡£

MLlib

MLlibÊÇÒ»¸öÌṩ¶àÖÖËã·¨µÄ»úÆ÷ѧϰ¿â£¬Ä¿µÄÊÇʹÓ÷ÖÀ࣬»Ø¹é£¬¾ÛÀ࣬Эͬ¹ýÂ˵ÈËã·¨Äܹ»ÔÚ¼¯ÈºÉϺáÏòÀ©Õ¹(¿ÉÒÔ²éÔÄToptalÖйØÓÚ»úÆ÷ѧϰµÄÎÄÕÂÏêϸÁ˽â)¡£MLlibÖеÄһЩËã·¨Ò²Äܹ»ÓëÁ÷Êý¾ÝÒ»ÆðʹÓã¬ÀýÈçʹÓÃÆÕͨ×îС¶þ³Ë·¨µÄÏßÐԻعéËã·¨»òk¾ùÖµ¾ÛÀàËã·¨(ÒÔ¼°¸ü¶àÆäËûÕýÔÚ¿ª·¢µÄËã·¨)¡£Apache Mahout(Ò»¸öHadoopµÄ»úÆ÷ѧϰ¿â)ÞðÆúMapReduce²¢½«ËùÓеÄÁ¦Á¿·ÅÔÚSpark MLlibÉÏ¡£

GraphX

GraphXÊÇÒ»¸öÓÃÓÚ²Ù×÷ͼºÍÖ´ÐÐͼ²¢ÐвÙ×÷µÄ¿â¡£ËüΪETL¼´Extraction-Transformation-Loading¡¢Ì½Ë÷ÐÔ·ÖÎöºÍµü´úͼ¼ÆËãÌṩÁËͳһµÄ¹¤¾ß¡£³ýÁËÄÚÖõÄͼ²Ù×÷Ö®Í⣬ËüÒ²ÌṩÁËÒ»¸öͨÓõÄͼËã·¨¿âÈçPageRank¡£

ÈçºÎʹÓÃApache Spark: ʼþ¼à²âÓÃÀý

»Ø´ðÁË¡°Ê²Ã´ÊÇApache Spark?¡±µÄÎÊÌâÖ®ºó£¬ÏÖÔڻعýÍ·À´ÏëÏëÄÄЩÀàÐ͵ÄÎÊÌâ»òÕßÌôÕ½¿ÉÒÔʹSparkµÃµ½¸üÓÐЧµÄʹÓá£

ÎÒ×î½üżȻ·¢ÏÖÁËһƪ¹ØÓÚͨ¹ý·ÖÎöTwitterÁ÷À´¼ì²âµØÕðµÄʵÑ飬ÓÐȤµÄÊÇ£¬ÊµÑé½á¹ûÒѾ­±íÃ÷ʹÓÃÕâÖÖ·½Ê½Í¨ÖªÈÕ±¾·¢ÉúµØÕðµÄËÙ¶È»á±ÈÈÕ±¾ÆøÏó¾Ö¸ü¿ì¡£¼´Ê¹ÔÚÎÄÕÂÖÐËûÃÇʹÓÃÁËÓë±¾ÎIJ»Í¬µÄ¼¼Êõ£¬µ«ÎÒÈÏΪÕâÊÇÒ»¸öºÜºÃµÄÀý×Ó£¬Í¨¹ýʹÓÃSpark±àд¼ò½àµÄ´úÂ룬ͬʱÓÖÎÞÐèΪ¼æÈÝÐÔ¡¢»¥²Ù×÷ÐÔ¶ø±àд½ºË®´úÂë(glue code)¡£

Ê×ÏÈ£¬ÎÒÃDZØÐë¹ýÂ˳öÓë¡°earthquake¡± »ò ¡°shaking¡±µÈÏà¹ØµÄtweetsÏûÏ¢Á÷£¬¿ÉÒÔºÜÈÝÒ×µØÊ¹ÓÃSpark StreamingʵÏÖ´ËÄ¿µÄ£º

TwitterUtils.createStream(...) .filter(_.getText.contains("earthquake") || _.getText.contains("shaking")) 

È»ºó£¬ÎÒÃÇÐèÒª¶ÔtweetsÏûÏ¢Á÷½øÐÐÓïÒå·ÖÎö£¬ÒÔÈ·¶¨ËüÃDZíʾµÄÊÇ·ñÊǵ±Ç°ÕýÔÚ·¢ÉúµÄµØÕð¡£ÀýÈ磬¡°Earthquake!¡±»òÕß¡°Now it is shaking¡±µÈtweetsÏûÏ¢½«±»ÊÓΪÕýÃæÆ¥Åä¡£¶øÏñ¡°²Î¼ÓµØÕð»áÒé(Attending an Earthquake Conference)¡±»ò¡°×òÌìµØÕðÕæ¿ÉÅÂ(The earthquake yesterday was scary)¡±µÈtweetsÏûÏ¢Ôò²»»á±»Æ¥Åä¡£ÎÄÕµÄ×÷ÕßΪʵÏִ˹¦ÄÜʹÓÃÁËÖ§³ÖÏòÁ¿»ú(SVM)£¬ÎÒÃÇÕâÀïÒ²¿ÉÒÔÕâô×ö£¬µ«ÊÇÒ²¿ÉÒÔ³¢ÊÔʹÓÃÁ÷ʽ¼ÆËãʵÏֵİ汾£¬ÏÂÃæÊÇʹÓÃMLlibÉú³ÉµÄ´úÂëʾÀý£º

// We would prepare some earthquake tweet data and load it in LIBSVM 
format.
val data = MLUtils.loadLibSVMFile(sc, "sample_earthquate_tweets.txt")
// Split data into training (60%) and test (40%).
val splits = data.randomSplit(Array(0.6, 0.4), seed = 11L)
val training = splits(0).cache() val test = splits(1) // Run training
algorithm to build the model
val numIterations = 100
val model = SVMWithSGD.train(training, numIterations) // Clear the
default
threshold. model.clearThreshold() // Compute raw scores on the test set.
val scoreAndLabels = test.map { point => val score = model.predict
(point.features) (score, point.label)} // Get evaluation metrics.
val metrics = new BinaryClassificationMetrics(scoreAndLabels)
val auROC = metrics.areaUnderROC()
println("Area under ROC = " + auROC)

Èç¹ûÎÒÃǶԸÃÄ£Ð͵ÄÔ¤²âÂʸе½ÂúÒ⣬ÎÒÃÇ¿ÉÒÔ½øÈëÏÂÒ»½×¶Î²¢ÔÚ·¢ÉúµØÕðʱ×÷³ö·´Ó¦¡£ÎªÁËÔ¤²âÒ»¸öµØÕðµÄ·¢Éú£¬ÎÒÃÇÐèÒªÔڹ涨µÄʱ¼ä´°¿ÚÄÚ(ÈçÎÄÕÂÖÐËùÃèÊöµÄ)¼ì²âÒ»¶¨ÊýÁ¿(¼´ÃܶÈ)µÄÕýÏò΢²©¡£ÐèҪעÒâµÄÊÇ£¬¶ÔÓÚÆôÓÃTwitterλÖ÷þÎñµÄtweetÏûÏ¢£¬ÎÒÃÇ»¹»áÌáÈ¡µØÕðµÄλÖá£ÓÐÁËÇ°ÃæÕâЩ֪ʶµÄÆÌµæ£¬ÎÒÃÇ¿ÉÒÔʹÓÃSparkSQL²éѯÏÖÓеÄHive±í(´æ´¢×ŶԽÓÊÕµØÕð֪ͨ¸ÐÐËȤµÄÓû§)À´¼ìË÷¶ÔÓ¦Óû§µÄµç×ÓÓʼþµØÖ·£¬²¢Ïò¸÷Óû§·¢Ë͸öÐÔ»¯µÄ¾¯¸æÓʼþ£¬ÈçÏÂËùʾ£º

// sc is an existing SparkContext.  
val sqlContext = new org.apache.spark.sql.hive.HiveContext(sc)
// sendEmail is a custom function
sqlContext.sql("FROM earthquake_warning_users SELECT firstName,
lastName, city, email") .collect().foreach(sendEmail)

ÆäËûApache SparkʹÓÃʾÀý

SparkµÄʹÓó¡¾°µ±È»²»½ö½ö¾ÖÏÞÓÚ¶ÔµØÕðµÄ¼ì²â¡£

ÕâÀïÌṩ¹ØÓÚÒ»¸ö·Ç³£ÊʺÏSpark¼¼Êõ´¦ÀíµÄ°¸ÀýËÙ²éÖ¸ÄÏ(µ«¿Ï¶¨Ã»ÓнӽüÇ)£¬ÕâЩ°¸ÀýÖеij¡¾°¶¼ÃæÁÙ×Å´óÊý¾ÝÆÕ±é´æÔÚµÄËÙ¶È(Velocity)¡¢¶àÑùÐÔ(Variety)ºÍÈÝÁ¿(Volume)ÎÊÌâ¡£

ÔÚÓÎÏ·ÐÐÒµÖУ¬´¦ÀíºÍ·¢ÏÖÀ´×ÔʵʱÓÎϷʼþÁ÷ÖеÄÒþ²ØÄ£Ê½£¬²¢Äܹ»¼´Ê±¶ÔËüÃÇ×ö³öÏìÓ¦Êǹ«Ë¾Äܹ»²úÉúÓªÊյĹؼüÄÜÁ¦£¬Ö÷ҪĿµÄÊÇΪʵÏÖÍæ¼ÒÁô´æ£¬¶¨Ïò¹ã¸æ£¬¸´Ôӵȼ¶µÄ×Ô¶¯µ÷ÕûµÈ¡£

ÔÚµç×ÓÉÌÎñÐÐÒµÖУ¬ÊµÊ±½»Ò×ÐÅÏ¢¿ÉÒÔ´«µÝµ½Á÷¾ÛÀàËã·¨Èçk-means»òÕßЭͬ¹ýÂËËã·¨ÈçALS£¬È»ºóÆä½á¹û¿ÉÒÔÓëÆäËû·Ç½á¹¹»¯Êý¾ÝÔ´(ÀýÈç¿Í»§ÆÀÂÛ»òÕß²úÆ·ÆÀÂÛ)Ïà½áºÏ£¬²¢³ÖÐø²»¶ÏµØÌá¸ßºÍ¸Ä½øÍƼöËã·¨ÒÔÊÊӦеķ¢Õ¹Ç÷ÊÆ¡£

ÔÚ½ðÈÚ»ò°²È«ÐÐÒµÖУ¬Spark¼¼ÊõÕ»¿ÉÒÔÓ¦ÓÃÓÚÆÛÕ©»òÈëÇÖ¼ì²âϵͳ¡¢»òÓ¦ÓÃÓÚ»ùÓÚ·çÏÕµÄÉí·ÝÑéÖ¤¡£Spark¿ÉÒÔͨ¹ýÊÕ¼¯´óÁ¿¹éµµÈÕÖ¾£¬Í¬Ê±½áºÏÍⲿÊý¾ÝÔ´Èçй¶µÄÊý¾Ý¡¢ÊÜËðÕË»§ÐÅÏ¢(Èçhttps://haveibeenpwned.com/)¼°À´×ÔÍⲿÁ¬½Ó/ÇëÇó(ÈçIPµØÀíλÖûòʱ¼ä)µÄÊý¾ÝÀ´´ïµ½×îºÃµÄ½á¹û¡££¬

½áÂÛ

×ܶøÑÔÖ®£¬Spark°ïÖú½µµÍÁ˾߱¸ÌôÕ½ÐԺͼÆËãÃܼ¯Ð͵ĺ£Á¿ÊµÊ±»òÀëÏßÊý¾Ý(°üÀ¨½á¹¹»¯ºÍ·Ç½á¹¹»¯Êý¾Ý)´¦ÀíÈÎÎñµÄÄѶȣ¬Î޷켯³ÉÏà¹Ø¸´ÔÓ¹¦ÄÜ£¬Èç»úÆ÷ѧϰºÍͼÐÎËã·¨¡£SparkµÄ´óÊý¾Ý´¦ÀíÄÜÁ¦½«»Ý¼°´óÖÚ£¬Ç뾡Çé³¢ÊÔ!

   
1929 ´Îä¯ÀÀ       27
Ïà¹ØÎÄÕÂ

»ùÓÚEAµÄÊý¾Ý¿â½¨Ä£
Êý¾ÝÁ÷½¨Ä££¨EAÖ¸ÄÏ£©
¡°Êý¾Ýºþ¡±£º¸ÅÄî¡¢ÌØÕ÷¡¢¼Ü¹¹Óë°¸Àý
ÔÚÏßÉ̳ÇÊý¾Ý¿âϵͳÉè¼Æ ˼·+Ч¹û
 
Ïà¹ØÎĵµ

GreenplumÊý¾Ý¿â»ù´¡Åàѵ
MySQL5.1ÐÔÄÜÓÅ»¯·½°¸
ijµçÉÌÊý¾ÝÖÐ̨¼Ü¹¹Êµ¼ù
MySQL¸ßÀ©Õ¹¼Ü¹¹Éè¼Æ
Ïà¹Ø¿Î³Ì

Êý¾ÝÖÎÀí¡¢Êý¾Ý¼Ü¹¹¼°Êý¾Ý±ê×¼
MongoDBʵս¿Î³Ì
²¢·¢¡¢´óÈÝÁ¿¡¢¸ßÐÔÄÜÊý¾Ý¿âÉè¼ÆÓëÓÅ»¯
PostgreSQLÊý¾Ý¿âʵսÅàѵ
×îл¼Æ»®
DeepSeekÔÚÈí¼þ²âÊÔÓ¦ÓÃʵ¼ù 4-12[ÔÚÏß]
DeepSeek´óÄ£ÐÍÓ¦Óÿª·¢Êµ¼ù 4-19[ÔÚÏß]
UAF¼Ü¹¹ÌåϵÓëʵ¼ù 4-11[±±¾©]
AIÖÇÄÜ»¯Èí¼þ²âÊÔ·½·¨Óëʵ¼ù 5-23[ÉϺ£]
»ùÓÚ UML ºÍEA½øÐзÖÎöÉè¼Æ 4-26[±±¾©]
ÒµÎñ¼Ü¹¹Éè¼ÆÓ뽨ģ 4-18[±±¾©]

APPÍÆ¹ãÖ®ÇÉÓù¤¾ß½øÐÐÊý¾Ý·ÖÎö
Hadoop Hive»ù´¡sqlÓï·¨
Ó¦Óö༶»º´æÄ£Ê½Ö§³Åº£Á¿¶Á·þÎñ
HBase ³¬Ïêϸ½éÉÜ
HBase¼¼ÊõÏêϸ½éÉÜ
Spark¶¯Ì¬×ÊÔ´·ÖÅä

HadoopÓëSpark´óÊý¾Ý¼Ü¹¹
HadoopÔ­ÀíÓë¸ß¼¶Êµ¼ù
HadoopÔ­Àí¡¢Ó¦ÓÃÓëÓÅ»¯
´óÊý¾ÝÌåϵ¿ò¼ÜÓëÓ¦ÓÃ
´óÊý¾ÝµÄ¼¼ÊõÓëʵ¼ù
Spark´óÊý¾Ý´¦Àí¼¼Êõ

GE Çø¿éÁ´¼¼ÊõÓëʵÏÖÅàѵ
º½Ìì¿Æ¹¤Ä³×Ó¹«Ë¾ Nodejs¸ß¼¶Ó¦Óÿª·¢
ÖÐÊ¢Òæ»ª ׿Խ¹ÜÀíÕß±ØÐë¾ß±¸µÄÎåÏîÄÜÁ¦
ijÐÅÏ¢¼¼Êõ¹«Ë¾ PythonÅàѵ
ij²©²ÊITϵͳ³§ÉÌ Ò×ÓÃÐÔ²âÊÔÓëÆÀ¹À
ÖйúÓÊ´¢ÒøÐÐ ²âÊÔ³ÉÊì¶ÈÄ£Ðͼ¯³É(TMMI)
ÖÐÎïÔº ²úÆ·¾­ÀíÓë²úÆ·¹ÜÀí