µÚÒ»´ÎÌýÎÅSparkÊÇ2013ÄêÄêÄ©£¬µ±Ê±±ÊÕß¶ÔScala(SparkµÄ±à³ÌÓïÑÔ)¸ÐÐËȤ¡£Ò»¶Îʱ¼äÖ®ºó×öÁËÒ»¸öÓÐȤµÄÊý¾Ý¿ÆÑ§ÏîÄ¿£¬ÊÔͼԤ²â̩̹Äá¿ËºÅÉϵÄÉú»¹Çé¿ö(Kaggle¾ºÈüÏîÄ¿£¬Í¨¹ýʹÓûúÆ÷ѧϰԤ²â̩̹Äá¿ËºÅÉÏÄÄЩ³Ë¿Í¾ß±¸¸ü¸ßµÄÉú»¹¿ÉÄÜÐÔ)¡£Í¨¹ý¸ÃÏîÄ¿¿ÉÒÔ¸üÉîÈëµØÀí½âSparkµÄ¸ÅÄîºÍ±à³Ì·½Ê½¡£

ÔÚ±¾ÎÄIntroduction to Apache Spark with Examples and Use
Cases£¬×÷ÕßRADEK OSTROWSKI½«Í¨¹ýKaggle¾ºÈüÏîÄ¿¡°Ô¤²â̩̹Äá¿ËºÅÉϵÄÉú»¹Çé¿ö¡±´ø´ó¼ÒÉîÈëѧϰSpark¡£
ÒÔÏÂΪÒëÎÄ
µÚÒ»´ÎÌýÎÅSparkÊÇ2013ÄêÄêÄ©£¬µ±Ê±±ÊÕß¶ÔScala(SparkµÄ±à³ÌÓïÑÔ)¸ÐÐËȤ¡£Ò»¶Îʱ¼äÖ®ºó×öÁËÒ»¸öÓÐȤµÄÊý¾Ý¿ÆÑ§ÏîÄ¿£¬ÊÔͼԤ²â̩̹Äá¿ËºÅÉϵÄÉú»¹Çé¿ö(Kaggle¾ºÈüÏîÄ¿£¬Í¨¹ýʹÓûúÆ÷ѧϰԤ²â̩̹Äá¿ËºÅÉÏÄÄЩ³Ë¿Í¾ß±¸¸ü¸ßµÄÉú»¹¿ÉÄÜÐÔ)¡£Í¨¹ý¸ÃÏîÄ¿¿ÉÒÔ¸üÉîÈëµØÀí½âSparkµÄ¸ÅÄîºÍ±à³Ì·½Ê½£¬Ç¿ÍƼöÏëÒª¾«½øSparkµÄ¿ª·¢ÈËÔ±ÄøÃÏîÄ¿ÈëÊÖ¡£
Èç½ñSparkÔÚÖڶ໥ÁªÍø¹«Ë¾±»¹ã·º²ÉÓã¬ÀýÈçAmazon¡¢eBayºÍYahooµÈ¡£Ðí¶à¹«Ë¾ÓµÓÐÔËÐÐÔÚÉÏǧ¸ö½ÚµãµÄSpark¼¯Èº¡£¸ù¾ÝSpark
FAQ£¬ÒÑÖª×î´óµÄ¼¯ÈºÓÐ×ų¬¹ý8000¸ö½Úµã¡£²»ÄÑ¿´³ö£¬SparkÊÇÒ»ÏîÖµµÃ¹Ø×¢ºÍѧϰµÄ¼¼Êõ¡£
±¾ÎÄͨ¹ýһЩʵ¼Ê°¸ÀýºÍ´úÂëʾÀý¶ÔSpark½øÐнéÉÜ£¬°¸ÀýºÍ´úÂëʾÀý²¿·Ö³ö×ÔApache Spark¹Ù·½ÍøÕ¾£¬Ò²ÓÐÒ»²¿·Ö³ö×Ô¡¶Learning
Spark - Lightning-Fast Big Data Analysis¡·Ò»Êé¡£
ʲôÊÇ Apache Spark? ³õ²½½éÉÜ
SparkÊÇApacheµÄÒ»¸öÏîÄ¿£¬±»Ðû´«Îª"ÉÁµç°ã¿ìËÙ¼¯Èº¼ÆËã"£¬ËüÓµÓз±ÈٵĿªÔ´ÉçÇø£¬Í¬Ê±Ò²ÊÇĿǰ×î»îÔ¾µÄApacheÏîÄ¿¡£
SparkÌṩÁËÒ»¸ö¸ü¿ì¸üͨÓõÄÊý¾Ý´¦ÀíÆ½Ì¨¡£ÓëHadoopÏà±È£¬ÔËÐÐÔÚÄÚ´æÖеijÌÐò£¬SparkµÄËÙ¶È¿ÉÒÔÌá¸ß100±¶£¬¼´Ê¹ÔËÐÐÔÚ´ÅÅÌÉÏ£¬ÆäËÙ¶ÈÒ²ÄÜÌá¸ß10±¶¡£È¥Ä꣬SparkÔÚ´¦ÀíËÙ¶È·½ÃæÒѾ³¬Ô½ÁËHadoop£¬½öÀûÓÃÊ®·ÖÖ®Ò»ÓÚHadoopƽ̨µÄ»úÆ÷£¬È´ÒÔ3±¶ÓÚHadoopµÄËÙ¶ÈÍê³ÉÁË100TBÊýÁ¿¼¶µÄDaytona
GreySort±ÈÈü£¬³ÉΪÁËPB¼¶±ðÅÅÐòËÙ¶È×î¿ìµÄ¿ªÔ´ÒýÇæ¡£
ͨ¹ýʹÓÃSparkËùÌṩµÄ³¬¹ý80¸ö¸ß¼¶º¯Êý£¬Èøü¿ìËÙµØÍê³É±àÂë³ÉΪ¿ÉÄÜ¡£´óÊý¾ÝÖеÄ"Hello
World!"(±à³ÌÓïÑÔÑÓÐøÏÂÀ´Ò»¸ö¹ßÀý)£ºWord Count³ÌÐòʾÀý¿ÉÒÔ˵Ã÷ÕâÒ»µã£¬Í¬ÑùµÄÂ߼ʹÓÃJavaÓïÑÔ±àдMapReduce´úÂëÐèÒª50ÐÐ×óÓÒ£¬µ«ÔÚSpark(ScalaÆÀÒéʵÏÖ)ÖеÄʵÏַdz£¼òµ¥£º
sparkContext.textFile("hdfs://..."). flatMap(line => line.split(" ")). map(word => (word, 1)).
reduceByKey(_ + _).saveAsTextFile("hdfs://...") |
ѧϰÈçApache SparkµÄÁíÒ»¸öÖØÒªÍ¾¾¶ÊÇʹÓý»»¥Ê½shell (REPL)£¬Ê¹ÓÃREPL¿ÉÒÔ½»»¥ÏÔʾ´úÂëÔËÐнá¹û£¬ÊµÊ±²âÊÔÿÐдúÂëµÄÔËÐнá¹û£¬ÎÞÐèÏȱàÂë¡¢ÔÙÖ´ÐÐÕû¸ö×÷Òµ£¬Èç´Ë±ãÄÜËõ¶Ì»¨ÔÚ´úÂëÉϵŤ×÷ʱ¼ä£¬Í¬Ê±Îª¼´Ï¯Êý¾Ý·ÖÎöÌṩÁË¿ÉÄÜ¡£
SparkµÄÆäËûÖ÷Òª¹¦ÄܰüÀ¨£º
Ŀǰ֧³ÖScala£¬JavaºÍPythonÈýÖÖÓïÑ﵀ API£¬²¢ÕýÔÚÖð²½Ö§³ÖÆäËûÓïÑÔ(ÀýÈçRÓïÑÔ);
Äܹ»ÓëHadoopÉú̬ϵͳºÍÊý¾ÝÔ´(HDFS£¬Amazon S3£¬Hive£¬HBase£¬CassandraµÈ)ÍêÃÀ¼¯³É;
¿ÉÒÔÔËÐÐÔÚHadoop YARN»òÕßApache Mesos¹ÜÀíµÄ¼¯ÈºÉÏ£¬Ò²¿ÉÒÔͨ¹ý×Ô´øµÄ×ÊÔ´¹ÜÀíÆ÷¶ÀÁ¢ÔËÐС£
Spark ÄÚºËÖ®ÉÏ»¹ÓÐÐí¶àÇ¿´óµÄ¡¢¸ü¸ß¼¶µÄ¿â×÷Ϊ²¹³ä£¬¿ÉÒÔÔÚͬһӦÓóÌÐòÖÐÖ±½ÓʹÓã¬Ä¿Ç°ÓÐSparkSQL£¬Spark
Streaming£¬MLlib(ÓÃÓÚ»úÆ÷ѧϰ)ºÍGraphXÕâËÄ´ó×é¼þ¿â£¬±¾ÎĽ«¶ÔSpark Core¼°ËÄ´ó×é¼þ¿â½øÐÐÏêϸ½éÉÜ¡£µ±È»£¬»¹ÓжîÍâÆäËüµÄSpark¿âºÍÀ©Õ¹¿âĿǰҲ´¦ÓÚ¿ª·¢ÖС£

Spark Core
Spark CoreÊÇ´ó¹æÄ£²¢ÐмÆËãºÍ·Ö²¼Ê½Êý¾Ý´¦ÀíµÄ»ù´¡ÒýÇæ¡£ËüµÄÖ°ÔðÓУº
ÄÚ´æ¹ÜÀíºÍ¹ÊÕϻָ´;
µ÷¶È¡¢·Ö·¢ºÍ¼à¿Ø¼¯ÈºÉϵÄ×÷Òµ;
Óë´æ´¢ÏµÍ³½øÐн»»¥¡£
SparkÒýÈëÁËRDD(µ¯ÐÔ·Ö²¼Ê½Êý¾Ý¼¯)µÄ¸ÅÄRDDÊÇÒ»¸ö²»¿É±äµÄÈÝ´í¡¢·Ö²¼Ê½¶ÔÏ󼯺ϣ¬Ö§³Ö²¢ÐвÙ×÷¡£RDD¿É°üº¬ÈκÎÀàÐ͵ĶÔÏ󣬿Éͨ¹ý¼ÓÔØÍⲿÊý¾Ý¼¯»òͨ¹ýDriver³ÌÐòÖеļ¯ºÏÀ´Íê³É´´½¨¡£
RDDÖ§³ÖÁ½ÖÖÀàÐ͵IJÙ×÷£º
ת»»(Transformations)Ö¸µÄÊÇ×÷ÓÃÓÚÒ»¸öRDDÉϲ¢»á²úÉú°üº¬½á¹ûµÄÐÂRDDµÄ²Ù×÷(ÀýÈçmap,
filter, join, unionµÈ)
¶¯×÷(Actions)Ö¸µÄÊÇ×÷ÓÃÓÚÒ»¸öRDDÖ®ºó£¬»á´¥·¢¼¯Èº¼ÆËã²¢µÃµ½·µ»ØÖµµÄ²Ù×÷(ÀýÈçreduce£¬count£¬firstµÈ)
SparkÖеÄת»»²Ù×÷ÊÇ¡°ÑÓ³ÙµÄ(lazy)¡±£¬Òâζ×Åת»»Ê±ËüÃDz¢²»Á¢¼´Æô¶¯¼ÆËã²¢·µ»Ø½á¹û¡£Ïà·´£¬ËüÃÇÖ»ÊÇ¡°¼Çס¡±ÒªÖ´ÐеIJÙ×÷ºÍ´ýÖ´ÐвÙ×÷µÄÊý¾Ý¼¯(ÀýÈçÎļþ)¡£×ª»»²Ù×÷½öµ±²úÉúµ÷ÓÃaction²Ù×÷ʱ²Å»á´¥·¢Êµ¼Ê¼ÆË㣬Íê³Éºó½«½á¹û·µ»Øµ½driver³ÌÐò¡£ÕâÖÖÉè¼ÆÊ¹SparkÄܹ»¸üÓÐЧµØÔËÐУ¬ÀýÈ磬Èç¹ûÒ»¸ö´óÎļþÒÔ²»Í¬·½Ê½½øÐÐת»»²Ù×÷²¢´«µÝµ½Ê׸öaction²Ù×÷£¬´ËʱSpark½«Ö»·µ»ØµÚÒ»ÐеĽá¹û£¬¶ø²»ÊǶÔÕû¸öÎļþÖ´ÐвÙ×÷¡£
ĬÈÏÇé¿öÏ£¬Ã¿´Î¶ÔÆä´¥·¢Ö´ÐÐaction²Ù×÷ʱ£¬¶¼ÐèÒªÖØÐ¼ÆËãÇ°Ãæ¾¹ýת»»²Ù×÷µÄRDD£¬²»¹ý£¬ÄãÒ²¿ÉÒÔʹÓó־û¯»ò»º´æ·½·¨ÔÚÄÚ´æÖг־û¯RDDÀ´±ÜÃâÕâÒ»ÎÊÌ⣬´Ëʱ£¬Spark½«ÔÚ¼¯ÈºµÄÄÚ´æÖб£ÁôÕâÐ©ÔªËØ£¬´Ó¶øÔÚÏ´ÎʹÓÃʱ¿ÉÒÔ¼ÓËÙ·ÃÎÊ¡£
SparkSQL
SparkSQLÊÇSparkÖÐÖ§³ÖSQLÓïÑÔ»òÕßHive²éѯÓïÑÔ²éѯÊý¾ÝµÄÒ»¸ö×é¼þ¡£ËüÆðÏÈ×÷ΪApache
Hive ¶Ë¿ÚÔËÐÐÔÚSparkÖ®ÉÏ(Ìæ´úMapReduce)£¬ÏÖÔÚÒѾ±»¼¯³ÉΪSparkµÄÒ»¸öÖØÒª×é¼þ¡£³ýÖ§³Ö¸÷ÖÖÊý¾ÝÔ´£¬Ëü»¹¿ÉÒÔʹÓôúÂëת»»À´½øÐÐSQL²éѯ£¬¹¦ÄÜÊ®·ÖÇ¿´ó¡£ÏÂÃæÊǼæÈÝHive²éѯµÄʾÀý£º
// sc is an existing SparkContext. val sqlContext = new org.apache.spark.sql.hive.HiveContext(sc) sqlContext.sql("CREATE TABLE IF NOT EXISTS src (key INT, value STRING)") sqlContext.sql("LOAD DATA LOCAL INPATH 'examples/src/main/resources/kv1.txt' INTO TABLE src")
// Queries are expressed in HiveQL sqlContext.sql("FROM src SELECT key, value").collect().foreach(println) |
Spark Streaming
Spark StreamingÖ§³Öʵʱ´¦ÀíÁ÷Êý¾Ý£¬ÀýÈçÉú²ú»·¾³ÖеÄWeb·þÎñÆ÷ÈÕÖ¾Îļþ(ÀýÈç Apache
FlumeºÍ HDFS/S3)£¬É罻ýÌåÊý¾Ý(ÀýÈçTwitter)ºÍ¸÷ÖÖÏûÏ¢¶ÓÁÐÖÐ(ÀýÈçKafka)µÄʵʱÊý¾Ý¡£ÔÚÒýÇæÄÚ²¿£¬Spark
Streaming½ÓÊÕÊäÈëµÄÊý¾ÝÁ÷£¬Óë´Ëͬʱ½«Êý¾Ý½øÐÐÇз֣¬ÐγÉÊý¾ÝƬ¶Î(batch)£¬È»ºó½»ÓÉSparkÒýÇæ´¦Àí£¬°´Êý¾ÝƬ¶ÎÉú³É×îÖյĽá¹ûÁ÷£¬ÈçÏÂͼËùʾ¡£

Spark Streaming APIÓëSpark Core½ôÃܽáºÏ£¬Ê¹µÃ¿ª·¢ÈËÔ±¿ÉÒÔÇáËɵØÍ¬Ê±¼ÝÊ»Åú´¦ÀíºÍÁ÷Êý¾Ý¡£
MLlib
MLlibÊÇÒ»¸öÌṩ¶àÖÖËã·¨µÄ»úÆ÷ѧϰ¿â£¬Ä¿µÄÊÇʹÓ÷ÖÀ࣬»Ø¹é£¬¾ÛÀ࣬Ðͬ¹ýÂ˵ÈËã·¨Äܹ»ÔÚ¼¯ÈºÉϺáÏòÀ©Õ¹(¿ÉÒÔ²éÔÄToptalÖйØÓÚ»úÆ÷ѧϰµÄÎÄÕÂÏêϸÁ˽â)¡£MLlibÖеÄһЩËã·¨Ò²Äܹ»ÓëÁ÷Êý¾ÝÒ»ÆðʹÓã¬ÀýÈçʹÓÃÆÕͨ×îС¶þ³Ë·¨µÄÏßÐԻعéËã·¨»òk¾ùÖµ¾ÛÀàËã·¨(ÒÔ¼°¸ü¶àÆäËûÕýÔÚ¿ª·¢µÄËã·¨)¡£Apache
Mahout(Ò»¸öHadoopµÄ»úÆ÷ѧϰ¿â)ÞðÆúMapReduce²¢½«ËùÓеÄÁ¦Á¿·ÅÔÚSpark MLlibÉÏ¡£
GraphX

GraphXÊÇÒ»¸öÓÃÓÚ²Ù×÷ͼºÍÖ´ÐÐͼ²¢ÐвÙ×÷µÄ¿â¡£ËüΪETL¼´Extraction-Transformation-Loading¡¢Ì½Ë÷ÐÔ·ÖÎöºÍµü´úͼ¼ÆËãÌṩÁËͳһµÄ¹¤¾ß¡£³ýÁËÄÚÖõÄͼ²Ù×÷Ö®Í⣬ËüÒ²ÌṩÁËÒ»¸öͨÓõÄͼËã·¨¿âÈçPageRank¡£
ÈçºÎʹÓÃApache Spark: ʼþ¼à²âÓÃÀý
»Ø´ðÁË¡°Ê²Ã´ÊÇApache Spark?¡±µÄÎÊÌâÖ®ºó£¬ÏÖÔڻعýÍ·À´ÏëÏëÄÄЩÀàÐ͵ÄÎÊÌâ»òÕßÌôÕ½¿ÉÒÔʹSparkµÃµ½¸üÓÐЧµÄʹÓá£
ÎÒ×î½üżȻ·¢ÏÖÁËһƪ¹ØÓÚͨ¹ý·ÖÎöTwitterÁ÷À´¼ì²âµØÕðµÄʵÑ飬ÓÐȤµÄÊÇ£¬ÊµÑé½á¹ûÒѾ±íÃ÷ʹÓÃÕâÖÖ·½Ê½Í¨ÖªÈÕ±¾·¢ÉúµØÕðµÄËÙ¶È»á±ÈÈÕ±¾ÆøÏó¾Ö¸ü¿ì¡£¼´Ê¹ÔÚÎÄÕÂÖÐËûÃÇʹÓÃÁËÓë±¾ÎIJ»Í¬µÄ¼¼Êõ£¬µ«ÎÒÈÏΪÕâÊÇÒ»¸öºÜºÃµÄÀý×Ó£¬Í¨¹ýʹÓÃSpark±àд¼ò½àµÄ´úÂ룬ͬʱÓÖÎÞÐèΪ¼æÈÝÐÔ¡¢»¥²Ù×÷ÐÔ¶ø±àд½ºË®´úÂë(glue
code)¡£
Ê×ÏÈ£¬ÎÒÃDZØÐë¹ýÂ˳öÓë¡°earthquake¡± »ò ¡°shaking¡±µÈÏà¹ØµÄtweetsÏûÏ¢Á÷£¬¿ÉÒÔºÜÈÝÒ×µØÊ¹ÓÃSpark
StreamingʵÏÖ´ËÄ¿µÄ£º
TwitterUtils.createStream(...) .filter(_.getText.contains("earthquake") || _.getText.contains("shaking")) |
È»ºó£¬ÎÒÃÇÐèÒª¶ÔtweetsÏûÏ¢Á÷½øÐÐÓïÒå·ÖÎö£¬ÒÔÈ·¶¨ËüÃDZíʾµÄÊÇ·ñÊǵ±Ç°ÕýÔÚ·¢ÉúµÄµØÕð¡£ÀýÈ磬¡°Earthquake!¡±»òÕß¡°Now
it is shaking¡±µÈtweetsÏûÏ¢½«±»ÊÓΪÕýÃæÆ¥Åä¡£¶øÏñ¡°²Î¼ÓµØÕð»áÒé(Attending
an Earthquake Conference)¡±»ò¡°×òÌìµØÕðÕæ¿ÉÅÂ(The earthquake yesterday
was scary)¡±µÈtweetsÏûÏ¢Ôò²»»á±»Æ¥Åä¡£ÎÄÕµÄ×÷ÕßΪʵÏִ˹¦ÄÜʹÓÃÁËÖ§³ÖÏòÁ¿»ú(SVM)£¬ÎÒÃÇÕâÀïÒ²¿ÉÒÔÕâô×ö£¬µ«ÊÇÒ²¿ÉÒÔ³¢ÊÔʹÓÃÁ÷ʽ¼ÆËãʵÏֵİ汾£¬ÏÂÃæÊÇʹÓÃMLlibÉú³ÉµÄ´úÂëʾÀý£º
// We would prepare some earthquake tweet data and load it in LIBSVM format. val data = MLUtils.loadLibSVMFile(sc, "sample_earthquate_tweets.txt") // Split data into training (60%) and test (40%). val splits = data.randomSplit(Array(0.6, 0.4), seed = 11L) val training = splits(0).cache() val test = splits(1) // Run training algorithm to build the model val numIterations = 100 val model = SVMWithSGD.train(training, numIterations) // Clear the default threshold. model.clearThreshold() // Compute raw scores on the test set. val scoreAndLabels = test.map { point => val score = model.predict (point.features) (score, point.label)} // Get evaluation metrics. val metrics = new BinaryClassificationMetrics(scoreAndLabels) val auROC = metrics.areaUnderROC() println("Area under ROC = " + auROC) |
Èç¹ûÎÒÃǶԸÃÄ£Ð͵ÄÔ¤²âÂʸе½ÂúÒ⣬ÎÒÃÇ¿ÉÒÔ½øÈëÏÂÒ»½×¶Î²¢ÔÚ·¢ÉúµØÕðʱ×÷³ö·´Ó¦¡£ÎªÁËÔ¤²âÒ»¸öµØÕðµÄ·¢Éú£¬ÎÒÃÇÐèÒªÔڹ涨µÄʱ¼ä´°¿ÚÄÚ(ÈçÎÄÕÂÖÐËùÃèÊöµÄ)¼ì²âÒ»¶¨ÊýÁ¿(¼´ÃܶÈ)µÄÕýÏò΢²©¡£ÐèҪעÒâµÄÊÇ£¬¶ÔÓÚÆôÓÃTwitterλÖ÷þÎñµÄtweetÏûÏ¢£¬ÎÒÃÇ»¹»áÌáÈ¡µØÕðµÄλÖá£ÓÐÁËÇ°ÃæÕâЩ֪ʶµÄÆÌµæ£¬ÎÒÃÇ¿ÉÒÔʹÓÃSparkSQL²éѯÏÖÓеÄHive±í(´æ´¢×ŶԽÓÊÕµØÕð֪ͨ¸ÐÐËȤµÄÓû§)À´¼ìË÷¶ÔÓ¦Óû§µÄµç×ÓÓʼþµØÖ·£¬²¢Ïò¸÷Óû§·¢Ë͸öÐÔ»¯µÄ¾¯¸æÓʼþ£¬ÈçÏÂËùʾ£º
// sc is an existing SparkContext. val sqlContext = new org.apache.spark.sql.hive.HiveContext(sc) // sendEmail is a custom function sqlContext.sql("FROM earthquake_warning_users SELECT firstName, lastName, city, email") .collect().foreach(sendEmail) |
ÆäËûApache SparkʹÓÃʾÀý
SparkµÄʹÓó¡¾°µ±È»²»½ö½ö¾ÖÏÞÓÚ¶ÔµØÕðµÄ¼ì²â¡£
ÕâÀïÌṩ¹ØÓÚÒ»¸ö·Ç³£ÊʺÏSpark¼¼Êõ´¦ÀíµÄ°¸ÀýËÙ²éÖ¸ÄÏ(µ«¿Ï¶¨Ã»ÓнӽüÇ)£¬ÕâЩ°¸ÀýÖеij¡¾°¶¼ÃæÁÙ×Å´óÊý¾ÝÆÕ±é´æÔÚµÄËÙ¶È(Velocity)¡¢¶àÑùÐÔ(Variety)ºÍÈÝÁ¿(Volume)ÎÊÌâ¡£
ÔÚÓÎÏ·ÐÐÒµÖУ¬´¦ÀíºÍ·¢ÏÖÀ´×ÔʵʱÓÎϷʼþÁ÷ÖеÄÒþ²ØÄ£Ê½£¬²¢Äܹ»¼´Ê±¶ÔËüÃÇ×ö³öÏìÓ¦Êǹ«Ë¾Äܹ»²úÉúÓªÊյĹؼüÄÜÁ¦£¬Ö÷ҪĿµÄÊÇΪʵÏÖÍæ¼ÒÁô´æ£¬¶¨Ïò¹ã¸æ£¬¸´Ôӵȼ¶µÄ×Ô¶¯µ÷ÕûµÈ¡£
ÔÚµç×ÓÉÌÎñÐÐÒµÖУ¬ÊµÊ±½»Ò×ÐÅÏ¢¿ÉÒÔ´«µÝµ½Á÷¾ÛÀàËã·¨Èçk-means»òÕßÐͬ¹ýÂËËã·¨ÈçALS£¬È»ºóÆä½á¹û¿ÉÒÔÓëÆäËû·Ç½á¹¹»¯Êý¾ÝÔ´(ÀýÈç¿Í»§ÆÀÂÛ»òÕß²úÆ·ÆÀÂÛ)Ïà½áºÏ£¬²¢³ÖÐø²»¶ÏµØÌá¸ßºÍ¸Ä½øÍƼöËã·¨ÒÔÊÊӦеķ¢Õ¹Ç÷ÊÆ¡£
ÔÚ½ðÈÚ»ò°²È«ÐÐÒµÖУ¬Spark¼¼ÊõÕ»¿ÉÒÔÓ¦ÓÃÓÚÆÛÕ©»òÈëÇÖ¼ì²âϵͳ¡¢»òÓ¦ÓÃÓÚ»ùÓÚ·çÏÕµÄÉí·ÝÑéÖ¤¡£Spark¿ÉÒÔͨ¹ýÊÕ¼¯´óÁ¿¹éµµÈÕÖ¾£¬Í¬Ê±½áºÏÍⲿÊý¾ÝÔ´Èçй¶µÄÊý¾Ý¡¢ÊÜËðÕË»§ÐÅÏ¢(Èçhttps://haveibeenpwned.com/)¼°À´×ÔÍⲿÁ¬½Ó/ÇëÇó(ÈçIPµØÀíλÖûòʱ¼ä)µÄÊý¾ÝÀ´´ïµ½×îºÃµÄ½á¹û¡££¬
½áÂÛ
×ܶøÑÔÖ®£¬Spark°ïÖú½µµÍÁ˾߱¸ÌôÕ½ÐԺͼÆËãÃܼ¯Ð͵ĺ£Á¿ÊµÊ±»òÀëÏßÊý¾Ý(°üÀ¨½á¹¹»¯ºÍ·Ç½á¹¹»¯Êý¾Ý)´¦ÀíÈÎÎñµÄÄѶȣ¬Î޷켯³ÉÏà¹Ø¸´ÔÓ¹¦ÄÜ£¬Èç»úÆ÷ѧϰºÍͼÐÎËã·¨¡£SparkµÄ´óÊý¾Ý´¦ÀíÄÜÁ¦½«»Ý¼°´óÖÚ£¬Ç뾡Çé³¢ÊÔ!
|