Äú¿ÉÒÔ¾èÖú£¬Ö§³ÖÎÒÃǵĹ«ÒæÊÂÒµ¡£

1Ôª 10Ôª 50Ôª





ÈÏÖ¤Â룺  ÑéÖ¤Âë,¿´²»Çå³þ?Çëµã»÷Ë¢ÐÂÑéÖ¤Âë ±ØÌî



  ÇóÖª ÎÄÕ ÎÄ¿â Lib ÊÓÆµ iPerson ¿Î³Ì ÈÏÖ¤ ×Éѯ ¹¤¾ß ½²×ù Modeler   Code  
»áÔ±   
 
   
 
 
     
   
 ¶©ÔÄ
  ¾èÖú
Spark K-Means
 
×÷Õߣºlsshlsw À´Ô´£ºCSDN ·¢²¼ÓÚ£º2015-6-16
  3213  次浏览      27
 

½éÉÜ

K-MeansÊDZ»Ó¦ÓõÄ×î¹ã·ºµÄ»ùÓÚ»®·ÖµÄ¾ÛÀàËã·¨£¬ÊÇÒ»ÖÖÓ²¾ÛÀàËã·¨£¬ÊôÓÚµäÐ͵ľÖÓòÔ­Ð͵ÄÄ¿±êº¯Êý¾ÛÀàµÄ´ú±í¡£Ëã·¨Ê×ÏÈËæ»úÑ¡Ôñk¸ö¶ÔÏó£¬Ã¿¸ö¶ÔÏó³õʼµØ´ú±íÒ»¸ö´ØµÄƽ¾ùÖµ»òÕßÖÐÐÄ¡£¶ÔÓÚÊ£ÓàµÄÿ¸ö¶ÔÏ󣬸ù¾ÝÆäµ½¸÷¸ö´ØÖÐÐĵľàÀ룬°ÑËûÃÇ·Ö¸ø¾àÀë×îСµÄ´ØÖÐÐÄ£¬È»ºóÖØÐ¼ÆËãÿ¸ö´ØÆ½¾ùÖµ¡£Öظ´Õâ¸ö¹ý³Ì£¬Ö±µ½¾ÛÀà×¼ÔòÔòº¯ÊýÊÕÁ²¡£×¼Ôòº¯ÊýÒ»°ã²ÉÓÃÁ½ÖÖ·½Ê½£ºµÚÒ»ÖÖÊÇÈ«¾ÖÎó²îº¯Êý£¬µÚ¶þÖÖÊÇǰºóÁ½´ÎÖÐÐÄÎó²î±ä»¯¡£

Óë·ÖÀ಻ͬ£¬·ÖÀàÊǼලѧϰ£¬ÒªÇó·ÖÀàǰÃ÷È·¸÷¸öÀà±ð£¬²¢¶ÏÑÔÿ¸öÔªËØÓ³Éäµ½Ò»¸öÀà±ð£¬¶ø¾ÛÀàÊǹ۲ìʽѧϰ£¬ÔÚ¾ÛÀàǰ¿ÉÒÔ²»ÖªµÀÀà±ðÉõÖÁ²»¸ø¶¨Àà±ðÊýÁ¿£¬ÊÇÎ޼ලѧϰµÄÒ»ÖÖ¡£Ä¿Ç°¾ÛÀà¹ã·ºÓ¦ÓÃÓÚͳ¼ÆÑ§¡¢ÉúÎïѧ¡¢Êý¾Ý¿â¼¼ÊõºÍÊг¡ÓªÏúµÈÁìÓò£¬ÏàÓ¦µÄËã·¨Ò²·Ç³£µÄ¶à¡£

K-MeansÊôÓÚÎ޼ලѧϰ£¬×î´óµÄÌØ±ðºÍÓÅÊÆÔÚÓÚÄ£Ð͵Ľ¨Á¢²»ÐèҪѵÁ·Êý¾Ý¡£ÔÚÈÕ³£¹¤×÷ÖУ¬ºÜ¶àÇé¿öÏÂûÓа취ÊÂÏÈ»ñÈ¡µ½ÓÐЧµÄѵÁ·Êý¾Ý£¬Õâʱ²ÉÓÃK-MeansÊÇÒ»¸ö²»´íµÄÑ¡Ôñ¡£µ«K-MeansÐèÒªÔ¤ÏÈÉèÖÃÓжàÉÙ¸ö´ØÀࣨKÖµ£©£¬Õâ¶ÔÓÚÏñ¼ÆËãijʡ·ÝÈ«²¿µçÐÅÓû§µÄ½»ÍùȦÕâÑùµÄ³¡¾°¾ÍÍêÈ«µÄû°ì·¨ÓÃK-Means½øÐС£¶ÔÓÚ¿ÉÒÔÈ·¶¨KÖµ²»»áÌ«´óµ«²»Ã÷È·¾«È·µÄKÖµµÄ³¡¾°£¬¿ÉÒÔ½øÐеü´úÔËË㣬ȻºóÕÒ³öcost×îСʱËù¶ÔÓ¦µÄKÖµ£¬Õâ¸öÖµÍùÍùÄܽϺõÄÃèÊöÓжàÉÙ¸ö´ØÀà¡£

ÔËÓó¡¾°

1.ÉÌÎñÉÏ£¬°ïÖúÊг¡·ÖÎöÈËÔ±´Ó¿Í»§»ù±¾¿âÖз¢ÏÖ²»Í¬µÄ¿Í»§Èº£¬²¢ÇÒÓùºÂòģʽÀ´¿Ì»­²»Í¬µÄ¿Í»§ÈºÌØÕ÷¡£

2.ÉúÎïѧÉÏ£¬ÓÃÓÚÍÆµ¼Ö²ÎïºÍ¶¯ÎïµÄ·ÖÀ࣬¶Ô»ùÒòµÄ·ÖÀ࣬»ñµÃ¶ÔÖÖȺÖйÌÓнṹµÄÈÏʶ¡£

3.»¥ÁªÍøÉÏ£¬ÓÃÓÚ¶ÔWebÉϵÄÎĵµ½øÐзÖÀà´Ó¶ø·¢ÏÖÐÅÏ¢¡£

4.¶ÔÒ»¸öÓÎÏ·ÖеÄÍæ¼Ò½øÐзÖÀࣨÏÂÃæµÄ°¸Àý£©¡£

¹¤×÷Ô­Àí

Õë¶Ô°üº¬n¸ö¶ÔÏóµÄÊý¾Ý¼¯ºÏDÒÔ¼°³õʼ»¯µÄ¾ÛÀàÊýÄ¿k£¬Ê¹ÓÃÏÂÃæµÄËã·¨¡£

1.´ÓÊý¾Ý¼¯ºÏDÖÐËæ»úÑ¡Ôñk¸ö¶ÔÏó×÷Ϊ³õʼ´ØÖÐÐÄ¡£

2.¸ù¾Ý´ØµÄÖÐÐÄÖµ£¬°ÑÊý¾Ý¼¯ºÏÖеÄn¸ö¶ÔÏóÈ«²¿·Ö¸ø×î¡°ÏàËÆ¡±µÄ´Ø£¨¡°ÏàËÆ¡±¸ù¾Ý¾àÀ볤¶ÌÀ´Åжϣ©¡£

3.¸ù¾Ý´ØµÄÖÐÐÄÖµ£¬ÖØÐ¼ÆËãÿ¸ö´ØµÄÖÐÐÄÖµ¡£

4.¼ÆËã×¼Ôòº¯Êý¡£

5.Èô×¼Ôòº¯ÊýÂú×ããÐÖµÔòÍ˳ö£¬·ñÔò·µ»ØµÚ¶þ²½¼ÌÐø¡£

ÊäÈëÊý¾Ý˵Ã÷

Êý¾Ý:Íæ¼ÒÐÅÏ¢£¨Ô£©

Íæ¼Ò£¨ID£©
ÓÎϷʱ¼ä£¨Ð¡Ê±£©
³äÖµ½ð¶î£¨Ôª£©
1
60
55
2
90
86
3
30
22
4
15
11
5
288
300
6
223
200
7
0
0
8
14
5
9
320
280
10
65
55
11
13
0

Êý¾Ý³éÏóΪÈçÏ£¬º¬ÒåΪ ÓÎϷʱ¼ä£¨Ð¡Ê±£©£¬³äÖµ½ð¶î£¨Ôª£©

°ÑÍæ¼Ò·ÖΪ3Àࣺ

1.ÓÅÖÊÓû§£¨¸ßʱ³¤£¬¸ßÏû·Ñ£©

2.ÆÕÍ¨Íæ¼Ò£¨ÔÚÏßʱ³¤Öеȣ¬Ïû·ÑÖеȣ©

3.²»»îÔ¾Óû§ £¨ÔÚÏßʱ¼ä¶Ì£¬Ïû·ÑµÍ£©

Á÷³Ìͼ

²âÊÔ´úÂë

import org.apache.spark.mllib.clustering.KMeans
import org.apache.spark.mllib.linalg.Vectors
import org.apache.spark.{SparkConf, SparkContext}

object KMeansTest {
  def main(args: Array[String]) {
      val conf = new SparkConf()
      val sc = new SparkContext(conf)

    val data =sc.textFile(args(0))
    val parsedData =data.map(s => Vectors.dense(s.split(' ').map(_.trim.toDouble))).cache()

    //ÉèÖôصĸöÊýΪ3
    val numClusters =3
    //µü´ú20´Î
    val numIterations= 20
    //ÔËÐÐ10´Î,Ñ¡³ö×îÓŽâ
    val runs=10
    val clusters =KMeans.train(parsedData, numClusters, numIterations,runs)
    // Evaluateclustering by computing Within Set Sum of Squared Errors
    val WSSSE = clusters.computeCost(parsedData)
    println("WithinSet Sum of Squared Errors = " + WSSSE)

    val a21 =clusters.predict(Vectors.dense(57.0,30.0))
    val a22 =clusters.predict(Vectors.dense(0.0,0.0))

    //´òÓ¡³öÖÐÐĵã
    println("Clustercenters:");
    for (center <-clusters.clusterCenters) {
      println(" "+ center)
    }

    //´òÓ¡³ö²âÊÔÊý¾ÝÊôÓÚÄĸö´Ø
    println(parsedData.map(v=> v.toString() + " belong to cluster :" +clusters.predict(v)).collect().mkString("\n"))
    println("Ô¤²âµÚ21¸öÓû§µÄ¹éÀàΪ-->"+a21)
    println("Ô¤²âµÚ22¸öÓû§µÄ¹éÀàΪ-->"+a22)
  }
}

Ìá½»´úÂë½Å±¾(standaloneģʽ)£º

./bin/spark-submit

--name kmeans   \                         £¨ÏîÄ¿Ãû£©

--class naiveBayes  \                     £¨Ö÷ÀàÃû£©

--master spark://master:7077  \           £¨Ê¹Óü¯Èº¹ÜÀíÆ÷£©

~/Desktop/kmeans.jar     \                  £¨´úÂë°üλÖã©

Hdfs://master:9000/KMeansTest.data             £¨args(0)µÄ²ÎÊýÖµ£©

Êä³ö½á¹û˵Ã÷

¿ÉÒÔÃ÷ÏԵĿ´µ½£º

1ÀàÓû§ÎªÓÅÖÊÓû§

2ÀàÓû§ÎªÆÕͨÓû§

3ÀàÓû§Îª²»»îÔ¾Óû§

21¸öÓû§µÄÊý¾ÝΪ(57,30)

22¸öµÄÓû§Êý¾ÝΪ(0,0)

·ÖÀàÊÇÕýÈ·µÄ

Èý¸ö´ØµÄ¾Û¼¯ÖÐÐÄ

   
3213 ´Îä¯ÀÀ       27
Ïà¹ØÎÄÕÂ

»ùÓÚEAµÄÊý¾Ý¿â½¨Ä£
Êý¾ÝÁ÷½¨Ä££¨EAÖ¸ÄÏ£©
¡°Êý¾Ýºþ¡±£º¸ÅÄî¡¢ÌØÕ÷¡¢¼Ü¹¹Óë°¸Àý
ÔÚÏßÉ̳ÇÊý¾Ý¿âϵͳÉè¼Æ ˼·+Ч¹û
 
Ïà¹ØÎĵµ

GreenplumÊý¾Ý¿â»ù´¡Åàѵ
MySQL5.1ÐÔÄÜÓÅ»¯·½°¸
ijµçÉÌÊý¾ÝÖÐ̨¼Ü¹¹Êµ¼ù
MySQL¸ßÀ©Õ¹¼Ü¹¹Éè¼Æ
Ïà¹Ø¿Î³Ì

Êý¾ÝÖÎÀí¡¢Êý¾Ý¼Ü¹¹¼°Êý¾Ý±ê×¼
MongoDBʵս¿Î³Ì
²¢·¢¡¢´óÈÝÁ¿¡¢¸ßÐÔÄÜÊý¾Ý¿âÉè¼ÆÓëÓÅ»¯
PostgreSQLÊý¾Ý¿âʵսÅàѵ
×îл¼Æ»®
DeepSeekÔÚÈí¼þ²âÊÔÓ¦ÓÃʵ¼ù 4-12[ÔÚÏß]
DeepSeek´óÄ£ÐÍÓ¦Óÿª·¢Êµ¼ù 4-19[ÔÚÏß]
UAF¼Ü¹¹ÌåϵÓëʵ¼ù 4-11[±±¾©]
AIÖÇÄÜ»¯Èí¼þ²âÊÔ·½·¨Óëʵ¼ù 5-23[ÉϺ£]
»ùÓÚ UML ºÍEA½øÐзÖÎöÉè¼Æ 4-26[±±¾©]
ÒµÎñ¼Ü¹¹Éè¼ÆÓ뽨ģ 4-18[±±¾©]

MySQLË÷Òý±³ºóµÄÊý¾Ý½á¹¹
MySQLÐÔÄܵ÷ÓÅÓë¼Ü¹¹Éè¼Æ
SQL ServerÊý¾Ý¿â±¸·ÝÓë»Ö¸´
ÈÃÊý¾Ý¿â·ÉÆðÀ´ 10´óDB2ÓÅ»¯
oracleµÄÁÙʱ±í¿Õ¼äдÂú´ÅÅÌ
Êý¾Ý¿âµÄ¿çƽ̨Éè¼Æ

²¢·¢¡¢´óÈÝÁ¿¡¢¸ßÐÔÄÜÊý¾Ý¿â
¸ß¼¶Êý¾Ý¿â¼Ü¹¹Éè¼ÆÊ¦
HadoopÔ­ÀíÓëʵ¼ù
Oracle Êý¾Ý²Ö¿â
Êý¾Ý²Ö¿âºÍÊý¾ÝÍÚ¾ò
OracleÊý¾Ý¿â¿ª·¢Óë¹ÜÀí

GE Çø¿éÁ´¼¼ÊõÓëʵÏÖÅàѵ
º½Ìì¿Æ¹¤Ä³×Ó¹«Ë¾ Nodejs¸ß¼¶Ó¦Óÿª·¢
ÖÐÊ¢Òæ»ª ׿Խ¹ÜÀíÕß±ØÐë¾ß±¸µÄÎåÏîÄÜÁ¦
ijÐÅÏ¢¼¼Êõ¹«Ë¾ PythonÅàѵ
ij²©²ÊITϵͳ³§ÉÌ Ò×ÓÃÐÔ²âÊÔÓëÆÀ¹À
ÖйúÓÊ´¢ÒøÐÐ ²âÊÔ³ÉÊì¶ÈÄ£Ðͼ¯³É(TMMI)
ÖÐÎïÔº ²úÆ·¾­ÀíÓë²úÆ·¹ÜÀí