½éÉÜ
K-MeansÊDZ»Ó¦ÓõÄ×î¹ã·ºµÄ»ùÓÚ»®·ÖµÄ¾ÛÀàËã·¨£¬ÊÇÒ»ÖÖÓ²¾ÛÀàËã·¨£¬ÊôÓÚµäÐ͵ľÖÓòÔÐ͵ÄÄ¿±êº¯Êý¾ÛÀàµÄ´ú±í¡£Ëã·¨Ê×ÏÈËæ»úÑ¡Ôñk¸ö¶ÔÏó£¬Ã¿¸ö¶ÔÏó³õʼµØ´ú±íÒ»¸ö´ØµÄƽ¾ùÖµ»òÕßÖÐÐÄ¡£¶ÔÓÚÊ£ÓàµÄÿ¸ö¶ÔÏ󣬸ù¾ÝÆäµ½¸÷¸ö´ØÖÐÐĵľàÀ룬°ÑËûÃÇ·Ö¸ø¾àÀë×îСµÄ´ØÖÐÐÄ£¬È»ºóÖØÐ¼ÆËãÿ¸ö´ØÆ½¾ùÖµ¡£Öظ´Õâ¸ö¹ý³Ì£¬Ö±µ½¾ÛÀà×¼ÔòÔòº¯ÊýÊÕÁ²¡£×¼Ôòº¯ÊýÒ»°ã²ÉÓÃÁ½ÖÖ·½Ê½£ºµÚÒ»ÖÖÊÇÈ«¾ÖÎó²îº¯Êý£¬µÚ¶þÖÖÊÇǰºóÁ½´ÎÖÐÐÄÎó²î±ä»¯¡£
Óë·ÖÀ಻ͬ£¬·ÖÀàÊǼලѧϰ£¬ÒªÇó·ÖÀàǰÃ÷È·¸÷¸öÀà±ð£¬²¢¶ÏÑÔÿ¸öÔªËØÓ³Éäµ½Ò»¸öÀà±ð£¬¶ø¾ÛÀàÊǹ۲ìʽѧϰ£¬ÔÚ¾ÛÀàǰ¿ÉÒÔ²»ÖªµÀÀà±ðÉõÖÁ²»¸ø¶¨Àà±ðÊýÁ¿£¬ÊÇÎ޼ලѧϰµÄÒ»ÖÖ¡£Ä¿Ç°¾ÛÀà¹ã·ºÓ¦ÓÃÓÚͳ¼ÆÑ§¡¢ÉúÎïѧ¡¢Êý¾Ý¿â¼¼ÊõºÍÊг¡ÓªÏúµÈÁìÓò£¬ÏàÓ¦µÄËã·¨Ò²·Ç³£µÄ¶à¡£
K-MeansÊôÓÚÎ޼ලѧϰ£¬×î´óµÄÌØ±ðºÍÓÅÊÆÔÚÓÚÄ£Ð͵Ľ¨Á¢²»ÐèҪѵÁ·Êý¾Ý¡£ÔÚÈÕ³£¹¤×÷ÖУ¬ºÜ¶àÇé¿öÏÂûÓа취ÊÂÏÈ»ñÈ¡µ½ÓÐЧµÄѵÁ·Êý¾Ý£¬Õâʱ²ÉÓÃK-MeansÊÇÒ»¸ö²»´íµÄÑ¡Ôñ¡£µ«K-MeansÐèÒªÔ¤ÏÈÉèÖÃÓжàÉÙ¸ö´ØÀࣨKÖµ£©£¬Õâ¶ÔÓÚÏñ¼ÆËãijʡ·ÝÈ«²¿µçÐÅÓû§µÄ½»ÍùȦÕâÑùµÄ³¡¾°¾ÍÍêÈ«µÄû°ì·¨ÓÃK-Means½øÐС£¶ÔÓÚ¿ÉÒÔÈ·¶¨KÖµ²»»áÌ«´óµ«²»Ã÷È·¾«È·µÄKÖµµÄ³¡¾°£¬¿ÉÒÔ½øÐеü´úÔËË㣬ȻºóÕÒ³öcost×îСʱËù¶ÔÓ¦µÄKÖµ£¬Õâ¸öÖµÍùÍùÄܽϺõÄÃèÊöÓжàÉÙ¸ö´ØÀà¡£
ÔËÓó¡¾°
1.ÉÌÎñÉÏ£¬°ïÖúÊг¡·ÖÎöÈËÔ±´Ó¿Í»§»ù±¾¿âÖз¢ÏÖ²»Í¬µÄ¿Í»§Èº£¬²¢ÇÒÓùºÂòģʽÀ´¿Ì»²»Í¬µÄ¿Í»§ÈºÌØÕ÷¡£
2.ÉúÎïѧÉÏ£¬ÓÃÓÚÍÆµ¼Ö²ÎïºÍ¶¯ÎïµÄ·ÖÀ࣬¶Ô»ùÒòµÄ·ÖÀ࣬»ñµÃ¶ÔÖÖȺÖйÌÓнṹµÄÈÏʶ¡£
3.»¥ÁªÍøÉÏ£¬ÓÃÓÚ¶ÔWebÉϵÄÎĵµ½øÐзÖÀà´Ó¶ø·¢ÏÖÐÅÏ¢¡£
4.¶ÔÒ»¸öÓÎÏ·ÖеÄÍæ¼Ò½øÐзÖÀࣨÏÂÃæµÄ°¸Àý£©¡£
¹¤×÷ÔÀí
Õë¶Ô°üº¬n¸ö¶ÔÏóµÄÊý¾Ý¼¯ºÏDÒÔ¼°³õʼ»¯µÄ¾ÛÀàÊýÄ¿k£¬Ê¹ÓÃÏÂÃæµÄËã·¨¡£
1.´ÓÊý¾Ý¼¯ºÏDÖÐËæ»úÑ¡Ôñk¸ö¶ÔÏó×÷Ϊ³õʼ´ØÖÐÐÄ¡£
2.¸ù¾Ý´ØµÄÖÐÐÄÖµ£¬°ÑÊý¾Ý¼¯ºÏÖеÄn¸ö¶ÔÏóÈ«²¿·Ö¸ø×î¡°ÏàËÆ¡±µÄ´Ø£¨¡°ÏàËÆ¡±¸ù¾Ý¾àÀ볤¶ÌÀ´Åжϣ©¡£
3.¸ù¾Ý´ØµÄÖÐÐÄÖµ£¬ÖØÐ¼ÆËãÿ¸ö´ØµÄÖÐÐÄÖµ¡£
4.¼ÆËã×¼Ôòº¯Êý¡£
5.Èô×¼Ôòº¯ÊýÂú×ããÐÖµÔòÍ˳ö£¬·ñÔò·µ»ØµÚ¶þ²½¼ÌÐø¡£

ÊäÈëÊý¾Ý˵Ã÷
Êý¾Ý:Íæ¼ÒÐÅÏ¢£¨Ô£©
Íæ¼Ò£¨ID£© |
ÓÎϷʱ¼ä£¨Ð¡Ê±£©
|
³äÖµ½ð¶î£¨Ôª£©
|
1 |
60 |
55 |
2 |
90 |
86 |
3 |
30 |
22 |
4 |
15 |
11 |
5 |
288 |
300 |
6 |
223 |
200 |
7 |
0 |
0 |
8 |
14 |
5 |
9 |
320 |
280 |
10 |
65 |
55 |
11 |
13 |
0 |
Êý¾Ý³éÏóΪÈçÏ£¬º¬ÒåΪ ÓÎϷʱ¼ä£¨Ð¡Ê±£©£¬³äÖµ½ð¶î£¨Ôª£©

°ÑÍæ¼Ò·ÖΪ3Àࣺ
1.ÓÅÖÊÓû§£¨¸ßʱ³¤£¬¸ßÏû·Ñ£©
2.ÆÕÍ¨Íæ¼Ò£¨ÔÚÏßʱ³¤Öеȣ¬Ïû·ÑÖеȣ©
3.²»»îÔ¾Óû§ £¨ÔÚÏßʱ¼ä¶Ì£¬Ïû·ÑµÍ£©
Á÷³Ìͼ

²âÊÔ´úÂë
import org.apache.spark.mllib.clustering.KMeans
import org.apache.spark.mllib.linalg.Vectors
import org.apache.spark.{SparkConf, SparkContext}
object KMeansTest {
def main(args: Array[String]) {
val conf = new SparkConf()
val sc = new SparkContext(conf)
val data =sc.textFile(args(0))
val parsedData =data.map(s => Vectors.dense(s.split(' ').map(_.trim.toDouble))).cache()
//ÉèÖôصĸöÊýΪ3
val numClusters =3
//µü´ú20´Î
val numIterations= 20
//ÔËÐÐ10´Î,Ñ¡³ö×îÓŽâ
val runs=10
val clusters =KMeans.train(parsedData, numClusters, numIterations,runs)
// Evaluateclustering by computing Within Set Sum of Squared Errors
val WSSSE = clusters.computeCost(parsedData)
println("WithinSet Sum of Squared Errors = " + WSSSE)
val a21 =clusters.predict(Vectors.dense(57.0,30.0))
val a22 =clusters.predict(Vectors.dense(0.0,0.0))
//´òÓ¡³öÖÐÐĵã
println("Clustercenters:");
for (center <-clusters.clusterCenters) {
println(" "+ center)
}
//´òÓ¡³ö²âÊÔÊý¾ÝÊôÓÚÄĸö´Ø
println(parsedData.map(v=> v.toString() + " belong to cluster :" +clusters.predict(v)).collect().mkString("\n"))
println("Ô¤²âµÚ21¸öÓû§µÄ¹éÀàΪ-->"+a21)
println("Ô¤²âµÚ22¸öÓû§µÄ¹éÀàΪ-->"+a22)
}
}
|
Ìá½»´úÂë½Å±¾(standaloneģʽ)£º
./bin/spark-submit
--name kmeans \ £¨ÏîÄ¿Ãû£©
--class naiveBayes \ £¨Ö÷ÀàÃû£©
--master spark://master:7077 \ £¨Ê¹Óü¯Èº¹ÜÀíÆ÷£©
~/Desktop/kmeans.jar \ £¨´úÂë°üλÖã©
Hdfs://master:9000/KMeansTest.data £¨args(0)µÄ²ÎÊýÖµ£©
|
Êä³ö½á¹û˵Ã÷
¿ÉÒÔÃ÷ÏԵĿ´µ½£º
1ÀàÓû§ÎªÓÅÖÊÓû§
2ÀàÓû§ÎªÆÕͨÓû§
3ÀàÓû§Îª²»»îÔ¾Óû§

21¸öÓû§µÄÊý¾ÝΪ(57,30)
22¸öµÄÓû§Êý¾ÝΪ(0,0)
·ÖÀàÊÇÕýÈ·µÄ

Èý¸ö´ØµÄ¾Û¼¯ÖÐÐÄ
|