Äú¿ÉÒÔ¾èÖú£¬Ö§³ÖÎÒÃǵĹ«ÒæÊÂÒµ¡£

1Ôª 10Ôª 50Ôª





ÈÏÖ¤Â룺  ÑéÖ¤Âë,¿´²»Çå³þ?Çëµã»÷Ë¢ÐÂÑéÖ¤Âë ±ØÌî



  ÇóÖª ÎÄÕ ÎÄ¿â Lib ÊÓÆµ iPerson ¿Î³Ì ÈÏÖ¤ ×Éѯ ¹¤¾ß ½²×ù Modeler   Code  
»áÔ±   
 
   
 
 
     
   
 ¶©ÔÄ
  ¾èÖú
MahoutÍÆ¼öËã·¨APIÏê½â
 
»ðÁú¹ûÈí¼þ    ·¢²¼ÓÚ 2014-07-29
  4969  次浏览      27
 

ǰÑÔ

ÓÃMahoutÀ´¹¹½¨ÍƼöϵͳ£¬ÊÇÒ»¼þ¼È¼òµ¥ÓÖÀ§ÄѵÄÊÂÇé¡£¼òµ¥ÊÇÒòΪMahoutÍêÕûµØ·â×°ÁË¡°Ð­Í¬¹ýÂË¡±Ëã·¨£¬²¢ÊµÏÖÁ˲¢Ðл¯£¬Ìṩ·Ç³£¼òµ¥µÄAPI½Ó¿Ú£»À§ÄÑÊÇÒòΪÎÒÃDz»Á˽âË㷨ϸ½Ú£¬ºÜÄÑÈ¥¸ù¾ÝÒµÎñµÄ³¡¾°½øÐÐËã·¨ÅäÖú͵÷ÓÅ¡£

±¾ÎĽ«ÉîÈëËã·¨APIÈ¥½âÊÍMahoutÍÆ¼öËã·¨µ×²ãµÄһЩÊ¡£

1. MahoutÍÆ¼öËã·¨½éÉÜ

MahouttÍÆ¼öËã·¨£¬´ÓÊý¾Ý´¦ÀíÄÜÁ¦ÉÏ£¬¿ÉÒÔ»®·ÖΪ2Àࣺ

µ¥»úÄÚ´æË㷨ʵÏÖ

»ùÓÚHadoopµÄ·Ö²½Ê½Ë㷨ʵÏÖ

1). µ¥»úÄÚ´æË㷨ʵÏÖ

µ¥»úÄÚ´æË㷨ʵÏÖ£º¾ÍÊÇÔÚµ¥»úÏÂÔËÐеÄËã·¨£¬ÊÇÓÉcf.tasteÏîĿʵÏֵģ¬ÏñÎÒµÄÃÇÊìϤµÄUserCF,ItemCF¶¼Ö§³Öµ¥»úÄÚ´æÔËÐУ¬²¢ÇÒ²ÎÊý¿ÉÒÔÁé»îÅäÖᣵ¥»úËã·¨µÄ»ù±¾ÊµÀý£¬Çë²Î¿¼ÎÄÕ£ºÓÃMaven¹¹½¨MahoutÏîÄ¿

µ¥»úÄÚ´æËã·¨µÄÎÊÌâÔÚÓÚ£¬ÊÜÏÞÓÚµ¥»úµÄ×ÊÔ´¡£¶ÔÓÚÖеȹæÄ£µÄÊý¾Ý£¬Ïñ1G,10GµÄÊý¾ÝÁ¿£¬ÓÐÄÜÁ¦½øÐмÆË㣬µ«Êdz¬¹ý100GµÄÊý¾ÝÁ¿£¬¶ÔÓÚµ¥»úÀ´ËµÊDz»¿ÉÄÜÍê³ÉµÄÈÎÎñ¡£

2). »ùÓÚHadoopµÄ·Ö²½Ê½Ë㷨ʵÏÖ

»ùÓÚHadoopµÄ·Ö²½Ê½Ë㷨ʵÏÖ£º¾ÍÊǰѵ¥»úÄÚ´æËã·¨²¢Ðл¯£¬°ÑÈÎÎñ·ÖÉ¢µ½¶ą̀¼ÆËã»úÒ»ÆðÔËÐС£MahoutÌṩÁËItemCF»ùÓÚHadoop²¢Ðл¯Ë㷨ʵÏÖ¡£»ùÓÚHadoopµÄ·Ö²½Ê½Ë㷨ʵÏÖ£¬Çë²Î¿¼ÎÄÕ£º
Mahout·Ö²½Ê½³ÌÐò¿ª·¢ »ùÓÚÎïÆ·µÄЭͬ¹ýÂËItemCF

·Ö²½Ê½²¢ÐÐËã·¨µÄÎÊÌâÔÚÓÚ£¬ÈçºÎÈõ¥»úËã·¨²¢Ðл¯¡£ÔÚµ¥»úËã·¨ÖУ¬ÎÒÃÇÖ»ÐèÒª¿¼ÂÇËã·¨£¬Êý¾Ý½á¹¹£¬Äڴ棬CPU¾Í¹»ÁË£¬µ«ÊÇ·Ö²½Ê½Ëã·¨»¹Òª¶îÍ⿼ÂǺܶàµÄÇé¿ö£¬±ÈÈç¶à½ÚµãµÄÊý¾ÝºÏ²¢£¬Êý¾ÝÅÅÐò£¬ÍøÂ·Í¨ÐŵÄЧÂÊ£¬½Úµãå´»úÖØË㣬Êý¾Ý·Ö²½Ê½´æ´¢µÈµÈµÄºÜ¶àÎÊÌâ¡£

2. Ëã·¨ÆÀÅбê×¼£ºÕÙ»ØÂÊ(recall)Óë²é×¼ÂÊ(precision)

MahoutÌṩÁË2¸öÆÀ¹ÀÍÆ¼öÆ÷µÄÖ¸±ê£¬²é×¼ÂʺÍÕÙ»ØÂÊ£¨²éÈ«ÂÊ£©£¬ÕâÁ½¸öÖ¸±êÊÇËÑË÷ÒýÇæÖо­µäµÄ¶ÈÁ¿·½·¨¡£

A£º¼ìË÷µ½µÄ£¬Ïà¹ØµÄ £¨Ëѵ½µÄÒ²ÏëÒªµÄ£©

B£ºÎ´¼ìË÷µ½µÄ£¬µ«ÊÇÏà¹ØµÄ £¨Ã»Ëѵ½£¬È»¶øÊµ¼ÊÉÏÏëÒªµÄ£©

C£º¼ìË÷µ½µÄ£¬µ«ÊDz»Ïà¹ØµÄ £¨Ëѵ½µÄµ«Ã»Óõģ©

D£ºÎ´¼ìË÷µ½µÄ£¬Ò²²»Ïà¹ØµÄ £¨Ã»Ëѵ½Ò²Ã»Óõģ©

±»¼ìË÷µ½µÄÔ½¶àÔ½ºÃ£¬ÕâÊÇ×·Çó¡°²éÈ«ÂÊ¡±£¬¼´A/(A+B)£¬Ô½´óÔ½ºÃ¡£

±»¼ìË÷µ½µÄ£¬Ô½Ïà¹ØµÄÔ½¶àÔ½ºÃ£¬²»Ïà¹ØµÄÔ½ÉÙÔ½ºÃ£¬ÕâÊÇ×·Çó¡°²é×¼ÂÊ¡±£¬¼´A/(A+C)£¬Ô½´óÔ½ºÃ¡£

ÔÚ´ó¹æÄ£Êý¾Ý¼¯ºÏÖУ¬ÕâÁ½¸öÖ¸±êÊÇÏà»¥ÖÆÔ¼µÄ¡£µ±Ï£ÍûË÷Òý³ö¸ü¶àµÄÊý¾ÝµÄʱºò£¬²é×¼ÂʾͻáϽµ£¬µ±Ï£ÍûË÷Òý¸ü׼ȷµÄʱºò£¬»áË÷Òý¸üÉÙµÄÊý¾Ý¡£

3. RecommenderµÄAPI½Ó¿Ú

1). ϵͳ»·¾³:

Win7 64bit
Java 1.6.0_45
Maven 3
Eclipse Juno Service Release 2
Mahout 0.8
Hadoop 1.1.2

2). Recommender½Ó¿ÚÎļþ£º

org.apache.mahout.cf.taste.recommender.Recommender.java

½Ó¿ÚÖз½·¨µÄ½âÊÍ£º

recommend(long userID, int howMany): »ñµÃÍÆ¼ö½á¹û£¬¸øuserIDÍÆ¼öhowMany¸öItem

recommend(long userID, int howMany, IDRescorer rescorer): »ñµÃÍÆ¼ö½á¹û£¬¸øuserIDÍÆ¼öhowMany¸öItem£¬¿ÉÒÔ¸ù¾Ýrescorer¶Ô½á¹¹ÖØÐÂÅÅÐò¡£

estimatePreference(long userID, long itemID): µ±´ò·ÖΪ¿Õ£¬¹À¼ÆÓû§¶ÔÎïÆ·µÄ´ò·Ö

setPreference(long userID, long itemID, float value): ¸³ÖµÓû§£¬ÎïÆ·£¬´ò·Ö

removePreference(long userID, long itemID): ɾ³ýÓû§¶ÔÎïÆ·µÄ´ò·Ö

getDataModel(): ÌáÈ¡ÍÆ¼öÊý¾Ý

ͨ¹ýRecommender½Ó¿Ú£¬ÎÒ¿ÉÒԲ³öºËÐÄËã·¨£¬Ó¦¸Ã»áÔÚ×ÓÀàµÄestimatePreference()·½·¨ÖнøÐÐʵÏÖ¡£

3). ͨ¹ý¼Ì³Ð¹ØÏµµ½Recommender½Ó¿ÚµÄ×ÓÀࣺ

ÍÆ¼öË㷨ʵÏÖÀࣺ

GenericUserBasedRecommender: »ùÓÚÓû§µÄÍÆ¼öËã·¨

GenericItemBasedRecommender: »ùÓÚÎïÆ·µÄÍÆ¼öËã·¨

KnnItemBasedRecommender: »ùÓÚÎïÆ·µÄKNNÍÆ¼öËã·¨

SlopeOneRecommender: SlopeÍÆ¼öËã·¨

SVDRecommender: SVDÍÆ¼öËã·¨

TreeClusteringRecommender£ºTreeClusterÍÆ¼öËã·¨

ÏÂÃæ½«·Ö±ð½éÉÜÿÖÖËã·¨µÄʵÏÖ¡£

4. ²âÊÔ³ÌÐò£ºRecommenderTest.java

²âÊÔÊý¾Ý¼¯£ºitem.csv

1,101,5.0
1,102,3.0
1,103,2.5
2,101,2.0
2,102,2.5
2,103,5.0
2,104,2.0
3,101,2.5
3,104,4.0
3,105,4.5
3,107,5.0
4,101,5.0
4,103,3.0
4,104,4.5
4,106,4.0
5,101,4.0
5,102,3.0
5,103,2.0
5,104,4.0
5,105,3.5
5,106,4.0

²âÊÔ³ÌÐò£ºorg.conan.mymahout.recommendation.job.RecommenderTest.java

package org.conan.mymahout.recommendation.job;

import java.io.IOException;
import java.util.List;

import org.apache.mahout.cf.taste.common.TasteException;
import org.apache.mahout.cf.taste.eval.RecommenderBuilder;
import org.apache.mahout.cf.taste.impl.common.LongPrimitiveIterator;
import org.apache.mahout.cf.taste.model.DataModel;
import org.apache.mahout.cf.taste.recommender.RecommendedItem;
import org.apache.mahout.common.RandomUtils;

public class RecommenderTest {

final static int NEIGHBORHOOD_NUM = 2;
final static int RECOMMENDER_NUM = 3;

public static void main(String[] args) throws TasteException, IOException {
RandomUtils.useTestSeed();
String file = "datafile/item.csv";
DataModel dataModel = RecommendFactory.buildDataModel(file);
slopeOne(dataModel);
}

public static void userCF(DataModel dataModel) throws TasteException{}
public static void itemCF(DataModel dataModel) throws TasteException{}
public static void slopeOne(DataModel dataModel) throws TasteException{}

...

ÿÖÖËã·¨¶¼Ò»¸öµ¥¶ÀµÄ·½·¨½øÐÐËã·¨²âÊÔ£¬ÈçuserCF(),itemCF(),slopeOne()¡­.

5. »ùÓÚÓû§µÄЭͬ¹ýÂËËã·¨UserCF

»ùÓÚÓû§µÄЭͬ¹ýÂË£¬Í¨¹ý²»Í¬Óû§¶ÔÎïÆ·µÄÆÀ·ÖÀ´ÆÀ²âÓû§Ö®¼äµÄÏàËÆÐÔ£¬»ùÓÚÓû§Ö®¼äµÄÏàËÆÐÔ×ö³öÍÆ¼ö¡£¼òµ¥À´½²¾ÍÊÇ£º¸øÓû§ÍƼöºÍËûÐËȤÏàËÆµÄÆäËûÓû§Ï²»¶µÄÎïÆ·¡£

¾ÙÀý˵Ã÷£º

»ùÓÚÓû§µÄ CF µÄ»ù±¾Ë¼ÏëÏ൱¼òµ¥£¬»ùÓÚÓû§¶ÔÎïÆ·µÄÆ«ºÃÕÒµ½ÏàÁÚÁÚ¾ÓÓû§£¬È»ºó½«ÁÚ¾ÓÓû§Ï²»¶µÄÍÆ¼ö¸øµ±Ç°Óû§¡£¼ÆËãÉÏ£¬¾ÍÊǽ«Ò»¸öÓû§¶ÔËùÓÐÎïÆ·µÄÆ«ºÃ×÷Ϊһ¸öÏòÁ¿À´¼ÆËãÓû§Ö®¼äµÄÏàËÆ¶È£¬ÕÒµ½ K ÁھӺ󣬸ù¾ÝÁÚ¾ÓµÄÏàËÆ¶ÈÈ¨ÖØÒÔ¼°ËûÃǶÔÎïÆ·µÄÆ«ºÃ£¬Ô¤²âµ±Ç°Óû§Ã»ÓÐÆ«ºÃµÄÎ´Éæ¼°ÎïÆ·£¬¼ÆËãµÃµ½Ò»¸öÅÅÐòµÄÎïÆ·Áбí×÷ÎªÍÆ¼ö¡£Í¼ 2 ¸ø³öÁËÒ»¸öÀý×Ó£¬¶ÔÓÚÓû§ A£¬¸ù¾ÝÓû§µÄÀúÊ·Æ«ºÃ£¬ÕâÀïÖ»¼ÆËãµÃµ½Ò»¸öÁÚ¾Ó ¨C Óû§ C£¬È»ºó½«Óû§ C ϲ»¶µÄÎïÆ· D ÍÆ¼ö¸øÓû§ A¡£

ÉÏÎÄÖÐͼƬºÍ½âÊÍÎÄ×Ö£¬Õª×Ô£º https://www.ibm.com/developerworks/cn/web/1103_zhaoct_recommstudy2/

Ëã·¨API: org.apache.mahout.cf.taste.impl.recommender.GenericUserBasedRecommender

@Override
public float estimatePreference(long userID, long itemID) throws TasteException {
DataModel model = getDataModel();
Float actualPref = model.getPreferenceValue(userID, itemID);
if (actualPref != null) {
return actualPref;
}
long[] theNeighborhood = neighborhood.getUserNeighborhood(userID);
return doEstimatePreference(userID, theNeighborhood, itemID);
}

protected float doEstimatePreference(long theUserID,

long[] theNeighborhood, long itemID) throws TasteException {
if (theNeighborhood.length == 0) {
return Float.NaN;
}
DataModel dataModel = getDataModel();
double preference = 0.0;
double totalSimilarity = 0.0;
int count = 0;
for (long userID : theNeighborhood) {
if (userID != theUserID) {
// See GenericItemBasedRecommender.doEstimatePreference() too
Float pref = dataModel.getPreferenceValue(userID, itemID);
if (pref != null) {
double theSimilarity = similarity.userSimilarity(theUserID, userID);
if (!Double.isNaN(theSimilarity)) {
preference += theSimilarity * pref;
totalSimilarity += theSimilarity;
count++;
}
}
}
}
// Throw out the estimate if it was based on no data points, of course, but also if based on
// just one. This is a bit of a band-aid on the 'stock' item-based algorithm for the moment.
// The reason is that in this case the estimate is, simply, the user's rating for one item
// that happened to have a defined similarity. The similarity score doesn't matter, and that
// seems like a bad situation.
if (count <= 1) {
return Float.NaN;
}
float estimate = (float) (preference / totalSimilarity);
if (capper != null) {
estimate = capper.capEstimate(estimate);
}
return estimate;
}

²âÊÔ³ÌÐò:

public static void userCF(DataModel dataModel) throws TasteException {
UserSimilarity userSimilarity = RecommendFactory.userSimilarity (RecommendFactory.SIMILARITY.EUCLIDEAN, dataModel);
UserNeighborhood userNeighborhood = RecommendFactory.userNeighborhood (RecommendFactory.NEIGHBORHOOD.NEAREST, userSimilarity, dataModel, NEIGHBORHOOD_NUM);
RecommenderBuilder recommenderBuilder = RecommendFactory.userRecommender(userSimilarity, userNeighborhood, true);

RecommendFactory.evaluate(RecommendFactory.EVALUATOR.AVERAGE_ABSOLUTE_DIFFERENCE, recommenderBuilder, null, dataModel, 0.7);
RecommendFactory.statsEvaluator(recommenderBuilder, null, dataModel, 2);

LongPrimitiveIterator iter = dataModel.getUserIDs();
while (iter.hasNext()) {
long uid = iter.nextLong();
List list = recommenderBuilder.buildRecommender(dataModel).recommend(uid, RECOMMENDER_NUM);
RecommendFactory.showItems(uid, list, true);
}
}

³ÌÐòÊä³ö£º

AVERAGE_ABSOLUTE_DIFFERENCE Evaluater Score:1.0
Recommender IR Evaluator: [Precision:0.5,Recall:0.5]
uid:1,(104,4.333333)(106,4.000000)
uid:2,(105,4.049678)
uid:3,(103,3.512787)(102,2.747869)
uid:4,(102,3.000000)

ÓÃRÓïÑÔÖØÐ´UserCFµÄʵÏÖ£¬Çë²Î¿¼ÎÄÕ£ºÓÃR½âÎöMahoutÓû§ÍƼöЭͬ¹ýÂËËã·¨(UserCF)

6. »ùÓÚÎïÆ·µÄЭͬ¹ýÂËËã·¨ItemCF

»ùÓÚitemµÄЭͬ¹ýÂË£¬Í¨¹ýÓû§¶Ô²»Í¬itemµÄÆÀ·ÖÀ´ÆÀ²âitemÖ®¼äµÄÏàËÆÐÔ£¬»ùÓÚitemÖ®¼äµÄÏàËÆÐÔ×ö³öÍÆ¼ö¡£¼òµ¥À´½²¾ÍÊÇ£º¸øÓû§ÍƼöºÍËû֮ǰϲ»¶µÄÎïÆ·ÏàËÆµÄÎïÆ·¡£

¾ÙÀý˵Ã÷£º

»ùÓÚÎïÆ·µÄ CF µÄÔ­ÀíºÍ»ùÓÚÓû§µÄ CF ÀàËÆ£¬Ö»ÊÇÔÚ¼ÆËãÁÚ¾Óʱ²ÉÓÃÎïÆ·±¾Éí£¬¶ø²»ÊÇ´ÓÓû§µÄ½Ç¶È£¬¼´»ùÓÚÓû§¶ÔÎïÆ·µÄÆ«ºÃÕÒµ½ÏàËÆµÄÎïÆ·£¬È»ºó¸ù¾ÝÓû§µÄÀúÊ·Æ«ºÃ£¬ÍƼöÏàËÆµÄÎïÆ·¸øËû¡£´Ó¼ÆËãµÄ½Ç¶È¿´£¬¾ÍÊǽ«ËùÓÐÓû§¶Ôij¸öÎïÆ·µÄÆ«ºÃ×÷Ϊһ¸öÏòÁ¿À´¼ÆËãÎïÆ·Ö®¼äµÄÏàËÆ¶È£¬µÃµ½ÎïÆ·µÄÏàËÆÎïÆ·ºó£¬¸ù¾ÝÓû§ÀúÊ·µÄÆ«ºÃÔ¤²âµ±Ç°Óû§»¹Ã»ÓбíʾƫºÃµÄÎïÆ·£¬¼ÆËãµÃµ½Ò»¸öÅÅÐòµÄÎïÆ·Áбí×÷ÎªÍÆ¼ö¡£Í¼ 3 ¸ø³öÁËÒ»¸öÀý×Ó£¬¶ÔÓÚÎïÆ· A£¬¸ù¾ÝËùÓÐÓû§µÄÀúÊ·Æ«ºÃ£¬Ï²»¶ÎïÆ· A µÄÓû§¶¼Ï²»¶ÎïÆ· C£¬µÃ³öÎïÆ· A ºÍÎïÆ· C ±È½ÏÏàËÆ£¬¶øÓû§ C ϲ»¶ÎïÆ· A£¬ÄÇô¿ÉÒÔÍÆ¶Ï³öÓû§ C ¿ÉÄÜҲϲ»¶ÎïÆ· C¡£

ÉÏÎÄÖÐͼƬºÍ½âÊÍÎÄ×Ö£¬Õª×Ô£º https://www.ibm.com/developerworks/cn/web/1103_zhaoct_recommstudy2/

Ëã·¨API: org.apache.mahout.cf.taste.impl.recommender.GenericItemBasedRecommender

@Override
public float estimatePreference(long userID, long itemID) throws TasteException {
PreferenceArray preferencesFromUser = getDataModel().getPreferencesFromUser(userID);
Float actualPref = getPreferenceForItem(preferencesFromUser, itemID);
if (actualPref != null) {
return actualPref;
}
return doEstimatePreference(userID, preferencesFromUser, itemID);
}

protected float doEstimatePreference(long userID, PreferenceArray preferencesFromUser, long itemID)
throws TasteException {
double preference = 0.0;
double totalSimilarity = 0.0;
int count = 0;
double[] similarities = similarity.itemSimilarities(itemID, preferencesFromUser.getIDs());
for (int i = 0; i < similarities.length; i++) {
double theSimilarity = similarities[i];
if (!Double.isNaN(theSimilarity)) {
// Weights can be negative!
preference += theSimilarity * preferencesFromUser.getValue(i);
totalSimilarity += theSimilarity;
count++;
}
}
// Throw out the estimate if it was based on no data points, of course, but also if based on
// just one. This is a bit of a band-aid on the 'stock' item-based algorithm for the moment.
// The reason is that in this case the estimate is, simply, the user's rating for one item
// that happened to have a defined similarity. The similarity score doesn't matter, and that
// seems like a bad situation.
if (count <= 1) {
return Float.NaN;
}
float estimate = (float) (preference / totalSimilarity);
if (capper != null) {
estimate = capper.capEstimate(estimate);
}
return estimate;
}

²âÊÔ³ÌÐò:

public static void itemCF(DataModel dataModel) throws TasteException {
ItemSimilarity itemSimilarity = RecommendFactory.itemSimilarity (RecommendFactory.SIMILARITY.EUCLIDEAN, dataModel);
RecommenderBuilder recommenderBuilder = RecommendFactory.itemRecommender(itemSimilarity, true);

RecommendFactory.evaluate(RecommendFactory.EVALUATOR.AVERAGE_ABSOLUTE_DIFFERENCE, recommenderBuilder, null, dataModel, 0.7);
RecommendFactory.statsEvaluator(recommenderBuilder, null, dataModel, 2);

LongPrimitiveIterator iter = dataModel.getUserIDs();
while (iter.hasNext()) {
long uid = iter.nextLong();
List list = recommenderBuilder.buildRecommender(dataModel).recommend(uid, RECOMMENDER_NUM);
RecommendFactory.showItems(uid, list, true);
}
}

³ÌÐòÊä³ö£º

AVERAGE_ABSOLUTE_DIFFERENCE Evaluater Score:0.8676552772521973
Recommender IR Evaluator: [Precision:0.5,Recall:1.0]
uid:1,(105,3.823529)(104,3.722222)(106,3.478261)
uid:2,(106,2.984848)(105,2.537037)(107,2.000000)
uid:3,(106,3.648649)(102,3.380000)(103,3.312500)
uid:4,(107,4.722222)(105,4.313953)(102,4.025000)
uid:5,(107,3.736842)

7. SlopeOneËã·¨

Õâ¸öËã·¨ÔÚmahout-0.8°æ±¾ÖУ¬ÒѾ­±»@Deprecated¡£

SlopeOneÊÇÒ»ÖÖ¼òµ¥¸ßЧµÄЭͬ¹ýÂËËã·¨¡£Í¨¹ý¾ù²î¼ÆËã½øÐÐÆÀ·Ö¡£SlopeOneÂÛÎÄÏÂÔØ(PDF)

1). ¾ÙÀý˵Ã÷£º

Óû§X£¬Y£¬Z£¬¶ÔÓÚÎïÆ·A,B½øÐдò·Ö£¬ÈçÏÂ±í£¬ÇóZ¶ÔBµÄ´ò·ÖÊǶàÉÙ£¿

Slope oneËã·¨ÈÏΪ£ºÆ½¾ùÖµ¿ÉÒÔ´úÌæÄ³Á½¸öδ֪¸öÌåÖ®¼äµÄ´ò·Ö²îÒ죬ÊÂÎïA¶ÔÊÂÎïBµÄƽ¾ù²îÊÇ£º((5 - 4) + (4 - 2)) / 2 = 1.5£¬¾ÍµÃµ½Z¶ÔBµÄ´ò·ÖÊÇ£¬3-1.5 = 1.5¡£

Slope oneËã·¨½«Óû§µÄÆÀ·ÖÖ®¼äµÄ¹ØÏµ¿´×÷¼òµ¥µÄÏßÐÔ¹ØÏµ£º

Y = mX + b

2). ƽ¾ù¼ÓȨ¼ÆË㣺

Óû§X£¬Y£¬Z£¬¶ÔÓÚÎïÆ·A,B,C½øÐдò·Ö£¬ÈçÏÂ±í£¬ÇóZ¶ÔAµÄ´ò·ÖÊǶàÉÙ£¿

1. ¼ÆËãAºÍBµÄƽ¾ù²î, ((5-3)+(3-4))/2=0.5

2. ¼ÆËãAºÍCµÄƽ¾ù²î, (5-2)/1=3

3. Z¶ÔAµÄÆÀ·Ö£¬Í¨¹ýABµÃµ½, 2+0.5=2.5

4. Z¶ÔAµÄÆÀ·Ö£¬Í¨¹ýACµÃµ½£¬5+3=8

5. ͨ¹ý¼ÓȨƽ¾ù¼ÆËãZ¶ÔAµÄÆÀ·Ö£ºAºÍB¶¼ÓÐÆÀ¼ÛµÄÓû§ÊýΪ2,AºÍC¶¼ÓÐÆÀ¼ÛµÄÓû§ÊýΪ1£¬È¨ÖØÎª±ðÊÇ2ºÍ1£¬ (2*2.5+1*8)/(2+1)=13/3=4.33

ͨ¹ýÕâÖÖ¼òµ¥µÄ·½Ê½£¬ÎÒÃÇ¿ÉÒÔ¿ìËÙ¼ÆËã³öÒ»¸öÆÀ·ÖÏÍê³ÉÍÆ¼ö¹ý³Ì£¡

Ëã·¨API: org.apache.mahout.cf.taste.impl.recommender.slopeone.SlopeOneRecommender

@Override
public float estimatePreference(long userID, long itemID) throws TasteException {
DataModel model = getDataModel();
Float actualPref = model.getPreferenceValue(userID, itemID);
if (actualPref != null) {
return actualPref;
}
return doEstimatePreference(userID, itemID);
}

private float doEstimatePreference(long userID, long itemID) throws TasteException {
double count = 0.0;
double totalPreference = 0.0;
PreferenceArray prefs = getDataModel().getPreferencesFromUser(userID);
RunningAverage[] averages = diffStorage.getDiffs(userID, itemID, prefs);
int size = prefs.length();
for (int i = 0; i < size; i++) {
RunningAverage averageDiff = averages[i];
if (averageDiff != null) {
double averageDiffValue = averageDiff.getAverage();
if (weighted) {
double weight = averageDiff.getCount();
if (stdDevWeighted) {
double stdev = ((RunningAverageAndStdDev) averageDiff).getStandardDeviation();
if (!Double.isNaN(stdev)) {
weight /= 1.0 + stdev;
}
// If stdev is NaN, then it is because count is 1. Because we're weighting by count,
// the weight is already relatively low. We effectively assume stdev is 0.0 here and
// that is reasonable enough. Otherwise, dividing by NaN would yield a weight of NaN
// and disqualify this pref entirely
// (Thanks Daemmon)
}
totalPreference += weight * (prefs.getValue(i) + averageDiffValue);
count += weight;
} else {
totalPreference += prefs.getValue(i) + averageDiffValue;
count += 1.0;
}
}
}
if (count <= 0.0) {
RunningAverage itemAverage = diffStorage.getAverageItemPref(itemID);
return itemAverage == null ? Float.NaN : (float) itemAverage.getAverage();
} else {
return (float) (totalPreference / count);
}
}

²âÊÔ³ÌÐò:

public static void slopeOne(DataModel dataModel) throws TasteException {
RecommenderBuilder recommenderBuilder = RecommendFactory.slopeOneRecommender();

RecommendFactory.evaluate(RecommendFactory.EVALUATOR.AVERAGE_ABSOLUTE_DIFFERENCE, recommenderBuilder, null, dataModel, 0.7);
RecommendFactory.statsEvaluator(recommenderBuilder, null, dataModel, 2);

LongPrimitiveIterator iter = dataModel.getUserIDs();
while (iter.hasNext()) {
long uid = iter.nextLong();
List list = recommenderBuilder.buildRecommender(dataModel).recommend(uid, RECOMMENDER_NUM);
RecommendFactory.showItems(uid, list, true);
}
}

³ÌÐòÊä³ö£º

AVERAGE_ABSOLUTE_DIFFERENCE Evaluater Score:1.3333333333333333
Recommender IR Evaluator: [Precision:0.25,Recall:0.5]
uid:1,(105,5.750000)(104,5.250000)(106,4.500000)
uid:2,(105,2.286115)(106,1.500000)
uid:3,(106,2.000000)(102,1.666667)(103,1.625000)
uid:4,(105,4.976859)(102,3.509071)

8. KNN Linear interpolation item¨CbasedÍÆ¼öËã·¨

Õâ¸öËã·¨ÔÚmahout-0.8°æ±¾ÖУ¬ÒѾ­±»@Deprecated¡£

Ëã·¨À´×ÔÂÛÎÄ£ºThis algorithm is based in the paper of Robert M. Bell and Yehuda Koren in ICDM '07.

(TODOδÍê)

Ëã·¨API: org.apache.mahout.cf.taste.impl.recommender.knn.KnnItemBasedRecommender

@Override
protected float doEstimatePreference(long theUserID, PreferenceArray preferencesFromUser, long itemID)
throws TasteException {

DataModel dataModel = getDataModel();
int size = preferencesFromUser.length();
FastIDSet possibleItemIDs = new FastIDSet(size);
for (int i = 0; i < size; i++) {
possibleItemIDs.add(preferencesFromUser.getItemID(i));
}
possibleItemIDs.remove(itemID);

List mostSimilar = mostSimilarItems(itemID, possibleItemIDs.iterator(),
neighborhoodSize, null);
long[] theNeighborhood = new long[mostSimilar.size() + 1];
theNeighborhood[0] = -1;

List usersRatedNeighborhood = Lists.newArrayList();
int nOffset = 0;
for (RecommendedItem rec : mostSimilar) {
theNeighborhood[nOffset++] = rec.getItemID();
}

if (!mostSimilar.isEmpty()) {
theNeighborhood[mostSimilar.size()] = itemID;
for (int i = 0; i < theNeighborhood.length; i++) {
PreferenceArray usersNeighborhood = dataModel.getPreferencesForItem(theNeighborhood[i]);
int size1 = usersRatedNeighborhood.isEmpty() ? usersNeighborhood.length() : usersRatedNeighborhood.size();
for (int j = 0; j < size1; j++) {
if (i == 0) {
usersRatedNeighborhood.add(usersNeighborhood.getUserID(j));
} else {
if (j >= usersRatedNeighborhood.size()) {
break;
}
long index = usersRatedNeighborhood.get(j);
if (!usersNeighborhood.hasPrefWithUserID(index) || index == theUserID) {
usersRatedNeighborhood.remove(index);
j--;
}
}
}
}
}

double[] weights = null;
if (!mostSimilar.isEmpty()) {
weights = getInterpolations(itemID, theNeighborhood, usersRatedNeighborhood);
}

int i = 0;
double preference = 0.0;
double totalSimilarity = 0.0;
for (long jitem : theNeighborhood) {

Float pref = dataModel.getPreferenceValue(theUserID, jitem);

if (pref != null) {
double weight = weights[i];
preference += pref * weight;
totalSimilarity += weight;
}
i++;

}
return totalSimilarity == 0.0 ? Float.NaN : (float) (preference / totalSimilarity);
}

}

²âÊÔ³ÌÐò:

public static void itemKNN(DataModel dataModel) throws TasteException {
ItemSimilarity itemSimilarity = RecommendFactory.itemSimilarity (RecommendFactory.SIMILARITY.EUCLIDEAN, dataModel);
RecommenderBuilder recommenderBuilder = RecommendFactory. itemKNNRecommender(itemSimilarity, new NonNegativeQuadraticOptimizer(), 10);

RecommendFactory.evaluate(RecommendFactory.EVALUATOR.AVERAGE_ABSOLUTE_DIFFERENCE, recommenderBuilder, null, dataModel, 0.7);
RecommendFactory.statsEvaluator(recommenderBuilder, null, dataModel, 2);

LongPrimitiveIterator iter = dataModel.getUserIDs();
while (iter.hasNext()) {
long uid = iter.nextLong();
List list = recommenderBuilder.buildRecommender(dataModel).recommend(uid, RECOMMENDER_NUM);
RecommendFactory.showItems(uid, list, true);
}
}

³ÌÐòÊä³ö£º

AVERAGE_ABSOLUTE_DIFFERENCE Evaluater Score:1.5
Recommender IR Evaluator: [Precision:0.5,Recall:1.0]
uid:1,(107,5.000000)(104,3.501168)(106,3.498198)
uid:2,(105,2.878995)(106,2.878086)(107,2.000000)
uid:3,(103,3.667444)(102,3.667161)(106,3.667019)
uid:4,(107,4.750247)(102,4.122755)(105,4.122709)
uid:5,(107,3.833621)

9. SVDÍÆ¼öËã·¨

(TODOδÍê)

Ëã·¨API: org.apache.mahout.cf.taste.impl.recommender.svd.SVDRecommender

@Override
public float estimatePreference(long userID, long itemID) throws TasteException {
double[] userFeatures = factorization.getUserFeatures(userID);
double[] itemFeatures = factorization.getItemFeatures(itemID);
double estimate = 0;
for (int feature = 0; feature < userFeatures.length; feature++) {
estimate += userFeatures[feature] * itemFeatures[feature];
}
return (float) estimate;
}

²âÊÔ³ÌÐò:

public static void svd(DataModel dataModel) throws TasteException {
RecommenderBuilder recommenderBuilder = RecommendFactory.svdRecommender(new ALSWRFactorizer(dataModel, 10, 0.05, 10));

RecommendFactory.evaluate(RecommendFactory.EVALUATOR.

AVERAGE_ABSOLUTE_DIFFERENCE, recommenderBuilder, null, dataModel, 0.7);
RecommendFactory.statsEvaluator(recommenderBuilder, null, dataModel, 2);

LongPrimitiveIterator iter = dataModel.getUserIDs();
while (iter.hasNext()) {
long uid = iter.nextLong();
List list = recommenderBuilder.buildRecommender(dataModel).recommend(uid, RECOMMENDER_NUM);
RecommendFactory.showItems(uid, list, true);
}
}

³ÌÐòÊä³ö£º

AVERAGE_ABSOLUTE_DIFFERENCE Evaluater Score:0.09990564982096355
Recommender IR Evaluator: [Precision:0.5,Recall:1.0]
uid:1,(104,4.032909)(105,3.390885)(107,1.858541)
uid:2,(105,3.761718)(106,2.951908)(107,1.561116)
uid:3,(103,5.593422)(102,2.458930)(106,-0.091259)
uid:4,(105,4.068329)(102,3.534025)(107,0.206257)
uid:5,(107,0.105169)

10. Tree Cluster-based ÍÆ¼öËã·¨

Õâ¸öËã·¨ÔÚmahout-0.8°æ±¾ÖУ¬ÒѾ­±»@Deprecated¡£

(TODOδÍê)

Ëã·¨API: org.apache.mahout.cf.taste.impl.recommender.TreeClusteringRecommender

@Override
public float estimatePreference(long userID, long itemID) throws TasteException {
DataModel model = getDataModel();
Float actualPref = model.getPreferenceValue(userID, itemID);
if (actualPref != null) {
return actualPref;
}
buildClusters();
List topRecsForUser = topRecsByUserID.get(userID);
if (topRecsForUser != null) {
for (RecommendedItem item : topRecsForUser) {
if (itemID == item.getItemID()) {
return item.getValue();
}
}
}
// Hmm, we have no idea. The item is not in the user's cluster
return Float.NaN;
}

²âÊÔ³ÌÐò:

public static void treeCluster(DataModel dataModel) throws TasteException {
UserSimilarity userSimilarity = RecommendFactory.userSimilarity (RecommendFactory.SIMILARITY.LOGLIKELIHOOD, dataModel);
ClusterSimilarity clusterSimilarity = RecommendFactory.clusterSimilarity (RecommendFactory.SIMILARITY.FARTHEST_NEIGHBOR_CLUSTER, userSimilarity);
RecommenderBuilder recommenderBuilder = RecommendFactory. treeClusterRecommender(clusterSimilarity, 10);

RecommendFactory.evaluate(RecommendFactory.EVALUATOR.AVERAGE_ABSOLUTE_DIFFERENCE, recommenderBuilder, null, dataModel, 0.7);
RecommendFactory.statsEvaluator(recommenderBuilder, null, dataModel, 2);

LongPrimitiveIterator iter = dataModel.getUserIDs();
while (iter.hasNext()) {
long uid = iter.nextLong();
List list = recommenderBuilder.buildRecommender(dataModel).recommend(uid, RECOMMENDER_NUM);
RecommendFactory.showItems(uid, list, true);
}
}

³ÌÐòÊä³ö£º

AVERAGE_ABSOLUTE_DIFFERENCE Evaluater Score:NaN
Recommender IR Evaluator: [Precision:NaN,Recall:0.0]

11. MahoutÍÆ¼öËã·¨×ܽá

Ëã·¨¼°ÊÊÓó¡¾°£º

Ëã·¨ÆÀ·ÖµÄ½á¹û£º

ͨ¹ý¶ÔÉÏÃæ¼¸ÖÖËã·¨µÄһƽ·Ö±È½Ï£ºitemCF,itemKNN,SVDµÄRrecision,RecallµÄÆÀ·ÖÖµÊÇ×îºÃµÄ£¬²¢ÇÒitemCFºÍ SVDµÄAVERAGE_ABSOLUTE_DIFFERENCEÊÇ×îµÍµÄ£¬ËùÒÔ£¬´ÓËã·¨µÄ½Ç¶ÈÖªµÀÁË£¬ÄĸöËã·¨ÊǸü׼ȷµÄ»òÕß»áË÷Òýµ½¸ü¶àµÄÊý¾Ý¼¯¡£

ÁíÍâµÄһЩÒòËØ£º

1. Õâ3¸öÖ¸±ê£¬²¢²»ÄÜÖ±½Ó¾ö¶¨¼ÆËã½á¹ûÒ»¶¨itemCF,SVDºÃ

2. ¸÷ÖÖËã·¨µÄ²ÎÊýÎÒÃDz¢Ã»Óе÷ÓÅ

3. Êý¾ÝÁ¿ºÍÊý¾Ý·Ö²¼£¬ÊÇÓ°ÏìËã·¨µÄÆÀ·Ö

   
4969 ´Îä¯ÀÀ       27
Ïà¹ØÎÄÕÂ

»ùÓÚEAµÄÊý¾Ý¿â½¨Ä£
Êý¾ÝÁ÷½¨Ä££¨EAÖ¸ÄÏ£©
¡°Êý¾Ýºþ¡±£º¸ÅÄî¡¢ÌØÕ÷¡¢¼Ü¹¹Óë°¸Àý
ÔÚÏßÉ̳ÇÊý¾Ý¿âϵͳÉè¼Æ ˼·+Ч¹û
 
Ïà¹ØÎĵµ

GreenplumÊý¾Ý¿â»ù´¡Åàѵ
MySQL5.1ÐÔÄÜÓÅ»¯·½°¸
ijµçÉÌÊý¾ÝÖÐ̨¼Ü¹¹Êµ¼ù
MySQL¸ßÀ©Õ¹¼Ü¹¹Éè¼Æ
Ïà¹Ø¿Î³Ì

Êý¾ÝÖÎÀí¡¢Êý¾Ý¼Ü¹¹¼°Êý¾Ý±ê×¼
MongoDBʵս¿Î³Ì
²¢·¢¡¢´óÈÝÁ¿¡¢¸ßÐÔÄÜÊý¾Ý¿âÉè¼ÆÓëÓÅ»¯
PostgreSQLÊý¾Ý¿âʵսÅàѵ
×îл¼Æ»®
DeepSeekÔÚÈí¼þ²âÊÔÓ¦ÓÃʵ¼ù 4-12[ÔÚÏß]
DeepSeek´óÄ£ÐÍÓ¦Óÿª·¢Êµ¼ù 4-19[ÔÚÏß]
UAF¼Ü¹¹ÌåϵÓëʵ¼ù 4-11[±±¾©]
AIÖÇÄÜ»¯Èí¼þ²âÊÔ·½·¨Óëʵ¼ù 5-23[ÉϺ£]
»ùÓÚ UML ºÍEA½øÐзÖÎöÉè¼Æ 4-26[±±¾©]
ÒµÎñ¼Ü¹¹Éè¼ÆÓ뽨ģ 4-18[±±¾©]

MySQLË÷Òý±³ºóµÄÊý¾Ý½á¹¹
MySQLÐÔÄܵ÷ÓÅÓë¼Ü¹¹Éè¼Æ
SQL ServerÊý¾Ý¿â±¸·ÝÓë»Ö¸´
ÈÃÊý¾Ý¿â·ÉÆðÀ´ 10´óDB2ÓÅ»¯
oracleµÄÁÙʱ±í¿Õ¼äдÂú´ÅÅÌ
Êý¾Ý¿âµÄ¿çƽ̨Éè¼Æ


²¢·¢¡¢´óÈÝÁ¿¡¢¸ßÐÔÄÜÊý¾Ý¿â
¸ß¼¶Êý¾Ý¿â¼Ü¹¹Éè¼ÆÊ¦
HadoopÔ­ÀíÓëʵ¼ù
Oracle Êý¾Ý²Ö¿â
Êý¾Ý²Ö¿âºÍÊý¾ÝÍÚ¾ò
OracleÊý¾Ý¿â¿ª·¢Óë¹ÜÀí


GE Çø¿éÁ´¼¼ÊõÓëʵÏÖÅàѵ
º½Ìì¿Æ¹¤Ä³×Ó¹«Ë¾ Nodejs¸ß¼¶Ó¦Óÿª·¢
ÖÐÊ¢Òæ»ª ׿Խ¹ÜÀíÕß±ØÐë¾ß±¸µÄÎåÏîÄÜÁ¦
ijÐÅÏ¢¼¼Êõ¹«Ë¾ PythonÅàѵ
ij²©²ÊITϵͳ³§ÉÌ Ò×ÓÃÐÔ²âÊÔÓëÆÀ¹À
ÖйúÓÊ´¢ÒøÐÐ ²âÊÔ³ÉÊì¶ÈÄ£Ðͼ¯³É(TMMI)
ÖÐÎïÔº ²úÆ·¾­ÀíÓë²úÆ·¹ÜÀí