ǰÑÔ
ÓÃMahoutÀ´¹¹½¨ÍƼöϵͳ£¬ÊÇÒ»¼þ¼È¼òµ¥ÓÖÀ§ÄѵÄÊÂÇé¡£¼òµ¥ÊÇÒòΪMahoutÍêÕûµØ·â×°ÁË¡°Ðͬ¹ýÂË¡±Ëã·¨£¬²¢ÊµÏÖÁ˲¢Ðл¯£¬Ìṩ·Ç³£¼òµ¥µÄAPI½Ó¿Ú£»À§ÄÑÊÇÒòΪÎÒÃDz»Á˽âË㷨ϸ½Ú£¬ºÜÄÑÈ¥¸ù¾ÝÒµÎñµÄ³¡¾°½øÐÐËã·¨ÅäÖú͵÷ÓÅ¡£
±¾ÎĽ«ÉîÈëËã·¨APIÈ¥½âÊÍMahoutÍÆ¼öËã·¨µ×²ãµÄһЩÊ¡£
1. MahoutÍÆ¼öËã·¨½éÉÜ
MahouttÍÆ¼öËã·¨£¬´ÓÊý¾Ý´¦ÀíÄÜÁ¦ÉÏ£¬¿ÉÒÔ»®·ÖΪ2Àࣺ
µ¥»úÄÚ´æË㷨ʵÏÖ
»ùÓÚHadoopµÄ·Ö²½Ê½Ë㷨ʵÏÖ
1). µ¥»úÄÚ´æË㷨ʵÏÖ
µ¥»úÄÚ´æË㷨ʵÏÖ£º¾ÍÊÇÔÚµ¥»úÏÂÔËÐеÄËã·¨£¬ÊÇÓÉcf.tasteÏîĿʵÏֵģ¬ÏñÎÒµÄÃÇÊìϤµÄUserCF,ItemCF¶¼Ö§³Öµ¥»úÄÚ´æÔËÐУ¬²¢ÇÒ²ÎÊý¿ÉÒÔÁé»îÅäÖᣵ¥»úËã·¨µÄ»ù±¾ÊµÀý£¬Çë²Î¿¼ÎÄÕ£ºÓÃMaven¹¹½¨MahoutÏîÄ¿
µ¥»úÄÚ´æËã·¨µÄÎÊÌâÔÚÓÚ£¬ÊÜÏÞÓÚµ¥»úµÄ×ÊÔ´¡£¶ÔÓÚÖеȹæÄ£µÄÊý¾Ý£¬Ïñ1G,10GµÄÊý¾ÝÁ¿£¬ÓÐÄÜÁ¦½øÐмÆË㣬µ«Êdz¬¹ý100GµÄÊý¾ÝÁ¿£¬¶ÔÓÚµ¥»úÀ´ËµÊDz»¿ÉÄÜÍê³ÉµÄÈÎÎñ¡£
2). »ùÓÚHadoopµÄ·Ö²½Ê½Ë㷨ʵÏÖ
»ùÓÚHadoopµÄ·Ö²½Ê½Ë㷨ʵÏÖ£º¾ÍÊǰѵ¥»úÄÚ´æËã·¨²¢Ðл¯£¬°ÑÈÎÎñ·ÖÉ¢µ½¶ą̀¼ÆËã»úÒ»ÆðÔËÐС£MahoutÌṩÁËItemCF»ùÓÚHadoop²¢Ðл¯Ë㷨ʵÏÖ¡£»ùÓÚHadoopµÄ·Ö²½Ê½Ë㷨ʵÏÖ£¬Çë²Î¿¼ÎÄÕ£º
Mahout·Ö²½Ê½³ÌÐò¿ª·¢ »ùÓÚÎïÆ·µÄÐͬ¹ýÂËItemCF
·Ö²½Ê½²¢ÐÐËã·¨µÄÎÊÌâÔÚÓÚ£¬ÈçºÎÈõ¥»úËã·¨²¢Ðл¯¡£ÔÚµ¥»úËã·¨ÖУ¬ÎÒÃÇÖ»ÐèÒª¿¼ÂÇËã·¨£¬Êý¾Ý½á¹¹£¬Äڴ棬CPU¾Í¹»ÁË£¬µ«ÊÇ·Ö²½Ê½Ëã·¨»¹Òª¶îÍ⿼ÂǺܶàµÄÇé¿ö£¬±ÈÈç¶à½ÚµãµÄÊý¾ÝºÏ²¢£¬Êý¾ÝÅÅÐò£¬ÍøÂ·Í¨ÐŵÄЧÂÊ£¬½Úµãå´»úÖØË㣬Êý¾Ý·Ö²½Ê½´æ´¢µÈµÈµÄºÜ¶àÎÊÌâ¡£
2. Ëã·¨ÆÀÅбê×¼£ºÕÙ»ØÂÊ(recall)Óë²é×¼ÂÊ(precision)
MahoutÌṩÁË2¸öÆÀ¹ÀÍÆ¼öÆ÷µÄÖ¸±ê£¬²é×¼ÂʺÍÕÙ»ØÂÊ£¨²éÈ«ÂÊ£©£¬ÕâÁ½¸öÖ¸±êÊÇËÑË÷ÒýÇæÖоµäµÄ¶ÈÁ¿·½·¨¡£

A£º¼ìË÷µ½µÄ£¬Ïà¹ØµÄ £¨Ëѵ½µÄÒ²ÏëÒªµÄ£©
B£ºÎ´¼ìË÷µ½µÄ£¬µ«ÊÇÏà¹ØµÄ £¨Ã»Ëѵ½£¬È»¶øÊµ¼ÊÉÏÏëÒªµÄ£©
C£º¼ìË÷µ½µÄ£¬µ«ÊDz»Ïà¹ØµÄ £¨Ëѵ½µÄµ«Ã»Óõģ©
D£ºÎ´¼ìË÷µ½µÄ£¬Ò²²»Ïà¹ØµÄ £¨Ã»Ëѵ½Ò²Ã»Óõģ©
±»¼ìË÷µ½µÄÔ½¶àÔ½ºÃ£¬ÕâÊÇ×·Çó¡°²éÈ«ÂÊ¡±£¬¼´A/(A+B)£¬Ô½´óÔ½ºÃ¡£
±»¼ìË÷µ½µÄ£¬Ô½Ïà¹ØµÄÔ½¶àÔ½ºÃ£¬²»Ïà¹ØµÄÔ½ÉÙÔ½ºÃ£¬ÕâÊÇ×·Çó¡°²é×¼ÂÊ¡±£¬¼´A/(A+C)£¬Ô½´óÔ½ºÃ¡£
ÔÚ´ó¹æÄ£Êý¾Ý¼¯ºÏÖУ¬ÕâÁ½¸öÖ¸±êÊÇÏà»¥ÖÆÔ¼µÄ¡£µ±Ï£ÍûË÷Òý³ö¸ü¶àµÄÊý¾ÝµÄʱºò£¬²é×¼ÂʾͻáϽµ£¬µ±Ï£ÍûË÷Òý¸ü׼ȷµÄʱºò£¬»áË÷Òý¸üÉÙµÄÊý¾Ý¡£
3. RecommenderµÄAPI½Ó¿Ú
1). ϵͳ»·¾³:
Win7 64bit Java 1.6.0_45 Maven 3 Eclipse Juno Service Release 2 Mahout 0.8 Hadoop 1.1.2 |
2). Recommender½Ó¿ÚÎļþ£º
org.apache.mahout.cf.taste.recommender.Recommender.java

½Ó¿ÚÖз½·¨µÄ½âÊÍ£º
recommend(long userID, int howMany):
»ñµÃÍÆ¼ö½á¹û£¬¸øuserIDÍÆ¼öhowMany¸öItem
recommend(long userID, int howMany,
IDRescorer rescorer): »ñµÃÍÆ¼ö½á¹û£¬¸øuserIDÍÆ¼öhowMany¸öItem£¬¿ÉÒÔ¸ù¾Ýrescorer¶Ô½á¹¹ÖØÐÂÅÅÐò¡£
estimatePreference(long userID, long
itemID): µ±´ò·ÖΪ¿Õ£¬¹À¼ÆÓû§¶ÔÎïÆ·µÄ´ò·Ö
setPreference(long userID, long itemID,
float value): ¸³ÖµÓû§£¬ÎïÆ·£¬´ò·Ö
removePreference(long userID, long itemID):
ɾ³ýÓû§¶ÔÎïÆ·µÄ´ò·Ö
getDataModel(): ÌáÈ¡ÍÆ¼öÊý¾Ý
ͨ¹ýRecommender½Ó¿Ú£¬ÎÒ¿ÉÒԲ³öºËÐÄËã·¨£¬Ó¦¸Ã»áÔÚ×ÓÀàµÄestimatePreference()·½·¨ÖнøÐÐʵÏÖ¡£
3). ͨ¹ý¼Ì³Ð¹ØÏµµ½Recommender½Ó¿ÚµÄ×ÓÀࣺ

ÍÆ¼öË㷨ʵÏÖÀࣺ
GenericUserBasedRecommender: »ùÓÚÓû§µÄÍÆ¼öËã·¨
GenericItemBasedRecommender: »ùÓÚÎïÆ·µÄÍÆ¼öËã·¨
KnnItemBasedRecommender: »ùÓÚÎïÆ·µÄKNNÍÆ¼öËã·¨
SlopeOneRecommender: SlopeÍÆ¼öËã·¨
SVDRecommender: SVDÍÆ¼öËã·¨
TreeClusteringRecommender£ºTreeClusterÍÆ¼öËã·¨
ÏÂÃæ½«·Ö±ð½éÉÜÿÖÖËã·¨µÄʵÏÖ¡£
4. ²âÊÔ³ÌÐò£ºRecommenderTest.java
²âÊÔÊý¾Ý¼¯£ºitem.csv
1,101,5.0 1,102,3.0 1,103,2.5 2,101,2.0 2,102,2.5 2,103,5.0 2,104,2.0 3,101,2.5 3,104,4.0 3,105,4.5 3,107,5.0 4,101,5.0 4,103,3.0 4,104,4.5 4,106,4.0 5,101,4.0 5,102,3.0 5,103,2.0 5,104,4.0 5,105,3.5 5,106,4.0 |
²âÊÔ³ÌÐò£ºorg.conan.mymahout.recommendation.job.RecommenderTest.java
package org.conan.mymahout.recommendation.job;
import java.io.IOException;
import java.util.List;
import org.apache.mahout.cf.taste.common.TasteException;
import org.apache.mahout.cf.taste.eval.RecommenderBuilder;
import org.apache.mahout.cf.taste.impl.common.LongPrimitiveIterator;
import org.apache.mahout.cf.taste.model.DataModel;
import org.apache.mahout.cf.taste.recommender.RecommendedItem;
import org.apache.mahout.common.RandomUtils;
public class RecommenderTest {
final static int NEIGHBORHOOD_NUM = 2;
final static int RECOMMENDER_NUM = 3;
public static void main(String[] args) throws
TasteException, IOException {
RandomUtils.useTestSeed();
String file = "datafile/item.csv";
DataModel dataModel = RecommendFactory.buildDataModel(file);
slopeOne(dataModel);
}
public static void userCF(DataModel dataModel)
throws TasteException{}
public static void itemCF(DataModel dataModel)
throws TasteException{}
public static void slopeOne(DataModel dataModel)
throws TasteException{}
... |
ÿÖÖËã·¨¶¼Ò»¸öµ¥¶ÀµÄ·½·¨½øÐÐËã·¨²âÊÔ£¬ÈçuserCF(),itemCF(),slopeOne()¡.
5. »ùÓÚÓû§µÄÐͬ¹ýÂËËã·¨UserCF
»ùÓÚÓû§µÄÐͬ¹ýÂË£¬Í¨¹ý²»Í¬Óû§¶ÔÎïÆ·µÄÆÀ·ÖÀ´ÆÀ²âÓû§Ö®¼äµÄÏàËÆÐÔ£¬»ùÓÚÓû§Ö®¼äµÄÏàËÆÐÔ×ö³öÍÆ¼ö¡£¼òµ¥À´½²¾ÍÊÇ£º¸øÓû§ÍƼöºÍËûÐËȤÏàËÆµÄÆäËûÓû§Ï²»¶µÄÎïÆ·¡£
¾ÙÀý˵Ã÷£º

»ùÓÚÓû§µÄ CF µÄ»ù±¾Ë¼ÏëÏ൱¼òµ¥£¬»ùÓÚÓû§¶ÔÎïÆ·µÄÆ«ºÃÕÒµ½ÏàÁÚÁÚ¾ÓÓû§£¬È»ºó½«ÁÚ¾ÓÓû§Ï²»¶µÄÍÆ¼ö¸øµ±Ç°Óû§¡£¼ÆËãÉÏ£¬¾ÍÊǽ«Ò»¸öÓû§¶ÔËùÓÐÎïÆ·µÄÆ«ºÃ×÷Ϊһ¸öÏòÁ¿À´¼ÆËãÓû§Ö®¼äµÄÏàËÆ¶È£¬ÕÒµ½
K ÁھӺ󣬸ù¾ÝÁÚ¾ÓµÄÏàËÆ¶ÈÈ¨ÖØÒÔ¼°ËûÃǶÔÎïÆ·µÄÆ«ºÃ£¬Ô¤²âµ±Ç°Óû§Ã»ÓÐÆ«ºÃµÄÎ´Éæ¼°ÎïÆ·£¬¼ÆËãµÃµ½Ò»¸öÅÅÐòµÄÎïÆ·Áбí×÷ÎªÍÆ¼ö¡£Í¼
2 ¸ø³öÁËÒ»¸öÀý×Ó£¬¶ÔÓÚÓû§ A£¬¸ù¾ÝÓû§µÄÀúÊ·Æ«ºÃ£¬ÕâÀïÖ»¼ÆËãµÃµ½Ò»¸öÁÚ¾Ó ¨C Óû§ C£¬È»ºó½«Óû§
C ϲ»¶µÄÎïÆ· D ÍÆ¼ö¸øÓû§ A¡£
ÉÏÎÄÖÐͼƬºÍ½âÊÍÎÄ×Ö£¬Õª×Ô£º https://www.ibm.com/developerworks/cn/web/1103_zhaoct_recommstudy2/
Ëã·¨API: org.apache.mahout.cf.taste.impl.recommender.GenericUserBasedRecommender
@Override public float estimatePreference(long userID, long itemID) throws TasteException { DataModel model = getDataModel(); Float actualPref = model.getPreferenceValue(userID, itemID); if (actualPref != null) { return actualPref; } long[] theNeighborhood = neighborhood.getUserNeighborhood(userID); return doEstimatePreference(userID, theNeighborhood, itemID); }
protected float doEstimatePreference(long theUserID,
long[] theNeighborhood, long itemID) throws
TasteException {
if (theNeighborhood.length == 0) {
return Float.NaN;
}
DataModel dataModel = getDataModel();
double preference = 0.0;
double totalSimilarity = 0.0;
int count = 0;
for (long userID : theNeighborhood) {
if (userID != theUserID) {
// See GenericItemBasedRecommender.doEstimatePreference()
too
Float pref = dataModel.getPreferenceValue(userID,
itemID);
if (pref != null) {
double theSimilarity = similarity.userSimilarity(theUserID,
userID);
if (!Double.isNaN(theSimilarity)) {
preference += theSimilarity * pref;
totalSimilarity += theSimilarity;
count++;
}
}
}
}
// Throw out the estimate if it was based on no
data points, of course, but also if based on
// just one. This is a bit of a band-aid on the
'stock' item-based algorithm for the moment.
// The reason is that in this case the estimate
is, simply, the user's rating for one item
// that happened to have a defined similarity.
The similarity score doesn't matter, and that
// seems like a bad situation.
if (count <= 1) {
return Float.NaN;
}
float estimate = (float) (preference / totalSimilarity);
if (capper != null) {
estimate = capper.capEstimate(estimate);
}
return estimate;
} |
²âÊÔ³ÌÐò:
public static void userCF(DataModel dataModel) throws TasteException { UserSimilarity userSimilarity = RecommendFactory.userSimilarity
(RecommendFactory.SIMILARITY.EUCLIDEAN, dataModel); UserNeighborhood userNeighborhood = RecommendFactory.userNeighborhood
(RecommendFactory.NEIGHBORHOOD.NEAREST, userSimilarity, dataModel, NEIGHBORHOOD_NUM); RecommenderBuilder recommenderBuilder =
RecommendFactory.userRecommender(userSimilarity, userNeighborhood, true);
RecommendFactory.evaluate(RecommendFactory.EVALUATOR.AVERAGE_ABSOLUTE_DIFFERENCE,
recommenderBuilder, null, dataModel, 0.7);
RecommendFactory.statsEvaluator(recommenderBuilder,
null, dataModel, 2);
LongPrimitiveIterator iter = dataModel.getUserIDs();
while (iter.hasNext()) {
long uid = iter.nextLong();
List list = recommenderBuilder.buildRecommender(dataModel).recommend(uid,
RECOMMENDER_NUM);
RecommendFactory.showItems(uid, list, true);
}
} |
³ÌÐòÊä³ö£º
AVERAGE_ABSOLUTE_DIFFERENCE Evaluater Score:1.0 Recommender IR Evaluator: [Precision:0.5,Recall:0.5] uid:1,(104,4.333333)(106,4.000000) uid:2,(105,4.049678) uid:3,(103,3.512787)(102,2.747869) uid:4,(102,3.000000) |
ÓÃRÓïÑÔÖØÐ´UserCFµÄʵÏÖ£¬Çë²Î¿¼ÎÄÕ£ºÓÃR½âÎöMahoutÓû§ÍƼöÐͬ¹ýÂËËã·¨(UserCF)
6. »ùÓÚÎïÆ·µÄÐͬ¹ýÂËËã·¨ItemCF
»ùÓÚitemµÄÐͬ¹ýÂË£¬Í¨¹ýÓû§¶Ô²»Í¬itemµÄÆÀ·ÖÀ´ÆÀ²âitemÖ®¼äµÄÏàËÆÐÔ£¬»ùÓÚitemÖ®¼äµÄÏàËÆÐÔ×ö³öÍÆ¼ö¡£¼òµ¥À´½²¾ÍÊÇ£º¸øÓû§ÍƼöºÍËû֮ǰϲ»¶µÄÎïÆ·ÏàËÆµÄÎïÆ·¡£
¾ÙÀý˵Ã÷£º

»ùÓÚÎïÆ·µÄ CF µÄÔÀíºÍ»ùÓÚÓû§µÄ CF ÀàËÆ£¬Ö»ÊÇÔÚ¼ÆËãÁÚ¾Óʱ²ÉÓÃÎïÆ·±¾Éí£¬¶ø²»ÊÇ´ÓÓû§µÄ½Ç¶È£¬¼´»ùÓÚÓû§¶ÔÎïÆ·µÄÆ«ºÃÕÒµ½ÏàËÆµÄÎïÆ·£¬È»ºó¸ù¾ÝÓû§µÄÀúÊ·Æ«ºÃ£¬ÍƼöÏàËÆµÄÎïÆ·¸øËû¡£´Ó¼ÆËãµÄ½Ç¶È¿´£¬¾ÍÊǽ«ËùÓÐÓû§¶Ôij¸öÎïÆ·µÄÆ«ºÃ×÷Ϊһ¸öÏòÁ¿À´¼ÆËãÎïÆ·Ö®¼äµÄÏàËÆ¶È£¬µÃµ½ÎïÆ·µÄÏàËÆÎïÆ·ºó£¬¸ù¾ÝÓû§ÀúÊ·µÄÆ«ºÃÔ¤²âµ±Ç°Óû§»¹Ã»ÓбíʾƫºÃµÄÎïÆ·£¬¼ÆËãµÃµ½Ò»¸öÅÅÐòµÄÎïÆ·Áбí×÷ÎªÍÆ¼ö¡£Í¼
3 ¸ø³öÁËÒ»¸öÀý×Ó£¬¶ÔÓÚÎïÆ· A£¬¸ù¾ÝËùÓÐÓû§µÄÀúÊ·Æ«ºÃ£¬Ï²»¶ÎïÆ· A µÄÓû§¶¼Ï²»¶ÎïÆ· C£¬µÃ³öÎïÆ·
A ºÍÎïÆ· C ±È½ÏÏàËÆ£¬¶øÓû§ C ϲ»¶ÎïÆ· A£¬ÄÇô¿ÉÒÔÍÆ¶Ï³öÓû§ C ¿ÉÄÜҲϲ»¶ÎïÆ· C¡£
ÉÏÎÄÖÐͼƬºÍ½âÊÍÎÄ×Ö£¬Õª×Ô£º https://www.ibm.com/developerworks/cn/web/1103_zhaoct_recommstudy2/
Ëã·¨API: org.apache.mahout.cf.taste.impl.recommender.GenericItemBasedRecommender
@Override public float estimatePreference(long userID, long itemID) throws TasteException { PreferenceArray preferencesFromUser = getDataModel().getPreferencesFromUser(userID); Float actualPref = getPreferenceForItem(preferencesFromUser, itemID); if (actualPref != null) { return actualPref; } return doEstimatePreference(userID, preferencesFromUser, itemID); }
protected float doEstimatePreference(long userID,
PreferenceArray preferencesFromUser, long itemID)
throws TasteException {
double preference = 0.0;
double totalSimilarity = 0.0;
int count = 0;
double[] similarities = similarity.itemSimilarities(itemID,
preferencesFromUser.getIDs());
for (int i = 0; i < similarities.length; i++)
{
double theSimilarity = similarities[i];
if (!Double.isNaN(theSimilarity)) {
// Weights can be negative!
preference += theSimilarity * preferencesFromUser.getValue(i);
totalSimilarity += theSimilarity;
count++;
}
}
// Throw out the estimate if it was based on no
data points, of course, but also if based on
// just one. This is a bit of a band-aid on the
'stock' item-based algorithm for the moment.
// The reason is that in this case the estimate
is, simply, the user's rating for one item
// that happened to have a defined similarity.
The similarity score doesn't matter, and that
// seems like a bad situation.
if (count <= 1) {
return Float.NaN;
}
float estimate = (float) (preference / totalSimilarity);
if (capper != null) {
estimate = capper.capEstimate(estimate);
}
return estimate;
} |
²âÊÔ³ÌÐò:
public static void itemCF(DataModel dataModel) throws TasteException { ItemSimilarity itemSimilarity = RecommendFactory.itemSimilarity
(RecommendFactory.SIMILARITY.EUCLIDEAN, dataModel); RecommenderBuilder recommenderBuilder = RecommendFactory.itemRecommender(itemSimilarity, true);
RecommendFactory.evaluate(RecommendFactory.EVALUATOR.AVERAGE_ABSOLUTE_DIFFERENCE,
recommenderBuilder, null, dataModel, 0.7);
RecommendFactory.statsEvaluator(recommenderBuilder,
null, dataModel, 2);
LongPrimitiveIterator iter = dataModel.getUserIDs();
while (iter.hasNext()) {
long uid = iter.nextLong();
List list = recommenderBuilder.buildRecommender(dataModel).recommend(uid,
RECOMMENDER_NUM);
RecommendFactory.showItems(uid, list, true);
}
} |
³ÌÐòÊä³ö£º
AVERAGE_ABSOLUTE_DIFFERENCE Evaluater Score:0.8676552772521973 Recommender IR Evaluator: [Precision:0.5,Recall:1.0] uid:1,(105,3.823529)(104,3.722222)(106,3.478261) uid:2,(106,2.984848)(105,2.537037)(107,2.000000) uid:3,(106,3.648649)(102,3.380000)(103,3.312500) uid:4,(107,4.722222)(105,4.313953)(102,4.025000) uid:5,(107,3.736842) |
7. SlopeOneËã·¨
Õâ¸öËã·¨ÔÚmahout-0.8°æ±¾ÖУ¬ÒѾ±»@Deprecated¡£
SlopeOneÊÇÒ»ÖÖ¼òµ¥¸ßЧµÄÐͬ¹ýÂËËã·¨¡£Í¨¹ý¾ù²î¼ÆËã½øÐÐÆÀ·Ö¡£SlopeOneÂÛÎÄÏÂÔØ(PDF)
1). ¾ÙÀý˵Ã÷£º
Óû§X£¬Y£¬Z£¬¶ÔÓÚÎïÆ·A,B½øÐдò·Ö£¬ÈçÏÂ±í£¬ÇóZ¶ÔBµÄ´ò·ÖÊǶàÉÙ£¿

Slope oneËã·¨ÈÏΪ£ºÆ½¾ùÖµ¿ÉÒÔ´úÌæÄ³Á½¸öδ֪¸öÌåÖ®¼äµÄ´ò·Ö²îÒ죬ÊÂÎïA¶ÔÊÂÎïBµÄƽ¾ù²îÊÇ£º((5
- 4) + (4 - 2)) / 2 = 1.5£¬¾ÍµÃµ½Z¶ÔBµÄ´ò·ÖÊÇ£¬3-1.5 = 1.5¡£
Slope oneËã·¨½«Óû§µÄÆÀ·ÖÖ®¼äµÄ¹ØÏµ¿´×÷¼òµ¥µÄÏßÐÔ¹ØÏµ£º
Y = mX + b
2). ƽ¾ù¼ÓȨ¼ÆË㣺
Óû§X£¬Y£¬Z£¬¶ÔÓÚÎïÆ·A,B,C½øÐдò·Ö£¬ÈçÏÂ±í£¬ÇóZ¶ÔAµÄ´ò·ÖÊǶàÉÙ£¿

1. ¼ÆËãAºÍBµÄƽ¾ù²î, ((5-3)+(3-4))/2=0.5
2. ¼ÆËãAºÍCµÄƽ¾ù²î, (5-2)/1=3
3. Z¶ÔAµÄÆÀ·Ö£¬Í¨¹ýABµÃµ½, 2+0.5=2.5
4. Z¶ÔAµÄÆÀ·Ö£¬Í¨¹ýACµÃµ½£¬5+3=8
5. ͨ¹ý¼ÓȨƽ¾ù¼ÆËãZ¶ÔAµÄÆÀ·Ö£ºAºÍB¶¼ÓÐÆÀ¼ÛµÄÓû§ÊýΪ2,AºÍC¶¼ÓÐÆÀ¼ÛµÄÓû§ÊýΪ1£¬È¨ÖØÎª±ðÊÇ2ºÍ1£¬
(2*2.5+1*8)/(2+1)=13/3=4.33
ͨ¹ýÕâÖÖ¼òµ¥µÄ·½Ê½£¬ÎÒÃÇ¿ÉÒÔ¿ìËÙ¼ÆËã³öÒ»¸öÆÀ·ÖÏÍê³ÉÍÆ¼ö¹ý³Ì£¡
Ëã·¨API: org.apache.mahout.cf.taste.impl.recommender.slopeone.SlopeOneRecommender
@Override public float estimatePreference(long userID, long itemID) throws TasteException { DataModel model = getDataModel(); Float actualPref = model.getPreferenceValue(userID, itemID); if (actualPref != null) { return actualPref; } return doEstimatePreference(userID, itemID); } private float doEstimatePreference(long userID, long itemID) throws TasteException { double count = 0.0; double totalPreference = 0.0; PreferenceArray prefs = getDataModel().getPreferencesFromUser(userID); RunningAverage[] averages = diffStorage.getDiffs(userID, itemID, prefs); int size = prefs.length(); for (int i = 0; i < size; i++) { RunningAverage averageDiff = averages[i]; if (averageDiff != null) { double averageDiffValue = averageDiff.getAverage(); if (weighted) { double weight = averageDiff.getCount(); if (stdDevWeighted) { double stdev = ((RunningAverageAndStdDev) averageDiff).getStandardDeviation(); if (!Double.isNaN(stdev)) { weight /= 1.0 + stdev; } // If stdev is NaN, then it is because count is 1. Because we're weighting by count, // the weight is already relatively low. We effectively assume stdev is 0.0 here and // that is reasonable enough. Otherwise, dividing by NaN would yield a weight of NaN // and disqualify this pref entirely // (Thanks Daemmon) } totalPreference += weight * (prefs.getValue(i) + averageDiffValue); count += weight; } else { totalPreference += prefs.getValue(i) + averageDiffValue; count += 1.0; } } } if (count <= 0.0) { RunningAverage itemAverage = diffStorage.getAverageItemPref(itemID); return itemAverage == null ? Float.NaN : (float) itemAverage.getAverage(); } else { return (float) (totalPreference / count); } } |
²âÊÔ³ÌÐò:
public static void slopeOne(DataModel dataModel) throws TasteException { RecommenderBuilder recommenderBuilder = RecommendFactory.slopeOneRecommender();
RecommendFactory.evaluate(RecommendFactory.EVALUATOR.AVERAGE_ABSOLUTE_DIFFERENCE,
recommenderBuilder, null, dataModel, 0.7);
RecommendFactory.statsEvaluator(recommenderBuilder,
null, dataModel, 2);
LongPrimitiveIterator iter = dataModel.getUserIDs();
while (iter.hasNext()) {
long uid = iter.nextLong();
List list = recommenderBuilder.buildRecommender(dataModel).recommend(uid,
RECOMMENDER_NUM);
RecommendFactory.showItems(uid, list, true);
}
} |
³ÌÐòÊä³ö£º
AVERAGE_ABSOLUTE_DIFFERENCE Evaluater Score:1.3333333333333333 Recommender IR Evaluator: [Precision:0.25,Recall:0.5] uid:1,(105,5.750000)(104,5.250000)(106,4.500000) uid:2,(105,2.286115)(106,1.500000) uid:3,(106,2.000000)(102,1.666667)(103,1.625000) uid:4,(105,4.976859)(102,3.509071) |
8. KNN Linear interpolation item¨CbasedÍÆ¼öËã·¨
Õâ¸öËã·¨ÔÚmahout-0.8°æ±¾ÖУ¬ÒѾ±»@Deprecated¡£
Ëã·¨À´×ÔÂÛÎÄ£ºThis algorithm is based in the
paper of Robert M. Bell and Yehuda Koren in ICDM '07.
(TODOδÍê)
Ëã·¨API: org.apache.mahout.cf.taste.impl.recommender.knn.KnnItemBasedRecommender
@Override protected float doEstimatePreference(long theUserID, PreferenceArray preferencesFromUser, long itemID) throws TasteException { DataModel dataModel = getDataModel(); int size = preferencesFromUser.length(); FastIDSet possibleItemIDs = new FastIDSet(size); for (int i = 0; i < size; i++) { possibleItemIDs.add(preferencesFromUser.getItemID(i)); } possibleItemIDs.remove(itemID); List mostSimilar = mostSimilarItems(itemID, possibleItemIDs.iterator(), neighborhoodSize, null); long[] theNeighborhood = new long[mostSimilar.size() + 1]; theNeighborhood[0] = -1; List usersRatedNeighborhood = Lists.newArrayList(); int nOffset = 0; for (RecommendedItem rec : mostSimilar) { theNeighborhood[nOffset++] = rec.getItemID(); } if (!mostSimilar.isEmpty()) { theNeighborhood[mostSimilar.size()] = itemID; for (int i = 0; i < theNeighborhood.length; i++) { PreferenceArray usersNeighborhood = dataModel.getPreferencesForItem(theNeighborhood[i]); int size1 = usersRatedNeighborhood.isEmpty() ? usersNeighborhood.length() : usersRatedNeighborhood.size(); for (int j = 0; j < size1; j++) { if (i == 0) { usersRatedNeighborhood.add(usersNeighborhood.getUserID(j)); } else { if (j >= usersRatedNeighborhood.size()) { break; } long index = usersRatedNeighborhood.get(j); if (!usersNeighborhood.hasPrefWithUserID(index) || index == theUserID) { usersRatedNeighborhood.remove(index); j--; } } } } }
double[] weights = null;
if (!mostSimilar.isEmpty()) {
weights = getInterpolations(itemID, theNeighborhood,
usersRatedNeighborhood);
}
int i = 0;
double preference = 0.0;
double totalSimilarity = 0.0;
for (long jitem : theNeighborhood) {
Float pref = dataModel.getPreferenceValue(theUserID,
jitem);
if (pref != null) {
double weight = weights[i];
preference += pref * weight;
totalSimilarity += weight;
}
i++;
}
return totalSimilarity == 0.0 ? Float.NaN : (float)
(preference / totalSimilarity);
}
} |
²âÊÔ³ÌÐò:
public static void itemKNN(DataModel dataModel) throws TasteException { ItemSimilarity itemSimilarity = RecommendFactory.itemSimilarity
(RecommendFactory.SIMILARITY.EUCLIDEAN, dataModel); RecommenderBuilder recommenderBuilder = RecommendFactory.
itemKNNRecommender(itemSimilarity, new NonNegativeQuadraticOptimizer(), 10);
RecommendFactory.evaluate(RecommendFactory.EVALUATOR.AVERAGE_ABSOLUTE_DIFFERENCE,
recommenderBuilder, null, dataModel, 0.7);
RecommendFactory.statsEvaluator(recommenderBuilder,
null, dataModel, 2);
LongPrimitiveIterator iter = dataModel.getUserIDs();
while (iter.hasNext()) {
long uid = iter.nextLong();
List list = recommenderBuilder.buildRecommender(dataModel).recommend(uid,
RECOMMENDER_NUM);
RecommendFactory.showItems(uid, list, true);
}
} |
³ÌÐòÊä³ö£º
AVERAGE_ABSOLUTE_DIFFERENCE Evaluater Score:1.5 Recommender IR Evaluator: [Precision:0.5,Recall:1.0] uid:1,(107,5.000000)(104,3.501168)(106,3.498198) uid:2,(105,2.878995)(106,2.878086)(107,2.000000) uid:3,(103,3.667444)(102,3.667161)(106,3.667019) uid:4,(107,4.750247)(102,4.122755)(105,4.122709) uid:5,(107,3.833621) |
9. SVDÍÆ¼öËã·¨
(TODOδÍê)
Ëã·¨API: org.apache.mahout.cf.taste.impl.recommender.svd.SVDRecommender
@Override public float estimatePreference(long userID, long itemID) throws TasteException { double[] userFeatures = factorization.getUserFeatures(userID); double[] itemFeatures = factorization.getItemFeatures(itemID); double estimate = 0; for (int feature = 0; feature < userFeatures.length; feature++) { estimate += userFeatures[feature] * itemFeatures[feature]; } return (float) estimate; } |
²âÊÔ³ÌÐò:
public static void svd(DataModel dataModel) throws TasteException { RecommenderBuilder recommenderBuilder =
RecommendFactory.svdRecommender(new ALSWRFactorizer(dataModel, 10, 0.05, 10));
RecommendFactory.evaluate(RecommendFactory.EVALUATOR.
AVERAGE_ABSOLUTE_DIFFERENCE, recommenderBuilder,
null, dataModel, 0.7);
RecommendFactory.statsEvaluator(recommenderBuilder,
null, dataModel, 2);
LongPrimitiveIterator iter = dataModel.getUserIDs();
while (iter.hasNext()) {
long uid = iter.nextLong();
List list = recommenderBuilder.buildRecommender(dataModel).recommend(uid,
RECOMMENDER_NUM);
RecommendFactory.showItems(uid, list, true);
}
} |
³ÌÐòÊä³ö£º
AVERAGE_ABSOLUTE_DIFFERENCE Evaluater Score:0.09990564982096355 Recommender IR Evaluator: [Precision:0.5,Recall:1.0] uid:1,(104,4.032909)(105,3.390885)(107,1.858541) uid:2,(105,3.761718)(106,2.951908)(107,1.561116) uid:3,(103,5.593422)(102,2.458930)(106,-0.091259) uid:4,(105,4.068329)(102,3.534025)(107,0.206257) uid:5,(107,0.105169) |
10. Tree Cluster-based ÍÆ¼öËã·¨
Õâ¸öËã·¨ÔÚmahout-0.8°æ±¾ÖУ¬ÒѾ±»@Deprecated¡£
(TODOδÍê)
Ëã·¨API: org.apache.mahout.cf.taste.impl.recommender.TreeClusteringRecommender
@Override public float estimatePreference(long userID, long itemID) throws TasteException { DataModel model = getDataModel(); Float actualPref = model.getPreferenceValue(userID, itemID); if (actualPref != null) { return actualPref; } buildClusters(); List topRecsForUser = topRecsByUserID.get(userID); if (topRecsForUser != null) { for (RecommendedItem item : topRecsForUser) { if (itemID == item.getItemID()) { return item.getValue(); } } } // Hmm, we have no idea. The item is not in the user's cluster return Float.NaN; } |
²âÊÔ³ÌÐò:
public static void treeCluster(DataModel dataModel) throws TasteException { UserSimilarity userSimilarity = RecommendFactory.userSimilarity
(RecommendFactory.SIMILARITY.LOGLIKELIHOOD, dataModel); ClusterSimilarity clusterSimilarity = RecommendFactory.clusterSimilarity
(RecommendFactory.SIMILARITY.FARTHEST_NEIGHBOR_CLUSTER, userSimilarity); RecommenderBuilder recommenderBuilder = RecommendFactory.
treeClusterRecommender(clusterSimilarity, 10);
RecommendFactory.evaluate(RecommendFactory.EVALUATOR.AVERAGE_ABSOLUTE_DIFFERENCE,
recommenderBuilder, null, dataModel, 0.7);
RecommendFactory.statsEvaluator(recommenderBuilder,
null, dataModel, 2);
LongPrimitiveIterator iter = dataModel.getUserIDs();
while (iter.hasNext()) {
long uid = iter.nextLong();
List list = recommenderBuilder.buildRecommender(dataModel).recommend(uid,
RECOMMENDER_NUM);
RecommendFactory.showItems(uid, list, true);
}
} |
³ÌÐòÊä³ö£º
AVERAGE_ABSOLUTE_DIFFERENCE Evaluater Score:NaN Recommender IR Evaluator: [Precision:NaN,Recall:0.0] |
11. MahoutÍÆ¼öËã·¨×ܽá
Ëã·¨¼°ÊÊÓó¡¾°£º

Ëã·¨ÆÀ·ÖµÄ½á¹û£º

ͨ¹ý¶ÔÉÏÃæ¼¸ÖÖËã·¨µÄһƽ·Ö±È½Ï£ºitemCF,itemKNN,SVDµÄRrecision,RecallµÄÆÀ·ÖÖµÊÇ×îºÃµÄ£¬²¢ÇÒitemCFºÍ
SVDµÄAVERAGE_ABSOLUTE_DIFFERENCEÊÇ×îµÍµÄ£¬ËùÒÔ£¬´ÓËã·¨µÄ½Ç¶ÈÖªµÀÁË£¬ÄĸöËã·¨ÊǸü׼ȷµÄ»òÕß»áË÷Òýµ½¸ü¶àµÄÊý¾Ý¼¯¡£
ÁíÍâµÄһЩÒòËØ£º
1. Õâ3¸öÖ¸±ê£¬²¢²»ÄÜÖ±½Ó¾ö¶¨¼ÆËã½á¹ûÒ»¶¨itemCF,SVDºÃ
2. ¸÷ÖÖËã·¨µÄ²ÎÊýÎÒÃDz¢Ã»Óе÷ÓÅ
3. Êý¾ÝÁ¿ºÍÊý¾Ý·Ö²¼£¬ÊÇÓ°ÏìËã·¨µÄÆÀ·Ö |