当前位置: 代码迷 >> 综合 >> [Mahout] 第一个小实验:使用GroupLens进行推荐模型的检验
  详细解决方案

[Mahout] 第一个小实验:使用GroupLens进行推荐模型的检验

热度:34   发布时间:2023-12-08 20:50:22.0

注: 内容参考至《Mahout实战》

根据mahout实战里面的内容,接下来将使用grouplens提供的movielens-1m的数据进行推荐。

在mahout自带的example之中,已经有了能读取dat文件的代码。其扩展至FileDataModel, 因此拿过来就能直接用了。但是由于考虑到机器性能的原因,我会丢弃掉部分数据,减小运算的数据量~

 

改造主要就是在参数之中增加了一个removeRatio参数,在读取文件的时候根据这个随机数进行随机的丢弃掉部分数据。

下面就是我稍微改造的GroupLensDataModel.java

/*** Licensed to the Apache Software Foundation (ASF) under one or more* contributor license agreements.  See the NOTICE file distributed with* this work for additional information regarding copyright ownership.* The ASF licenses this file to You under the Apache License, Version 2.0* (the "License"); you may not use this file except in compliance with* the License.  You may obtain a copy of the License at**     http://www.apache.org/licenses/LICENSE-2.0** Unless required by applicable law or agreed to in writing, software* distributed under the License is distributed on an "AS IS" BASIS,* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.* See the License for the specific language governing permissions and* limitations under the License.*/import java.io.File;
import java.io.FileOutputStream;
import java.io.IOException;
import java.io.InputStream;
import java.io.OutputStreamWriter;
import java.io.Writer;
import java.net.URL;
import java.util.Random;
import java.util.regex.Pattern;import org.apache.mahout.cf.taste.impl.model.file.FileDataModel;
import org.apache.mahout.common.iterator.FileLineIterable;import com.google.common.base.Charsets;
import com.google.common.io.Closeables;
import com.google.common.io.Files;
import com.google.common.io.InputSupplier;
import com.google.common.io.Resources;public final class GroupLensDataModel extends FileDataModel {private static final String COLON_DELIMTER = "::";private static final Pattern COLON_DELIMITER_PATTERN = Pattern.compile(COLON_DELIMTER);/*** * @param ratingsFile ratingsFile GroupLens ratings.dat file in its native format* @param removeRatio try to make target file size small by random drop data* @throws IOException IOException if an error occurs while reading or writing files*/public GroupLensDataModel(File ratingsFile, double removeRatio) throws IOException {super(convertGLFile(ratingsFile, removeRatio));}/*** * @param originalFile* @param ratio will remove part of target records* @return* @throws IOException*/private static File convertGLFile(File originalFile, double ratio) throws IOException {// Now translate the file; remove commas, then convert "::" delimiter to commaFile resultFile = new File(new File(System.getProperty("java.io.tmpdir")), "ratings.txt");if (resultFile.exists()) {resultFile.delete();}Writer writer = null;try {writer = new OutputStreamWriter(new FileOutputStream(resultFile), Charsets.UTF_8);Random rand = new Random();for (String line : new FileLineIterable(originalFile, false)) {if(rand.nextDouble() > ratio) {int lastDelimiterStart = line.lastIndexOf(COLON_DELIMTER);if (lastDelimiterStart < 0) {throw new IOException("Unexpected input format on line: " + line);}String subLine = line.substring(0, lastDelimiterStart);String convertedLine = COLON_DELIMITER_PATTERN.matcher(subLine).replaceAll(",");writer.write(convertedLine);writer.write('\n');}}} catch (IOException ioe) {resultFile.delete();throw ioe;} finally {Closeables.close(writer, false);}return resultFile;}public static File readResourceToTempFile(String resourceName) throws IOException {InputSupplier<? extends InputStream> inSupplier;try {URL resourceURL = Resources.getResource(GroupLensDataModel.class, resourceName);inSupplier = Resources.newInputStreamSupplier(resourceURL);} catch (IllegalArgumentException iae) {File resourceFile = new File("src/main/java" + resourceName);inSupplier = Files.newInputStreamSupplier(resourceFile);}File tempFile = File.createTempFile("taste", null);tempFile.deleteOnExit();Files.copy(inSupplier, tempFile);return tempFile;}@Overridepublic String toString() {return "GroupLensDataModel";}}

 

下面就是主程序:

import java.io.File;
import java.io.IOException;import org.apache.mahout.cf.taste.common.TasteException;
import org.apache.mahout.cf.taste.eval.RecommenderBuilder;
import org.apache.mahout.cf.taste.eval.RecommenderEvaluator;
import org.apache.mahout.cf.taste.impl.eval.AverageAbsoluteDifferenceRecommenderEvaluator;
import org.apache.mahout.cf.taste.impl.neighborhood.NearestNUserNeighborhood;
import org.apache.mahout.cf.taste.impl.recommender.GenericUserBasedRecommender;
import org.apache.mahout.cf.taste.impl.similarity.PearsonCorrelationSimilarity;
import org.apache.mahout.cf.taste.model.DataModel;
import org.apache.mahout.cf.taste.neighborhood.UserNeighborhood;
import org.apache.mahout.cf.taste.recommender.Recommender;
import org.apache.mahout.cf.taste.similarity.UserSimilarity;public class TestGroupLens {public static void main(String[] args) {// load data settry {DataModel model = new GroupLensDataModel(new File("E:\\DataSet\\ml-1m\\ratings.dat"), 0.5);RecommenderEvaluator evaluator = new AverageAbsoluteDifferenceRecommenderEvaluator();RecommenderBuilder builder = new RecommenderBuilder() {@Overridepublic Recommender buildRecommender(DataModel dataModel)throws TasteException {UserSimilarity sim = new PearsonCorrelationSimilarity(dataModel);UserNeighborhood nbh = new NearestNUserNeighborhood(30, sim, dataModel);// 生成推荐引擎Recommender rec = new GenericUserBasedRecommender(dataModel, nbh, sim);return rec;}}; double score = evaluator.evaluate(builder, null, model, 0.7, 0.3);System.out.println(score);} catch (IOException e) {e.printStackTrace();} catch (TasteException e) {// TODO Auto-generated catch blocke.printStackTrace();} }}

 

运行的结果在0.85左右。

跟书上提供的结果0.89稍微有点差距