当前位置: 代码迷 >> 综合 >> 利用 mahout 实现的 k-means 算法对欧冠球队分档
  详细解决方案

利用 mahout 实现的 k-means 算法对欧冠球队分档

热度:49   发布时间:2024-01-11 18:09:58.0

网络上现在介绍mahout实现的 k-means 算法很多都使用Synthetic Control Chart Time Series数据synthetic_control.data 来做分析的数据。不过对于初学者来说,synthetic_control.data 的数据可能不太便于理解(它包括600个数据点,每个数据点有60个属性),同时分析出来的结果也不直观,难以在初学者脑中迅速形成感性的印象,并实在地体会到 k-means 算法所带来的显著效果。因此,本文想以一个简单易懂并且生动的例子来展示如何利用 mahout 实现的k-means算法来为我们服务。

笔者是一位球迷,平时喜欢看欧洲冠军联赛,而目前2013年度的欧冠小组赛各个球队鏖战正酣。不过今年的比赛异常激烈,几轮小组赛后也很难通过个人的经验轻易地将所有球队相对准确地分成各个档次,甚至预测哪支球队最具冠军相。所以本文想用机器学习的算法,基于 mahout 所实现的 k-means算法来将所有球队分成不同档次。

2013年度的欧冠共分ABCDEFGH8个小组,每个小组四个队,以下是我从足球技术网站(http://www.footballdatabase.com)上下载到的最新球队技术统计表:


统计表中,对于每个球队(一个数据点)来说总共有18个属性。这18个属性依次是:积分,净胜球,总的完成场次数,总的赢球场次,总的平球场次,总的输球场次,总的进球数,总的失球数,主场赢球场次,主场平球场次,主场输球场次,主场进球数,主场失球数,客场赢球场次,客场平球场次,客场输球场次,客场进球数,客场失球数。当然,以上的数据我们还需要进一步处理,让其变为能作为k-means算法识别的输入数据。本文中笔者将数据输入到本地磁盘文件testdata/euro.data中,内容如下:

8 3 4 2 2 0 6 3 2 0 0 5 2 0 2 0 1 1

7 3 4 2 1 1 8 5 2 0 0 6 1 0 1 1 2 4

5 -2 4 1 2 1 3 5 0 2 0 1 1 1 0 1 2 4

1 -4 4 0 1 3 1 5 0 1 1 0 2 0 0 2 1 3

10 10 4 3 1 0 14 4 2 0 0 6 1 1 1 0 8 3

4 -4 4 1 1 2 6 10 1 0 1 4 7 0 1 1 2 3

4 -5 4 1 1 2 3 8 1 1 0 2 1 0 0 2 1 7

3 -1 4 0 3 1 6 7 0 2 0 4 4 0 1 1 2 3

10 11 4 3 1 0 13 2 1 1 0 4 1 2 0 0 9 1

6 1 3 2 0 1 5 4 1 0 1 2 4 1 0 0 3 0

4 -2 4 1 1 2 3 5 1 1 0 3 1 0 0 2 0 4

1 -10 4 0 1 3 1 11 0 0 2 0 8 0 1 1 1 3

12 11 4 4 0 0 12 1 2 0 0 8 0 2 0 0 4 1

9 5 4 3 0 1 11 6 1 0 1 6 5 2 0 0 5 1

3 -6 4 1 0 3 6 12 1 0 1 4 4 0 0 2 2 8

0 -10 4 0 0 4 2 12 0 0 2 0 4 0 0 2 2 8

9 9 4 3 0 1 11 2 1 0 1 4 2 2 0 0 7 0

6 -2 4 2 0 2 4 6 1 0 1 3 3 1 0 1 1 3

5 0 4 1 2 1 4 4 0 1 1 1 2 1 1 0 3 2

2 -7 4 0 2 2 2 9 0 1 1 1 5 0 1 1 1 4

9 3 4 3 0 1 6 3 1 0 1 3 2 2 0 0 3 1

9 1 4 3 0 1 7 6 2 0 0 5 3 1 0 1 2 3

6 2 4 2 0 2 6 4 1 0 1 3 1 1 0 1 3 3

0 -6 4 0 0 4 4 10 0 0 2 2 4 0 0 2 2 6

12 10 4 4 0 0 12 2 2 0 0 7 1 2 0 0 5 1

5 -1 4 1 2 1 3 4 0 2 0 1 1 1 0 1 2 3

4 -1 4 1 1 2 3 4 0 0 2 1 3 1 1 0 2 1

1 -8 4 0 1 3 0 8 0 0 2 0 4 0 1 1 0 4

10 7 4 3 1 0 9 2 2 0 0 7 1 1 1 0 2 1

5 0 4 1 2 1 5 5 1 1 0 3 1 0 1 1 2 4

4 -4 4 1 1 2 3 7 1 1 0 2 1 0 0 2 1 6

3 -3 4 1 0 3 2 5 1 01 2 2 0 0 2 0 3

接下来,把该数据文件上传到hdfs上,命令为 “hadoop dfs -put testdata/euro.data /user/root/testdata/ ”。在我的实验集群里,hadoop daemons的用户是root,因此mahout默认的工作目录就是root用户的目录“/user/root”,而/user/root/testdata目录就是mahout默认的输入目录。

此时,我们就可以开始运行mahout的k-means算法来对输入数据进行计算,命令为“mahout org.apache.mahout.clustering.syntheticcontrol.kmeans.Job”。在本例中,为了显示简单,我并没有制定详细的参数,而选择使用mahout k-mean算法的默认参数。执行该命令后,mahout依次启动、执行了8个job。由于命令行返回的执行过程太长,我仅截取最后的一部分在此显示:

13/11/17 07:05:59 INFO kmeans.Job: Dumping out clusters from clusters:output/clusters-*-final and clusteredPoints: output/clusteredPoints

VL-0{n=3 c=[0:9.000, 1:4.333, 2:4.000,3:2.667, 4:1.000, 5:0.333, 6:7.000, 7:2.667, 8:1.667, 10:0.333, 11:5.000,12:1.667, 13:1.000, 14:1.000, 16:2.000, 17:1.000] r=[0:0.816, 1:1.886, 3:0.471,4:0.816, 5:0.471, 6:1.414, 7:0.471, 8:0.471, 10:0.471, 11:1.633, 12:0.471,13:0.816, 14:0.816, 16:0.816]}

       Weight : [props - optional]: Point:

       1.0: [0:8.000, 1:3.000, 2:4.000, 3:2.000, 4:2.000, 6:6.000, 7:3.000,8:2.000, 11:5.000, 12:2.000, 14:2.000, 16:1.000, 17:1.000]

       1.0: [0:9.000, 1:3.000, 2:4.000, 3:3.000, 5:1.000, 6:6.000, 7:3.000,8:1.000, 10:1.000, 11:3.000, 12:2.000, 13:2.000, 16:3.000, 17:1.000]

       1.0: [0:10.000, 1:7.000, 2:4.000, 3:3.000, 4:1.000, 6:9.000, 7:2.000,8:2.000, 11:7.000, 12:1.000, 13:1.000, 14:1.000, 16:2.000, 17:1.000]

VL-1{n=2 c=[0:8.000, 1:2.000, 2:4.000,3:2.500, 4:0.500, 5:1.000, 6:7.500, 7:5.500, 8:2.000, 11:5.500, 12:2.000,13:0.500, 14:0.500, 15:1.000, 16:2.000, 17:3.500] r=[0:1.000, 1:1.000, 3:0.500,4:0.500, 6:0.500, 7:0.500, 11:0.500, 12:1.000, 13:0.500, 14:0.500, 17:0.500]}

        Weight : [props - optional]:  Point:

       1.0: [0:7.000, 1:3.000, 2:4.000, 3:2.000, 4:1.000, 5:1.000, 6:8.000,7:5.000, 8:2.000, 11:6.000, 12:1.000, 14:1.000, 15:1.000, 16:2.000, 17:4.000]

       1.0: [0:9.000, 1:1.000, 2:4.000, 3:3.000, 5:1.000, 6:7.000, 7:6.000,8:2.000, 11:5.000, 12:3.000, 13:1.000, 15:1.000, 16:2.000, 17:3.000]

VL-22{n=5 c=[5.200, 0.400, 3.800, 1.400,1.000, 1.400, 4.600, 4.200, 0.600, 0.400, 1.000, 2.000, 2.200, 0.800, 0.600,0.400, 2.600, 2.000] r=[0.748, 1.020, 0.400, 0.490, 0.894, 0.490, 1.020, 0.400,0.490, 0.490, 0.632, 0.894, 1.166, 0.400, 0.490, 0.490, 0.490, 1.414]}

       Weight : [props - optional]: Point:

       1.0: [0:6.000, 1:1.000, 2:3.000, 3:2.000, 5:1.000, 6:5.000, 7:4.000,8:1.000, 10:1.000, 11:2.000, 12:4.000, 13:1.000, 16:3.000]

       1.0: [0:5.000, 2:4.000, 3:1.000, 4:2.000, 5:1.000, 6:4.000, 7:4.000,9:1.000, 10:1.000, 11:1.000, 12:2.000, 13:1.000, 14:1.000, 16:3.000, 17:2.000]

       1.0: [0:6.000, 1:2.000, 2:4.000, 3:2.000, 5:2.000, 6:6.000, 7:4.000,8:1.000, 10:1.000, 11:3.000, 12:1.000, 13:1.000, 15:1.000, 16:3.000, 17:3.000]

       1.0: [0:4.000, 1:-1.000, 2:4.000, 3:1.000, 4:1.000, 5:2.000, 6:3.000,7:4.000, 10:2.000, 11:1.000, 12:3.000, 13:1.000, 14:1.000, 16:2.000, 17:1.000]

       1.0: [0:5.000, 2:4.000, 3:1.000, 4:2.000, 5:1.000, 6:5.000, 7:5.000,8:1.000, 9:1.000, 11:3.000, 12:1.000, 14:1.000, 15:1.000, 16:2.000, 17:4.000]

VL-20{n=6 c=[0:10.333, 1:9.333, 2:4.000,3:3.333, 4:0.333, 5:0.333, 6:12.167, 7:2.833, 8:1.500, 9:0.167, 10:0.333, 11:5.833,12:1.667, 13:1.833, 14:0.167, 16:6.333, 17:1.167] r=[0:1.247, 1:2.055, 3:0.471,4:0.471, 5:0.471, 6:1.067, 7:1.675, 8:0.500, 9:0.373, 10:0.471, 11:1.462,12:1.599, 13:0.373, 14:0.373, 16:1.795, 17:0.898]}

       Weight : [props - optional]: Point:

       1.0: [0:10.000, 1:10.000, 2:4.000, 3:3.000, 4:1.000, 6:14.000, 7:4.000,8:2.000, 11:6.000, 12:1.000, 13:1.000, 14:1.000, 16:8.000, 17:3.000]

       1.0: [0:10.000, 1:11.000, 2:4.000, 3:3.000, 4:1.000, 6:13.000, 7:2.000,8:1.000, 9:1.000, 11:4.000, 12:1.000, 13:2.000, 16:9.000, 17:1.000]

       1.0: [0:12.000, 1:11.000, 2:4.000, 3:4.000, 6:12.000, 7:1.000, 8:2.000,11:8.000, 13:2.000, 16:4.000, 17:1.000]

       1.0: [0:9.000, 1:5.000, 2:4.000, 3:3.000, 5:1.000, 6:11.000, 7:6.000,8:1.000, 10:1.000, 11:6.000, 12:5.000, 13:2.000, 16:5.000, 17:1.000]

       1.0: [0:9.000, 1:9.000, 2:4.000, 3:3.000, 5:1.000, 6:11.000, 7:2.000,8:1.000, 10:1.000, 11:4.000, 12:2.000, 13:2.000, 16:7.000]

       1.0: [0:12.000, 1:10.000, 2:4.000, 3:4.000, 6:12.000, 7:2.000, 8:2.000,11:7.000, 12:1.000, 13:2.000, 16:5.000, 17:1.000]

VL-26{n=9 c=[3.889, -2.667, 4.000, 0.889,1.222, 1.889, 3.111, 5.778, 0.556, 1.111, 0.333, 2.000, 1.778, 0.333, 0.111,1.556, 1.111, 4.000] r=[0:1.370, 1:1.333, 3:0.567, 4:0.916, 5:0.737, 6:1.286,7:1.227, 8:0.497, 9:0.737, 10:0.471, 11:1.155, 12:1.030, 13:0.471, 14:0.314,15:0.497, 16:0.737, 17:1.414]}

       Weight : [props - optional]: Point:

       1.0: [0:5.000, 1:-2.000, 2:4.000, 3:1.000, 4:2.000, 5:1.000, 6:3.000,7:5.000, 9:2.000, 11:1.000, 12:1.000, 13:1.000, 15:1.000, 16:2.000, 17:4.000]

       1.0: [0:1.000, 1:-4.000, 2:4.000, 4:1.000, 5:3.000, 6:1.000, 7:5.000,9:1.000, 10:1.000, 12:2.000, 15:2.000, 16:1.000, 17:3.000]

       1.0: [0:4.000, 1:-5.000, 2:4.000, 3:1.000, 4:1.000, 5:2.000, 6:3.000,7:8.000, 8:1.000, 9:1.000, 11:2.000, 12:1.000, 15:2.000, 16:1.000, 17:7.000]

       1.0: [0:3.000, 1:-1.000, 2:4.000, 4:3.000, 5:1.000, 6:6.000, 7:7.000,9:2.000, 11:4.000, 12:4.000, 14:1.000, 15:1.000, 16:2.000, 17:3.000]

       1.0: [0:4.000, 1:-2.000, 2:4.000, 3:1.000, 4:1.000, 5:2.000, 6:3.000,7:5.000, 8:1.000, 9:1.000, 11:3.000, 12:1.000, 15:2.000, 17:4.000]

       1.0: [0:6.000, 1:-2.000, 2:4.000, 3:2.000, 5:2.000, 6:4.000, 7:6.000,8:1.000, 10:1.000, 11:3.000, 12:3.000, 13:1.000, 15:1.000, 16:1.000, 17:3.000]

1.0: [0:5.000, 1:-1.000, 2:4.000, 3:1.000,4:2.000, 5:1.000, 6:3.000, 7:4.000, 9:2.000, 11:1.000, 12:1.000, 13:1.000,15:1.000, 16:2.000, 17:3.000]

       1.0: [0:4.000, 1:-4.000, 2:4.000, 3:1.000, 4:1.000, 5:2.000, 6:3.000,7:7.000, 8:1.000, 9:1.000, 11:2.000, 12:1.000, 15:2.000, 16:1.000, 17:6.000]

       1.0: [0:3.000, 1:-3.000, 2:4.000, 3:1.000, 5:3.000, 6:2.000, 7:5.000,8:1.000, 10:1.000, 11:2.000, 12:2.000, 15:2.000, 17:3.000]

VL-6{n=7 c=[0:1.571, 1:-7.286, 2:4.000, 3:0.286,4:0.714, 5:3.000, 6:3.000, 7:10.286, 8:0.286, 9:0.143, 10:1.571, 11:1.571,12:5.143, 14:0.571, 15:1.429, 16:1.429, 17:5.143] r=[0:1.400, 1:2.050, 3:0.452,4:0.700, 5:0.756, 6:2.204, 7:1.385, 8:0.452, 9:0.350, 10:0.495, 11:1.678,12:1.552, 14:0.495, 15:0.495, 16:0.728, 17:2.030]}

       Weight : [props - optional]: Point:

       1.0: [0:4.000, 1:-4.000, 2:4.000, 3:1.000, 4:1.000, 5:2.000, 6:6.000,7:10.000, 8:1.000, 10:1.000, 11:4.000, 12:7.000, 14:1.000, 15:1.000, 16:2.000,17:3.000]

       1.0: [0:1.000, 1:-10.000, 2:4.000, 4:1.000, 5:3.000, 6:1.000, 7:11.000,10:2.000, 12:8.000, 14:1.000, 15:1.000, 16:1.000, 17:3.000]

       1.0: [0:3.000, 1:-6.000, 2:4.000, 3:1.000, 5:3.000, 6:6.000, 7:12.000,8:1.000, 10:1.000, 11:4.000, 12:4.000, 15:2.000, 16:2.000, 17:8.000]

       1.0: [1:-10.000, 2:4.000, 5:4.000, 6:2.000, 7:12.000, 10:2.000,12:4.000, 15:2.000, 16:2.000, 17:8.000]

       1.0: [0:2.000, 1:-7.000, 2:4.000, 4:2.000, 5:2.000, 6:2.000, 7:9.000,9:1.000, 10:1.000, 11:1.000, 12:5.000, 14:1.000, 15:1.000, 16:1.000, 17:4.000]

       1.0: [1:-6.000, 2:4.000, 5:4.000, 6:4.000, 7:10.000, 10:2.000, 11:2.000,12:4.000, 15:2.000, 16:2.000, 17:6.000]

       1.0: [0:1.000, 1:-8.000, 2:4.000, 4:1.000, 5:3.000, 7:8.000, 10:2.000,12:4.000, 14:1.000, 15:1.000, 17:4.000]

13/11/17 07:05:59 INFO clustering.ClusterDumper: Wrote 6 clusters

13/11/17 07:05:59 INFO driver.MahoutDriver: Program took 620405 ms(Minutes: 10.340083333333334)

当命令执行完以后,我们可以通过“hadoop dfs -lsr”命令查看hdfs上面的输出:

drwx------  - root supergroup          02013-11-17 07:05 /user/root/.staging

drwxr-xr-x  - root supergroup          02013-11-17 07:05 /user/root/output

-rw-r--r--  1 root supergroup        1942013-11-17 07:05 /user/root/output/_policy

drwxr-xr-x  - root supergroup          02013-11-17 07:05 /user/root/output/clusteredPoints

-rw-r--r--  1 root supergroup          02013-11-17 07:05 /user/root/output/clusteredPoints/_SUCCESS

drwxr-xr-x  - root supergroup          02013-11-17 07:05 /user/root/output/clusteredPoints/_logs

drwxr-xr-x  - root supergroup          02013-11-17 07:05 /user/root/output/clusteredPoints/_logs/history

-rw-r--r--  1 root supergroup       90562013-11-17 07:05/user/root/output/clusteredPoints/_logs/history/job_201311110448_0029_1384700717566_root_Cluster+Classification+Driver+running+over+input%3A+

-rw-r--r--  1 root supergroup      231422013-11-17 07:05 /user/root/output/clusteredPoints/_logs/history/job_201311110448_0029_conf.xml

-rw-r--r--  1 root supergroup       48562013-11-17 07:05 /user/root/output/clusteredPoints/part-m-00000

drwxr-xr-x  - root supergroup          02013-11-17 06:57 /user/root/output/clusters-0

-rw-r--r--  1 root supergroup        1942013-11-17 06:57 /user/root/output/clusters-0/_policy

-rw-r--r--  1 root supergroup        6222013-11-17 06:57 /user/root/output/clusters-0/part-00000

-rw-r--r--  1 root supergroup        6762013-11-17 06:57 /user/root/output/clusters-0/part-00001

-rw-r--r--  1 root supergroup        6762013-11-17 06:57 /user/root/output/clusters-0/part-00002

-rw-r--r--  1 root supergroup        6492013-11-17 06:57 /user/root/output/clusters-0/part-00003

-rw-r--r--  1 root supergroup        6762013-11-17 06:57 /user/root/output/clusters-0/part-00004

-rw-r--r--  1 root supergroup        6762013-11-17 06:57 /user/root/output/clusters-0/part-00005

drwxr-xr-x  - root supergroup          02013-11-17 06:59 /user/root/output/clusters-1

-rw-r--r--  1 root supergroup          02013-11-17 06:59 /user/root/output/clusters-1/_SUCCESS

drwxr-xr-x  - root supergroup          02013-11-17 06:57 /user/root/output/clusters-1/_logs

drwxr-xr-x  - root supergroup          02013-11-17 06:57 /user/root/output/clusters-1/_logs/history

-rw-r--r--  1 root supergroup      135992013-11-17 06:57/user/root/output/clusters-1/_logs/history/job_201311110448_0023_1384700247814_root_Cluster+Iterator+running+iteration+1+over+priorPat

-rw-r--r--  1 root supergroup      23351 2013-11-17 06:57/user/root/output/clusters-1/_logs/history/job_201311110448_0023_conf.xml

-rw-r--r--  1 root supergroup        1942013-11-17 06:59 /user/root/output/clusters-1/_policy

-rw-r--r--  1 root supergroup       27462013-11-17 06:58 /user/root/output/clusters-1/part-r-00000

drwxr-xr-x  - root supergroup          02013-11-17 07:00 /user/root/output/clusters-2

-rw-r--r--  1 root supergroup          02013-11-17 07:00 /user/root/output/clusters-2/_SUCCESS

drwxr-xr-x  - root supergroup          02013-11-17 06:59 /user/root/output/clusters-2/_logs

drwxr-xr-x  - root supergroup          02013-11-17 06:59 /user/root/output/clusters-2/_logs/history

-rw-r--r--  1 root supergroup      135952013-11-17 06:59 /user/root/output/clusters-2/_logs/history/job_201311110448_0024_1384700349954_root_Cluster+Iterator+running+iteration+2+over+priorPat

-rw-r--r--  1 root supergroup      233512013-11-17 06:59/user/root/output/clusters-2/_logs/history/job_201311110448_0024_conf.xml

-rw-r--r--  1 root supergroup        1942013-11-17 07:00 /user/root/output/clusters-2/_policy

-rw-r--r--  1 root supergroup       28182013-11-17 07:00 /user/root/output/clusters-2/part-r-00000

drwxr-xr-x  - root supergroup          02013-11-17 07:01 /user/root/output/clusters-3

-rw-r--r--  1 root supergroup          02013-11-17 07:01 /user/root/output/clusters-3/_SUCCESS

drwxr-xr-x  - root supergroup          02013-11-17 07:00 /user/root/output/clusters-3/_logs

drwxr-xr-x  - root supergroup          0 2013-11-17 07:00/user/root/output/clusters-3/_logs/history

-rw-r--r--  1 root supergroup      135962013-11-17 07:00/user/root/output/clusters-3/_logs/history/job_201311110448_0025_1384700433064_root_Cluster+Iterator+running+iteration+3+over+priorPat

-rw-r--r--  1 root supergroup      233512013-11-17 07:00/user/root/output/clusters-3/_logs/history/job_201311110448_0025_conf.xml

-rw-r--r--  1 root supergroup        1942013-11-17 07:01 /user/root/output/clusters-3/_policy

-rw-r--r--  1 root supergroup       28272013-11-17 07:01 /user/root/output/clusters-3/part-r-00000

drwxr-xr-x  - root supergroup          02013-11-17 07:02 /user/root/output/clusters-4

-rw-r--r--  1 root supergroup          02013-11-17 07:02 /user/root/output/clusters-4/_SUCCESS

drwxr-xr-x  - root supergroup          02013-11-17 07:01 /user/root/output/clusters-4/_logs

drwxr-xr-x  - root supergroup          02013-11-17 07:01 /user/root/output/clusters-4/_logs/history

-rw-r--r--  1 root supergroup      135952013-11-17 07:01/user/root/output/clusters-4/_logs/history/job_201311110448_0026_1384700503944_root_Cluster+Iterator+running+iteration+4+over+priorPat

-rw-r--r--  1 root supergroup      233512013-11-17 07:01 /user/root/output/clusters-4/_logs/history/job_201311110448_0026_conf.xml

-rw-r--r--  1 root supergroup        1942013-11-17 07:02 /user/root/output/clusters-4/_policy

-rw-r--r--  1 root supergroup       28272013-11-17 07:02 /user/root/output/clusters-4/part-r-00000

drwxr-xr-x  - root supergroup          0 2013-11-17 07:04/user/root/output/clusters-5

-rw-r--r--  1 root supergroup          02013-11-17 07:04 /user/root/output/clusters-5/_SUCCESS

drwxr-xr-x  - root supergroup          02013-11-17 07:02 /user/root/output/clusters-5/_logs

drwxr-xr-x  - root supergroup          02013-11-17 07:02 /user/root/output/clusters-5/_logs/history

-rw-r--r--  1 root supergroup      135952013-11-17 07:02/user/root/output/clusters-5/_logs/history/job_201311110448_0027_1384700577994_root_Cluster+Iterator+running+iteration+5+over+priorPat

-rw-r--r--  1 root supergroup      233512013-11-17 07:02/user/root/output/clusters-5/_logs/history/job_201311110448_0027_conf.xml

-rw-r--r--  1 root supergroup        1942013-11-17 07:04 /user/root/output/clusters-5/_policy

-rw-r--r--  1 root supergroup       28272013-11-17 07:04 /user/root/output/clusters-5/part-r-00000

drwxr-xr-x  - root supergroup          02013-11-17 07:05 /user/root/output/clusters-6-final

-rw-r--r--  1 root supergroup          02013-11-17 07:05 /user/root/output/clusters-6-final/_SUCCESS

drwxr-xr-x  - root supergroup          02013-11-17 07:04 /user/root/output/clusters-6-final/_logs

drwxr-xr-x  - root supergroup          02013-11-17 07:04 /user/root/output/clusters-6-final/_logs/history

-rw-r--r--  1 root supergroup      135962013-11-17 07:04/user/root/output/clusters-6-final/_logs/history/job_201311110448_0028_1384700652169_root_Cluster+Iterator+running+iteration+6+over+priorPat

-rw-r--r--  1 root supergroup      233512013-11-17 07:04 /user/root/output/clusters-6-final/_logs/history/job_201311110448_0028_conf.xml

-rw-r--r--  1 root supergroup        1942013-11-17 07:05 /user/root/output/clusters-6-final/_policy

-rw-r--r--  1 root supergroup       28272013-11-17 07:05 /user/root/output/clusters-6-final/part-r-00000

drwxr-xr-x  - root supergroup          02013-11-17 06:57 /user/root/output/data

-rw-r--r--  1 root supergroup          02013-11-17 06:57 /user/root/output/data/_SUCCESS

drwxr-xr-x  - root supergroup          02013-11-17 06:56 /user/root/output/data/_logs

drwxr-xr-x  - root supergroup          02013-11-17 06:56 /user/root/output/data/_logs/history

-rw-r--r--  1 root supergroup       90362013-11-17 06:56/user/root/output/data/_logs/history/job_201311110448_0022_1384700164248_root_Input+Driver+running+over+input%3A+testdata

-rw-r--r--  1 root supergroup      227462013-11-17 06:56 /user/root/output/data/_logs/history/job_201311110448_0022_conf.xml

-rw-r--r--  1 root supergroup       45382013-11-17 06:56 /user/root/output/data/part-m-00000

drwxr-xr-x  - root supergroup          02013-11-17 06:57 /user/root/output/random-seeds

-rw-r--r--  1 root supergroup       13192013-11-17 06:57 /user/root/output/random-seeds/part-randomSeed

drwxr-xr-x  - root supergroup          02013-11-16 23:21 /user/root/testdata

-rw-r--r--   1 root supergroup       1191 2013-11-16 23:21/user/root/testdata/euro.data


现在,我们可以通过/user/root/output/clusteredPoints/part-m-00000文件来查看最后的计算结果。不过,我们得先使用seqdumper 工具将结果从hdfs上面下载到本地,并且将其从sequece file格式转化为可被我们阅读的文本格式。命令为“mahout seqdumper -i /user/root/output/clusteredPoints/part-m-00000 -ooutput/euro_group.txt”,而结果文件内容如下:

Key class: classorg.apache.hadoop.io.IntWritable Value Class: classorg.apache.mahout.clustering.classify.WeightedVectorWritable

Key: 0: Value: 1.0: [0:8.000, 1:3.000,2:4.000, 3:2.000, 4:2.000, 6:6.000, 7:3.000, 8:2.000, 11:5.000, 12:2.000,14:2.000, 16:1.000, 17:1.000]

Key: 1: Value: 1.0: [0:7.000, 1:3.000,2:4.000, 3:2.000, 4:1.000, 5:1.000, 6:8.000, 7:5.000, 8:2.000, 11:6.000,12:1.000, 14:1.000, 15:1.000, 16:2.000, 17:4.000]

Key: 26: Value: 1.0: [0:5.000, 1:-2.000,2:4.000, 3:1.000, 4:2.000, 5:1.000, 6:3.000, 7:5.000, 9:2.000, 11:1.000,12:1.000, 13:1.000, 15:1.000, 16:2.000, 17:4.000]

Key: 26: Value: 1.0: [0:1.000, 1:-4.000,2:4.000, 4:1.000, 5:3.000, 6:1.000, 7:5.000, 9:1.000, 10:1.000, 12:2.000,15:2.000, 16:1.000, 17:3.000]

Key: 20: Value: 1.0: [0:10.000, 1:10.000,2:4.000, 3:3.000, 4:1.000, 6:14.000, 7:4.000, 8:2.000, 11:6.000, 12:1.000,13:1.000, 14:1.000, 16:8.000, 17:3.000]

Key: 6: Value: 1.0: [0:4.000, 1:-4.000,2:4.000, 3:1.000, 4:1.000, 5:2.000, 6:6.000, 7:10.000, 8:1.000, 10:1.000,11:4.000, 12:7.000, 14:1.000, 15:1.000, 16:2.000, 17:3.000]

Key: 26: Value: 1.0: [0:4.000, 1:-5.000,2:4.000, 3:1.000, 4:1.000, 5:2.000, 6:3.000, 7:8.000, 8:1.000, 9:1.000,11:2.000, 12:1.000, 15:2.000, 16:1.000, 17:7.000]

Key: 26: Value: 1.0: [0:3.000, 1:-1.000,2:4.000, 4:3.000, 5:1.000, 6:6.000, 7:7.000, 9:2.000, 11:4.000, 12:4.000,14:1.000, 15:1.000, 16:2.000, 17:3.000]

Key: 20: Value: 1.0: [0:10.000, 1:11.000,2:4.000, 3:3.000, 4:1.000, 6:13.000, 7:2.000, 8:1.000, 9:1.000, 11:4.000,12:1.000, 13:2.000, 16:9.000, 17:1.000]

Key: 22: Value: 1.0: [0:6.000, 1:1.000,2:3.000, 3:2.000, 5:1.000, 6:5.000, 7:4.000, 8:1.000, 10:1.000, 11:2.000,12:4.000, 13:1.000, 16:3.000]

Key: 26: Value: 1.0: [0:4.000, 1:-2.000,2:4.000, 3:1.000, 4:1.000, 5:2.000, 6:3.000, 7:5.000, 8:1.000, 9:1.000,11:3.000, 12:1.000, 15:2.000, 17:4.000]

Key: 6: Value: 1.0: [0:1.000, 1:-10.000,2:4.000, 4:1.000, 5:3.000, 6:1.000, 7:11.000, 10:2.000, 12:8.000, 14:1.000,15:1.000, 16:1.000, 17:3.000]

Key: 20: Value: 1.0: [0:12.000, 1:11.000,2:4.000, 3:4.000, 6:12.000, 7:1.000, 8:2.000, 11:8.000, 13:2.000, 16:4.000,17:1.000]

Key: 20: Value: 1.0: [0:9.000, 1:5.000,2:4.000, 3:3.000, 5:1.000, 6:11.000, 7:6.000, 8:1.000, 10:1.000, 11:6.000,12:5.000, 13:2.000, 16:5.000, 17:1.000]

Key: 6: Value: 1.0: [0:3.000, 1:-6.000,2:4.000, 3:1.000, 5:3.000, 6:6.000, 7:12.000, 8:1.000, 10:1.000, 11:4.000,12:4.000, 15:2.000, 16:2.000, 17:8.000]

Key: 6: Value: 1.0: [1:-10.000, 2:4.000,5:4.000, 6:2.000, 7:12.000, 10:2.000, 12:4.000, 15:2.000, 16:2.000, 17:8.000]

Key: 20: Value: 1.0: [0:9.000, 1:9.000,2:4.000, 3:3.000, 5:1.000, 6:11.000, 7:2.000, 8:1.000, 10:1.000, 11:4.000,12:2.000, 13:2.000, 16:7.000]

Key: 26: Value: 1.0: [0:6.000, 1:-2.000,2:4.000, 3:2.000, 5:2.000, 6:4.000, 7:6.000, 8:1.000, 10:1.000, 11:3.000,12:3.000, 13:1.000, 15:1.000, 16:1.000, 17:3.000]

Key: 22: Value: 1.0: [0:5.000, 2:4.000,3:1.000, 4:2.000, 5:1.000, 6:4.000, 7:4.000, 9:1.000, 10:1.000, 11:1.000,12:2.000, 13:1.000, 14:1.000, 16:3.000, 17:2.000]

Key: 6: Value: 1.0: [0:2.000, 1:-7.000,2:4.000, 4:2.000, 5:2.000, 6:2.000, 7:9.000, 9:1.000, 10:1.000, 11:1.000,12:5.000, 14:1.000, 15:1.000, 16:1.000, 17:4.000]

Key: 0: Value: 1.0: [0:9.000, 1:3.000,2:4.000, 3:3.000, 5:1.000, 6:6.000, 7:3.000, 8:1.000, 10:1.000, 11:3.000,12:2.000, 13:2.000, 16:3.000, 17:1.000]

Key: 1: Value: 1.0: [0:9.000, 1:1.000,2:4.000, 3:3.000, 5:1.000, 6:7.000, 7:6.000, 8:2.000, 11:5.000, 12:3.000,13:1.000, 15:1.000, 16:2.000, 17:3.000]

Key: 22: Value: 1.0: [0:6.000, 1:2.000,2:4.000, 3:2.000, 5:2.000, 6:6.000, 7:4.000, 8:1.000, 10:1.000, 11:3.000,12:1.000, 13:1.000, 15:1.000, 16:3.000, 17:3.000]

Key: 6: Value: 1.0: [1:-6.000, 2:4.000,5:4.000, 6:4.000, 7:10.000, 10:2.000, 11:2.000, 12:4.000, 15:2.000, 16:2.000,17:6.000]

Key: 20: Value: 1.0: [0:12.000, 1:10.000,2:4.000, 3:4.000, 6:12.000, 7:2.000, 8:2.000, 11:7.000, 12:1.000, 13:2.000,16:5.000, 17:1.000]

Key: 26: Value: 1.0: [0:5.000, 1:-1.000,2:4.000, 3:1.000, 4:2.000, 5:1.000, 6:3.000, 7:4.000, 9:2.000, 11:1.000,12:1.000, 13:1.000, 15:1.000, 16:2.000, 17:3.000]

Key: 22: Value: 1.0: [0:4.000, 1:-1.000,2:4.000, 3:1.000, 4:1.000, 5:2.000, 6:3.000, 7:4.000, 10:2.000, 11:1.000,12:3.000, 13:1.000, 14:1.000, 16:2.000, 17:1.000]

Key: 6: Value: 1.0: [0:1.000, 1:-8.000,2:4.000, 4:1.000, 5:3.000, 7:8.000, 10:2.000, 12:4.000, 14:1.000, 15:1.000,17:4.000]

Key: 0: Value: 1.0: [0:10.000, 1:7.000,2:4.000, 3:3.000, 4:1.000, 6:9.000, 7:2.000, 8:2.000, 11:7.000, 12:1.000,13:1.000, 14:1.000, 16:2.000, 17:1.000]

Key: 22: Value: 1.0: [0:5.000, 2:4.000,3:1.000, 4:2.000, 5:1.000, 6:5.000, 7:5.000, 8:1.000, 9:1.000, 11:3.000,12:1.000, 14:1.000, 15:1.000, 16:2.000, 17:4.000]

Key: 26: Value: 1.0: [0:4.000, 1:-4.000,2:4.000, 3:1.000, 4:1.000, 5:2.000, 6:3.000, 7:7.000, 8:1.000, 9:1.000,11:2.000, 12:1.000, 15:2.000, 16:1.000, 17:6.000]

Key: 26: Value: 1.0: [0:3.000, 1:-3.000,2:4.000, 3:1.000, 5:3.000, 6:2.000, 7:5.000, 8:1.000, 10:1.000, 11:2.000,12:2.000, 15:2.000, 17:3.000]

Count: 32

可以清楚地看到,mahout最终将所有球队分为了6个类别——类别号分别为Key: 0、Key: 1、Key: 26、Key: 20、Key: 6和Key: 22。以下是总结以后的最终结果:

 类别1:
Manchester United, Arsenal, Barcelona

 类别2:
Real Madrid, Paris Saint Germain, Bayern München, Manchester City, Chelsea,Atlético Madrid

 类别3:
Olympiakos, Basel, Borussia Dortmund, FC Porto, AC Milan

 类别4:
Bayer Leverkusen, Schalke 04, SSC Napoli

 类别5:
Galatasaray, Anderlecht, CSKA Moskva,Viktoria Plzeň, Steaua Bucuresti, Olympique Marseille, Austria Wien

 类别6:
Shakhtar Donetsk, Real Sociedad, Kobenhavn,Juventus,Benfica,Zenit Petersburg, Ajax, Celtic

根据笔者十余年的看球经验,感觉mahout对球队的分组还算比较靠谱。等本年度的欧冠最终尘埃落地,我们可以再回头看看同类别的球队是否成绩相近——虽然足球场上决定胜负的偶然因素很多,但也可以据此来大概评判mahout此回预测工作的可靠性和价值。题外话:如果温格教授知道自己的球队今年居然被计算机程序看作和巴塞罗那是一个档次的球队,应该会很爽吧 :-)




  相关解决方案