2. Referrence" />
当前位置: 代码迷 >> 数据仓库 >> 2. Referrence
  详细解决方案

2. Referrence

热度:207   发布时间:2016-05-05 15:44:59.0
【数据挖掘】分类之knn

1.算法简介


knn的思想很简单:计算待分类的数据点与训练集所有样本点,取距离最近的k个样本;统计这k个样本的类别数量;根据多数表决方案,取数量最多的那一类作为待测样本的类别。距离度量可采用Euclidean distance,Manhattan distance和cosine。


用Iris数据集作为测试,代码参考[1]

import numpy as npimport scipy.spatial.distance as ssddef read_data(fn):    """ read dataset and separate into characteristics data        and label data    """     # read dataset file    with open(fn) as f:        raw_data = np.loadtxt(f, delimiter= ',', dtype="float",             skiprows=1, usecols=None)    #initialize    charac=[]; label=[]    #obtain input characrisitics and label    for row in raw_data:        charac.append(row[:-1])        label.append(int (row[-1]))    return np.array(charac),np.array(label)def knn(k,dtrain,dtest,dtr_label):    """k-nearest neighbors algorithm"""    pred_label=[]    #for each instance in test dataset, calculate    #distance in respect to train dataset    for di in dtest:        distances=[]        for ij,dj in enumerate(dtrain):            distances.append((ssd.euclidean(di,dj),ij))        #sort the distances to get k-neighbors        k_nn=sorted(distances)[:k]        #classify accroding to the maxmium label        dlabel=[]        for dis,idtr in k_nn:            dlabel.append(dtr_label[idtr])        pred_label.append(np.argmax(np.bincount(dlabel)))    return pred_labeldef evaluate(result):    """evaluate the predicited label"""    eval_result=np.zeros(2,int)    for x in result:        #pred_label==dte_label        if x==0:            eval_result[0]+=1        #pred_label!=dte_label        else:            eval_result[1]+=1    return eval_resultdtrain,dtr_label=read_data('iris-train.csv')dtest,dte_label=read_data('iris-test.csv')K=[1,3,7,11]print "knn classification result for iris data set:\n"print "k    | number of correct/wrong classified test records"for k in K:    pred_label=knn(k,dtrain,dtest,dtr_label)    eval_result=evaluate(pred_label-dte_label)    #print the evaluted result into screen    print k,"   | ", eval_result[0], "/", eval_result[1]print




2. Referrence


[1] M. Saad Nurul Ishlah, Python: Simple K Nearest Neighbours Classifier.



  相关解决方案