当前位置: 代码迷 >> 综合 >> sklearn——TfidfVectorizer笔记
  详细解决方案

sklearn——TfidfVectorizer笔记

热度:42   发布时间:2023-12-29 00:35:35.0

代码:

from sklearn.feature_extraction.text import TfidfVectorizercorpus = ['I had had a dream','My dream will come true']vectorizer = TfidfVectorizer()
matrix = vectorizer.fit_transform(corpus)
print("特征词IDF值:\n", vectorizer.idf_)
print("特征词TF-IDF矩阵:\n", matrix.toarray())
print("特征词坐标与TF-IDF值:\n", matrix)
print("特征词:\n", vectorizer.get_feature_names())
print("特征词与索引:\n", vectorizer.vocabulary_)

 输出:

特征词IDF值:[1.40546511 1.         1.40546511 1.40546511 1.40546511 1.40546511]
特征词TF-IDF矩阵:[[0.         0.33517574 0.94215562 0.         0.         0.        ][0.47107781 0.33517574 0.         0.47107781 0.47107781 0.47107781]]
特征词坐标与TF-IDF值:(0, 1)	0.33517574332792605(0, 2)	0.9421556246632359(1, 4)	0.47107781233161794(1, 0)	0.47107781233161794(1, 5)	0.47107781233161794(1, 3)	0.47107781233161794(1, 1)	0.33517574332792605
特征词:['come', 'dream', 'had', 'my', 'true', 'will']
特征词与索引:{'had': 2, 'dream': 1, 'my': 3, 'will': 5, 'come': 0, 'true': 4}