问题描述
尝试用单个标签列出所有单词。 当我将不同的评论分为单词列表时,然后尝试将它们添加到名为pos / neg_bag_of_words的变量中。 这似乎适用于一个评论,但是当我遍历整个评论语料库时,它似乎覆盖了一个标签的先前单词列表,而另一个标签列表的值为None。 我究竟做错了什么?
review1 = 'this dumbest films ever seen rips nearly ever'
review2 = 'whole mess there plot afterthought \
          acting goes there nothing good nothing honestly cant \
          understand this type nonsense gets produced actually'
review3 = 'released does somebody somewhere some stage think this \
          really load shite call crap like this that people'
review4 = 'downloading illegally trailer looks like completely \
          different film least have download haven wasted your \
          time money waste your time this painful'
labels = 'POSITIVE', 'NEGATIVE', 'NEGATIVE', 'POSITIVE'
reviews = [review1, review2, review3, review4]
for review, label in zip(reviews, labels):
    pos_bag_of_words = []
    neg_bag_of_words = []
    if label == 'NEGATIVE': 
#         neg_bag_of_words.extend(list(review.split()))
        neg_bag_of_words = list(review.split()) + neg_bag_of_words
    if label == 'POSITIVE':
#         pos_bag_of_words.extend(list(review.split()))
        pos_bag_of_words = list(review.split()) + pos_bag_of_words
 
  退货
#There are positive words in the entire corpus... but I get nothing
>>> pos_bag_of_words
    ['downloading',
 'illegally',
 'trailer',
 'looks',
 'like',
 'completely',
 'different',
 'film',
 'least',
 'have',
 'download',
 'haven',
 'wasted',
 'your',
 'time',
 'money',
 'waste',
 'your',
 'time',
 'this',
 'painful']
>>> neg_bag_of_words
[]
 
 1楼
 
     您应该将neg_bag_of_words和pos_bag_of_words初始化放在for循环之外。 
     否则,每次执行for循环时,您的列表都会重新初始化为空列表。 
     这就是为什么neg_bag_of_words一无所获的neg_bag_of_words 。 
     做这样的事情: 
pos_bag_of_words = []
neg_bag_of_words = []
for review, label in zip(reviews, labels):
    if label == 'NEGATIVE': 
        neg_bag_of_words = list(review.split()) + neg_bag_of_words
    if label == 'POSITIVE':
        pos_bag_of_words = list(review.split()) + pos_bag_of_words