当前位置: 代码迷 >> python >> 创建由单词对组成的元组
  详细解决方案

创建由单词对组成的元组

热度:110   发布时间:2023-06-13 13:41:44.0

我有一个字符串(或单词列表)。 我想为每个可能的单词对组合创建元组,以便将它们传递给Counter以进行字典创建和频率计算。 频率以以下方式计算:如果该对以字符串形式存在(无论顺序如何,或者它们之间是否有其他单词),则频率= 1(即使单词1的频率为7,单词2的频率为3,对word1和word2仍然为1)

我正在使用循环创建所有对的元组但被卡住了

tweetList = ('I went to work but got delayed at other work and got stuck in a traffic and I went to drink some coffee but got no money and asked for money from work', 'We went to get our car but the car was not ready. We tried to expedite our car but were told it is not ready')

words = set(tweetList.split())
n = 10
for tweet in tweetList:

    for word1 in words:
        for word2 in words:
            pairW = [(word1, word2)]

            c1 = Counter(pairW for pairW in tweet)

c1.most_common(n)

但是,输出结果非常奇怪:

[('k', 1)]

似乎是单词而不是单词,它遍历字母

如何解决? 使用split()将字符串转换为单词列表?

另一个问题:如何避免创建重复的元组,例如:(word1,word2)和(word2,word1)? 枚举?

作为输出,我期望有一个字典,其中的键=所有单词对(尽管请参阅重复的注释),而值=列表中一对单词的出现频率

谢谢!

我想知道这是否是您想要的:

import itertools, collections

tweets = ['I went to work but got delayed at other work and got stuck in a traffic and I went to drink some coffee but got no money and asked for money from work',
          'We went to get our car but the car was not ready. We tried to expedite our car but were told it is not ready']

words = set(word.lower() for tweet in tweets for word in tweet.split())
_pairs = list(itertools.permutations(words, 2))
# We need to clean up similar pairs: sort words in each pair and then convert
# them to tuple so we can convert whole list into set.
pairs = set(map(tuple, map(sorted, _pairs)))

c = collections.Counter()

for tweet in tweets:
    for pair in pairs:
        if pair[0] in tweet and pair[1] in tweet:
            c.update({pair: 1})

print c.most_common(10)

结果是: [(('a', 'went'), 2), (('a', 'the'), 2), (('but', 'i'), 2), (('i', 'the'), 2), (('but', 'the'), 2), (('a', 'i'), 2), (('a', 'we'), 2), (('but', 'we'), 2), (('no', 'went'), 2), (('but', 'went'), 2)]

tweet是一个字符串,因此Counter(pairW for pairW in tweet)将计算tweet字母的频率,这可能不是您想要的。

  相关解决方案