当前位置: 代码迷 >> python >> 使用条件删除列表列表中的重复项 使用一组可见项保持插入顺序: 不使用defaultdict和set保持顺序:
  详细解决方案

使用条件删除列表列表中的重复项 使用一组可见项保持插入顺序: 不使用defaultdict和set保持顺序:

热度:88   发布时间:2023-07-14 08:45:37.0

我有一个由2个元素组成的集合,第一个元素仍然是单词,第二个元素是单词的来源文件,现在如果单词是单词,我需要将文件名附加到单词相同的EG input([['word1', 'F1.txt'], ['word1', 'F2.txt'], ['word2', 'F1.txt'], ['word2', 'F2.txt'], ['word3', 'F1.txt'], ['word3', 'F2.txt'], ['word4', 'F2.txt']])应该输出[['word1', 'F1.txt', 'F2.txt'], ['word2', 'F1.txt', 'F2.txt'], ['word3', 'F1.txt', 'F2.txt'], ['word4', 'F2.txt']]能否给我一些有关此操作的提示?

您可以使用和 :

from collections import defaultdict


def remove_dups_pairs(lst):
    s = set(map(tuple, lst))
    d = defaultdict(list)
    for word, file in s:
        d[word].append(file)
    return [[key] + values for key, values in d.items()]


print(remove_dups_pairs([["fire", "elem.txt"], ["fire", "things.txt"], ["water", "elem.txt"], ["water", "elem.txt"], ["water", "nature.txt"]]))

输出量

[['fire', 'elem.txt', 'things.txt'], ['water', 'elem.txt', 'nature.txt']]

由于@ShmulikA提到的集合不会保留排序,因此,如果需要保留排序,可以这样进行:

def remove_dups_pairs(lst):
    d = defaultdict(list)
    seen = set()
    for word, file in lst:
        if (word, file) not in seen:
            d[word].append(file)
            seen.add((word, file))

    return [[key] + values for key, values in d.items()]


print(remove_dups_pairs([["fire", "elem.txt"], ["fire", "things.txt"], ["water", "elem.txt"], ["water", "elem.txt"],
                         ["water", "nature.txt"]]))

输出量

[['water', 'elem.txt', 'nature.txt'], ['fire', 'elem.txt', 'things.txt']]

另外,如果您不想使用defaultdict,可以执行以下操作:

inner=[[]]
count = 0
def loockup(data,i, count):
    for j in range(i+1, len(data)):
        if data[i][0] == data[j][0] and data[j][1] not in inner[count]:
            inner[count].append(data[j][1])
    return inner

for i in range(len(data)):
    if data[i][0] in inner[count]:
        inner=loockup(data,i,count)
    else:
        if i!=0:
            count +=1
            inner.append([])
        inner[count].append(data[i][0])
        inner[count].append(data[i][1])
        loockup(data,i, count)
print (inner)

使用一组可见项保持插入顺序:

from collections import defaultdict

def remove_dups_pairs_ordered(lst):
    d = defaultdict(list)

    # stores word,file pairs we already seen
    seen = set()
    for item in lst:
        word, file = item
        key = (word, file)

        # skip adding word,file we already seen before
        if key in seen:
            continue
        seen.add(key)
        d[word].append(file)

    # convert the dict word -> [f1, f2..] into 
    # a list of lists [[word1, f1,f2, ...], [word2, f1, f2...], ...]
    return [[word] + files for word, files in d.items()]

print(remove_dups_pairs_ordered(lst))

输出:

[['fire', 'elem.txt', 'things.txt'], ['water', 'elem.txt', 'nature.txt']]


不使用defaultdict和set保持顺序:

from collections import defaultdict

def remove_dups_pairs(lst):
    d = defaultdict(set)

    for item in lst:
        d[item[0]].add(item[1])
    return [[word] + list(files) for word, files in d.items()]

lst = [
    ["fire","elem.txt"], ["fire","things.txt"],
    ["water","elem.txt"], ["water","elem.txt"],
    ["water","nature.txt"]
]

print(remove_dups_pairs(lst))

输出:

   [['fire', 'things.txt', 'elem.txt'], ['water', 'nature.txt', 'elem.txt']]

可以使用解决此问题。 它是一个字典,允许按添加键的顺序进行迭代。

import collections

def remove_dups_pairs(data):
    word_files = collections.OrderedDict()
    for word, file_name in data:
        if word not in word_files.keys():
            word_files.update({word: [file_name]})
        elif file_name not in word_files[word]:
            word_files[word].append(file_name)
    return [[word] + files for word, files in word_files.items()]


print(remove_dups_pairs([["fire", "elem.txt"], ["fire", "things.txt"],
                         ["water", "elem.txt"], ["water", "elem.txt"],
                         ["water", "nature.txt"]]))
print(remove_dups_pairs([['word1', 'F1.txt'], ['word1', 'F2.txt'],
                         ['word2', 'F1.txt'], ['word2', 'F2.txt'],
                         ['word3', 'F1.txt'], ['word3', 'F2.txt'],
                         ['word4', 'F2.txt']]))

输出:

[['fire', 'elem.txt', 'things.txt'], ['water', 'elem.txt', 'nature.txt']]
[['word1', 'F1.txt', 'F2.txt'], ['word2', 'F1.txt', 'F2.txt'], ['word3', 'F1.txt', 'F2.txt'], ['word4', 'F2.txt']]