问题描述
我有一个由2个元素组成的集合,第一个元素仍然是单词,第二个元素是单词的来源文件,现在如果单词是单词,我需要将文件名附加到单词相同的EG input([['word1', 'F1.txt'], ['word1', 'F2.txt'], ['word2', 'F1.txt'], ['word2', 'F2.txt'], ['word3', 'F1.txt'], ['word3', 'F2.txt'], ['word4', 'F2.txt']])
应该输出[['word1', 'F1.txt', 'F2.txt'], ['word2', 'F1.txt', 'F2.txt'], ['word3', 'F1.txt', 'F2.txt'], ['word4', 'F2.txt']]
能否给我一些有关此操作的提示?
1楼
您可以使用和 :
from collections import defaultdict
def remove_dups_pairs(lst):
s = set(map(tuple, lst))
d = defaultdict(list)
for word, file in s:
d[word].append(file)
return [[key] + values for key, values in d.items()]
print(remove_dups_pairs([["fire", "elem.txt"], ["fire", "things.txt"], ["water", "elem.txt"], ["water", "elem.txt"], ["water", "nature.txt"]]))
输出量
[['fire', 'elem.txt', 'things.txt'], ['water', 'elem.txt', 'nature.txt']]
由于@ShmulikA提到的集合不会保留排序,因此,如果需要保留排序,可以这样进行:
def remove_dups_pairs(lst):
d = defaultdict(list)
seen = set()
for word, file in lst:
if (word, file) not in seen:
d[word].append(file)
seen.add((word, file))
return [[key] + values for key, values in d.items()]
print(remove_dups_pairs([["fire", "elem.txt"], ["fire", "things.txt"], ["water", "elem.txt"], ["water", "elem.txt"],
["water", "nature.txt"]]))
输出量
[['water', 'elem.txt', 'nature.txt'], ['fire', 'elem.txt', 'things.txt']]
2楼
另外,如果您不想使用defaultdict,可以执行以下操作:
inner=[[]]
count = 0
def loockup(data,i, count):
for j in range(i+1, len(data)):
if data[i][0] == data[j][0] and data[j][1] not in inner[count]:
inner[count].append(data[j][1])
return inner
for i in range(len(data)):
if data[i][0] in inner[count]:
inner=loockup(data,i,count)
else:
if i!=0:
count +=1
inner.append([])
inner[count].append(data[i][0])
inner[count].append(data[i][1])
loockup(data,i, count)
print (inner)
3楼
使用一组可见项保持插入顺序:
from collections import defaultdict
def remove_dups_pairs_ordered(lst):
d = defaultdict(list)
# stores word,file pairs we already seen
seen = set()
for item in lst:
word, file = item
key = (word, file)
# skip adding word,file we already seen before
if key in seen:
continue
seen.add(key)
d[word].append(file)
# convert the dict word -> [f1, f2..] into
# a list of lists [[word1, f1,f2, ...], [word2, f1, f2...], ...]
return [[word] + files for word, files in d.items()]
print(remove_dups_pairs_ordered(lst))
输出:
[['fire', 'elem.txt', 'things.txt'], ['water', 'elem.txt', 'nature.txt']]
不使用defaultdict和set保持顺序:
from collections import defaultdict
def remove_dups_pairs(lst):
d = defaultdict(set)
for item in lst:
d[item[0]].add(item[1])
return [[word] + list(files) for word, files in d.items()]
lst = [
["fire","elem.txt"], ["fire","things.txt"],
["water","elem.txt"], ["water","elem.txt"],
["water","nature.txt"]
]
print(remove_dups_pairs(lst))
输出:
[['fire', 'things.txt', 'elem.txt'], ['water', 'nature.txt', 'elem.txt']]
4楼
可以使用解决此问题。 它是一个字典,允许按添加键的顺序进行迭代。
import collections
def remove_dups_pairs(data):
word_files = collections.OrderedDict()
for word, file_name in data:
if word not in word_files.keys():
word_files.update({word: [file_name]})
elif file_name not in word_files[word]:
word_files[word].append(file_name)
return [[word] + files for word, files in word_files.items()]
print(remove_dups_pairs([["fire", "elem.txt"], ["fire", "things.txt"],
["water", "elem.txt"], ["water", "elem.txt"],
["water", "nature.txt"]]))
print(remove_dups_pairs([['word1', 'F1.txt'], ['word1', 'F2.txt'],
['word2', 'F1.txt'], ['word2', 'F2.txt'],
['word3', 'F1.txt'], ['word3', 'F2.txt'],
['word4', 'F2.txt']]))
输出:
[['fire', 'elem.txt', 'things.txt'], ['water', 'elem.txt', 'nature.txt']]
[['word1', 'F1.txt', 'F2.txt'], ['word2', 'F1.txt', 'F2.txt'], ['word3', 'F1.txt', 'F2.txt'], ['word4', 'F2.txt']]