问题描述
我已经以表格形式将数据抓取到Python:
Name Sport Score
John Golf 100
Jill Rugby 55
John Hockey 100
Bob Golf 45
如何使用Python格式化此表格,以便轻松地对项目进行排序或分组。 例如,如果我想查看所有打过高尔夫球的人的名字或所有在任何一项运动中得分都为100的人。 或仅约翰的所有数据。
1楼
pandas
的DataFrame
是必经之路:
import pandas as pd
df = pd.DataFrame({'Name': ['John', 'Jill', 'John', 'Bob'],
'Sport' : ['Golf', 'Rugby', 'Hockey', 'Golf'],
'Score': [100, 50, 100, 45]})
# the names of people that played Golf
df[df['Sport'] == 'Golf']['Name'].unique()
>> ['John' 'Bob']
# all of the people that scored 100 on any sport.
df[df['Score'] == 100]['Name'].unique()
>> ['John']
# all of the data for just John.
df[df['Name'] == 'John']
>> Name Score Sport
0 John 100 Golf
2 John 100 Hockey
2楼
具有namedtuple
和lambda
的map
和filter
可以用于此任务。
from collections import namedtuple
# Create a named tuple to store the rows
Row = namedtuple('Row', ('name', 'sport', 'score'))
data = '''Name Sport Score
John Golf 100
Jill Rugby 55
John Hockey 100
Bob Golf 45'''
# Read the data, skip the first line
lines = data.splitlines()[1:]
rows = []
for line in lines:
name, sport, score = line.strip().split()
rows.append(Row(name, sport, int(score)))
# People that played Golf
golf_filter = lambda row: row.sport == 'Golf'
golf_players = filter(golf_filter, rows)
# People that scored 100 on any sport
score_filter = lambda row: row.score == 100
scorers = filter(score_filter, rows)
# People named John
john_filter = lambda row: row.name == 'John'
john_data = filter(john_filter, rows)
# If you want a specific column than you can map the data
# Names of golf players
get_name = lambda row: row.name
golf_players_names = map(get_name, golf_players)
结果:
>>> golf_players
[Row(name='John', sport='Golf', score=100),
Row(name='Bob', sport='Golf', score=45)]
>>> john_data
[Row(name='John', sport='Golf', score=100),
Row(name='John', sport='Hockey', score=100)]
>>> scorers
[Row(name='John', sport='Golf', score=100),
Row(name='John', sport='Hockey', score=100)]
>>> golf_players_names
['John', 'Bob']
3楼
这个如何?
yourDS={"name":["John","Jill","John","Bob"],
"sport":["Golf","Rugby","Hockey","Golf"],
"score":[100,55,100,45]
}
当列表被排序时,这应该保持每个条目的关系。
为了避免列表中重复元素的影响,请首先从列表中创建一个新set
。
对于您期望的查询,您可以执行类似的操作。
for index,value in enumerate(yourDS["score"]):
if value=="x":
print yourDS["name"][index]
最好使用list
存储结果并将其set
为set
,以避免某些情况,例如,如果某人在两个不同的游戏中得分为x
。
4楼
您可以创建列表列表。 每行将是列表中的一个列表。
lst1=[['John','Golf',100],['Jill','Rugby',55],['John','Hockey',100],['Bob','Golf',45]]
lst100=[]
for lst in lst1:
if lst[2]==100:
lst100.append(lst)
print lst100
5楼
如果您想根据数据检索信息 ,我将使用SQL。 非常适合回答以下问题:
...查看所有打过高尔夫球的人的名字...
在任何一项运动中获得100分的所有人...
...仅约翰的所有数据。
当今最流行的数据库语言是SQL,而实际上,Python实际上通过对其提供了内置支持。
SQL虽然不是要学习的艰巨任务,但超出了此答案的范围。 要了解这一点,我建议您查看 , 或 (它们都是交互式的)。
或者,如果您只是想读入并写出它而又不关心它的实际含义,请考虑使用内置的 。