当前位置: 代码迷 >> 综合 >> pandas group分组与agg聚合
  详细解决方案

pandas group分组与agg聚合

热度:34   发布时间:2023-09-15 06:34:01.0
import pandas as pddf = pd.DataFrame({'Country':['China','China','India', 'India', 'America', 'Japan', 'China', 'India'], 'Income':[10000, 10000, 5000, 5002, 40000, 50000, 8000, 5000],'Age':[5000, 4321, 1234, 4010, 250, 250, 4500, 4321]})

构造的数据如下:

Age Country Income
0 5000 China 10000
1 4321 China 10000
2 1234 India 5000
3 4010 India 5002
4 250 America 40000
5 250 Japan 50000
6 4500 China 8000
7 4321 India 5000

分组

单列分组

df_gb = df.groupby('Country')
for index, data in df_gb:print(index)print(data)
输出
AmericaAge  Country  Income
4  250  America   40000
ChinaAge Country  Income
0  5000   China   10000
1  4321   China   10000
6  4500   China    8000
IndiaAge Country  Income
2  1234   India    5000
3  4010   India    5002
7  4321   India    5000
JapanAge Country  Income
5  250   Japan   50000

多列分组

df_gb = df.groupby(['Country', 'Income'])
for (index1, index2), data in df_gb:print((index1, index2))print(data)输出('America', 40000)Age  Country  Income
4  250  America   40000
('China', 8000)Age Country  Income
6  4500   China    8000
('China', 10000)Age Country  Income
0  5000   China   10000
1  4321   China   10000
('India', 5000)Age Country  Income
2  1234   India    5000
7  4321   India    5000
('India', 5002)Age Country  Income
3  4010   India    5002
('Japan', 50000)Age Country  Income
5  250   Japan   50000

聚合

对分组后数据进行聚合

默认情况对分组之后其他列进行聚合

df_agg = df.groupby('Country').agg(['min', 'mean', 'max'])
print(df_agg)
输出Age                    Income                     min         mean   max    min          mean    max
Country                                                     
America   250   250.000000   250  40000  40000.000000  40000
China    4321  4607.000000  5000   8000   9333.333333  10000
India    1234  3188.333333  4321   5000   5000.666667   5002
Japan     250   250.000000   250  50000  50000.000000  50000

对分组后的部分列进行聚合

某些情况,只需要对部分数据进行不同的聚合操作,可以通过字典来构建

num_agg = {'Age':['min', 'mean', 'max']}
print(df.groupby('Country').agg(num_agg))
输出Age                   min         mean   max
Country                         
America   250   250.000000   250
China    4321  4607.000000  5000
India    1234  3188.333333  4321
Japan     250   250.000000   250
num_agg = {'Age':['min', 'mean', 'max'], 'Income':['min', 'max']}
print(df.groupby('Country').agg(num_agg))
输出Age                    Income       min         mean   max    min    max
Country                                       
America   250   250.000000   250  40000  40000
China    4321  4607.000000  5000   8000  10000
India    1234  3188.333333  4321   5000   5002
Japan     250   250.000000   250  50000  50000
  相关解决方案