大熊猫方差和标准差结果与手动计算不同_python

我正在尝试使用熊猫来完成Mean，Variance和SD。 但是，手动计算与熊猫输出的计算不同。 使用熊猫有什么我想念的吗？ 附上xl截图供参考

import pandas as pd

dg_df = pd.DataFrame(
            data=[600,470,170,430,300],
            index=['a','b','c','d','e'])

print(dg_df.mean(axis=0)) # 394.0 matches with manual calculation
print(dg_df.var())        # 27130.0 not matching with manual calculation 21704
print(dg_df.std(axis=0))  # 164.71187 not matching with manual calculation 147.32

改变默认参数ddof=1 （自由德尔塔度） 0在以及在，参数axis=0是默认的，所以应该被忽略：

print(dg_df.mean())
0    394.0
dtype: float64

print(dg_df.var(ddof=0))  
0    21704.0
dtype: float64

print(dg_df.std(ddof=0))
0    147.322775
dtype: float64

标准差的定义不止一个。 您正在计算Excel STDEV.P的等效STDEV.P ，其中包含以下描述：“根据整个总体计算标准差...”。 如果在Excel中需要样本标准差，请使用STDEV.S 。

默认采用1 个自由度 ，也称为样本标准差。

默认采用0 自由度 ，也称为人口标准差。

请参阅以了解样本和人口之间的差异。

您还可以使用Pandas std / var方法指定ddof=0 ：

dg_df.std(ddof=0)
dg_df.var(ddof=0)

您也可以使用dg_df.describe（），然后使用下一个数据帧。 也许更直观

count   5.00000
mean    394.00000
std 164.71187
min 170.00000
25% 300.00000
50% 430.00000
75% 470.00000
max 600.00000

你可以得到像dg_df.describe().loc['count']这样的正确数据dg_df.describe().loc['count']

大熊猫方差和标准差结果与手动计算不同

问题描述

1楼

2楼

3楼