1.3.0 版的新功能 (2021年7月2日)#

这些是 pandas 1.3.0 中的更改。有关包括其他 pandas 版本在内的完整更改日志，请参阅发行说明。

警告

读取新的 Excel 2007+ (.xlsx) 文件时，read_excel() 的默认参数 engine=None 现在将在选项 io.excel.xlsx.reader 设置为 "auto" 的所有情况下使用 openpyxl 引擎。以前，在某些情况下会使用 xlrd 引擎。有关此更改的背景信息，请参阅1.2.0 版新功能。

功能增强#

读取 CSV 或 JSON 文件时自定义 HTTP(s) 头#

当从不受 fsspec 处理的远程 URL（例如 HTTP 和 HTTPS）读取数据时，传递给 storage_options 的字典将用于创建请求中包含的头部。这可用于控制 User-Agent 头部或发送其他自定义头部（GH 36688）。例如：

In [1]: headers = {"User-Agent": "pandas"}
In [2]: df = pd.read_csv(
   ...:     "https://download.bls.gov/pub/time.series/cu/cu.item",
   ...:     sep="\t",
   ...:     storage_options=headers
   ...: )

读取和写入 XML 文档#

我们添加了 I/O 支持，可以使用 read_xml() 和 DataFrame.to_xml() 读取和渲染 XML 文档的浅层版本。使用 lxml 作为解析器，XPath 1.0 和 XSLT 1.0 均可用。（GH 27554）

In [1]: xml = """<?xml version='1.0' encoding='utf-8'?>
   ...: <data>
   ...:  <row>
   ...:     <shape>square</shape>
   ...:     <degrees>360</degrees>
   ...:     <sides>4.0</sides>
   ...:  </row>
   ...:  <row>
   ...:     <shape>circle</shape>
   ...:     <degrees>360</degrees>
   ...:     <sides/>
   ...:  </row>
   ...:  <row>
   ...:     <shape>triangle</shape>
   ...:     <degrees>180</degrees>
   ...:     <sides>3.0</sides>
   ...:  </row>
   ...:  </data>"""

In [2]: df = pd.read_xml(xml)
In [3]: df
Out[3]:
      shape  degrees  sides
0    square      360    4.0
1    circle      360    NaN
2  triangle      180    3.0

In [4]: df.to_xml()
Out[4]:
<?xml version='1.0' encoding='utf-8'?>
<data>
  <row>
    <index>0</index>
    <shape>square</shape>
    <degrees>360</degrees>
    <sides>4.0</sides>
  </row>
  <row>
    <index>1</index>
    <shape>circle</shape>
    <degrees>360</degrees>
    <sides/>
  </row>
  <row>
    <index>2</index>
    <shape>triangle</shape>
    <degrees>180</degrees>
    <sides>3.0</sides>
  </row>
</data>

更多信息请参阅 IO 工具用户指南中的写入 XML。

Styler 增强功能#

我们对 Styler 进行了一些重点开发。另请参阅已修订和改进的Styler 文档（GH 39720、GH 39317、GH 40493）。

方法 Styler.set_table_styles() 现在可以接受更自然的 CSS 语言作为参数，例如 'color:red;' 而不是 [('color', 'red')]（GH 39563）

方法 Styler.highlight_null()、Styler.highlight_min() 和 Styler.highlight_max() 现在允许自定义 CSS 高亮显示，而不是默认的背景着色（GH 40242）

Styler.apply() 现在接受在 axis=None 时返回 ndarray 的函数，使其与 axis=0 和 axis=1 的行为保持一致（GH 39359）

当通过 Styler.apply() 或 Styler.applymap() 给出格式不正确的 CSS 时，现在会在渲染时引发错误（GH 39660）

Styler.format() 现在接受关键字参数 escape 用于可选的 HTML 和 LaTeX 转义（GH 40388, GH 41619）

Styler.background_gradient() 增加了参数 gmap，用于为阴影提供特定的渐变图（GH 22727）

Styler.clear() 现在也清除 Styler.hidden_index 和 Styler.hidden_columns（GH 40484）

增加了方法 Styler.highlight_between()（GH 39821）

增加了方法 Styler.highlight_quantile()（GH 40926）

增加了方法 Styler.text_gradient()（GH 41098）

增加了方法 Styler.set_tooltips() 以允许悬停提示；这可用于增强交互式显示（GH 21266, GH 40284）

在方法 Styler.format() 中添加了参数 precision，用于控制浮点数的显示（GH 40134）

Styler 渲染的 HTML 输出现在遵循 w3 HTML 样式指南（GH 39626）

Styler 类的许多功能现在可以在具有非唯一索引或列的 DataFrame 上部分或完全使用（GH 41143）

通过使用新的 Styler 选项对索引或列进行单独的稀疏化，可以更好地控制显示，这些选项也可以通过 option_context() 使用（GH 41142）

添加了选项 styler.render.max_elements，以避免在样式化大型 DataFrame 时浏览器过载（GH 40712）

增加了方法 Styler.to_latex()（GH 21673, GH 42320），它还允许一些有限的 CSS 转换（GH 40731）

增加了方法 Styler.to_html()（GH 13379）

增加了方法 Styler.set_sticky()，以使索引和列头在可滚动的 HTML 框架中永久可见（GH 29072）

DataFrame 构造函数在 dict 参数中尊重 `copy=False`#

当将字典与 copy=False 一起传递给 DataFrame 时，将不再创建副本（GH 32960）。

In [1]: arr = np.array([1, 2, 3])

In [2]: df = pd.DataFrame({"A": arr, "B": arr.copy()}, copy=False)

In [3]: df
Out[3]: 
   A  B
0  1  1
1  2  2
2  3  3

df["A"] 仍然是 arr 的一个视图

In [4]: arr[0] = 0

In [5]: assert df.iloc[0, 0] == 0

不传递 copy 时的默认行为将保持不变，即会创建副本。

PyArrow 支持的字符串数据类型#

我们增强了 StringDtype，这是一种专用于字符串数据的扩展类型。（GH 39908）

现在可以将 storage 关键字选项指定给 StringDtype。使用 pandas 选项或使用 dtype='string[pyarrow]' 指定 dtype，以允许 StringArray 由 PyArrow 数组而不是 Python 对象的 NumPy 数组支持。

PyArrow 支持的 StringArray 需要安装 pyarrow 1.0.0 或更高版本。

警告

string[pyarrow] 目前被认为是实验性的。其实现和部分 API 可能会在不发出警告的情况下更改。

In [6]: pd.Series(['abc', None, 'def'], dtype=pd.StringDtype(storage="pyarrow"))
Out[6]: 
0     abc
1    <NA>
2     def
dtype: string

你也可以使用别名 "string[pyarrow]"。

In [7]: s = pd.Series(['abc', None, 'def'], dtype="string[pyarrow]")

In [8]: s
Out[8]: 
0     abc
1    <NA>
2     def
dtype: string

你也可以使用 pandas 选项创建 PyArrow 支持的字符串数组。

In [9]: with pd.option_context("string_storage", "pyarrow"):
   ...:     s = pd.Series(['abc', None, 'def'], dtype="string")
   ...: 

In [10]: s
Out[10]: 
0     abc
1    <NA>
2     def
dtype: string

常用的字符串访问器方法均可使用。在适当的情况下，Series 或 DataFrame 列的返回类型也将具有字符串 dtype。

In [11]: s.str.upper()
Out[11]: 
0     ABC
1    <NA>
2     DEF
dtype: string

In [12]: s.str.split('b', expand=True).dtypes
Out[12]: 
0    string[pyarrow]
1    string[pyarrow]
dtype: object

返回整数的字符串访问器方法将返回一个具有 Int64Dtype 的值

In [13]: s.str.count("a")
Out[13]: 
0       1
1    <NA>
2       0
dtype: Int64

居中时间日期型滚动窗口#

当对具有时间日期型索引的 DataFrame 和 Series 对象执行滚动计算时，现在可以使用居中时间日期型窗口（GH 38780）。例如：

In [14]: df = pd.DataFrame(
   ....:     {"A": [0, 1, 2, 3, 4]}, index=pd.date_range("2020", periods=5, freq="1D")
   ....: )
   ....: 

In [15]: df
Out[15]: 
            A
2020-01-01  0
2020-01-02  1
2020-01-03  2
2020-01-04  3
2020-01-05  4

In [16]: df.rolling("2D", center=True).mean()
Out[16]: 
              A
2020-01-01  0.5
2020-01-02  1.5
2020-01-03  2.5
2020-01-04  3.5
2020-01-05  4.0

其他功能增强#

DataFrame.rolling()、Series.rolling()、DataFrame.expanding() 和 Series.expanding() 现在支持一个 method 参数，其中包含一个 'table' 选项，该选项可在整个 DataFrame 上执行窗口操作。有关性能和功能优势，请参阅窗口概述（GH 15095, GH 38995）
ExponentialMovingWindow 现在支持 online 方法，可以以在线方式执行 mean 计算。请参阅窗口概述（GH 41673）
添加了 MultiIndex.dtypes()（GH 37062）
在 DataFrame.resample() 的 origin 参数中添加了 end 和 end_day 选项（GH 37804）
改进了 read_csv() 和 engine="c" 时 usecols 和 names 不匹配的错误消息（GH 29042）
改进了在窗口方法中传递无效的 win_type 参数时错误消息的一致性（GH 15969）
read_sql_query() 现在接受 dtype 参数，用于根据用户输入从 SQL 数据库中转换列数据（GH 10285）
如果未指定 usecols 时头部或给定名称的长度与数据长度不匹配，read_csv() 现在会引发 ParserWarning（GH 21768）
使用 DataFrame.to_sql() 时，改进了从 pandas 到 SQLAlchemy 的整数类型映射（GH 35076）
to_numeric() 现在支持可为空的 ExtensionDtype 对象的向下转换（GH 33013）
增加了对 MultiIndex.set_names 和 MultiIndex.rename 中类似字典名称的支持（GH 20421）
read_excel() 现在可以自动检测 .xlsb 文件和较旧的 .xls 文件（GH 35416, GH 41225）
ExcelWriter 现在接受 if_sheet_exists 参数，用于控制写入现有工作表时追加模式的行为（GH 40230）
Rolling.sum(), Expanding.sum(), Rolling.mean(), Expanding.mean(), ExponentialMovingWindow.mean(), Rolling.median(), Expanding.median(), Rolling.max(), Expanding.max(), Rolling.min() 和 Expanding.min() 现在支持使用 engine 关键字进行 Numba 执行 (GH 38895, GH 41267)
DataFrame.apply() 现在可以接受 NumPy 一元运算符的字符串形式，例如 df.apply("sqrt")，这对于 Series.apply() 已经适用 (GH 39116)
DataFrame.apply() 现在可以接受非可调用 DataFrame 属性的字符串形式，例如 df.apply("size")，这对于 Series.apply() 已经适用 (GH 39116)
DataFrame.applymap() 现在可以接受 kwargs 以传递给用户提供的 func (GH 39987)
现在禁止将 DataFrame 索引器传递给 iloc 用于 Series.__getitem__() 和 DataFrame.__getitem__() (GH 39004)
Series.apply() 现在可以接受非列表或非字典的类列表或类字典参数，例如 ser.apply(np.array(["sum", "mean"]))，这对于 DataFrame.apply() 已经适用 (GH 39140)
DataFrame.plot.scatter() 现在可以接受分类列作为参数 c (GH 12380, GH 31357)
Series.loc() 现在在 Series 具有 MultiIndex 且索引器维度过多时会引发有用的错误消息 (GH 35349)
read_stata() 现在支持从压缩文件读取数据 (GH 26599)
增加了对 Timedelta 中解析带有负号的类似 ISO 8601 时间戳的支持 (GH 37172)
增加了对 FloatingArray 中一元运算符的支持 (GH 38749)
RangeIndex 现在可以通过直接传递 range 对象来构建，例如 pd.RangeIndex(range(3)) (GH 12067)
Series.round() 和 DataFrame.round() 现在支持可空整数和浮点数据类型 (GH 38844)
read_csv() 和 read_json() 暴露了参数 encoding_errors 以控制如何处理编码错误 (GH 39450)
DataFrameGroupBy.any(), SeriesGroupBy.any(), DataFrameGroupBy.all() 和 SeriesGroupBy.all() 对可空数据类型使用克莱尼逻辑 (Kleene logic) (GH 37506)
DataFrameGroupBy.any(), SeriesGroupBy.any(), DataFrameGroupBy.all() 和 SeriesGroupBy.all() 返回一个带有 BooleanDtype 的可空数据类型列 (GH 33449)
DataFrameGroupBy.any(), SeriesGroupBy.any(), DataFrameGroupBy.all() 和 SeriesGroupBy.all() 在 object 数据包含 pd.NA 时（即使 skipna=True）引发了问题 (GH 37501)
DataFrameGroupBy.rank() 和 SeriesGroupBy.rank() 现在支持 object-dtype 数据 (GH 38278)
现在，使用 data 参数是 **非** NumPy ndarray 且由 NumPy 标量组成的 Python 可迭代对象来构建 DataFrame 或 Series，其 dtype 的精度将是 NumPy 标量中的最大值；当 data 是 NumPy ndarray 时已经如此 (GH 40908)
向 pivot_table() 添加了关键字 sort，以允许结果不进行排序 (GH 39143)
向 DataFrame.value_counts() 添加了关键字 dropna，以允许计算包含 NA 值的行 (GH 41325)
Series.replace() 现在将尽可能把结果转换为 PeriodDtype 而非 object dtyoe (GH 41526)
改进了 Rolling、Expanding 和 ExponentialMovingWindow 上的 corr 和 cov 方法中的错误消息，当 other 不是 DataFrame 或 Series 时 (GH 41741)
Series.between() 现在可以接受 left 或 right 作为 inclusive 的参数，以仅包含左边界或右边界 (GH 40245)
DataFrame.explode() 现在支持展开多列。它的 column 参数现在也接受字符串列表或元组，以便同时在多列上展开 (GH 39240)
DataFrame.sample() 现在接受 ignore_index 参数，以便在采样后重置索引，类似于 DataFrame.drop_duplicates() 和 DataFrame.sort_values() (GH 38581)

值得注意的错误修复#

这些是可能导致显著行为更改的错误修复。

`Categorical.unique` 现在始终保持与原始数据相同的 dtype#

以前，当对分类数据调用 Categorical.unique() 时，新数组中未使用的类别会被删除，导致新数组的 dtype 与原始数组不同 (GH 18291)

例如，给定

In [17]: dtype = pd.CategoricalDtype(['bad', 'neutral', 'good'], ordered=True)

In [18]: cat = pd.Categorical(['good', 'good', 'bad', 'bad'], dtype=dtype)

In [19]: original = pd.Series(cat)

In [20]: unique = original.unique()

旧行为:

In [1]: unique
['good', 'bad']
Categories (2, object): ['bad' < 'good']
In [2]: original.dtype == unique.dtype
False

新行为:

In [21]: unique
Out[21]: 
['good', 'bad']
Categories (3, object): ['bad' < 'neutral' < 'good']

In [22]: original.dtype == unique.dtype
Out[22]: True

在 `DataFrame.combine_first()` 中保留 dtypes#

DataFrame.combine_first() 现在将保留 dtypes (GH 7509)

In [23]: df1 = pd.DataFrame({"A": [1, 2, 3], "B": [1, 2, 3]}, index=[0, 1, 2])

In [24]: df1
Out[24]: 
   A  B
0  1  1
1  2  2
2  3  3

In [25]: df2 = pd.DataFrame({"B": [4, 5, 6], "C": [1, 2, 3]}, index=[2, 3, 4])

In [26]: df2
Out[26]: 
   B  C
2  4  1
3  5  2
4  6  3

In [27]: combined = df1.combine_first(df2)

旧行为:

In [1]: combined.dtypes
Out[2]:
A    float64
B    float64
C    float64
dtype: object

新行为:

In [28]: combined.dtypes
Out[28]: 
A    float64
B      int64
C    float64
dtype: object

Groupby 方法 agg 和 transform 不再更改可调用对象的返回 dtype#

以前，当参数 func 是可调用对象时，方法 DataFrameGroupBy.aggregate()、SeriesGroupBy.aggregate()、DataFrameGroupBy.transform() 和 SeriesGroupBy.transform() 可能会转换结果的 dtype，可能导致不理想的结果 (GH 21240)。当结果是数值类型且转换回输入 dtype 不改变任何值时（通过 np.allclose 测量），就会发生这种转换。现在不再发生这种转换。

In [29]: df = pd.DataFrame({'key': [1, 1], 'a': [True, False], 'b': [True, True]})

In [30]: df
Out[30]: 
   key      a     b
0    1   True  True
1    1  False  True

旧行为:

In [5]: df.groupby('key').agg(lambda x: x.sum())
Out[5]:
        a  b
key
1    True  2

新行为:

In [31]: df.groupby('key').agg(lambda x: x.sum())
Out[31]: 
     a  b
key      
1    1  2

`DataFrameGroupBy.mean()`、`DataFrameGroupBy.median()` 和 `GDataFrameGroupBy.var()`、`SeriesGroupBy.mean()`、`SeriesGroupBy.median()` 和 `SeriesGroupBy.var()` 的 `float` 结果#

以前，这些方法可能根据输入值的不同而产生不同的 dtype。现在，这些方法将始终返回 float dtype。 (GH 41137)

In [32]: df = pd.DataFrame({'a': [True], 'b': [1], 'c': [1.0]})

旧行为:

In [5]: df.groupby(df.index).mean()
Out[5]:
        a  b    c
0    True  1  1.0

新行为:

In [33]: df.groupby(df.index).mean()
Out[33]: 
     a    b    c
0  1.0  1.0  1.0

使用 `loc` 和 `iloc` 设置值时尝试就地操作#

当使用 loc 或 iloc 设置整个列时，pandas 将尝试将值插入到现有数据中，而不是创建一个全新的数组。

In [34]: df = pd.DataFrame(range(3), columns=["A"], dtype="float64")

In [35]: values = df.values

In [36]: new = np.array([5, 6, 7], dtype="int64")

In [37]: df.loc[[0, 1, 2], "A"] = new

在新旧行为中，values 中的数据都会被覆盖，但在旧行为中，df["A"] 的 dtype 变更为 int64。

旧行为:

In [1]: df.dtypes
Out[1]:
A    int64
dtype: object
In [2]: np.shares_memory(df["A"].values, new)
Out[2]: False
In [3]: np.shares_memory(df["A"].values, values)
Out[3]: False

在 pandas 1.3.0 中，df 继续与 values 共享数据

新行为:

In [38]: df.dtypes
Out[38]: 
A    float64
dtype: object

In [39]: np.shares_memory(df["A"], new)
Out[39]: False

In [40]: np.shares_memory(df["A"], values)
Out[40]: True

设置 `frame[keys] = values` 时从不就地操作#

当使用 frame[keys] = values 设置多列时，新数组将替换这些键的预先存在的数组，而这些数组将 **不会** 被覆盖 (GH 39510)。结果是，列将保留 values 的 dtype(s)，绝不会转换为现有数组的 dtypes。

In [41]: df = pd.DataFrame(range(3), columns=["A"], dtype="float64")

In [42]: df[["A"]] = 5

在旧行为中，5 被转换为 float64 并插入到支持 df 的现有数组中

旧行为:

In [1]: df.dtypes
Out[1]:
A    float64

在新行为中，我们得到一个新数组，并保留一个整数 dtype 的 5

新行为:

In [43]: df.dtypes
Out[43]: 
A    int64
dtype: object

在布尔 Series 中设置值时一致的类型转换#

现在将非布尔值设置到 dtype=bool 的 Series 中时，将始终转换为 dtype=object (GH 38709)

In [1]: orig = pd.Series([True, False])

In [2]: ser = orig.copy()

In [3]: ser.iloc[1] = np.nan

In [4]: ser2 = orig.copy()

In [5]: ser2.iloc[1] = 2.0

旧行为:

In [1]: ser
Out [1]:
0    1.0
1    NaN
dtype: float64

In [2]:ser2
Out [2]:
0    True
1     2.0
dtype: object

新行为:

In [1]: ser
Out [1]:
0    True
1     NaN
dtype: object

In [2]:ser2
Out [2]:
0    True
1     2.0
dtype: object

DataFrameGroupBy.rolling 和 SeriesGroupBy.rolling 不再在值中返回分组列#

现在，分组列将从 groupby.rolling 操作的结果中删除 (GH 32262)

In [44]: df = pd.DataFrame({"A": [1, 1, 2, 3], "B": [0, 1, 2, 3]})

In [45]: df
Out[45]: 
   A  B
0  1  0
1  1  1
2  2  2
3  3  3

旧行为:

In [1]: df.groupby("A").rolling(2).sum()
Out[1]:
       A    B
A
1 0  NaN  NaN
1    2.0  1.0
2 2  NaN  NaN
3 3  NaN  NaN

新行为:

In [46]: df.groupby("A").rolling(2).sum()
Out[46]: 
       B
A       
1 0  NaN
  1  1.0
2 2  NaN
3 3  NaN

移除了滚动方差和标准差中的人工截断#

Rolling.std() 和 Rolling.var() 将不再人工截断小于 ~1e-8 和 ~1e-15 的结果为零 (GH 37051, GH 40448, GH 39872)。

然而，当在较大值上滚动时，结果中现在可能存在浮点数误差。

In [47]: s = pd.Series([7, 5, 5, 5])

In [48]: s.rolling(3).var()
Out[48]: 
0         NaN
1         NaN
2    1.333333
3    0.000000
dtype: float64

DataFrameGroupBy.rolling 和 SeriesGroupBy.rolling 与 MultiIndex 不再在结果中删除级别#

DataFrameGroupBy.rolling() 和 SeriesGroupBy.rolling() 将不再删除结果中具有 MultiIndex 的 DataFrame 的级别。这可能导致结果 MultiIndex 中级别重复的感知，但此更改恢复了 1.1.3 版本中的行为 (GH 38787, GH 38523)。

In [49]: index = pd.MultiIndex.from_tuples([('idx1', 'idx2')], names=['label1', 'label2'])

In [50]: df = pd.DataFrame({'a': [1], 'b': [2]}, index=index)

In [51]: df
Out[51]: 
               a  b
label1 label2      
idx1   idx2    1  2

旧行为:

In [1]: df.groupby('label1').rolling(1).sum()
Out[1]:
          a    b
label1
idx1    1.0  2.0

新行为:

In [52]: df.groupby('label1').rolling(1).sum()
Out[52]: 
                        a    b
label1 label1 label2          
idx1   idx1   idx2    1.0  2.0

向后不兼容的 API 更改#

增加了依赖项的最低版本#

某些依赖项的最低支持版本已更新。如果已安装，我们现在要求

包	最低版本	必需	已更改
numpy	1.17.3	X	X
pytz	2017.3	X
python-dateutil	2.7.3	X
bottleneck	1.2.1
numexpr	2.7.0		X
pytest (dev)	6.0		X
mypy (dev)	0.812		X
setuptools	38.6.0		X

对于可选库，一般建议使用最新版本。下表列出了 pandas 开发过程中当前测试的每个库的最低版本。低于最低测试版本的可选库可能仍然有效，但不被视为支持。

包	最低版本	已更改
beautifulsoup4	4.6.0
fastparquet	0.4.0	X
fsspec	0.7.4
gcsfs	0.6.0
lxml	4.3.0
matplotlib	2.2.3
numba	0.46.0
openpyxl	3.0.0	X
pyarrow	0.17.0	X
pymysql	0.8.1	X
pytables	3.5.1
s3fs	0.4.0
scipy	1.2.0
sqlalchemy	1.3.0	X
tabulate	0.8.7	X
xarray	0.12.0
xlrd	1.2.0
xlsxwriter	1.0.2
xlwt	1.3.0
pandas-gbq	0.12.0

有关更多信息，请参阅依赖项和可选依赖项。

其他 API 更改#

部分初始化的 CategoricalDtype 对象（即那些 categories=None 的对象）将不再与完全初始化的 dtype 对象进行等值比较 (GH 38516)
现在，访问 DataFrame 上的 _constructor_expanddim 和 Series 上的 _constructor_sliced 将引发 AttributeError。以前会引发 NotImplementedError (GH 38782)
向 DataFrame.to_sql() 添加了新的 engine 和 **engine_kwargs 参数，以支持其他未来的“SQL 引擎”。目前我们仍然只在底层使用 SQLAlchemy，但计划支持更多引擎，例如 turbodbc (GH 36893)
从 PeriodIndex 字符串表示中移除了冗余的 freq (GH 41653)
ExtensionDtype.construct_array_type() 现在是 ExtensionDtype 子类的一个必需方法，而不是可选方法 (GH 24860)
现在，对不可哈希的 pandas 对象调用 hash 将引发 TypeError，并附带内置错误消息（例如 unhashable type: 'Series'）。以前会引发自定义消息，例如 'Series' objects are mutable, thus they cannot be hashed。此外，isinstance(<Series>, abc.collections.Hashable) 现在将返回 False (GH 40013)
Styler.from_custom_template() 现在有两个新的模板名称参数，并移除了旧的 name 参数，因为为了更好地解析，引入了模板继承 (GH 42053)。还需要对 Styler 属性进行子类化修改。

构建#

`.pptx` 和 `.pdf` 格式的文档不再包含在 wheels 或源发行版中。（GH 30741）

弃用#

弃用了 DataFrame 缩减和 DataFrameGroupBy 操作中删除无用列的行为#

在 DataFrame 上调用缩减操作（例如 .min、.max、.sum），当 numeric_only=None（默认）时，缩减操作引发 TypeError 的列会被静默忽略并从结果中删除。

此行为已被弃用。在将来的版本中，将引发 TypeError，用户将需要在使用函数之前仅选择有效列。

例如

In [53]: df = pd.DataFrame({"A": [1, 2, 3, 4], "B": pd.date_range("2016-01-01", periods=4)})

In [54]: df
Out[54]: 
   A          B
0  1 2016-01-01
1  2 2016-01-02
2  3 2016-01-03
3  4 2016-01-04

旧行为:

In [3]: df.prod()
Out[3]:
Out[3]:
A    24
dtype: int64

未来行为:

In [4]: df.prod()
...
TypeError: 'DatetimeArray' does not implement reduction 'prod'

In [5]: df[["A"]].prod()
Out[5]:
A    24
dtype: int64

同样，当将函数应用于 DataFrameGroupBy 时，函数引发 TypeError 的列目前会被静默忽略并从结果中删除。

此行为已被弃用。在将来的版本中，将引发 TypeError，用户将需要在使用函数之前仅选择有效列。

例如

In [55]: df = pd.DataFrame({"A": [1, 2, 3, 4], "B": pd.date_range("2016-01-01", periods=4)})

In [56]: gb = df.groupby([1, 1, 2, 2])

旧行为:

In [4]: gb.prod(numeric_only=False)
Out[4]:
A
1   2
2  12

未来行为:

In [5]: gb.prod(numeric_only=False)
...
TypeError: datetime64 type does not support prod operations

In [6]: gb[["A"]].prod(numeric_only=False)
Out[6]:
    A
1   2
2  12

其他弃用#

弃用了允许将标量传递给 Categorical 构造函数的行为 (GH 38433)
弃用了在不传递类列表数据的情况下构造 CategoricalIndex 的行为 (GH 38944)
弃用了在 Index 构造函数中允许子类特定关键字参数的行为，请直接使用特定的子类替代 (GH 14093, GH 21311, GH 22315, GH 26974)
弃用了 datetimelike（timedelta64[ns]、datetime64[ns]、Datetime64TZDtype、PeriodDtype）的 astype() 方法，用于转换为整数 dtypes，请改用 values.view(...)。此弃用在 pandas 1.4.0 中被撤销。
弃用了 MultiIndex.is_lexsorted() 和 MultiIndex.lexsort_depth()，请改用 MultiIndex.is_monotonic_increasing() (GH 32259)
弃用了 Series.where()、Series.mask()、DataFrame.where()、DataFrame.mask() 中的关键字 try_cast；如果需要，请手动转换结果 (GH 38836)
弃用了 Timestamp 对象与 datetime.date 对象的比较。请改用 ts <= pd.Timestamp(mydate) 或 ts.date() <= mydate 而不是 ts <= mydate (GH 36131)
弃用了 Rolling.win_type 返回 "freq" 的行为 (GH 38963)
弃用了 Rolling.is_datetimelike (GH 38963)
弃用了 DataFrame 索引器，用于 Series.__setitem__() 和 DataFrame.__setitem__() (GH 39004)
弃用了 ExponentialMovingWindow.vol() (GH 39220)
弃用了使用 .astype 在 datetime64[ns] dtype 和 DatetimeTZDtype 之间进行转换；在未来版本中，这将引发错误，请改用 obj.tz_localize 或 obj.dt.tz_localize (GH 38622)
弃用了将 datetime.date 对象转换为 datetime64 的行为，当它们在 DataFrame.unstack()、DataFrame.shift()、Series.shift() 和 DataFrame.reindex() 中用作 fill_value 时，请改传 pd.Timestamp(dateobj) (GH 39767)
弃用了 Styler.set_na_rep() 和 Styler.set_precision()，转而使用 Styler.format()，其中 na_rep 和 precision 分别作为现有和新的输入参数 (GH 40134, GH 40425)
弃用了 Styler.where()，转而使用 Styler.applymap() 的替代形式 (GH 40821)
弃用了在 Series.transform() 和 DataFrame.transform() 中允许部分失败的行为，当 func 为类列表或类字典且引发除 TypeError 以外的任何异常时；在未来版本中，func 引发除 TypeError 以外的任何异常都将引发错误 (GH 40211)
弃用了 read_csv() 和 read_table() 中的参数 error_bad_lines 和 warn_bad_lines，转而使用参数 on_bad_lines (GH 15122)
弃用了 DataFrame 构造函数中对 np.ma.mrecords.MaskedRecords 的支持，请改传 {name: data[name] for name in data.dtype.names} (GH 40363)
弃用了在不同级别数量上使用 merge()、DataFrame.merge() 和 DataFrame.join() 的行为 (GH 34862)
弃用了 ExcelWriter 中的 **kwargs；请改用关键字参数 engine_kwargs (GH 40430)
弃用了 DataFrame 和 Series 聚合中的 level 关键字；请改用 groupby (GH 39983)
弃用了 Categorical.remove_categories()、Categorical.add_categories()、Categorical.reorder_categories()、Categorical.rename_categories()、Categorical.set_categories() 的 inplace 参数，并将在未来版本中移除 (GH 37643)
弃用了 merge() 通过 suffixes 关键字和已存在列产生重复列的行为 (GH 22818)
弃用了设置 Categorical._codes 的行为，请改用所需代码创建新的 Categorical (GH 40606)
弃用了 read_excel() 和 ExcelFile.parse() 中的可选参数 convert_float (GH 41127)
弃用了 DatetimeIndex.union() 处理混合时区的行为；在未来版本中，两者都将转换为 UTC 而非 object dtype (GH 39328)
弃用了在 read_csv() 中使用超出范围索引的 usecols 并设置 engine="c" 的行为 (GH 25623)
弃用了 DataFrame 构造函数中对第一个元素是 Categorical 的列表的特殊处理；请改传 pd.DataFrame({col: categorical, ...}) (GH 38845)
弃用了 DataFrame 构造函数中，当传递 dtype 且数据无法转换为该 dtype 时的行为。在未来的版本中，这将引发错误而不是被静默忽略 (GH 24435)
弃用了 Timestamp.freq 属性。对于使用它的属性（is_month_start、is_month_end、is_quarter_start、is_quarter_end、is_year_start、is_year_end），当你有 freq 时，请使用例如 freq.is_month_start(ts) (GH 15146)
弃用了使用 DatetimeTZDtype 数据和 datetime64[ns] dtype 构造 Series 或 DataFrame 的行为。请改用 Series(data).dt.tz_localize(None) (GH 41555, GH 33401)
弃用了使用大整数值和小整数 dtype 构造 Series 时静默溢出的行为；请改用 Series(data).astype(dtype) (GH 41734)
弃用了 DataFrame 构造函数中浮点数据转换为整数 dtype 的行为，即使有损失；在未来版本中，这将保持浮点类型，与 Series 的行为一致 (GH 41770)
弃用了在传递包含字符串的数据且未传递 dtype 时，Series 构造函数中推断 timedelta64[ns]、datetime64[ns] 或 DatetimeTZDtype dtypes 的行为 (GH 33558)
在未来的版本中，使用 datetime64[ns] 数据和 DatetimeTZDtype 构造 Series 或 DataFrame 将把数据视为本地时间而不是 UTC 时间（与 DatetimeIndex 行为一致）。要将数据视为 UTC 时间，请使用 pd.Series(data).dt.tz_localize("UTC").dt.tz_convert(dtype.tz) 或 pd.Series(data.view("int64"), dtype=dtype) (GH 33401)
弃用了将列表作为 key 传递给 DataFrame.xs() 和 Series.xs() 的行为 (GH 41760)
弃用了 Series.between() 的布尔参数 inclusive，将其标准参数值改为 {"left", "right", "neither", "both"} (GH 40628)
弃用了以下所有函数的位置参数传递（注明例外）(GH 41485)
- concat()（除 objs 外）
- read_csv()（除 filepath_or_buffer 外）
- read_table()（除 filepath_or_buffer 外）
- DataFrame.clip() 和 Series.clip()（除 upper 和 lower 外）
- DataFrame.drop_duplicates()（subset 除外）、Series.drop_duplicates()、Index.drop_duplicates() 和 MultiIndex.drop_duplicates()
- DataFrame.drop()（除 labels 外）和 Series.drop()
- DataFrame.dropna() 和 Series.dropna()
- DataFrame.ffill()、Series.ffill()、DataFrame.bfill() 和 Series.bfill()
- DataFrame.fillna() 和 Series.fillna()（value 除外）
- DataFrame.interpolate() 和 Series.interpolate()（除 method 外）
- DataFrame.mask() 和 Series.mask()（除 cond 和 other 外）
- DataFrame.reset_index()（除 level 外）和 Series.reset_index()
- DataFrame.set_axis() 和 Series.set_axis()（除 labels 外）
- DataFrame.set_index()（除 keys 外）
- DataFrame.sort_index() 和 Series.sort_index()
- DataFrame.sort_values()（除 by 外）和 Series.sort_values()
- DataFrame.where() 和 Series.where()（除 cond 和 other 外）
- Index.set_names() 和 MultiIndex.set_names()（names 除外）
- MultiIndex.codes()（`codes` 除外）
- MultiIndex.set_levels()（`levels` 除外）
- Resampler.interpolate()（`method` 除外）

性能改进#

IntervalIndex.isin() 的性能改进（GH 38353）
针对可空数据类型的 Series.mean() 性能改进（GH 34814）
针对可空数据类型的 Series.isin() 性能改进（GH 38340）
针对可空浮点和可空整数 dtype，DataFrame.fillna() 使用 `method="pad"` 或 `method="backfill"` 时的性能改进（GH 39953）
针对 `method=kendall` 的 DataFrame.corr() 性能改进（GH 28329）
针对 `method=spearman` 的 DataFrame.corr() 性能改进（GH 40956, GH 41885）
Rolling.corr() 和 Rolling.cov() 的性能改进（GH 39388）
RollingGroupby.corr(), ExpandingGroupby.corr(), ExpandingGroupby.corr() 和 ExpandingGroupby.cov() 的性能改进（GH 39591）
针对对象数据类型的 unique() 性能改进（GH 37615）
针对基本情况（包括分隔符）的 json_normalize() 性能改进（GH 40035 GH 15621）
ExpandingGroupby 聚合方法的性能改进（GH 39664）
Styler 的性能改进，其渲染时间减少了 50% 以上，现在与 DataFrame.to_html() 匹配（GH 39972 GH 39952, GH 40425）
方法 Styler.set_td_classes() 现在与 Styler.apply() 和 Styler.applymap() 的性能相当，在某些情况下甚至更好（GH 40453）
带 `times` 参数的 ExponentialMovingWindow.mean() 性能改进（GH 39784）
当需要 Python 回退实现时，DataFrameGroupBy.apply() 和 SeriesGroupBy.apply() 的性能改进（GH 40176）
将 PyArrow 布尔数组转换为 pandas 可空布尔数组的性能改进（GH 41051）
`CategoricalDtype` 类型数据拼接的性能改进（GH 40193）
在 DataFrameGroupBy.cummin(), SeriesGroupBy.cummin(), DataFrameGroupBy.cummax() 和 SeriesGroupBy.cummax() 中可空数据类型的性能改进（GH 37493）
含 nan 值的 Series.nunique() 性能改进（GH 40865）
在 DataFrame.transpose(), Series.unstack() 中 `DatetimeTZDtype` 的性能改进（GH 40149）
在 Series.plot() 和 DataFrame.plot() 中使用入口点延迟加载的性能改进（GH 41492）

Bug 修复#

分类#

CategoricalIndex 的 Bug，当传递标量数据时错误地未引发 `TypeError`（GH 38614）
CategoricalIndex.reindex 的 Bug，当传入的 Index 不是分类类型但其值都是类别中的标签时，操作失败（GH 28690）
从 `date` 对象的 object-dtype 数组构造 Categorical 时，未能正确地与 `astype` 进行往返转换的 Bug（GH 38552）
从 `ndarray` 和 CategoricalDtype 构造 DataFrame 的 Bug（GH 38857）
在 DataFrame 中将分类值设置到 object-dtype 列的 Bug（GH 39136）
DataFrame.reindex() 的 Bug，当新索引包含重复项且旧索引为 CategoricalIndex 时，引发 `IndexError`（GH 38906）
Categorical.fillna() 的 Bug，当使用非类别元组进行填充时，对于元组类别的处理，错误地引发 `NotImplementedError` 而非 `ValueError`（GH 41914）

日期时间类#

DataFrame 和 Series 构造函数的 Bug，有时会从 Timestamp（或 Timedelta）`data` 中丢弃纳秒，当 `dtype=datetime64[ns]`（或 `timedelta64[ns]`）时（GH 38032）
DataFrame.first() 和 Series.first() 的 Bug，当第一个日期是某月的最后一天时，带有一个月偏移量会返回不正确的结果（GH 29623）
构造 DataFrame 或 Series 时，`datetime64` 数据与 `timedelta64` dtype 不匹配，或反之亦然，未能引发 `TypeError` 的 Bug（GH 38575, GH 38764, GH 38792）
构造 Series 或 DataFrame 时，`datetime` 对象超出 `datetime64[ns]` dtype 范围，或 `timedelta` 对象超出 `timedelta64[ns]` dtype 范围的 Bug（GH 38792, GH 38965）
DatetimeIndex.intersection(), DatetimeIndex.symmetric_difference(), PeriodIndex.intersection(), PeriodIndex.symmetric_difference() 的 Bug，在与 CategoricalIndex 操作时总是返回 object-dtype（GH 38741）
DatetimeIndex.intersection() 的 Bug，当 `n != 1` 时，使用非 Tick 频率会给出不正确的结果（GH 42104）
Series.where() 的 Bug，错误地将 `datetime64` 值转换为 `int64`（GH 37682）
Categorical 的 Bug，错误地将 `datetime` 对象类型转换为 Timestamp（GH 38878）
Timestamp 对象与纳秒 `datetime64` 对象之间，当后者刚好超出实现边界时比较的 Bug（GH 39221）
在 Timestamp.round(), Timestamp.floor(), Timestamp.ceil() 中，当值接近 Timestamp 的实现边界时的 Bug（GH 39244）
在 Timedelta.round(), Timedelta.floor(), Timedelta.ceil() 中，当值接近 Timedelta 的实现边界时的 Bug（GH 38964）
date_range() 的 Bug，在极端情况下错误地创建包含 `NaT` 的 DatetimeIndex，而不是引发 `OutOfBoundsDatetime`（GH 24124）
infer_freq() 的 Bug，如果 DatetimeIndex 具有时区并跨越夏令时边界，则错误地无法推断出 'H' 频率（GH 39556）
由 DatetimeArray 或 TimedeltaArray 支持的 Series 的 Bug，有时未能将数组的 `freq` 设置为 `None`（GH 41425）

时间差#

从 `np.timedelta64` 对象构造 Timedelta 的 Bug，其中非纳秒单位超出 `timedelta64[ns]` 的范围（GH 38965）
构造 TimedeltaIndex 时错误地接受 `np.datetime64("NaT")` 对象的 Bug（GH 39462）
从仅包含符号而无数字的输入字符串构造 Timedelta 时未能引发错误的 Bug（GH 39710）
TimedeltaIndex 和 to_timedelta() 的 Bug，当传入的非纳秒 `timedelta64` 数组在转换为 `timedelta64[ns]` 时溢出，却未能引发错误（GH 40008）

时区#

不同的表示 UTC 的 `tzinfo` 对象未被视为等效的 Bug（GH 39216）
dateutil.tz.gettz("UTC") 未被识别为与表示 UTC 的其他 tzinfo 等效的 Bug（GH 39276）

数值#

DataFrame.quantile(), DataFrame.sort_values() 导致后续索引行为不正确的 Bug（GH 38351）
DataFrame.sort_values() 的 Bug，对于空的 `by` 参数会引发 `IndexError`（GH 40258）
DataFrame.select_dtypes() 的 Bug，当 `include=np.number` 时会丢弃数值 `ExtensionDtype` 列（GH 35340）
DataFrame.mode() 和 Series.mode() 的 Bug，对于空输入未能保持一致的整数 Index（GH 33321）
DataFrame.rank() 的 Bug，当 DataFrame 包含 `np.inf` 时（GH 32593）
DataFrame.rank() 的 Bug，当 `axis=0` 且列包含不可比较类型时引发 `IndexError`（GH 38932）
Series.rank(), DataFrame.rank(), DataFrameGroupBy.rank() 和 SeriesGroupBy.rank() 的 Bug，将最负的 `int64` 值视为缺失值（GH 32859）
DataFrame.select_dtypes() 的 Bug，在 Windows 和 Linux 之间，当 `include="int"` 时行为不同（GH 36596）
DataFrame.apply() 和 DataFrame.agg() 的 Bug，当传入参数 `func="size"` 时，会对整个 DataFrame 进行操作，而不是针对行或列（GH 39934）
DataFrame.transform() 的 Bug，当传入字典且列缺失时会引发 `SpecificationError`；现在将改为引发 `KeyError`（GH 40004）
DataFrameGroupBy.rank() 和 SeriesGroupBy.rank() 的 Bug，当 `pct=True` 且连续组之间存在相等值时，给出不正确的结果（GH 40518）
Series.count() 的 Bug，在 32 位平台上，当参数 `level=None` 时，结果会是 `int32` 类型（GH 40908）
Series 和 DataFrame 的 Bug，当使用 `any` 和 `all` 方法进行缩减时，对于对象数据未返回布尔结果（GH 12863, GH 35450, GH 27709）
Series.clip() 的 Bug，如果 Series 包含 NA 值且数据类型为可空整数或浮点数，则会失败（GH 40851）
UInt64Index.where() 和 UInt64Index.putmask() 的 Bug，当 `other` 参数是 `np.int64` dtype 时，错误地引发 `TypeError`（GH 41974）
DataFrame.agg() 的 Bug，当一个或多个聚合函数未能产生结果时，聚合轴未按照提供的聚合函数顺序排序（GH 33634）
DataFrame.clip() 的 Bug，未将缺失值解释为无阈值（GH 40420）

转换#

Series.to_dict() 的 Bug，当 `orient='records'` 时，现在返回 Python 原生类型（GH 25969）
Series.view() 和 Index.view() 的 Bug，在日期时间类（`datetime64[ns]`、`datetime64[ns, tz]`、`timedelta64`、`period`）dtypes 之间进行转换时（GH 39788）
从空的 `np.recarray` 创建 DataFrame 时，未能保留原始 dtypes 的 Bug（GH 40121）
从 `frozenset` 构造 DataFrame 时，未能引发 `TypeError` 的 Bug（GH 40163）
在 Index 构造中，当数据无法转换为传入的 `dtype` 时，默默地忽略该 `dtype` 的 Bug（GH 21311）
StringArray.astype() 的 Bug，在转换为 `dtype='categorical'` 时，回退到 NumPy 并引发错误（GH 40450）
factorize() 的 Bug，当给定一个数值 NumPy dtype 低于 int64、uint64 和 float64 的数组时，唯一值未能保留其原始 dtype（GH 41132）
用包含具有 `ExtensionDtype` 的类数组和 `copy=True` 的字典构造 DataFrame 时，未能创建副本的 Bug（GH 38939）
qcut() 的 Bug，当以 `Float64DType` 作为输入时引发错误（GH 40730）
构造 DataFrame 和 Series 时，`datetime64[ns]` 数据和 `dtype=object` 导致结果为 `datetime` 对象而非 Timestamp 对象的 Bug（GH 41599）
构造 DataFrame 和 Series 时，`timedelta64[ns]` 数据和 `dtype=object` 导致结果为 `np.timedelta64` 对象而非 Timedelta 对象的 Bug（GH 41599）
构造 DataFrame 时，当给定一个包含 Period 或 Interval 对象的二维 object-dtype `np.ndarray` 时，未能分别转换为 PeriodDtype 或 IntervalDtype 的 Bug（GH 41812）
从列表和 PandasDtype 构造 Series 的 Bug（GH 39357）
从不符合 `int64` dtype 范围的 `range` 对象创建 Series 的 Bug（GH 30173）
从具有全元组键和需要重新索引的 Index 的 `dict` 创建 Series 的 Bug（GH 41707）
infer_dtype() 的 Bug，未能识别带有 Period dtype 的 Series、Index 或数组（GH 23553）
infer_dtype() 的 Bug，对于一般的 ExtensionArray 对象会引发错误。现在将返回 `"unknown-array"` 而非引发错误（GH 37367）
DataFrame.convert_dtypes() 的 Bug，在空 DataFrame 上调用时错误地引发了 `ValueError`（GH 40393）

字符串#

将 `pyarrow.ChunkedArray` 转换为 StringArray 的 Bug，当原始数组有零个块时（GH 41040）
Series.replace() 和 DataFrame.replace() 的 Bug，对于 `StringDType` 数据，当 `regex=True` 时忽略替换（GH 41333, GH 35977）
使用 StringArray 的 Series.str.extract() 的 Bug，对于空的 DataFrame 返回 object dtype（GH 41441）
Series.str.replace() 的 Bug，当 `regex=False` 时 `case` 参数被忽略（GH 41602）

区间#

IntervalIndex.intersection() 和 IntervalIndex.symmetric_difference() 的 Bug，在与 CategoricalIndex 操作时总是返回 object-dtype（GH 38653, GH 38741）
IntervalIndex.intersection() 的 Bug，当至少一个 Index 对象包含在另一个中存在的重复项时，返回重复项（GH 38743）
IntervalIndex.union(), IntervalIndex.intersection(), IntervalIndex.difference() 和 IntervalIndex.symmetric_difference() 现在，当与另一个具有不兼容 dtype 的 IntervalIndex 操作时，会转换为适当的 dtype 而不是引发 `TypeError`（GH 39267）
PeriodIndex.union(), PeriodIndex.intersection(), PeriodIndex.symmetric_difference(), PeriodIndex.difference() 现在，当与另一个具有不兼容 dtype 的 PeriodIndex 操作时，会转换为 object dtype 而不是引发 `IncompatibleFrequency`（GH 39306）
IntervalIndex.is_monotonic(), IntervalIndex.get_loc(), IntervalIndex.get_indexer_for() 和 IntervalIndex.__contains__() 的 Bug，当存在 NA 值时（GH 41831）

索引#

Index.union() 和 MultiIndex.union() 的 Bug，当 Index 不是单调的或 `sort` 设置为 `False` 时，会丢弃重复的 Index 值（GH 36289, GH 31326, GH 40862）
CategoricalIndex.get_indexer() 的 Bug，在非唯一时未能引发 `InvalidIndexError`（GH 38372）
IntervalIndex.get_indexer() 的 Bug，当 `target` 具有 `CategoricalDtype` 且索引和目标都包含 NA 值时（GH 41934）
Series.loc() 的 Bug，当输入使用布尔列表过滤且要设置的值是低维列表时，引发 `ValueError`（GH 20438）
在 DataFrame 中插入许多新列导致后续索引行为不正确的 Bug（GH 38380）
DataFrame.__setitem__() 的 Bug，当为重复列设置多个值时引发 `ValueError`（GH 15695）
DataFrame.loc(), Series.loc(), DataFrame.__getitem__() 和 Series.__getitem__() 的 Bug，对于字符串切片，当 DatetimeIndex 非单调时返回不正确的元素（GH 33146）
DataFrame.reindex() 和 Series.reindex() 的 Bug，带时区感知索引时，当 `method="ffill"` 和 `method="bfill"` 且指定 `tolerance` 时引发 `TypeError`（GH 38566）
DataFrame.reindex() 的 Bug，当 `datetime64[ns]` 或 `timedelta64[ns]` 的 `fill_value` 需要转换为 object dtype 时，错误地转换为整数（GH 39755）
DataFrame.__setitem__() 的 Bug，当使用指定列和非空 DataFrame 值设置空 DataFrame 时引发 `ValueError`（GH 38831）
DataFrame.loc.__setitem__() 的 Bug，当 DataFrame 存在重复列时，对唯一列进行操作会引发 `ValueError`（GH 38521）
DataFrame.iloc.__setitem__() 和 DataFrame.loc.__setitem__() 的 Bug，当使用字典值进行设置时，对于混合 dtypes 的处理（GH 38335）
Series.loc.__setitem__() 和 DataFrame.loc.__setitem__() 的 Bug，当提供布尔生成器时引发 `KeyError`（GH 39614）
Series.iloc() 和 DataFrame.iloc() 的 Bug，当提供生成器时引发 `KeyError`（GH 39614）
DataFrame.__setitem__() 的 Bug，当右侧是一个列数错误的 DataFrame 时，未引发 `ValueError`（GH 38604）
Series.__setitem__() 的 Bug，当使用标量索引器设置 Series 时引发 `ValueError`（GH 38303）
DataFrame.loc() 的 Bug，当作为输入的 DataFrame 只有一行时，会丢弃 MultiIndex 的级别（GH 10521）
DataFrame.__getitem__() 和 Series.__getitem__() 的 Bug，当使用具有毫秒的 Index 的现有字符串进行切片时，总是引发 `KeyError`（GH 33589）
在数值 Series 中设置 `timedelta64` 或 `datetime64` 值时，未能转换为 object dtype 的 Bug（GH 39086, GH 39619）
在 Series 或 DataFrame 中设置 Interval 值时，当 IntervalDtype 不匹配时，错误地将新值转换为现有 dtype 的 Bug（GH 39120）
在具有整数 dtype 的 Series 中设置 `datetime64` 值时，错误地将 `datetime64` 值转换为整数的 Bug（GH 39266）
在具有 Datetime64TZDtype 的 Series 中设置 `np.datetime64("NaT")` 时，错误地将无时区的值视为有时区的值的 Bug（GH 39769）
Index.get_loc() 的 Bug，当 `key=NaN` 且指定 `method` 但 `NaN` 不在 Index 中时，未引发 `KeyError`（GH 39382）
DatetimeIndex.insert() 的 Bug，将 `np.datetime64("NaT")` 插入到时区感知索引时，错误地将无时区的值视为有时区的值（GH 39769）
在 Index.insert() 中，当设置的新列无法保留在现有 `frame.columns` 中时，或在 Series.reset_index() 或 DataFrame.reset_index() 中，错误地引发错误，而不是转换为兼容的 dtype 的 Bug（GH 39068）
RangeIndex.append() 的 Bug，其中长度为 1 的单个对象被错误地连接（GH 39401）
RangeIndex.astype() 的 Bug，当转换为 CategoricalIndex 时，类别变成了 Int64Index 而非 RangeIndex（GH 41263）
使用布尔索引器将 `numpy.timedelta64` 值设置到 object-dtype Series 的 Bug（GH 39488）
使用 `at` 或 `iat` 将数值设置到布尔 dtype Series 时，未能转换为 object-dtype 的 Bug（GH 39582）
DataFrame.__setitem__() 和 DataFrame.iloc.__setitem__() 的 Bug，当尝试使用行切片进行索引并将列表设置为值时引发 `ValueError`（GH 40440）
DataFrame.loc() 的 Bug，当在 MultiIndex 中未找到键且级别未完全指定时，未引发 `KeyError`（GH 41170）
DataFrame.loc.__setitem__() 的 Bug，在扩展设置时，当扩展轴中的索引包含重复项时，错误地引发错误（GH 40096）
带有 MultiIndex 的 DataFrame.loc.__getitem__() 的 Bug，当至少一个索引列具有 float dtype 且我们检索标量时转换为浮点数（GH 41369）
DataFrame.loc() 的 Bug，错误地匹配非布尔索引元素（GH 20432）
在使用 `np.nan` 对具有 CategoricalIndex 的 Series 或 DataFrame 进行索引时，当存在 `np.nan` 键时，错误地引发 `KeyError` 的 Bug（GH 41933）
Series.__delitem__() 的 Bug，当使用 `ExtensionDtype` 时错误地转换为 `ndarray`（GH 40386）
带有 CategoricalIndex 的 DataFrame.at() 的 Bug，当传入整数键时返回不正确的结果（GH 41846）
DataFrame.loc() 的 Bug，如果索引器包含重复项，则返回错误顺序的 MultiIndex（GH 40978）
DataFrame.__setitem__() 的 Bug，当使用 `str` 子类作为列名且 DatetimeIndex 时引发 `TypeError`（GH 37366）
在 PeriodIndex.get_loc() 中存在一个错误，当给定一个 Period 对象且其 freq 不匹配时，未能抛出 KeyError (GH 41670)
当使用 UInt64Index 和负整数键时，.loc.__getitem__ 存在一个错误，在某些情况下会抛出 OverflowError 而非 KeyError，在其他情况下则会回绕为正整数 (GH 41777)
在 Index.get_indexer() 中存在一个错误，在某些情况下，当给定无效的 method、limit 或 tolerance 参数时，未能抛出 ValueError (GH 41918)
在对具有 TimedeltaIndex 的 Series 或 DataFrame 进行切片时，传递无效字符串会抛出 ValueError 而非 TypeError 的错误 (GH 41821)
在 Index 构造函数中存在一个错误，有时会静默忽略指定的 dtype (GH 38879)
Index.where() 的行为现在与 Index.putmask() 的行为保持一致，即 index.where(mask, other) 匹配 index.putmask(~mask, other) (GH 39412)

缺失值#

在 Grouper 中存在一个错误，未能正确传播 dropna 参数；现在 DataFrameGroupBy.transform() 在 dropna=True 时能正确处理缺失值 (GH 35612)
在 isna()、Series.isna()、Index.isna()、DataFrame.isna() 以及相应的 notna 函数中存在一个错误，无法识别 Decimal("NaN") 对象 (GH 39409)
在 DataFrame.fillna() 中存在一个错误，不接受 downcast 关键字的字典参数 (GH 40809)
在 isna() 中存在一个错误，不会返回可空类型掩码的副本，导致后续的掩码修改会改变原始数组 (GH 40935)
在 DataFrame 构造中存在一个错误，当浮点数据包含 NaN 且指定整数 dtype 时，会进行类型转换而非保留 NaN (GH 26919)
在 Series.isin() 和 MultiIndex.isin() 中存在一个错误，如果元组中包含所有 NaN，它们将不会被视为等效 (GH 41836)

多级索引#

在 DataFrame.drop() 中存在一个错误，当 MultiIndex 不唯一且未提供 level 时会抛出 TypeError (GH 36293)
在 MultiIndex.intersection() 中存在一个错误，导致结果中 NaN 重复 (GH 38623)
在 MultiIndex.equals() 中存在一个错误，当 MultiIndex 包含 NaN 时，即使它们的顺序不同，也会错误地返回 True (GH 38439)
在 MultiIndex.intersection() 中存在一个错误，与 CategoricalIndex 取交集时总是返回空结果 (GH 38653)
在 MultiIndex.difference() 中存在一个错误，当索引包含不可排序的条目时，错误地抛出 TypeError (GH 41915)
在 MultiIndex.reindex() 中存在一个错误，当在空的 MultiIndex 上使用且仅索引特定级别时，会抛出 ValueError (GH 41170)
在 MultiIndex.reindex() 中存在一个错误，当对齐扁平的 Index 时，会抛出 TypeError (GH 41707)

输入/输出#

在 Index.__repr__() 中存在一个错误，当 display.max_seq_items=1 时 (GH 38415)
在 read_csv() 中存在一个错误，当设置了 decimal 参数且 engine="python" 时，无法识别科学计数法 (GH 31920)
在 read_csv() 中存在一个错误，当 NA 值包含注释字符串时，将其解释为注释，现已针对 engine="python" 修复 (GH 34002)
在 read_csv() 中存在一个错误，当文件没有数据行时，如果指定了多个标题列和 index_col，会抛出 IndexError (GH 38292)
在 read_csv() 中存在一个错误，对于 engine="python"，不接受 usecols 的长度与 names 不同 (GH 16469)
在 read_csv() 中存在一个错误，当 delimiter="," 且为 engine="python" 指定了 usecols 和 parse_dates 时，返回对象 dtype (GH 35873)
在 read_csv() 中存在一个错误，当为 engine="c" 指定了 names 和 parse_dates 时，会抛出 TypeError (GH 33699)
在 read_clipboard() 和 DataFrame.to_clipboard() 中存在一个错误，在 WSL 中无法工作 (GH 38527)
允许为 read_sql()、read_sql_query() 和 read_sql_table() 的 parse_dates 参数设置自定义错误值 (GH 35185)
在 DataFrame.to_hdf() 和 Series.to_hdf() 中存在一个错误，当尝试应用于 DataFrame 或 Series 的子类时，会抛出 KeyError (GH 33748)
在 HDFStore.put() 中存在一个错误，当保存具有非字符串 dtype 的 DataFrame 时，会抛出错误的 TypeError (GH 34274)
在 json_normalize() 中存在一个错误，导致生成器对象的第一个元素未包含在返回的 DataFrame 中 (GH 35923)
在 read_csv() 中存在一个错误，当列应解析为日期且为 engine="python" 指定了 usecols 时，会将千位分隔符应用于日期列 (GH 39365)
在 read_excel() 中存在一个错误，当指定多个标题和索引列时，会前向填充 MultiIndex 名称 (GH 34673)
在 read_excel() 中存在一个错误，不遵守 set_option() 的设置 (GH 34252)
在 read_csv() 中存在一个错误，对于可空布尔 dtype，不切换 true_values 和 false_values (GH 34655)
在 read_json() 中存在一个错误，当 orient="split" 时不维护数字字符串索引 (GH 28556)
如果 chunksize 不为零且查询没有返回结果，read_sql() 会返回一个空的生成器。现在它返回一个包含单个空 DataFrame 的生成器 (GH 34411)
在 read_hdf() 中存在一个错误，当使用 where 参数对分类字符串列进行过滤时，返回意外的记录 (GH 39189)
在 read_sas() 中存在一个错误，当 datetimes 为空时，会抛出 ValueError (GH 39725)
在 read_excel() 中存在一个错误，会从单列电子表格中删除空值 (GH 39808)
在 read_excel() 中存在一个错误，会加载某些文件类型的末尾空行/列 (GH 41167)
在 read_excel() 中存在一个错误，当 Excel 文件具有 MultiIndex 标题后跟两个空行且无索引时，会抛出 AttributeError (GH 40442)
在 read_excel()、read_csv()、read_table()、read_fwf() 和 read_clipboard() 中存在一个错误，即在 MultiIndex 标题后且无索引时，会删除一个空行 (GH 40442)
在 DataFrame.to_string() 中存在一个错误，当 index=False 时，截断列位置错误 (GH 40904)
在 DataFrame.to_string() 中存在一个错误，当 index=False 时，会额外添加一个点并使截断行未对齐 (GH 40904)
在 read_orc() 中存在一个错误，总是抛出 AttributeError (GH 40918)
在 read_csv() 和 read_table() 中存在一个错误，如果同时定义了 names 和 prefix，会静默忽略 prefix，现在会抛出 ValueError (GH 39123)
在 read_csv() 和 read_excel() 中存在一个错误，当 mangle_dupe_cols 设置为 True 时，不遵守重复列名的 dtype (GH 35211)
在 read_csv() 中存在一个错误，如果同时定义了 delimiter 和 sep，会静默忽略 sep，现在会抛出 ValueError (GH 39823)
在 read_csv() 和 read_table() 中存在一个错误，当 sys.setprofile 之前被调用时，会错误地解释参数 (GH 41069)
将 PyArrow 转换为 pandas（例如读取 Parquet）时存在一个错误，当具有可空 dtype 且 PyArrow 数组的数据缓冲区大小不是 dtype 大小的倍数时 (GH 40896)
在 read_excel() 中存在一个错误，即使用户指定了 engine 参数，当 pandas 无法确定文件类型时也会抛出错误 (GH 41225)
在 read_clipboard() 中存在一个错误，如果第一列中存在空值，从 Excel 文件复制会使值移到错误的列中 (GH 41108)
在 DataFrame.to_hdf() 和 Series.to_hdf() 中存在一个错误，当尝试将字符串列附加到不兼容的列时，会抛出 TypeError (GH 41897)

周期#

Period 对象或 Index、Series 或 DataFrame 与不匹配的 PeriodDtype 进行比较的行为现在与其他不匹配类型比较一致：对于相等性检查返回 False，对于不相等性检查返回 True，对于不等式检查抛出 TypeError (GH 39274)

绘图#

在 plotting.scatter_matrix() 中存在一个错误，当传递 2d ax 参数时会抛出错误 (GH 16253)
防止 Matplotlib 的 constrained_layout 启用时出现警告 (GH 25261)
在 DataFrame.plot() 中存在一个错误，如果函数被重复调用，并且某些调用使用了 yerr 而其他调用没有使用，则图例中会显示错误的颜色 (GH 39522)
在 DataFrame.plot() 中存在一个错误，如果函数被重复调用，并且某些调用使用了 secondary_y 而其他调用使用了 legend=False，则图例中会显示错误的颜色 (GH 40044)
在 DataFrame.plot.box() 中存在一个错误，当选择了 dark_background 主题时，绘图的上限或最小/最大标记不可见 (GH 40769)

分组/重采样/滚动#

在 DataFrameGroupBy.agg() 和 SeriesGroupBy.agg() 中存在一个错误，PeriodDtype 列的聚合结果类型转换过于激进 (GH 38254)
在 SeriesGroupBy.value_counts() 中存在一个错误，分组分类 Series 中未观察到的类别未被计数 (GH 38672)
在 SeriesGroupBy.value_counts() 中存在一个错误，在空 Series 上会抛出错误 (GH 39172)
在 GroupBy.indices() 中存在一个错误，当分组键中存在空值时，会包含不存在的索引 (GH 9304)
修复了 DataFrameGroupBy.sum() 和 SeriesGroupBy.sum() 中的错误，现在使用 Kahan 求和法来防止精度损失 (GH 38778)
修复了 DataFrameGroupBy.cumsum()、SeriesGroupBy.cumsum()、DataFrameGroupBy.mean() 和 SeriesGroupBy.mean() 中的错误，通过使用 Kahan 求和法防止精度损失 (GH 38934)
在 Resampler.aggregate() 和 DataFrame.transform() 中存在一个错误，当缺失的键具有混合 dtype 时，会抛出 TypeError 而非 SpecificationError (GH 39025)
在 DataFrameGroupBy.idxmin() 和 DataFrameGroupBy.idxmax() 中存在一个错误，涉及 ExtensionDtype 列 (GH 38733)
在 Series.resample() 中存在一个错误，当索引是包含 NaT 的 PeriodIndex 时会抛出错误 (GH 39227)
在 RollingGroupby.corr() 和 ExpandingGroupby.corr() 中存在一个错误，当提供比每个组更长的 other 时，分组列会返回 0 而非 np.nan (GH 39591)
在 ExpandingGroupby.corr() 和 ExpandingGroupby.cov() 中存在一个错误，当提供比每个组更长的 other 时，会返回 1 而非 np.nan (GH 39591)
在 DataFrameGroupBy.mean()、SeriesGroupBy.mean()、DataFrameGroupBy.median()、SeriesGroupBy.median() 和 DataFrame.pivot_table() 中存在一个错误，无法传播元数据 (GH 28283)
在 Series.rolling() 和 DataFrame.rolling() 中存在一个错误，当窗口是偏移量且日期按降序排列时，未能正确计算窗口边界 (GH 40002)
在 Series.groupby() 和 DataFrame.groupby() 中存在一个错误，在空 Series 或 DataFrame 上直接使用 idxmax、idxmin、mad、min、max、sum、prod 和 skew 方法，或通过 apply、aggregate 或 resample 使用它们时，会丢失索引、列和/或数据类型 (GH 26411)
在 DataFrameGroupBy.apply() 和 SeriesGroupBy.apply() 中存在一个错误，当在 RollingGroupby 对象上使用时，会创建 MultiIndex 而非 Index (GH 39732)
在 DataFrameGroupBy.sample() 中存在一个错误，当指定 weights 且索引是 Int64Index 时，会抛出错误 (GH 39927)
在 DataFrameGroupBy.aggregate() 和 Resampler.aggregate() 中存在一个错误，当传递字典且列缺失时，有时会抛出 SpecificationError；现在总是抛出 KeyError (GH 40004)
在 DataFrameGroupBy.sample() 中存在一个错误，未在计算结果前应用列选择 (GH 39928)
在 ExponentialMovingWindow 中存在一个错误，当调用 __getitem__ 且提供 times 时，会错误地抛出 ValueError (GH 40164)
在 ExponentialMovingWindow 中存在一个错误，当调用 __getitem__ 时，不会保留 com、span、alpha 或 halflife 属性 (GH 40164)
由于计算不正确，ExponentialMovingWindow 现在在指定 times 且 adjust=False 时会抛出 NotImplementedError (GH 40098)
在 ExponentialMovingWindowGroupby.mean() 中存在一个错误，当 engine='numba' 时，times 参数被忽略 (GH 40951)
在 ExponentialMovingWindowGroupby.mean() 中存在一个错误，在多个组的情况下使用了错误的时间 (GH 40951)
在 ExponentialMovingWindowGroupby 中存在一个错误，对于非平凡组，时间向量和值变得不同步 (GH 40951)
在 Series.asfreq() 和 DataFrame.asfreq() 中存在一个错误，当索引未排序时会删除行 (GH 39805)
在 DataFrame 的聚合函数中存在一个错误，当给定 level 关键字时，不遵守 numeric_only 参数 (GH 40660)
在 SeriesGroupBy.aggregate() 中存在一个错误，使用用户定义函数聚合具有对象类型 Index 的 Series 时，会导致不正确的 Index 形状 (GH 40014)
在 RollingGroupby 中存在一个错误，其中 groupby 中的 as_index=False 参数被忽略 (GH 39433)
在 DataFrameGroupBy.any()、SeriesGroupBy.any()、DataFrameGroupBy.all() 和 SeriesGroupBy.all() 中存在一个错误，当与包含 NA 的可空类型列一起使用时，即使 skipna=True，也会抛出 ValueError (GH 40585)
在 DataFrameGroupBy.cummin()、SeriesGroupBy.cummin()、DataFrameGroupBy.cummax() 和 SeriesGroupBy.cummax() 中存在一个错误，会错误地舍入接近 int64 实现边界的整数值 (GH 40767)
在 DataFrameGroupBy.rank() 和 SeriesGroupBy.rank() 中存在一个错误，当使用可空 dtype 时，会错误地抛出 TypeError (GH 41010)
在 DataFrameGroupBy.cummin()、SeriesGroupBy.cummin()、DataFrameGroupBy.cummax() 和 SeriesGroupBy.cummax() 中存在一个错误，当可空数据类型过大导致无法往返转换成浮点数时，会计算出错误的结果 (GH 37493)
在 DataFrame.rolling() 中存在一个错误，当 min_periods=0 且计算不稳定时，对于全为 NaN 的窗口，返回的均值为零 (GH 41053)
在 DataFrame.rolling() 中存在一个错误，当 min_periods=0 且计算不稳定时，对于全为 NaN 的窗口，返回的和不为零 (GH 41053)
在 SeriesGroupBy.agg() 中存在一个错误，无法在保持顺序的聚合中保留有序的 CategoricalDtype (GH 41147)
在 DataFrameGroupBy.min()、SeriesGroupBy.min()、DataFrameGroupBy.max() 和 SeriesGroupBy.max() 中存在一个错误，当有多个对象 dtype 列且 numeric_only=False 时，会错误地抛出 ValueError (GH 41111)
在 DataFrameGroupBy.rank() 中存在一个错误，当 GroupBy 对象的 axis=0 且 rank 方法的关键字 axis=1 时 (GH 41320)
在 DataFrameGroupBy.__getitem__() 中存在一个错误，当列不唯一时，错误地返回一个格式错误的 SeriesGroupBy 而非 DataFrameGroupBy (GH 41427)
在 DataFrameGroupBy.transform() 中存在一个错误，当列不唯一时，错误地抛出 AttributeError (GH 41427)
在 Resampler.apply() 中存在一个错误，当列不唯一时，错误地删除重复列 (GH 41445)
在 Series.groupby() 聚合中存在一个错误，当聚合对 dtype 无效时，例如对 datetime64[ns] dtype 使用 .prod，会错误地返回空 Series 而非抛出 TypeError (GH 41342)
在 DataFrameGroupBy 聚合中存在一个错误，当没有有效列时，错误地未能删除对于该聚合无效 dtype 的列 (GH 41291)
在 DataFrame.rolling.__iter__() 中存在一个错误，其中 on 未分配给结果对象的索引 (GH 40373)
在 DataFrameGroupBy.transform() 和 DataFrameGroupBy.agg() 中存在一个错误，当 engine="numba" 时，*args 会与用户传入的函数一起被缓存 (GH 41647)
在 DataFrameGroupBy 方法 agg、transform、sum、bfill、ffill、pad、pct_change、shift、ohlc 中存在一个错误，会丢失 .columns.names (GH 41497)

重塑#

在 merge() 中存在一个错误，当执行具有部分索引的内连接且 right_index=True 且索引之间没有重叠时，会抛出错误 (GH 33814)
在 DataFrame.unstack() 中存在一个错误，缺失的级别导致不正确的索引名称 (GH 37510)
在 merge_asof() 中存在一个错误，当 left_index=True 并指定 right_on 时，会传播右侧索引而非左侧索引 (GH 33463)
在 DataFrame.join() 中存在一个错误，当 DataFrame 具有 MultiIndex 且其中一个或两个索引只有一个级别时，返回了错误的结果 (GH 36909)
merge_asof() 现在在合并列非数值的情况下会抛出 ValueError 而非隐晦的 TypeError (GH 29130)
在 DataFrame.join() 中存在一个错误，当 DataFrame 具有 MultiIndex 且至少一个维度具有 Categorical dtype 且类别未按字母顺序排序时，未能正确分配值 (GH 38502)
Series.value_counts() 和 Series.mode() 现在返回原始顺序中一致的键 (GH 12679、GH 11227 和 GH 39007)
在 DataFrame.stack() 中存在一个错误，未能正确处理 MultiIndex 列中的 NaN 值 (GH 39481)
在 DataFrame.apply() 中存在一个错误，当参数 func 是字符串、axis=1 且不支持 axis 参数时，会给出不正确的结果；现在会抛出 ValueError (GH 39211)
在 DataFrame.sort_values() 中存在一个错误，当 ignore_index=True 时，在列上排序后未能正确重塑索引 (GH 39464)
在 DataFrame.append() 中存在一个错误，返回的 dtype 在 ExtensionDtype dtype 组合下不正确 (GH 39454)
在 DataFrame.append() 中存在一个错误，当与 datetime64 和 timedelta64 dtype 组合使用时，返回不正确的 dtype (GH 39574)
在 DataFrame.append() 中存在一个错误，当 DataFrame 具有 MultiIndex 且附加的 Series 的 Index 不是 MultiIndex 时 (GH 41707)
在 DataFrame.pivot_table() 中存在一个错误，当对空 DataFrame 操作时，对于单个值返回 MultiIndex (GH 13483)
Index 现在可以传递给 numpy.all() 函数 (GH 40180)
在 DataFrame.stack() 中存在一个错误，在 MultiIndex 中未保留 CategoricalDtype (GH 36991)
在 to_datetime() 中存在一个错误，当输入序列包含不可哈希项时会引发错误 (GH 39756)
在 Series.explode() 中存在一个错误，当 ignore_index 为 True 且值为标量时会保留索引 (GH 40487)
在 to_datetime() 中存在一个错误，当 Series 包含 None 和 NaT 且元素数量超过 50 个时会引发 ValueError (GH 39882)
在 Series.unstack() 和 DataFrame.unstack() 中存在一个错误，当对象-dtype 值包含时区感知日期时间对象时会错误地引发 TypeError (GH 41875)
在 DataFrame.melt() 中存在一个错误，当 DataFrame 具有作为 value_vars 使用的重复列时会引发 InvalidIndexError (GH 41951)

稀疏#

在 DataFrame.sparse.to_coo() 中存在一个错误，当列是缺少 0 的数值 Index 时会引发 KeyError (GH 18414)
在 SparseArray.astype() 中存在一个错误，当 copy=False 且从整数 dtype 转换为浮点 dtype 时会产生不正确的结果 (GH 34456)
在 SparseArray.max() 和 SparseArray.min() 中存在一个错误，总是返回空结果 (GH 40921)

扩展数组#

在 DataFrame.where() 中存在一个错误，当 other 是一个带有 ExtensionDtype 的 Series 时 (GH 38729)
修复了 Series.idxmax()、Series.idxmin()、Series.argmax() 和 Series.argmin() 在底层数据为 ExtensionArray 时会失败的错误 (GH 32749, GH 33719, GH 36566)
修复了 PandasExtensionDtype 子类的一些属性被不正确缓存的错误 (GH 40329)
在 DataFrame.mask() 中存在一个错误，当用 ExtensionDtype 遮盖 DataFrame 时会引发 ValueError (GH 40941)

样式器#

在 Styler 中存在一个错误，其中方法的 subset 参数对于某些有效的 MultiIndex 切片会引发错误 (GH 33562)
Styler 渲染的 HTML 输出已进行细微修改，以支持 w3 良好代码标准 (GH 39626)
在 Styler 中存在一个错误，其中渲染的 HTML 在某些表头单元格中缺少列类标识符 (GH 39716)
在 Styler.background_gradient() 中存在一个错误，其中文本颜色未正确确定 (GH 39888)
在 Styler.set_table_styles() 中存在一个错误，其中 table_styles 参数的 CSS 选择器中的多个元素未正确添加 (GH 34061)
在 Styler 中存在一个错误，从 Jupyter 复制时会丢失左上角单元格并导致表头错位 (GH 12147)
在 Styler.where 中存在一个错误，其中 kwargs 未传递给适用的可调用对象 (GH 40845)
在 Styler 中存在一个错误，导致 CSS 在多次渲染时重复 (GH 39395, GH 40334)

其他#

inspect.getmembers(Series) 不再引发 AbstractMethodError (GH 38782)
在 Series.where() 中存在一个错误，当 dtype 为数值型且 other=None 时未转换为 nan (GH 39761)
在 assert_series_equal()、assert_frame_equal()、assert_index_equal() 和 assert_extension_array_equal() 中存在一个错误，当某个属性具有无法识别的 NA 类型时会错误地引发异常 (GH 39461)
在 assert_index_equal() 中存在一个错误，当 exact=True 且比较 CategoricalIndex 实例与 Int64Index 和 RangeIndex 类别时未引发异常 (GH 41263)
在 DataFrame.equals()、Series.equals() 和 Index.equals() 中存在一个错误，当对象-dtype 包含 np.datetime64("NaT") 或 np.timedelta64("NaT") 时 (GH 39650)
在 show_versions() 中存在一个错误，其中控制台 JSON 输出不是正确的 JSON 格式 (GH 39701)
pandas 现在可以在使用 xlc 时在 z/OS 上编译 (GH 35826)
在 pandas.util.hash_pandas_object() 中存在一个错误，当输入对象类型为 DataFrame 时未识别 hash_key、encoding 和 categorize (GH 41404)

贡献者#

共有 251 人为本次发布贡献了补丁。名字旁边带有“+”的人是首次贡献补丁。

Abhishek R +
Ada Draginda
Adam J. Stewart
Adam Turner +
Aidan Feldman +
Ajitesh Singh +
Akshat Jain +
Albert Villanova del Moral
Alexandre Prince-Levasseur +
Andrew Hawyrluk +
Andrew Wieteska
AnglinaBhambra +
Ankush Dua +
Anna Daglis
Ashlan Parker +
Ashwani +
Avinash Pancham
Ayushman Kumar +
BeanNan
Benoît Vinot
Bharat Raghunathan
Bijay Regmi +
Bobin Mathew +
Bogdan Pilyavets +
Brian Hulette +
Brian Sun +
Brock +
Bryan Cutler
Caleb +
Calvin Ho +
Chathura Widanage +
Chinmay Rane +
Chris Lynch
Chris Withers
Christos Petropoulos
Corentin Girard +
DaPy15 +
Damodara Puddu +
Daniel Hrisca
Daniel Saxton
DanielFEvans
Dare Adewumi +
Dave Willmer
David Schlachter +
David-dmh +
Deepang Raval +
Doris Lee +
Dr. Jan-Philip Gehrcke +
DriesS +
Dylan Percy
Erfan Nariman
Eric Leung
EricLeer +
Eve
Fangchen Li
Felix Divo
Florian Jetter
Fred Reiss
GFJ138 +
Gaurav Sheni +
Geoffrey B. Eisenbarth +
Gesa Stupperich +
Griffin Ansel +
Gustavo C. Maciel +
Heidi +
Henry +
Hung-Yi Wu +
Ian Ozsvald +
Irv Lustig
Isaac Chung +
Isaac Virshup
JHM Darbyshire (MBP) +
JHM Darbyshire (iMac) +
Jack Liu +
James Lamb +
Jeet Parekh
Jeff Reback
Jiezheng2018 +
Jody Klymak
Johan Kåhrström +
John McGuigan
Joris Van den Bossche
Jose
JoseNavy
Josh Dimarsky
Josh Friedlander
Joshua Klein +
Julia Signell
Julian Schnitzler +
Kaiqi Dong
Kasim Panjri +
Katie Smith +
Kelly +
Kenil +
Keppler, Kyle +
Kevin Sheppard
Khor Chean Wei +
Kiley Hewitt +
Larry Wong +
Lightyears +
Lucas Holtz +
Lucas Rodés-Guirao
Lucky Sivagurunathan +
Luis Pinto
Maciej Kos +
Marc Garcia
Marco Edward Gorelli +
Marco Gorelli
MarcoGorelli +
Mark Graham
Martin Dengler +
Martin Grigorov +
Marty Rudolf +
Matt Roeschke
Matthew Roeschke
Matthew Zeitlin
Max Bolingbroke
Maxim Ivanov
Maxim Kupfer +
Mayur +
MeeseeksMachine
Micael Jarniac
Michael Hsieh +
Michel de Ruiter +
Mike Roberts +
Miroslav Šedivý
Mohammad Jafar Mashhadi
Morisa Manzella +
Mortada Mehyar
Muktan +
Naveen Agrawal +
Noah
Nofar Mishraki +
Oleh Kozynets
Olga Matoula +
Oli +
Omar Afifi
Omer Ozarslan +
Owen Lamont +
Ozan Öğreden +
Pandas Development Team
Paolo Lammens
Parfait Gasana +
Patrick Hoefler
Paul McCarthy +
Paulo S. Costa +
Pav A
Peter
Pradyumna Rahul +
Punitvara +
QP Hou +
Rahul Chauhan
Rahul Sathanapalli
Richard Shadrach
Robert Bradshaw
Robin to Roxel
Rohit Gupta
Sam Purkis +
Samuel GIFFARD +
Sean M. Law +
Shahar Naveh +
ShaharNaveh +
Shiv Gupta +
Shrey Dixit +
Shudong Yang +
Simon Boehm +
Simon Hawkins
Sioned Baker +
Stefan Mejlgaard +
Steven Pitman +
Steven Schaerer +
Stéphane Guillou +
TLouf +
Tegar D Pratama +
Terji Petersen
Theodoros Nikolaou +
Thomas Dickson
Thomas Li
Thomas Smith
Thomas Yu +
ThomasBlauthQC +
Tim Hoffmann
Tom Augspurger
Torsten Wörtwein
Tyler Reddy
UrielMaD
Uwe L. Korn
Venaturum +
VirosaLi
Vladimir Podolskiy
Vyom Pathak +
WANG Aiyong
Waltteri Koskinen +
Wenjun Si +
William Ayd
Yeshwanth N +
Yuanhao Geng
Zito Relova +
aflah02 +
arredond +
attack68
cdknox +
chinggg +
fathomer +
ftrihardjo +
github-actions[bot] +
gunjan-solanki +
guru kiran
hasan-yaman
i-aki-y +
jbrockmendel
jmholzer +
jordi-crespo +
jotasi +
jreback
juliansmidek +
kylekeppler
lrepiton +
lucasrodes
maroth96 +
mikeronayne +
mlondschien
moink +
morrme
mschmookler +
mzeitlin11
na2 +
nofarmishraki +
partev
patrick
ptype
realead
rhshadrach
rlukevie +
rosagold +
saucoide +
sdementen +
shawnbrown
sstiijn +
stphnlyd +
sukriti1 +
taytzehao
theOehrly +
theodorju +
thordisstella +
tonyyyyip +
tsinggggg +
tushushu +
vangorade +
vladu +
wertha +

1.3.0 版的新功能 (2021年7月2日)#

功能增强#

读取 CSV 或 JSON 文件时自定义 HTTP(s) 头#

读取和写入 XML 文档#

Styler 增强功能#

DataFrame 构造函数在 dict 参数中尊重 copy=False#

PyArrow 支持的字符串数据类型#

居中时间日期型滚动窗口#

其他功能增强#

值得注意的错误修复#

Categorical.unique 现在始终保持与原始数据相同的 dtype#

在 DataFrame.combine_first() 中保留 dtypes#

Groupby 方法 agg 和 transform 不再更改可调用对象的返回 dtype#

DataFrameGroupBy.mean()、DataFrameGroupBy.median() 和 GDataFrameGroupBy.var()、SeriesGroupBy.mean()、SeriesGroupBy.median() 和 SeriesGroupBy.var() 的 float 结果#

使用 loc 和 iloc 设置值时尝试就地操作#

设置 frame[keys] = values 时从不就地操作#

在布尔 Series 中设置值时一致的类型转换#

DataFrameGroupBy.rolling 和 SeriesGroupBy.rolling 不再在值中返回分组列#

移除了滚动方差和标准差中的人工截断#

DataFrameGroupBy.rolling 和 SeriesGroupBy.rolling 与 MultiIndex 不再在结果中删除级别#

向后不兼容的 API 更改#

增加了依赖项的最低版本#

其他 API 更改#

构建#

弃用#

弃用了 DataFrame 缩减和 DataFrameGroupBy 操作中删除无用列的行为#

其他弃用#

性能改进#

Bug 修复#

分类#

日期时间类#

时间差#

时区#

数值#

转换#

字符串#

区间#

索引#

缺失值#

多级索引#

输入/输出#

周期#

绘图#

分组/重采样/滚动#

重塑#

稀疏#

扩展数组#

样式器#

其他#

贡献者#

DataFrame 构造函数在 dict 参数中尊重 `copy=False`#

`Categorical.unique` 现在始终保持与原始数据相同的 dtype#

在 `DataFrame.combine_first()` 中保留 dtypes#

`DataFrameGroupBy.mean()`、`DataFrameGroupBy.median()` 和 `GDataFrameGroupBy.var()`、`SeriesGroupBy.mean()`、`SeriesGroupBy.median()` 和 `SeriesGroupBy.var()` 的 `float` 结果#

使用 `loc` 和 `iloc` 设置值时尝试就地操作#

设置 `frame[keys] = values` 时从不就地操作#