1.4.0 版本新特性 (2022 年 1 月 22 日)#

以下是 pandas 1.4.0 中的变更。有关包括其他 pandas 版本在内的完整更新日志，请参阅发行说明。

改进#

改进了警告消息#

此前，警告消息可能会指向 pandas 库中的行。运行脚本 setting_with_copy_warning.py

import pandas as pd

df = pd.DataFrame({'a': [1, 2, 3]})
df[:2].loc[:, 'a'] = 5

在 pandas 1.3 中会得到

.../site-packages/pandas/core/indexing.py:1951: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.

这使得很难确定警告是从哪里产生的。现在 pandas 将检查调用堆栈，报告导致警告产生的 pandas 库外部的第一行。上述脚本的输出现在是

setting_with_copy_warning.py:4: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.

Index 可以容纳任意 ExtensionArrays#

在此之前，将自定义 ExtensionArray 传递给 pd.Index 会将数组转换为 object dtype。现在 Index 可以直接容纳任意 ExtensionArray (GH 43930)。

旧行为:

In [1]: arr = pd.array([1, 2, pd.NA])

In [2]: idx = pd.Index(arr)

在旧行为中，idx 将是 object-dtype

旧行为:

In [1]: idx
Out[1]: Index([1, 2, <NA>], dtype='object')

在新行为中，我们保留原始 dtype

新行为:

In [3]: idx
Out[3]: Index([1, 2, <NA>], dtype='Int64')

此规则有一个例外是 SparseArray，它将继续转换为 numpy dtype，直到 pandas 2.0。届时它将像其他 ExtensionArrays 一样保留其 dtype。

Styler#

Styler 在 1.4.0 中得到了进一步开发。以下是已进行的一般性改进

添加了索引的样式和格式设置，通过 Styler.apply_index()、Styler.applymap_index() 和 Styler.format_index()。这些方法与已用于设置数据值样式和格式的方法的签名相同，并适用于 HTML、LaTeX 和 Excel 格式 (GH 41893, GH 43101, GH 41993, GH 41995)

新方法 Styler.hide() 废弃了 Styler.hide_index() 和 Styler.hide_columns() (GH 43758)

关键字参数 level 和 names 已添加到 Styler.hide() (并隐式添加到已废弃的方法 Styler.hide_index() 和 Styler.hide_columns()) 以额外控制 MultiIndexes 和 Index 名称的可见性 (GH 25475, GH 43404, GH 43346)

Styler.export() 和 Styler.use() 已更新，以解决 v1.2.0 和 v1.3.0 中添加的所有功能 (GH 40675)

pd.options.styler 类别下的全局选项已扩展，可配置默认的 Styler 属性，以解决格式化、编码以及 HTML 和 LaTeX 渲染问题。请注意，以前 Styler 依赖于 display.html.use_mathjax，现在已被 styler.html.mathjax 取代 (GH 41395)

某些关键字参数的验证，例如 caption (GH 43368)

如下所示的各种 bug 修复

此外，HTML 特定渲染还有一些具体的增强

Styler.bar() 引入了额外的参数来控制对齐和显示 (GH 26070, GH 36419)，它还验证了输入参数 width 和 height (GH 42511)

Styler.to_html() 引入了关键字参数 sparse_index、sparse_columns、bold_headers、caption、max_rows 和 max_columns (GH 41946, GH 43149, GH 42972)

Styler.to_html() 为性能提升省略了隐藏表格元素的 CSSStyle 规则 (GH 43619)

现在可以直接指定自定义 CSS 类，无需字符串替换 (GH 43686)

能够通过新的 hyperlinks 格式化关键字参数自动渲染超链接 (GH 45058)

还有一些 LaTeX 特定的增强

Styler.to_latex() 引入了关键字参数 environment，它还允许通过单独的 jinja2 模板实现特定的“longtable”条目 (GH 41866)

现在可以在 LaTeX 中进行简单的稀疏化，无需包含 multirow 宏包 (GH 43369)

通过关键字参数为 MultiIndex 行稀疏化添加了 cline 支持 (GH 45138)

使用基于 pyarrow 的新 CSV 引擎进行多线程 CSV 读取#

pandas.read_csv() 现在接受 engine="pyarrow" (需要至少 pyarrow 1.0.1) 作为参数，允许在安装了 pyarrow 的多核机器上更快地解析 CSV。有关更多信息，请参阅I/O 文档。( GH 23697, GH 43706)

滚动和扩展窗口的排名函数#

为 Rolling 和 Expanding 添加了 rank 函数。新函数支持 DataFrame.rank() 的 method、ascending 和 pct 标志。method 参数支持 min、max 和 average 排名方法。示例

In [4]: s = pd.Series([1, 4, 2, 3, 5, 3])

In [5]: s.rolling(3).rank()
Out[5]: 
0    NaN
1    NaN
2    2.0
3    2.0
4    3.0
5    1.5
dtype: float64

In [6]: s.rolling(3).rank(method="max")
Out[6]: 
0    NaN
1    NaN
2    2.0
3    2.0
4    3.0
5    2.0
dtype: float64

Groupby 位置索引#

现在可以指定相对于每个组末尾的位置范围。

DataFrameGroupBy.head()、SeriesGroupBy.head()、DataFrameGroupBy.tail() 和 SeriesGroupBy.tail() 的负数参数现在可以正常工作，并分别导致相对于每个组末尾和开头的范围。以前，负数参数会返回空帧。

In [7]: df = pd.DataFrame([["g", "g0"], ["g", "g1"], ["g", "g2"], ["g", "g3"],
   ...:                    ["h", "h0"], ["h", "h1"]], columns=["A", "B"])
   ...: 

In [8]: df.groupby("A").head(-1)
Out[8]: 
   A   B
0  g  g0
1  g  g1
2  g  g2
4  h  h0

DataFrameGroupBy.nth() 和 SeriesGroupBy.nth() 现在接受切片或整数和切片列表。

In [9]: df.groupby("A").nth(slice(1, -1))
Out[9]: 
   A   B
1  g  g1
2  g  g2

In [10]: df.groupby("A").nth([slice(None, 1), slice(-1, None)])
Out[10]: 
   A   B
0  g  g0
3  g  g3
4  h  h0
5  h  h1

DataFrameGroupBy.nth() 和 SeriesGroupBy.nth() 现在接受索引表示法。

In [11]: df.groupby("A").nth[1, -1]
Out[11]: 
   A   B
1  g  g1
3  g  g3
5  h  h1

In [12]: df.groupby("A").nth[1:-1]
Out[12]: 
   A   B
1  g  g1
2  g  g2

In [13]: df.groupby("A").nth[:1, -1:]
Out[13]: 
   A   B
0  g  g0
3  g  g3
4  h  h0
5  h  h1

DataFrame.from_dict 和 DataFrame.to_dict 增加了新的 `'tight'` 选项#

现在，使用 DataFrame.from_dict() 和 DataFrame.to_dict() 方法可以使用一种新的 'tight' 字典格式，它保留了 MultiIndex 条目和名称，并且可以与标准 json 库一起使用，以生成 DataFrame 对象的紧凑表示 (GH 4889)。

In [14]: df = pd.DataFrame.from_records(
   ....:     [[1, 3], [2, 4]],
   ....:     index=pd.MultiIndex.from_tuples([("a", "b"), ("a", "c")],
   ....:                                     names=["n1", "n2"]),
   ....:     columns=pd.MultiIndex.from_tuples([("x", 1), ("y", 2)],
   ....:                                       names=["z1", "z2"]),
   ....: )
   ....: 

In [15]: df
Out[15]: 
z1     x  y
z2     1  2
n1 n2      
a  b   1  3
   c   2  4

In [16]: df.to_dict(orient='tight')
Out[16]: 
{'index': [('a', 'b'), ('a', 'c')],
 'columns': [('x', 1), ('y', 2)],
 'data': [[1, 3], [2, 4]],
 'index_names': ['n1', 'n2'],
 'column_names': ['z1', 'z2']}

其他增强功能#

concat() 在所有对象属性相同的情况下将保留 attrs，在属性不同时则丢弃 attrs (GH 41828)
在 as_index=False 的 DataFrameGroupBy 操作中，现在可以正确保留分组列的 ExtensionDtype dtypes (GH 41373)
在 DataFrame.plot.hist() 和 DataFrame.plot.box() 中添加了对 by 参数赋值的支持 (GH 15079)
Series.sample()、DataFrame.sample()、DataFrameGroupBy.sample() 和 SeriesGroupBy.sample() 现在接受 np.random.Generator 作为 random_state 的输入。生成器将更高效，尤其是在 replace=False 的情况下 (GH 38100)
Series.ewm() 和 DataFrame.ewm() 现在支持带有 'table' 选项的 method 参数，该选项在整个 DataFrame 上执行窗口操作。有关性能和功能优势，请参阅窗口概述 (GH 42273)
DataFrameGroupBy.cummin()、SeriesGroupBy.cummin()、DataFrameGroupBy.cummax() 和 SeriesGroupBy.cummax() 现在支持参数 skipna (GH 34047)
read_table() 现在支持参数 storage_options (GH 39167)
DataFrame.to_stata() 和 StataWriter() 现在接受仅限关键字的参数 value_labels，以保存非分类列的标签 (GH 38454)
依赖于基于哈希图的算法的方法，例如 DataFrameGroupBy.value_counts()、DataFrameGroupBy.count() 和 factorize() 忽略了复数的虚部 (GH 17927)
添加了 Series.str.removeprefix() 和 Series.str.removesuffix()，这是 Python 3.9 中引入的，用于从字符串类型 Series 中删除前缀/后缀 (GH 36944)
尝试使用 DataFrame.to_csv()、DataFrame.to_html()、DataFrame.to_excel()、DataFrame.to_feather()、DataFrame.to_parquet()、DataFrame.to_stata()、DataFrame.to_json()、DataFrame.to_pickle() 和 DataFrame.to_xml() 写入缺失的父目录中的文件时，现在会明确提及缺失的父目录，Series 对应方法也是如此 (GH 24306)
使用 .loc 和 .iloc 索引现在支持 Ellipsis (GH 37750)
IntegerArray.all()、IntegerArray.any()、FloatingArray.any() 和 FloatingArray.all() 使用克利尼逻辑 (GH 41967)
为 DataFrame.to_stata()、StataWriter、StataWriter117 和 StataWriterUTF8 添加了对可空布尔和整数类型的支持 (GH 40855)
DataFrame.__pos__() 和 DataFrame.__neg__() 现在保留 ExtensionDtype dtypes (GH 43883)
当可选依赖项无法导入时引发的错误现在包含原始异常，以便于调查 (GH 43882)
添加了 ExponentialMovingWindow.sum() (GH 13297)
Series.str.split() 现在支持 regex 参数，明确指定模式是否为正则表达式。默认值为 None (GH 43563, GH 32835, GH 25549)
DataFrame.dropna() 现在接受单个标签作为 subset，以及类数组 (GH 41021)
添加了 DataFrameGroupBy.value_counts() (GH 43564)
read_csv() 现在接受 on_bad_lines 中的 callable 函数，当 engine="python" 时，用于自定义处理坏行 (GH 5686)
添加了 ExcelWriter 参数 if_sheet_exists="overlay" 选项 (GH 40231)
read_excel() 现在接受 decimal 参数，允许用户在将字符串列解析为数值时指定小数点 (GH 14403)
DataFrameGroupBy.mean()、SeriesGroupBy.mean()、DataFrameGroupBy.std()、SeriesGroupBy.std()、DataFrameGroupBy.var()、SeriesGroupBy.var()、DataFrameGroupBy.sum() 和 SeriesGroupBy.sum() 现在支持使用 engine 关键字进行 Numba 执行 (GH 43731, GH 44862, GH 44939)
Timestamp.isoformat() 现在处理基本 datetime 类中的 timespec 参数 (GH 26131)
NaT.to_numpy() 的 dtype 参数现在得到尊重，因此可以返回 np.timedelta64 (GH 44460)
新增选项 display.max_dir_items 可自定义添加到 Dataframe.__dir__() 并建议用于 Tab 补全的列数 (GH 37996)
在 USFederalHolidayCalendar 中添加了“六月节国家独立日” (GH 44574)
Rolling.var()、Expanding.var()、Rolling.std() 和 Expanding.std() 现在支持使用 engine 关键字进行 Numba 执行 (GH 44461)
为了与 DataFrame.info() 兼容，已添加 Series.info() (GH 5167)
实现了 IntervalArray.min() 和 IntervalArray.max()，因此 min 和 max 现在适用于 IntervalIndex、Series 和具有 IntervalDtype 的 DataFrame (GH 44746)
UInt64Index.map() 现在尽可能保留 dtype (GH 44609)
read_json() 现在可以解析无符号长长整型 (GH 26068)
DataFrame.take() 现在在传入标量作为索引器时会引发 TypeError (GH 42875)
is_list_like() 现在将鸭子数组识别为类列表，除非 .ndim == 0 (GH 35131)
当使用 DataFrame.to_json() 以 orient='table' 导出 DataFrame 时，ExtensionDtype 和 ExtensionArray 现在被（反）序列化 (GH 20612, GH 44705)
为 DataFrame.to_pickle()/read_pickle() 及其相关函数添加了对 Zstandard 压缩的支持 (GH 43925)
DataFrame.to_sql() 现在返回一个表示写入行数的 int 类型值 (GH 23998)

显著的 Bug 修复#

这些是可能导致显著行为变化的 bug 修复。

日期字符串解析不一致#

to_datetime() 的 dayfirst 选项不严格，这可能导致意外行为

In [17]: pd.to_datetime(["31-12-2021"], dayfirst=False)
Out[17]: DatetimeIndex(['2021-12-31'], dtype='datetime64[ns]', freq=None)

现在，如果日期字符串不能根据给定的 dayfirst 值进行解析，并且该值是一个带分隔符的日期字符串 (例如 31-12-2012)，则会引发警告。

在 concat 中忽略空列或全 NA 列的 dtypes#

注意

此行为更改已在 pandas 1.4.3 中恢复。

当使用 concat() 连接两个或更多 DataFrame 对象时，如果其中一个 DataFrame 为空或所有值为 NA，则在查找连接后的 dtype 时，其 dtype 有时会被忽略。现在它们始终不被忽略 (GH 43507)。

In [3]: df1 = pd.DataFrame({"bar": [pd.Timestamp("2013-01-01")]}, index=range(1))
In [4]: df2 = pd.DataFrame({"bar": np.nan}, index=range(1, 2))
In [5]: res = pd.concat([df1, df2])

此前，df2 中的 float-dtype 会被忽略，因此结果 dtype 将是 datetime64[ns]。结果，np.nan 将被转换为 NaT。

旧行为:

In [6]: res
Out[6]:
         bar
0 2013-01-01
1        NaT

现在 float-dtype 得到了尊重。由于这些 DataFrame 的共同 dtype 是 object，np.nan 被保留。

新行为:

In [6]: res
Out[6]:
                   bar
0  2013-01-01 00:00:00
1                  NaN

在 value_counts 和 mode 中，空值不再强制转换为 NaN 值#

Series.value_counts() 和 Series.mode() 不再将 None、NaT 和其他空值强制转换为 np.object_-dtype 的 NaN 值。此行为现在与 unique、isin 等保持一致 (GH 42688)。

In [18]: s = pd.Series([True, None, pd.NaT, None, pd.NaT, None])

In [19]: res = s.value_counts(dropna=False)

以前，所有空值都被 NaN 值替换。

旧行为:

In [3]: res
Out[3]:
NaN     5
True    1
dtype: int64

现在空值不再被混淆。

新行为:

In [20]: res
Out[20]: 
None    3
NaT     2
True    1
Name: count, dtype: int64

read_csv 中的 mangle_dupe_cols 不再重命名与目标名称冲突的唯一列#

read_csv() 不再重命名与重复列的目标名称冲突的唯一列标签。已存在的列将被跳过，即目标列名称使用下一个可用的索引 (GH 14704)。

In [21]: import io

In [22]: data = "a,a,a.1\n1,2,3"

In [23]: res = pd.read_csv(io.StringIO(data))

此前，第二列被称为 a.1，而第三列也被重命名为 a.1.1。

旧行为:

In [3]: res
Out[3]:
    a  a.1  a.1.1
0   1    2      3

现在重命名会在更改第二列名称时检查 a.1 是否已存在，并跳过此索引。第二列改为重命名为 a.2。

新行为:

In [24]: res
Out[24]: 
   a  a.2  a.1
0  1    2    3

unstack 和 pivot_table 不再因结果可能超出 int32 限制而引发 ValueError#

此前 DataFrame.pivot_table() 和 DataFrame.unstack() 如果操作可能产生超过 2**31 - 1 个元素的结果，则会引发 ValueError。此操作现在改为引发 errors.PerformanceWarning (GH 26314)。

旧行为:

In [3]: df = DataFrame({"ind1": np.arange(2 ** 16), "ind2": np.arange(2 ** 16), "count": 0})
In [4]: df.pivot_table(index="ind1", columns="ind2", values="count", aggfunc="count")
ValueError: Unstacked DataFrame is too big, causing int32 overflow

新行为:

In [4]: df.pivot_table(index="ind1", columns="ind2", values="count", aggfunc="count")
PerformanceWarning: The following operation may generate 4294967296 cells in the resulting pandas object.

groupby.apply 一致的转换检测#

DataFrameGroupBy.apply() 和 SeriesGroupBy.apply() 旨在灵活，允许用户执行聚合、转换、过滤，并将其与可能不属于这些类别中的任何一个的用户定义函数一起使用。作为其中的一部分，apply 将尝试检测操作何时是转换，在这种情况下，结果将具有与输入相同的索引。为了确定操作是否为转换，pandas 将输入的索引与结果的索引进行比较，并确定它是否已被修改。以前在 pandas 1.3 中，不同的代码路径使用不同的“修改”定义：一些会使用 Python 的 is，而另一些则只测试相等性。

这种不一致性已被移除，pandas 现在测试的是相等性。

In [25]: def func(x):
   ....:     return x.copy()
   ....: 

In [26]: df = pd.DataFrame({'a': [1, 2], 'b': [3, 4], 'c': [5, 6]})

In [27]: df
Out[27]: 
   a  b  c
0  1  3  5
1  2  4  6

旧行为:

In [3]: df.groupby(['a']).apply(func)
Out[3]:
     a  b  c
a
1 0  1  3  5
2 1  2  4  6

In [4]: df.set_index(['a', 'b']).groupby(['a']).apply(func)
Out[4]:
     c
a b
1 3  5
2 4  6

在上面的示例中，第一个示例使用 pandas 使用 is 的代码路径，并确定 func 不是转换，而第二个示例则测试相等性，并确定 func 是转换。在第一种情况下，结果的索引与输入的索引不同。

新行为:

In [5]: df.groupby(['a']).apply(func)
Out[5]:
   a  b  c
0  1  3  5
1  2  4  6

In [6]: df.set_index(['a', 'b']).groupby(['a']).apply(func)
Out[6]:
     c
a b
1 3  5
2 4  6

现在，在这两种情况下都确定 func 是一个转换。在每种情况下，结果都具有与输入相同的索引。

不兼容的 API 变更#

Python 的最低版本要求提高#

pandas 1.4.0 支持 Python 3.8 及更高版本。

依赖项的最低版本要求提高#

一些依赖项的最低支持版本已更新。如果已安装，我们现在要求

包	最低版本	必需	已更改
numpy	1.18.5	X	X
pytz	2020.1	X	X
python-dateutil	2.8.1	X	X
bottleneck	1.3.1		X
numexpr	2.7.1		X
pytest (dev)	6.0
mypy (dev)	0.930		X

对于可选库，一般建议使用最新版本。下表列出了 pandas 开发过程中当前正在测试的每个库的最低版本。低于最低测试版本的可选库可能仍然有效，但不支持。

包	最低版本	已更改
beautifulsoup4	4.8.2	X
fastparquet	0.4.0
fsspec	0.7.4
gcsfs	0.6.0
lxml	4.5.0	X
matplotlib	3.3.2	X
numba	0.50.1	X
openpyxl	3.0.3	X
pandas-gbq	0.14.0	X
pyarrow	1.0.1	X
pymysql	0.10.1	X
pytables	3.6.1	X
s3fs	0.4.0
scipy	1.4.1	X
sqlalchemy	1.4.0	X
tabulate	0.8.7
xarray	0.15.1	X
xlrd	2.0.1	X
xlsxwriter	1.2.2	X
xlwt	1.3.0

有关更多信息，请参阅依赖项和可选依赖项。

其他 API 变更#

Index.get_indexer_for() 不再接受关键字参数（target 除外）；过去，如果索引不唯一，这些参数会被静默忽略 (GH 42310)
由于文档字符串的更改，DataFrame.to_string() 中 min_rows 参数的位置发生变化 (GH 44304)
当 skipna 参数传入 None 时，DataFrame 或 Series 的缩减操作现在会引发 ValueError (GH 44178)
read_csv() 和 read_html() 不再在其中一个标题行只包含 Unnamed: 列时引发错误 (GH 13054)
更改了 USFederalHolidayCalendar 中几个假日的 name 属性，以与官方联邦假日名称匹配，具体如下：
- “New Year’s Day” 增加了所有格撇号
- “Presidents Day” 变为 “Washington’s Birthday”
- “Martin Luther King Jr. Day” 现在是 “Birthday of Martin Luther King, Jr.”
- “July 4th” 现在是 “Independence Day”
- “Thanksgiving” 现在是 “Thanksgiving Day”
- “Christmas” 现在是 “Christmas Day”
- 添加了“六月节国家独立日”

废弃项#

废弃了 Int64Index, UInt64Index & Float64Index#

Int64Index、UInt64Index 和 Float64Index 已被废弃，转而使用基础的 Index 类，并将在 Pandas 2.0 中移除 (GH 43028)。

要构造数字索引，可以使用基础 Index 类并指定数据类型（这也适用于旧版 pandas）

# replace
pd.Int64Index([1, 2, 3])
# with
pd.Index([1, 2, 3], dtype="int64")

要检查索引对象的数据类型，可以将 isinstance 检查替换为检查 dtype

# replace
isinstance(idx, pd.Int64Index)
# with
idx.dtype == "int64"

目前，为了保持向后兼容性，调用 Index 在给定数字数据时将继续返回 Int64Index、UInt64Index 和 Float64Index，但将来将返回 Index。

当前行为:

In [1]: pd.Index([1, 2, 3], dtype="int32")
Out [1]: Int64Index([1, 2, 3], dtype='int64')
In [1]: pd.Index([1, 2, 3], dtype="uint64")
Out [1]: UInt64Index([1, 2, 3], dtype='uint64')

未来行为:

In [3]: pd.Index([1, 2, 3], dtype="int32")
Out [3]: Index([1, 2, 3], dtype='int32')
In [4]: pd.Index([1, 2, 3], dtype="uint64")
Out [4]: Index([1, 2, 3], dtype='uint64')

废弃了 DataFrame.append 和 Series.append#

DataFrame.append() 和 Series.append() 已被废弃，并将在未来版本中移除。请改用 pandas.concat() (GH 35407)。

废弃的语法

In [1]: pd.Series([1, 2]).append(pd.Series([3, 4])
Out [1]:
<stdin>:1: FutureWarning: The series.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.
0    1
1    2
0    3
1    4
dtype: int64

In [2]: df1 = pd.DataFrame([[1, 2], [3, 4]], columns=list('AB'))
In [3]: df2 = pd.DataFrame([[5, 6], [7, 8]], columns=list('AB'))
In [4]: df1.append(df2)
Out [4]:
<stdin>:1: FutureWarning: The series.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.
   A  B
0  1  2
1  3  4
0  5  6
1  7  8

推荐的语法

In [28]: pd.concat([pd.Series([1, 2]), pd.Series([3, 4])])
Out[28]: 
0    1
1    2
0    3
1    4
dtype: int64

In [29]: df1 = pd.DataFrame([[1, 2], [3, 4]], columns=list('AB'))

In [30]: df2 = pd.DataFrame([[5, 6], [7, 8]], columns=list('AB'))

In [31]: pd.concat([df1, df2])
Out[31]: 
   A  B
0  1  2
1  3  4
0  5  6
1  7  8

其他废弃项#

废弃了 Index.is_type_compatible() (GH 42113)
废弃了 Index.get_loc() 中的 method 参数，请改用 index.get_indexer([label], method=...) (GH 42269)
已弃用在 Series.__setitem__() 中将整型键视为位置参数的行为，当索引是 Float64Index 且不包含该键，或者是 IntervalIndex 且没有包含该键的条目，或者是一个 MultiIndex 且其首层 Float64Index 不包含该键时 (GH 33469)
已弃用将 numpy.datetime64 对象作为 UTC 时间传递给 Timestamp 构造函数（同时指定时区）时的行为。在未来的版本中，这些将被视为本地时间。要保留旧行为，请使用 Timestamp(dt64).tz_localize("UTC").tz_convert(tz) (GH 24559)
已弃用在 MultiIndex 的某个级别上使用标签序列进行索引时忽略缺失标签的行为 (GH 42351)
现在，在没有指定 dtype 的情况下创建空的 Series 将会引发更明显的 FutureWarning，而不是 DeprecationWarning (GH 30017)
已弃用 Index.get_slice_bound()、Index.slice_indexer() 和 Index.slice_locs() 中的 kind 参数；在未来的版本中，传递 kind 将会引发错误 (GH 42857)
已弃用 Rolling、Expanding 和 EWM 聚合中丢弃无关列的行为 (GH 42738)
已弃用对非唯一 Index 使用 Index.reindex() 的行为 (GH 42568)
已弃用 Styler.render()，推荐使用 Styler.to_html() (GH 42140)
已弃用 Styler.hide_index() 和 Styler.hide_columns()，推荐使用 Styler.hide() (GH 43758)
已弃用在 DataFrame.ewm() 中将字符串列标签传递给 times 参数的行为 (GH 43265)
已弃用 DataFrame.between_time() 中的 include_start 和 include_end 参数；在未来的版本中，传递 include_start 或 include_end 将会引发错误 (GH 40245)
已弃用 read_csv()、read_table() 和 read_excel() 中的 squeeze 参数。用户应改为在之后使用 .squeeze("columns") 来压缩 DataFrame (GH 43242)
已弃用 SparseArray 构造中的 index 参数 (GH 23089)
已弃用 date_range() 和 bdate_range() 中的 closed 参数，推荐使用 inclusive 参数；在未来的版本中，传递 closed 将会引发错误 (GH 40245)
已弃用 Rolling.validate()、Expanding.validate() 和 ExponentialMovingWindow.validate() (GH 43665)
已弃用在 Series.transform 和 DataFrame.transform 中使用字典时，因引发 TypeError 而静默丢弃列的行为 (GH 43740)
已弃用在 Series.aggregate()、DataFrame.aggregate()、Series.groupby.aggregate() 和 DataFrame.groupby.aggregate() 中使用列表时，因引发 TypeError、DataError 以及某些情况下的 ValueError 而静默丢弃列的行为 (GH 43740)
已弃用将时区感知值设置到时区感知 Series 或 DataFrame 列中（当两者的时区不匹配时）的类型转换行为。此前，这会转换为对象 dtype。在未来的版本中，插入的值将被转换为序列或列的现有 timezone (GH 37605)
已弃用将时区不匹配的项传递给 DatetimeIndex.insert()、DatetimeIndex.putmask()、DatetimeIndex.where()、DatetimeIndex.fillna()、Series.mask()、Series.where()、Series.fillna()、Series.shift()、Series.replace()、Series.reindex() (以及 DataFrame 列的类似操作) 时的类型转换行为。过去这会转换为 object dtype。在未来的版本中，这些操作会将传递的项转换为索引或序列的时区 (GH 37605, GH 44940)
已弃用 read_csv() 和 read_table() 中的 prefix 关键字参数，未来的版本中该参数将被移除 (GH 43396)
已弃用在 concat() 中将非布尔参数传递给 sort 的行为 (GH 41518)
已弃用在 read_fwf() 中将除 filepath_or_buffer 之外的参数作为位置参数传递的行为 (GH 41485)
已弃用在 read_xml() 中将除 path_or_buffer 之外的参数作为位置参数传递的行为 (GH 45133)
已弃用为 DataFrame.mad() 和 Series.mad() 传递 skipna=None 的行为，请改为传递 skipna=True (GH 44580)
已弃用 to_datetime() 中，当字符串为“now”且 utc=False 时的行为；在未来的版本中，这将与 Timestamp("now") 匹配，而 Timestamp("now") 又与返回本地时间的 Timestamp.now() 匹配 (GH 18705)
已弃用 DateOffset.apply()，请改为使用 offset + other (GH 44522)
已弃用 Index.copy() 中的 names 参数 (GH 44916)
现在 DataFrame.to_latex() 会显示弃用警告，表明其参数签名可能会在未来的版本中更改，以更接近 Styler.to_latex() 的参数 (GH 44411)
已弃用布尔型 dtype 和数值型 dtype 对象之间 concat() 的行为；在未来的版本中，这些将转换为对象 dtype，而不是将布尔值强制转换为数值 (GH 39817)
已弃用 Categorical.replace()，请改为使用 Series.replace() (GH 44929)
已弃用在 DataFrame.loc.__setitem__()、DataFrame.loc.__getitem__()、Series.loc.__setitem__()、Series.loc.__getitem__()、DataFrame.__getitem__()、Series.__getitem__() 和 Series.__setitem__() 中将 set 或 dict 作为索引器传递的行为 (GH 42825)
已弃用使用布尔键对 Index.__getitem__() 进行索引的行为；请使用 index.values[key] 来获取旧行为 (GH 44051)
已弃用 DataFrame.where() 中按列向下转换整数 dtype 的行为 (GH 44597)
已弃用 DatetimeIndex.union_many()，请改为使用 DatetimeIndex.union() (GH 44091)
已弃用 Groupby.pad()，推荐使用 Groupby.ffill() (GH 33396)
已弃用 Groupby.backfill()，推荐使用 Groupby.bfill() (GH 33396)
已弃用 Resample.pad()，推荐使用 Resample.ffill() (GH 33396)
已弃用 Resample.backfill()，推荐使用 Resample.bfill() (GH 33396)
已弃用 DataFrame.rank() 中的 numeric_only=None；在未来的版本中，numeric_only 必须是 True 或 False (默认值) (GH 45036)
已弃用 Timestamp.utcfromtimestamp() 的行为，未来它将返回一个时区感知 UTC Timestamp (GH 22451)
已弃用 NaT.freq() (GH 45071)
已弃用 Series 和 DataFrame 构造函数在传递包含 NaN 的浮点型数据且指定整数 dtype 时却忽略 dtype 参数的行为；在未来的版本中，这将会引发错误 (GH 40110)
已弃用 Series.to_frame() 和 Index.to_frame() 在 name=None 时忽略 name 参数的行为。目前，这意味着保留现有名称，但未来显式传递 name=None 将会把结果 DataFrame 的列名设置为 None (GH 44212)

性能改进#

DataFrameGroupBy.sample() 和 SeriesGroupBy.sample() 的性能提升，特别是在提供了 weights 参数时 (GH 34483)
非字符串数组转换为字符串数组的性能提升 (GH 34483)
针对用户定义函数，DataFrameGroupBy.transform() 和 SeriesGroupBy.transform() 的性能提升 (GH 41598)
构建 DataFrame 对象的性能提升 (GH 42631, GH 43142, GH 43147, GH 43307, GH 43144, GH 44826)
当提供了 fill_value 参数时，DataFrameGroupBy.shift() 和 SeriesGroupBy.shift() 的性能提升 (GH 26615)
对于没有缺失值的数据，DataFrame.corr() 中 method=pearson 的性能提升 (GH 40956)
某些 DataFrameGroupBy.apply() 和 SeriesGroupBy.apply() 操作的性能提升 (GH 42992, GH 43578)
read_stata() 的性能提升 (GH 43059, GH 43227)
read_sas() 的性能提升 (GH 43333)
使用 uint dtypes 的 to_datetime() 的性能提升 (GH 42606)
当 infer_datetime_format 设置为 True 时，to_datetime() 的性能提升 (GH 43901)
Series.sparse.to_coo() 的性能提升 (GH 42880)
使用 UInt64Index 进行索引的性能提升 (GH 43862)
使用 Float64Index 进行索引的性能提升 (GH 43705)
使用非唯一 Index 进行索引的性能提升 (GH 43792)
在 MultiIndex 上使用列表式索引器的性能提升 (GH 43370)
在另一个 MultiIndex 上使用 MultiIndex 索引器的性能提升 (GH 43370)
DataFrameGroupBy.quantile() 和 SeriesGroupBy.quantile() 的性能提升 (GH 43469, GH 43725)
DataFrameGroupBy.count() 和 SeriesGroupBy.count() 的性能提升 (GH 43730, GH 43694)
DataFrameGroupBy.any()、SeriesGroupBy.any()、DataFrameGroupBy.all() 和 SeriesGroupBy.all() 的性能提升 (GH 43675, GH 42841)
DataFrameGroupBy.std() 和 SeriesGroupBy.std() 的性能提升 (GH 43115, GH 43576)
DataFrameGroupBy.cumsum() 和 SeriesGroupBy.cumsum() 的性能提升 (GH 43309)
SparseArray.min() 和 SparseArray.max() 不再需要转换为密集数组 (GH 43526)
使用 step=1 的 slice 对 SparseArray 进行索引不再需要转换为密集数组 (GH 43777)
当 allow_fill=False 时，SparseArray.take() 的性能提升 (GH 43654)
使用 engine="numba" 时，Rolling.mean()、Expanding.mean()、Rolling.sum()、Expanding.sum()、Rolling.max()、Expanding.max()、Rolling.min() 和 Expanding.min() 的性能提升 (GH 43612, GH 44176, GH 45170)
当文件编码为 UTF-8 且 memory_map=True 时，pandas.read_csv() 的性能提升 (GH 43787)
RangeIndex.sort_values() 覆盖 Index.sort_values() 的性能提升 (GH 43666)
RangeIndex.insert() 的性能提升 (GH 43988)
Index.insert() 的性能提升 (GH 43953)
DatetimeIndex.tolist() 的性能提升 (GH 43823)
DatetimeIndex.union() 的性能提升 (GH 42353)
Series.nsmallest() 的性能提升 (GH 43696)
DataFrame.insert() 的性能提升 (GH 42998)
DataFrame.dropna() 的性能提升 (GH 43683)
DataFrame.fillna() 的性能提升 (GH 43316)
DataFrame.values() 的性能提升 (GH 43160)
DataFrame.select_dtypes() 的性能提升 (GH 42611)
DataFrame 规约操作的性能提升 (GH 43185, GH 43243, GH 43311, GH 43609)
Series.unstack() 和 DataFrame.unstack() 的性能提升 (GH 43335, GH 43352, GH 42704, GH 43025)
Series.to_frame() 的性能提升 (GH 43558)
Series.mad() 的性能提升 (GH 43010)
merge() 的性能提升 (GH 43332)
当索引列是日期时间且已格式化时，to_csv() 的性能提升 (GH 39413)
当 MultiIndex 包含许多未使用的级别时，to_csv() 的性能提升 (GH 37484)
当 index_col 设置为数值列时，read_csv() 的性能提升 (GH 44158)
concat() 的性能提升 (GH 43354)
SparseArray.__getitem__() 的性能提升 (GH 23122)
从类似数组的对象（如 Pytorch 张量）构建 DataFrame 的性能提升 (GH 44616)

错误修复#

Categorical#

修复了将 dtype 不兼容的值设置到 Categorical (或由 Categorical 支持的 Series 或 DataFrame) 中时，引发 ValueError 而非 TypeError 的错误 (GH 41919)
修复了在 Categorical.searchsorted() 中传递 dtype 不兼容值时，引发 KeyError 而非 TypeError 的错误 (GH 41919)
修复了 Categorical.astype() 在将日期时间和 Timestamp 转换为 int 型（当 dtype 为 object 时）的错误 (GH 44930)
修复了当传递 dtype 不兼容值时，IntervalDtype 类型 Series.where() 引发 ValueError 而非 TypeError 的错误 (GH 41919)
修复了当传递 dtype 不兼容值时，Categorical.fillna() 引发 ValueError 而非 TypeError 的错误 (GH 41919)
修复了当用非类别元组填充时，带有元组类别的 Categorical.fillna() 引发 ValueError 而非 TypeError 的错误 (GH 41919)

日期时间类型#

修复了 DataFrame 构造函数不必要地复制非日期时间类型二维对象数组的错误 (GH 39272)
修复了 to_datetime() 在使用 format 和 pandas.NA 时引发 ValueError 的错误 (GH 42957)
to_datetime() 在无法遵守给定 dayfirst 选项时，会静默地交换 MM/DD/YYYY 和 DD/MM/YYYY 格式 - 现在，在有分隔符的日期字符串（例如 31-12-2012）的情况下，会引发警告 (GH 12585)
修复了 date_range() 和 bdate_range() 在 start = end 且集合在一侧闭合时，不返回右边界的错误 (GH 43394)
修复了 DatetimeIndex 或 TimedeltaIndex 与 DatetimeArray 或 TimedeltaArray 的原地加减法错误 (GH 43904)
修复了对时区感知 DatetimeIndex 调用 np.isnan、np.isfinite 或 np.isinf 时错误地引发 TypeError 的错误 (GH 43917)
修复了从混合时区的日期时间字符串构造 Series 时，错误地部分推断日期时间值的错误 (GH 40111)
修复了 Tick 对象与 np.timedelta64 对象相加时错误地引发错误而不是返回 Timedelta 的错误 (GH 44474)
np.maximum.reduce 和 np.minimum.reduce 现在在对 datetime64[ns] 或 timedelta64[ns] dtype 的 Series、DataFrame 或 Index 进行操作时，正确返回 Timestamp 和 Timedelta 对象 (GH 43923)
修复了将 np.timedelta64 对象添加到 BusinessDay 或 CustomBusinessDay 对象时错误地引发错误的错误 (GH 44532)
修复了 Index.insert() 在将 np.datetime64、np.timedelta64 或 tuple 插入到 dtype='object' 的 Index 中时，使用负数索引导致添加 None 并替换现有值的错误 (GH 44509)
修复了 Timestamp.to_pydatetime() 未能保留 fold 属性的错误 (GH 45087)
修复了 Series.mode() 在 DatetimeTZDtype 下错误地返回时区不敏感值，以及在 PeriodDtype 下错误地引发错误的错误 (GH 41927)
修复了 reindex() 的回归错误，该错误在使用与日期时间类型不兼容的填充值时引发错误（或在使用 datetime.date 作为填充值时未引发弃用警告）(GH 42921)
修复了 DateOffset 与 Timestamp 相加时，结果未包含 offset.nanoseconds 的错误 (GH 43968, GH 36589)
修复了 Timestamp.fromtimestamp() 不支持 tz 参数的错误 (GH 45083)
修复了从 Series 字典构造 DataFrame 时，如果索引 dtype 不匹配，有时会因传递字典的顺序而引发错误的错误 (GH 44091)
修复了 Timestamp 在某些夏令时转换期间的哈希处理导致段错误的问题 (GH 33931 and GH 40817)

Timedelta#

修复了全 NaT 的 TimeDeltaIndex、Series 或 DataFrame 列与对象 dtype 的数字数组进行除法运算时，未能推断结果为 timedelta64-dtype 的错误 (GH 39750)
修复了 timedelta64[ns] 数据与标量进行地板除法运算时返回错误值的错误 (GH 44466)
修复了 Timedelta 现在正确考虑所有关键字参数中纳秒贡献的错误 (GH 43764, GH 45227)

时区#

修复了 to_datetime() 在 infer_datetime_format=True 时未能正确解析零 UTC 偏移量 (Z) 的错误 (GH 41047)
修复了 Series.dt.tz_convert() 重置带有 CategoricalIndex 的 Series 索引的错误 (GH 43080)
修复了 Timestamp 和 DatetimeIndex 在减去两个时区不匹配的时区感知对象时错误地引发 TypeError 的错误 (GH 31793)

数值#

修复了整数列表或元组与 Series 进行地板除法时错误地引发错误的错误 (GH 44674)
修复了 DataFrame.rank() 在带有 object 列且 method="first" 时引发 ValueError 的错误 (GH 41931)
修复了 DataFrame.rank() 将缺失值和极端值（例如 np.nan 和 np.inf）视为相等，导致在 na_option="bottom" 或 na_option="top" 使用时结果不正确的错误 (GH 41931)
修复了当选项 compute.use_numexpr 设置为 False 时，numexpr 引擎仍然被使用的错误 (GH 32556)
修复了 DataFrame 算术运算中，如果子类的 _constructor() 属性是除子类本身之外的可调用对象时出现的错误 (GH 43201)
修复了涉及 RangeIndex 的算术运算中，结果会带有错误 name 的错误 (GH 43962)
修复了涉及 Series 的算术运算中，当操作数具有匹配的 NA 或匹配的元组名称时，结果可能带有错误 name 的错误 (GH 44459)
修复了 IntegerDtype 或 BooleanDtype 数组与 NA 标量进行除法运算时错误地引发错误的错误 (GH 44685)
修复了带有 FloatingDtype 的 Series 与 timedelta-like 标量相乘时错误地引发错误的错误 (GH 44772)

转换#

修复了 UInt64Index 构造函数在传递包含既小到足以转换为 int64 的正整数，又大到无法容纳在 int64 中的整数列表时出现的错误 (GH 42201)
修复了 Series 构造函数在 dtype 为 int64 时返回 0 作为缺失值，在 dtype 为 bool 时返回 False 的错误 (GH 43017, GH 43018)
修复了从包含 Series 对象的 PandasArray 构造 DataFrame 时，行为与等效的 np.ndarray 不同的错误 (GH 43986)
修复了 IntegerDtype 不允许从字符串 dtype 进行强制转换的错误 (GH 25472)
修复了 to_datetime() 在 arg:xr.DataArray 和指定 unit="ns" 时引发 TypeError 的错误 (GH 44053)
修复了当子类未重载 _constructor_sliced() 时，DataFrame.convert_dtypes() 不返回正确类型的错误 (GH 43201)
修复了 DataFrame.astype() 未能从原始 DataFrame 传播 attrs 的错误 (GH 44414)
修复了 DataFrame.convert_dtypes() 结果丢失 columns.names 的错误 (GH 41435)
修复了从 pyarrow 数据构造 IntegerArray 时未能验证 dtypes 的错误 (GH 44891)
修复了 Series.astype() 不允许从 PeriodDtype 转换为 datetime64 dtype，与 PeriodIndex 行为不一致的错误 (GH 45038)

字符串#

修复了在未安装 pyarrow 时，检查 string[pyarrow] dtype 错误地引发 ImportError 的错误 (GH 44276)

区间#

修复了带有 IntervalDtype 的 Series.where() 在 where 调用不应替换任何内容时错误地引发错误的错误 (GH 44181)

索引#

修复了在提供了 level 参数时，Series.rename() 与 MultiIndex 相关的错误 (GH 43659)
修复了 DataFrame.truncate() 和 Series.truncate() 在对象 Index 长度大于一但只有一个唯一值时出现的错误 (GH 42365)
修复了 Series.loc() 和 DataFrame.loc() 在 MultiIndex 上使用元组索引时，其中一个级别也是元组时出现的错误 (GH 27591)
修复了 Series.loc() 与 MultiIndex 相关的一个错误，当 MultiIndex 的第一层只包含 np.nan 值时出现问题 (GH 42055)
修复了对带有 DatetimeIndex 的 Series 或 DataFrame 进行字符串索引时，返回类型取决于索引是否单调的错误 (GH 24892)
在 MultiIndex 上进行索引时，当索引器是包含日期时间字符串的元组时，无法删除标量级别的错误 (GH 42476)
在 DataFrame.sort_values() 和 Series.sort_values() 中传递升序值时，未能引发或错误地引发 ValueError 的错误 (GH 41634)
使用布尔索引更新 pandas.Series 值时出现的错误，该布尔索引是使用 pandas.DataFrame.pop() 创建的 (GH 42530)
在 Index.get_indexer_non_unique() 中，当索引包含多个 np.nan 时的错误 (GH 35392)
在 DataFrame.query() 中，未能处理反引号列名（如 `Temp(°C)`）中的度数符号，该符号用于查询 DataFrame 的表达式中的错误 (GH 42826)
在 DataFrame.drop() 中，引发 KeyError 时，错误消息未能显示带有逗号的缺失标签的错误 (GH 42881)
在 DataFrame.query() 中，当安装了 numexpr 包时，查询字符串中的方法调用导致错误的错误 (GH 22435)
在 DataFrame.nlargest() 和 Series.nlargest() 中，当排序结果未计算包含 np.nan 的索引时的错误 (GH 28984)
在带有 NA 标量（例如 np.nan）的非唯一对象-dtype Index 上进行索引时的错误 (GH 43711)
在 DataFrame.__setitem__() 中，当新 dtype 和旧 dtype 匹配时，错误地写入现有列的数组而不是设置新数组的错误 (GH 43406)
在将浮点 dtype 值设置到整数 dtype 的 Series 中时，如果这些值可以无损地转换为整数，则无法原地设置的错误 (GH 44316)
在 Series.__setitem__() 中，当使用对象 dtype 设置大小和 dtype 匹配的数组（'datetime64[ns]' 或 'timedelta64[ns]'）时，错误地将日期时间/时间差转换为整数的错误 (GH 43868)
在 DataFrame.sort_index() 中，当索引已排序时，ignore_index=True 未被尊重的错误 (GH 43591)
在 Index.get_indexer_non_unique() 中，当索引包含多个 np.datetime64("NaT") 和 np.timedelta64("NaT") 时的错误 (GH 43869)
在将标量 Interval 值设置到具有 IntervalDtype 的 Series 中时，如果标量的边是浮点数而值的边是整数，则出现错误 (GH 44201)
在将可以解析为日期时间的字符串支持的 Categorical 值设置到 DatetimeArray 或 Series 或由 DatetimeArray 支持的 DataFrame 列中时，未能解析这些字符串的错误 (GH 44236)
在 Series.__setitem__() 中，当使用除 int64 以外的整数 dtype 并使用 range 对象进行设置时，不必要地向上转换为 int64 的错误 (GH 44261)
在 Series.__setitem__() 中，当使用布尔掩码索引器设置长度为 1 的类列表值时，错误地广播该值的错误 (GH 44265)
在 Series.reset_index() 中，当 drop 和 inplace 设置为 True 时，未忽略 name 参数的错误 (GH 44575)
在 DataFrame.loc.__setitem__() 和 DataFrame.iloc.__setitem__() 中，混合 dtype 有时无法原地操作的错误 (GH 44345)
在 DataFrame.loc.__getitem__() 中，当使用布尔键选择单个列时，错误地引发 KeyError 的错误 (GH 44322)。
在 DataFrame.iloc() 中，当设置单个 ExtensionDtype 列并设置 2D 值（例如 df.iloc[:] = df.values）时，错误地引发错误的错误 (GH 44514)
在 DataFrame.iloc() 中，当设置单个 ExtensionDtype 列和作为索引器的数组元组时的错误 (GH 44703)
在使用 loc 或 iloc 通过带有负步长的切片索引列时，当列为 ExtensionDtype 时错误地引发错误的错误 (GH 44551)
在 DataFrame.loc.__setitem__() 中，当索引器完全为 False 时，更改 dtype 的错误 (GH 37550)
在 IntervalIndex.get_indexer_non_unique() 中，对于非唯一且非单调的索引，返回布尔掩码而不是整数数组的错误 (GH 44084)
在 IntervalIndex.get_indexer_non_unique() 中，未正确处理带有 NaNs 的 'object' dtype 目标的错误 (GH 44482)
修复了回归问题，即当单个列 np.matrix 添加到 DataFrame 时不再被强制转换为 1d np.ndarray (GH 42376)
在 Series.__getitem__() 中，当使用整数 CategoricalIndex 时，将整数列表视为位置索引器，这与单个标量整数的行为不一致的错误 (GH 15470, GH 14865)
在 Series.__setitem__() 中，当将浮点数或整数设置到整数-dtype Series 中时，必要时未能向上转换以保持精度的错误 (GH 45121)
在 DataFrame.iloc.__setitem__() 中，忽略轴参数的错误 (GH 45032)

缺失#

在 DataFrame.fillna() 中，当使用 limit 且没有 method 时，忽略 axis='columns' 或 axis = 1 的错误 (GH 40989, GH 17399)
在 DataFrame.fillna() 中，当使用字典型 value 和重复列名时，未替换缺失值的错误 (GH 43476)
在构造 DataFrame 时，当字典值为 np.datetime64 且 dtype='timedelta64[ns]'，反之亦然时，错误地进行类型转换而不是引发错误的错误 (GH 44428)
在 Series.interpolate() 和 DataFrame.interpolate() 中，当 inplace=True 时，未原地写入底层数组的错误 (GH 44749)
在 Index.fillna() 中，当存在 NA 值并指定了 downcast 参数时，错误地返回未填充的 Index。现在会引发 NotImplementedError；请勿传递 downcast 参数 (GH 44873)
在 DataFrame.dropna() 中，即使没有删除任何条目，也会更改 Index 的错误 (GH 41965)
在 Series.fillna() 中，带有对象 dtype 错误地忽略 downcast="infer" 的错误 (GH 44241)

多级索引#

在 MultiIndex.get_loc() 中，当第一级是 DatetimeIndex 并传递字符串键时的错误 (GH 42465)
在 MultiIndex.reindex() 中，当传递与 ExtensionDtype 级别对应的 level 时的错误 (GH 42043)
在 MultiIndex.get_loc() 中，在嵌套元组上引发 TypeError 而不是 KeyError 的错误 (GH 42440)
在 MultiIndex.union() 中，设置错误的 sortorder 导致后续切片索引操作出错的错误 (GH 44752)
在 MultiIndex.putmask() 中，当另一个值也是 MultiIndex 时的错误 (GH 43212)
在 MultiIndex.dtypes() 中，重复的级别名称只返回每个名称一个 dtype 的错误 (GH 45174)

I/O#

在 read_excel() 中，尝试读取 .xlsx 文件中的图表工作表的错误 (GH 41448)
在 json_normalize() 中，当 record_path 长度大于 1 时，errors=ignore 可能未能忽略 meta 缺失值的错误 (GH 41876)
在 read_csv() 中，当使用多标题输入和引用列名为元组的参数时的错误 (GH 42446)
在 read_fwf() 中，colspecs 和 names 长度不一致时未引发 ValueError 的错误 (GH 40830)
在 Series.to_json() 和 DataFrame.to_json() 中，序列化纯 Python 对象到 JSON 时跳过某些属性的错误 (GH 42768, GH 33043)
从 sqlalchemy 的 Row 对象构造 DataFrame 时列标题被删除的错误 (GH 40682)
在反序列化带有对象 dtype 的 Index 时，错误地推断数字 dtype 的错误 (GH 43188)
在 read_csv() 中，当读取长度不等的 multi-header 输入时，错误地引发 IndexError 的错误 (GH 43102)
在 read_csv() 中，当以块模式读取文件且某些块的列数少于 engine="c" 的 header 时，引发 ParserError 的错误 (GH 21211)
在 read_csv() 中，当期望文件路径名或文件类对象时，异常类从 OSError 更改为 TypeError 的错误 (GH 43366)
在 read_csv() 和 read_fwf() 中，当为 engine='python' 指定 nrows 时，除了第一个 skiprows 以外，忽略所有 skiprows 的错误 (GH 44021, GH 10261)
在 read_csv() 中，当设置 keep_date_col=True 时，原始列保留为对象格式的错误 (GH 13378)
在 read_json() 中，未能正确处理非 numpy dtypes（尤其是 category）的错误 (GH 21892, GH 33205)
在 json_normalize() 中，多字符 sep 参数被错误地添加到每个键的前缀的错误 (GH 43831)
在 json_normalize() 中，读取缺少多级元数据的数据时，未遵守 errors="ignore" 的错误 (GH 44312)
在 read_csv() 中，当 header 设置为 None 且 engine="python" 时，使用第二行猜测隐式索引的错误 (GH 22144)
在 read_csv() 中，当为 engine="c" 给出 names 时，未识别出错误行的错误 (GH 22144)
在 read_csv() 中，float_precision="round_trip" 未跳过初始/尾随空白的错误 (GH 43713)
当 Python 未构建 lzma 模块时出现的错误：即使未使用 lzma 功能，在 pandas 导入时也会引发警告 (GH 43495)
在 read_csv() 中，未将 dtype 应用于 index_col 的错误 (GH 9435)
在 dumping/loading DataFrame 时，使用 yaml.dump(frame) 的错误 (GH 42748)
在 read_csv() 中，当 names 比 header 长但等于 engine="python" 的数据行时，引发 ValueError 的错误 (GH 38453)
在 ExcelWriter 中，engine_kwargs 未传递给所有引擎的错误 (GH 43442)
在 read_csv() 中，当 parse_dates 与 MultiIndex 列一起使用时，引发 ValueError 的错误 (GH 8991)
在 read_csv() 中，当 \n 被指定为与 lineterminator 冲突的 delimiter 或 sep 时，未引发 ValueError 的错误 (GH 43528)
在 to_csv() 中，将分类 Series 中的日期时间转换为整数的错误 (GH 40754)
在 read_csv() 中，日期解析失败后将列转换为数字的错误 (GH 11019)
在 read_csv() 中，在尝试日期转换之前未将 NaN 值替换为 np.nan 的错误 (GH 26203)
在 read_csv() 中，尝试读取 .csv 文件并从可空整数类型推断索引列 dtype 时引发 AttributeError 的错误 (GH 44079)
在 to_csv() 中，总是将具有不同格式的日期时间列强制转换为相同格式的错误 (GH 21734)
DataFrame.to_csv() 和 Series.to_csv() 当 compression 设置为 'zip' 时，不再创建包含以“.zip”结尾的文件的 zip 文件。相反，它们会更智能地推断内部文件名 (GH 39465)
在 read_csv() 中，当读取布尔值和缺失值的混合列到浮点类型时，缺失值变为 1.0 而不是 NaN 的错误 (GH 42808, GH 34120)
在 to_xml() 中，对于带有扩展数组 dtype 的 pd.NA 引发错误的错误 (GH 43903)
在 read_csv() 中，当同时传递 date_parser 中的解析器和 parse_dates=False 时，仍然调用了解析的错误 (GH 44366)
在 read_csv() 中，当 index_col 不是第一列时，未正确设置 MultiIndex 列名称的错误 (GH 38549)
在 read_csv() 中，未能创建内存映射文件时，静默忽略错误的错误 (GH 44766)
在 read_csv() 中，当传递以二进制模式打开的 tempfile.SpooledTemporaryFile 时的错误 (GH 44748)
在 read_json() 中，尝试解析包含“://”的 json 字符串时引发 ValueError 的错误 (GH 36271)
在 read_csv() 中，当 engine="c" 和 encoding_errors=None 时导致段错误的错误 (GH 45180)
在 read_csv() 中，usecols 的无效值导致未关闭的文件句柄的错误 (GH 45384)
在 DataFrame.to_json() 中，修复内存泄漏的错误 (GH 43877)

时期#

将 Period 对象添加到 np.timedelta64 对象时，错误地引发 TypeError 的错误 (GH 44182)
在 PeriodIndex.to_timestamp() 中，当索引具有 freq="B" 时，错误地推断其结果的 freq="D" 而不是 freq="B" 的错误 (GH 44105)
在 Period 构造函数中，错误地允许 np.timedelta64("NaT") 的错误 (GH 44507)
在 PeriodIndex.to_timestamp() 中，对于具有非连续数据的索引给出不正确值的错误 (GH 44100)
在 Series.where() 中，当 PeriodDtype 且 where 调用不应替换任何内容时错误地引发错误的错误 (GH 45135)

绘图#

当给定非数字数据时，DataFrame.boxplot() 现在会引发 ValueError，而不是隐晦的 KeyError 或 ZeroDivisionError，这与其他绘图函数（如 DataFrame.hist()）一致 (GH 43480)

分组/重采样/滚动#

在 SeriesGroupBy.apply() 中，当底层 Series 为空时，传递无法识别的字符串参数未能引发 TypeError 的错误 (GH 42021)
在 Series.rolling.apply(), DataFrame.rolling.apply(), Series.expanding.apply() 和 DataFrame.expanding.apply() 中，当 engine="numba" 时，*args 被缓存到用户传递的函数的错误 (GH 42287)
在 DataFrameGroupBy.max(), SeriesGroupBy.max(), DataFrameGroupBy.min() 和 SeriesGroupBy.min() 中，可空整数 dtypes 精度丢失的错误 (GH 41743)
在 DataFrame.groupby.rolling.var() 中，只在第一个组上计算滚动方差的错误 (GH 42442)
在 DataFrameGroupBy.shift() 和 SeriesGroupBy.shift() 中，如果 fill_value 不是 None，则返回分组列的错误 (GH 41556)
在 SeriesGroupBy.nlargest() 和 SeriesGroupBy.nsmallest() 中，当输入 Series 已排序且 n 大于或等于所有组大小时，索引不一致的错误 (GH 15272, GH 16345, GH 29129)
在 pandas.DataFrame.ewm() 中，非 float64 dtypes 静默失败的错误 (GH 42452)
在 pandas.DataFrame.rolling() 沿行操作 (axis=1) 时，错误地省略包含 float16 和 float32 的列的错误 (GH 41779)
在 Resampler.aggregate() 中，不允许使用 Named Aggregation 的错误 (GH 32803)
在 Series.rolling() 中，当 Series 的 dtype 为 Int64 时的错误 (GH 43016)
在 DataFrame.rolling.corr() 中，当 DataFrame 列是 MultiIndex 时的错误 (GH 21157)
在 DataFrame.groupby.rolling() 中，当指定 on 并调用 __getitem__ 时，随后返回不正确结果的错误 (GH 43355)
在 DataFrameGroupBy.apply() 和 SeriesGroupBy.apply() 中，当使用基于时间的 Grouper 对象时，在分组向量包含 NaT 的特殊情况下错误地引发 ValueError 的错误 (GH 43500, GH 43515)
在 DataFrameGroupBy.mean() 和 SeriesGroupBy.mean() 中，当 complex dtype 失败的错误 (GH 43701)
在 Series.rolling() 和 DataFrame.rolling() 中，当 center=True 且索引递减时，未正确计算第一行窗口边界的错误 (GH 43927)
在 Series.rolling() 和 DataFrame.rolling() 中，对于不均匀纳秒的居中日期时间型窗口的错误 (GH 43997)
在 DataFrameGroupBy.mean() 和 SeriesGroupBy.mean() 中，当列被选择至少两次时引发 KeyError 的错误 (GH 44924)
在 DataFrameGroupBy.nth() 和 SeriesGroupBy.nth() 中，axis=1 失败的错误 (GH 43926)
在 Series.rolling() 和 DataFrame.rolling() 中，当索引包含重复项时，未遵守居中日期时间型窗口的右边界的错误 (GH 3944)
在 Series.rolling() 和 DataFrame.rolling() 中，当使用返回不相等起始和结束数组的 pandas.api.indexers.BaseIndexer 子类时，导致段错误而不是引发 ValueError 的错误 (GH 44470)
在 Groupby.nunique() 中，未遵守 observed=True 用于 categorical 分组列的错误 (GH 45128)
在 DataFrameGroupBy.head(), SeriesGroupBy.head(), DataFrameGroupBy.tail() 和 SeriesGroupBy.tail() 中，当 dropna=True 时，未删除带有 NaN 的组的错误 (GH 45089)
在 GroupBy.__iter__() 中，选择 GroupBy 对象中的列子集后，返回所有列而不是所选子集的错误 (GH 44821)
在 Groupby.rolling() 中，当传递非单调数据时，未能正确引发 ValueError 的错误 (GH 43909)
当按具有 categorical 数据类型且长度与分组轴不相等的 Series 分组时，引发 ValueError 的错误 (GH 44179)

重塑#

改进了从多维 numpy.ndarray 创建 DataFrame 列时的错误消息 (GH 42463)
在 concat() 中，当连接在 Index 中有重复项的 DataFrame 和多个键时，创建带有重复级别条目的 MultiIndex 的错误 (GH 42651)
在 pandas.cut() 中，对具有重复索引和非精确 pandas.CategoricalIndex() 的 Series 进行操作时的错误 (GH 42185, GH 42425)
在 DataFrame.append() 中，当追加的列不匹配时未能保留 dtypes 的错误 (GH 43392)
在 concat() 中，bool 和 boolean dtypes 导致 object dtype 而不是 boolean dtype 的错误 (GH 42800)
在 crosstab() 中，当输入是分类 Series，其中一个或两个 Series 中不存在类别，并且 margins=True 时的错误。以前，缺失类别的边际值为 NaN。现在正确地报告为 0 (GH 43505)
在 concat() 中，当 objs 参数都具有相同的索引且 keys 参数包含重复项时失败的错误 (GH 43595)
在 concat() 中，忽略 sort 参数的错误 (GH 43375)
在 merge() 中，当 on 参数的列索引是 MultiIndex 时，在内部分配列时返回错误的错误 (GH 43734)
在 crosstab() 中，当输入是列表或元组时失败的错误 (GH 44076)
在 DataFrame.append() 中，当追加 Series 对象列表时，未能保留 index.name 的错误 (GH 44109)
修复了 Dataframe.apply() 方法中的元数据传播，从而修复了 Dataframe.transform(), Dataframe.nunique() 和 Dataframe.mode() 的相同问题 (GH 28283)
在 concat() 中，如果所有级别只包含缺失值，则将 MultiIndex 的级别转换为浮点数的错误 (GH 44900)
在 DataFrame.stack() 中，带有 ExtensionDtype 列时错误地引发错误的错误 (GH 43561)
在 merge() 中，当使用不同名称的索引与 on 关键字连接时引发 KeyError 的错误 (GH 45094)
在 Series.unstack() 中，对象对结果列进行不必要的类型推断的错误 (GH 44595)
在 MultiIndex.join() 中，存在重叠的 IntervalIndex 级别的错误 (GH 44096)
在 DataFrame.replace() 和 Series.replace() 中，结果 dtype 根据 regex 参数而不同的错误 (GH 44864)
在 DataFrame.pivot() 中，当 index=None 且 DataFrame 索引是 MultiIndex 时的错误 (GH 23955)

稀疏#

在 DataFrame.sparse.to_coo() 中，当列名不唯一时引发 AttributeError 的错误 (GH 29564)
在 SparseArray.max() 和 SparseArray.min() 中，对于非零元素为 0 的数组引发 ValueError 的错误 (GH 43527)
在 DataFrame.sparse.to_coo() 中，静默将非零填充值转换为零的错误 (GH 24817)
在 SparseArray 比较方法中，当与长度不匹配的类数组操作数进行比较时，根据输入引发 AssertionError 或不清楚的 ValueError 的错误 (GH 43863)
在 SparseArray 算术方法 floordiv 和 mod 中，当除以零时不匹配非稀疏 Series 行为的错误 (GH 38172)
在 SparseArray 一元方法以及 SparseArray.isna() 中，未重新计算索引的错误 (GH 44955)

扩展数组#

在 array() 中，未能保留 PandasArray 的错误 (GH 43887)
NumPy ufuncs np.abs, np.positive, np.negative 现在在对实现 __abs__, __pos__, __neg__ 的 ExtensionArrays 调用时，正确保留 dtype。特别是在 TimedeltaArray 中已修复此问题 (GH 43899, GH 23316)
NumPy ufuncs np.minimum.reduce np.maximum.reduce, np.add.reduce 和 np.prod.reduce 现在可以正确工作，而不是在带有 IntegerDtype 或 FloatDtype 的 Series 上引发 NotImplementedError (GH 43923, GH 44793)
带有 out 关键字的 NumPy ufuncs 现在支持带有 IntegerDtype 和 FloatingDtype 的数组 (GH 45122)
避免在使用许多具有扩展 dtype 的列时，引发关于碎片化 DataFrame 的 PerformanceWarning (GH 44098)
在 IntegerArray 和 FloatingArray 构造中，错误地将不匹配的 NA 值（例如 np.timedelta64("NaT")）强制转换为数字 NA 的错误 (GH 44514)
在 BooleanArray.__eq__() 和 BooleanArray.__ne__() 中，与不兼容类型（如字符串）比较时引发 TypeError 的错误。这导致 DataFrame.replace() 有时在包含可空布尔列时引发 TypeError (GH 44499)
在 array() 中，当传递带有 float16 dtype 的 ndarray 时错误地引发错误的错误 (GH 44715)
在对 BooleanArray 调用 np.sqrt 时，返回格式错误的 FloatingArray 的错误 (GH 44715)
在 Series.where() 中存在错误，当 other 是与 Series 的 `dtype` 不兼容的 NA 标量（例如，NaT 与数值 `dtype`）时，错误地转换为兼容的 NA 值（GH 44697）
在 Series.replace() 中存在错误，其中显式传递 value=None 被视为没有传递 value，并且结果中不包含 None（GH 36984，GH 19998）
在 Series.replace() 中存在错误，在无操作替换中发生了不必要的向下转换（GH 44498）
在 Series.replace() 中存在错误，在使用 FloatDtype、string[python] 或 string[pyarrow] `dtype` 时，在可能的情况下未保留其 `dtype`（GH 33484，GH 40732，GH 31644，GH 41215，GH 25438）

Styler#

在 Styler 中存在错误，其中初始化时的 uuid 保留了一个浮动下划线（GH 43037）
在 Styler.to_html() 中存在错误，其中如果使用某些参数调用 to_html 方法，Styler 对象会被更新（GH 43034）
在 Styler.copy() 中存在错误，其中 uuid 之前未被复制（GH 40675）
在 Styler.apply() 中存在错误，其中返回 Series 对象的函数在对齐其索引标签方面未被正确处理（GH 13657，GH 42014）
渲染具有命名 Index 的空 DataFrame 时存在错误（GH 43305）
渲染单层 MultiIndex 时存在错误（GH 43383）
当结合非稀疏渲染和 Styler.hide_columns() 或 Styler.hide_index() 时存在错误（GH 43464）
在 Styler 中使用多个选择器时设置表格样式存在错误（GH 44011）
行修剪和列修剪未能反映隐藏行的错误（GH 43703，GH 44247）

其他#

在 DataFrame.astype() 中存在错误，当具有非唯一列和 Series `dtype` 参数时（GH 44417）
在 CustomBusinessMonthBegin.__add__()（CustomBusinessMonthEnd.__add__()）中存在错误，当目标月份的开始（结束）已经是工作日时，未应用额外的 offset 参数（GH 41356）
在 RangeIndex.union() 中存在错误，当与另一个 RangeIndex 的 step 匹配（偶数）且起始值差异严格小于 step / 2 时（GH 44019）
在 RangeIndex.difference() 中存在错误，当 sort=None 且 step<0 时未能进行排序（GH 44085）
在 Series.replace() 和 DataFrame.replace() 中存在错误，当使用 value=None 和 ExtensionDtypes 时（GH 44270，GH 37899）
在 FloatingArray.equals() 中存在错误，未能正确判断包含 np.nan 值的两个数组是否相等（GH 44382）
在 DataFrame.shift() 中存在错误，当 axis=1 且列为 ExtensionDtype 时，如果传递不兼容的 fill_value 会错误地引发异常（GH 44564）
在 DataFrame.shift() 中存在错误，当 axis=1 且 periods 大于 len(frame.columns) 时生成了无效的 DataFrame（GH 44978）
在 DataFrame.diff() 中存在错误，当传递 NumPy 整数对象而非 int 对象时（GH 44572）
在 Series.replace() 中存在错误，当对包含 np.nan 值的 Series 使用 regex=True 时引发 ValueError（GH 43344）
在 DataFrame.to_records() 中存在错误，当缺失名称被 level_n 替换时使用了不正确的 n（GH 44818）
在 DataFrame.eval() 中存在错误，其中 resolvers 参数覆盖了默认的解析器（GH 34966）
Series.__repr__() 和 DataFrame.__repr__() 不再将索引中的所有空值替换为“NaN”，而是使用其真实的字符串表示。“NaN”仅用于 float("nan")（GH 45263）

贡献者#

共有 275 人为本次发布贡献了补丁。名字旁带有“+”的人是首次贡献补丁。

Abhishek R
Albert Villanova del Moral
Alessandro Bisiani +
Alex Lim
Alex-Gregory-1 +
Alexander Gorodetsky
Alexander Regueiro +
Alexey Györi
Alexis Mignon
Aleš Erjavec
Ali McMaster
Alibi +
Andrei Batomunkuev +
Andrew Eckart +
Andrew Hawyrluk
Andrew Wood
Anton Lodder +
Armin Berres +
Arushi Sharma +
Benedikt Heidrich +
Beni Bienz +
Benoît Vinot
Bert Palm +
Boris Rumyantsev +
Brian Hulette
Brock
Bruno Costa +
Bryan Racic +
Caleb Epstein
Calvin Ho
ChristofKaufmann +
Christopher Yeh +
Chuliang Xiao +
ClaudiaSilver +
DSM
Daniel Coll +
Daniel Schmidt +
Dare Adewumi
David +
David Sanders +
David Wales +
Derzan Chiang +
DeviousLab +
Dhruv B Shetty +
Digres45 +
Dominik Kutra +
Drew Levitt +
DriesS
EdAbati
Elle
Elliot Rampono
Endre Mark Borza
Erfan Nariman
Evgeny Naumov +
Ewout ter Hoeven +
Fangchen Li
Felix Divo
Felix Dulys +
Francesco Andreuzzi +
Francois Dion +
Frans Larsson +
Fred Reiss
GYvan
Gabriel Di Pardi Arruda +
Gesa Stupperich
Giacomo Caria +
Greg Siano +
Griffin Ansel
Hiroaki Ogasawara +
Horace +
Horace Lai +
Irv Lustig
Isaac Virshup
JHM Darbyshire (MBP)
JHM Darbyshire (iMac)
JHM Darbyshire +
Jack Liu
Jacob Skwirsk +
Jaime Di Cristina +
James Holcombe +
Janosh Riebesell +
Jarrod Millman
Jason Bian +
Jeff Reback
Jernej Makovsek +
Jim Bradley +
Joel Gibson +
Joeperdefloep +
Johannes Mueller +
John S Bogaardt +
John Zangwill +
Jon Haitz Legarreta Gorroño +
Jon Wiggins +
Jonas Haag +
Joris Van den Bossche
Josh Friedlander
José Duarte +
Julian Fleischer +
Julien de la Bruère-T
Justin McOmie
Kadatatlu Kishore +
Kaiqi Dong
Kashif Khan +
Kavya9986 +
Kendall +
Kevin Sheppard
Kiley Hewitt
Koen Roelofs +
Krishna Chivukula
KrishnaSai2020
Leonardo Freua +
Leonardus Chen
Liang-Chi Hsieh +
Loic Diridollou +
Lorenzo Maffioli +
Luke Manley +
LunarLanding +
Marc Garcia
Marcel Bittar +
Marcel Gerber +
Marco Edward Gorelli
Marco Gorelli
MarcoGorelli
Marvin +
Mateusz Piotrowski +
Mathias Hauser +
Matt Richards +
Matthew Davis +
Matthew Roeschke
Matthew Zeitlin
Matthias Bussonnier
Matti Picus
Mauro Silberberg +
Maxim Ivanov
Maximilian Carr +
MeeseeksMachine
Michael Sarrazin +
Michael Wang +
Michał Górny +
Mike Phung +
Mike Taves +
Mohamad Hussein Rkein +
NJOKU OKECHUKWU VALENTINE +
Neal McBurnett +
Nick Anderson +
Nikita Sobolev +
Olivier Cavadenti +
PApostol +
Pandas Development Team
Patrick Hoefler
Peter
Peter Tillmann +
Prabha Arivalagan +
Pradyumna Rahul
Prerana Chakraborty
Prithvijit +
Rahul Gaikwad +
Ray Bell
Ricardo Martins +
Richard Shadrach
Robbert-jan ‘t Hoen +
Robert Voyer +
Robin Raymond +
Rohan Sharma +
Rohan Sirohia +
Roman Yurchak
Ruan Pretorius +
Sam James +
Scott Talbert
Shashwat Sharma +
Sheogorath27 +
Shiv Gupta
Shoham Debnath
Simon Hawkins
Soumya +
Stan West +
Stefanie Molin +
Stefano Alberto Russo +
Stephan Heßelmann
Stephen
Suyash Gupta +
Sven
Swanand01 +
Sylvain Marié +
TLouf
Tania Allard +
Terji Petersen
TheDerivator +
Thomas Dickson
Thomas Kastl +
Thomas Kluyver
Thomas Li
Thomas Smith
Tim Swast
Tim Tran +
Tobias McNulty +
Tobias Pitters
Tomoki Nakagawa +
Tony Hirst +
Torsten Wörtwein
V.I. Wood +
Vaibhav K +
Valentin Oliver Loftsson +
Varun Shrivastava +
Vivek Thazhathattil +
Vyom Pathak
Wenjun Si
William Andrea +
William Bradley +
Wojciech Sadowski +
Yao-Ching Huang +
Yash Gupta +
Yiannis Hadjicharalambous +
Yoshiki Vázquez Baeza
Yuanhao Geng
Yury Mikhaylov
Yvan Gatete +
Yves Delley +
Zach Rait
Zbyszek Królikowski +
Zero +
Zheyuan
Zhiyi Wu +
aiudirog
ali sayyah +
aneesh98 +
aptalca
arw2019 +
attack68
brendandrury +
bubblingoak +
calvinsomething +
claws +
deponovo +
dicristina
el-g-1 +
evensure +
fotino21 +
fshi01 +
gfkang +
github-actions[bot]
i-aki-y
jbrockmendel
jreback
juliandwain +
jxb4892 +
kendall smith +
lmcindewar +
lrepiton
maximilianaccardo +
michal-gh
neelmraman
partev
phofl +
pratyushsharan +
quantumalaviya +
rafael +
realead
rocabrera +
rosagold
saehuihwang +
salomondush +
shubham11941140 +
srinivasan +
stphnlyd
suoniq
trevorkask +
tushushu
tyuyoshi +
usersblock +
vernetya +
vrserpa +
willie3838 +
zeitlinv +
zhangxiaoxing +

1.4.0 版本新特性 (2022 年 1 月 22 日)#

改进#

改进了警告消息#

Index 可以容纳任意 ExtensionArrays#

Styler#

使用基于 pyarrow 的新 CSV 引擎进行多线程 CSV 读取#

滚动和扩展窗口的排名函数#

Groupby 位置索引#

DataFrame.from_dict 和 DataFrame.to_dict 增加了新的 'tight' 选项#

其他增强功能#

显著的 Bug 修复#

日期字符串解析不一致#

在 concat 中忽略空列或全 NA 列的 dtypes#

在 value_counts 和 mode 中，空值不再强制转换为 NaN 值#

read_csv 中的 mangle_dupe_cols 不再重命名与目标名称冲突的唯一列#

unstack 和 pivot_table 不再因结果可能超出 int32 限制而引发 ValueError#

groupby.apply 一致的转换检测#

不兼容的 API 变更#

Python 的最低版本要求提高#

依赖项的最低版本要求提高#

其他 API 变更#

废弃项#

废弃了 Int64Index, UInt64Index & Float64Index#

废弃了 DataFrame.append 和 Series.append#

其他废弃项#

性能改进#

错误修复#

Categorical#

日期时间类型#

Timedelta#

时区#

数值#

转换#

字符串#

区间#

索引#

缺失#

多级索引#

I/O#

时期#

绘图#

分组/重采样/滚动#

重塑#

稀疏#

扩展数组#

Styler#

其他#

贡献者#

DataFrame.from_dict 和 DataFrame.to_dict 增加了新的 `'tight'` 选项#