2.2.0 新特性 (2024 年 1 月 19 日)#

这是 pandas 2.2.0 中的变化。有关包括其他 pandas 版本在内的完整更新日志，请参阅发布说明。

pandas 3.0 的未来变化#

pandas 3.0 将为 pandas 的默认行为带来两项重大变更。

写时复制 (Copy-on-Write)#

当前可选的写时复制模式将在 pandas 3.0 中默认启用。将不再提供保持当前行为启用的选项。新的行为语义在写时复制用户指南中有所解释。

从 pandas 2.0 开始，可以通过以下选项启用新行为

pd.options.mode.copy_on_write = True

此项变更带来了 pandas 在处理复制和视图方式上的不同行为变化。其中一些变化允许明确弃用，例如链式赋值的变化。其他变化则更为微妙，因此警告信息隐藏在一个可在 pandas 2.2 中启用的选项后面。

pd.options.mode.copy_on_write = "warn"

此模式会在许多与大多数查询不相关的场景中发出警告。我们建议探索此模式，但并非必须消除所有这些警告。迁移指南更详细地解释了升级过程。

默认专用字符串数据类型（由 Arrow 支持）#

历史上，pandas 使用 NumPy 的对象数据类型表示字符串列。这种表示方式存在诸多问题，包括性能缓慢和内存占用大。这种情况将在 pandas 3.0 中改变。pandas 将开始把字符串列推断为一种新的 string 数据类型，该类型由 Arrow 支持，在内存中连续表示字符串。这将带来巨大的性能和内存改进。

旧行为

In [1]: ser = pd.Series(["a", "b"])
Out[1]:
0    a
1    b
dtype: object

新行为

In [1]: ser = pd.Series(["a", "b"])
Out[1]:
0    a
1    b
dtype: string

在这些场景中使用的字符串数据类型将主要表现得与 NumPy 对象类似，包括缺失值语义以及对这些列的常规操作。

此更改在整个 API 中还包含一些额外更改

目前，指定 dtype="string" 会创建一个由存储在 NumPy 数组中的 Python 字符串支持的 dtype。这将在 pandas 3.0 中改变，此 dtype 将创建一个由 Arrow 支持的字符串列。
列名和索引也将由 Arrow 字符串支持。
为适应此更改，PyArrow 将成为 pandas 3.0 的必需依赖项。

可以通过以下方式启用未来的 dtype 推断逻辑

pd.options.future.infer_string = True

增强功能#

to_sql 和 read_sql 中的 ADBC 驱动程序支持#

read_sql() 和 to_sql() 现在支持 Apache Arrow ADBC 驱动程序。与通过 SQLAlchemy 使用的传统驱动程序相比，ADBC 驱动程序应提供显著的性能改进、更好的类型支持和更清晰的空值处理。

import adbc_driver_postgresql.dbapi as pg_dbapi

df = pd.DataFrame(
    [
        [1, 2, 3],
        [4, 5, 6],
    ],
    columns=['a', 'b', 'c']
)
uri = "postgresql://postgres:postgres@localhost/postgres"
with pg_dbapi.connect(uri) as conn:
    df.to_sql("pandas_table", conn, index=False)

# for round-tripping
with pg_dbapi.connect(uri) as conn:
    df2 = pd.read_sql("pandas_table", conn)

Arrow 类型系统提供了更广泛的类型，可以更紧密地匹配 PostgreSQL 等数据库提供的类型。为了说明这一点，请注意以下不同数据库和 pandas 后端中可用类型的（非详尽）列表

NumPy/pandas	Arrow	PostgreSQL	SQLite
`int16/Int16`	`int16`	`SMALLINT`	`INTEGER`
`int32/Int32`	`int32`	`INTEGER`	`INTEGER`
`int64/Int64`	`int64`	`BIGINT`	`INTEGER`
`float32`	`float32`	`REAL`	`REAL`
`float64`	`float64`	`DOUBLE PRECISION`	`REAL`
`object`	`string`	`TEXT`	`TEXT`
`bool`	`bool_`	`BOOLEAN`
`datetime64[ns]`	`timestamp(us)`	`TIMESTAMP`
`datetime64[ns,tz]`	`timestamp(us,tz)`	`TIMESTAMPTZ`
	`date32`	`DATE`
	`month_day_nano_interval`	`INTERVAL`
	`binary`	`BINARY`	`BLOB`
	`decimal128`	`DECIMAL` [1]
	`list`	`ARRAY` [1]
	`struct`	`COMPOSITE TYPE` [1]

注释

如果您有兴趣在 DataFrame 的整个生命周期中尽可能地保留数据库类型，建议用户利用 read_sql() 的 dtype_backend="pyarrow" 参数

# for round-tripping
with pg_dbapi.connect(uri) as conn:
    df2 = pd.read_sql("pandas_table", conn, dtype_backend="pyarrow")

这将阻止您的数据转换为传统的 pandas/NumPy 类型系统，后者通常会将 SQL 类型以无法往返转换的方式进行转换。

有关 ADBC 驱动程序及其开发状态的完整列表，请参阅 ADBC 驱动程序实现状态文档。

根据一个或多个条件创建 pandas Series#

已添加 Series.case_when() 函数，用于根据一个或多个条件创建 Series 对象。 (GH 39154)

In [1]: import pandas as pd

In [2]: df = pd.DataFrame(dict(a=[1, 2, 3], b=[4, 5, 6]))

In [3]: default=pd.Series('default', index=df.index)

In [4]: default.case_when(
   ...:      caselist=[
   ...:          (df.a == 1, 'first'),                              # condition, replacement
   ...:          (df.a.gt(1) & df.b.eq(5), 'second'),  # condition, replacement
   ...:      ],
   ...: )
   ...: 
Out[4]: 
0      first
1     second
2    default
dtype: object

用于 NumPy 可空类型和 Arrow 类型的 `to_numpy` 转换为合适的 NumPy dtype#

用于 NumPy 可空类型和 Arrow 类型的 to_numpy 现在将转换为合适的 NumPy dtype，而不是用于可空类型和 PyArrow 支持的扩展 dtypes 的 object dtype。

旧行为

In [1]: ser = pd.Series([1, 2, 3], dtype="Int64")
In [2]: ser.to_numpy()
Out[2]: array([1, 2, 3], dtype=object)

新行为

In [5]: ser = pd.Series([1, 2, 3], dtype="Int64")

In [6]: ser.to_numpy()
Out[6]: array([1, 2, 3])

In [7]: ser = pd.Series([1, 2, 3], dtype="timestamp[ns][pyarrow]")

In [8]: ser.to_numpy()
Out[8]: 
array(['1970-01-01T00:00:00.000000001', '1970-01-01T00:00:00.000000002',
       '1970-01-01T00:00:00.000000003'], dtype='datetime64[ns]')

默认的 NumPy dtype（不带任何参数）确定如下

浮点 dtypes 被转换为 NumPy 浮点数
不含缺失值的整数 dtypes 被转换为 NumPy 整数 dtypes
含有缺失值的整数 dtypes 被转换为 NumPy 浮点 dtypes，并使用 NaN 作为缺失值指示符
不含缺失值的布尔 dtypes 被转换为 NumPy 布尔 dtype
含有缺失值的布尔 dtypes 保留对象 dtype
datetime 和 timedelta 类型分别被转换为 NumPy datetime64 和 timedelta64 类型，并使用 NaT 作为缺失值指示符

PyArrow 结构化数据的 Series.struct 访问器#

Series.struct 访问器为处理 struct[pyarrow] dtype 的 Series 数据提供属性和方法。例如，Series.struct.explode() 将 PyArrow 结构化数据转换为 pandas DataFrame。 (GH 54938)

In [9]: import pyarrow as pa

In [10]: series = pd.Series(
   ....:     [
   ....:         {"project": "pandas", "version": "2.2.0"},
   ....:         {"project": "numpy", "version": "1.25.2"},
   ....:         {"project": "pyarrow", "version": "13.0.0"},
   ....:     ],
   ....:     dtype=pd.ArrowDtype(
   ....:         pa.struct([
   ....:             ("project", pa.string()),
   ....:             ("version", pa.string()),
   ....:         ])
   ....:     ),
   ....: )
   ....: 

In [11]: series.struct.explode()
Out[11]: 
   project version
0   pandas   2.2.0
1    numpy  1.25.2
2  pyarrow  13.0.0

使用 Series.struct.field() 索引（可能嵌套的）结构体字段。

In [12]: series.struct.field("project")
Out[12]: 
0     pandas
1      numpy
2    pyarrow
Name: project, dtype: string[pyarrow]

PyArrow 列表数据的 Series.list 访问器#

Series.list 访问器为处理 list[pyarrow] dtype 的 Series 数据提供属性和方法。例如，Series.list.__getitem__() 允许在 Series 中索引 pyarrow 列表。 (GH 55323)

In [13]: import pyarrow as pa

In [14]: series = pd.Series(
   ....:     [
   ....:         [1, 2, 3],
   ....:         [4, 5],
   ....:         [6],
   ....:     ],
   ....:     dtype=pd.ArrowDtype(
   ....:         pa.list_(pa.int64())
   ....:     ),
   ....: )
   ....: 

In [15]: series.list[0]
Out[15]: 
0    1
1    4
2    6
dtype: int64[pyarrow]

`read_excel()` 的 Calamine 引擎#

read_excel() 中添加了 calamine 引擎。它使用 python-calamine，该库为 Rust 库 calamine 提供了 Python 绑定。此引擎支持 Excel 文件（.xlsx、.xlsm、.xls、.xlsb）和 OpenDocument 电子表格（.ods）。 (GH 50395)

该引擎有两个优点

Calamine 通常比其他引擎更快，一些基准测试显示其速度比“openpyxl”快 5 倍，比“odf”快 20 倍，比“pyxlsb”快 4 倍，比“xlrd”快 1.5 倍。但是，“openpyxl”和“pyxlsb”在从大型文件中读取少量行时更快，因为它们支持行的惰性迭代。
Calamine 支持识别 .xlsb 文件中的 datetime，而“pyxlsb”是 pandas 中唯一能够读取 .xlsb 文件的其他引擎，却不支持此功能。

pd.read_excel("path_to_file.xlsb", engine="calamine")

更多信息，请参阅 IO 工具用户指南中的 Calamine (Excel 和 ODS 文件)。

其他增强功能#

将 method 参数设置为 multi 的 to_sql() 现在可以在后端与 Oracle 配合使用。
Series.attrs / DataFrame.attrs 现在使用深拷贝来传播 attrs。 (GH 54134)
get_dummies() 现在返回与输入 dtype 兼容的扩展 dtypes boolean 或 bool[pyarrow]。 (GH 56273)
read_csv() 现在支持 on_bad_lines 参数与 engine="pyarrow" 搭配使用。 (GH 54480)
read_sas() 现在返回分辨率更接近 SAS 原生存储的 datetime64 dtypes，并避免在无法使用 datetime64[ns] dtype 存储的情况下返回 object-dtype。 (GH 56127)
read_spss() 现在返回一个 DataFrame，其元数据存储在 DataFrame.attrs 中。 (GH 54264)
tseries.api.guess_datetime_format() 现在是公共 API 的一部分。 (GH 54727)
DataFrame.apply() 现在允许使用 numba（通过 engine="numba"）对传入函数进行 JIT 编译，从而可能提高速度。 (GH 54666)
添加了 ExtensionArray._explode() 接口方法，以允许扩展类型实现 explode 方法。 (GH 54833)
添加了 ExtensionArray.duplicated() 方法，以允许扩展类型实现 duplicated 方法。 (GH 55255)
Series.ffill()、Series.bfill()、DataFrame.ffill() 和 DataFrame.bfill() 增加了参数 limit_area；第三方 ExtensionArray 作者需要将此参数添加到 _pad_or_backfill 方法中。 (GH 56492)
允许通过 read_excel() 的 engine_kwargs 参数将 read_only、data_only 和 keep_links 参数传递给 openpyxl。 (GH 55027)
为带有 pyarrow.duration 类型的 ArrowDtype 实现了 Series.interpolate() 和 DataFrame.interpolate()。 (GH 56267)
为 Series.value_counts() 实现了 masked 算法。 (GH 54984)
为带有 pyarrow.duration 类型的 ArrowDtype 实现了 Series.dt() 方法和属性。 (GH 52284)
为 ArrowDtype 实现了 Series.str.extract()。 (GH 56268)
改进了 DatetimeIndex.to_period() 中出现的不支持作为期间频率的频率（例如 "BMS"）的错误消息。 (GH 56243)
改进了使用无效偏移量（例如 "QS"）构造 Period 时的错误消息。 (GH 55785)
dtypes string[pyarrow] 和 string[pyarrow_numpy] 现在都利用 PyArrow 的 large_string 类型，以避免长列溢出。 (GH 56259)

显著的错误修复#

这些是可能导致显著行为变化的错误修复。

`merge()` 和 `DataFrame.join()` 现在一致遵循文档中描述的排序行为#

在之前的 pandas 版本中，merge() 和 DataFrame.join() 并不总是返回符合文档中描述的排序行为的结果。现在，pandas 在合并和连接操作中遵循文档中描述的排序行为。 (GH 54611, GH 56426, GH 56443)

如文档所述，sort=True 会按字典序对结果 DataFrame 中的连接键进行排序。当 sort=False 时，连接键的顺序取决于连接类型（how 关键字）

how="left"：保留左键的顺序
how="right"：保留右键的顺序
how="inner"：保留左键的顺序
how="outer"：按字典序排序键

一个行为改变的例子是，左连接键不唯一且 sort=False 的内连接

In [16]: left = pd.DataFrame({"a": [1, 2, 1]})

In [17]: right = pd.DataFrame({"a": [1, 2]})

In [18]: result = pd.merge(left, right, how="inner", on="a", sort=False)

旧行为

In [5]: result
Out[5]:
   a
0  1
1  1
2  2

新行为

In [19]: result
Out[19]: 
   a
0  1
1  2
2  1

当级别不同时，`merge()` 和 `DataFrame.join()` 不再重新排序级别#

在之前的 pandas 版本中，当连接两个具有不同级别的索引时，merge() 和 DataFrame.join() 会重新排序索引级别。 (GH 34133)

In [20]: left = pd.DataFrame({"left": 1}, index=pd.MultiIndex.from_tuples([("x", 1), ("x", 2)], names=["A", "B"]))

In [21]: right = pd.DataFrame({"right": 2}, index=pd.MultiIndex.from_tuples([(1, 1), (2, 2)], names=["B", "C"]))

In [22]: left
Out[22]: 
     left
A B      
x 1     1
  2     1

In [23]: right
Out[23]: 
     right
B C       
1 1      2
2 2      2

In [24]: result = left.join(right)

旧行为

In [5]: result
Out[5]:
       left  right
B A C
1 x 1     1      2
2 x 2     1      2

新行为

In [25]: result
Out[25]: 
       left  right
A B C             
x 1 1     1      2
  2 2     1      2

提高的依赖项最低版本#

对于可选依赖项，一般建议使用最新版本。低于最低测试版本的可选依赖项可能仍然有效，但不支持。下表列出了已提高最低测试版本号的可选依赖项。

包	新最低版本
`beautifulsoup4`	4.11.2
`blosc`	1.21.3
`bottleneck`	1.3.6
`fastparquet`	2022.12.0
`fsspec`	2022.11.0
`gcsfs`	2022.11.0
`lxml`	4.9.2
`matplotlib`	3.6.3
`numba`	0.56.4
`numexpr`	2.8.4
`qtpy`	2.3.0
`openpyxl`	3.1.0
`psycopg2`	2.9.6
`pyreadstat`	1.2.0
`pytables`	3.8.0
`pyxlsb`	1.0.10
`s3fs`	2022.11.0
`scipy`	1.10.0
`sqlalchemy`	2.0.0
`tabulate`	0.9.0
`xarray`	2022.12.0
`xlsxwriter`	3.0.5
`zstandard`	0.19.0
`pyqt5`	5.15.8
`tzdata`	2022.7

更多信息请参见依赖项和可选依赖项。

其他 API 变更#

可空扩展 dtypes 的哈希值已更改，以提高哈希操作的性能。 (GH 56507)
check_exact 现在仅对 testing.assert_frame_equal() 和 testing.assert_series_equal() 中的浮点 dtypes 生效。特别是，整数 dtypes 总是精确检查。 (GH 55882)

弃用#

链式赋值#

为了准备 pandas 3.0 中复制/视图行为的更大变更（写时复制 (CoW), PDEP-7），我们开始弃用链式赋值。

链式赋值是指您尝试通过两个后续索引操作更新 pandas DataFrame 或 Series。根据这些操作的类型和顺序，目前这种操作可能有效也可能无效。

一个典型示例如下

df = pd.DataFrame({"foo": [1, 2, 3], "bar": [4, 5, 6]})

# first selecting rows with a mask, then assigning values to a column
# -> this has never worked and raises a SettingWithCopyWarning
df[df["bar"] > 5]["foo"] = 100

# first selecting the column, and then assigning to a subset of that column
# -> this currently works
df["foo"][df["bar"] > 5] = 100

这个链式赋值的第二个示例目前可以更新原始的 df。这在 pandas 3.0 中将不再有效，因此我们开始弃用此功能

>>> df["foo"][df["bar"] > 5] = 100
FutureWarning: ChainedAssignmentError: behaviour will change in pandas 3.0!
You are setting values through chained assignment. Currently this works in certain cases, but when using Copy-on-Write (which will become the default behaviour in pandas 3.0) this will never work to update the original DataFrame or Series, because the intermediate object on which we are setting values will behave as a copy.
A typical example is when you are setting values in a column of a DataFrame, like:

df["col"][row_indexer] = value

Use `df.loc[row_indexer, "col"] = values` instead, to perform the assignment in a single step and ensure this keeps updating the original `df`.

See the caveats in the documentation: https://pandas.ac.cn/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy

您可以通过消除链式赋值的用法来修复此警告，并确保您的代码已为 pandas 3.0 做好准备。通常，这可以通过使用例如 .loc 在一个步骤中完成赋值。对于上面的示例，我们可以这样做

df.loc[df["bar"] > 5, "foo"] = 100

同样的弃用适用于以链式方式执行的原地方法，例如

>>> df["foo"].fillna(0, inplace=True)
FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method.
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.

当目标是更新 DataFrame df 中的列时，这里的替代方法是在 df 本身上调用该方法，例如 df.fillna({"foo": 0}, inplace=True)。

更多详细信息请参见迁移指南。

弃用偏移量中的别名 `M`、`Q`、`Y` 等，转而使用 `ME`、`QE`、`YE` 等#

弃用了以下频率别名 (GH 9586)

偏移量	已弃用别名	新别名
`月末`	`M`	`ME`
`工作月末`	`BM`	`BME`
`半月末`	`SM`	`SME`
`自定义工作月末`	`CBM`	`CBME`
`季度末`	`Q`	`QE`
`工作季度末`	`BQ`	`BQE`
`年末`	`Y`	`YE`
`工作年末`	`BY`	`BYE`

例如

之前行为:

In [8]: pd.date_range('2020-01-01', periods=3, freq='Q-NOV')
Out[8]:
DatetimeIndex(['2020-02-29', '2020-05-31', '2020-08-31'],
              dtype='datetime64[ns]', freq='Q-NOV')

未来行为:

In [26]: pd.date_range('2020-01-01', periods=3, freq='QE-NOV')
Out[26]: DatetimeIndex(['2020-02-29', '2020-05-31', '2020-08-31'], dtype='datetime64[ns]', freq='QE-NOV')

弃用自动向下转型#

弃用了在许多方法中对对象 dtype 结果的自动向下转型。由于行为依赖于值，这些方法会以难以预测的方式静默更改 dtype。此外，pandas 正在逐步取消静默 dtype 更改。 (GH 54710, GH 54261)

这些方法包括

未来请明确调用 DataFrame.infer_objects() 以复制当前行为。

result = result.infer_objects(copy=False)

或者使用 astype 将所有浮点数显式转换为整数。

设置以下选项以选择未来的行为

In [9]: pd.set_option("future.no_silent_downcasting", True)

其他弃用#

更改了 Timedelta.resolution_string()，使其返回 h, min, s, ms, us 和 ns，而不是 H, T, S, L, U 和 N，以与频率别名中的相应弃用保持兼容性。 (GH 52536)
弃用了 offsets.Day.delta, offsets.Hour.delta, offsets.Minute.delta, offsets.Second.delta, offsets.Milli.delta, offsets.Micro.delta, offsets.Nano.delta，请改用 pd.Timedelta(obj)。 (GH 55498)
弃用了 pandas.api.types.is_interval() 和 pandas.api.types.is_period()，请改用 isinstance(obj, pd.Interval) 和 isinstance(obj, pd.Period)。 (GH 55264)
弃用了 read_gbq() 和 DataFrame.to_gbq()。请改用 pandas_gbq.read_gbq 和 pandas_gbq.to_gbq https://pandas-gbq.readthedocs.io/en/latest/api.html。 (GH 55525)
弃用了 DataFrameGroupBy.fillna() 和 SeriesGroupBy.fillna()；请改用 DataFrameGroupBy.ffill()、DataFrameGroupBy.bfill() 进行前向和后向填充，或使用 DataFrame.fillna() 填充单个值（或 Series 对应的函数）。 (GH 55718)
弃用了 DateOffset.is_anchored()，对于非 Tick 子类请使用 obj.n == 1（对于 Tick，此值始终为 False）。 (GH 55388)
弃用了 DatetimeArray.__init__() 和 TimedeltaArray.__init__()，请改用 array()。 (GH 55623)
弃用了 Index.format()，请改用 index.astype(str) 或 index.map(formatter)。 (GH 55413)
弃用了 Series.ravel()，底层数组已经是 1D，因此 ravel 不是必需的。 (GH 52511)
弃用了使用 PeriodIndex（和 'convention' 关键字）的 Series.resample() 和 DataFrame.resample()，请改为在重采样之前转换为 DatetimeIndex（使用 .to_timestamp()）。 (GH 53481)
弃用了 Series.view()，请改用 Series.astype() 来更改 dtype。 (GH 20251)
弃用了 offsets.Tick.is_anchored()，请改用 False。 (GH 55388)
弃用了 core.internals 成员 Block、ExtensionBlock 和 DatetimeTZBlock，请改用公共 API。 (GH 55139)
弃用了 PeriodIndex 构造函数中的 year、month、quarter、day、hour、minute 和 second 关键字，请改用 PeriodIndex.from_fields()。 (GH 55960)
弃用了在 Index.view() 中将类型作为参数传递，请改为不带任何参数进行调用。 (GH 55709)
弃用了在 date_range()、timedelta_range()、period_range() 和 interval_range() 中允许非整数 periods 参数。 (GH 56036)
弃用了在 DataFrame.to_clipboard() 中允许非关键字参数。 (GH 54229)
弃用了在 DataFrame.to_csv() 中允许非关键字参数，path_or_buf 除外。 (GH 54229)
弃用了在 DataFrame.to_dict() 中允许非关键字参数。 (GH 54229)
弃用了在 DataFrame.to_excel() 中允许非关键字参数，excel_writer 除外。 (GH 54229)
弃用了在 DataFrame.to_gbq() 中允许非关键字参数，destination_table 除外。 (GH 54229)
弃用了在 DataFrame.to_hdf() 中允许非关键字参数，path_or_buf 除外。 (GH 54229)
弃用了在 DataFrame.to_html() 中允许非关键字参数，buf 除外。 (GH 54229)
弃用了在 DataFrame.to_json() 中允许非关键字参数，path_or_buf 除外。 (GH 54229)
弃用了在 DataFrame.to_latex() 中允许非关键字参数，buf 除外。 (GH 54229)
弃用了在 DataFrame.to_markdown() 中允许非关键字参数，buf 除外。 (GH 54229)
弃用了在 DataFrame.to_parquet() 中允许非关键字参数，path 除外。 (GH 54229)
已弃用在 DataFrame.to_pickle() 中允许非关键字参数，除了 path (GH 54229)
已弃用在 DataFrame.to_string() 中允许非关键字参数，除了 buf (GH 54229)
已弃用在 DataFrame.to_xml() 中允许非关键字参数，除了 path_or_buffer (GH 54229)
已弃用允许将 BlockManager 对象传递给 DataFrame 或将 SingleBlockManager 对象传递给 Series (GH 52419)
已弃用 Index.insert() 在对象数据类型索引上默默执行结果类型推断的行为，请明确调用 result.infer_objects(copy=False) 以保留旧行为 (GH 51363)
已弃用在 Series.isin() 和 Index.isin() 中将非日期时间值（主要是字符串）转换为 datetime64、timedelta64 和 PeriodDtype 数据类型的行为 (GH 53111)
已弃用在 Index、Series 和 DataFrame 构造函数中给定 pandas 输入时进行数据类型推断的行为，请对输入调用 .infer_objects 以保持当前行为 (GH 56012)
已弃用将 Index 设置到 DataFrame 时进行数据类型推断的行为，请改为明确转换 (GH 56102)
已弃用在使用 DataFrameGroupBy.apply() 和 DataFrameGroupBy.resample() 时在计算中包含组的行为；传递 include_groups=False 以排除组 (GH 7155)
已弃用使用长度为零的布尔索引器对 Index 进行索引的行为 (GH 55820)
已弃用在按长度为1的列表状对象分组时，不向 DataFrameGroupBy.get_group 或 SeriesGroupBy.get_group 传递元组的行为 (GH 25971)
已弃用在 YearBegin 中表示频率的字符串 AS，以及表示不同财年开始的年度频率的字符串 AS-DEC、AS-JAN 等 (GH 54275)
已弃用在 YearEnd 中表示频率的字符串 A，以及表示不同财年结束的年度频率的字符串 A-DEC、A-JAN 等 (GH 54275)
已弃用在 BYearBegin 中表示频率的字符串 BAS，以及表示不同财年开始的年度频率的字符串 BAS-DEC、BAS-JAN 等 (GH 54275)
已弃用在 BYearEnd 中表示频率的字符串 BA，以及表示不同财年结束的年度频率的字符串 BA-DEC、BA-JAN 等 (GH 54275)
已弃用在 Hour、BusinessHour、CustomBusinessHour 中表示频率的字符串 H、BH 和 CBH (GH 52536)
已弃用在 to_timedelta() 中表示单位的字符串 H、S、U 和 N (GH 52536)
已弃用在 Timedelta 中表示单位的字符串 H、T、S、L、U 和 N (GH 52536)
已弃用在 Minute、Second、Milli、Micro、Nano 中表示频率的字符串 T、S、L、U 和 N (GH 52536)
已弃用在 read_csv() 中组合解析后的日期时间列以及 keep_date_col 关键字的支持 (GH 55569)
已弃用 DataFrameGroupBy.grouper 和 SeriesGroupBy.grouper；这些属性将在未来版本的 pandas 中移除 (GH 56521)
已弃用 Grouping 属性 group_index、result_index 和 group_arraylike；这些属性将在未来版本的 pandas 中移除 (GH 56148)
已弃用 read_csv() 和 read_table() 中的 delim_whitespace 关键字，请改为使用 sep="\\s+" (GH 55569)
已弃用 to_datetime()、to_timedelta() 和 to_numeric() 中的 errors="ignore" 选项；请改为明确捕获异常 (GH 54467)
已弃用 Series 构造函数中的 fastpath 关键字 (GH 20110)
已弃用 Series.resample() 和 DataFrame.resample() 中的 kind 关键字，请改为明确转换对象的 index (GH 55895)
已弃用 PeriodIndex 中的 ordinal 关键字，请改为使用 PeriodIndex.from_ordinals() (GH 55960)
已弃用 TimedeltaIndex 构造中的 unit 关键字，请改为使用 to_timedelta() (GH 55499)
已弃用 read_csv() 和 read_table() 中的 verbose 关键字 (GH 55569)
已弃用 DataFrame.replace() 和 Series.replace() 与 CategoricalDtype 结合时的行为；在未来版本中，replace 将更改值同时保留类别。要更改类别，请改为使用 ser.cat.rename_categories (GH 55147)
已弃用 Series.value_counts() 和 Index.value_counts() 与对象数据类型结合时的行为；在未来版本中，它们将不会对结果 Index 执行数据类型推断，请执行 result.index = result.index.infer_objects() 以保留旧行为 (GH 56161)
已弃用 DataFrame.pivot_table() 中 observed=False 的默认值；在未来版本中将为 True (GH 56236)
已弃用扩展测试类 BaseNoReduceTests、BaseBooleanReduceTests 和 BaseNumericReduceTests，请改为使用 BaseReduceTests (GH 54663)
已弃用选项 mode.data_manager 和 ArrayManager；未来版本中将只提供 BlockManager (GH 55043)
已弃用 DataFrame.stack 的先前实现；指定 future_stack=True 以采用未来版本 (GH 53515)

性能改进#

testing.assert_frame_equal() 和 testing.assert_series_equal() 的性能改进 (GH 55949, GH 55971)
concat() 在 axis=1 且对象具有未对齐索引时的性能改进 (GH 55084)
get_dummies() 的性能改进 (GH 56089)
merge() 和 merge_ordered() 在按排序升序键连接时的性能改进 (GH 56115)
merge_asof() 在 by 不为 None 时的性能改进 (GH 55580, GH 55678)
read_stata() 对于具有许多变量的文件的性能改进 (GH 55515)
DataFrame.groupby() 在聚合 pyarrow 时间戳和持续时间数据类型时的性能改进 (GH 55031)
DataFrame.join() 在按无序分类索引连接时的性能改进 (GH 56345)
DataFrame.loc() 和 Series.loc() 在使用 MultiIndex 进行索引时的性能改进 (GH 56062)
DataFrame.sort_index() 和 Series.sort_index() 在通过 MultiIndex 索引时的性能改进 (GH 54835)
DataFrame.to_dict() 在将 DataFrame 转换为字典时的性能改进 (GH 50990)
Index.difference() 的性能改进 (GH 55108)
Index.sort_values() 在索引已排序时的性能改进 (GH 56128)
MultiIndex.get_indexer() 在 method 不为 None 时的性能改进 (GH 55839)
Series.duplicated() 对于 pyarrow 数据类型的性能改进 (GH 55255)
Series.str.get_dummies() 在数据类型为 "string[pyarrow]" 或 "string[pyarrow_numpy]" 时的性能改进 (GH 56110)
Series.str() 方法的性能改进 (GH 55736)
Series.value_counts() 和 Series.mode() 对于掩码数据类型的性能改进 (GH 54984, GH 55340)
DataFrameGroupBy.nunique() 和 SeriesGroupBy.nunique() 的性能改进 (GH 55972)
SeriesGroupBy.idxmax()、SeriesGroupBy.idxmin()、DataFrameGroupBy.idxmax()、DataFrameGroupBy.idxmin() 的性能改进 (GH 54234)
哈希可为空的扩展数组时的性能改进 (GH 56507)
索引非唯一索引时的性能改进 (GH 55816)
使用超过4个键进行索引时的性能改进 (GH 54550)
将时间本地化为 UTC 时的性能改进 (GH 55241)

错误修复#

分类数据#

Categorical.isin() 对于包含重叠 Interval 值的分类数据引发 InvalidIndexError (GH 34974)
CategoricalDtype.__eq__() 中的错误，对于混合类型的无序分类数据返回 False (GH 55468)
在将 pa.dictionary 转换为 CategoricalDtype 时，使用 pa.DictionaryArray 作为类别时出错 (GH 56672)

日期时间类#

DatetimeIndex 构造中的错误，当同时传递 tz 和 dayfirst 或 yearfirst 时，会忽略 dayfirst/yearfirst (GH 55813)
DatetimeIndex 中的错误，当传递浮点对象的对象数据类型 ndarray 和 tz 时，结果本地化不正确 (GH 55780)
Series.isin() 中的错误，对于 DatetimeTZDtype 数据类型和所有比较值都是 NaT 的情况，即使系列包含 NaT 条目，也错误地返回全 False (GH 56427)
concat() 中的错误，当连接全NA DataFrame 和 DatetimeTZDtype 数据类型 DataFrame 时引发 AttributeError (GH 52093)
testing.assert_extension_array_equal() 中的错误，在比较分辨率时可能使用了错误的单位 (GH 55730)
to_datetime() 和 DatetimeIndex 中的错误，当传递混合字符串和数字类型的列表时，错误地引发异常 (GH 55780)
to_datetime() 和 DatetimeIndex 中的错误，当传递包含混合时区或混合时区意识的混合类型对象时未能引发 ValueError (GH 55693)
Tick.delta() 中的错误，当使用非常大的刻度时引发 OverflowError 而不是 OutOfBoundsTimedelta (GH 55503)
DatetimeIndex.shift() 中的错误，非纳秒分辨率的数据错误地返回纳秒分辨率的结果 (GH 56117)
DatetimeIndex.union() 中的错误，对于具有相同时区但不同单位的时区感知索引返回对象数据类型 (GH 55238)
Index.is_monotonic_increasing() 和 Index.is_monotonic_decreasing() 中的错误，当索引中的第一个值是 NaT 时，总是将 Index.is_unique() 缓存为 True (GH 55755)
Index.view() 中的错误，转换为不受支持分辨率的 datetime64 数据类型时错误地引发异常 (GH 55710)
Series.dt.round() 中的错误，对于非纳秒分辨率和 NaT 条目，错误地引发 OverflowError (GH 56158)
Series.fillna() 中的错误，对于非纳秒分辨率数据类型和更高分辨率的向量值，返回不正确（内部损坏）的结果 (GH 56410)
Timestamp.unit() 中的错误，从具有分钟或小时分辨率和时区偏移的 ISO8601 格式字符串中推断不正确 (GH 56208)
.astype 中的错误，从更高分辨率的 datetime64 数据类型转换为更低分辨率的 datetime64 数据类型（例如 datetime64[us]->datetime64[ms]）时，在接近下限时默默溢出 (GH 55979)
将 Week 偏移量添加到或从具有非纳秒分辨率的 datetime64 Series、Index 或 DataFrame 列中减去时返回不正确结果的错误 (GH 55583)
将具有 offset 属性的 BusinessDay 偏移量添加到或从非纳秒 Index、Series 或 DataFrame 列中减去时给出不正确结果的错误 (GH 55608)
将带有微秒组件的 DateOffset 对象添加到或从具有非纳秒分辨率的 datetime64 Index、Series 或 DataFrame 列中减去的错误 (GH 55595)
将非常大的 Tick 对象添加到或从 Timestamp 或 Timedelta 对象中减去时引发 OverflowError 而不是 OutOfBoundsTimedelta 的错误 (GH 55503)
创建具有非纳秒 DatetimeTZDtype 的 Index、Series 或 DataFrame，并且输入在纳秒分辨率下会超出范围时，错误地引发 OutOfBoundsDatetime 的错误 (GH 54620)
创建具有非纳秒 datetime64 (或 DatetimeTZDtype) 的 Index、Series 或 DataFrame，并将混合数字输入视为纳秒而不是数据类型单位的倍数（非混合数字输入会发生这种情况）的错误 (GH 56004)
创建具有非纳秒 datetime64 数据类型且输入对于 datetime64[ns] 将超出范围时，错误地引发 OutOfBoundsDatetime 的错误 (GH 55756)
解析具有纳秒分辨率但非 ISO8601 格式的日期时间字符串时错误地截断亚微秒组件的错误 (GH 56051)
解析具有亚秒分辨率和尾随零的日期时间字符串时错误地推断秒或毫秒分辨率的错误 (GH 55737)
to_datetime() 的结果中，浮点数据类型参数的 unit 与 Timestamp 的逐点结果不匹配的错误 (GH 56037)
修复了回归问题，其中 concat() 在连接具有不同分辨率的 datetime64 列时会引发错误 (GH 53641)

时间差#

Timedelta 构造中引发 OverflowError 而不是 OutOfBoundsTimedelta 的错误 (GH 55503)
TimedeltaIndex 和 Series 的渲染 (__repr__) 中，对于具有非纳秒分辨率且条目都是24小时倍数的 timedelta64 值，未能使用纳秒情况下的紧凑表示的错误 (GH 55405)

时区#

AbstractHolidayCalendar 中的错误，在计算假期观察结果时未传播时区数据 (GH 54580)
Timestamp 构造中，当使用一个模糊值和 pytz 时区时未能引发 pytz.AmbiguousTimeError 的错误 (GH 55657)
Timestamp.tz_localize() 中的错误，在夏令时期间使用 nonexistent="shift_forward 绕过 UTC+0 (GH 51501)

数字#

read_csv() 中的错误，当 engine="pyarrow" 时导致大整数的舍入错误 (GH 52505)
Series.__floordiv__() 和 Series.__truediv__() 对于带有整数数据类型的 ArrowDtype，对大除数引发异常的错误 (GH 56706)
Series.__floordiv__() 对于带有整数数据类型的 ArrowDtype，对大值引发异常的错误 (GH 56645)
Series.pow() 未正确填充缺失值的错误 (GH 55512)
Series.replace() 和 DataFrame.replace() 错误地将浮点数 0.0 与 False 匹配，反之亦然 (GH 55398)
Series.round() 对可为空的布尔数据类型引发异常的错误 (GH 55936)

转换#

DataFrame.astype() 中的错误，当对未解封的数组调用 str 时 - 数组可能会就地改变 (GH 54654)
DataFrame.astype() 中的错误，其中 errors="ignore" 对扩展类型无效 (GH 54654)
Series.convert_dtypes() 未将所有 NA 列转换为 null[pyarrow] 的错误 (GH 55346)
:meth:DataFrame.loc 中的错误，在使用完整的列设置器（例如 df.loc[:, 'a'] = incompatible_value）分配具有不同数据类型的 Series 时，未抛出“不兼容数据类型警告”（参见 PDEP6）(GH 39584)

字符串#

pandas.api.types.is_string_dtype() 中的错误，检查没有元素的对象数组是否为字符串数据类型时出现 (GH 54661)
DataFrame.apply() 中的错误，当 engine="numba" 且列或索引具有 StringDtype 时失败 (GH 56189)
DataFrame.reindex() 未与 string[pyarrow_numpy] 数据类型的 Index 匹配的错误 (GH 56106)
Index.str.cat() 总是将结果转换为对象数据类型的错误 (GH 56157)
Series.__mul__() 对于带有 pyarrow.string 数据类型的 ArrowDtype 和 pyarrow 后端的 string[pyarrow] 的错误 (GH 51970)
Series.str.find() 中的错误，当 ArrowDtype 带有 pyarrow.string 且 start < 0 时 (GH 56411)
Series.str.fullmatch() 中的错误，当 dtype=pandas.ArrowDtype(pyarrow.string())) 允许正则表达式以字面量 //$ 结尾时进行部分匹配 (GH 56652)
Series.str.replace() 中的错误，当 ArrowDtype 带有 pyarrow.string 且 n < 0 时 (GH 56404)
Series.str.startswith() 和 Series.str.endswith() 中的错误，对于带有 pyarrow.string 数据类型的 ArrowDtype，参数类型为 tuple[str, ...] 时 (GH 56579)
Series.str.startswith() 和 Series.str.endswith() 中的错误，对于 string[pyarrow]，参数类型为 tuple[str, ...] 时 (GH 54942)
对于 dtype="string[pyarrow_numpy]" 的比较操作中的错误，如果数据类型无法比较则引发异常 (GH 56008)

区间#

Interval 的 __repr__ 中的错误，不显示 Timestamp 边界的 UTC 偏移量。此外，现在将显示小时、分钟和秒组件 (GH 55015)
在 IntervalIndex.factorize() 和 Series.factorize() 中，当使用 IntervalDtype 且区间为 datetime64 或 timedelta64 类型时，未保留非纳秒单位的错误 (GH 56099)
在 IntervalIndex.from_arrays() 中存在错误，当传入分辨率不匹配的 datetime64 或 timedelta64 数组时，会构建一个无效的 IntervalArray 对象 (GH 55714)
在 IntervalIndex.from_tuples() 中存在错误，当子类型是可空扩展 dtype 时会引发异常 (GH 56765)
在 IntervalIndex.get_indexer() 中存在错误，当 datetime 或 timedelta 区间与整数目标错误匹配 (GH 47772)
在 IntervalIndex.get_indexer() 中存在错误，当带有时区信息的 datetime 区间与不带时区信息的目标序列错误匹配 (GH 47772)
在使用切片对带有 Series 的 IntervalIndex 设置值时错误地引发异常 (GH 54722)

索引#

在 DataFrame.loc() 中存在错误，当 DataFrame 具有 MultiIndex 时，会修改布尔索引器 (GH 56635)
在 DataFrame.loc() 中存在错误，当将具有扩展 dtype 的 Series 设置为 NumPy dtype 时 (GH 55604)
在 Index.difference() 中存在错误，当 other 为空或 other 被认为是不可比较时，未返回唯一值集 (GH 55113)
在将 Categorical 值设置到带有 numpy dtypes 的 DataFrame 时引发 RecursionError 的错误 (GH 52927)
修复了设置单个字符串值时创建带有缺失值的新列的错误 (GH 56204)

缺失值处理#

在 DataFrame.update() 中存在错误，对于带时区的 datetime64 dtypes 未能原地更新 (GH 56227)

多级索引#

在 MultiIndex.get_indexer() 中存在错误，当提供了 method 且索引为非单调时，未引发 ValueError (GH 53452)

输入/输出#

在 read_csv() 中存在错误，当指定 skiprows 时，engine="python" 不尊重 chunksize 参数 (GH 56323)
在 read_csv() 中存在错误，当指定可调用的 skiprows 和分块大小时，engine="python" 会导致 TypeError (GH 55677)
在 read_csv() 中存在错误，on_bad_lines="warn" 会写入 stderr 而不是引发 Python 警告；现在这会产生一个 errors.ParserWarning (GH 54296)
在 read_csv() 中存在错误，当 engine="pyarrow" 时，quotechar 被忽略 (GH 52266)
在 read_csv() 中存在错误，当 engine="pyarrow" 时，usecols 对没有标题的 CSV 文件不起作用 (GH 54459)
在 read_excel() 中存在错误，当 engine="xlrd"（xls 文件）的文件包含 NaN 或 Inf 时会出错 (GH 54564)
在 read_json() 中存在错误，如果设置了 infer_string，则无法正确处理 dtype 转换 (GH 56195)
在 DataFrame.to_excel() 中存在错误，OdsWriter（ods 文件）写入布尔/字符串值 (GH 54994)
在 DataFrame.to_hdf() 和 read_hdf() 中存在错误，当 datetime64 dtypes 具有非纳秒分辨率时，无法正确往返 (GH 55622)
在 DataFrame.to_stata() 中存在错误，当扩展 dtypes 时会引发异常 (GH 54671)
在 read_excel() 中存在错误，当使用 engine="odf"（ods 文件）且字符串单元格包含注释时 (GH 55200)
在 read_excel() 中存在错误，当 ODS 文件中浮点值没有缓存的格式化单元格时 (GH 55219)
一个错误，DataFrame.to_json() 对于不支持的 NumPy 类型会引发 OverflowError 而不是 TypeError (GH 55403)

周期#

在 PeriodIndex 构造中存在错误，当传入 data、ordinal 和 **fields 中多于一个参数时，未能引发 ValueError (GH 55961)
在 Period 加法中存在错误，会默默地回绕而不是引发 OverflowError (GH 55503)
从带有非纳秒单位的 PeriodDtype 使用 astype 转换为 datetime64 或 DatetimeTZDtype 时存在错误，会错误地返回纳秒单位 (GH 55958)

绘图#

在 DataFrame.plot.box() 中存在错误，当 vert=False 且 Matplotlib Axes 使用 sharey=True 创建时 (GH 54941)
在 DataFrame.plot.scatter() 中存在错误，会丢弃字符串列 (GH 56142)
在 Series.plot() 中存在错误，当重用 ax 对象时，若传入 how 关键字则未能引发异常 (GH 55953)

分组/重采样/滚动#

在 DataFrameGroupBy.idxmin()、DataFrameGroupBy.idxmax()、SeriesGroupBy.idxmin() 和 SeriesGroupBy.idxmax() 中存在错误，当索引是包含 NA 值的 CategoricalIndex 时，不会保留 Categorical dtype (GH 54234)
在 DataFrameGroupBy.transform() 和 SeriesGroupBy.transform() 中存在错误，当 observed=False 且 f="idxmin" 或 f="idxmax" 时，对于未观测到的类别会错误地引发异常 (GH 54234)
在 DataFrameGroupBy.value_counts() 和 SeriesGroupBy.value_counts() 中存在错误，如果 DataFrame 的列或 Series 的名称是整数，可能会导致错误的排序 (GH 55951)
在 DataFrameGroupBy.value_counts() 和 SeriesGroupBy.value_counts() 中存在错误，在 DataFrame.groupby() 和 Series.groupby() 中不尊重 sort=False (GH 55951)
在 DataFrameGroupBy.value_counts() 和 SeriesGroupBy.value_counts() 中存在错误，当 sort=True 且 normalize=True 时，会按比例而不是频率进行排序 (GH 55951)
在 DataFrame.asfreq() 和 Series.asfreq() 中存在错误，当 DatetimeIndex 具有非纳秒分辨率时，错误地转换为纳秒分辨率 (GH 55958)
在 DataFrame.ewm() 中存在错误，当传入的 times 具有非纳秒 datetime64 或 DatetimeTZDtype dtype 时 (GH 56262)
在 DataFrame.groupby() 和 Series.groupby() 中存在错误，当 sort=True 时，按 Decimal 和 NA 值组合进行分组会失败 (GH 54847)
在 DataFrame.groupby() 中存在错误，当为 DataFrame 子类选择列子集以应用函数时 (GH 56761)
在 DataFrame.resample() 中存在错误，不尊重 BusinessDay 的 closed 和 label 参数 (GH 55282)
在 DataFrame.resample() 中存在错误，当对 pyarrow.timestamp 或 pyarrow.duration 类型的 ArrowDtype 进行重采样时 (GH 55989)
在 DataFrame.resample() 中存在错误，对于 BusinessDay，bin 边缘不正确 (GH 55281)
在 DataFrame.resample() 中存在错误，对于 MonthBegin，bin 边缘不正确 (GH 55271)
在 DataFrame.rolling() 和 Series.rolling() 中存在错误，当 closed='left' 和 closed='neither' 时，重复的日期时间类索引被视为连续而非相等 (GH 20712)
在 DataFrame.rolling() 和 Series.rolling() 中存在错误，当 index 或 on 列是带有 pyarrow.timestamp 类型的 ArrowDtype 时 (GH 55849)

重塑#

在 concat() 中存在错误，当传入 DatetimeIndex 索引时会忽略 sort 参数 (GH 54769)
在 concat() 中存在错误，当 ignore_index=False 时会重命名 Series (GH 15047)
在 merge_asof() 中存在错误，当 by dtype 不是 object、int64 或 uint64 时会引发 TypeError (GH 22794)
在 merge_asof() 中存在错误，对于字符串 dtype 会引发不正确的错误 (GH 56444)
在 merge_asof() 中存在错误，当对 ArrowDtype 列使用 Timedelta 容差时 (GH 56486)
在 merge() 中存在错误，当合并 datetime 列和 timedelta 列时未引发异常 (GH 56455)
在 merge() 中存在错误，当合并字符串列和数字列时未引发异常 (GH 56441)
在 merge() 中存在错误，对于新的字符串 dtype 未排序 (GH 56442)
在 merge() 中存在错误，当左侧和/或右侧为空时，返回的列顺序不正确 (GH 51929)
在 DataFrame.melt() 中存在错误，如果 var_name 不是字符串，则会引发异常 (GH 55948)
在 DataFrame.melt() 中存在错误，它不会保留 datetime (GH 55254)
在 DataFrame.pivot_table() 中存在错误，当列具有数字名称时，行边距不正确 (GH 26568)
在 DataFrame.pivot() 中存在错误，当数据具有数字列和扩展 dtype 时 (GH 56528)
在 DataFrame.stack() 中存在错误，当 future_stack=True 时，不会保留索引中的 NA 值 (GH 56573)

稀疏#

在 arrays.SparseArray.take() 中存在错误，当使用与数组填充值不同的填充值时 (GH 55181)

其他#

DataFrame.__dataframe__() 不支持 pyarrow large strings (GH 56702)
在 DataFrame.describe() 中存在错误，当格式化百分位数时，结果中的 99.999% 百分位数被四舍五入为 100% (GH 55765)
在 api.interchange.from_dataframe() 中存在错误，当处理空字符串列时会引发 NotImplementedError (GH 56703)
在 cut() 和 qcut() 中存在错误，当 datetime64 dtype 值具有非纳秒单位时，会错误地返回纳秒单位的 bin (GH 56101)
在 cut() 中存在错误，错误地允许使用不带时区信息的 bin 对带时区信息的 datetime 进行切割 (GH 54964)
在 infer_freq() 和 DatetimeIndex.inferred_freq() 中存在错误，当每周频率和非纳秒分辨率时 (GH 55609)
在 DataFrame.apply() 中存在错误，当传入 raw=True 时，会忽略传递给应用函数的 args (GH 55009)
在 DataFrame.from_dict() 中存在错误，它总是对创建的 DataFrame 的行进行排序。 (GH 55683)
在 DataFrame.sort_index() 中存在错误，当传入 axis="columns" 和 ignore_index=True 时会引发 ValueError (GH 56478)
在 DataFrame 中渲染 inf 值时存在错误，当启用了 use_inf_as_na 选项时 (GH 55483)
在渲染带有 Series 的 MultiIndex 时存在错误，当其中一个索引级别的名称为 0 时，该名称未显示 (GH 55415)
将空 DataFrame 分配给列时，错误消息中存在错误 (GH 55956)
当时间字符串转换为带有 pyarrow.time64 类型的 ArrowDtype 时存在错误 (GH 56463)
修复了当在 core.window.Rolling.apply 中使用 engine="numba" 传递 numpy ufunc 时，来自 numba >= 0.58.0 的虚假弃用警告 (GH 55247)

贡献者#

共有 162 人为本次发布贡献了补丁。名字旁边带有“+”的人是首次贡献补丁。

AG
Aaron Rahman +
Abdullah Ihsan Secer +
Abhijit Deo +
Adrian D’Alessandro
Ahmad Mustafa Anis +
Amanda Bizzinotto
Amith KK +
Aniket Patil +
Antonio Fonseca +
Artur Barseghyan
Ben Greiner
Bill Blum +
Boyd Kane
Damian Kula
Dan King +
Daniel Weindl +
Daniele Nicolodi
David Poznik
David Toneian +
Dea María Léon
Deepak George +
Dmitriy +
Dominique Garmier +
Donald Thevalingam +
Doug Davis +
Dukastlik +
Elahe Sharifi +
Eric Han +
Fangchen Li
Francisco Alfaro +
Gadea Autric +
Guillaume Lemaitre
Hadi Abdi Khojasteh
Hedeer El Showk +
Huanghz2001 +
Isaac Virshup
Issam +
Itay Azolay +
Itayazolay +
Jaca +
Jack McIvor +
JackCollins91 +
James Spencer +
Jay
Jessica Greene
Jirka Borovec +
JohannaTrost +
John C +
Joris Van den Bossche
José Lucas Mayer +
José Lucas Silva Mayer +
João Andrade +
Kai Mühlbauer
Katharina Tielking, MD +
Kazuto Haruguchi +
Kevin
Lawrence Mitchell
Linus +
Linus Sommer +
Louis-Émile Robitaille +
Luke Manley
Lumberbot (aka Jack)
Maggie Liu +
MainHanzo +
Marc Garcia
Marco Edward Gorelli
MarcoGorelli
Martin Šícho +
Mateusz Sokół
Matheus Felipe +
Matthew Roeschke
Matthias Bussonnier
Maxwell Bileschi +
Michael Tiemann
Michał Górny
Molly Bowers +
Moritz Schubert +
NNLNR +
Natalia Mokeeva
Nils Müller-Wendt +
Omar Elbaz
Pandas Development Team
Paras Gupta +
Parthi
Patrick Hoefler
Paul Pellissier +
Paul Uhlenbruck +
Philip Meier
Philippe THOMY +
Quang Nguyễn
Raghav
Rajat Subhra Mukherjee
Ralf Gommers
Randolf Scholz +
Richard Shadrach
Rob +
Rohan Jain +
Ryan Gibson +
Sai-Suraj-27 +
Samuel Oranyeli +
Sara Bonati +
Sebastian Berg
Sergey Zakharov +
Shyamala Venkatakrishnan +
StEmGeo +
Stefanie Molin
Stijn de Gooijer +
Thiago Gariani +
Thomas A Caswell
Thomas Baumann +
Thomas Guillet +
Thomas Lazarus +
Thomas Li
Tim Hoffmann
Tim Swast
Tom Augspurger
Toro +
Torsten Wörtwein
Ville Aikas +
Vinita Parasrampuria +
Vyas Ramasubramani +
William Andrea
William Ayd
Willian Wang +
Xiao Yuan
Yao Xiao
Yves Delley
Zemux1613 +
Ziad Kermadi +
aaron-robeson-8451 +
aram-cinnamon +
caneff +
ccccjone +
chris-caballero +
cobalt
color455nm +
denisrei +
dependabot[bot]
jbrockmendel
jfadia +
johanna.trost +
kgmuzungu +
mecopur +
mhb143 +
morotti +
mvirts +
omar-elbaz
paulreece
pre-commit-ci[bot]
raj-thapa
rebecca-palmer
rmhowe425
rohanjain101
shiersansi +
smij720
srkds +
taytzehao
torext
vboxuser +
xzmeng +
yashb +

2.2.0 新特性 (2024 年 1 月 19 日)#

pandas 3.0 的未来变化#

写时复制 (Copy-on-Write)#

默认专用字符串数据类型（由 Arrow 支持）#

增强功能#

to_sql 和 read_sql 中的 ADBC 驱动程序支持#

根据一个或多个条件创建 pandas Series#

用于 NumPy 可空类型和 Arrow 类型的 to_numpy 转换为合适的 NumPy dtype#

PyArrow 结构化数据的 Series.struct 访问器#

PyArrow 列表数据的 Series.list 访问器#

read_excel() 的 Calamine 引擎#

其他增强功能#

显著的错误修复#

merge() 和 DataFrame.join() 现在一致遵循文档中描述的排序行为#

当级别不同时，merge() 和 DataFrame.join() 不再重新排序级别#

提高的依赖项最低版本#

其他 API 变更#

弃用#

链式赋值#

弃用偏移量中的别名 M、Q、Y 等，转而使用 ME、QE、YE 等#

弃用自动向下转型#

其他弃用#

性能改进#

错误修复#

分类数据#

日期时间类#

时间差#

时区#

数字#

转换#

字符串#

区间#

索引#

缺失值处理#

多级索引#

输入/输出#

周期#

绘图#

分组/重采样/滚动#

重塑#

稀疏#

其他#

贡献者#

用于 NumPy 可空类型和 Arrow 类型的 `to_numpy` 转换为合适的 NumPy dtype#

`read_excel()` 的 Calamine 引擎#

`merge()` 和 `DataFrame.join()` 现在一致遵循文档中描述的排序行为#

当级别不同时，`merge()` 和 `DataFrame.join()` 不再重新排序级别#

弃用偏移量中的别名 `M`、`Q`、`Y` 等，转而使用 `ME`、`QE`、`YE` 等#