2.0.0 版本新特性 (2023 年 4 月 3 日)#

以下是 pandas 2.0.0 中的更改。有关包括其他 pandas 版本在内的完整更改日志，请参阅发行说明。

增强功能#

使用 pip extras 安装可选依赖项#

使用 pip 安装 pandas 时，还可以通过指定 extras 来安装一系列可选依赖项。

pip install "pandas[performance, aws]>=2.0.0"

可用的 extras 在安装指南中可以找到，包括 [all, performance, computation, fss, aws, gcp, excel, parquet, feather, hdf5, spss, postgresql, mysql, sql-other, html, xml, plot, output_formatting, clipboard, compression, test] (GH 39164)。

`Index` 现在可以容纳 NumPy 数值数据类型#

Index 现在可以使用任何 NumPy 数值数据类型 (GH 42717)。

此前，只能使用 int64、uint64 和 float64 数据类型

In [1]: pd.Index([1, 2, 3], dtype=np.int8)
Out[1]: Int64Index([1, 2, 3], dtype="int64")
In [2]: pd.Index([1, 2, 3], dtype=np.uint16)
Out[2]: UInt64Index([1, 2, 3], dtype="uint64")
In [3]: pd.Index([1, 2, 3], dtype=np.float32)
Out[3]: Float64Index([1.0, 2.0, 3.0], dtype="float64")

Int64Index、UInt64Index 和 Float64Index 在 pandas 1.4 版本中已弃用，现已移除。现在应直接使用 Index，它现在可以接受所有 NumPy 数值数据类型，即 int8/int16/int32/int64/uint8/uint16/uint32/uint64/float32/float64 数据类型

In [1]: pd.Index([1, 2, 3], dtype=np.int8)
Out[1]: Index([1, 2, 3], dtype='int8')

In [2]: pd.Index([1, 2, 3], dtype=np.uint16)
Out[2]: Index([1, 2, 3], dtype='uint16')

In [3]: pd.Index([1, 2, 3], dtype=np.float32)
Out[3]: Index([1.0, 2.0, 3.0], dtype='float32')

Index 能够容纳 NumPy 数值数据类型意味着 Pandas 功能发生了一些变化。特别是，以前强制创建 64 位索引的操作，现在可以创建位大小更低的索引，例如 32 位索引。

下面是可能不完全的更改列表

现在使用 NumPy 数值数组实例化时会遵循 NumPy 数组的数据类型。以前，所有从 NumPy 数值数组创建的索引都被强制转换为 64 位。现在，例如，Index(np.array([1, 2, 3])) 在 32 位系统上将是 int32，而以前即使在 32 位系统上也会是 int64。使用数字列表实例化 Index 仍将返回 64 位数据类型，例如 Index([1, 2, 3]) 将具有 int64 数据类型，这与以前相同。

DatetimeIndex 的各种数值日期时间属性（day、month、year 等）以前是 int64 数据类型，而 arrays.DatetimeArray 则是 int32。现在它们在 DatetimeIndex 上也是 int32

In [4]: idx = pd.date_range(start='1/1/2018', periods=3, freq='ME')

In [5]: idx.array.year
Out[5]: array([2018, 2018, 2018], dtype=int32)

In [6]: idx.year
Out[6]: Index([2018, 2018, 2018], dtype='int32')

从 Series.sparse.from_coo() 创建的 Index 上的层级数据类型现在是 int32，与 scipy 稀疏矩阵上的 rows/cols 数据类型相同。以前它们是 int64 数据类型。

In [7]: from scipy import sparse

In [8]: A = sparse.coo_matrix(
   ...:     ([3.0, 1.0, 2.0], ([1, 0, 0], [0, 2, 3])), shape=(3, 4)
   ...: )
   ...: 

In [9]: ser = pd.Series.sparse.from_coo(A)

In [10]: ser.index.dtypes
Out[10]: 
level_0    int32
level_1    int32
dtype: object

Index 无法使用 float16 数据类型实例化。以前，使用 float16 数据类型实例化 Index 会导致一个 Float64Index 具有 float64 数据类型。现在它会引发 NotImplementedError

In [11]: pd.Index([1, 2, 3], dtype=np.float16)
---------------------------------------------------------------------------
NotImplementedError                       Traceback (most recent call last)
Cell In[11], line 1
----> 1 pd.Index([1, 2, 3], dtype=np.float16)

File ~/work/pandas/pandas/pandas/core/indexes/base.py:577, in Index.__new__(cls, data, dtype, copy, name, tupleize_cols)
    573 arr = ensure_wrapped_if_datetimelike(arr)
    575 klass = cls._dtype_to_subclass(arr.dtype)
--> 577 arr = klass._ensure_array(arr, arr.dtype, copy=False)
    578 result = klass._simple_new(arr, name, refs=refs)
    579 if dtype is None and is_pandas_object and data_dtype == np.object_:

File ~/work/pandas/pandas/pandas/core/indexes/base.py:602, in Index._ensure_array(cls, data, dtype, copy)
    599     raise ValueError("Index data must be 1-dimensional")
    600 elif dtype == np.float16:
    601     # float16 not supported (no indexing engine)
--> 602     raise NotImplementedError("float16 indexes are not supported")
    604 if copy:
    605     # asarray_tuplesafe does not always copy underlying data,
    606     #  so need to make sure that this happens
    607     data = data.copy()

NotImplementedError: float16 indexes are not supported

参数 `dtype_backend`，用于返回 PyArrow 支持或 NumPy 支持的可空数据类型#

以下函数新增了关键字 dtype_backend (GH 36712)

当此选项设置为 "numpy_nullable" 时，它将返回一个由可空数据类型支持的 DataFrame。

当此关键字设置为 "pyarrow" 时，这些函数将返回由 PyArrow 支持的可空 ArrowDtype DataFrame (GH 48957, GH 49997)

In [12]: import io

In [13]: data = io.StringIO("""a,b,c,d,e,f,g,h,i
   ....:     1,2.5,True,a,,,,,
   ....:     3,4.5,False,b,6,7.5,True,a,
   ....: """)
   ....: 

In [14]: df = pd.read_csv(data, dtype_backend="pyarrow")

In [15]: df.dtypes
Out[15]: 
a     int64[pyarrow]
b    double[pyarrow]
c      bool[pyarrow]
d    string[pyarrow]
e     int64[pyarrow]
f    double[pyarrow]
g      bool[pyarrow]
h    string[pyarrow]
i      null[pyarrow]
dtype: object

In [16]: data.seek(0)
Out[16]: 0

In [17]: df_pyarrow = pd.read_csv(data, dtype_backend="pyarrow", engine="pyarrow")

In [18]: df_pyarrow.dtypes
Out[18]: 
a     int64[pyarrow]
b    double[pyarrow]
c      bool[pyarrow]
d    string[pyarrow]
e     int64[pyarrow]
f    double[pyarrow]
g      bool[pyarrow]
h    string[pyarrow]
i      null[pyarrow]
dtype: object

写时复制改进#

在写时复制优化中列出的方法中，新增了一种惰性复制机制，它将复制延迟到相关对象被修改时进行。当启用写时复制时，这些方法会返回视图，这与常规执行相比提供了显著的性能提升 (GH 49473)。
当启用写时复制时，每次将 DataFrame 的单个列作为 Series 访问（例如 df["col"]）时，现在总是返回一个新对象（而不是多次返回一个相同的、缓存的 Series 对象）。这确保了这些 Series 对象正确遵循写时复制规则 (GH 49450)
当从现有 Series 构造 Series 时，默认 copy=False，Series 构造函数现在将创建一个惰性副本（将复制延迟到数据发生修改时进行）(GH 50471)
当从现有 DataFrame 构造 DataFrame 时，默认 copy=False，DataFrame 构造函数现在将创建一个惰性副本（将复制延迟到数据发生修改时进行）(GH 51239)
当从 Series 对象的字典构造 DataFrame 并指定 copy=False 时，DataFrame 构造函数现在将对 DataFrame 的列使用这些 Series 对象的惰性副本 (GH 50777)
当从 Series 或 Index 构造 DataFrame 并指定 copy=False 时，DataFrame 构造函数现在将遵循写时复制。
当从 NumPy 数组构造时，DataFrame 和 Series 构造函数现在将默认复制数组，以避免在修改数组时改变 DataFrame / Series。指定 copy=False 以获取旧行为。当设置 copy=False 时，如果 NumPy 数组在 DataFrame / Series 创建后被修改，pandas 不保证正确的写时复制行为。
当使用 DataFrame 调用时，DataFrame.from_records() 现在将遵循写时复制。
尝试使用链式赋值设置值（例如 df["a"][1:3] = 0）在启用写时复制时现在总是会引发警告。在此模式下，链式赋值永远无法工作，因为我们总是设置到索引操作（getitem）结果的临时对象中，而该对象在写时复制下始终表现为副本。因此，通过链式赋值永远无法更新原始 Series 或 DataFrame。因此，会向用户引发一条信息性警告，以避免静默地不执行任何操作 (GH 49467)
当 inplace=True 时，DataFrame.replace() 现在将遵循写时复制机制。
DataFrame.transpose() 现在将遵循写时复制机制。
可以原地执行的算术运算，例如 ser *= 2 现在将遵循写时复制机制。
当 DataFrame 具有 MultiIndex 列时，DataFrame.__getitem__() 现在将遵循写时复制机制。
当 Series 具有 MultiIndex 时，Series.__getitem__() 现在将遵循写时复制机制。
Series 具有 MultiIndex。
Series.view() 现在将遵循写时复制机制。

写时复制可以通过以下方式之一启用

pd.set_option("mode.copy_on_write", True)

pd.options.mode.copy_on_write = True

或者，写时复制可以在本地通过以下方式启用

with pd.option_context("mode.copy_on_write", True):
    ...

其他增强功能#

在使用 ArrowDtype 并结合 pyarrow.string 类型时，增加了对 str 访问器方法的支持 (GH 50325)
在使用 ArrowDtype 并结合 pyarrow.timestamp 类型时，增加了对 dt 访问器方法的支持 (GH 50954)
read_sas() 现在支持使用 encoding='infer' 来正确读取和使用 SAS 文件指定的编码。 (GH 48048)
DataFrameGroupBy.quantile()、SeriesGroupBy.quantile() 和 DataFrameGroupBy.std() 现在保留可空数据类型而不是转换为 NumPy 数据类型 (GH 37493)
DataFrameGroupBy.std()、SeriesGroupBy.std() 现在支持 datetime64、timedelta64 和 DatetimeTZDtype 数据类型 (GH 48481)
Series.add_suffix()、DataFrame.add_suffix()、Series.add_prefix() 和 DataFrame.add_prefix() 支持 axis 参数。如果设置了 axis，则可以覆盖要考虑的默认轴行为 (GH 47819)
testing.assert_frame_equal() 现在显示 DataFrames 不同的第一个元素，类似于 pytest 的输出 (GH 47910)
为 DataFrame.to_dict() 添加了 index 参数 (GH 46398)
merge() 中增加了对扩展数组数据类型的支持 (GH 44240)
为 DataFrame 上的二进制运算符添加了元数据传播 (GH 28283)
通过 _accumulate 为 ExtensionArray 接口添加了 cumsum、cumprod、cummin 和 cummax (GH 28385)
CategoricalConversionWarning、InvalidComparison、InvalidVersion、LossySetitemError 和 NoBufferPresent 现在在 pandas.errors 中公开 (GH 27656)
通过添加缺失的测试包 pytest-asyncio 修复 test optional_extra (GH 48361)
改进了 DataFrame.astype() 抛出的异常消息，以便在无法进行类型转换时包含列名。 (GH 47571)
date_range() 现在支持 unit 关键字（“s”、“ms”、“us”或“ns”），用于指定输出索引的所需分辨率 (GH 49106)
timedelta_range() 现在支持 unit 关键字（“s”、“ms”、“us”或“ns”），用于指定输出索引的所需分辨率 (GH 49824)
DataFrame.to_json() 现在支持 mode 关键字，支持的输入为 'w' 和 'a'。默认为 'w'，当 lines=True 且 orient='records' 时，可以使用 'a' 将面向记录的 JSON 行附加到现有 JSON 文件。 (GH 35849)
为 IntervalIndex.from_breaks()、IntervalIndex.from_arrays() 和 IntervalIndex.from_tuples() 添加了 name 参数 (GH 48911)
在使用 testing.assert_frame_equal() 对 DataFrame 进行操作时，改进了异常消息，使其包含被比较的列 (GH 50323)
当连接列重复时，merge_asof() 的错误消息得到了改进 (GH 50102)
get_dummies() 增加了对扩展数组数据类型的支持 (GH 32430)
增加了 Index.infer_objects()，类似于 Series.infer_objects() (GH 50034)
为 Series.infer_objects() 和 DataFrame.infer_objects() 添加了 copy 参数，传入 False 将避免为已经非对象类型或无法推断出更好数据类型的 Series 或列创建副本 (GH 50096)
DataFrame.plot.hist() 现在识别 xlabel 和 ylabel 参数 (GH 49793)
Series.drop_duplicates() 获得了 ignore_index 关键字以重置索引 (GH 48304)
Series.dropna() 和 DataFrame.dropna() 获得了 ignore_index 关键字以重置索引 (GH 31725)
改进了 to_datetime() 中非 ISO8601 格式的错误消息，告知用户第一个错误的位置 (GH 50361)
改进了尝试对 DataFrame 对象进行对齐操作（例如在 DataFrame.compare() 中）时的错误消息，以澄清“标签相同”指的是索引和列 (GH 50083)
增加了对 PyArrow 字符串数据类型的 Index.min() 和 Index.max() 的支持 (GH 51397)
增加了 DatetimeIndex.as_unit() 和 TimedeltaIndex.as_unit()，用于转换为不同的分辨率；支持的分辨率是“s”、“ms”、“us”和“ns” (GH 50616)
增加了 Series.dt.unit() 和 Series.dt.as_unit()，用于转换为不同的分辨率；支持的分辨率是“s”、“ms”、“us”和“ns” (GH 51223)
为 read_sql() 添加了新参数 dtype，以便与 read_sql_query() 保持一致 (GH 50797)
read_csv()、read_table()、read_fwf() 和 read_excel() 现在接受 date_format (GH 50601)
to_datetime() 现在接受 "ISO8601" 作为 format 的参数，它将匹配任何 ISO8601 字符串（但可能格式不完全相同）(GH 50411)
to_datetime() 现在接受 "mixed" 作为 format 的参数，它将单独推断每个元素的格式 (GH 50972)
为 read_json() 添加了新参数 engine，通过指定 engine="pyarrow" 来支持使用 PyArrow 解析 JSON (GH 48893)
增加了对 SQLAlchemy 2.0 的支持 (GH 40686)
在 read_csv() 中当 engine="pyarrow" 时，增加了对 decimal 参数的支持 (GH 51302)
Index 集合操作 Index.union()、Index.intersection()、Index.difference() 和 Index.symmetric_difference() 现在支持 sort=True，它总是返回排序结果，与默认 sort=None 在某些情况下不排序不同 (GH 25151)
添加了新的转义模式“latex-math”，以避免在格式化程序中转义“$” (GH 50040)

显著的错误修复#

这些错误修复可能会导致显著的行为变化。

`DataFrameGroupBy.cumsum()` 和 `DataFrameGroupBy.cumprod()` 溢出而不是有损转换为浮点数#

在以前的版本中，应用 cumsum 和 cumprod 时会转换为浮点数，即使结果可以用 int64 数据类型表示，也会导致不正确的结果。此外，当达到 int64 的限制时，聚合会像 NumPy 以及常规 DataFrame.cumprod() 和 DataFrame.cumsum() 方法一样溢出 (GH 37493)。

旧行为

In [1]: df = pd.DataFrame({"key": ["b"] * 7, "value": 625})
In [2]: df.groupby("key")["value"].cumprod()[5]
Out[2]: 5.960464477539062e+16

我们返回了第 6 个值的不正确结果。

新行为

In [19]: df = pd.DataFrame({"key": ["b"] * 7, "value": 625})

In [20]: df.groupby("key")["value"].cumprod()
Out[20]: 
0                   625
1                390625
2             244140625
3          152587890625
4        95367431640625
5     59604644775390625
6    359414837200037393
Name: value, dtype: int64

我们在第 7 个值时溢出，但第 6 个值仍然正确。

`DataFrameGroupBy.nth()` 和 `SeriesGroupBy.nth()` 现在表现为过滤操作#

在以前的 pandas 版本中，DataFrameGroupBy.nth() 和 SeriesGroupBy.nth() 表现得好像它们是聚合操作一样。然而，对于大多数输入 n，它们可能会为每个组返回零行或多行。这意味着它们是过滤操作，类似于例如 DataFrameGroupBy.head()。pandas 现在将其视为过滤操作 (GH 13666)。

In [21]: df = pd.DataFrame({"a": [1, 1, 2, 1, 2], "b": [np.nan, 2.0, 3.0, 4.0, 5.0]})

In [22]: gb = df.groupby("a")

旧行为

In [5]: gb.nth(n=1)
Out[5]:
   A    B
1  1  2.0
4  2  5.0

新行为

In [23]: gb.nth(n=1)
Out[23]: 
   a    b
1  1  2.0
4  2  5.0

特别是，结果的索引是通过选择适当的行从输入中派生出来的。此外，当 n 大于组大小时，返回的是零行而不是 NaN。

旧行为

In [5]: gb.nth(n=3, dropna="any")
Out[5]:
    B
A
1 NaN
2 NaN

新行为

In [24]: gb.nth(n=3, dropna="any")
Out[24]: 
Empty DataFrame
Columns: [a, b]
Index: []

向后不兼容的 API 更改#

使用不支持分辨率的 datetime64 或 timedelta64 数据类型构造#

在过去的版本中，当构造 Series 或 DataFrame 并传入不支持分辨率（即除了“ns”之外的任何分辨率）的“datetime64”或“timedelta64”数据类型时，pandas 会静默地将给定的数据类型替换为其纳秒级对应物

旧行为:

In [5]: pd.Series(["2016-01-01"], dtype="datetime64[s]")
Out[5]:
0   2016-01-01
dtype: datetime64[ns]

In [6] pd.Series(["2016-01-01"], dtype="datetime64[D]")
Out[6]:
0   2016-01-01
dtype: datetime64[ns]

在 pandas 2.0 中，我们支持“s”、“ms”、“us”和“ns”分辨率。当传入支持的数据类型（例如“datetime64[s]”）时，结果现在具有完全请求的数据类型

新行为:

In [25]: pd.Series(["2016-01-01"], dtype="datetime64[s]")
Out[25]: 
0   2016-01-01
dtype: datetime64[s]

对于不支持的数据类型，pandas 现在会引发错误，而不是静默地替换为受支持的数据类型

新行为:

In [26]: pd.Series(["2016-01-01"], dtype="datetime64[D]")
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[26], line 1
----> 1 pd.Series(["2016-01-01"], dtype="datetime64[D]")

File ~/work/pandas/pandas/pandas/core/series.py:584, in Series.__init__(self, data, index, dtype, name, copy, fastpath)
    582         data = data.copy()
    583 else:
--> 584     data = sanitize_array(data, index, dtype, copy)
    586     manager = _get_option("mode.data_manager", silent=True)
    587     if manager == "block":

File ~/work/pandas/pandas/pandas/core/construction.py:648, in sanitize_array(data, index, dtype, copy, allow_2d)
    645     subarr = np.array([], dtype=np.float64)
    647 elif dtype is not None:
--> 648     subarr = _try_cast(data, dtype, copy)
    650 else:
    651     subarr = maybe_convert_platform(data)

File ~/work/pandas/pandas/pandas/core/construction.py:808, in _try_cast(arr, dtype, copy)
    803     return lib.ensure_string_array(arr, convert_na_value=False, copy=copy).reshape(
    804         shape
    805     )
    807 elif dtype.kind in "mM":
--> 808     return maybe_cast_to_datetime(arr, dtype)
    810 # GH#15832: Check if we are requesting a numeric dtype and
    811 # that we can convert the data to the requested dtype.
    812 elif dtype.kind in "iu":
    813     # this will raise if we have e.g. floats

File ~/work/pandas/pandas/pandas/core/dtypes/cast.py:1228, in maybe_cast_to_datetime(value, dtype)
   1224     raise TypeError("value must be listlike")
   1226 # TODO: _from_sequence would raise ValueError in cases where
   1227 #  _ensure_nanosecond_dtype raises TypeError
-> 1228 _ensure_nanosecond_dtype(dtype)
   1230 if lib.is_np_dtype(dtype, "m"):
   1231     res = TimedeltaArray._from_sequence(value, dtype=dtype)

File ~/work/pandas/pandas/pandas/core/dtypes/cast.py:1285, in _ensure_nanosecond_dtype(dtype)
   1282     raise ValueError(msg)
   1283 # TODO: ValueError or TypeError? existing test
   1284 #  test_constructor_generic_timestamp_bad_frequency expects TypeError
-> 1285 raise TypeError(
   1286     f"dtype={dtype} is not supported. Supported resolutions are 's', "
   1287     "'ms', 'us', and 'ns'"
   1288 )

TypeError: dtype=datetime64[D] is not supported. Supported resolutions are 's', 'ms', 'us', and 'ns'

值计数将结果名称设置为 `count`#

在过去的版本中，当运行 Series.value_counts() 时，结果会继承原始对象的名称，而结果索引则没有名称。这会导致在重置索引时产生混淆，并且列名将与列值不对应。现在，结果名称将是 'count'（如果传入 normalize=True 则为 'proportion'），并且索引将以原始对象命名 (GH 49497)。

旧行为:

In [8]: pd.Series(['quetzal', 'quetzal', 'elk'], name='animal').value_counts()

Out[2]:
quetzal    2
elk        1
Name: animal, dtype: int64

新行为:

In [27]: pd.Series(['quetzal', 'quetzal', 'elk'], name='animal').value_counts()
Out[27]: 
animal
quetzal    2
elk        1
Name: count, dtype: int64

其他 value_counts 方法（例如 DataFrame.value_counts()）也类似。

禁止 astype 转换为不支持的 datetime64/timedelta64 数据类型#

在以前的版本中，将 Series 或 DataFrame 从 datetime64[ns] 转换为不同的 datetime64[X] 数据类型时，会返回 datetime64[ns] 数据类型，而不是请求的数据类型。在 pandas 2.0 中，增加了对“datetime64[s]”、“datetime64[ms]”和“datetime64[us]”数据类型的支持，因此转换为这些数据类型时会精确地得到请求的数据类型

旧行为:

In [28]: idx = pd.date_range("2016-01-01", periods=3)

In [29]: ser = pd.Series(idx)

旧行为:

In [4]: ser.astype("datetime64[s]")
Out[4]:
0   2016-01-01
1   2016-01-02
2   2016-01-03
dtype: datetime64[ns]

使用新行为，我们得到了精确请求的数据类型

新行为:

In [30]: ser.astype("datetime64[s]")
Out[30]: 
0   2016-01-01
1   2016-01-02
2   2016-01-03
dtype: datetime64[s]

对于不支持的分辨率，例如“datetime64[D]”，我们会引发错误，而不是静默地忽略请求的数据类型

新行为:

In [31]: ser.astype("datetime64[D]")
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[31], line 1
----> 1 ser.astype("datetime64[D]")

File ~/work/pandas/pandas/pandas/core/generic.py:6662, in NDFrame.astype(self, dtype, copy, errors)
   6656     results = [
   6657         ser.astype(dtype, copy=copy, errors=errors) for _, ser in self.items()
   6658     ]
   6660 else:
   6661     # else, only a single dtype is given
-> 6662     new_data = self._mgr.astype(dtype=dtype, copy=copy, errors=errors)
   6663     res = self._constructor_from_mgr(new_data, axes=new_data.axes)
   6664     return res.__finalize__(self, method="astype")

File ~/work/pandas/pandas/pandas/core/internals/managers.py:430, in BaseBlockManager.astype(self, dtype, copy, errors)
    427 elif using_copy_on_write():
    428     copy = False
--> 430 return self.apply(
    431     "astype",
    432     dtype=dtype,
    433     copy=copy,
    434     errors=errors,
    435     using_cow=using_copy_on_write(),
    436 )

File ~/work/pandas/pandas/pandas/core/internals/managers.py:363, in BaseBlockManager.apply(self, f, align_keys, **kwargs)
    361         applied = b.apply(f, **kwargs)
    362     else:
--> 363         applied = getattr(b, f)(**kwargs)
    364     result_blocks = extend_blocks(applied, result_blocks)
    366 out = type(self).from_blocks(result_blocks, self.axes)

File ~/work/pandas/pandas/pandas/core/internals/blocks.py:784, in Block.astype(self, dtype, copy, errors, using_cow, squeeze)
    781         raise ValueError("Can not squeeze with more than one column.")
    782     values = values[0, :]  # type: ignore[call-overload]
--> 784 new_values = astype_array_safe(values, dtype, copy=copy, errors=errors)
    786 new_values = maybe_coerce_values(new_values)
    788 refs = None

File ~/work/pandas/pandas/pandas/core/dtypes/astype.py:237, in astype_array_safe(values, dtype, copy, errors)
    234     dtype = dtype.numpy_dtype
    236 try:
--> 237     new_values = astype_array(values, dtype, copy=copy)
    238 except (ValueError, TypeError):
    239     # e.g. _astype_nansafe can fail on object-dtype of strings
    240     #  trying to convert to float
    241     if errors == "ignore":

File ~/work/pandas/pandas/pandas/core/dtypes/astype.py:179, in astype_array(values, dtype, copy)
    175     return values
    177 if not isinstance(values, np.ndarray):
    178     # i.e. ExtensionArray
--> 179     values = values.astype(dtype, copy=copy)
    181 else:
    182     values = _astype_nansafe(values, dtype, copy=copy)

File ~/work/pandas/pandas/pandas/core/arrays/datetimes.py:741, in DatetimeArray.astype(self, dtype, copy)
    739 elif isinstance(dtype, PeriodDtype):
    740     return self.to_period(freq=dtype.freq)
--> 741 return dtl.DatetimeLikeArrayMixin.astype(self, dtype, copy)

File ~/work/pandas/pandas/pandas/core/arrays/datetimelike.py:517, in DatetimeLikeArrayMixin.astype(self, dtype, copy)
    513 elif (dtype.kind in "mM" and self.dtype != dtype) or dtype.kind == "f":
    514     # disallow conversion between datetime/timedelta,
    515     # and conversions for any datetimelike to float
    516     msg = f"Cannot cast {type(self).__name__} to dtype {dtype}"
--> 517     raise TypeError(msg)
    518 else:
    519     return np.asarray(self, dtype=dtype)

TypeError: Cannot cast DatetimeArray to dtype datetime64[D]

对于从 timedelta64[ns] 数据类型进行的转换，旧行为会转换为浮点格式。

旧行为:

In [32]: idx = pd.timedelta_range("1 Day", periods=3)

In [33]: ser = pd.Series(idx)

旧行为:

In [7]: ser.astype("timedelta64[s]")
Out[7]:
0     86400.0
1    172800.0
2    259200.0
dtype: float64

In [8]: ser.astype("timedelta64[D]")
Out[8]:
0    1.0
1    2.0
2    3.0
dtype: float64

新行为，与 datetime64 类似，要么精确返回请求的数据类型，要么引发错误

新行为:

In [34]: ser.astype("timedelta64[s]")
Out[34]: 
0   1 days
1   2 days
2   3 days
dtype: timedelta64[s]

In [35]: ser.astype("timedelta64[D]")
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[35], line 1
----> 1 ser.astype("timedelta64[D]")

File ~/work/pandas/pandas/pandas/core/generic.py:6662, in NDFrame.astype(self, dtype, copy, errors)
   6656     results = [
   6657         ser.astype(dtype, copy=copy, errors=errors) for _, ser in self.items()
   6658     ]
   6660 else:
   6661     # else, only a single dtype is given
-> 6662     new_data = self._mgr.astype(dtype=dtype, copy=copy, errors=errors)
   6663     res = self._constructor_from_mgr(new_data, axes=new_data.axes)
   6664     return res.__finalize__(self, method="astype")

File ~/work/pandas/pandas/pandas/core/internals/managers.py:430, in BaseBlockManager.astype(self, dtype, copy, errors)
    427 elif using_copy_on_write():
    428     copy = False
--> 430 return self.apply(
    431     "astype",
    432     dtype=dtype,
    433     copy=copy,
    434     errors=errors,
    435     using_cow=using_copy_on_write(),
    436 )

File ~/work/pandas/pandas/pandas/core/internals/managers.py:363, in BaseBlockManager.apply(self, f, align_keys, **kwargs)
    361         applied = b.apply(f, **kwargs)
    362     else:
--> 363         applied = getattr(b, f)(**kwargs)
    364     result_blocks = extend_blocks(applied, result_blocks)
    366 out = type(self).from_blocks(result_blocks, self.axes)

File ~/work/pandas/pandas/pandas/core/internals/blocks.py:784, in Block.astype(self, dtype, copy, errors, using_cow, squeeze)
    781         raise ValueError("Can not squeeze with more than one column.")
    782     values = values[0, :]  # type: ignore[call-overload]
--> 784 new_values = astype_array_safe(values, dtype, copy=copy, errors=errors)
    786 new_values = maybe_coerce_values(new_values)
    788 refs = None

File ~/work/pandas/pandas/pandas/core/dtypes/astype.py:237, in astype_array_safe(values, dtype, copy, errors)
    234     dtype = dtype.numpy_dtype
    236 try:
--> 237     new_values = astype_array(values, dtype, copy=copy)
    238 except (ValueError, TypeError):
    239     # e.g. _astype_nansafe can fail on object-dtype of strings
    240     #  trying to convert to float
    241     if errors == "ignore":

File ~/work/pandas/pandas/pandas/core/dtypes/astype.py:179, in astype_array(values, dtype, copy)
    175     return values
    177 if not isinstance(values, np.ndarray):
    178     # i.e. ExtensionArray
--> 179     values = values.astype(dtype, copy=copy)
    181 else:
    182     values = _astype_nansafe(values, dtype, copy=copy)

File ~/work/pandas/pandas/pandas/core/arrays/timedeltas.py:358, in TimedeltaArray.astype(self, dtype, copy)
    354         return type(self)._simple_new(
    355             res_values, dtype=res_values.dtype, freq=self.freq
    356         )
    357     else:
--> 358         raise ValueError(
    359             f"Cannot convert from {self.dtype} to {dtype}. "
    360             "Supported resolutions are 's', 'ms', 'us', 'ns'"
    361         )
    363 return dtl.DatetimeLikeArrayMixin.astype(self, dtype, copy=copy)

ValueError: Cannot convert from timedelta64[ns] to timedelta64[D]. Supported resolutions are 's', 'ms', 'us', 'ns'

UTC 和固定偏移时区默认使用标准库 tzinfo 对象#

在以前的版本中，用于表示 UTC 的默认 tzinfo 对象是 pytz.UTC。在 pandas 2.0 中，我们转而默认使用 datetime.timezone.utc。类似地，对于表示固定 UTC 偏移的时区，我们使用 datetime.timezone 对象而不是 pytz.FixedOffset 对象。请参阅 (GH 34916)

旧行为:

In [2]: ts = pd.Timestamp("2016-01-01", tz="UTC")
In [3]: type(ts.tzinfo)
Out[3]: pytz.UTC

In [4]: ts2 = pd.Timestamp("2016-01-01 04:05:06-07:00")
In [3]: type(ts2.tzinfo)
Out[5]: pytz._FixedOffset

新行为:

In [36]: ts = pd.Timestamp("2016-01-01", tz="UTC")

In [37]: type(ts.tzinfo)
Out[37]: datetime.timezone

In [38]: ts2 = pd.Timestamp("2016-01-01 04:05:06-07:00")

In [39]: type(ts2.tzinfo)
Out[39]: datetime.timezone

对于既非 UTC 也非固定偏移的时区，例如“US/Pacific”，我们继续默认使用 pytz 对象。

空的 DataFrame/Series 现在将默认拥有 `RangeIndex`#

此前，在不指定轴（index=None，columns=None）的情况下构造一个空的（data 为 None 或空的类列表参数）Series 或 DataFrame，会返回一个 object 数据类型的空 Index。

现在，轴返回一个空的 RangeIndex (GH 49572)。

旧行为:

In [8]: pd.Series().index
Out[8]:
Index([], dtype='object')

In [9] pd.DataFrame().axes
Out[9]:
[Index([], dtype='object'), Index([], dtype='object')]

新行为:

In [40]: pd.Series().index
Out[40]: RangeIndex(start=0, stop=0, step=1)

In [41]: pd.DataFrame().axes
Out[41]: [RangeIndex(start=0, stop=0, step=1), RangeIndex(start=0, stop=0, step=1)]

DataFrame 到 LaTeX 有一个新的渲染引擎#

现有的 DataFrame.to_latex() 已重构，以利用此前在 Styler.to_latex() 下可用的扩展实现。参数签名类似，尽管 col_space 已被移除，因为它被 LaTeX 引擎忽略。此渲染引擎还需要 jinja2 作为依赖项，需要安装，因为渲染是基于 jinja2 模板的。

以下 pandas LaTeX 选项不再使用并已移除。通用的最大行和列参数仍然保留，但对于此功能应替换为 Styler 的等效项。提供类似功能的替代选项如下所示

display.latex.escape：已替换为 styler.format.escape，
display.latex.longtable：已替换为 styler.latex.environment，
display.latex.multicolumn、display.latex.multicolumn_format 和 display.latex.multirow：已替换为 styler.sparse.rows、styler.sparse.columns、styler.latex.multirow_align 和 styler.latex.multicol_align，
display.latex.repr：已替换为 styler.render.repr，
display.max_rows 和 display.max_columns：替换为 styler.render.max_rows、styler.render.max_columns 和 styler.render.max_elements。

请注意，由于此更改，一些默认值也已更改

multirow 现在默认为 True。
multirow_align 默认值为 “r” 而不是 “l”。
multicol_align 默认值为 “r” 而不是 “l”。
escape 现在默认为 False。

请注意，_repr_latex_ 的行为也已更改。以前设置 display.latex.repr 只在使用 nbconvert 导出 Jupyter Notebook 时生成 LaTeX，而不是用户运行 Notebook 时。现在 styler.render.repr 选项允许控制 Jupyter Notebook 中特定操作的输出（不仅仅是 nbconvert）。请参阅 GH 39911。

依赖项的最低版本提高#

一些依赖项的最低支持版本已更新。如果已安装，我们现在要求

包	最低版本	必需	已更改
mypy (开发)	1.0		X
pytest (开发)	7.0.0		X
pytest-xdist (开发)	2.2.0		X
hypothesis (开发)	6.34.2		X
python-dateutil	2.8.2	X	X
tzdata	2022.1	X	X

对于可选库，一般建议使用最新版本。下表列出了在 pandas 开发过程中当前正在测试的每个库的最低版本。低于最低测试版本的可选库可能仍然可用，但不被视为受支持。

包	最低版本	已更改
pyarrow	7.0.0	X
matplotlib	3.6.1	X
fastparquet	0.6.3	X
xarray	0.21.0	X

有关更多信息，请参阅依赖项和可选依赖项。

日期时间现在以一致的格式解析#

过去，to_datetime() 会独立猜测每个元素的格式。这在某些元素具有混合日期格式的情况下是合适的——然而，当用户期望一致的格式但函数会在元素之间切换格式时，这通常会导致问题。从 2.0.0 版本开始，解析将使用一致的格式，由第一个非 NA 值决定（除非用户指定了格式，在这种情况下使用用户指定的格式）。

旧行为:

In [1]: ser = pd.Series(['13-01-2000', '12-01-2000'])
In [2]: pd.to_datetime(ser)
Out[2]:
0   2000-01-13
1   2000-12-01
dtype: datetime64[ns]

新行为:

In [42]: ser = pd.Series(['13-01-2000', '12-01-2000'])

In [43]: pd.to_datetime(ser)
Out[43]: 
0   2000-01-13
1   2000-01-12
dtype: datetime64[ns]

请注意，这也影响 read_csv()。

如果您仍然需要解析格式不一致的日期，可以使用 format='mixed'（可能同时使用 dayfirst）

ser = pd.Series(['13-01-2000', '12 January 2000'])
pd.to_datetime(ser, format='mixed', dayfirst=True)

或者，如果您的格式都是 ISO8601（但可能格式不完全相同）

ser = pd.Series(['2020-01-01', '2020-01-01 03:00'])
pd.to_datetime(ser, format='ISO8601')

其他 API 更改#

Timestamp 构造函数中的 freq、tz、nanosecond 和 unit 关键字现在是仅限关键字参数 (GH 45307, GH 32526)
在 Timestamp 中传入大于 999 或小于 0 的纳秒数现在会引发 ValueError (GH 48538, GH 48255)
read_csv()：当使用 c 解析器时，使用 index_col 指定不正确的列数现在会引发 ParserError 而不是 IndexError。
get_dummies() 中 dtype 的默认值从 uint8 更改为 bool (GH 45848)
DataFrame.astype()、Series.astype() 和 DatetimeIndex.astype() 将 datetime64 数据转换为“datetime64[s]”、“datetime64[ms]”或“datetime64[us]”中的任何一个时，将返回具有给定分辨率的对象，而不是强制转换回“datetime64[ns]” (GH 48928)
DataFrame.astype()、Series.astype() 和 DatetimeIndex.astype() 将 timedelta64 数据转换为“timedelta64[s]”、“timedelta64[ms]”或“timedelta64[us]”中的任何一个时，将返回具有给定分辨率的对象，而不是强制转换为“float64”数据类型 (GH 48963)
DatetimeIndex.astype()、TimedeltaIndex.astype()、PeriodIndex.astype() Series.astype()、DataFrame.astype() 在处理 datetime64、timedelta64 或 PeriodDtype 数据类型时，不再允许转换为除“int64”之外的整数数据类型，请改用 obj.astype('int64', copy=False).astype(dtype) (GH 49715)
Index.astype() 现在允许从 float64 数据类型转换为日期时间类数据类型，与 Series 行为一致 (GH 49660)
将数据类型为“timedelta64[s]”、“timedelta64[ms]”或“timedelta64[us]”的数据传递给 TimedeltaIndex、Series 或 DataFrame 构造函数时，现在将保留该数据类型，而不是转换为“timedelta64[ns]”；分辨率较低的 timedelta64 数据将转换为最低支持的分辨率“timedelta64[s]” (GH 49014)
将数据类型为“timedelta64[s]”、“timedelta64[ms]”或“timedelta64[us]”传递给 TimedeltaIndex、Series 或 DataFrame 构造函数时，现在将保留该数据类型，而不是转换为“timedelta64[ns]”；对于 Series 或 DataFrame，传入分辨率较低的数据类型将转换为最低支持的分辨率“timedelta64[s]” (GH 49014)
将非纳秒分辨率的 np.datetime64 对象传递给 Timestamp 将保留输入分辨率（如果是“s”、“ms”、“us”或“ns”）；否则将转换为最接近的支持分辨率 (GH 49008)
将分辨率非纳秒级的 datetime64 值传递给 to_datetime() 将保留输入分辨率（如果是“s”、“ms”、“us”或“ns”）；否则将转换为最接近的支持分辨率 (GH 50369)
将整数值和非纳秒级 datetime64 数据类型（例如“datetime64[s]”）传递给 DataFrame、Series 或 Index 时，会将这些值视为数据类型单位的倍数，与例如 Series(np.array(values, dtype="M8[s]")) 的行为一致 (GH 51092)
将 ISO-8601 格式的字符串传递给 Timestamp 将保留解析输入的分辨率（如果是“s”、“ms”、“us”或“ns”）；否则将转换为最接近的支持分辨率 (GH 49737)
DataFrame.mask() 和 Series.mask() 中的 other 参数现在默认值为 no_default 而不是 np.nan，这与 DataFrame.where() 和 Series.where() 保持一致。条目将填充相应的 NULL 值（对于 NumPy 数据类型为 np.nan，对于扩展数据类型为 pd.NA）。(GH 49111)
Series.quantile() 和 DataFrame.quantile() 与 SparseDtype 的行为已更改，以保留稀疏数据类型 (GH 49583)
当使用对象数据类型 Index 创建包含日期时间对象的 Series 时，pandas 不再默默地将索引转换为 DatetimeIndex (GH 39307, GH 23598)
pandas.testing.assert_index_equal() 使用参数 exact="equiv" 时，现在认为当两个索引都是 RangeIndex 或具有 int64 数据类型的 Index 时，它们是相等的。此前，这意味着 RangeIndex 或 Int64Index (GH 51098)
数据类型为 “timedelta64[ns]” 或 “datetime64[ns]” 的 Series.unique() 现在返回 TimedeltaArray 或 DatetimeArray，而不是 numpy.ndarray (GH 49176)
to_datetime() 和 DatetimeIndex 现在允许包含 datetime 对象和数字条目的序列，与 Series 行为一致 (GH 49037, GH 50453)
pandas.api.types.is_string_dtype() 现在仅在元素被推断为字符串时，才对 dtype=object 的类数组返回 True (GH 15585)
将包含 datetime 对象和 date 对象的序列传递给 Series 构造函数将返回 object 数据类型而不是 datetime64[ns] 数据类型，这与 Index 行为一致 (GH 49341)
将无法解析为日期时间字符串的字符串传递给具有 dtype="datetime64[ns]" 的 Series 或 DataFrame 时，将引发错误而不是默默地忽略关键字并返回 object 数据类型 (GH 24435)
将包含无法转换为 Timedelta 的类型的序列传递给 to_timedelta() 或传递给 Series 或 DataFrame 构造函数（使用 dtype="timedelta64[ns]"），或传递给 TimedeltaIndex 时，现在会引发 TypeError 而不是 ValueError (GH 49525)
Index 构造函数在序列中包含至少一个 NaT 且其余均为 None 或 NaN 时，其行为已更改为推断 datetime64[ns] 数据类型而不是 object，这与 Series 行为一致 (GH 49340)
read_stata() 在参数 index_col 设置为 None（默认值）时，现在将返回的 DataFrame 的索引设置为 RangeIndex 而不是 Int64Index (GH 49745)
Index、Series 和 DataFrame 的算术方法在处理对象数据类型时，其行为已更改，结果不再对数组操作的结果进行类型推断；请使用 result.infer_objects(copy=False) 对结果进行类型推断 (GH 49999, GH 49714)
Index 构造函数在使用包含全布尔值或全复数值的对象数据类型 numpy.ndarray 时，其行为已更改，现在将保留对象数据类型，这与 Series 行为一致 (GH 49594)
Series.astype() 将包含 bytes 对象的数据类型从对象数据类型转换为字符串数据类型时，其行为已更改；现在对字节对象执行 val.decode() 而不是 str(val)，这与 Index.astype() 行为一致 (GH 45326)
read_csv() 的默认 na_values 中添加了 "None" (GH 50286)
Series 和 DataFrame 构造函数在给定整数数据类型和非整数浮点数据时，其行为已更改，现在会引发 ValueError 而不是默默地保留浮点数据类型；若要获得旧行为，请使用 Series(data) 或 DataFrame(data)；若要获得指定数据类型，请使用 Series(data).astype(dtype) 或 DataFrame(data).astype(dtype) (GH 49599)
DataFrame.shift() 在 axis=1、整数 fill_value 和同构日期时间类数据类型的情况下，其行为已更改，现在会用整数数据类型填充新列，而不是转换为日期时间类 (GH 49842)
read_json() 在遇到异常时现在会关闭文件 (GH 49921)
read_csv()、read_json() 和 read_fwf() 的行为已更改，现在在未指定索引时，索引将始终是 RangeIndex。此前，如果新的 DataFrame/Series 长度为 0，索引将是数据类型为 object 的 Index (GH 49572)
DataFrame.values()、DataFrame.to_numpy()、DataFrame.xs()、DataFrame.reindex()、DataFrame.fillna() 和 DataFrame.replace() 不再默默地整合底层数组；请使用 df = df.copy() 来确保整合 (GH 49356)
使用 loc 或 iloc 在两个轴上进行完整切片（即 df.loc[:, :] 或 df.iloc[:, :]）创建新 DataFrame 时，现在会返回一个新的 DataFrame（浅拷贝），而不是原始 DataFrame，这与其他获取完整切片的方法（例如 df.loc[:] 或 df[:]）保持一致 (GH 49469)
Series 和 DataFrame 构造函数在分别传入 Series 和 DataFrame 且默认 copy=False（且没有其他关键字触发复制）时，现在将返回一个浅拷贝（即共享数据，但不共享属性）。此前，新的 Series 或 DataFrame 会共享索引属性（例如，df.index = ... 也会更新父级或子级的索引）(GH 49523)
禁止计算 Timedelta 对象的 cumprod；此前这会返回不正确的值 (GH 50246)
从 HDFStore 文件中读取的没有索引的 DataFrame 对象现在具有 RangeIndex 而不是 int64 索引 (GH 51076)
使用包含 NA 和/或 NaT 的数据实例化数字 NumPy 数据类型的 Index 现在会引发 ValueError。此前会引发 TypeError (GH 51050)
使用 read_json(orient='split') 加载具有重复列的 JSON 文件时，现在会重命名列以避免重复，这与 read_csv() 和其他读取器一致 (GH 50370)
从 Series.sparse.from_coo 返回的 Series 的索引级别现在始终具有数据类型 int32。此前它们的数据类型为 int64 (GH 50926)
to_datetime() 在 unit 为 “Y” 或 “M” 且序列包含非整数 float 值时，现在会引发错误，这与 Timestamp 行为一致 (GH 50301)
Series.round()、DataFrame.__invert__()、Series.__invert__()、DataFrame.swapaxes()、DataFrame.first()、DataFrame.last()、Series.first()、Series.last() 和 DataFrame.align() 方法现在将始终返回新对象 (GH 51032)
DataFrame 和 DataFrameGroupBy 聚合（例如 “sum”）在处理对象数据类型列时，不再为其结果推断非对象数据类型，请显式地在结果上调用 result.infer_objects(copy=False) 以获得旧行为 (GH 51205, GH 49603)
使用 ArrowDtype 数据类型进行除以零操作时，根据分子，现在返回 -inf、nan 或 inf，而不是引发错误 (GH 51541)
添加了 pandas.api.types.is_any_real_numeric_dtype() 以检查实际数字数据类型 (GH 51152)
value_counts() 现在返回数据类型为 ArrowDtype 且类型为 pyarrow.int64 的数据，而不是 "Int64" 类型 (GH 51462)
factorize() 和 unique() 在传入非纳秒分辨率的 NumPy timedelta64 或 datetime64 时，会保留原始数据类型 (GH 48670)

注意

当前的 PDEP 提案废弃并移除 pandas API 中除一小部分方法外的所有 inplace 和 copy 关键字。目前的讨论正在此处进行。在写时复制 (Copy-on-Write) 的背景下，这些关键字将不再是必需的。如果此提案被接受，这两个关键字将在 pandas 的下一个版本中被废弃，并在 pandas 3.0 中移除。

弃用#

废弃了将带有系统本地时区的日期时间字符串解析为 tzlocal 的行为，请改用 tz 关键字或显式调用 tz_localize (GH 50791)
废弃了 to_datetime() 和 read_csv() 中的参数 infer_datetime_format，因为其严格版本现在是默认值 (GH 48621)
废弃了 to_datetime() 在解析字符串时带有 unit 的行为，在未来的版本中，这些字符串将被解析为日期时间（与不带 unit 的行为一致），而不是转换为浮点数。若要保留旧行为，请在调用 to_datetime() 之前将字符串转换为数字类型 (GH 50735)
废弃了 pandas.io.sql.execute() (GH 50185)
Index.is_boolean() 已被废弃。请改用 pandas.api.types.is_bool_dtype() (GH 50042)
Index.is_integer() 已被废弃。请改用 pandas.api.types.is_integer_dtype() (GH 50042)
Index.is_floating() 已被废弃。请改用 pandas.api.types.is_float_dtype() (GH 50042)
Index.holds_integer() 已被废弃。请改用 pandas.api.types.infer_dtype() (GH 50243)
Index.is_numeric() 已被废弃。请改用 pandas.api.types.is_any_real_numeric_dtype() (GH 50042,:issue:51152)
Index.is_categorical() 已被废弃。请改用 pandas.api.types.is_categorical_dtype() (GH 50042)
Index.is_object() 已被废弃。请改用 pandas.api.types.is_object_dtype() (GH 50042)
Index.is_interval() 已被废弃。请改用 pandas.api.types.is_interval_dtype() (GH 50042)
废弃了 read_csv()、read_table()、read_fwf() 和 read_excel() 中的参数 date_parser，推荐使用 date_format (GH 50601)
废弃了 datetime64 和 DatetimeTZDtype 数据类型的 all 和 any 聚合，请改用例如 (obj != pd.Timestamp(0), tz=obj.tz).all() (GH 34479)
废弃了 Resampler 中未使用的参数 *args 和 **kwargs (GH 50977)
废弃了在单个元素 Series 上调用 float 或 int 以分别返回 float 或 int 的行为。请改为在调用 float 或 int 之前提取该元素 (GH 51101)
废弃了 Grouper.groups()，请改用 Groupby.groups() (GH 51182)
废弃了 Grouper.grouper()，请改用 Groupby.grouper() (GH 51182)
废弃了 Grouper.obj()，请改用 Groupby.obj() (GH 51206)
废弃了 Grouper.indexer()，请改用 Resampler.indexer() (GH 51206)
废弃了 Grouper.ax()，请改用 Resampler.ax() (GH 51206)
废弃了 read_parquet() 中的关键字 use_nullable_dtypes，请改用 dtype_backend (GH 51853)
废弃了 Series.pad()，推荐使用 Series.ffill() (GH 33396)
废弃了 Series.backfill()，推荐使用 Series.bfill() (GH 33396)
废弃了 DataFrame.pad()，推荐使用 DataFrame.ffill() (GH 33396)
废弃了 DataFrame.backfill()，推荐使用 DataFrame.bfill() (GH 33396)
废弃了 close()。请改用 StataReader 作为上下文管理器 (GH 49228)
废弃了在迭代通过 level 参数为长度为 1 的列表进行分组的 DataFrameGroupBy 或 SeriesGroupBy 时生成标量值的行为；现在将返回长度为 1 的元组 (GH 51583)

移除先前版本的弃用/更改#

移除了 Int64Index、UInt64Index 和 Float64Index。更多信息请参阅此处 (GH 42717)
移除了已废弃的 Timestamp.freq、Timestamp.freqstr 以及 Timestamp 构造函数和 Timestamp.fromordinal() 中的参数 freq (GH 14146)
移除了已废弃的 CategoricalBlock、Block.is_categorical()，要求 datetime64 和 timedelta64 值在传递给 Block.make_block_same_class() 之前必须封装在 DatetimeArray 或 TimedeltaArray 中，要求 DatetimeTZBlock.values 在传递给 BlockManager 构造函数时具有正确的 ndim，并移除了 SingleBlockManager 构造函数中的 “fastpath” 关键字 (GH 40226, GH 40571)
移除了已废弃的全局选项 use_inf_as_null，推荐使用 use_inf_as_na (GH 17126)
移除了已废弃的模块 pandas.core.index (GH 30193)
移除了已废弃的别名 pandas.core.tools.datetimes.to_time，请改为直接从 pandas.core.tools.times 导入该函数 (GH 34145)
移除了已废弃的别名 pandas.io.json.json_normalize，请改为直接从 pandas.json_normalize 导入该函数 (GH 27615)
移除了已废弃的 Categorical.to_dense()，请改用 np.asarray(cat) (GH 32639)
移除了已废弃的 Categorical.take_nd() (GH 27745)
移除了已废弃的 Categorical.mode()过去式，请改用 Series(cat).mode() (GH 45033)


移除了已废弃的 Categorical.is_dtype_equal() 和 CategoricalIndex.is_dtype_equal() (GH 37545)
移除了已废弃的 CategoricalIndex.take_nd() (GH 30702)
移除了已废弃的 Index.is_type_compatible() (GH 42113)
移除了已废弃的 Index.is_mixed()，请改为直接检查 index.inferred_type过去式 (GH 32922)

移除了已废弃的 pandas.api.types.is_categorical()；请改用 pandas.api.types.is_categorical_dtype() (GH 33385)
移除了已废弃的 Index.asi8() (GH 37877)
强制执行废弃行为：在将 datetime64[ns] 数据和时区感知数据类型传递给 Series 时，将值解释为墙钟时间而不是 UTC 时间，与 DatetimeIndex 行为一致 (GH 41662)
强制执行废弃行为：在非对齐（按索引或列）的多个 DataFrame 上应用 NumPy 通用函数时，现在会首先对输入进行对齐 (GH 39239)
移除了已废弃的 DataFrame._AXIS_NUMBERS()、DataFrame._AXIS_NAMES()过去式、Series._AXIS_NUMBERS()过去式、Series._AXIS_NAMES() (GH 33637)

移除了已废弃的 Index.to_native_types()，请改用 obj.astype(str) (GH 36418)
移除了已废弃的 Series.iteritems()、DataFrame.iteritems()，请改用 obj.items (GH 45321)
移除了已废弃的 DataFrame.lookup() (GH 35224)
移除了已废弃的 Series.append()、DataFrame.append()，请改用 concat() (GH 35407)
移除了已废弃的 Series.iteritems()、DataFrame.iteritems() 和 HDFStore.iteritems()，请改用 obj.items (GH 45321)
移除了已废弃的 DatetimeIndex.union_many() (GH 45018)
移除了已废弃的 DatetimeArray过去式、DatetimeIndex 和 dt 访问器的 weekofyear 和 week过去式属性，推荐使用 isocalendar().week (GH 33595)

移除了已废弃的 RangeIndex._start()、RangeIndex._stop()、RangeIndex._step()，请改用 start、stop、step (GH 30482)
移除了已废弃的 DatetimeIndex.to_perioddelta()，请改用 dtindex - dtindex.to_period(freq).to_timestamp() (GH 34853)
移除了已废弃的 Styler.hide_index() 和 Styler.hide_columns() (GH 49397)
移除了已废弃的 Styler.set_na_rep() 和 Styler.set_precision() (GH 49397)
移除了已废弃的 Styler.where() (GH 49397)
移除了已废弃的 Styler.render() (GH 49397)
移除了 DataFrame.to_latex() 中已废弃的参数 col_space (GH 47970)
移除了 Styler.highlight_null() 中已废弃的参数 null_color (GH 49397)
移除了 testing.assert_frame_equal()、testing.assert_extension_array_equal()、testing.assert_series_equal()、testing.assert_index_equal() 中已废弃的参数 check_less_precise (GH 30562)
移除了 DataFrame.info()过去式中已废弃的参数 null_counts。请改用 show_counts (GH 37999)

移除了已废弃的 Index.is_monotonic() 和 Series.is_monotonic()；请改用 obj.is_monotonic_increasing (GH 45422)
移除了已废弃的 Index.is_all_dates()过去式 (GH 36697)

强制执行废弃行为：禁止将时区感知 Timestamp 和 dtype="datetime64[ns]" 传递给 Series 或 DataFrame 构造函数 (GH 41555)
强制执行废弃行为：禁止将时区感知值序列和 dtype="datetime64[ns]" 传递给 Series 或 DataFrame 构造函数 (GH 41555)
强制执行废弃行为：禁止在 DataFrame 构造函数中使用 numpy.ma.mrecords.MaskedRecords；请改用 "{name: data[name] for name in data.dtype.names} (GH 40363)
强制执行废弃行为：禁止在 Series.astype()过去式和 DataFrame.astype()过去式中使用无单位的 “datetime64” 数据类型进行转换 (GH 47844)

强制执行废弃行为：禁止使用 .astype 将 datetime64[ns] 的 Series、DataFrame 或 DatetimeIndex 转换为时区感知数据类型，请改用 obj.tz_localize 或 ser.dt.tz_localize (GH 39258)
强制执行废弃行为：禁止使用 .astype 将时区感知 Series、DataFrame 或 DatetimeIndex 转换为时区非感知 datetime64[ns] 数据类型，请改用 obj.tz_localize(None) 或 obj.tz_convert("UTC").tz_localize(None) (GH 39258)
强制执行废弃行为：禁止向 concat() 中的 sort 参数传递非布尔值 (GH 44629)
移除了日期解析函数 parse_date_time()、parse_date_fields()、parse_all_fields() 和 generic_parser() (GH 24518)
移除了 core.arrays.SparseArray 构造函数中的参数 index (GH 43523)
移除了 DataFrame.groupby() 和 Series.groupby()过去式中的参数 squeeze (GH 32380)

移除了已废弃的 DateOffset 属性 apply、apply_index过去式、__call__过去式、onOffset过去式和 isAnchored (GH 34171)

移除了 DatetimeIndex.to_series()过去式中的 keep_tz 参数 (GH 29731)

移除了 Index.copy() 中的参数 names 和 dtype，以及 MultiIndex.copy()过去式中的 levels 和 codes (GH 35853, GH 36685)

移除了 MultiIndex.set_levels()过去式和 MultiIndex.set_codes()过去式中的参数 inplace过去式 (GH 35626)

从 DataFrame.to_excel() 和 Series.to_excel() 中移除了参数 verbose 和 encoding (GH 47912)
从 DataFrame.to_csv() 和 Series.to_csv() 中移除了参数 line_terminator，请改用 lineterminator (GH 45302)
从 DataFrame.set_axis() 和 Series.set_axis() 中移除了参数 inplace，请改用 obj = obj.set_axis(..., copy=False) (GH 48130)
禁止向 MultiIndex.set_levels() 和 MultiIndex.set_codes() 传递位置参数 (GH 41485)
禁止将带有单位“Y”、“y”或“M”的组件字符串解析为 Timedelta，因为这些单位不表示明确的持续时间 (GH 36838)
移除了 MultiIndex.is_lexsorted() 和 MultiIndex.lexsort_depth() (GH 38701)
从 PeriodIndex.astype() 中移除了参数 how，请改用 PeriodIndex.to_timestamp() (GH 37982)
从 DataFrame.mask()、DataFrame.where()、Series.mask() 和 Series.where() 中移除了参数 try_cast (GH 38836)
从 Period.to_timestamp() 中移除了参数 tz，请改用 obj.to_timestamp(...).tz_localize(tz) (GH 34522)
从 DataFrame.plot() 和 Series.plot() 中移除了参数 sort_columns (GH 47563)
从 DataFrame.take() 和 Series.take() 中移除了参数 is_copy (GH 30615)
从 Index.get_slice_bound()、Index.slice_indexer() 和 Index.slice_locs() 中移除了参数 kind (GH 41378)
从 read_csv() 中移除了参数 prefix、squeeze、error_bad_lines 和 warn_bad_lines (GH 40413, GH 43427)
从 read_excel() 中移除了参数 squeeze (GH 43427)
从 DataFrame.describe() 和 Series.describe() 中移除了参数 datetime_is_numeric，因为日期时间数据将始终被概括为数值数据 (GH 34798)
禁止向 Series.xs() 和 DataFrame.xs() 传递列表 key，请改用元组 (GH 41789)
禁止在 Index 构造函数中使用子类特有的关键字（例如 “freq”、“tz”、“names”、“closed”） (GH 38597)
从 Categorical.remove_unused_categories() 中移除了参数 inplace (GH 37918)
禁止向 Timestamp 传递非整数浮点数，当 unit 为“M”或“Y”时 (GH 47266)
从 read_excel() 中移除了关键字 convert_float 和 mangle_dupe_cols (GH 41176)
从 read_csv() 和 read_table() 中移除了关键字 mangle_dupe_cols (GH 48137)
从 DataFrame.where()、Series.where()、DataFrame.mask() 和 Series.mask() 中移除了关键字 errors (GH 47728)
禁止向 read_excel() 传递非关键字参数，`io` 和 `sheet_name` 除外 (GH 34418)
禁止向 DataFrame.drop() 和 Series.drop() 传递非关键字参数，`labels` 除外 (GH 41486)
禁止向 DataFrame.fillna() 和 Series.fillna() 传递非关键字参数，`value` 除外 (GH 41485)
禁止向 StringMethods.split() 和 StringMethods.rsplit() 传递非关键字参数，`pat` 除外 (GH 47448)
禁止向 DataFrame.set_index() 传递非关键字参数，`keys` 除外 (GH 41495)
禁止向 Resampler.interpolate() 传递非关键字参数，`method` 除外 (GH 41699)
禁止向 DataFrame.reset_index() 和 Series.reset_index() 传递非关键字参数，`level` 除外 (GH 41496)
禁止向 DataFrame.dropna() 和 Series.dropna() 传递非关键字参数 (GH 41504)
禁止向 ExtensionArray.argsort() 传递非关键字参数 (GH 46134)
禁止向 Categorical.sort_values() 传递非关键字参数 (GH 47618)
禁止向 Index.drop_duplicates() 和 Series.drop_duplicates() 传递非关键字参数 (GH 41485)
禁止向 DataFrame.drop_duplicates() 传递非关键字参数，`subset` 除外 (GH 41485)
禁止向 DataFrame.sort_index() 和 Series.sort_index() 传递非关键字参数 (GH 41506)
禁止向 DataFrame.interpolate() 和 Series.interpolate() 传递非关键字参数，`method` 除外 (GH 41510)
禁止向 DataFrame.any() 和 Series.any() 传递非关键字参数 (GH 44896)
禁止向 Index.set_names() 传递非关键字参数，`names` 除外 (GH 41551)
禁止向 Index.join() 传递非关键字参数，`other` 除外 (GH 46518)
禁止向 concat() 传递非关键字参数，`objs` 除外 (GH 41485)
禁止向 pivot() 传递非关键字参数，`data` 除外 (GH 48301)
禁止向 DataFrame.pivot() 传递非关键字参数 (GH 48301)
禁止向 read_html() 传递非关键字参数，`io` 除外 (GH 27573)
禁止向 read_json() 传递非关键字参数，`path_or_buf` 除外 (GH 27573)
禁止向 read_sas() 传递非关键字参数，`filepath_or_buffer` 除外 (GH 47154)
禁止向 read_stata() 传递非关键字参数，`filepath_or_buffer` 除外 (GH 48128)
禁止向 read_csv() 传递非关键字参数，`filepath_or_buffer` 除外 (GH 41485)
禁止向 read_table() 传递非关键字参数，`filepath_or_buffer` 除外 (GH 41485)
禁止向 read_fwf() 传递非关键字参数，`filepath_or_buffer` 除外 (GH 44710)
禁止向 read_xml() 传递非关键字参数，`path_or_buffer` 除外 (GH 45133)
禁止向 Series.mask() 和 DataFrame.mask() 传递非关键字参数，`cond` 和 `other` 除外 (GH 41580)
禁止向 DataFrame.to_stata() 传递非关键字参数，`path` 除外 (GH 48128)
禁止向 DataFrame.where() 和 Series.where() 传递非关键字参数，`cond` 和 `other` 除外 (GH 41523)
禁止向 Series.set_axis() 和 DataFrame.set_axis() 传递非关键字参数，`labels` 除外 (GH 41491)
禁止向 Series.rename_axis() 和 DataFrame.rename_axis() 传递非关键字参数，`mapper` 除外 (GH 47587)
禁止向 Series.clip() 和 DataFrame.clip() 传递非关键字参数，`lower` 和 `upper` 除外 (GH 41511)
禁止向 Series.bfill()、Series.ffill()、DataFrame.bfill() 和 DataFrame.ffill() 传递非关键字参数 (GH 41508)
禁止向 DataFrame.replace()、Series.replace() 传递非关键字参数，`to_replace` 和 `value` 除外 (GH 47587)
禁止向 DataFrame.sort_values() 传递非关键字参数，`by` 除外 (GH 41505)
禁止向 Series.sort_values() 传递非关键字参数 (GH 41505)
禁止向 DataFrame.reindex() 传递非关键字参数，`labels` 除外 (GH 17966)
禁止对非唯一 Index 对象使用 Index.reindex() (GH 42568)
禁止使用标量 data 构造 Categorical (GH 38433)
禁止在未传递 data 的情况下构造 CategoricalIndex (GH 38944)
移除了 Rolling.validate()、Expanding.validate() 和 ExponentialMovingWindow.validate() (GH 43665)
移除了返回 "freq" 的 Rolling.win_type (GH 38963)
移除了 Rolling.is_datetimelike (GH 38963)
从 DataFrame 和 Series 聚合中移除了 level 关键字；请改用 groupby (GH 39983)
移除了已弃用的 Timedelta.delta()、Timedelta.is_populated() 和 Timedelta.freq (GH 46430, GH 46476)
移除了已弃用的 NaT.freq (GH 45071)
移除了已弃用的 Categorical.replace()，请改用 Series.replace() (GH 44929)
从 Categorical.min() 和 Categorical.max() 中移除了 numeric_only 关键字，改用 skipna (GH 48821)
更改了 DataFrame.median() 和 DataFrame.mean() 在 numeric_only=None 时的行为，不再排除日期时间类列。此备注在 `numeric_only=None` 弃用生效后将不再相关 (GH 29941)
移除了 is_extension_type()，改用 is_extension_array_dtype() (GH 29457)
移除了 .ExponentialMovingWindow.vol (GH 39220)
移除了 Index.get_value() 和 Index.set_value() (GH 33907, GH 28621)
移除了 Series.slice_shift() 和 DataFrame.slice_shift() (GH 37601)
移除了 DataFrameGroupBy.pad() 和 DataFrameGroupBy.backfill() (GH 45076)
从 read_json() 中移除了 numpy 参数 (GH 30636)
禁止在 DataFrame.to_dict() 中为 orient 传递缩写 (GH 32516)
禁止在非单调的 DatetimeIndex 上使用不在索引中的键进行部分切片。现在这将引发 KeyError (GH 18531)
移除了 get_offset，改用 to_offset() (GH 30340)
从 infer_freq() 中移除了 warn 关键字 (GH 45947)
从 DataFrame.between_time() 中移除了 include_start 和 include_end 参数，改用 inclusive (GH 43248)
从 date_range() 和 bdate_range() 中移除了 closed 参数，改用 inclusive 参数 (GH 40245)
从 DataFrame.expanding() 中移除了 center 关键字 (GH 20647)
从 eval() 中移除了 truediv 关键字 (GH 29812)
从 Index.get_loc() 中移除了 method 和 tolerance 参数。请改用 index.get_indexer([label], method=..., tolerance=...) (GH 42269)
移除了 pandas.datetime 子模块 (GH 30489)
移除了 pandas.np 子模块 (GH 30296)
移除了 pandas.util.testing，改用 pandas.testing (GH 30745)
移除了 Series.str.__iter__() (GH 28277)
移除了 pandas.SparseArray，改用 arrays.SparseArray (GH 30642)
移除了 pandas.SparseSeries 和 pandas.SparseDataFrame，包括 pickle 支持。 (GH 30642)
强制禁止向数据类型为 `datetime64`、`timedelta64` 或 `period` 的 DataFrame.shift() 和 Series.shift`() 传递整数 fill_value (GH 32591)
强制禁止将字符串列标签传递给 DataFrame.ewm() 中的 times (GH 43265)
强制禁止向 Series.between() 中的 inclusive 传递 True 和 False，请分别改用 "both" 和 "neither" (GH 40628)
强制禁止 `read_csv` 在 `engine="c"` 时使用超出边界的索引作为 usecols (GH 25623)
强制禁止在 ExcelWriter 中使用 **kwargs；请改用关键字参数 engine_kwargs (GH 40430)
强制禁止将列标签元组传递给 DataFrameGroupBy.__getitem__() (GH 30546)
强制禁止在使用 MultiIndex 某一级别上的标签序列进行索引时出现缺失标签。现在这将引发 KeyError (GH 42351)
强制禁止使用位置切片通过 .loc 设置值。请改用带标签的 .loc 或带位置的 .iloc (GH 31840)
强制禁止使用 `float` 键进行位置索引，即使该键是整数，请手动将其转换为整数 (GH 34193)
强制禁止使用 DataFrame 索引器与 .iloc 结合使用，请改用 .loc 进行自动对齐 (GH 39022)
强制禁止在 __getitem__ 和 __setitem__ 方法中使用 `set` 或 `dict` 索引器 (GH 42825)
强制禁止对 Index 进行索引或对 Series 进行位置索引时生成多维对象，例如 obj[:, None]，请改在索引前转换为 numpy (GH 35141)
强制禁止在 merge() 的 suffixes 中使用 `dict` 或 `set` 对象 (GH 34810)
强制禁止 merge() 通过 suffixes 关键字和已存在的列生成重复列 (GH 22818)
强制禁止在不同级别数量上使用 merge() 或 join() (GH 34862)
强制禁止 DataFrame.melt() 中的 value_name 参数与 DataFrame 列中的元素匹配 (GH 35003)
强制禁止在 DataFrame.to_markdown() 和 Series.to_markdown() 中将 showindex 传递给 **kwargs，请改用 index (GH 33091)
移除了直接设置 Categorical._codes 的功能 (GH 41429)
移除了直接设置 Categorical.categories 的功能 (GH 47834)
从 Categorical.add_categories()、Categorical.remove_categories()、Categorical.set_categories()、Categorical.rename_categories()、Categorical.reorder_categories()、Categorical.set_ordered()、Categorical.as_ordered()、Categorical.as_unordered() 中移除了参数 inplace (GH 37981, GH 41118, GH 41133, GH 47834)
强制 Rolling.count() 在 min_periods=None 时默认窗口大小 (GH 31302)
在 DataFrame.to_parquet()、DataFrame.to_stata() 和 DataFrame.to_feather() 中将 fname 重命名为 path (GH 30338)
强制禁止使用包含单个切片项的列表（例如 ser[[slice(0, 2)]]）对 Series 进行索引。请将列表转换为元组，或者直接传递切片 (GH 31333)
更改了使用字符串索引器对具有 DatetimeIndex 索引的 DataFrame 进行索引的行为，以前这作为行的切片操作，现在它像任何其他列键一样操作；对于旧行为，请使用 frame.loc[key] (GH 36179)
强制 display.max_colwidth 选项不接受负整数 (GH 31569)
移除了 display.column_space 选项，改用 df.to_string(col_space=...) (GH 47280)
从 pandas 类中移除了已弃用的方法 mad (GH 11787)
从 pandas 类中移除了已弃用的方法 tshift (GH 11631)
更改了将空数据传递给 Series 的行为；默认数据类型将是 object 而不是 float64 (GH 29405)
更改了 DatetimeIndex.union()、DatetimeIndex.intersection() 和 DatetimeIndex.symmetric_difference() 在时区不匹配时的行为，现在会转换为 UTC 而不是转换为 `object` 数据类型 (GH 39328)
更改了 to_datetime() 在参数为“now”且 utc=False 时的行为，以匹配 Timestamp("now") (GH 18705)
更改了使用时区感知的 DatetimeIndex 与时区不感知的 datetime 对象（反之亦然）进行索引的行为；现在这些操作行为与任何其他不可比较类型相同，会引发 KeyError (GH 36148)
更改了 Index.reindex()、Series.reindex() 和 DataFrame.reindex() 在 `datetime64` 数据类型和 `datetime.date` 对象作为 fill_value 时的行为；现在它们不再被视为等同于 `datetime.datetime` 对象，因此 reindex 会转换为 `object` 数据类型 (GH 39767)
更改了 SparseArray.astype() 的行为，当给定一个不是显式 SparseDtype 的数据类型时，现在会转换为精确请求的数据类型，而不是静默地使用 SparseDtype (GH 34457)
更改了 Index.ravel() 的行为，现在返回原始 Index 的视图而不是 np.ndarray (GH 36900)
更改了 Series.to_frame() 和 Index.to_frame() 在显式 name=None 时的行为，现在列名将使用 None 而不是索引的名称或默认的 0 (GH 45523)
更改了 concat() 在一个数组为 `bool` 类型，另一个为整数类型时的行为，现在返回 `object` 类型而不是整数类型；请在连接前将 `bool` 对象显式转换为整数以获得旧行为 (GH 45101)
更改了 DataFrame 构造函数在给定浮点 data 和整数 dtype 时的行为，当数据无法无损转换时，将保留浮点数据类型，与 Series 行为一致 (GH 41170)
更改了 Index 构造函数在给定包含数字条目的 `object` 类型 np.ndarray 时的行为；现在将保留 `object` 数据类型，而不是推断为数字数据类型，与 Series 行为一致 (GH 42870)
更改了 Index.__and__()、Index.__or__() 和 Index.__xor__() 的行为，现在作为逻辑操作（与 Series 行为一致）而不是集合操作的别名 (GH 37374)
更改了 DataFrame 构造函数在传入列表且第一个元素为 Categorical 时的行为，现在将元素视为行，转换为 `object` 数据类型，与其他类型行为一致 (GH 38845)
更改了 DataFrame 构造函数在传入数据无法转换为的 `dtype`（非 `int`）时的行为；现在将引发错误而不是静默忽略 `dtype` (GH 41733)
更改了 Series 构造函数的行为，它将不再从字符串条目推断 `datetime64` 或 `timedelta64` 数据类型 (GH 41731)
更改了 Timestamp 构造函数在传入 np.datetime64 对象和 tz 时的行为，现在将输入解释为墙上时间而不是 UTC 时间 (GH 42288)
更改了 Timestamp.utcfromtimestamp() 的行为，现在返回一个时区感知的对象，满足 Timestamp.utcfromtimestamp(val).timestamp() == val (GH 45083)
更改了 Index 构造函数在传入 SparseArray 或 SparseDtype 时的行为，现在将保留该数据类型而不是转换为 numpy.ndarray (GH 43930)
更改了 DatetimeTZDtype 对象的类 `setitem` 操作（__setitem__、fillna、where、mask、replace、insert，以及 shift 的 fill_value）的行为，当使用具有不匹配时区的值时，该值将转换为对象的时区，而不是将两者都转换为 `object` 数据类型 (GH 44243)
更改了 Index、Series、DataFrame 构造函数在浮点数据类型和 DatetimeTZDtype 时的行为，现在数据被解释为 UTC 时间而不是墙上时间，与整数数据类型的处理方式一致 (GH 45573)
更改了 Series 和 DataFrame 构造函数在整数数据类型和包含 NaN 的浮点数据时的行为，现在将引发 IntCastingNaNError (GH 40110)
当 Series 和 DataFrame 构造函数使用整数 dtype 且值过大无法无损转换为此 dtype 时，其行为已变更，现在会引发 ValueError (GH 41734)
当 Series 和 DataFrame 构造函数使用整数 dtype 且值具有 datetime64 或 timedelta64 `dtype`时，其行为已变更，现在会引发 TypeError，请改用 values.view("int64") (GH 41770)
从 pandas.DataFrame.resample()、 pandas.Series.resample() 和 pandas.Grouper 中移除了已弃用的 base 和 loffset 参数。请改用 offset 或 origin (GH 31809)
当 Series.fillna() 和 DataFrame.fillna() 具有 timedelta64[ns] `dtype` 且 fill_value 不兼容时，其行为已变更；现在会转换为 object `dtype` 而不是引发错误，与其他 `dtype` 的行为保持一致 (GH 45746)
Series.str.replace() 中 regex 的默认参数已从 True 更改为 False。此外，当 regex=True 时，单个字符的 pat 现在将被视为正则表达式而非字符串字面量。(GH 36695, GH 24804)
当 DataFrame.any() 和 DataFrame.all() 使用 bool_only=True 时，其行为已变更；包含所有布尔值的 object `dtype` 列将不再被包含在内，请先手动转换为 bool `dtype` (GH 46188)
当 DataFrame.max()、 DataFrame.min、 DataFrame.mean、 DataFrame.median、 DataFrame.skew、 DataFrame.kurt 使用 axis=None 时，其行为已变更，现在会返回一个在两个轴上应用聚合操作的标量 (GH 45072)
Timestamp 与 datetime.date 对象进行比较的行为已变更；现在它们比较结果为不相等，并在不相等比较时引发错误，与 datetime.datetime 行为一致 (GH 36131)
NaT 与 datetime.date 对象进行比较的行为已变更；现在它们会在不相等比较时引发错误 (GH 39196)
强制执行弃用：当 Series.transform 和 DataFrame.transform 与列表或字典一起使用时，静默删除引发 TypeError 的列的行为已强制执行 (GH 43740)
DataFrame.apply() 使用列表类型参数时的行为已变更，任何部分失败都将引发错误 (GH 43740)
DataFrame.to_latex() 的行为已变更，现在通过 Styler.to_latex() 使用 Styler 实现 (GH 47970)
当键为整数且索引为 Float64Index 且键不在索引中时，Series.__setitem__() 的行为已变更；之前我们将键视为位置索引（行为类似于 series.iloc[key] = val），现在我们将其视为标签索引（行为类似于 series.loc[key] = val），与 Series.__getitem__`() 的行为一致 (GH 33469)
从 factorize()、 Index.factorize() 和 ExtensionArray.factorize() 中移除了 na_sentinel 参数 (GH 47157)
当 Series.diff() 和 DataFrame.diff() 使用其数组未实现 diff 的 ExtensionDtype `dtype` 时，其行为已变更，现在会引发 TypeError 而非转换为 numpy (GH 31025)
强制执行弃用：在 DataFrame 上调用 method="outer" 的 numpy “ufunc”；现在会引发 NotImplementedError (GH 36955)
强制执行弃用：禁止将 numeric_only=True 传递给非数值 `dtype` 的 Series 聚合操作（rank、 any、 all 等）(GH 47500)
DataFrameGroupBy.apply() 和 SeriesGroupBy.apply() 的行为已变更，即使检测到转换器，也会遵守 group_keys 参数 (GH 34998)
当 DataFrame 的列与 Series 的索引不匹配时，它们之间的比较会引发 ValueError 而不是自动对齐，请在比较前执行 left, right = left.align(right, axis=1, copy=False) (GH 36795)
强制执行弃用：`DataFrame` 聚合操作中 numeric_only=None（默认值）时静默删除引发错误的列的行为；numeric_only 现在默认为 False (GH 41480)
所有带有 numeric_only 参数的 DataFrame 方法中，numeric_only 的默认值已更改为 False (GH 46096, GH 46906)
Series.rank() 中 numeric_only 的默认值已更改为 False (GH 47561)
强制执行弃用：在 groupby 和 resample 操作中，当 numeric_only=False 时静默删除无关列的行为 (GH 41475)
强制执行弃用：在 Rolling、 Expanding 和 ExponentialMovingWindow 操作中静默删除无关列的行为。这现在将引发 errors.DataError (GH 42834)
使用 df.loc[:, foo] = bar 或 df.iloc[:, foo] = bar 设置值的行为已变更，现在总是尝试原地设置值，然后才回退到类型转换 (GH 45333)
各种 DataFrameGroupBy 方法中 numeric_only 的默认值已更改；现在所有方法都默认为 numeric_only=False (GH 46072)
Resampler 方法中 numeric_only 的默认值已更改为 False (GH 47177)
使用 DataFrameGroupBy.transform() 方法，当可调用对象返回 DataFrame 时，将会与输入的索引对齐 (GH 47244)
当向 DataFrame.groupby() 提供一个长度为1的列列表时，遍历生成的 DataFrameGroupBy 对象时返回的键现在将是长度为1的元组 (GH 47761)
移除了已弃用的方法 ExcelWriter.write_cells()、 ExcelWriter.save()、 ExcelWriter.cur_sheet()、 ExcelWriter.handles()、 ExcelWriter.path() (GH 45795)
ExcelWriter 属性 book 不再可设置；它仍然可以被访问和修改 (GH 48943)
移除了 Rolling、 Expanding 和 ExponentialMovingWindow 操作中未使用的 *args 和 **kwargs (GH 47851)
从 DataFrame.to_csv() 中移除了已弃用的参数 line_terminator (GH 45302)
从 lreshape() 中移除了已弃用的参数 label (GH 30219)
DataFrame.eval() 和 DataFrame.query() 中 expr 之后的参数仅限关键字参数 (GH 47587)
移除了 Index._get_attributes_dict() (GH 50648)
移除了 Series.__array_wrap__() (GH 50648)
DataFrame.value_counts() 的行为已变更，现在对于任何列表类型（无论是否只包含一个元素）返回带有 MultiIndex 的 Series，但对于单个标签返回 Index (GH 50829)



性能改进#

DataFrameGroupBy.median()、 SeriesGroupBy.median() 和 DataFrameGroupBy.cumprod() 对可空 `dtype` 的性能改进 (GH 37493)
DataFrameGroupBy.all()、 DataFrameGroupBy.any()、 SeriesGroupBy.all() 和 SeriesGroupBy.any() 对 object `dtype` 的性能改进 (GH 50623)
MultiIndex.argsort() 和 MultiIndex.sort_values() 的性能改进 (GH 48406)
MultiIndex.size() 的性能改进 (GH 48723)
MultiIndex.union() 在无缺失值且无重复项时的性能改进 (GH 48505, GH 48752)
MultiIndex.difference() 的性能改进 (GH 48606)
MultiIndex 集合操作中 sort=None 时的性能改进 (GH 49010)
DataFrameGroupBy.mean()、 SeriesGroupBy.mean()、 DataFrameGroupBy.var() 和 SeriesGroupBy.var() 对扩展数组 `dtype` 的性能改进 (GH 37493)
MultiIndex.isin() 在 level=None 时的性能改进 (GH 48622, GH 49577)
MultiIndex.putmask() 的性能改进 (GH 49830)
Index.union() 和 MultiIndex.union() 在索引包含重复项时的性能改进 (GH 48900)
Series.rank() 对 pyarrow 支持的 `dtype` 的性能改进 (GH 50264)
Series.searchsorted() 对 pyarrow 支持的 `dtype` 的性能改进 (GH 50447)
Series.fillna() 对扩展数组 `dtype` 的性能改进 (GH 49722, GH 50078)
Index.join()、 Index.intersection() 和 Index.union() 在 Index 单调时，对掩码和 arrow `dtype` 的性能改进 (GH 50310, GH 51365)
Series.value_counts() 对可空 `dtype` 的性能改进 (GH 48338)
Series 构造函数传递带有可空 `dtype` 的整数 numpy 数组时的性能改进 (GH 48338)
DatetimeIndex 构造函数传递列表时的性能改进 (GH 48609)
merge() 和 DataFrame.join() 在连接排序后的 MultiIndex 时的性能改进 (GH 48504)
to_datetime() 在解析带有时区偏移的字符串时的性能改进 (GH 50107)
DataFrame.loc() 和 Series.loc() 在 MultiIndex 进行基于元组的索引时的性能改进 (GH 48384)
Series.replace() 对分类 `dtype` 的性能改进 (GH 49404)
MultiIndex.unique() 的性能改进 (GH 48335)
对可空和 arrow `dtype` 的索引操作的性能改进 (GH 49420, GH 51316)
concat() 对扩展数组支持的索引的性能改进 (GH 49128, GH 49178)
api.types.infer_dtype() 的性能改进 (GH 51054)
当使用 BZ2 或 LZMA 时，降低了 DataFrame.to_pickle()/Series.to_pickle() 的内存使用 (GH 49068)
StringArray 构造函数传递 np.str_ 类型的 numpy 数组时的性能改进 (GH 49109)
from_tuples() 的性能改进 (GH 50620)
factorize() 的性能改进 (GH 49177)
__setitem__() 的性能改进 (GH 50248, GH 50632)
当数组包含 NA 时，ArrowExtensionArray 比较方法的性能改进 (GH 50524)
to_numpy() 的性能改进 (GH 49973, GH 51227)
解析字符串到 BooleanDtype 的性能改进 (GH 50613)
DataFrame.join() 在连接 MultiIndex 的子集时的性能改进 (GH 48611)
MultiIndex.intersection() 的性能改进 (GH 48604)
DataFrame.__setitem__() 的性能改进 (GH 46267)
可空 `dtype` 的 var 和 std 的性能改进 (GH 48379)。
遍历 pyarrow 和可空 `dtype` 时的性能改进 (GH 49825, GH 49851)
read_sas() 的性能改进 (GH 47403, GH 47405, GH 47656, GH 48502)
RangeIndex.sort_values() 的内存改进 (GH 48801)
当 copy=True 时，Series.to_numpy() 通过避免两次复制来提高性能 (GH 24345)
Series.rename() 配合 MultiIndex 时的性能改进 (GH 21055)
当 by 为分类类型且 sort=False 时，DataFrameGroupBy 和 SeriesGroupBy 的性能改进 (GH 48976)
当 by 为分类类型且 observed=False 时，DataFrameGroupBy 和 SeriesGroupBy 的性能改进 (GH 49596)
read_stata() 在参数 index_col 设置为 None（默认值）时的性能改进。现在索引将是 RangeIndex 而不是 Int64Index (GH 49745)
merge() 在不合并索引时的性能改进 - 新索引现在将是 RangeIndex 而不是 Int64Index (GH 49478)
DataFrame.to_dict() 和 Series.to_dict() 在使用任何非对象 `dtype` 时的性能改进 (GH 46470)
read_html() 在存在多个表格时的性能改进 (GH 49929)
Period 构造函数在从字符串或整数构造时的性能改进 (GH 38312)
to_datetime() 在使用 '%Y%m%d' 格式时的性能改进 (GH 17410)
to_datetime() 在给定或可推断格式时的性能改进 (GH 50465)
Series.median() 对可空 `dtype` 的性能改进 (GH 50838)
read_csv() 在将 to_datetime() lambda 函数传递给 date_parser 且输入具有混合时区偏移时的性能改进 (GH 35296)
isna() 和 isnull() 的性能改进 (GH 50658)
SeriesGroupBy.value_counts() 对分类 `dtype` 的性能改进 (GH 46202)
修复了 read_hdf() 中的引用泄漏 (GH 37441)
修复了 DataFrame.to_json() 和 Series.to_json() 在序列化日期时间和时间差时的内存泄漏 (GH 40443)
降低了许多 DataFrameGroupBy 方法的内存使用 (GH 51090)
DataFrame.round() 对整数 decimal 参数的性能改进 (GH 17254)
DataFrame.replace() 和 Series.replace() 在使用大型字典作为 to_replace 时的性能改进 (GH 6697)
StataReader 在读取可寻址文件时的内存改进 (GH 48922)



错误修复#

Categorical#

Categorical.set_categories() 丢失 `dtype` 信息的错误 (GH 48812)
Series.replace() 在 `dtype` 为分类类型且 to_replace 值与新值重叠时的错误 (GH 49404)
Series.replace() 在 `dtype` 为分类类型时丢失底层分类的可空 `dtype` 的错误 (GH 49404)
DataFrame.groupby() 和 Series.groupby() 在用作分组器时会重新排序分类的错误 (GH 48749)
Categorical 构造函数在从 Categorical 对象构造且 dtype="category" 时丢失有序性的错误 (GH 49309)
SeriesGroupBy.min()、 SeriesGroupBy.max()、 DataFrameGroupBy.min() 和 DataFrameGroupBy.max() 在使用无序 CategoricalDtype 且没有分组时未能引发 TypeError 的错误 (GH 51034)



日期时间类型#

pandas.infer_freq() 在 RangeIndex 上推断时引发 TypeError 的错误 (GH 47084)
to_datetime() 在字符串参数对应大整数时错误地引发 OverflowError 的错误 (GH 50533)
to_datetime() 在 errors='coerce' 和 infer_datetime_format=True 时对无效偏移量引发错误的错误 (GH 48633)
DatetimeIndex 构造函数在 tz=None 与时区感知 `dtype` 或数据结合明确指定时未能引发错误的错误 (GH 48659)
从 DatetimeIndex 减去 datetime 标量时未能保留原始 freq 属性的错误 (GH 48818)
pandas.tseries.holiday.Holiday 中，半开日期区间导致 USFederalHolidayCalendar.holidays() 返回类型不一致的错误 (GH 49075)
在夏令时转换附近，使用 `dateutil` 或 `zoneinfo` 时区渲染带有时区感知 `dtype` 的 DatetimeIndex、 Series 和 DataFrame 的错误 (GH 49684)
to_datetime() 在解析 Timestamp、 datetime.datetime、 datetime.date 或 np.datetime64 对象且传递非 ISO8601 format 时引发 ValueError 的错误 (GH 49298, GH 50036)
to_datetime() 在解析空字符串并传递非 ISO8601 格式时引发 ValueError 的错误。现在，空字符串将被解析为 NaT，以与 ISO8601 格式的处理方式兼容 (GH 50251)
Timestamp 在解析非 ISO8601 分隔的日期字符串时显示 UserWarning，但该警告对用户而言无法采取行动的错误 (GH 50232)
to_datetime() 在解析包含 ISO 周指令和 ISO 工作日指令格式的日期时显示误导性的 ValueError 的错误 (GH 50308)
Timestamp.round() 当 freq 参数持续时间为零（例如“0ns”）时，返回错误结果而非引发错误的错误 (GH 49737)
to_datetime() 在传递无效格式且 errors 为 'ignore' 或 'coerce' 时未引发 ValueError 的错误 (GH 50266)
DateOffset 在使用毫秒和另一个超日参数构造时引发 TypeError 的错误 (GH 49897)
to_datetime() 在解析带有 %Y%m%d 格式的十进制日期字符串时未引发 ValueError 的错误 (GH 50051)
to_datetime() 在解析带有 ISO8601 格式的混合偏移日期字符串时未将 None 转换为 NaT 的错误 (GH 50071)
to_datetime() 在解析超出范围的日期字符串且 errors='ignore' 和 format='%Y%m%d' 时未返回输入的错误 (GH 14487)
to_datetime() 在使用时区感知字符串、ISO8601 格式和 utc=False 解析时，将时区非感知 datetime.datetime 转换为时区感知的错误 (GH 50254)
to_datetime() 在解析 ISO8601 格式日期且某些值未填充零时引发 ValueError 的错误 (GH 21422)
在 to_datetime() 中存在一个 Bug，当使用 format='%Y%m%d' 和 errors='ignore' 时会给出不正确的结果（GH 26493）
在 to_datetime() 中存在一个 Bug，如果 format 不是 ISO8601 格式，它将无法解析日期字符串 'today' 和 'now'（GH 50359）
在 Timestamp.utctimetuple() 中存在一个 Bug，导致其抛出 TypeError（GH 32174）
在 to_datetime() 中存在一个 Bug，当使用 errors='ignore' 解析混合偏移的 Timestamp 时会抛出 ValueError（GH 50585）
在 to_datetime() 中存在一个 Bug，它错误地处理了在溢出边界 1 个 unit 范围内的浮点输入（GH 50183）
在 to_datetime() 中存在一个 Bug，当单位为“Y”或“M”时会给出不正确的结果，与逐点的 Timestamp 结果不匹配（GH 50870）
在 Series.interpolate() 和 DataFrame.interpolate() 中存在一个 Bug，当使用 datetime 或 timedelta 数据类型时，会错误地抛出 ValueError（GH 11312）
在 to_datetime() 中存在一个 Bug，当输入超出边界时，带有 errors='ignore' 的输入未被返回（GH 50587）
在 DataFrame.from_records() 中存在一个 Bug，当给定具有时区感知 datetime64 列的 DataFrame 输入时，错误地丢弃了时区感知（GH 51162）
在 to_datetime() 中存在一个 Bug，当使用 errors='coerce' 解析日期字符串时，会抛出 decimal.InvalidOperation（GH 51084）
在 to_datetime() 中存在一个 Bug，当同时指定 unit 和 origin 时会返回不正确的结果（GH 42624）
在 Series.astype() 和 DataFrame.astype() 中存在一个 Bug，当将包含时区感知日期时间或字符串的对象类型对象转换为 datetime64[ns] 时，会错误地将其本地化为 UTC，而不是抛出 TypeError（GH 50140）
在 DataFrameGroupBy.quantile() 和 SeriesGroupBy.quantile() 中存在一个 Bug，当使用 datetime 或 timedelta 数据类型时，对于包含 NaT 的组会给出不正确的结果（GH 51373）
在 DataFrameGroupBy.quantile() 和 SeriesGroupBy.quantile() 中存在一个 Bug，当使用 PeriodDtype 或 DatetimeTZDtype 时会错误地抛出异常（GH 51373）



时间差#

在 to_timedelta() 中存在一个 Bug，当输入具有可空数据类型 Float64 时会抛出错误（GH 48796）
Timedelta 构造函数中的错误：当给定 np.timedelta64("nat") 时，错误地抛出异常而不是返回 NaT (GH 48898)
Timedelta 构造函数中的错误：当同时传入 Timedelta 对象和关键字（如 days, seconds）时未能抛出异常 (GH 48898)
Timedelta 与非常大的 datetime.timedelta 对象进行比较时，错误地引发 OutOfBoundsTimedelta 的错误 (GH 49021)



时区#

Series.astype() 和 DataFrame.astype() 中的错误：当 object-dtype 包含多个时区感知（timezone-aware）的 datetime 对象且时区不一致时，将其转换为 DatetimeTZDtype 会错误地引发异常 (GH 32581)
to_datetime() 中的错误：当 format 使用 %Z 指定时，未能解析带有时区名称的日期字符串 (GH 49748)
改进了当 Timestamp.tz_localize() 中 ambiguous 参数传入无效值时的错误消息 (GH 49565)
字符串解析中的错误：错误地允许使用无效时区构造 Timestamp，这会在尝试打印时引发异常 (GH 50668)
修正了 objects_to_datetime64ns() 中的 TypeError 消息，以提示 DatetimeIndex 具有混合时区 (GH 50974)



数值#

DataFrame.add() 中的错误：当输入包含混合 DataFrame 类型和 Series 类型时，无法应用 ufunc (GH 39853)
对 Series 执行算术运算时的错误：在结合掩码数据类型和 numpy 数据类型时未传播掩码 (GH 45810, GH 42630)
DataFrame.sem() 和 Series.sem() 中的错误：当使用 ArrowDtype 支持的数据时，始终会引发错误的 TypeError (GH 49759)
Series.__add__() 中的错误：对于列表和掩码 Series，会转换为 object 类型 (GH 22962)
mode() 中的错误：当存在 NA 值时，不尊重 dropna=False 参数 (GH 50982)
DataFrame.query() 中的错误：当 engine="numexpr" 且列名为 min 或 max 时，会引发 TypeError (GH 50937)
DataFrame.min() 和 DataFrame.max() 中的错误：当带有 tz-aware 数据且包含 pd.NaT 和 axis=1 时，会返回不正确的结果 (GH 51242)



转换#

从字符串列表构造 Series 时，当 int64 数据类型时，错误地抛出异常而不是进行类型转换 (GH 44923)
构造具有掩码数据类型和布尔值（包含 NA）的 Series 时引发异常的错误 (GH 42137)
DataFrame.eval() 中的错误：当函数调用中存在负值时，错误地引发 AttributeError (GH 46471)
Series.convert_dtypes() 中的错误：当 Series 包含 NA 且数据类型为 object 时，未能将数据类型转换为可空数据类型 (GH 48791)
错误：任何 ExtensionDtype 子类，只要 kind="M"，都会被解释为时区类型 (GH 34986)
arrays.ArrowExtensionArray 中的错误：当传入字符串序列或二进制序列时，会引发 NotImplementedError (GH 49172)
Series.astype() 中的错误：从非 pyarrow 字符串数据类型转换为 pyarrow 数值类型时引发 pyarrow.ArrowInvalid (GH 50430)
DataFrame.astype() 中的错误：当转换为 string 且 copy=False 时，会就地修改输入数组 (GH 51073)
Series.to_numpy() 中的错误：在应用 na_value 之前就将数据转换为 NumPy 数组 (GH 48951)
DataFrame.astype() 中的错误：当转换为 pyarrow 数据类型时未复制数据 (GH 50984)
to_datetime() 中的错误：当 format 是 ISO8601 格式时，未能尊重 exact 参数 (GH 12649)
TimedeltaArray.astype() 中的错误：转换为 pyarrow duration 类型时引发 TypeError (GH 49795)
DataFrame.eval() 和 DataFrame.query() 中的错误：对扩展数组数据类型引发异常 (GH 29618, GH 50261, GH 31913)
Series() 中的错误：当从 Index 创建且 dtype 等于 Index 的 dtype 时，未复制数据 (GH 52008)



字符串#

pandas.api.types.is_string_dtype() 中的错误：当数据类型为 StringDtype 或带有 pyarrow.string() 的 ArrowDtype 时，不会返回 True (GH 15585)
将字符串数据类型转换为“datetime64[ns]”或“timedelta64[ns]”时，错误地引发 TypeError 的错误 (GH 36153)
在字符串数据类型列中用数组设置值时，当数组包含缺失值时，作为副作用会修改数组的错误 (GH 51299)



区间#

IntervalIndex.is_overlapping() 中的错误：当区间具有重复的左边界时，输出不正确 (GH 49581)
Series.infer_objects() 中的错误：未能为由 Interval 对象组成的 object series 推断 IntervalDtype (GH 50090)
Series.shift() 中的错误：当使用 IntervalDtype 和无效的 null fill_value 时，未能引发 TypeError (GH 51258)



索引#

DataFrame.__setitem__() 中的错误：当索引器是 boolean 数据类型的 DataFrame 时引发异常 (GH 47125)
DataFrame.reindex() 中的错误：当索引列和索引为 uint 数据类型时，填充了错误的值 (GH 48184)
DataFrame.loc() 中的错误：当设置具有不同数据类型的 DataFrame 时，会将值强制转换为单一数据类型 (GH 50467)
DataFrame.sort_values() 中的错误：当 by 为空列表且 inplace=True 时，未返回 None (GH 50643)
DataFrame.loc() 中的错误：当使用列表索引器设置值时，强制转换数据类型 (GH 49159)
Series.loc() 中的错误：对超出范围的切片结束索引器引发错误 (GH 50161)
DataFrame.loc() 中的错误：当使用全 False 的 bool 索引器和空对象时，引发 ValueError (GH 51450)
DataFrame.loc() 中的错误：当使用 bool 索引器和 MultiIndex 时，引发 ValueError (GH 47687)
DataFrame.loc() 中的错误：当使用非标量索引器设置 pyarrow 后端列的值时，引发 IndexError (GH 50085)
DataFrame.__getitem__()、Series.__getitem__()、DataFrame.__setitem__() 和 Series.__setitem__() 中的错误：当对具有扩展浮点数据类型（Float64 & Float64）或使用整数的复杂数据类型的索引进行索引时 (GH 51053)
DataFrame.loc() 中的错误：当使用空索引器设置不兼容值时修改对象 (GH 45981)
DataFrame.__setitem__() 中的错误：当右侧是具有 MultiIndex 列的 DataFrame 时，引发 ValueError (GH 49121)
DataFrame.reindex() 中的错误：当 DataFrame 具有单个扩展数组列且对 columns 和 index 进行重新索引时，将数据类型强制转换为 object (GH 48190)
DataFrame.iloc() 中的错误：当索引器是具有数值扩展数组数据类型的 Series 时，引发 IndexError (GH 49521)
describe() 中的错误：在结果索引中格式化百分位数时，显示了比所需更多的小数位 (GH 46362)
DataFrame.compare() 中的错误：当比较可空数据类型中的 NA 与值时，不识别差异 (GH 48939)
Series.rename() 中的错误：当使用 MultiIndex 时丢失扩展数组数据类型 (GH 21055)
DataFrame.isetitem() 中的错误：将 DataFrame 中的扩展数组数据类型强制转换为 object 类型 (GH 49922)
Series.__getitem__() 中的错误：当从空的 pyarrow 后端对象中选择时返回损坏的对象 (GH 51734)
BusinessHour 中的错误：当索引中不包含开放时间时，会导致 DatetimeIndex 创建失败 (GH 49835)



缺失值#

Index.equals() 中的错误：当 Index 由包含 NA 的元组组成时，引发 TypeError (GH 48446)
Series.map() 中的错误：当数据包含 NaN 且使用了 defaultdict 映射时，导致结果不正确 (GH 48813)
NA 中的错误：当与 bytes 对象执行二进制操作时，引发 TypeError 而不是返回 NA (GH 49108)
DataFrame.update() 中的错误：当 overwrite=False 且 self 具有包含 NaT 值的列但该列在 other 中不存在时，引发 TypeError (GH 16713)
Series.replace() 中的错误：当替换包含 NA 的 object-dtype Series 中的值时，引发 RecursionError (GH 47480)
Series.replace() 中的错误：当替换包含 NA 的数值 Series 中的值时，引发 RecursionError (GH 50758)



多级索引#

MultiIndex.get_indexer() 中的错误：不匹配 NaN 值 (GH 29252, GH 37222, GH 38623, GH 42883, GH 43222, GH 46173, GH 48905)
MultiIndex.argsort() 中的错误：当索引包含 NA 时，引发 TypeError (GH 48495)
MultiIndex.difference() 中的错误：丢失扩展数组数据类型 (GH 48606)
MultiIndex.set_levels 中的错误：设置空级别时引发 IndexError (GH 48636)
MultiIndex.unique() 中的错误：丢失扩展数组数据类型 (GH 48335)
MultiIndex.intersection() 中的错误：丢失扩展数组 (GH 48604)
MultiIndex.union() 中的错误：丢失扩展数组 (GH 48498, GH 48505, GH 48900)
MultiIndex.union() 中的错误：当 sort=None 且索引包含缺失值时未排序 (GH 49010)
MultiIndex.append() 中的错误：未检查名称是否相等 (GH 48288)
MultiIndex.symmetric_difference() 中的错误：丢失扩展数组 (GH 48607)
MultiIndex.join() 中的错误：当 MultiIndex 包含重复项时丢失数据类型 (GH 49830)
MultiIndex.putmask() 中的错误：丢失扩展数组 (GH 49830)
MultiIndex.value_counts() 中的错误：返回的 Series 由元组的扁平索引而非 MultiIndex 索引 (GH 49558)



输入/输出#

read_sas() 中的错误：导致 DataFrame 碎片化并引发 errors.PerformanceWarning (GH 48595)
改进了 read_excel() 中的错误消息，在读取文件时引发异常时包含了导致错误的表名 (GH 48706)
错误：当序列化 PyArrow 后端数据的子集时，会序列化整个数据而不是子集 (GH 42600)
read_sql_query() 中的错误：当指定 chunksize 且结果为空时，忽略 dtype 参数 (GH 50245)
read_csv() 中的错误：对于列数少于 names 的单行 csv，当 engine="c" 时引发 errors.ParserError (GH 47566)
read_json() 中的错误：当 orient="table" 且存在 NA 值时引发异常 (GH 40255)
显示 string 数据类型时未显示存储选项的错误 (GH 50099)
DataFrame.to_string() 中的错误：当 header=False 时，将索引名称打印在数据第一行的同一行 (GH 49230)
DataFrame.to_string() 中的错误：忽略扩展数组的浮点数格式化器 (GH 39336)
修复了源自内部 JSON 模块初始化的内存泄漏问题 (GH 49222)
修复了 json_normalize() 错误地从与 sep 参数匹配的列名中移除前导字符的问题 (GH 49861)
read_csv() 中的错误：当扩展数组数据类型包含 NA 时，不必要地溢出 (GH 32134)
DataFrame.to_dict() 中的错误：未将 NA 转换为 None (GH 50795)
DataFrame.to_json() 中的错误：当未能编码字符串时，会导致段错误 (segfault) (GH 50307)
DataFrame.to_html() 中的错误：当 DataFrame 包含非标量数据时，设置 na_rep 无效 (GH 47103)
read_xml() 中的错误：当使用 iterparse 时，类文件对象失败 (GH 50641)
read_csv() 中的错误：当 engine="pyarrow" 时，encoding 参数未正确处理 (GH 51302)
read_xml() 中的错误：当使用 iterparse 时，忽略重复元素 (GH 51183)
ExcelWriter 中的错误：如果在实例化过程中发生异常，会留下文件句柄未关闭 (GH 51443)
DataFrame.to_parquet() 中的错误：当 engine="pyarrow" 时，非字符串索引或列会引发 ValueError (GH 52036)



周期#

Period.strftime() 和 PeriodIndex.strftime() 中的错误：当传入特定于语言环境的指令时，引发 UnicodeDecodeError (GH 46319)
将 Period 对象添加到 DateOffset 对象数组时，错误地引发 TypeError 的错误 (GH 50162)
Period 中的错误：当传入精度高于纳秒的字符串时，会导致 KeyError，而不是丢弃额外精度 (GH 50417)
解析表示周期的字符串（例如“2017-01-23/2017-01-29”）时，错误地将其解析为分钟频率而不是周频率的错误 (GH 50803)
DataFrameGroupBy.sum(), DataFrameGroupByGroupBy.cumsum(), DataFrameGroupByGroupBy.prod(), DataFrameGroupByGroupBy.cumprod() 中的错误：当使用 PeriodDtype 时，未能引发 TypeError (GH 51040)
解析空字符串时，Period 错误地引发 ValueError 而不是返回 NaT 的错误 (GH 51349)



绘图#

DataFrame.plot.hist() 中的错误：未丢弃 data 中对应 NaN 值的 weights 元素 (GH 48884)
ax.set_xlim 有时会引发 UserWarning，用户无法解决，因为 set_xlim 不接受解析参数——转换器现在改为使用 Timestamp() (GH 49148)



分组/重采样/滚动#

ExponentialMovingWindow 中的错误：online 对于不支持的操作未引发 NotImplementedError (GH 48834)
DataFrameGroupBy.sample() 中的错误：当对象为空时，引发 ValueError (GH 48459)
Series.groupby() 中的错误：当索引中的某个条目等于索引名称时，引发 ValueError (GH 48567)
DataFrameGroupBy.resample() 中的错误：当传入空 DataFrame 时，产生不一致的结果 (GH 47705)
DataFrameGroupBy 和 SeriesGroupBy 中的错误：当按分类索引分组时，结果中不包含未观测到的类别 (GH 49354)
DataFrameGroupBy 和 SeriesGroupBy 中的错误：当按分类分组时，结果顺序会根据输入索引而改变 (GH 49223)
DataFrameGroupBy 和 SeriesGroupBy 中的错误：当按分类数据分组时，即使与 sort=False 一起使用，也会对结果值进行排序 (GH 42482)
DataFrameGroupBy.apply() 和 SeriesGroupBy.apply 中的错误：当 as_index=False 且使用分组键失败并引发 TypeError 时，不会尝试不使用分组键进行计算 (GH 49256)
DataFrameGroupBy.describe() 中的错误：会描述分组键 (GH 49256)
SeriesGroupBy.describe() 中的错误：当 as_index=False 时，形状不正确 (GH 49256)
DataFrameGroupBy 和 SeriesGroupBy 中的错误：当分组器为分类数据类型且 dropna=False 时，会丢弃 NA 值 (GH 36327)
SeriesGroupBy.nunique() 中的错误：当分组器是空分类且 observed=True 时，错误地引发异常 (GH 21334)
SeriesGroupBy.nth() 中的错误：从 DataFrameGroupBy 子集化后，当分组器包含 NA 值时引发异常 (GH 26454)
DataFrame.groupby() 中的错误：当 as_index=False 时，结果中不包含由 key 指定的 Grouper (GH 50413)
DataFrameGroupBy.value_counts() 中的错误：与 TimeGrouper 一起使用时引发异常 (GH 50486)
Resampler.size() 中的错误：导致返回宽 DataFrame，而不是带有 MultiIndex 的 Series (GH 46826)
DataFrameGroupBy.transform() 和 SeriesGroupBy.transform() 中的错误：当分组器在 "idxmin" 和 "idxmax" 参数上具有 axis=1 时，会错误地引发异常 (GH 45986)
DataFrameGroupBy 中的错误：当与空 DataFrame、分类分组器和 dropna=False 一起使用时会引发异常 (GH 50634)
SeriesGroupBy.value_counts() 中的错误：不尊重 sort=False (GH 50482)
DataFrameGroupBy.resample() 中的错误：当对时间索引进行重采样时，从键列表中获取结果会引发 KeyError (GH 50840)
DataFrameGroupBy.transform() 和 SeriesGroupBy.transform() 中的错误：当分组器在 "ngroup" 参数上具有 axis=1 时，会错误地引发异常 (GH 45986)
DataFrameGroupBy.describe() 中的错误：当数据具有重复列时，产生不正确的结果 (GH 50806)
DataFrameGroupBy.agg() 中的错误：当 engine="numba" 时，未能尊重 as_index=False (GH 51228)
DataFrameGroupBy.agg(), SeriesGroupBy.agg(), 和 Resampler.agg() 中的错误：当传入函数列表时，会忽略参数 (GH 50863)
DataFrameGroupBy.ohlc() 中的错误：忽略 as_index=False (GH 51413)
DataFrameGroupBy.agg() 中的错误：在子集列（例如 .groupby(...)[["a", "b"]]）后，结果中不包含分组 (GH 51186)



重塑#

DataFrame.pivot_table() 中的错误：当可空数据类型且 margins=True 时，引发 TypeError (GH 48681)
DataFrame.unstack() 和 Series.unstack() 中的错误：当 MultiIndex 具有混合名称时，错误地拆堆了 MultiIndex 的错误级别 (GH 48763)
DataFrame.melt() 中的错误：丢失扩展数组数据类型 (GH 41570)
DataFrame.pivot() 中的错误：不尊重 None 作为列名 (GH 48293)
DataFrame.join() 中的错误：当 left_on 或 right_on 是或包含 CategoricalIndex 时，错误地引发 AttributeError (GH 48464)
DataFrame.pivot_table() 中的错误：当结果是空 DataFrame 时，参数 margins=True 引发 ValueError (GH 49240)
澄清了 merge() 中传入无效 validate 选项时的错误消息 (GH 49417)
DataFrame.explode() 中的错误：当多列包含 NaN 值或空列表时，引发 ValueError (GH 46084)
DataFrame.transpose() 中的错误：当 IntervalDtype 列具有 timedelta64[ns] 终点时 (GH 44917)
DataFrame.agg() 和 Series.agg() 中的错误：当传入函数列表时，会忽略参数 (GH 50863)



稀疏#

Series.astype() 中的错误：当将具有 datetime64[ns] 子类型的 SparseDtype 转换为 int64 数据类型时，引发异常，这与非稀疏行为不一致 (GH 49631,:issue:50087)
Series.astype() 中的错误：当从 datetime64[ns] 转换为 Sparse[datetime64[ns]] 时，错误地引发异常 (GH 50082)
Series.sparse.to_coo() 中的错误：当 MultiIndex 包含 ExtensionArray 时，引发 SystemError (GH 50996)



扩展数组#

Series.mean() 中的错误：当使用可空整数时，不必要地溢出 (GH 48378)
Series.tolist() 中的错误：对于可空数据类型，返回 NumPy 标量而不是 Python 标量 (GH 49890)
Series.round() 中的错误：对于 pyarrow 后端数据类型，引发 AttributeError (GH 50437)
错误：当将空 DataFrame（带有 ExtensionDtype）与另一个具有相同 ExtensionDtype 的 DataFrame 连接时，结果数据类型变为 object (GH 48510)
在指定 na_value 时，array.PandasArray.to_numpy() 在处理 NA 值时引发异常的错误 (GH 40638)
api.types.is_numeric_dtype() 中的错误：当自定义 ExtensionDtype 的 _is_numeric 返回 True 时，它不会返回 True (GH 50563)
api.types.is_integer_dtype()、api.types.is_unsigned_integer_dtype()、api.types.is_signed_integer_dtype()、api.types.is_float_dtype() 中的错误：当自定义 ExtensionDtype 的 kind 返回相应的 NumPy 类型时，它不会返回 True (GH 50667)
Series 构造函数中存在错误，对于可为空的无符号整数 dtype 会不必要地溢出 (GH 38798, GH 25880)
向 StringArray 设置非字符串值时引发 ValueError 而不是 TypeError 的错误 (GH 49632)
DataFrame.reindex() 中的错误：在列具有 ExtensionDtype 的情况下，它不遵循默认的 copy=True 关键字（因此，使用 getitem ([]) 选择多列时也未能正确生成副本） (GH 51197)
ArrowExtensionArray 逻辑运算 & 和 | 引发 KeyError 的错误 (GH 51688)



Styler#

修复 background_gradient() 对含有 NA 值的可为空 dtype Series 的支持 (GH 50712)



元数据#

修复了 DataFrame.corr() 和 DataFrame.cov() 中的元数据传播问题 (GH 28283)



其他#

错误地接受包含多次 “[pyarrow]” 的 dtype 字符串的错误 (GH 51548)
Series.searchsorted() 在接受 DataFrame 作为参数 value 时行为不一致的错误 (GH 49620)
array() 未能对 DataFrame 输入引发异常的错误 (GH 51167)




贡献者#
共有260人次为本次发布贡献了补丁。名字旁带有“+”的人是首次贡献补丁。

5j9 +
ABCPAN-rank +
Aarni Koskela +
Aashish KC +
Abubeker Mohammed +
Adam Mróz +
Adam Ormondroyd +
Aditya Anulekh +
Ahmed Ibrahim
Akshay Babbar +
Aleksa Radojicic +
Alex +
Alex Buzenet +
Alex Kirko
Allison Kwan +
Amay Patel +
Ambuj Pawar +
Amotz +
Andreas Schwab +
Andrew Chen +
Anton Shevtsov
Antonio Ossa Guerra +
Antonio Ossa-Guerra +
Anushka Bishnoi +
Arda Kosar
Armin Berres
Asadullah Naeem +
Asish Mahapatra
Bailey Lissington +
BarkotBeyene
Ben Beasley
Bhavesh Rajendra Patil +
Bibek Jha +
Bill +
Bishwas +
CarlosGDCJ +
Carlotta Fabian +
Chris Roth +
Chuck Cadman +
Corralien +
DG +
Dan Hendry +
Daniel Isaac
David Kleindienst +
David Poznik +
David Rudel +
DavidKleindienst +
Dea María Léon +
Deepak Sirohiwal +
Dennis Chukwunta
Douglas Lohmann +
Dries Schaumont
Dustin K +
Edoardo Abati +
Eduardo Chaves +
Ege Özgüroğlu +
Ekaterina Borovikova +
Eli Schwartz +
Elvis Lim +
Emily Taylor +
Emma Carballal Haire +
Erik Welch +
Fangchen Li
Florian Hofstetter +
Flynn Owen +
Fredrik Erlandsson +
Gaurav Sheni
Georeth Chow +
George Munyoro +
Guilherme Beltramini
Gulnur Baimukhambetova +
H L +
Hans
Hatim Zahid +
HighYoda +
Hiki +
Himanshu Wagh +
Hugo van Kemenade +
Idil Ismiguzel +
Irv Lustig
Isaac Chung
Isaac Virshup
JHM Darbyshire
JHM Darbyshire (iMac)
JMBurley
Jaime Di Cristina
Jan Koch
JanVHII +
Janosh Riebesell
JasmandeepKaur +
Jeremy Tuloup
Jessica M +
Jonas Haag
Joris Van den Bossche
João Meirelles +
Julia Aoun +
Justus Magin +
Kang Su Min +
Kevin Sheppard
Khor Chean Wei
Kian Eliasi
Kostya Farber +
KotlinIsland +
Lakmal Pinnaduwage +
Lakshya A Agrawal +
Lawrence Mitchell +
Levi Ob +
Loic Diridollou
Lorenzo Vainigli +
Luca Pizzini +
Lucas Damo +
Luke Manley
Madhuri Patil +
Marc Garcia
Marco Edward Gorelli
Marco Gorelli
MarcoGorelli
Maren Westermann +
Maria Stazherova +
Marie K +
Marielle +
Mark Harfouche +
Marko Pacak +
Martin +
Matheus Cerqueira +
Matheus Pedroni +
Matteo Raso +
Matthew Roeschke
MeeseeksMachine +
Mehdi Mohammadi +
Michael Harris +
Michael Mior +
Natalia Mokeeva +
Neal Muppidi +
Nick Crews
Nishu Choudhary +
Noa Tamir
Noritada Kobayashi
Omkar Yadav +
P. Talley +
Pablo +
Pandas Development Team
Parfait Gasana
Patrick Hoefler
Pedro Nacht +
Philip +
Pietro Battiston
Pooja Subramaniam +
Pranav Saibhushan Ravuri +
Pranav. P. A +
Ralf Gommers +
RaphSku +
Richard Shadrach
Robsdedude +
Roger
Roger Thomas
RogerThomas +
SFuller4 +
Salahuddin +
Sam Rao
Sean Patrick Malloy +
Sebastian Roll +
Shantanu
Shashwat +
Shashwat Agrawal +
Shiko Wamwea +
Shoham Debnath
Shubhankar Lohani +
Siddhartha Gandhi +
Simon Hawkins
Soumik Dutta +
Sowrov Talukder +
Stefanie Molin
Stefanie Senger +
Stepfen Shawn +
Steven Rotondo
Stijn Van Hoey
Sudhansu +
Sven
Sylvain MARIE
Sylvain Marié
Tabea Kossen +
Taylor Packard
Terji Petersen
Thierry Moisan
Thomas H +
Thomas Li
Torsten Wörtwein
Tsvika S +
Tsvika Shapira +
Vamsi Verma +
Vinicius Akira +
William Andrea
William Ayd
William Blum +
Wilson Xing +
Xiao Yuan +
Xnot +
Yasin Tatar +
Yuanhao Geng
Yvan Cywan +
Zachary Moon +
Zhengbo Wang +
abonte +
adrienpacifico +
alm
amotzop +
andyjessen +
anonmouse1 +
bang128 +
bishwas jha +
calhockemeyer +
carla-alves-24 +
carlotta +
casadipietra +
catmar22 +
cfabian +
codamuse +
dataxerik
davidleon123 +
dependabot[bot] +
fdrocha +
github-actions[bot]
himanshu_wagh +
iofall +
jakirkham +
jbrockmendel
jnclt +
joelchen +
joelsonoda +
joshuabello2550
joycewamwea +
kathleenhang +
krasch +
ltoniazzi +
luke396 +
milosz-martynow +
minat-hub +
mliu08 +
monosans +
nealxm
nikitaved +
paradox-lab +
partev
raisadz +
ram vikram singh +
rebecca-palmer
sarvaSanjay +
seljaks +
silviaovo +
smij720 +
soumilbaldota +
stellalin7 +
strawberry beach sandals +
tmoschou +
uzzell +
yqyqyq-W +
yun +
Ádám Lippai
김동현 (Daniel Donghyun Kim) +


              
              
              
              
              
                
                  

       上一页
 2.0.1 版更新（2023 年 4 月 24 日）
 
    下一页
 1.5.3 版更新（2023 年 1 月 18 日）


            
            
              
                

  

  
     本页目录
  
    
功能增强
使用 pip extras 安装可选依赖项
Index 现在可以包含 NumPy 数值 dtypes
参数 dtype_backend，用于返回 pyarrow-backed 或 numpy-backed 可为空的 dtypes
写时复制改进
其他增强功能


重要错误修复
DataFrameGroupBy.cumsum() 和 DataFrameGroupBy.cumprod() 溢出而非有损转换为浮点数
DataFrameGroupBy.nth() 和 SeriesGroupBy.nth() 现在表现为过滤操作


向后不兼容的 API 更改
使用不支持分辨率的 datetime64 或 timedelta64 dtype 构造
值计数将结果名称设置为 count
禁止转换为不支持的 datetime64/timedelta64 dtypes
UTC 和固定偏移时区默认为标准库 tzinfo 对象
空 DataFrame/Series 现在默认为 RangeIndex
DataFrame 到 LaTeX 有新的渲染引擎
依赖项的最低版本要求提高
日期时间现在以一致的格式解析
其他 API 更改


弃用
移除先前版本的弃用/更改
性能改进
错误修复
分类数据
日期时间类型
时间差
时区
数值
转换
字符串
区间
索引
缺失值
多重索引
I/O
周期
绘图
分组/重采样/滚动
重塑
稀疏数据
扩展数组
样式器
元数据
其他


贡献者

  

  

  
    
       显示源代码

2.0.0 版本新特性 (2023 年 4 月 3 日)#

增强功能#

使用 pip extras 安装可选依赖项#

Index 现在可以容纳 NumPy 数值数据类型#

参数 dtype_backend，用于返回 PyArrow 支持或 NumPy 支持的可空数据类型#

写时复制改进#

其他增强功能#

显著的错误修复#

DataFrameGroupBy.cumsum() 和 DataFrameGroupBy.cumprod() 溢出而不是有损转换为浮点数#

DataFrameGroupBy.nth() 和 SeriesGroupBy.nth() 现在表现为过滤操作#

向后不兼容的 API 更改#

使用不支持分辨率的 datetime64 或 timedelta64 数据类型构造#

值计数将结果名称设置为 count#

禁止 astype 转换为不支持的 datetime64/timedelta64 数据类型#

UTC 和固定偏移时区默认使用标准库 tzinfo 对象#

空的 DataFrame/Series 现在将默认拥有 RangeIndex#

DataFrame 到 LaTeX 有一个新的渲染引擎#

依赖项的最低版本提高#

日期时间现在以一致的格式解析#

其他 API 更改#

弃用#

移除先前版本的弃用/更改#

性能改进#

错误修复#

Categorical#

日期时间类型#

时间差#

时区#

数值#

转换#

字符串#

区间#

索引#

缺失值#

多级索引#

输入/输出#

周期#

绘图#

分组/重采样/滚动#

重塑#

稀疏#

扩展数组#

Styler#

元数据#

其他#

贡献者#

`Index` 现在可以容纳 NumPy 数值数据类型#

参数 `dtype_backend`，用于返回 PyArrow 支持或 NumPy 支持的可空数据类型#

`DataFrameGroupBy.cumsum()` 和 `DataFrameGroupBy.cumprod()` 溢出而不是有损转换为浮点数#

`DataFrameGroupBy.nth()` 和 `SeriesGroupBy.nth()` 现在表现为过滤操作#

值计数将结果名称设置为 `count`#

空的 DataFrame/Series 现在将默认拥有 `RangeIndex`#