1.5.0 版本新特性 (2022 年 9 月 19 日)#

以下是 pandas 1.5.0 中的变更。有关完整的变更日志，包括其他 pandas 版本，请参阅发布说明。

增强功能#

`pandas-stubs`#

现在，pandas 开发团队支持 pandas-stubs 库，为 pandas API 提供类型存根。请访问 pandas-dev/pandas-stubs 获取更多信息。

我们感谢 VirtusLab 和 Microsoft 对 pandas-stubs 的初步和重要贡献。

原生 PyArrow 支持的扩展数组 (ExtensionArray)#

安装 Pyarrow 后，用户现在可以创建由 pyarrow.ChunkedArray 和 pyarrow.DataType 支持的 pandas 对象。

dtype 参数可以接受一个 pyarrow 数据类型字符串，并在括号中包含 pyarrow，例如 "int64[pyarrow]"；对于带参数的 pyarrow 数据类型，可以接受用 pyarrow.DataType 初始化的 ArrowDtype。

In [1]: import pyarrow as pa

In [2]: ser_float = pd.Series([1.0, 2.0, None], dtype="float32[pyarrow]")

In [3]: ser_float
Out[3]: 
0     1.0
1     2.0
2    <NA>
dtype: float[pyarrow]

In [4]: list_of_int_type = pd.ArrowDtype(pa.list_(pa.int64()))

In [5]: ser_list = pd.Series([[1, 2], [3, None]], dtype=list_of_int_type)

In [6]: ser_list
Out[6]: 
0      [1. 2.]
1    [ 3. nan]
dtype: list<item: int64>[pyarrow]

In [7]: ser_list.take([1, 0])
Out[7]: 
1    [ 3. nan]
0      [1. 2.]
dtype: list<item: int64>[pyarrow]

In [8]: ser_float * 5
Out[8]: 
0     5.0
1    10.0
2    <NA>
dtype: float[pyarrow]

In [9]: ser_float.mean()
Out[9]: 1.5

In [10]: ser_float.dropna()
Out[10]: 
0    1.0
1    2.0
dtype: float[pyarrow]

大多数操作都受支持，并已使用 pyarrow 计算函数实现。我们建议安装最新版本的 PyArrow 以访问最新实现的计算函数。

警告

此功能是实验性的，API 在未来的版本中可能会在不通知的情况下更改。

DataFrame 交换协议实现#

Pandas 现在实现了 DataFrame 交换 API 规范。有关 API 的完整详细信息，请参阅 https://data-apis.org/dataframe-protocol/latest/index.html

该协议包含两部分：

新增方法 DataFrame.__dataframe__()，用于生成交换对象。它有效地将 pandas DataFrame “导出” 为一个交换对象，这样任何实现了该协议的其他库都可以“导入”该 DataFrame，而无需了解生产者除了它生成一个交换对象之外的任何信息。
新增函数 pandas.api.interchange.from_dataframe()，它可以从任何符合协议的库中获取任意交换对象，并从中构建一个 pandas DataFrame。

样式器 (Styler)#

最显著的开发是新的方法 Styler.concat()，它允许添加自定义的页脚行来可视化数据上的额外计算，例如总计和计数等 (GH 43875, GH 46186)。

此外，还有一种替代输出方法 Styler.to_string()，它允许使用 Styler 的格式化方法创建例如 CSV 文件 (GH 44502)。

还提供了一个新功能 Styler.relabel_index()，可用于完全自定义索引或列标题的显示 (GH 47864)。

次要功能改进包括：

增加了在 Excel 中渲染 border 和 border-{side} CSS 属性的能力 (GH 42276)。

使关键词参数保持一致：Styler.highlight_null() 现在接受 color 并弃用 null_color，尽管这仍然保持向后兼容 (GH 45907)。

在 `DataFrame.resample()` 中使用 `group_keys` 控制索引#

参数 group_keys 已添加到方法 DataFrame.resample() 中。与 DataFrame.groupby() 一样，此参数控制在使用 Resampler.apply() 时是否将每个组添加到重采样结果的索引中。

警告

不指定 group_keys 参数将保留之前的行为，并且如果指定 group_keys=False 会导致结果改变，则会发出警告。在未来的 pandas 版本中，不指定 group_keys 将默认为与 group_keys=False 相同的行为。

In [11]: df = pd.DataFrame(
   ....:     {'a': range(6)},
   ....:     index=pd.date_range("2021-01-01", periods=6, freq="8H")
   ....: )
   ....:

In [12]: df.resample("D", group_keys=True).apply(lambda x: x)
Out[12]:
                                a
2021-01-01 2021-01-01 00:00:00  0
           2021-01-01 08:00:00  1
           2021-01-01 16:00:00  2
2021-01-02 2021-01-02 00:00:00  3
           2021-01-02 08:00:00  4
           2021-01-02 16:00:00  5

In [13]: df.resample("D", group_keys=False).apply(lambda x: x)
Out[13]:
                     a
2021-01-01 00:00:00  0
2021-01-01 08:00:00  1
2021-01-01 16:00:00  2
2021-01-02 00:00:00  3
2021-01-02 08:00:00  4
2021-01-02 16:00:00  5

以前，结果索引将取决于 apply 返回的值，如下例所示。

In [1]: # pandas 1.3
In [2]: df.resample("D").apply(lambda x: x)
Out[2]:
                     a
2021-01-01 00:00:00  0
2021-01-01 08:00:00  1
2021-01-01 16:00:00  2
2021-01-02 00:00:00  3
2021-01-02 08:00:00  4
2021-01-02 16:00:00  5

In [3]: df.resample("D").apply(lambda x: x.reset_index())
Out[3]:
                           index  a
2021-01-01 0 2021-01-01 00:00:00  0
           1 2021-01-01 08:00:00  1
           2 2021-01-01 16:00:00  2
2021-01-02 0 2021-01-02 00:00:00  3
           1 2021-01-02 08:00:00  4
           2 2021-01-02 16:00:00  5

from_dummies#

新增函数 from_dummies()，用于将哑编码的 DataFrame 转换为分类 DataFrame。

In [11]: import pandas as pd

In [12]: df = pd.DataFrame({"col1_a": [1, 0, 1], "col1_b": [0, 1, 0],
   ....:                    "col2_a": [0, 1, 0], "col2_b": [1, 0, 0],
   ....:                    "col2_c": [0, 0, 1]})
   ....: 

In [13]: pd.from_dummies(df, sep="_")
Out[13]: 
  col1 col2
0    a    b
1    b    a
2    a    c

写入 ORC 文件#

新方法 DataFrame.to_orc() 允许写入 ORC 文件 (GH 43864)。

此功能依赖于 pyarrow 库。有关更多详细信息，请参阅 ORC 的 IO 文档。

警告

强烈建议使用 conda 安装 pyarrow，因为 pyarrow 可能会出现一些问题。
to_orc() 需要 pyarrow>=7.0.0。
to_orc() 暂不支持 Windows 系统，您可以在安装可选依赖项中找到有效的环境。
有关支持的 dtype，请参阅 Arrow 中支持的 ORC 功能。
目前，将 DataFrame 转换为 ORC 文件时，日期时间列中的时区信息不会被保留。

df = pd.DataFrame(data={"col1": [1, 2], "col2": [3, 4]})
df.to_orc("./out.orc")

直接从 TAR 归档文件读取#

I/O 方法，例如 read_csv() 或 DataFrame.to_json() 现在允许直接读取和写入 TAR 归档文件 (GH 44787)。

df = pd.read_csv("./movement.tar.gz")
# ...
df.to_csv("./out.tar.gz")

这支持 .tar, .tar.gz, .tar.bz 和 .tar.xz2 归档文件。所使用的压缩方法会从文件名中推断。如果无法推断压缩方法，请使用 compression 参数。

df = pd.read_csv(some_file_obj, compression={"method": "tar", "mode": "r:gz"}) # noqa F821

(mode 是 tarfile.open 的模式之一：https://docs.pythonlang.cn/3/library/tarfile.html#tarfile.open)

read_xml 现在支持 `dtype`, `converters` 和 `parse_dates`#

与其他 I/O 方法类似，pandas.read_xml() 现在支持为列分配特定的 dtype、应用转换器方法以及解析日期 (GH 43567)。

In [14]: from io import StringIO

In [15]: xml_dates = """<?xml version='1.0' encoding='utf-8'?>
   ....: <data>
   ....:   <row>
   ....:     <shape>square</shape>
   ....:     <degrees>00360</degrees>
   ....:     <sides>4.0</sides>
   ....:     <date>2020-01-01</date>
   ....:    </row>
   ....:   <row>
   ....:     <shape>circle</shape>
   ....:     <degrees>00360</degrees>
   ....:     <sides/>
   ....:     <date>2021-01-01</date>
   ....:   </row>
   ....:   <row>
   ....:     <shape>triangle</shape>
   ....:     <degrees>00180</degrees>
   ....:     <sides>3.0</sides>
   ....:     <date>2022-01-01</date>
   ....:   </row>
   ....: </data>"""
   ....: 

In [16]: df = pd.read_xml(
   ....:     StringIO(xml_dates),
   ....:     dtype={'sides': 'Int64'},
   ....:     converters={'degrees': str},
   ....:     parse_dates=['date']
   ....: )
   ....: 

In [17]: df
Out[17]: 
      shape degrees  sides       date
0    square   00360      4 2020-01-01
1    circle   00360   <NA> 2021-01-01
2  triangle   00180      3 2022-01-01

In [18]: df.dtypes
Out[18]: 
shape              object
degrees            object
sides               Int64
date       datetime64[ns]
dtype: object

read_xml 现在支持使用 `iterparse` 处理大型 XML 文件#

对于大小从数百兆字节到数千兆字节的超大型 XML 文件，pandas.read_xml() 现在支持使用 lxml 的 iterparse 和 etree 的 iterparse 来解析这些文件，这些方法是内存高效的，可以在不将整个树加载到内存中的情况下遍历 XML 树并提取特定元素和属性 (GH 45442)。

In [1]: df = pd.read_xml(
...      "/path/to/downloaded/enwikisource-latest-pages-articles.xml",
...      iterparse = {"page": ["title", "ns", "id"]})
...  )
df
Out[2]:
                                                     title   ns        id
0                                       Gettysburg Address    0     21450
1                                                Main Page    0     42950
2                            Declaration by United Nations    0      8435
3             Constitution of the United States of America    0      8435
4                     Declaration of Independence (Israel)    0     17858
...                                                    ...  ...       ...
3578760               Page:Black cat 1897 07 v2 n10.pdf/17  104    219649
3578761               Page:Black cat 1897 07 v2 n10.pdf/43  104    219649
3578762               Page:Black cat 1897 07 v2 n10.pdf/44  104    219649
3578763      The History of Tom Jones, a Foundling/Book IX    0  12084291
3578764  Page:Shakespeare of Stratford (1926) Yale.djvu/91  104     21450

[3578765 rows x 3 columns]

写时复制 (Copy on Write)#

新增功能 copy_on_write (GH 46958)。写时复制确保任何以任何方式从另一个 DataFrame 或 Series 派生出来的对象始终表现为副本。写时复制不允许更新除应用方法对象以外的任何其他对象。

写时复制可以通过以下方式启用：

pd.set_option("mode.copy_on_write", True)
pd.options.mode.copy_on_write = True

或者，写时复制可以在本地通过以下方式启用：

with pd.option_context("mode.copy_on_write", True):
    ...

在没有写时复制的情况下，当更新从父 DataFrame 派生出的子 DataFrame 时，父 DataFrame 也会被更新。

In [19]: df = pd.DataFrame({"foo": [1, 2, 3], "bar": 1})

In [20]: view = df["foo"]

In [21]: view.iloc[0]
Out[21]: 1

In [22]: df
Out[22]: 
   foo  bar
0    1    1
1    2    1
2    3    1

启用写时复制后，df 将不再被更新。

In [23]: with pd.option_context("mode.copy_on_write", True):
   ....:     df = pd.DataFrame({"foo": [1, 2, 3], "bar": 1})
   ....:     view = df["foo"]
   ....:     view.iloc[0]
   ....:     df
   ....: 

更详细的解释可以在这里找到。

其他增强功能#

当 arg 为字典但 na_action 既不是 None 也不是 'ignore' 时，Series.map() 现在会引发错误 (GH 46588)。
MultiIndex.to_frame() 现在支持参数 allow_duplicates，如果该参数缺失或为 False，则会在遇到重复标签时引发错误 (GH 45245)。
StringArray 现在除了接受字符串和 pandas.NA 外，还接受包含 NaN 类似值（None, np.nan）的类数组对象作为其构造函数的 values 参数 (GH 40839)。
改进了 CategoricalIndex 中 categories 的渲染效果 (GH 45218)。
DataFrame.plot() 现在允许 subplots 参数为一个可迭代的列表，用于指定列组，以便可以将列分组到同一子图中 (GH 29688)。
当向下转换会生成 float32 中无法表示的值时，to_numeric() 现在会保留 float64 数组 (GH 43693)。
Series.reset_index() 和 DataFrame.reset_index() 现在支持参数 allow_duplicates (GH 44410)。
DataFrameGroupBy.min(), SeriesGroupBy.min(), DataFrameGroupBy.max() 和 SeriesGroupBy.max() 现在支持使用 engine 关键词进行 Numba 执行 (GH 45428)。
read_csv() 现在支持 defaultdict 作为 dtype 参数 (GH 41574)。
DataFrame.rolling() 和 Series.rolling() 现在支持带有固定长度窗口的 step 参数 (GH 15354)。
实现了 bool-dtype 的 Index，将 bool-dtype 数组类对象传递给 pd.Index 现在将保留 bool dtype，而不是转换为 object (GH 45061)。
实现了复杂 dtype 的 Index，将复杂 dtype 数组类对象传递给 pd.Index 现在将保留复杂 dtype，而不是转换为 object (GH 45845)。
具有 IntegerDtype 的 Series 和 DataFrame 现在支持位运算 (GH 34463)。
为 DateOffset 添加 milliseconds 字段支持 (GH 43371)。
如果填充值可以在不损失精度的情况下进行类型转换，DataFrame.where() 会尝试保持 DataFrame 的 dtype (GH 45582)。
DataFrame.reset_index() 现在接受一个 names 参数，用于重命名索引名称 (GH 6878)。
当 levels 已给出但 keys 为 None 时，concat() 现在会引发错误 (GH 46653)。
当 levels 包含重复值时，concat() 现在会引发错误 (GH 46653)。
为 DataFrame.corr(), DataFrame.corrwith(), DataFrame.cov(), DataFrame.idxmin(), DataFrame.idxmax(), DataFrameGroupBy.idxmin(), DataFrameGroupBy.idxmax(), DataFrameGroupBy.var(), SeriesGroupBy.var(), DataFrameGroupBy.std(), SeriesGroupBy.std(), DataFrameGroupBy.sem(), SeriesGroupBy.sem() 和 DataFrameGroupBy.quantile() 添加了 numeric_only 参数 (GH 46560)。
当使用 string[pyarrow] dtype 的方法不调度到 pyarrow.compute 方法时，现在会引发 errors.PerformanceWarning 警告 (GH 42613, GH 46725)。
为 DataFrame.join() 添加了 validate 参数 (GH 46622)。
为 Resampler.sum(), Resampler.prod(), Resampler.min(), Resampler.max(), Resampler.first() 和 Resampler.last() 添加了 numeric_only 参数 (GH 46442)。
ExponentialMovingWindow 中的 times 参数现在接受 np.timedelta64 (GH 47003)。
DataError, SpecificationError, SettingWithCopyError, SettingWithCopyWarning, NumExprClobberingError, UndefinedVariableError, IndexingError, PyperclipException, PyperclipWindowsException, CSSWarning, PossibleDataLossError, ClosedFileError, IncompatibilityWarning, AttributeConflictWarning, DatabaseError, PossiblePrecisionLoss, ValueLabelTypeMismatch, InvalidColumnName 和 CategoricalConversionWarning 现在已在 pandas.errors 中公开 (GH 27656)。
为 testing.assert_series_equal() 添加了 check_like 参数 (GH 47247)。
为扩展数组 dtype 添加了对 DataFrameGroupBy.ohlc() 和 SeriesGroupBy.ohlc() 的支持 (GH 37493)。
允许使用 read_sas() 读取压缩的 SAS 文件（例如 .sas7bdat.gz 文件）。
pandas.read_html() 现在支持从表格单元格中提取链接 (GH 13141)。
DatetimeIndex.astype() 现在支持将时区不敏感的索引转换为 datetime64[s], datetime64[ms] 和 datetime64[us]，并将时区敏感的索引转换为相应的 datetime64[unit, tzname] dtypes (GH 47579)。
当 dtype 为数值型且提供了 numeric_only=True 时，Series 归约器（例如 min, max, sum, mean）现在将成功运行；以前这会引发 NotImplementedError 错误 (GH 47500)。
如果结果值等距分布，RangeIndex.union() 现在可以返回一个 RangeIndex 而不是 Int64Index (GH 47557, GH 43885)。
DataFrame.compare() 现在接受一个 result_names 参数，允许用户指定正在比较的左右 DataFrame 的结果名称。默认情况下为 'self' 和 'other' (GH 44354)。
DataFrame.quantile() 增加了一个 method 参数，可以接受 table 以评估多列分位数 (GH 43881)。
Interval 现在支持检查一个区间是否包含在另一个区间内 (GH 46613)。
为 Series.set_axis() 和 DataFrame.set_axis() 添加了 copy 关键词，允许用户在一个新对象上设置轴，而不必复制底层数据 (GH 47932)。
方法 ExtensionArray.factorize() 接受 use_na_sentinel=False 以确定如何处理空值 (GH 46601)。
Dockerfile 现在为 pandas 开发安装了一个专用的 pandas-dev 虚拟环境，而不是使用 base 环境 (GH 48427)。

重要的错误修复#

这些是可能导致显著行为变化的错误修复。

在 `groupby` 转换中使用 `dropna=True`#

转换是一种结果大小与输入相同的操作。当结果是 DataFrame 或 Series 时，还要求结果的索引与输入的索引匹配。在 pandas 1.4 中，对包含空值的组使用 DataFrameGroupBy.transform() 或 SeriesGroupBy.transform() 并设置 dropna=True 会得到不正确的结果。如下例所示，不正确的结果要么包含错误的值，要么结果的索引与输入不匹配。

In [24]: df = pd.DataFrame({'a': [1, 1, np.nan], 'b': [2, 3, 4]})

旧行为:

In [3]: # Value in the last row should be np.nan
        df.groupby('a', dropna=True).transform('sum')
Out[3]:
   b
0  5
1  5
2  5

In [3]: # Should have one additional row with the value np.nan
        df.groupby('a', dropna=True).transform(lambda x: x.sum())
Out[3]:
   b
0  5
1  5

In [3]: # The value in the last row is np.nan interpreted as an integer
        df.groupby('a', dropna=True).transform('ffill')
Out[3]:
                     b
0                    2
1                    3
2 -9223372036854775808

In [3]: # Should have one additional row with the value np.nan
        df.groupby('a', dropna=True).transform(lambda x: x)
Out[3]:
   b
0  2
1  3

新行为:

In [25]: df.groupby('a', dropna=True).transform('sum')
Out[25]: 
     b
0  5.0
1  5.0
2  NaN

In [26]: df.groupby('a', dropna=True).transform(lambda x: x.sum())
Out[26]: 
     b
0  5.0
1  5.0
2  NaN

In [27]: df.groupby('a', dropna=True).transform('ffill')
Out[27]: 
     b
0  2.0
1  3.0
2  NaN

In [28]: df.groupby('a', dropna=True).transform(lambda x: x)
Out[28]: 
     b
0  2.0
1  3.0
2  NaN

使用 `iso_dates=True` 将时区不敏感的 Timestamp 序列化为 JSON#

DataFrame.to_json(), Series.to_json() 和 Index.to_json() 会错误地将带有时区不敏感 Timestamp 的 DatetimeArrays/DatetimeIndexes 本地化为 UTC (GH 38760)。

请注意，此补丁并未修复时区敏感 Timestamp 在序列化时到 UTC 的本地化问题。（相关问题 GH 12997）

旧行为

In [32]: index = pd.date_range(
   ....:     start='2020-12-28 00:00:00',
   ....:     end='2020-12-28 02:00:00',
   ....:     freq='1H',
   ....: )
   ....:

In [33]: a = pd.Series(
   ....:     data=range(3),
   ....:     index=index,
   ....: )
   ....:

In [4]: from io import StringIO

In [5]: a.to_json(date_format='iso')
Out[5]: '{"2020-12-28T00:00:00.000Z":0,"2020-12-28T01:00:00.000Z":1,"2020-12-28T02:00:00.000Z":2}'

In [6]: pd.read_json(StringIO(a.to_json(date_format='iso')), typ="series").index == a.index
Out[6]: array([False, False, False])

新行为

In [34]: from io import StringIO

In [35]: a.to_json(date_format='iso')
Out[35]: '{"2020-12-28T00:00:00.000Z":0,"2020-12-28T01:00:00.000Z":1,"2020-12-28T02:00:00.000Z":2}'

# Roundtripping now works
In [36]: pd.read_json(StringIO(a.to_json(date_format='iso')), typ="series").index == a.index
Out[36]: array([ True,  True,  True])

DataFrameGroupBy.value_counts 使用非分组分类列和 `observed=True`#

调用 DataFrameGroupBy.value_counts() 并设置 observed=True 会错误地删除非分组列中未观察到的类别 (GH 46357)。

In [6]: df = pd.DataFrame(["a", "b", "c"], dtype="category").iloc[0:2]
In [7]: df
Out[7]:
   0
0  a
1  b

旧行为

In [8]: df.groupby(level=0, observed=True).value_counts()
Out[8]:
0  a    1
1  b    1
dtype: int64

新行为

In [9]: df.groupby(level=0, observed=True).value_counts()
Out[9]:
0  a    1
1  a    0
   b    1
0  b    0
   c    0
1  c    0
dtype: int64

向后不兼容的 API 更改#

提高依赖项的最低版本要求#

部分依赖项的最低支持版本已更新。如果已安装，我们现在要求：

包	最低版本	必需	已更改
numpy	1.20.3	X	X
mypy (dev)	0.971		X
beautifulsoup4	4.9.3		X
blosc	1.21.0		X
bottleneck	1.3.2		X
fsspec	2021.07.0		X
hypothesis	6.13.0		X
gcsfs	2021.07.0		X
jinja2	3.0.0		X
lxml	4.6.3		X
numba	0.53.1		X
numexpr	2.7.3		X
openpyxl	3.0.7		X
pandas-gbq	0.15.0		X
psycopg2	2.8.6		X
pymysql	1.0.2		X
pyreadstat	1.1.2		X
pyxlsb	1.0.8		X
s3fs	2021.08.0		X
scipy	1.7.1		X
sqlalchemy	1.4.16		X
tabulate	0.8.9		X
xarray	0.19.0		X
xlsxwriter	1.4.3		X

对于可选库，一般建议使用最新版本。下表列出了 pandas 开发过程中目前测试的每个库的最低版本。低于最低测试版本的可选库可能仍然可用，但不被视为受支持。

包	最低版本	已更改
beautifulsoup4	4.9.3	X
blosc	1.21.0	X
bottleneck	1.3.2	X
brotlipy	0.7.0
fastparquet	0.4.0
fsspec	2021.08.0	X
html5lib	1.1
hypothesis	6.13.0	X
gcsfs	2021.08.0	X
jinja2	3.0.0	X
lxml	4.6.3	X
matplotlib	3.3.2
numba	0.53.1	X
numexpr	2.7.3	X
odfpy	1.4.1
openpyxl	3.0.7	X
pandas-gbq	0.15.0	X
psycopg2	2.8.6	X
pyarrow	1.0.1
pymysql	1.0.2	X
pyreadstat	1.1.2	X
pytables	3.6.1
python-snappy	0.6.0
pyxlsb	1.0.8	X
s3fs	2021.08.0	X
scipy	1.7.1	X
sqlalchemy	1.4.16	X
tabulate	0.8.9	X
tzdata	2022a
xarray	0.19.0	X
xlrd	2.0.1
xlsxwriter	1.4.3	X
xlwt	1.3.0
zstandard	0.15.2

更多详情请参阅依赖项和可选依赖项。

其他 API 更改#

BigQuery I/O 方法 read_gbq() 和 DataFrame.to_gbq() 默认设置为 auth_local_webserver = True。Google 已弃用 auth_local_webserver = False “带外”(复制-粘贴) 流程。计划于 2022 年 10 月停止支持 auth_local_webserver = False 选项。( GH 46312)
read_json() 现在在输入字符串以 .json, .json.gz, .json.bz2 等结尾但文件不存在时，会引发 FileNotFoundError (以前是 ValueError)。(GH 29102)
之前会引发 OverflowError 的 Timestamp 或 Timedelta 操作，现在改为引发 OutOfBoundsDatetime 或 OutOfBoundsTimedelta（在适当的情况下）(GH 47268)
当 read_sas() 之前返回 None 时，现在会返回一个空的 DataFrame (GH 47410)
如果 index 或 columns 参数是集合，则 DataFrame 构造函数会引发错误 (GH 47215)

弃用#

警告

在下一个主要版本 2.0 中，正在考虑进行几项较大的 API 更改，这些更改将不会有正式的弃用通知，例如将标准库 zoneinfo 作为默认时区实现而不是 pytz，让 Index 支持所有数据类型而不是拥有多个子类 (CategoricalIndex, Int64Index 等)，以及更多。正在考虑的更改已记录在此 GitHub Issue 中，欢迎任何反馈或疑虑。

在具有 Int64Index 或 RangeIndex 的 Series 上进行基于标签的整数切片#

在未来的版本中，对具有 Int64Index 或 RangeIndex 的 Series 进行整数切片将被视为基于标签，而非基于位置。这将使行为与 Series.__getitem__() 和 Series.__setitem__() 的其他行为保持一致 (GH 45162)。

例如

In [29]: ser = pd.Series([1, 2, 3, 4, 5], index=[2, 3, 5, 7, 11])

在旧行为中，ser[2:4] 将切片视为基于位置

旧行为:

In [3]: ser[2:4]
Out[3]:
5    3
7    4
dtype: int64

在未来版本中，这将视为基于标签

未来行为:

In [4]: ser.loc[2:4]
Out[4]:
2    1
3    2
dtype: int64

要保留旧行为，请使用 series.iloc[i:j]。要获得未来行为，请使用 series.loc[i:j]。

对 DataFrame 的切片将不受影响。

`ExcelWriter` 属性#

ExcelWriter 的所有属性之前都已记录为非公开。但是，一些第三方 Excel 引擎记录了访问 ExcelWriter.book 或 ExcelWriter.sheets，并且用户正在使用这些属性以及可能其他属性。以前这些属性使用不安全；例如，对 ExcelWriter.book 的修改不会更新 ExcelWriter.sheets，反之亦然。为了支持这一点，pandas 已将一些属性公开并改进了它们的实现，以便现在可以安全使用。( GH 45572)

以下属性现在是公开的，并被认为是安全的访问。

book

check_extension

close

date_format

datetime_format

engine

if_sheet_exists

sheets

supported_extensions

以下属性已被弃用。现在访问它们时会引发 FutureWarning，并将在未来的版本中删除。用户应注意，它们的使用被认为是不安全的，并可能导致意外结果。

cur_sheet

handles

path

save

write_cells

有关更多详细信息，请参阅 ExcelWriter 的文档。

在 `DataFrameGroupBy.apply()` 和 `SeriesGroupBy.apply()` 中使用 `group_keys` 与转换器#

在 pandas 的早期版本中，如果推断传递给 DataFrameGroupBy.apply() 或 SeriesGroupBy.apply() 的函数是转换器（即，结果索引等于输入索引），则会忽略 DataFrame.groupby() 和 Series.groupby() 的 group_keys 参数，并且永远不会将组键添加到结果的索引中。将来，当用户指定 group_keys=True 时，组键将被添加到索引中。

由于 group_keys=True 是 DataFrame.groupby() 和 Series.groupby() 的默认值，因此在使用转换器时未指定 group_keys 将引发 FutureWarning。可以通过指定 group_keys=False 来抑制此警告并保留之前的行为。

使用 `loc` 和 `iloc` 设置值时的原地操作#

大多数情况下，使用 DataFrame.iloc() 设置值会尝试原地设置，只有在必要时才回退到插入新数组。在某些情况下，此规则不被遵循，例如当从具有不同 dtype 的数组设置整个列时：

In [30]: df = pd.DataFrame({'price': [11.1, 12.2]}, index=['book1', 'book2'])

In [31]: original_prices = df['price']

In [32]: new_prices = np.array([98, 99])

旧行为:

In [3]: df.iloc[:, 0] = new_prices
In [4]: df.iloc[:, 0]
Out[4]:
book1    98
book2    99
Name: price, dtype: int64
In [5]: original_prices
Out[5]:
book1    11.1
book2    12.2
Name: price, float: 64

此行为已弃用。在未来的版本中，使用 iloc 设置整个列将尝试原地操作。

未来行为:

In [3]: df.iloc[:, 0] = new_prices
In [4]: df.iloc[:, 0]
Out[4]:
book1    98.0
book2    99.0
Name: price, dtype: float64
In [5]: original_prices
Out[5]:
book1    98.0
book2    99.0
Name: price, dtype: float64

要获得旧行为，请直接使用 DataFrame.__setitem__()

In [3]: df[df.columns[0]] = new_prices
In [4]: df.iloc[:, 0]
Out[4]
book1    98
book2    99
Name: price, dtype: int64
In [5]: original_prices
Out[5]:
book1    11.1
book2    12.2
Name: price, dtype: float64

当 df.columns 不唯一且您想按索引更改单个列时，为了获得旧行为，可以使用 DataFrame.isetitem()，该方法已在 pandas 1.5 中添加。

In [3]: df_with_duplicated_cols = pd.concat([df, df], axis='columns')
In [3]: df_with_duplicated_cols.isetitem(0, new_prices)
In [4]: df_with_duplicated_cols.iloc[:, 0]
Out[4]:
book1    98
book2    99
Name: price, dtype: int64
In [5]: original_prices
Out[5]:
book1    11.1
book2    12.2
Name: 0, dtype: float64

`numeric_only` 默认值#

在 DataFrame、DataFrameGroupBy 和 Resampler 操作（如 min、sum 和 idxmax）中，numeric_only 参数的默认值（如果存在的话）不一致。此外，使用默认值 None 的操作可能导致意外结果。( GH 46560)

In [1]: df = pd.DataFrame({"a": [1, 2], "b": ["x", "y"]})

In [2]: # Reading the next line without knowing the contents of df, one would
        # expect the result to contain the products for both columns a and b.
        df[["a", "b"]].prod()
Out[2]:
a    2
dtype: int64

为避免此行为，指定值 numeric_only=None 已被弃用，并将在 pandas 的未来版本中删除。将来，所有带有 numeric_only 参数的操作都将默认设置为 False。用户应仅对可操作的列调用操作，或者指定 numeric_only=True 以仅对布尔、整数和浮点列进行操作。

为了支持向新行为的过渡，以下方法已获得 numeric_only 参数。

其他弃用#

在 DataFrame.to_csv() 和 Series.to_csv() 中弃用了关键字 line_terminator，请改用 lineterminator；这是为了与 read_csv() 和标准库“csv”模块保持一致 (GH 9568)
当传递非稀疏 dtype 时，弃用了 SparseArray.astype(), Series.astype(), 和 DataFrame.astype() 与 SparseDtype 的行为。在未来的版本中，这将转换为该非稀疏 dtype，而不是将其封装在 SparseDtype 中 (GH 34457)
弃用了 DatetimeIndex.intersection() 和 DatetimeIndex.symmetric_difference()（union 行为已在 1.3.0 版本中弃用）与混合时区的行为；在未来的版本中，两者都将转换为 UTC 而不是对象 dtype (GH 39328, GH 45357)
弃用了 DataFrame.iteritems(), Series.iteritems(), HDFStore.iteritems()，转而使用 DataFrame.items(), Series.items(), HDFStore.items() (GH 45321)
弃用了 Series.is_monotonic() 和 Index.is_monotonic()，转而使用 Series.is_monotonic_increasing() 和 Index.is_monotonic_increasing() (GH 45422, GH 21335)
弃用了 DatetimeIndex.astype(), TimedeltaIndex.astype(), PeriodIndex.astype() 在转换为除 int64 之外的整数 dtype 时的行为。在未来的版本中，这些将精确转换为指定的 dtype（而不是始终转换为 int64），并且如果转换溢出将引发错误 (GH 45034)
弃用了 DataFrame 和 Series 的 __array_wrap__ 方法，改为依赖标准的 numpy ufuncs (GH 45451)
当浮点型数据在 Series 或 DatetimeIndex 中与时区一起传递时，将其视为实际时间（wall-times）的行为已弃用 (GH 45573)
弃用了 Series.fillna() 和 DataFrame.fillna() 在 timedelta64[ns] dtype 和不兼容填充值时的行为；在未来的版本中，这将转换为一个通用 dtype（通常是对象）而不是引发错误，与其它 dtype 的行为保持一致 (GH 45746)
弃用了 infer_freq() 中的 warn 参数 (GH 45947)
弃用了 ExtensionArray.argsort() 中允许非关键字参数的用法 (GH 46134)
弃用了 DataFrame.any() 和 DataFrame.all() 在 bool_only=True 时将全布尔 object-dtype 列视为布尔型的方式，请明确转换为布尔型 (GH 46188)
弃用了方法 DataFrame.quantile() 的行为，属性 numeric_only 将默认为 False。结果将包括日期时间/时间差列 (GH 7308)。
弃用了 Timedelta.freq 和 Timedelta.is_populated (GH 46430)
弃用了 Timedelta.delta (GH 46476)
弃用了在 DataFrame.any() 和 Series.any() 中将参数作为位置参数传递的行为 (GH 44802)
弃用了除 data 外，在 DataFrame.pivot() 和 pivot() 中传递位置参数的行为 (GH 30228)
弃用了方法 DataFrame.mad(), Series.mad() 以及相应的 groupby 方法 (GH 11787)
弃用了 Index.join() 中除 other 之外的位置参数，请改用仅限关键字的参数而非位置参数 (GH 46518)
弃用了 StringMethods.rsplit() 和 StringMethods.split() 中除 pat 之外的位置参数，请改用仅限关键字的参数而非位置参数 (GH 47423)
弃用了对不带时区信息的 DatetimeIndex 使用表示带时区信息的日期时间字符串进行索引的行为 (GH 46903, GH 36148)
弃用了在 Timestamp 构造函数中，允许 unit="M" 或 unit="Y" 与非整数浮点值一起使用 (GH 47267)
弃用了 display.column_space 全局配置选项 (GH 7576)
弃用了 factorize(), Index.factorize() 和 ExtensionArray.factorize() 中的参数 na_sentinel；请改用 use_na_sentinel=True 表示 NaN 值使用哨兵值 -1，以及 use_na_sentinel=False（代替 na_sentinel=None）来编码 NaN 值 (GH 46910)
弃用了 DataFrameGroupBy.transform() 在 UDF 返回 DataFrame 时不进行结果对齐的行为 (GH 45648)
澄清了 to_datetime() 在带分隔符的日期无法根据指定的 dayfirst 参数解析时发出的警告 (GH 46210)
当带分隔符的日期无法根据指定的 dayfirst 参数解析时，即使对于省略前导零的日期（例如 31/1/2001），to_datetime() 也会发出警告 (GH 47880)
弃用了 Series 和 Resampler 的归约器（例如 min, max, sum, mean）在 dtype 为非数值且提供了 numeric_only=True 时引发 NotImplementedError 的行为；在未来的版本中，这将引发 TypeError (GH 47500)
弃用了 Series.rank() 在 dtype 为非数值且提供了 numeric_only=True 时返回空结果的行为；在未来的版本中，这将引发 TypeError (GH 47500)
弃用了 Series.mask(), Series.where(), DataFrame.mask() 和 DataFrame.where() 中的参数 errors，因为 errors 对这些方法没有影响 (GH 47728)
弃用了 Rolling, Expanding 和 ExponentialMovingWindow 操作中的参数 *args 和 **kwargs。( GH 47836)
弃用了 Categorical.set_ordered(), Categorical.as_ordered() 和 Categorical.as_unordered() 中的 inplace 关键字 (GH 37643)
弃用了通过 cat.categories = ['a', 'b', 'c'] 设置分类变量类别的方式，请改用 Categorical.rename_categories() (GH 37643)
弃用了 Series.to_excel() 和 DataFrame.to_excel() 中未使用的参数 encoding 和 verbose (GH 47912)
弃用了 DataFrame.set_axis() 和 Series.set_axis() 中的 inplace 关键字，请改用 obj = obj.set_axis(..., copy=False) (GH 48130)
弃用了对按长度为 1 的列表分组的 DataFrameGroupBy 或 SeriesGroupBy 进行迭代时，生成单个元素的情况；将改为返回长度为一的元组 (GH 42795)
修复了 MultiIndex.lesort_depth() 作为公共方法弃用的警告消息，因为该消息之前错误地引用了 MultiIndex.is_lexsorted() (GH 38701)
弃用了 DataFrame.plot() 和 Series.plot() 中的 sort_columns 参数 (GH 47563)。
弃用了 DataFrame.to_stata() 和 read_stata() 中除第一个参数之外的所有位置参数，请改用关键字参数 (GH 48128)。
弃用了 read_csv(), read_fwf(), read_table() 和 read_excel() 中的 mangle_dupe_cols 参数。该参数从未实现，取而代之将添加一个可以指定重命名模式的新参数 (GH 47718)
弃用了在 Series.astype() 中允许 dtype='datetime64' 或 dtype=np.datetime64 的行为，请改用 “datetime64[ns]” (GH 47844)

性能改进#

当 other 是 Series 时，DataFrame.corrwith() 在列向 (axis=0) Pearson 和 Spearman 相关性方面的性能改进 (GH 46174)
在 DataFrameGroupBy.transform() 和 SeriesGroupBy.transform() 中，对于某些用户定义的 DataFrame -> Series 函数，性能有所提高 (GH 45387)
当子集仅包含一列时，DataFrame.duplicated() 的性能改进 (GH 45236)
DataFrameGroupBy.diff() 和 SeriesGroupBy.diff() 的性能改进 (GH 16706)
在 DataFrameGroupBy.transform() 和 SeriesGroupBy.transform() 中，当为用户定义函数广播值时，性能有所提高 (GH 45708)
在 DataFrameGroupBy.transform() 和 SeriesGroupBy.transform() 中，当只存在一个组时，用户定义函数的性能有所提高 (GH 44977)
在非唯一未排序索引上分组时，DataFrameGroupBy.apply() 和 SeriesGroupBy.apply() 的性能有所提高 (GH 46527)
在 DataFrame.loc() 和 Series.loc() 中，针对 MultiIndex 的基于元组的索引性能有所提高 (GH 45681, GH 46040, GH 46330)
在 DataFrameGroupBy.var() 和 SeriesGroupBy.var() 中，当 ddof 不为 1 时，性能有所提高 (GH 48152)
当索引是 MultiIndex 时，DataFrame.to_records() 的性能改进 (GH 47263)
当 MultiIndex 包含 DatetimeIndex、TimedeltaIndex 或 ExtensionDtypes 类型的级别时，MultiIndex.values 的性能改进 (GH 46288)
当左侧和/或右侧为空时，merge() 的性能改进 (GH 45838)
当左侧和/或右侧为空时，DataFrame.join() 的性能改进 (GH 46015)
当目标是 MultiIndex 时，DataFrame.reindex() 和 Series.reindex() 的性能改进 (GH 46235)
在 PyArrow 支持的字符串数组中设置值时的性能改进 (GH 46400)
factorize() 的性能改进 (GH 46109)
DataFrame 和 Series 构造函数对扩展 dtype 标量的性能改进 (GH 45854)
当提供了 nrows 参数时，read_excel() 的性能改进 (GH 32727)
当应用重复的 CSS 格式时，Styler.to_excel() 的性能改进 (GH 47371)
MultiIndex.is_monotonic_increasing() 的性能改进 (GH 47458)
BusinessHour str 和 repr 的性能改进 (GH 44764)
当使用默认的 strftime 格式 "%Y-%m-%d %H:%M:%S" 或 "%Y-%m-%d %H:%M:%S.%f" 之一时，日期时间数组字符串格式化的性能改进。( GH 44764)
在处理时间数组时，Series.to_sql() 和 DataFrame.to_sql() (SQLiteTable) 的性能改进。( GH 44764)
read_sas() 的性能改进 (GH 47404)
对于 arrays.SparseArray，argmax 和 argmin 的性能改进 (GH 34197)

错误修复#

分类#

Categorical.view() 不接受整数 dtypes 的错误 (GH 25464)
当索引类别为整数 dtype 且索引包含 NaN 值时，CategoricalIndex.union() 错误地引发错误而不是转换为 float64 的错误 (GH 45362)
concat() 在连接两个（或更多）无序 CategoricalIndex 变量时，如果其类别是排列，则产生不正确的索引值的错误 (GH 24845)

日期时间类#

DataFrame.quantile() 在处理日期时间类 dtype 且没有行时，错误地返回 float64 dtype 而不是保留日期时间类 dtype 的错误 (GH 41544)
to_datetime() 在处理 np.str_ 对象序列时错误地引发错误的 Bug (GH 32264)
在 Timestamp 构造中，当将日期时间组件作为位置参数传递，并将 tzinfo 作为关键字参数传递时，错误地引发了错误 (GH 31929)
在 Index.astype() 中，当从对象 dtype 转换为 timedelta64[ns] dtype 时，错误地将 np.datetime64("NaT") 值转换为 np.timedelta64("NaT") 而不是引发错误的 Bug (GH 45722)
在 SeriesGroupBy.value_counts() 的索引中，当传递分类列时，出现的错误 (GH 44324)
DatetimeIndex.tz_localize() 本地化到 UTC 时未能复制底层数据的 Bug (GH 46460)
DatetimeIndex.resolution() 错误地返回“day”而不是“nanosecond”的错误，针对纳秒级分辨率的索引 (GH 46903)
在 Timestamp 中，当使用整数或浮点值以及 unit="Y" 或 unit="M" 时，结果略有错误 (GH 47266)
在 DatetimeArray 构造中，当传递另一个 DatetimeArray 且 freq=None 时，错误地从给定数组推断频率的错误 (GH 47296)
在 to_datetime() 中，即使 errors=coerce，如果行数超过 50 行，也会抛出 OutOfBoundsDatetime 的 Bug (GH 45319)
将 DateOffset 添加到 Series 时，不添加 nanoseconds 字段的 Bug (GH 47856)

时间差#

astype_nansafe() astype(“timedelta64[ns]”) 在包含 np.nan 时失败的错误 (GH 45798)
在构造 Timedelta 时，当传入 np.timedelta64 对象和 unit 时，有时会静默溢出并返回不正确结果，而不是引发 OutOfBoundsTimedelta 的错误 (GH 46827)
在构造 Timedelta 时，当从带有 unit="W" 的大整数或浮点数构造时，静默溢出并返回不正确结果，而不是引发 OutOfBoundsTimedelta 的错误 (GH 47268)

时区#

在 Timestamp 构造函数中，当传入 ZoneInfo tzinfo 对象时引发错误的 Bug (GH 46425)

数值#

当对带有 dtype="boolean" 和 NA 的类数组执行操作时，错误地原地修改数组的 Bug (GH 45421)
当可空类型没有 NA 值时，算术运算与不可空类型的相同运算不匹配的 Bug (GH 48223)
在 floordiv 中，当除以 IntegerDtype 0 时，会返回 0 而不是 inf 的 Bug (GH 48223)
在对带有 dtype="boolean" 的类数组进行除法、pow 和 mod 操作时，行为与相应的 np.bool_ 类型不一致的 Bug (GH 46063)
将带有 IntegerDtype 或 FloatingDtype 的 Series 乘以带有 timedelta64[ns] dtype 的类数组时，错误地引发错误的 Bug (GH 45622)
在 mean() 中，可选依赖项 bottleneck 导致精度损失与数组长度呈线性关系。已为 mean() 禁用 bottleneck，将损失改进为对数线性，但这可能导致性能下降。( GH 42878)

转换#

DataFrame.astype() 未保留子类的错误 (GH 40810)
从包含浮点数的列表或浮点数dtype的ndarray-like（例如 dask.Array）以及整数dtype构造 Series 时出现错误，抛出异常而不是像处理 np.ndarray 那样进行类型转换 (GH 40110)
Float64Index.astype() 将其转换为无符号整数dtype时出现错误，错误地将其转换为 np.int64 dtype (GH 45309)
Series.astype() 和 DataFrame.astype() 将浮点数dtype转换为无符号整数dtype时出现错误，在存在负值的情况下未能引发异常 (GH 45151)
array() 在使用 FloatingDtype 且值中包含可转换为浮点数的字符串时，错误地引发异常 (GH 45424)
比较字符串和 datetime64ns 对象时导致 OverflowError 异常的错误 (GH 45506)
通用抽象dtypes的元类中的错误导致 DataFrame.apply() 和 Series.apply() 对内置函数 type 引发异常 (GH 46684)
DataFrame.to_records() 中的错误：如果索引是 MultiIndex，则返回不一致的 numpy 类型 (GH 47263)
DataFrame.to_dict() 在 orient="list" 或 orient="index" 模式下未能返回原生类型 (GH 46751)
DataFrame.apply() 中的错误：当应用于空 DataFrame 且 axis=1 时，返回 DataFrame 而不是 Series (GH 39111)
从一个不是 NumPy ndarray 但由所有 NumPy 无符号整数标量组成的 iterable 推断 dtype 时出现错误，未能推断出无符号整数dtype (GH 47294)
DataFrame.eval() 中的错误：当 pandas 对象（例如 'Timestamp'）作为列名时出现问题 (GH 44603)

字符串#

str.startswith() 和 str.endswith() 在使用其他 Series 作为参数 _pat_ 时出现错误。现在引发 TypeError (GH 3485)
Series.str.zfill() 中的错误：当字符串包含前导符号时，在符号字符之前而不是之后填充 '0'，与标准库的 str.zfill 行为不符 (GH 20868)

区间#

IntervalArray.__setitem__() 中的错误：当将 np.nan 设置到基于整数的数组中时，引发 ValueError 而不是 TypeError (GH 45484)
IntervalDtype 中的错误：当使用 datetime64[ns, tz] 作为 dtype 字符串时 (GH 46999)

索引#

DataFrame.iloc() 中的错误：当对仅包含一个 ExtensionDtype 列的 DataFrame 索引单行时，返回的是副本而非底层数据的视图 (GH 45241)
DataFrame.__getitem__() 中的错误：即使选择了唯一列，当 DataFrame 包含重复列时仍返回副本 (GH 45316, GH 41062)
Series.align() 中的错误：当两个 MultiIndex 的交集相同时，未能创建具有级别并集的 MultiIndex (GH 45224)
将 NA 值（None 或 np.nan）设置到基于整数的 IntervalDtype 的 Series 中时出现错误，错误地将其转换为 object dtype 而不是基于浮点数的 IntervalDtype (GH 45568)
索引时将值设置到 ExtensionDtype 列中出现错误，当使用 df.iloc[:, i] = values 且 values 具有与 df.iloc[:, i] 相同的 dtype 时，错误地插入了新数组而不是原地设置 (GH 33457)
Series.__setitem__() 中的错误：当使用非整数 Index 且使用整数键设置无法原地设置的值时，引发 ValueError 而不是转换为通用 dtype (GH 45070)
DataFrame.loc() 中的错误：当将值作为列表设置到 DataFrame 中时，未将 None 转换为 NA (GH 47987)
Series.__setitem__() 中的错误：当将不兼容的值设置到 PeriodDtype 或 IntervalDtype 的 Series 中时，使用布尔掩码索引时会引发异常，而使用其他等效索引器时则会强制转换；现在这些操作都一致地强制转换，包括 Series.mask() 和 Series.where() (GH 45768)
DataFrame.where() 中的错误：当存在多个日期时间类 dtype 的列时，未能将结果降级以与其他 dtype 保持一致 (GH 45837)
isin() 中的错误：当使用无符号整数 dtype 和没有 dtype 的列表状参数时，错误地向上转换为 float64 (GH 46485)
Series.loc.__setitem__() 和 Series.loc.__getitem__() 中的错误：当不使用 MultiIndex 但使用多个键时，未引发异常 (GH 13831)
Index.reindex() 中的错误：当指定了 level 但未提供 MultiIndex 时引发 AssertionError；现在 level 被忽略 (GH 35132)
当设置的值对于 Series 的 dtype 来说太大时，未能强制转换为通用类型 (GH 26049, GH 32878) 中的错误
loc.__setitem__() 中的错误：将 range 键视为位置而非基于标签的键 (GH 45479)
DataFrame.__setitem__() 中的错误：当使用标量键和 DataFrame 作为值进行设置时，将扩展数组 dtypes 转换为 object (GH 46896)
Series.__setitem__() 中的错误：当将标量设置到可为空的 pandas dtype 时，如果标量无法（无损地）转换为可为空的类型，则不会引发 TypeError (GH 45404)
Series.__setitem__() 中的错误：当设置包含 NA 的 boolean dtype 值时，错误地引发异常而不是转换为 boolean dtype (GH 45462)
Series.loc() 中的错误：当 Index 不匹配时，布尔索引器包含 NA 导致引发异常 (GH 46551)
Series.__setitem__() 中的错误：将 NA 设置到数值型 dtype 的 Series 中时，会错误地向上转换为 object-dtype，而不是将值视为 np.nan (GH 44199)
DataFrame.loc() 中的错误：当将值设置到列中且右侧是字典时 (GH 47216)
Series.__setitem__() 中的错误：当使用 datetime64[ns] dtype、全为 False 的布尔掩码和不兼容的值时，错误地转换为 object 而不是保留 datetime64[ns] dtype (GH 45967)
Index.__getitem__() 中的错误：当索引器是来自包含 NA 的布尔 dtype 时，引发 ValueError (GH 45806)
Series.__setitem__() 中的错误：使用标量扩展 Series 时丢失精度 (GH 32346)
Series.mask() 中的错误：当 inplace=True 或使用小整数 dtypes 的布尔掩码设置值时，错误地引发异常 (GH 45750)
DataFrame.mask() 中的错误：当 inplace=True 且包含 ExtensionDtype 列时，错误地引发异常 (GH 45577)
从具有 object-dtype 行索引且包含日期时间类值的 DataFrame 中获取列时出现错误：现在生成的 Series 会保留父 DataFrame 中精确的 object-dtype Index (GH 42950)
DataFrame.__getattribute__() 中的错误：如果列具有 "string" dtype，则引发 AttributeError (GH 46185)
DataFrame.compare() 中的错误：当比较扩展数组 dtype 和 numpy dtype 时，返回全为 NaN 的列 (GH 44014)
DataFrame.where() 中的错误：当使用 "boolean" 掩码处理 numpy dtype 时设置了错误的值 (GH 44014)
对 DatetimeIndex 使用 np.str_ 键进行索引时错误地引发异常 (GH 45580)
CategoricalIndex.get_indexer() 中的错误：当索引包含 NaN 值时，导致目标中存在但索引中不存在的元素被映射到 NaN 元素的索引，而不是 -1 (GH 45361)
当将大整数值设置到具有 float32 或 float16 dtype 的 Series 中时出现错误，错误地改变了这些值，而不是强制转换为 float64 dtype (GH 45844)
Series.asof() 和 DataFrame.asof() 中的错误：错误地将布尔型 dtype 结果转换为 float64 dtype (GH 16063)
NDFrame.xs()、DataFrame.iterrows()、DataFrame.loc() 和 DataFrame.iloc() 中的错误：并非总是传播元数据 (GH 28283)
DataFrame.sum() 中的错误：如果输入包含 NaNs，min_count 会改变 dtype (GH 46947)
IntervalTree 中的错误导致无限递归 (GH 46658)
PeriodIndex 中的错误：当索引到 NA 时引发 AttributeError，而不是将其替换为 NaT (GH 46673)
DataFrame.at() 中的错误：会允许修改多个列 (GH 48296)

缺失值#

Series.fillna() 和 DataFrame.fillna() 中的错误：在某些没有 NA 值的情况下，downcast 关键字未被遵守 (GH 45423)
Series.fillna() 和 DataFrame.fillna() 中的错误：当使用 IntervalDtype 和不兼容的值时，引发异常而不是转换为通用（通常是 object）dtype (GH 45796)
Series.map() 中的错误：如果 mapper 是 dict 或 Series，则不遵守 na_action 参数 (GH 47527)
DataFrame.interpolate() 中的错误：当 object-dtype 列中使用 inplace=False 时，未能返回副本 (GH 45791)
DataFrame.dropna() 中的错误：允许同时设置 how 和 thresh 这两个不兼容的参数 (GH 46575)
DataFrame.fillna() 中的错误：当 DataFrame 是单个块时忽略了 axis 参数 (GH 47713)

多级索引#

DataFrame.loc() 中的错误：当使用负步长和非空起始/结束值对 MultiIndex 进行切片时，返回空结果 (GH 46156)
DataFrame.loc() 中的错误：当使用除 -1 以外的负步长对 MultiIndex 进行切片时引发异常 (GH 46156)
DataFrame.loc() 中的错误：当使用负步长对 MultiIndex 进行切片并切片非整数标签索引级别时引发异常 (GH 46156)
Series.to_numpy() 中的错误：当提供了 na_value 时，多级索引的 Series 无法转换为 numpy 数组 (GH 45774)
MultiIndex.equals 中的错误：当只有一侧具有扩展数组 dtype 时，不满足交换律 (GH 46026)
MultiIndex.from_tuples() 中的错误：无法构造空元组的 Index (GH 45608)

I/O#

DataFrame.to_stata() 中的错误：如果 DataFrame 包含 -np.inf，则未引发错误 (GH 45350)
read_excel() 中的错误：某些 skiprows 可调用对象导致无限循环 (GH 45585)
DataFrame.info() 中的错误：当在空的 DataFrame 上调用时，输出末尾的换行符被省略 (GH 45494)
read_csv() 中的错误：对于 engine="c" 且 on_bad_lines="warn" 时，未能识别换行符 (GH 41710)
DataFrame.to_csv() 中的错误：对于 Float64 dtype 未遵守 float_format 参数 (GH 45991)
read_csv() 中的错误：并非在所有情况下都遵守指定的索引列转换器 (GH 40589)
read_csv() 中的错误：即使 index_col=False，仍将第二行解释为 Index 名称 (GH 46569)
read_parquet() 中的错误：当 engine="pyarrow" 且传递了不支持数据类型的列时，导致部分写入磁盘 (GH 44914)
DataFrame.to_excel() 和 ExcelWriter 中的错误：当将空 DataFrame 写入 .ods 文件时会引发异常 (GH 45793)
read_csv() 中的错误：对于 engine="python" 时，忽略了不存在的标题行 (GH 47400)
read_excel() 中的错误：当 header 引用不存在的行时，引发了无法控制的 IndexError (GH 43143)
read_html() 中的错误：<br> 元素周围的文本在连接时没有空格 (GH 29528)
read_csv() 中的错误：当数据长度超过标题行时，导致 usecols 中期望字符串的可调用对象出现问题 (GH 46997)
带有 datetime64[ns] 子类型的 Interval dtype 在 Parquet 往返过程中出现错误 (GH 45881)
read_excel() 中的错误：当读取 XML 元素之间包含换行符的 .ods 文件时 (GH 45598)
read_parquet() 中的错误：当 engine="fastparquet" 时，文件在出错时未关闭 (GH 46555)
DataFrame.to_html() 现在在 border 关键字设置为 False 时，会从 <table> 元素中排除 border 属性。
在处理某些类型的压缩 SAS7BDAT 文件时，read_sas() 存在错误 (GH 35545)
当未给定名称时，read_excel() 未能向前填充 MultiIndex 存在错误 (GH 47487)
对于零行的 SAS7BDAT 文件，read_sas() 返回 None 而不是空 DataFrame，存在错误 (GH 18198)
在 MultiIndex 中使用扩展数组时，DataFrame.to_string() 使用了错误的缺失值，存在错误 (GH 47986)
StataWriter 中存在错误，值标签总是以默认编码写入 (GH 46750)
StataWriterUTF8 中存在错误，一些有效字符从变量名中被删除 (GH 47276)
当写入带有 MultiIndex 的空 DataFrame 时，DataFrame.to_excel() 存在错误 (GH 19543)
对于包含 0x40 控制字节的 RLE 压缩 SAS7BDAT 文件，read_sas() 存在错误 (GH 31243)
read_sas() 打乱列名存在错误 (GH 31243)
对于包含 0x00 控制字节的 RLE 压缩 SAS7BDAT 文件，read_sas() 存在错误 (GH 47099)
使用 use_nullable_dtypes=True 时，read_parquet() 返回 float64 数据类型而不是可空 Float64 数据类型，存在错误 (GH 45694)
DataFrame.to_json() 中存在错误，即 PeriodDtype 在使用 read_json() 读取回来时无法完成序列化往返 (GH 44720)
读取带有中文字符标签的 XML 文件时，read_xml() 会引发 XMLSyntaxError，存在错误 (GH 47902)

Period#

从 PeriodArray 中减去 Period 返回错误结果，存在错误 (GH 45999)
Period.strftime() 和 PeriodIndex.strftime() 存在错误，指令 %l 和 %u 给出错误结果 (GH 46252)
当将字符串传递给 Period 且微秒是 1000 的倍数时，推断出不正确的 freq，存在错误 (GH 46811)
从 Timestamp 或 np.datetime64 对象构造 Period 时，如果对象具有非零纳秒且 freq="ns"，会错误地截断纳秒，存在错误 (GH 46811)
将 np.timedelta64("NaT", "ns") 添加到具有类似 timedelta 的 freq 的 Period 时，错误地引发 IncompatibleFrequency 而不是返回 NaT，存在错误 (GH 47196)
当 dtype.freq.n > 1 时，将整数数组添加到 PeriodDtype 数组时给出错误结果，存在错误 (GH 47209)
当操作溢出时，从 PeriodDtype 数组中减去 Period 返回错误结果而不是引发 OverflowError，存在错误 (GH 47538)

绘图#

DataFrame.plot.barh() 阻止了 x 轴的标注，且 xlabel 更新了 y 轴标签，存在错误 (GH 45144)
DataFrame.plot.box() 阻止了 x 轴的标注，存在错误 (GH 45463)
DataFrame.boxplot() 阻止了传递 xlabel 和 ylabel，存在错误 (GH 45463)
DataFrame.boxplot() 阻止了指定 vert=False，存在错误 (GH 36918)
DataFrame.plot.scatter() 阻止了指定 norm，存在错误 (GH 45809)
修复了在未设置 ylabel 时，Series.plot() 中显示“None”作为 ylabel 的问题 (GH 46129)
DataFrame.plot() 中存在错误，导致绘制季度序列时 xticks 和垂直网格位置不正确 (GH 47602)
DataFrame.plot() 中存在错误，阻止了为辅助 y 轴设置 y 轴标签、限制和刻度 (GH 47753)

分组/重采样/滚动#

DataFrame.resample() 忽略 TimedeltaIndex 上的 closed="right"，存在错误 (GH 45414)
当 func="size" 且输入 DataFrame 具有多列时，DataFrameGroupBy.transform() 失败，存在错误 (GH 27469)
当 axis=1 时，DataFrameGroupBy.size() 和 DataFrameGroupBy.transform() (带 func="size") 产生错误结果，存在错误 (GH 45715)
当 DataFrame 的列数多于行数时，ExponentialMovingWindow.mean() 在 axis=1 且 engine='numba' 时存在错误 (GH 46086)
使用 engine="numba" 时存在错误，修改 engine_kwargs 会返回相同的 JIT 编译函数 (GH 46086)
当 axis=1 且 func 为 "first" 或 "last" 时，DataFrameGroupBy.transform() 失败，存在错误 (GH 45986)
DataFrameGroupBy.cumsum() 在 skipna=False 时给出错误结果，存在错误 (GH 46216)
DataFrameGroupBy.sum()、SeriesGroupBy.sum()、DataFrameGroupBy.prod()、SeriesGroupBy.prod, :meth:().DataFrameGroupBy.cumsum 和 SeriesGroupBy.cumsum() 在整数数据类型时丢失精度，存在错误 (GH 37493)
DataFrameGroupBy.cumsum() 和 SeriesGroupBy.cumsum() 在 timedelta64[ns] 数据类型时未能识别 NaT 为空值，存在错误 (GH 46216)
DataFrameGroupBy.cumsum() 和 SeriesGroupBy.cumsum() 在整数数据类型时，当和大于数据类型的最大值时导致溢出，存在错误 (GH 37493)
DataFrameGroupBy.cummin()、SeriesGroupBy.cummin()、DataFrameGroupBy.cummax() 和 SeriesGroupBy.cummax() 在可空数据类型时错误地原地修改原始数据，存在错误 (GH 46220)
当 MultiIndex 的第一级包含 None 时，DataFrame.groupby() 引发错误，存在错误 (GH 47348)
DataFrameGroupBy.cummax() 和 SeriesGroupBy.cummax() 在 int64 数据类型且首个值为最小可能的 int64 时存在错误 (GH 46382)
DataFrameGroupBy.cumprod() 和 SeriesGroupBy.cumprod() 在 skipna=False 时，NaN 会影响不同列的计算，存在错误 (GH 48064)
DataFrameGroupBy.max() 和 SeriesGroupBy.max() 在空组和 uint64 数据类型时错误地引发 RuntimeError，存在错误 (GH 46408)
当 func 是字符串且提供了 args 或 kwargs 时，DataFrameGroupBy.apply() 和 SeriesGroupBy.apply() 会失败，存在错误 (GH 46479)
当存在唯一分组时，SeriesGroupBy.apply() 会错误地命名其结果，存在错误 (GH 46369)
Rolling.sum() 和 Rolling.mean() 在窗口值相同的情况下会给出错误结果，存在错误 (GH 42064, GH 46431)
Rolling.var() 和 Rolling.std() 在窗口值相同的情况下会给出非零结果，存在错误 (GH 42064)
Rolling.skew() 和 Rolling.kurt() 在窗口值相同的情况下会给出 NaN，存在错误 (GH 30993)
当窗口大小大于数据大小时，Rolling.var() 在计算加权方差时会发生段错误，存在错误 (GH 46760)
Grouper.__repr__() 中存在错误，其中未包含 dropna。现在已包含 (GH 46754)
当 center=True、axis=1 且指定 win_type 时，DataFrame.rolling() 会引发 ValueError，存在错误 (GH 46135)
DataFrameGroupBy.describe() 和 SeriesGroupBy.describe() 对空数据集产生不一致结果，存在错误 (GH 41575)
当与 on 一起使用时，DataFrame.resample() 归约方法会尝试聚合提供的列，存在错误 (GH 47079)
当输入 DataFrame/Series 的 MultiIndex 中有 NaN 值时，DataFrame.groupby() 和 Series.groupby() 不遵守 dropna=False，存在错误 (GH 46783)
当从缺少重采样键的键列表中获取结果时，DataFrameGroupBy.resample() 引发 KeyError，存在错误 (GH 47362)
当 DataFrame 为空进行转换（如 fillna）时，DataFrame.groupby() 会丢失索引列，存在错误 (GH 47787)
DataFrame.groupby() 和 Series.groupby() 在 dropna=False 和 sort=False 时，会将所有空组放在末尾，而不是按照它们出现的顺序，存在错误 (GH 46584)

重塑#

concat() 在连接具有整数数据类型的 Series 和另一个具有整数分类且包含 NaN 值的 CategoricalDtype 时，转换为 object 数据类型而不是 float64，存在错误 (GH 45359)
get_dummies() 选择了 object 和 categorical 数据类型但未选择 string，存在错误 (GH 44965)
当将 MultiIndex 与另一个 MultiIndex 的 Series 对齐时，DataFrame.align() 存在错误 (GH 46001)
连接 IntegerDtype 或 FloatingDtype 数组时存在错误，结果数据类型未反映不可空数据类型的行为 (GH 46379)
当 join="outer" 且 sort=True 时，concat() 丢失列的数据类型，存在错误 (GH 47329)
当包含 None 时，concat() 不排序列名，存在错误 (GH 47331)
concat() 在具有相同键时，在索引 MultiIndex 时导致错误，存在错误 (GH 46519)
当 dropna=True 且聚合列具有扩展数组数据类型时，pivot_table() 引发 TypeError，存在错误 (GH 47477)
在 ssl 库中使用 FIPS 模式时，merge() 对 how="cross" 引发错误，存在错误 (GH 48024)
使用后缀连接具有重复列名的 DataFrame 列表时，DataFrame.join() 存在错误 (GH 46396)
DataFrame.pivot_table() 在 sort=False 时导致索引仍被排序，存在错误 (GH 17041)
当 axis=1 且 sort=False 时，concat() 的结果 Index 是 Int64Index 而不是 RangeIndex，存在错误 (GH 46675)
当 columns 中缺少 stubnames 且 i 包含字符串数据类型列时，wide_to_long() 引发错误，存在错误 (GH 46044)
带有分类索引的 DataFrame.join() 导致意外重新排序，存在错误 (GH 47812)

稀疏#

Series.where() 和 DataFrame.where() 在 SparseDtype 时未能保留数组的 fill_value，存在错误 (GH 45691)
SparseArray.unique() 未能保持原始元素顺序，存在错误 (GH 47809)

扩展数组#

IntegerArray.searchsorted() 和 FloatingArray.searchsorted() 在对 np.nan 操作时返回不一致的结果，存在错误 (GH 45255)

样式器#

尝试将样式函数应用于空的 DataFrame 子集时存在错误 (GH 45313)
CSSToExcelConverter 中存在错误，当未提供边框样式而只提供边框颜色时，导致 xlsxwriter 引擎出现 TypeError (GH 42276)
Styler.set_sticky() 中存在错误，导致在暗模式下出现白底白字 (GH 46984)
Styler.to_latex() 中存在错误，当 clines="all;data" 且 DataFrame 没有行时导致 UnboundLocalError。(GH 47203)
当使用 xlsxwriter 引擎时，Styler.to_excel() 使用 vertical-align: middle; 存在错误 (GH 30107)
将样式应用于具有布尔列标签的 DataFrame 时存在错误 (GH 47838)

元数据#

修复了 DataFrame.melt() 中的元数据传播 (GH 28283)
修复了 DataFrame.explode() 中的元数据传播 (GH 28283)

其他#

当 names=True 且 check_order=False 时，assert_index_equal() 未检查名称，存在错误 (GH 47328)

贡献者#

共有 271 人为本次发布贡献了补丁。名字旁边有“+”的人是首次贡献补丁。

Aadharsh Acharya +
Aadharsh-Acharya +
Aadhi Manivannan +
Adam Bowden
Aditya Agarwal +
Ahmed Ibrahim +
Alastair Porter +
Alex Povel +
Alex-Blade
Alexandra Sciocchetti +
AlonMenczer +
Andras Deak +
Andrew Hawyrluk
Andy Grigg +
Aneta Kahleová +
Anthony Givans +
Anton Shevtsov +
B. J. Potter +
BarkotBeyene +
Ben Beasley +
Ben Wozniak +
Bernhard Wagner +
Boris Rumyantsev
Brian Gollop +
CCXXXI +
Chandrasekaran Anirudh Bhardwaj +
Charles Blackmon-Luca +
Chris Moradi +
ChrisAlbertsen +
Compro Prasad +
DaPy15
Damian Barabonkov +
Daniel I +
Daniel Isaac +
Daniel Schmidt
Danil Iashchenko +
Dare Adewumi
Dennis Chukwunta +
Dennis J. Gray +
Derek Sharp +
Dhruv Samdani +
Dimitra Karadima +
Dmitry Savostyanov +
Dmytro Litvinov +
Do Young Kim +
Dries Schaumont +
Edward Huang +
Eirik +
Ekaterina +
Eli Dourado +
Ezra Brauner +
Fabian Gabel +
FactorizeD +
Fangchen Li
Francesco Romandini +
Greg Gandenberger +
Guo Ci +
Hiroaki Ogasawara
Hood Chatham +
Ian Alexander Joiner +
Irv Lustig
Ivan Ng +
JHM Darbyshire
JHM Darbyshire (MBP)
JHM Darbyshire (iMac)
JMBurley
Jack Goldsmith +
James Freeman +
James Lamb
James Moro +
Janosh Riebesell
Jarrod Millman
Jason Jia +
Jeff Reback
Jeremy Tuloup +
Johannes Mueller
John Bencina +
John Mantios +
John Zangwill
Jon Bramley +
Jonas Haag
Jordan Hicks
Joris Van den Bossche
Jose Ortiz +
JosephParampathu +
José Duarte
Julian Steger +
Kai Priester +
Kapil E. Iyer +
Karthik Velayutham +
Kashif Khan
Kazuki Igeta +
Kevin Jan Anker +
Kevin Sheppard
Khor Chean Wei
Kian Eliasi
Kian S +
Kim, KwonHyun +
Kinza-Raza +
Konjeti Maruthi +
Leonardus Chen
Linxiao Francis Cong +
Loïc Estève
LucasG0 +
Lucy Jiménez +
Luis Pinto
Luke Manley
Marc Garcia
Marco Edward Gorelli
Marco Gorelli
MarcoGorelli
Margarete Dippel +
Mariam-ke +
Martin Fleischmann
Marvin John Walter +
Marvin Walter +
Mateusz
Matilda M +
Matthew Roeschke
Matthias Bussonnier
MeeseeksMachine
Mehgarg +
Melissa Weber Mendonça +
Michael Milton +
Michael Wang
Mike McCarty +
Miloni Atal +
Mitlasóczki Bence +
Moritz Schreiber +
Morten Canth Hels +
Nick Crews +
NickFillot +
Nicolas Hug +
Nima Sarang
Noa Tamir +
Pandas Development Team
Parfait Gasana
Parthi +
Partho +
Patrick Hoefler
Peter
Peter Hawkins +
Philipp A
Philipp Schaefer +
Pierrot +
Pratik Patel +
Prithvijit
Purna Chandra Mansingh +
Radoslaw Lemiec +
RaphSku +
Reinert Huseby Karlsen +
Richard Shadrach
Richard Shadrach +
Robbie Palmer
Robert de Vries
Roger +
Roger Murray +
Ruizhe Deng +
SELEE +
Sachin Yadav +
Saiwing Yeung +
Sam Rao +
Sandro Casagrande +
Sebastiaan Vermeulen +
Shaghayegh +
Shantanu +
Shashank Shet +
Shawn Zhong +
Shuangchi He +
Simon Hawkins
Simon Knott +
Solomon Song +
Somtochi Umeh +
Stefan Krawczyk +
Stefanie Molin
Steffen Rehberg
Steven Bamford +
Steven Rotondo +
Steven Schaerer
Sylvain MARIE +
Sylvain Marié
Tarun Raghunandan Kaushik +
Taylor Packard +
Terji Petersen
Thierry Moisan
Thomas Grainger
Thomas Hunter +
Thomas Li
Tim McFarland +
Tim Swast
Tim Yang +
Tobias Pitters
Tom Aarsen +
Tom Augspurger
Torsten Wörtwein
TraverseTowner +
Tyler Reddy
Valentin Iovene
Varun Sharma +
Vasily Litvinov
Venaturum
Vinicius Akira Imaizumi +
Vladimir Fokow +
Wenjun Si
Will Lachance +
William Andrea
Wolfgang F. Riedl +
Xingrong Chen
Yago González
Yikun Jiang +
Yuanhao Geng
Yuval +
Zero
Zhengfei Wang +
abmyii
alexondor +
alm
andjhall +
anilbey +
arnaudlegout +
asv-bot +
ateki +
auderson +
bherwerth +
bicarlsen +
carbonleakage +
charles +
charlogazzo +
code-review-doctor +
dataxerik +
deponovo
dimitra-karadima +
dospix +
ehallam +
ehsan shirvanian +
ember91 +
eshirvana
fractionalhare +
gaotian98 +
gesoos
github-actions[bot]
gunghub +
hasan-yaman
iansheng +
iasoon +
jbrockmendel
joshuabello2550 +
jyuv +
kouya takahashi +
mariana-LJ +
matt +
mattB1989 +
nealxm +
partev
poloso +
realead
roib20 +
rtpsw
ryangilmour +
shourya5 +
srotondo +
stanleycai95 +
staticdev +
tehunter +
theidexisted +
tobias.pitters +
uncjackg +
vernetya
wany-oh +
wfr +
z3c0 +

1.5.0 版本新特性 (2022 年 9 月 19 日)#

增强功能#

pandas-stubs#

原生 PyArrow 支持的扩展数组 (ExtensionArray)#

DataFrame 交换协议实现#

样式器 (Styler)#

在 DataFrame.resample() 中使用 group_keys 控制索引#

from_dummies#

写入 ORC 文件#

直接从 TAR 归档文件读取#

read_xml 现在支持 dtype, converters 和 parse_dates#

read_xml 现在支持使用 iterparse 处理大型 XML 文件#

写时复制 (Copy on Write)#

其他增强功能#

重要的错误修复#

在 groupby 转换中使用 dropna=True#

使用 iso_dates=True 将时区不敏感的 Timestamp 序列化为 JSON#

DataFrameGroupBy.value_counts 使用非分组分类列和 observed=True#

向后不兼容的 API 更改#

提高依赖项的最低版本要求#

其他 API 更改#

弃用#

在具有 Int64Index 或 RangeIndex 的 Series 上进行基于标签的整数切片#

ExcelWriter 属性#

在 DataFrameGroupBy.apply() 和 SeriesGroupBy.apply() 中使用 group_keys 与转换器#

使用 loc 和 iloc 设置值时的原地操作#

numeric_only 默认值#

其他弃用#

性能改进#

错误修复#

分类#

日期时间类#

时间差#

时区#

数值#

转换#

字符串#

区间#

索引#

缺失值#

多级索引#

I/O#

Period#

绘图#

分组/重采样/滚动#

重塑#

稀疏#

扩展数组#

样式器#

元数据#

其他#

贡献者#

`pandas-stubs`#

在 `DataFrame.resample()` 中使用 `group_keys` 控制索引#

read_xml 现在支持 `dtype`, `converters` 和 `parse_dates`#

read_xml 现在支持使用 `iterparse` 处理大型 XML 文件#

在 `groupby` 转换中使用 `dropna=True`#

使用 `iso_dates=True` 将时区不敏感的 Timestamp 序列化为 JSON#

DataFrameGroupBy.value_counts 使用非分组分类列和 `observed=True`#

`ExcelWriter` 属性#

在 `DataFrameGroupBy.apply()` 和 `SeriesGroupBy.apply()` 中使用 `group_keys` 与转换器#

使用 `loc` 和 `iloc` 设置值时的原地操作#

`numeric_only` 默认值#