1.0.0 (2020年1月29日) 有何新变化#

以下是 pandas 1.0.0 中的变更。有关包含其他 pandas 版本的完整变更日志，请参阅发行说明。

注意

pandas 1.0 版本移除了许多在之前版本中已弃用的功能（有关概述，请参阅下文）。建议在升级到 pandas 1.0 之前，先升级到 pandas 0.25 并确保您的代码没有警告。

新弃用策略#

从 pandas 1.0.0 开始，pandas 将采用 SemVer 的变体进行版本发布。简而言之，

弃用将在次要版本中引入（例如 1.1.0, 1.2.0, 2.1.0, …）
弃用将在主要版本中强制执行（例如 1.0.0, 2.0.0, 3.0.0, …）
API 破坏性更改将仅在主要版本中进行（实验性功能除外）

更多信息请参阅版本策略。

增强功能#

在 `rolling.apply` 和 `expanding.apply` 中使用 Numba#

我们为 apply() 和 apply() 添加了一个 engine 关键字，允许用户使用 Numba 而不是 Cython 执行例程。如果 apply 函数可以在 numpy 数组上操作且数据集较大（100 万行或更多），使用 Numba 引擎可以带来显著的性能提升。更多详细信息，请参阅滚动应用文档 (GH 28987, GH 30936)

为滚动操作定义自定义窗口#

我们添加了一个 pandas.api.indexers.BaseIndexer() 类，允许用户定义在 rolling 操作期间如何创建窗口边界。用户可以在 pandas.api.indexers.BaseIndexer() 的子类上定义自己的 get_window_bounds 方法，该方法将在滚动聚合期间为每个窗口生成起始和结束索引。更多详细信息和示例用法，请参阅自定义窗口滚动文档

转换为 Markdown#

我们添加了 to_markdown() 以创建 Markdown 表格 (GH 11052)

In [1]: df = pd.DataFrame({"A": [1, 2, 3], "B": [1, 2, 3]}, index=['a', 'a', 'b'])

In [2]: print(df.to_markdown())
|    |   A |   B |
|:---|----:|----:|
| a  |   1 |   1 |
| a  |   2 |   2 |
| b  |   3 |   3 |

实验性新功能#

实验性 `NA` 标量，用于表示缺失值#

引入了一个新的 pd.NA 值（单例），用于表示标量缺失值。到目前为止，pandas 使用多个值来表示缺失数据：np.nan 用于浮点数据，np.nan 或 None 用于对象-dtype 数据，pd.NaT 用于日期时间类数据。pd.NA 的目标是提供一个可以在各种数据类型中一致使用的“缺失”指示符。pd.NA 目前用于可空整数和布尔数据类型以及新的字符串数据类型 (GH 28095)。

警告

实验性：pd.NA 的行为仍可能在不发出警告的情况下更改。

例如，使用可空整数 dtype 创建 Series

In [3]: s = pd.Series([1, 2, None], dtype="Int64")

In [4]: s
Out[4]: 
0       1
1       2
2    <NA>
Length: 3, dtype: Int64

In [5]: s[2]
Out[5]: <NA>

与 np.nan 相比，pd.NA 在某些操作中的行为不同。除了算术运算之外，pd.NA 在比较运算中也会传播为“缺失”或“未知”。

In [6]: np.nan > 1
Out[6]: False

In [7]: pd.NA > 1
Out[7]: <NA>

对于逻辑运算，pd.NA 遵循三值逻辑（或 Kleene 逻辑）的规则。例如

In [8]: pd.NA | True
Out[8]: True

更多信息，请参阅用户指南中关于缺失数据的NA 部分。

专用字符串数据类型#

我们添加了 StringDtype，这是一种专用于字符串数据的扩展类型。以前，字符串通常存储在对象-dtype NumPy 数组中。 (GH 29975)

警告

StringDtype 目前被认为是实验性的。其实现和部分 API 可能会在不发出警告的情况下更改。

'string' 扩展类型解决了对象-dtype NumPy 数组的几个问题

您可能会不小心在 object dtype 数组中存储字符串和非字符串的混合。而 StringArray 只能存储字符串。
object dtype 会破坏特定于 dtype 的操作，例如 DataFrame.select_dtypes()。没有明确的方法可以只选择文本，同时排除非文本但仍是对象-dtype 的列。
阅读代码时，object dtype 数组的内容不如 string 清晰。

In [9]: pd.Series(['abc', None, 'def'], dtype=pd.StringDtype())
Out[9]: 
0     abc
1    <NA>
2     def
Length: 3, dtype: string

您也可以使用别名 "string"。

In [10]: s = pd.Series(['abc', None, 'def'], dtype="string")

In [11]: s
Out[11]: 
0     abc
1    <NA>
2     def
Length: 3, dtype: string

常用的字符串访问器方法有效。在适当的情况下，Series 或 DataFrame 列的返回类型也将是字符串 dtype。

In [12]: s.str.upper()
Out[12]: 
0     ABC
1    <NA>
2     DEF
Length: 3, dtype: string

In [13]: s.str.split('b', expand=True).dtypes
Out[13]: 
0    string[python]
1    string[python]
Length: 2, dtype: object

返回整数的字符串访问器方法将返回一个 Int64Dtype 类型的值

In [14]: s.str.count("a")
Out[14]: 
0       1
1    <NA>
2       0
Length: 3, dtype: Int64

我们建议在处理字符串时显式使用 string 数据类型。更多信息请参阅文本数据类型。

支持缺失值的布尔数据类型#

我们添加了 BooleanDtype / BooleanArray，这是一种专用于布尔数据并可保存缺失值的扩展类型。默认的 bool 数据类型基于布尔-dtype NumPy 数组，该列只能保存 True 或 False，而不能保存缺失值。这个新的 BooleanArray 也可以通过在单独的掩码中跟踪来存储缺失值。 (GH 29555, GH 30095, GH 31131)

In [15]: pd.Series([True, False, None], dtype=pd.BooleanDtype())
Out[15]: 
0     True
1    False
2     <NA>
Length: 3, dtype: boolean

您也可以使用别名 "boolean"。

In [16]: s = pd.Series([True, False, None], dtype="boolean")

In [17]: s
Out[17]: 
0     True
1    False
2     <NA>
Length: 3, dtype: boolean

方法 `convert_dtypes` 简化支持的扩展 dtype 的使用#

为了鼓励使用支持 pd.NA 的扩展 dtype，例如 StringDtype、BooleanDtype、Int64Dtype、Int32Dtype 等，我们引入了 DataFrame.convert_dtypes() 和 Series.convert_dtypes() 方法。 (GH 29752) (GH 30929)

示例

In [18]: df = pd.DataFrame({'x': ['abc', None, 'def'],
   ....:                    'y': [1, 2, np.nan],
   ....:                    'z': [True, False, True]})
   ....: 

In [19]: df
Out[19]: 
      x    y      z
0   abc  1.0   True
1  None  2.0  False
2   def  NaN   True

[3 rows x 3 columns]

In [20]: df.dtypes
Out[20]: 
x     object
y    float64
z       bool
Length: 3, dtype: object

In [21]: converted = df.convert_dtypes()

In [22]: converted
Out[22]: 
      x     y      z
0   abc     1   True
1  <NA>     2  False
2   def  <NA>   True

[3 rows x 3 columns]

In [23]: converted.dtypes
Out[23]: 
x    string[python]
y             Int64
z           boolean
Length: 3, dtype: object

这在使用 read_csv() 和 read_excel() 等读取器读取数据后特别有用。有关说明，请参阅此处。

其他增强功能#

DataFrame.to_string() 添加了 max_colwidth 参数，用于控制何时截断宽列 (GH 9784)
为 Series.to_numpy()、Index.to_numpy() 和 DataFrame.to_numpy() 添加了 na_value 参数，用于控制缺失数据的值 (GH 30322)
MultiIndex.from_product() 在未显式提供的情况下从输入中推断级别名称 (GH 27292)
DataFrame.to_latex() 现在接受 caption 和 label 参数 (GH 25436)
包含可空整数、新字符串 dtype 和周期数据类型的 DataFrame 现在可以转换为 pyarrow (>=0.15.0)，这意味着在使用 pyarrow 引擎时，它支持写入 Parquet 文件格式 (GH 28368)。从 pyarrow >= 0.16 开始，完全往返于 parquet（使用 to_parquet() / read_parquet()）受支持 (GH 20612)。
to_parquet() 现在可以正确处理 pyarrow 引擎中用户定义模式的 schema 参数。 (GH 30270)
DataFrame.to_json() 现在接受 indent 整数参数，以启用 JSON 输出的美化打印 (GH 12004)
read_stata() 可以读取 Stata 119 dta 文件。 (GH 28250)
实现了 Window.var() 和 Window.std() 函数 (GH 26597)
为 DataFrame.to_string() 添加了 encoding 参数，用于处理非 ASCII 文本 (GH 28766)
为 DataFrame.to_html() 添加了 encoding 参数，用于处理非 ASCII 文本 (GH 28663)
Styler.background_gradient() 现在接受 vmin 和 vmax 参数 (GH 12145)
Styler.format() 添加了 na_rep 参数，以帮助格式化缺失值 (GH 21527, GH 28358)
read_excel() 现在可以通过传入 engine='pyxlsb' 读取二进制 Excel (.xlsb) 文件。更多详细信息和示例用法，请参阅二进制 Excel 文件文档。关闭 GH 8540。
DataFrame.to_parquet() 中的 partition_cols 参数现在接受字符串 (GH 27117)
pandas.read_json() 现在解析 NaN, Infinity 和 -Infinity (GH 12213)
DataFrame 构造函数保留 ExtensionArray 的 dtype 为 ExtensionArray (GH 11363)
DataFrame.sort_values() 和 Series.sort_values() 增加了 ignore_index 关键字，以便在排序后重置索引 (GH 30114)
DataFrame.sort_index() 和 Series.sort_index() 增加了 ignore_index 关键字以重置索引 (GH 30114)
DataFrame.drop_duplicates() 增加了 ignore_index 关键字以重置索引 (GH 30114)
添加了用于导出 118 和 119 版本 Stata dta 文件的新写入器 StataWriterUTF8。这些文件格式支持导出包含 Unicode 字符的字符串。格式 119 支持变量超过 32,767 个的数据集 (GH 23573, GH 30959)
Series.map() 现在接受 collections.abc.Mapping 子类作为映射器 (GH 29733)
添加了一个实验性的 attrs 用于存储数据集的全局元数据 (GH 29062)
Timestamp.fromisocalendar() 现在兼容 Python 3.8 及以上版本 (GH 28115)
DataFrame.to_pickle() 和 read_pickle() 现在接受 URL (GH 30163)

向后不兼容的 API 更改#

避免使用 `MultiIndex.levels` 中的名称#

作为 MultiIndex 大规模重构的一部分，级别名称现在与级别分开存储 (GH 27242)。我们建议使用 MultiIndex.names 访问名称，并使用 Index.set_names() 更新名称。

为了向后兼容，您仍然可以通过级别访问名称。

In [24]: mi = pd.MultiIndex.from_product([[1, 2], ['a', 'b']], names=['x', 'y'])

In [25]: mi.levels[0].name
Out[25]: 'x'

但是，不再可能通过级别更新 MultiIndex 的名称。

In [26]: mi.levels[0].name = "new name"
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
Cell In[26], line 1
----> 1 mi.levels[0].name = "new name"

File ~/work/pandas/pandas/pandas/core/indexes/base.py:1697, in Index.name(self, value)
   1693 @name.setter
   1694 def name(self, value: Hashable) -> None:
   1695     if self._no_setting_name:
   1696         # Used in MultiIndex.levels to avoid silently ignoring name updates.
-> 1697         raise RuntimeError(
   1698             "Cannot set name on a level of a MultiIndex. Use "
   1699             "'MultiIndex.set_names' instead."
   1700         )
   1701     maybe_extract_name(value, None, type(self))
   1702     self._name = value

RuntimeError: Cannot set name on a level of a MultiIndex. Use 'MultiIndex.set_names' instead.

In [27]: mi.names
Out[27]: FrozenList(['x', 'y'])

要更新，请使用 MultiIndex.set_names，它会返回一个新的 MultiIndex。

In [28]: mi2 = mi.set_names("new name", level=0)

In [29]: mi2.names
Out[29]: FrozenList(['new name', 'y'])

`IntervalArray` 的新表示#

pandas.arrays.IntervalArray 采用了与其它数组类一致的新 __repr__ 表示 (GH 25022)

pandas 0.25.x

In [1]: pd.arrays.IntervalArray.from_tuples([(0, 1), (2, 3)])
Out[2]:
IntervalArray([(0, 1], (2, 3]],
              closed='right',
              dtype='interval[int64]')

pandas 1.0.0

In [30]: pd.arrays.IntervalArray.from_tuples([(0, 1), (2, 3)])
Out[30]: 
<IntervalArray>
[(0, 1], (2, 3]]
Length: 2, dtype: interval[int64, right]

`DataFrame.rename` 现在只接受一个位置参数#

DataFrame.rename() 以前会接受可能导致歧义或未定义行为的位置参数。从 pandas 1.0 开始，只有第一个参数（它将标签映射到默认轴上的新名称）允许按位置传递 (GH 29136)。

pandas 0.25.x

In [1]: df = pd.DataFrame([[1]])
In [2]: df.rename({0: 1}, {0: 2})
Out[2]:
FutureWarning: ...Use named arguments to resolve ambiguity...
   2
1  1

pandas 1.0.0

In [3]: df.rename({0: 1}, {0: 2})
Traceback (most recent call last):
...
TypeError: rename() takes from 1 to 2 positional arguments but 3 were given

请注意，现在在提供冲突或可能模棱两可的参数时将引发错误。

pandas 0.25.x

In [4]: df.rename({0: 1}, index={0: 2})
Out[4]:
   0
1  1

In [5]: df.rename(mapper={0: 1}, index={0: 2})
Out[5]:
   0
2  1

pandas 1.0.0

In [6]: df.rename({0: 1}, index={0: 2})
Traceback (most recent call last):
...
TypeError: Cannot specify both 'mapper' and any of 'index' or 'columns'

In [7]: df.rename(mapper={0: 1}, index={0: 2})
Traceback (most recent call last):
...
TypeError: Cannot specify both 'mapper' and any of 'index' or 'columns'

您仍然可以通过提供 axis 关键字参数来更改第一个位置参数应用的轴。

In [31]: df.rename({0: 1})
Out[31]: 
   0
1  1

[1 rows x 1 columns]

In [32]: df.rename({0: 1}, axis=1)
Out[32]: 
   1
0  1

[1 rows x 1 columns]

如果您想同时更新索引和列标签，请务必使用相应的关键字。

In [33]: df.rename(index={0: 1}, columns={0: 2})
Out[33]: 
   2
1  1

[1 rows x 1 columns]

扩展了 `DataFrame` 的详细信息输出#

DataFrame.info() 现在显示列摘要的行号 (GH 17304)

pandas 0.25.x

In [1]: df = pd.DataFrame({"int_col": [1, 2, 3],
...                    "text_col": ["a", "b", "c"],
...                    "float_col": [0.0, 0.1, 0.2]})
In [2]: df.info(verbose=True)
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3 entries, 0 to 2
Data columns (total 3 columns):
int_col      3 non-null int64
text_col     3 non-null object
float_col    3 non-null float64
dtypes: float64(1), int64(1), object(1)
memory usage: 152.0+ bytes

pandas 1.0.0

In [34]: df = pd.DataFrame({"int_col": [1, 2, 3],
   ....:                    "text_col": ["a", "b", "c"],
   ....:                    "float_col": [0.0, 0.1, 0.2]})
   ....: 

In [35]: df.info(verbose=True)
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3 entries, 0 to 2
Data columns (total 3 columns):
 #   Column     Non-Null Count  Dtype  
---  ------     --------------  -----  
 0   int_col    3 non-null      int64  
 1   text_col   3 non-null      object 
 2   float_col  3 non-null      float64
dtypes: float64(1), int64(1), object(1)
memory usage: 200.0+ bytes

`pandas.array()` 推断变化#

pandas.array() 现在在多种情况下推断 pandas 的新扩展类型 (GH 29791)

字符串数据（包括缺失值）现在返回一个 arrays.StringArray。
整数数据（包括缺失值）现在返回一个 arrays.IntegerArray。
布尔数据（包括缺失值）现在返回新的 arrays.BooleanArray

pandas 0.25.x

In [1]: pd.array(["a", None])
Out[1]:
<PandasArray>
['a', None]
Length: 2, dtype: object

In [2]: pd.array([1, None])
Out[2]:
<PandasArray>
[1, None]
Length: 2, dtype: object

pandas 1.0.0

In [36]: pd.array(["a", None])
Out[36]: 
<StringArray>
['a', <NA>]
Length: 2, dtype: string

In [37]: pd.array([1, None])
Out[37]: 
<IntegerArray>
[1, <NA>]
Length: 2, dtype: Int64

提醒一下，您可以指定 dtype 来禁用所有推断。

`arrays.IntegerArray` 现在使用 `pandas.NA`#

arrays.IntegerArray 现在使用 pandas.NA 而不是 numpy.nan 作为其缺失值标记 (GH 29964)。

pandas 0.25.x

In [1]: a = pd.array([1, 2, None], dtype="Int64")
In [2]: a
Out[2]:
<IntegerArray>
[1, 2, NaN]
Length: 3, dtype: Int64

In [3]: a[2]
Out[3]:
nan

pandas 1.0.0

In [38]: a = pd.array([1, 2, None], dtype="Int64")

In [39]: a
Out[39]: 
<IntegerArray>
[1, 2, <NA>]
Length: 3, dtype: Int64

In [40]: a[2]
Out[40]: <NA>

这会带来一些 API 破坏性后果。

转换为 NumPy ndarray

当转换为 NumPy 数组时，缺失值将是 pd.NA，它无法转换为浮点数。因此，调用 np.asarray(integer_array, dtype="float") 现在将引发错误。

pandas 0.25.x

In [1]: np.asarray(a, dtype="float")
Out[1]:
array([ 1.,  2., nan])

pandas 1.0.0

In [41]: np.asarray(a, dtype="float")
Out[41]: array([ 1.,  2., nan])

请改用带有显式 na_value 的 arrays.IntegerArray.to_numpy()。

In [42]: a.to_numpy(dtype="float", na_value=np.nan)
Out[42]: array([ 1.,  2., nan])

归约可返回 pd.NA

当执行诸如 skipna=False 的求和归约时，如果存在缺失值，结果将是 pd.NA 而不是 np.nan (GH 30958)。

pandas 0.25.x

In [1]: pd.Series(a).sum(skipna=False)
Out[1]:
nan

pandas 1.0.0

In [43]: pd.Series(a).sum(skipna=False)
Out[43]: <NA>

value_counts 返回一个可空整数 dtype

带有可空整数 dtype 的 Series.value_counts() 现在为这些值返回一个可空整数 dtype。

pandas 0.25.x

In [1]: pd.Series([2, 1, 1, None], dtype="Int64").value_counts().dtype
Out[1]:
dtype('int64')

pandas 1.0.0

In [44]: pd.Series([2, 1, 1, None], dtype="Int64").value_counts().dtype
Out[44]: Int64Dtype()

有关 pandas.NA 和 numpy.nan 之间差异的更多信息，请参阅NA 语义。

`arrays.IntegerArray` 比较返回 `arrays.BooleanArray`#

对 arrays.IntegerArray 的比较操作现在返回 arrays.BooleanArray 而不是 NumPy 数组 (GH 29964)。

pandas 0.25.x

In [1]: a = pd.array([1, 2, None], dtype="Int64")
In [2]: a
Out[2]:
<IntegerArray>
[1, 2, NaN]
Length: 3, dtype: Int64

In [3]: a > 1
Out[3]:
array([False,  True, False])

pandas 1.0.0

In [45]: a = pd.array([1, 2, None], dtype="Int64")

In [46]: a > 1
Out[46]: 
<BooleanArray>
[False, True, <NA>]
Length: 3, dtype: boolean

请注意，缺失值现在会传播，而不是像 numpy.nan 那样始终不相等。更多信息请参阅NA 语义。

默认情况下，`Categorical.min()` 现在返回最小值而不是 np.nan#

当 Categorical 包含 np.nan 时，Categorical.min() 默认不再返回 np.nan (skipna=True) (GH 25303)

pandas 0.25.x

In [1]: pd.Categorical([1, 2, np.nan], ordered=True).min()
Out[1]: nan

pandas 1.0.0

In [47]: pd.Categorical([1, 2, np.nan], ordered=True).min()
Out[47]: 1

空 `pandas.Series` 的默认 dtype#

现在，不指定 dtype 初始化空 pandas.Series 将引发 DeprecationWarning (GH 17261)。默认 dtype 在未来版本中将从 float64 更改为 object，以便与 DataFrame 和 Index 的行为保持一致。

pandas 1.0.0

In [1]: pd.Series()
Out[2]:
DeprecationWarning: The default dtype for empty Series will be 'object' instead of 'float64' in a future version. Specify a dtype explicitly to silence this warning.
Series([], dtype: float64)

重新采样操作的结果 dtype 推断更改#

DataFrame.resample() 聚合中结果 dtype 的规则已针对扩展类型发生变化 (GH 31359)。以前，pandas 会尝试将结果转换回原始 dtype，如果不可能则回退到通常的推断规则。现在，只有当结果中的标量值是扩展 dtype 的标量类型的实例时，pandas 才会返回原始 dtype 的结果。

In [48]: df = pd.DataFrame({"A": ['a', 'b']}, dtype='category',
   ....:                   index=pd.date_range('2000', periods=2))
   ....: 

In [49]: df
Out[49]: 
            A
2000-01-01  a
2000-01-02  b

[2 rows x 1 columns]

pandas 0.25.x

In [1]> df.resample("2D").agg(lambda x: 'a').A.dtype
Out[1]:
CategoricalDtype(categories=['a', 'b'], ordered=False)

pandas 1.0.0

In [50]: df.resample("2D").agg(lambda x: 'a').A.dtype
Out[50]: CategoricalDtype(categories=['a', 'b'], ordered=False, categories_dtype=object)

这修复了 resample 和 groupby 之间的一致性问题。这也修复了一个潜在的错误，即结果的**值**可能会根据结果如何强制转换回原始 dtype 而改变。

pandas 0.25.x

In [1] df.resample("2D").agg(lambda x: 'c')
Out[1]:

     A
0  NaN

pandas 1.0.0

In [51]: df.resample("2D").agg(lambda x: 'c')
Out[51]: 
            A
2000-01-01  c

[1 rows x 1 columns]

Python 最低版本要求提高#

pandas 1.0.0 支持 Python 3.6.1 及更高版本 (GH 29212)。

依赖项的最低版本要求提高#

一些依赖项的最低支持版本已更新 (GH 29766, GH 29723)。如果已安装，我们现在要求：

包	最低版本	必需
numpy	1.13.3	X
pytz	2015.4	X
python-dateutil	2.6.1	X
bottleneck	1.2.1
numexpr	2.6.2
pytest (开发)	4.0.2

对于可选库，一般建议使用最新版本。下表列出了在 pandas 开发过程中当前正在测试的每个库的最低版本。低于最低测试版本号的可选库可能仍然有效，但不被视为受支持。

包	最低版本	已更改
beautifulsoup4	4.6.0
fastparquet	0.3.2	X
gcsfs	0.2.2
lxml	3.8.0
matplotlib	2.2.2
numba	0.46.0	X
openpyxl	2.5.7	X
pyarrow	0.13.0	X
pymysql	0.7.1
pytables	3.4.2
s3fs	0.3.0	X
scipy	0.19.0
sqlalchemy	1.1.4
xarray	0.8.2
xlrd	1.1.0
xlsxwriter	0.9.8
xlwt	1.2.0

更多信息请参阅依赖项和可选依赖项。

构建更改#

pandas 已添加 pyproject.toml 文件，并且不再在上传到 PyPI 的源代码分发中包含 Cythonized 文件 (GH 28341, GH 20775)。如果您正在安装预构建的分发包 (wheel) 或通过 conda 安装，这应该对您没有任何影响。如果您从源代码构建 pandas，则在调用 pip install pandas 之前不再需要将 Cython 安装到您的构建环境中。

其他 API 更改#

DataFrameGroupBy.transform() 和 SeriesGroupBy.transform() 现在对无效的操作名称引发错误 (GH 27489)
pandas.api.types.infer_dtype() 现在对整数和 np.nan 的混合类型返回“integer-na” (GH 27283)
如果显式提供了 names=None，MultiIndex.from_arrays() 将不再从数组中推断名称 (GH 27292)
为了改善 Tab 自动补全，pandas 在使用 dir（例如 dir(df)）自省 pandas 对象时，不包含大多数已弃用的属性。要查看哪些属性被排除，请查看对象的 _deprecations 属性，例如 pd.DataFrame._deprecations (GH 28805)。
unique() 的返回 dtype 现在与输入 dtype 匹配。 (GH 27874)
将 options.matplotlib.register_converters 的默认配置值从 True 更改为 "auto" (GH 18720)。现在，pandas 自定义格式化程序将仅通过 plot() 应用于由 pandas 创建的绘图。以前，pandas 的格式化程序会应用于在 plot() 之后创建的所有绘图。更多信息请参阅单位注册。
Series.dropna() 已弃用其 **kwargs 参数，转而使用单个 how 参数。以前，向 **kwargs 提供 how 以外的任何内容都会引发 TypeError (GH 29388)
测试 pandas 时，pytest 的新最低要求版本是 5.0.1 (GH 29664)
Series.str.__iter__() 已弃用，并将在未来版本中移除 (GH 28277)。
将 <NA> 添加到 read_csv() 的默认 NA 值列表中 (GH 30821)

文档改进#

添加了关于扩展到大型数据集的新章节 (GH 28315)。
添加了关于 HDF5 数据集的查询 MultiIndex 的子章节 (GH 28791)。

弃用#

Series.item() 和 Index.item() 已被_取消弃用_ (GH 29250)
Index.set_value 已弃用。对于给定的索引 idx、数组 arr、idx 中值为 idx_val 以及新值为 val，idx.set_value(arr, idx_val, val) 等效于 arr[idx.get_loc(idx_val)] = val，应改用此方法 (GH 28621)。
is_extension_type() 已弃用，应改用 is_extension_array_dtype() (GH 29457)
eval() 关键字参数“truediv”已弃用，并将在未来版本中移除 (GH 29812)
DateOffset.isAnchored() 和 DatetOffset.onOffset() 已弃用，并将在未来版本中移除，请改用 DateOffset.is_anchored() 和 DateOffset.is_on_offset() (GH 30340)
pandas.tseries.frequencies.get_offset 已弃用，并将在未来版本中移除，请改用 pandas.tseries.frequencies.to_offset (GH 4205)
Categorical.take_nd() 和 CategoricalIndex.take_nd() 已弃用，请改用 Categorical.take() 和 CategoricalIndex.take() (GH 27745)
Categorical.min() 和 Categorical.max() 的参数 numeric_only 已弃用，并替换为 skipna (GH 25303)
lreshape() 中的参数 label 已弃用，并将在未来版本中移除 (GH 29742)
pandas.core.index 已弃用，并将在未来版本中移除，公共类可在顶级命名空间中获取 (GH 19711)
pandas.json_normalize() 现在已在顶级命名空间中公开。现在已弃用将 json_normalize 作为 pandas.io.json.json_normalize 使用，建议改用 json_normalize 作为 pandas.json_normalize() (GH 27586)。
pandas.read_json() 的 numpy 参数已弃用 (GH 28512)。
DataFrame.to_stata()、DataFrame.to_feather() 和 DataFrame.to_parquet() 的参数“fname”已弃用，请改用“path” (GH 23574)
RangeIndex 的已弃用内部属性 _start、_stop 和 _step 现在引发 FutureWarning 而不是 DeprecationWarning (GH 26581)
pandas.util.testing 模块已弃用。请使用 Assertion functions 中记录的 pandas.testing 中的公共 API (GH 16232)。
pandas.SparseArray 已弃用。请改用 pandas.arrays.SparseArray (arrays.SparseArray)。 (GH 30642)
Series.take() 和 DataFrame.take() 的参数 is_copy 已弃用，并将在未来版本中移除。 (GH 27357)
在 Index 上使用多维索引（例如 index[:, None]）已弃用，并将在未来版本中移除，请改为在索引之前转换为 numpy 数组 (GH 30588)
pandas.np 子模块已弃用。请改为直接导入 numpy (GH 30296)
pandas.datetime 类已弃用。请改从 datetime 导入 (GH 30610)
diff 将在未来引发 TypeError，而不是隐式丢失扩展类型的 dtype。请改为在调用 diff 之前转换为正确的 dtype (GH 31025)

从分组 DataFrame 中选择列

从 DataFrameGroupBy 对象中选择列时，不推荐在单括号内传递单个键（或键的元组），应改为使用项列表。 (GH 23566) 例如

df = pd.DataFrame({
    "A": ["foo", "bar", "foo", "bar", "foo", "bar", "foo", "foo"],
    "B": np.random.randn(8),
    "C": np.random.randn(8),
})
g = df.groupby('A')

# single key, returns SeriesGroupBy
g['B']

# tuple of single key, returns SeriesGroupBy
g[('B',)]

# tuple of multiple keys, returns DataFrameGroupBy, raises FutureWarning
g[('B', 'C')]

# multiple keys passed directly, returns DataFrameGroupBy, raises FutureWarning
# (implicitly converts the passed strings into a single tuple)
g['B', 'C']

# proper way, returns DataFrameGroupBy
g[['B', 'C']]

移除先前版本的弃用/更改#

移除了 SparseSeries 和 SparseDataFrame

SparseSeries、SparseDataFrame 和 DataFrame.to_sparse 方法已移除 (GH 28425)。我们建议改用包含稀疏值的 Series 或 DataFrame。

Matplotlib 单位注册

以前，pandas 会在导入时附带地向 matplotlib 注册转换器 (GH 18720)。这改变了在导入 pandas 后通过 matplotlib 绘图生成的输出，即使您是直接使用 matplotlib 而不是 plot()。

要在 matplotlib 绘图中使用 pandas 格式化程序，请指定

In [1]: import pandas as pd
In [2]: pd.options.plotting.matplotlib.register_converters = True

请注意，由 DataFrame.plot() 和 Series.plot() 创建的绘图确实会自动注册转换器。唯一的行为变化是在通过 matplotlib.pyplot.plot 或 matplotlib.Axes.plot 绘制日期类对象时。更多信息请参阅时间序列绘图的自定义格式化程序。

其他移除

从 read_stata()、StataReader 和 StataReader.read() 中移除了先前已弃用的关键字“index”，请改用“index_col” (GH 17328)
移除了 StataReader.data 方法，请改用 StataReader.read() (GH 9493)
移除了 pandas.plotting._matplotlib.tsplot，请改用 Series.plot() (GH 19980)
pandas.tseries.converter.register 已移动到 pandas.plotting.register_matplotlib_converters() (GH 18307)
Series.plot() 不再接受位置参数，请改用关键字参数 (GH 30003)
DataFrame.hist() 和 Series.hist() 不再允许 figsize="default"，请改用传入元组的方式指定图形大小 (GH 30003)
整数-dtype 数组通过 Timedelta 进行整除现在会引发 TypeError (GH 21036)
TimedeltaIndex 和 DatetimeIndex 不再接受非纳秒级 dtype 字符串，例如“timedelta64”或“datetime64”，请改用“timedelta64[ns]”和“datetime64[ns]” (GH 24806)
将 pandas.api.types.infer_dtype() 中默认的“skipna”参数从 False 更改为 True (GH 24050)
移除 Series.ix 和 DataFrame.ix (GH 26438)
移除 Index.summary (GH 18217)
移除 Index 构造函数中之前已弃用的关键字“fastpath” (GH 23110)
移除 Series.get_value, Series.set_value, DataFrame.get_value, DataFrame.set_value (GH 17739)
移除 Series.compound 和 DataFrame.compound (GH 26405)
将 DataFrame.set_index() 和 Series.set_axis() 中默认的“inplace”参数从 None 更改为 False (GH 27600)
移除 Series.cat.categorical, Series.cat.index, Series.cat.name (GH 24751)
移除 to_datetime() 和 to_timedelta() 中之前已弃用的关键字“box”；此外，它们现在始终返回 DatetimeIndex, TimedeltaIndex, Index, Series 或 DataFrame (GH 24486)
to_timedelta(), Timedelta 和 TimedeltaIndex 不再允许“unit”参数使用“M”、“y”或“Y” (GH 23264)
移除（非公开）offsets.generate_range 中之前已弃用的关键字“time_rule”，该函数已移至 core.arrays._ranges.generate_range() (GH 24157)
DataFrame.loc() 或 Series.loc() 使用类似列表的索引器和缺失标签时，不再重新索引 (GH 17295)
DataFrame.to_excel() 和 Series.to_excel() 在列不存在时不再重新索引 (GH 17295)
移除 concat() 中之前已弃用的关键字“join_axes”；请改用结果上的 reindex_like (GH 22318)
移除 DataFrame.sort_index() 中之前已弃用的关键字“by”，请改用 DataFrame.sort_values() (GH 10726)
移除对 DataFrame.aggregate(), Series.aggregate(), core.groupby.DataFrameGroupBy.aggregate(), core.groupby.SeriesGroupBy.aggregate(), core.window.rolling.Rolling.aggregate() 中嵌套重命名的支持 (GH 18529)
将 datetime64 数据传递给 TimedeltaIndex 或将 timedelta64 数据传递给 DatetimeIndex 现在会引发 TypeError (GH 23539, GH 23937)
将 int64 值传递给 DatetimeIndex 并指定时区时，现在会将这些值解释为 UTC 中的纳秒时间戳，而不是给定时区中的墙上时间 (GH 24559)
传递给 DataFrame.groupby() 的元组现在仅被视为单个键 (GH 18314)
移除 Index.contains，请改用 key in index (GH 30103)
在 Timestamp, DatetimeIndex, TimedeltaIndex 中不再允许 int 或整数数组的加减运算，请改用 obj + n * obj.freq 而不是 obj + n (GH 22535)
移除 Series.ptp (GH 21614)
移除 Series.from_array (GH 18258)
移除 DataFrame.from_items (GH 18458)
移除 DataFrame.as_matrix, Series.as_matrix (GH 18458)
移除 Series.asobject (GH 18477)
移除 DataFrame.as_blocks, Series.as_blocks, DataFrame.blocks, Series.blocks (GH 17656)
pandas.Series.str.cat() 现在默认使用 join='left' 对齐 others (GH 27611)
pandas.Series.str.cat() 不再接受列表中的列表 (GH 27611)
带有 Categorical dtype 的 Series.where()（或带有 Categorical 列的 DataFrame.where()）不再允许设置新类别 (GH 24114)
移除 DatetimeIndex, TimedeltaIndex 和 PeriodIndex 构造函数中之前已弃用的关键字“start”、“end”和“periods”；请改用 date_range(), timedelta_range() 和 period_range() (GH 23919)
移除 DatetimeIndex 和 TimedeltaIndex 构造函数中之前已弃用的关键字“verify_integrity” (GH 23919)
移除 pandas.core.internals.blocks.make_block 中之前已弃用的关键字“fastpath” (GH 19265)
移除 Block.make_block_same_class() 中之前已弃用的关键字“dtype” (GH 19434)
移除 ExtensionArray._formatting_values。请改用 ExtensionArray._formatter。 (GH 23601)
移除 MultiIndex.to_hierarchical (GH 21613)
移除 MultiIndex.labels，请改用 MultiIndex.codes (GH 23752)
移除 MultiIndex 构造函数中之前已弃用的关键字“labels”，请改用“codes” (GH 23752)
移除 MultiIndex.set_labels，请改用 MultiIndex.set_codes() (GH 23752)
移除 MultiIndex.set_codes(), MultiIndex.copy(), MultiIndex.drop() 中之前已弃用的关键字“labels”，请改用“codes” (GH 23752)
移除对旧版 HDF5 格式的支持 (GH 29787)
不再允许将 dtype 别名（例如“datetime64[ns, UTC]”）传递给 DatetimeTZDtype，请改用 DatetimeTZDtype.construct_from_string() (GH 23990)
移除 read_excel() 中之前已弃用的关键字“skip_footer”；请改用“skipfooter” (GH 18836)
read_excel() 不再允许为参数 usecols 使用整数值，请改传递一个从 0 到 usecols（含）的整数列表 (GH 23635)
移除 DataFrame.to_records() 中之前已弃用的关键字“convert_datetime64” (GH 18902)
移除 IntervalIndex.from_intervals，请改用 IntervalIndex 构造函数 (GH 19263)
将 DatetimeIndex.to_series() 中默认的“keep_tz”参数从 None 更改为 True (GH 23739)
移除 api.types.is_period 和 api.types.is_datetimetz (GH 23917)
移除读取包含由 pre-0.16 版本 pandas 创建的 Categorical 实例的 pickles 的功能 (GH 27538)
移除 pandas.tseries.plotting.tsplot (GH 18627)
移除 DataFrame.apply() 中之前已弃用的关键字“reduce”和“broadcast” (GH 18577)
移除 pandas._testing 中之前已弃用的 assert_raises_regex 函数 (GH 29174)
移除 pandas.core.indexes.frozen 中之前已弃用的 FrozenNDArray 类 (GH 29335)
移除 read_feather() 中之前已弃用的关键字“nthreads”，请改用“use_threads” (GH 23053)
移除 Index.is_lexsorted_for_tuple (GH 29305)
移除对 DataFrame.aggregate(), Series.aggregate(), core.groupby.DataFrameGroupBy.aggregate(), core.groupby.SeriesGroupBy.aggregate(), core.window.rolling.Rolling.aggregate() 中嵌套重命名的支持 (GH 29608)
移除 Series.valid；请改用 Series.dropna() (GH 18800)
移除 DataFrame.is_copy, Series.is_copy (GH 18812)
移除 DataFrame.get_ftype_counts, Series.get_ftype_counts (GH 18243)
移除 DataFrame.ftypes, Series.ftypes, Series.ftype (GH 26744)
移除 Index.get_duplicates，请改用 idx[idx.duplicated()].unique() (GH 20239)
移除 Series.clip_upper, Series.clip_lower, DataFrame.clip_upper, DataFrame.clip_lower (GH 24203)
移除更改 DatetimeIndex.freq, TimedeltaIndex.freq 或 PeriodIndex.freq 的能力 (GH 20772)
移除 DatetimeIndex.offset (GH 20730)
移除 DatetimeIndex.asobject, TimedeltaIndex.asobject, PeriodIndex.asobject，请改用 astype(object) (GH 29801)
移除 factorize() 中之前已弃用的关键字“order” (GH 19751)
移除 read_stata() 和 DataFrame.to_stata() 中之前已弃用的关键字“encoding” (GH 21400)
将 concat() 中默认的“sort”参数从 None 更改为 False (GH 20613)
移除 DataFrame.update() 中之前已弃用的关键字“raise_conflict”，请改用“errors” (GH 23585)
移除 DatetimeIndex.shift(), TimedeltaIndex.shift(), PeriodIndex.shift() 中之前已弃用的关键字“n”，请改用“periods” (GH 22458)
移除 DataFrame.resample() 中之前已弃用的关键字“how”、“fill_method”和“limit” (GH 30139)
将整数传递给 Series.fillna() 或 DataFrame.fillna() 并使用 timedelta64[ns] dtype 时，现在会引发 TypeError (GH 24694)
不再支持将多个轴传递给 DataFrame.dropna() (GH 20995)
移除 Series.nonzero，请改用 to_numpy().nonzero() (GH 24048)
不再支持将浮点 dtype codes 传递给 Categorical.from_codes()，请改传递 codes.astype(np.int64) (GH 21775)
移除 Series.str.partition() 和 Series.str.rpartition() 中之前已弃用的关键字“pat”，请改用“sep” (GH 23767)
移除 Series.put (GH 27106)
移除 Series.real, Series.imag (GH 27106)
移除 Series.to_dense, DataFrame.to_dense (GH 26684)
移除 Index.dtype_str，请改用 str(index.dtype) (GH 27106)
Categorical.ravel() 现在返回一个 Categorical 而不是 ndarray (GH 27199)
Numpy ufuncs 上的“outer”方法，例如作用于 Series 对象的 np.subtract.outer 不再受支持，并将引发 NotImplementedError (GH 27198)
移除 Series.get_dtype_counts 和 DataFrame.get_dtype_counts (GH 27145)
将 Categorical.take() 中默认的“fill_value”参数从 True 更改为 False (GH 20841)
将 Series.rolling().apply(), DataFrame.rolling().apply(), Series.expanding().apply() 和 DataFrame.expanding().apply() 中“raw”参数的默认值从 None 更改为 False (GH 20584)
移除 Series.argmin() 和 Series.argmax() 的已弃用行为，对于旧行为，请改用 Series.idxmin() 和 Series.idxmax() (GH 16955)
将时区感知的 datetime.datetime 或 Timestamp 传递给 Timestamp 构造函数并带上 tz 参数时，现在会引发 ValueError (GH 23621)
移除 Series.base, Index.base, Categorical.base, Series.flags, Index.flags, PeriodArray.flags, Series.strides, Index.strides, Series.itemsize, Index.itemsize, Series.data, Index.data (GH 20721)
更改 Timedelta.resolution() 以匹配标准库 datetime.timedelta.resolution 的行为，对于旧行为，请改用 Timedelta.resolution_string() (GH 26839)
移除 Timestamp.weekday_name, DatetimeIndex.weekday_name 和 Series.dt.weekday_name (GH 18164)
移除 Timestamp.tz_localize(), DatetimeIndex.tz_localize() 和 Series.tz_localize() 中之前已弃用的关键字“errors” (GH 22644)
将 CategoricalDtype 中默认的“ordered”参数从 None 更改为 False (GH 26336)
Series.set_axis() 和 DataFrame.set_axis() 现在要求“labels”作为第一个参数，“axis”作为可选的命名参数 (GH 30089)
移除 to_msgpack, read_msgpack, DataFrame.to_msgpack, Series.to_msgpack (GH 27103)
移除 Series.compress (GH 21930)
移除 Categorical.fillna() 中之前已弃用的关键字“fill_value”，请改用“value” (GH 19269)
移除 andrews_curves() 中之前已弃用的关键字“data”，请改用“frame” (GH 6956)
移除 parallel_coordinates() 中之前已弃用的关键字“data”，请改用“frame” (GH 6956)
移除 parallel_coordinates() 中之前已弃用的关键字“colors”，请改用“color” (GH 6956)
移除 read_gbq() 中之前已弃用的关键字“verbose”和“private_key” (GH 30200)
对时区感知的 Series 和 DatetimeIndex 调用 np.array 和 np.asarray 现在将返回一个时区感知的 Timestamp 对象数组 (GH 24596)

性能改进#

DataFrame 与标量的算术和比较运算的性能改进 (GH 24990, GH 29853)
使用非唯一 IntervalIndex 进行索引的性能改进 (GH 27489)
MultiIndex.is_monotonic 的性能改进 (GH 27495)
当 bins 为 IntervalIndex 时，cut() 的性能改进 (GH 27668)
使用 range 初始化 DataFrame 的性能改进 (GH 30171)
当 method 为 "spearman" 时，DataFrame.corr() 的性能改进 (GH 28139)
当提供值列表进行替换时，DataFrame.replace() 的性能改进 (GH 28099)
DataFrame.select_dtypes() 通过使用向量化而非循环迭代来提高性能 (GH 28317)
Categorical.searchsorted() 和 CategoricalIndex.searchsorted() 的性能改进 (GH 28795)
当 Categorical 与标量进行比较且标量不在类别中时的性能改进 (GH 29750)
当检查 Categorical 中的值是否等于、大于等于或大于给定标量时的性能改进。如果检查 Categorical 是否小于或小于等于标量，则此改进不存在 (GH 29820)
Index.equals() 和 MultiIndex.equals() 的性能改进 (GH 29134)
当 skipna 为 True 时，infer_dtype() 的性能改进 (GH 28814)

Bug 修复#

Categorical#

添加测试以断言 fillna() 在值不是类别中的值时引发正确的 ValueError 消息 (GH 13628)
Categorical.astype() 中的 Bug，在转换为 int 时 NaN 值处理不正确 (GH 28406)
使用 CategoricalIndex 调用 DataFrame.reindex() 时，如果目标包含重复项则会失败，而如果源包含重复项则不会失败 (GH 28107)
Categorical.astype() 未允许强制转换为扩展 dtype 的 Bug (GH 28668)
merge() 无法在分类和扩展 dtype 列上连接的 Bug (GH 28668)
Categorical.searchsorted() 和 CategoricalIndex.searchsorted() 现在也适用于无序分类 (GH 21667)
添加测试以断言使用 DataFrame.to_parquet() 或 read_parquet() 往返 parquet 时将保留字符串类型的 Categorical dtypes (GH 27955)
更改 Categorical.remove_categories() 中的错误消息，使其始终以集合形式显示无效的移除项 (GH 28669)
对日期时间分类 dtype 的 Series 使用日期访问器时，返回的对象类型与在该类型 Series 上使用 str.() / dt.() 时不一致。例如，当对包含重复条目的 Categorical 访问 Series.dt.tz_localize() 时，访问器会跳过重复项 (GH 27952)
DataFrame.replace() 和 Series.replace() 在分类数据上给出不正确结果的 Bug (GH 26988)
在空 Categorical 上调用 Categorical.min() 或 Categorical.max() 会引发 numpy 异常的 Bug (GH 30227)
以下方法现在通过 groupby(..., observed=False) 调用时，也能正确输出未观察到的类别的值 (GH 17605) * core.groupby.SeriesGroupBy.count() * core.groupby.SeriesGroupBy.size() * core.groupby.SeriesGroupBy.nunique() * core.groupby.SeriesGroupBy.nth()

日期时间类#

Series.__setitem__() 中的 Bug，在插入到 datetime64 dtype 的 Series 中时，错误地将 np.timedelta64("NaT") 转换为 np.datetime64("NaT") (GH 27311)
当基础数据是只读时，Series.dt() 属性查找中的 Bug (GH 27529)
HDFStore.__getitem__ 中错误读取 Python 2 中创建的 tz 属性的 Bug (GH 26443)
to_datetime() 中的 Bug，其中传递格式错误的 str 数组并使用 errors="coerce" 时，可能会错误地导致引发 ValueError (GH 28299)
core.groupby.SeriesGroupBy.nunique() 中的 Bug，其中 NaT 值干扰了唯一值的计数 (GH 27951)
Timestamp 减法中的 Bug，当从 np.datetime64 对象中减去 Timestamp 时错误地引发 TypeError (GH 28286)
Timestamp 与整数或整数 dtype 数组的加减运算现在将引发 NullFrequencyError 而不是 ValueError (GH 28268)
具有整数 dtype 的 Series 和 DataFrame 在添加或减去 np.datetime64 对象时未能引发 TypeError 的 Bug (GH 28080)
Series.astype(), Index.astype() 和 DataFrame.astype() 在强制转换为整数 dtype 时未能处理 NaT 的 Bug (GH 28492)
Week 带有 weekday 时，在添加或减去无效类型时错误地引发 AttributeError 而不是 TypeError 的 Bug (GH 28530)
当与 dtype 为 'timedelta64[ns]' 的 Series 进行运算时，DataFrame 算术运算中的 Bug (GH 28049)
core.groupby.generic.SeriesGroupBy.apply() 中的 Bug，当原始 DataFrame 中的列是日期时间且列标签不是标准整数时，引发 ValueError (GH 28247)
pandas._config.localization.get_locales() 中的 Bug，其中 locales -a 将区域设置列表编码为 windows-1252 (GH 23638, GH 24760, GH 27368)
Series.var() 在使用 timedelta64[ns] dtype 调用时未能引发 TypeError 的 Bug (GH 28289)
DatetimeIndex.strftime() 和 Series.dt.strftime() 中的 Bug，其中 NaT 被转换为字符串 'NaT' 而不是 np.nan (GH 29578)
使用长度不正确的布尔掩码对日期时间类数组进行掩码操作时未引发 IndexError 的 Bug (GH 30308)
Timestamp.resolution 是一个属性而不是类属性的 Bug (GH 29910)
pandas.to_datetime() 在调用时传入 None 却引发 TypeError 而不是返回 NaT 的 Bug (GH 30011)
pandas.to_datetime() 在使用 cache=True（默认值）时对 deque 对象失败的 Bug (GH 29403)
在 Series.item() 与 datetime64 或 timedelta64 数据类型、DatetimeIndex.item() 和 TimedeltaIndex.item() 中存在错误，它们返回整数而不是 Timestamp 或 Timedelta（GH 30175）
在 DatetimeIndex 添加非优化 DateOffset 时，错误地丢弃时区信息的错误（GH 30336）
在 DataFrame.drop() 中，当尝试从 DatetimeIndex 中删除不存在的值时，会产生一个令人困惑的错误消息的错误（GH 30399）
在 DataFrame.append() 中，会移除新数据的时区感知能力的错误（GH 30238）
在 Series.cummin() 和 Series.cummax() 中，当数据类型为时区感知型时，错误地丢弃其时区的错误（GH 15553）
在 DatetimeArray、TimedeltaArray 和 PeriodArray 中，原地加法和减法并未实际原地执行的错误（GH 24115）
在 pandas.to_datetime() 中，当使用存储 IntegerArray 的 Series 调用时，抛出 TypeError 而不是返回 Series 的错误（GH 30050）
在 date_range() 中，当使用自定义工作时间作为 freq 并给定 periods 数量时的错误（GH 30593）
在 PeriodIndex 比较中，错误地将整数转换为 Period 对象，这与 Period 的比较行为不一致的错误（GH 30722）
在 DatetimeIndex.insert() 中，当尝试将具有时区信息的 Timestamp 插入到不带时区信息的 DatetimeIndex 中，反之亦然时，抛出 ValueError 而不是 TypeError 的错误（GH 30806）

Timedelta#

从 np.datetime64 对象中减去 TimedeltaIndex 或 TimedeltaArray 时出现错误（GH 29558）

时区#

数值#

在 DataFrame.quantile() 中，当零列的 DataFrame 错误地引发异常的错误（GH 23925）
DataFrame 弹性不等式比较方法（DataFrame.lt()、DataFrame.le()、DataFrame.gt()、DataFrame.ge()）在对象数据类型和复数条目下，未能像其 Series 对应物一样引发 TypeError 的错误（GH 28079）
在 DataFrame 逻辑操作（&, |, ^）中，未能通过填充 NA 值来匹配 Series 行为的错误（GH 28741）
在 DataFrame.interpolate() 中，按名称指定轴会在变量被赋值之前引用该变量的错误（GH 29142）
在 Series.var() 中，当可为空的整数数据类型 Series 未传递 ddof 参数时，未计算出正确的值的错误（GH 29128）
当使用 frac > 1 且 replace = False 时，错误消息得到改进（GH 27451）
数值索引中存在错误，导致可以用无效数据类型（例如日期时间类型）实例化 Int64Index、UInt64Index 或 Float64Index（GH 29539）
在 UInt64Index 中，从包含 np.uint64 范围内值的列表构造时，精度丢失的错误（GH 29526）
在 NumericIndex 构造中，当使用 np.uint64 范围内的整数时，导致索引失败的错误（GH 28023）
在 NumericIndex 构造中，导致当使用 np.uint64 范围内的整数索引 DataFrame 时，UInt64Index 被转换为 Float64Index 的错误（GH 28279）
在 Series.interpolate() 中，当使用方法 `index` 且索引未排序时，之前会返回不正确的结果（GH 21037）
Bug in DataFrame.round() where a DataFrame with a CategoricalIndex of IntervalIndex columns would incorrectly raise a TypeError (GH 30063)
Bug in Series.pct_change() and DataFrame.pct_change() when there are duplicated indices (GH 30463)
Bug in DataFrame cumulative operations (e.g. cumsum, cummax) incorrect casting to object-dtype (GH 19296)
Bug in diff losing the dtype for extension types (GH 30889)
Bug in DataFrame.diff raising an IndexError when one of the columns was a nullable integer dtype (GH 30967)

Conversion#

Strings#

Calling Series.str.isalnum() (and other “ismethods”) on an empty Series would return an object dtype instead of bool (GH 29624)

Interval#

Bug in IntervalIndex.get_indexer() where a Categorical or CategoricalIndex target would incorrectly raise a TypeError (GH 30063)
Bug in pandas.core.dtypes.cast.infer_dtype_from_scalar where passing pandas_dtype=True did not infer IntervalDtype (GH 30337)
Bug in Series constructor where constructing a Series from a list of Interval objects resulted in object dtype instead of IntervalDtype (GH 23563)
Bug in IntervalDtype where the kind attribute was incorrectly set as None instead of "O" (GH 30568)
Bug in IntervalIndex, IntervalArray, and Series with interval data where equality comparisons were incorrect (GH 24112)

Indexing#

Bug in assignment using a reverse slicer (GH 26939)
Bug in DataFrame.explode() would duplicate frame in the presence of duplicates in the index (GH 28010)
Bug in reindexing a PeriodIndex() with another type of index that contained a Period (GH 28323) (GH 28337)
Fix assignment of column via .loc with numpy non-ns datetime type (GH 27395)
Bug in Float64Index.astype() where np.inf was not handled properly when casting to an integer dtype (GH 28475)
Index.union() could fail when the left contained duplicates (GH 28257)
Bug when indexing with .loc where the index was a CategoricalIndex with non-string categories didn’t work (GH 17569, GH 30225)
Index.get_indexer_non_unique() could fail with TypeError in some cases, such as when searching for ints in a string index (GH 28257)
Bug in Float64Index.get_loc() incorrectly raising TypeError instead of KeyError (GH 29189)
Bug in DataFrame.loc() with incorrect dtype when setting Categorical value in 1-row DataFrame (GH 25495)
MultiIndex.get_loc() can’t find missing values when input includes missing values (GH 19132)
Bug in Series.__setitem__() incorrectly assigning values with boolean indexer when the length of new data matches the number of True values and new data is not a Series or an np.array (GH 30567)
Bug in indexing with a PeriodIndex incorrectly accepting integers representing years, use e.g. ser.loc["2007"] instead of ser.loc[2007] (GH 30763)

Missing#

MultiIndex#

Constructor for MultiIndex verifies that the given sortorder is compatible with the actual lexsort_depth if verify_integrity parameter is True (the default) (GH 28735)
Series and MultiIndex .drop with MultiIndex raise exception if labels not in given in level (GH 8594)

IO#

read_csv() now accepts binary mode file buffers when using the Python csv engine (GH 23779)
Bug in DataFrame.to_json() where using a Tuple as a column or index value and using orient="columns" or orient="index" would produce invalid JSON (GH 20500)
Improve infinity parsing. read_csv() now interprets Infinity, +Infinity, -Infinity as floating point values (GH 10065)
Bug in DataFrame.to_csv() where values were truncated when the length of na_rep was shorter than the text input data. (GH 25099)
Bug in DataFrame.to_string() where values were truncated using display options instead of outputting the full content (GH 9784)
Bug in DataFrame.to_json() where a datetime column label would not be written out in ISO format with orient="table" (GH 28130)
Bug in DataFrame.to_parquet() where writing to GCS would fail with engine='fastparquet' if the file did not already exist (GH 28326)
Bug in read_hdf() closing stores that it didn’t open when Exceptions are raised (GH 28699)
Bug in DataFrame.read_json() where using orient="index" would not maintain the order (GH 28557)
Bug in DataFrame.to_html() where the length of the formatters argument was not verified (GH 28469)
Bug in DataFrame.read_excel() with engine='ods' when sheet_name argument references a non-existent sheet (GH 27676)
Bug in pandas.io.formats.style.Styler() formatting for floating values not displaying decimals correctly (GH 13257)
Bug in DataFrame.to_html() when using formatters=<list> and max_cols together. (GH 25955)
Bug in Styler.background_gradient() not able to work with dtype Int64 (GH 28869)
Bug in DataFrame.to_clipboard() which did not work reliably in ipython (GH 22707)
Bug in read_json() where default encoding was not set to utf-8 (GH 29565)
Bug in PythonParser where str and bytes were being mixed when dealing with the decimal field (GH 29650)
read_gbq() now accepts progress_bar_type to display progress bar while the data downloads. (GH 29857)
Bug in pandas.io.json.json_normalize() where a missing value in the location specified by record_path would raise a TypeError (GH 30148)
read_excel() now accepts binary data (GH 15914)
Bug in read_csv() in which encoding handling was limited to just the string utf-16 for the C engine (GH 24130)

Plotting#

Bug in Series.plot() not able to plot boolean values (GH 23719)
Bug in DataFrame.plot() not able to plot when no rows (GH 27758)
Bug in DataFrame.plot() producing incorrect legend markers when plotting multiple series on the same axis (GH 18222)
Bug in DataFrame.plot() when kind='box' and data contains datetime or timedelta data. These types are now automatically dropped (GH 22799)
Bug in DataFrame.plot.line() and DataFrame.plot.area() produce wrong xlim in x-axis (GH 27686, GH 25160, GH 24784)
Bug where DataFrame.boxplot() would not accept a color parameter like DataFrame.plot.box() (GH 26214)
Bug in the xticks argument being ignored for DataFrame.plot.bar() (GH 14119)
set_option() now validates that the plot backend provided to 'plotting.backend' implements the backend when the option is set, rather than when a plot is created (GH 28163)
DataFrame.plot() now allow a backend keyword argument to allow changing between backends in one session (GH 28619).
Bug in color validation incorrectly raising for non-color styles (GH 29122).
Allow DataFrame.plot.scatter() to plot objects and datetime type data (GH 18755, GH 30391)
Bug in DataFrame.hist(), xrot=0 does not work with by and subplots (GH 30288).

GroupBy/resample/rolling#

Bug in core.groupby.DataFrameGroupBy.apply() only showing output from a single group when function returns an Index (GH 28652)
Bug in DataFrame.groupby() with multiple groups where an IndexError would be raised if any group contained all NA values (GH 20519)
Bug in Resampler.size() and Resampler.count() returning wrong dtype when used with an empty Series or DataFrame (GH 28427)
Bug in DataFrame.rolling() not allowing for rolling over datetimes when axis=1 (GH 28192)
Bug in DataFrame.rolling() not allowing rolling over multi-index levels (GH 15584).
Bug in DataFrame.rolling() not allowing rolling on monotonic decreasing time indexes (GH 19248).
Bug in DataFrame.groupby() not offering selection by column name when axis=1 (GH 27614)
Bug in core.groupby.DataFrameGroupby.agg() not able to use lambda function with named aggregation (GH 27519)
Bug in DataFrame.groupby() losing column name information when grouping by a categorical column (GH 28787)
Remove error raised due to duplicated input functions in named aggregation in DataFrame.groupby() and Series.groupby(). Previously error will be raised if the same function is applied on the same column and now it is allowed if new assigned names are different. (GH 28426)
core.groupby.SeriesGroupBy.value_counts() will be able to handle the case even when the Grouper makes empty groups (GH 28479)
Bug in core.window.rolling.Rolling.quantile() ignoring interpolation keyword argument when used within a groupby (GH 28779)
Bug in DataFrame.groupby() where any, all, nunique and transform functions would incorrectly handle duplicate column labels (GH 21668)
Bug in core.groupby.DataFrameGroupBy.agg() with timezone-aware datetime64 column incorrectly casting results to the original dtype (GH 29641)
Bug in DataFrame.groupby() when using axis=1 and having a single level columns index (GH 30208)
Bug in DataFrame.groupby() when using nunique on axis=1 (GH 30253)
Bug in DataFrameGroupBy.quantile() and SeriesGroupBy.quantile() with multiple list-like q value and integer column names (GH 30289)
Bug in DataFrameGroupBy.pct_change() and SeriesGroupBy.pct_change() causes TypeError when fill_method is None (GH 30463)
Bug in Rolling.count() and Expanding.count() argument where min_periods was ignored (GH 26996)

Reshaping#

Bug in DataFrame.apply() that caused incorrect output with empty DataFrame (GH 28202, GH 21959)
Bug in DataFrame.stack() not handling non-unique indexes correctly when creating MultiIndex (GH 28301)
Bug in pivot_table() not returning correct type float when margins=True and aggfunc='mean' (GH 24893)
Bug merge_asof() could not use datetime.timedelta for tolerance kwarg (GH 28098)
Bug in merge(), did not append suffixes correctly with MultiIndex (GH 28518)
qcut() and cut() now handle boolean input (GH 20303)
Fix to ensure all int dtypes can be used in merge_asof() when using a tolerance value. Previously every non-int64 type would raise an erroneous MergeError (GH 28870).
Better error message in get_dummies() when columns isn’t a list-like value (GH 28383)
Bug in Index.join() that caused infinite recursion error for mismatched MultiIndex name orders. (GH 25760, GH 28956)
Bug Series.pct_change() where supplying an anchored frequency would throw a ValueError (GH 28664)
Bug where DataFrame.equals() returned True incorrectly in some cases when two DataFrames had the same columns in different orders (GH 28839)
Bug in DataFrame.replace() that caused non-numeric replacer’s dtype not respected (GH 26632)
Bug in melt() where supplying mixed strings and numeric values for id_vars or value_vars would incorrectly raise a ValueError (GH 29718)
Dtypes are now preserved when transposing a DataFrame where each column is the same extension dtype (GH 30091)
Bug in merge_asof() merging on a tz-aware left_index and right_on a tz-aware column (GH 29864)
Improved error message and docstring in cut() and qcut() when labels=True (GH 13318)
Bug in missing fill_na parameter to DataFrame.unstack() with list of levels (GH 30740)

Sparse#

Bug in SparseDataFrame arithmetic operations incorrectly casting inputs to float (GH 28107)
Bug in DataFrame.sparse returning a Series when there was a column named sparse rather than the accessor (GH 30758)
Fixed operator.xor() with a boolean-dtype SparseArray. Now returns a sparse result, rather than object dtype (GH 31025)

ExtensionArray#

Bug in arrays.PandasArray when setting a scalar string (GH 28118, GH 28150).
Bug where nullable integers could not be compared to strings (GH 28930)
Bug where DataFrame constructor raised ValueError with list-like data and dtype specified (GH 30280)

Other#

Trying to set the display.precision, display.max_rows or display.max_columns using set_option() to anything but a None or a positive int will raise a ValueError (GH 23348)
Using DataFrame.replace() with overlapping keys in a nested dictionary will no longer raise, now matching the behavior of a flat dictionary (GH 27660)
DataFrame.to_csv() and Series.to_csv() now support dicts as compression argument with key 'method' being the compression method and others as additional compression options when the compression method is 'zip'. (GH 26023)
Bug in Series.diff() where a boolean series would incorrectly raise a TypeError (GH 17294)
Series.append() will no longer raise a TypeError when passed a tuple of Series (GH 28410)
Fix corrupted error message when calling pandas.libs._json.encode() on a 0d array (GH 18878)
Backtick quoting in DataFrame.query() and DataFrame.eval() can now also be used to use invalid identifiers like names that start with a digit, are python keywords, or are using single character operators. (GH 27017)
Bug in pd.core.util.hashing.hash_pandas_object where arrays containing tuples were incorrectly treated as non-hashable (GH 28969)
Bug in DataFrame.append() that raised IndexError when appending with empty list (GH 28769)
Fix AbstractHolidayCalendar to return correct results for years after 2030 (now goes up to 2200) (GH 27790)
Fixed IntegerArray returning inf rather than NaN for operations dividing by 0 (GH 27398)
Fixed pow operations for IntegerArray when the other value is 0 or 1 (GH 29997)
Bug in Series.count() raises if use_inf_as_na is enabled (GH 29478)
Bug in Index where a non-hashable name could be set without raising TypeError (GH 29069)
Bug in DataFrame constructor when passing a 2D ndarray and an extension dtype (GH 12513)
Bug in DataFrame.to_csv() when supplied a series with a dtype="string" and a na_rep, the na_rep was being truncated to 2 characters. (GH 29975)
Bug where DataFrame.itertuples() would incorrectly determine whether or not namedtuples could be used for dataframes of 255 columns (GH 28282)
Handle nested NumPy object arrays in testing.assert_series_equal() for ExtensionArray implementations (GH 30841)
Bug in Index constructor incorrectly allowing 2-dimensional input arrays (GH 13601, GH 27125)

Contributors#

A total of 308 people contributed patches to this release. People with a “+” by their names contributed a patch for the first time.

Aaditya Panikath +
Abdullah İhsan Seçer
Abhijeet Krishnan +
Adam J. Stewart
Adam Klaum +
Addison Lynch
Aivengoe +
Alastair James +
Albert Villanova del Moral
Alex Kirko +
Alfredo Granja +
Allen Downey
Alp Arıbal +
Andreas Buhr +
Andrew Munch +
Andy
Angela Ambroz +
Aniruddha Bhattacharjee +
Ankit Dhankhar +
Antonio Andraues Jr +
Arda Kosar +
Asish Mahapatra +
Austin Hackett +
Avi Kelman +
AyowoleT +
Bas Nijholt +
Ben Thayer
Bharat Raghunathan
Bhavani Ravi
Bhuvana KA +
Big Head
Blake Hawkins +
Bobae Kim +
Brett Naul
Brian Wignall
Bruno P. Kinoshita +
Bryant Moscon +
Cesar H +
Chris Stadler
Chris Zimmerman +
Christopher Whelan
Clemens Brunner
Clemens Tolboom +
Connor Charles +
Daniel Hähnke +
Daniel Saxton
Darin Plutchok +
Dave Hughes
David Stansby
DavidRosen +
Dean +
Deepan Das +
Deepyaman Datta
DorAmram +
Dorothy Kabarozi +
Drew Heenan +
Eliza Mae Saret +
Elle +
Endre Mark Borza +
Eric Brassell +
Eric Wong +
Eunseop Jeong +
Eyden Villanueva +
Felix Divo
ForTimeBeing +
Francesco Truzzi +
Gabriel Corona +
Gabriel Monteiro +
Galuh Sahid +
Georgi Baychev +
Gina
GiuPassarelli +
Grigorios Giannakopoulos +
Guilherme Leite +
Guilherme Salomé +
Gyeongjae Choi +
Harshavardhan Bachina +
Harutaka Kawamura +
Hassan Kibirige
Hielke Walinga
Hubert
Hugh Kelley +
Ian Eaves +
Ignacio Santolin +
Igor Filippov +
Irv Lustig
Isaac Virshup +
Ivan Bessarabov +
JMBurley +
Jack Bicknell +
Jacob Buckheit +
Jan Koch
Jan Pipek +
Jan Škoda +
Jan-Philip Gehrcke
Jasper J.F. van den Bosch +
Javad +
Jeff Reback
Jeremy Schendel
Jeroen Kant +
Jesse Pardue +
Jethro Cao +
Jiang Yue
Jiaxiang +
Jihyung Moon +
Jimmy Callin
Jinyang Zhou +
Joao Victor Martinelli +
Joaq Almirante +
John G Evans +
John Ward +
Jonathan Larkin +
Joris Van den Bossche
Josh Dimarsky +
Joshua Smith +
Josiah Baker +
Julia Signell +
Jung Dong Ho +
Justin Cole +
Justin Zheng
Kaiqi Dong
Karthigeyan +
Katherine Younglove +
Katrin Leinweber
Kee Chong Tan +
Keith Kraus +
Kevin Nguyen +
Kevin Sheppard
Kisekka David +
Koushik +
Kyle Boone +
Kyle McCahill +
Laura Collard, PhD +
LiuSeeker +
Louis Huynh +
Lucas Scarlato Astur +
Luiz Gustavo +
Luke +
Luke Shepard +
MKhalusova +
Mabel Villalba
Maciej J +
Mak Sze Chun
Manu NALEPA +
Marc
Marc Garcia
Marco Gorelli +
Marco Neumann +
Martin Winkel +
Martina G. Vilas +
Mateusz +
Matthew Roeschke
Matthew Tan +
Max Bolingbroke
Max Chen +
MeeseeksMachine
Miguel +
MinGyo Jung +
Mohamed Amine ZGHAL +
Mohit Anand +
MomIsBestFriend +
Naomi Bonnin +
Nathan Abel +
Nico Cernek +
Nigel Markey +
Noritada Kobayashi +
Oktay Sabak +
Oliver Hofkens +
Oluokun Adedayo +
Osman +
Oğuzhan Öğreden +
Pandas Development Team +
Patrik Hlobil +
Paul Lee +
Paul Siegel +
Petr Baev +
Pietro Battiston
Prakhar Pandey +
Puneeth K +
Raghav +
Rajat +
Rajhans Jadhao +
Rajiv Bharadwaj +
Rik-de-Kort +
Roei.r
Rohit Sanjay +
Ronan Lamy +
Roshni +
Roymprog +
Rushabh Vasani +
Ryan Grout +
Ryan Nazareth
Samesh Lakhotia +
Samuel Sinayoko
Samyak Jain +
Sarah Donehower +
Sarah Masud +
Saul Shanabrook +
Scott Cole +
SdgJlbl +
Seb +
Sergei Ivko +
Shadi Akiki
Shorokhov Sergey
Siddhesh Poyarekar +
Sidharthan Nair +
Simon Gibbons
Simon Hawkins
Simon-Martin Schröder +
Sofiane Mahiou +
Sourav kumar +
Souvik Mandal +
Soyoun Kim +
Sparkle Russell-Puleri +
Srinivas Reddy Thatiparthy (శ్రీనివాస్ రెడ్డి తాటిపర్తి)
Stuart Berg +
Sumanau Sareen
Szymon Bednarek +
Tambe Tabitha Achere +
Tan Tran
Tang Heyi +
Tanmay Daripa +
Tanya Jain
Terji Petersen
Thomas Li +
Tirth Jain +
Tola A +
Tom Augspurger
Tommy Lynch +
Tomoyuki Suzuki +
Tony Lorenzo
Unprocessable +
Uwe L. Korn
Vaibhav Vishal
Victoria Zdanovskaya +
Vijayant +
Vishwak Srinivasan +
WANG Aiyong
Wenhuan
Wes McKinney
Will Ayd
Will Holmgren
William Ayd
William Blan +
Wouter Overmeire
Wuraola Oyewusi +
YaOzI +
Yash Shukla +
Yu Wang +
Yusei Tahara +
alexander135 +
alimcmaster1
avelineg +
bganglia +
bolkedebruin
bravech +
chinhwee +
cruzzoe +
dalgarno +
daniellebrown +
danielplawrence
est271 +
francisco souza +
ganevgv +
garanews +
gfyoung
h-vetinari
hasnain2808 +
ianzur +
jalbritt +
jbrockmendel
jeschwar +
jlamborn324 +
joy-rosie +
kernc
killerontherun1
krey +
lexy-lixinyu +
lucyleeow +
lukasbk +
maheshbapatu +
mck619 +
nathalier
naveenkaushik2504 +
nlepleux +
nrebena
ohad83 +
pilkibun
pqzx +
proost +
pv8493013j +
qudade +
rhstanton +
rmunjal29 +
sangarshanan +
sardonick +
saskakarsi +
shaido987 +
ssikdar1
steveayers124 +
tadashigaki +
timcera +
tlaytongoogle +
tobycheese
tonywu1999 +
tsvikas +
yogendrasoni +
zys5945 +

1.0.0 (2020年1月29日) 有何新变化#

新弃用策略#

增强功能#

在 rolling.apply 和 expanding.apply 中使用 Numba#

为滚动操作定义自定义窗口#

转换为 Markdown#

实验性新功能#

实验性 NA 标量，用于表示缺失值#

专用字符串数据类型#

支持缺失值的布尔数据类型#

方法 convert_dtypes 简化支持的扩展 dtype 的使用#

其他增强功能#

向后不兼容的 API 更改#

避免使用 MultiIndex.levels 中的名称#

IntervalArray 的新表示#

DataFrame.rename 现在只接受一个位置参数#

扩展了 DataFrame 的详细信息输出#

pandas.array() 推断变化#

arrays.IntegerArray 现在使用 pandas.NA#

arrays.IntegerArray 比较返回 arrays.BooleanArray#

默认情况下，Categorical.min() 现在返回最小值而不是 np.nan#

空 pandas.Series 的默认 dtype#

重新采样操作的结果 dtype 推断更改#

Python 最低版本要求提高#

依赖项的最低版本要求提高#

构建更改#

其他 API 更改#

文档改进#

弃用#

移除先前版本的弃用/更改#

性能改进#

Bug 修复#

Categorical#

日期时间类#

Timedelta#

时区#

数值#

Conversion#

Strings#

Interval#

Indexing#

Missing#

MultiIndex#

IO#

Plotting#

GroupBy/resample/rolling#

Reshaping#

Sparse#

ExtensionArray#

Other#

Contributors#

在 `rolling.apply` 和 `expanding.apply` 中使用 Numba#

实验性 `NA` 标量，用于表示缺失值#

方法 `convert_dtypes` 简化支持的扩展 dtype 的使用#

避免使用 `MultiIndex.levels` 中的名称#

`IntervalArray` 的新表示#

`DataFrame.rename` 现在只接受一个位置参数#

扩展了 `DataFrame` 的详细信息输出#

`pandas.array()` 推断变化#

`arrays.IntegerArray` 现在使用 `pandas.NA`#

`arrays.IntegerArray` 比较返回 `arrays.BooleanArray`#

默认情况下，`Categorical.min()` 现在返回最小值而不是 np.nan#

空 `pandas.Series` 的默认 dtype#