1.1.0 版本新特性 (2020 年 7 月 28 日)#

以下是 pandas 1.1.0 中的更改。有关包括 pandas 其他版本在内的完整更改日志，请参阅发布说明。

改进#

`loc` 引发的 `KeyError` 现在会指定缺失的标签#

此前，如果 .loc 调用缺少标签，会引发 KeyError，并指出不再支持此操作。

现在，错误消息中还会包含一个缺失标签的列表（最多 10 项，显示宽度 80 字符）。请参阅 GH 34272。

现在所有 `dtype` 都可以转换为 `StringDtype`#

此前，声明或转换为 StringDtype 通常只有在数据已经是 str 类型或类 NaN 值时才可能实现 (GH 31204)。现在，StringDtype 在所有 astype(str) 或 dtype=str 有效的情况下都可使用。

例如，现在以下操作是可行的

In [1]: ser = pd.Series([1, "abc", np.nan], dtype="string")

In [2]: ser
Out[2]: 
0       1
1     abc
2    <NA>
Length: 3, dtype: string

In [3]: ser[0]
Out[3]: '1'

In [4]: pd.Series([1, 2, np.nan], dtype="Int64").astype("string")
Out[4]: 
0       1
1       2
2    <NA>
Length: 3, dtype: string

非单调 PeriodIndex 的部分字符串切片#

PeriodIndex 现在支持对非单调索引进行部分字符串切片，与 DatetimeIndex 的行为保持一致 (GH 31096)。

例如

In [5]: dti = pd.date_range("2014-01-01", periods=30, freq="30D")

In [6]: pi = dti.to_period("D")

In [7]: ser_monotonic = pd.Series(np.arange(30), index=pi)

In [8]: shuffler = list(range(0, 30, 2)) + list(range(1, 31, 2))

In [9]: ser = ser_monotonic.iloc[shuffler]

In [10]: ser
Out[10]: 
2014-01-01     0
2014-03-02     2
2014-05-01     4
2014-06-30     6
2014-08-29     8
              ..
2015-09-23    21
2015-11-22    23
2016-01-21    25
2016-03-21    27
2016-05-20    29
Freq: D, Length: 30, dtype: int64

In [11]: ser["2014"]
Out[11]: 
2014-01-01     0
2014-03-02     2
2014-05-01     4
2014-06-30     6
2014-08-29     8
2014-10-28    10
2014-12-27    12
2014-01-31     1
2014-04-01     3
2014-05-31     5
2014-07-30     7
2014-09-28     9
2014-11-27    11
Freq: D, Length: 13, dtype: int64

In [12]: ser.loc["May 2015"]
Out[12]: 
2015-05-26    17
Freq: D, Length: 1, dtype: int64

比较两个 `DataFrame` 或两个 `Series` 并总结差异#

我们添加了 DataFrame.compare() 和 Series.compare() 用于比较两个 DataFrame 或两个 Series (GH 30429)。

In [13]: df = pd.DataFrame(
   ....:     {
   ....:         "col1": ["a", "a", "b", "b", "a"],
   ....:         "col2": [1.0, 2.0, 3.0, np.nan, 5.0],
   ....:         "col3": [1.0, 2.0, 3.0, 4.0, 5.0]
   ....:     },
   ....:     columns=["col1", "col2", "col3"],
   ....: )
   ....: 

In [14]: df
Out[14]: 
  col1  col2  col3
0    a   1.0   1.0
1    a   2.0   2.0
2    b   3.0   3.0
3    b   NaN   4.0
4    a   5.0   5.0

[5 rows x 3 columns]

In [15]: df2 = df.copy()

In [16]: df2.loc[0, 'col1'] = 'c'

In [17]: df2.loc[2, 'col3'] = 4.0

In [18]: df2
Out[18]: 
  col1  col2  col3
0    c   1.0   1.0
1    a   2.0   2.0
2    b   3.0   4.0
3    b   NaN   4.0
4    a   5.0   5.0

[5 rows x 3 columns]

In [19]: df.compare(df2)
Out[19]: 
  col1       col3      
  self other self other
0    a     c  NaN   NaN
2  NaN   NaN  3.0   4.0

[2 rows x 4 columns]

更多详细信息请参阅用户指南。

允许 groupby 键中包含 NA 值#

在 groupby 中，我们为 DataFrame.groupby() 和 Series.groupby() 添加了 dropna 关键字，以允许分组键中包含 NA 值。如果用户希望在 groupby 键中包含 NA 值，可以将 dropna 设置为 False。为了保持向后兼容性，dropna 的默认值设置为 True (GH 3729)。

In [20]: df_list = [[1, 2, 3], [1, None, 4], [2, 1, 3], [1, 2, 2]]

In [21]: df_dropna = pd.DataFrame(df_list, columns=["a", "b", "c"])

In [22]: df_dropna
Out[22]: 
   a    b  c
0  1  2.0  3
1  1  NaN  4
2  2  1.0  3
3  1  2.0  2

[4 rows x 3 columns]

# Default ``dropna`` is set to True, which will exclude NaNs in keys
In [23]: df_dropna.groupby(by=["b"], dropna=True).sum()
Out[23]: 
     a  c
b        
1.0  2  3
2.0  2  5

[2 rows x 2 columns]

# In order to allow NaN in keys, set ``dropna`` to False
In [24]: df_dropna.groupby(by=["b"], dropna=False).sum()
Out[24]: 
     a  c
b        
1.0  2  3
2.0  2  5
NaN  1  4

[3 rows x 2 columns]

参数 dropna 的默认设置为 True，这意味着 NA 不会包含在分组键中。

带键的排序#

我们为 DataFrame 和 Series 的排序方法（包括 DataFrame.sort_values()、DataFrame.sort_index()、Series.sort_values() 和 Series.sort_index()）添加了一个 key 参数。在执行排序之前，key 可以是任何可调用函数，它会逐列应用于用于排序的每一列 (GH 27237)。更多信息请参阅带键的 sort_values 和带键的 sort_index。

In [25]: s = pd.Series(['C', 'a', 'B'])

In [26]: s
Out[26]: 
0    C
1    a
2    B
Length: 3, dtype: object

In [27]: s.sort_values()
Out[27]: 
2    B
0    C
1    a
Length: 3, dtype: object

请注意，这会先按大写字母排序。如果我们应用 Series.str.lower() 方法，我们会得到

In [28]: s.sort_values(key=lambda x: x.str.lower())
Out[28]: 
1    a
2    B
0    C
Length: 3, dtype: object

当应用于 DataFrame 时，如果指定了 by，则键会逐列应用于所有列或子集，例如：

In [29]: df = pd.DataFrame({'a': ['C', 'C', 'a', 'a', 'B', 'B'],
   ....:                    'b': [1, 2, 3, 4, 5, 6]})
   ....: 

In [30]: df
Out[30]: 
   a  b
0  C  1
1  C  2
2  a  3
3  a  4
4  B  5
5  B  6

[6 rows x 2 columns]

In [31]: df.sort_values(by=['a'], key=lambda col: col.str.lower())
Out[31]: 
   a  b
2  a  3
3  a  4
4  B  5
5  B  6
0  C  1
1  C  2

[6 rows x 2 columns]

更多详细信息请参阅 DataFrame.sort_values()、Series.sort_values() 和 sort_index() 中的示例和文档。

Timestamp 构造函数中支持 fold 参数#

Timestamp: 现在支持仅关键字参数 fold，这与父类 datetime.datetime 类似，并遵循 PEP 495。它支持将 fold 作为初始化参数接受，并从其他构造函数参数中推断 fold (GH 25057, GH 31338)。此支持仅限于 dateutil 时区，因为 pytz 不支持 fold。

例如

In [32]: ts = pd.Timestamp("2019-10-27 01:30:00+00:00")

In [33]: ts.fold
Out[33]: 0

In [34]: ts = pd.Timestamp(year=2019, month=10, day=27, hour=1, minute=30,
   ....:                   tz="dateutil/Europe/London", fold=1)
   ....: 

In [35]: ts
Out[35]: Timestamp('2019-10-27 01:30:00+0000', tz='dateutil//usr/share/zoneinfo/Europe/London')

有关使用 fold 的更多信息，请参阅用户指南中的Fold 子节。

`to_datetime` 中解析带不同时区的时区感知格式#

to_datetime() 现在支持解析包含时区名称 (%Z) 和 UTC 偏移量 (%z) 的不同时区格式，然后通过设置 utc=True 将它们转换为 UTC。这将返回一个时区为 UTC 的 DatetimeIndex，而不是在未设置 utc=True 时返回 object dtype 的 Index (GH 32792)。

例如

In [36]: tz_strs = ["2010-01-01 12:00:00 +0100", "2010-01-01 12:00:00 -0100",
   ....:            "2010-01-01 12:00:00 +0300", "2010-01-01 12:00:00 +0400"]
   ....: 

In [37]: pd.to_datetime(tz_strs, format='%Y-%m-%d %H:%M:%S %z', utc=True)
Out[37]: 
DatetimeIndex(['2010-01-01 11:00:00+00:00', '2010-01-01 13:00:00+00:00',
               '2010-01-01 09:00:00+00:00', '2010-01-01 08:00:00+00:00'],
              dtype='datetime64[ns, UTC]', freq=None)

In[37]: pd.to_datetime(tz_strs, format='%Y-%m-%d %H:%M:%S %z')
Out[37]:
Index([2010-01-01 12:00:00+01:00, 2010-01-01 12:00:00-01:00,
       2010-01-01 12:00:00+03:00, 2010-01-01 12:00:00+04:00],
      dtype='object')

`Grouper` 和 `resample` 现在支持 `origin` 和 `offset` 参数#

Grouper 和 DataFrame.resample() 现在支持 origin 和 offset 参数。它允许用户控制调整分组的时间戳。(GH 31809)

分组的箱体根据时间序列起始点的当天开始时间进行调整。这对于以天为倍数（如 30D）或能被天整除（如 90s 或 1min）的频率非常有效。但对于不符合此标准的某些频率，它可能会造成不一致。现在，您可以使用参数 origin 指定一个固定时间戳来更改此行为。

现在有两个参数已被弃用（更多信息请参阅 DataFrame.resample() 的文档）

base 应替换为 offset。
loffset 应替换为在重采样后直接向索引 DataFrame 添加偏移量。

使用 origin 的小示例

In [38]: start, end = '2000-10-01 23:30:00', '2000-10-02 00:30:00'

In [39]: middle = '2000-10-02 00:00:00'

In [40]: rng = pd.date_range(start, end, freq='7min')

In [41]: ts = pd.Series(np.arange(len(rng)) * 3, index=rng)

In [42]: ts
Out[42]: 
2000-10-01 23:30:00     0
2000-10-01 23:37:00     3
2000-10-01 23:44:00     6
2000-10-01 23:51:00     9
2000-10-01 23:58:00    12
2000-10-02 00:05:00    15
2000-10-02 00:12:00    18
2000-10-02 00:19:00    21
2000-10-02 00:26:00    24
Freq: 7min, Length: 9, dtype: int64

使用默认行为 'start_day' 的重采样（origin 为 2000-10-01 00:00:00）

In [43]: ts.resample('17min').sum()
Out[43]: 
2000-10-01 23:14:00     0
2000-10-01 23:31:00     9
2000-10-01 23:48:00    21
2000-10-02 00:05:00    54
2000-10-02 00:22:00    24
Freq: 17min, Length: 5, dtype: int64

In [44]: ts.resample('17min', origin='start_day').sum()
Out[44]: 
2000-10-01 23:14:00     0
2000-10-01 23:31:00     9
2000-10-01 23:48:00    21
2000-10-02 00:05:00    54
2000-10-02 00:22:00    24
Freq: 17min, Length: 5, dtype: int64

使用固定 origin 的重采样

In [45]: ts.resample('17min', origin='epoch').sum()
Out[45]: 
2000-10-01 23:18:00     0
2000-10-01 23:35:00    18
2000-10-01 23:52:00    27
2000-10-02 00:09:00    39
2000-10-02 00:26:00    24
Freq: 17min, Length: 5, dtype: int64

In [46]: ts.resample('17min', origin='2000-01-01').sum()
Out[46]: 
2000-10-01 23:24:00     3
2000-10-01 23:41:00    15
2000-10-01 23:58:00    45
2000-10-02 00:15:00    45
Freq: 17min, Length: 4, dtype: int64

如果需要，您可以使用参数 offset（一个 Timedelta）来调整箱体，它将被添加到默认的 origin。

有关完整示例，请参阅：使用 origin 或 offset 调整箱体的起始点。

`fsspec` 现在用于文件系统处理#

对于本地文件系统以外的文件系统读写以及从 HTTP(S) 读取，将使用可选依赖项 fsspec 来调度操作 (GH 33452)。这对于 S3 和 GCS 存储（此前已支持）将提供不变的功能，但也将新增对其他几种存储实现的支持，例如 Azure Data Lake 和 Blob、SSH、FTP、Dropbox 和 GitHub。有关文档和功能，请参阅 fsspec 文档。

与 S3 和 GCS 交互的现有功能不会受到此更改的影响，因为 fsspec 仍将引入与以前相同的包。

其他改进#

与 matplotlib 3.3.0 的兼容性 (GH 34850)
IntegerArray.astype() 现在支持 datetime64 dtype (GH 32538)
IntegerArray 现在实现了 sum 操作 (GH 33172)
新增 pandas.errors.InvalidIndexError (GH 34570)。
新增 DataFrame.value_counts() (GH 5377)
新增 pandas.api.indexers.FixedForwardWindowIndexer() 类，以支持 rolling 操作期间的前瞻窗口。
新增 pandas.api.indexers.VariableOffsetWindowIndexer() 类，以支持非固定偏移量的 rolling 操作 (GH 34994)。
describe() 现在包含一个 datetime_is_numeric 关键字，用于控制如何汇总日期时间列 (GH 30164, GH 34798)。
Styler 现在可以更高效地渲染 CSS，当多个单元格具有相同样式时 (GH 30876)。
highlight_null() 现在接受 subset 参数 (GH 31345)。
当直接写入 sqlite 连接时，DataFrame.to_sql() 现在支持 multi 方法 (GH 29921)。
pandas.errors.OptionError 现在在 pandas.errors 中公开 (GH 27553)。
新增 api.extensions.ExtensionArray.argmax() 和 api.extensions.ExtensionArray.argmin() (GH 24382)。
timedelta_range() 现在在传递 start、stop 和 periods 时将推断频率 (GH 32377)。
在 IntervalIndex 上进行位置切片现在支持 step > 1 的切片 (GH 31658)。
Series.str 现在有一个 fullmatch 方法，它将正则表达式与 Series 每行中的整个字符串进行匹配，类似于 re.fullmatch (GH 32806)。
DataFrame.sample() 将允许 array-like 和 BitGenerator 对象作为种子传递给 random_state (GH 32503)。
Index.union() 现在会为 MultiIndex 对象引发 RuntimeWarning，如果其中的对象不可排序。传递 sort=False 以抑制此警告 (GH 33015)。
新增 Series.dt.isocalendar() 和 DatetimeIndex.isocalendar()，返回一个 DataFrame，其中包含根据 ISO 8601 日历计算的年、周和日 (GH 33206, GH 34392)。
DataFrame.to_feather() 方法现在支持 pyarrow 0.17 中新增的额外关键字参数（例如设置压缩）(GH 33422)。
cut() 现在接受参数 ordered，默认值为 ordered=True。如果 ordered=False 且未提供标签，将引发错误 (GH 33141)。
DataFrame.to_csv()、DataFrame.to_pickle() 和 DataFrame.to_json() 现在支持在使用 gzip 和 bz2 协议时传递压缩参数字典。这可用于设置自定义压缩级别，例如 df.to_csv(path, compression={'method': 'gzip', 'compresslevel': 1}) (GH 33196)。
melt() 新增了 ignore_index 参数（默认为 True），如果设置为 False，则该方法不会丢弃索引 (GH 17440)。
Series.update() 现在接受可以强制转换为 Series 的对象，例如 dict 和 list，这与 DataFrame.update() 的行为相似 (GH 33215)。
DataFrameGroupBy.transform() 和 DataFrameGroupBy.aggregate() 获得了 engine 和 engine_kwargs 参数，支持使用 Numba 执行函数 (GH 32854, GH 33388)。
Resampler.interpolate() 现在支持 SciPy 插值方法 scipy.interpolate.CubicSpline 作为方法 cubicspline (GH 33670)。
DataFrameGroupBy 和 SeriesGroupBy 现在实现了 sample 方法，用于在组内进行随机抽样 (GH 31775)。
DataFrame.to_numpy() 现在支持 na_value 关键字，用于控制输出数组中的 NA 哨兵值 (GH 33820)。
新增 api.extension.ExtensionArray.equals 到扩展数组接口，类似于 Series.equals() (GH 27081)。
在 read_stata() 和 StataReader 中支持的最低 dta 版本已增加到 105 (GH 26667)。
to_stata() 支持使用 compression 关键字参数进行压缩。压缩可以推断，也可以使用包含方法和传递给压缩库的任何附加参数的字符串或字典显式设置。压缩功能也已添加到低级 Stata 文件写入器 StataWriter、StataWriter117 和 StataWriterUTF8 (GH 26599)。
HDFStore.put() 现在接受 track_times 参数。此参数会传递给 PyTables 的 create_table 方法 (GH 32682)。
Series.plot() 和 DataFrame.plot() 现在接受 xlabel 和 ylabel 参数，用于在 x 轴和 y 轴上显示标签 (GH 9093)。
使 Rolling 和 Expanding 变为可迭代的 (GH 11704)。
使 option_context 成为一个 contextlib.ContextDecorator，这允许它作为装饰器应用于整个函数 (GH 34253)。
DataFrame.to_csv() 和 Series.to_csv() 现在接受 errors 参数 (GH 22610)。
DataFrameGroupBy.groupby.transform() 现在允许 func 为 pad、backfill 和 cumcount (GH 31269)。
read_json() 现在接受 nrows 参数。(GH 33916)。
DataFrame.hist()、Series.hist()、core.groupby.DataFrameGroupBy.hist() 和 core.groupby.SeriesGroupBy.hist() 新增了 legend 参数。设置为 True 可在直方图中显示图例。(GH 6279)
concat() 和 append() 现在保留扩展 dtype，例如将可空整数列与 numpy 整数列组合时，不再会产生 object dtype，而是保留整数 dtype (GH 33607, GH 34339, GH 34095)。
read_gbq() 现在允许禁用进度条 (GH 33360)。
read_gbq() 现在支持来自 pandas-gbq 的 max_results 关键字参数 (GH 34639)。
DataFrame.cov() 和 Series.cov() 现在支持一个新参数 ddof，以支持类似于对应 numpy 方法中的自由度差 (GH 34611)。
DataFrame.to_html() 和 DataFrame.to_string() 的 col_space 参数现在接受列表或字典，用于更改某些特定列的宽度 (GH 28917)。
DataFrame.to_excel() 现在还可以写入 OpenOffice 电子表格 (.ods) 文件 (GH 27222)。
explode() 现在接受 ignore_index 参数以重置索引，类似于 pd.concat() 或 DataFrame.sort_values() (GH 34932)。
DataFrame.to_markdown() 和 Series.to_markdown() 现在接受 index 参数作为 tabulate 的 showindex 的别名 (GH 32667)。
read_csv() 现在接受字符串值，如“0”、“0.0”、“1”、“1.0”，可转换为可空布尔 dtype (GH 34859)。
ExponentialMovingWindow 现在支持 times 参数，允许使用 times 中的时间戳间隔计算 mean (GH 34839)。
DataFrame.agg() 和 Series.agg() 现在接受命名聚合，用于重命名输出列/索引。(GH 26513)
compute.use_numba 现在作为一个配置选项存在，可在可用时利用 numba 引擎 (GH 33966, GH 35374)。
Series.plot() 现在支持非对称误差条。此前，如果 Series.plot() 收到一个“2xN”数组，其中包含 yerr 和/或 xerr 的错误值，则左/下值（第一行）会被镜像，而右/上值（第二行）会被忽略。现在，第一行表示左/下误差值，第二行表示右/上误差值。(GH 9536)

重要错误修复#

这些错误修复可能会带来显著的行为更改。

`MultiIndex.get_indexer` 正确解释 `method` 参数#

这会将 MultiIndex.get_indexer() 使用 method='backfill' 或 method='pad' 时的行为恢复到 pandas 0.23.0 之前的行为。特别是，MultiIndex 被视为元组列表，填充或回填是根据这些元组列表的顺序进行的 (GH 29896)。

举例来说，给定：

In [47]: df = pd.DataFrame({
   ....:     'a': [0, 0, 0, 0],
   ....:     'b': [0, 2, 3, 4],
   ....:     'c': ['A', 'B', 'C', 'D'],
   ....: }).set_index(['a', 'b'])
   ....: 

In [48]: mi_2 = pd.MultiIndex.from_product([[0], [-1, 0, 1, 3, 4, 5]])

使用 mi_2 对 df 进行重索引并使用 method='backfill' 时的差异可以在这里看到：

pandas >= 0.23, < 1.1.0:

In [1]: df.reindex(mi_2, method='backfill')
Out[1]:
      c
0 -1  A
   0  A
   1  D
   3  A
   4  A
   5  C

pandas <0.23, >= 1.1.0

In [49]: df.reindex(mi_2, method='backfill')
Out[49]: 
        c
0 -1    A
   0    A
   1    B
   3    C
   4    D
   5  NaN

[6 rows x 1 columns]

以及使用 mi_2 对 df 进行重索引并使用 method='pad' 时的差异可以在这里看到：

pandas >= 0.23, < 1.1.0

In [1]: df.reindex(mi_2, method='pad')
Out[1]:
        c
0 -1  NaN
   0  NaN
   1    D
   3  NaN
   4    A
   5    C

pandas < 0.23, >= 1.1.0

In [50]: df.reindex(mi_2, method='pad')
Out[50]: 
        c
0 -1  NaN
   0    A
   1    A
   3    C
   4    D
   5    D

[6 rows x 1 columns]

基于标签的查找失败始终引发 `KeyError`#

标签查找 series[key]、series.loc[key] 和 frame.loc[key] 以前会根据键类型和 Index 类型引发 KeyError 或 TypeError。现在它们始终引发 KeyError (GH 31867)。

In [51]: ser1 = pd.Series(range(3), index=[0, 1, 2])

In [52]: ser2 = pd.Series(range(3), index=pd.date_range("2020-02-01", periods=3))

旧行为:

In [3]: ser1[1.5]
...
TypeError: cannot do label indexing on Int64Index with these indexers [1.5] of type float

In [4] ser1["foo"]
...
KeyError: 'foo'

In [5]: ser1.loc[1.5]
...
TypeError: cannot do label indexing on Int64Index with these indexers [1.5] of type float

In [6]: ser1.loc["foo"]
...
KeyError: 'foo'

In [7]: ser2.loc[1]
...
TypeError: cannot do label indexing on DatetimeIndex with these indexers [1] of type int

In [8]: ser2.loc[pd.Timestamp(0)]
...
KeyError: Timestamp('1970-01-01 00:00:00')

新行为:

In [3]: ser1[1.5]
...
KeyError: 1.5

In [4] ser1["foo"]
...
KeyError: 'foo'

In [5]: ser1.loc[1.5]
...
KeyError: 1.5

In [6]: ser1.loc["foo"]
...
KeyError: 'foo'

In [7]: ser2.loc[1]
...
KeyError: 1

In [8]: ser2.loc[pd.Timestamp(0)]
...
KeyError: Timestamp('1970-01-01 00:00:00')

同样，如果传入不兼容的键，DataFrame.at() 和 Series.at() 将引发 TypeError 而不是 ValueError；如果传入缺失的键，则引发 KeyError，这与 .loc[] 的行为一致 (GH 31722)。

`MultiIndex` 上的整数查找失败引发 `KeyError`#

当 MultiIndex 的第一层为整数 dtype 时，使用整数进行索引在索引的第一层中不存在一个或多个整数键时错误地未能引发 KeyError (GH 33539)。

In [53]: idx = pd.Index(range(4))

In [54]: dti = pd.date_range("2000-01-03", periods=3)

In [55]: mi = pd.MultiIndex.from_product([idx, dti])

In [56]: ser = pd.Series(range(len(mi)), index=mi)

旧行为:

In [5]: ser[[5]]
Out[5]: Series([], dtype: int64)

新行为:

In [5]: ser[[5]]
...
KeyError: '[5] not in index'

`DataFrame.merge()` 保留右侧 DataFrame 的行顺序#

执行右合并时，DataFrame.merge() 现在会保留右侧 DataFrame 的行顺序 (GH 27453)。

In [57]: left_df = pd.DataFrame({'animal': ['dog', 'pig'],
   ....:                        'max_speed': [40, 11]})
   ....: 

In [58]: right_df = pd.DataFrame({'animal': ['quetzal', 'pig'],
   ....:                         'max_speed': [80, 11]})
   ....: 

In [59]: left_df
Out[59]: 
  animal  max_speed
0    dog         40
1    pig         11

[2 rows x 2 columns]

In [60]: right_df
Out[60]: 
    animal  max_speed
0  quetzal         80
1      pig         11

[2 rows x 2 columns]

旧行为:

>>> left_df.merge(right_df, on=['animal', 'max_speed'], how="right")
    animal  max_speed
0      pig         11
1  quetzal         80

新行为:

In [61]: left_df.merge(right_df, on=['animal', 'max_speed'], how="right")
Out[61]: 
    animal  max_speed
0  quetzal         80
1      pig         11

[2 rows x 2 columns]

当某些列不存在时，为 DataFrame 的多个列赋值#

以前，当为 DataFrame 的多个列赋值且其中一些列不存在时，会将值赋给最后一列。现在，将使用正确的值构造新列。(GH 13658)

In [62]: df = pd.DataFrame({'a': [0, 1, 2], 'b': [3, 4, 5]})

In [63]: df
Out[63]: 
   a  b
0  0  3
1  1  4
2  2  5

[3 rows x 2 columns]

旧行为:

In [3]: df[['a', 'c']] = 1
In [4]: df
Out[4]:
   a  b
0  1  1
1  1  1
2  1  1

新行为:

In [64]: df[['a', 'c']] = 1

In [65]: df
Out[65]: 
   a  b  c
0  1  3  1
1  1  4  1
2  1  5  1

[3 rows x 3 columns]

`groupby` 聚合操作的一致性#

以前，使用 DataFrame.groupby() 配合 as_index=True 和聚合函数 nunique 时，结果列中会包含分组列。现在，分组列仅出现在索引中，与其他聚合操作保持一致。(GH 32579)

In [66]: df = pd.DataFrame({"a": ["x", "x", "y", "y"], "b": [1, 1, 2, 3]})

In [67]: df
Out[67]: 
   a  b
0  x  1
1  x  1
2  y  2
3  y  3

[4 rows x 2 columns]

旧行为:

In [3]: df.groupby("a", as_index=True).nunique()
Out[4]:
   a  b
a
x  1  1
y  1  2

新行为:

In [68]: df.groupby("a", as_index=True).nunique()
Out[68]: 
   b
a   
x  1
y  2

[2 rows x 1 columns]

以前，使用 DataFrame.groupby() 配合 as_index=False 和函数 idxmax、idxmin、mad、nunique、sem、skew 或 std 时，会修改分组列。现在，分组列保持不变，与其他聚合操作保持一致。(GH 21090, GH 10355)

旧行为:

In [3]: df.groupby("a", as_index=False).nunique()
Out[4]:
   a  b
0  1  1
1  1  2

新行为:

In [69]: df.groupby("a", as_index=False).nunique()
Out[69]: 
   a  b
0  x  1
1  y  2

[2 rows x 2 columns]

以前，DataFrameGroupBy.size() 方法会忽略 as_index=False。现在，分组列作为列返回，使结果成为 DataFrame 而不是 Series。(GH 32599)

旧行为:

In [3]: df.groupby("a", as_index=False).size()
Out[4]:
a
x    2
y    2
dtype: int64

新行为:

In [70]: df.groupby("a", as_index=False).size()
Out[70]: 
   a  size
0  x     2
1  y     2

[2 rows x 2 columns]

`DataFrameGroupby.agg()` 在 `as_index=False` 且重命名列时丢失结果#

此前，当 DataFrameGroupby.agg() 将 as_index 选项设置为 False 并且结果列被重新标记时，会丢失结果列。在这种情况下，结果值会被之前的索引替换 (GH 32240)。

In [71]: df = pd.DataFrame({"key": ["x", "y", "z", "x", "y", "z"],
   ....:                    "val": [1.0, 0.8, 2.0, 3.0, 3.6, 0.75]})
   ....: 

In [72]: df
Out[72]: 
  key   val
0   x  1.00
1   y  0.80
2   z  2.00
3   x  3.00
4   y  3.60
5   z  0.75

[6 rows x 2 columns]

旧行为:

In [2]: grouped = df.groupby("key", as_index=False)
In [3]: result = grouped.agg(min_val=pd.NamedAgg(column="val", aggfunc="min"))
In [4]: result
Out[4]:
     min_val
 0   x
 1   y
 2   z

新行为:

In [73]: grouped = df.groupby("key", as_index=False)

In [74]: result = grouped.agg(min_val=pd.NamedAgg(column="val", aggfunc="min"))

In [75]: result
Out[75]: 
  key  min_val
0   x     1.00
1   y     0.80
2   z     0.75

[3 rows x 2 columns]

`DataFrame` 上的 `apply` 和 `applymap` 仅评估第一行/列一次#

In [76]: df = pd.DataFrame({'a': [1, 2], 'b': [3, 6]})

In [77]: def func(row):
   ....:     print(row)
   ....:     return row
   ....: 

旧行为:

In [4]: df.apply(func, axis=1)
a    1
b    3
Name: 0, dtype: int64
a    1
b    3
Name: 0, dtype: int64
a    2
b    6
Name: 1, dtype: int64
Out[4]:
   a  b
0  1  3
1  2  6

新行为:

In [78]: df.apply(func, axis=1)
a    1
b    3
Name: 0, Length: 2, dtype: int64
a    2
b    6
Name: 1, Length: 2, dtype: int64
Out[78]: 
   a  b
0  1  3
1  2  6

[2 rows x 2 columns]

向后不兼容的 API 更改#

为 `testing.assert_frame_equal` 和 `testing.assert_series_equal` 添加了 `check_freq` 参数#

在 pandas 1.1.0 中，check_freq 参数已添加到 testing.assert_frame_equal() 和 testing.assert_series_equal()，并默认为 True。如果索引没有相同的频率，testing.assert_frame_equal() 和 testing.assert_series_equal() 现在将引发 AssertionError。在 pandas 1.1.0 之前，不检查索引频率。

提高了依赖项的最低版本要求#

一些依赖项的最低支持版本已更新 (GH 33718, GH 29766, GH 29723, pytables >= 3.4.3)。如果安装，我们现在要求：

包	最低版本	必需	已更改
numpy	1.15.4	X	X
pytz	2015.4	X
python-dateutil	2.7.3	X	X
bottleneck	1.2.1
numexpr	2.6.2
pytest (dev)	4.0.2

对于可选库，一般建议使用最新版本。下表列出了在 pandas 开发过程中目前正在测试的每个库的最低版本。低于最低测试版本的可选库可能仍然可用，但不被视为受支持。

包	最低版本	已更改
beautifulsoup4	4.6.0
fastparquet	0.3.2
fsspec	0.7.4
gcsfs	0.6.0	X
lxml	3.8.0
matplotlib	2.2.2
numba	0.46.0
openpyxl	2.5.7
pyarrow	0.13.0
pymysql	0.7.1
pytables	3.4.3	X
s3fs	0.4.0	X
scipy	1.2.0	X
sqlalchemy	1.1.4
xarray	0.8.2
xlrd	1.1.0
xlsxwriter	0.9.8
xlwt	1.2.0
pandas-gbq	1.2.0	X

更多信息请参阅依赖项和可选依赖项。

开发变更#

Cython 的最低版本现在是最新的错误修复版本 (0.29.16) (GH 33334)。

弃用#

使用包含切片的单项列表对 Series 进行查找（例如 ser[[slice(0, 4)]]）已被弃用，并将在未来版本中引发错误。请将其转换为元组，或直接传递切片 (GH 31333)。
未来版本中，使用 numeric_only=None 的 DataFrame.mean() 和 DataFrame.median() 将包含 datetime64 和 datetime64tz 列 (GH 29941)。
使用位置切片通过 .loc 设置值已被弃用，并将在未来版本中引发错误。请改用带标签的 .loc 或带位置的 .iloc (GH 31840)。
DataFrame.to_dict() 已弃用接受 orient 的短名称，并将在未来版本中引发错误 (GH 32515)。
Categorical.to_dense() 已弃用，并将在未来版本中移除，请改用 np.asarray(cat) (GH 32639)。
SingleBlockManager 构造函数中的 fastpath 关键字已被弃用，并将在未来版本中移除 (GH 33092)。
在 pandas.merge() 中将 suffixes 作为 set 提供已被弃用。请改为提供元组 (GH 33740, GH 34741)。
使用多维索引器（如 [:, None]）对 Series 进行索引以返回 ndarray 现在会引发 FutureWarning。请改为在索引之前转换为 NumPy 数组 (GH 27837)。
Index.is_mixed() 已弃用，并将在未来版本中移除，请改为直接检查 index.inferred_type (GH 32922)。
将除第一个参数之外的任何参数作为位置参数传递给 read_html() 已被弃用。所有其他参数都应作为关键字参数给出 (GH 27573)。
将除 path_or_buf（第一个）之外的任何参数作为位置参数传递给 read_json() 已被弃用。所有其他参数都应作为关键字参数给出 (GH 27573)。
将除前两个参数之外的任何参数作为位置参数传递给 read_excel() 已被弃用。所有其他参数都应作为关键字参数给出 (GH 27573)。
pandas.api.types.is_categorical() 已弃用，并将在未来版本中移除；请改用 pandas.api.types.is_categorical_dtype() (GH 33385)
Index.get_value() 已弃用，并将在未来版本中移除 (GH 19728)
Series.dt.week() 和 Series.dt.weekofyear() 已弃用，并将在未来版本中移除，请改用 Series.dt.isocalendar().week() (GH 33595)
DatetimeIndex.week() 和 DatetimeIndex.weekofyear 已弃用，并将在未来版本中移除，请改用 DatetimeIndex.isocalendar().week (GH 33595)
DatetimeArray.week() 和 DatetimeArray.weekofyear 已弃用，并将在未来版本中移除，请改用 DatetimeArray.isocalendar().week (GH 33595)
DateOffset.__call__() 已弃用，并将在未来版本中移除，请改用 offset + other (GH 34171)
apply_index() 已弃用，并将在未来版本中移除。请改用 offset + other (GH 34580)
DataFrame.tshift() 和 Series.tshift() 已弃用，并将在未来版本中移除，请改用 DataFrame.shift() 和 Series.shift() (GH 11631)
使用浮点键索引 Index 对象已弃用，并将在未来引发 IndexError。您可以手动转换为整数键代替 (GH 34191)。
groupby() 中的 squeeze 关键字已弃用，并将在未来版本中移除 (GH 32380)
Period.to_timestamp() 中的 tz 关键字已弃用，并将在未来版本中移除；请改用 per.to_timestamp(...).tz_localize(tz) (GH 34522)
DatetimeIndex.to_perioddelta() 已弃用，并将在未来版本中移除。请改用 index - index.to_period(freq).to_timestamp() (GH 34853)
DataFrame.melt() 接受已存在的 value_name 已弃用，并将在未来版本中移除 (GH 34731)
DataFrame.expanding() 函数中的 center 关键字已弃用，并将在未来版本中移除 (GH 20647)

性能改进#

Timedelta 构造函数中的性能改进 (GH 30543)
Timestamp 构造函数中的性能改进 (GH 30543)
DataFrame 和 Series 之间使用 axis=0 进行弹性算术运算的性能改进 (GH 31296)
DataFrame 和 Series 之间使用 axis=1 进行算术运算的性能改进 (GH 33600)
内部索引方法 _shallow_copy() 现在会将缓存属性复制到新索引，从而避免在新索引上再次创建这些属性。这可以加速许多依赖于创建现有索引副本的操作 (GH 28584, GH 32640, GH 32669)
使用 DataFrame.sparse.from_spmatrix() 构造函数从 scipy.sparse 矩阵创建带有稀疏值的 DataFrame 时，性能显著提升 (GH 32821, GH 32825, GH 32826, GH 32856, GH 32858)。
groupby 方法 Groupby.first() 和 Groupby.last() 的性能改进 (GH 34178)
factorize() 对于可空（整数和布尔）数据类型的性能改进 (GH 33064)。
构造 Categorical 对象的性能改进 (GH 33921)
修复了 pandas.qcut() 和 pandas.cut() 中的性能下降问题 (GH 33921)
可空（整数和布尔）数据类型的聚合操作（sum、prod、min、max）的性能改进 (GH 30982, GH 33261, GH 33442)。
两个 DataFrame 对象之间算术运算的性能改进 (GH 32779)
RollingGroupby 的性能改进 (GH 34052)
MultiIndex 算术运算（sub、add、mul、div）的性能改进 (GH 34297)
当 bool_indexer 是 list 时，DataFrame[bool_indexer] 的性能改进 (GH 33924)
通过 io.formats.style.Styler.apply()、io.formats.style.Styler.applymap() 或 io.formats.style.Styler.bar() 等各种方式添加样式的 io.formats.style.Styler.render() 性能显著提升 (GH 19917)

错误修复#

类别型#

将无效的 fill_value 传递给 Categorical.take() 会引发 ValueError 而非 TypeError (GH 33660)
将包含整数类别且包含缺失值的 Categorical 与浮点数据类型列在 concat() 或 append() 等操作中合并时，现在将生成浮点列而不是对象数据类型列 (GH 33607)
修复了 merge() 无法在非唯一类别索引上连接的错误 (GH 28189)
修复了将类别数据与 dtype=object 一起传递给 Index 构造函数时，错误地返回 CategoricalIndex 而非对象数据类型 Index 的错误 (GH 32167)
修复了当任一元素缺失时，Categorical 比较运算符 __ne__ 错误地评估为 False 的错误 (GH 32276)
Categorical.fillna() 现在接受 Categorical other 参数 (GH 32420)
Categorical 的 Repr 没有区分 int 和 str (GH 33676)

日期时间类型#

将除 int64 以外的整数数据类型传递给 np.array(period_index, dtype=...) 现在将引发 TypeError，而不是错误地使用 int64 (GH 32255)
如果轴不是 PeriodIndex，Series.to_timestamp() 现在会引发 TypeError。以前会引发 AttributeError (GH 33327)
如果轴不是 DatetimeIndex，Series.to_period() 现在会引发 TypeError。以前会引发 AttributeError (GH 33327)
Period 不再接受元组作为 freq 参数 (GH 34658)
修复了 Timestamp 中的错误，从模糊的纪元时间构造 Timestamp 并再次调用构造函数会改变 Timestamp.value() 属性 (GH 24329)
DatetimeArray.searchsorted()、TimedeltaArray.searchsorted()、PeriodArray.searchsorted() 不识别非 pandas 标量并错误地引发 ValueError 而非 TypeError (GH 30950)
修复了 Timestamp 中的错误，使用 dateutil 时区构造 Timestamp，如果在夏令时从冬季切换到夏季之前不到 128 纳秒，将导致时间不存在 (GH 31043)
修复了 Period.to_timestamp()、Period.start_time() 在微秒频率下返回的时间戳比正确时间早一纳秒的错误 (GH 31475)
当缺少年、月或日时，Timestamp 曾引发令人困惑的错误消息 (GH 31200)
修复了 DatetimeIndex 构造函数错误地接受 bool 数据类型输入的错误 (GH 32668)
修复了 DatetimeIndex.searchsorted() 不接受 list 或 Series 作为其参数的错误 (GH 32762)
修复了传递字符串 Series 时 PeriodIndex() 引发错误的错误 (GH 26109)
修复了 Timestamp 算术中，当添加或减去具有 timedelta64 数据类型的 np.ndarray 时的错误 (GH 33296)
修复了 DatetimeIndex.to_period() 在不带参数调用时无法推断频率的错误 (GH 33358)
修复了 DatetimeIndex.tz_localize() 错误地保留 freq，在某些情况下原始 freq 不再有效的错误 (GH 30511)
修复了 DatetimeIndex.intersection() 在某些情况下丢失 freq 和时区的错误 (GH 33604)
修复了 DatetimeIndex.get_indexer() 在混合日期时间类型目标下返回错误输出的错误 (GH 33741)
修复了 DatetimeIndex 与某些 DateOffset 对象进行加减运算时错误地保留无效 freq 属性的错误 (GH 33779)
修复了 DatetimeIndex 中的错误，设置索引的 freq 属性可能悄悄更改查看相同数据的另一个索引的 freq 属性 (GH 33552)
DataFrame.min() 和 DataFrame.max() 在对空 pd.to_datetime() 初始化对象调用时，未与 Series.min() 和 Series.max() 返回一致结果
修复了 DatetimeIndex.intersection() 和 TimedeltaIndex.intersection() 结果没有正确 name 属性的错误 (GH 33904)
修复了 DatetimeArray.__setitem__()、TimedeltaArray.__setitem__()、PeriodArray.__setitem__() 错误地允许 int64 数据类型的值被静默转换的错误 (GH 33717)
修复了从 Period 中减去 TimedeltaIndex 时，在某些应成功的情况下错误地引发 TypeError，在某些应引发 TypeError 的情况下引发 IncompatibleFrequency 的错误 (GH 33883)
修复了从具有非纳秒分辨率的只读 NumPy 数组构造 Series 或 Index 时，错误地转换为对象数据类型，而不是在时间戳范围内强制转换为 datetime64[ns] 数据类型的错误 (GH 34843)。
Period、date_range()、period_range()、pd.tseries.frequencies.to_offset() 中的 freq 关键字不再允许元组，请改用字符串 (GH 34703)
修复了 DataFrame.append() 中的错误，当将包含标量时区感知 Timestamp 的 Series 追加到空 DataFrame 时，结果是对象列而不是 datetime64[ns, tz] 数据类型 (GH 35038)
当时间戳超出实现范围时，OutOfBoundsDatetime 会发出改进的错误消息。 (GH 32967)
修复了 AbstractHolidayCalendar.holidays() 在未定义规则时的错误 (GH 31415)
修复了 Tick 比较中，与 timedelta-like 对象比较时引发 TypeError 的错误 (GH 34088)
修复了 Tick 乘法中，乘以浮点数时引发 TypeError 的错误 (GH 34486)

时间差#

修复了构造具有高精度整数的 Timedelta 时，会四舍五入 Timedelta 组件的错误 (GH 31354)
修复了将 np.nan 或 None 除以 Timedelta 时错误地返回 NaT 的错误 (GH 31869)
Timedelta 现在将 µs 理解为微秒的标识符 (GH 32899)
当纳秒不为零时，Timedelta 字符串表示现在包含纳秒 (GH 9309)
修复了将 Timedelta 对象与具有 timedelta64 数据类型的 np.ndarray 进行比较时，错误地将所有条目视为不相等的错误 (GH 33441)
修复了 timedelta_range() 在边缘情况下多生成一个点的错误 (GH 30353, GH 33498)
修复了 DataFrame.resample() 在边缘情况下多生成一个点的错误 (GH 30353, GH 13022, GH 33498)
修复了 DataFrame.resample() 在处理时间差时忽略 loffset 参数的错误 (GH 7687, GH 33498)
修复了 Timedelta 和 pandas.to_timedelta() 中，对于字符串输入忽略 unit 参数的错误 (GH 12136)

时区#

修复了 to_datetime() 中 infer_datetime_format=True 时，时区名称（例如 UTC）无法正确解析的错误 (GH 33133)

数值型#

修复了 DataFrame.floordiv() 中，axis=0 时未像 Series.floordiv() 那样处理除以零的错误 (GH 31271)
修复了 to_numeric() 中，字符串参数 "uint64" 和 errors="coerce" 静默失败的错误 (GH 32394)
修复了 to_numeric() 中，downcast="unsigned" 对空数据失败的错误 (GH 32493)
修复了 DataFrame.mean() 中，numeric_only=False 且列为 datetime64 数据类型或 PeriodDtype 时错误地引发 TypeError 的错误 (GH 32426)
修复了 DataFrame.count() 中，level="foo" 且索引级别 "foo" 包含 NaNs 时导致段错误的错误 (GH 21824)
修复了 DataFrame.diff() 中，axis=1 时混合数据类型返回不正确结果的错误 (GH 32995)
修复了 DataFrame.corr() 和 DataFrame.cov() 在处理包含 pandas.NA 的可空整数列时引发错误的错误 (GH 33803)
修复了具有非重叠列且带重复标签的 DataFrame 对象之间的算术运算中导致无限循环的错误 (GH 35194)
修复了 DataFrame 和 Series 在对象数据类型对象和 datetime64 数据类型对象之间的加减运算中存在的错误 (GH 33824)
修复了 Index.difference() 在比较 Float64Index 和对象 Index 时给出不正确结果的错误 (GH 35217)
修复了 DataFrame 聚合操作（例如 df.min(), df.max()）中 ExtensionArray 数据类型的错误 (GH 34520, GH 32651)
如果 limit_direction 是 'forward' 或 'both' 且 method 是 'backfill' 或 'bfill'，或者 limit_direction 是 'backward' 或 'both' 且 method 是 'pad' 或 'ffill'，Series.interpolate() 和 DataFrame.interpolate() 现在会引发 ValueError (GH 34746)

转换#

修复了从具有大端 datetime64 数据类型的 NumPy 数组构造 Series 时的错误 (GH 29684)
修复了构造 Timedelta 时，纳秒关键字值过大引发的错误 (GH 32402)
修复了 DataFrame 构造中，集合会被复制而不是引发错误的错误 (GH 32582)
DataFrame 构造函数不再接受 DataFrame 对象列表。由于 NumPy 的更改，DataFrame 对象现在始终被视为二维对象，因此 DataFrame 对象列表被视为三维，不再被 DataFrame 构造函数接受 (GH 32289)。
修复了 DataFrame 中，使用列表初始化框架并为 MultiIndex 分配带有嵌套列表的 columns 时的错误 (GH 32173)
创建新索引时，改进了列表无效构造的错误消息 (GH 35190)

字符串#

修复了 astype() 方法中，将“string”数据类型数据转换为可为空整数数据类型时的错误 (GH 32450)。
修复了当对 StringArray 或 StringDtype 类型的 Series 取 min 或 max 时会引发错误的问题。 (GH 31746)
修复了 Series.str.cat() 在其他具有 Index 类型时返回 NaN 输出的错误 (GH 33425)
pandas.api.dtypes.is_string_dtype() 不再错误地将类别序列识别为字符串。

区间#

修复了 IntervalArray 错误地允许在设置值时更改底层数据的错误 (GH 32782)

索引#

如果提供了 level 关键字且轴不是 MultiIndex，DataFrame.xs() 现在会引发 TypeError。以前会引发 AttributeError (GH 33610)
修复了在 DatetimeIndex 上使用部分时间戳进行切片时，在年、季度或月份末尾附近丢失高分辨率索引的错误 (GH 31064)
修复了 PeriodIndex.get_loc() 将高分辨率字符串与 PeriodIndex.get_value() 不同对待的错误 (GH 31172)
修复了 Series.at() 和 DataFrame.at() 在 Float64Index 中查找整数时与 .loc 行为不匹配的错误 (GH 31329)
修复了 PeriodIndex.is_monotonic() 在包含前导 NaT 条目时错误地返回 True 的错误 (GH 31437)
修复了 DatetimeIndex.get_loc() 使用转换后的整数键而不是用户传递的键引发 KeyError 的错误 (GH 31425)
修复了 Series.xs() 在某些对象数据类型情况下错误地返回 Timestamp 而非 datetime64 的错误 (GH 31630)
修复了 DataFrame.iat() 在某些对象数据类型情况下错误地返回 Timestamp 而非 datetime 的错误 (GH 32809)
修复了 DataFrame.at() 在列或索引非唯一时的错误 (GH 33041)
修复了 Series.loc() 和 DataFrame.loc() 在对象数据类型 Index 上使用整数键进行索引时，如果 Index 不全为整数，则会出现错误 (GH 31905)
修复了 DataFrame.iloc.__setitem__() 在具有重复列的 DataFrame 上错误地为所有匹配列设置值的错误 (GH 15686, GH 22036)
修复了 DataFrame.loc() 和 Series.loc() 在 DatetimeIndex、TimedeltaIndex 或 PeriodIndex 上错误地允许查找不匹配的日期时间类型数据类型的错误 (GH 32650)
修复了 Series.__getitem__() 使用非标准标量（例如 np.dtype）进行索引的错误 (GH 32684)
修复了 Index 构造函数中，对 NumPy 标量引发无用错误消息的错误 (GH 33017)
修复了 DataFrame.lookup() 在 frame.index 或 frame.columns 不唯一时错误地引发 AttributeError 的错误；现在将引发带有有用错误消息的 ValueError (GH 33041)
修复了 Interval 中，无法将 Timedelta 添加或从 Timestamp 区间中减去的错误 (GH 32023)
修复了 DataFrame.copy() 未在复制后使 _item_cache 失效，导致复制后的值更新未反映的错误 (GH 31784)
修复了 DataFrame.loc() 和 Series.loc() 在提供 datetime64[ns, tz] 值时引发错误的回归 (GH 32395)
修复了 Series.__getitem__() 中，使用整数键和具有前导整数级别的 MultiIndex 时，如果键不在第一级别中，则未能引发 KeyError 的错误 (GH 33355)
修复了 DataFrame.iloc() 中，使用 ExtensionDtype 对单列 DataFrame 进行切片（例如 df.iloc[:, :1]）时返回无效结果的错误 (GH 32957)
修复了 DatetimeIndex.insert() 和 TimedeltaIndex.insert() 中，将元素设置到空 Series 中时导致索引 freq 丢失的错误 (GH 33573)
修复了 Series.__setitem__() 中，使用 IntervalIndex 和整数的类列表键时的错误 (GH 33473)
修复了 Series.__getitem__() 允许 np.ndarray、Index、Series 索引器使用缺失标签，但不允许 list，现在这些都会引发 KeyError 的错误 (GH 33646)
修复了 DataFrame.truncate() 和 Series.truncate() 中，假定索引单调递增的错误 (GH 33756)
使用表示日期时间字符串列表进行索引在 DatetimeIndex 或 PeriodIndex 上失败 (GH 11278)
修复了 Series.at() 在与 MultiIndex 一起使用时，对有效输入会引发异常的错误 (GH 26989)
修复了 DataFrame.loc() 中，字典值将 int 数据类型的列更改为 float 的错误 (GH 34573)
修复了 Series.loc() 在与 MultiIndex 一起使用时，访问 None 值会引发 IndexingError 的错误 (GH 34318)
修复了 DataFrame.reset_index() 和 Series.reset_index() 在空 DataFrame 或带有 MultiIndex 的 Series 上无法保留数据类型的错误 (GH 19602)
修复了 Series 和 DataFrame 在带有 NaT 条目的 DatetimeIndex 上使用 time 键进行索引时的错误 (GH 35114)

缺失值#

对空 Series 调用 fillna() 现在正确返回一个浅拷贝对象。该行为现在与 Index、DataFrame 和非空 Series 一致 (GH 32543)。
修复了 Series.replace() 中的错误，当参数 to_replace 为字典/列表类型并用于包含 <NA> 的 Series 时会引发 TypeError。该方法现在通过在替换比较时忽略 <NA> 值来处理此问题 (GH 32621)
修复了 any() 和 all() 在使用可为空布尔数据类型且 skipna=False 时，对于所有 False 或所有 True 值错误地返回 <NA> 的错误 (GH 33253)
澄清了关于使用 method=akima 进行插值的文档。参数 der 必须是标量或 None (GH 33426)
DataFrame.interpolate() 现在使用正确的轴约定。此前，沿列进行插值会导致沿索引进行插值，反之亦然。此外，使用 pad、ffill、bfill 和 backfill 方法进行插值与使用 DataFrame.fillna() 的这些方法相同 (GH 12918, GH 29146)
修复了 DataFrame.interpolate() 在对具有字符串类型列名的 DataFrame 调用时会抛出 ValueError 的 bug。该方法现在独立于列名的类型 (GH 33956)
现在可以将 NA 传递到使用格式规范的格式字符串中。例如，"{:.1f}".format(pd.NA) 之前会引发 ValueError，但现在会返回字符串 "<NA>" (GH 34740)
修复了 Series.map() 在无效的 na_action 时未引发错误的 bug (GH 32815)

多重索引#

DataFrame.swaplevels() 现在在轴不是 MultiIndex 时会引发 TypeError。此前会引发 AttributeError (GH 31126)
修复了 Dataframe.loc() 在与 MultiIndex 一起使用时存在的 bug。返回的值与给定输入值的顺序不一致 (GH 22797)

In [79]: df = pd.DataFrame(np.arange(4),
   ....:                   index=[["a", "a", "b", "b"], [1, 2, 1, 2]])
   ....: 

# Rows are now ordered as the requested keys
In [80]: df.loc[(['b', 'a'], [2, 1]), :]
Out[80]: 
     0
b 2  3
  1  2
a 2  1
  1  0

[4 rows x 1 columns]

修复了 MultiIndex.intersection() 在 sort=False 时不保证保留顺序的 bug (GH 31325)
修复了 DataFrame.truncate() 会删除 MultiIndex 名称的 bug (GH 34564)

In [81]: left = pd.MultiIndex.from_arrays([["b", "a"], [2, 1]])

In [82]: right = pd.MultiIndex.from_arrays([["a", "b", "c"], [1, 2, 3]])

# Common elements are now guaranteed to be ordered by the left side
In [83]: left.intersection(right, sort=False)
Out[83]: 
MultiIndex([('b', 2),
            ('a', 1)],
           )

修复了在未指定级别且列不同的情况下连接两个 MultiIndex 时存在的 bug。参数 return-indexers 被忽略 (GH 34074)

输入/输出#

将 set 作为 names 参数传递给 pandas.read_csv()、pandas.read_table() 或 pandas.read_fwf() 将引发 ValueError: Names should be an ordered collection. (GH 34946)
修复了当 display.precision 为零时打印输出中的 bug (GH 20359)
修复了 read_json() 在 json 包含大数字字符串时发生整数溢出的 bug (GH 30320)
当参数 header 和 prefix 都不为 None 时，read_csv() 现在将引发 ValueError (GH 27394)
修复了 DataFrame.to_json() 在 path_or_buf 是 S3 URI 时引发 NotFoundError 的 bug (GH 28375)
修复了 DataFrame.to_parquet() 覆盖 pyarrow 的 coerce_timestamps 默认值的 bug；遵循 pyarrow 的默认值允许使用 version="2.0" 写入纳秒时间戳 (GH 31652)。
修复了 read_csv() 在 sep=None 与 comment 关键字结合使用时引发 TypeError 的 bug (GH 31396)
修复了 HDFStore 的 bug，该 bug 导致在 Python 3 中读取 Python 2 写入的固定格式 DataFrame 时，datetime64 列的 dtype 被设置为 int64 (GH 31750)
read_sas() 现在可以处理大于 Timestamp.max 的日期和日期时间，将它们作为 datetime.datetime 对象返回 (GH 20927)
修复了 DataFrame.to_json() 在 Timedelta 对象在 date_format="iso" 时无法正确序列化的 bug (GH 28256)
当 parse_dates 中传递的列名在 Dataframe 中缺失时，read_csv() 将引发 ValueError (GH 31251)
修复了 read_excel() 中，当 UTF-8 字符串包含高位代理时会导致分段冲突的 bug (GH 23809)
修复了 read_csv() 在空文件上导致文件描述符泄漏的 bug (GH 31488)
修复了 read_csv() 中当头部和数据行之间存在空行时导致段错误的 bug (GH 28071)
修复了 read_csv() 在权限问题上引发误导性异常的 bug (GH 23784)
修复了 read_csv() 在 header=None 且有两个额外数据列时引发 IndexError 的 bug
修复了 read_sas() 在读取 Google Cloud Storage 文件时引发 AttributeError 的 bug (GH 33069)
修复了 DataFrame.to_sql() 在保存超出范围的日期时引发 AttributeError 的 bug (GH 26761)
修复了 read_excel() 未正确处理 OpenDocument 文本单元格中多个嵌入空格的 bug (GH 32207)
修复了 read_json() 在将布尔值 list 读取到 Series 时引发 TypeError 的 bug (GH 31464)
修复了 pandas.io.json.json_normalize() 中 record_path 指定的位置未指向数组的 bug (GH 26284)
当加载不支持的 HDF 文件时，pandas.read_hdf() 现在有更明确的错误消息 (GH 9539)
修复了 read_feather() 在读取 s3 或 http 文件路径时引发 ArrowIOError 的 bug (GH 29055)
修复了 to_excel() 无法处理列名 render 并引发 KeyError 的 bug (GH 34331)
修复了 execute() 在 SQL 语句包含 % 字符且没有参数时，对某些 DB-API 驱动程序引发 ProgrammingError 的 bug (GH 34211)
修复了 StataReader() 的 bug，该 bug 导致在使用迭代器读取数据时，分类变量具有不同的 dtypes (GH 31544)
HDFStore.keys() 现在有一个可选的 include 参数，允许检索所有原生 HDF5 表名 (GH 29916)
当传递意外的关键字参数时，read_csv() 和 read_table() 引发的 TypeError 异常显示为 parser_f (GH 25648)
修复了 read_excel() 对于 ODS 文件删除 0.0 值的 bug (GH 27222)
修复了 ujson.encode() 在数字大于 sys.maxsize 时引发 OverflowError 的 bug (GH 34395)
修复了 HDFStore.append_to_multiple() 在设置 min_itemsize 参数时引发 ValueError 的 bug (GH 11238)
修复了 create_table() 现在在输入中未在 data_columns 中指定 column 参数时会引发错误的 bug (GH 28156)
当设置了 lines 和 chunksize 时，read_json() 现在可以从文件 URL 读取按行分隔的 json 文件。
修复了 DataFrame.to_sql() 在使用 MySQL 读取包含 -np.inf 条目的 DataFrame 时会引发更明确的 ValueError 的 bug (GH 34431)
修复了 read_* 函数不会解压大写文件扩展名的 bug (GH 35164)
修复了 read_excel() 在 header=None 且 index_col 以 list 形式给出时引发 TypeError 的 bug (GH 31783)
修复了 read_excel() 在 MultiIndex 中使用日期时间值作为头部时存在的 bug (GH 34748)
read_excel() 不再接受 **kwds 参数。这意味着传递关键字参数 chunksize 现在会引发 TypeError (此前引发 NotImplementedError)，而传递关键字参数 encoding 现在会引发 TypeError (GH 34464)
修复了 DataFrame.to_records() 在时区感知 datetime64 列中错误地丢失时区信息的 bug (GH 32535)

绘图#

DataFrame.plot() 用于线/条形图现在支持按字典设置颜色 (GH 8193)。
修复了 DataFrame.plot.hist() 中权重对多列不起作用的 bug (GH 33173)
修复了 DataFrame.boxplot() 和 DataFrame.plot.boxplot() 丢失 medianprops、whiskerprops、capprops 和 boxprops 颜色属性的 bug (GH 30346)
修复了 DataFrame.hist() 中 column 参数顺序被忽略的 bug (GH 29235)
修复了 DataFrame.plot.scatter() 中添加多个具有不同 cmap 的图时，颜色条总是使用第一个 cmap 的 bug (GH 33389)
修复了 DataFrame.plot.scatter() 即使参数 c 分配给包含颜色名称的列时也会向图添加颜色条的 bug (GH 34316)
修复了 pandas.plotting.bootstrap_plot() 导致坐标轴混乱和标签重叠的 bug (GH 34905)
修复了 DataFrame.plot.scatter() 在绘制可变标记大小时导致错误的 bug (GH 32904)

分组/重采样/滚动#

将 pandas.api.indexers.BaseIndexer 与 count、min、max、median、skew、cov、corr 一起使用时，现在将为任何单调 pandas.api.indexers.BaseIndexer 后代返回正确的结果 (GH 32865)
DataFrameGroupby.mean() 和 SeriesGroupby.mean() (以及类似的 median()、std() 和 var()) 现在在传递非接受的关键字参数时会引发 TypeError。此前会引发 UnsupportedFunctionCall (如果 min_count 传递给 median() 则引发 AssertionError) (GH 31485)
修复了 DataFrameGroupBy.apply() 和 SeriesGroupBy.apply() 在 by 轴未排序、有重复项且应用的 func 未修改传入对象时引发 ValueError 的 bug (GH 30667)
修复了 DataFrameGroupBy.transform() 使用转换函数产生不正确结果的 bug (GH 30918)
修复了 DataFrameGroupBy.transform() 和 SeriesGroupBy.transform() 在按多个键分组时（其中一些是分类键，另一些不是）返回错误结果的 bug (GH 32494)
修复了 DataFrameGroupBy.count() 和 SeriesGroupBy.count() 在分组列包含 NaNs 时导致分段错误的 bug (GH 32841)
修复了 DataFrame.groupby() 和 Series.groupby() 在聚合布尔型 Series 时产生不一致类型的 bug (GH 32894)
修复了 DataFrameGroupBy.sum() 和 SeriesGroupBy.sum() 在非空值数量低于可空整数 dtypes 的 min_count 时返回大负数的 bug (GH 32861)
修复了 SeriesGroupBy.quantile() 在可空整数上引发错误的 bug (GH 33136)
修复了 DataFrame.resample() 在结果时区感知 DatetimeIndex 在午夜发生夏令时转换时会引发 AmbiguousTimeError 的 bug (GH 25758)
修复了 DataFrame.groupby() 在按具有只读类别的分类列分组且 sort=False 时会引发 ValueError 的 bug (GH 33410)
修复了 DataFrameGroupBy.agg()、SeriesGroupBy.agg()、DataFrameGroupBy.transform()、SeriesGroupBy.transform()、DataFrameGroupBy.resample() 和 SeriesGroupBy.resample() 中未保留子类的 bug (GH 28330)
修复了 SeriesGroupBy.agg() 中 SeriesGroupBy 命名聚合中此前接受任何列名的 bug。现在只允许 str 和可调用对象，否则会引发 TypeError (GH 34422)
修复了 DataFrame.groupby() 在其中一个 agg 键引用空列表时丢失 Index 名称的 bug (GH 32580)
修复了 Rolling.apply() 在指定 engine='numba' 时 center=True 被忽略的 bug (GH 34784)
修复了 DataFrame.ewm.cov() 对 MultiIndex 输入抛出 AssertionError 的 bug (GH 34440)
修复了 core.groupby.DataFrameGroupBy.quantile() 对非数值类型引发 TypeError 而不是删除列的 bug (GH 27892)
修复了 core.groupby.DataFrameGroupBy.transform() 在 func='nunique' 且列类型为 datetime64 时，结果类型仍为 datetime64 而非 int64 的 bug (GH 35109)
修复了 DataFrame.groupby() 在选择列并使用 as_index=False 进行聚合时引发 AttributeError 的 bug (GH 35246)。
修复了 DataFrameGroupBy.first() 和 DataFrameGroupBy.last() 在按多个 Categoricals 分组时会引发不必要的 ValueError 的 bug (GH 34951)

重塑#

修复了影响所有数值和布尔型归约方法未返回子类数据类型的 bug (GH 25596)
修复了 DataFrame.pivot_table() 在仅设置 MultiIndexed 列时存在的 bug (GH 17038)
修复了 DataFrame.unstack() 和 Series.unstack() 可以接受 MultiIndexed 数据中的元组名称的 bug (GH 19966)
修复了 DataFrame.pivot_table() 在 margin 为 True 且仅定义了 column 时存在的 bug (GH 31016)
修正了 DataFrame.pivot() 在 columns 设置为 None 时不正确的错误消息 (GH 30924)
修复了 crosstab() 在输入为两个 Series 且具有元组名称时，输出将保留一个虚拟 MultiIndex 作为列的 bug (GH 18321)
DataFrame.pivot() 现在可以接受 index 和 columns 参数的列表 (GH 21425)
修复了 concat() 在 copy=True 时结果索引未被复制的 bug (GH 29879)
修复了 SeriesGroupBy.aggregate() 聚合名称相同时导致聚合被覆盖的 bug (GH 30880)
修复了 Index.astype() 在从 Float64Index 转换为 Int64Index 或转换为 ExtensionArray dtype 时会丢失 name 属性的 bug (GH 32013)
当传递 DataFrame 或包含 DataFrame 的序列时，Series.append() 现在将引发 TypeError (GH 31413)
DataFrame.replace() 和 Series.replace() 如果 to_replace 不是预期类型，将引发 TypeError。此前，replace 会静默失败 (GH 18634)
修复了 Series 的原地操作 bug，该操作会将列添加回原始从中删除的 DataFrame (使用 inplace=True) (GH 30484)
修复了 DataFrame.apply() 的 bug，即使请求 raw=True，回调函数仍会使用 Series 参数调用 (GH 32423)
修复了 DataFrame.pivot_table() 在从具有时区感知 dtype 的列创建 MultiIndex 级别时丢失时区信息的 bug (GH 32558)
修复了 concat() 在将非字典映射作为 objs 传入时会引发 TypeError 的 bug (GH 32863)
DataFrame.agg() 现在在尝试聚合不存在的列时提供更具描述性的 SpecificationError 消息 (GH 32755)
修复了 DataFrame.unstack() 在使用 MultiIndex 列和 MultiIndex 行时存在的 bug (GH 32624, GH 24729 和 GH 28306)
在不传入 ignore_index=True 的情况下向 DataFrame 附加字典将引发 TypeError: Can only append a dict if ignore_index=True，而不是 TypeError: Can only append a :class:`Series` if ignore_index=True or if the :class:`Series` has a name (GH 30871)
修复了 DataFrame.corrwith()、DataFrame.memory_usage()、DataFrame.dot()、DataFrame.idxmin()、DataFrame.idxmax()、DataFrame.duplicated()、DataFrame.isin()、DataFrame.count()、Series.explode()、Series.asof() 和 DataFrame.asof() 未返回子类类型的 bug (GH 31331)
修复了 concat() 不允许串联具有重复键的 DataFrame 和 Series 的 bug (GH 33654)
修复了 cut() 在参数 labels 包含重复项时引发错误的 bug (GH 33141)
确保只有命名函数可以在 eval() 中使用 (GH 32460)
修复了 Dataframe.aggregate() 和 Series.aggregate() 在某些情况下导致递归循环的 bug (GH 34224)
修复了 melt() 在熔化 MultiIndex 列且 col_level > 0 时会引发 KeyError on id_vars 的回归 bug (GH 34129)
修复了 Series.where() 在空 Series 和非布尔 dtype 的空 cond 时存在的 bug (GH 34592)
修复了 DataFrame.apply() 对 S dtype 元素引发 ValueError 的回归 bug (GH 34529)

稀疏数据#

从时区感知 dtype 创建 SparseArray 将在删除时区信息之前发出警告，而不是静默删除 (GH 32501)
修复了 arrays.SparseArray.from_spmatrix() 错误读取 scipy 稀疏矩阵的 bug (GH 31991)
修复了 Series.sum() 与 SparseArray 结合使用时引发 TypeError 的 bug (GH 25777)
修复了包含完全稀疏的 SparseArray (填充了 NaN) 的 DataFrame 在按列表索引时出现的 bug (GH 27781, GH 29563)
SparseDtype 的 repr 现在包含其 fill_value 属性的 repr。此前它使用 fill_value 的字符串表示形式 (GH 34352)
修复了空 DataFrame 无法转换为 SparseDtype 的 bug (GH 33113)
修复了 arrays.SparseArray() 在使用可迭代对象索引稀疏 DataFrame 时返回不正确类型的 bug (GH 34526, GH 34540)

扩展数组#

修复了 Series.value_counts() 在 Int64 dtype 空输入时会引发错误的 bug (GH 33317)
修复了 concat() 在连接列不重叠的 DataFrame 对象时导致 object-dtype 列而不是保留扩展 dtype 的 bug (GH 27692, GH 33027)
修复了 StringArray.isna() 在 pandas.options.mode.use_inf_as_na 设置为 True 时对 NA 值返回 False 的 bug (GH 33655)
修复了使用 EA dtype 和索引但无数据或标量数据构造 Series 失败的 bug (GH 26469)
修复了导致 Series.__repr__() 对元素是多维数组的扩展类型崩溃的 bug (GH 33770)。
修复了 Series.update() 在 ExtensionArray dtypes 包含缺失值时会引发 ValueError 的 bug (GH 33980)
修复了 StringArray.memory_usage() 未实现时的 bug (GH 33963)
修复了 DataFrameGroupBy() 会忽略可空布尔 dtypes 聚合的 min_count 参数的 bug (GH 34051)
修复了 DataFrame 构造函数在 dtype='string' 时会失败的 bug (GH 27953, GH 33623)
修复了 DataFrame 列设置为标量扩展类型时被视为对象类型而不是扩展类型的 bug (GH 34832)
修复了 IntegerArray.astype() 未能正确复制掩码的 bug (GH 34931)。

其他#

对对象 dtype Index 进行的集合操作现在总是返回对象 dtype 结果 (GH 31401)
修正了 pandas.testing.assert_series_equal() 在 left 参数是不同子类且 check_series_type=True 时正确引发错误的 bug (GH 32670)。
在 DataFrame.query() 或 DataFrame.eval() 字符串中获取缺失属性时，会引发正确的 AttributeError (GH 32408)
修复了 pandas.testing.assert_series_equal() 在 check_dtype 为 False 时仍检查 Interval 和 ExtensionArray 操作数 dtypes 的 bug (GH 32747)
修复了 DataFrame.__dir__() 在列名中使用 Unicode 代理项时导致分段错误的 bug (GH 25509)
修复了 DataFrame.equals() 和 Series.equals() 允许子类相等的问题 (GH 34402)。

贡献者#

共有 368 人为本次发布贡献了补丁。名字旁边带有“+”的人是首次贡献补丁。

3vts +
A Brooks +
Abbie Popa +
Achmad Syarif Hidayatullah +
Adam W Bagaskarta +
Adrian Mastronardi +
Aidan Montare +
Akbar Septriyan +
Akos Furton +
Alejandro Hall +
Alex Hall +
Alex Itkes +
Alex Kirko
Ali McMaster +
Alvaro Aleman +
Amy Graham +
Andrew Schonfeld +
Andrew Shumanskiy +
Andrew Wieteska +
Angela Ambroz
Anjali Singh +
Anna Daglis
Anthony Milbourne +
Antony Lee +
Ari Sosnovsky +
Arkadeep Adhikari +
Arunim Samudra +
Ashkan +
Ashwin Prakash Nalwade +
Ashwin Srinath +
Atsushi Nukariya +
Ayappan +
Ayla Khan +
Bart +
Bart Broere +
Benjamin Beier Liu +
Benjamin Fischer +
Bharat Raghunathan
Bradley Dice +
Brendan Sullivan +
Brian Strand +
Carsten van Weelden +
Chamoun Saoma +
ChrisRobo +
Christian Chwala
Christopher Whelan
Christos Petropoulos +
Chuanzhu Xu
CloseChoice +
Clément Robert +
CuylenE +
DanBasson +
Daniel Saxton
Danilo Horta +
DavaIlhamHaeruzaman +
Dave Hirschfeld
Dave Hughes
David Rouquet +
David S +
Deepyaman Datta
Dennis Bakhuis +
Derek McCammond +
Devjeet Roy +
Diane Trout
Dina +
Dom +
Drew Seibert +
EdAbati
Emiliano Jordan +
Erfan Nariman +
Eric Groszman +
Erik Hasse +
Erkam Uyanik +
Evan D +
Evan Kanter +
Fangchen Li +
Farhan Reynaldo +
Farhan Reynaldo Hutabarat +
Florian Jetter +
Fred Reiss +
GYHHAHA +
Gabriel Moreira +
Gabriel Tutui +
Galuh Sahid
Gaurav Chauhan +
George Hartzell +
Gim Seng +
Giovanni Lanzani +
Gordon Chen +
Graham Wetzler +
Guillaume Lemaitre
Guillem Sánchez +
HH-MWB +
Harshavardhan Bachina
How Si Wei
Ian Eaves
Iqrar Agalosi Nureyza +
Irv Lustig
Iva Laginja +
JDkuba
Jack Greisman +
Jacob Austin +
Jacob Deppen +
Jacob Peacock +
Jake Tae +
Jake Vanderplas +
James Cobon-Kerr
Jan Červenka +
Jan Škoda
Jane Chen +
Jean-Francois Zinque +
Jeanderson Barros Candido +
Jeff Reback
Jered Dominguez-Trujillo +
Jeremy Schendel
Jesse Farnham
Jiaxiang
Jihwan Song +
Joaquim L. Viegas +
Joel Nothman
John Bodley +
John Paton +
Jon Thielen +
Joris Van den Bossche
Jose Manuel Martí +
Joseph Gulian +
Josh Dimarsky
Joy Bhalla +
João Veiga +
Julian de Ruiter +
Justin Essert +
Justin Zheng
KD-dev-lab +
Kaiqi Dong
Karthik Mathur +
Kaushal Rohit +
Kee Chong Tan
Ken Mankoff +
Kendall Masse
Kenny Huynh +
Ketan +
Kevin Anderson +
Kevin Bowey +
Kevin Sheppard
Kilian Lieret +
Koki Nishihara +
Krishna Chivukula +
KrishnaSai2020 +
Lesley +
Lewis Cowles +
Linda Chen +
Linxiao Wu +
Lucca Delchiaro Costabile +
MBrouns +
Mabel Villalba
Mabroor Ahmed +
Madhuri Palanivelu +
Mak Sze Chun
Malcolm +
Marc Garcia
Marco Gorelli
Marian Denes +
Martin Bjeldbak Madsen +
Martin Durant +
Martin Fleischmann +
Martin Jones +
Martin Winkel
Martina Oefelein +
Marvzinc +
María Marino +
Matheus Cardoso +
Mathis Felardos +
Matt Roeschke
Matteo Felici +
Matteo Santamaria +
Matthew Roeschke
Matthias Bussonnier
Max Chen
Max Halford +
Mayank Bisht +
Megan Thong +
Michael Marino +
Miguel Marques +
Mike Kutzma
Mohammad Hasnain Mohsin Rajan +
Mohammad Jafar Mashhadi +
MomIsBestFriend
Monica +
Natalie Jann
Nate Armstrong +
Nathanael +
Nick Newman +
Nico Schlömer +
Niklas Weber +
ObliviousParadigm +
Olga Lyashevska +
OlivierLuG +
Pandas Development Team
Parallels +
Patrick +
Patrick Cando +
Paul Lilley +
Paul Sanders +
Pearcekieser +
Pedro Larroy +
Pedro Reys
Peter Bull +
Peter Steinbach +
Phan Duc Nhat Minh +
Phil Kirlin +
Pierre-Yves Bourguignon +
Piotr Kasprzyk +
Piotr Niełacny +
Prakhar Pandey
Prashant Anand +
Puneetha Pai +
Quang Nguyễn +
Rafael Jaimes III +
Rafif +
RaisaDZ +
Rakshit Naidu +
Ram Rachum +
Red +
Ricardo Alanis +
Richard Shadrach +
Rik-de-Kort
Robert de Vries
Robin to Roxel +
Roger Erens +
Rohith295 +
Roman Yurchak
Ror +
Rushabh Vasani
Ryan
Ryan Nazareth
SAI SRAVAN MEDICHERLA +
SHUBH CHATTERJEE +
Sam Cohan
Samira-g-js +
Sandu Ursu +
Sang Agung +
SanthoshBala18 +
Sasidhar Kasturi +
SatheeshKumar Mohan +
Saul Shanabrook
Scott Gigante +
Sebastian Berg +
Sebastián Vanrell
Sergei Chipiga +
Sergey +
ShilpaSugan +
Simon Gibbons
Simon Hawkins
Simon Legner +
Soham Tiwari +
Song Wenhao +
Souvik Mandal
Spencer Clark
Steffen Rehberg +
Steffen Schmitz +
Stijn Van Hoey
Stéphan Taljaard
SultanOrazbayev +
Sumanau Sareen
SurajH1 +
Suvayu Ali +
Terji Petersen
Thomas J Fan +
Thomas Li
Thomas Smith +
Tim Swast
Tobias Pitters +
Tom +
Tom Augspurger
Uwe L. Korn
Valentin Iovene +
Vandana Iyer +
Venkatesh Datta +
Vijay Sai Mutyala +
Vikas Pandey
Vipul Rai +
Vishwam Pandya +
Vladimir Berkutov +
Will Ayd
Will Holmgren
William +
William Ayd
Yago González +
Yosuke KOBAYASHI +
Zachary Lawrence +
Zaky Bilfagih +
Zeb Nicholls +
alimcmaster1
alm +
andhikayusup +
andresmcneill +
avinashpancham +
benabel +
bernie gray +
biddwan09 +
brock +
chris-b1
cleconte987 +
dan1261 +
david-cortes +
davidwales +
dequadras +
dhuettenmoser +
dilex42 +
elmonsomiat +
epizzigoni +
fjetter
gabrielvf1 +
gdex1 +
gfyoung
guru kiran +
h-vishal
iamshwin
jamin-aws-ospo +
jbrockmendel
jfcorbett +
jnecus +
kernc
kota matsuoka +
kylekeppler +
leandermaben +
link2xt +
manoj_koneni +
marydmit +
masterpiga +
maxime.song +
mglasder +
moaraccounts +
mproszewska
neilkg
nrebena
ossdev07 +
paihu
pan Jacek +
partev +
patrick +
pedrooa +
pizzathief +
proost
pvanhauw +
rbenes
rebecca-palmer
rhshadrach +
rjfs +
s-scherrer +
sage +
sagungrp +
salem3358 +
saloni30 +
smartswdeveloper +
smartvinnetou +
themien +
timhunderwood +
tolhassianipar +
tonywu1999
tsvikas
tv3141
venkateshdatta1993 +
vivikelapoutre +
willbowditch +
willpeppo +
za +
zaki-indra +

1.1.0 版本新特性 (2020 年 7 月 28 日)#

改进#

`loc` 引发的 `KeyError` 现在会指定缺失的标签#

现在所有 `dtype` 都可以转换为 `StringDtype`#

非单调 PeriodIndex 的部分字符串切片#

比较两个 DataFrame 或两个 Series 并总结差异#

允许 groupby 键中包含 NA 值#

带键的排序#

Timestamp 构造函数中支持 fold 参数#

`to_datetime` 中解析带不同时区的时区感知格式#

`Grouper` 和 `resample` 现在支持 `origin` 和 `offset` 参数#

`fsspec` 现在用于文件系统处理#

其他改进#

重要错误修复#

`MultiIndex.get_indexer` 正确解释 `method` 参数#

基于标签的查找失败始终引发 `KeyError`#

`MultiIndex` 上的整数查找失败引发 `KeyError`#

`DataFrame.merge()` 保留右侧 DataFrame 的行顺序#

当某些列不存在时，为 DataFrame 的多个列赋值#

`groupby` 聚合操作的一致性#

`DataFrameGroupby.agg()` 在 `as_index=False` 且重命名列时丢失结果#

DataFrame 上的 `apply` 和 `applymap` 仅评估第一行/列一次#

向后不兼容的 API 更改#

为 `testing.assert_frame_equal` 和 `testing.assert_series_equal` 添加了 `check_freq` 参数#

提高了依赖项的最低版本要求#

开发变更#

弃用#

性能改进#

错误修复#

类别型#

日期时间类型#

时间差#

时区#

数值型#

转换#

字符串#

区间#

索引#

缺失值#

多重索引#

输入/输出#

绘图#

分组/重采样/滚动#

重塑#

稀疏数据#

扩展数组#

其他#

贡献者#

比较两个 `DataFrame` 或两个 `Series` 并总结差异#

`DataFrame` 上的 `apply` 和 `applymap` 仅评估第一行/列一次#