0.24.0 版中的新功能（2019 年 1 月 25 日）#

警告

0.24.x 系列版本将是最后一个支持 Python 2 的版本。未来的功能版本将仅支持 Python 3。有关更多详细信息，请参阅放弃 Python 2.7。

这是 0.23.4 版的一个主要版本，包含多项 API 更改、新功能、增强功能和性能改进，以及大量的错误修复。

主要亮点包括

可选的整数 NA 支持
访问 Series 或 Index 后端数组的新 API
创建数组的新顶级方法
在 Series 或 DataFrame 中存储 Interval 和 Period 数据
支持在两个 MultiIndex 上进行连接

更新前请查阅API 变更和弃用说明。

这些是 pandas 0.24.0 中的变化。有关包括其他 pandas 版本在内的完整更新日志，请参阅发行说明。

改进#

可选的整数 NA 支持#

pandas 获得了保存带有缺失值的整数 dtype 的能力。这一长期需求的功能通过使用扩展类型得以实现。

注意

IntegerArray 目前处于实验阶段。其 API 或实现可能会在不发出警告的情况下发生变化。

我们可以使用指定的 dtype 构建一个 Series。dtype 字符串 Int64 是一个 pandas ExtensionDtype。使用传统的缺失值标记 np.nan 指定列表或数组将推断为整数 dtype。Series 的显示也将使用 NaN 在字符串输出中指示缺失值。（GH 20700, GH 20747, GH 22441, GH 21789, GH 22346）

In [1]: s = pd.Series([1, 2, np.nan], dtype='Int64')

In [2]: s
Out[2]: 
0       1
1       2
2    <NA>
Length: 3, dtype: Int64

这些 dtype 上的操作将像其他 pandas 操作一样传播 NaN。

# arithmetic
In [3]: s + 1
Out[3]: 
0       2
1       3
2    <NA>
Length: 3, dtype: Int64

# comparison
In [4]: s == 1
Out[4]: 
0     True
1    False
2     <NA>
Length: 3, dtype: boolean

# indexing
In [5]: s.iloc[1:3]
Out[5]: 
1       2
2    <NA>
Length: 2, dtype: Int64

# operate with other dtypes
In [6]: s + s.iloc[1:3].astype('Int8')
Out[6]: 
0    <NA>
1       4
2    <NA>
Length: 3, dtype: Int64

# coerce when needed
In [7]: s + 0.01
Out[7]: 
0    1.01
1    2.01
2    <NA>
Length: 3, dtype: Float64

这些 dtype 可以作为 DataFrame 的一部分进行操作。

In [8]: df = pd.DataFrame({'A': s, 'B': [1, 1, 3], 'C': list('aab')})

In [9]: df
Out[9]: 
      A  B  C
0     1  1  a
1     2  1  a
2  <NA>  3  b

[3 rows x 3 columns]

In [10]: df.dtypes
Out[10]: 
A     Int64
B     int64
C    object
Length: 3, dtype: object

这些 dtype 可以合并、重塑和转换。

In [11]: pd.concat([df[['A']], df[['B', 'C']]], axis=1).dtypes
Out[11]: 
A     Int64
B     int64
C    object
Length: 3, dtype: object

In [12]: df['A'].astype(float)
Out[12]: 
0    1.0
1    2.0
2    NaN
Name: A, Length: 3, dtype: float64

归约和 groupby 操作（如 sum）也可用。

In [13]: df.sum()
Out[13]: 
A      3
B      5
C    aab
Length: 3, dtype: object

In [14]: df.groupby('B').A.sum()
Out[14]: 
B
1    3
3    0
Name: A, Length: 2, dtype: Int64

警告

整数 NA 支持目前使用大写的 dtype 版本，例如 Int8，而不是传统的 int8。这可能会在未来更改。

有关更多信息，请参阅可空整数数据类型。

访问 Series 或 Index 中的值#

已添加Series.array 和 Index.array，用于提取 Series 或 Index 后端的数组。（GH 19954, GH 23623）

In [15]: idx = pd.period_range('2000', periods=4)

In [16]: idx.array
Out[16]: 
<PeriodArray>
['2000-01-01', '2000-01-02', '2000-01-03', '2000-01-04']
Length: 4, dtype: period[D]

In [17]: pd.Series(idx).array
Out[17]: 
<PeriodArray>
['2000-01-01', '2000-01-02', '2000-01-03', '2000-01-04']
Length: 4, dtype: period[D]

历史上，这会通过 series.values 完成，但使用 .values 时不清楚返回的值是实际数组、其某种转换形式，还是 pandas 的自定义数组之一（如 Categorical）。例如，对于 PeriodIndex，.values 每次都会生成一个新的 period 对象 ndarray。

In [18]: idx.values
Out[18]: 
array([Period('2000-01-01', 'D'), Period('2000-01-02', 'D'),
       Period('2000-01-03', 'D'), Period('2000-01-04', 'D')], dtype=object)

In [19]: id(idx.values)
Out[19]: 140274320467376

In [20]: id(idx.values)
Out[20]: 140274716206992

如果您需要实际的 NumPy 数组，请使用Series.to_numpy() 或 Index.to_numpy()。

In [21]: idx.to_numpy()
Out[21]: 
array([Period('2000-01-01', 'D'), Period('2000-01-02', 'D'),
       Period('2000-01-03', 'D'), Period('2000-01-04', 'D')], dtype=object)

In [22]: pd.Series(idx).to_numpy()
Out[22]: 
array([Period('2000-01-01', 'D'), Period('2000-01-02', 'D'),
       Period('2000-01-03', 'D'), Period('2000-01-04', 'D')], dtype=object)

对于由普通 NumPy 数组支持的 Series 和 Index，Series.array 将返回一个新的 arrays.PandasArray，它是一个围绕 numpy.ndarray 的轻量级（无复制）包装器。PandasArray 本身并不是特别有用，但它提供了与 pandas 或第三方库中定义的任何扩展数组相同的接口。

In [23]: ser = pd.Series([1, 2, 3])

In [24]: ser.array
Out[24]: 
<NumpyExtensionArray>
[1, 2, 3]
Length: 3, dtype: int64

In [25]: ser.to_numpy()
Out[25]: array([1, 2, 3])

我们没有移除或弃用Series.values 或DataFrame.values，但我们强烈建议改用 .array 或 .to_numpy()。

有关更多信息，请参阅Dtypes 和Attributes and Underlying Data。

`pandas.array`: 创建数组的新顶级方法#

已添加一个新的顶级方法 array()，用于创建一维数组（GH 22860）。这可用于创建任何扩展数组，包括由第三方库注册的扩展数组。有关扩展数组的更多信息，请参阅dtypes 文档。

In [26]: pd.array([1, 2, np.nan], dtype='Int64')
Out[26]: 
<IntegerArray>
[1, 2, <NA>]
Length: 3, dtype: Int64

In [27]: pd.array(['a', 'b', 'c'], dtype='category')
Out[27]: 
['a', 'b', 'c']
Categories (3, object): ['a', 'b', 'c']

传递没有专用扩展类型（例如 float、integer 等）的数据将返回一个新的 arrays.PandasArray，它只是一个围绕 numpy.ndarray 的轻量级（无复制）包装器，满足 pandas 扩展数组接口。

In [28]: pd.array([1, 2, 3])
Out[28]: 
<IntegerArray>
[1, 2, 3]
Length: 3, dtype: Int64

PandasArray 本身并不是一个非常有用的对象。但如果您需要编写适用于任何 ExtensionArray 的底层通用代码，PandasArray 可以满足该需求。

请注意，默认情况下，如果未指定 dtype，则返回数组的 dtype 将从数据中推断。特别需要注意的是，第一个示例 [1, 2, np.nan] 将返回一个浮点数组，因为 NaN 是一个浮点数。

In [29]: pd.array([1, 2, np.nan])
Out[29]: 
<IntegerArray>
[1, 2, <NA>]
Length: 3, dtype: Int64

在 Series 和 DataFrame 中存储 Interval 和 Period 数据#

Interval 和 Period 数据现在可以存储在 Series 或 DataFrame 中，除了以前的 IntervalIndex 和 PeriodIndex。（GH 19453, GH 22862）

In [30]: ser = pd.Series(pd.interval_range(0, 5))

In [31]: ser
Out[31]: 
0    (0, 1]
1    (1, 2]
2    (2, 3]
3    (3, 4]
4    (4, 5]
Length: 5, dtype: interval

In [32]: ser.dtype
Out[32]: interval[int64, right]

对于 Period

In [33]: pser = pd.Series(pd.period_range("2000", freq="D", periods=5))

In [34]: pser
Out[34]: 
0    2000-01-01
1    2000-01-02
2    2000-01-03
3    2000-01-04
4    2000-01-05
Length: 5, dtype: period[D]

In [35]: pser.dtype
Out[35]: period[D]

以前，这些数据会被转换为具有对象 dtype 的 NumPy 数组。总的来说，当在 Series 或 DataFrame 的列中存储 Interval 或 Period 数组时，这应该会带来更好的性能。

使用Series.array 从 Series 中提取 Interval 或 Period 的底层数组

In [36]: ser.array
Out[36]: 
<IntervalArray>
[(0, 1], (1, 2], (2, 3], (3, 4], (4, 5]]
Length: 5, dtype: interval[int64, right]

In [37]: pser.array
Out[37]: 
<PeriodArray>
['2000-01-01', '2000-01-02', '2000-01-03', '2000-01-04', '2000-01-05']
Length: 5, dtype: period[D]

这些返回 arrays.IntervalArray 或 arrays.PeriodArray 的实例，它们是支持 Interval 和 Period 数据的新扩展数组。

警告

为了向后兼容，Series.values 继续为 Interval 和 Period 数据返回 NumPy 对象数组。当您需要 Series 中存储的数据数组时，我们建议使用Series.array；当您确定需要 NumPy 数组时，建议使用Series.to_numpy()。

有关更多信息，请参阅Dtypes 和Attributes and Underlying Data。

与两个 MultiIndex 进行连接#

DataFrame.merge() 和 DataFrame.join() 现在可用于在重叠的索引级别上连接多索引的 Dataframe 实例（GH 6360）

请参阅合并、连接和拼接文档部分。

In [38]: index_left = pd.MultiIndex.from_tuples([('K0', 'X0'), ('K0', 'X1'),
   ....:                                        ('K1', 'X2')],
   ....:                                        names=['key', 'X'])
   ....: 

In [39]: left = pd.DataFrame({'A': ['A0', 'A1', 'A2'],
   ....:                      'B': ['B0', 'B1', 'B2']}, index=index_left)
   ....: 

In [40]: index_right = pd.MultiIndex.from_tuples([('K0', 'Y0'), ('K1', 'Y1'),
   ....:                                         ('K2', 'Y2'), ('K2', 'Y3')],
   ....:                                         names=['key', 'Y'])
   ....: 

In [41]: right = pd.DataFrame({'C': ['C0', 'C1', 'C2', 'C3'],
   ....:                       'D': ['D0', 'D1', 'D2', 'D3']}, index=index_right)
   ....: 

In [42]: left.join(right)
Out[42]: 
            A   B   C   D
key X  Y                 
K0  X0 Y0  A0  B0  C0  D0
    X1 Y0  A1  B1  C0  D0
K1  X2 Y1  A2  B2  C1  D1

[3 rows x 4 columns]

对于早期版本，可以使用以下方法完成。

In [43]: pd.merge(left.reset_index(), right.reset_index(),
   ....:          on=['key'], how='inner').set_index(['key', 'X', 'Y'])
   ....: 
Out[43]: 
            A   B   C   D
key X  Y                 
K0  X0 Y0  A0  B0  C0  D0
    X1 Y0  A1  B1  C0  D0
K1  X2 Y1  A2  B2  C1  D1

[3 rows x 4 columns]

`read_html` 函数的改进#

read_html() 以前会忽略 colspan 和 rowspan 属性。现在它能理解这些属性，并将它们视为具有相同值的一系列单元格。（GH 17054）

In [44]: from io import StringIO

In [45]: result = pd.read_html(StringIO("""
   ....:   <table>
   ....:     <thead>
   ....:       <tr>
   ....:         <th>A</th><th>B</th><th>C</th>
   ....:       </tr>
   ....:     </thead>
   ....:     <tbody>
   ....:       <tr>
   ....:         <td colspan="2">1</td><td>2</td>
   ....:       </tr>
   ....:     </tbody>
   ....:   </table>"""))
   ....: 

旧行为:

In [13]: result
Out [13]:
[   A  B   C
 0  1  2 NaN]

新行为:

In [46]: result
Out[46]: 
[   A  B  C
 0  1  1  2
 
 [1 rows x 3 columns]]

新的 `Styler.pipe()` 方法#

Styler 类新增了 pipe() 方法。这提供了一种方便的方式来应用用户预定义样式函数，并有助于减少在 Notebook 中重复使用 DataFrame 样式功能时的“样板代码”。(GH 23229)

In [47]: df = pd.DataFrame({'N': [1250, 1500, 1750], 'X': [0.25, 0.35, 0.50]})

In [48]: def format_and_align(styler):
   ....:     return (styler.format({'N': '{:,}', 'X': '{:.1%}'})
   ....:                   .set_properties(**{'text-align': 'right'}))
   ....: 

In [49]: df.style.pipe(format_and_align).set_caption('Summary of results.')
Out[49]: <pandas.io.formats.style.Styler at 0x7f9425e40c40>

pandas 中的其他类已存在类似方法，包括DataFrame.pipe()、GroupBy.pipe() 和Resampler.pipe()。

重命名 MultiIndex 中的名称#

DataFrame.rename_axis() 现在支持 index 和 columns 参数，Series.rename_axis() 支持 index 参数（GH 19978）。

此更改允许传递字典，以便可以更改 MultiIndex 的某些名称。

示例

In [50]: mi = pd.MultiIndex.from_product([list('AB'), list('CD'), list('EF')],
   ....:                                 names=['AB', 'CD', 'EF'])
   ....: 

In [51]: df = pd.DataFrame(list(range(len(mi))), index=mi, columns=['N'])

In [52]: df
Out[52]: 
          N
AB CD EF   
A  C  E   0
      F   1
   D  E   2
      F   3
B  C  E   4
      F   5
   D  E   6
      F   7

[8 rows x 1 columns]

In [53]: df.rename_axis(index={'CD': 'New'})
Out[53]: 
           N
AB New EF   
A  C   E   0
       F   1
   D   E   2
       F   3
B  C   E   4
       F   5
   D   E   6
       F   7

[8 rows x 1 columns]

有关更多详细信息，请参阅高级文档中的重命名部分。

其他改进#

merge() 现在直接允许 DataFrame 类型对象和命名 Series 之间的合并，无需事先将 Series 对象转换为 DataFrame（GH 21220）
ExcelWriter 现在接受 mode 作为关键字参数，在使用 openpyxl 引擎时，可以追加到现有工作簿（GH 3441）
FrozenList 获得了 .union() 和 .difference() 方法。此功能大大简化了依赖于明确排除某些列的 groupby 操作。有关更多信息，请参阅将对象拆分为组（GH 15475, GH 15506）。
DataFrame.to_parquet() 现在接受 index 作为参数，允许用户覆盖引擎的默认行为，以在生成的 Parquet 文件中包含或省略 DataFrame 的索引。（GH 20768）
read_feather() 现在接受 columns 作为参数，允许用户指定应读取哪些列。（GH 24025）
DataFrame.corr() 和 Series.corr() 现在接受可调用对象，用于相关性的通用计算方法，例如直方图交集（GH 22684）
DataFrame.to_string() 现在接受 decimal 作为参数，允许用户指定输出中应使用的小数分隔符。（GH 23614）
DataFrame.to_html() 现在接受 render_links 作为参数，允许用户生成包含 DataFrame 中出现的任何 URL 链接的 HTML。有关示例用法，请参阅 IO 文档中关于写入 HTML 的部分。（GH 2679）
pandas.read_csv() 现在支持将 pandas 扩展类型作为 dtype 的参数，允许用户在读取 CSV 时使用 pandas 扩展类型。（GH 23228）
shift() 方法现在接受 fill_value 作为参数，允许用户指定一个值，该值将在空期间替换 NA/NaT。（GH 15486）
to_datetime() 现在支持将 %Z 和 %z 指令传递到 format 参数中（GH 13486）
Series.mode() 和 DataFrame.mode() 现在支持 dropna 参数，可用于指定是否应考虑 NaN/NaT 值（GH 17534）
DataFrame.to_csv() 和 Series.to_csv() 在传递文件句柄时，现在支持 compression 关键字。（GH 21227）
Index.droplevel() 现在也为扁平索引实现，以便与 MultiIndex 兼容（GH 21115）
Series.droplevel() 和 DataFrame.droplevel() 现在已实现（GH 20342）
通过 gcsfs 库增加了对 Google Cloud Storage 读写的支持（GH 19454, GH 23094）
DataFrame.to_gbq() 和 read_gbq() 的签名和文档已更新，以反映 pandas-gbq 库 0.8.0 版本的更改。增加了 credentials 参数，可启用任何类型的 google-auth 凭据。（GH 21627, GH 22557, GH 23662）
新方法 HDFStore.walk() 将递归遍历 HDF5 文件的组层次结构（GH 10932）
read_html() 会跨 colspan 和 rowspan 复制单元格数据，并且在未提供 header 关键字参数且没有 thead 时，会将所有 th 表行视为标题（GH 17054）
Series.nlargest()、Series.nsmallest()、DataFrame.nlargest() 和 DataFrame.nsmallest() 现在接受 keep 参数的 "all" 值。这会保留第 n 大/小值的所有并列项（GH 16818）
IntervalIndex 获得了 set_closed() 方法，用于更改现有的 closed 值（GH 21670）
to_csv()、to_csv()、to_json() 和 to_json() 现在支持 compression='infer' 以根据文件名扩展名推断压缩（GH 15008）。to_csv、to_json 和 to_pickle 方法的默认压缩已更新为 'infer'（GH 22004）。
DataFrame.to_sql() 现在支持为受支持的数据库写入 TIMESTAMP WITH TIME ZONE 类型。对于不支持时区的数据库，日期时间数据将存储为不带时区信息的本地时间戳。请参阅日期时间数据类型以了解其影响（GH 9086）。
to_timedelta() 现在支持 ISO 格式的 timedelta 字符串（GH 21877）
Series 和 DataFrame 现在支持构造函数中的 Iterable 对象（GH 2193）
DatetimeIndex 获得了 DatetimeIndex.timetz 属性。这会返回带有时区信息的本地时间。（GH 21358）
对于 DatetimeIndex 和 Timestamp，round()、ceil() 和 floor() 现在支持 ambiguous 参数以处理四舍五入到模糊时间的日期时间（GH 18946），以及 nonexistent 参数以处理四舍五入到不存在时间的日期时间。请参阅本地化时不存在的时间（GH 22647）
resample() 的结果现在是可迭代的，类似于 groupby()（GH 15314）。
Series.resample() 和 DataFrame.resample() 获得了 Resampler.quantile()（GH 15023）。
带有 PeriodIndex 的 DataFrame.resample() 和 Series.resample() 现在将像 DatetimeIndex 一样遵守 base 参数。（GH 23882）
pandas.api.types.is_list_like() 获得了一个关键字 allow_sets，默认情况下为 True；如果为 False，则 set 的所有实例将不再被视为“列表式”（GH 23061）
Index.to_frame() 现在支持覆盖列名（GH 22580）。
Categorical.from_codes() 现在可以接受 dtype 参数，作为传递 categories 和 ordered 的替代方案（GH 24398）。
新属性 __git_version__ 将返回当前构建的 git 提交 SHA（GH 21295）。
与 Matplotlib 3.0 的兼容性（GH 22790）。
添加了 Interval.overlaps()、arrays.IntervalArray.overlaps() 和 IntervalIndex.overlaps()，用于确定类区间对象之间的重叠（GH 21998）
read_fwf() 现在接受关键字 infer_nrows（GH 15138）。
to_parquet() 现在支持在 engine = 'pyarrow' 时，将 DataFrame 写入为按列子集分区的 parquet 文件目录（GH 23283）
Timestamp.tz_localize()、DatetimeIndex.tz_localize() 和 Series.tz_localize() 获得了 nonexistent 参数，用于替代处理不存在的时间。请参阅本地化时不存在的时间（GH 8917, GH 24466）
Index.difference()、Index.intersection()、Index.union() 和 Index.symmetric_difference() 现在有一个可选的 sort 参数，用于控制结果是否应尽可能排序（GH 17839, GH 24471）
read_excel() 现在接受 usecols 作为列名列表或可调用对象（GH 18273）
已添加 MultiIndex.to_flat_index()，用于将多个级别展平为单级别 Index 对象。
DataFrame.to_stata() 和 pandas.io.stata.StataWriter117 可以将混合字符串列写入 Stata strl 格式（GH 23633）
DataFrame.between_time() 和 DataFrame.at_time() 获得了 axis 参数（GH 8839）
DataFrame.to_records() 现在接受 index_dtypes 和 column_dtypes 参数，以允许存储的列和索引记录中使用不同的数据类型（GH 18146）
IntervalIndex 获得了 is_overlapping 属性，用于指示 IntervalIndex 是否包含任何重叠的区间（GH 23309）
pandas.DataFrame.to_sql() 获得了 method 参数，用于控制 SQL 插入子句。请参阅文档中的插入方法部分。（GH 8953）
DataFrame.corrwith() 现在支持 Spearman 秩相关、Kendall tau 以及可调用相关方法。（GH 21925）
DataFrame.to_json()、DataFrame.to_csv()、DataFrame.to_pickle() 和其他导出方法现在支持路径参数中的波浪号（~）。（GH 23473）

向后不兼容的 API 更改#

pandas 0.24.0 包含多项 API 破坏性变更。

依赖项的最低版本要求提高#

我们已更新了依赖项的最低支持版本（GH 21242, GH 18742, GH 23774, GH 24767）。如果已安装，我们现在要求：

包	最低版本	必需
numpy	1.12.0	X
bottleneck	1.2.0
fastparquet	0.2.1
matplotlib	2.0.0
numexpr	2.6.1
pandas-gbq	0.8.0
pyarrow	0.9.0
pytables	3.4.2
scipy	0.18.1
xlrd	1.0.0
pytest (开发版)	3.6

此外，我们不再依赖 feather-format 进行基于 feather 的存储，并将其替换为对 pyarrow 的引用（GH 21639 和 GH 23053）。

`os.linesep` 用于 `DataFrame.to_csv` 的 `line_terminator`#

DataFrame.to_csv() 现在使用 os.linesep() 而不是 '\n' 作为默认的行终止符（GH 20353）。此更改仅影响在 Windows 上运行时，即使在 line_terminator 中传递 '\n'，Windows 上也使用 '\r\n' 作为行终止符。

Windows 上的旧行为

In [1]: data = pd.DataFrame({"string_with_lf": ["a\nbc"],
   ...:                      "string_with_crlf": ["a\r\nbc"]})

In [2]: # When passing file PATH to to_csv,
   ...: # line_terminator does not work, and csv is saved with '\r\n'.
   ...: # Also, this converts all '\n's in the data to '\r\n'.
   ...: data.to_csv("test.csv", index=False, line_terminator='\n')

In [3]: with open("test.csv", mode='rb') as f:
   ...:     print(f.read())
Out[3]: b'string_with_lf,string_with_crlf\r\n"a\r\nbc","a\r\r\nbc"\r\n'

In [4]: # When passing file OBJECT with newline option to
   ...: # to_csv, line_terminator works.
   ...: with open("test2.csv", mode='w', newline='\n') as f:
   ...:     data.to_csv(f, index=False, line_terminator='\n')

In [5]: with open("test2.csv", mode='rb') as f:
   ...:     print(f.read())
Out[5]: b'string_with_lf,string_with_crlf\n"a\nbc","a\r\nbc"\n'

Windows 上的新行为

显式传递 line_terminator，将行终止符设置为该字符。

In [1]: data = pd.DataFrame({"string_with_lf": ["a\nbc"],
   ...:                      "string_with_crlf": ["a\r\nbc"]})

In [2]: data.to_csv("test.csv", index=False, line_terminator='\n')

In [3]: with open("test.csv", mode='rb') as f:
   ...:     print(f.read())
Out[3]: b'string_with_lf,string_with_crlf\n"a\nbc","a\r\nbc"\n'

在 Windows 上，os.linesep 的值为 '\r\n'，因此如果未设置 line_terminator，则使用 '\r\n' 作为行终止符。

In [1]: data = pd.DataFrame({"string_with_lf": ["a\nbc"],
   ...:                      "string_with_crlf": ["a\r\nbc"]})

In [2]: data.to_csv("test.csv", index=False)

In [3]: with open("test.csv", mode='rb') as f:
   ...:     print(f.read())
Out[3]: b'string_with_lf,string_with_crlf\r\n"a\nbc","a\r\nbc"\r\n'

对于文件对象，指定 newline 不足以设置行终止符。即使在这种情况下，您也必须显式传入 line_terminator。

In [1]: data = pd.DataFrame({"string_with_lf": ["a\nbc"],
   ...:                      "string_with_crlf": ["a\r\nbc"]})

In [2]: with open("test2.csv", mode='w', newline='\n') as f:
   ...:     data.to_csv(f, index=False)

In [3]: with open("test2.csv", mode='rb') as f:
   ...:     print(f.read())
Out[3]: b'string_with_lf,string_with_crlf\r\n"a\nbc","a\r\nbc"\r\n'

Python 引擎下字符串数据类型列中 `np.nan` 的正确处理#

read_excel() 和 read_csv() 在 Python 引擎下存在一个错误，当 dtype=str 且 na_filter=True 时，缺失值会变为 'nan'。现在，这些缺失值被转换为字符串缺失指示符 np.nan。（GH 20377）

旧行为:

In [5]: data = 'a,b,c\n1,,3\n4,5,6'
In [6]: df = pd.read_csv(StringIO(data), engine='python', dtype=str, na_filter=True)
In [7]: df.loc[0, 'b']
Out[7]:
'nan'

新行为:

In [54]: data = 'a,b,c\n1,,3\n4,5,6'

In [55]: df = pd.read_csv(StringIO(data), engine='python', dtype=str, na_filter=True)

In [56]: df.loc[0, 'b']
Out[56]: nan

请注意，我们现在输出的是 np.nan 本身，而不是其字符串形式。

解析带时区偏移的日期时间字符串#

以前，使用 to_datetime() 或 DatetimeIndex 解析带 UTC 偏移的日期时间字符串时，会自动将日期时间转换为 UTC，而不会进行时区本地化。这与使用 Timestamp 解析相同的日期时间字符串不一致，后者会保留 tz 属性中的 UTC 偏移。现在，当所有日期时间字符串都具有相同的 UTC 偏移时，to_datetime() 会在 tz 属性中保留 UTC 偏移（GH 17697, GH 11736, GH 22457）

旧行为:

In [2]: pd.to_datetime("2015-11-18 15:30:00+05:30")
Out[2]: Timestamp('2015-11-18 10:00:00')

In [3]: pd.Timestamp("2015-11-18 15:30:00+05:30")
Out[3]: Timestamp('2015-11-18 15:30:00+0530', tz='pytz.FixedOffset(330)')

# Different UTC offsets would automatically convert the datetimes to UTC (without a UTC timezone)
In [4]: pd.to_datetime(["2015-11-18 15:30:00+05:30", "2015-11-18 16:30:00+06:30"])
Out[4]: DatetimeIndex(['2015-11-18 10:00:00', '2015-11-18 10:00:00'], dtype='datetime64[ns]', freq=None)

新行为:

In [57]: pd.to_datetime("2015-11-18 15:30:00+05:30")
Out[57]: Timestamp('2015-11-18 15:30:00+0530', tz='UTC+05:30')

In [58]: pd.Timestamp("2015-11-18 15:30:00+05:30")
Out[58]: Timestamp('2015-11-18 15:30:00+0530', tz='UTC+05:30')

解析具有相同 UTC 偏移的日期时间字符串将保留 tz 中的 UTC 偏移。

In [59]: pd.to_datetime(["2015-11-18 15:30:00+05:30"] * 2)
Out[59]: DatetimeIndex(['2015-11-18 15:30:00+05:30', '2015-11-18 15:30:00+05:30'], dtype='datetime64[ns, UTC+05:30]', freq=None)

解析具有不同 UTC 偏移的日期时间字符串现在将创建具有不同 UTC 偏移的 datetime.datetime 对象的索引。

In [59]: idx = pd.to_datetime(["2015-11-18 15:30:00+05:30",
                               "2015-11-18 16:30:00+06:30"])

In[60]: idx
Out[60]: Index([2015-11-18 15:30:00+05:30, 2015-11-18 16:30:00+06:30], dtype='object')

In[61]: idx[0]
Out[61]: Timestamp('2015-11-18 15:30:00+0530', tz='UTC+05:30')

In[62]: idx[1]
Out[62]: Timestamp('2015-11-18 16:30:00+0630', tz='UTC+06:30')

传递 utc=True 将模拟以前的行为，但会正确指示日期已转换为 UTC。

In [60]: pd.to_datetime(["2015-11-18 15:30:00+05:30",
   ....:                 "2015-11-18 16:30:00+06:30"], utc=True)
   ....: 
Out[60]: DatetimeIndex(['2015-11-18 10:00:00+00:00', '2015-11-18 10:00:00+00:00'], dtype='datetime64[ns, UTC]', freq=None)

使用 `read_csv()` 解析混合时区#

read_csv() 不再静默地将混合时区列转换为 UTC（GH 24987）。

旧行为

>>> import io
>>> content = """\
... a
... 2000-01-01T00:00:00+05:00
... 2000-01-01T00:00:00+06:00"""
>>> df = pd.read_csv(io.StringIO(content), parse_dates=['a'])
>>> df.a
0   1999-12-31 19:00:00
1   1999-12-31 18:00:00
Name: a, dtype: datetime64[ns]

新行为

In[64]: import io

In[65]: content = """\
   ...: a
   ...: 2000-01-01T00:00:00+05:00
   ...: 2000-01-01T00:00:00+06:00"""

In[66]: df = pd.read_csv(io.StringIO(content), parse_dates=['a'])

In[67]: df.a
Out[67]:
0   2000-01-01 00:00:00+05:00
1   2000-01-01 00:00:00+06:00
Name: a, Length: 2, dtype: object

可以看出，dtype 是 object；列中的每个值都是一个字符串。要将字符串转换为日期时间数组，请使用 date_parser 参数。

In [3]: df = pd.read_csv(
   ...:     io.StringIO(content),
   ...:     parse_dates=['a'],
   ...:     date_parser=lambda col: pd.to_datetime(col, utc=True),
   ...: )

In [4]: df.a
Out[4]:
0   1999-12-31 19:00:00+00:00
1   1999-12-31 18:00:00+00:00
Name: a, dtype: datetime64[ns, UTC]

有关更多信息，请参阅解析带时区偏移的日期时间字符串。

`dt.end_time` 和 `to_timestamp(how='end')` 中的时间值#

在调用Series.dt.end_time、Period.end_time、PeriodIndex.end_time、使用 how='end' 调用Period.to_timestamp()，或使用 how='end' 调用PeriodIndex.to_timestamp() 时，Period 和 PeriodIndex 对象中的时间值现在设置为“23:59:59.999999999”（GH 17157）

旧行为:

In [2]: p = pd.Period('2017-01-01', 'D')
In [3]: pi = pd.PeriodIndex([p])

In [4]: pd.Series(pi).dt.end_time[0]
Out[4]: Timestamp(2017-01-01 00:00:00)

In [5]: p.end_time
Out[5]: Timestamp(2017-01-01 23:59:59.999999999)

新行为:

例如，调用Series.dt.end_time 现在将得到时间“23:59:59.999999999”，与Period.end_time 的情况相同。

In [61]: p = pd.Period('2017-01-01', 'D')

In [62]: pi = pd.PeriodIndex([p])

In [63]: pd.Series(pi).dt.end_time[0]
Out[63]: Timestamp('2017-01-01 23:59:59.999999999')

In [64]: p.end_time
Out[64]: Timestamp('2017-01-01 23:59:59.999999999')

时区感知数据的 Series.unique#

Series.unique() 对于带时区值的日期时间的返回类型已从 numpy.ndarray 的 Timestamp 对象更改为 arrays.DatetimeArray。（GH 24024）。

In [65]: ser = pd.Series([pd.Timestamp('2000', tz='UTC'),
   ....:                  pd.Timestamp('2000', tz='UTC')])
   ....: 

旧行为:

In [3]: ser.unique()
Out[3]: array([Timestamp('2000-01-01 00:00:00+0000', tz='UTC')], dtype=object)

新行为:

In [66]: ser.unique()
Out[66]: 
<DatetimeArray>
['2000-01-01 00:00:00+00:00']
Length: 1, dtype: datetime64[ns, UTC]

稀疏数据结构重构#

SparseArray（SparseSeries 的后端数组以及 SparseDataFrame 中的列）现在是一个扩展数组（GH 21978, GH 19056, GH 22835）。为了符合此接口并与 pandas 的其余部分保持一致，进行了一些 API 破坏性更改。

SparseArray 不再是 numpy.ndarray 的子类。要将 SparseArray 转换为 NumPy 数组，请使用 numpy.asarray()。
SparseArray.dtype 和 SparseSeries.dtype 现在是 SparseDtype 的实例，而不是 np.dtype。使用 SparseDtype.subtype 访问底层 dtype。
numpy.asarray(sparse_array) 现在返回包含所有值的稠密数组，而不仅仅是非填充值的值（GH 14167）
SparseArray.take 现在与 pandas.api.extensions.ExtensionArray.take() 的 API 匹配（GH 19506）
- allow_fill 的默认值已从 False 更改为 True。
- out 和 mode 参数现在不再被接受（以前，如果指定它们会引发错误）。
- 不再允许为 indices 传递标量。
concat() 将稀疏和密集 Series 混合后的结果是一个带有稀疏值的 Series，而不是 SparseSeries。
SparseDataFrame.combine 和 DataFrame.combine_first 不再支持在保留稀疏子类型的同时将稀疏列与密集列组合。结果将是一个对象 dtype 的 SparseArray。
现在允许将 SparseArray.fill_value 设置为具有不同 dtype 的填充值。
当切片带有稀疏值的单列时，DataFrame[column] 现在是带有稀疏值的 Series，而不是 SparseSeries。（GH 23559）。
Series.where() 的结果现在是带有稀疏值的 Series，就像其他扩展数组一样（GH 24077）

对于需要或可能具体化大型稠密数组的操作，会发出一些新的警告。

当使用带 method 的 fillna 时，会发出 errors.PerformanceWarning 警告，因为会构建一个稠密数组来创建填充数组。使用 value 填充是填充稀疏数组的有效方式。
当连接具有不同填充值的稀疏 Series 时，现在会发出 errors.PerformanceWarning 警告。将继续使用第一个稀疏数组的填充值。

除了这些 API 破坏性更改之外，还进行了许多性能改进和错误修复。

最后，添加了一个 Series.sparse 访问器，以提供稀疏数据特有的方法，例如 Series.sparse.from_coo()。

In [67]: s = pd.Series([0, 0, 1, 1, 1], dtype='Sparse[int]')

In [68]: s.sparse.density
Out[68]: 0.6

`get_dummies()` 始终返回一个 DataFrame#

以前，当 `sparse=True` 传递给 get_dummies() 时，返回值可能是 DataFrame 或 SparseDataFrame，具体取决于所有列还是仅部分列进行了独热编码。现在，始终返回 DataFrame (GH 24284)。

旧行为

第一个 get_dummies() 返回 `DataFrame`，因为列 `A` 未进行独热编码。当只将 `["B", "C"]` 传递给 `get_dummies` 时，所有列都进行了独热编码，并返回 `SparseDataFrame`。

In [2]: df = pd.DataFrame({"A": [1, 2], "B": ['a', 'b'], "C": ['a', 'a']})

In [3]: type(pd.get_dummies(df, sparse=True))
Out[3]: pandas.core.frame.DataFrame

In [4]: type(pd.get_dummies(df[['B', 'C']], sparse=True))
Out[4]: pandas.core.sparse.frame.SparseDataFrame

新行为

现在，返回类型始终是 DataFrame。

In [69]: type(pd.get_dummies(df, sparse=True))
Out[69]: pandas.core.frame.DataFrame

In [70]: type(pd.get_dummies(df[['B', 'C']], sparse=True))
Out[70]: pandas.core.frame.DataFrame

注意

`SparseDataFrame` 和具有稀疏值的 DataFrame 之间的内存使用没有区别。内存使用将与以前版本的 pandas 相同。

在 `DataFrame.to_dict(orient='index')` 中引发 ValueError#

DataFrame.to_dict() 中的错误在与 `orient='index'` 和非唯一索引一起使用时，会引发 ValueError 而不是丢失数据 (GH 22801)

In [71]: df = pd.DataFrame({'a': [1, 2], 'b': [0.5, 0.75]}, index=['A', 'A'])

In [72]: df
Out[72]: 
   a     b
A  1  0.50
A  2  0.75

[2 rows x 2 columns]

In [73]: df.to_dict(orient='index')
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[73], line 1
----> 1 df.to_dict(orient='index')

File ~/work/pandas/pandas/pandas/util/_decorators.py:333, in deprecate_nonkeyword_arguments.<locals>.decorate.<locals>.wrapper(*args, **kwargs)
    327 if len(args) > num_allow_args:
    328     warnings.warn(
    329         msg.format(arguments=_format_argument_list(allow_args)),
    330         FutureWarning,
    331         stacklevel=find_stack_level(),
    332     )
--> 333 return func(*args, **kwargs)

File ~/work/pandas/pandas/pandas/core/frame.py:2183, in DataFrame.to_dict(self, orient, into, index)
   2080 """
   2081 Convert the DataFrame to a dictionary.
   2082 
   (...)
   2179  defaultdict(<class 'list'>, {'col1': 2, 'col2': 0.75})]
   2180 """
   2181 from pandas.core.methods.to_dict import to_dict
-> 2183 return to_dict(self, orient, into=into, index=index)

File ~/work/pandas/pandas/pandas/core/methods/to_dict.py:242, in to_dict(df, orient, into, index)
    240 elif orient == "index":
    241     if not df.index.is_unique:
--> 242         raise ValueError("DataFrame index must be unique for orient='index'.")
    243     columns = df.columns.tolist()
    244     if are_all_object_dtype_cols:

ValueError: DataFrame index must be unique for orient='index'.

Tick DateOffset 规范化限制#

不再支持使用 `normalize=True` 创建 Tick 对象（Day、Hour、Minute、Second、Milli、Micro、Nano）。这可以防止加法操作未能保持单调性或结合律的意外行为。(GH 21427)

旧行为:

In [2]: ts = pd.Timestamp('2018-06-11 18:01:14')

In [3]: ts
Out[3]: Timestamp('2018-06-11 18:01:14')

In [4]: tic = pd.offsets.Hour(n=2, normalize=True)
   ...:

In [5]: tic
Out[5]: <2 * Hours>

In [6]: ts + tic
Out[6]: Timestamp('2018-06-11 00:00:00')

In [7]: ts + tic + tic + tic == ts + (tic + tic + tic)
Out[7]: False

新行为:

In [74]: ts = pd.Timestamp('2018-06-11 18:01:14')

In [75]: tic = pd.offsets.Hour(n=2)

In [76]: ts + tic + tic + tic == ts + (tic + tic + tic)
Out[76]: True

周期相减#

一个 Period 减去另一个 Period 将得到一个 DateOffset，而不是一个整数 (GH 21314)

旧行为:

In [2]: june = pd.Period('June 2018')

In [3]: april = pd.Period('April 2018')

In [4]: june - april
Out [4]: 2

新行为:

In [77]: june = pd.Period('June 2018')

In [78]: april = pd.Period('April 2018')

In [79]: june - april
Out[79]: <2 * MonthEnds>

类似地，从 PeriodIndex 中减去一个 Period 现在将返回一个 DateOffset 对象的 Index，而不是 Int64Index

旧行为:

In [2]: pi = pd.period_range('June 2018', freq='M', periods=3)

In [3]: pi - pi[0]
Out[3]: Int64Index([0, 1, 2], dtype='int64')

新行为:

In [80]: pi = pd.period_range('June 2018', freq='M', periods=3)

In [81]: pi - pi[0]
Out[81]: Index([<0 * MonthEnds>, <MonthEnd>, <2 * MonthEnds>], dtype='object')

DataFrame 中 `NaN` 的加法/减法#

对具有 timedelta64[ns] dtype 的 DataFrame 列添加或减去 NaN 现在将引发 TypeError，而不是返回全 NaT。这是为了与 TimedeltaIndex 和 Series 的行为兼容 (GH 22163)

In [82]: df = pd.DataFrame([pd.Timedelta(days=1)])

In [83]: df
Out[83]: 
       0
0 1 days

[1 rows x 1 columns]

旧行为:

In [4]: df = pd.DataFrame([pd.Timedelta(days=1)])

In [5]: df - np.nan
Out[5]:
    0
0 NaT

新行为:

In [2]: df - np.nan
...
TypeError: unsupported operand type(s) for -: 'TimedeltaIndex' and 'float'

DataFrame 比较操作的广播更改#

以前，DataFrame 比较操作（`==`、`!=` 等）的广播行为与算术操作（`+`、`-` 等）的行为不一致。在这些情况下，比较操作的行为已更改为与算术操作匹配。(GH 22880)

受影响的情况有

对只有 1 行或 1 列的二维 np.ndarray 进行操作时，现在会以与 np.ndarray 相同的方式进行广播 (GH 23000)。
长度与 DataFrame 行数匹配的列表或元组现在将引发 ValueError，而不是按列操作 (GH 22880)。
长度与 DataFrame 列数匹配的列表或元组现在将按行操作，而不是引发 ValueError (GH 22880)。

In [84]: arr = np.arange(6).reshape(3, 2)

In [85]: df = pd.DataFrame(arr)

In [86]: df
Out[86]: 
   0  1
0  0  1
1  2  3
2  4  5

[3 rows x 2 columns]

旧行为:

In [5]: df == arr[[0], :]
    ...: # comparison previously broadcast where arithmetic would raise
Out[5]:
       0      1
0   True   True
1  False  False
2  False  False
In [6]: df + arr[[0], :]
...
ValueError: Unable to coerce to DataFrame, shape must be (3, 2): given (1, 2)

In [7]: df == (1, 2)
    ...: # length matches number of columns;
    ...: # comparison previously raised where arithmetic would broadcast
...
ValueError: Invalid broadcasting comparison [(1, 2)] with block values
In [8]: df + (1, 2)
Out[8]:
   0  1
0  1  3
1  3  5
2  5  7

In [9]: df == (1, 2, 3)
    ...:  # length matches number of rows
    ...:  # comparison previously broadcast where arithmetic would raise
Out[9]:
       0      1
0  False   True
1   True  False
2  False  False
In [10]: df + (1, 2, 3)
...
ValueError: Unable to coerce to Series, length must be 2: given 3

新行为:

# Comparison operations and arithmetic operations both broadcast.
In [87]: df == arr[[0], :]
Out[87]: 
       0      1
0   True   True
1  False  False
2  False  False

[3 rows x 2 columns]

In [88]: df + arr[[0], :]
Out[88]: 
   0  1
0  0  2
1  2  4
2  4  6

[3 rows x 2 columns]

# Comparison operations and arithmetic operations both broadcast.
In [89]: df == (1, 2)
Out[89]: 
       0      1
0  False  False
1  False  False
2  False  False

[3 rows x 2 columns]

In [90]: df + (1, 2)
Out[90]: 
   0  1
0  1  3
1  3  5
2  5  7

[3 rows x 2 columns]

# Comparison operations and arithmetic operations both raise ValueError.
In [6]: df == (1, 2, 3)
...
ValueError: Unable to coerce to Series, length must be 2: given 3

In [7]: df + (1, 2, 3)
...
ValueError: Unable to coerce to Series, length must be 2: given 3

DataFrame 算术操作的广播更改#

DataFrame 算术操作在使用二维 np.ndarray 对象进行操作时，现在会以与 np.ndarray 广播相同的方式进行广播。(GH 23000)

In [91]: arr = np.arange(6).reshape(3, 2)

In [92]: df = pd.DataFrame(arr)

In [93]: df
Out[93]: 
   0  1
0  0  1
1  2  3
2  4  5

[3 rows x 2 columns]

旧行为:

In [5]: df + arr[[0], :]   # 1 row, 2 columns
...
ValueError: Unable to coerce to DataFrame, shape must be (3, 2): given (1, 2)
In [6]: df + arr[:, [1]]   # 1 column, 3 rows
...
ValueError: Unable to coerce to DataFrame, shape must be (3, 2): given (3, 1)

新行为:

In [94]: df + arr[[0], :]   # 1 row, 2 columns
Out[94]: 
   0  1
0  0  2
1  2  4
2  4  6

[3 rows x 2 columns]

In [95]: df + arr[:, [1]]   # 1 column, 3 rows
Out[95]: 
   0   1
0  1   2
1  5   6
2  9  10

[3 rows x 2 columns]

Series 和 Index 数据类型不兼容#

Series 和 Index 构造函数现在在数据与传入的 `dtype=` 不兼容时会引发错误 (GH 15832)

旧行为:

In [4]: pd.Series([-1], dtype="uint64")
Out [4]:
0    18446744073709551615
dtype: uint64

新行为:

In [4]: pd.Series([-1], dtype="uint64")
Out [4]:
...
OverflowError: Trying to coerce negative values to unsigned integers

连接更改#

现在，在对带有 NA 值的整数 Categorical 调用 pandas.concat() 时，如果与非另一个整数 Categorical 进行连接，它们将被作为对象处理 (GH 19214)

In [96]: s = pd.Series([0, 1, np.nan])

In [97]: c = pd.Series([0, 1, np.nan], dtype="category")

旧行为

In [3]: pd.concat([s, c])
Out[3]:
0    0.0
1    1.0
2    NaN
0    0.0
1    1.0
2    NaN
dtype: float64

新行为

In [98]: pd.concat([s, c])
Out[98]: 
0    0.0
1    1.0
2    NaN
0    0.0
1    1.0
2    NaN
Length: 6, dtype: float64

日期时间型 API 更改#

对于具有非 `None` `freq` 属性的 DatetimeIndex 和 TimedeltaIndex，整数类型数组或 Index 的加减法将返回相同类的对象 (GH 19959)
DateOffset 对象现在是不可变的。尝试修改这些对象将引发 AttributeError (GH 21341)
PeriodIndex 减去另一个 PeriodIndex 现在将返回一个对象类型的 DateOffset 对象的 Index，而不是引发 TypeError (GH 20049)
当输入分别为日期时间或时间差 dtype 且 `retbins=True` 时，cut() 和 qcut() 现在返回 DatetimeIndex 或 TimedeltaIndex 分箱 (GH 19891)
DatetimeIndex.to_period() 和 Timestamp.to_period() 在时区信息将丢失时会发出警告 (GH 21333)
PeriodIndex.tz_convert() 和 PeriodIndex.tz_localize() 已被移除 (GH 21781)

其他 API 更改#

新构造的以整数作为 dtype 的空 DataFrame 现在只有在指定 `index` 时才会被转换为 float64 (GH 22858)
如果 `others` 是一个 set，Series.str.cat() 现在将引发错误 (GH 23009)
将标量值传递给 DatetimeIndex 或 TimedeltaIndex 现在将引发 TypeError 而不是 ValueError (GH 23539)
由于截断现在由 DataFrameFormatter 处理，HTMLFormatter 中的 `max_rows` 和 `max_cols` 参数已被移除 (GH 23818)
如果包含缺失值的列被声明为 `bool` 数据类型，read_csv() 现在将引发 ValueError (GH 20591)
从 MultiIndex.to_frame() 生成的 DataFrame 的列顺序现在保证与 MultiIndex.names 的顺序匹配。(GH 22420)
错误地将 DatetimeIndex 而不是元组序列传递给 MultiIndex.from_tuples() 现在会引发 TypeError 而不是 ValueError (GH 24024)
pd.offsets.generate_range() 的参数 `time_rule` 已被移除；请改用 `offset` (GH 24157)
在 0.23.x 版本中，pandas 在合并数值列（例如 `int` 类型列）和 `object` 类型列时会引发 ValueError (GH 9780)。我们已重新启用合并 `object` 和其他数据类型的能力；但是，当合并数值列和仅由字符串组成的 `object` 类型列时，pandas 仍然会引发错误 (GH 21681)
现在，访问 MultiIndex 中具有重复名称的级别（例如在 get_level_values() 中）会引发 ValueError 而不是 KeyError (GH 21678)。
如果子数据类型无效，`IntervalDtype` 的无效构造现在将始终引发 TypeError 而不是 ValueError (GH 21185)
尝试使用非唯一 MultiIndex 重新索引 DataFrame 现在会引发 ValueError 而不是 Exception (GH 21770)
Index 减法将尝试按元素操作，而不是引发 TypeError (GH 19369)
当使用 to_excel() 时，pandas.io.formats.style.Styler 支持 `number-format` 属性 (GH 22015)
当提供无效方法时，DataFrame.corr() 和 Series.corr() 现在会引发 ValueError 并附带一条有用的错误消息，而不是 KeyError (GH 22298)
`shift()` 现在将始终返回一个副本，而不是以前当偏移量为 0 时返回自身的行为 (GH 22397)
DataFrame.set_index() 现在提供了更好的（且较少出现的）KeyError，对不正确的类型会引发 ValueError，并且在使用 `drop=True` 时不会因重复的列名而失败。(GH 22484)
现在，对包含多个相同类型 ExtensionArray 的 DataFrame 进行单行切片时，会保留其数据类型，而不是强制转换为对象类型 (GH 22784)
`DateOffset` 的属性 `_cacheable` 和方法 `_should_cache` 已被移除 (GH 23118)
当提供标量值进行搜索时，Series.searchsorted() 现在返回一个标量而不是一个数组 (GH 23801)。
当提供标量值进行搜索时，Categorical.searchsorted() 现在返回一个标量而不是一个数组 (GH 23466)。
如果搜索的键在其类别中未找到，Categorical.searchsorted() 现在将引发 KeyError 而不是 ValueError (GH 23466)。
Index.hasnans() 和 Series.hasnans() 现在始终返回 Python 布尔值。以前，根据情况可能会返回 Python 或 NumPy 布尔值 (GH 23294)。
DataFrame.to_html() 和 DataFrame.to_string() 的参数顺序已重新排列，以便相互保持一致。(GH 23614)
如果目标索引非唯一且不等于当前索引，CategoricalIndex.reindex() 现在会引发 ValueError。它以前只在目标索引不是分类数据类型时才引发错误 (GH 23963)。
Series.to_list() 和 Index.to_list() 现在分别是 `Series.tolist` 和 `Index.tolist` 的别名 (GH 8826)
`SparseSeries.unstack` 的结果现在是一个具有稀疏值的 DataFrame，而不是 SparseDataFrame (GH 24372)。
DatetimeIndex 和 TimedeltaIndex 不再忽略 dtype 精度。传递非纳秒分辨率的 dtype 将引发 ValueError (GH 24753)

扩展类型更改#

相等性和可哈希性

pandas 现在要求扩展数据类型是可哈希的（即相应的 ExtensionDtype 对象；可哈希性不是相应 ExtensionArray 值的要求）。基类实现了一个默认的 `__eq__` 和 `__hash__`。如果您的数据类型是参数化的，则应更新 `ExtensionDtype._metadata` 元组以匹配 `__init__` 方法的签名。更多信息请参见 pandas.api.extensions.ExtensionDtype (GH 22476)。

新增和更改的方法

已添加 dropna() (GH 21185)
已添加 repeat() (GH 24349)
ExtensionArray 构造函数 _from_sequence 现在接受关键字参数 `copy=False` (GH 21185)
作为基本 ExtensionArray 接口的一部分，已添加 pandas.api.extensions.ExtensionArray.shift() (GH 22387)。
已添加 searchsorted() (GH 24350)
通过选择性基类方法重写支持 `sum`、`mean` 等归约操作 (GH 22762)
ExtensionArray.isna() 允许返回一个 ExtensionArray (GH 22325)。

数据类型更改

ExtensionDtype 获得了从字符串数据类型实例化的能力，例如 `decimal` 将实例化一个注册的 DecimalDtype；此外，ExtensionDtype 获得了方法 `construct_array_type` (GH 21185)
添加了 `ExtensionDtype._is_numeric` 以控制扩展数据类型是否被视为数值型 (GH 22290)。
添加了 pandas.api.types.register_extension_dtype() 用于向 pandas 注册扩展类型 (GH 22664)
已更新 PeriodDtype、DatetimeTZDtype 和 IntervalDtype 的 `.type` 属性，使其成为相应数据类型的实例（分别为 `Period`、`Timestamp` 和 `Interval`）(GH 22938)

运算符支持

基于 ExtensionArray 的 Series 现在支持算术和比较运算符 (GH 19577)。为 ExtensionArray 提供运算符支持有两种方法

在您的 ExtensionArray 子类上定义每个运算符。
使用 pandas 中依赖于 `ExtensionArray` 基础元素（标量）上已定义运算符的运算符实现。

有关添加运算符支持的两种方法的详细信息，请参阅扩展数组运算符支持文档部分。

其他更改

现在为 pandas.api.extensions.ExtensionArray 提供了一个默认的 repr (GH 23601)。
`ExtensionArray._formatting_values()` 已弃用。请改用 ExtensionArray._formatter。(GH 23601)
具有布尔数据类型的 ExtensionArray 现在可以正确地用作布尔索引器。pandas.api.types.is_bool_dtype() 现在正确地将它们视为布尔值 (GH 22326)

错误修复

在使用 ExtensionArray 和整数索引的 Series 中 Series.get() 的错误 (GH 21257)
shift() 现在调度到 ExtensionArray.shift() (GH 22386)
Series.combine() 在 Series 内部与 ExtensionArray 协同工作正常 (GH 20825)
带有标量参数的 Series.combine() 现在适用于任何函数类型 (GH 21248)
Series.astype() 和 DataFrame.astype() 现在调度到 ExtensionArray.astype() (GH 21185)。
现在，对包含多个相同类型 ExtensionArray 的 DataFrame 进行单行切片时，会保留其数据类型，而不是强制转换为对象类型 (GH 22784)
连接具有不同扩展数据类型的多个 Series 时未转换为对象数据类型的错误 (GH 22994)
由 ExtensionArray 支持的 Series 现在可与 util.hash_pandas_object() 配合使用 (GH 23066)
对于每列具有相同扩展数据类型的 DataFrame，DataFrame.stack() 不再转换为对象数据类型。输出 Series 将具有与列相同的数据类型 (GH 23077)。
Series.unstack() 和 DataFrame.unstack() 不再将扩展数组转换为对象数据类型的 ndarray。输出 DataFrame 中的每个列现在将具有与输入相同的数据类型 (GH 23077)。
在对 Dataframe.groupby() 进行分组并在 ExtensionArray 上进行聚合时，未能返回实际 ExtensionArray 数据类型的错误 (GH 23227)。
在合并扩展数组支持的列时 pandas.merge() 中的错误 (GH 23020)。

弃用#

MultiIndex.labels 已弃用，并已由 MultiIndex.codes 替代。功能保持不变。新名称更好地反映了这些代码的性质，并使 MultiIndex API 更类似于 CategoricalIndex 的 API (GH 13443)。因此，MultiIndex 中 `labels` 名称的其他用法也已弃用并替换为 `codes`
- 您应该使用名为 `codes` 而不是 `labels` 的参数来初始化 MultiIndex 实例。
- MultiIndex.set_labels 已弃用，取而代之的是 MultiIndex.set_codes()。
- 对于方法 MultiIndex.copy()，`labels` 参数已弃用，并已由 `codes` 参数替代。
DataFrame.to_stata()、read_stata()、StataReader 和 StataWriter 已弃用 `encoding` 参数。Stata dta 文件的编码由文件类型决定，无法更改 (GH 21244)
MultiIndex.to_hierarchical() 已弃用，并将在未来版本中移除 (GH 21613)
Series.ptp() 已弃用。请改用 `numpy.ptp` (GH 21614)
Series.compress() 已弃用。请改用 `Series[condition]` (GH 18262)
Series.to_csv() 的签名已统一为 DataFrame.to_csv() 的签名：第一个参数的名称现在是 `path_or_buf`，后续参数的顺序已更改，`header` 参数现在默认为 `True`。(GH 19715)
Categorical.from_codes() 已弃用为 `codes` 参数提供浮点值。(GH 21767)
pandas.read_table() 已弃用。请改为使用 read_csv()，如有必要，请传递 `sep='\t'`。此弃用已在 0.25.0 版本中移除。(GH 21948)
Series.str.cat() 已弃用在列表类内部使用任意列表类。列表类容器仍然可以包含许多 Series、Index 或一维 np.ndarray，或者只包含标量值。(GH 21950)
FrozenNDArray.searchsorted() 已弃用 `v` 参数，改用 `value` (GH 14645)
DatetimeIndex.shift() 和 PeriodIndex.shift() 现在接受 `periods` 参数而不是 `n`，以与 Index.shift() 和 Series.shift() 保持一致。使用 `n` 会引发弃用警告 (GH 22458, GH 22912)
不同 Index 构造函数的 `fastpath` 关键字已弃用 (GH 23110)。
Timestamp.tz_localize()、DatetimeIndex.tz_localize() 和 Series.tz_localize() 已弃用 `errors` 参数，改用 `nonexistent` 参数 (GH 8917)
`FrozenNDArray` 类已弃用。一旦此类别被移除，`FrozenNDArray` 在反序列化时将被反序列化为 np.ndarray (GH 9031)
方法 DataFrame.update() 和 Panel.update() 已弃用 `raise_conflict=False|True` 关键字，改用 `errors='ignore'|'raise'` (GH 23585)
方法 Series.str.partition() 和 Series.str.rpartition() 已弃用 `pat` 关键字，改用 `sep` (GH 22676)
已弃用 pandas.read_feather() 的 `nthreads` 关键字，改用 `use_threads` 以反映 `pyarrow>=0.11.0` 中的更改。(GH 23053)
pandas.read_excel() 已弃用接受整数作为 `usecols`。请改为传入一个从 0 到 `usecols`（包括）的整数列表 (GH 23527)
从具有 `datetime64` 数据类型的数据构造 TimedeltaIndex 已弃用，将在未来版本中引发 TypeError (GH 23539)
从具有 `timedelta64` 数据类型的数据构造 DatetimeIndex 已弃用，将在未来版本中引发 TypeError (GH 23675)
DatetimeIndex.to_series() 的 `keep_tz` 关键字的 `keep_tz=False` 选项（默认值）已弃用 (GH 17832)。
现在，使用 Timestamp 和 `tz` 参数将带时区的 `datetime.datetime` 或 Timestamp 转换为不同时区已弃用。请改用 Timestamp.tz_convert() (GH 23579)
pandas.api.types.is_period() 已弃用，取而代之的是 `pandas.api.types.is_period_dtype` (GH 23917)
pandas.api.types.is_datetimetz() 已弃用，取而代之的是 `pandas.api.types.is_datetime64tz` (GH 23917)
通过传递范围参数 `start`、`end` 和 `periods` 来创建 TimedeltaIndex、DatetimeIndex 或 PeriodIndex 已弃用，建议改用 timedelta_range()、date_range() 或 period_range() (GH 23919)
将 `'datetime64[ns, UTC]'` 等字符串别名作为 `unit` 参数传递给 DatetimeTZDtype 已弃用。请改用 `DatetimeTZDtype.construct_from_string` (GH 23990)。
infer_dtype() 的 `skipna` 参数在 pandas 的未来版本中将默认为 `True` (GH 17066, GH 24050)
在带有分类数据的 Series.where() 中，提供一个不在类别中的 `other` 值已弃用。请先将分类数据转换为不同的数据类型或将 `other` 添加到类别中 (GH 24077)。
Series.clip_lower()、Series.clip_upper()、DataFrame.clip_lower() 和 DataFrame.clip_upper() 已弃用，并将在未来版本中移除。请改用 `Series.clip(lower=threshold)`、`Series.clip(upper=threshold)` 和等效的 `DataFrame` 方法 (GH 24203)
Series.nonzero() 已弃用，并将在未来版本中移除 (GH 18262)
将整数传递给带有 `timedelta64[ns]` 数据类型的 Series.fillna() 和 DataFrame.fillna() 已弃用，在未来版本中将引发 TypeError。请改用 `obj.fillna(pd.Timedelta(...))` (GH 24694)
`Series.cat.categorical`、`Series.cat.name` 和 `Series.cat.index` 已弃用。请直接使用 `Series.cat` 或 `Series` 上的属性。(GH 24751)。
现在，将没有精度的 dtype（例如 `np.dtype('datetime64')` 或 `timedelta64`）传递给 Index、DatetimeIndex 和 TimedeltaIndex 已弃用。请改用纳秒精度的 dtype (GH 24753)。

整数与日期时间/时间差的加减法已弃用#

过去，用户在某些情况下可以对 Timestamp、DatetimeIndex 和 TimedeltaIndex 进行整数或整数类型数组的加减法运算。

此用法现已弃用。请改为添加或减去对象 `freq` 属性的整数倍 (GH 21939, GH 23878)。

旧行为:

In [5]: ts = pd.Timestamp('1994-05-06 12:15:16', freq=pd.offsets.Hour())
In [6]: ts + 2
Out[6]: Timestamp('1994-05-06 14:15:16', freq='H')

In [7]: tdi = pd.timedelta_range('1D', periods=2)
In [8]: tdi - np.array([2, 1])
Out[8]: TimedeltaIndex(['-1 days', '1 days'], dtype='timedelta64[ns]', freq=None)

In [9]: dti = pd.date_range('2001-01-01', periods=2, freq='7D')
In [10]: dti + pd.Index([1, 2])
Out[10]: DatetimeIndex(['2001-01-08', '2001-01-22'], dtype='datetime64[ns]', freq=None)

新行为:

In [108]: ts = pd.Timestamp('1994-05-06 12:15:16', freq=pd.offsets.Hour())

In[109]: ts + 2 * ts.freq
Out[109]: Timestamp('1994-05-06 14:15:16', freq='H')

In [110]: tdi = pd.timedelta_range('1D', periods=2)

In [111]: tdi - np.array([2 * tdi.freq, 1 * tdi.freq])
Out[111]: TimedeltaIndex(['-1 days', '1 days'], dtype='timedelta64[ns]', freq=None)

In [112]: dti = pd.date_range('2001-01-01', periods=2, freq='7D')

In [113]: dti + pd.Index([1 * dti.freq, 2 * dti.freq])
Out[113]: DatetimeIndex(['2001-01-08', '2001-01-22'], dtype='datetime64[ns]', freq=None)

向 DatetimeIndex 传递整数数据和时区#

在未来版本的 pandas 中，DatetimeIndex 在传入整数数据和时区时的行为将发生变化。以前，这些数据被解释为所需时区中的“墙上时间”（wall times）。未来，它们将被解释为 UTC 中的“墙上时间”，然后转换为所需时区（GH 24559）。

默认行为保持不变，但会发出警告。

In [3]: pd.DatetimeIndex([946684800000000000], tz="US/Central")
/bin/ipython:1: FutureWarning:
    Passing integer-dtype data and a timezone to DatetimeIndex. Integer values
    will be interpreted differently in a future version of pandas. Previously,
    these were viewed as datetime64[ns] values representing the wall time
    *in the specified timezone*. In the future, these will be viewed as
    datetime64[ns] values representing the wall time *in UTC*. This is similar
    to a nanosecond-precision UNIX epoch. To accept the future behavior, use

        pd.to_datetime(integer_data, utc=True).tz_convert(tz)

    To keep the previous behavior, use

        pd.to_datetime(integer_data).tz_localize(tz)

 #!/bin/python3
 Out[3]: DatetimeIndex(['2000-01-01 00:00:00-06:00'], dtype='datetime64[ns, US/Central]', freq=None)

正如警告信息所解释的，通过指定整数值为 UTC，然后转换为最终时区，可以选择启用未来行为。

In [99]: pd.to_datetime([946684800000000000], utc=True).tz_convert('US/Central')
Out[99]: DatetimeIndex(['1999-12-31 18:00:00-06:00'], dtype='datetime64[ns, US/Central]', freq=None)

旧的行为可以通过直接本地化到最终时区来保留。

In [100]: pd.to_datetime([946684800000000000]).tz_localize('US/Central')
Out[100]: DatetimeIndex(['2000-01-01 00:00:00-06:00'], dtype='datetime64[ns, US/Central]', freq=None)

将时区感知 Series 和 Index 转换为 NumPy 数组#

默认情况下，从包含时区感知日期时间数据的 Series 或 Index 转换为 NumPy 数组的行为将发生变化，以保留时区（GH 23569）。

NumPy 没有专门用于时区感知日期时间的 dtype。过去，将包含时区感知日期时间数据的 Series 或 DatetimeIndex 转换为 NumPy 数组会通过以下方式进行：

将时区感知数据转换为 UTC
删除时区信息
返回一个具有 datetime64[ns] dtype 的 numpy.ndarray

未来版本的 pandas 将通过返回一个对象 dtype 的 NumPy 数组来保留时区信息，其中每个值都是一个附加了正确时区的 Timestamp。

In [101]: ser = pd.Series(pd.date_range('2000', periods=2, tz="CET"))

In [102]: ser
Out[102]: 
0   2000-01-01 00:00:00+01:00
1   2000-01-02 00:00:00+01:00
Length: 2, dtype: datetime64[ns, CET]

默认行为保持不变，但会发出警告。

In [8]: np.asarray(ser)
/bin/ipython:1: FutureWarning: Converting timezone-aware DatetimeArray to timezone-naive
      ndarray with 'datetime64[ns]' dtype. In the future, this will return an ndarray
      with 'object' dtype where each element is a 'pandas.Timestamp' with the correct 'tz'.

        To accept the future behavior, pass 'dtype=object'.
        To keep the old behavior, pass 'dtype="datetime64[ns]"'.
  #!/bin/python3
Out[8]:
array(['1999-12-31T23:00:00.000000000', '2000-01-01T23:00:00.000000000'],
      dtype='datetime64[ns]')

通过指定 dtype，可以在不发出任何警告的情况下获得以前或将来的行为。

旧行为

In [103]: np.asarray(ser, dtype='datetime64[ns]')
Out[103]: 
array(['1999-12-31T23:00:00.000000000', '2000-01-01T23:00:00.000000000'],
      dtype='datetime64[ns]')

未来行为

# New behavior
In [104]: np.asarray(ser, dtype=object)
Out[104]: 
array([Timestamp('2000-01-01 00:00:00+0100', tz='CET'),
       Timestamp('2000-01-02 00:00:00+0100', tz='CET')], dtype=object)

或者通过使用 Series.to_numpy()。

In [105]: ser.to_numpy()
Out[105]: 
array([Timestamp('2000-01-01 00:00:00+0100', tz='CET'),
       Timestamp('2000-01-02 00:00:00+0100', tz='CET')], dtype=object)

In [106]: ser.to_numpy(dtype="datetime64[ns]")
Out[106]: 
array(['1999-12-31T23:00:00.000000000', '2000-01-01T23:00:00.000000000'],
      dtype='datetime64[ns]')

以上所有内容也适用于包含时区感知值的 DatetimeIndex。

移除先前版本的弃用/更改#

LongPanel 和 WidePanel 类已被移除（GH 10892）。
Series.repeat() 已将 reps 参数重命名为 repeats（GH 14645）。
一些私有函数已从（非公共）模块 pandas.core.common 中移除（GH 22001）。
先前已弃用的模块 pandas.core.datetools 已移除（GH 14105, GH 14094）。
传入 DataFrame.groupby() 且同时引用列和索引级别的字符串将引发 ValueError（GH 14432）。
Index.repeat() 和 MultiIndex.repeat() 已将 n 参数重命名为 repeats（GH 14645）。
Series 构造函数和 .astype 方法现在将在 dtype 参数传入没有单位的时间戳 dtype（例如 np.datetime64）时引发 ValueError（GH 15987）。
已从 str.match() 中完全移除先前已弃用的 as_indexer 关键字（GH 22356, GH 6581）。
模块 pandas.types、pandas.computation 和 pandas.util.decorators 已被移除（GH 16157, GH 16250）。
移除了 pandas.io.formats.style.Styler 的 pandas.formats.style shim（GH 16059）。
pandas.pnow、pandas.match、pandas.groupby、pd.get_store、pd.Expr 和 pd.Term 已被移除（GH 15538, GH 15940）。
Categorical.searchsorted() 和 Series.searchsorted() 已将 v 参数重命名为 value（GH 14645）。
pandas.parser、pandas.lib 和 pandas.tslib 已被移除（GH 15537）。
Index.searchsorted() 已将 key 参数重命名为 value（GH 14645）。
DataFrame.consolidate 和 Series.consolidate 已被移除（GH 15501）。
先前已弃用的模块 pandas.json 已移除（GH 19944）。
模块 pandas.tools 已被移除（GH 15358, GH 16005）。
SparseArray.get_values() 和 SparseArray.to_dense() 已删除 fill 参数（GH 14686）。
DataFrame.sortlevel 和 Series.sortlevel 已被移除（GH 15099）。
SparseSeries.to_dense() 已删除 sparse_only 参数（GH 14686）。
DataFrame.astype() 和 Series.astype() 已将 raise_on_error 参数重命名为 errors（GH 14967）。
is_sequence、is_any_int_dtype 和 is_floating_dtype 已从 pandas.api.types 中移除（GH 16163, GH 16189）。

性能改进#

使用单调递增的 CategoricalIndex 对 Series 和 DataFrames 进行切片现在非常快，速度可与使用 Int64Index 进行切片相媲美。无论按标签（使用 .loc）还是按位置（.iloc）索引，速度都有所提升（GH 20395）。对单调递增的 CategoricalIndex 本身进行切片（即 ci[1000:2000]）也显示出类似的速度提升（GH 21659）。
改进了 CategoricalIndex.equals() 在与另一个 CategoricalIndex 比较时的性能（GH 24023）。
改进了 Series.describe() 在数值 dtpyes 情况下的性能（GH 21274）。
改进了 GroupBy.rank() 在处理并列排名时的性能（GH 21237）。
改进了 DataFrame.set_index() 在列包含 Period 对象时的性能（GH 21582, GH 21606）。
改进了 Series.at() 和 Index.get_value() 对于扩展数组值（例如 Categorical）的性能（GH 24204）。
改进了 Categorical 和 CategoricalIndex 中成员资格检查的性能（即 x in cat 风格的检查速度快得多）。CategoricalIndex.contains() 也因此快得多（GH 21369, GH 21508）。
改进了 HDFStore.groups() （以及 HDFStore.keys() 等相关函数）的性能。（即 x in store 检查速度快得多）（GH 21372）。
改进了 pandas.get_dummies() 在 sparse=True 时的性能（GH 21997）。
改进了 IndexEngine.get_indexer_non_unique() 对于已排序、非唯一索引的性能（GH 9466）。
改进了 PeriodIndex.unique() 的性能（GH 23083）。
改进了 concat() 对于 Series 对象的性能（GH 23404）。
改进了 DatetimeIndex.normalize() 和 Timestamp.normalize() 对于时区无关或 UTC 日期时间的性能（GH 23634）。
改进了 DatetimeIndex.tz_localize() 以及 DatetimeIndex 多个属性在 dateutil UTC 时区时的性能（GH 23772）。
修复了 Windows Python 3.7 上 read_csv() 的性能回归问题（GH 23516）。
改进了 Categorical 构造函数对于 Series 对象的性能（GH 23814）。
改进了 where() 对于分类数据的性能（GH 24077）。
改进了遍历 Series 的性能。DataFrame.itertuples() 现在创建迭代器时不再在内部分配所有元素的列表（GH 20783）。
改进了 Period 构造函数的性能，同时也有利于 PeriodArray 和 PeriodIndex 的创建（GH 24084, GH 24118）。
改进了时区感知 DatetimeArray 二元操作的性能（GH 24491）。

Bug 修复#

分类数据（Categorical）#

Categorical.from_codes() 中的一个 bug，其中 codes 中的 NaN 值被静默转换为 0（GH 21767）。将来这将引发 ValueError。同时也改变了 .from_codes([1.1, 2.0]) 的行为。
Categorical.sort_values() 中的一个 bug，其中 NaN 值总是位于前面，无论 na_position 值如何（GH 22556）。
使用布尔值 Categorical 进行索引时的一个 bug。现在，布尔值 Categorical 被视为布尔掩码（GH 22665）。
在 dtype 强制转换更改后，使用空值和布尔类别构造 CategoricalIndex 会引发 ValueError（GH 22702）。
Categorical.take() 中的一个 bug，其中用户提供的 fill_value 未编码 fill_value，这可能导致 ValueError、不正确的结果或分段错误（GH 23296）。
在 Series.unstack() 中，指定类别中不存在的 fill_value 现在会引发 TypeError，而不是忽略 fill_value（GH 23284）。
在重采样 DataFrame.resample() 并对分类数据进行聚合时，分类 dtype 会丢失的 bug（GH 23227）。
.str 访问器的许多方法中的一个 bug，它们在调用 CategoricalIndex.str 构造函数时总是失败（GH 23555, GH 23556）。
Series.where() 在分类数据上丢失分类 dtype 的 bug（GH 24077）。
Categorical.apply() 中的一个 bug，其中 NaN 值可能会被不可预测地处理。它们现在保持不变（GH 24241）。
Categorical 比较方法中的一个 bug，在与 DataFrame 操作时错误地引发 ValueError（GH 24630）。
Categorical.set_categories() 中的一个 bug，其中当 rename=True 时设置更少的新类别会导致分段错误（GH 24675）。

日期时间类（Datetimelike）#

修复了两个具有不同 normalize 属性的 DateOffset 对象可能评估为相等的 bug（GH 21404）。
修复了 Timestamp.resolution() 错误地返回 1 微秒 timedelta 而不是 1 纳秒 Timedelta 的 bug（GH 21336, GH 21365）。
to_datetime() 中的一个 bug，在指定 box=True 时未能始终返回 Index（GH 21864）。
DatetimeIndex 比较中的一个 bug，其中字符串比较错误地引发 TypeError（GH 22074）。
DatetimeIndex 与 timedelta64[ns] dtype 数组比较时的一个 bug；在某些情况下错误地引发 TypeError，在另一些情况下则错误地未能引发（GH 22074）。
DatetimeIndex 与对象 dtype 数组比较时的一个 bug（GH 22074）。
具有 datetime64[ns] dtype 的 DataFrame 与 Timedelta 类似对象进行加减运算时的 bug（GH 22005, GH 22163）。
具有 datetime64[ns] dtype 的 DataFrame 与 DateOffset 对象进行加减运算时返回 object dtype 而不是 datetime64[ns] dtype 的 bug（GH 21610, GH 22163）。
具有 datetime64[ns] dtype 的 DataFrame 错误地与 NaT 进行比较的 bug（GH 22242, GH 22163）。
具有 datetime64[ns] dtype 的 DataFrame 减去 Timestamp 类似对象时错误地返回 datetime64[ns] dtype 而不是 timedelta64[ns] dtype 的 bug（GH 8554, GH 22163）。
具有 datetime64[ns] dtype 的 DataFrame 减去非纳秒单位的 np.datetime64 对象时未能转换为纳秒的 bug（GH 18874, GH 22163）。
DataFrame 与 Timestamp 类似对象进行比较时，对于类型不匹配的不等式检查未能引发 TypeError 的 bug（GH 8932, GH 22163）。
包含 datetime64[ns] 的混合 dtype 的 DataFrame 在进行相等比较时错误地引发 TypeError 的 bug（GH 13128, GH 22163）。
DataFrame.values 对于包含时区感知日期时间值的单列 DataFrame 错误地返回 DatetimeIndex 的 bug。现在返回一个由 Timestamp 对象组成的 2-D numpy.ndarray（GH 24024）。
DataFrame.eq() 与 NaT 比较时错误地返回 True 或 NaN 的 bug（GH 15697, GH 22163）。
DatetimeIndex 减法中错误地未能引发 OverflowError 的 bug（GH 22492, GH 22508）。
DatetimeIndex 错误地允许使用 Timedelta 对象进行索引的 bug（GH 20464）。
DatetimeIndex 中，如果原始频率为 None，则频率被设置的 bug（GH 22150）。
DatetimeIndex 的舍入方法（round()、ceil()、floor()）和 Timestamp 的舍入方法（round()、ceil()、floor()）可能导致精度损失的 bug（GH 22591）。
to_datetime() 传入 Index 参数时，结果会丢失 name 的 bug（GH 21697）。
PeriodIndex 中，添加或减去 timedelta 或 Tick 对象会产生不正确结果的 bug（GH 22988）。
在 Series 的 repr 中，period-dtype 数据之前缺少空格的 bug（GH 23601）。
date_range() 中，当起始日期通过负频率递减到过去的结束日期时，会产生 bug（GH 23270）。
Series.min() 在对包含 NaT 的序列调用时返回 NaN 而不是 NaT 的 bug（GH 23282）。
Series.combine_first() 未正确对齐分类数据，导致 self 中的缺失值未被 other 中的有效值填充的 bug（GH 24147）。
DataFrame.combine() 在处理日期时间类值时引发 TypeError 的 bug（GH 23079）。
date_range() 中，当频率为 Day 或更高，且日期足够远未来时，可能回绕到过去而不是引发 OutOfBoundsDatetime 的 bug（GH 14187）。
period_range() 在 start 和 end 作为 Period 对象提供时，会忽略其频率的 bug（GH 20535）。
PeriodIndex 中，当属性 freq.n 大于 1 时，添加 DateOffset 对象会返回不正确结果的 bug（GH 23215）。
Series 中的一个 bug，当设置日期时间类值时，会将字符串索引解释为字符列表（GH 23451）。
DataFrame 中的一个 bug，从具有时区的 Timestamp 对象数组创建新列时，创建的是 object-dtype 列，而不是带时区的 datetime 列（GH 23932）。
Timestamp 构造函数中的一个 bug，它会删除输入 Timestamp 的频率（GH 22311）。
DatetimeIndex 中的一个 bug，其中调用 np.array(dtindex, dtype=object) 会错误地返回一个 long 对象数组（GH 23524）。
Index 中的一个 bug，其中传入时区感知 DatetimeIndex 和 dtype=object 会错误地引发 ValueError（GH 23524）。
Index 中的一个 bug，其中在时区无关 DatetimeIndex 上调用 np.array(dtindex, dtype=object) 会返回一个 datetime 对象数组，而不是 Timestamp 对象数组，可能导致时间戳的纳秒部分丢失（GH 23524）。
Categorical.__setitem__ 不允许在两个无序但具有相同类别（但顺序不同）的 Categorical 之间进行设置的 bug（GH 24142）。
date_range() 中的一个 bug，其中使用毫秒或更高分辨率的日期可能会返回不正确的值或索引中错误数量的值（GH 24110）。
DatetimeIndex 中的一个 bug，其中从 Categorical 或 CategoricalIndex 构造 DatetimeIndex 会错误地丢失时区信息（GH 18664）。
DatetimeIndex 和 TimedeltaIndex 中的一个 bug，其中使用 Ellipsis 进行索引会错误地丢失索引的 freq 属性（GH 21282）。
澄清了当向 DatetimeIndex 传入的数据首个条目为 NaT 时，且 freq 参数不正确时产生的错误消息（GH 11587）。
to_datetime() 中的一个 bug，当传入 DataFrame 或单位映射的 dict 时，box 和 utc 参数会被忽略（GH 23760）。
Series.dt 中的一个 bug，其中缓存在就地操作后不会正确更新（GH 24408）。
PeriodIndex 中的一个 bug，其中与长度为 1 的类数组对象进行比较时未能引发 ValueError（GH 23078）。
修复了 DatetimeIndex.astype()、PeriodIndex.astype() 和 TimedeltaIndex.astype() 忽略无符号整数 dtype 符号的 bug（GH 24405）。
修复了 Series.max() 在 datetime64[ns]-dtype 情况下，当存在空值且传入 skipna=False 时未能返回 NaT 的 bug（GH 24265）。
to_datetime() 中的一个 bug，其中包含时区感知和时区无关 datetime 对象的数组未能引发 ValueError（GH 24569）。
to_datetime() 中的一个 bug，其中无效的日期时间格式即使 errors='coerce' 也不会将输入强制转换为 NaT（GH 24763）。

时间差（Timedelta）#

具有 timedelta64[ns] dtype 的 DataFrame 除以 Timedelta 类似标量时，错误地返回 timedelta64[ns] dtype 而不是 float64 dtype 的 bug（GH 20088, GH 22163）。
将具有 object dtype 的 Index 添加到具有 timedelta64[ns] dtype 的 Series 时错误地引发异常的 bug（GH 22390）。
将具有数值 dtype 的 Series 乘以 timedelta 对象时的 bug（GH 22390）。
具有数值 dtype 的 Series 在添加或减去具有 timedelta64 dtype 的数组或 Series 时的 bug（GH 22390）。
具有数值 dtype 的 Index 在乘以或除以 dtype 为 timedelta64 的数组时的 bug（GH 22390）。
TimedeltaIndex 错误地允许使用 Timestamp 对象进行索引的 bug（GH 20464）。
修复了从对象 dtype 数组中减去 Timedelta 时引发 TypeError 的 bug（GH 21980）。
修复了将一个全 timedelta64[ns] dtypes 的 DataFrame 添加到一个全整数 dtypes 的 DataFrame 时，返回不正确结果而不是引发 TypeError 的 bug（GH 22696）。
TimedeltaIndex 中的一个 bug，其中添加时区感知日期时间标量会错误地返回时区无关的 DatetimeIndex（GH 23215）。
TimedeltaIndex 中的一个 bug，其中添加 np.timedelta64('NaT') 会错误地返回一个全 NaT 的 DatetimeIndex，而不是一个全 NaT 的 TimedeltaIndex（GH 23215）。
Timedelta 和 to_timedelta() 在支持的单位字符串中存在不一致性的 bug（GH 21762）。
TimedeltaIndex 除法中的一个 bug，其中除以另一个 TimedeltaIndex 会引发 TypeError 而不是返回 Float64Index（GH 23829, GH 22631）。
TimedeltaIndex 比较操作中的一个 bug，其中与非 Timedelta 类似对象进行比较会引发 TypeError，而不是对于 __eq__ 返回全 False，对于 __ne__ 返回全 True（GH 24056）。
Timedelta 与 Tick 对象比较时错误地引发 TypeError 的 bug（GH 24710）。

时区（Timezones）#

Index.shift() 中的一个 bug，其中在夏令时转换时会引发 AssertionError（GH 8616）。
Timestamp 构造函数中的一个 bug，其中传入无效时区偏移指定符（Z）不会引发 ValueError（GH 8910）。
Timestamp.replace() 中的一个 bug，其中在夏令时边界替换会保留不正确的偏移量（GH 7825）。
Series.replace() 在 datetime64[ns, tz] 数据中替换 NaT 时的 bug（GH 11792）。
Timestamp 中的一个 bug，其中传入带有不同时区偏移的不同字符串日期格式会产生不同的时区偏移（GH 12064）。
比较时区无关的 Timestamp 与时区感知的 DatetimeIndex 时，后者会被强制转换为时区无关的 bug（GH 12601）。
Series.truncate() 在使用时区感知 DatetimeIndex 时，可能导致核心转储的 bug（GH 9243）。
Series 构造函数中的一个 bug，它会将时区感知和时区无关的 Timestamp 强制转换为时区感知（GH 13051）。
具有 datetime64[ns, tz] dtype 的 Index 未正确本地化整数数据的 bug（GH 20964）。
DatetimeIndex 中，使用整数和时区构造时未正确本地化的 bug（GH 12619）。
修复了 DataFrame.describe() 和 Series.describe() 在时区感知日期时间上未显示 first 和 last 结果的 bug（GH 21328）。
DatetimeIndex 比较中，当比较时区感知 DatetimeIndex 与 np.datetime64 时，未能引发 TypeError 的 bug（GH 22074）。
DataFrame 使用时区感知标量进行赋值时的 bug（GH 19843）。
DataFrame.asof() 中的一个 bug，当尝试比较时区无关和时区感知时间戳时引发 TypeError（GH 21194）。
使用 replace 方法在夏令时转换时构造 DatetimeIndex，其中 Timestamp 构造时出现 bug（GH 18785）。
使用 DataFrame.loc() 设置新值时，当 DatetimeIndex 存在夏令时转换时出现的 bug（GH 18308, GH 20724）。
Index.unique() 未正确重新本地化时区感知日期的 bug（GH 21737）。
在夏令时转换时索引 Series 的 bug（GH 21846）。
DataFrame.resample() 和 Series.resample() 中的一个 bug，如果时区感知时间序列在夏令时转换处结束，会引发 AmbiguousTimeError 或 NonExistentTimeError（GH 19375, GH 10117）。
DataFrame.drop() 和 Series.drop() 中的一个 bug，当指定时区感知 Timestamp 键从具有夏令时转换的 DatetimeIndex 中删除时（GH 21761）。
DatetimeIndex 构造函数中的一个 bug，其中 NaT 和 dateutil.tz.tzlocal 会引发 OutOfBoundsDatetime 错误（GH 23807）。
DatetimeIndex.tz_localize() 和 Timestamp.tz_localize() 在夏令时转换附近使用 dateutil.tz.tzlocal 时，会返回不正确本地化的日期时间的 bug（GH 23807）。
Timestamp 构造函数中的一个 bug，其中传入带有 datetime.datetime 参数的 dateutil.tz.tzutc 时区会被转换为 pytz.UTC 时区（GH 23807）。
to_datetime() 中的一个 bug，当指定 unit 和 errors='ignore' 时，utc=True 未被尊重（GH 23758）。
to_datetime() 中的一个 bug，当传入 Timestamp 时，utc=True 未被尊重（GH 24415）。
在 DataFrame.any() 中存在错误，当 axis=1 且数据为日期时间类型时，会返回错误值 (GH 23070)
在 DatetimeIndex.to_period() 中存在错误，其中时区感知的索引在创建 PeriodIndex 之前首先被转换为 UTC 时间 (GH 22905)
在 DataFrame.tz_localize()、DataFrame.tz_convert()、Series.tz_localize() 和 Series.tz_convert() 中存在错误，其中 copy=False 会就地修改原始参数 (GH 6326)
在 DataFrame.max() 和 DataFrame.min() 中存在错误，当 axis=1 且所有列包含相同 timezone 时，会返回带有 NaN 的 Series (GH 10390)

偏移量#

在 FY5253 中存在错误，其中日期偏移量在算术运算中可能错误地引发 AssertionError (GH 14774)
在 DateOffset 中存在错误，其中关键字参数 week 和 milliseconds 被接受但被忽略。现在传递这些参数将引发 ValueError (GH 19398)
在向 DataFrame 或 PeriodIndex 添加 DateOffset 时存在错误，错误地引发了 TypeError (GH 23215)
在将 DateOffset 对象与非 DateOffset 对象（尤其是字符串）进行比较时存在错误，会引发 ValueError，而不是在相等性检查时返回 False 并在不等性检查时返回 True (GH 23524)

数值#

在 Series 的 __rmatmul__ 中存在错误，不支持矩阵向量乘法 (GH 21530)
在 factorize() 中存在错误，在只读数组下会失败 (GH 12813)
修复了 unique() 处理带符号零不一致的错误：对于某些输入，0.0 和 -0.0 被视为相等，对于另一些输入则被视为不同。现在对于所有输入，它们都被视为相等 (GH 21866)
在 DataFrame.agg()、DataFrame.transform() 和 DataFrame.apply() 中存在错误，其中，当提供函数列表和 axis=1 时（例如 df.apply(['sum', 'mean'], axis=1)），会错误地引发 TypeError。现在这三种方法都可以正确执行此类计算。 (GH 16679)。
在 Series 与日期时间类型标量和数组进行比较时存在错误 (GH 22074)
在 DataFrame 的布尔型 dtype 和整数相乘时存在错误，返回 object dtype 而非整数 dtype (GH 22047, GH 22163)
在 DataFrame.apply() 中存在错误，其中，当提供字符串参数和额外的位置或关键字参数时（例如 df.apply('sum', min_count=1)），会错误地引发 TypeError (GH 22376)
在 DataFrame.astype() 中存在错误，转换为扩展 dtype 时可能引发 AttributeError (GH 22578)
在 DataFrame 中存在错误，当包含 timedelta64[ns] `dtype` 时，与整数 `dtype` 的 ndarray 进行算术运算会错误地将 `narray` 视为 timedelta64[ns] `dtype` (GH 23114)
在 Series.rpow() 中存在错误，当对象 dtype 为 NaN 时，1 ** NA 的结果为 NaN 而不是 1 (GH 22922)。
Series.agg() 现在可以处理 NumPy 中感知 NaN 的方法，例如 numpy.nansum() (GH 19629)
在 Series.rank() 和 DataFrame.rank() 中存在错误，当 pct=True 且行数超过 2²⁴ 时，导致百分比大于 1.0 (GH 18271)
现在，使用非唯一 CategoricalIndex() 调用 DataFrame.round() 等方法会返回预期数据。以前，数据会被错误地复制 (GH 21809)。
将 log10、floor 和 ceil 添加到 DataFrame.eval() 支持的函数列表中 (GH 24139, GH 24353)
Series 和 Index 之间的逻辑运算 &, |, ^ 将不再引发 ValueError (GH 22092)
在 is_scalar() 函数中检查 PEP 3141 数字会返回 True (GH 22903)
像 Series.sum() 这样的归约方法，现在从 NumPy ufunc 调用时，接受 keepdims=False 的默认值，而不是引发 TypeError。尚未完全实现对 keepdims 的支持 (GH 24356)。

转换#

在 DataFrame.combine_first() 中存在错误，其中列类型意外地转换为浮点型 (GH 20699)
在 DataFrame.clip() 中存在错误，其中列类型未保留并转换为浮点型 (GH 24162)
在 DataFrame.clip() 中存在错误，当数据帧的列顺序不匹配时，观察到的数值结果不正确 (GH 20911)
在 DataFrame.astype() 中存在错误，其中当存在重复列名时，转换为扩展 dtype 会导致 RecursionError (GH 24704)

字符串#

在 Index.str.partition() 中存在错误，它不是 NaN 安全的 (GH 23558)。
在 Index.str.split() 中存在错误，它不是 NaN 安全的 (GH 23677)。
在 Series.str.contains() 中存在错误，对于 Categorical dtype 的 Series，不遵守 na 参数的规定 (GH 22158)
在 Index.str.cat() 中存在错误，当结果只包含 NaN 时 (GH 24044)

区间#

在 IntervalIndex 构造函数中存在错误，其中 closed 参数并非总是覆盖推断的 closed 值 (GH 19370)
在 IntervalIndex 的 repr 中存在错误，其中在区间列表后缺少了一个逗号 (GH 20611)
在 Interval 中存在错误，其中标量算术运算没有保留 closed 值 (GH 22313)
在 IntervalIndex 中存在错误，其中使用日期时间类型值进行索引时引发 KeyError (GH 20636)
在 IntervalTree 中存在错误，其中包含 NaN 的数据触发了警告，并导致使用 IntervalIndex 进行不正确的索引查询 (GH 23352)

索引#

在 DataFrame.ne() 中存在错误，如果列中包含列名“dtype”则会失败 (GH 22383)
现在，当 .loc 请求单个缺失标签时，KeyError 的回溯信息更短、更清晰 (GH 21557)
PeriodIndex 现在在查找格式错误的字符串时会发出 KeyError，这与 DatetimeIndex 的行为一致 (GH 22803)
当对 MultiIndex（其中第一级为整数类型）中的 .ix 请求缺失整数标签时，现在会引发 KeyError，这与扁平 Int64Index 的情况一致，而不再回退到位置索引 (GH 21593)
在 Index.reindex() 中存在错误，当重新索引一个时区非感知和时区感知 DatetimeIndex 时 (GH 8306)
在 Series.reindex() 中存在错误，当使用 datetime64[ns, tz] dtype 重新索引空序列时 (GH 20869)
在 DataFrame 中存在错误，当使用 .loc 和时区感知 DatetimeIndex 设置值时 (GH 11365)
DataFrame.__getitem__ 现在接受字典和字典键作为标签的列表式对象，与 Series.__getitem__ 的行为一致 (GH 21294)
修复了当列非唯一时 DataFrame[np.nan] 的问题 (GH 21428)
当使用纳秒分辨率日期和时区索引 DatetimeIndex 时存在错误 (GH 11679)
其中使用包含负值的 Numpy 数组进行索引会修改索引器，存在错误 (GH 21867)
其中混合索引不允许 .at 使用整数，存在错误 (GH 19860)
Float64Index.get_loc 现在在传递布尔键时引发 KeyError。 (GH 19087)
在 DataFrame.loc() 中存在错误，当使用 IntervalIndex 进行索引时 (GH 19977)
Index 不再破坏 None、NaN 和 NaT，即它们被视为三个不同的键。但是，对于数值型 `Index`，这三者仍然被强制转换为 NaN (GH 22332)
如果标量是浮点型而 Index 是整数 `dtype`，则 scalar in Index 中存在错误 (GH 22085)
在 MultiIndex.set_levels() 中存在错误，当 `levels` 值不可索引时 (GH 23273)
其中通过 Index 设置 timedelta 列会导致其被强制转换为双精度浮点型，从而丢失精度，存在错误 (GH 23511)
在 Index.union() 和 Index.intersection() 中存在错误，其中在某些情况下，结果的 Index 名称计算不正确 (GH 9943, GH 9862)
在 Index 中存在错误，使用布尔 Index 进行切片时可能引发 TypeError (GH 22533)
在 PeriodArray.__setitem__ 中存在错误，当接受切片和列表式值时 (GH 23978)
在 DatetimeIndex、TimedeltaIndex 中存在错误，其中使用 Ellipsis 索引会导致它们丢失 freq 属性 (GH 21282)
在 iat 中存在错误，其中使用它赋值不兼容的值会创建一个新列 (GH 23236)

缺失值#

在 DataFrame.fillna() 中存在错误，其中当某一列包含 datetime64[ns, tz] dtype 时会引发 ValueError (GH 15522)
在 Series.hasnans() 中存在错误，如果初次调用后引入空元素，则可能被错误缓存并返回不正确结果 (GH 19700)
Series.isin() 现在将所有 `NaN` 浮点数视为相等，也适用于 np.object_-dtype。此行为与 float64 的行为一致 (GH 22119)
unique() 不再破坏 np.object_-dtype 的 `NaN` 浮点数和 NaT-object，即 NaT 不再被强制转换为 `NaN` 值并被视为不同的实体。 (GH 22295)
DataFrame 和 Series 现在可以正确处理带硬掩码的 NumPy 掩码数组。以前，从带硬掩码的掩码数组构造 DataFrame 或 Series 会创建一个包含底层值而不是预期 `NaN` 的 pandas 对象。 (GH 24574)
在 DataFrame 构造函数中存在错误，其中在处理 NumPy 掩码记录数组时，dtype 参数未被遵守。 (GH 24874)

多层索引#

在 io.formats.style.Styler.applymap() 中存在错误，其中 subset= 与 MultiIndex 切片会简化为 Series (GH 19861)
移除了对 0.8.0 版本之前 MultiIndex pickle 文件的兼容性；维护了与 0.13 版本及更高版本的 MultiIndex pickle 文件的兼容性 (GH 21654)
MultiIndex.get_loc_level()（以及因此，Series 或 DataFrame 上带有 MultiIndex 索引的 .loc）现在在请求一个存在于 levels 但未使用的标签时，会引发 KeyError，而不是返回一个空 slice (GH 22221)
MultiIndex 新增了 MultiIndex.from_frame() 方法，它允许从 DataFrame 构造 MultiIndex 对象 (GH 22420)
修复了 Python 3 中创建 MultiIndex 时，如果某些级别具有混合类型（例如某些标签是元组）则会引发 TypeError 的错误 (GH 15457)

输入/输出#

在 read_csv() 中存在错误，其中使用布尔类别的 CategoricalDtype 指定的列未能从字符串值正确强制转换为布尔值 (GH 20498)
在 read_csv() 中存在错误，其中在 Python 2.x 中，Unicode 列名未能被正确识别 (GH 13253)
在 DataFrame.to_sql() 中存在错误，当写入时区感知数据（datetime64[ns, tz] dtype）时会引发 TypeError (GH 9086)
在 DataFrame.to_sql() 中存在错误，其中幼稚（naive）的 DatetimeIndex 在支持的数据库（例如 PostgreSQL）中会被写入为 TIMESTAMP WITH TIMEZONE 类型 (GH 23510)
在 read_excel() 中存在错误，当使用空数据集指定 parse_cols 时 (GH 9208)
read_html() 在考虑 skiprows 和 header 参数时，不再忽略中只包含空白字符的。以前，用户必须在此类表格中减小 header 和 skiprows 的值以解决此问题。 (GH 21641)
read_excel() 将正确显示先前已弃用的 sheetname 的弃用警告 (GH 17994)
read_csv() 和 read_table() 将在遇到编码错误的字符串时抛出 UnicodeError 而不会导致核心转储 (GH 22748)
read_csv() 将正确解析时区感知日期时间 (GH 22256)
在 read_csv() 中存在错误，其中当数据分块读取时，C 引擎的内存管理被过早优化 (GH 23509)
在 read_csv() 中存在错误，其中在提取多索引时，未命名的列被错误地识别 (GH 23687)
read_sas() 将正确解析宽度小于 8 字节的 sas7bdat 文件中的数字。 (GH 21616)
read_sas() 将正确解析包含多列的 sas7bdat 文件 (GH 22628)
read_sas() 将正确解析数据页类型设置了位 7 的 sas7bdat 文件（因此页类型为 128 + 256 = 384） (GH 16615)
在 read_sas() 中存在错误，其中在文件格式无效时，引发了不正确的错误。 (GH 24548)
在 detect_client_encoding() 中存在错误，其中在 mod_wsgi 进程中导入时，由于对 stdout 的访问受限，潜在的 IOError 未被处理。 (GH 21552)
在 DataFrame.to_html() 中存在错误，当 index=False 时，在截断的 DataFrame 上缺少截断指示符（…） (GH 15019, GH 22783)
在 DataFrame.to_html() 中存在错误，当 index=False 且列和行索引都是 MultiIndex 时 (GH 22579)
在 DataFrame.to_html() 中存在错误，当 index_names=False 时显示索引名称 (GH 22747)
在 DataFrame.to_html() 中存在错误，当 header=False 时不显示行索引名称 (GH 23788)
在 DataFrame.to_html() 中存在错误，当 sparsify=False 时导致引发 TypeError (GH 22887)
在 DataFrame.to_string() 中存在错误，当 index=False 且第一列值的宽度大于第一列标题的宽度时，导致列对齐被破坏 (GH 16839, GH 13032)
在 DataFrame.to_string() 中存在错误，导致 DataFrame 的表示未能占据整个窗口 (GH 22984)
在 DataFrame.to_csv() 中存在错误，其中单层 MultiIndex 错误地写入了元组。现在只写入索引的值 (GH 19589)。
HDFStore 将在 format 关键字参数传递给构造函数时引发 ValueError (GH 13291)
在 HDFStore.append() 中存在错误，当附加一个包含空字符串列且 min_itemsize < 8 的 DataFrame 时 (GH 12242)
在 read_csv() 中存在错误，其中在解析 NaN 值时，由于完成或出错时清理不足，C 引擎发生内存泄漏 (GH 21353)
在 read_csv() 中存在错误，其中当 skipfooter 与 nrows、iterator 或 chunksize 一起传递时，会引发不正确的错误消息 (GH 23711)
在 read_csv() 中存在错误，其中在未提供 MultiIndex 索引名称的情况下，它们被错误地处理 (GH 23484)
在 read_csv() 中存在错误，其中当方言的值与默认参数冲突时，会引发不必要的警告 (GH 23761)
在 read_html() 中存在错误，其中当提供了无效的风格时，错误消息未显示有效的风格 (GH 23549)
在 read_excel() 中存在错误，其中提取了多余的标题名称，即使未指定任何标题 (GH 11733)
在 read_excel() 中存在错误，其中在 Python 2.x 中，列名有时未能正确转换为字符串 (GH 23874)
在 read_excel() 中存在错误，其中 index_col=None 未被遵守，仍然解析了索引列 (GH 18792, GH 20480)
在 read_excel() 中存在错误，其中当 usecols 作为字符串传入时，未对其进行有效列名验证 (GH 20480)
在 DataFrame.to_dict() 中存在错误，当结果字典在数值数据情况下包含非 Python 标量时 (GH 23753)
DataFrame.to_string()、DataFrame.to_html()、DataFrame.to_latex() 在将字符串作为 float_format 参数传递时，将正确格式化输出 (GH 21625, GH 22270)
在 read_csv() 中存在错误，当尝试将 ‘inf’ 用作整数索引列的 na_value 时，导致引发 OverflowError (GH 17128)
在 read_csv() 中存在错误，导致 Python 3.6+ 在 Windows 上的 C 引擎无法正确读取带有重音或特殊字符的 CSV 文件名 (GH 15086)
在 read_fwf() 中存在错误，其中文件压缩类型未能被正确推断 (GH 22199)
在 pandas.io.json.json_normalize() 中存在错误，当 record_path 的两个连续元素都是字典时，导致引发 TypeError (GH 22706)
在 DataFrame.to_stata()、pandas.io.stata.StataWriter 和 pandas.io.stata.StataWriter117 中存在错误，其中异常会导致写入部分且无效的 dta 文件 (GH 23573)
在 DataFrame.to_stata() 和 pandas.io.stata.StataWriter117 中存在错误，当使用包含非 ASCII 字符的 strLs 时，会生成无效文件 (GH 23573)
在 HDFStore 中存在错误，当 Python 3 从 Python 2 写入的固定格式读取 DataFrame 时，导致引发 ValueError (GH 24510)
在 DataFrame.to_string() 以及更普遍地在浮点型 repr 格式化程序中存在错误。如果列中存在 inf，则零不会被截断，而对于 NA 值则会截断。现在，当存在 NA 时，零也会被截断 (GH 24861)。
当截断列数且最后一列较宽时，repr 中存在错误 (GH 24849)。

绘图#

在 DataFrame.plot.scatter() 和 DataFrame.plot.hexbin() 中存在错误，在 IPython 内联后端中，当颜色条开启时，导致 X 轴标签和刻度标签消失 (GH 10611, GH 10678, 和 GH 20455)
在使用 matplotlib.axes.Axes.scatter() 绘制包含日期时间的 Series 时存在错误 (GH 22039)
在 DataFrame.plot.bar() 中存在错误，导致条形图使用多种颜色而不是单一颜色 (GH 20585)
在验证颜色参数时存在错误，导致额外的颜色被附加到给定的颜色数组中。这发生在多个使用 matplotlib 的绘图函数中。 (GH 20726)

分组/重采样/滚动#

在 Rolling.min() 和 Rolling.max() 中存在错误，当 closed='left'、索引为日期时间类型且序列中仅有一个条目时，导致段错误 (GH 24718)
在 GroupBy.first() 和 GroupBy.last() 中存在错误，当 as_index=False 时，导致时区信息丢失 (GH 15884)
在 DateFrame.resample() 中存在错误，当跨 DST 边界降采样时 (GH 8531)
在 DateFrame.resample() 的日期锚定中存在错误，当偏移量为 Day 且 n > 1 时 (GH 24127)
当分组变量只包含 NaN 且 NumPy 版本小于 1.13 时，调用 SeriesGroupBy 的 SeriesGroupBy.count() 方法会错误地引发 ValueError，存在错误 (GH 21956)。
在 Rolling.min() 中存在多个错误，当 closed='left' 且索引为日期时间类型时，导致结果不正确以及段错误。 (GH 21704)
在 Resampler.apply() 中存在错误，当向应用的函数传递位置参数时 (GH 14615)。
在 Series.resample() 中存在错误，当向 loffset 关键字参数传递 numpy.timedelta64 时 (GH 7687)。
在 Resampler.asfreq() 中存在错误，当 TimedeltaIndex 的频率是新频率的子周期时 (GH 13022)。
在 SeriesGroupBy.mean() 中存在错误，当值为整数但无法适应 int64 而发生溢出时 (GH 22487)
RollingGroupby.agg() 和 ExpandingGroupby.agg() 现在支持将多个聚合函数作为参数 (GH 15072)
在 DataFrame.resample() 和 Series.resample() 中存在错误，当通过每周偏移量 ('W') 跨 DST 转换进行重采样时 (GH 9119, GH 21459)
在 DataFrame.expanding() 中存在错误，其中在聚合期间未遵守 axis 参数 (GH 23372)
在 GroupBy.transform() 中存在错误，当输入函数可以接受 DataFrame 但将其重命名时，导致缺失值 (GH 23455)。
GroupBy.nth() 中的一个错误，导致列顺序并非总能保留 (GH 20760)
GroupBy.rank() 在使用 method='dense' 和 pct=True 模式时的一个错误，当组中只有一个成员时会引发 ZeroDivisionError (GH 23666)。
调用 GroupBy.rank() 时，如果组为空且 pct=True，会引发 ZeroDivisionError (GH 22519)
DataFrame.resample() 在对 TimeDeltaIndex 中的 NaT 进行重采样时的一个错误 (GH 13223)。
DataFrame.groupby() 中的一个错误，在选择列时未遵循 observed 参数，而是始终使用 observed=False (GH 23970)
SeriesGroupBy.pct_change() 或 DataFrameGroupBy.pct_change() 中的一个错误，以前在计算百分比变化时会跨组工作，而现在它能正确地按组工作 (GH 21200, GH 21235)。
一个阻止创建哈希表的错误，当行数非常大 (2^32) 时 (GH 22805)
groupby 中的一个错误，当对分类数据进行分组时，如果 observed=True 且分类列中存在 nan，会引发 ValueError 并导致不正确的分组 (GH 24740, GH 21151)。

重塑#

pandas.concat() 在连接具有时区感知索引的重采样DataFrame时的一个错误 (GH 13783)
pandas.concat() 在仅连接 Series 时的一个错误，现在 concat 的 names 参数不再被忽略 (GH 23490)
Series.combine_first() 对于 datetime64[ns, tz] 数据类型的一个错误，它会返回时区-不感知的结果 (GH 21469)
Series.where() 和 DataFrame.where() 对于 datetime64[ns, tz] 数据类型的一个错误 (GH 21546)
DataFrame.where() 在DataFrame为空且 cond 为空且具有非布尔数据类型时的一个错误 (GH 21947)
Series.mask() 和 DataFrame.mask() 在使用 list 条件时的一个错误 (GH 21891)
DataFrame.replace() 中的一个错误，在转换超出范围的 datetime64[ns, tz] 时会引发 RecursionError (GH 20380)
GroupBy.rank() 现在在为参数 na_option 传递无效值时会引发 ValueError (GH 22124)
get_dummies() 在 Python 2 中处理 Unicode 属性时的一个错误 (GH 22084)
DataFrame.replace() 中的一个错误，在替换空列表时会引发 RecursionError (GH 22083)
Series.replace() 和 DataFrame.replace() 中的一个错误，当字典作为 to_replace 值使用且字典中的一个键是另一个键的值时，使用整数键和使用字符串键的结果不一致 (GH 20656)
DataFrame.drop_duplicates() 对于空的 DataFrame 错误地引发了错误 (GH 20516)
pandas.wide_to_long() 中的一个错误，当字符串作为 stubnames 参数传递且列名是该 stubname 的子字符串时 (GH 22468)
merge() 中的一个错误，在合并包含夏令时 (DST) 转换的 datetime64[ns, tz] 数据时 (GH 18885)
merge_asof() 在定义容差内合并浮点值时的一个错误 (GH 22981)
pandas.concat() 中的一个错误，在将具有时区感知数据的多列DataFrame与列数不同的DataFrame进行连接时 (GH 22796)
merge_asof() 中的一个错误，以前在尝试合并缺失值时会引发令人困惑的错误消息 (GH 23189)
DataFrame.nsmallest() 和 DataFrame.nlargest() 对于列中包含 MultiIndex 的DataFrame的一个错误 (GH 23033)。
pandas.melt() 在传递DataFrame中不存在的列名时的一个错误 (GH 23575)
DataFrame.append() 中的一个错误，当附加一个带有 dateutil 时区的 Series 时会引发 TypeError (GH 23682)
Series 构造函数中的一个错误，在不传递数据且 dtype=str 时 (GH 22477)
cut() 中的一个错误，当 bins 是一个重叠的 IntervalIndex 时，每个项目返回多个bin，而不是引发 ValueError (GH 23980)
pandas.concat() 中的一个错误，在连接 Series datetimetz 与 Series category 时会丢失时区 (GH 23816)
DataFrame.join() 中的一个错误，在部分MultiIndex上连接时会丢失名称 (GH 20452)。
DataFrame.nlargest() 和 DataFrame.nsmallest() 现在会在 keep != 'all' 时返回正确的n值，即使在第一列上存在平局 (GH 22752)
构造一个DataFrame时，如果索引参数不是 Index 的实例，就会出现问题 (GH 22227)。
DataFrame 中的一个错误，阻止了列表子类用于构造 (GH 21226)
DataFrame.unstack() 和 DataFrame.pivot_table() 中的一个错误，当结果DataFrame的元素数量超过int32可处理范围时会返回一个误导性错误消息。现在，错误消息已改进，指向实际问题 (GH 20601)
DataFrame.unstack() 中的一个错误，在unstacking时区感知值时会引发 ValueError (GH 18338)
DataFrame.stack() 中的一个错误，其中时区感知值被转换为时区不感知值 (GH 19420)
merge_asof() 中的一个错误，当 by_col 是时区感知值时会引发 TypeError (GH 21184)
一个错误，在 DataFrame 构造过程中抛出错误时显示不正确的形状。 (GH 20742)

稀疏#

将布尔、日期时间或时间差列更新为稀疏现在可以正常工作 (GH 22367)
Series.to_sparse() 中的一个错误，当Series已经包含稀疏数据但未能正确构造时 (GH 22389)
向SparseArray构造函数提供 sparse_index 不再将所有数据类型的na值默认为 np.nan。现在使用 data.dtype 的正确na值。
SparseArray.nbytes 中的一个错误，通过未包含其稀疏索引的大小而低估了其内存使用量。
提高了 Series.shift() 对于非NA的 fill_value 的性能，因为值不再转换为密集数组。
DataFrame.groupby 中的一个错误，当按稀疏列分组时，未在组中包含非NA fill_value 的 fill_value (GH 5078)
一元反转运算符 (~) 在具有布尔值的 SparseSeries 上的一个错误。其性能也得到了改进 (GH 22835)
SparseArary.unique() 中的一个错误，未返回唯一值 (GH 19595)
SparseArray.nonzero() 和 SparseDataFrame.dropna() 中的一个错误，返回了偏移/不正确的结果 (GH 21172)
DataFrame.apply() 中的一个错误，其中数据类型会失去稀疏性 (GH 23744)
concat() 中的一个错误，在连接包含全稀疏值的 Series 列表时，改变了 fill_value 并转换为密集Series (GH 24371)

样式#

background_gradient() 现在接受 text_color_threshold 参数，根据背景颜色的亮度自动调整文本颜色使其变亮。这在深色背景下提高了可读性，而无需限制背景颜色映射范围。 (GH 21258)
background_gradient() 现在也支持表格范围的应用 (除了行向和列向)，通过设置 axis=None (GH 15204)
bar() 现在也支持表格范围的应用 (除了行向和列向)，通过设置 axis=None 以及使用 vmin 和 vmax 设置剪裁范围 (GH 21548 和 GH 21526)。NaN 值也得到了正确处理。

构建变更#

开发时构建pandas现在需要 cython >= 0.28.2 (GH 21688)
测试pandas现在需要 hypothesis>=3.58。您可以在此处找到 Hypothesis 文档，以及在贡献指南中找到pandas特有的介绍。 (GH 22280)
在macOS上构建pandas现在目标最低macOS 10.9，如果其在macOS 10.9或更高版本上运行 (GH 23424)

其他#

一个错误，C变量被声明为外部链接，如果在pandas之前导入了某些其他C库，则会导致导入错误。 (GH 24113)

贡献者#

共有337人为此版本贡献了补丁。名字旁带有“+”的人是首次贡献补丁。

AJ Dyka +
AJ Pryor, Ph.D +
Aaron Critchley
Adam Hooper
Adam J. Stewart
Adam Kim
Adam Klimont +
Addison Lynch +
Alan Hogue +
Alex Radu +
Alex Rychyk
Alex Strick van Linschoten +
Alex Volkov +
Alexander Buchkovsky
Alexander Hess +
Alexander Ponomaroff +
Allison Browne +
Aly Sivji
Andrew
Andrew Gross +
Andrew Spott +
Andy +
Aniket uttam +
Anjali2019 +
Anjana S +
Antti Kaihola +
Anudeep Tubati +
Arjun Sharma +
Armin Varshokar
Artem Bogachev
ArtinSarraf +
Barry Fitzgerald +
Bart Aelterman +
Ben James +
Ben Nelson +
Benjamin Grove +
Benjamin Rowell +
Benoit Paquet +
Boris Lau +
Brett Naul
Brian Choi +
C.A.M. Gerlach +
Carl Johan +
Chalmer Lowe
Chang She
Charles David +
Cheuk Ting Ho
Chris
Chris Roberts +
Christopher Whelan
Chu Qing Hao +
Da Cheezy Mobsta +
Damini Satya
Daniel Himmelstein
Daniel Saxton +
Darcy Meyer +
DataOmbudsman
David Arcos
David Krych
Dean Langsam +
Diego Argueta +
Diego Torres +
Dobatymo +
Doug Latornell +
Dr. Irv
Dylan Dmitri Gray +
Eric Boxer +
Eric Chea
Erik +
Erik Nilsson +
Fabian Haase +
Fabian Retkowski
Fabien Aulaire +
Fakabbir Amin +
Fei Phoon +
Fernando Margueirat +
Florian Müller +
Fábio Rosado +
Gabe Fernando
Gabriel Reid +
Giftlin Rajaiah
Gioia Ballin +
Gjelt
Gosuke Shibahara +
Graham Inggs
Guillaume Gay
Guillaume Lemaitre +
Hannah Ferchland
Haochen Wu
Hubert +
HubertKl +
HyunTruth +
Iain Barr
Ignacio Vergara Kausel +
Irv Lustig +
IsvenC +
Jacopo Rota
Jakob Jarmar +
James Bourbeau +
James Myatt +
James Winegar +
Jan Rudolph
Jared Groves +
Jason Kiley +
Javad Noorbakhsh +
Jay Offerdahl +
Jeff Reback
Jeongmin Yu +
Jeremy Schendel
Jerod Estapa +
Jesper Dramsch +
Jim Jeon +
Joe Jevnik
Joel Nothman
Joel Ostblom +
Jordi Contestí
Jorge López Fueyo +
Joris Van den Bossche
Jose Quinones +
Jose Rivera-Rubio +
Josh
Jun +
Justin Zheng +
Kaiqi Dong +
Kalyan Gokhale
Kang Yoosam +
Karl Dunkle Werner +
Karmanya Aggarwal +
Kevin Markham +
Kevin Sheppard
Kimi Li +
Koustav Samaddar +
Krishna +
Kristian Holsheimer +
Ksenia Gueletina +
Kyle Prestel +
LJ +
LeakedMemory +
Li Jin +
Licht Takeuchi
Luca Donini +
Luciano Viola +
Mak Sze Chun +
Marc Garcia
Marius Potgieter +
Mark Sikora +
Markus Meier +
Marlene Silva Marchena +
Martin Babka +
MatanCohe +
Mateusz Woś +
Mathew Topper +
Matt Boggess +
Matt Cooper +
Matt Williams +
Matthew Gilbert
Matthew Roeschke
Max Kanter
Michael Odintsov
Michael Silverstein +
Michael-J-Ward +
Mickaël Schoentgen +
Miguel Sánchez de León Peque +
Ming Li
Mitar
Mitch Negus
Monson Shao +
Moonsoo Kim +
Mortada Mehyar
Myles Braithwaite
Nehil Jain +
Nicholas Musolino +
Nicolas Dickreuter +
Nikhil Kumar Mengani +
Nikoleta Glynatsi +
Ondrej Kokes
Pablo Ambrosio +
Pamela Wu +
Parfait G +
Patrick Park +
Paul
Paul Ganssle
Paul Reidy
Paul van Mulbregt +
Phillip Cloud
Pietro Battiston
Piyush Aggarwal +
Prabakaran Kumaresshan +
Pulkit Maloo
Pyry Kovanen
Rajib Mitra +
Redonnet Louis +
Rhys Parry +
Rick +
Robin
Roei.r +
RomainSa +
Roman Imankulov +
Roman Yurchak +
Ruijing Li +
Ryan +
Ryan Nazareth +
Rüdiger Busche +
SEUNG HOON, SHIN +
Sandrine Pataut +
Sangwoong Yoon
Santosh Kumar +
Saurav Chakravorty +
Scott McAllister +
Sean Chan +
Shadi Akiki +
Shengpu Tang +
Shirish Kadam +
Simon Hawkins +
Simon Riddell +
Simone Basso
Sinhrks
Soyoun(Rose) Kim +
Srinivas Reddy Thatiparthy (శ్రీనివాస్ రెడ్డి తాటిపర్తి) +
Stefaan Lippens +
Stefano Cianciulli
Stefano Miccoli +
Stephen Childs
Stephen Pascoe
Steve Baker +
Steve Cook +
Steve Dower +
Stéphan Taljaard +
Sumin Byeon +
Sören +
Tamas Nagy +
Tanya Jain +
Tarbo Fukazawa
Thein Oo +
Thiago Cordeiro da Fonseca +
Thierry Moisan
Thiviyan Thanapalasingam +
Thomas Lentali +
Tim D. Smith +
Tim Swast
Tom Augspurger
Tomasz Kluczkowski +
Tony Tao +
Triple0 +
Troels Nielsen +
Tuhin Mahmud +
Tyler Reddy +
Uddeshya Singh
Uwe L. Korn +
Vadym Barda +
Varad Gunjal +
Victor Maryama +
Victor Villas
Vincent La
Vitória Helena +
Vu Le
Vyom Jain +
Weiwen Gu +
Wenhuan
Wes Turner
Wil Tan +
William Ayd
Yeojin Kim +
Yitzhak Andrade +
Yuecheng Wu +
Yuliya Dovzhenko +
Yury Bayda +
Zac Hatfield-Dodds +
aberres +
aeltanawy +
ailchau +
alimcmaster1
alphaCTzo7G +
amphy +
araraonline +
azure-pipelines[bot] +
benarthur91 +
bk521234 +
cgangwar11 +
chris-b1
cxl923cc +
dahlbaek +
dannyhyunkim +
darke-spirits +
david-liu-brattle-1
davidmvalente +
deflatSOCO
doosik_bae +
dylanchase +
eduardo naufel schettino +
euri10 +
evangelineliu +
fengyqf +
fjdiod
fl4p +
fleimgruber +
gfyoung
h-vetinari
harisbal +
henriqueribeiro +
himanshu awasthi
hongshaoyang +
igorfassen +
jalazbe +
jbrockmendel
jh-wu +
justinchan23 +
louispotok
marcosrullan +
miker985
nicolab100 +
nprad
nsuresh +
ottiP
pajachiet +
raguiar2 +
ratijas +
realead +
robbuckley +
saurav2608 +
sideeye +
ssikdar1
svenharris +
syutbai +
testvinder +
thatneat
tmnhat2001
tomascassidy +
tomneep
topper-123
vkk800 +
winlu +
ym-pett +
yrhooke +
ywpark1 +
zertrin
zhezherun +

0.24.0 版中的新功能（2019 年 1 月 25 日）#

改进#

可选的整数 NA 支持#

访问 Series 或 Index 中的值#

pandas.array: 创建数组的新顶级方法#

在 Series 和 DataFrame 中存储 Interval 和 Period 数据#

与两个 MultiIndex 进行连接#

read_html 函数的改进#

新的 Styler.pipe() 方法#

重命名 MultiIndex 中的名称#

其他改进#

向后不兼容的 API 更改#

依赖项的最低版本要求提高#

os.linesep 用于 DataFrame.to_csv 的 line_terminator#

Python 引擎下字符串数据类型列中 np.nan 的正确处理#

解析带时区偏移的日期时间字符串#

使用 read_csv() 解析混合时区#

dt.end_time 和 to_timestamp(how='end') 中的时间值#

时区感知数据的 Series.unique#

稀疏数据结构重构#

get_dummies() 始终返回一个 DataFrame#

在 DataFrame.to_dict(orient='index') 中引发 ValueError#

Tick DateOffset 规范化限制#

周期相减#

DataFrame 中 NaN 的加法/减法#

DataFrame 比较操作的广播更改#

DataFrame 算术操作的广播更改#

Series 和 Index 数据类型不兼容#

连接更改#

日期时间型 API 更改#

其他 API 更改#

扩展类型更改#

弃用#

整数与日期时间/时间差的加减法已弃用#

向 DatetimeIndex 传递整数数据和时区#

将时区感知 Series 和 Index 转换为 NumPy 数组#

移除先前版本的弃用/更改#

性能改进#

Bug 修复#

分类数据（Categorical）#

日期时间类（Datetimelike）#

时间差（Timedelta）#

时区（Timezones）#

偏移量#

数值#

转换#

字符串#

区间#

索引#

缺失值#

多层索引#

输入/输出#

绘图#

分组/重采样/滚动#

重塑#

稀疏#

样式#

构建变更#

其他#

贡献者#

`pandas.array`: 创建数组的新顶级方法#

`read_html` 函数的改进#

新的 `Styler.pipe()` 方法#

`os.linesep` 用于 `DataFrame.to_csv` 的 `line_terminator`#

Python 引擎下字符串数据类型列中 `np.nan` 的正确处理#

使用 `read_csv()` 解析混合时区#

`dt.end_time` 和 `to_timestamp(how='end')` 中的时间值#

`get_dummies()` 始终返回一个 DataFrame#

在 `DataFrame.to_dict(orient='index')` 中引发 ValueError#

DataFrame 中 `NaN` 的加法/减法#