版本 0.18.0 (2016年3月13日)#

这是从 0.17.1 版本发布的一个主要版本，包含少量 API 变更、多项新功能、增强功能和性能改进，以及大量的错误修复。我们建议所有用户升级到此版本。

警告

pandas >= 0.18.0 不再兼容 Python 2.6 和 3.3 版本 (GH 7718, GH 11273)

警告

numexpr 2.4.4 版本现在会显示警告，并且由于存在一些错误行为，将不再用作 pandas 的计算后端。这不影响其他版本（>= 2.1 和 >= 2.4.6）。 (GH 12489)

主要亮点包括

移动和扩展窗口函数现在是 Series 和 DataFrame 的方法，类似于 .groupby，详见此处。
添加了对 RangeIndex 的支持，作为 Int64Index 的专用形式，以节省内存，详见此处。
对 .resample 方法进行了 API 破坏性更改，使其更接近 .groupby，详见此处。
移除了对浮点数位置索引的支持，此功能自 0.14.0 版本起已弃用。现在将引发 TypeError，详见此处。
已添加 .to_xarray() 函数，以兼容 xarray 包，详见此处。
已增强 read_sas 函数，使其能够读取 sas7bdat 文件，详见此处。
添加了 .str.extractall() 方法，以及对 .str.extract() 方法和 .str.cat() 方法的 API 更改。
pd.test() 顶层 nose 测试运行器可用 (GH 4327)。

更新前请查看 API 变更和弃用。

新功能#

窗口函数现在是方法#

窗口函数已被重构为 Series/DataFrame 对象的方法，而不是顶层函数，后者现已弃用。这使得这些窗口类型函数能够拥有类似于 .groupby 的 API。完整的文档请参见此处 (GH 11603, GH 12373)

In [1]: np.random.seed(1234)

In [2]: df = pd.DataFrame({'A': range(10), 'B': np.random.randn(10)})

In [3]: df
Out[3]: 
   A         B
0  0  0.471435
1  1 -1.190976
2  2  1.432707
3  3 -0.312652
4  4 -0.720589
5  5  0.887163
6  6  0.859588
7  7 -0.636524
8  8  0.015696
9  9 -2.242685

[10 rows x 2 columns]

旧行为

In [8]: pd.rolling_mean(df, window=3)
        FutureWarning: pd.rolling_mean is deprecated for DataFrame and will be removed in a future version, replace with
                       DataFrame.rolling(window=3,center=False).mean()
Out[8]:
    A         B
0 NaN       NaN
1 NaN       NaN
2   1  0.237722
3   2 -0.023640
4   3  0.133155
5   4 -0.048693
6   5  0.342054
7   6  0.370076
8   7  0.079587
9   8 -0.954504

新行为

In [4]: r = df.rolling(window=3)

这些显示描述性表示

In [5]: r
Out[5]: Rolling [window=3,center=False,axis=0,method=single]

并提供可用方法和属性的 Tab 补全。

In [9]: r.<TAB>  # noqa E225, E999
r.A           r.agg         r.apply       r.count       r.exclusions  r.max         r.median      r.name        r.skew        r.sum
r.B           r.aggregate   r.corr        r.cov         r.kurt        r.mean        r.min         r.quantile    r.std         r.var

这些方法直接作用于 Rolling 对象本身

In [6]: r.mean()
Out[6]: 
     A         B
NaN       NaN
NaN       NaN
1.0  0.237722
2.0 -0.023640
3.0  0.133155
4.0 -0.048693
5.0  0.342054
6.0  0.370076
7.0  0.079587
8.0 -0.954504

[10 rows x 2 columns]

它们提供 getitem 访问器

In [7]: r['A'].mean()
Out[7]: 
  NaN
  NaN
  1.0
  2.0
  3.0
  4.0
  5.0
  6.0
  7.0
  8.0
Name: A, Length: 10, dtype: float64

以及多重聚合

In [8]: r.agg({'A': ['mean', 'std'],
   ...:        'B': ['mean', 'std']})
   ...: 
Out[8]: 
     A              B          
  mean  std      mean       std
0  NaN  NaN       NaN       NaN
1  NaN  NaN       NaN       NaN
2  1.0  1.0  0.237722  1.327364
3  2.0  1.0 -0.023640  1.335505
4  3.0  1.0  0.133155  1.143778
5  4.0  1.0 -0.048693  0.835747
6  5.0  1.0  0.342054  0.920379
7  6.0  1.0  0.370076  0.871850
8  7.0  1.0  0.079587  0.750099
9  8.0  1.0 -0.954504  1.162285

[10 rows x 4 columns]

rename 的变更#

Series.rename 和 NDFrame.rename_axis 现在可以接受标量或列表式参数来更改 Series 或轴的名称，除了它们旧有的更改标签的行为。(GH 9494, GH 11965)

In [9]: s = pd.Series(np.random.randn(5))

In [10]: s.rename('newname')
Out[10]: 
0    1.150036
1    0.991946
2    0.953324
3   -2.021255
4   -0.334077
Name: newname, Length: 5, dtype: float64

In [11]: df = pd.DataFrame(np.random.randn(5, 2))

In [12]: (df.rename_axis("indexname")
   ....:    .rename_axis("columns_name", axis="columns"))
   ....: 
Out[12]: 
columns_name         0         1
indexname                       
0             0.002118  0.405453
1             0.289092  1.321158
2            -1.546906 -0.202646
3            -0.655969  0.193421
4             0.553439  1.318152

[5 rows x 2 columns]

新功能在方法链中表现良好。以前，这些方法只接受函数或将标签映射到新标签的字典。对于函数或字典式的值，此功能仍与以前一样。

Range 索引#

已将 RangeIndex 添加到 Int64Index 子类中，以支持在常见用例中节省内存的替代方案。这与 Python 的 range 对象（Python 2 中的 xrange）具有相似的实现，因为它只存储索引的起始、停止和步长值。它将与用户 API 透明交互，如果需要则转换为 Int64Index。

现在，这将是 NDFrame 对象的默认构建索引，而不是以前的 Int64Index。 (GH 939, GH 12070, GH 12071, GH 12109, GH 12888)

旧行为

In [3]: s = pd.Series(range(1000))

In [4]: s.index
Out[4]:
Int64Index([  0,   1,   2,   3,   4,   5,   6,   7,   8,   9,
            ...
            990, 991, 992, 993, 994, 995, 996, 997, 998, 999], dtype='int64', length=1000)

In [6]: s.index.nbytes
Out[6]: 8000

新行为

In [13]: s = pd.Series(range(1000))

In [14]: s.index
Out[14]: RangeIndex(start=0, stop=1000, step=1)

In [15]: s.index.nbytes
Out[15]: 128

str.extract 的变更#

.str.extract 方法接受带有捕获组的正则表达式，查找每个主题字符串中的第一个匹配项，并返回捕获组的内容 (GH 11386)。

在 v0.18.0 中，expand 参数已添加到 extract 中。

expand=False：它根据主题和正则表达式模式返回 Series、Index 或 DataFrame（与 0.18.0 版本之前行为相同）。
expand=True：它总是返回一个 DataFrame，这从用户的角度来看更一致、更不易混淆。

目前默认值为 expand=None，这会发出 FutureWarning 并使用 expand=False。为避免此警告，请显式指定 expand。

In [1]: pd.Series(['a1', 'b2', 'c3']).str.extract(r'[ab](\d)', expand=None)
FutureWarning: currently extract(expand=None) means expand=False (return Index/Series/DataFrame)
but in a future version of pandas this will be changed to expand=True (return DataFrame)

Out[1]:
0      1
1      2
2    NaN
dtype: object

当 expand=False 时，提取带有一个组的正则表达式会返回一个 Series。

In [16]: pd.Series(['a1', 'b2', 'c3']).str.extract(r'[ab](\d)', expand=False)
Out[16]: 
0      1
1      2
2    NaN
Length: 3, dtype: object

当 expand=True 时，它返回一个带有一列的 DataFrame。

In [17]: pd.Series(['a1', 'b2', 'c3']).str.extract(r'[ab](\d)', expand=True)
Out[17]: 
     0
0    1
1    2
2  NaN

[3 rows x 1 columns]

当 expand=False 时，对带有一个捕获组的 Index 调用正则表达式会返回一个 Index。

In [18]: s = pd.Series(["a1", "b2", "c3"], ["A11", "B22", "C33"])

In [19]: s.index
Out[19]: Index(['A11', 'B22', 'C33'], dtype='object')

In [20]: s.index.str.extract("(?P<letter>[a-zA-Z])", expand=False)
Out[20]: Index(['A', 'B', 'C'], dtype='object', name='letter')

当 expand=True 时，它返回一个带有一列的 DataFrame。

In [21]: s.index.str.extract("(?P<letter>[a-zA-Z])", expand=True)
Out[21]: 
  letter
0      A
1      B
2      C

[3 rows x 1 columns]

当 expand=False 时，对带有多个捕获组的 Index 调用正则表达式会引发 ValueError。

>>> s.index.str.extract("(?P<letter>[a-zA-Z])([0-9]+)", expand=False)
ValueError: only one regex group is supported with Index

当 expand=True 时，它返回一个 DataFrame。

In [22]: s.index.str.extract("(?P<letter>[a-zA-Z])([0-9]+)", expand=True)
Out[22]: 
  letter   1
0      A  11
1      B  22
2      C  33

[3 rows x 2 columns]

总而言之，extract(expand=True) 始终返回一个 DataFrame，其中每行对应一个主题字符串，每列对应一个捕获组。

添加 str.extractall#

已添加 .str.extractall 方法 (GH 11386)。与 extract 不同，后者只返回第一个匹配项。

In [23]: s = pd.Series(["a1a2", "b1", "c1"], ["A", "B", "C"])

In [24]: s
Out[24]: 
A    a1a2
B      b1
C      c1
Length: 3, dtype: object

In [25]: s.str.extract(r"(?P<letter>[ab])(?P<digit>\d)", expand=False)
Out[25]: 
  letter digit
A      a     1
B      b     1
C    NaN   NaN

[3 rows x 2 columns]

方法 extractall 返回所有匹配项。

In [26]: s.str.extractall(r"(?P<letter>[ab])(?P<digit>\d)")
Out[26]: 
        letter digit
  match             
A 0          a     1
  1          a     2
B 0          b     1

[3 rows x 2 columns]

str.cat 的变更#

方法 .str.cat() 连接 Series 的成员。以前，如果 Series 中存在 NaN 值，对其调用 .str.cat() 将返回 NaN，这与 Series.str.* API 的其余部分不同。此行为已修改为默认忽略 NaN 值。 (GH 11435)。

添加了一个新的、更友好的 ValueError，以防止将 sep 作为参数而不是关键字参数提供的错误。(GH 11334)。

In [27]: pd.Series(['a', 'b', np.nan, 'c']).str.cat(sep=' ')
Out[27]: 'a b c'

In [28]: pd.Series(['a', 'b', np.nan, 'c']).str.cat(sep=' ', na_rep='?')
Out[28]: 'a b ? c'

In [2]: pd.Series(['a', 'b', np.nan, 'c']).str.cat(' ')
ValueError: Did you mean to supply a ``sep`` keyword?

日期时间类型舍入#

DatetimeIndex、Timestamp、TimedeltaIndex、Timedelta 获得了用于日期时间类型舍入、向下取整和向上取整的 .round()、.floor() 和 .ceil() 方法。(GH 4314, GH 11963)

朴素日期时间

In [29]: dr = pd.date_range('20130101 09:12:56.1234', periods=3)

In [30]: dr
Out[30]: 
DatetimeIndex(['2013-01-01 09:12:56.123400', '2013-01-02 09:12:56.123400',
               '2013-01-03 09:12:56.123400'],
              dtype='datetime64[ns]', freq='D')

In [31]: dr.round('s')
Out[31]: 
DatetimeIndex(['2013-01-01 09:12:56', '2013-01-02 09:12:56',
               '2013-01-03 09:12:56'],
              dtype='datetime64[ns]', freq=None)

# Timestamp scalar
In [32]: dr[0]
Out[32]: Timestamp('2013-01-01 09:12:56.123400')

In [33]: dr[0].round('10s')
Out[33]: Timestamp('2013-01-01 09:13:00')

时区感知日期时间在本地时间进行舍入、向下取整和向上取整

In [34]: dr = dr.tz_localize('US/Eastern')

In [35]: dr
Out[35]: 
DatetimeIndex(['2013-01-01 09:12:56.123400-05:00',
               '2013-01-02 09:12:56.123400-05:00',
               '2013-01-03 09:12:56.123400-05:00'],
              dtype='datetime64[ns, US/Eastern]', freq=None)

In [36]: dr.round('s')
Out[36]: 
DatetimeIndex(['2013-01-01 09:12:56-05:00', '2013-01-02 09:12:56-05:00',
               '2013-01-03 09:12:56-05:00'],
              dtype='datetime64[ns, US/Eastern]', freq=None)

Timedelta

In [37]: t = pd.timedelta_range('1 days 2 hr 13 min 45 us', periods=3, freq='d')

In [38]: t
Out[38]: 
TimedeltaIndex(['1 days 02:13:00.000045', '2 days 02:13:00.000045',
                '3 days 02:13:00.000045'],
               dtype='timedelta64[ns]', freq='D')

In [39]: t.round('10min')
Out[39]: TimedeltaIndex(['1 days 02:10:00', '2 days 02:10:00', '3 days 02:10:00'], dtype='timedelta64[ns]', freq=None)

# Timedelta scalar
In [40]: t[0]
Out[40]: Timedelta('1 days 02:13:00.000045')

In [41]: t[0].round('2h')
Out[41]: Timedelta('1 days 02:00:00')

此外，.round()、.floor() 和 .ceil() 将通过 Series 的 .dt 访问器提供。

In [42]: s = pd.Series(dr)

In [43]: s
Out[43]: 
0   2013-01-01 09:12:56.123400-05:00
1   2013-01-02 09:12:56.123400-05:00
2   2013-01-03 09:12:56.123400-05:00
Length: 3, dtype: datetime64[ns, US/Eastern]

In [44]: s.dt.round('D')
Out[44]: 
0   2013-01-01 00:00:00-05:00
1   2013-01-02 00:00:00-05:00
2   2013-01-03 00:00:00-05:00
Length: 3, dtype: datetime64[ns, US/Eastern]

FloatIndex 中整数的格式化#

FloatIndex 中的整数，例如 1.，现在格式化为带小数点的 0 位数字，例如 1.0 (GH 11713) 此更改不仅影响控制台显示，还影响 .to_csv 或 .to_html 等 IO 方法的输出。

旧行为

In [2]: s = pd.Series([1, 2, 3], index=np.arange(3.))

In [3]: s
Out[3]:
0    1
1    2
2    3
dtype: int64

In [4]: s.index
Out[4]: Float64Index([0.0, 1.0, 2.0], dtype='float64')

In [5]: print(s.to_csv(path=None))
0,1
1,2
2,3

新行为

In [45]: s = pd.Series([1, 2, 3], index=np.arange(3.))

In [46]: s
Out[46]: 
0.0    1
1.0    2
2.0    3
Length: 3, dtype: int64

In [47]: s.index
Out[47]: Index([0.0, 1.0, 2.0], dtype='float64')

In [48]: print(s.to_csv(path_or_buf=None, header=False))
0.0,1
1.0,2
2.0,3

dtype 赋值行为的变更#

当 DataFrame 的切片使用相同 dtype 的新切片更新时，DataFrame 的 dtype 现在将保持不变。(GH 10503)

旧行为

In [5]: df = pd.DataFrame({'a': [0, 1, 1],
                           'b': pd.Series([100, 200, 300], dtype='uint32')})

In [7]: df.dtypes
Out[7]:
a     int64
b    uint32
dtype: object

In [8]: ix = df['a'] == 1

In [9]: df.loc[ix, 'b'] = df.loc[ix, 'b']

In [11]: df.dtypes
Out[11]:
a    int64
b    int64
dtype: object

新行为

In [49]: df = pd.DataFrame({'a': [0, 1, 1],
   ....:                    'b': pd.Series([100, 200, 300], dtype='uint32')})
   ....: 

In [50]: df.dtypes
Out[50]: 
a     int64
b    uint32
Length: 2, dtype: object

In [51]: ix = df['a'] == 1

In [52]: df.loc[ix, 'b'] = df.loc[ix, 'b']

In [53]: df.dtypes
Out[53]: 
a     int64
b    uint32
Length: 2, dtype: object

当 DataFrame 的整数切片被部分更新为浮点数新切片时，这些浮点数在不损失精度的情况下可以向下转换为整数，切片的 dtype 将设置为浮点数而不是整数。

旧行为

In [4]: df = pd.DataFrame(np.array(range(1,10)).reshape(3,3),
                          columns=list('abc'),
                          index=[[4,4,8], [8,10,12]])

In [5]: df
Out[5]:
      a  b  c
4 8   1  2  3
  10  4  5  6
8 12  7  8  9

In [7]: df.ix[4, 'c'] = np.array([0., 1.])

In [8]: df
Out[8]:
      a  b  c
4 8   1  2  0
  10  4  5  1
8 12  7  8  9

新行为

In [54]: df = pd.DataFrame(np.array(range(1,10)).reshape(3,3),
   ....:                   columns=list('abc'),
   ....:                   index=[[4,4,8], [8,10,12]])
   ....: 

In [55]: df
Out[55]: 
      a  b  c
4 8   1  2  3
  10  4  5  6
8 12  7  8  9

[3 rows x 3 columns]

In [56]: df.loc[4, 'c'] = np.array([0., 1.])

In [57]: df
Out[57]: 
      a  b  c
4 8   1  2  0
  10  4  5  1
8 12  7  8  9

[3 rows x 3 columns]

to_xarray 方法#

在未来版本的 pandas 中，我们将弃用 Panel 和其他大于 2 维度的对象。为了提供连续性，所有 NDFrame 对象都获得了 .to_xarray() 方法，以便转换为 xarray 对象，后者为大于 2 维度的对象提供了类似 pandas 的接口。(GH 11972)

有关详细信息，请参见 xarray 完整文档。

In [1]: p = Panel(np.arange(2*3*4).reshape(2,3,4))

In [2]: p.to_xarray()
Out[2]:
<xarray.DataArray (items: 2, major_axis: 3, minor_axis: 4)>
array([[[ 0,  1,  2,  3],
        [ 4,  5,  6,  7],
        [ 8,  9, 10, 11]],

       [[12, 13, 14, 15],
        [16, 17, 18, 19],
        [20, 21, 22, 23]]])
Coordinates:
  * items       (items) int64 0 1
  * major_axis  (major_axis) int64 0 1 2
  * minor_axis  (minor_axis) int64 0 1 2 3

Latex 表示#

DataFrame 获得了 ._repr_latex_() 方法，以便使用 nbconvert 在 ipython/jupyter notebook 中转换为 latex。(GH 11778)

请注意，这必须通过设置选项 pd.display.latex.repr=True 来激活 (GH 12182)

例如，如果您有一个打算使用 nbconvert 转换为 latex 的 jupyter notebook，请在第一个单元格中放置语句 pd.display.latex.repr=True，以便其中包含的 DataFrame 输出也以 latex 格式存储。

选项 display.latex.escape 和 display.latex.longtable 也已添加到配置中，并由 to_latex 方法自动使用。有关更多信息，请参阅可用选项文档。

`pd.read_sas()` 变更#

read_sas 获得了读取 SAS7BDAT 文件（包括压缩文件）的能力。文件可以整体读取，也可以增量读取。有关详细信息，请参见此处。(GH 4052)

其他增强功能#

处理 SAS xport 文件中的截断浮点数 (GH 11713)
在 Series.to_string 中添加了隐藏索引的选项 (GH 11729)
read_excel 现在支持 s3://bucketname/filename 格式的 s3 URL (GH 11447)
从 s3 读取时添加对 AWS_S3_HOST 环境变量的支持 (GH 12198)
Panel.round() 的简单版本现已实现 (GH 11763)
对于 Python 3.x，round(DataFrame)、round(Series)、round(Panel) 将可用 (GH 11763)
sys.getsizeof(obj) 返回 pandas 对象的内存使用情况，包括其包含的值 (GH 11597)
Series 获得了 is_unique 属性 (GH 11946)
DataFrame.quantile 和 Series.quantile 现在接受 interpolation 关键字 (GH 10174)。
添加了 DataFrame.style.format，用于更灵活地格式化单元格值 (GH 11692)
DataFrame.select_dtypes 现在允许 np.float16 类型代码 (GH 11990)
pivot_table() 现在接受 values 参数的大多数可迭代对象 (GH 12017)
添加了 Google BigQuery 服务账户认证支持，实现了在远程服务器上认证。(GH 11881, GH 12572)。有关详细信息，请参见此处
HDFStore 现在是可迭代的：for k in store 等同于 for k in store.keys() (GH 12221)。
为 Period 的 .dt 添加了缺失的方法/字段 (GH 8848)
整个代码库已进行 PEP 规范化 (GH 12096)

向后不兼容的 API 变更#

.to_string(index=False) 方法的输出中已移除前导空格 (GH 11833)
out 参数已从 Series.round() 方法中移除。(GH 11763)
DataFrame.round() 在其返回中保持非数值列不变，而不是引发错误。(GH 11885)
DataFrame.head(0) 和 DataFrame.tail(0) 返回空帧，而不是 self。(GH 11937)
Series.head(0) 和 Series.tail(0) 返回空 Series，而不是 self。(GH 11937)
to_msgpack 和 read_msgpack 编码现在默认设置为 'utf-8'。(GH 12170)
文本文件解析函数（.read_csv()、.read_table()、.read_fwf()）的关键字参数顺序已更改，以便将相关参数分组。(GH 11555)
NaTType.isoformat 现在返回字符串 'NaT，以允许将结果传递给 Timestamp 的构造函数。(GH 12300)

NaT 和 Timedelta 操作#

NaT 和 Timedelta 扩展了算术操作，这些操作在适用时也扩展到 Series 算术。为 datetime64[ns] 或 timedelta64[ns] 定义的操作现在也为 NaT 定义 (GH 11564)。

NaT 现在支持与整数和浮点数的算术操作。

In [58]: pd.NaT * 1
Out[58]: NaT

In [59]: pd.NaT * 1.5
Out[59]: NaT

In [60]: pd.NaT / 2
Out[60]: NaT

In [61]: pd.NaT * np.nan
Out[61]: NaT

NaT 定义了更多与 datetime64[ns] 和 timedelta64[ns] 的算术操作。

In [62]: pd.NaT / pd.NaT
Out[62]: nan

In [63]: pd.Timedelta('1s') / pd.NaT
Out[63]: nan

NaT 可以表示 datetime64[ns] 空值或 timedelta64[ns] 空值。鉴于其模糊性，它被视为 timedelta64[ns]，这使得更多操作能够成功。

In [64]: pd.NaT + pd.NaT
Out[64]: NaT

# same as
In [65]: pd.Timedelta('1s') + pd.Timedelta('1s')
Out[65]: Timedelta('0 days 00:00:02')

而不是

In [3]: pd.Timestamp('19900315') + pd.Timestamp('19900315')
TypeError: unsupported operand type(s) for +: 'Timestamp' and 'Timestamp'

但是，当包装在 dtype 为 datetime64[ns] 或 timedelta64[ns] 的 Series 中时，dtype 信息将得到尊重。

In [1]: pd.Series([pd.NaT], dtype='<M8[ns]') + pd.Series([pd.NaT], dtype='<M8[ns]')
TypeError: can only operate on a datetimes for subtraction,
           but the operator [__add__] was passed

In [66]: pd.Series([pd.NaT], dtype='<m8[ns]') + pd.Series([pd.NaT], dtype='<m8[ns]')
Out[66]: 
0   NaT
Length: 1, dtype: timedelta64[ns]

Timedelta 除以 floats 现在可用。

In [67]: pd.Timedelta('1s') / 2.0
Out[67]: Timedelta('0 days 00:00:00.500000')

Series 中 Timedelta 减去 Timestamp 现在可用 (GH 11925)

In [68]: ser = pd.Series(pd.timedelta_range('1 day', periods=3))

In [69]: ser
Out[69]: 
0   1 days
1   2 days
2   3 days
Length: 3, dtype: timedelta64[ns]

In [70]: pd.Timestamp('2012-01-01') - ser
Out[70]: 
0   2011-12-31
1   2011-12-30
2   2011-12-29
Length: 3, dtype: datetime64[ns]

NaT.isoformat() 现在返回 'NaT'。此更改允许 pd.Timestamp 根据其 isoformat 重新生成任何类似时间戳的对象 (GH 12300)。

msgpack 的变更#

在 0.17.0 和 0.18.0 版本中，msgpack 写入格式发生了向前不兼容的更改；旧版 pandas 无法读取由新版本打包的文件 (GH 12129, GH 10527)

0.17.0 中引入并在 0.18.0 中修复的 to_msgpack 和 read_msgpack 中的错误导致 Python 2 中打包的文件无法被 Python 3 读取 (GH 12142)。下表描述了 msgpacks 的向后和向前兼容性。

警告

打包自	可解包自
0.17 版前 / Python 2	任意
0.17 版前 / Python 3	任意
0.17 版 / Python 2	等于 0.17 版 / Python 2 大于等于 0.18 版 / 任意 Python
0.17 版 / Python 3	大于等于 0.18 版 / 任意 Python
0.18	>= 0.18

0.18.0 版本向后兼容读取旧版本打包的文件，但 0.17 版在 Python 2 中打包的文件除外，它们只能在 Python 2 中解包。

.rank 的签名变更#

Series.rank 和 DataFrame.rank 现在具有相同的签名 (GH 11759)

旧签名

In [3]: pd.Series([0,1]).rank(method='average', na_option='keep',
                              ascending=True, pct=False)
Out[3]:
0    1
1    2
dtype: float64

In [4]: pd.DataFrame([0,1]).rank(axis=0, numeric_only=None,
                                 method='average', na_option='keep',
                                 ascending=True, pct=False)
Out[4]:
   0
0  1
1  2

新签名

In [71]: pd.Series([0,1]).rank(axis=0, method='average', numeric_only=False,
   ....:                       na_option='keep', ascending=True, pct=False)
   ....: 
Out[71]: 
0    1.0
1    2.0
Length: 2, dtype: float64

In [72]: pd.DataFrame([0,1]).rank(axis=0, method='average', numeric_only=False,
   ....:                          na_option='keep', ascending=True, pct=False)
   ....: 
Out[72]: 
     0
0  1.0
1  2.0

[2 rows x 1 columns]

QuarterBegin 中 n=0 时的错误#

在以前的版本中，QuarterBegin 偏移量的行为不一致，具体取决于 n 参数为 0 时的日期。(GH 11406)

当 n=0 时，锚定偏移量的通用语义是：如果日期是锚点（例如，季度开始日期），则不移动日期，否则向前滚动到下一个锚点。

In [73]: d = pd.Timestamp('2014-02-01')

In [74]: d
Out[74]: Timestamp('2014-02-01 00:00:00')

In [75]: d + pd.offsets.QuarterBegin(n=0, startingMonth=2)
Out[75]: Timestamp('2014-02-01 00:00:00')

In [76]: d + pd.offsets.QuarterBegin(n=0, startingMonth=1)
Out[76]: Timestamp('2014-04-01 00:00:00')

对于以前版本中的 QuarterBegin 偏移量，如果日期与季度开始日期在同一个月，则日期会向后滚动。

In [3]: d = pd.Timestamp('2014-02-15')

In [4]: d + pd.offsets.QuarterBegin(n=0, startingMonth=2)
Out[4]: Timestamp('2014-02-01 00:00:00')

此行为已在 0.18.0 版本中得到修正，这与其他锚定偏移量（如 MonthBegin 和 YearBegin）保持一致。

In [77]: d = pd.Timestamp('2014-02-15')

In [78]: d + pd.offsets.QuarterBegin(n=0, startingMonth=2)
Out[78]: Timestamp('2014-05-01 00:00:00')

Resample API#

与上面窗口函数 API 的更改类似，.resample(...) 正在更改为具有更像 groupby 的 API。(GH 11732, GH 12702, GH 12202, GH 12332, GH 12334, GH 12348, GH 12448)。

In [79]: np.random.seed(1234)

In [80]: df = pd.DataFrame(np.random.rand(10,4),
   ....:                   columns=list('ABCD'),
   ....:                   index=pd.date_range('2010-01-01 09:00:00',
   ....:                                       periods=10, freq='s'))
   ....: 

In [81]: df
Out[81]: 
                            A         B         C         D
2010-01-01 09:00:00  0.191519  0.622109  0.437728  0.785359
2010-01-01 09:00:01  0.779976  0.272593  0.276464  0.801872
2010-01-01 09:00:02  0.958139  0.875933  0.357817  0.500995
2010-01-01 09:00:03  0.683463  0.712702  0.370251  0.561196
2010-01-01 09:00:04  0.503083  0.013768  0.772827  0.882641
2010-01-01 09:00:05  0.364886  0.615396  0.075381  0.368824
2010-01-01 09:00:06  0.933140  0.651378  0.397203  0.788730
2010-01-01 09:00:07  0.316836  0.568099  0.869127  0.436173
2010-01-01 09:00:08  0.802148  0.143767  0.704261  0.704581
2010-01-01 09:00:09  0.218792  0.924868  0.442141  0.909316

[10 rows x 4 columns]

旧 API:

您会编写一个立即评估的重采样操作。如果没有提供 how 参数，它将默认设置为 how='mean'。

In [6]: df.resample('2s')
Out[6]:
                         A         B         C         D
2010-01-01 09:00:00  0.485748  0.447351  0.357096  0.793615
2010-01-01 09:00:02  0.820801  0.794317  0.364034  0.531096
2010-01-01 09:00:04  0.433985  0.314582  0.424104  0.625733
2010-01-01 09:00:06  0.624988  0.609738  0.633165  0.612452
2010-01-01 09:00:08  0.510470  0.534317  0.573201  0.806949

您也可以直接指定 how

In [7]: df.resample('2s', how='sum')
Out[7]:
                         A         B         C         D
2010-01-01 09:00:00  0.971495  0.894701  0.714192  1.587231
2010-01-01 09:00:02  1.641602  1.588635  0.728068  1.062191
2010-01-01 09:00:04  0.867969  0.629165  0.848208  1.251465
2010-01-01 09:00:06  1.249976  1.219477  1.266330  1.224904
2010-01-01 09:00:08  1.020940  1.068634  1.146402  1.613897

新 API:

现在，您可以将 .resample(..) 编写为像 .groupby(...) 一样的两阶段操作，这将生成一个 Resampler。

In [82]: r = df.resample('2s')

In [83]: r
Out[83]: <pandas.core.resample.DatetimeIndexResampler object at 0x7f9440acefe0>

降采样#

然后您可以使用此对象执行操作。这些是降采样操作（从高频率到低频率）。

In [84]: r.mean()
Out[84]: 
                            A         B         C         D
2010-01-01 09:00:00  0.485748  0.447351  0.357096  0.793615
2010-01-01 09:00:02  0.820801  0.794317  0.364034  0.531096
2010-01-01 09:00:04  0.433985  0.314582  0.424104  0.625733
2010-01-01 09:00:06  0.624988  0.609738  0.633165  0.612452
2010-01-01 09:00:08  0.510470  0.534317  0.573201  0.806949

[5 rows x 4 columns]

In [85]: r.sum()
Out[85]: 
                            A         B         C         D
2010-01-01 09:00:00  0.971495  0.894701  0.714192  1.587231
2010-01-01 09:00:02  1.641602  1.588635  0.728068  1.062191
2010-01-01 09:00:04  0.867969  0.629165  0.848208  1.251465
2010-01-01 09:00:06  1.249976  1.219477  1.266330  1.224904
2010-01-01 09:00:08  1.020940  1.068634  1.146402  1.613897

[5 rows x 4 columns]

此外，resample 现在支持 getitem 操作，以在特定列上执行重采样。

In [86]: r[['A','C']].mean()
Out[86]: 
                            A         C
2010-01-01 09:00:00  0.485748  0.357096
2010-01-01 09:00:02  0.820801  0.364034
2010-01-01 09:00:04  0.433985  0.424104
2010-01-01 09:00:06  0.624988  0.633165
2010-01-01 09:00:08  0.510470  0.573201

[5 rows x 2 columns]

以及 .aggregate 类型操作。

In [87]: r.agg({'A' : 'mean', 'B' : 'sum'})
Out[87]: 
                            A         B
2010-01-01 09:00:00  0.485748  0.894701
2010-01-01 09:00:02  0.820801  1.588635
2010-01-01 09:00:04  0.433985  0.629165
2010-01-01 09:00:06  0.624988  1.219477
2010-01-01 09:00:08  0.510470  1.068634

[5 rows x 2 columns]

这些访问器当然可以组合使用

In [88]: r[['A','B']].agg(['mean','sum'])
Out[88]: 
                            A                   B          
                         mean       sum      mean       sum
2010-01-01 09:00:00  0.485748  0.971495  0.447351  0.894701
2010-01-01 09:00:02  0.820801  1.641602  0.794317  1.588635
2010-01-01 09:00:04  0.433985  0.867969  0.314582  0.629165
2010-01-01 09:00:06  0.624988  1.249976  0.609738  1.219477
2010-01-01 09:00:08  0.510470  1.020940  0.534317  1.068634

[5 rows x 4 columns]

升采样#

升采样操作将频率从低频率变为高频率。这些操作现在通过 Resampler 对象与 backfill()、ffill()、fillna() 和 asfreq() 方法执行。

In [89]: s = pd.Series(np.arange(5, dtype='int64'),
              index=pd.date_range('2010-01-01', periods=5, freq='Q'))

In [90]: s
Out[90]:
2010-03-31    0
2010-06-30    1
2010-09-30    2
2010-12-31    3
2011-03-31    4
Freq: Q-DEC, Length: 5, dtype: int64

以前

In [6]: s.resample('M', fill_method='ffill')
Out[6]:
2010-03-31    0
2010-04-30    0
2010-05-31    0
2010-06-30    1
2010-07-31    1
2010-08-31    1
2010-09-30    2
2010-10-31    2
2010-11-30    2
2010-12-31    3
2011-01-31    3
2011-02-28    3
2011-03-31    4
Freq: M, dtype: int64

新 API

In [91]: s.resample('M').ffill()
Out[91]:
2010-03-31    0
2010-04-30    0
2010-05-31    0
2010-06-30    1
2010-07-31    1
2010-08-31    1
2010-09-30    2
2010-10-31    2
2010-11-30    2
2010-12-31    3
2011-01-31    3
2011-02-28    3
2011-03-31    4
Freq: M, Length: 13, dtype: int64

注意

在新 API 中，您可以进行降采样或升采样。之前的实现允许您即使在升采样时也传递聚合函数（如 mean），这会造成一些混淆。

旧版 API 仍可使用，但会显示弃用警告#

警告

这个新的 resample API 包含了一些针对 0.18.0 版本之前 API 的内部更改，使其在大多数情况下能够与弃用警告一起使用，因为 resample 操作返回一个延迟对象。我们可以拦截操作并执行（0.18.0 之前的）API 所做的操作（带警告）。以下是一个典型用例

In [4]: r = df.resample('2s')

In [6]: r*10
pandas/tseries/resample.py:80: FutureWarning: .resample() is now a deferred operation
use .resample(...).mean() instead of .resample(...)

Out[6]:
                      A         B         C         D
2010-01-01 09:00:00  4.857476  4.473507  3.570960  7.936154
2010-01-01 09:00:02  8.208011  7.943173  3.640340  5.310957
2010-01-01 09:00:04  4.339846  3.145823  4.241039  6.257326
2010-01-01 09:00:06  6.249881  6.097384  6.331650  6.124518
2010-01-01 09:00:08  5.104699  5.343172  5.732009  8.069486

但是，直接在 Resampler 上进行获取和赋值操作将引发 ValueError

In [7]: r.iloc[0] = 5
ValueError: .resample() is now a deferred operation
use .resample(...).mean() instead of .resample(...)

在某些情况下，当使用原始代码时，新 API 无法执行所有操作。此代码旨在每 2 秒重采样，取平均值，然后取这些结果的最小值。

In [4]: df.resample('2s').min()
Out[4]:
A    0.433985
B    0.314582
C    0.357096
D    0.531096
dtype: float64

新 API 将会

In [89]: df.resample('2s').min()
Out[89]: 
                            A         B         C         D
2010-01-01 09:00:00  0.191519  0.272593  0.276464  0.785359
2010-01-01 09:00:02  0.683463  0.712702  0.357817  0.500995
2010-01-01 09:00:04  0.364886  0.013768  0.075381  0.368824
2010-01-01 09:00:06  0.316836  0.568099  0.397203  0.436173
2010-01-01 09:00:08  0.218792  0.143767  0.442141  0.704581

[5 rows x 4 columns]

好消息是，新 API 和旧 API 的返回维度会有所不同，因此这应该会大声引发异常。

复制原始操作

In [90]: df.resample('2s').mean().min()
Out[90]: 
A    0.433985
B    0.314582
C    0.357096
D    0.531096
Length: 4, dtype: float64

eval 的变更#

在以前的版本中，eval 表达式中的新列赋值导致 DataFrame 就地更改。(GH 9297, GH 8664, GH 10486)

In [91]: df = pd.DataFrame({'a': np.linspace(0, 10, 5), 'b': range(5)})

In [92]: df
Out[92]: 
      a  b
0   0.0  0
1   2.5  1
2   5.0  2
3   7.5  3
4  10.0  4

[5 rows x 2 columns]

In [12]: df.eval('c = a + b')
FutureWarning: eval expressions containing an assignment currentlydefault to operating inplace.
This will change in a future version of pandas, use inplace=True to avoid this warning.

In [13]: df
Out[13]:
      a  b     c
0   0.0  0   0.0
1   2.5  1   3.5
2   5.0  2   7.0
3   7.5  3  10.5
4  10.0  4  14.0

在 0.18.0 版本中，添加了一个新的 inplace 关键字，以选择赋值是就地完成还是返回副本。

In [93]: df
Out[93]: 
      a  b     c
0   0.0  0   0.0
1   2.5  1   3.5
2   5.0  2   7.0
3   7.5  3  10.5
4  10.0  4  14.0

[5 rows x 3 columns]

In [94]: df.eval('d = c - b', inplace=False)
Out[94]: 
      a  b     c     d
0   0.0  0   0.0   0.0
1   2.5  1   3.5   2.5
2   5.0  2   7.0   5.0
3   7.5  3  10.5   7.5
4  10.0  4  14.0  10.0

[5 rows x 4 columns]

In [95]: df
Out[95]: 
      a  b     c
0   0.0  0   0.0
1   2.5  1   3.5
2   5.0  2   7.0
3   7.5  3  10.5
4  10.0  4  14.0

[5 rows x 3 columns]

In [96]: df.eval('d = c - b', inplace=True)

In [97]: df
Out[97]: 
      a  b     c     d
0   0.0  0   0.0   0.0
1   2.5  1   3.5   2.5
2   5.0  2   7.0   5.0
3   7.5  3  10.5   7.5
4  10.0  4  14.0  10.0

[5 rows x 4 columns]

警告

为了向后兼容，如果未指定，inplace 默认为 True。这将在未来版本的 pandas 中更改。如果您的代码依赖于就地赋值，您应该更新为显式设置 inplace=True

inplace 关键字参数也已添加到 query 方法中。

In [98]: df.query('a > 5')
Out[98]: 
      a  b     c     d
3   7.5  3  10.5   7.5
4  10.0  4  14.0  10.0

[2 rows x 4 columns]

In [99]: df.query('a > 5', inplace=True)

In [100]: df
Out[100]: 
      a  b     c     d
3   7.5  3  10.5   7.5
4  10.0  4  14.0  10.0

[2 rows x 4 columns]

警告

请注意，query 中 inplace 的默认值为 False，这与之前的版本一致。

eval 也已更新，允许用于多次赋值的多行表达式。这些表达式将按顺序逐一评估。多行表达式中只有赋值有效。

In [101]: df
Out[101]: 
      a  b     c     d
3   7.5  3  10.5   7.5
4  10.0  4  14.0  10.0

[2 rows x 4 columns]

In [102]: df.eval("""
   .....: e = d + a
   .....: f = e - 22
   .....: g = f / 2.0""", inplace=True)
   .....: 

In [103]: df
Out[103]: 
      a  b     c     d     e    f    g
3   7.5  3  10.5   7.5  15.0 -7.0 -3.5
4  10.0  4  14.0  10.0  20.0 -2.0 -1.0

[2 rows x 7 columns]

其他 API 变更#

DataFrame.between_time 和 Series.between_time 现在只解析一组固定的时间字符串。不再支持日期字符串的解析并引发 ValueError。(GH 11818)

In [107]: s = pd.Series(range(10), pd.date_range('2015-01-01', freq='H', periods=10))

In [108]: s.between_time("7:00am", "9:00am")
Out[108]:
2015-01-01 07:00:00    7
2015-01-01 08:00:00    8
2015-01-01 09:00:00    9
Freq: H, Length: 3, dtype: int64

这将现在引发错误。

In [2]: s.between_time('20150101 07:00:00','20150101 09:00:00')
ValueError: Cannot convert arg ['20150101 07:00:00'] to a time.

.memory_usage() 现在包括索引中的值，.info() 中的 memory_usage 也是如此 (GH 11597)
DataFrame.to_latex() 现在在 Python 2 中通过参数 encoding 支持非 ASCII 编码（例如 utf-8）(GH 7061)
pandas.merge() 和 DataFrame.merge() 在尝试与非 DataFrame 类型或其子类的对象合并时，将显示特定的错误消息 (GH 12081)
DataFrame.unstack 和 Series.unstack 现在接受 fill_value 关键字，以便在 unstack 导致结果 DataFrame 中出现缺失值时直接替换缺失值。作为额外的好处，指定 fill_value 将保留原始堆叠数据的类型。(GH 9746)
作为窗口函数和重采样新 API 的一部分，聚合函数已得到澄清，对无效聚合引发更具信息性的错误消息。(GH 9052)。完整的示例在groupby中给出。
NDFrame 对象的统计函数（如 sum()、mean()、min()）现在将在为 **kwargs 传入非 numpy 兼容参数时引发错误 (GH 12301)
.to_latex 和 .to_html 获得了与 .to_csv 类似的 decimal 参数；默认值为 '.' (GH 12031)
当使用空数据但带索引构造 DataFrame 时，显示更具帮助性的错误消息 (GH 8020)
.describe() 现在将正确处理布尔 dtype 作为分类类型 (GH 6625)
当 .transform 带有用户定义输入无效时，显示更具帮助性的错误消息 (GH 10165)
指数加权函数现在允许直接指定 alpha (GH 10789)，如果参数违反 0 < alpha <= 1 则引发 ValueError (GH 12492)

弃用#

函数 pd.rolling_*、pd.expanding_* 和 pd.ewm* 已弃用，并由相应的方法调用替代。请注意，新的建议语法包含所有参数（即使是默认参数）(GH 11603)

In [1]: s = pd.Series(range(3))

In [2]: pd.rolling_mean(s,window=2,min_periods=1)
        FutureWarning: pd.rolling_mean is deprecated for Series and
             will be removed in a future version, replace with
             Series.rolling(min_periods=1,window=2,center=False).mean()
Out[2]:
        0    0.0
        1    0.5
        2    1.5
        dtype: float64

In [3]: pd.rolling_cov(s, s, window=2)
        FutureWarning: pd.rolling_cov is deprecated for Series and
             will be removed in a future version, replace with
             Series.rolling(window=2).cov(other=<Series>)
Out[3]:
        0    NaN
        1    0.5
        2    0.5
        dtype: float64

.rolling、.expanding 和 .ewm（新）函数的 freq 和 how 参数已弃用，并将在未来版本中移除。您可以在创建窗口函数之前简单地对输入进行重采样。(GH 11603)。

例如，不是使用 s.rolling(window=5,freq='D').max() 获取滚动 5 天窗口的最大值，而是可以使用 s.resample('D').mean().rolling(window=5).max()。
它首先将数据重采样为每日数据，然后提供一个滚动 5 天窗口。
pd.tseries.frequencies.get_offset_name 函数已弃用。请使用 offset 的 .freqstr 属性作为替代方案 (GH 11192)
pandas.stats.fama_macbeth 例程已弃用，并将在未来版本中移除 (GH 6077)
pandas.stats.ols、pandas.stats.plm 和 pandas.stats.var 例程已弃用，并将在未来版本中移除 (GH 6077)
在使用 HDFStore.select 中长期弃用的语法时显示 FutureWarning 而不是 DeprecationWarning，其中 where 子句不是字符串类型 (GH 12027)

`pandas.options.display.mpl_style` 配置已弃用，并将在未来版本的 pandas 中移除。此功能最好由 matplotlib 的样式表处理 (GH 11783)。

移除已弃用的浮点索引器#

In [104]: s = pd.Series([1, 2, 3], index=[4, 5, 6])

In [105]: s
Out[105]: 
4    1
5    2
6    3
Length: 3, dtype: int64

In [106]: s2 = pd.Series([1, 2, 3], index=list('abc'))

In [107]: s2
Out[107]: 
a    1
b    2
c    3
Length: 3, dtype: int64

旧行为

# this is label indexing
In [2]: s[5.0]
FutureWarning: scalar indexers for index type Int64Index should be integers and not floating point
Out[2]: 2

# this is positional indexing
In [3]: s.iloc[1.0]
FutureWarning: scalar indexers for index type Int64Index should be integers and not floating point
Out[3]: 2

# this is label indexing
In [4]: s.loc[5.0]
FutureWarning: scalar indexers for index type Int64Index should be integers and not floating point
Out[4]: 2

# .ix would coerce 1.0 to the positional 1, and index
In [5]: s2.ix[1.0] = 10
FutureWarning: scalar indexers for index type Index should be integers and not floating point

In [6]: s2
Out[6]:
a     1
b    10
c     3
dtype: int64

新行为

在 GH 4892 中，对非 Float64Index 使用浮点数索引已弃用（在 0.14.0 版本中）。在 0.18.0 版本中，此弃用警告已移除，现在将引发 TypeError。(GH 12165, GH 12333)

In [3]: s.iloc[2.0]
TypeError: cannot do label indexing on <class 'pandas.indexes.numeric.Int64Index'> with these indexers [2.0] of <type 'float'>

对于 iloc，通过浮点标量获取和设置将始终引发错误。

In [108]: s[5.0]
Out[108]: 2

In [109]: s.loc[5.0]
Out[109]: 2

其他索引器将强制转换为类似整数的类型，用于获取和设置。FutureWarning 已针对 .loc、.ix 和 [] 移除。

In [110]: s_copy = s.copy()

In [111]: s_copy[5.0] = 10

In [112]: s_copy
Out[112]: 
4     1
5    10
6     3
Length: 3, dtype: int64

In [113]: s_copy = s.copy()

In [114]: s_copy.loc[5.0] = 10

In [115]: s_copy
Out[115]: 
4     1
5    10
6     3
Length: 3, dtype: int64

和设置

In [3]: s2.ix[1.0] = 10
In [4]: s2
Out[4]:
a       1
b       2
c       3
1.0    10
dtype: int64

使用 .ix 和浮点索引器进行位置设置会将此值添加到索引中，而不是像以前那样按位置设置值。

In [116]: s.loc[5.0:6]
Out[116]: 
5    2
6    3
Length: 2, dtype: int64

对于非 Float64Index，切片也将把类似整数的浮点数强制转换为整数。

In [117]: s.loc[5.1:6]
Out[117]: 
6    3
Length: 1, dtype: int64

请注意，对于无法强制转换为整数的浮点数，将排除基于标签的边界

In [118]: s = pd.Series([1, 2, 3], index=np.arange(3.))

In [119]: s[1.0]
Out[119]: 2

In [120]: s[1.0:2.5]
Out[120]: 
1.0    2
2.0    3
Length: 2, dtype: int64

Float64Index 上的浮点索引保持不变。

移除之前版本的弃用/变更#
移除 rolling_corr_pairwise，转而使用 .rolling().corr(pairwise=True) (GH 4950)
移除 expanding_corr_pairwise，转而使用 .expanding().corr(pairwise=True) (GH 4950)
移除 DataMatrix 模块。无论如何，此模块都没有导入到 pandas 命名空间中 (GH 12111)
移除 DataFrame.duplicated() 和 DataFrame.drop_duplicates() 中 cols 关键字，转而使用 subset (GH 6680)
移除 pd.io.sql 命名空间中的 read_frame 和 frame_query（均为 pd.read_sql 的别名）以及 write_frame（to_sql 的别名）函数，这些函数自 0.14.0 版本起已弃用 (GH 6292)。

移除 .factorize() 中的 `order` 关键字 (GH 6930)

性能改进#
改进 andrews_curves 的性能 (GH 11534)
改进了大型 DatetimeIndex、PeriodIndex 和 TimedeltaIndex 的操作性能，包括 NaT (GH 10277)
改进 pandas.concat 的性能 (GH 11958)
改进 StataReader 的性能 (GH 11591)
改进了使用包含 NaT 的日期时间 Series 构建 Categoricals 的性能 (GH 12077)

改进了 ISO 8601 日期解析的性能，适用于无分隔符、带前导零和时区前有空格的日期 (GH 11899)，带前导零 (GH 11871) 和时区前有空格 (GH 9714)

错误修复#
GroupBy.size 在数据帧为空时的错误。(GH 11699)
Period.end_time 在请求时间段的倍数时的错误 (GH 11738)
.clip 在时区感知日期时间中的回归问题 (GH 11838)
date_range 在边界落在频率上时的错误 (GH 11804, GH 12409)
将嵌套字典传递给 .groupby(...).agg(...) 的一致性错误 (GH 9052)
Timedelta 构造函数接受 Unicode (GH 11995)
StataReader 在增量读取时读取值标签的错误 (GH 12014)
向量化 DateOffset 在 n 参数为 0 时的错误 (GH 11370)
numpy 1.11 与 NaT 比较变更的兼容性 (GH 12049)
read_csv 在线程中从 StringIO 读取时的错误 (GH 11790)
在因子化和 Categoricals 中，未将 NaT 视为日期时间类中的缺失值的错误 (GH 12077)
getitem 在 Series 的值是时区感知时的错误 (GH 12089)
Series.str.get_dummies 在其中一个变量为 ‘name’ 时的错误 (GH 12180)
pd.concat 在连接时区感知 NaT Series 时的错误。(GH 11693, GH 11755, GH 12217)
pd.read_stata 读取版本 <= 108 文件时的错误 (GH 12232)
Series.resample 在索引为 DatetimeIndex 且包含非零纳秒部分时使用 Nano 频率的错误 (GH 12037)
使用 .nunique 和稀疏索引重采样时的错误 (GH 12352)
移除了一些编译器警告 (GH 12471)
解决 python 3.5 中 boto 的兼容性问题 (GH 11915)
NaT 从带时区的 Timestamp 或 DatetimeIndex 中减去的错误 (GH 11718)
Series 的单个时区感知 Timestamp 减法错误 (GH 12290)
`Timedelta.round` 在处理负值时存在的错误 (GH 11690)
`CategoricalIndex` 的 `.loc` 操作可能导致生成普通的 `Index` 的错误 (GH 11586)
`DataFrame.info` 在存在重复列名时存在的错误 (GH 11761)
带有时区信息的 datetime 对象的 `.copy` 方法存在的错误 (GH 11794)
`Series.apply` 和 `Series.map` 中 `timedelta64` 未被装箱的错误 (GH 11349)
`DataFrame.set_index()` 处理带有时区信息的 `Series` 时存在的错误 (GH 12358)
`DataFrame` 子类中的错误：`AttributeError` 未能传播 (GH 11808)
对带有时区信息的数据进行 groupby 操作时的错误：选择结果未返回 `Timestamp` (GH 11616)
`pd.read_clipboard` 和 `pd.to_clipboard` 函数不支持 Unicode 的错误；升级已包含 `pyperclip` v1.5.15 (GH 9263)
包含赋值操作的 `DataFrame.query` 存在的错误 (GH 8664)
`from_msgpack` 中的错误：如果 `DataFrame` 具有对象列，则解包的 `DataFrame` 的列的 `__contains__()` 方法会失败。 (GH 11880)
对带有 `TimedeltaIndex` 的分类数据进行 `.resample` 操作时的错误 (GH 12169)
将标量 datetime 广播到 `DataFrame` 时丢失时区信息的错误 (GH 11682)
从带有混合时区信息的 `Timestamp` 创建 `Index` 时强制转换为 UTC 的错误 (GH 11488)
`to_numeric` 中的错误：如果输入具有多于一个维度，则不抛出异常 (GH 11776)
解析带有非零分钟的时区偏移字符串时的错误 (GH 11708)
`df.plot` 在 matplotlib 1.5+ 下使用不正确的颜色绘制条形图时的错误 (GH 11614)
`groupby` 的 `plot` 方法在使用关键字参数时存在的错误 (GH 11805)。
`DataFrame.duplicated` 和 `drop_duplicates` 在设置 `keep=False` 时导致虚假匹配的错误 (GH 11864)
`.loc` 使用重复键时的结果可能具有 dtype 不正确的 `Index` 的错误 (GH 11497)
`pd.rolling_median` 中的错误：即使内存充足，内存分配仍失败 (GH 11696)
`DataFrame.style` 出现虚假零值的错误 (GH 12134)
`DataFrame.style` 在整数列不从 0 开始时的错误 (GH 12125)
`.style.bar` 在特定浏览器中可能无法正确渲染的错误 (GH 11678)
`Timedelta` 与 `Timedelta` 的 `numpy.array` 进行富比较时导致无限递归的错误 (GH 11835)
`DataFrame.round` 丢弃列索引名称的错误 (GH 11986)
`df.replace` 在混合 dtype 的 `DataFrame` 中替换值时的错误 (GH 11698)
`Index` 中的错误：当未提供新名称时，无法复制传入 `Index` 的名称 (GH 11193)
`read_excel` 中的错误：在存在空工作表且 `sheetname=None` 时无法读取任何非空工作表 (GH 11711)
`read_excel` 中的错误：在提供关键字 `parse_dates` 和 `date_parser` 时未能抛出 `NotImplemented` 错误 (GH 11544)
`read_sql` 中的错误：`pymysql` 连接未能返回分块数据 (GH 11522)
`.to_csv` 忽略浮点索引的格式化参数 `decimal`、`na_rep`、`float_format` 的错误 (GH 11553)
`Int64Index` 和 `Float64Index` 阻止使用模运算符的错误 (GH 9244)
`MultiIndex.drop` 在处理非字典序排序的 `MultiIndex` 时的错误 (GH 12078)
`DataFrame` 在掩码空 `DataFrame` 时的错误 (GH 11859)
`.plot` 中的错误：当列数与提供的 Series 数量不匹配时，可能修改 `colors` 输入 (GH 12039)。
`Series.plot` 在索引具有 `CustomBusinessDay` 频率时失败的错误 (GH 7222)。
`.to_sql` 处理 `datetime.time` 值并使用 sqlite 回退时的错误 (GH 8341)
`read_excel` 中的错误：在 `squeeze=True` 时无法读取单列数据 (GH 12157)
`read_excel` 无法读取单个空列的错误 (GH 12292, GH 9002)
`.groupby` 中的错误：如果 DataFrame 中只有一行，则不会为错误的列抛出 `KeyError` (GH 11741)
`.read_csv` 中的错误：在空数据上指定 dtype 时产生错误 (GH 12048)
`.read_csv` 中的错误：像 `'2E'` 这样的字符串被视为有效的浮点数 (GH 12237)
使用调试符号构建 pandas 时的错误 (GH 12123)
移除了 `DatetimeIndex` 的 `millisecond` 属性。这总会抛出 `ValueError` (GH 12019)。
`Series` 构造函数处理只读数据时的错误 (GH 11502)
移除了 `pandas._testing.choice()`。应改用 `np.random.choice()`。 (GH 12386)
`.loc` setitem 索引器阻止使用带时区信息的 `DatetimeIndex` 的错误 (GH 12050)
`.style` 中的错误：索引和 MultiIndex 未显示 (GH 11655)
`to_msgpack` 和 `from_msgpack` 未能正确序列化或反序列化 `NaT` 的错误 (GH 12307)。
`.skew` 和 `.kurt` 由于极相似值引起的舍入误差的错误 (GH 11974)
`Timestamp` 构造函数中的错误：如果 HHMMSS 未用 `:` 分隔，则微秒分辨率会丢失 (GH 10041)
`buffer_rd_bytes` 中的错误：如果读取失败，src->buffer 可能会被多次释放，导致段错误 (GH 12098)
`crosstab` 中的错误：具有非重叠索引的参数会返回 `KeyError` (GH 10291)
`DataFrame.apply` 中的错误：在 `dtype` 不是 numpy dtype 的情况下，未能阻止归约 (GH 12244)
使用标量值初始化分类 Series 时的错误。 (GH 12336)
在 `.to_datetime` 中通过设置 `utc=True` 来指定 UTC `DatetimeIndex` 时的错误 (GH 11934)
`read_csv` 中增加 CSV 读取器缓冲区大小时的错误 (GH 12494)
为具有重复列名的 `DataFrame` 设置列时的错误 (GH 12344)

贡献者#

共有 101 人为此版本贡献了补丁。名字旁带有“+”的人是首次贡献补丁。

ARF +
Alex Alekseyev +
Andrew McPherson +
Andrew Rosenfeld
Andy Hayden
Anthonios Partheniou
Anton I. Sipos
Ben +
Ben North +
Bran Yang +
Chris
Chris Carroux +
Christopher C. Aycock +
Christopher Scanlin +
Cody +
Da Wang +
Daniel Grady +
Dorozhko Anton +
Dr-Irv +
Erik M. Bray +
Evan Wright
Francis T. O’Donovan +
Frank Cleary +
Gianluca Rossi
Graham Jeffries +
Guillaume Horel
Henry Hammond +
Isaac Schwabacher +
Jean-Mathieu Deschenes
Jeff Reback
Joe Jevnik +
John Freeman +
John Fremlin +
Jonas Hoersch +
Joris Van den Bossche
Joris Vankerschaver
Justin Lecher
Justin Lin +
Ka Wo Chen
Keming Zhang +
Kerby Shedden
Kyle +
Marco Farrugia +
MasonGallo +
MattRijk +
Matthew Lurie +
Maximilian Roos
Mayank Asthana +
Mortada Mehyar
Moussa Taifi +
Navreet Gill +
Nicolas Bonnotte
Paul Reiners +
Philip Gura +
Pietro Battiston
RahulHP +
Randy Carnevale
Rinoc Johnson
Rishipuri +
Sangmin Park +
Scott E Lasley
Sereger13 +
Shannon Wang +
Skipper Seabold
Thierry Moisan
Thomas A Caswell
Toby Dylan Hocking +
Tom Augspurger
Travis +
Trent Hauck
Tux1
Varun
Wes McKinney
Will Thompson +
Yoav Ram
Yoong Kang Lim +
Yoshiki Vázquez Baeza
Young Joong Kim +
Younggun Kim
Yuval Langer +
alex argunov +
behzad nouri
boombard +
brian-pantano +
chromy +
daniel +
dgram0 +
gfyoung +
hack-c +
hcontrast +
jfoo +
kaustuv deolal +
llllllllll
ranarag +
rockg
scls19fr
seales +
sinhrks
srib +
surveymedia.ca +
tworec +