版本 0.16.0 (2015年3月22日)#

这是从 0.15.2 以来的一个主要版本，包含少量 API 更改、一些新功能、增强和性能改进以及大量错误修复。我们建议所有用户升级到此版本。

亮点包括

DataFrame.assign 方法，参见此处
Series.to_coo/from_coo 方法，用于与 scipy.sparse 交互，参见此处
Timedelta 的向后不兼容更改，以使 .seconds 属性与 datetime.timedelta 一致，参见此处
.loc 切片 API 的更改，以符合 .ix 的行为，参见此处
Categorical 构造函数中排序默认值的更改，参见此处
.str 访问器的增强，以简化字符串操作，参见此处
pandas.tools.rplot、pandas.sandbox.qtpandas 和 pandas.rpy 模块已弃用。我们建议用户使用外部包，如 seaborn、pandas-qt 和 rpy2 来获得类似或等效的功能，参见此处

更新前请查看API 更改和弃用。

新功能#

DataFrame 分配#

受 dplyr 的 mutate 动词启发，DataFrame 有了一个新的 assign() 方法。assign 的函数签名很简单，就是 **kwargs。键是新字段的列名，值是要插入的值（例如，Series 或 NumPy 数组），或者是要在 DataFrame 上调用的一个参数的函数。新值被插入，并且返回整个 DataFrame（包含所有原始列和新列）。

In [1]: iris = pd.read_csv('data/iris.data')

In [2]: iris.head()
Out[2]: 
   SepalLength  SepalWidth  PetalLength  PetalWidth         Name
0          5.1         3.5          1.4         0.2  Iris-setosa
1          4.9         3.0          1.4         0.2  Iris-setosa
2          4.7         3.2          1.3         0.2  Iris-setosa
3          4.6         3.1          1.5         0.2  Iris-setosa
4          5.0         3.6          1.4         0.2  Iris-setosa

[5 rows x 5 columns]

In [3]: iris.assign(sepal_ratio=iris['SepalWidth'] / iris['SepalLength']).head()
Out[3]: 
   SepalLength  SepalWidth  PetalLength  PetalWidth         Name  sepal_ratio
0          5.1         3.5          1.4         0.2  Iris-setosa     0.686275
1          4.9         3.0          1.4         0.2  Iris-setosa     0.612245
2          4.7         3.2          1.3         0.2  Iris-setosa     0.680851
3          4.6         3.1          1.5         0.2  Iris-setosa     0.673913
4          5.0         3.6          1.4         0.2  Iris-setosa     0.720000

[5 rows x 6 columns]

上面是一个插入预计算值的示例。我们也可以传入一个函数进行求值。

In [4]: iris.assign(sepal_ratio=lambda x: (x['SepalWidth']
   ...:                                    / x['SepalLength'])).head()
   ...: 
Out[4]: 
   SepalLength  SepalWidth  PetalLength  PetalWidth         Name  sepal_ratio
0          5.1         3.5          1.4         0.2  Iris-setosa     0.686275
1          4.9         3.0          1.4         0.2  Iris-setosa     0.612245
2          4.7         3.2          1.3         0.2  Iris-setosa     0.680851
3          4.6         3.1          1.5         0.2  Iris-setosa     0.673913
4          5.0         3.6          1.4         0.2  Iris-setosa     0.720000

[5 rows x 6 columns]

assign 的强大之处在于它在操作链中的使用。例如，我们可以将 DataFrame 限制为萼片长度大于 5 的那些行，计算比率，然后绘制

In [5]: iris = pd.read_csv('data/iris.data')

In [6]: (iris.query('SepalLength > 5')
   ...:      .assign(SepalRatio=lambda x: x.SepalWidth / x.SepalLength,
   ...:              PetalRatio=lambda x: x.PetalWidth / x.PetalLength)
   ...:      .plot(kind='scatter', x='SepalRatio', y='PetalRatio'))
   ...: 
Out[6]: <Axes: xlabel='SepalRatio', ylabel='PetalRatio'>

更多信息请参见文档。(GH 9229)

与 scipy.sparse 的交互#

添加了 SparseSeries.to_coo() 和 SparseSeries.from_coo() 方法 (GH 8048)，用于与 scipy.sparse.coo_matrix 实例进行转换（参见此处）。例如，给定一个带有 MultiIndex 的 SparseSeries，我们可以通过将行和列标签指定为索引级别来将其转换为 scipy.sparse.coo_matrix

s = pd.Series([3.0, np.nan, 1.0, 3.0, np.nan, np.nan])
s.index = pd.MultiIndex.from_tuples([(1, 2, 'a', 0),
                                     (1, 2, 'a', 1),
                                     (1, 1, 'b', 0),
                                     (1, 1, 'b', 1),
                                     (2, 1, 'b', 0),
                                     (2, 1, 'b', 1)],
                                    names=['A', 'B', 'C', 'D'])

s

# SparseSeries
ss = s.to_sparse()
ss

A, rows, columns = ss.to_coo(row_levels=['A', 'B'],
                             column_levels=['C', 'D'],
                             sort_labels=False)

A
A.todense()
rows
columns

from_coo 方法是一个方便的方法，用于从 scipy.sparse.coo_matrix 创建 SparseSeries

from scipy import sparse
A = sparse.coo_matrix(([3.0, 1.0, 2.0], ([1, 0, 0], [0, 2, 3])),
                      shape=(3, 4))
A
A.todense()

ss = pd.SparseSeries.from_coo(A)
ss

字符串方法增强#

以下新方法可以通过 .str 访问器访问，以将函数应用于每个值。这旨在使其与字符串上的标准方法更一致。(GH 9282, GH 9352, GH 9386, GH 9387, GH 9439)

方法

isalnum()

isalpha()

isdigit()

isdigit()

isspace()

islower()

isupper()

istitle()

isnumeric()

isdecimal()

find()

rfind()

ljust()

rjust()

zfill()
```
In [7]: s = pd.Series(['abcd', '3456', 'EFGH'])

In [8]: s.str.isalpha()
Out[8]: 
0     True
1    False
2     True
Length: 3, dtype: bool

In [9]: s.str.find('ab')
Out[9]: 
0    0
1   -1
2   -1
Length: 3, dtype: int64
```

Series.str.pad() 和 Series.str.center() 现在接受 fillchar 选项来指定填充字符 (GH 9352)

In [10]: s = pd.Series(['12', '300', '25'])

In [11]: s.str.pad(5, fillchar='_')
Out[11]: 
0    ___12
1    __300
2    ___25
Length: 3, dtype: object

添加了 Series.str.slice_replace()，此前会引发 NotImplementedError (GH 8888)

In [12]: s = pd.Series(['ABCD', 'EFGH', 'IJK'])

In [13]: s.str.slice_replace(1, 3, 'X')
Out[13]: 
0    AXD
1    EXH
2     IX
Length: 3, dtype: object

# replaced with empty char
In [14]: s.str.slice_replace(0, 1)
Out[14]: 
0    BCD
1    FGH
2     JK
Length: 3, dtype: object

其他增强#

Reindex 现在支持带有单调递增或递减索引的框架或系列，使用 method='nearest' (GH 9258)

In [15]: df = pd.DataFrame({'x': range(5)})

In [16]: df.reindex([0.2, 1.8, 3.5], method='nearest')
Out[16]: 
     x
0.2  0
1.8  2
3.5  4

[3 rows x 1 columns]

此方法也通过更底层的 Index.get_indexer 和 Index.get_loc 方法暴露。

read_excel() 函数的 sheetname 参数现在接受列表和 None，分别用于获取多个或所有工作表。如果指定了多个工作表，则返回一个字典。(GH 9450)
```
# Returns the 1st and 4th sheet, as a dictionary of DataFrames.
pd.read_excel('path_to_file.xls', sheetname=['Sheet1', 3])
```
允许使用迭代器增量读取 Stata 文件；支持 Stata 文件中的长字符串。参见文档此处 (GH 9493)。
以 ~ 开头的路径现在将扩展为以用户主目录开头 (GH 9066)
在 get_data_yahoo 中添加了时间间隔选择 (GH 9071)
添加了 Timestamp.to_datetime64() 作为 Timedelta.to_timedelta64() 的补充 (GH 9255)
tseries.frequencies.to_offset() 现在接受 Timedelta 作为输入 (GH 9064)
Series 的自相关方法中添加了滞后参数，默认为滞后-1 自相关 (GH 9192)
Timedelta 现在将在构造函数中接受 nanoseconds 关键字 (GH 9273)
SQL 代码现在可以安全地转义表名和列名 (GH 8986)
为 Series.str.<tab>、Series.dt.<tab> 和 Series.cat.<tab> 添加了自动完成功能 (GH 9322)
Index.get_indexer 现在支持 method='pad' 和 method='backfill'，即使对于任何目标数组（不仅仅是单调目标）也如此。这些方法也适用于单调递减和单调递增的索引 (GH 9258)。
Index.asof 现在适用于所有索引类型 (GH 9258)。
io.read_excel() 中已添加 verbose 参数，默认为 False。设置为 True 可在解析时打印工作表名称。(GH 9450)
向 Timestamp、DatetimeIndex、Period、PeriodIndex 和 Series.dt 添加了 days_in_month (兼容别名 daysinmonth) 属性 (GH 9572)
在 to_csv 中添加了 decimal 选项，用于提供非'.'小数分隔符的格式 (GH 781)
为 Timestamp 添加了 normalize 选项，以规范化到午夜 (GH 8794)
添加了使用 HDF5 文件和 rhdf5 库将 DataFrame 导入 R 的示例。更多信息请参见文档 (GH 9636)。

向后不兼容的 API 更改#

timedelta 中的更改#

在 v0.15.0 中引入了一个新的标量类型 Timedelta，它是 datetime.timedelta 的子类。在此处提及了一个关于 .seconds 访问器的 API 更改。其目的是提供一组用户友好的访问器，给出该单位的“自然”值，例如，如果你有一个 Timedelta('1 day, 10:11:12')，那么 .seconds 将返回 12。然而，这与 datetime.timedelta 的定义相悖，后者将 .seconds 定义为 10 * 3600 + 11 * 60 + 12 == 36672。

因此，在 v0.16.0 中，我们正在恢复 API 以匹配 datetime.timedelta 的定义。此外，组件值仍然可以通过 .components 访问器获得。这些更改影响 .seconds 和 .microseconds 访问器，并移除 .hours、.minutes、.milliseconds 访问器。这些更改也影响 TimedeltaIndex 和 Series 的 .dt 访问器。(GH 9185, GH 9139)

以前的行为

In [2]: t = pd.Timedelta('1 day, 10:11:12.100123')

In [3]: t.days
Out[3]: 1

In [4]: t.seconds
Out[4]: 12

In [5]: t.microseconds
Out[5]: 123

新的行为

In [17]: t = pd.Timedelta('1 day, 10:11:12.100123')

In [18]: t.days
Out[18]: 1

In [19]: t.seconds
Out[19]: 36672

In [20]: t.microseconds
Out[20]: 100123

使用 .components 允许完全组件访问

In [21]: t.components
Out[21]: Components(days=1, hours=10, minutes=11, seconds=12, milliseconds=100, microseconds=123, nanoseconds=0)

In [22]: t.components.seconds
Out[22]: 12

索引更改#

使用 .loc 的一小部分边缘情况的行为已更改 (GH 8613)。此外，我们改进了引发的错误消息的内容

现在允许使用 .loc 进行切片时，开始和/或停止边界在索引中未找到；以前这会引发 KeyError。这使得在这种情况下行为与 .ix 相同。此更改仅适用于切片，不适用于单个标签索引。

In [23]: df = pd.DataFrame(np.random.randn(5, 4),
   ....:                   columns=list('ABCD'),
   ....:                   index=pd.date_range('20130101', periods=5))
   ....: 

In [24]: df
Out[24]: 
                   A         B         C         D
2013-01-01  0.469112 -0.282863 -1.509059 -1.135632
2013-01-02  1.212112 -0.173215  0.119209 -1.044236
2013-01-03 -0.861849 -2.104569 -0.494929  1.071804
2013-01-04  0.721555 -0.706771 -1.039575  0.271860
2013-01-05 -0.424972  0.567020  0.276232 -1.087401

[5 rows x 4 columns]

In [25]: s = pd.Series(range(5), [-2, -1, 1, 2, 3])

In [26]: s
Out[26]: 
-2    0
-1    1
 1    2
 2    3
 3    4
Length: 5, dtype: int64

以前的行为

In [4]: df.loc['2013-01-02':'2013-01-10']
KeyError: 'stop bound [2013-01-10] is not in the [index]'

In [6]: s.loc[-10:3]
KeyError: 'start bound [-10] is not the [index]'

新的行为

In [27]: df.loc['2013-01-02':'2013-01-10']
Out[27]: 
                   A         B         C         D
2013-01-02  1.212112 -0.173215  0.119209 -1.044236
2013-01-03 -0.861849 -2.104569 -0.494929  1.071804
2013-01-04  0.721555 -0.706771 -1.039575  0.271860
2013-01-05 -0.424972  0.567020  0.276232 -1.087401

[4 rows x 4 columns]

In [28]: s.loc[-10:3]
Out[28]: 
-2    0
-1    1
 1    2
 2    3
 3    4
Length: 5, dtype: int64

允许使用 .ix 对整数索引进行浮点值切片。以前这只对 .loc 启用

以前的行为

In [8]: s.ix[-1.0:2]
TypeError: the slice start value [-1.0] is not a proper indexer for this index type (Int64Index)

新的行为

In [2]: s.ix[-1.0:2]
Out[2]:
-1    1
 1    2
 2    3
dtype: int64

在使用 .loc 时，如果索引类型无效，则提供有用的异常。例如，尝试在 DatetimeIndex、PeriodIndex 或 TimedeltaIndex 类型的索引上使用 .loc，并使用整数（或浮点数）。

以前的行为
```
In [4]: df.loc[2:3]
KeyError: 'start bound [2] is not the [index]'
```
新的行为
```
In [4]: df.loc[2:3]
TypeError: Cannot do slice indexing on <class 'pandas.tseries.index.DatetimeIndex'> with <type 'int'> keys
```

Categorical 更改#

在以前的版本中，未指定排序（即未传入 ordered 关键字）的 Categoricals 默认被视为 ordered Categoricals。从现在起，Categorical 构造函数中的 ordered 关键字将默认为 False。排序现在必须明确指定。

此外，以前你*可以*通过简单地设置属性来改变 Categorical 的 ordered 属性，例如 cat.ordered=True；现在这已弃用，你应该使用 cat.as_ordered() 或 cat.as_unordered()。这些方法默认会返回一个**新**对象，而不是修改现有对象。(GH 9347, GH 9190)

以前的行为

In [3]: s = pd.Series([0, 1, 2], dtype='category')

In [4]: s
Out[4]:
0    0
1    1
2    2
dtype: category
Categories (3, int64): [0 < 1 < 2]

In [5]: s.cat.ordered
Out[5]: True

In [6]: s.cat.ordered = False

In [7]: s
Out[7]:
0    0
1    1
2    2
dtype: category
Categories (3, int64): [0, 1, 2]

新的行为

In [29]: s = pd.Series([0, 1, 2], dtype='category')

In [30]: s
Out[30]: 
0    0
1    1
2    2
Length: 3, dtype: category
Categories (3, int64): [0, 1, 2]

In [31]: s.cat.ordered
Out[31]: False

In [32]: s = s.cat.as_ordered()

In [33]: s
Out[33]: 
0    0
1    1
2    2
Length: 3, dtype: category
Categories (3, int64): [0 < 1 < 2]

In [34]: s.cat.ordered
Out[34]: True

# you can set in the constructor of the Categorical
In [35]: s = pd.Series(pd.Categorical([0, 1, 2], ordered=True))

In [36]: s
Out[36]: 
0    0
1    1
2    2
Length: 3, dtype: category
Categories (3, int64): [0 < 1 < 2]

In [37]: s.cat.ordered
Out[37]: True

为了方便创建分类数据系列，我们增加了在调用 .astype() 时传递关键字参数的功能。这些参数会直接传递给构造函数。

In [54]: s = pd.Series(["a", "b", "c", "a"]).astype('category', ordered=True)

In [55]: s
Out[55]:
0    a
1    b
2    c
3    a
dtype: category
Categories (3, object): [a < b < c]

In [56]: s = (pd.Series(["a", "b", "c", "a"])
   ....:        .astype('category', categories=list('abcdef'), ordered=False))

In [57]: s
Out[57]:
0    a
1    b
2    c
3    a
dtype: category
Categories (6, object): [a, b, c, d, e, f]

其他 API 更改#

Index.duplicated 现在返回 np.array(dtype=bool)，而不是包含 bool 值的 Index(dtype=object)。(GH 8875)
DataFrame.to_json 现在为混合 dtype 框架返回每列的精确类型序列化 (GH 9037)

以前数据在序列化前会被强制转换为共同的 dtype，例如导致整数序列化为浮点数
```
In [2]: pd.DataFrame({'i': [1,2], 'f': [3.0, 4.2]}).to_json()
Out[2]: '{"f":{"0":3.0,"1":4.2},"i":{"0":1.0,"1":2.0}}'
```
现在每列都使用其正确的 dtype 进行序列化
```
In [2]:  pd.DataFrame({'i': [1,2], 'f': [3.0, 4.2]}).to_json()
Out[2]: '{"f":{"0":3.0,"1":4.2},"i":{"0":1,"1":2}}'
```
DatetimeIndex、PeriodIndex 和 TimedeltaIndex.summary 现在输出相同的格式。(GH 9116)
TimedeltaIndex.freqstr 现在输出与 DatetimeIndex 相同的字符串格式。(GH 9116)
条形图和水平条形图不再沿信息轴添加虚线。以前的样式可以通过 matplotlib 的 axhline 或 axvline 方法实现 (GH 9088)。
Series 访问器 .dt、.cat 和 .str 现在在系列不包含适当类型的数据时引发 AttributeError 而不是 TypeError (GH 9617)。这更紧密地遵循 Python 的内置异常层次结构，并确保 hasattr(s, 'cat') 等测试在 Python 2 和 3 上都一致。

Series 现在支持整数类型的位运算 (GH 9016)。以前，即使输入 dtype 是整数，输出 dtype 也会被强制转换为 bool。

以前的行为

In [2]: pd.Series([0, 1, 2, 3], list('abcd')) | pd.Series([4, 4, 4, 4], list('abcd'))
Out[2]:
a    True
b    True
c    True
d    True
dtype: bool

新行为。如果输入 dtype 是整数，则输出 dtype 也是整数，并且输出值是位运算的结果。

In [2]: pd.Series([0, 1, 2, 3], list('abcd')) | pd.Series([4, 4, 4, 4], list('abcd'))
Out[2]:
a    4
b    5
c    6
d    7
dtype: int64

在涉及 Series 或 DataFrame 的除法运算中，0/0 和 0//0 现在给出 np.nan 而不是 np.inf。(GH 9144, GH 8445)

以前的行为

In [2]: p = pd.Series([0, 1])

In [3]: p / 0
Out[3]:
0    inf
1    inf
dtype: float64

In [4]: p // 0
Out[4]:
0    inf
1    inf
dtype: float64

新的行为

In [38]: p = pd.Series([0, 1])

In [39]: p / 0
Out[39]: 
0    NaN
1    inf
Length: 2, dtype: float64

In [40]: p // 0
Out[40]: 
0    NaN
1    inf
Length: 2, dtype: float64

Series.values_counts 和 Series.describe 对于分类数据现在将 NaN 条目放在末尾。(GH 9443)
Series.describe 对于分类数据现在将未使用的类别的计数和频率设为 0，而不是 NaN (GH 9443)
由于一个错误修复，使用 DatetimeIndex.asof 查找部分字符串标签现在包含与字符串匹配的值，即使它们在部分字符串标签的开始之后 (GH 9258)。

旧行为
```
In [4]: pd.to_datetime(['2000-01-31', '2000-02-28']).asof('2000-02')
Out[4]: Timestamp('2000-01-31 00:00:00')
```
已修复的行为
```
In [41]: pd.to_datetime(['2000-01-31', '2000-02-28']).asof('2000-02')
Out[41]: Timestamp('2000-02-28 00:00:00')
```
要重现旧行为，只需在标签中添加更多精度（例如，使用 2000-02-01 而不是 2000-02）。

弃用#

rplot 栅格绘图接口已弃用，并将在未来版本中移除。我们建议使用像 seaborn 这样的外部包来获取类似但更精细的功能 (GH 3445)。文档中包含了一些如何将现有代码从 rplot 转换为 seaborn 的示例此处。
pandas.sandbox.qtpandas 接口已弃用，并将在未来版本中移除。我们建议用户使用外部包 pandas-qt。(GH 9615)
pandas.rpy 接口已弃用，并将在未来版本中移除。类似的功能可以通过 rpy2 项目访问 (GH 9602)
将 DatetimeIndex/PeriodIndex 添加到另一个 DatetimeIndex/PeriodIndex 作为集合操作已被弃用。这将在未来版本中更改为 TypeError。.union() 应该用于并集集合操作。(GH 9094)
将 DatetimeIndex/PeriodIndex 从另一个 DatetimeIndex/PeriodIndex 中减去作为集合操作已被弃用。这将在未来版本中更改为实际的数值减法，产生 TimeDeltaIndex。.difference() 应该用于差集集合操作。(GH 9094)

移除先前版本弃用/更改#

DataFrame.pivot_table 和 crosstab 的 rows 和 cols 关键字参数已移除，取而代之的是 index 和 columns (GH 6581)
DataFrame.to_excel 和 DataFrame.to_csv 的 cols 关键字参数已移除，取而代之的是 columns (GH 6581)
移除了 convert_dummies，取而代之的是 get_dummies (GH 6581)
移除了 value_range，取而代之的是 describe (GH 6581)

性能改进#

修复了 .loc 索引使用数组或类列表时的性能回归问题 (GH 9126)。
DataFrame.to_json 对于混合 dtype 框架的性能提高了 30 倍。(GH 9037)
通过使用标签而非值，提高了 MultiIndex.duplicated 的性能 (GH 9125)
通过调用 unique 而不是 value_counts，提高了 nunique 的速度 (GH 9129, GH 7771)
通过适当利用同构/异构 dtypes，DataFrame.count 和 DataFrame.dropna 的性能提高了 10 倍 (GH 9136)
在使用 MultiIndex 和 level 关键字参数时，DataFrame.count 的性能提高了 20 倍 (GH 9163)
当键空间超出 int64 边界时，merge 的性能和内存使用改进 (GH 9151)
多键 groupby 的性能改进 (GH 9429)
MultiIndex.sortlevel 的性能改进 (GH 9445)
DataFrame.duplicated 的性能和内存使用改进 (GH 9398)
周期 Period 的 Cython 化 (GH 9440)
to_hdf 的内存使用量减少 (GH 9648)

错误修复#

更改了 .to_html 以移除表格体中的前导/尾随空格 (GH 4987)
修复了在 Python 3 中使用 read_csv 读取 s3 时的兼容性问题 (GH 9452)
修复了 DatetimeIndex 中的兼容性问题，该问题影响 numpy.int_ 默认为 numpy.int32 的架构 (GH 8943)
面板索引与对象类型使用时的错误 (GH 9140)
返回的 Series.dt.components 索引被重置为默认索引的错误 (GH 9247)
在 Categorical.__getitem__/__setitem__ 中使用类列表输入时，由于索引器强制转换导致结果不正确的错误 (GH 9469)
DatetimeIndex 的部分设置中的错误 (GH 9478)
groupby 对于整数和 datetime64 列在应用聚合器时导致值在数字足够大时被更改的错误 (GH 9311, GH 6620)
修复了 to_sql 中将 Timestamp 对象列（带有时区信息的 datetime 列）映射到适当的 sqlalchemy 类型时的错误 (GH 9085)。
修复了 to_sql dtype 参数不接受实例化 SQLAlchemy 类型的错误 (GH 9083)。
.loc 使用 np.datetime64 部分设置时的错误 (GH 9516)
对于类似日期时间的 Series 和 .xs 切片推断出不正确的 dtype (GH 9477)
Categorical.unique() (以及如果 s 的 dtype 为 category 则为 s.unique()) 中的项现在按其最初找到的顺序出现，而不是按排序顺序出现 (GH 9331)。这现在与 pandas 中其他 dtype 的行为一致。
修复了大端平台在 StataReader 中产生不正确结果的错误 (GH 8688)。
MultiIndex.has_duplicates 中存在一个错误，当有许多级别时会导致索引器溢出 (GH 9075, GH 5873)
pivot 和 unstack 中存在的错误，其中 nan 值会破坏索引对齐 (GH 4862, GH 7401, GH 7403, GH 7405, GH 7466, GH 9497)
MultiIndex 上使用 sort=True 或空值进行左连接时的错误 (GH 9210)。
MultiIndex 中插入新键失败的错误 (GH 9250)。
当键空间超出 int64 边界时，groupby 中的错误 (GH 9096)。
使用 TimedeltaIndex 或 DatetimeIndex 和空值进行 unstack 时的错误 (GH 9491)。
rank 中的错误，其中带容差比较浮点数会导致不一致的行为 (GH 8365)。
修复了从 URL 加载数据时，read_stata 和 StataReader 中的字符编码错误 (GH 9231)。
在将 offsets.Nano 添加到其他偏移量时引发 TypeError 的错误 (GH 9284)
DatetimeIndex 迭代中的错误，与 (GH 8890) 相关，已在 (GH 9100) 中修复
resample 在夏令时转换期间的错误。这需要修复偏移类，以便它们在夏令时转换时行为正确。(GH 5172, GH 8744, GH 8653, GH 9173, GH 9468)。
二进制运算符方法（例如 .mul()）与整数级别对齐时的错误 (GH 9463)。
箱线图、散点图和蜂窝图可能显示不必要的警告的错误 (GH 8877)
使用 layout 关键字参数的子图可能显示不必要警告的错误 (GH 9464)
在使用包装函数（例如 fillna）时，使用需要传递参数（例如轴）的 grouper 函数时出现的错误 (GH 9221)
DataFrame 现在在构造函数中正确支持同时使用 copy 和 dtype 参数 (GH 9099)
当使用 C 引擎读取带有回车符换行符的文件并使用 skiprows 时，read_csv 中的错误。(GH 9079)
isnull 现在检测 PeriodIndex 中的 NaT (GH 9129)
使用多列 groupby 时，groupby .nth() 中的错误 (GH 8979)
DataFrame.where 和 Series.where 将数值错误地强制转换为字符串的错误 (GH 9280)
DataFrame.where 和 Series.where 在传入字符串类列表时引发 ValueError 的错误。(GH 9280)
在非字符串值上访问 Series.str 方法现在会引发 TypeError，而不是产生不正确的结果 (GH 9184)
DatetimeIndex.__contains__ 在索引包含重复项且非单调递增时的错误 (GH 9512)
修复了当所有值相等时 Series.kurt() 的除零错误 (GH 9197)
修复了 xlsxwriter 引擎中的问题，该问题导致如果没有应用其他格式，它会向单元格添加默认的“常规”格式。这阻止了其他行或列格式的应用。(GH 9167)
修复了在 read_csv 中同时指定 index_col=False 和 usecols 时的兼容性问题。(GH 9082)
wide_to_long 会修改输入存根名称列表的错误 (GH 9204)
to_sql 未使用双精度存储 float64 值的错误。(GH 9009)
SparseSeries 和 SparsePanel 现在接受零参数构造函数（与其非稀疏对应项相同）(GH 9272)。
合并 Categorical 和 object dtypes 时的回归问题 (GH 9426)
read_csv 在某些格式错误的输入文件上发生缓冲区溢出的错误 (GH 9205)
groupby MultiIndex 缺失对的错误 (GH 9049, GH 9344)
修复了 Series.groupby 中按 MultiIndex 级别分组时会忽略排序参数的错误 (GH 9444)
修复了 DataFrame.Groupby 在分类列情况下 sort=False 被忽略的错误。(GH 8868)
修复了 Python 3 上从 Amazon S3 读取 CSV 文件时引发 TypeError 的错误 (GH 9452)
Google BigQuery 阅读器中存在的错误，其中查询结果中可能存在 'jobComplete' 键但为 False (GH 8728)
在 Series.values_counts 中，当 Series 类型为分类，且 dropna=True 时，排除 NaN 的错误 (GH 9443)
修复了 DataFrame.std/var/sem 缺少 numeric_only 选项的问题 (GH 9201)
支持使用标量数据构造 Panel 或 Panel4D (GH 8285)
Series 文本表示与 max_rows/max_columns 脱钩 (GH 7508)。

Series 数字格式在截断时不一致 (GH 8532)。

以前的行为

In [2]: pd.options.display.max_rows = 10
In [3]: s = pd.Series([1,1,1,1,1,1,1,1,1,1,0.9999,1,1]*10)
In [4]: s
Out[4]:
0    1
1    1
2    1
...
127    0.9999
128    1.0000
129    1.0000
Length: 130, dtype: float64

新的行为

    1.0000
    1.0000
    1.0000
    1.0000
    1.0000
...
  1.0000
  1.0000
  0.9999
  1.0000
  1.0000
dtype: float64

在某些情况下，在框架中设置新项目时会生成一个错误的 SettingWithCopy 警告 (GH 8730)

以下代码以前会报告 SettingWithCopy 警告。

In [42]: df1 = pd.DataFrame({'x': pd.Series(['a', 'b', 'c']),
   ....:                     'y': pd.Series(['d', 'e', 'f'])})
   ....: 

In [43]: df2 = df1[['x']]

In [44]: df2['y'] = ['g', 'h', 'i']

贡献者#

共有 60 人为此版本贡献了补丁。名字旁边带有“+”的人是首次贡献补丁。

Aaron Toth +
Alan Du +
Alessandro Amici +
Artemy Kolchinsky
Ashwini Chaudhary +
Ben Schiller
Bill Letson
Brandon Bradley +
Chau Hoang +
Chris Reynolds
Chris Whelan +
Christer van der Meeren +
David Cottrell +
David Stephens
Ehsan Azarnasab +
Garrett-R +
Guillaume Gay
Jake Torcasso +
Jason Sexauer
Jeff Reback
John McNamara
Joris Van den Bossche
Joschka zur Jacobsmühlen +
Juarez Bochi +
Junya Hayashi +
K.-Michael Aye
Kerby Shedden +
Kevin Sheppard
Kieran O’Mahony
Kodi Arfer +
Matti Airas +
Min RK +
Mortada Mehyar
Robert +
Scott E Lasley
Scott Lasley +
Sergio Pascual +
Skipper Seabold
Stephan Hoyer
Thomas Grainger
Tom Augspurger
TomAugspurger
Vladimir Filimonov +
Vyomkesh Tripathi +
Will Holmgren
Yulong Yang +
behzad nouri
bertrandhaut +
bjonen
cel4 +
clham
hsperr +
ischwabacher
jnmclarty
josham +
jreback
omtinez +
roch +
sinhrks
unutbu

		方法
`isalnum()`	`isalpha()`	`isdigit()`	`isdigit()`	`isspace()`
`islower()`	`isupper()`	`istitle()`	`isnumeric()`	`isdecimal()`
`find()`	`rfind()`	`ljust()`	`rjust()`	`zfill()`