与 SQL 的比较#

由于许多潜在的 pandas 用户对 SQL 比较熟悉,因此本页旨在提供一些示例,说明如何使用 pandas 执行各种 SQL 操作。

如果您是 pandas 的新手,您可能需要先阅读10 分钟学会 pandas以熟悉该库。

按照惯例,我们如下导入 pandas 和 NumPy

In [1]: import pandas as pd

In [2]: import numpy as np

大多数示例将利用 pandas 测试中找到的 tips 数据集。我们将数据读入一个名为 tips 的 DataFrame,并假设我们有一个同名同结构的数据库表。

In [3]: url = (
   ...:     "https://raw.githubusercontent.com/pandas-dev"
   ...:     "/pandas/main/pandas/tests/io/data/csv/tips.csv"
   ...: )
   ...: 

In [4]: tips = pd.read_csv(url)

In [5]: tips
Out[5]: 
     total_bill   tip     sex smoker   day    time  size
0         16.99  1.01  Female     No   Sun  Dinner     2
1         10.34  1.66    Male     No   Sun  Dinner     3
2         21.01  3.50    Male     No   Sun  Dinner     3
3         23.68  3.31    Male     No   Sun  Dinner     2
4         24.59  3.61  Female     No   Sun  Dinner     4
..          ...   ...     ...    ...   ...     ...   ...
239       29.03  5.92    Male     No   Sat  Dinner     3
240       27.18  2.00  Female    Yes   Sat  Dinner     2
241       22.67  2.00    Male    Yes   Sat  Dinner     2
242       17.82  1.75    Male     No   Sat  Dinner     2
243       18.78  3.00  Female     No  Thur  Dinner     2

[244 rows x 7 columns]

副本与就地操作#

大多数 pandas 操作会返回 Series/DataFrame 的副本。要使更改“生效”,您需要将其分配给一个新变量

sorted_df = df.sort_values("col1")

或覆盖原始变量

df = df.sort_values("col1")

注意

您会发现某些方法提供了 inplace=Truecopy=False 关键字参数

df.replace(5, inplace=True)

关于弃用和删除大多数方法的 inplacecopy(例如 dropna),除了极少数方法(包括 replace)之外,正在积极讨论中。在 Copy-on-Write 的上下文中,这两个关键字将不再是必需的。提案可以在这里找到。

SELECT#

在 SQL 中,选择是通过您想要选择的列的逗号分隔列表(或选择所有列的 *)来完成的。

SELECT total_bill, tip, smoker, time
FROM tips;

使用 pandas,可以通过将列名列表传递给 DataFrame 来选择列。

In [6]: tips[["total_bill", "tip", "smoker", "time"]]
Out[6]: 
     total_bill   tip smoker    time
0         16.99  1.01     No  Dinner
1         10.34  1.66     No  Dinner
2         21.01  3.50     No  Dinner
3         23.68  3.31     No  Dinner
4         24.59  3.61     No  Dinner
..          ...   ...    ...     ...
239       29.03  5.92     No  Dinner
240       27.18  2.00    Yes  Dinner
241       22.67  2.00    Yes  Dinner
242       17.82  1.75     No  Dinner
243       18.78  3.00     No  Dinner

[244 rows x 4 columns]

不带列名列表调用 DataFrame 将显示所有列(类似于 SQL 的 *)。

在 SQL 中,您可以添加一个计算列。

SELECT *, tip/total_bill as tip_rate
FROM tips;

使用 pandas,您可以使用 DataFrame 的 DataFrame.assign() 方法追加一个新列。

In [7]: tips.assign(tip_rate=tips["tip"] / tips["total_bill"])
Out[7]: 
     total_bill   tip     sex smoker   day    time  size  tip_rate
0         16.99  1.01  Female     No   Sun  Dinner     2  0.059447
1         10.34  1.66    Male     No   Sun  Dinner     3  0.160542
2         21.01  3.50    Male     No   Sun  Dinner     3  0.166587
3         23.68  3.31    Male     No   Sun  Dinner     2  0.139780
4         24.59  3.61  Female     No   Sun  Dinner     4  0.146808
..          ...   ...     ...    ...   ...     ...   ...       ...
239       29.03  5.92    Male     No   Sat  Dinner     3  0.203927
240       27.18  2.00  Female    Yes   Sat  Dinner     2  0.073584
241       22.67  2.00    Male    Yes   Sat  Dinner     2  0.088222
242       17.82  1.75    Male     No   Sat  Dinner     2  0.098204
243       18.78  3.00  Female     No  Thur  Dinner     2  0.159744

[244 rows x 8 columns]

WHERE#

SQL 中的过滤是通过 WHERE 子句完成的。

SELECT *
FROM tips
WHERE time = 'Dinner';

DataFrame 可以通过多种方式进行过滤;其中最直观的是使用布尔索引

In [8]: tips[tips["total_bill"] > 10]
Out[8]: 
     total_bill   tip     sex smoker   day    time  size
0         16.99  1.01  Female     No   Sun  Dinner     2
1         10.34  1.66    Male     No   Sun  Dinner     3
2         21.01  3.50    Male     No   Sun  Dinner     3
3         23.68  3.31    Male     No   Sun  Dinner     2
4         24.59  3.61  Female     No   Sun  Dinner     4
..          ...   ...     ...    ...   ...     ...   ...
239       29.03  5.92    Male     No   Sat  Dinner     3
240       27.18  2.00  Female    Yes   Sat  Dinner     2
241       22.67  2.00    Male    Yes   Sat  Dinner     2
242       17.82  1.75    Male     No   Sat  Dinner     2
243       18.78  3.00  Female     No  Thur  Dinner     2

[227 rows x 7 columns]

上面的语句只是将一个包含 True/False 对象的 Series 传递给 DataFrame,返回所有 True 的行。

In [9]: is_dinner = tips["time"] == "Dinner"

In [10]: is_dinner
Out[10]: 
0      True
1      True
2      True
3      True
4      True
       ... 
239    True
240    True
241    True
242    True
243    True
Name: time, Length: 244, dtype: bool

In [11]: is_dinner.value_counts()
Out[11]: 
time
True     176
False     68
Name: count, dtype: int64

In [12]: tips[is_dinner]
Out[12]: 
     total_bill   tip     sex smoker   day    time  size
0         16.99  1.01  Female     No   Sun  Dinner     2
1         10.34  1.66    Male     No   Sun  Dinner     3
2         21.01  3.50    Male     No   Sun  Dinner     3
3         23.68  3.31    Male     No   Sun  Dinner     2
4         24.59  3.61  Female     No   Sun  Dinner     4
..          ...   ...     ...    ...   ...     ...   ...
239       29.03  5.92    Male     No   Sat  Dinner     3
240       27.18  2.00  Female    Yes   Sat  Dinner     2
241       22.67  2.00    Male    Yes   Sat  Dinner     2
242       17.82  1.75    Male     No   Sat  Dinner     2
243       18.78  3.00  Female     No  Thur  Dinner     2

[176 rows x 7 columns]

与 SQL 的 ORAND 类似,可以使用 |OR)和 &AND)将多个条件传递给 DataFrame。

晚餐时小费超过 $5 的餐点

SELECT *
FROM tips
WHERE time = 'Dinner' AND tip > 5.00;
In [13]: tips[(tips["time"] == "Dinner") & (tips["tip"] > 5.00)]
Out[13]: 
     total_bill    tip     sex smoker  day    time  size
23        39.42   7.58    Male     No  Sat  Dinner     4
44        30.40   5.60    Male     No  Sun  Dinner     4
47        32.40   6.00    Male     No  Sun  Dinner     4
52        34.81   5.20  Female     No  Sun  Dinner     4
59        48.27   6.73    Male     No  Sat  Dinner     4
116       29.93   5.07    Male     No  Sun  Dinner     4
155       29.85   5.14  Female     No  Sun  Dinner     5
170       50.81  10.00    Male    Yes  Sat  Dinner     3
172        7.25   5.15    Male    Yes  Sun  Dinner     2
181       23.33   5.65    Male    Yes  Sun  Dinner     2
183       23.17   6.50    Male    Yes  Sun  Dinner     4
211       25.89   5.16    Male    Yes  Sat  Dinner     4
212       48.33   9.00    Male     No  Sat  Dinner     4
214       28.17   6.50  Female    Yes  Sat  Dinner     3
239       29.03   5.92    Male     No  Sat  Dinner     3

至少 5 位食客的聚会留下的小费,或者账单总额超过 $45

SELECT *
FROM tips
WHERE size >= 5 OR total_bill > 45;
In [14]: tips[(tips["size"] >= 5) | (tips["total_bill"] > 45)]
Out[14]: 
     total_bill    tip     sex smoker   day    time  size
59        48.27   6.73    Male     No   Sat  Dinner     4
125       29.80   4.20  Female     No  Thur   Lunch     6
141       34.30   6.70    Male     No  Thur   Lunch     6
142       41.19   5.00    Male     No  Thur   Lunch     5
143       27.05   5.00  Female     No  Thur   Lunch     6
155       29.85   5.14  Female     No   Sun  Dinner     5
156       48.17   5.00    Male     No   Sun  Dinner     6
170       50.81  10.00    Male    Yes   Sat  Dinner     3
182       45.35   3.50    Male    Yes   Sun  Dinner     3
185       20.69   5.00    Male     No   Sun  Dinner     5
187       30.46   2.00    Male    Yes   Sun  Dinner     5
212       48.33   9.00    Male     No   Sat  Dinner     4
216       28.15   3.00    Male    Yes   Sat  Dinner     5

NULL 检查使用 notna()isna() 方法完成。

In [15]: frame = pd.DataFrame(
   ....:     {"col1": ["A", "B", np.nan, "C", "D"], "col2": ["F", np.nan, "G", "H", "I"]}
   ....: )
   ....: 

In [16]: frame
Out[16]: 
  col1 col2
0    A    F
1    B  NaN
2  NaN    G
3    C    H
4    D    I

假设我们有一个与我们上面的 DataFrame 结构相同的表。我们可以使用以下查询仅查看 col2 IS NULL 的记录。

SELECT *
FROM frame
WHERE col2 IS NULL;
In [17]: frame[frame["col2"].isna()]
Out[17]: 
  col1 col2
1    B  NaN

使用 notna() 可以获取 col1 IS NOT NULL 的项。

SELECT *
FROM frame
WHERE col1 IS NOT NULL;
In [18]: frame[frame["col1"].notna()]
Out[18]: 
  col1 col2
0    A    F
1    B  NaN
3    C    H
4    D    I

GROUP BY#

在 pandas 中,SQL 的 GROUP BY 操作是使用同名的 groupby() 方法执行的。groupby() 通常指的是一个过程,在这个过程中,我们希望将数据集拆分成组,应用一些函数(通常是聚合),然后将这些组合并在一起。

一个常见的 SQL 操作是获取数据集中每个组的记录数。例如,获取按性别留下的小费数量的查询。

SELECT sex, count(*)
FROM tips
GROUP BY sex;
/*
Female     87
Male      157
*/

pandas 的对应操作是:

In [19]: tips.groupby("sex").size()
Out[19]: 
sex
Female     87
Male      157
dtype: int64

请注意,在 pandas 代码中,我们使用了 DataFrameGroupBy.size() 而不是 DataFrameGroupBy.count()。这是因为 DataFrameGroupBy.count() 将函数应用于每个列,返回每个列中的 NOT NULL 记录数。

In [20]: tips.groupby("sex").count()
Out[20]: 
        total_bill  tip  smoker  day  time  size
sex                                             
Female          87   87      87   87    87    87
Male           157  157     157  157   157   157

或者,我们可以将 DataFrameGroupBy.count() 方法应用于单个列。

In [21]: tips.groupby("sex")["total_bill"].count()
Out[21]: 
sex
Female     87
Male      157
Name: total_bill, dtype: int64

也可以一次应用多个函数。例如,假设我们想看看小费金额随星期几的变化情况 - DataFrameGroupBy.agg() 允许您将一个字典传递给分组的 DataFrame,指示要对特定列应用哪些函数。

SELECT day, AVG(tip), COUNT(*)
FROM tips
GROUP BY day;
/*
Fri   2.734737   19
Sat   2.993103   87
Sun   3.255132   76
Thu  2.771452   62
*/
In [22]: tips.groupby("day").agg({"tip": "mean", "day": "size"})
Out[22]: 
           tip  day
day                
Fri   2.734737   19
Sat   2.993103   87
Sun   3.255132   76
Thur  2.771452   62

通过将列列表传递给 groupby() 方法可以按多个列进行分组。

SELECT smoker, day, COUNT(*), AVG(tip)
FROM tips
GROUP BY smoker, day;
/*
smoker day
No     Fri      4  2.812500
       Sat     45  3.102889
       Sun     57  3.167895
       Thu    45  2.673778
Yes    Fri     15  2.714000
       Sat     42  2.875476
       Sun     19  3.516842
       Thu    17  3.030000
*/
In [23]: tips.groupby(["smoker", "day"]).agg({"tip": ["size", "mean"]})
Out[23]: 
             tip          
            size      mean
smoker day                
No     Fri     4  2.812500
       Sat    45  3.102889
       Sun    57  3.167895
       Thur   45  2.673778
Yes    Fri    15  2.714000
       Sat    42  2.875476
       Sun    19  3.516842
       Thur   17  3.030000

JOIN#

JOINs 可以使用 join()merge() 进行。join() 默认将 DataFrame 连接在其索引上。每个方法都有参数允许您指定要执行的连接类型(LEFTRIGHTINNERFULL)或要连接的列(列名或索引)。

警告

如果两个键列都包含键值为 null 值的行,则这些行将相互匹配。这与常规 SQL JOIN 的行为不同,并且可能导致意外的结果。

In [24]: df1 = pd.DataFrame({"key": ["A", "B", "C", "D"], "value": np.random.randn(4)})

In [25]: df2 = pd.DataFrame({"key": ["B", "D", "D", "E"], "value": np.random.randn(4)})

假设我们有两个与我们的 DataFrame 同名同结构的数据库表。

现在我们来介绍各种 JOIN 类型。

INNER JOIN#

SELECT *
FROM df1
INNER JOIN df2
  ON df1.key = df2.key;
# merge performs an INNER JOIN by default
In [26]: pd.merge(df1, df2, on="key")
Out[26]: 
  key   value_x   value_y
0   B -0.282863  1.212112
1   D -1.135632 -0.173215
2   D -1.135632  0.119209

merge() 还为想要将一个 DataFrame 的列与另一个 DataFrame 的索引连接的情况提供了参数。

In [27]: indexed_df2 = df2.set_index("key")

In [28]: pd.merge(df1, indexed_df2, left_on="key", right_index=True)
Out[28]: 
  key   value_x   value_y
1   B -0.282863  1.212112
3   D -1.135632 -0.173215
3   D -1.135632  0.119209

merge() 还支持通过传递列名列表来连接多个列。

SELECT *
FROM df1_multi
INNER JOIN df2_multi
  ON df1_multi.key1 = df2_multi.key1
    AND df1_multi.key2 = df2_multi.key2;
In [29]: df1_multi = pd.DataFrame({
   ....:     "key1": ["A", "B", "C", "D"],
   ....:     "key2": [1, 2, 3, 4],
   ....:     "value": np.random.randn(4)
   ....: })
   ....: 

In [30]: df2_multi = pd.DataFrame({
   ....:     "key1": ["B", "D", "D", "E"],
   ....:     "key2": [2, 4, 4, 5],
   ....:     "value": np.random.randn(4)
   ....: })
   ....: 

In [31]: pd.merge(df1_multi, df2_multi, on=["key1", "key2"])
Out[31]: 
  key1  key2   value_x   value_y
0    B     2 -2.104569  0.721555
1    D     4  1.071804 -0.706771
2    D     4  1.071804 -1.039575

如果 DataFrame 之间的列名不同,则 `on` 可以被 `left_on` 和 `right_on` 替换。

In [32]: df2_multi = pd.DataFrame({
   ....:     "key_1": ["B", "D", "D", "E"],
   ....:     "key_2": [2, 4, 4, 5],
   ....:     "value": np.random.randn(4)
   ....: })
   ....: 

In [33]: pd.merge(df1_multi, df2_multi, left_on=["key1", "key2"], right_on=["key_1", "key_2"])
Out[33]: 
  key1  key2   value_x key_1  key_2   value_y
0    B     2 -2.104569     B      2 -0.424972
1    D     4  1.071804     D      4  0.567020
2    D     4  1.071804     D      4  0.276232

LEFT OUTER JOIN#

显示 df1 的所有记录。

SELECT *
FROM df1
LEFT OUTER JOIN df2
  ON df1.key = df2.key;
In [34]: pd.merge(df1, df2, on="key", how="left")
Out[34]: 
  key   value_x   value_y
0   A  0.469112       NaN
1   B -0.282863  1.212112
2   C -1.509059       NaN
3   D -1.135632 -0.173215
4   D -1.135632  0.119209

RIGHT JOIN#

显示 df2 的所有记录。

SELECT *
FROM df1
RIGHT OUTER JOIN df2
  ON df1.key = df2.key;
In [35]: pd.merge(df1, df2, on="key", how="right")
Out[35]: 
  key   value_x   value_y
0   B -0.282863  1.212112
1   D -1.135632 -0.173215
2   D -1.135632  0.119209
3   E       NaN -1.044236

FULL JOIN#

pandas 还支持 FULL JOIN,它显示数据集的双方,无论连接的列是否找到匹配项。

在撰写本文时,FULL JOIN 在所有 RDBMS(MySQL)中并不受支持。

显示两个表的所有记录。

SELECT *
FROM df1
FULL OUTER JOIN df2
  ON df1.key = df2.key;
In [36]: pd.merge(df1, df2, on="key", how="outer")
Out[36]: 
  key   value_x   value_y
0   A  0.469112       NaN
1   B -0.282863  1.212112
2   C -1.509059       NaN
3   D -1.135632 -0.173215
4   D -1.135632  0.119209
5   E       NaN -1.044236

UNION#

UNION ALL 可以使用 concat() 来执行。

In [37]: df1 = pd.DataFrame(
   ....:     {"city": ["Chicago", "San Francisco", "New York City"], "rank": range(1, 4)}
   ....: )
   ....: 

In [38]: df2 = pd.DataFrame(
   ....:     {"city": ["Chicago", "Boston", "Los Angeles"], "rank": [1, 4, 5]}
   ....: )
   ....: 
SELECT city, rank
FROM df1
UNION ALL
SELECT city, rank
FROM df2;
/*
         city  rank
      Chicago     1
San Francisco     2
New York City     3
      Chicago     1
       Boston     4
  Los Angeles     5
*/
In [39]: pd.concat([df1, df2])
Out[39]: 
            city  rank
0        Chicago     1
1  San Francisco     2
2  New York City     3
0        Chicago     1
1         Boston     4
2    Los Angeles     5

SQL 的 UNIONUNION ALL 类似,但是 UNION 会删除重复的行。

SELECT city, rank
FROM df1
UNION
SELECT city, rank
FROM df2;
-- notice that there is only one Chicago record this time
/*
         city  rank
      Chicago     1
San Francisco     2
New York City     3
       Boston     4
  Los Angeles     5
*/

在 pandas 中,您可以使用 concat() 结合 drop_duplicates()

In [40]: pd.concat([df1, df2]).drop_duplicates()
Out[40]: 
            city  rank
0        Chicago     1
1  San Francisco     2
2  New York City     3
1         Boston     4
2    Los Angeles     5

LIMIT#

SELECT * FROM tips
LIMIT 10;
In [41]: tips.head(10)
Out[41]: 
   total_bill   tip     sex smoker  day    time  size
0       16.99  1.01  Female     No  Sun  Dinner     2
1       10.34  1.66    Male     No  Sun  Dinner     3
2       21.01  3.50    Male     No  Sun  Dinner     3
3       23.68  3.31    Male     No  Sun  Dinner     2
4       24.59  3.61  Female     No  Sun  Dinner     4
5       25.29  4.71    Male     No  Sun  Dinner     4
6        8.77  2.00    Male     No  Sun  Dinner     2
7       26.88  3.12    Male     No  Sun  Dinner     4
8       15.04  1.96    Male     No  Sun  Dinner     2
9       14.78  3.23    Male     No  Sun  Dinner     2

pandas 对应一些 SQL 分析和聚合函数#

带偏移量的 Top n 行#

-- MySQL
SELECT * FROM tips
ORDER BY tip DESC
LIMIT 10 OFFSET 5;
In [42]: tips.nlargest(10 + 5, columns="tip").tail(10)
Out[42]: 
     total_bill   tip     sex smoker   day    time  size
183       23.17  6.50    Male    Yes   Sun  Dinner     4
214       28.17  6.50  Female    Yes   Sat  Dinner     3
47        32.40  6.00    Male     No   Sun  Dinner     4
239       29.03  5.92    Male     No   Sat  Dinner     3
88        24.71  5.85    Male     No  Thur   Lunch     2
181       23.33  5.65    Male    Yes   Sun  Dinner     2
44        30.40  5.60    Male     No   Sun  Dinner     4
52        34.81  5.20  Female     No   Sun  Dinner     4
85        34.83  5.17  Female     No  Thur   Lunch     4
211       25.89  5.16    Male    Yes   Sat  Dinner     4

每个组的 Top n 行#

-- Oracle's ROW_NUMBER() analytic function
SELECT * FROM (
  SELECT
    t.*,
    ROW_NUMBER() OVER(PARTITION BY day ORDER BY total_bill DESC) AS rn
  FROM tips t
)
WHERE rn < 3
ORDER BY day, rn;
In [43]: (
   ....:     tips.assign(
   ....:         rn=tips.sort_values(["total_bill"], ascending=False)
   ....:         .groupby(["day"])
   ....:         .cumcount()
   ....:         + 1
   ....:     )
   ....:     .query("rn < 3")
   ....:     .sort_values(["day", "rn"])
   ....: )
   ....: 
Out[43]: 
     total_bill    tip     sex smoker   day    time  size  rn
95        40.17   4.73    Male    Yes   Fri  Dinner     4   1
90        28.97   3.00    Male    Yes   Fri  Dinner     2   2
170       50.81  10.00    Male    Yes   Sat  Dinner     3   1
212       48.33   9.00    Male     No   Sat  Dinner     4   2
156       48.17   5.00    Male     No   Sun  Dinner     6   1
182       45.35   3.50    Male    Yes   Sun  Dinner     3   2
197       43.11   5.00  Female    Yes  Thur   Lunch     4   1
142       41.19   5.00    Male     No  Thur   Lunch     5   2

使用 rank(method='first') 函数的相同方式

In [44]: (
   ....:     tips.assign(
   ....:         rnk=tips.groupby(["day"])["total_bill"].rank(
   ....:             method="first", ascending=False
   ....:         )
   ....:     )
   ....:     .query("rnk < 3")
   ....:     .sort_values(["day", "rnk"])
   ....: )
   ....: 
Out[44]: 
     total_bill    tip     sex smoker   day    time  size  rnk
95        40.17   4.73    Male    Yes   Fri  Dinner     4  1.0
90        28.97   3.00    Male    Yes   Fri  Dinner     2  2.0
170       50.81  10.00    Male    Yes   Sat  Dinner     3  1.0
212       48.33   9.00    Male     No   Sat  Dinner     4  2.0
156       48.17   5.00    Male     No   Sun  Dinner     6  1.0
182       45.35   3.50    Male    Yes   Sun  Dinner     3  2.0
197       43.11   5.00  Female    Yes  Thur   Lunch     4  1.0
142       41.19   5.00    Male     No  Thur   Lunch     5  2.0
-- Oracle's RANK() analytic function
SELECT * FROM (
  SELECT
    t.*,
    RANK() OVER(PARTITION BY sex ORDER BY tip) AS rnk
  FROM tips t
  WHERE tip < 2
)
WHERE rnk < 3
ORDER BY sex, rnk;

让我们找到(小费 < 2)的(按性别分组的排名 < 3)的小费。请注意,当使用 rank(method='min') 函数时,rnk_min 对于相同 tip 保持不变(类似于 Oracle 的 RANK() 函数)。

In [45]: (
   ....:     tips[tips["tip"] < 2]
   ....:     .assign(rnk_min=tips.groupby(["sex"])["tip"].rank(method="min"))
   ....:     .query("rnk_min < 3")
   ....:     .sort_values(["sex", "rnk_min"])
   ....: )
   ....: 
Out[45]: 
     total_bill   tip     sex smoker  day    time  size  rnk_min
67         3.07  1.00  Female    Yes  Sat  Dinner     1      1.0
92         5.75  1.00  Female    Yes  Fri  Dinner     2      1.0
111        7.25  1.00  Female     No  Sat  Dinner     1      1.0
236       12.60  1.00    Male    Yes  Sat  Dinner     2      1.0
237       32.83  1.17    Male    Yes  Sat  Dinner     2      2.0

UPDATE#

UPDATE tips
SET tip = tip*2
WHERE tip < 2;
In [46]: tips.loc[tips["tip"] < 2, "tip"] *= 2

DELETE#

DELETE FROM tips
WHERE tip > 9;

在 pandas 中,我们选择应保留的行,而不是删除应删除的行。

In [47]: tips = tips.loc[tips["tip"] <= 9]