首页 > 脚本专栏 > python > Pandas根据多列的值生成列

Python Pandas实现根据多列的值生成新的列

2026-02-02 08:44:49 作者：小满大王i

在 Pandas 中,可以根据多列的值生成新的列,这篇文章主要和大家详细介绍了一些常见的方法,文中的示例代码讲解详细,感兴趣的小伙伴可以了解下

在 Pandas 中，可以根据 多列的值 生成新的列，常见的方法包括：

apply() + 自定义函数（最灵活）
np.where() 或 np.select()（条件判断）
直接数学运算（如 df['A'] + df['B']）
assign() + lambda（链式操作）
eval()（高效计算，但需谨慎使用）

1. 使用apply()+ 自定义函数（推荐）

适用于 复杂逻辑，可以基于多列计算新列。

import pandas as pd

df = pd.DataFrame({
    'math': [90, 80, 70],
    'english': [85, 75, 65],
    'science': [88, 92, 78]
})

# 定义一个函数，基于多列计算平均分
def calculate_average(row):
    return (row['math'] + row['english'] + row['science']) / 3

# 使用 apply() 按行计算
df['average'] = df.apply(calculate_average, axis=1)

print(df)

输出：

math english science average
0 90 85 88 87.666667
1 80 75 92 82.333333
2 70 65 78 71.000000

优化：使用lambda简化

df['average'] = df.apply(lambda row: (row['math'] + row['english'] + row['science']) / 3, axis=1)

2. 使用np.where()或np.select()（条件判断）

适用于 基于多列条件 生成新列。

(1)np.where()（二分类）

import numpy as np

# 如果 math 和 english 都 > 80，则 '优秀'，否则 '普通'
df['grade'] = np.where((df['math'] > 80) & (df['english'] > 80), '优秀', '普通')

print(df)

输出：

math english science average grade
0 90 85 88 87.666667 优秀
1 80 75 92 82.333333 普通
2 70 65 78 71.000000 普通

(2)np.select()（多条件）

conditions = [
    (df['math'] >= 90) & (df['english'] >= 90),
    (df['math'] >= 80) & (df['english'] >= 80),
    (df['math'] >= 70) & (df['english'] >= 70)
]
choices = ['A', 'B', 'C']

df['grade'] = np.select(conditions, choices, default='D')

print(df)

输出：

math english science average grade
0 90 85 88 87.666667 B
1 80 75 92 82.333333 C
2 70 65 78 71.000000 D

3. 直接数学运算（简单计算）

适用于 多列直接运算（如加权平均、总分等）。

# 计算总分（math + english + science）
df['total'] = df['math'] + df['english'] + df['science']

# 计算加权平均（math 权重 0.5，english 0.3，science 0.2）
df['weighted_avg'] = df['math'] * 0.5 + df['english'] * 0.3 + df['science'] * 0.2

print(df)

输出：

math english science total weighted_avg
0 90 85 88 263 87.300000
1 80 75 92 247 79.900000
2 70 65 78 213 70.100000

4. 使用assign()+lambda（链式操作）

适用于 不修改原 DataFrame，而是返回新 DataFrame。

df = df.assign(
    total=lambda x: x['math'] + x['english'] + x['science'],
    weighted_avg=lambda x: x['math'] * 0.5 + x['english'] * 0.3 + x['science'] * 0.2
)

print(df)

输出：

math english science total weighted_avg
0 90 85 88 263 87.300000
1 80 75 92 247 79.900000
2 70 65 78 213 70.100000

5. 使用eval()（高效计算，但需谨慎）

适用于 快速计算，但可能影响可读性。

# 计算总分
df['total'] = df.eval('math + english + science')

# 计算加权平均
df['weighted_avg'] = df.eval('math * 0.5 + english * 0.3 + science * 0.2')

print(df)

输出：

math english science total weighted_avg
0 90 85 88 263 87.300000
1 80 75 92 247 79.900000
2 70 65 78 213 70.100000

总结

方法	适用场景	示例
apply() + 自定义函数	复杂逻辑	df.apply(lambda row: row['A'] + row['B'], axis=1)
np.where() / np.select()	条件判断	np.where((df['A'] > 0) & (df['B'] < 0), '符合', '不符合')
直接运算	简单计算	df['total'] = df['A'] + df['B'] + df['C']
assign() + lambda	链式操作	df.assign(new_col=lambda x: x['A'] * 2)
eval()	高效计算	df.eval('A + B * C')

最佳实践：

简单计算 → 直接 + - * / 或 assign()
复杂逻辑 → apply() + lambda 或自定义函数
条件判断 → np.where()（二分类）或 np.select()（多条件）
避免 eval()（除非性能关键，否则可读性较差）

示例：综合应用

# 计算总分
df['total'] = df['math'] + df['english'] + df['science']

# 计算加权平均
df['weighted_avg'] = df.eval('math * 0.5 + english * 0.3 + science * 0.2')

# 判断是否优秀（math 和 english 都 > 85）
df['is_excellent'] = np.where((df['math'] > 85) & (df['english'] > 85), 'Yes', 'No')

print(df)

输出：

math english science total weighted_avg is_excellent
0 90 85 88 263 87.300000 No
1 80 75 92 247 79.900000 No
2 70 65 78 213 70.100000 No

这样，你可以灵活地基于多列数据生成新列！

到此这篇关于Python Pandas实现根据多列的值生成新的列的文章就介绍到这了,更多相关Pandas根据多列的值生成列内容请搜索脚本之家以前的文章或继续浏览下面的相关文章希望大家以后多多支持脚本之家！

Python Pandas实现根据多列的值生成新的列

1. 使用apply()+ 自定义函数（推荐）

2. 使用np.where()或np.select()（条件判断）

(1)np.where()（二分类）

(2)np.select()（多条件）

3. 直接数学运算（简单计算）

4. 使用assign()+lambda（链式操作）

5. 使用eval()（高效计算，但需谨慎）

总结

示例：综合应用

您可能感兴趣的文章: