How do I concisely replace column values given multiple conditions?

P

political scientist

Guest
I'm trying to use numpy.select to replace string values within a column; if string contains a keyword, I need the whole string to be replaced with another keyword (there are +- 25 combinations).

df["new_col"] = np.select(
condlist=[
df["col"].str.contains("cat1", na=False, case=False),
df["col"].str.contains("cat2", na=False, case=False),
df["col"].str.contains("cat3", na=False, case=False),
df["col"].str.contains("cat4", na=False, case=False),
# ...
df["col"].str.contains("cat25", na=False, case=False),
],
choicelist=[
"NEW_cat1",
"NEW_cat2",
"NEW_cat3",
"NEW_cat4",
# ...
"NEW_cat25"
],
default="DEFAULT_cat",
)


Is there a more concise way, or should I just repeat str.contains(...) within condlist 25 times?; is numpy.select the proper way to do it, at all?

I assume dict could be used here, but don't see how exactly.

df["col"].map(d) where d is a dict with old and new values like {"cat1":"NEW_cat1"} wouldn't work (?) since I can't hardcode exact values that need to be replaced (and that's why I'm using str.contains).

Continue reading...
 
Top