在python中，如何从一列中检查特定的单词/标记并在其新的相关列中显示它们的存在_python

我是python的新手。 我在MS Excel文件中有一列，其中使用了four tag ，分别是LOC , ORG , PER和MISC ， given data 如下：

1 LOC/Thai Buddhist temple;
2 PER/louis;
3 ORG/WikiLeaks;LOC/Southern Ocean;
4 ORG/queen;
5 PER/Sanchez;PER/Eli Wallach;MISC/The Good, The Bad and the Ugly;
6 
7 PER/Thomas Watson;
...................
...................
.............#continue upto 2,000 rows

我想要一个结果，在特定行中存在或不存在标签的情况下，如果存在某些标签，则在其特定的（新列，如下所示）列中放置"1" ，如果不存在则添加"0" 。我希望此excel文件中的所有4列均为LOC / ORG / PER / MISC，并将分别为2、3、4和5列，而first column is given data ，并且文件包含将近2815行，并且每行都从这些LOC / ORG / PER / MISC。

我的目标是从新列开始计数

LOC总数，ORG总数，PER总数和MISC总数

结果将是这样的：

             given data              LOC  ORG  PER MISC
1 LOC/Thai Buddhist temple;           1    0    0   0   #here only LOC is present
2 PER/louis;                          0     0    1  0   #here only PER is present
3 ORG/WikiLeaks;LOC/Southern Ocean;   1     1   0   0   #here LOC and ORG is present
4 PER/Eli Wallach;MISC/The Good;      0     0   1   1   #here PER and MISC is present
5    .................................................
6                                     0     0   0   0   #here no tag is present
7 .....................................................
.......................................................
..................................continue up to 2815 rows....

我是Python.so的初学者，我已尽力搜索其解决方案代码，但找不到与我的问题相关的任何程序，这就是我在此处发布的原因。 所以，请任何人帮助我。

我假设您已经成功地从excel中读取了数据，并使用pandas在python中创建了一个数据框（要读取excel文件，我们有df1 = read_excel（“ File / path / name.xls” Header = True / False））。

这是您的数据框df1的布局

Colnum | Tagstring
1      |LOC/Thai Buddhist temple;
2      |PER/louis;
3      |ORG/WikiLeaks;LOC/Southern Ocean;
4      |ORG/queen;
5      |PER/Sanchez;PER/Eli Wallach;MISC/The Good, The Bad and the Ugly;
6      |PER/Thomas Watson;

现在，有几种方法可以搜索字符串中的文本。

我将演示find函数：

语法：str.find（str，beg = 0，end = len（string））

str1 = "LOC";
str2 = "PER";
str3 = "ORG";
str4 = "MISC";

df1["LOC"] = (if Tagstring.find(str1) >= 0 then 1 else 0).astype('int')
df1["PER"] = (if Tagstring.find(str2) >= 0 then 1 else 0).astype('int')
df1["ORG"] = (if Tagstring.find(str3) >= 0 then 1 else 0).astype('int')
df1["MISC"] = (if Tagstring.find(str4) >= 0 then 1 else 0).astype('int')

如果您已读取数据df则可以执行以下操作：

pd.concat([df,pd.DataFrame({i:df.Tagstring.str.contains(i).astype(int) for i in 'LOC  ORG  PER MISC'.split()})],axis=1)
Out[716]: 
                                                 Tagstring  LOC  ORG  PER    MISC 
Colnum                                                                      
1                                LOC/Thai Buddhist temple;    1    0    0       0
2                                               PER/louis;    0    0    1       0
3                        ORG/WikiLeaks;LOC/Southern Ocean;    1    1    0       0
4                                               ORG/queen;    0    1    0       0
5        PER/Sanchez;PER/Eli Wallach;MISC/The Good, The...    0    0    1       1
6                                       PER/Thomas Watson;    0    0    1       0

在python中，如何从一列中检查特定的单词/标记并在其新的相关列中显示它们的存在

问题描述

1楼

2楼