2018 Robust Distant Supervision Relation Extraction via Deep Reinforcement
2016 Neural Relation Extraction with Selective Attention over Instances
代码: C and TensorFlow
一、tensorflow版本程序运行测试
0.关于OpenNRE
NREPapers
关系抽取框架:嵌入向量(词向量and位置向量) 、编码器(PCNN or CNN or RNN or Bi-RNN)、选择器(注意力、最大和、平均)、分类(softmax多分类)。
OpenNRE提供了句子级别的关系抽取和包级别的关系抽取任务的训练和测试。
1.下载并解压代码
unzip OpenNER-master.zip
2.实验数据集整理
公开原始数据集NYT10 Dataset
数据集及数据转换工具NYT10+Toolkit
数据集格式及存放位置:
3.可用的英文词向量
斯坦福 glove
谷歌 word2vec
词向量、训练集、测试集、关系集处理代码:
import json
def get_data(data_dir):
list = []
file = data_dir+'/'+data_dir+'.txt'
fileObject = open(data_dir+'.json', 'w')
with open(file, 'r', encoding='utf8') as f:
line = f.readline().strip()
while line:
temp = line.split('\t')
# print(temp)
h_id = temp[0]
t_id = temp[1]
h = temp[2]
t = temp[3]
r = temp[4]
sent = temp[5]
d = {'sentence': sent,
'head': {'word': h, 'id': h_id},
'tail': {'word': t, 'id': t_id},
'relation': r}
list.append(d)
line = f.readline().strip()
jsonData = json.dumps(list)
fileObject.write(jsonData)
fileObject.close()
def get_rel_data(data_dir):
rf = data_dir+'/relation2id.txt'
fileObject = open('rel2id.json', 'w')
data = dict()
with open(rf, 'r', encoding='utf8') as f:
line = f.readline().strip()
while line:
temp = line.split(' ')
print(temp)
rel = temp[0]
r_id = temp[1]
if rel not in data:
data[rel] = r_id
line = f.readline().strip()
jsonData = json.dumps(data)
fileObject.write(jsonData+'\n')
fileObject.close()
def get_vec():
file = 'wiki-news-300d-1M.vec'
fileObject = open('word_vec.json', 'w')
list = []
with open(file, 'r', encoding='utf8') as f:
line = f.readline().strip()
line = f.readline().strip()
while line:
# print(line)
temp = line.split(' ')
word = temp[0]
vec = temp[1:51]
if(word.istitle()):
pass
else:
d = {'word': word,'vec': vec}
list.append(d)
if(len(list)>=10000):
break
line = f.readline().strip()
jsonData = json.dumps(list)
fileObject.write(jsonData)
fileObject.close()
# get_data(data_dir='test')
# get_data(data_dir='train')
# get_rel_data(data_dir='test')
# get_vec()
4.程序可运行
python train_demo.py nyt pcnn att