关系抽取实验记录thunlpNRE

移动开发2024-04-12 01:38:36

2018 Robust Distant Supervision Relation Extraction via Deep Reinforcement
2016 Neural Relation Extraction with Selective Attention over Instances
代码： C and TensorFlow

一、tensorflow版本程序运行测试

0.关于OpenNRE

NREPapers
关系抽取框架：嵌入向量(词向量and位置向量) 、编码器(PCNN or CNN or RNN or Bi-RNN)、选择器(注意力、最大和、平均)、分类(softmax多分类)。
OpenNRE提供了句子级别的关系抽取和包级别的关系抽取任务的训练和测试。

1.下载并解压代码

unzip OpenNER-master.zip

2.实验数据集整理

公开原始数据集NYT10 Dataset
数据集及数据转换工具NYT10+Toolkit
数据集格式及存放位置：

3.可用的英文词向量

斯坦福 glove
谷歌 word2vec
词向量、训练集、测试集、关系集处理代码：

import json

def get_data(data_dir):

    list = []
    file = data_dir+'/'+data_dir+'.txt'
    fileObject = open(data_dir+'.json', 'w')
    with open(file, 'r', encoding='utf8') as f:
        line = f.readline().strip()
        while line:
            temp = line.split('\t')
            # print(temp)
            h_id = temp[0]
            t_id = temp[1]
            h = temp[2]
            t = temp[3]
            r = temp[4]
            sent = temp[5]
            d = {'sentence': sent,
                 'head': {'word': h, 'id': h_id},
                 'tail': {'word': t, 'id': t_id},
                 'relation': r}
            list.append(d)
            line = f.readline().strip()

    jsonData = json.dumps(list)
    fileObject.write(jsonData)
    fileObject.close()




def get_rel_data(data_dir):

    rf = data_dir+'/relation2id.txt'
    fileObject = open('rel2id.json', 'w')
    data = dict()
    with open(rf, 'r', encoding='utf8') as f:
        line = f.readline().strip()
        while line:
            temp = line.split(' ')
            print(temp)
            rel = temp[0]
            r_id = temp[1]
            if rel not in data:
                data[rel] = r_id
            line = f.readline().strip()

    jsonData = json.dumps(data)
    fileObject.write(jsonData+'\n')
    fileObject.close()


def get_vec():

    file = 'wiki-news-300d-1M.vec'
    fileObject = open('word_vec.json', 'w')
    list = []
    with open(file, 'r', encoding='utf8') as f:
        line = f.readline().strip()
        line = f.readline().strip()
        while line:
            # print(line)
            temp = line.split(' ')
            word = temp[0]
            vec = temp[1:51]

            if(word.istitle()):
                pass
            else:
                d = {'word': word,'vec': vec}
                list.append(d)
                if(len(list)>=10000):
                    break
            line = f.readline().strip()

    jsonData = json.dumps(list)
    fileObject.write(jsonData)
    fileObject.close()

# get_data(data_dir='test')
# get_data(data_dir='train')
# get_rel_data(data_dir='test')
# get_vec()

4.程序可运行

python train_demo.py nyt pcnn att

查看全文

https://www.xamrdz.com/mobile/4mb1907850.html