一. neo4j安装
-
安装jdk
可以安装openjdk,neo4j 4.0版本以上需要openjdk-11,3.5版本需要openjdk-8。
如果默认软件源没有openjdk,可以添加ppa源。
如果ubuntu版本比较旧(如16.04),可能装openjdk-11比较麻烦,可以装openjdk-8。
sudo add-apt-repository -y ppa:openjdk-r/ppa
sudo apt-get update
sudo apt-get install openjdk-8-jdk
2. 安装neo4j
wget -O - https://debian.neo4j.org/neotechnology.gpg.key | sudo apt-key add -
echo 'deb https://debian.neo4j.org/repo stable/' | sudo tee -a /etc/apt/sources.list.d/neo4j.list
sudo apt-get update
sudo apt-get install neo4j
sudo apt-get install cypher-shell
3. 启动或停止服务
neo4j status
neo4j start
neo4j stop
通过cypher-shell可以进入neo4j交互界面,默认用户名和密码是"neo4j"。
在交互界面可以通过CALL dbms.changePassword('password');
修改密码。
4. 设置远程浏览器访问
默认只能localhost访问,需要远程访问需修改/etc/neo4j/neo4j.conf
,去掉注释即可
#dbms.connectors.default_listen_address=0.0.0.0
二. py2neo使用
节点和关系
In [1]: from py2neo import Graph, Node, Relationship
In [2]: a = Node("Person", name="Alice")
In [3]: b = Node("Person", name="Bob")
In [4]: ab = Relationship(a, "KNOWS", b)
In [5]: print(type(a))
<class 'py2neo.data.Node'>
In [6]: print(a)
(:Person {name: 'Alice'})
In [7]: print(type(ab))
<class 'py2neo.data.KNOWS'>
In [8]: print(ab)
(Alice)-[:KNOWS {}]->(Bob)
这样就成功创建了两个 Node 和两个 Node 之间的 Relationship。 Node 和 Relationship 都继承了 PropertyDict 类,它可以赋值很多属性,类似于字典的形式。
Subgraph
Subgraph子图,是 Node 和 Relationship 的集合,最简单的构造子图的方式是通过关系运算符,如下:
# 创建subgraph
In [10]: s = a | b | ab
In [11]: print(type(s))
<class 'py2neo.data.Subgraph'>
In [12]: print(s)
Subgraph({Node('Person', name='Alice'), Node('Person', name='Bob')}, {KNOWS(Node('Person', name='Alice'), Node('Person', name='Bob'))})
# 可以通过 nodes () 和 relationships () 方法获取所有的 Node 和 Relationship
In [20]: type(s.nodes)
Out[20]: py2neo.collections.SetView
In [18]: list(s.nodes)
Out[18]: [Node('Person', name='Alice'), Node('Person', name='Bob')]
In [19]: list(s.relationships)
Out[19]: [KNOWS(Node('Person', name='Alice'), Node('Person', name='Bob'))]
# subgraph求交集
In [21]: s2 = a | b
In [22]: s&s2
Out[22]: Subgraph({Node('Person', name='Alice'), Node('Person', name='Bob')}, {})
walkable
Walkable 是增加了遍历信息的 Subgraph,可以通过 + 号便可以构建一个 Walkable 对象,如:
In [34]: a = Node("Person", name="Alice")
In [35]: b = Node("Person", name="Bob")
In [36]: c = Node("Person", name="Jack")
In [37]: d = Node("Dog", name="Pupy")
In [38]: ab = Relationship(a, "KNOWS", b)
In [39]: bc = Relationship(b, "LIKES", c)
In [40]: cd = Relationship(c, "HAS", d)
# 创建walkable对象
In [41]: w = ab+bc+cd
In [42]: print(type(w))
<class 'py2neo.data.Path'>
In [43]: print(w)
(Alice)-[:KNOWS {}]->(Bob)-[:LIKES {}]->(Jack)-[:HAS {}]->(Pupy)
In [44]: from py2neo import walk
# 用walk方法从起始节点遍历到终止节点
In [45]: for item in walk(w):
...: print(item)
(:Person {name: 'Alice'})
(Alice)-[:KNOWS {}]->(Bob)
(:Person {name: 'Bob'})
(Bob)-[:LIKES {}]->(Jack)
(:Person {name: 'Jack'})
(Jack)-[:HAS {}]->(Pupy)
(:Dog {name: 'Pupy'})
# 用 start_node ()、end_node ()、nodes ()、relationships () 方法来获取起始 Node、终止 Node、所有 Node 和 Relationship
In [47]: w.start_node
Out[47]: Node('Person', name='Alice')
In [48]: w.end_node
Out[48]: Node('Dog', name='Pupy')
In [49]: w.nodes
Out[49]:
(Node('Person', name='Alice'),
Node('Person', name='Bob'),
Node('Person', name='Jack'),
Node('Dog', name='Pupy'))
In [50]: w.relationships
Out[50]:
(KNOWS(Node('Person', name='Alice'), Node('Person', name='Bob')),
LIKES(Node('Person', name='Bob'), Node('Person', name='Jack')),
HAS(Node('Person', name='Jack'), Node('Dog', name='Pupy')))
Graph
- 初始化
Graph是和 Neo4j 数据交互的 最重要得API,提供了许多方法来操作 Neo4j 数据库。 Graph 在初始化的时候需要传入连接的 URI,初始化参数有 bolt、secure、host、http_port、https_port、bolt_port、user、password,详情参考:http://py2neo.org/v3/database.html#py2neo.database.Graph。 初始化的实例如下:
g = Graph(host='localhost', auth=('neo4j', 'passwd'))
- 创建数据
可以直接创建子图,也可以创建单个节点或关系
In [34]: a = Node("Person", name="Alice")
In [35]: b = Node("Person", name="Bob")
In [36]: c = Node("Person", name="Jack")
In [37]: d = Node("Dog", name="Pupy")
In [38]: ab = Relationship(a, "KNOWS", b)
In [39]: bc = Relationship(b, "LIKES", c)
In [40]: cd = Relationship(c, "HAS", d)
In [41]: ss = a|b|c|d|ab|bc|cd
In [42]: g.create(ss)
得到如下结果:
再添加一个关系
r = Relationship(a, 'KONWS', c)
g.create(r)
得到结果如下:
- 查找节点
使用NodeMatcher查找节点。
In [40]: from py2neo import NodeMatcher, RelationshipMatcher
In [41]: nm = NodeMatcher(g)
In [43]: res = nm.match('Person')
In [44]: list(res)
Out[44]:
[Node('Person', name='Bob'),
Node('Person', name='Alice'),
Node('Person', name='Jack')]
# 返回查找结果得第一个
In [58]: res = nm.match('Person').first()
In [59]: res
Out[59]: Node('Person', name='Bob')
In [49]: res = nm.match('Dog', name='Pupy')
In [50]: list(res)
Out[50]: [Node('Dog', name='Pupy')]
# 使用正则匹配查询
In [56]: res = nm.match('Person').where('_.name=~"A.*"')
In [57]: list(res)
Out[57]: [Node('Person', name='Alice')]
first()返回单个节点
limit(amount)返回底部节点的限值条数
skip(amount)返回顶部节点的限值条数
order_by(fields)排序
where(conditions, **properties)筛选条件
- 查找关系
可以使用g.match查找关系,也可以使用RelationshipMatcher,后者更强大。
In [40]: from py2neo import NodeMatcher, RelationshipMatcher
In [42]: rm = RelationshipMatcher(g)
In [96]: list(g.match())
Out[96]:
[LIKES(Node('Person', name='Bob'), Node('Person', name='Jack')),
KONWS(Node('Person', name='Alice'), Node('Person', name='Jack')),
KNOWS(Node('Person', name='Alice'), Node('Person', name='Bob')),
HAS(Node('Person', name='Jack'), Node('Dog', name='Pupy'))]
In [63]: res = g.match(r_type='LIKES')
In [64]: list(res)
Out[64]: [LIKES(Node('Person', name='Bob'), Node('Person', name='Jack'))]
# 查询以某个节点为头节点的某个关系,例如要查询白血病的并发症
In [293]: a = nm.match('疾病', name='白血病').first()
In [294]: a
Out[294]: Node('疾病', name='白血病')
In [295]: list(g.match(r_type='并发症', nodes=[a]))
Out[295]:
[并发症(Node('疾病', name='白血病'), Node('疾病', name='白血病性中枢神经感染')),
并发症(Node('疾病', name='白血病'), Node('疾病', name='白血病脑出血')),
并发症(Node('疾病', name='白血病'), Node('疾病', name='肠功能衰竭')),
并发症(Node('疾病', name='白血病'), Node('疾病', name='卡氏肺囊虫感染'))]
In [66]: res2 = rm.match(r_type='LIKES')
In [67]: list(res2)
Out[67]: [LIKES(Node('Person', name='Bob'), Node('Person', name='Jack'))]
- 批量插入
批量插入时要注意避免插入很多相同节点(即使类型和值都相同,但多次用Node构建,产生的节点就是不同的,因为id不同),如下示例:
In [258]: a1 = Node('Person', '小明')
In [259]: a2 = Node('Person', '小明')
In [260]: a1==a2
Out[260]: False
In [261]: id(a1)
Out[261]: 139971127871536
In [262]: id(a2)
Out[262]: 139971551445936
因此在批量插入时,尤其是对表格类数据,要注意避免多次构造具有相同类型和值的节点,可以在用Node构建节点前先用NodeMatcher查询是否已经存在相同类型和值的节点。下边是一个据体的批量插入的例子:
g = Graph(host='localhost', auth=('neo4j', 'password'))
nm = NodeMatcher(g)
for i in data:
spos = i['spo_list']
for spo in spos:
p, sub, obj, sub_type, obj_type = spo.values()
sub_existed = nm.match(sub_type, name=sub).first() # 查询是否已存在相同类型和值的节点
obj_existed = nm.match(obj_type, name=obj).first()
if sub_existed and obj_existed: # 两个节点之间只能有一种关系,因此如果sub和obj都已经存在了,就不再插入
continue
elif sub_existed:
obj_node = Node(obj_type, name=obj) # 只存在sub节点,则需要构建新的obj节点
rel = Relationship(sub_existed, p, obj_node)
elif obj_existed:
sub_node = Node(sub_type, name=sub)
rel = Relationship(sub_node, p, obj_existed)
else:
sub_node = Node(sub_type, name=sub)
obj_node = Node(obj_type, name=obj)
rel = Relationship(sub_node, p, obj_node)
g.create(rel)
参考
- https://www.cnblogs.com/selfcs/p/12658740.html
- https://py2neo.readthedocs.io/en/latest/database/work.html
- https://www.cnblogs.com/qiujichu/p/13032254.html
- http://foreversong.cn/archives/1271
- https://cuiqingcai.com/4778.html