如果想要生成词云图,首先我们需要对输入的文本进行词频统计。这里我们使用的是《小王子》中的一段文本进行统计:
- 词频统计(对其中的一些标点符号什么的进行了过滤,只剩下纯文本)
# 使用字典实现词频统计
# 这里使用《小王子》的一段
str1 = """
The grown-ups' response, this time, was to advise me to lay aside my drawings of boa constrictors, whether from the inside or the outside, and devote myself instead to geography, history, arithmetic and grammar. That is why, at the age of six, I gave up what might have been a magnificent career as a painter. I had been disheartened by the failure of my Drawing Number One and my Drawing Number Two. Grown-ups never understand anything by themselves, and it is tiresome for children to be always and forever explaining things to them.
So then I chose another profession, and learned to pilot airplanes. I have flown a little over all parts of the world; and it is true that geography has been very useful to me. At a glance I can distinguish China from Arizona. If one gets lost in the night, such knowledge is valuable.
In the course of this life I have had a great many encounters with a great many people who have been concerned with matters of consequence. I have lived a great deal among grown-ups. I have seen them intimately, close at hand. And that hasn't much improved my opinion of them.
Whenever I met one of them who seemed to me at all clear-sighted, I tried the experiment of showing him my Drawing Number One, which I have always kept. I would try to find out, so, if this was a person of true understanding. But, whoever it was, he, or she, would always say:
"That is a hat."
Then I would never talk to that person about boa constrictors, or primeval forests, or stars. I would bring myself down to his level. I would talk to him about bridge, and golf, and politics, and neckties. And the grown-up would be greatly pleased to have met such a sensible man.
"""
str1 = str1.replace("\n", "").replace("!","").replace(",", "").replace(".", "").replace("'", "").replace('"', "").replace(';', "").replace(':', "")
worlds2 = str1.split()
dic1 = {}
for i in worlds2:
if i not in dic1:
dic1[i] = 1
else:
dic1[i] += 1
print(dic1)
然后我们需要引入一些必要的库:pyecharts
最后将结果保存到HTML文件中,使用浏览器打开,做成可视化的效果。
- 生成词云文件
# 词云图
from pyecharts import options as opts
from pyecharts.charts import Page, WordCloud
worlds3 = []
for k,v in dic1.items():
temp = []
temp.append(k)
temp.append(int(v))
worlds3.append(tuple(temp))
print(worlds3)
def wordcloud():
return WordCloud().add(series_name="", data_pair=worlds3, word_size_range=[20, 100], shape="cardioid").set_global_opts(title_opts=opts.TitleOpts(title="小王子第一段词频统计"))
wordcloud().render("10_25_world_cloud.html")
可能以前我们觉得挺高大上的,但其实代码相对简单,并且效果非常不错,词语的大小说明了他在文中的出现次数,越在中间的出现次数越高,并且使用不同的颜色表示了不同的词语。鼠标在词语上面停留还能提示其在文中出现的次数。整体效果如上图所示。