Python实现词频统计的两种方法

词频统计是指在文本中计算每个单词出现的次数，是文本处理中一个最基本的任务。在Python中，可以使用多种方法实现词频统计，包括使用字典、列表、Counter类等数据结构。

一、使用字典

其中，使用字典实现词频统计是最基本的方法之一。具体实现步骤如下：

将文本转换为小写，并分割成单词列表。

text = "This is a sample text with several words. Here are some more words. And here are some more."  
words = text.lower().split()

2.创建一个空字典，用于存储每个单词的出现次数。

word_counts = {}

3.遍历单词列表，如果单词已经在字典中出现过，则将其出现次数加1，否则将其加入字典中并设置其出现次数为1。

for word in words:  
    if word in word_counts:  
        word_counts[word] += 1  
    else:  
        word_counts[word] = 1

4.打印每个单词的频率。

for word, count in word_counts.items():  
    print(word, count)

输出结果为：

this 1  
is 1  
a 1  
sample 1  
text 1  
with 1  
several 1  
words. 1  
here 2  
are 2  
some 2  
more 2  
and 1

二、使用Counter类

除了使用字典实现词频统计外，Python的collections模块中还提供了Counter类，可以方便地统计可迭代对象中元素的出现次数。使用Counter类实现词频统计的代码如下：

from collections import Counter  
  
text = "This is a sample text with several words. Here are some more words. And here are some more."  
words = text.lower().split()  
word_counts = Counter(words)  
for word, count in word_counts.items():  
    print(word, count)

输出结果与之前使用字典实现词频统计的结果相同。

word_counts.items()的数据类型是dict_items，可以先用dict()转换成字典，再做后续处理。

Python实现词频统计的两种方法

一、使用字典

二、使用Counter类

悦读