Bootstrap

利用情感种子词计算数据集中最具有正负情感的前50个词【社会计算大作业】

说明

需要相关文件的,请访问我的 github仓库,指路。拜托顺手点个小 star

步骤

读取评论文件,按照正向和负向两个分类把评论分别写入两个新的文件(正向的评论和负向的评论)。读取前4000条写入正向的评论文件,后8000条写入负向的评论文件。

def separate_csv(file):
    """ 将评论按照正向或者负向分别写入两个文件 """
    a = 1
    with open(file, "r", encoding="utf-8") as f:
        reader = csv.reader(f)
        for row in reader:
            if a < 4002:
                a += 1
                continue
            with open("comments_0.csv", "a", encoding="utf-8", newline="") as f2:
                wirter = csv.writer(f2)
                wirter.writerow(row)
            a += 1
    a = 1
    with open(file, "r", encoding="utf-8") as f:
        reader = csv.reader(f)
        for row in reader:
            if a == 1:
                a += 1
                continue
            with open("comments_1.csv", "a", encoding="utf-8", newline="") as f2:
                wirter = csv.writer(f2)
                wirter.writerow(row)
            a += 1
            if a > 4001:
                break

然后我们用jieba分词工具分别获取正向和负向评价中出现频率最高的50个词,分别写入对应的文件。

def jieba_get_high_frequency_words():
    """ 用jieba分词分别提取出正向和负向的高频词 """
    col_name = [
        'ID',
        'comment'
    ]
    csvpd = pd.read_csv("comments_1.csv", names=col_name)['comment']
    data = ''.join(csvpd)
    with open("high_frequency_word_1.csv", "w", encoding="utf-8", newline="") as f:
        csvwriter = csv.writer(f)
        i = 1
        for keyword, weight in textrank(data, topK=50, withWeight=True):
            csvwriter.writerow([i, keyword])
 
;