题名基于微博大数据的群体行为分析
作者李建伟
学位类别硕士
答辩日期2017-05-24
授予单位中国科学院沈阳自动化研究所
授予地点沈阳
导师郑泽宇
关键词爬虫 用户融入度 群体行为 机器学习 可视化
其他题名Group Behavior Analysis Based on micro-blog Big Data
学位专业计算机应用技术
中文摘要随着移动互联网的迅猛发展,人们能够通过智能手机等轻便设备上的APP接入到互联网从而获取和发布信息,微博作为一种社交媒介,其易于使用、开放互动的特点迅速积累了众多用户。新浪微博作为国内主流的社交平台吸引了众多用户的参与,随着一次次微博热点事件的引爆,微博在用户中具备了颇高的人气,伴随微博的发展而产生的海量数据为科研和商业带来了巨大发展潜力,微博成为各学科领域科研数据获取的重要途径,其提供的用户信息数据与文本内容数据能够被用于众多学科和领域的研究。 微博用户作为参与微博社交网络的主要参与者,如何对用户进行客观评价和量化分析,催发了众多研究成果如对用户在微博社交网络中的活跃程度,对用户在微博社交网络中的影响力进行评价和排序,分析用户的可信度及其微博言论的可信程度都是十分有价值的研究方向。本文以新浪微博作为群体行为分析的平台,使用大数据分析技术中常用的思路和方法对新浪微博的用户群体行为进行研究,对用户的评价指标进行整理,提出能够刻画用户脱离社交网络的新指标用户融入度,并改进机器学习方法在用户指标评价中的使用方式,主要完成以下研究内容: 研究如何从新浪微博获取用户信息数据和微博内容数据。以新浪微博所公开发布的数据集为基础,以爬虫手段为辅助工具相结合来获取数据。使用scrapy框架来设计爬虫,使用scrapy-redis组件来改进爬虫的性能,介绍了爬虫设计中需要考虑的诸多问题并实现了用户数据优先的采集方法。 总结针对微博用户的评价方法,提出微博用户融入度并使用机器学习方法来构建用户融入度模型。针对用户评价指标定义较为模糊,重新梳理了用户活跃度、用户影响力和用户可信度三个用户评价指标,总结了基于权重的评价方法、基于PageRank的评价方法和基于机器学习的评价方法,在不同的特征集下使用机器学习方法构建用户融入度模型;在时间维度,使用长短期神经网络来对多时间序列进行预测,在空间维度,探索空间数据可视化方法。 本文创新点是在基于scrapy框架的爬虫中实现了用户优先的数据抓取方法;首次横向整理了微博用户评价指标,并提出了用户融入度指标来刻画用户脱离微博社交网络的风险,从预测风险角度使用机器学习方法构建用户融入度模型;从时间维度出发,使用长短期神经网络LSTM构建用户群体行为的多时间序列预测模型;面向未来,探索空间数据可视化方法。
英文摘要With the rapid development of mobile Internet, using APP in intelligent mobile phone and other portable equipment, people can obtain and release information through the Internet. micro-blog, as a kind of social media, accumulated a large number of users because it is easy to use and interactive quickly. Sina micro-blog, as the mainstream of domestic social networking platform, attract a large number of users and has a high popularity among users with micro-blog detonated hot events; massive data generated along with the development of micro-blog has brought great development potential for scientific research and business, micro-blog has become an important way in various fields of scientific research data acquisition, and it provides user information and text data. Micro-blog users are the main body in the micro-blog social network, how to analyze the objective evaluation and quantification of the user? There are a large number of research results, the user activity in micro-blog's social network, to evaluate and rank user influence in micro-blog in a social network, user’s credibility and user comment’s credibility analysis is very valuable. For the user evaluation indexes, this paper takes Sina micro-blog as the research object, combined with the big data analysis of user groups of sina micro-blog research, the evaluation index of the user's arrangement, put forward new indexes to describe the probability that users escape from social network, and the improvement of machine learning methods in the user evaluation indexes. the main content is as follows: How to get user information and content data from Sina micro-blog. Based on the data set published by Sina and micro-blog, the crawler is used as an auxiliary tool to obtain data. The use of scrapy framework to design the crawler, the use of scrapy-redis components to improve the performance of crawler, many problems needed to be considered in the design of crawler, the realization of the user data grab first method. Summarizing the evaluation methods of micro-blog users, this paper puts forward the involvement degree of micro-blog users and uses the method of machine learning to build the model of user integration. At present, the user evaluation indexes definition is fuzzy, so we card the active users, user influence and user credibility, summed up based on the weights evaluation method, based on PageRank evaluation method and based on machine learning evaluation method, we construct model in different feature set based on machine learning evaluation method; in the dimension of time, use short-term neural network to forecast the time sequence; in the dimension of space, explore spatial data visualization method. The innovation of this paper is in the framework of scrapy crawler realized user data grab first method; arrange micro-blog user evaluation indexes, and put forward the integration degree of the user which can describe risk that user escapes from the micro-blog social network, build model for involvement degree of micro-blog users from the perspective of risk prediction using machine learning methods; from the point of time, using LSTM neural network to construct user group behavior prediction model; for the future, to explore the spatial data visualization method.
语种中文
产权排序1
内容类型学位论文
源URL[http://ir.sia.cn/handle/173321/20525]  
专题沈阳自动化研究所_数字工厂研究室
推荐引用方式
GB/T 7714
李建伟. 基于微博大数据的群体行为分析[D]. 沈阳. 中国科学院沈阳自动化研究所. 2017.
个性服务
查看访问统计
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。


©版权所有 ©2017 CSpace - Powered by CSpace