Distributed Supervised Sentiment Analysis of TweetsIntegrating Machine Learning and Streaming Analytics for Big Data Challenges in Communication and Audience Research
- Arcila Calderón, Carlos 1
- Ortega Mohedano, Félix 1
- Álvarez, Mateo 2
- Vicente Mariño, Miguel 3
-
1
Universidad de Salamanca
info
-
2
Universidad Rey Juan Carlos
info
-
3
Universidad de Valladolid
info
ISSN: 1139-5737
Ano de publicación: 2019
Título do exemplar: La investigación empírica en comunicación
Número: 42
Páxinas: 113-136
Tipo: Artigo
Outras publicacións en: Empiria: Revista de metodología de ciencias sociales
Resumo
El análisis a gran escala de tweets en tiempo real utilizando el análisis de sentimiento supervisado representa una oportunidad única para la investigación de comunicación y audiencias. El poner juntos los enfoques de aprendizaje automático y de analítica en tiempo real en un entorno distribuido puede ayudar a los investigadores a obtener datos valiosos de Twitter con el fin de clasificar de forma inmediata mensajes en función de su contexto, sin restricciones de tiempo o almacenamiento, mejorando los diseños transversales, longitudinales y experimentales con nuevas fuentes de datos. A pesar de que los investigadores de comunicación y audiencias ya han comenzado a utilizar los métodos computacionales en sus rutinas, la mayoría desconocen el uso de las tecnologías de computo distribuido para afrontar retos de dimensión big data. Este artículo describe la implementación de métodos de aprendizaje automático paralelizados en Apache Spark para predecir sentimientos de tweets en tiempo real y explica cómo este proceso puede ser escalado usando computación distribuida tanto comercial como académica, cuando los ordenadores personales son insuficientes para almacenar y analizar los datos. Se discuten las limitaciones de estos métodos y sus implicaciones en los estudios de medios, comunicación y audiencias.
Referencias bibliográficas
- Arcila, C.; Ortega, F.; Jiménez, J. & Trulleque, S. (2017). Análisis supervisado de sentimientos políticos en español: Clasificación en tiempo real de tweets basada en aprendizaje automático. El Profesional de la Información, 26 (5), 978-987.
- Bakliwal, A., Foster, J., van der Puil, J., O’Brien, R., Tounsi, L., & Hughes, M. (2013, June). Sentiment analysis of political tweets: Towards an accurate classifier. pp. 49- 58. Association for Computational Linguistics.
- Bakliwal, A., Arora, P., Madhappan, S., Kapre, N., Singh, M., & Varma, V. (2012). Mining sentiments from tweets. Proceedings of the WASSA, 12. pp. 11-18.
- Bastos, M. T., Mercea, D., & Charpentier, A. (2015). Tents, tweets, and events: The interplay between ongoing protests and social media. Journal of Communication, 65(2), 320-350. doi: 10.1111/jcom.12145
- Bermingham, A., & Smeaton, A. F. (2011). On using Twitter to monitor political sentiment and predict election results. In: Proceedings of the Workshop on Sentiment Analysis where AI meets Psychology (SAAIP), IJCNLLP 2011, pp. 2-10.
- Bermingham, A., & Smeaton, A. F. (2010, October). Classifying sentiment in microblogs: is brevity an advantage?. In Proceedings of the 19th ACM international conference on Information and knowledge management (pp. 1833-1836). ACM.
- Bird, S., Klein, E., & Loper, E. (2009). Natural language processing with Python: analyzing text with the natural language toolkit. Sebastopol, CA: O’Reilly Media, Inc.
- Bollen, J., Mao, H., & Pepe, A. (2011). Modelling public mood and emotion: Twitter sentiment and socio-economic phenomena. ICWSM, 11, 450-453.
- Cambria, E., Schuller, B., Liu, B., Wang, H., & Havasi, C. (2013). Knowledge-based approaches to concept-level sentiment analysis. IEEE Intelligent Systems, 28(2), 12- 14. doi: 10.1109/MIS.2013.45
- Choy, M., Cheong, M. L., Laik, M. N., & Shung, K. P. (2011). A sentiment analysis of Singapore Presidential Election 2011 using Twitter data with census correction. Report in arXiv preprint arXiv:1108.5520.
- Cobb, W. N. W. (2015). Trending now: using big data to examine public opinion of space policy. Space Policy, 32, 11-16. doi: 10.1016/j.spacepol.2015.02.008
- Coletto, M., Esuli, A., Lucchese, C., Muntean, C. I., Nardini, F. M., Perego, R., & Renso, C. (2016, August). Sentiment-enhanced multidimensional analysis of online social networks: perception of the mediterranean refugees crisis. In Advances in Social Networks Analysis and Mining (ASONAM), 2016 IEEE/ACM International Conference on (pp. 1270-1277). IEEE.
- Driscoll, K., & Walker, S. (2014). Big data, big questions| working within a black box: Transparency in the collection and production of big twitter data. International Journal of Communication, 8, 20. doi: 1932–8036/20140005
- Feldman, R. (2013). Techniques and applications for sentiment analysis. Communications of the ACM, 56(4), 82-89.
- Go, A., Bhayani, R., & Huang, L. (2009). Twitter sentiment classification using distant supervision. CS224N Project Report, Stanford, 1, 12.
- Kaneko, T., & Yanai, K. (2016). Event photo mining from twitter using keyword bursts and image clustering. Neurocomputing, 172, 143-158. doi: 10.1016/j.neucom.2015.02.081
- Kelleher, J. D., Mac Namee, B., & D’Arcy, A. (2015). Fundamentals of machine learning for predictive data analytics: algorithms, worked examples, and case studies. Cambridge, MA: MIT Press.
- Kinsley, H. (2017). PythonProgramming.
- Kranjc, J., Smailović, J., Podpečan, V., Grčar, M., Žnidaršič, M., & Lavrač, N. (2015). Active learning for sentiment analysis on data streams: Methodology and workflow implementation in the ClowdFlows platform. Information Processing & Management, 51(2), 187-203. doi: 10.1016/j.ipm.2014.04.001
- Krippendorff, K. (2004). Content analysis: An introduction to its methodology. Thousand Oaks, CA: Sage.
- Leetaru, K. (2012). Data mining methods for the content analyst: An introduction to the computational analysis of content. New York: Routledge.
- Li, J., & Xu, H. (2016). Suggest what to tag: Recommending more precise hashtags based on users’ dynamic interests and streaming tweet content. Knowledge-Based Systems, 106, 196-205. doi: 10.1016/j.knosys.2016.05.047
- Madlberger, L., & Almansour, A. (2014, November). Predictions based on Twitter—A critical view on the research process. In Data and Software Engineering (ICODSE), 2014 International Conference on (pp. 1-6). IEEE. pp. 1-6. doi: 10.1109/ICODSE.2014.7062667
- Makice, K. (2009). Twitter API: Up and running: Learn how to build applications with the Twitter API. Sebastopol, CA: O’Reilly Media, Inc.
- Neuendorf, K. A. (2016). The content analysis guidebook. Thousand Oaks, CA: Sage.
- Nodarakis, N., Sioutas, S., Tsakalidis, A. K., & Tzimas, G. (2016, March). Large Scale Sentiment Analysis on Twitter with Spark. In EDBT/ICDT Workshops (pp. 1-8).
- O’Connor, B., Balasubramanyan, R., Routledge, B. R., & Smith, N. A. (2010). From tweets to polls: Linking text sentiment to public opinion time series. ICWSM, 11, pp. 122-129.
- Pang, B., Lee, L., & Vaithyanathan, S. (2002, July). Thumbs up?: sentiment classification using machine learning techniques. In Proceedings of the ACL-02 conference on Empirical methods in natural language processing, Volume 10, pp. 79-86. Association for Computational Linguistics.
- Pond, P. (2016). Twitter time: A temporal analysis of tweet streams during televised political debate. Television & New Media, 17(2), 142-158. doi: 10.1177/1527476415616190
- Preethi, P. G., & Uma, V. (2015). Temporal sentiment analysis and causal rules extraction from tweets for event prediction. Procedia Computer Science, 48, 84-89. doi: 10.1016/j.procs.2015.04.154
- Raschka, S. (2015). Python machine learning. Birmingham: Packt Publishing Ltd.
- Pentreath, N. (2015). Machine Learning with Spark. Birmingham: Packt Publishing Ltd.
- Roesslein, J. (2009). Tweepy. An easy-to-use Python library for accessing the Twitter API.
- Shahin, S. (2016) When Scale Meets Depth: Integrating Natural Language Processing and Textual Analysis for Studying Digital Corpora. Communication Methods and Measures, 10(1), 28-50, doi: 10.1080/19312458.2015.1118447
- Sluban, B., Smailović, J., Battiston, S., & Mozetič, I. (2015). Sentiment leaning of influential communities in social networks. Computational Social Networks, 2(1), 1-21. doi: 10.1186/s40649-015-0016-5
- Smailović, J., Kranjc, J., Grčar, M., Žnidaršič, M., & Mozetič, I. (2015, October). Monitoring the Twitter sentiment during the Bulgarian elections. In Data Science and Advanced Analytics (DSAA), 2015. 36678 2015. IEEE International Conference on (pp. 1-10). IEEE. doi: 10.1109/DSAA.2015.7344886
- Smailović, J., Grčar, M., Lavrač, N., & Žnidaršič, M. (2013). Predictive sentiment analysis of tweets: A stock market application. In Human-Computer Interaction and Knowledge Discovery in Complex, Unstructured, Big Data (pp. 77-88).
- Springer Berlin Heidelberg. Spark Kafka Integration (2016). Spark Streaming + Kafka Integration Guide. kafka-integration.html
- Turck, M. & Hao, J. (2016). The Chart of the Big Data Landscape 2016 (Version 3.0)
- Turney, P. D. (2002, July). Thumbs up or thumbs down?: semantic orientation applied to unsupervised classification of reviews. In Proceedings of the 40th annual meeting on association for computational linguistics (pp. 417-424). Association for Computational Linguistics.
- van Zoonen, W., & Toni, G. L. A. (2016). Social media research: The application of supervised machine learning in organizational communication research. Computers in Human Behavior, 63, 132-141. doi: 10.1016/j.chb.2016.05.028
- Vinodhini, G., & Chandrasekaran, R. M. (2012). Sentiment analysis and opinion mining: a survey. International Journal of Advanced Research in Computer Science and Software Engineering, 2(6), 282-292.
- Wang, H., Can, D., Kazemzadeh, A., Bar, F., & Narayanan, S. (2012, July). A system for real-time twitter sentiment analysis of 2012 us presidential election cycle. In Proceedings of the ACL 2012 System Demonstrations (pp. 115-120). Association for Computational Linguistics.
- Wilson, T., Wiebe, J. & Hoffmann, P. (2005). Recognizing Contextual Polarity in Phrase-Level Sentiment Analysis. In Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing (pp. 347-354). Association for Computational Linguistics