Real-Time Analytics with Hadoop: Integrating Streaming Engines for Performance Gains
Main Article Content
Abstract
The rising demand for real-time data analytics in domains such as the Internet of Things (IoT) and telecommunications necessitates hybrid big data architectures that seamlessly combine batch and stream processing. This study investigates the integration of Hadoop with real-time streaming engines, specifically Apache Storm and Apache Flink, to address the challenges of low-latency analytics within traditional big data frameworks. We analyze performance trade-offs, latency mitigation techniques, and fault tolerance mechanisms involved in such hybrid deployments. Through benchmarking and architectural evaluation, the research identifies key design considerations, including pipeline optimization and efficient resource management strategies that support concurrent batch and real-time workloads. Empirical insights from IoT and telecom use cases illustrate the effectiveness of integrating Hadoop’s scalable storage with the high-throughput, low-latency processing capabilities of modern stream engines. The findings affirm the practicality and performance benefits of adopting a unified analytics ecosystem for real-time data-driven decision-making.
Downloads
Metrics
Article Details

This work is licensed under a Creative Commons Attribution 4.0 International License.
You are free to:
- Share — copy and redistribute the material in any medium or format for any purpose, even commercially.
- Adapt — remix, transform, and build upon the material for any purpose, even commercially.
- The licensor cannot revoke these freedoms as long as you follow the license terms.
Under the following terms:
- Attribution — You must give appropriate credit , provide a link to the license, and indicate if changes were made . You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use.
- No additional restrictions — You may not apply legal terms or technological measures that legally restrict others from doing anything the license permits.
Notices:
You do not have to comply with the license for elements of the material in the public domain or where your use is permitted by an applicable exception or limitation .
No warranties are given. The license may not give you all of the permissions necessary for your intended use. For example, other rights such as publicity, privacy, or moral rights may limit how you use the material.
References
Zaharia, M., Chowdhury, M., Das, T., Dave, A., & Shenker, S. (2010). Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing. Proceedings of the 9th USENIX conference on Networked Systems Design and Implementation (NSDI’10), 2(1), 15–28.
Soni, M., & Chhajed, S. (2014). Hadoop in Action: Real-Time Analytics with Apache Hadoop. Packt Publishing.
Kim, B., Lee, S., & Kim, Y. (2013). Real-Time Stream Processing with Apache Storm and Hadoop. Proceedings of the International Conference on Cloud Computing and Big Data.
Kreps, J., Narkhede, N., & Rao, J. (2011). Kafka: A Distributed Messaging System for Log Processing. Proceedings of the 6th International Workshop on Networking Meets Databases.
Davy, M., & Wang, X. (2014). A Study of Apache Flink for Big Data Streaming Analytics. Proceedings of the International Conference on Big Data Computing and Communications.
Agarwal, R., & Agrawal, R. (2016). Streaming Analytics with Apache Flink: A New Approach for Processing Data Streams. IEEE Transactions on Big Data, 2(1), 15-20.
Gubbi, J., Buyya, R., Marusic, S., & Palaniswami, M. (2013). Internet of Things (IoT): A Vision, Architectural Elements, and Future Directions. Future Generation Computer Systems, 29(7), 1645–1660.
Meng, X., Bradley, J., Yavuz, B., & Liu, S. (2016). Mllib: Scalable Machine Learning on Apache Spark. Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.
White, T. (2012). Hadoop: The Definitive Guide. O’Reilly Media.
Dastgheibi, S. A., & Fox, A. (2014). Real-Time Big Data Stream Processing with Apache Kafka. Proceedings of the International Workshop on Big Data.
Soni, S., & Rani, R. (2017). Real-Time Data Stream Analytics Using Apache Flink: A Survey. International Journal of Computer Applications, 167(6), 1-7.
Dean, J., & Ghemawat, S. (2008). MapReduce: Simplified Data Processing on Large Clusters. Proceedings of the 6th USENIX Symposium on Operating Systems Design and Implementation (OSDI’04).
Zhang, Z., & Zhang, L. (2015). Performance Analysis of Apache Hadoop and Apache Spark for Big Data Processing. Proceedings of the International Conference on Data Mining and Big Data.
Huang, X., & Cao, Y. (2017). Design and Optimization of Big Data Real-Time Processing System Based on Hadoop and Apache Storm. International Journal of Computer Science and Network Security, 17(4), 69-75.
Li, Y., & Liu, Y. (2016). A Comparative Study of Real-Time Stream Processing Frameworks: Apache Storm and Apache Flink. Proceedings of the International Conference on Computational Intelligence and Communication Networks.
Ucar, N., & Yildirim, E. (2019). Performance Evaluation of Stream Processing Frameworks for Big Data Analytics. Future Generation Computer Systems, 89, 20-30.
Gajbhiye, S., & Apte, M. (2018). Real-Time Big Data Processing and Analytics: A Case Study of IoT in Smart City. Proceedings of the 2nd International Conference on Cloud Computing and Data Science.
Hasan, S. S., & Zulkernine, M. (2017). Performance Evaluation of Streaming Analytics Systems: A Survey of Apache Storm, Spark Streaming, and Flink. Proceedings of the International Conference on Cloud Computing and Data Science.
Dong, M., & Liu, Q. (2019). Efficient Data Stream Processing and Its Applications in IoT. International Journal of Computing and Digital Systems, 8(1), 23-30.
Pal, S., & Kundu, M. (2015). Real-Time Data Processing in Hadoop Using Apache Flink. Proceedings of the International Conference on Big Data.
Ekanayake, J., & Pallickara, S. (2011). Real-Time Stream Processing with Apache Storm. Proceedings of the International Conference on Cloud Computing Technology and Science (CloudCom), 148-155.
Milani, M., & Triani, F. (2018). Real-Time Big Data Processing with Apache Flink: A Comparative Study. Computers & Electrical Engineering, 68, 775-782.
Basu, A., & Soni, M. (2017). A Review on Real-Time Big Data Stream Processing with Apache Kafka and Apache Storm. International Journal of Computer Applications, 160(5), 23-31.
Chaudhary, A., & Agrawal, R. (2015). Integration of Hadoop with Real-Time Stream Processing for Big Data Analytics. IEEE International Conference on Big Data (Big Data), 234-240.
Yan, Z., & Liu, Y. (2016). Real-Time Big Data Analytics with Apache Flink and Hadoop. Journal of Software Engineering and Applications, 9(6), 384-390.