Real-Time Analytics with Hadoop: Integrating Streaming Engines for Performance Gains

Main Article Content

Harsha Vardhan Reddy Goli

Abstract

The rising demand for real-time data analytics in domains such as the Internet of Things (IoT) and telecommunications necessitates hybrid big data architectures that seamlessly combine batch and stream processing. This study investigates the integration of Hadoop with real-time streaming engines, specifically Apache Storm and Apache Flink, to address the challenges of low-latency analytics within traditional big data frameworks. We analyze performance trade-offs, latency mitigation techniques, and fault tolerance mechanisms involved in such hybrid deployments. Through benchmarking and architectural evaluation, the research identifies key design considerations, including pipeline optimization and efficient resource management strategies that support concurrent batch and real-time workloads. Empirical insights from IoT and telecom use cases illustrate the effectiveness of integrating Hadoop’s scalable storage with the high-throughput, low-latency processing capabilities of modern stream engines. The findings affirm the practicality and performance benefits of adopting a unified analytics ecosystem for real-time data-driven decision-making.

Downloads

Download data is not yet available.

Metrics

Metrics Loading ...

Article Details

How to Cite
Reddy Goli, H. V. (2020). Real-Time Analytics with Hadoop: Integrating Streaming Engines for Performance Gains. Turkish Journal of Computer and Mathematics Education (TURCOMAT), 11(2), 1347–1358. https://doi.org/10.61841/turcomat.v11i2.15250
Section
Research Articles

References

Zaharia, M., Chowdhury, M., Das, T., Dave, A., & Shenker, S. (2010). Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing. Proceedings of the 9th USENIX conference on Networked Systems Design and Implementation (NSDI’10), 2(1), 15–28.

Soni, M., & Chhajed, S. (2014). Hadoop in Action: Real-Time Analytics with Apache Hadoop. Packt Publishing.

Kim, B., Lee, S., & Kim, Y. (2013). Real-Time Stream Processing with Apache Storm and Hadoop. Proceedings of the International Conference on Cloud Computing and Big Data.

Kreps, J., Narkhede, N., & Rao, J. (2011). Kafka: A Distributed Messaging System for Log Processing. Proceedings of the 6th International Workshop on Networking Meets Databases.

Davy, M., & Wang, X. (2014). A Study of Apache Flink for Big Data Streaming Analytics. Proceedings of the International Conference on Big Data Computing and Communications.

Agarwal, R., & Agrawal, R. (2016). Streaming Analytics with Apache Flink: A New Approach for Processing Data Streams. IEEE Transactions on Big Data, 2(1), 15-20.

Gubbi, J., Buyya, R., Marusic, S., & Palaniswami, M. (2013). Internet of Things (IoT): A Vision, Architectural Elements, and Future Directions. Future Generation Computer Systems, 29(7), 1645–1660.

Meng, X., Bradley, J., Yavuz, B., & Liu, S. (2016). Mllib: Scalable Machine Learning on Apache Spark. Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.

White, T. (2012). Hadoop: The Definitive Guide. O’Reilly Media.

Dastgheibi, S. A., & Fox, A. (2014). Real-Time Big Data Stream Processing with Apache Kafka. Proceedings of the International Workshop on Big Data.

Soni, S., & Rani, R. (2017). Real-Time Data Stream Analytics Using Apache Flink: A Survey. International Journal of Computer Applications, 167(6), 1-7.

Dean, J., & Ghemawat, S. (2008). MapReduce: Simplified Data Processing on Large Clusters. Proceedings of the 6th USENIX Symposium on Operating Systems Design and Implementation (OSDI’04).

Zhang, Z., & Zhang, L. (2015). Performance Analysis of Apache Hadoop and Apache Spark for Big Data Processing. Proceedings of the International Conference on Data Mining and Big Data.

Huang, X., & Cao, Y. (2017). Design and Optimization of Big Data Real-Time Processing System Based on Hadoop and Apache Storm. International Journal of Computer Science and Network Security, 17(4), 69-75.

Li, Y., & Liu, Y. (2016). A Comparative Study of Real-Time Stream Processing Frameworks: Apache Storm and Apache Flink. Proceedings of the International Conference on Computational Intelligence and Communication Networks.

Ucar, N., & Yildirim, E. (2019). Performance Evaluation of Stream Processing Frameworks for Big Data Analytics. Future Generation Computer Systems, 89, 20-30.

Gajbhiye, S., & Apte, M. (2018). Real-Time Big Data Processing and Analytics: A Case Study of IoT in Smart City. Proceedings of the 2nd International Conference on Cloud Computing and Data Science.

Hasan, S. S., & Zulkernine, M. (2017). Performance Evaluation of Streaming Analytics Systems: A Survey of Apache Storm, Spark Streaming, and Flink. Proceedings of the International Conference on Cloud Computing and Data Science.

Dong, M., & Liu, Q. (2019). Efficient Data Stream Processing and Its Applications in IoT. International Journal of Computing and Digital Systems, 8(1), 23-30.

Pal, S., & Kundu, M. (2015). Real-Time Data Processing in Hadoop Using Apache Flink. Proceedings of the International Conference on Big Data.

Ekanayake, J., & Pallickara, S. (2011). Real-Time Stream Processing with Apache Storm. Proceedings of the International Conference on Cloud Computing Technology and Science (CloudCom), 148-155.

Milani, M., & Triani, F. (2018). Real-Time Big Data Processing with Apache Flink: A Comparative Study. Computers & Electrical Engineering, 68, 775-782.

Basu, A., & Soni, M. (2017). A Review on Real-Time Big Data Stream Processing with Apache Kafka and Apache Storm. International Journal of Computer Applications, 160(5), 23-31.

Chaudhary, A., & Agrawal, R. (2015). Integration of Hadoop with Real-Time Stream Processing for Big Data Analytics. IEEE International Conference on Big Data (Big Data), 234-240.

Yan, Z., & Liu, Y. (2016). Real-Time Big Data Analytics with Apache Flink and Hadoop. Journal of Software Engineering and Applications, 9(6), 384-390.