Real-Time Disease Prediction Using Big Data and Human-Computer Interaction
DOI:
https://doi.org/10.55544/sjmars.3.5.7Keywords:
Healthcare, Stream processing, human-computer interaction, Big data, Apache Spark, Internet of ThingsAbstract
Big data streaming involves managing the vast volumes of data generated continuously by wearable medical devices with sensors, healthcare cloud platforms, and mobile applications. Traditional methods for processing this data are often time- and resource-intensive. To address this challenge, there is a need for efficient and scalable real-time big data stream processing. This study introduces a novel architecture for a big data-driven real-time health status prediction and analytics system. In this architecture, we replace Hadoop MapReduce with Spark to enable a parallel, distributed, and scalable decision tree algorithm capable of handling real-time computations. This model is then applied to streaming data from various sources, supporting the prediction of health statuses across multiple diseases. Using distributed streaming data, the system predicts health conditions associated with different disorders. To evaluate the performance, we compare Spark's decision tree (Spark DT) with traditional machine learning tools such as Weka. Key performance metrics, including execution time and throughput, are analyzed to assess the effectiveness of the proposed architecture. Experimental results demonstrate that the proposed system can effectively manage and predict vast amounts of real-time IoT-enabled medical data related to various disorders, showcasing its potential for real-time healthcare applications.
References
Manogaran G, Lopez D. Health data analytics using scalable logistic regression with stochastic gradient descent. Int J Adv Intell Paradigms. 2018;10(1–2):118–32.
Hu H, Wen Y, Chua T‑S, Li X. Toward scalable systems for big data analytics: a technology tutorial. IEEE Access. 2014;2:652–87.
Cattell R. Scalable sql and NoSQL data stores. ACM Sigmod Record. 2011;39(4):12–27.
Moniruzzaman A, Hossain SA. NoSQL database: New era of databases for big data analytics‑classification, char‑ acteristics and comparison. 2013. arXiv preprint arXiv:1307.0191.
Dean J, Ghemawat S. Mapreduce: simplified data processing on large clusters. Commun ACM. 2008;51(1):107–13.
Belle A, Thiagarajan R, Soroushmehr S, Navidi F, Beard DA, Najarian K. Big data analytics in healthcare. BioMed Res Int. 2015; 2015.
Anuradha J, et al. A brief introduction on big data 5vs characteristics and hadoop technology. Procedia Comput Sci. 2015;48:319–24.
Banaee H, Ahmed MU, Loutfi A. Data mining for wearable sensors in health monitoring systems: a review of recent trends and challenges. Sensors. 2013;13(12):17472–500.
Mathew PS, Pillai AS. Big data challenges and solutions in healthcare: a survey. In: Snášel V, Abraham A, Krömer P, Pant M, Muda A, editors. Innovations in bio‑inspired computing and applications. Berlin: Springer; 2016. p. 543–53.
Sun J, Reddy CK. Big data analytics for healthcare. In: Proceedings of the 19th ACM SIGKDD International Discovery and Data Mining. New York: ACM; 2013. p. 1525–1525.
Masethe HD, Masethe MA. Prediction of heart disease using classification algorithms. Proc World Congress Eng Comput Sci. 2014;2:22–4.
Bhardwaj A, Tiwari A. Breast cancer diagnosis using genetically optimized neural network model. Expert Syst Appl. 2015;42(10):4611–20.
Tomar D, Agarwal S. A survey on data mining approaches for healthcare. Int J Bio‑Sci Bio‑Technol. 2013;5(5):241–66.
Herland M, Khoshgoftaar TM, Wald R. A review of data mining using big data in health informatics. J Big Data. 2014;1(1):2.
Rallapalli S, Gondkar R, Rao GVM. Cloud based k‑means clustering running as a Mapreduce job for big data health‑ care analytics using Apache mahout. In: Satapathy S, Mandal J, S Udgata, Bhateja V, editors. Information systems design and intelligent applications. Berlin: Springer; 2016. p. 127–35.
Sarkar BB, Paul S, Cornel B, Rohatinovici N, Chaki N. Personal health record management system using Hadoop framework: An application for smarter health care. In: International Workshop Soft Computing Applications. Berlin: Springer; 2016. p. 385–93.
Sampath P, Tamilselvi S, Kumar NS, Lavanya S, Eswari T. Diabetic data analysis in healthcare using Hadoop architec‑ ture over big data. Int J Biomed Eng Technol. 2017;23(2–4):137–47.
Rathore MM, Paul A, Ahmad A, Anisetti M, Jeon G. Hadoop‑based intelligent care system (HICS): analytical approach for big data in IoT. ACM Trans Internet Technol (TOIT). 2017;18(1):8.
Basco JA, Senthilkumar N. Real‑time analysis of healthcare using big data analytics. Comput Inf Technol. 2017;263:042056.
Yadranjiaghdam B, Pool N, Tabrizi N. A survey on real‑time big data analytics: Applications and tools. In: 2016 international conference On computational science and computational intelligence (CSCI). New York: IEEE; 2016. p. 404–9.
Hazarika AV, Ram GJSR, Jain E. Performance comparison of hadoop and spark engine. In: 2017 international confer‑ ence on I‑SMAC (IoT in social, mobile, analytics and cloud)(I‑SMAC). New York: IEEE; 2017. p. 671–4.
Rallapalli S, Suryakanthi T. Predicting the risk of diabetes in big data electronic health records by using scalable random forest classification algorithm. In: 2016 international conference on advances in computing and communi‑ cation engineering (ICACCE). New York: IEEE; 2016. p. 281–4.
Feroz MN, Mengel S. Examination of data, rule generation and detection of phishing urls using online logistic regression. In: 2014 IEEE international conference on big data (Big Data). New York: IEEE; 2014. p. 241–50.
Zhao T, Ni H, Zhou X, Qiang L, Zhang D, Yu Z. Detecting abnormal patterns of daily activities for the elderly living alone. In: International conference on health information science. Berlin: Springer; 2014. p. 95–108.
Rathore MM, Ahmad A, Paul A, Wan J, Zhang D. Real‑time medical emergency response system: exploiting Io
T and big data for public health. J Med Syst. 2016;40(12):283.Manogaran G, Lopez D. A survey of big data architectures and machine learning algorithms in healthcare. Int J Biomed Eng Technol. 2017;25(2–4):182–211.
Lee K, Agrawal A, Choudhary A. Real‑time disease surveillance using twitter data: demonstration on flu and cancer. In: Proceedings of the 19th ACM SIGKDD international conference on knowledge discovery and data mining. New York: ACM; 2013. p. 1474–7.
Apache kafka. https://kafka.apache.org. Accessed 15 Dec 2017.
Hunt P, Konar M, Junqueira FP, Reed B. Zookeeper: Wait‑free coordination for internet‑scale systems. In: USENIX Annual technical conference, vol. 8. Boston, MA, USA; 2010.
Quinlan JR. C4. 5: programs for machine learning. Amsterdam: Elsevier; 2014.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2024 Stallion Journal for Multidisciplinary Associated Research Studies
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.