Adaptive ETL Orchestration Using Reinforcement Learning in Multi-Cloud Data Pipelines

Sarvesh Kumar Gupta

doi:10.55544/sjmars.1.4.8

Authors

Sarvesh Kumar Gupta Consulting Member of Technical Staff, Oracle, Saint Peters, Missouri - 63376, USA https://orcid.org/0009-0008-7460-4874

DOI:

https://doi.org/10.55544/sjmars.1.4.8

Keywords:

Reinforcement Learning, ETL Orchestration, Multi-Cloud Computing, Data Pipelines, Workflow Scheduling, Adaptive Resource Allocation, Cloud Data Engineering, Intelligent Automation

Abstract

With the fast rise in the number of companies using the capabilities of big data technologies, it became essential to introduce more advanced solutions to support Extract, Transform, and Load (ETL) operations within a wide range of cloud architectures. In general, the traditional approach to ETL orchestration involves scheduling tasks according to the established static rules and allocating resources according to predefined procedures, which often fails to address the needs of varying workloads, limited availability of computing resources, changing network performance, and different billing structures typical of multi-cloud environments. In order to overcome such problems, the paper is aimed at evaluating reinforcement learning for orchestrating ETL data pipelines. In this case, the proposed architecture will feature a learning-based solution designed to ensure the continuous adjustment of the schedule, computing resources, and data storage according to the current status of the ETL process. With the help of information about execution results, the reinforcement learning approach will be able to determine the best policy in terms of processing latency, throughput, failure rates, and other important factors. Thus, it will be possible to analyze the main components, principles of operation, and potential advantages of adaptive ETL orchestration based on reinforcement learning.

References

[1] Wu, F., Wu, Q., & Tan, Y. (2015). Workflow scheduling in cloud: A survey. The Journal of Supercomputing, 71, 3373–3418.

[2] Barika, M., Garg, S., Zomaya, A. Y., Wang, L., Moorsel, A. van, & Ranjan, R. (2019). Orchestrating big data analysis workflows in the cloud: Research challenges, survey, and future directions. ACM Computing Surveys, 52(5), Article 95.

[3] Cui, D., Peng, Z., Ke, W., Hong, X., & Zuo, J. (2018). Cloud workflow scheduling algorithm based on reinforcement learning. International Journal of High Performance Computing and Networking, 11(3), 181–190.

[4] Melnik, M., & Nasonov, D. (2019). Workflow scheduling using neural networks and reinforcement learning. Procedia Computer Science, 156, 29–36.

[5] Nascimento, A., Olimpio, V., Silva, V., Paes, A., & de Oliveira, D. (2019). A reinforcement learning scheduling strategy for parallel cloud-based workflows. IEEE IPDPS Workshops, 817–824.

[6] Pei, S., Zhang, Q., & Cheng, X. (2020). Workflow scheduling using graph segmentation and reinforcement learning. International Journal of Performability Engineering, 16(8), 1262–1270.

[7] Gao, T., Wu, C. Q., Hou, A., Wang, Y., Li, R., & Xu, M. (2019). Minimizing financial cost of scientific workflows under deadline constraints in multi-cloud environments. Proceedings of ACM SAC, 114–121.

[8] Barika, M., Garg, S., & Ranjan, R. (2020). Cost effective stream workflow scheduling to handle application structural changes. Future Generation Computer Systems, 112, 348–361.

[9] Nascimento, A., Silva, V., Paes, A., & de Oliveira, D. (2021). An incremental reinforcement learning scheduling strategy for data-intensive scientific workflows in the cloud. Concurrency and Computation: Practice and Experience, 33(11), e6193.

[10] Li, H., Huang, J., Wang, B., & Fan, Y. (2022). Weighted double deep Q-network based reinforcement learning for bi-objective multi-workflow scheduling in the cloud. Cluster Computing, 25, 751–768.

[11] Jayanetti, A., Halgamuge, S., & Buyya, R. (2022). Deep reinforcement learning for energy and time optimized scheduling of precedence-constrained tasks in edge–cloud computing environments. Future Generation Computer Systems, 137, 14–30.

Adaptive ETL Orchestration Using Reinforcement Learning in Multi-Cloud Data Pipelines

Authors

DOI:

Keywords:

Abstract

References

Downloads

Published

How to Cite

Issue

Section

License

Similar Articles

Make a Submission

Information

Journal Impact Factor

DOI Prefix

Abstracting & Indexing

Current Issue

Announcements