Jenkins Pipeline Project for Auto Recycling AWS EMR Spark Cluster
5000 people affected
2 month project
In this project I created a Jenkins Pipeline that Auto Recycles our Production AWS EMR Spark Cluster once every week.
The impact of this project was on all Controlup Customers since the Spark cluster is the main component in the Controlup Data pipeline and this data powers our Controlup Insights application which faces all Controlup customers.
The project involved integrating and interfacing with many different technologies and API's.
All code managed and dynamically pulled from Git Source control on VSTS
Jenkins, jobs and DSL Pipeline in Groovy.
Spark Java Big Data application running on Amazon EMR.
Microsoft SQL Server hosted on Amazon RDS.
Pulling binaries and configurations from S3.
Using the AWS CLI and Python boto3 library.
Bash, Powershell and Python scripts and Linux command line tools.
For properly automating the process, each automation step were individually wrapped as a separate Jenkins jobs and all managed by Jenkins Groovy based Pipeline for easy readability, extensibility and reusability of the code.
Project was designed to run automatically on any environment Dev, QA or Production and currently runs successfully in Production.
Set priorities with management to complete this sooner