Jenkins Pipeline Project for Auto Recycling AWS EMR Spark Cluster

Yosef Tavin - PeerSpot reviewer
5000 people affected
2 month project

Project Description

In this project I created a Jenkins Pipeline that Auto Recycles our Production AWS EMR Spark Cluster once every week.


The impact of this project was on all Controlup Customers since the Spark cluster is the main component in the Controlup Data pipeline and this data powers our Controlup Insights application which faces all Controlup customers.


The project involved integrating and interfacing with many different technologies and API's.

Among them: 

  • All code managed and dynamically pulled from Git Source control on VSTS
  • Jenkins, jobs and DSL Pipeline in Groovy.
  • Spark Java Big Data application running on Amazon EMR.
  • Microsoft SQL Server hosted on Amazon RDS.
  • Pulling binaries and configurations from S3.
  • Using the AWS CLI and Python boto3 library.
  • Bash, Powershell and Python scripts and Linux command line tools.


For properly automating the process, each automation step were individually wrapped as a separate Jenkins jobs and all managed by Jenkins Groovy based Pipeline for easy readability, extensibility and reusability of the code.

Project was designed to run automatically on any environment Dev, QA or Production and currently runs successfully in Production.

Lessons Learned

Set priorities with management to complete this sooner

Highlights

Received recognition / award
Support from colleagues
Well Designed

Difficulties

Management had to be convinced
Steep learning curve
Large no. of people impacted
Integrating many Technologies

Products Used

Technical Skills Used

  • Bash
  • Powershell
  • Python
  • Groovy
  • Consul
  • Vault
  • Git
  • AWS CLI
  • AWS Boto3 Python library
  • Slack

Technical Certifications

  • AWS Solutions Architect
  • Rishon LeZiyyon (IL)31.97134.7894