Developed a Solution to check 20 lower URL environments and send email Status every 4 hours unattended and continuously

250 people affected
1 people managed
1 month project

Project Description

Developed a Solution to check 20 lower URL environments and send out email Status every 4 hours

Most companies monitor their Production sites for outages. However, many companies have several lower environments such as Dev, QA, Training, Stage, Cert, UAT, Test, etc. that are important for on-going testing purposes and are not monitored for outages. I developed a continuous solution using UFT that would run unattended every 4 hours, check each of the 20 URLs, and if there was an outage the script would send an email to the corresponding Point of Contact with a "Failed" Status including the URL and also the Status Code. A Status Code of 200 is considered good, so any HTTP Status Code not equal to 200 would be considered a "Failed" such as "404 - Not Found", "500 - Internal Server Error", "503 - Service Unavailable", "504 - Gateway Time Out", etc.

Additionally, when an email was sent out, the header of the email was color coded so if there was a Failed URL, the header in the email would be color-coded in Red. If all applications were o.k., then the header would be Green. This was done intentionally so if an email recipient was using a hand-held device and opened the email and saw Green, then they would know right then that no action needed to be taken. If the person saw Red, then they knew that some action might need to be taken.

Furthermore, every time the UFT script ran the scheduled verification check, it downloaded the data-table from UFT with all 20 URLs and corresponding Status Codes to a specified folder with the Excel file named using the format MM-DD-YYYY HH-MM-SS which gave the Date and Time for each run. For example, the downloaded Excel file would be of the form "UFT_CoolCoder_Application_URL_Check_Email 11-15-2018 8-45-12.xls". By looking at the naming convention any person could tell that the script ran and sent out an email on November 15, 2018 8:45 AM (more detailed 8:45:12 AM). The contents of each Excel file would have "Name_of_Environment", "Application_URL", "STATUS_OF_URL", "EMAIL_ADDRESS", and "Name_Point_of_Contact".

The benefit of this UFT Automated Solution was that when any of the lower environments had an outage, the Point of Contact Person (or Persons) was alerted with an email within 4 hours and if it was an unexpected outage, the URL could be brought back up with minimal downtime.

Using the data to reduce future outages

Finally, after every 3 Months, I would take all the data from the downloaded Excel files and analyze the data. By this, what I mean to say is that after compiling all the data, I presented graphs that showed for each day of the week, the number of outages per URL (i.e. Dev01, UAT02, etc.) for Monday through Sunday. Furthermore, I also developed a graph that showed the frequency of Status Codes for outages per day of the week. This was useful to upper management in finding a trend of why certain Status Codes such as "500 - Internal Server Error" occurred more on a certain day of the week. After analyzing over several weeks, there was a definite trend of when the most and least outages occur based on day of the week.

Lessons Learned

The script worked well and was found to be very useful. If I had to do it again, I would check to see when certain Servers were scheduled to be taken down for maintenance because "503 - Service Unavailable" would be expected when a Server is intentionally brought down for maintenance issues.

Highlights

received recognition / award
support from colleagues
reduced outages
prevented future outages

Difficulties

taking concept into production
  • Carrollton32.9537-96.8903
Ask me a question
Sign Up with Email