Cloud Site Reliability Engineer
3 weeks ago
The Site Reliability Engineer is a member of the Cloud Team and providing support on software development, operations and maintenance while dealing with complex infrastructure to improve performance, visibility, stability, availability and reliability using automated solutions. This role will provide Tier 3 support, either directly or by engaging with other stakeholders, for applications and platforms residing in the Cloud. Ideal candidate has hands-on experience and understanding of software development lifecycle from inception to implementation. The successful candidate should have knowledge and understanding of maintaining and will be responsible for ensuring the reliability and speed of the software.
This position is eligible for the TalentQuest employee referral program. If an employee referred you for this job, please apply using the system-generated link that was sent to you.
Responsibilities
Set up and maintain Azure-native monitoring tools like Azure Monitor, Log Analytics, and Application Insights to oversee system performance, resource health, and workload behavior across AKS environments.
Build tailored dashboards that provide clear visualizations of key metrics and configure proactive alerting mechanisms to detect anomalies early and trigger appropriate responses.
Utilize Azure Sentinel to enhance security incident detection and response for AKS environments, maintaining compliance and minimizing risks.
Implement end-to-end observability practices by combining metrics, logs, and traces for comprehensive insights into containerized applications and their underlying infrastructure.
Design and maintain automation scripts using Python, PowerShell, or Bash to streamline repetitive tasks, such as automated scaling, backup processes, and system health checks.
Develop runbooks and automated workflows that trigger predefined remediation steps for commonly encountered issues, minimizing manual intervention and response time.
Create scripts that enable automatic system adjustments and recovery actions when performance thresholds are crossed or errors are detected.
Utilize tools such as Terraform or ARM templates to automate and manage the provisioning of cloud resources, ensuring consistency and repeatability.
Rapidly Diagnose Issues: Lead the identification and troubleshooting of issues impacting system performance, leveraging data from monitoring tools and logs for swift resolution.
Root Cause Analysis (RCA): Conduct thorough post-incident analyses to document root causes, identify areas for improvement, and implement preventive measures to reduce recurrence.
Runbook Maintenance: Keep incident response runbooks up to date with the latest information and best practices to ensure readiness and consistency during unexpected events.
Analyze Metrics and Performance Data: Continuously monitor key performance indicators (KPIs) across cloud resources and workloads to spot trends, potential bottlenecks, and opportunities for enhancement.
Propose and implement strategies to improve the cost-efficiency and performance of cloud services, such as right-sizing resources or enhancing load-balancing configurations.
Work closely with architecture and development teams to provide input on designing robust, scalable, and resilient cloud solutions.
Implement best practices for optimizing container performance within AKS clusters, ensuring optimal CPU and memory usage without compromising application availability.
Provide feedback and support to development teams to ensure applications are designed with reliability and scalability in mind.
Advocate for and help implement best practices in reliability, incident management, and proactive monitoring across teams.
Collaborate with security teams to identify and mitigate vulnerabilities in cloud infrastructure, integrating security monitoring and automated compliance checks.
Create comprehensive documentation covering monitoring configurations, incident response protocols, and remediation procedures to ensure team alignment and knowledge retention.
Contribute to the creation of internal training resources to help team members familiarize themselves with new tools, techniques, and processes.
Regularly share insights, lessons learned, and new approaches to improve the team's response capabilities and the overall reliability of cloud services.
Regularly analyze usage data and performance metrics to identify opportunities for cost optimization, such as rightsizing virtual machines, optimizing storage solutions, and scheduling non-critical resources to shut down during off-peak hours.
Use Azure Cost Management + Billing to monitor expenses and track actual versus predicted costs.
Work with architecture teams to design solutions that maintain performance while minimizing costs, including the use of reserved instances, spot instances, and optimizing data transfer methods.
Develop automation scripts that dynamically manage resource allocation based on load, reducing unnecessary expenditure.
Proficiency in Service Level Objectives, Service Level Indicators, and error budgeting to balance system reliability with development velocity.
Expertise in chaos engineering practices to test and improve system resiliency under controlled conditions.
Deep knowledge of monitoring and observability tools, such as Prometheus, Grafana, and Azure Monitor.
Strong troubleshooting abilities for distributed systems with proficiency in identifying root causes.
Experience implementing incident management frameworks, ensuring smooth communication, documentation, and follow-up for service interruptions.
Qualifications
Bachelor's Degree in Information Technology or the equivalent combination of training, education, and experience.
Solid hands-on experience in a Site Reliability Engineer, DevOps Engineer, or similar role with a strong focus on Azure cloud services.
Technical Skills
Proficiency in scripting languages such as Python, PowerShell, or Bash.
Extensive experience with Azure monitoring tools like Azure Monitor, Log Analytics, Application Insights, and Azure Sentinel.
Familiarity with AKS and best practices for monitoring containerized applications.
Problem-Solving: Proven track record of effective troubleshooting and resolution of cloud infrastructure issues.
Automation Expertise: Hands-on experience creating automated solutions using IaC tools like Terraform or ARM templates.
Collaboration and Communication: Strong interpersonal skills to work effectively within cross-functional teams.
Desired Qualifications
Certifications: Azure certifications such as Microsoft Certified: Azure Administrator Associate or Azure Solutions Architect Expert.
Advanced Knowledge: Experience with Kusto Query Language (KQL) for in-depth data analysis and complex queries.
Security Acumen: Familiarity with integrating security best practices into monitoring and incident response.
Dynatrace experience a plus.
Knowledge, understanding and experience of DevOps and Agile Methodologies.
Experience in Microsoft Azure Technologies.
Experience in Tanzu Application/Container Services (TAS/TKS) (Previously Pivotal Cloud Foundry) or equivalent container based platforms/products like Openshift, Azure Kubernetes Services, Google Container Services etc.
Experience using ServiceNow ITOM and ITSM to create catalogs or to automate processes by integrating with other systems.
Knowledge and understanding of how software is built and managed.
Hours: Monday - Friday, 8:00AM - 4:30PM
Location: 820 Follin Lane, Vienna, VA Heritage Oaks Drive Pensacola, FL Security Drive Winchester, VA Willow Creek Road San Diego, CA Bendix Road, Suite 250, Virginia Beach, VA Saint Johns Industrial Parkway South, Jacksonville, FL Airport Freeway, Suite 925, North Richland Hills, TX 76180 4 Concourse Parkway Sandy Springs, GA 30328
About Us
Navy Federal provides much more than a job. We provide a meaningful career experience, including a culture that is energized, engaged and committed; and fierce appreciation for our teams, who are rewarded with highly competitive pay and generous benefits and perks.
• Best Companies for Latinos to Work for 2024
• Computerworld Best Places to Work in IT
• Forbes 2024 America's Best Large Employers
• Forbes 2023 The Best Employers for New Grads
• Fortune Best Workplaces for Millennials 2023
• Fortune Best Workplaces for Women 2023
• Fortune 100 Best Companies to Work For 2024
• Military Times 2023 Best for Vets Employers
• Newsweek Most Loved Workplaces
• Ripplematch Campus Forward Award - Excellence in Early Career Hiring
• Yello and WayUp Top 100 Internship Programs
From Fortune. 2024 Fortune Media IP Limited. All rights reserved. Used under license. Fortune and Fortune Media IP Limited are not affiliated with, and do not endorse products or services of, Navy Federal Credit Union.
Equal Employment Opportunity: Navy Federal values, celebrates, and enacts diversity in the workplace. Navy Federal takes affirmative action to employ and advance in employment qualified individuals with disabilities, disabled veterans, Armed Forces service medal veterans, recently separated veterans, and other protected veterans. EOE/AA/M/F/Veteran/Disability EOE/AA/M/F/Veteran/Disability
Hybrid Workplace: Navy Federal Credit Union is a hybrid workplace, and details will be discussed during your interview process.
Disclaimers: Navy Federal reserves the right to fill this role at a higher/lower grade level based on business need . click apply for full job details
-
Cloud Solutions Engineer III
3 weeks ago
Pensacola, Florida, United States Navy Federal Credit Union Full timeAbout the RoleAt Navy Federal Credit Union, we seek a skilled Cloud Solutions Engineer III to join our team. The ideal candidate will have advanced knowledge of engineering principles, with experience in designing, implementing, and maintaining system and product solutions. The successful candidate will provide technical direction and engineering support for...
-
ETS Engineer III
3 weeks ago
Pensacola, United States Navy Federal Credit Union Full timeOverviewTo research, evaluate, design, implement, and maintain system and product solutions, applying knowledge of engineering principles. To provide technical direction and engineering support for projects and infrastructure. Develop and maintain expert functional knowledge of evolving IT engineering industry technologies/competition, concepts and trends....
-
ETS Principal Engineer
3 days ago
Pensacola, United States Navy Federal Credit Union Full timeOverviewTo research, evaluate, design, implement, and maintain systems and product solutions, applying expert knowledge of engineering principles. Assist with coordinating activities between multiple disciplines within IT and vendors on both technical and non-technical issues pertaining to computer system hardware and software, network infrastructure,...
-
Pensacola, Florida, United States International Staff Consulting Full timeWe are seeking a seasoned reliability engineer to lead our pressure equipment integrity program. As a key member of our maintenance team, you will be responsible for developing and implementing a comprehensive inspection program that ensures compliance with industry standards and regulations.Key Responsibilities:• Develop and maintain a plant-wide fixed...
-
Site Superintendent Position
2 days ago
Pensacola, Florida, United States RQ Construction Full timeResponsibilities: The successful candidate will be responsible for overseeing the execution of construction projects, ensuring timely completion, and maintaining high-quality standards. Key duties include:Supervising daily construction activities, directing employees, and coordinating with subcontractors.Ensuring compliance with project requirements, budget,...
-
Electrical Engineer
2 weeks ago
Pensacola, United States Ascend Performance Materials Full timePOSITION OVERVIEW Ascend Performance Materials is the premium provider of high-quality chemicals, fibers and plastics. With world scale integrated manufacturing facilities, we are able to develop new products from our core technologies and provide flexibility to respond to the expanding needs of customers. Ascend has global sales and distribution facilities...
-
Senior Network Systems Engineer
2 weeks ago
Pensacola, Florida, United States COMTECH TELECOMMUNICATIONS Full timeJob TitleSenior Network Systems Engineer - $110,000/yearCompany OverviewComtech Telecommunications Corp. is a leading global technology company providing innovative solutions for terrestrial and wireless networks, next-generation emergency services, satellite communications, and cloud capabilities.Job DescriptionWe're seeking experienced professionals to...
-
ETS Engineer III
3 days ago
Pensacola, United States Navy Federal Credit Union Full timeOverviewNavy Federal's Microsoft Infrastructure Engineering Organization fosters a collaborative environment and is building a best-in-class team that manages, enhances, and protects Navy Federal information and its Microsoft Windows server environment.As a Microsoft Infrastructure Engineer, you are expected to be a subject matter expert with hands-on...
-
Cyber Security Architect Lead
3 weeks ago
Pensacola, Florida, United States Navy Federal Credit Union Full timeWe are seeking a highly skilled Cyber Security Architect Lead to join our team at Navy Federal Credit Union.The ideal candidate will have a strong background in cybersecurity engineering, with a minimum of 10+ years of experience in on-premise and cloud architectures, proxy management, cloud governance, and security controls.This role requires expertise in...
-
Cyber Security Data Integration Specialist
3 weeks ago
Pensacola, Florida, United States Argo Cyber Systems Full timeJob OverviewArgo Cyber Systems is supporting a U.S. Government customer on a mission critical development and sustainment program to design, build, deliver, and operate a network operations environment.This position requires a thorough understanding of network architecture fundamentals, protocols, routing, firewalls, cloud, and DevOps. We are seeking a...
-
Nylon 6,6 Value Chain Leader
7 days ago
Pensacola, Florida, United States Ascend Performance Materials Full timeAbout the PositionWe are seeking a skilled Electrical Engineer to join our team. The successful candidate will have a Bachelor of Science degree in Electrical Engineering and 0-3 years of experience in the chemical, plastics, or refining industry.The Electrical Engineer will be responsible for ensuring the availability and inability of the assets to meet the...
-
Mechanical Engineer
2 weeks ago
Pensacola, United States Velocity Restorations LLC Full timeMechanical Engineer About Us Velocity is the country's leading and largest builder of classic American vehicles. We're redefining the essence of classic ownership, vehicles not just restored, but entirely reborn. Our vehicles marry the soul and aesthetic of classic cars with the performance, technology, and reliability of contemporary automobiles. We have...
-
Site Management Professional
3 weeks ago
Pensacola, Florida, United States RQ Construction Full timeRQ Construction is a leading player in the Southern California commercial and governmental Design-Build economy, with expertise in fast-track projects for public and private clients. We are seeking an experienced Construction Superintendent to join our Field Operations team in Pensacola, FL.The ideal candidate will have at least five years of experience in a...
-
Electrical Systems Design and Control Engineer
10 hours ago
Pensacola, Florida, United States Automation Control Service Full timeJob DescriptionWe are seeking an experienced Electrical Systems Design and Control Engineer to join our team at Automation Control Service. As a key member of our engineering team, you will be responsible for designing, programming, testing, and commissioning electrical systems and industrial control systems.The successful candidate will have a strong...
-
Clinical Engineer
1 month ago
Pensacola, United States TRIMEDX Full timeIf you are wondering what makes TRIMEDX different, it‘s that all of our associates share in a common purpose of serving clients, patients, communities, and each other with equal measures of care and performance.Everyone is focused on serving the customer and we do that by collaborating and supporting each otherAssociates look forward to coming to work each...
-
Clinical Engineer
1 month ago
Pensacola, United States TRIMEDX Full timeIf you are wondering what makes TRIMEDX different, it‘s that all of our associates share in a common purpose of serving clients, patients, communities, and each other with equal measures of care and performance.Everyone is focused on serving the customer and we do that by collaborating and supporting each otherAssociates look forward to coming to work each...
-
Distribution Engineer
3 weeks ago
Pensacola, United States Enercon Full timeDistribution Engineer - Power Delivery- Early CareerLocations US-GA-Kennesaw | US-FL-Tampa | US-FL-Pensacola | US-CA-San Luis ObispoJob ID 2024-2796# of Openings 5Discipline DistributionOverviewWhat does our Power Delivery (PD) Group do? ENERCON's Power Delivery (PD) group provides comprehensive engineering and technical services to support power...
-
Engineer
4 weeks ago
Pensacola, United States Sinclair Broadcast Group Full timeWEAR-TV 3 has an excellent opportunity for a Full Time Broadcast Engineer. This role plays an imperative role as the go-between resource for the engineering team, leadership, and the location’s personnel. We need a dynamic individual to help grow the impact of our engineering team! This position involves maintaining computers and servers related to...
-
Electrical Power Delivery Engineer
3 weeks ago
Pensacola, Florida, United States Enercon Full timeAbout EnerconEnercon Services, Inc. is a leading provider of engineering and technical services for the energy sector.We are seeking an experienced Electrical Power Delivery Engineer to join our team in US locations: US-GA-Kennesaw | US-FL-Tampa | US-FL-Pensacola | US-CA-San Luis Obispo.The successful candidate will have a strong background in electrical...
-
Geotechnical Project Engineer Position
3 weeks ago
Pensacola, Florida, United States NOVA Engineering and Environmental, LLC Full timeAbout NOVA Engineering and Environmental, LLCNOVA Engineering and Environmental, LLC is a leading engineering firm specializing in geotechnical services. Our team of experts provides innovative solutions for complex projects.Salary Range:We offer a competitive salary range of $ $90,000 - $120,000 per annum, depending on experience.Job Description:The...