Principle Lead SRE

4 weeks ago


Phoenix, United States Insight Global Full time

Role: Principle Lead SRE

Location: Phoenix, AZ (85027)

Hybrid: Onsite, 3 days/week

Contract Duration: 6 months contract-to-hire

Day to Day:

A large retail enterprise in Phoenix, AZ is looking for a SRE Engineer to help lead observability initiatives and assist in the development and implementation of build release pipelines with accountability for managing deployment schedules, issues, risks, etc. You will be responsible for providing expertise and design solutions for observability applications as well as system integration with internal systems and external vendors. This can include providing technical leadership in design, development, and testing. Alongside the design aspect, you will track infrastructure delivery, define the structure of the systems and interfaces, and work heavily with other team members of the agile development to ensure commitment for each sprint. You will act as a mentor and provide coding and technical direction to less experienced staff or developers.

You will need to have experience in the following categories:

  • Experience with gathering and organizing large volume of data to use for instrumentation into an Enterprise Observability solution.
  • Experience with recommending baseline monitoring thresholds, and performance monitoring KPIs and SLAs.
  • Experience with installing agents, forwarders, APIs, performance monitoring alerts, dashboards, and data trend analysis.
  • Good Knowledge and understanding of Azure foundation components e.g., App GW, APIM, Virtual Network, NSG, Load Balancer, Azure VM etc. is required.

Desired Qualifications:

  • 4-year degree (Computer Science, Information Systems, or relational functional field) and/or equivalent combination of education or work experience. **A DEGREE IS ABSOLUTELY MANDATORY**
  • 5+ years Tech lead experience required.
  • 8+ years of hands on software engineering experience with Java, along with either Python, Go, C, C++. Java is a must.
  • Understanding of public cloud (Azure / GCP is highly preferred).
  • 10+ years of experience on integration engineering related to Observability/Monitoring framework and on two or more APM Tools (AppDynamics, Datadog, Splunk, Dynatrace, Kibana, Elastic etc.).
  • Database Experience: Azure SQL, PostgreSQL, MySQL, MongoDB, TSDB, or similar databases.
  • 5+ years of experience as a System Reliability Engineer is required.

Plusses:

  • Experience working with Open-source platforms and Open Telemetry libraries e.g., Grafana is preferred.
  • Heavy GCP experience.

What are the top 3-5 responsibilities expected of this worker?

1. Lead the Observability Ingestion team.

2. Provide technical solutions day to day.

3. Responsible for the technical delivery of the team.

4. Resolve any technical blockers.

5. Work with the Architects to provide solution options and perform POC and learning on any new technologies required for the team.

What are your top 3-5 skills in an ideal candidate?

1. SRE skills

2. Observability development skills

3. Technical lead experience

4. Performance Monitoring

5. Problem Solving

6. Grafana, Prometheus, Cortex, Loki, Tempo, Mimir

7. We have a need to get the GCP experience for this position as we have focused only on AKS/Azure experience for the previous candidates


  • Principle Lead SRE

    4 weeks ago


    Phoenix, United States Insight Global Full time

    Role: Principle Lead SRE Location: Phoenix, AZ (85027)Hybrid: Onsite, 3 days/weekContract Duration: 6 months contract-to-hireDay to Day:A large retail enterprise in Phoenix, AZ is looking for a SRE Engineer to help lead observability initiatives and assist in the development and implementation of build release pipelines with accountability for managing...

  • Principle Lead SRE

    4 weeks ago


    Phoenix, United States Insight Global Full time

    Role: Principle Lead SRE Location: Phoenix, AZ (85027)Hybrid: Onsite, 3 days/weekContract Duration: 6 months contract-to-hireDay to Day:A large retail enterprise in Phoenix, AZ is looking for a SRE Engineer to help lead observability initiatives and assist in the development and implementation of build release pipelines with accountability for managing...

  • Principle Lead SRE

    4 weeks ago


    Phoenix, United States Insight Global Full time

    Role: Principle Lead SRE Location: Phoenix, AZ (85027)Hybrid: Onsite, 3 days/weekContract Duration: 6 months contract-to-hireDay to Day:A large retail enterprise in Phoenix, AZ is looking for a SRE Engineer to help lead observability initiatives and assist in the development and implementation of build release pipelines with accountability for managing...

  • Principle Lead SRE

    4 weeks ago


    Phoenix, United States Insight Global Full time

    Role: Principle Lead SRE Location: Phoenix, AZ (85027)Hybrid: Onsite, 3 days/weekContract Duration: 6 months contract-to-hireDay to Day:A large retail enterprise in Phoenix, AZ is looking for a SRE Engineer to help lead observability initiatives and assist in the development and implementation of build release pipelines with accountability for managing...


  • Phoenix, United States Mastech Digital Full time

    Organizational Structure And Impact: Impact/Function this role has within the bank/LOB: SRC site reliability center supports all infrastructure within the bank, responsible for 24/7 operations of enterprise support, all technology and day to day operations - keeps the bank running, process pillars and observability, using tools and needs product owner to...


  • Phoenix, United States Mastech Digital Full time

    Organizational Structure And Impact: Impact/Function this role has within the bank/LOB: SRC site reliability center supports all infrastructure within the bank, responsible for 24/7 operations of enterprise support, all technology and day to day operations - keeps the bank running, process pillars and observability, using tools and needs product owner to...


  • Phoenix, United States Mastech Digital Full time

    Organizational Structure And Impact: Impact/Function this role has within the bank/LOB: SRC site reliability center supports all infrastructure within the bank, responsible for 24/7 operations of enterprise support, all technology and day to day operations - keeps the bank running, process pillars and observability, using tools and needs product owner to...


  • Phoenix, United States Mastech Digital Full time

    Organizational Structure And Impact: Impact/Function this role has within the bank/LOB: SRC site reliability center supports all infrastructure within the bank, responsible for 24/7 operations of enterprise support, all technology and day to day operations - keeps the bank running, process pillars and observability, using tools and needs product owner to...


  • Phoenix, United States Expert In Recruitment Solutions Full time

    Job Title: Site Reliability Engineer (SRE) Location: This is a hybrid onsite position, worker is required to work onsite 2-3 days per week in Phoenix, AZ. Hybrid Onsite: Worker is required to work onsite 3 days per week in Phoenix, AZ as they will be working cross functionally with 3 different teams. MAIN RESPONSIBILITIES " Experience in leading...


  • Phoenix, United States Expert In Recruitment Solutions Full time

    Job Title: Site Reliability Engineer (SRE) Location: This is a hybrid onsite position, worker is required to work onsite 2-3 days per week in Phoenix, AZ. Hybrid Onsite: Worker is required to work onsite 3 days per week in Phoenix, AZ as they will be working cross functionally with 3 different teams. MAIN RESPONSIBILITIES " Experience in leading...


  • Phoenix, United States TEKsystems Full time

    Description: Monitor infrastructure, servers, middleware, databases, and batch jobs. • Aggressively respond to service requests from business partners facing support teams, Operations, Risk/control partners, etc. • Troubleshoot environment, data control and operational issues. • Create and Maintain documentation to ensure knowledge...


  • Phoenix, Arizona, United States TEKsystems Full time

    *Description:* Monitor infrastructure, servers, middleware, databases, and batch jobs. Aggressively respond to service requests from business partners facing support teams, Operations, Risk/control partners, etc. Troubleshoot environment, data control and operational issues. Create and Maintain documentation to ensure knowledge accessibility. Automate...


  • Phoenix, United States Cloud BC Labs Full time

    Job DescriptionJob DescriptionPOSITIONSite Reliability Engineer (SRE)LOCATIONHybrid Phoenix, AZDURATION6 MonthsINTERVIEW TYPEVideoVISA RESTRICTIONSNoneREQUIRED SKILLSExperience leading onshore/offshore teamsHands on building/troubleshooting experienceTransitioned from Prometheus to Mimir; Grafana is still a must-haveSite Reliability/Observability dev...


  • Phoenix, United States Motion Recruitment Full time

    Senior Site Reliability Engineer (SRE)Location: Phoenix, AZ, 85050 (Hybrid- 3 days onsite)Term: 06+ Months Contract (with a possible extension)Job SummaryWe are seeking a highly skilled Senior Site Reliability Engineer (SRE) to lead our release processes, manage infrastructure incidents, and optimize our CI/CD pipelines. The ideal candidate will have a deep...


  • Phoenix, United States Motion Recruitment Full time

    Senior Site Reliability Engineer (SRE)Location: Phoenix, AZ, 85050 (Hybrid- 3 days onsite)Term: 06+ Months Contract (with a possible extension)Job SummaryWe are seeking a highly skilled Senior Site Reliability Engineer (SRE) to lead our release processes, manage infrastructure incidents, and optimize our CI/CD pipelines. The ideal candidate will have a deep...


  • Phoenix, United States Motion Recruitment Full time

    Senior Site Reliability Engineer (SRE)Location: Phoenix, AZ, 85050 (Hybrid- 3 days onsite)Term: 06+ Months Contract (with a possible extension)Job SummaryWe are seeking a highly skilled Senior Site Reliability Engineer (SRE) to lead our release processes, manage infrastructure incidents, and optimize our CI/CD pipelines. The ideal candidate will have a deep...


  • Phoenix, United States TEKsystems Full time

    *Description:* Monitor infrastructure, servers, middleware, databases, and batch jobs. Aggressively respond to service requests from business partners facing support teams, Operations, Risk/control partners, etc. Troubleshoot environment, data control and operational issues. Create and Maintain documentation to ensure knowledge accessibility. Automate...


  • Phoenix, United States TEKsystems Full time

    *Description:* Monitor infrastructure, servers, middleware, databases, and batch jobs. Aggressively respond to service requests from business partners facing support teams, Operations, Risk/control partners, etc. Troubleshoot environment, data control and operational issues. Create and Maintain documentation to ensure knowledge accessibility. Automate...

  • SRE Lead Engineer

    3 days ago


    Phoenix, United States TEKsystems Full time

    *Description:* Monitor infrastructure, servers, middleware, databases, and batch jobs. Aggressively respond to service requests from business partners facing support teams, Operations, Risk/control partners, etc. Troubleshoot environment, data control and operational issues. Create and Maintain documentation to ensure knowledge accessibility. Automate...


  • Phoenix, United States Netpace Full time

    PLEASE READ CHAT NOTES FOR MAX SUBMISSION RATE - Max submission rate is Job Title: Tech Lead/ Principal Lead Engineer Observability Hybrid Onsite: Worker is required to work onsite 3 days per week in Phoenix, AZ as they will be working cross functionally with 3 different teams. MAIN RESPONSIBILITIES " Experience in leading Observability...