Site Reliability Engineer III

3 weeks ago


santa ana, United States Ledgent Technology Full time

We are seeking a Lead Site Reliability Engineer for a project with our client. The client is in the life insurance industry and the position is contract to hire. You can work remotely or a hybrid schedule.

LEAD SRE

As a Lead SRE you will be providing technical leadership, direction and accountability for platform engineering, system design and end-to-end implementation to meet and exceed the product or platform non-functional requirements including quality, security, reliability, availability and performance. The main responsibilities include, but are not limited to, optimizing design and engineering for new system and enhancements, including processes and day to day activities, to reliably support product rollout and operation in production. As a lead SRE, the role will include both oversight for production operations of our portfolio of systems, as well as development/engineering of solutions to optimize system reliability and automation.

How you'll help move us forward:

  • Lead the design, build and implement orchestration and tooling solutions to ensure that repetitive administration tasks are performed at a high level of efficiency and free of defect
  • Establish best practices for structuring, automating, building, deploying and monitoring complex distributed software products and environments.
  • Ensure the reliability and traceability of software releases and deployments of software and infrastructure changes.
  • Create and maintain platform architecture and design specifications to aid development, testing and maintenance of software environments
  • Design and implement monitoring and recovery tools to provide for site high availability (HA) and disaster recovery (DR)
  • Design and develop highly available infrastructure and platform components to meet the needs of our growing and evolving product lines
  • Design and implement security engineering best practices in all our deployed platform and environments
  • Triage alerts & diagnose/resolve critical issues, manage the implementation of changes
  • Manage the coordination, documentation, and tracking of critical incidents and corresponding root cause analysis, ensuring rapid and complete issue resolution and appropriate closed loop to customers and other key stakeholders.
  • Collaborate with Delivery Engineers and DevExp Engineers to enhance and implement continuous integration/continuous deployment orchestration system to reduce friction for software delivery to production
  • Lead, grow, mentor other SRE team member.
  • Evangelize the DevSecOps culture and SRE mindset, and mentor others about reliability and best practices.
  • Identify and work with other engineering discipline to implement opportunities for:
    • Automation
    • Signal to noise reduction
    • Prevention of recurring issues, and other actions to reduce time to mitigate service-impacting events and increase the productivity of cloud operations and development resources
  • Maintain a strong understanding of IaaS, PaaS, and SaaS offerings with building and maintaining a state-of-the-art, cloud-based environment for large-scale data processing
  • Design and implement processes, technology and automation for performance testing.
  • Ensure that implementation and solution are fully documented, and solution deployed with fully operationalized processes to support the solution lifecycle

The experience you bring:

  • 10-15 years of experience in infrastructure, system engineering, software engineering
  • Advanced knowledge in software engineering in test, testing automation frameworks and tools for application and/or any-as-code (infrastructure, configuration, development tools such as documentation or diagram as code)
  • Advanced knowledge in at least 3 of the following key areas: Cloud native and IaaS Architecture (performance testing, monitoring, operations), Design (compliance, security), Cloud Engineering (planning, provision), Containers orchestration solutions.
  • Strong understanding of business technology drivers and their impact on architecture design, performance and monitoring
  • Advanced level of knowledge on Observability engineering with hands on experience implementing and integrating at least 2-3 monitoring and observability platform such as AppDynamics, Dynatrace, Splunk, Grafana Cloud or cloud-based observability services in AWS or Azure
  • A systematic problem-solving approach, coupled with strong communications skills and a sense of ownership and drive.
  • Hands-on experience in designing, analyzing, scaling, and troubleshooting medium to large scale distributed systems.
  • Practice and well-versed with SRE methodologies and passionate about solving operation problems through automation and software engineering.
  • Ability to communicate effectively vertically and horizontally within the organization about technical strategy in clear, concise, understandable terms appropriate to the audience technical understanding and expertise
  • Demonstrated ability to conceptualize, launch and deliver multiple engineering projects on time and within budget
  • Demonstrated ability to understand and troubleshoot complex problems under pressure

What makes you stand out:

  • Subject matter expert in designing and supporting one of the 3 major public cloud provider - AWS is a plus will consider any other public cloud providers experience
  • Demonstrated expertise in microservices lifecycle management (integration, testing, deployment)
  • Strong experience in multiple technologies in the following set of logging and monitoring tools: ELK stack, Prometheus, Stackdriver, New Relic, Datadog, Dynatrace, Splunk, AWS logging and monitoring
  • Expert knowledge of release software tooling (e.g. Jenkins or Jenkins X, Spinnaker, Harness, Azure Devops service or other Cloud specific cloud environment)
  • Expert level knowledge of containerization technologies including experience in optimizing Docker image and managing Docker image lifecycle

TECHNICAL SKILLS

Must Have

  • Advanced experience with algorithms, data structures, complexity analysis and software design
  • Demonstrated expertise in microservices lifecycle management (integration, testing, deployment)
  • Expert knowledge of release software tooling (e.g. Jenkins or Jenkins X, Spinnaker, Harness, Azure Devops service or other Cloud specific cloud environment)
  • Expert level knowledge of containerization technologies including experience in optimizing Docker image and managing Docker image lifecycle
  • Expert level of knowledge for Kubernetes preferred but will consider experienced in other orchestration solution
  • Expert level of Linux/Unix/Window OS experience
  • Strong experience in multiple technologies in the following set of logging and monitoring tools: ELK stack, Prometheus, Stackdriver, New Relic, Datadog, Dynatrace, Splunk, AWS logging and monitoring
  • Subject matter expert in designing and supporting one of the 3 major public cloud provider - AWS is a plus will consider any other public cloud providers experience


All qualified applicants will receive consideration for employment without regard to race, color, national origin, age, ancestry, religion, sex, sexual orientation, gender identity, gender expression, marital status, disability, medical condition, genetic information, pregnancy, or military or veteran status. We consider all qualified applicants, including those with criminal histories, in a manner consistent with state and local laws, including the California Fair Chance Act, City of Los Angeles' Fair Chance Initiative for Hiring Ordinance, and Los Angeles County Fair Chance Ordinance. For unincorporated Los Angeles county, to the extent our customers require a background check for certain positions, the Company faces a significant risk to its business operations and business reputation unless a review of criminal history is conducted for those specific job positions.



  • Santa Ana, United States Ledgent Technology Full time

    We are seeking a Lead Site Reliability Engineer for a project with our client. The client is in the life insurance industry and the position is contract to hire. You can work remotely or a hybrid schedule.LEAD SREAs a Lead SRE you will be providing technical leadership, direction and accountability for platform engineering, system design and end-to-end...


  • Santa Clara, California, United States NVIDIA Full time

    As a Senior Manager in Site Reliability Engineering (SRE) at NVIDIA, you will lead a team dedicated to the design, construction, and maintenance of expansive production systems, emphasizing high efficiency and availability. This role spans various domains, including software and systems engineering, cloud-scale storage, data management, and services. SRE...


  • Santa Clara, California, United States Palo Alto Networks Full time

    About the RolePalo Alto Networks is seeking an experienced Principal Site Reliability Engineer to join our Cloud Infrastructure team. As a key member of our team, you will be responsible for designing, building, and maintaining scalable and reliable cloud infrastructure to support our mission-critical applications.Key ResponsibilitiesDesign and implement...


  • Santa Clara, United States NVIDIA Full time

    NVIDIA is the leader in AI, machine learning and datacenter acceleration. NVIDIA is expanding that leadership into datacenter networking with ethernet switches, NICs and DPUs NVIDIA has continuously reinvented itself over two decades. Our invention of the GPU in 1999 sparked the growth of the PC gaming market, redefined modern computer graphics, and...


  • Santa Barbara, California, United States Invoca Full time

    **Company Overview:**    Invoca is a leading AI and machine learning-powered Conversation Intelligence company, with over 300 employees and 2,000+ customers. The company has achieved significant growth, reaching $100M in revenue and raising over $184M from top venture capitalists.About the Role:The Senior Site Reliability Engineer will be part of the...

  • Structural Engineer

    2 weeks ago


    santa ana, United States CyberCoders Full time

    If you are a Structural Engineer with degree in engineering and minimum 3 years of industry experience, please read on!Based on Santa Ana, CA, we are a national structural engineering company that uses technology to provide on-time and high quality structural engineering services to clients across the nation. We specialize in retail, commercial, industrial,...

  • Structural Engineer

    3 weeks ago


    Santa Ana, United States CyberCoders Full time

    If you are a Structural Engineer with degree in engineering and minimum 3 years of industry experience, please read on!Based on Santa Ana, CA, we are a national structural engineering company that uses technology to provide on-time and high quality structural engineering services to clients across the nation. We specialize in retail, commercial, industrial,...

  • Geotechnical Engineer

    2 weeks ago


    santa ana, United States LVI Associates Full time

    Position: Geotechnical EngineerJob Summary: We are seeking a Geotechnical Engineer to join our team. In this role, you will assist in planning and executing geotechnical investigations, overseeing foundation and earthwork construction, solving technical challenges in field and laboratory settings, and preparing engineering reports for a variety of projects....

  • Geotechnical Engineer

    2 weeks ago


    Santa Ana, United States LVI Associates Full time

    Position: Geotechnical EngineerJob Summary: We are seeking a Geotechnical Engineer to join our team. In this role, you will assist in planning and executing geotechnical investigations, overseeing foundation and earthwork construction, solving technical challenges in field and laboratory settings, and preparing engineering reports for a variety of projects....

  • Geotechnical Engineer

    2 weeks ago


    Santa Ana, United States LVI Associates Full time

    Position: Geotechnical EngineerJob Summary: We are seeking a Geotechnical Engineer to join our team. In this role, you will assist in planning and executing geotechnical investigations, overseeing foundation and earthwork construction, solving technical challenges in field and laboratory settings, and preparing engineering reports for a variety of projects....

  • Geotechnical Engineer

    2 weeks ago


    santa ana, United States LVI Associates Full time

    Position: Geotechnical EngineerJob Summary: We are seeking a Geotechnical Engineer to join our team. In this role, you will assist in planning and executing geotechnical investigations, overseeing foundation and earthwork construction, solving technical challenges in field and laboratory settings, and preparing engineering reports for a variety of projects....

  • Sr R&D Engineer

    2 months ago


    Santa Ana, United States Entegee Full time

    Job DescriptionJob DescriptionSummary:Seeking a Senior Engineer with extensive experience in heart valve and Class III medical device development to lead the design, testing, and analysis of heart valves and delivery systems.Job Requirements:BS in Engineering with 5+ years of experience.3+ years of experience with heart valves and Class III medical...


  • Santa Clara, CA, United States NVIDIA Full time

    NVIDIA is the leader in AI, machine learning and datacenter acceleration. NVIDIA is expanding that leadership into datacenter networking with ethernet switches, NICs and DPUs NVIDIA has continuously reinvented itself over two decades. Our invention of the GPU in 1999 sparked the growth of the PC gaming market, redefined modern computer graphics, and...


  • Santa Clara, California, United States Applied Materials Full time

    Job SummaryWe are seeking a highly skilled Mechanical Engineer III to join our team at Applied Materials. As a key member of our engineering team, you will be responsible for designing, developing, and implementing mechanical systems and processes to support our ALD process chambers.Key ResponsibilitiesDesign and develop mechanical systems and processes to...

  • Reliability Engineer

    6 months ago


    Santa Clara, United States Comtech Full time

    Comtech Telecommunications Corp. has an opportunity in Santa Clara, CA for a Reliability/Failure Analysis Engineer. In this important role, you will collaborate with a diverse team of technical professionals and interact with outside customers, providing solutions to a variety of technical problems of moderate scope and complexity. The successful candidate...


  • santa clara, United States OpenLight Full time

    Company DescriptionOpen Light is a photonics design company that offers the world’s first open silicon photonics platform with integrated lasers. Our technology is used in Datacom, telecom, LiDAR, healthcare, HPC, AI, and optical computing applications. Our goal is to improve the performance, power efficiency, and reliability of your design using our open...


  • Santa Clara, United States OpenLight Full time

    Company DescriptionOpen Light is a photonics design company that offers the world’s first open silicon photonics platform with integrated lasers. Our technology is used in Datacom, telecom, LiDAR, healthcare, HPC, AI, and optical computing applications. Our goal is to improve the performance, power efficiency, and reliability of your design using our open...


  • Santa Clara, CA, United States TEKsystems Full time

    No C2C or sub-contracting available. This is a W2 opportunity only. This role sits is PST timezone and shifts are typically Wed-Sat or Sun-Thurs. (Four 10 hour shifts)Top Skills' DetailsDeep technical knowledge with BGP and VXLAN protocolsExperience with one or more of the following CSP environments: AWS, Azure, GCP, OCIExperience with high performance...


  • santa ana, United States Silex Technology Full time

    About Silex Technology Silex Technology prides itself on our wireless expertise, unrivaled quality, and dedicated support in delivering highly reliable and secure Wi-Fi connectivity solutions to medical, industrial, and commercial customers. We support this across a broad mix of products that meet even the most demanding connectivity requirements. Building...


  • Santa Ana, United States Silex Technology Full time

    About Silex Technology Silex Technology prides itself on our wireless expertise, unrivaled quality, and dedicated support in delivering highly reliable and secure Wi-Fi connectivity solutions to medical, industrial, and commercial customers. We support this across a broad mix of products that meet even the most demanding connectivity requirements. Building...