Sr. Site Reliability Engineer

4 weeks ago


Durham, United States NetApp Full time
Sr. Site Reliability Engineer at NetApp summary: The Sr. Site Reliability Engineer at NetApp operates at the intersection of development and operations, ensuring the reliability and scalability of cloud services through innovative design and automation. Key responsibilities include managing production environments, building systems for infrastructure management, and implementing observability stacks while mentoring junior engineers. Strong expertise in scripting, containerization, and cloud platforms is essential for success in this dynamic role.

About NetApp NetApp is the intelligent data infrastructure company, turning a world of disruption into opportunity for every customer. No matter the data type, workload or environment, we help our customers identify and realize new business possibilities. And it all starts with our people. If this sounds like something you want to be part of, NetApp is the place for you. You can help bring new ideas to life, approaching each challenge with fresh eyes. We embrace diversity and openness because it's in our DNA. Of course, you won't be doing it alone. At NetApp, we're all about asking for help when we need it, collaborating with others, and partnering across the organization - and beyond. "At NetApp, we fully embrace and advance a diverse, inclusive global workforce with a culture of belonging that leverages the backgrounds and perspectives of all employees, customers, partners, and communities to foster a higher performing organization."-George Kurian, CEO

Job Summary

As a Sr. Site Reliability Engineer, you operate seamlessly between development and operations. You will engage in and improve the lifecycle of cloud services - from design to deployment, operation, and refinement. You will maintain services by measuring and monitoring availability, latency, and overall system health. You will play a key role in scaling systems sustainably through automation and evolving them by pushing for changes to improve reliability and velocity. You will administer cloud-based environments that support our SaaS (Software as a Service) / IaaS (Infrastructure as a Service) offerings implemented on a microservices, container-based architecture (Kubernetes). To be successful in this role, you must be a motivated self-starter and self-learner, possess strong problem-solving skills; and be someone who embraces challenges.

Key Responsibilities Managing production environments by monitoring availability and taking a holistic view of platform and product health. Building software and systems to manage platform infrastructure and applications. Expert in identifying and strategizing stability and reliability issues in product code. Ability to mentor SRE (Site Reliability Engineering) engineers and coach automation first mindset Partner with development teams to improve services through rigorous testing and release procedures Ability to identify and balance the infrastructure feature acceleration vs. Well-deserved pause and fix Debug and troubleshoot service bottlenecks throughout the whole software stack. Measure and monitor availability, latency, and overall system health.

Develop and improve instrumentation for monitoring and logging the health and availability of services Conduct CICD operations to deploy an assortment of software deliverables across a global, production environment Provide architectural guidance to optimize the observability stack across NetApp’s cloud services Be hands-on in the implementation of our observability stack. You have driven the deployment of these tools at scale and have experience working with a rapidly growing infrastructure. Build dashboards to provide insights and visibility into critical business metrics for a variety of audiences from engineering and SRE teams Job Requirements

At least 8 years of experience is required. Experience in writing, troubleshooting and bug fixing product code Scripting and infrastructure automation using, for example, Ansible, Python, Go, Perl, or Ruby. Deep working knowledge of Containers, Kubernetes, and Serverless computing implementation. Understanding of SDLC lifecycle and DevOps development methodologies Experience with one of the three (AWS, Azure, GCP) hyper-scalers. Experience in defining, applying, and managing SLAs, SLOs and SLIs to the product. Good interpersonal communication and customer service skills are needed to work successfully with stakeholders in high-stress and/or ambiguous situations This role includes on-call work and travel sometimes. Education

Bachelor of Science Degree in Computer Science, a master’s degree; or equivalent experience is required. Compensation: The target salary range for this position is 152,150 - 216,590 USD. The salary offered will be determined by the candidate's location, qualifications, experience, and education and may be outside of this range. Final compensation packages are competitive and in line with industry standards, reflecting a variety of factors, and include a comprehensive benefits package. This may cover Health Insurance, Life Insurance, Retirement or Pension Plans, Paid Time Off (PTO), various Leave options, Performance-Based Incentives, employee stock purchase plan, and/or restricted stocks (RSU’s), with all offerings subject to regional variations and governed by local laws, regulations, and company policies. Benefits may vary by country and region, and further details will be provided as part of the recruitment process. Equal Opportunity Employer: NetApp is firmly committed to Equal Employment Opportunity (EEO) and to compliance with all federal, state and local laws that prohibit employment discrimination based on age, race, color, gender, sexual orientation, gender identity, national origin, religion, disability or genetic information, pregnancy, protected veteran status, and any other protected classification. Did you know... Statistics show women apply to jobs only when they're 100% qualified. But no one is 100% qualified. We encourage you to shift the trend and apply anyway We look forward to hearing from you. Why NetApp? We are all about helping customers turn challenges into business opportunity. It starts with bringing new thinking to age-old problems, like how to use data most effectively to run better - but also to innovate. We tailor our approach to the customer's unique needs with a combination of fresh thinking and proven approaches. We enable a healthy work-life balance. Our volunteer time off program is best in class, offering employees 40 hours of paid time per year to volunteer with their favorite organizations. We provide comprehensive medical, dental, wellness, and vision plans for you and your family.

We offer educational assistance, legal services, and access to discounts. Finally, we provide financial savings programs to help you plan for your future. If you want to help us build knowledge and solve big problems, let's talk.

Keywords: Site Reliability Engineering, Cloud Services, Kubernetes, DevOps, Infrastructure Automation, Monitoring and Observability, Microservices, SaaS, IaaS, Continuous Integration and Deployment



  • Durham, NC, United States NetApp, Inc. Full time

    Job Summary As a Sr. Site Reliability Engineer, you operate seamlessly between development and operations. You will engage in and improve the lifecycle of cloud services - from design to deployment, operation, and refinement. You will maintain services by measuring and monitoring availability, latency, and overall system health. You will play a key role in...

  • Sr. Devops

    4 weeks ago


    Durham, United States CapB InfoteK Full time

    CapB is a global leader on IT Solutions and Managed Services. Our R&D is focused on providing cutting edge products and solutions across Digital Transformations from Cloud, AI/ML, IOT, Blockchain to MDM/PIM, Supply chain, ERP, CRM, HRMS and Integration solutions. For our growing needs we need consultants who can work with us on salaried or contract basis. We...


  • Durham, NC, United States Cisco Full time

    **Application window is expected to close on October 30th, 2024** Who We Are Cisco Spaces is an industry leading indoor location as a service solution to gain insights into the behavior of end user devices and network- connected objects in any place with wireless connectivity, allowing customers to make informed business decisions, optimize operations, and...


  • Durham, United States DataVisor Full time

    DataVisor is a next generation security company that utilizes industry leading unsupervised machine learning to detect fraudulent activity for financial transactions, mobile user acquisition, social networks, commerce and money laundering. Our solution is used by some of the largest internet properties in the world, including Pinterest, FedEx, AirAsia,...


  • Durham, United States Zachary Piper Solutions Full time

    Piper Companies is seeking an Sr. Azure Cloud Engineer to join an innovative, comprehensive, and global company located in Durham, NC through a Remote work schedule . The Sr. Azure Cloud Engineer is responsible for designing, implementing, and managing cloud-based solutions on Microsoft Azure. Responsibilities of the Sr. Azure Cloud Engineer...


  • Durham, United States Cisco Full time

    Application Deadline 1/20/25 Who We Are At Cisco, we are a global leader in networking and IT, driving innovation and redefining how people connect, communicate, and collaborate. Our mission is to shape the future of the internet by creating unprecedented value and opportunity for our customers, employees, investors, and ecosystem partners. We are committed...


  • Durham, NC, United States DataVisor Full time

    DataVisor is a next generation security company that utilizes industry leading unsupervised machine learning to detect fraudulent activity for financial transactions, mobile user acquisition, social networks, commerce and money laundering. Our solution is used by some of the largest internet properties in the world, including Pinterest, FedEx, AirAsia,...


  • Durham, United States Red Hat Full time

    Red Hat is seeking a Site Reliability Engineer (SRE) to develop, scale, and operate our OpenShift managed cloud services. OpenShift is Red Hat’s enterprise Kubernetes distribution. As an SRE you will contribute to running OpenShift at scale by enabling customer self-service, making our monitoring system more sustainable, and eliminating work through...


  • Durham, NC, United States Nvidia Full time

    Join our team in Santa Clara, CA, USA as a Senior Site Reliability Engineer. At NVIDIA, you'll be part of the team shaping the future of computing and guaranteeing the smooth operation of our brand-new technologies. Our mission is to leverage AI's power to build outstanding and pioneering solutions that have a significant impact on the world.What you'll be...


  • Durham, United States Dell Full time

    Sr Principal Engineering Technologist at Dell summary:As a Sr Principal Engineering Technologist, you will lead the management and development of innovative Storage System Software Solutions, collaborating with product teams to drive Dell's success. Your role involves prototyping, testing, and integrating advanced technologies while mentoring junior team...


  • Durham, United States Dell Full time

    Sr Principal Engineering TechnologistFrom applied research to advanced engineering, the Engineering Technologist team has the expertise to shape ground-breaking products, material and processes.Before applying for this role, please read the following information about this opportunity found below.It’s a fascinating field of work.We’re involved in...


  • Durham, United States Dell Full time

    Sr Principal Engineering TechnologistFrom applied research to advanced engineering, the CTO Storage team has the expertise to shape ground-breaking Storage products, technologies, and innovations.If you are interested in applying for this job, please make sure you meet the following requirements as listed below.It’s a fascinating field of work.We’re...


  • Durham, United States NetApp Full time

    About NetAppNetApp is the intelligent data infrastructure company, turning a world of disruption into opportunity for every customer. No matter the data type, workload or environment, we help our customers identify and realize new business possibilities. And it all starts with our people.If this sounds like something you want to be part of, NetApp is the...


  • Durham, United States NetApp Full time

    About NetAppNetApp is the intelligent data infrastructure company, turning a world of disruption into opportunity for every customer. No matter the data type, workload or environment, we help our customers identify and realize new business possibilities. And it all starts with our people.If this sounds like something you want to be part of, NetApp is the...


  • Durham, United States Zoetis Full time

    Role Description POSITION SUMMARY The Sr. Principal System Engineer (Sr. Pr. SE) will work as a key member of the research and development team in the design, development, testing and integration of electro-mechanical biodevice products. This position requires a highly creative and seasoned leader with a proven track record in creating new product concepts,...


  • Durham, NC, United States Dell Technologies Full time

    Sr Principal Engineering TechnologistFrom applied research to advanced engineering, the CTO Storage team has the expertise to shape ground-breaking Storage products, technologies, and innovations. It’s a fascinating field of work. We’re involved in assessing the competition, developing Storage technology and product strategies and generating IP. We lead...


  • Durham, NC, United States NetApp, Inc. Full time

    Job Summary The Platform Reliability Engineering Architect will lead a dynamic team responsible for ensuring our critical systems' reliability, performance, and efficiency. This role involves a strategic blend of engineering and operations and requires a strong background in software development, systems engineering, and leadership. This is a pivotal...


  • Durham, United States NetApp Full time

    Sr. Mgr, Platform Engineering at NetApp summary: As a Senior Manager in Platform Engineering at NetApp, the role focuses on fostering a data-driven culture to enhance the reliability and performance of cloud services. Responsibilities include leading teams in automated DevOps practices, enhancing software deployment processes, and driving continuous...

  • Sr. Product Owner

    4 days ago


    Durham, United States Piper Companies Full time

    Piper Companies is seeking a Sr. Product Owner to support an award winning tech company. This position will be in Durham, NC 3 days a week . The Sr. Product Owner will support an administration with various projects and mentor other members as needed. Responsibilities of the Sr. Product Owner include: Help the organization with policy updates,...

  • Sr. Product Owner

    19 hours ago


    Durham, United States Piper Companies Full time

    Piper Companies is seeking a Sr. Product Owner to support an award winning tech company. This position will be in Durham, NC 3 days a week. The Sr. Product Owner will support an administration with various projects and mentor other members as needed. Responsibilities of the Sr. Product Owner include: Help the organization with policy updates, new products,...