Sr. Manager of SRE Operations

4 weeks ago


San Jose, California, United States ZEDEDA Full time

ZEDEDA is a simple and scalable cloud-based IoT edge orchestration solution that delivers visibility, control and security for the distributed edge with the freedom of deploying and managing any app on any hardware at scale and connecting to any cloud or on-premises system. With ZEDEDA customers can seamlessly manage and deploy any compute node to instantly unlock the value of IoT data, make real-time decisions, maximize operational efficiency and drive new business outcomes. We are looking for an experienced Senior Site Reliability Engineer (SRE) who is seeking new challenges and wants to make their mark by contributing to the design and upkeep of an exciting start-up.

Reporting to the VP of Engineering, the Sr. Manager of SRE Operations is responsible for ensuring the availability of our SaaS platform and exceeding the uptime and performance requirements of our Fortune 500 customers. Together with the SRE Operations team you will implement processes and procedures that will ensure meeting the quality and predictability of disaster recovery, performance monitoring and alerting as well as reporting. ZEDEDA is ISO27001 and SOC2 certified which means that incidents need to be handled according to those standards. Being the lead of the team you will play a key role in ensuring the team performs beyond expectations and assists in growing the team. On-call responsibility is part of the role as well as implementing a strategy that supports 24 x 7 x 365 availability of the SRE Operations team, additionally you will be the initial escalation point for incidents and are responsible for ensuring they get resolved by including other teams if needed.

You will work with the SRE Technical Lead and team, as well as other groups in engineering to suggest and implement improvements for operating the platform. Regular reporting on the performance of the platform to upper management is expected. This is a hands-on role and you will perform your duties as part of the SRE Operations Team. You will interface with the Customer Experience Organization and when required meet with our customers. You are an energetic self-starter fully committed to our customers' success by putting yourself in our customer's shoes and constantly striving to make sure they can use our product at all times, by,

- Creating ecstatic customers

- Ensuring frictionless deployments

- Escalation management

- On-call duties

- Radiate energy and enthusiasm

- Be a (technical) leader to the team

Qualifications

  • MS Computer Science, Information Technology or similar experience
  • 10+ years experience in SRE, with 5+ years experience in a SRE Operating Lead role
  • Leadership qualities and aspirations
  • Project and escalation management skills
  • Proven technical writing skills
  • Excellent communication and written skills (English)

Requirements

  • An infrastructure with global presence in USA, EMEA, China and GovCloud
  • A large, complex, infrastructure with 20+ SaaS instances, 500+ VMs, 100+ databases, 10+ logging services
  • Meeting SLOs and creating robust and insightful metrics for large infrastructures and multiple SaaS instances
  • Capacity planning of a complex solution with 50k+ connected devices
  • Continuously driving cost down to maintain a competitive advantage
  • Managing a successful 24x7x365 on-call team and being point of escalation Implementing a structured incident management approach from the start of incident, resolution to root cause analyses.
  • Industry standards compliance, ISO-27001, SOC-2
  • Strong leadership skills with ability to coach and hIre A-players, and foster a culture of continuous improvement and automation.
  • Putting security at the center of everything you do.
  • Hands-on knowledge of: AWS, Azure or GCP
  • Terraform, Ansible
  • Python, Shell script(managed) Kubernetes, ArgoCD
  • GitOps, Jenkins, Github Actions
  • Datadog, Grafana Stack and Open Telemetry
  • PostgreSQL, Redis, Hashicorp Vault, InfluxDB and Open Search
  • Lacework, Blameless, Vanta

Pay & Benefits

Zededa's main compensation philosophy is to provide you with the opportunity to progress as you grow and develop with the company. The base pay range, dependent on your skills, qualifications, experience and location for this role is between $175,000 and $200,000, and will also include commission, equity and benefits components to round out your total compensation.



  • San Jose, California, United States Amaze Systems Inc. Full time

    Job OverviewPosition: Cloud Operations EngineerLocation: Various Locations (Onsite)Preferred Background: Candidates with experience from GoogleKey Competencies:DevOps PracticesSite Reliability Engineering (SRE)Google Cloud Platform (GCP)Kubernetes ManagementContainerization with DockerExperience Level: Maximum of 9 yearsContact: Annu Tiwari | Senior Talent...


  • San Jose, California, United States Amaze Systems Inc. Full time

    Job OverviewPosition: Cloud Operations EngineerLocation: Various locations (Onsite)Candidate Requirements: Must have experience from GoogleKey Competencies:DevOps methodologiesSite Reliability Engineering (SRE)Google Cloud Platform (GCP)Kubernetes orchestrationContainerization with DockerExperience Level: Maximum of 9 yearsContact: Annu Tiwari | Senior...


  • San Jose, California, United States Amaze Systems Inc. Full time

    Job OverviewPosition: Cloud Operations EngineerLocation: Remote options availableCandidate Requirements: Previous experience at Google preferredKey Competencies:DevOps methodologiesSite Reliability Engineering (SRE)Google Cloud Platform (GCP)Kubernetes orchestrationContainerization with DockerExperience Level: Maximum of 9 yearsContact: Annu Tiwari | Senior...


  • San Jose, California, United States Amaze Systems Inc. Full time

    Job OverviewPosition: Cloud Operations EngineerLocation: Remote (with potential onsite requirements)Preferred Background: Candidates with previous experience at GoogleKey Competencies:DevOps methodologiesSite Reliability Engineering (SRE)Google Cloud Platform (GCP)Kubernetes orchestrationContainerization with DockerExperience Level: Maximum of 9...


  • San Jose, California, United States Amaze Systems Inc. Full time

    Job OverviewPosition: Cloud Operations EngineerLocation: Multiple locations available (Onsite)Eligibility: Candidates with prior experience at Google are preferred.Required Skills:DevOps methodologiesSite Reliability Engineering (SRE)Google Cloud Platform (GCP)Kubernetes orchestrationContainerization with DockerExperience: Maximum of 9 years in relevant...


  • San Jose, California, United States Amaze Systems Inc. Full time

    Job OverviewPosition: Cloud Operations EngineerLocation: Remote Opportunities AvailableCandidate Requirements: Previous experience at Google preferredEssential Skills:DevOps methodologiesSite Reliability Engineering (SRE)Google Cloud Platform (GCP)Kubernetes orchestrationContainerization with DockerExperience Level: Up to 9 yearsContact: Annu Tiwari | Senior...

  • Senior Staff Engineer

    1 month ago


    San Jose, California, United States GEICO Full time

    ​​Distinguished Engineer - Network and Server Hardware SRE​​ Position Summary ​​GEICO is seeking an experienced Engineer with a passion for building high-performance, low maintenance, zero-downtime platforms, and applications. You will help drive our insurance business transformation as we transition from a traditional IT model to a tech...


  • San Jose, California, United States AECOM Full time

    Job DescriptionAECOM is seeking a Project Management Sr Manager to be based in San Jose, CA.• Plans, directs, and supervises all operations included in moderately sized projects with moderate risk, complexity, and financial impact.• Manages and leads all technical, financial, and client satisfaction areas using consistent processes and tools.• May...


  • San Diego, California, United States Platform Science Full time

    About the RoleWe are seeking a highly skilled Senior Cloud Reliability Engineer to join our team at Platform Science. As a key member of our cloud operations team, you will be responsible for ensuring the reliability and performance of our cloud-based services.Key ResponsibilitiesDevelop and Enhance CI/CD Pipelines: Design and implement Continuous...


  • San Jose, California, United States Hireio, Inc. Full time

    Exciting Opportunity: Data Infrastructure Site Reliability Engineering (SRE) TeamJoin Hireio, Inc., a premier platform for short-form mobile video hosting services. As a trailblazer in technology, our SRE team integrates software development with infrastructure management to architect, construct, and oversee extensive, highly distributed systems. We operate...


  • San Diego, California, United States Platform Science Full time

    About the RoleWe are seeking a highly skilled Senior Cloud Reliability Engineer to join our team at Platform Science. As a key member of our cloud operations team, you will be responsible for ensuring the reliability and performance of our cloud-based services.Key ResponsibilitiesDevelop and Enhance CI/CD Pipelines: Design and implement Continuous...


  • San Jose, California, United States Siri InfoSolutions Inc Full time

    Job OverviewHello,We are reaching out from Siri InfoSolutions Inc to present an exciting opportunity for a Site Reliability Engineer (SRE) with a strong background in Kubernetes and Linux.Position: SRE with Kubernetes Certification and Linux SkillsLocation: RemoteEmployment Type: Full-TimeKey Technical Skills Required:Proficiency in Docker, Kubernetes,...


  • San Jose, California, United States Adobe Full time

    Site Reliability Engineer page is loadedAdobe's Reliability Engineering team is looking for a Site Reliability Engineer (SRE) to help build and operate services like Adobe Sign. Adobe Sign is the fastest, and easiest way to get contracts signed and filed.You have a track record as a site reliability engineer in large-scale SaaS businesses, and a strong...


  • San Mateo, California, United States Zoox Full time

    The IT Platform division at Zoox is enhancing its focus on IT Technical Operations for our operational robot fleet, prioritizing real-time command center support, continuous monitoring services, and the integration of Site Reliability Engineering (SRE) methodologies. As the Senior Technical Operations Engineer, you will be instrumental in maintaining the...


  • San Jose, California, United States Cisco Systems, Inc. Full time

    About the RoleWe are seeking a seasoned Senior Software Development Manager to lead the development of our cutting-edge cloud service within Cisco Networking Engineering. As a key member of our team, you will be responsible for architecting and developing backend features, defining technical roadmaps, and coordinating with stakeholders to deliver integrated...

  • Reliability Engineer

    22 hours ago


    San Diego, California, United States Platform Science Full time

    About the RoleWe are seeking a highly skilled Senior Site Reliability Engineer to join our team at Platform Science. As a key member of our cloud operations team, you will be responsible for ensuring the reliability and performance of our cloud-based platform.Key ResponsibilitiesDevelop and enhance Continuous Integration/Continuous Deployment (CI/CD)...


  • San Jose, California, United States AEG Full time

    About the San Jose EarthquakesThe San Jose Earthquakes are a professional soccer team with a mission to establish our home as the epicenter of American soccer by uniting Northern California with big dreams, big hustle, and big impact. We are seeking a Stadium Operations Sr. Manager who will oversee the Earthquakes and PayPal Park maintenance, cleanliness,...


  • San Mateo, California, United States Zoox Full time

    The IT Platform division at Zoox is enhancing its focus on IT Technical Operations for our operational robot fleet, prioritizing real-time command center support, live monitoring services, and the integration of Site Reliability Engineering (SRE) methodologies. As the Senior Technical Operations Engineer, you will be pivotal in ensuring the reliability and...


  • San Jose, California, United States Vivotek USA Inc Full time

    Job DescriptionVIVOTEK USA Inc is seeking a highly skilled Sr. Accountant to join its financial team.Key Responsibilities:Ensure the accuracy and timeliness of financial records, documents, and payments.Maintain the Accounts Payable vendor master and perform material cost updates and cost simulations.Prepare financial reports, policies, and other written...


  • San Jose, California, United States The Accuro Group Full time

    Position: Cloud Compliance EngineerLocation: San Jose, CAJob Type: ContractAbout the role:As a Cloud Compliance Engineer, you will be responsible for ensuring compliance across various SRE teams. Your role involves addressing and resolving compliance gaps within company systems, networks, and applications to safeguard assets and data from regulatory...