Sr. Manager of SRE Operations

3 weeks ago


San Jose, United States Zededa Full time

ZEDEDA is a simple and scalable cloud-based IoT edge orchestration solution that delivers visibility, control and security for the distributed edge with the freedom of deploying and managing any app on any hardware at scale and connecting to any cloud or on-premises system. With ZEDEDA customers can seamlessly manage and deploy any compute node to instantly unlock the value of IoT data, make real-time decisions, maximize operational efficiency and drive new business outcomes. We are looking for an experienced Senior Site Reliability Engineer (SRE) who is seeking new challenges and wants to make their mark by contributing to the design and upkeep of an exciting start-up.

Reporting to the VP of Engineering, the Sr. Manager of SRE Operations is responsible for ensuring the availability of our SaaS platform and exceeding the uptime and performance requirements of our Fortune 500 customers. Together with the SRE Operations team you will implement processes and procedures that will ensure meeting the quality and predictability of disaster recovery, performance monitoring and alerting as well as reporting. ZEDEDA is ISO27001 and SOC2 certified which means that incidents need to be handled according to those standards. Being the lead of the team you will play a key role in ensuring the team performs beyond expectations and assists in growing the team. On-call responsibility is part of the role as well as implementing a strategy that supports 24 x 7 x 365 availability of the SRE Operations team, additionally you will be the initial escalation point for incidents and are responsible for ensuring they get resolved by including other teams if needed.

You will work with the SRE Technical Lead and team, as well as other groups in engineering to suggest and implement improvements for operating the platform. Regular reporting on the performance of the platform to upper management is expected. This is a hands-on role and you will perform your duties as part of the SRE Operations Team. You will interface with the Customer Experience Organization and when required meet with our customers. You are an energetic self-starter fully committed to our customers' success by putting yourself in our customer's shoes and constantly striving to make sure they can use our product at all times, by,

* Creating ecstatic customers
* Ensuring frictionless deployments
* Escalation management
* On-call duties
* Radiate energy and enthusiasm
* Be a (technical) leader to the team

Qualifications

* MS Computer Science, Information Technology or similar experience
* 10+ years experience in SRE, with 5+ years experience in a SRE Operating Lead role
* Leadership qualities and aspirations
* Project and escalation management skills
* Proven technical writing skills
* Excellent communication and written skills (English)

Requirements

* An infrastructure with global presence in USA, EMEA, China and GovCloud
* A large, complex, infrastructure with 20+ SaaS instances, 500+ VMs, 100+ databases, 10+ logging services
* Meeting SLOs and creating robust and insightful metrics for large infrastructures and multiple SaaS instances
* Capacity planning of a complex solution with 50k+ connected devices
* Continuously driving cost down to maintain a competitive advantage
* Managing a successful 24x7x365 on-call team and being point of escalation Implementing a structured incident management approach from the start of incident, resolution to root cause analyses.
* Industry standards compliance, ISO-27001, SOC-2
* Strong leadership skills with ability to coach and hIre A-players, and foster a culture of continuous improvement and automation.
* Putting security at the center of everything you do.
* Hands-on knowledge of: AWS, Azure or GCP
* Terraform, Ansible
* Python, Shell script(managed) Kubernetes, ArgoCD
* GitOps, Jenkins, Github Actions
* Datadog, Grafana Stack and Open Telemetry
* PostgreSQL, Redis, Hashicorp Vault, InfluxDB and Open Search
* Lacework, Blameless, Vanta

Pay & Benefits

Zededa's main compensation philosophy is to provide you with the opportunity to progress as you grow and develop with the company. The base pay range, dependent on your skills, qualifications, experience and location for this role is between $175,000 and $200,000, and will also include commission, equity and benefits components to round out your total compensation.



  • San Jose, United States ZEDEDA Full time

    ZEDEDA is a simple and scalable cloud-based IoT edge orchestration solution that delivers visibility, control and security for the distributed edge with the freedom of deploying and managing any app on any hardware at scale and connecting to any cloud or on-premises system. With ZEDEDA customers can seamlessly manage and deploy any compute node to instantly...

  • SRE Leader

    4 weeks ago


    San Jose, United States Glocomms Full time

    Senior SRE to SRE Leader - Data Infrastructure Glocomms has partnered with a leading social media platform with over 2.5B monthly users worldwide, seeking multiple Senior SRE's & SRE leaders to join their data infrastructure team. This team is a pioneer in innovation. We seamlessly merge software development and infrastructure operations to design, build,...

  • SRE Leader

    1 month ago


    San Jose, United States Glocomms Full time

    Senior SRE to SRE Leader - Data Infrastructure Glocomms has partnered with a leading social media platform with over 2.5B monthly users worldwide, seeking multiple Senior SRE's & SRE leaders to join their data infrastructure team. This team is a pioneer in innovation. We seamlessly merge software development and infrastructure operations to design, build,...

  • SRE Leader

    4 weeks ago


    San Jose, United States Glocomms Full time

    Senior SRE to SRE Leader - Data Infrastructure Glocomms has partnered with a leading social media platform with over 2.5B monthly users worldwide, seeking multiple Senior SRE's & SRE leaders to join their data infrastructure team. This team is a pioneer in innovation. We seamlessly merge software development and infrastructure operations to design, build,...


  • San Jose, United States Glocomms Full time

    One of Glocomms key partners who is a top tier technology brand is currently seeking a Tech Lead Manager, SRE for their Data Infrastructure team. Responsibilities: Lead the design and development of their large-scale cloud infrastructure Collaborate with cross-functional teams for the development of their core infrastructures in areas such as ML,...


  • San Jose, United States Glocomms Full time

    One of Glocomms key partners who is a top tier technology brand is currently seeking a Tech Lead Manager, SRE for their Data Infrastructure team. Responsibilities: Lead the design and development of their large-scale cloud infrastructure Collaborate with cross-functional teams for the development of their core infrastructures in areas such as ML,...

  • SRE Leader

    2 weeks ago


    San Diego, United States Glocomms Full time

    Senior SRE to SRE Leader - Data Infrastructure Be one of the first applicants, read the complete overview of the role below, then send your application for consideration. Glocomms has partnered with a leading social media platform with over 2.5B monthly users worldwide, seeking multiple Senior SRE's & SRE leaders to join their data infrastructure team....


  • San Jose, United States TikTok Full time

    ResponsibilitiesTikTok is the leading destination for short-form mobile video. Our mission is to inspire creativity and bring joy. TikTok has global offices including Los Angeles, New York, London, Paris, Berlin, Dubai, Singapore, Jakarta, Seoul and Tokyo. Why Join UsCreation is the core of TikTok's purpose. Our platform is built to help imaginations thrive....

  • Senior SRE/SDE

    14 hours ago


    San Jose, United States Selby Jennings Full time

    The Company: Our client is one of the world's leading social media companies. This platform allows innovative avenues to express creativity, explore interests, and most importantly global connectivity. Having over a billion users, this company pursues the best of the best engineering talent, while also forming dynamic teams who, like the users, are...


  • San Jose, United States Selby Jennings Full time

    The Company: Our client is one of the world's leading social media companies. This platform allows innovative avenues to express creativity, explore interests, and most importantly global connectivity. Having over a billion users, this company pursues the best of the best engineering talent, while also forming dynamic teams who, like the users, are...


  • San Jose, United States ICONMA Full time

    Job Description: " Senior level software engineer & customer facing consulting experience. Full-stack experience with both OOP (Java and Python) and scripting (TypeScript or NodeJS). Solid experience in enterprise systems integrations and web services (SOAP, REST). Deep knowledge in cloud infrastructure, DevOps (CI/CD), Ansible, Linux, networking & security....


  • San Jose, California, United States Selby Jennings Full time

    The Company: Our client is one of the world's leading social media companies. This platform allows innovative avenues to express creativity, explore interests, and most importantly global connectivity. Having over a billion users, this company pursues the best of the best engineering talent, while also forming dynamic teams who, like the users, are...


  • San Francisco, United States Jobs for Humanity Full time

    Job Description Position Type : Full time Type Of Hire : Experienced (relevant combo of work and education) Education Desired : Bachelor of Computer Science Travel Percentage : 1 - 5% Job Description We are Worldpay for Platforms. We have a rich 25-year history of being the author, scaler, and leader of integrated payments. Today our business serves...


  • San Diego, United States ServiceNow Full time

    **Manager, SRE Delivery Analytics** * 4810 Eastgate Mall, San Diego, California, United States * Full-time * Work Persona: Flexible * Region: AMS - North America and Canada * Employee Type: Regular **Company Description** ServiceNow is making the world of work, work better for people. Our cloud?based platform and solutions deliver digital workflows that...


  • San Ramon, United States The LaSalle Group Full time

    LaSalle Network has partnered with a well-established software provider that's based in San Ramon, CA, who's in need of a well-rounded, Site Reliability Engineer (SRE) - Grafana Observability - with a strong background in Grafana and related tools such as Prometheus and Telegraf. The ideal candidate will play a crucial role in accelerating the transition of...


  • San Ramon, United States LaSalle Network Full time

    LaSalle Network has partnered with a well-established software provider that's based in San Ramon, CA, who's in need of a well-rounded, Site Reliability Engineer (SRE) - Grafana Observability - with a strong background in Grafana and related tools such as Prometheus and Telegraf. The ideal candidate will play a crucial role in accelerating the transition of...

  • Sr. Product Manager

    3 weeks ago


    San Jose, United States CyberTec Full time

    Sr. Product Manager Long term project USC, GC, EAD Onsite in either San Jose, CA or Houston, TX or Seattle, WA only. 65-70/hr C2C Client name will be disclosed after screening. Sr. Product Manager- Security and Compliance needed for a long-term project with Cloud Computing client. Must Have: 10+ years of IT/Engineering 4-yr Tech Degree, FEDRAMP, NIST, Zero...


  • San Ramon, United States LaSalle Network Full time

    LaSalle Network has partnered with a well-established software provider that's based in San Ramon, CA, who's in need of a well-rounded, Site Reliability Engineer (SRE) - Grafana Observability - with a strong background in Grafana and related tools such as Prometheus and Telegraf. The ideal candidate will play a crucial role in accelerating the transition of...


  • San Ramon, United States LaSalle Network Full time

    LaSalle Network has partnered with a well-established software provider that's based in San Ramon, CA, who's in need of a well-rounded, Site Reliability Engineer (SRE) - Grafana Observability - with a strong background in Grafana and related tools such as Prometheus and Telegraf. The ideal candidate will play a crucial role in accelerating the transition of...


  • San Ramon, United States LaSalle Network Full time

    LaSalle Network has partnered with a well-established software provider that's based in San Ramon, CA, who's in need of a well-rounded, Site Reliability Engineer (SRE) - Grafana Observability - with a strong background in Grafana and related tools such as Prometheus and Telegraf. The ideal candidate will play a crucial role in accelerating the transition of...