Kubernetes Platform Engineer

1 week ago


San Francisco, California, United States CloudFlare Full time

Overview

At Cloudflare, we operate a vast array of essential internal services utilizing Kubernetes. These services encompass our control plane, APIs, data analytics, and various internal tools that facilitate the management of our extensive global network. Our Kubernetes infrastructure is meticulously designed from the ground up and operates on bare metal Linux across multiple regions worldwide. The scale at which we function involves managing tens of thousands of containers and handling terabits per second of network traffic. Our team takes pride in knowing that our platform is integral to the functioning of the global Internet.

Key Responsibilities

  • Enhance automation, configuration management, and tooling for Kubernetes, Ceph, and Prometheus.
  • Architect scalable and resilient systems to accommodate company growth.
  • Optimize resource management, including CPU, bandwidth, and storage.
  • Fortify the platform against security vulnerabilities and resource contention challenges.
  • Refine our GitOps practices and systems.
  • Collaborate with application teams to identify challenges and guide them in architecting their systems on Kubernetes.
  • Contribute to the open-source community, engaging with projects such as Prometheus, Kubevirt, Contour, Envoy, Consul, cdk8s, Vault, Ceph, Cloudprober, Etcd, Calico, and Terraform.
  • Assist in responding to and preventing incidents affecting core platforms.

Qualifications

  • Proven experience managing production Kubernetes or similar orchestration platforms.
  • Recent experience with configuration management tools like SaltStack or Ansible.
  • Understanding of container runtimes within Linux, including isolation, storage, and networking.
  • Proficiency in coding with Bash, TypeScript, and Go.
  • Strong knowledge of IP networking, including routing and iptables.
  • Exceptional debugging skills in a Linux environment.
  • Experience with source control, including branching, merging, and rebasing.
  • Adept at breaking down complex problems into manageable components, discussing options, weighing trade-offs, and driving solutions.

Preferred Qualifications

  • Experience operating Kubernetes on-premise at scale in roles such as SRE, systems design, or architecture.
  • Expertise in providing guidance and building platforms across multiple zones and regions to support distributed, highly-available applications.
  • Operational experience with Etcd, Prometheus, Ceph, Rook, SaltStack, Vault, Calico, and other common CNIs like Cilium.

Compensation

Compensation may vary based on work location.

  • Estimated annual salary for Colorado-based hires: $137,000 - $167,000.
  • Estimated annual salary for New York City, Washington, and California (excluding Bay Area) based hires: $154,000 - $188,000.
  • Estimated annual salary for Bay Area-based hires: $162,000 - $198,000.

Equity and Benefits

This position is eligible for participation in Cloudflare's equity plan. Cloudflare provides a comprehensive benefits package designed to support you and your family, including:

  • Health & Welfare Benefits: Medical/Rx Insurance, Dental Insurance, Vision Insurance, Flexible Spending Accounts, Commuter Spending Accounts, Fertility & Family Forming Benefits, Mental Health Support, and Global Travel Medical Insurance.
  • Financial Benefits: Short and Long Term Disability Insurance, Life & Accident Insurance, 401(k) Retirement Savings Plan, and Employee Stock Participation Plan.
  • Time Off: Flexible paid time off encompassing vacation and sick leave, along with various leave programs including parental, medical, and bereavement leave.


  • San Francisco, California, United States Pager Full time

    PagerDuty empowers teams across various sectors to execute essential tasks that drive business progress through the PagerDuty Operations Cloud.We are in search of a Site Reliability Engineer to become a vital member of our SRE-Platform team. In this position, you will play a crucial role in constructing, managing, and enhancing the Kubernetes platform that...


  • San Francisco, California, United States Pager Full time

    PagerDuty empowers teams across various sectors to perform essential tasks that drive business success through the PagerDuty Operations Cloud.We are in search of a Site Reliability Engineer to enhance our SRE-Platform team. In this capacity, you will play a pivotal role in developing, sustaining, and scaling the Kubernetes infrastructure that underpins...


  • San Francisco, California, United States Pager Full time

    About PagerDutyPagerDuty empowers diverse teams to execute critical tasks that drive business success through the PagerDuty Operations Cloud.We are in search of a Site Reliability Engineer to enhance our SRE-Platform team. In this capacity, you will play a vital role in constructing, sustaining, and scaling the Kubernetes infrastructure that underpins...


  • San Francisco, California, United States Pager Full time

    About PagerDutyPagerDuty empowers diverse teams to execute critical operations that drive business success through the PagerDuty Operations Cloud.We are in search of a Site Reliability Engineer to enhance our SRE-Platform team. In this position, you will play a vital role in constructing, sustaining, and scaling the Kubernetes infrastructure that supports...


  • San Francisco, California, United States CloudFlare Full time

    OverviewAt Cloudflare, we operate critical internal services that leverage Kubernetes technology. These services encompass our control plane, APIs, data analytics, and various internal tools essential for managing our expansive global network. Our Kubernetes platforms are meticulously designed to run on bare metal Linux across diverse regions worldwide,...


  • San Francisco, California, United States CloudFlare Full time

    OverviewAt Cloudflare, we rely on Kubernetes to power many of our essential internal services. These services encompass our control plane, APIs, data analytics, and various internal tools that facilitate the management of our expansive global network. Our Kubernetes platforms are meticulously designed from the ground up, operating on bare metal Linux across...


  • San Francisco, California, United States CloudFlare Full time

    OverviewAt Cloudflare, we manage a vast array of internal services that are crucial to our operations, all of which are built on Kubernetes. These services encompass our control plane, APIs, data analytics, and various internal tools that facilitate the management of our global network. Our Kubernetes infrastructure is meticulously designed to operate on...


  • San Francisco, California, United States CloudFlare Full time

    OverviewAt Cloudflare, we operate numerous essential internal services utilizing Kubernetes. These services encompass our control plane, APIs, data analytics, and various internal tools essential for managing our expansive global network. Our Kubernetes infrastructure is meticulously designed to run on bare metal Linux across diverse regions worldwide,...


  • San Francisco, California, United States CloudFlare Full time

    OverviewAt Cloudflare, we manage a vast array of internal services that operate on Kubernetes, which are crucial for our control plane, APIs, data analytics, and various internal tools essential for overseeing our global network. Our Kubernetes infrastructure is meticulously designed to run on bare metal Linux across multiple regions worldwide, handling an...


  • San Jose, California, United States CISCO Systems Full time

    Position OverviewYou will be an integral part of a Cloud Infrastructure and Platform Automation (IPA) software engineering team focused on developing tools and integrations for a suite of cloud infrastructure services essential to Cisco's operations. Key ResponsibilitiesYour role will involve: Collaborating with core services team members to outline...


  • San Jose, California, United States CISCO Systems Full time

    Position OverviewThis role involves being an integral part of a dynamic team focused on Cloud Infrastructure and Platform Automation (IPA). The team is dedicated to developing tools and integrations that support a suite of cloud infrastructure services critical to business operations.Key Responsibilities- Collaborate with core services team members to...


  • San Jose, California, United States CISCO Systems Full time

    Position OverviewYou will play a crucial role within a Cloud Infrastructure and Platform Automation (IPA) software engineering team dedicated to developing tools and integrations for a suite of cloud infrastructure services that support Cisco's essential business operations. Key ResponsibilitiesYour primary responsibilities will include: Collaborating with...


  • San Jose, California, United States CISCO Systems Full time

    Position OverviewAs a pivotal member of the Cloud Infrastructure and Platform Automation (IPA) software engineering team, you will be instrumental in developing tools and integrations for a suite of cloud infrastructure services that underpin critical business operations. Key ResponsibilitiesYour role will encompass: Collaborating with fellow core services...


  • San Jose, California, United States CISCO Systems Full time

    Position OverviewYou will be an integral part of a Cloud Infrastructure and Platform Automation (IPA) software engineering team dedicated to developing tools and integrations for a suite of cloud infrastructure services that support critical business operations. Key ResponsibilitiesYour role will involve: Collaborating with core services team members to...


  • San Jose, California, United States CISCO Systems Full time

    Position OverviewYou will be an integral part of a Cloud Infrastructure and Platform Automation (IPA) software engineering team dedicated to creating tools and integrations for a suite of cloud infrastructure services essential to Cisco's operations. Key ResponsibilitiesAs a software engineer, you will leverage your extensive experience in enterprise-level...


  • San Francisco, California, United States Pager Full time

    PagerDuty empowers teams of all kinds to drive business forward through our Operations Cloud.We're seeking a Senior Site Reliability Engineer to join our SRE-Platform team. As a key contributor, you'll build, maintain, and scale our Kubernetes platform, accelerating developer productivity, improving reliability, and helping PagerDuty scale for the...


  • San Francisco, California, United States Postman, Inc. Full time

    About the RoleWe're seeking a seasoned Platform Engineer to join our team at Postman, Inc. as a Senior Platform Engineer, Observability Agent. This role will be responsible for building and maintaining our Observability Agent, ensuring seamless integration with various tech stacks, and collaborating with our team to drive design and architecture...


  • San Francisco, California, United States SpotOn Tranact LLC Full time

    About the RoleWe are seeking a highly skilled and experienced Platform Engineering Manager to join our team at SpotOn Tranact LLC. As a key member of our engineering organization, you will be responsible for leading the development and implementation of our cloud-based platform.Key ResponsibilitiesLead the design, implementation, and deployment of our cloud...


  • San Francisco, California, United States Mux Full time

    About MuxMux is video for developers. Our mission is to democratize video by solving the hard problems developers face when building video: video encoding and streaming (Mux Video), video monitoring (Mux Data), and more. Video is a huge part of people's lives, and we want to help make it better.We're committed to building a healthy team that welcomes diverse...


  • San Francisco, California, United States Adobe Full time

    About the RoleWe are seeking an experienced Senior Software Engineer to join our highly motivated and fast-paced team, building a new creativity platform that will help define the future of Creative Cloud and Adobe's Digital Media business.Key ResponsibilitiesDesign and develop reliable Continuous Delivery Pipelines for developing, building, testing, and...