Kafka Site Reliability Engineer DevOps

3 weeks ago


Santa Clara, California, United States Palo Alto Networks Full time

We are revolutionizing the cybersecurity landscape with our cloud-delivered security services, and our cloud infrastructure is rapidly expanding globally.

We're seeking experienced SREs and software engineers interested in production engineering to help us scale the world's largest enterprise security cloud infrastructure.

Palo Alto Networks has transformed the enterprise firewall market, growing from a start-up to a multi-billion-dollar company.

Our Application Framework, a key component of our cloud-delivered security services, processes security events from hundreds of thousands of firewalls worldwide, providing a massive data analytics platform for deep inspection, anomaly detection, and actionable security automation.

Our cloud infrastructure hosts complex distributed systems and virtualization software platforms, enabling big data processing for security services, sandboxing, and malware detection, URL categorization, and malicious site/domain identification, as well as security research/response.

RESPONSIBILITIES:


You will be responsible for maintaining and scaling production Kafka clusters with high ingestion rates, Zookeeper clusters, and other big data pipeline systems.

You will improve scalability, service reliability, capacity, and performance.

You will write automation code for managing, monitoring, measuring, expanding, and healing clusters.


You will participate in the occasional on-call rotation supporting the infrastructure.


You will roll up your sleeves to troubleshoot incidents, formulate theories, and test your hypothesis to find the root cause.


QUALIFICATIONS:


Hands-on experience with managing production Kafka clusters.

Strong development/automation skills, with a focus on Python.

In-depth understanding of Kafka cluster management, Zookeeper, partitioning, topic replication, and mirroring.

Excellent grasp of monitoring and metrics collection, performance tuning, and troubleshooting complex distributed systems.

Tools-first mindset, with a focus on building tools for efficiency and ease of use.

Organized, focused on building, improving, resolving, and delivering, with excellent communication skills and a strong team player mentality.



  • Santa Clara, California, United States Palo Alto Networks Full time

    About the RoleWe are seeking a highly skilled Senior Staff Site Reliability Engineer to join our CDL/SLS team at Palo Alto Networks. As a key member of our team, you will be responsible for designing, building, and operating reliable and secure cloud infrastructure.Our Infrastructure Platform stack includes Terraform, Kubernetes, GitLab CI/CD, GitOps,...


  • Santa Clara, California, United States Palo Alto Networks Full time

    About the RoleWe are seeking a highly skilled Sr Staff Site Reliability Engineer to join our CDL/SLS team at Palo Alto Networks. As a key member of our engineering team, you will be responsible for designing, building, and operating reliable, secure cloud infrastructure.As a Sr Staff Site Reliability Engineer, you will contribute to the success of our SRE...


  • Santa Clara, California, United States Palo Alto Networks Full time

    Job DescriptionPalo Alto Networks is seeking a highly skilled Principal Site Reliability Engineer to join our team. As a key member of our engineering team, you will be responsible for designing, building, and operating reliable and secure cloud infrastructure.You will work closely with our development team to ensure that applications are production-ready,...


  • Santa Clara, California, United States Palo Alto Networks Full time

    Job DescriptionPalo Alto Networks is seeking a highly skilled Senior Staff Site Reliability Engineer to join our CDL/SLS team. As a key member of our infrastructure team, you will be responsible for designing, building, and operating reliable and secure cloud infrastructure.Key Responsibilities:Develop expertise in new technologies and contribute to the...


  • Santa Clara, California, United States Palo Alto Networks Full time

    About UsPalo Alto Networks is a leader in the cybersecurity industry, dedicated to protecting the digital way of life. Our mission is to be the cybersecurity partner of choice, and we're looking for innovators who share our passion for shaping the future of cybersecurity.We're a company built on disruption, and we're looking for individuals who are...


  • Santa Clara, California, United States Palo Alto Networks Full time

    Job OverviewPalo Alto Networks is seeking a highly skilled Cloud Infrastructure Engineer to join our CDL/SLS team. As a Senior Staff Site Reliability Engineer, you will be responsible for designing, building, and operating reliable and secure cloud infrastructure.Our team is at the forefront of innovation, constantly pushing the boundaries of what is...


  • Santa Clara, California, United States Syntricate Technologies Full time

    Job Title: Site Reliability EngineeringWe are seeking a highly skilled Site Reliability Engineer to join our team at Syntricate Technologies. As a Site Reliability Engineer, you will be responsible for ensuring the reliability and scalability of our cloud-based systems.Key Responsibilities:Design and implement scalable and reliable cloud infrastructure using...


  • Santa Clara, California, United States Software Technology, Inc Full time

    Job Title: Service Reliability EngineerSoftware Technology, Inc is seeking a highly skilled Service Reliability Engineer to join our team.Key Responsibilities:Develop and implement DevOps practices to ensure 24x7 SaaS operationCollaborate with micro-service software developers, architects, and field integration resources to architect and deliver Ericsson's...


  • Santa Clara, California, United States Palo Alto Networks Full time

    Job DescriptionPalo Alto Networks is seeking a highly skilled Site Reliability Engineer to join our team. As a Site Reliability Engineer, you will be responsible for designing, building, maintaining, and scaling production services and server farms within our FedRAMP SASE product portfolio.Key ResponsibilitiesDesign and implement scalable and reliable...


  • Santa Clara, California, United States Roche Holdings Inc. Full time

    About the Role:Roche is seeking a Principal DevOps Engineer to lead the QCS Algorithms deployments. The ideal candidate will have experience in designing and developing build, release, and deploy toolchains for DevOps, as well as setting up and managing parity across development, staging, and production environments in cloud infrastructure.Key...


  • Santa Clara, California, United States Palo Alto Networks Full time

    Job DescriptionPalo Alto Networks is seeking a highly skilled Site Reliability Engineer to join our team. As a Site Reliability Engineer, you will be responsible for designing, building, and maintaining scalable and reliable infrastructure for our cloud-based products.Key Responsibilities:Design and implement scalable and reliable infrastructure for...


  • Santa Clara, California, United States Palo Alto Networks Full time

    strongJobDescription/strongbrbrYourCareerbrbrPaloAltoNetworksrunsalargeinfrastructureandisoneofthelargestGCPcustomers.pAsaSeniorStaffDevOpsEngineerfortheCDL/SLSteam,youwillbepartofateamsupportingtheservicesrunningonthisinfrastructure./pThisincludesautomation,architecture,performance,observability,troubleshooting,security,andreliability.pbrOurInfrastructurePl...


  • Santa Clara, California, United States NVIDIA Full time

    As a Senior Manager in Site Reliability Engineering (SRE) at NVIDIA, you will lead a team dedicated to the design, construction, and maintenance of expansive production systems, emphasizing high efficiency and availability. This role spans various domains, including software and systems engineering, cloud-scale storage, data management, and services. SRE...

  • DevOps Engineer

    4 weeks ago


    Santa Clara, California, United States Selector Software Full time

    Job OverviewSelector Software is seeking a skilled DevOps Engineer to play a pivotal role in ensuring the reliability, scalability, and performance of our cutting-edge AIOps platform. As a key member of our team, you will be responsible for overseeing the software delivery lifecycle, from infrastructure provisioning and configuration management to monitoring...


  • Santa Clara, California, United States NVIDIA Full time

    We are seeking a highly skilled and experienced DevOps Expert for Robotics to join our dynamic NVIDIA team. The ideal candidate will have a strong background in managing and optimizing software development and deployment processes, with expertise in Monorepo, Bazel, Git, Linux, Jenkins, Docker, Kubernetes, and Python.This role will involve leading the DevOps...


  • Santa Clara, California, United States NVIDIA Full time

    We are seeking a highly skilled and experienced DevOps Engineer to join our dynamic NVIDIA Robotics team.The ideal candidate will have a strong background in managing and optimizing software development and deployment processes, with expertise in Monorepo, Bazel, Git, Linux, Jenkins, Docker, Kubernetes, and Python.You will be working on many open-source and...


  • Santa Clara, California, United States Palo Alto Networks Full time

    Job DescriptionYour CareerThe Global Customer Operation Team is responsible for building products that protect data, workloads, and infrastructure for some of the largest enterprise customers in the world.We help our customers in their journey to the public cloud by ensuring they have the best in class protection.The public cloud market has been growing at a...


  • Santa Clara, California, United States NVIDIA Full time

    NVIDIA is a leader in AI, machine learning, and datacenter acceleration. Our company is expanding its leadership into datacenter networking with ethernet switches, NICs, and DPUs. We have continuously reinvented ourselves over two decades, with our invention of the GPU in 1999 sparking the growth of the PC gaming market, redefining modern computer graphics,...


  • Santa Clara, California, United States Palo Alto Networks Full time

    About the RoleWe are seeking a highly skilled Senior Staff Site Reliability Engineer to join our team at Palo Alto Networks. As a key member of our Cloud Infrastructure team, you will be responsible for designing, building, and operating reliable and secure cloud infrastructure.Our ideal candidate will have a strong background in cloud computing, with...


  • Santa Clara, California, United States Oracle Corporation Full time

    About the RoleWe are seeking a highly skilled DevOps Engineer to join our team at Oracle Corporation. As a DevOps Engineer, you will play a critical role in the development and deployment of our cloud-based solutions.Key ResponsibilitiesDesign and implement automated deployment and testing processesCollaborate with cross-functional teams to identify and...