Current jobs related to Principal Kafka Site Reliability Engineer DevOps - Santa Clara - Palo Alto Networks


  • Santa Clara, California, United States Palo Alto Networks Full time

    Job Title: Principal Kafka Site Reliability Engineer DevOpsWe are revolutionizing the cybersecurity landscape with our cloud-delivered security services, and our cloud infrastructure is rapidly expanding with a global presence.We're seeking exceptional SREs and software engineers interested in production engineering to help us scale the largest enterprise...


  • Santa Clara, California, United States Palo Alto Networks Full time

    We are revolutionizing the cybersecurity landscape with our cloud-delivered security services, and our cloud infrastructure is rapidly expanding globally.We're seeking experienced SREs and software engineers interested in production engineering to help us scale the world's largest enterprise security cloud infrastructure.Palo Alto Networks has transformed...


  • Santa Clara, California, United States Palo Alto Networks Full time

    About the RolePalo Alto Networks is seeking a highly skilled Principal Site Reliability Engineer to join our team. As a Principal Site Reliability Engineer, you will be responsible for designing, building, and operating reliable, secure cloud infrastructure. You will work closely with developers, researchers, data scientists, and security experts to ensure...


  • Santa Clara, California, United States Palo Alto Networks Full time

    Job Title: Principal Site Reliability EngineerWe are seeking a highly skilled Principal Site Reliability Engineer to join our team at Palo Alto Networks. As a key member of our engineering team, you will be responsible for designing, building, and operating reliable, secure cloud infrastructure.About the RoleThis is a unique opportunity to work with a...


  • Santa Clara, California, United States Palo Alto Networks Full time

    About the RolePalo Alto Networks is seeking a highly skilled Principal Site Reliability Engineer to join our team. As a key member of our engineering team, you will be responsible for designing, building, and operating reliable, secure cloud infrastructure.Key ResponsibilitiesContribute to the success of SRE and DevOps teamsDevelop expertise in new...


  • Santa Clara, California, United States Palo Alto Networks Full time

    About the RolePalo Alto Networks is seeking a highly skilled Principal Site Reliability Engineer to join our team. As a Principal Site Reliability Engineer, you will be responsible for designing, building, and operating reliable, secure cloud infrastructure. You will work closely with developers, researchers, data scientists, and security experts to ensure...


  • Santa Clara, California, United States Palo Alto Networks Full time

    About the RolePalo Alto Networks is seeking a highly skilled Principal Site Reliability Engineer to join our team. As a key member of our infrastructure platform, you will be responsible for designing, building, and operating reliable and secure cloud infrastructure.Key ResponsibilitiesContribute to the success of SRE and DevOps teams by developing expertise...


  • Santa Clara, California, United States Palo Alto Networks Full time

    Job DescriptionPalo Alto Networks is seeking a highly skilled Principal Site Reliability Engineer to join our team. As a key member of our engineering team, you will be responsible for designing, building, and operating reliable and secure cloud infrastructure.You will work closely with our development team to ensure that applications are production-ready,...


  • Santa Clara, California, United States Palo Alto Networks Full time

    About the RolePalo Alto Networks is seeking a highly skilled Principal Site Reliability Engineer to join our team. As a Principal Site Reliability Engineer, you will be responsible for designing, implementing, and maintaining scalable and reliable infrastructure to support our mission-critical platforms.Key ResponsibilitiesDesign and implement scalable and...


  • Santa Clara, California, United States Palo Alto Networks Full time

    Job Title: Principal Site Reliability EngineerWe are seeking a highly skilled Principal Site Reliability Engineer to join our Global Customer Operations team at Palo Alto Networks. As a key member of our SRE team, you will be responsible for designing, building, maintaining, and scaling production services and server farms within our FedRAMP SASE product...


  • Santa Clara, California, United States Palo Alto Networks Full time

    About the RoleWe are seeking a highly skilled Senior Staff Site Reliability Engineer to join our CDL/SLS team at Palo Alto Networks. As a key member of our team, you will be responsible for designing, building, and operating reliable and secure cloud infrastructure.Our Infrastructure Platform stack includes Terraform, Kubernetes, GitLab CI/CD, GitOps,...


  • Santa Clara, California, United States Palo Alto Networks Full time

    About the RolePalo Alto Networks is seeking a highly skilled Senior Staff Site Reliability Engineer to join our team. As a key member of our engineering team, you will be responsible for designing, building, and operating reliable, secure cloud infrastructure.Key ResponsibilitiesDevelop expertise in new technologies and contribute to the success of SRE and...


  • Santa Clara, California, United States Roche Holdings Inc. Full time

    About the Role:Roche is seeking a Principal DevOps Engineer to lead the QCS Algorithms deployments. The ideal candidate will have experience in designing and developing build, release, and deploy toolchains for DevOps, as well as setting up and managing parity across development, staging, and production environments in cloud infrastructure.Key...


  • Santa Clara, California, United States Palo Alto Networks Full time

    About the RolePalo Alto Networks is seeking a highly skilled Senior Staff Site Reliability Engineer to join our CDL/SLS team. As a key member of our team, you will be responsible for designing, building, and operating reliable and secure cloud infrastructure.Key ResponsibilitiesContribute to the success of SRE and DevOps teamsDevelop expertise in new...


  • Santa Clara, California, United States Palo Alto Networks Full time

    About the RoleWe are seeking a highly skilled Principal Site Reliability Engineer to join our team at Palo Alto Networks. As a key member of our Global Customer Operation Team, you will be responsible for designing, building, maintaining, and scaling production services and server farms within our FedRAMP SASE product portfolio.Key ResponsibilitiesDesign and...


  • Santa Clara, California, United States Palo Alto Networks Full time

    About the RoleWe are seeking a highly skilled Principal Site Reliability Engineer to join our team at Palo Alto Networks. As a key member of our Global Customer Operation Team, you will be responsible for designing, building, maintaining, and scaling production services and server farms within our FedRAMP SASE product portfolio.Key ResponsibilitiesDesign and...


  • Santa Clara, California, United States Centrify Corporation Full time

    Cloud Site Reliability EngineerAt Centrify Corporation, we're seeking a skilled Cloud Site Reliability Engineer to join our Cloud DevOps team. As a key member of our operations team, you'll play a critical role in ensuring the uptime and delivery of our cloud-based services.Key Responsibilities:Manage our cloud application using DevOps and Agile practices to...


  • Santa Clara, California, United States Palo Alto Networks Full time

    About the RoleWe are seeking a highly skilled Sr Staff Site Reliability Engineer to join our CDL/SLS team at Palo Alto Networks. As a key member of our engineering team, you will be responsible for designing, building, and operating reliable, secure cloud infrastructure.As a Sr Staff Site Reliability Engineer, you will contribute to the success of our SRE...


  • Santa Clara, California, United States Syntricate Technologies Full time

    Job DescriptionWe are seeking a highly skilled Site Reliability Engineer to join our team at Syntricate Technologies. As a Site Reliability Engineer, you will be responsible for ensuring the reliability, scalability, and performance of our cloud-based systems.Key Responsibilities:Design, implement, and maintain cloud infrastructure on AWS, including EC2,...


  • Santa Clara, California, United States Syntricate Technologies Full time

    Job DescriptionWe are seeking a highly skilled Site Reliability Engineer to join our team at Syntricate Technologies. As a Site Reliability Engineer, you will be responsible for ensuring the reliability, scalability, and performance of our cloud-based systems.Key ResponsibilitiesDesign, implement, and maintain cloud infrastructure on AWS, including EC2, SSM,...

Principal Kafka Site Reliability Engineer DevOps

2 months ago


Santa Clara, United States Palo Alto Networks Full time

We are reshaping the cybersecurity market through our cloud-delivered security services, and our cloud infrastructure is quickly and massively growing with a global footprint. We’re looking for great SREs, as well as software engineers interested in production engineering, to help us scale the largest enterprise security cloud infrastructure in the world. Description Palo Alto Networks reinvented the enterprise firewall, growing from a start-up to a multi-billion-dollar company. Our Application Framework, the latest offering in our cloud-delivered security services, ingests security events from hundreds of thousands of firewalls deployed across the globe to provide a massive data analytics platform for deep inspection, anomaly detection, and actionable security automation. Our cloud infrastructure is home to a series of massive and complicated distributed systems and virtualization software platforms which enable big data processing around security services, sandboxing and malware detection, URL categorization and malicious site/domain identification, and security research/response. RESPONSIBILITIES: You will be responsible for maintaining and scaling production Kafka clusters with very high ingestion rates, Zookeeper clusters, as well as other big data pipeline systems such as Kafka and HDFS. You will improve scalability, service reliability, capacity, and performance. You will write automation code for managing, monitoring, measuring, expanding, and healing clusters. You are not an operator, you’re an experienced software engineer focused on operations. You will do Kafka tuning, capacity planning, and deep dive troubleshooting. You will participate in the occasional on-call rotation supporting the infrastructure. You will roll up the sleeves to troubleshoot incidents, formulate theories and test your hypothesis, and narrow down possibilities to find the root cause. QUALIFICATIONS: Hands-on experience with managing production Kafka clusters. Strong development/automation skills. Must be very comfortable with reading and writing Python. Commits to Kafka source code would be a big plus. In-depth understanding of the internals of Kafka cluster management, Zookeeper, partitioning, topic replication and mirroring. Very good grasp of monitoring and metrics collection, performance tuning, and troubleshooting complicated situations with distributed systems. Tools-first mindset. You build tools for yourself and others to increase efficiency and to make hard or repetitive tasks easy and quick. Organized, focused on building, improving, resolving and delivering. Good communicator in and across teams, great teamwork, and a character of taking ownership. #J-18808-Ljbffr