Kafka Platform Site Reliability Engineer

4 weeks ago


Austin, Texas, United States Diverse Lynx Full time
Job Description for Kafka SRE:

As a Site Reliability Engineer for Kafka Platform, you will be responsible for carrying out SRE duties to ensure the smooth operation of the Kafka Streaming Platform. Your key responsibilities will include having a thorough understanding of the Kafka architecture, including producers, consumers, topics, and partitions. You will also be responsible for keeping an eye on the platform and adhering to runbooks/SOPs to manage platform and application problems. Additionally, you will need to familiarize yourself with cluster maintenance processes and implement changes as per the documented installation and validation plans. Your troubleshooting and debugging skills will be essential in pinpointing and rectifying issues, while also offering advice on how to prevent such problems in the future. You will also be responsible for conducting thorough root cause analysis of major production incidents, documenting for future reference, and putting in place proactive measures to enhance system reliability. Furthermore, you will automate routine tasks using scripts or automation tools to lessen manual work, decrease the chance of human errors, and boost system reliability.

Key Skills and Requirements:
At least 2-3 years of experience for a junior level role and 5+ for mid-level/senior level working as a Site reliability engineer for Kafka Platform.
Deep level Knowledge on core Kafka components like producers, consumers, topics, partitions etc.
Troubleshooting both Kafka platform service, application problems and identifying the root cause.
Writing Ansible playbooks and automate manual tasks using Ansible, shell scripting and python.
Should be familiar with Unix/Linux system internals, networking, and distributed systems.

Diverse Lynx LLC is an Equal Employment Opportunity employer. All qualified applicants will receive due consideration for employment without any discrimination. All applicants will be evaluated solely on the basis of their ability, competence and their proven capability to perform the functions outlined in the corresponding role. We promote and support a diverse workforce across all levels in the company.

  • Austin, Texas, United States Diverse Lynx Full time

    Job Title: Kafka AdminAt Diverse Lynx LLC, we are seeking a highly skilled Kafka Admin to join our team. As a key member of our Site Reliability Engineering (SRE) team, you will be responsible for ensuring the smooth operation of our Kafka Streaming Platform.Key Responsibilities:Carry out SRE duties for the Kafka Streaming Platform, ensuring its reliability...


  • Austin, Texas, United States Cognizant North America Full time

    About the Role:Cognizant's Cloud, Infrastructure, and Security Services Practice (CIS) is focused on driving digital transformation through holistic modernization across layers.We help customers transform infrastructure and workplaces to meet the evolving needs of the digital era.Our approach delivers key results for customers by achieving cloud-driven...


  • Austin, Texas, United States Futran Tech Solutions Pvt. Ltd. Full time

    Job Title: Site Reliability Engineer/Infrastructure SpecialistLocation: RemoteJob Type: Full-timeAbout the Role:We are seeking a highly skilled Site Reliability Engineer/Infrastructure Specialist to join our team at Futran Tech Solutions Pvt. Ltd. The ideal candidate will have experience supporting internet-facing production services and distributed systems,...


  • Austin, Texas, United States Apple Full time

    Job SummaryAt Apple, we are seeking a highly skilled Site Reliability Engineer to join our Ad Platforms team. As a Site Reliability Engineer, you will be responsible for ensuring the reliability and scalability of our ad-tech systems.Key ResponsibilitiesDesign and implement infrastructure and application monitoring and observability capabilities to improve...


  • Austin, Texas, United States Apple Full time

    Job Title: Site Reliability EngineerJob Summary:At Apple, we are seeking a highly skilled Site Reliability Engineer to join our Ad Platforms team. As a Site Reliability Engineer, you will be responsible for ensuring the reliability, scalability, and performance of our ad-tech systems.Key Responsibilities:Implement and improve our infrastructure and...


  • Austin, Texas, United States Liquibase Full time

    Job DescriptionWe are seeking a highly skilled Site Reliability Engineer to join our team at Liquibase. As a key member of our DevOps team, you will be responsible for designing, implementing, and maintaining highly resilient and secure infrastructure for our SaaS platform using AWS services.Key Responsibilities:Design and implement secure and scalable...


  • Austin, Texas, United States Unreal Gigs Full time

    Job Summary:At Unreal Gigs, we're seeking a skilled Site Reliability Engineer to join our team. As a Site Reliability Engineer, you'll play a critical role in ensuring the high availability, scalability, and performance of our complex distributed systems. You'll be responsible for building and maintaining highly reliable systems, automating infrastructure...


  • Austin, Texas, United States Tesla Full time

    Job SummaryWe are seeking a highly skilled Senior Site Reliability Engineer to join our Energy team at Tesla. As a key member of our team, you will be responsible for designing, building, and operating the infrastructure that powers our Energy IoT applications.Key ResponsibilitiesInvestigate and resolve complex technical issues related to the availability,...


  • Austin, Texas, United States Apple Full time

    Job Title: Site Reliability Engineering ManagerAbout the Role:Apple is seeking a highly skilled Site Reliability Engineering Manager to lead our cloud services team. As a Site Reliability Engineering Manager, you will be responsible for establishing SRE practices for our private cloud service to accelerate our ability to reliably and consistently deliver...


  • Austin, Texas, United States Apple Full time

    Job SummaryApple is seeking a Site Reliability Engineer to join our team. As a Site Reliability Engineer, you will be responsible for ensuring the availability, performance, and maintenance of high-volume, highly available, mission-critical enterprise platforms and applications related to Apple Manufacturing & Product lifecycle.Key Responsibilities- Develop...


  • Austin, Texas, United States Unreal Gigs Full time

    Job Summary:At Unreal Gigs, we're seeking a highly skilled Site Reliability Engineer to join our team. As a Site Reliability Engineer, you'll play a critical role in ensuring the high availability, scalability, and performance of our complex distributed systems. You'll be responsible for designing, implementing, and maintaining reliable systems, automating...


  • Austin, Texas, United States Terminal Industries Full time

    About UsTerminal Industries is a leading provider of software solutions for the logistics industry. Our platform digitizes, indexes, and automates the yard, leveraging best-in-class machine learning to optimize truck, trailer, chassis, container, and personnel usage.Our PlatformOur platform provides warehouse operators with the intelligence needed to...


  • Austin, Texas, United States H-E-B Full time

    Job Title: Staff Site Reliability EngineerAt H-E-B, we're seeking a highly skilled Staff Site Reliability Engineer to join our team. As a key member of our digital infrastructure team, you'll be responsible for designing and implementing fault-tolerant architectures, ensuring the reliability and scalability of our systems.Responsibilities:Design and lead the...


  • Austin, Texas, United States ProCore CPA Full time

    Job DescriptionWe're seeking a highly skilled Staff Site Reliability Engineer to join our Project Execution Group at Procore. As a key member of our team, you'll be responsible for leading, collaborating, and developing solutions to maintain the health of our core platform.The ideal candidate will have a passion for solving complex problems unique to running...


  • Austin, Texas, United States Teacher Retirement System of Texas Full time

    Job Title: Azure Cloud Engineer/Platform Reliability EngineerAbout the Role:We are seeking a highly skilled Azure Cloud Engineer/Platform Reliability Engineer to join our team at the Teacher Retirement System of Texas. As a key member of our Core Platforms Department, you will be responsible for ensuring the reliability, scalability, and performance of our...


  • Austin, Texas, United States Teacher Retirement System of Texas Full time

    Job Title: Azure Cloud Engineer/Platform Reliability EngineerWe are seeking a highly skilled Azure Cloud Engineer/Platform Reliability Engineer to join our team at the Teacher Retirement System of Texas. As a key member of our Core Platforms Department, you will be responsible for ensuring the reliability, scalability, and performance of our Information...


  • Austin, Texas, United States Apple Full time

    Software Engineer, Ad PlatformsAustin,Texas,United StatesAt Apple, we work every day to build products that enrich people's lives. Our Advertising Platforms group makes it possible for people around the world to easily access informative and imaginative content on their devices while helping publishers and developers promote and monetize their work.Today,...


  • Austin, Texas, United States ORACLE AMERICA Full time

    Job Summary:Oracle America is seeking a skilled Site Reliability Developer 3 to join our team in Austin, TX. As a Site Reliability Developer, you will be responsible for solving complex problems related to infrastructure and cloud services, and building automation to prevent problem recurrence.Key Responsibilities:Solve complex problems related to...


  • Austin, Texas, United States Procore Technologies Full time

    Job DescriptionWe're seeking a highly skilled Staff Site Reliability Engineer to join our Project Execution Group at Procore Technologies. In this role, you'll lead and collaborate with a team of reliability engineers to maintain the health of our core platform.The ideal candidate will have expertise in container orchestration (Kubernetes), cloud automation...


  • Austin, Texas, United States Terminal Industries Full time

    About UsTerminal Industries builds software that digitizes, indexes, and automates the yard, leveraging best-in-class machine learning. Our platform provides warehouse operators with the intelligence needed to optimize their usage of trucks, trailers, chassis, containers, and personnel. These are the fundamental operating assets of commerce - and represent...