Kafka Platform Site Reliability Engineer
4 weeks ago
As a Site Reliability Engineer for Kafka Platform, you will be responsible for carrying out SRE duties to ensure the smooth operation of the Kafka Streaming Platform. Your key responsibilities will include having a thorough understanding of the Kafka architecture, including producers, consumers, topics, and partitions. You will also be responsible for keeping an eye on the platform and adhering to runbooks/SOPs to manage platform and application problems. Additionally, you will need to familiarize yourself with cluster maintenance processes and implement changes as per the documented installation and validation plans. Your troubleshooting and debugging skills will be essential in pinpointing and rectifying issues, while also offering advice on how to prevent such problems in the future. You will also be responsible for conducting thorough root cause analysis of major production incidents, documenting for future reference, and putting in place proactive measures to enhance system reliability. Furthermore, you will automate routine tasks using scripts or automation tools to lessen manual work, decrease the chance of human errors, and boost system reliability.
Key Skills and Requirements:
At least 2-3 years of experience for a junior level role and 5+ for mid-level/senior level working as a Site reliability engineer for Kafka Platform.
Deep level Knowledge on core Kafka components like producers, consumers, topics, partitions etc.
Troubleshooting both Kafka platform service, application problems and identifying the root cause.
Writing Ansible playbooks and automate manual tasks using Ansible, shell scripting and python.
Should be familiar with Unix/Linux system internals, networking, and distributed systems.
Diverse Lynx LLC is an Equal Employment Opportunity employer. All qualified applicants will receive due consideration for employment without any discrimination. All applicants will be evaluated solely on the basis of their ability, competence and their proven capability to perform the functions outlined in the corresponding role. We promote and support a diverse workforce across all levels in the company.
-
Kafka Platform Engineer
4 weeks ago
Austin, Texas, United States Diverse Lynx Full timeJob Title: Kafka AdminAt Diverse Lynx LLC, we are seeking a highly skilled Kafka Admin to join our team. As a key member of our Site Reliability Engineering (SRE) team, you will be responsible for ensuring the smooth operation of our Kafka Streaming Platform.Key Responsibilities:Carry out SRE duties for the Kafka Streaming Platform, ensuring its reliability...
-
Kafka Site Reliability Engineer Onsite
4 weeks ago
Austin, Texas, United States Cognizant North America Full timeAbout the Role:Cognizant's Cloud, Infrastructure, and Security Services Practice (CIS) is focused on driving digital transformation through holistic modernization across layers.We help customers transform infrastructure and workplaces to meet the evolving needs of the digital era.Our approach delivers key results for customers by achieving cloud-driven...
-
Austin, Texas, United States Futran Tech Solutions Pvt. Ltd. Full timeJob Title: Site Reliability Engineer/Infrastructure SpecialistLocation: RemoteJob Type: Full-timeAbout the Role:We are seeking a highly skilled Site Reliability Engineer/Infrastructure Specialist to join our team at Futran Tech Solutions Pvt. Ltd. The ideal candidate will have experience supporting internet-facing production services and distributed systems,...
-
Site Reliability Engineer
3 weeks ago
Austin, Texas, United States Apple Full timeJob SummaryAt Apple, we are seeking a highly skilled Site Reliability Engineer to join our Ad Platforms team. As a Site Reliability Engineer, you will be responsible for ensuring the reliability and scalability of our ad-tech systems.Key ResponsibilitiesDesign and implement infrastructure and application monitoring and observability capabilities to improve...
-
Site Reliability Engineer
3 weeks ago
Austin, Texas, United States Apple Full timeJob Title: Site Reliability EngineerJob Summary:At Apple, we are seeking a highly skilled Site Reliability Engineer to join our Ad Platforms team. As a Site Reliability Engineer, you will be responsible for ensuring the reliability, scalability, and performance of our ad-tech systems.Key Responsibilities:Implement and improve our infrastructure and...
-
Site Reliability Engineer
4 weeks ago
Austin, Texas, United States Liquibase Full timeJob DescriptionWe are seeking a highly skilled Site Reliability Engineer to join our team at Liquibase. As a key member of our DevOps team, you will be responsible for designing, implementing, and maintaining highly resilient and secure infrastructure for our SaaS platform using AWS services.Key Responsibilities:Design and implement secure and scalable...
-
Site Reliability Engineer
3 weeks ago
Austin, Texas, United States Unreal Gigs Full timeJob Summary:At Unreal Gigs, we're seeking a skilled Site Reliability Engineer to join our team. As a Site Reliability Engineer, you'll play a critical role in ensuring the high availability, scalability, and performance of our complex distributed systems. You'll be responsible for building and maintaining highly reliable systems, automating infrastructure...
-
Senior Site Reliability Engineer, Energy Systems
4 weeks ago
Austin, Texas, United States Tesla Full timeJob SummaryWe are seeking a highly skilled Senior Site Reliability Engineer to join our Energy team at Tesla. As a key member of our team, you will be responsible for designing, building, and operating the infrastructure that powers our Energy IoT applications.Key ResponsibilitiesInvestigate and resolve complex technical issues related to the availability,...
-
Site Reliability Engineering Manager
3 weeks ago
Austin, Texas, United States Apple Full timeJob Title: Site Reliability Engineering ManagerAbout the Role:Apple is seeking a highly skilled Site Reliability Engineering Manager to lead our cloud services team. As a Site Reliability Engineering Manager, you will be responsible for establishing SRE practices for our private cloud service to accelerate our ability to reliably and consistently deliver...
-
Site Reliability Engineer
4 weeks ago
Austin, Texas, United States Apple Full timeJob SummaryApple is seeking a Site Reliability Engineer to join our team. As a Site Reliability Engineer, you will be responsible for ensuring the availability, performance, and maintenance of high-volume, highly available, mission-critical enterprise platforms and applications related to Apple Manufacturing & Product lifecycle.Key Responsibilities- Develop...
-
Site Reliability Engineer
4 weeks ago
Austin, Texas, United States Unreal Gigs Full timeJob Summary:At Unreal Gigs, we're seeking a highly skilled Site Reliability Engineer to join our team. As a Site Reliability Engineer, you'll play a critical role in ensuring the high availability, scalability, and performance of our complex distributed systems. You'll be responsible for designing, implementing, and maintaining reliable systems, automating...
-
Senior Site Reliability Engineer
3 weeks ago
Austin, Texas, United States Terminal Industries Full timeAbout UsTerminal Industries is a leading provider of software solutions for the logistics industry. Our platform digitizes, indexes, and automates the yard, leveraging best-in-class machine learning to optimize truck, trailer, chassis, container, and personnel usage.Our PlatformOur platform provides warehouse operators with the intelligence needed to...
-
Staff Site Reliability Engineer
4 weeks ago
Austin, Texas, United States H-E-B Full timeJob Title: Staff Site Reliability EngineerAt H-E-B, we're seeking a highly skilled Staff Site Reliability Engineer to join our team. As a key member of our digital infrastructure team, you'll be responsible for designing and implementing fault-tolerant architectures, ensuring the reliability and scalability of our systems.Responsibilities:Design and lead the...
-
Staff Site Reliability Engineer
4 weeks ago
Austin, Texas, United States ProCore CPA Full timeJob DescriptionWe're seeking a highly skilled Staff Site Reliability Engineer to join our Project Execution Group at Procore. As a key member of our team, you'll be responsible for leading, collaborating, and developing solutions to maintain the health of our core platform.The ideal candidate will have a passion for solving complex problems unique to running...
-
Austin, Texas, United States Teacher Retirement System of Texas Full timeJob Title: Azure Cloud Engineer/Platform Reliability EngineerAbout the Role:We are seeking a highly skilled Azure Cloud Engineer/Platform Reliability Engineer to join our team at the Teacher Retirement System of Texas. As a key member of our Core Platforms Department, you will be responsible for ensuring the reliability, scalability, and performance of our...
-
Austin, Texas, United States Teacher Retirement System of Texas Full timeJob Title: Azure Cloud Engineer/Platform Reliability EngineerWe are seeking a highly skilled Azure Cloud Engineer/Platform Reliability Engineer to join our team at the Teacher Retirement System of Texas. As a key member of our Core Platforms Department, you will be responsible for ensuring the reliability, scalability, and performance of our Information...
-
Software Engineer, Ad Platforms
4 weeks ago
Austin, Texas, United States Apple Full timeSoftware Engineer, Ad PlatformsAustin,Texas,United StatesAt Apple, we work every day to build products that enrich people's lives. Our Advertising Platforms group makes it possible for people around the world to easily access informative and imaginative content on their devices while helping publishers and developers promote and monetize their work.Today,...
-
Site Reliability Engineer
3 weeks ago
Austin, Texas, United States ORACLE AMERICA Full timeJob Summary:Oracle America is seeking a skilled Site Reliability Developer 3 to join our team in Austin, TX. As a Site Reliability Developer, you will be responsible for solving complex problems related to infrastructure and cloud services, and building automation to prevent problem recurrence.Key Responsibilities:Solve complex problems related to...
-
Staff Site Reliability Engineer
4 weeks ago
Austin, Texas, United States Procore Technologies Full timeJob DescriptionWe're seeking a highly skilled Staff Site Reliability Engineer to join our Project Execution Group at Procore Technologies. In this role, you'll lead and collaborate with a team of reliability engineers to maintain the health of our core platform.The ideal candidate will have expertise in container orchestration (Kubernetes), cloud automation...
-
Site Reliability Engineer
3 weeks ago
Austin, Texas, United States Terminal Industries Full timeAbout UsTerminal Industries builds software that digitizes, indexes, and automates the yard, leveraging best-in-class machine learning. Our platform provides warehouse operators with the intelligence needed to optimize their usage of trucks, trailers, chassis, containers, and personnel. These are the fundamental operating assets of commerce - and represent...