Principal Database Reliability Engineer
3 weeks ago
Join Udemy. Help define the future of learning. Udemy is an AI‑powered skills acceleration platform built to help people and teams grow. It's personalized, practical, and focused on real‑world impact. Our mission is simple: to transform lives through learning. Your work helps people around the world build skills they can use, whether they're picking up something new or leveling up to stay ahead. Over 80 million learners and 17,000 businesses already learn with Udemy. If you're excited by change, energized by learning, and ready to have a real impact, you'll feel right at home. Principal Database Reliability Engineer As part of Udemy's Platform team, the Datastore Infrastructure (DSI) team oversees all aspects of Databases (MySQL, Aurora, DynamoDB), Message Queues (RabbitMQ), Streaming (Kafka), and Caching (Redis, Memcache) in our infrastructure. This includes ensuring uptime, security and compliance, observability, performance, improving developers’ productivity, and developing future growth strategies. The team is split between EU and US regions. You will play a vital role in overseeing day‑to‑day activities and engineering strategies of DSI, ensuring that millions of students worldwide achieve greater learning and career outcomes on Udemy. We value teamwork, a good sense of humor, strong ownership, technological curiosity, and a desire to learn. To be successful in this role, you will collaborate closely with engineering, product, and a diverse set of stakeholders around the world. You are not just interested in maintaining systems but also writing the software that maintains them. You strongly believe in a no‑blame culture and advocate for humane on‑call practices. You constantly seek opportunities for improvement and thrive in an environment where you can drive positive change. What you'll be doing Lead improvement projects for our datastores and platform teams to align with the company's long‑term objectives. Maintain infrastructure uptime, monitor performance, and ensure infrastructure continues scaling as we grow. Develop immutable infrastructure patterns and automate infrastructure provisioning via code (Terraform, Python, Ansible, etc.). Ensure adherence to PCI, ISO 27001, and SOC 2 security requirements, modifying CI/CD processes when necessary, and upholding policies and standards. Advocate for and implement positive changes in tools and processes through healthy discussions. Participate in on‑call rotation, demonstrating a systematic approach to incident management. Participate in day‑to‑day activities, support requests, and project‑related tasks for the team. Contribute to documentation, maintain ticketing queues, provide project support, troubleshoot, and offer after‑hours assistance as required. Provide coaching and mentorship to new hires, fostering their technical growth and integration into the team, while maintaining close communication with team members throughout their tenure. What you'll have 8–10 years of professional experience working in a Cloud Engineering, SRE, or DBRE team with infrastructure responsibilities managing large production workloads. Proficiency with managing MySQL at scale (horizontal scaling, sharding, InnoDB optimizations, query optimization, HA/DR, monitoring, backup strategy, security, automations). Strong understanding of running production workloads in Kubernetes. Proficiency with tools like Terraform, Ansible, Git and how to work with infrastructure as code and automated provisioning. Strong experience in Kafka cluster management, topic configuration, performance tuning, and ensuring high availability and fault tolerance. Experience with MSK is also good. Experience with message queues (MQ/SQS) and caching (Redis, Memcache) or similar products. Experience in Python. Knowledge of configuration‑management tools, monitoring systems (Datadog or similar) for database infrastructure, and scaling strategies for handling increased data volumes. Strong troubleshooting skills to diagnose complex database issues. Hands‑on experience with AWS cloud infrastructure and a grasp of security best practices. Adaptability and comfort working in a fast‑paced, hands‑on environment. Nice to have Experience with additional programming languages (Golang, Kotlin, Java). Experience implementing CDC pipelines for reliable data replication and synchronization. Experience with Vitess Operator running MySQL on Kubernetes. Experience writing Kubernetes Helm charts. Experience with tools like ArgoCD/Argo Workflows, or similar alternatives. Knowledge of security standards, vulnerability patching, TLS/SSL, and related topics. Any additional experience or familiarity with related technologies would be advantageous. We understand that not everyone will match each of the above qualifications. However, we also realize that everyone has unique experiences that can add value to our company. Even if you think your background might not perfectly align, we'd love to hear from you Posting Date: November 05, 2025Application window: November 05, 2025 – December 05, 2025 At Udemy, we strive to be transparent around compensation. Actual compensation for this role is based on several factors, including but not limited to job‑related skills, qualifications, experience, and specific work location due to differences in the cost of labor. In addition to a base salary, this role is also eligible for equity. Hiring Compensation Range: $184,000 – $230,000 USD Why work here? You’ll grow here. Learning is part of the job. You’ll get full access to Udemy courses, a monthly UDay to invest in yourself, and a budget to spend on whatever helps you improve. Many people are diving into AI lately, but what you focus on is up to you. AI is real here. We use it in the way we learn and the way we work. You’ll have the space and tools to experiment, apply, and get better at using AI in practical ways. You’ll own your work. We trust people to lead, make decisions, and follow through. You don't need to wait for permission or layers of approval to have an impact. You’ll build with others. We collaborate openly and shape ideas together. Everyone has a voice, and good thinking is welcomed from any direction. You’ll see your impact. What you build helps people grow their skills, change their careers, or find a path forward. You've got the experience, why not use it to help others gain theirs? Bring your curiosity. We’ll bring the platform and the support. Let’s LEARN together. Our Benefits Start with U Our benefits start with you and were built to provide you and your family with the protection and care you need, making it easy to access the right coverage when you need it most. Benefits vary by region; we encourage applicants to review the relevant regional benefit pages to gain an understanding of what we offer. For details on role‑specific benefits, please refer to the information provided during the hiring process. Benefits outlined are provided as a general overview and may vary depending on the location, role, and employment classification. All benefits are subject to change at the discretion of the organization and in accordance with applicable laws and policies. Information regarding data privacy is available within the Udemy Careers Privacy Notice. At Udemy, we value diversity and inclusion and consider qualified applicants without regard to race, color, religion, sex, national origin, ancestry, age, genetic information, sexual orientation, gender identity, marital or family status, veteran status, medical condition, or disability. We understand that not everyone will match each of the qualifications. However, we also realize that everyone has unique experiences that can add value to our company. Even if you think your background might not perfectly align, we'd love to hear from you #J-18808-Ljbffr
-
Principal Engineer, Managed Database Services
3 weeks ago
Austin, United States DigitalOcean Full timePrincipal Engineer, Managed Database Services Join to apply for the Principal Engineer, Managed Database Services role at DigitalOcean. Dive in and do the best work of your career at DigitalOcean. Journey alongside a strong community of top talent who are relentless in their drive to build the simplest scalable cloud. If you have a growth mindset, naturally...
-
Database Reliability Engineer
3 weeks ago
Austin, United States CloudFlare Full timeDatabase Reliability Engineer Available Locations: Austin TX Washington DC About the Department The Database Platform Team, a vital part of Cloudflare's Infrastructure Engineering organization, is dedicated to building and operating databases at scale. Our mission is to empower internal engineering teams, enabling them to deliver products quickly and...
-
Database Reliability Engineer
3 weeks ago
Austin, United States Electronic Arts (EA) Full timeOverviewElectronic Arts creates next-level entertainment experiences that inspire players and fans around the world. Here, everyone is part of the story. Part of a community that connects across the globe. A place where creativity thrives, new perspectives are invited, and ideas matter. A team where everyone makes play happen.EA's Production Infrastructure &...
-
Principal Database Reliability Engineer
2 weeks ago
Austin, United States BEDI Partnerships Full timeJoin Udemy. Help define the future of learning. Udemy is an AI-powered skills acceleration platform built to help people and teams grow. It’s personalized, practical, and focused on real-world impact. Our mission is simple: to transform lives through learning. Your work helps people around the world build skills they can use, whether they’re picking up...
-
Principal Software Engineer
2 weeks ago
Austin, United States Cloudera Full timeBusiness Area: Engineering Seniority Level: Director Job Description: At Cloudera, we empower people to transform complex data into clear and actionable insights. With as much data under management as the hyperscalers, we're the preferred data partner for the top companies in almost every industry. Powered by the relentless innovation of the open source...
-
Principal Software Engineer
2 weeks ago
Austin, TX, United States Cloudera Full timeBusiness Area: Engineering Seniority Level: Director Job Description: At Cloudera, we empower people to transform complex data into clear and actionable insights. With as much data under management as the hyperscalers, we're the preferred data partner for the top companies in almost every industry. Powered by the relentless innovation of the open source...
-
Principal Software Engineer
6 days ago
Austin, TX, United States Cloudera Full timeBusiness Area: Engineering Seniority Level: Director Job Description: At Cloudera, we empower people to transform complex data into clear and actionable insights. With as much data under management as the hyperscalers, we're the preferred data partner for the top companies in almost every industry. Powered by the relentless innovation of the open source...
-
Principal Mechanical Reliability Engineer
3 weeks ago
Austin, United States Dell GmbH Full timePrincipal Mechanical Reliability Engineer Mechanical Engineering leads and delivers the development of innovative and compliant mechanical design solutions, as well as cross-functional interfaces for desktop, portable and server computer systems and peripherals. Our team conducts the analysis, feasibility studies and testing of mechanical products,...
-
Principal Electrical Reliability Engineer
1 week ago
Austin, TX, United States Dell Technologies Full timePrincipal Electrical Reliability Engineer Our Electrical Engineering team puts the spark into the full hardware development lifecycle, from concept to production. It takes experts in system architecture definition, design, analysis, prototyping, sourcing & the debugging and validation of layouts or routes to deliver state-of-the-art products for a changing...
-
Principal Electrical Reliability Engineer
2 weeks ago
Austin, TX, United States Dell Technologies Full timePrincipal Electrical Reliability Engineer Our Electrical Engineering team puts the spark into the full hardware development lifecycle, from concept to production. It takes experts in system architecture definition, design, analysis, prototyping, sourcing & the debugging and validation of layouts or routes to deliver state-of-the-art products for a changing...