Embedded Site Reliability Engineer

2 weeks ago


Mountain View, California, United States Samsung Electronics America North America Full time
Job Summary

We are seeking a highly skilled Embedded Site Reliability Engineer to lead the technical strategy and vision for our underpinning infrastructure, alerting & monitoring, infrastructure provisioning, networking, and development tooling in collaboration with other engineering teams and leadership.

Key Responsibilities:
  • Design, implement, and manage scalable and resilient infrastructure solutions for our advertising technology platform.
  • Collaborate with development teams to integrate DevOps best practices into the software development lifecycle.
  • Implement and maintain CI/CD pipelines to automate software delivery and deployment processes.
  • Monitor, troubleshoot, and optimize system performance to ensure high availability and reliability.
  • Evaluate and estimate capacity and growth plan projections for future.
  • Work closely with security teams to implement and enforce best practices for infrastructure security.
  • Participate in on-call rotations to provide 24/7 support for critical systems.
  • Continuously evaluate and implement new technologies to enhance the efficiency of our infrastructure.
  • Be the Infrastructure and Operation subject matter expert for the development team.
  • Plan for the future capacity and growth plans including disaster recovery and BCP.
Qualifications:
  • Typically requires at least 8 years of related experience and a Bachelor's degree; or 6 years and a Master's degree; or a PhD with 3 years.
  • Strong understanding of cloud technologies (e.g., AWS, Azure, GCP) and expertise managing cloud native applications on Kubernetes.
  • Proven experience with containerization and orchestration tools (e.g., Docker, Kubernetes).
  • Expertise in automation and scripting (e.g., Terraform, Ansible, Python) and infrastructure as code (IaC) tools.
  • Familiarity with monitoring and logging tools (e.g., Prometheus, Grafana, ELK stack).
  • Excellent communication and leadership skills.
  • Knowledge of Ad Tech would be preferable.


  • Mountain View, California, United States Samsung Electronics America North America Full time

    Transforming Advertising Technology with Samsung AdsSamsung Ads is revolutionizing the advertising landscape with cutting-edge technology and innovative services. As a key player in this evolution, we're seeking a talented Embedded Site Reliability Engineer to join our Global Ads Product & Engineering team.Key Responsibilities:Design and implement scalable...


  • Mountain View, California, United States Samsung Electronics Full time

    Position OverviewSamsung has evolved from a hardware leader into a vibrant ecosystem of innovative services across devices. At the forefront of this transformation is Samsung Ads, a flourishing division poised for significant growth.Our Global Ads Product & Engineering team, with a robust presence across multiple countries, is integral to this advancement....


  • Mountain View, California, United States Optomi Full time

    Optomi's Site Reliability Engineer OpportunityWe are seeking a skilled Site Reliability Engineer to join our team at Optomi, in partnership with a large consulting firm. This role requires a versatile and highly motivated individual who can provide frontline technical and operational support to our Site Reliability teams.Key Responsibilities:Collaborate with...


  • Mountain View, California, United States Groq Full time

    Job Title: Principal Site Reliability EngineerWe are seeking a highly skilled Principal Site Reliability Engineer to join our team at Groq. As a Principal Site Reliability Engineer, you will be responsible for ensuring the reliability, scalability, and performance of our tools and services for provisioning and managing the full lifecycle of Groq hardware and...


  • Mountain View, California, United States Groq Full time

    Job Title: Principal Site Reliability EngineerWe are seeking a highly skilled Principal Site Reliability Engineer to join our team at Groq. As a Principal Site Reliability Engineer, you will be responsible for ensuring the reliability, scalability, and performance of our tools and services for provisioning and managing the full lifecycle of Groq hardware and...


  • Mountain View, California, United States Groq Full time

    Job Title: Principal Site Reliability EngineerWe are seeking a highly skilled Principal Site Reliability Engineer to join our team at Groq. As a Principal Site Reliability Engineer, you will be responsible for ensuring the reliability, scalability, and performance of our tools and services for provisioning and managing the full lifecycle of Groq hardware and...


  • Mountain View, California, United States Groq Full time

    Job Title: Principal Site Reliability EngineerAt Groq, we're revolutionizing the AI economy by making processing power more accessible, faster, and more affordable. Our Language Processing Unit (LPU) outpaces the GPU in speed, power, efficiency, and cost-effectiveness. As a Principal Site Reliability Engineer, you'll play a crucial role in ensuring the...


  • Mountain View, California, United States Tik Tok Full time

    About the RoleWe are seeking a highly skilled Site Reliability Engineer to join our AML team, where you will play a critical role in designing, building, and maintaining highly available, scalable, and fault-tolerant systems.ResponsibilitiesDesign and develop large-scale systems that meet the needs of our users.Monitor and analyze system performance,...


  • Mountain View, California, United States Muon Space Full time

    About the RoleMuon Space is seeking a skilled Site Reliability Engineer to join our Platform Software team. Our team provides cloud infrastructure for Muon's Satellite Operations systems and Data Platform, as well as development and test systems for engineers across the company.Key ResponsibilitiesDevelop and maintain infrastructure-as-code components for...


  • Mountain View, California, United States Moveworks Full time

    About MoveworksMoveworks is a leading AI startup that provides a universal AI copilot for search and automation across all business applications. Our mission is to empower employees to work faster and more efficiently by eliminating repetitive support issues and delivering instant knowledge.Job DescriptionWe are seeking a highly skilled Staff Site...


  • Mountain View, California, United States Groq Full time

    Reliability Engineer at GroqWe are seeking a highly skilled Reliability Engineer to join our team at Groq. As a Reliability Engineer, you will play a critical role in ensuring the reliability, scalability, and performance of our tools and services.Key Responsibilities:Design and implement scalable and reliable architectures for our platform...


  • Mountain View, California, United States Bayone Full time

    Job DescriptionAt Bayone, we are seeking a highly skilled Site Reliability Engineer to join our team. As a key member of our infrastructure team, you will be responsible for ensuring the high availability and scalability of our online production environment.Minimum Qualifications:Bachelor's degree in Computer Science or a related technical field, or...


  • Mountain View, California, United States Samsung Electronics America North America Full time

    Job Title: Platform Site Reliability EngineerSamsung Ads is seeking a highly skilled Platform Site Reliability Engineer to join our Global Ads Product & Engineering team. As a key member of our team, you will play a crucial role in ensuring the reliability, scalability, and performance of our advertising technology platform.Key Responsibilities:Design,...


  • Mountain View, California, United States Samsung Electronics America North America Full time

    Job Title: Platform Site Reliability EngineerSamsung Ads is a thriving business poised for even greater success, and we're looking for a passionate leader to join our Global Ads Product & Engineering team.About the RoleWe're the innovators behind the products, tech, and tools driving ad-based monetization. As a Site Reliability Engineer specializing in...


  • Mountain View, California, United States Tik Tok Full time

    About the RoleWe are seeking a highly skilled Site Reliability Engineer to join our Trust team at TikTok. As a Site Reliability Engineer, you will be responsible for building and maintaining the systems that protect our users' data and ensure the reliability of our platform.ResponsibilitiesManage day-to-day operations of data services, including SLA...


  • Mountain View, California, United States Tik Tok Full time

    About TikTok U.S. Data SecurityTikTok is the leading destination for short-form mobile video. Our mission is to inspire creativity and bring joy. U.S. Data Security (USDS) is a subsidiary of TikTok in the U.S. that focuses on providing oversight and protection of the TikTok platform and U.S. user data.ResponsibilitiesDevelop and maintain automation...


  • Mountain View, California, United States Atlassian Full time

    About the RoleWe're seeking a highly skilled Site Reliability Engineer to join our team at Atlassian. As a key member of our engineering organization, you'll play a critical role in ensuring the reliability and performance of our cloud-based services.ResponsibilitiesDesign and implement scalable solutions to improve service reliability and...


  • Mountain View, California, United States Groq Full time

    Reliability Engineer at GroqWe're looking for a skilled Reliability Engineer to join our team at Groq. As a Reliability Engineer, you will be responsible for ensuring the reliability and performance of our cloud-based infrastructure and applications.Key Responsibilities:Design and implement high-availability systems and infrastructure to minimize downtime...


  • Mountain View, California, United States Tik Tok Full time

    About the RoleWe are seeking a highly skilled Site Reliability Engineer to join our Data Platform team. As a Site Reliability Engineer, you will be responsible for ensuring the reliability, fault-tolerance, and scalability of our data infrastructure.Key ResponsibilitiesDesign, build, and maintain large-scale data systems that support core products and...


  • Mountain View, California, United States Atlassian Full time

    About the RoleWe're seeking a highly skilled Site Reliability Engineer to join our team at Atlassian. As a Site Reliability Engineer, you will play a critical role in ensuring the performance, reliability, and scalability of our cloud-based services.Key ResponsibilitiesDesign, implement, and maintain scalable and reliable cloud infrastructureCollaborate with...