Staff Site Reliability Engineer
2 months ago
About the Company:
At Coupang we are building the future of ecommerce. Born out of an obsession to make shopping, eating,
and living easier than ever, we’re collectively disrupting the multi-billion-dollar e-commerce industry from
the ground up. We exist to wow our customers. We know we’re doing the right thing when we hear our
customers say, “How did we ever live without Coupang?” We are one of the fastest-growing e-commerce
companies that established an unparalleled reputation for being a dominant and reliable force in South
Korean commerce.
We are proud to have the best of both worlds — a startup culture with the resources of a large global
public company. This fuels us to continue our growth and launch new services at the speed we have
been at since our inception. We are all entrepreneurial surrounded by opportunities to drive new initiatives
and innovations. At our core, we are bold and ambitious people who like to get our hands dirty and make
a hands-on impact. At Coupang, you will see yourself, your colleagues, your team, and the company
grow every day.
Our mission to build the future of commerce is real. We push the boundaries of what’s possible to solve
problems and break traditional tradeoffs. Join Coupang now to create an epic experience in this always-
on, high-tech, and hyper-connected world.
About the Role:
Site Reliability Engineers (SREs) at Coupang is a mission-critical role which combines software and
system engineering to build, run and scale our complex, large-scale ecommerce systems. As part of the
Site Reliability Engineering team, you will be responsible for ensuring all our customer facing services are
healthy, monitored, automated, and designed to scale. As SRE organization we take pride in handling
“operations as an engineering” problem with automation first approach. You will use your background to
build best in class infrastructure automation for areas such as Observability, Incident management,
Disaster Recovery, Load testing, Capacity engineering and many more. In this role you will work very
closely with our product development teams from an early stage of design to all the way helping resolve
any production incidents, maintaining SLI/SLA bar for production services and influencing them with SRE
principles and best practices. If you take pride in complete ownership, have a passion for solving complex
technical challenges for large scale distributed systems and demeanor to work and communicate
effectively across team boundaries, this is the role for you
Key Responsibilities:
Serve as a primary point responsible for the reliability, health, and performance of all Coupang
customer-facing services.
Gain deep knowledge of Coupang application workflow and dependencies.
Spearheading and conceptualizing revolutionary designs in critical service architecture.
Conducting comprehensive architecture reviews leading re-architecting initiatives to set industry
leading benchmarks in performance, reliability and availability.
Lead and drive large scale technical initiatives across multiple engineering teams.
Be able to drive collaboration effectively across organizational boundaries, be able to build strong
stakeholder relationships to achieve broad organizational objectives.
Identify and implement scalable solutions for complex technical problems. Be the change driver.
Self-motivated to be able to navigate the ambiguity with large initiatives and find solutions to
accomplish the goal.
Be the SRE champion/lead working with rest of the technical leaders across Coupang to define
and drive the engineering roadmap.
Contribute towards hiring and building a world class team. Mentor and coach junior engineers on
the team.
Communicate effectively with people at all levels of the organization.
Essential Qualifications:
10+ years of industry experience building and operating large scale distributed systems.
Deep UNIX/Linux systems knowledge and administration background.
Strong programming skills in one or more of: Python, Java, Golang, C++.
Strong problem-solving and analytical skills spanning systems, network (TCP/IP) and code, with a
focus on data-driven decision-making.
Proficient with cloud-based infrastructure, including AWS, Azure, or Google Cloud Platform.
Strong understanding of DevOps and SRE practices, including continuous integration, continuous
delivery, and infrastructure as code (IaC).
Proficient with containerization and orchestration technologies, such as Docker and Kubernetes.
Knowledge of observability ecosystem including metrics, logging, tracing and tools, such as
Prometheus, Grafana, Elastic Stack, Datadog, or New Relic.
Excellent communication and collaboration skills, with the ability to work with teams across
distinct functions and technical domains.
Preferred Qualifications:
Master’s degree in computer science, Engineering, or a related technical field.
Prior experience working with large scale web-based Java architectures and JVM configuration.
Professional certifications in cloud platforms, monitoring tools, or related technologies.
Previous experience working on a large-scale ecommerce platform.
#J-18808-Ljbffr
We have other current jobs related to this field that you can find below
-
Site Reliability Engineer
2 months ago
Seattle, United States Capgemini Full time**Site Reliability Engineer** **FTE with benefits** Our team is looking to add experienced Site Reliability / DevOps Engineer to our team. + Experiencedwith **Python and Shell Scripting.** + **Shouldhave extensive experience with Azure or AWS (Azure preferred)** + **Experiencewith Monitoring and Observability - Datadog** + **Experiencewith Infrastructure as...
-
Senior Site Reliability Engineer
2 months ago
Seattle, United States Saxon Global Full timeStarbucks Senior Site Reliability Engineer (Cloud) 8-month contract (Likely extension to 18 month with strong performance) Hybrid - (Must be local to the Seattle area, onsite at Starbucks headquarters 3 days a week with 2 days remote) Job Summary and Mission This position contributes to Starbucks on their Data Platform Services team. This team maintains...
-
Site Reliability Engineer
2 weeks ago
Seattle, United States Perkins Coie Full timeJob Description: Perkins Coie is seeking a highly skilled and experienced Site Reliability Engineer (SRE) specializing in automation and storage management to join our team. The ideal candidate will be responsible for designing, implementing, and maintaining our storage infrastructure to ensure high availability and performance. They will be part of the SRE...
-
Lead Site Reliability Engineer
2 weeks ago
Seattle, United States Capgemini Full timeLeadSite Reliability Engineer Seattle,WA FTE/Direct hiring with benefits NoRemote - Onsite and Hybrid position fromWA location only Qualification& Skills 8+ years ofexperience in Site Reliability Engineering or related field Develop,maintain and configure cloud observability systems (e.g., Datadog, Splunk,OpenTelemetry, APM, etc.). Buildflexible...
-
Site Reliability Engineer
2 months ago
Seattle, United States Perkins Coie Full timeJob Description: Perkins Coie is seeking a highly skilled and experienced Site Reliability Engineer (SRE) specializing in automation and storage management to join our team. The ideal candidate will be responsible for designing, implementing, and maintaining our storage infrastructure to ensure high availability and performance. They will be part of the SRE...
-
Site Reliability Engineer III/Network
2 weeks ago
Seattle, United States F5 Networks Full timeAt F5, we strive to bring a better digital world to life. Our teams empower organizations across the globe to create, secure, and run applications that enhance how we experience our evolving digital world. We are passionate about cybersecurity, from protecting consumers from fraud to enabling companies to focus on innovation. Everything we do centers around...
-
Senior Site Reliability Engineer
1 month ago
Seattle, Washington, United States Flexe Full timeFlexe solves the hardest omnichannel logistics problems for the world's largest retailers and brands. Integrating technology, open logistics networks, and elastic economic models allows Flexe customers to move fast, at scale, and with precision. Founded in 2013 and headquartered in Seattle, Flexe brings deep logistics expertise and enterprise-grade...
-
Lead Site Reliability Engineer
1 month ago
Seattle, United States Capgemini Full time**LeadSite Reliability Engineer** **Seattle,WA** **FTE/Direct hiring with benefits** **NoRemote - Onsite and Hybrid position fromWA location only** **Qualification& Skills** + 8+ years ofexperience in Site Reliability Engineering or related field + Develop,maintain and configure cloud observability systems (e.g., Datadog, Splunk,OpenTelemetry, APM, etc.). +...
-
Senior Site Reliability Engineer
3 months ago
Seattle, United States SingleStore Full timePosition Overview MemSQL is seeking a Senior Site Reliability Engineer to help drive our Kubernetes product strategy surrounding our managed service. You will be at the forefront; crafting the design, building out the collaborated vision, and sustaining your envisioned product strategy. This role will be an integral part of building our managed service...
-
Senior Site Reliability Engineer
2 months ago
Seattle, United States Sentry Full timeAbout Sentry Bad software is everywhere, and we’re tired of it. Sentry is on a mission to help developers write better software faster, so we can get back to enjoying technology. With more than $217 million in funding and 100,000+ organizations that believe we’re on to something, we're building performance and error monitoring tools that help companies...
-
Senior Site Reliability Engineer
2 weeks ago
Seattle, United States West500 Partners Full timeOur client is a fast-growing downtown Seattle startup developing AI automation for professional services, including legal technology and medical records. They have a great product market fit and rapidly increasing revenues and are currently in need of a local Software Engineering Lead with CI/CD expertise, an AWS background, and a keen interest in innovative...
-
Senior Site Reliability Engineer
2 weeks ago
Seattle, United States West500 Partners Full timeOur client is a fast-growing downtown Seattle startup developing AI automation for professional services, including legal technology and medical records. They have a great product market fit and rapidly increasing revenues and are currently in need of a local Software Engineering Lead with CI/CD expertise, an AWS background, and a keen interest in innovative...
-
Reliability Engineer
2 weeks ago
Seattle, United States JLL Full timeOVERVIEW - Reliability Engineer JLL is seeking aReliability Engineerto join our team! In JLL Work Dynamics our most significant assets are our "People" and our "Clients". We will act with Dignity and Respect, make Ethical Decisions, champion Corporate Responsibility and serve as a driving force for a Sustainable Asset Management. There are opportunities for...
-
Principal Site Reliability Engineer
2 weeks ago
Seattle, United States Oracle Full timeOCI Incident Response is the first line of defense for maintaining the high availability of Oracle’s cloud. We make customer-impacting events shorter, less frequent, and less impactful by providing large-scale incident management. We are front-and-center in driving down event duration by using our operational experience, knowledge of standard processes,...
-
Principal Site Reliability Engineer
2 weeks ago
Seattle, United States Oracle Full timeOCI Incident Response is the first line of defense for maintaining the high availability of Oracle’s cloud. We make customer-impacting events shorter, less frequent, and less impactful by providing large-scale incident management. We are front-and-center in driving down event duration by using our operational experience, knowledge of standard processes,...
-
Senior Site Reliability Engineer, Object Storage
4 weeks ago
Seattle, United States Apple Full timeSenior Site Reliability Engineer, Object Storage Seattle, Washington, United States Software and Services The Apple Services Engineering (ASE) team is one of the most exciting examples of Apple’s long-held passion for combining art and technology. These are the people who power the App Store, Apple TV, Apple Music, Apple Podcasts, and Apple Books. And they...
-
Senior Site Reliability Engineer
2 weeks ago
Seattle, United States Censys Full timeCensys knows the internet and cloud better than anyone else. Attack Surface Management provides customers with an attacker-centric view of all externally facing internet and cloud to extend visibility, prioritize, and remediate the most critical risk exposures that will actually lead to a breach. Our daily IPv4 scans and the world’s largest SSL/TLS...
-
Software Engineer
2 months ago
Seattle, United States Lacework Full timeAt Lacework, we strive to provide a supportive, collaborative environment where people are empowered to do the best work of their careers. Our team members enjoy solving complex problems, big sky thinking, and obsess over getting the details right. We love what we do and are proud of our work to secure clouds and container environments for thousands of users...
-
Site Reliability
1 month ago
Seattle, United States Canonical Full timeThis role is an opportunity for a hands-on, but literally hands-off, technologist with a passion for Linux to build a career with Canonical and drive the success with those leveraging Ubuntu and open source products. If you have experience of IT operations automation, Infrastructure as Code and a passion for technology, then you will enjoy working with some...
-
Software Engineer
1 month ago
Seattle, United States Lacework Full timeAt Lacework, we strive to provide a supportive, collaborative environment where people are empowered to do the best work of their careers. Is this your next job Read the full description below to find out, and do not hesitate to make an application. Our team members enjoy solving complex problems, big sky thinking, and obsess over getting the details right....