Current jobs related to Senior Site Reliability Engineer - Palo Alto, California - SHEIN Technology LLC

Staff Site Reliability Engineer

5 hours ago

Palo Alto, California, United States General Motors Full time

Job DescriptionAt General Motors, we're revolutionizing the automotive industry with software-defined vehicles. As a Site Reliability Engineer, you'll play a critical role in ensuring the reliability, scalability, and security of our production systems.ResponsibilitiesLead the Site Reliability engineering effort to improve anomaly detection, platform...
Site Reliability Engineer

1 day ago

Palo Alto, California, United States Rubrik Full time

About The RoleRubrik is seeking a skilled Site Reliability Engineer to join our team. As a Site Reliability Engineer, you will play a critical role in ensuring the high availability and durability of our databases, as well as establishing best practices for internal teams to write performant SQL queries.Key ResponsibilitiesEnsure high availability and...
Site Reliability Engineer

2 weeks ago

Palo Alto, California, United States General Motors Full time

About the RoleWe are seeking a highly skilled Site Reliability Engineer to join our team at General Motors. As a Site Reliability Engineer, you will play a critical role in ensuring the stability, scalability, and reliability of our software-defined vehicle solutions.Key ResponsibilitiesLead the Site Reliability engineering effort to improve anomaly...
Staff Site Reliability Engineer

1 week ago

Palo Alto, California, United States EarnIn Full time

About EarnInEarnIn is a pioneering financial technology company that specializes in building products that deliver real-time financial flexibility for individuals with unique financial needs. Our mission is to provide access to earned wages without mandatory fees, interest rates, or credit checks.We have a strong leadership team and world-class funding...
Production Engineer

3 weeks ago

Palo Alto, California, United States Snarkify Full time

About the RoleWe are seeking a highly skilled and motivated Production Engineer / Site Reliability Engineer / DevOps Specialist to join our team at Snarkify. As a key member of our infrastructure team, you will play a critical role in ensuring the stability, scalability, and performance of our groundbreaking Zero-Knowledge Proof (ZKP) prover network.Key...
Senior Supplier Reliability Engineer

3 weeks ago

Palo Alto, California, United States Rivian Full time

About Rivian Rivian is dedicated to preserving the spirit of adventure for generations to come. This commitment extends to the emissions-free Electric Adventure Vehicles we manufacture and the innovative, bold individuals we aim to attract. As a forward-thinking organization, we continuously push the boundaries of what is achievable, never settling for the...
Senior Mechanical Reliability Engineer, Megapack

4 days ago

Palo Alto, California, United States Tesla Full time

About the RoleWe are seeking a highly skilled Sr. Mechanical Reliability Engineer to join our team at Tesla, focusing on the Megapack industrial energy storage system. As a key member of our reliability team, you will play a critical role in designing and ensuring the reliability of our products, from concept to field operation.ResponsibilitiesFacilitate...
Senior Software Engineer

1 week ago

Palo Alto, California, United States Luma AI Full time

About the RoleLuma AI is a leading AI research and development company, and we're seeking a highly skilled Senior Software Engineer - Reliability Expert to join our team. As a key member of our infrastructure team, you will be responsible for defining, measuring, and improving the reliability of our GPU clusters.Key ResponsibilitiesCollaborate with research...
Power Electronics Reliability Engineer

3 days ago

Palo Alto, California, United States Tesla Full time

Job Title: Power Electronics Reliability EngineerAt Tesla, we're looking for a skilled Power Electronics Reliability Engineer to join our team. As a key member of our Energy team, you'll play a crucial role in designing reliability into our Industrial, Residential, and Charging products.Key Responsibilities:Set and communicate reliability requirements and...
Reliability Engineer

2 weeks ago

Palo Alto, California, United States General Motors Full time

About the RoleWe are seeking a highly skilled Site Reliability Engineer to join our team at General Motors. As a Site Reliability Engineer, you will play a critical role in ensuring the stability, scalability, and reliability of our software-defined vehicle solutions.Key ResponsibilitiesLead the Site Reliability engineering effort to improve anomaly...
Lead Reliability Engineer

3 weeks ago

Palo Alto, California, United States Rubrik Full time

OverviewAs a Lead Reliability Engineer at Rubrik, you will play a crucial role in ensuring the seamless operation of our infrastructure services while preparing for future scalability.Key ResponsibilitiesIn this position, you will be tasked with:Maintaining optimal availability and durability of our database systems.Establishing and promoting best practices...
**Reliability Engineer

3 weeks ago

Palo Alto, California, United States Tesla, Inc. Full time

About the RoleWe are seeking a highly skilled Reliability Engineer to join our team at Tesla, Inc. as a Sr. Mechanical Reliability Engineer, Megapack. This is a key role in designing reliability into our industrial energy storage systems, ensuring our products meet the highest standards of reliability.Key ResponsibilitiesFacilitate Design FMEA sessions to...
**Reliability Engineer

3 weeks ago

Palo Alto, California, United States Tesla, Inc. Full time

About the RoleWe are seeking a highly skilled Reliability Engineer to join our team at Tesla, Inc. as a Sr. Mechanical Reliability Engineer, Megapack. This is a key role in designing reliability into our industrial energy storage systems, ensuring our products meet the highest standards of reliability.Key ResponsibilitiesFacilitate Design FMEA sessions to...
Product Reliability Engineer

3 days ago

Palo Alto, California, United States Palantir Technologies Full time

About the RoleWe're seeking a skilled Product Reliability Engineer to join our team at Palantir Technologies. As a key member of our engineering team, you'll play a critical role in ensuring the stability and reliability of our products.Key ResponsibilitiesDevelop a deep understanding of Palantir's products and processesCollaborate with cross-functional...
Reliability Engineer

1 week ago

Palo Alto, California, United States Luma AI Full time

About the RoleLuma AI is a leading AI research and development company, and we're seeking a highly skilled Senior Software Engineer - Reliability Expert to join our team. As a key member of our infrastructure team, you will be responsible for defining, measuring, and improving the reliability of our GPU clusters.Key ResponsibilitiesCollaborate with research...
Product Reliability Engineer

23 hours ago

Palo Alto, California, United States Palantir Technologies Full time

About the RoleWe are seeking a skilled Product Reliability Engineer to join our team at Palantir Technologies. As a Product Reliability Engineer, you will play a critical role in ensuring the stability and reliability of our products.Key ResponsibilitiesDevelop a deep understanding of Palantir's products and processes.Collaborate with customer-facing,...
Reliability Engineer

1 hour ago

Palo Alto, California, United States Tesla Full time

Reliability Engineer - Charging SystemsAs a Reliability Engineer at Tesla, you will play a critical role in designing and developing reliable charging systems for our electric vehicles. This position requires a strong understanding of reliability engineering principles, as well as experience with accelerated testing methods and statistical analysis.Key...
Reliability Engineer

5 days ago

Palo Alto, California, United States Tesla Full time

Reliability Engineer - Charging SystemsAs a Reliability Engineer at Tesla, you will play a critical role in designing and developing reliable charging systems for our electric vehicles. This position requires a strong understanding of reliability engineering principles, as well as experience with accelerated testing methods and statistical analysis.Key...
Reliability Test Engineer

2 weeks ago

Palo Alto, California, United States Testing Solutions GmbH Full time

Job Summary: We are seeking a skilled Reliability Test Engineer to join our team at Testing Solutions GmbH. As a key member of our Optimus Test Team, you will play a critical role in supporting the component, sub-system, and system level testing of our Optimus Bot.Key Responsibilities:Assemble test fixtures and conduct material, performance, and reliability...
Reliability Engineer for Power Distribution

4 days ago

Palo Alto, California, United States Tesla Full time

Job SummaryWe are seeking a highly skilled Reliability Engineer to join our team at Tesla. As a key member of our Power Distribution team, you will play a critical role in designing and implementing reliability solutions for our high voltage systems.Key ResponsibilitiesDevelop and implement accelerated test plans for design validation, burn-in testing, and...

Senior Site Reliability Engineer

4 months ago

Palo Alto, California, United States SHEIN Technology LLC Full time

About the job
Job Title: Senior Site Reliability Engineer I
Reports to: Senior Manager of Site Reliability Engineering
Job Location: Palo Alto, CA, USA
Job Status: Exempt, FT
About SHEIN
SHEIN is a global online fashion and lifestyle retailer, offering SHEIN branded apparel and products from a global network of vendors, all at affordable prices. Headquartered in Singapore, with more than 15,000 employees operating from offices around the world, SHEIN is committed to making the beauty of fashion accessible to all, promoting its industry-leading, on-demand production methodology, for a smarter, future-ready industry.
Position Summary
We are looking for a Senior Site Reliability Engineer - Big Data (Official Title: Senior Site Reliability Engineer I) for our Palo Alto, CA-based office hub. Site Reliability Engineers work with the Technical Operations team at SHEIN and are hybrid software/systems engineers, whose overarching goal is to ensure that Production Services are "Always On." They strive to build the most reliable and performant systems on the planet.
SREs work closely cross-functional teams to ensure we have the right set of tools to generate, collect, analyze, visualize and alert on operational data, so we know exactly what happens across the ecosystem and can see problems before they occur and address them as quickly as possible.
They are also responsible for improving Operational Efficiency, Utilization and System Resiliency of the Platform. They own Critical Open-Source Software that our platform relies on and are core participants in every significant engineering effort underway in the platform.
They are also tasked with driving forward the operability of the platform to drive down the number of incidents while reducing MTTR. To accomplish this, the team combines software development, networking and systems engineering expertise, and a strong desire to be challenged by problems of scale and complexity to make our service better for our customers.
Job Responsibilities

Participate in an on-call rotation to ensure 24/7/365 availability of SHEIN's production system
Supervise capacity & utilization and work closely with cross-functional teams to orchestrate scale-up/down of the services
Own & operate critical open-source services like Elasticsearch, Kafka, RabbitMQ, Redis
Build tools and design processes that help improve observability and system resiliency of the platform
Triage Site Availability Incidents and proactively work towards reducing MTTR for customer impacting incidents
Partner with Service owners to implement Service Level Metrics & Service Level Objectives that act as service level health indicators
Establish design patterns for monitoring, benchmarking and deploying new features for the backend services
Develop and maintain technical documentation, network diagrams, runbooks, and procedures
Driving initiatives to evolve our current platform to increase efficiency and keep it in line with current standards and best practices
Responding to production incidents and using your experience in software development, systems engineering, and networking to proactively prevent repeatable issues
Provide relief and sustainable resolution to issues within our infrastructure
Drive initiatives with partner teams to improve the reliability and performance of the infrastructure through improved system design.
Join a culture of intolerance to manual activity which results in a highly automated environment delivering scalable solutions.
Drive efficiencies through software improvement and root cause analysis resulting in service delivery, maturity, and scalability.

Job Requirements

Bachelor's degree in Computer Science, Information Systems, or equivalent technical discipline is preferred
Experience with Big Data related component operation and maintenance, including Hadoop, Yarn, HBase, Hive, Spark, etc., is highly preferred
Experience with OSS technologies, like Elasticsearch, Kafka, and Redis, is highly preferred
Solid understanding of Linux system is preferred
Minimum 3 years working experience in an enterprise 24/7 production environment supporting mission-critical, real-time, high-traffic applications, especially in cloud environments is preferred
Systematic problem-solving approach, combined with a sense of ownership and drive
Full-stack debugging and performance optimization ability, including knowledge of Cloud systems (load balancing, caching, content distribution, etc.), continuous integration/build systems, Java, SQL and NoSQL databases
Track record monitoring and analyzing system performance, isolating issues or bottlenecks that could impact reliability, performance and scalability
Strong experience with observability tools such as Grafana, Prometheus, Zabbix etc
Good experience in any of the scripting/programming languages: Python, GoLang etc
Familiar with container technology, such as: Docker, Kubernetes, Mesos, etc.
Understanding and experience with SRE concepts and practices, including being an advocate for the elimination of toil and drive simple solutions
Good verbal and written communication skills, and be able to work effectively with geographically remote teams

Pay
$107,600.00 min - $180,200.00 max annually, Bonus & RSU offered.
Benefits and Perks
Healthcare (medical, dental, vision, prescription drugs)
Health Savings Account with Employer Funding
Flexible Spending Accounts (Healthcare and Dependent care)
Company-Paid Basic Life/AD&D insurance
Company-Paid Short-Term and Long-Term Disability
Voluntary Benefit Offerings (Voluntary Life/AD&D, Hospital Indemnity, Critical Illness, and Accident)
Employee Assistance Program
Business Travel Accident Insurance
401(k) Savings Plan with discretionary company match and access to a financial advisor
Vacation, paid holidays, floating holiday and sick days
Employee discounts
Free weekly catered lunch
Dog-friendly office (available at select locations)
Free gym access (available at select locations)
Free swag giveaways
Annual Holiday Party
Invitations to pop-ups and other company events
Complimentary daily office snacks and beverages
SHEIN Technology LLC is an equal opportunity employer committed to a diverse workplace environment.

Americas

Europe

Asia / Oceania

Africa

Current jobs related to Senior Site Reliability Engineer - Palo Alto, California - SHEIN Technology LLC

Senior Site Reliability Engineer