Current jobs related to Site Reliability Engineer - Palo Alto - criteo

Site Reliability Engineering Manager

3 weeks ago

Palo Alto, California, United States Plume Full time

About the JobThe Technical Manager will lead a team of Site Reliability Engineers, providing technical guidance and oversight. Key responsibilities include:Supervise a team of Site Reliability Engineers who provide first-line support to Customer Clouds.Attend and conduct customer Meetings for Project and Roadmap specification.Manage growth and performance of...
Site Reliability Engineer

1 week ago

Palo Alto, United States JPMorgan Chase Full time

DESCRIPTION:Duties: Design, build and operate large-scale production systems. Debug complex problems across the whole stack. Develop tools for application engineering teams based on operations requirements for micro services. Improve alerting and monitoring for the existing services. Assist with onboarding and mentoring new engineers. Collaborate with the...
Technical Site Reliability Engineering Leader

4 weeks ago

Palo Alto, California, United States Plume Full time

About the CompanyPlume is a leader in the smart home and small business market, delivering services to over 50 million locations globally. Our software-defined network platform allows CSPs to decouple their service offerings from hardware and rapidly curate and deliver new services over a multi-vendor, open-platform architecture.We're looking for a seasoned...
Manager, Site Reliability Engineering

1 month ago

Palo Alto, United States Navan Group Full time

At Navan, “It’s all about the user. All of them.” We’re passionate about providing a seamless one-stop experience for business travelers, no matter how they travel, where they stay, or where they’re going. We are committed to building the most reliable, scalable, and efficient infrastructure to ensure our services are always available when...
Site Reliability Engineer

4 weeks ago

Palo Alto, California, United States Tesla Full time

Role DescriptionThis is a challenging opportunity to work with cutting-edge technology and contribute to the development of automation tools. As a Site Reliability Engineer, you will drive root cause analysis of system failures, manage containerization technology, and maintain site performance using various tools.Expected CompensationThe estimated annual...
Site Reliability Infrastructure Engineer

3 weeks ago

Palo Alto, California, United States Assured Full time

About Assured">At Assured, we modernize insurance by providing software solutions to large insurers. We empower them to win in a technology-driven world with self-service claim filing software and backend fraud detection.">Job Overview">We are looking for a Site Reliability Engineer to join our team. The ideal candidate will have experience working in a...
Manager, Site Reliability Engineering

1 week ago

Palo Alto, United States Plume Design, Inc. Full time

We’re looking for a seasoned Technical Manager, experienced with Customer Facing environments, to Captain our Site Reliability Engineering Team. This team is focused on deployments, fixes, and sustainability. The right candidate needs to have strong technical knowledge in key areas while focusing on customer satisfaction. What You’ll Do: Supervise a...
Manager, Site Reliability Engineering

4 days ago

Palo Alto, United States Plume Full time

Job DescriptionJob DescriptionLife at PlumeAt Plume, we believe that technology isn't about moving faster, it's about making life's moments better. Which is why we've built the world's first, and only, open and hardware-independent service delivery platform for smart homes, small businesses, enterprises, and beyond. Our SaaS platform uses...
Site Reliability Engineer

1 week ago

Palo Alto, United States criteo Full time

At Criteo we face some of the most challenging, but interesting problems in the IT industry. We work at a scale of speed, performance and complexity that few others in the industry can compete with. Our data is not big it’s absolutely HUGE. We have about 40 petabytes in our Hadoop storage (more than 30 TB extra per day), we take less than 10ms to respond...
Sr. Site Reliability Engineer, Dojo

6 days ago

Palo Alto, United States Tesla, Inc. Full time

We are seeking an experienced Site Reliability Engineer (SRE) to join our team responsible for ensuring the reliability and performance of our Dojo cluster infrastructure. The successful candidate will be responsible for providing exceptional customer response and support, managing third-party systems, and collaborating with various teams to ensure seamless...
Reliability Engineer for Distributed Systems

3 weeks ago

Palo Alto, California, United States Tesla Full time

Company OverviewTesla is a leading electric vehicle manufacturer accelerating the world's transition to sustainable energy. Our mission-critical systems enable our engineers to design and develop innovative solutions.Job SummaryWe are seeking a highly skilled Site Reliability Engineer to join our Design Technology Operations team. This position will be...
Principal Site Reliability Engineer with Luma AI

3 days ago

Palo Alto, United States jobs.lever.co - ATS Full time

The SRE role at Luma AI sits with the Infrastructure and Research teams and is responsible for our GPU clusters. Luma runs on '000s of H100 GPUs across multiple providers and clusters for Training, Data Processing and Inference. We need a highly skilled SRE to ensure those clusters are healthy and to build the monitoring and management tools we need to make...
Reliability Engineering Team Lead

6 days ago

Palo Alto, California, United States Navan Group Full time

At Navan, our vision is centered around providing a seamless user experience. We are passionate about delivering a one-stop-shop for business travelers, catering to their diverse needs and preferences.We are committed to building robust, scalable, and efficient infrastructure that ensures our services are always available when needed most. As we continue to...
Reliability Engineering Expert

13 hours ago

Palo Alto, California, United States Wing Inflatables, Inc. Full time

Role OverviewWing is seeking a highly experienced Design Reliability Engineer to join our Design for Excellence team in Palo Alto, California. As a key contributor to ensuring the reliability and robustness of our hardware designs, you will leverage your deep understanding of testing methodologies and reliability engineering principles to drive significant...
Site Reliability Engineer

4 weeks ago

Palo Alto, United States Ario Full time

Our Mission at ArioYou generate enormous amounts of personal data when you use the internet. This data is extremely powerful and could make your life easier, better, more magical. So why aren't you using it? At Ario, we've developed a product that effortlessly enables you to consolidate your digital world - from your Twitter likes to your Kindle highlights -...
Reliability Engineering Professional

3 weeks ago

Palo Alto, California, United States Tesla Full time

**About the Role:**Tesla is looking for a highly motivated Reliability Engineering Professional to join our team. As a key member of our engineering group, you will play a crucial role in ensuring the reliability of our innovative products.This position offers an exciting opportunity to contribute to the development of cutting-edge technology and shape the...
Reliability Engineering Expert

2 days ago

Palo Alto, California, United States Testing Solutions GmbH Full time

Unlock the Future of Multimodal AILuma AI is revolutionizing the field of artificial intelligence by pushing beyond language models and developing more aware, capable, and useful systems. As a Senior Software Engineer in our Reliability team, you will play a critical role in defining, measuring, and improving the reliability of our GPU clusters. Our SRE team...
Hardware Reliability Engineer

3 days ago

Palo Alto, United States Wing Inflatables, Inc. Full time

About Wing:Wing offers drone delivery as a safe, fast, and sustainable solution for last mile logistics. Consumer appetites for on-demand services are increasing, but current delivery methods are inefficient, costly, and contribute to road accidents and air pollution. Wing’s fleet of highly automated delivery drones can transport small packages directly...
Vehicle Technology Reliability Expert

3 weeks ago

Palo Alto, California, United States Tesla Full time

About the JobWe are looking for an experienced Site Reliability Engineer to join our team. Your responsibilities will include building release processes, managing Kubernetes infrastructure, and maintaining site performance. You will also participate in on-call rotations and facilitate production and security incidents.Required SkillsTo succeed in this role,...
Sr. Mechanical Reliability Engineer, Megapack

5 days ago

Palo Alto, United States Tesla Full time

As a Sr. Mechanical Reliability Engineer focusing on Tesla Megapack, you will play a key role in designing reliability into Tesla's industrial energy storage systems ensuring the products meet the highest standards of reliability. This role follows the reliability lifecycle of the product from concept to design, validation testing/analysis, manufacturing,...

Site Reliability Engineer

1 week ago

Palo Alto, United States criteo Full time

At Criteo we face some of the most challenging, but interesting problems in the IT industry. We work at a scale of speed, performance and complexity that few others in the industry can compete with. Our data is not big it’s absolutely HUGE. We have about 40 petabytes in our Hadoop storage (more than 30 TB extra per day), we take less than 10ms to respond to an ad request and we deliver billions of ads per day.

To help us solve these challenges, Criteo is looking for the best of the best in terms of engineering talent within our cool and geeky environment

SREs develop, maintain and operate software that automates the traditional roles of the system administrator.

Challenges of this role

Build systems that make the best decision in a very short time, half a million times per second. Across three continents and 15 datacenters, 24/7.
Store and process tens of TB of data, in one hour, using over a thousand nodes on our Hadoop cluster. And constantly get better at it while keeping the lights on.
Get stuff done. A problem partially solved today is better than a perfect solution next year. Have an idea during the night? Code it in the morning, push it at noon, test it in the afternoon and deploy it the next morning.
High stakes, high rewards: 1% increase in performance may yield millions for the company. But if a single bug goes through, the Internet goes down (we’re only half joking, our work reaches 95% of the internet population).
Develop open source projects. Because we are working at the forefront of technology, we are dealing with problems that few have faced. We’re big users of open source, and we’d like to give back to the community.
Work with engineering leadership to develop long-term roadmaps and architectures to scale our infrastructure and improve our SLA.

Strong candidates qualifications

Reliability – You apply rigorous thinking when designing systems or software. You focus on resilience, monitoring and high availability.
Scalability – You like working with problems involving huge amounts of servers with requirements for high throughput and millisecond response times.
Flexibility - It’s not really important what technology you’ve used before (Ruby, Python, Java, Scala…) – we’re looking for people who can adapt very quickly and with an open mind. Our engineers will choose and use the best tool for the job.
Curiosity – You love tinkering with Linux, Networks and Distributed Systems, work on personal projects, are curious.
Passionate – You are a problem solver, a fixer, and a creative technologist. We believe engineering is a talent and a passion, not just a skill.
Team Oriented – You need to be a great team worker and a great communicator.

Bonus

Experience at Internet scale, using the Hadoop stack, Kafka, Mesos, Cassandra etc.
Experience with tools such as Chef, Puppet or Ansible
Good knowledge of advertising technology

We use: CentOS, Windows, Chef, Graphite, Cassandra, Couchbase, ElasticSearch, Memcached, Hadoop, Spark, Kafka, Jenkins, Consul, Mesos, IIS, HAProxy, Maven, MSBuild, Gerrit, Kerberos, Active Directory, OpenAM, Nagios, Centreon, Riemann, IPMI…

Criteo R&D Culture

Empowerment – We believe in hiring the best engineers in the industry and then letting them get on with what they do best – designing, coding and releasing state of the art software.
Mobility – In our Voyager program our engineers get to pick which team they want to work on for 2-4 weeks, boosting collaboration, networking and maybe even leading to switching teams.
Agility - We work in a fast pace environment where we build and release stuff frequently to deliver value soon and adapt to changes quickly.
Variety – We have many ways to get your code to production including our Hackathon, 10% projects, Voyager and more.
Multicultural – We have engineers from all over the world for you to interact and exchange ideas with.

Our culture keeps evolving, and you will be expected to contribute actively with new ideas to complement and enhance the existing programs that include frictionless internal mobility, 10% time, mentoring, technical talks, hackathons, conferences, etc.

Are you up to the challenge?

About Criteo [CRTO]

Criteo (CRTO), the leader in commerce marketing, is building the highest performing and open commerce marketing ecosystem to drive profits and sales for retailers and brands. 2,700 Criteo team members partner with 16,000 customers and thousands of publishers across the globe to deliver performance at scale by connecting shoppers to the things they need and love. Designed for commerce, Criteo Commerce Marketing Ecosystem sees over 50 billion in annual commerce sales data. For more information, please visit www.criteo.com.

The 600+ engineers @Criteo are building the next generation digital advertising technologies that allow us to manage billions of ad impressions every day. We're working in a very fast-paced release cycle and are adding new capabilities weekly and even daily.

A few figures:

15 datacenters (9 with computing capacity + 6 dedicated to network connectivity) across US, EU, APAC
More than 24K servers, running a mix of Linux and Windows
One of the largest Hadoop clusters in Europe with close to 108PB of storage and 32.000 cores
150B HTTP requests and close to 4B unique banners displayed per day
Close to 3M HTTP requests per second handled during peak times
130Gbps of bandwidth, half of it through peering exchanges

We recognize that engineering culture is key for building a world-class engineering organization. Our core values are getting stuff done, collaboration and respect, code quality, striving for excellence, and having fun at what we do.

Do you want to know more about life in the R&D?

Youtube:R&D Criteo @ US/R&D Criteo @ Global

Our blog:http://www.criteolabs.com

Twitter: @CriteoEng
Also onGlassdoor

Criteo is an Equal Opportunity Employer

#LI-CD1

To all recruitment agencies:Criteo does not accept agency resumes. Please do not forward resumes to our jobs alias, Criteo employees or any other company location. Criteo is not responsible for any fees related to unsolicited resumes.

#J-18808-Ljbffr

Americas

Europe

Asia / Oceania

Africa

Current jobs related to Site Reliability Engineer - Palo Alto - criteo

Site Reliability Engineer