We have other current jobs related to this field that you can find below
-
Senior Site Reliability Engineer
4 weeks ago
Los Angeles, California, United States First Resonance Full timeJob Title: Senior Site Reliability EngineerFirst Resonance is a forward-thinking company at the forefront of hardware development for cutting-edge products like electric airplanes, autonomous vehicles, and robotics. As a Senior Site Reliability Engineer at First Resonance, you will be instrumental in enhancing the efficiency, scalability, and reliability of...
-
Senior Site Reliability Engineer
2 weeks ago
Los Angeles, United States Dice Full timeDice is the leading career destination for tech experts at every stage of their careers. Our client, Motion Recruitment Partners, LLC, is seeking the following. Apply via Dice today! Job Description A Fortune 500 consulting company is looking for SREs with Subject Matter Expertise with Dynatrace. You'll design, install, and configure Dynatrace onto...
-
Site Reliability Engineer
3 weeks ago
Los Angeles, United States Motion Recruitment Full timeOur Client, A Global Entertainment and Technology Company is looking for an Site Reliability Engineer to join their team in either San Diego, Los Angeles, or San Francisco!REMOTE POSITION: however candidates will need to be local to one of the three worksites to go in for occasional meetings and team events. ***This is a 6 month Contract Position With a...
-
Site Reliability Engineer
3 weeks ago
Los Angeles, United States Motion Recruitment Full timeOur Client, A Global Entertainment and Technology Company is looking for an Site Reliability Engineer to join their team in either San Diego, Los Angeles, or San Francisco!REMOTE POSITION: however candidates will need to be local to one of the three worksites to go in for occasional meetings and team events. ***This is a 6 month Contract Position With a...
-
Site Reliability Engineer
3 months ago
Los Angeles, United States Adastra replica Full timeJob DescriptionJob DescriptionOur client is looking for an experienced Site Reliability Engineer to design, operate, maintain, and scale mission-critical infrastructure and products. Products include (but are not limited to) automated Hardware-In-The-Loop (HITL) data analysis systems, vehicle configuration sign-off tools, continuous integration systems for...
-
Site Reliability Engineer
3 days ago
Los Angeles, United States TikTok Full timeThis new, security-first division was created to bring heightened focus and governance to our data protection policies and content assurance protocols to keep.The teams within USDS that deliver on this commitment daily span across Trust & Safety, Security & Privacy, Engineering, User & Product Ops, Corporate Functions and more.Site Reliability...
-
Site Reliability Engineer
3 months ago
Los Angeles, United States eTek IT Services, Inc. Full timeJob DescriptionJob DescriptionOverviewThe Site Reliability Engineer will play a crucial role in ensuring the reliability, scalability, and performance of our infrastructure and applications, ultimately contributing to the seamless operations of our systems. This role is vital in maintaining a high level of uptime and system efficiency, enhancing the overall...
-
Lead Site Reliability Engineer
4 days ago
Los Angeles, California, United States Motion Recruitment Full timeJob OverviewA prominent consulting firm is seeking experienced Site Reliability Engineers (SREs) with specialized knowledge in Dynatrace. In this role, you will be responsible for the design, installation, and configuration of Dynatrace on Kubernetes clusters for a variety of enterprise clients. This position is remote, with occasional travel to one of the...
-
Principal Site Reliability Engineer
4 days ago
Los Angeles, California, United States City National Bank Full timePRINCIPAL SITE RELIABILITY ENGINEERWHAT IS THE OPPORTUNITY?As a Principal Site Reliability Engineer, you will leverage your expertise in software development, systems engineering, and operational management to design and maintain robust, scalable systems. Your primary focus will be to guarantee the reliability, scalability, and optimal uptime of City...
-
Principal Site Reliability Engineer
4 days ago
Los Angeles, California, United States City National Bank Full timePOSITION: SITE RELIABILITY PRINCIPAL ENGINEEROVERVIEW:As a Site Reliability Engineer (SRE), you will leverage your expertise in software development, systems engineering, and operational practices to construct and maintain large-scale, resilient systems. Your primary responsibility will be to guarantee the reliability, scalability, and optimal uptime of City...
-
Senior Reliability Engineer
3 days ago
Los Angeles, California, United States Westlake Chemical Corporation Full timeSenior Reliability Engineer - Reliability LeadThis role is responsible for overseeing and coordinating the activities of engineers within the designated area, providing essential guidance and direction.Key ResponsibilitiesResponsibilities may include, but are not limited to, the following:- Supervise and coordinate the activities of engineers in the assigned...
-
Reliability Engineer
3 months ago
Los Angeles, United States Kindeva Drug Delivery Company Full timeThe Reliability Engineer will lead the sites Asset Reliability agenda, effectively promoting analytical problem-solving techniques and structured reliability improvement processes. We have an immediate opening for a Reliability Engineers at Kindeva’s Northridge, CA manufacturing facility. The Reliability Engineer will lead the sites Asset Reliability...
-
Site Reliability Engineer
3 weeks ago
Los Angeles, United States Journal Technologies Full timeJob DescriptionJob DescriptionSalary: $85,000.00 to $105,000.00 USD Who We Are: At Journal Technologies, we believe our technology can be a force for good in the world ensuring the proper and efficient functioning of some of the most foundational aspects of society - the courts and justice system. We create and implement enterprise software that supports...
-
Los Angeles, California, United States Riot Games Full timeSoftware Reliability Engineering at Riot is challenged with diving into our most ambiguous technology spaces between games, central services and infrastructure to solve our reliability and visibility challenges as Riot continues to scale into a multi-game ecosystem. In order to succeed as a Staff Engineer on this team you will need to be able to partner with...
-
GCP Site Reliability Engineer
2 weeks ago
Los Angeles, United States Luytens Technology Solutions Pvt. Ltd. Full timeJob DescriptionJob DescriptionEx Google Candidate required:Overview:We are seeking a talented GCP Site Reliability Engineer with prior experience at Google to join our team. The role is of great importance as it involves ensuring the reliability, scalability, and performance of our infrastructure on Google Cloud Platform (GCP). The GCP Site Reliability...
-
Site Reliability Engineer with 2K
2 weeks ago
Los Angeles, United States eTek IT Services, Inc. Full timeJob DescriptionJob DescriptionJob DescriptionPosition: Site reliability EngineerLocation: RemoteDuration: 1 yearRequired Qualification:6+ years of demonstrated influence across one or more teams for large scale projects that drive impact and improvement across the organization& 6+ years of developing tools for automation of processes or augmenting off the...
-
Infrastructure Reliability Engineer
4 days ago
Los Angeles, California, United States eTek IT Services, Inc. Full timeSite Reliability EngineereTek IT Services, Inc. is seeking a skilled Site Reliability Engineer to enhance our operational capabilities. This position plays a crucial role in ensuring the dependability, scalability, and efficiency of our systems and applications, thereby improving overall user satisfaction.Core Responsibilities:Architect and deploy monitoring...
-
Senior Site Reliability Engineer, CORE
5 months ago
Los Gatos, California, United States Netflix Full time"At Netflix, we strive to bring joy to people across the world through amazing stories. As we grow internationally, we are continually enhancing our cloud-based infrastructure to improve our performance, scalability, and reliability.The SRE team's goal is to ensure customer joy by successfully managing risk and minimizing impact across Netflix. We do this...
-
Reliability Assurance Engineer
4 days ago
Los Angeles, California, United States Kindeva Drug Delivery Company Full timePosition Overview: The Reliability Engineer is responsible for spearheading the Asset Reliability initiatives at our manufacturing facility, utilizing analytical problem-solving methodologies and structured processes for reliability enhancement.Key Responsibilities:Maximize equipment uptime across all essential machinery.Oversee and enhance the Root Cause...
-
Cloud Infrastructure Reliability Engineer
4 days ago
Los Angeles, California, United States Motion Recruitment Full timeOur client, Motion Recruitment, is seeking a Site Reliability Engineer to enhance their team.REMOTE POSITION: Candidates must be local to designated worksites for occasional meetings and team events.***This is a 6-month Contract Position With Potential for Conversion or Extension***As a Site Reliability Engineer, you will be part of the CICD and Cloud Site...
Senior Site Reliability Engineer
2 months ago
Join the
Sustainable Talent
team, supporting
NVIDIA
as a
Senior Site Reliability Engineer
supporting the Infrastructure, Planning, and Process organization.
This is a W-2 full-time
contract
based in
Santa Clara, CA,
with
Hybrid
work options. We offer competitive pay
$75 - $90/hr
based on factors like experience, education, location, etc. and provide full benefits, PTO, and amazing company culture
As an SRE, you will be troubleshooting and managing our client's on-premise infrastructure to support various software engineering teams company wide. Keen attention to detail, problem-solving abilities, and a solid knowledge base are essential.
What you’ll be doing:
Working on systems deployed in NVIDIA's internal cloud making them available and reliable for our end users.
Monitor system performance and troubleshoot issues related to CPU, memory, disk, and network utilization.
Providing high quality of user support.
Monitoring KPIs and making sure that team’s SLAs are met.
Managing and maintaining production Kubernetes clusters.
Drive automation of monitoring to gain more insight into applications and system health.
Craft and implement critical metrics using various analytics methods and dashboards.
Reuse AI techniques to extract useful signals about machines and jobs from the data generated.
What we need to see:
Experience working with on-premise infrastructure.
Experience managing and troubleshooting Linux systems.
Experience managing systems installed data centers. Proficient with BMC (Redfish), KVM, and IPMI tools.
Background in Databases like SQL (MySQL) and timeseries DBs like Prometheus.
Strong knowledge of networking principles and protocols, including TCP/IP, DNS, DHCP, and VLANs.
Experience with data analytics/visualization tools like Kibana, Grafana, Splunk etc.
Strong Ansible or Jenkins skills.
Proficient with Kubernetes, dockers & virtualization.
Proficient using source code management and binary repository systems like GitLab, GitHub, Artifactory, Perforce etc.
Advanced knowledge of standard methodologies related to security.
5+ years of proven SRE experience.
Experience with Python or Bash scripting.
Bachelor's degree in Computer Science, Information Technology, or related field, or equivalent experience.
Ways to stand out from the crowd:
Working knowledge of OpenStack.
Previous experience with SRE teams managing on-prem infrastructure.
Experience managing NVIDIA hardware like GPUs and Tegras.
Thrives in a multi-tasking environment with constantly evolving priorities.
Prior experience with large scale operations team.
Experience with Windows server infrastructure.
Outstanding interpersonal skills and communication with all levels of management.
Experience with using and improving data centers.
Ability to analyze sophisticated problems into simple sub problems and then reuse available solutions to implement most of those.
Ability to design simple systems that can work efficiently without needing much support.
Sustainable Talent is a M/F+, disabled, and veteran equal employment opportunity and affirmative action employer.
#J-18808-Ljbffr