Site Reliability Engineering Specialist
7 days ago
Cosm is a global technology company that specializes in creating immersive experiences. We help our partners design and build spaces and content that blur the lines between the physical and virtual worlds across various markets. Our team brings together innovation, creativity, and expertise to power the immersive experiences of the future as Cosm.
About the RoleWe are seeking a Site Reliability Engineer to play a pivotal role in designing, implementing, automating, and maintaining our organization's technology infrastructure. As an SRE, you will be responsible for designing robust, scalable, and resilient platforms that facilitate real-time monitoring, analysis, and decision-making processes critical to our business operations.
The ideal candidate is a solutions-oriented person who can learn new technologies quickly and become competent with all layers of the development platform. They should be willing to roll up their sleeves and familiarize themselves with various technologies while choosing the best tool for the job.
Responsibilities- Monitoring and Alerting: Design and automate robust monitoring and alerting mechanisms to ensure the health, performance, and availability of our operations center platform, products, and associated infrastructure components.
- Application Monitoring: Collaborate with software engineering and product teams to understand how to monitor their applications and microservices.
- Infrastructure Deployment: Work with infrastructure teams to deploy and configure necessary hardware and software components to support our operations center platform, including servers, networks, databases, and monitoring tools.
- Documentation and Training: Create comprehensive documentation, diagrams, and guides to facilitate system understanding, troubleshooting, and knowledge transfer. Provide training and support to operations center staff on platform usage and best practices.
- Collaboration and Stakeholder Management: Collaborate closely with cross-functional teams, including product, operations, IT, security, and business units, to understand requirements, gather feedback, and align observability platform architecture with organizational goals and priorities.
- Incident Management: Work on an on-call rotation to troubleshoot and resolve incidents, working closely with the support team to ensure prompt resolution.
- Bachelor's or Master's degree in Computer Science, Information Technology, or a related field, or relevant work experience.
- 6+ years of proven experience as a platform engineer, site reliability engineer, systems engineer, or a similar role, with a focus on designing, implementing, and monitoring complex systems.
- Expert-level knowledge of the Grafana LGTM stack.
- Familiarity with scripting languages for automation and configuration management, such as PowerShell, Bash, JavaScript, or TypeScript.
- Strong understanding of hybrid computing concepts and hands-on experience with AWS.
- Experience with virtualization/containerization technologies like Hyper-V, VMware, Amazon EC2, Docker, and Kubernetes.
- Experience using Pulumi, Terraform, and other IaC frameworks.
- In-depth knowledge of enterprise Linux and Windows Server operating systems (2016/2019/2022), including installation, configuration, and troubleshooting.
- Familiarity with configuration management frameworks like Ansible or Puppet is a plus.
- Expertise in data retrieval technologies, including constructing efficient TraceQL, PromQL, GraphQL, and LogQL queries.
- Solid understanding of networking principles and protocols.
- Excellent problem-solving and troubleshooting skills, with a keen attention to detail.
- Strong communication and interpersonal skills, with the ability to collaborate effectively with clients and team members.
This position offers a competitive annualized base salary range of $105,000 to $140,000, depending on the candidate's geographic region, job-related knowledge, skills, and relevant experience.
Cosm is an equal opportunity employer committed to creating an inclusive environment for all employees. We celebrate diversity and welcome applicants from diverse backgrounds.
-
Los Angeles, California, United States StubHub Full timeAt StubHub, we're redefining the live event experience on a global scale. Our team is looking for a talented Senior Site Reliability Engineer to design and develop next-generation technologies and complex features.This role will be based in either our New York, NY or Los Angeles, CA office and has a hybrid (3 in-person days per week) work schedule. As a...
-
Los Angeles, California, United States eTek IT Services, Inc. Full timeJob DescriptioneTek IT Services, Inc. is seeking a skilled Site Reliability Engineer to join our team.We offer a competitive salary of $120,000 per year.About the Role:This position involves providing technical leadership and expertise in designing, implementing, and maintaining large-scale systems.The ideal candidate will have 6+ years of experience...
-
Senior Site Reliability Engineer
7 days ago
Los Angeles, California, United States Hireio, Inc. Full timeJob OverviewHireio, Inc. is seeking an experienced Senior Site Reliability Engineer to lead our global SRE team for the Data Platform.Key ResponsibilitiesOversee the reliability of major data warehouse products, services, and query engines such as ClickHouse, Spark, Presto, and Doris.Evaluate and ensure that all service level objectives and agreements from...
-
Site Reliability Engineer
7 days ago
Los Angeles, California, United States Saxon Global Full timeJob DescriptionSaxon Global is seeking a highly skilled Site Reliability Engineer to join our team. The ideal candidate will have expertise in designing and deploying large-scale, massively distributed, fault-tolerant systems. This is a unique opportunity to work with cutting-edge technology and contribute to the success of our organization.Key...
-
Reliability Engineer II
7 days ago
Los Angeles, California, United States Takeda Pharmaceutical Company Ltd Full timeAbout the JobWe are seeking a skilled Reliability Engineer II to join our team in Los Angeles, CA. In this role, you will contribute to the reliability, design, development, and enhancement of manufacturing and support equipment for our Los Angeles Fractionation Expansion project.ResponsibilitiesWork with the LAFE Project Engineering Team to apply...
-
Los Molinos, California, United States Zillow Full timeZillow Group is revolutionizing the real estate industry by making it easier to unlock life's next chapter. As a Senior Site Reliability Engineer, you will play a critical role in helping millions of people find and win their dream home through digital solutions.The Transformation Enablement Team (TE) at Zillow Group empowers product teams to efficiently run...
-
DevOps Engineer Specialist
7 days ago
Los Angeles, California, United States Panavision Full timeJob Title: DevOps Engineer SpecialistA successful candidate will have a strong background in software development, CI/CD, and cloud infrastructure. They will be responsible for designing and implementing scalable and reliable solutions that meet the needs of our clients and support the growth of our business.
-
Senior Engineering Marketing Specialist
7 days ago
Los Angeles, California, United States Viterbi School of Engineering Full timeAt the USC Viterbi School of Engineering, we are seeking a highly skilled and innovative Senior Engineering Marketing Specialist to join our team. This is a hybrid position that will require you to work onsite for 3 days a week.The successful candidate will be responsible for developing data-driven marketing plans that attract students and promote our...
-
Los Angeles, California, United States Paxon Energy & Infrastructure Services LLC Full timeJob OverviewPaxon Energy & Infrastructure Services LLC is seeking an experienced Electrical Substation Structural Engineering Specialist to join our team. This role involves designing, analyzing, and optimizing substation structures to ensure safety, stability, and reliability.Responsibilities:Structural Design and AnalysisPerform structural analysis,...
-
Digital TV Installation Specialist
3 days ago
Los Angeles, California, United States Geeks on Site Full timeJob SummaryA skilled Digital TV Installation Specialist is needed to join our team at Geeks On Site. As a professional installer, you will be responsible for providing top-notch TV mounting services to residential and commercial customers.About the RoleIn this role, you will work independently to assess customer needs, install various types of TV mounts, and...
-
Project Engineer/Infrastructure Specialist
7 days ago
Los Angeles, California, United States Suenram & Associates, Inc. Full timeWe are seeking a highly skilled Project Engineer/Infrastructure Specialist to join our team at Suenram & Associates, Inc. in Southern California.Job OverviewThe ideal candidate will have 1-3 years of experience in civil engineering and a strong understanding of project management principles.Key Responsibilities:Provide engineering design support and guidance...
-
Los Angeles, California, United States KPFF Consulting Engineers Full timeCivil Design Engineer OpportunityWe are seeking a motivated and experienced Civil Design Engineer to join our team at KPFF Consulting Engineers. As a key member of our site development engineering group, you will have the opportunity to work on dynamic projects that impact the community.**Job Description:**Design civil engineering solutions for site...
-
Production Engineering Specialist
7 days ago
Los Angeles, California, United States Northrop Grumman Full timeNorthrop Grumman is seeking a Production Engineering Specialist to support various programs under Advanced Weapons Systems.The role of the Production Engineering Specialist is to lead Northrop Grumman to success by utilizing capacity modeling software, Digital Transformation and common engineering tactics.The specialist will use advanced knowledge of...
-
Video Processing Platform Reliability Expert
7 days ago
Los Angeles, California, United States Tik Tok Full timeAbout TikTok U.S. Data Security, we're the leading destination for short-form mobile video. Our mission is to inspire creativity and bring joy. U.S. Data Security ('USDS') is a subsidiary of TikTok in the U.S., focusing on heightened security governance to protect user data. As a Site Reliability Engineer, you'll ensure our video system's reliability,...
-
Site Evaluation Specialist
7 days ago
Los Angeles, California, United States University of Southern California Full time**Company Overview**The University of Southern California (USC), a world-class research institution, is seeking a highly skilled professional to join its Viterbi School of Engineering. With a rich history dating back to 1880, USC has established itself as a hub for innovation and discovery in the heart of downtown Los Angeles.**Compensation Package**We offer...
-
Los Angeles, California, United States Institute of Risk and Safety Analyses Full timeAbout the RoleWe are seeking a skilled Forensic Engineer/Accident Reconstruction Specialist to join our team at the Institute of Risk and Safety Analyses. This exciting opportunity involves applying scientific expertise in a litigation setting, specifically in the field of personal injury.As an assisting engineer, you will be responsible for reviewing...
-
Site Operations Specialist
3 days ago
Los Angeles, California, United States CBRE Full timeAt CBRE Global Workplace Solutions, we help clients make real estate a meaningful contributor to organizational productivity and performance. As a Site Operations Specialist, you will be responsible for ensuring the smooth operation of our facilities and maintaining a high level of service delivery.About the RoleMaintain and repair building systems,...
-
Site Management Specialist
6 days ago
Los Angeles, California, United States Lifelancer Full timeAbout the RoleWe are seeking a highly skilled and experienced Site Management Specialist to join our team. In this role, you will be responsible for performing site qualification, site initiation, interim monitoring, site management and close-out visits ensuring regulatory, ICH-GCP and/or Good Pharmacoepidemiology Practices (GPP) and protocol compliance.Key...
-
Site Construction Specialist
3 days ago
Los Angeles, California, United States McCarthy Building Companies, Inc. Full timeCompany OverviewMcCarthy Building Companies, Inc. is a leading construction company dedicated to delivering exceptional results on large-scale projects.SalaryThe estimated hourly salary for this role in California ranges from $22.91 to $44.50 per hour, with the base pay determined by the applicable Collective Bargaining Agreement and influenced by factors...
-
Los Angeles, California, United States Tik Tok Full timeAbout UsTikTok is a leading short-form mobile video platform, dedicated to inspiring creativity and bringing joy. Our U.S. Data Security division focuses on data protection policies and content assurance protocols to ensure user safety. As a subsidiary of TikTok, we're committed to providing oversight and protection of the platform and user data.**Why Choose...