Site Reliability Engineer
1 day ago
Site Reliability Engineer (SRE)
Location: San Francisco Bay Area
Role Overview:
We are seeking a highly skilled Site Reliability Engineer (SRE) to join a dynamic team at a rapidly growing technology company. As an SRE, you will be responsible for ensuring the reliability, scalability, and performance of mission-critical systems, while implementing automation and optimizing cloud infrastructure. This role offers the opportunity to work with cutting-edge AI/ML technologies, leveraging them to solve complex challenges in cloud infrastructure management and performance optimization.
Key Responsibilities:
- System Reliability & Performance: Design, implement, and maintain scalable systems, ensuring high availability, performance, and disaster recovery across production environments.
- Automation & Tool Development: Develop automation tools to streamline operations, improve system reliability, and reduce manual interventions.
- Cloud Infrastructure Management: Create and manage cloud instances (e.g., dev, staging, production) using AWS, GCP, or Azure, optimizing infrastructure performance and cost.
- Integration of AI/ML Models: Collaborate with engineering teams to integrate machine learning models into production environments, ensuring that these models scale efficiently and perform optimally.
- Incident Management: Respond to and resolve incidents, minimizing downtime and ensuring quick recovery. Lead post-incident reviews and implement preventive measures.
- Continuous Improvement: Identify areas of improvement and drive initiatives to enhance system reliability, performance, and security.
- Security & Compliance: Ensure that infrastructure and applications adhere to security best practices and compliance standards.
Qualifications:
- Educational Background: Bachelor’s degree in Computer Science, Engineering, or a related field (or equivalent experience).
- Experience: Proven experience as a Site Reliability Engineer or in a similar role within a SaaS environment, managing and optimizing cloud infrastructure (preferably AWS, GCP, or Azure), and familiarity with integrating AI and machine learning technologies.
- Technical Skills:
- Proficiency in programming and scripting languages such as Python, Go, or Bash.
- Experience with containerization and orchestration tools like Docker and Kubernetes.
- Solid understanding of networking, security, and performance optimization practices.
- Knowledge of CI/CD pipelines and DevOps practices to ensure smooth development and deployment cycles.
- Problem-Solving: Strong analytical and problem-solving skills with attention to detail.
- Collaboration & Communication: Excellent interpersonal skills, with the ability to work collaboratively in cross-functional teams and communicate technical concepts clearly.
Benefits:
- Competitive Salary: Attractive compensation package, including equity options.
- Health & Wellness: Comprehensive health, dental, and vision insurance, along with other benefits.
- Work Environment: A collaborative and innovative work environment within a growing company.
- Growth Opportunities: Opportunities for career growth, professional development, and a chance to shape the future of the company’s technology and infrastructure.
-
Senior Site Reliability Engineer
1 week ago
San Jose, United States NInfo Systems, Inc. Full timeCompany DescriptionNInfo Systems Inc. is a Certified minority-owned national IT Recruiting and Solutions provider with two decades of experience. It works with Fortune 500 corporations, mid-sized companies, Boutique Consulting companies, startups, SME-level organizations, Federal/ State agencies, and tier-one vendors.Role: Senior Reliability Engineer, Hybrid...
-
Site Reliability Engineer
2 weeks ago
San Jose, United States Altimetrik Full timeWe are looking to hire a Site reliability EngineerEducational Background: Holds a bachelor’s or master’s degree in computer science, information technology, or a related technical field. Alternatively, significant work experience in DevOps or cloud infrastructure management could offset a formal degree requirement.Cloud Infrastructure Expertise: Has at...
-
Site Reliability Engineer
2 weeks ago
san jose, United States Altimetrik Full timeWe are looking to hire a Site reliability EngineerEducational Background: Holds a bachelor’s or master’s degree in computer science, information technology, or a related technical field. Alternatively, significant work experience in DevOps or cloud infrastructure management could offset a formal degree requirement.Cloud Infrastructure Expertise: Has at...
-
Site Reliability Engineer
2 weeks ago
san jose, United States Altimetrik Full timeWe are looking to hire a Site reliability EngineerEducational Background: Holds a bachelor’s or master’s degree in computer science, information technology, or a related technical field. Alternatively, significant work experience in DevOps or cloud infrastructure management could offset a formal degree requirement.Cloud Infrastructure Expertise: Has at...
-
Site Reliability Engineer/
1 week ago
San Jose, United States PDSSOFT INC. Full timeSite Reliability Engineer (SRE) / AWS DevOps EngineerLocation: San Jose,CADuration: Long TermJob Description:We are seeking a highly skilled Site Reliability Engineer (SRE) with expertise in GitHub Actions, AWS DevOps, Helm Charts, and YAML configuration. The ideal candidate will be responsible for ensuring the reliability, scalability, and efficiency of our...
-
Site Reliability Engineer
2 weeks ago
San Francisco, United States WEX Full timeThe WEX Site Reliability Engineering (SRE) team is seeking an entry-level Site Reliability Engineer Level 1 who is passionate about learning and growing in the field of software development and solutions focused on observability, incident response, reliability and performance, operational excellence, and compliance. The team will be part of the Benefits...
-
Site Reliability Engineer
1 month ago
San Francisco, California, United States Outdefine Full timeAbout the JobWe are seeking a highly skilled Site Reliability Engineer to join our team at Outdefine. As a key member of our engineering team, you will be responsible for ensuring the reliability, scalability, and performance of our ecommerce platform.Key ResponsibilitiesDesign and implement scalable and highly available cloud infrastructure using Kubernetes...
-
Site Reliability Engineer
1 month ago
San Francisco, California, United States Roman Health Pharmacy LLC Full timeAbout the RoleWe are seeking a highly skilled Site Reliability Engineer to join our team at Xero. As a key member of our Reliability Enablement team, you will play a critical role in ensuring the reliability and performance of our systems.Key ResponsibilitiesInvestigate operational surprises and support teams in post-incident activitiesConduct in-depth...
-
Site Reliability Engineer
1 month ago
San Francisco, California, United States Swish Analytics Full time{"h1": "Site Reliability Engineer at Swish Analytics"} Swish Analytics is a sports analytics and betting startup that's revolutionizing the industry with cutting-edge predictive data products. We're on a mission to make oddsmaking a challenge rooted in engineering, mathematics, and sports betting expertise, not intuition. We're looking for a team-oriented...
-
Site Reliability Engineer
2 weeks ago
San Francisco, United States Ellation, Inc. Full timeWho We AreWe‘re a cast of characters working to shine a spotlight on anime. Crunchyroll is an international business focused on creating both online and offline experiences for fans through content (licensed, co-produced, originals, distribution), merchandise, events, gaming, news, and more. Visit our About Us pages for more information about our...
-
Site Reliability Engineer
1 week ago
San Francisco, United States Ellation, Inc. Full timeWho We AreWe‘re a cast of characters working to shine a spotlight on anime. Crunchyroll is an international business focused on creating both online and offline experiences for fans through content (licensed, co-produced, originals, distribution), merchandise, events, gaming, news, and more. Visit our About Us pages for more information about our...
-
Senior Site Reliability Manager
4 weeks ago
san jose, United States Triune Infomatics Inc Full timeRole: Senior Site Reliability ManagerFull-Time - HybridLocal to San Jose, CAThe Client is a simple and scalable cloud-based IoT edge orchestration solution that delivers visibility, control, and security for the distributed edge. Their platform allows customers to seamlessly manage and deploy any compute node, unlocking the value of IoT data, enabling...
-
Senior Site Reliability Manager
2 weeks ago
san jose, United States Triune Infomatics Inc Full timeRole: Senior Site Reliability ManagerFull-Time - HybridLocal to San Jose, CAThe Client is a simple and scalable cloud-based IoT edge orchestration solution that delivers visibility, control, and security for the distributed edge. Their platform allows customers to seamlessly manage and deploy any compute node, unlocking the value of IoT data, enabling...
-
san jose, United States EVONA Full timeSite Reliability Engineer (SRE)Location: San Francisco Bay AreaRole Overview:We are seeking a highly skilled Site Reliability Engineer (SRE) to join a dynamic team at a rapidly growing technology company. As an SRE, you will be responsible for ensuring the reliability, scalability, and performance of mission-critical systems, while implementing automation...
-
Site Reliability Engineer
1 month ago
San Francisco, California, United States WEX Full timeJob SummaryThe WEX Site Reliability Engineering team is seeking a highly motivated and quick-learning individual to join our team as a Site Reliability Engineer Level 1. As a key member of our team, you will be responsible for ensuring the reliability, performance, and security of our systems.Key Responsibilities:Actively participate in training and...
-
Senior Site Reliability Manager
2 months ago
San Jose, United States Triune Infomatics Inc Full timeRole: Senior Site Reliability ManagerFull-Time - HybridLocal to San Jose, CAThe Client is a simple and scalable cloud-based IoT edge orchestration solution that delivers visibility, control, and security for the distributed edge. Their platform allows customers to seamlessly manage and deploy any compute node, unlocking the value of IoT data, enabling...
-
Site Reliability Engineer
4 weeks ago
San Francisco, United States New York Technology Partners Full timeMust Have's in the order of preference.Typical Java/J2EE experience between 6 and 10 yearsApplication Production Support(SRE - Site Reliability Engineering) with 3+ years - Preferably in e-commerce domainHands-on experience in any of the UI Frameworks(AngularJS, VueJS etc) - 1+ years
-
Site Reliability Engineer
4 weeks ago
san francisco, United States New York Technology Partners Full timeMust Have's in the order of preference.Typical Java/J2EE experience between 6 and 10 yearsApplication Production Support(SRE - Site Reliability Engineering) with 3+ years - Preferably in e-commerce domainHands-on experience in any of the UI Frameworks(AngularJS, VueJS etc) - 1+ years
-
Site Reliability Engineer
3 weeks ago
San Francisco, United States Focal Systems Full timeLocation: San Francisco - hybrid (1-2 days per week)Salary: $165-175k + stock Company Description Focal Systems is the industry leader in retail AI solutions. We are a Silicon Valley based startup that has more than doubled in size every year since inception. We are a Deep Learning first company. Our mission is to automate and optimize brick and mortar...
-
Site Reliability Engineer
1 month ago
San Francisco, California, United States Arbitrum Inc Full timeReliability EngineerAt Arbitrum Inc, we're on a mission to bring blockchain to a billion people. Our developer platform is designed to make building on the blockchain easy, and we're looking for a skilled Reliability Engineer to join our Infrastructure team.As a Reliability Engineer, you'll collaborate with our engineering team to design, deploy, and...