Site Reliability Engineer
3 weeks ago
Na Roche, você pode-se apresentar como você mesmo, abraçado pelas qualidades únicas que traz. Nossa cultura incentiva a expressão pessoal, o diálogo aberto e as conexões genuínas, onde você é valorizado e respeitado por quem você é, e permitindo que você prospere tanto pessoal como profissionalmente. É assim que pretendemos prevenir, deter e curar doenças e garantir que todos tenham acesso aos cuidados de saúde hoje e nas gerações futuras. Junte-se à Roche, onde cada voz é importante.### A posiçãoThe role requires the candidate to be available for on-call duty service, responding promptly to urgent issues and emergencies outside of regular working hours, ensuring that critical situations are addressed in a timely and effective manner**Who We Are**At Roche, we are passionate about transforming patients’ lives, and we are bold in both decision and action - we believe that good business means a better world. That is why we come to work every single day. We commit ourselves to scientific rigor, unassailable ethics, and access to medical innovations for all. We do this today to build a better tomorrow. Roche is strongly committed to a diverse and inclusive workplace. We strive to build teams that represent a range of backgrounds, perspectives, and skills. Embracing diversity enables us to create a great place to work and to innovate for patients.Roche is building a global site reliability engineering (SRE) team that will support commercial and internal solutions. This team will have the mindset of building and creating engineering solutions to solve a broad spectrum of problems. **Step into the Future of IT Infrastructure with Roche**As a seasoned Site Reliability Engineer (SRE) at Roche, you'll leverage your deep software engineering expertise to propel our IT infrastructure to new heights of robustness, scalability, and reliability. This isn't just a role—it's an invitation to shape the backbone of critical infrastructures and drive our technological innovations forward.**Your Mission**Design and maintain cutting-edge tools, scripts, and frameworks that automate repetitive tasks, streamline software deployment, and manage expansive systems with unparalleled efficiency.Partner closely with forward-thinking development teams to architect and implement high-performance solutions that elevate system efficiency, optimize resource utilization, and enhance deployment processes for superior uptime and user satisfaction.**Your Impact**Lead the charge in incident management and response. Detect system anomalies, troubleshoot swiftly, and conduct thorough root cause analyses to prevent recurring issues.Champion continuous improvement by refining monitoring and alerting mechanisms, conducting insightful post-incident reviews, and embedding best practices in software lifecycle management. Your strategic foresight and meticulous planning will ensure our systems are not only reliable but also superlatively performant.By joining our elite team, you will play a pivotal role in delivering seamless experiences to our end-users, exceeding business and customer demands, and solidifying Roche's reputation as a leader in IT innovation.**Your Core Responsibilities*** **Reliability Mastery:**Proactively monitor and maintain system reliability using advanced tools like DataDog, VictorOps, ELK, Grafana, and Prometheus. Become a key player in ensuring system stability and performance.* **Uptime Guardian:** Ensure optimal uptime and performance by swiftly identifying issues and responding to alerts with precision.* **Technical Troubleshooter**: Basic understanding of Architecture and designs to deep dive into complex technical issues, troubleshoot, investigate, and resolve them. Collaborate seamlessly with engineering teams to enable timely and effective resolutions.* **Service Excellence:** Maintain and consistently achieve defined SLAs, SLIs, and SLOs, ensuring service levels are consistently met or exceeded.* **Automation Innovator:** Develop and deploy automation scripts (using Python or other scripting languages) to streamline operations, enhance system efficiencies, and reduce manual tasks.* **Cloud Steward**: Manage and maintain robust infrastructure across AWS and Azure environments, implementing best practices to ensure peak performance, reliability of cloud-based applications. Drive cost optimization through best practice implementation and continuous vigilance.* **Cross-functional Collaborator:** Work closely with engineering, DevOps, security and operations teams to drive continuous improvement and foster a culture of reliability and inclusion.* **Incident Responder**: Handle requests and incidents through JIRA and ServiceNow, documenting troubleshooting procedures, solutions, and lessons learned to fuel ongoing improvements.* **Flexible Scheduling:** Work on-call outside of normal working hours and weekends as scheduled to ensure continuous support.* **Team Builder:** Actively contribute to the growth and development of the SRE team's capabilities, nurturing a stronger, more inclusive, and resilient team.**Who You Are**:* **Educational Background**: Bachelor's degree in Computer Science, Engineering, or a related field, or equivalent professional experience. An MBA or PhD is a plus, but not required.* **Certifications**: Relevant industry certifications (AWS/Azure) to showcase your expertise.* **Experience**: Approximately 5 years of experience in site reliability engineering, IT operations, DevOps, or related fields, or equivalent skills and experience.* **Cloud Expertise**: Solid experience with AWS and/or Azure, including setting up, monitoring, and maintaining cloud resources (incl. Kubernetes, EKS, AKS, GKE, etc knowledge). Also experience on basis understanding of tools related to Infrastructure as a code, such as Terraform* **Tool Proficiency**: Proficiency with monitoring and logging tools such as DataDog, Splunk-Oncall, ELK stack, Grafana, and Prometheus etc. Knowledge of Loki Mimir and Tempo is a plus.* **Hands-On Skills**: Hands-on experience with JIRA and ServiceNow for tracking incidents, requests, and documentation.* **Scripting Knowledge**: Proficiency in Python or similar scripting languages for automation purposes.* **Incident Response**: Understanding of SRE Core principles beside in-depth understanding of incident prioritization, escalation processes, and service level management (SLA/SLO/SLI).* **Troubleshooting**: Demonstrates proficient troubleshooting capabilities, especially in cloud and distributed system environments.* **Communication and Teamwork**: Excellent communication, teamwork, and documentation skills, with a proactive and self-motivated approach to improving system reliability and operational efficiencies.* **Diversity and Inclusion**: We value and encourage candidates from diverse backgrounds and experiences, believing that diverse perspectives drive innovation and success.* **Language requirements**: Excelling in both spoken and written English communication.By joining our team, you will be part of a dynamic environment where your contributions will directly impact the resilience and reliability of our services. You will have opportunities for professional growth and the ability to collaborate with industry leaders. Let’s drive the future of IT stability together, ensuring an exceptional experience for our customers.### Quem nós somosUm futuro mais saudável nos leva a inovar. Juntos, mais de 100 mil funcionários em todo o mundo se dedicam ao avanço da ciência, garantindo que todos tenham acesso à saúde hoje e nas próximas gerações. Nossos esforços resultam em mais de 26 milhões de pessoas tratadas com nossos medicamentos e mais de 30 bilhões de testes realizados usando nossos produtos de diagnóstico. Nós nos capacitamos para explorar novas possibilidades, promover a criatividade e manter as nossas ambições altas, para fornecer soluções de saúde que mudem a vida
#J-18808-Ljbffr
-
Site Reliability Engineer
3 weeks ago
Mission, United States Bitscopic Inc. Full timeSite Reliability Engineer (SRE)Job OverviewAt Bitscopic, we bring high definition to patient data, delivering actionable insights for Clinicians and empowering them with the knowledge to make better decisions. Our mission is to help providers achieve better care outcomes through these insights.We are seeking a skilled Site Reliability Engineer to join our...
-
Site Reliability Engineer
3 weeks ago
Mission, United States OutSolve Full timeWe’re seeking a highly skilled Site Reliability Engineer (SRE) to join our engineering team and help ensure the reliability, scalability, and performance of our systems. As an SRE, you’ll blend software engineering with systems engineering to build and maintain resilient infrastructure, automate operations, and drive continuous improvement across our...
-
Mission, United States Bitscopic Inc. Full timeA healthcare analytics company is seeking a Site Reliability Engineer to ensure the reliability of services for federal clients. The role involves monitoring, deployment, and system optimization. Ideal candidates will have experience working with federal compliance requirements, especially with the Department of Veterans Affairs, and skills in monitoring...
-
Site Reliability Engineering
3 weeks ago
Mission, United States Homa Full timeHoma is a global mobile game developer and publisher creating games people love. We partner with studios and internally develop games, having launched over 80 titles, reached over 2 billion downloads, and seen our game All in Hole break into the global top-50 grossing charts. These are milestones, not the finish line. With deep expertise in product and...
-
Site Reliability Engineer
2 weeks ago
Mission, United States Flanksource Inc. Full timeDesign and maintain Kubernetes clusters across multiple environments (development, staging, production)Build automation for cluster deployment, configuration, and managementMonitor and troubleshoot clusters to ensure high availability and optimal performanceImplement security best practices for Kubernetes and underlying infrastructureParticipate in incident...
-
Customer Reliability Engineer
3 weeks ago
Mission, United States LY Corp. Full timeCustomer Reliability Engineer / LINE PlatformAs a Customer Reliability Engineer, you will be fully responsible for investigation and support of the technical problems that concern the users of the Messaging Platform.Holding deep domain knowledge and strong technical skills, you will be responsible for solving issues for internal and internal users of the...
-
Senior Software Engineer, Backend
3 weeks ago
Mission Viejo, United States Affirm Full timeSenior Software Engineer, Backend (Platform Reliability) Affirm is reinventing credit to make it more honest and friendly, giving consumers the flexibility to buy now and pay later without any hidden fees or compounding interest. We are seeking a Senior Software Engineer to join our Infrastructure Platform Engineering team, ensuring the reliability,...
-
Site Manager
3 weeks ago
Mission, United States Aretè & Cocchi Technology Full timeOverviewSaipem is a global leader in the engineering and construction of major projects for the energy and infrastructure sectors, both offshore and onshore. Saipem is “One Company” organized into business lines: Asset Based Services, Drilling, Energy Carriers, Offshore Wind, Sustainable Infrastructures, Robotics & Industrialized Solutions. Always...
-
Lead Software Engineer
3 weeks ago
Mission, United States Steneg Full timeA European leader in industrial automation, this company develops tailored control systems and smart automation solutions for demanding manufacturing sectors such as automotive, electronics, and machinery. By leveraging state-of-the-art PLC technologies and integrating them into Industry 4.0-ready systems, the company drives digital transformation across...
-
Mission/Quality Engineering Manager 2
3 weeks ago
Mission, United States Northrop Grumman Corp. (AU) Full timeRequisition ID: R10179899Category: Mission and Quality AssuranceLocation: Manhattan Beach, California, United States of AmericaClearance Type: NoneTelecommute: No - Teleworking not available for this positionShift: 1st Shift (United States of America)Travel Required: NoRelocation Assistance: Relocation assistance may be availablePositions Available: 1At...