Site Reliability Engineering Manager

2 weeks ago


Sacramento, CA, United States TWO95 International Full time

Position – Site Reliability Engineering Manager

Location – Sacramento, CA

Type – Fulltime

Salary – $Market

ESSENTIAL JOB FUNCTIONS AND BASIC DUTIES
  • The SREM will ensure that reliability measures are incorporated into strategic IT plans and that expectations are clearly defined. The SREM will also be responsible for working with business and IT stakeholders to balance real-world risks with business drivers such as speed, agility, flexibility and performance. The ISM's job is composed of a broad range of activities in support of IT program initiatives, including:
    • Strategic support
    • Reliability liaison
  • Architecture/engineering support
  • Operational support
  • Work with the Senior Director, Service Delivery to develop a reliability program and projects that address identified risks and platform reliability, automation, and scale requirements.
  • Manage the process of gathering, analyzing and assessing the current and future reliability landscape, as well as providing the Service Delivery Senior Director with a realistic overview of risks in the enterprise environment.
  • Work with the Service Delivery Senior Director to develop budget projections based on short- and long-term goals and objectives.
  • Monitor and report on reliability standards, as well as the enforcement of policies within the IT department.
  • Propose changes to existing policies and procedures to ensure operating efficiency and regulatory compliance.
  • Manage a staff of reliability engineering professionals, hire and train new staff, conduct performance reviews, and provide leadership and coaching, including technical and personal development programs for team members.

Requirements

Reliability Liaison

  • Assist resource owners and IT staff in understanding and responding to reliability concerns experienced. Provide reliability communication, awareness and training for audiences, which may range from senior leaders to field staff. Work as a liaison with vendors and the legal and purchasing departments to establish mutually acceptable contracts and service-level agreements. Manage production issues and incidents and participate in problem and change management forums.
  • Work with various stakeholders to identify information asset owners to classify data and systems as part of a reliability framework implementation. Serve as an active and consistent participant in the systems reliability governance process.
  • Work with the Services Delivery Senior Director and other IT and business stakeholders to define metrics and reporting strategies that effectively communicate successes and progress of the reliability program. Provide support and guidance for legal and regulatory compliance efforts, including audit support.
Architecture/Engineering Support

  • Consult with other IT and reliability staff reports to ensure that reliability is factored into the evaluation, selection, installation and configuration of hardware, applications and software. Recommend and coordinate the implementation of technical controls to support and enforce defined reliability practices and policies.
  • Research, evaluate, design, test, recommend or plan the implementation of new or updated reliability hardware or software, and analyze its impact on the existing environment; provide technical and managerial expertise for the administration of reliability tools. Work with the enterprise architecture team to ensure that there is a convergence of business, technical and reliability requirements; liaise with IT management to align existing technical installed base and skills with future architectural requirements.
  • Develop a strong working relationship with the reliability engineering team reporting to this position to develop and implement controls and configurations aligned with reliability policies and legal, regulatory and audit requirements.
Operational Support

  • Coordinate, measure and report on the technical aspects of reliability engineering management. Manage outsourced vendors that provide reliability functions for compliance with contracted service-level agreements. Manage and coordinate operational components of incident management, including detection, response and reporting. Maintain a knowledgebase comprising a technical reference library, reliability trends and practices, and laws and regulations.
  • Manage the day-to-day activities of reliability management, identify risk tolerances, recommend treatment plans and communicate information about residual risk. Manage reliability projects and provide expert guidance on reliability matters for other IT projects. Ensure audit trails, system logs and other monitoring data sources are reviewed periodically and are in compliance with policies and audit requirements.
  • Design, coordinate and oversee reliability testing procedures to verify the reliability of systems, networks and applications, and manage the remediation of identified risks.
  • Performs other duties as directed.

EDUCATION AND EXPERIENCE:

Bachelor’s or Master’s degree in Reliability Engineering, Computer Science, Information systems, or related discipline, plus a minimum of seven years of IT experience, five years of which must be in a reliability engineering role, and at least two years in a supervisory capacity, or an equivalent combination of education and experience.

Benefits

Note: If interested please send your updated resume and include your salary requirement along with your contact details with a suitable time when we can reach you. If you know of anyone in your sphere of contacts, who would be a perfect match for this job then, we would appreciate if you can forward this posting to them with a copy to us.

We look forward to hearing from you at the earliest



  • Sacramento, CA, United States General Motors Full time

    Job Description The Role The rapid adoption of advanced software in vehicles marks a new era for automakers and consumers, bringing both advantages and challenges. As part of Site Reliability Engineering (SRE) at General motors, you'll join a dedicated team focused on enhancing the reliability, efficiency, and scalability of our distributed systems. We...


  • Sacramento, CA, United States Oracle Full time

    Job Description Executive Summary: SPRE Architect Role Requirements Oracle is seeking a Strategic Platform Reliability Engineering (SPRE) Architect to strengthen the architectural foundation and operational resilience of key SaaS offerings, ensuring availability, security, and compliance for top-tier customers. The SPRE Architect will lead cross-functional...


  • Sacramento, CA, United States Oracle Full time

    Job Description Executive Summary: SPRE Architect Role Requirements Oracle is seeking a Strategic Platform Reliability Engineering (SPRE) Architect to strengthen the architectural foundation and operational resilience of key SaaS offerings, ensuring availability, security, and compliance for top-tier customers. The SPRE Architect will lead cross-functional...


  • Sacramento, CA, United States Oracle Full time

    Job Description Executive Summary: SPRE Architect Role Requirements Oracle is seeking a Strategic Platform Reliability Engineering (SPRE) Architect to strengthen the architectural foundation and operational resilience of key SaaS offerings, ensuring availability, security, and compliance for top-tier customers. The SPRE Architect will lead cross-functional...


  • Sacramento, CA, United States Apple Full time

    Role Number: 200625148-3121 Summary Apple is where individual imaginations gather together, committing to the values that lead to great work. Every new product we build, service we create, or Apple Store experience we deliver is the result of us making each other’s ideas stronger. That happens because every one of us shares a belief that we can make...


  • Sacramento, CA, United States NV5 Full time

    Overview NV5 (NASDAQ: NVEE) is a provider of professional and technical engineering and consulting solutions to public and private sector clients in the infrastructure, energy, construction, program management and environmental markets. We are shaping the future. We are problem solvers. We are client champions. We are a team of talented professionals,...


  • Sacramento, CA, United States Noblis Full time

    Responsibilities Introduction: We are seeking an experienced Site Implementation Engineer to support the FAA in implementing the Terminal Flight Data Manager (TFDM) system at designated sites. This is an end-to-end role where you will lead the installation process, from site surveys to system commissioning. As the single point of contact for all site...


  • Sacramento, CA, United States Bennett Engineering Services Full time

    Traffic Engineer, Project Mid-Level Position Summary BENEN seeks a Traffic Engineer at the project level to support and lead the delivery of traffic engineering projects for public infrastructure and private development clients. This role is a core member of the Traffic Division, with direct responsibility for producing high quality design documents,...


  • Sacramento, CA, United States Oracle Full time

    Job Description The mission of our Network Reliability Engineering team is to provide exceptional network reliability and automation services that enable our customers to drive operational excellence in OCI networks at scale. By focusing on both reactive and proactive functions, we aim to minimize downtime, quickly resolve incidents, and continuously enhance...


  • Sacramento, CA, United States Quality Technology Services, LLC Full time

    The Manager of Facilities Operations Engineering will be responsible for the Data Center facility technical strategy, continuous improvement and development support for sites across the QTS portfolio. This role is ultimately responsible to assist with the engineering related to coordination, installation, and integration aspects of electrical and mechanical...