Site Reliability Engineering Manager
1 week ago
Position – Site Reliability Engineering Manager
Location – Sacramento, CA
Type – Fulltime
Salary – $Market
ESSENTIAL JOB FUNCTIONS AND BASIC DUTIES- The SREM will ensure that reliability measures are incorporated into strategic IT plans and that expectations are clearly defined. The SREM will also be responsible for working with business and IT stakeholders to balance real-world risks with business drivers such as speed, agility, flexibility and performance. The ISM's job is composed of a broad range of activities in support of IT program initiatives, including:
- Strategic support
- Reliability liaison
- Architecture/engineering support
- Operational support
- Work with the Senior Director, Service Delivery to develop a reliability program and projects that address identified risks and platform reliability, automation, and scale requirements.
- Manage the process of gathering, analyzing and assessing the current and future reliability landscape, as well as providing the Service Delivery Senior Director with a realistic overview of risks in the enterprise environment.
- Work with the Service Delivery Senior Director to develop budget projections based on short- and long-term goals and objectives.
- Monitor and report on reliability standards, as well as the enforcement of policies within the IT department.
- Propose changes to existing policies and procedures to ensure operating efficiency and regulatory compliance.
- Manage a staff of reliability engineering professionals, hire and train new staff, conduct performance reviews, and provide leadership and coaching, including technical and personal development programs for team members.
Requirements
Reliability Liaison- Assist resource owners and IT staff in understanding and responding to reliability concerns experienced. Provide reliability communication, awareness and training for audiences, which may range from senior leaders to field staff. Work as a liaison with vendors and the legal and purchasing departments to establish mutually acceptable contracts and service-level agreements. Manage production issues and incidents and participate in problem and change management forums.
- Work with various stakeholders to identify information asset owners to classify data and systems as part of a reliability framework implementation. Serve as an active and consistent participant in the systems reliability governance process.
- Work with the Services Delivery Senior Director and other IT and business stakeholders to define metrics and reporting strategies that effectively communicate successes and progress of the reliability program. Provide support and guidance for legal and regulatory compliance efforts, including audit support.
- Consult with other IT and reliability staff reports to ensure that reliability is factored into the evaluation, selection, installation and configuration of hardware, applications and software. Recommend and coordinate the implementation of technical controls to support and enforce defined reliability practices and policies.
- Research, evaluate, design, test, recommend or plan the implementation of new or updated reliability hardware or software, and analyze its impact on the existing environment; provide technical and managerial expertise for the administration of reliability tools. Work with the enterprise architecture team to ensure that there is a convergence of business, technical and reliability requirements; liaise with IT management to align existing technical installed base and skills with future architectural requirements.
- Develop a strong working relationship with the reliability engineering team reporting to this position to develop and implement controls and configurations aligned with reliability policies and legal, regulatory and audit requirements.
- Coordinate, measure and report on the technical aspects of reliability engineering management. Manage outsourced vendors that provide reliability functions for compliance with contracted service-level agreements. Manage and coordinate operational components of incident management, including detection, response and reporting. Maintain a knowledgebase comprising a technical reference library, reliability trends and practices, and laws and regulations.
- Manage the day-to-day activities of reliability management, identify risk tolerances, recommend treatment plans and communicate information about residual risk. Manage reliability projects and provide expert guidance on reliability matters for other IT projects. Ensure audit trails, system logs and other monitoring data sources are reviewed periodically and are in compliance with policies and audit requirements.
- Design, coordinate and oversee reliability testing procedures to verify the reliability of systems, networks and applications, and manage the remediation of identified risks.
- Performs other duties as directed.
EDUCATION AND EXPERIENCE:
Bachelor’s or Master’s degree in Reliability Engineering, Computer Science, Information systems, or related discipline, plus a minimum of seven years of IT experience, five years of which must be in a reliability engineering role, and at least two years in a supervisory capacity, or an equivalent combination of education and experience.
Benefits
Note: If interested please send your updated resume and include your salary requirement along with your contact details with a suitable time when we can reach you. If you know of anyone in your sphere of contacts, who would be a perfect match for this job then, we would appreciate if you can forward this posting to them with a copy to us.
We look forward to hearing from you at the earliest
-
Sacramento, CA, United States General Motors Full timeJob Description The Role The rapid adoption of advanced software in vehicles marks a new era for automakers and consumers, bringing both advantages and challenges. As part of Site Reliability Engineering (SRE) at General motors, you'll join a dedicated team focused on enhancing the reliability, efficiency, and scalability of our distributed systems. We...
-
Site Reliability Developer 6
6 days ago
Sacramento, CA, United States Oracle Full timeJob Description Executive Summary: SPRE Architect Role Requirements Oracle is seeking a Strategic Platform Reliability Engineering (SPRE) Architect to strengthen the architectural foundation and operational resilience of key SaaS offerings, ensuring availability, security, and compliance for top-tier customers. The SPRE Architect will lead cross-functional...
-
Site Reliability Developer 6
1 week ago
Sacramento, CA, United States Oracle Full timeJob Description Executive Summary: SPRE Architect Role Requirements Oracle is seeking a Strategic Platform Reliability Engineering (SPRE) Architect to strengthen the architectural foundation and operational resilience of key SaaS offerings, ensuring availability, security, and compliance for top-tier customers. The SPRE Architect will lead cross-functional...
-
Site Reliability Developer 6
1 week ago
Sacramento, CA, United States Oracle Full timeJob Description Executive Summary: SPRE Architect Role Requirements Oracle is seeking a Strategic Platform Reliability Engineering (SPRE) Architect to strengthen the architectural foundation and operational resilience of key SaaS offerings, ensuring availability, security, and compliance for top-tier customers. The SPRE Architect will lead cross-functional...
-
Sacramento, CA, United States NV5 Full timeOverview NV5 (NASDAQ: NVEE) is a provider of professional and technical engineering and consulting solutions to public and private sector clients in the infrastructure, energy, construction, program management and environmental markets. We are shaping the future. We are problem solvers. We are client champions. We are a team of talented professionals,...
-
Wireless Network Reliability Engineer
7 days ago
Sacramento, CA, United States Apple Full timeRole Number: 200625148-3121 Summary Apple is where individual imaginations gather together, committing to the values that lead to great work. Every new product we build, service we create, or Apple Store experience we deliver is the result of us making each other’s ideas stronger. That happens because every one of us shares a belief that we can make...
-
Site Implementation Engineer
5 days ago
Sacramento, CA, United States Noblis Full timeResponsibilities Introduction: We are seeking an experienced Site Implementation Engineer to support the FAA in implementing the Terminal Flight Data Manager (TFDM) system at designated sites. This is an end-to-end role where you will lead the installation process, from site surveys to system commissioning. As the single point of contact for all site...
-
Traffic Mid-Level Engineer
2 days ago
Sacramento, CA, United States Bennett Engineering Services Full timeTraffic Engineer, Project Mid-Level Position Summary BENEN seeks a Traffic Engineer at the project level to support and lead the delivery of traffic engineering projects for public infrastructure and private development clients. This role is a core member of the Traffic Division, with direct responsibility for producing high quality design documents,...
-
Principal Network Reliability Engineer
10 hours ago
Sacramento, CA, United States Oracle Full timeJob Description The mission of our Network Reliability Engineering team is to provide exceptional network reliability and automation services that enable our customers to drive operational excellence in OCI networks at scale. By focusing on both reactive and proactive functions, we aim to minimize downtime, quickly resolve incidents, and continuously enhance...
-
Manager, Facilities Operations Engineer
6 days ago
Sacramento, CA, United States Quality Technology Services, LLC Full timeThe Manager of Facilities Operations Engineering will be responsible for the Data Center facility technical strategy, continuous improvement and development support for sites across the QTS portfolio. This role is ultimately responsible to assist with the engineering related to coordination, installation, and integration aspects of electrical and mechanical...