Senior Site Reliability Engineer

4 weeks ago

Agoura Hills, United States Lakeview Loan Servicing Full time

Overview Lakeview IT is passionate about delivering high-quality products and services to our customers. Our technology operations team is committed to ensuring reliable, scalable, and high-performing services for our clients. We are looking for a talented and motivated Site Reliability Engineer to join our dynamic team and help us continue to build and maintain a world-class infrastructure. The Sr. Site Reliability Engineer at Lakeview is responsible for ensuring the availability, performance, and scalability of the company's critical systems. They will lead the design and implementation of infrastructure solutions, focusing on automation, monitoring, and high reliability. This role involves optimizing system performance, managing incident responses, and conducting post-mortems to drive continuous improvement. The engineer will also work closely with engineering, development, and operations teams to create and enforce best practices, establish service-level objectives, and ensure seamless deployment processes. Additionally, mentoring junior team members and driving key architectural decisions are essential aspects of the role to build a culture of reliability and operational excellence. Salary range for the role is between $130,000 and $150,000 with an annual bonus. The position can be 100% remote, but if located in the Agoura Hills, CA area the expectation will be that the role is hybrid. ResponsibilitiesProactively identify and resolve incidents before they impact operations. Monitor all systems and infrastructure for the highest level of availability. Perform routine maintenance tasks, including monitoring, patching, and backups. Respond to incidents and outages in a timely and effective manner. Collaborate with other teams to diagnose and resolve complex issues. Document incident details and implement corrective actions to prevent recurrence. Document processes, configurations, and troubleshooting procedures. Diagnose and resolve application performance problems or system outages. Play the role of Incident Manager during outages. Resolve complex hardware and software issues, and work with vendors when necessary. Optimize system performance and resource utilization on-prem and in the cloud. Develop and maintain automation scripts to streamline repetitive tasks. Utilize scripting languages (e.g., PowerShell, Python, etc.) to automate system administration. Implement configuration management tools to ensure consistency and repeatability. Create and maintain comprehensive documentation of IT processes and procedures. Lead the design, development, and implementation of reliable, scalable infrastructure systems. Mentor junior SREs, guiding on best practices and technical issues. Architect and execute disaster recovery and high-availability plans. Drive incident management processes, ensuring swift and effective resolution of critical issues. Optimize system performance through proactive monitoring, tuning, and capacity planning. Lead root cause analysis and post-mortem discussions to identify long-term fixes. Develop and maintain complex automation scripts to enhance system reliability. Influence reliability improvements within the engineering organization, promoting a culture of observability and resilience. Champion the adoption of new tools and technologies that enhance system stability and deployment efficiency. Communicate effectively with stakeholders and executive leadership regarding system status, incidents, and upcoming reliability initiatives. QualificationsStrong understanding of IT infrastructure components, including servers, networks, and storage. Knowledge in scripting languages (e.g., PowerShell, Python). Knowledge of networking concepts and protocols (e.g., TCP/IP, DNS, DHCP). Experience with IT service management frameworks. Experience with cloud platforms such as AWS and Azure. Experience of virtualization technologies such as Azure VDI, AWS Workspaces. Experience with monitoring and alerting tools (e.g., New Relic, Datadog). Excellent problem-solving and analytical skills. Strong communication and interpersonal skills. Extensive expertise in the Windows operating system. Physical Demands and Work Environment The physical demands described here are representative of those that must be met by an employee to successfully perform the essential functions of this job. Reasonable accommodations may be made to enable individuals with disabilities to perform the essential functions. While performing the duties of this job, the employee is regularly required to sit and use hands to handle, touch or feel objects, tools, or controls. The employee frequently is required to talk and hear. The noise level in the work environment is usually moderate. The employee is occasionally required to stand; walk; reach with hands and arms. The employee is rarely required to stoop, kneel, crouch, or crawl. The employee must regularly lift and/or move up to 50 pounds. Specific vision abilities required by this job include close vision, color vision, and the ability to adjust focus. EEOC Lakeview is an Equal Employment Opportunity employer. All aspects of consideration for employment and employment with the Company are governed on the basis of merit, competence, and qualifications without regard to race, color, religion, sex, national origin, age, disability, veteran status, sexual orientation, or any other category protected by federal, state, or local law.

Senior Site Reliability Engineer

2 weeks ago

Agoura Hills, CA, United States Lakeview Loan Servicing Full time

Overview Lakeview IT is passionate about delivering high-quality products and services to our customers. Our technology operations team is committed to ensuring reliable, scalable, and high-performing services for our clients. We are looking for a talented and motivated Site Reliability Engineer to join our dynamic team and help us continue to build and...
Senior Site Reliability Engineer

6 days ago

Agoura Hills, CA, United States Lakeview Loan Servicing Full time

Overview Lakeview IT is passionate about delivering high-quality products and services to our customers. Our technology operations team is committed to ensuring reliable, scalable, and high-performing services for our clients. We are looking for a talented and motivated Site Reliability Engineer to join our dynamic team and help us continue to build and...
Senior Site Reliability Engineer

1 week ago

Agoura Hills, CA, United States Lakeview Loan Servicing Full time

Overview Lakeview IT is passionate about delivering high-quality products and services to our customers. Our technology operations team is committed to ensuring reliable, scalable, and high-performing services for our clients. We are looking for a talented and motivated Site Reliability Engineer to join our dynamic team and help us continue to build and...
Senior Site Reliability Engineer

1 day ago

Agoura, California, United States Lakeview Loan Servicing, LLC. Full time

OverviewLakeview IT is passionate about delivering high-quality products and services to our customers. Our technology operations team is committed to ensuring reliable, scalable, and high-performing services for our clients. We are looking for a talented and motivated Site Reliability Engineer to join our dynamic team and help us continue to build and...
Manager, Site Reliability Engineering

7 days ago

Beverly Hills, California, United States ALO Full time $120,000 - $250,000 per year

WHY JOIN ALO?Mindful movement. It's at the core of why we do what we do at ALO—it's our calling. Because mindful movement in the studio leads to better living. It changes who yogis are off the mat, making their lives and their communities better. That's the real meaning of studio-to-street: taking the consciousness from practice on the mat and putting it...
Site Reliability Engineer, Consultant

7 days ago

El Dorado Hills, United States Blue Shield of California Full time

Your Role We are seeking an Experienced Site Reliability Engineer (SRE) to lead reliability, scalability, and performance initiatives across our production systems. In this role, you will blend software engineering, automation, and systems operations to ensure that our platforms are resilient, efficient, and continuously improving.You will be part of a...
Site Reliability Engineer, Consultant

4 days ago

Woodland Hills, CA, United States Blue Shield of CA Full time

Your Role We are seeking an Experienced Site Reliability Engineer (SRE) to lead reliability, scalability, and performance initiatives across our production systems. In this role, you will blend software engineering, automation, and systems operations to ensure that our platforms are resilient, efficient, and continuously improving. You will be part of a...
Site Reliability Engineer, Consultant

14 hours ago

Woodland Hills, CA, United States Blue Shield of CA Full time

Your Role We are seeking an Experienced Site Reliability Engineer (SRE) to lead reliability, scalability, and performance initiatives across our production systems. In this role, you will blend software engineering, automation, and systems operations to ensure that our platforms are resilient, efficient, and continuously improving. You will be part of a...
Sr. Reliability Engineer

3 weeks ago

Rochester Hills, United States Gates Corporation Full time

Join to apply for the Sr. Reliability Engineer role at Gates Corporation1 day ago Be among the first 25 applicantsJoin to apply for the Sr. Reliability Engineer role at Gates CorporationGet AI-powered advice on this job and more exclusive features.Are you inspired by challenging the status quo? Do you thrive in collaborative environments that drive results?...
Principal Site Reliability Engineer, ML Platform

1 week ago

Short Hills, New Jersey, United States Zscaler Full time $164,500 - $235,000 per year

About ZscalerServing thousands of enterprise customers around the world including 45% of Fortune 500 companies, Zscaler (NASDAQ: ZS) was founded in 2007 with a mission to make the cloud a safe place to do business and a more enjoyable experience for enterprise users. As the operator of the world's largest security cloud, Zscaler accelerates digital...

Americas

Europe

Asia / Oceania

Africa

Senior Site Reliability Engineer