Senior Cloud Site Reliability Engineer
4 days ago
Position Title: Senior Cloud Site Reliability Engineer (Azure)
Department: Information Technology
Location: Remote
Reports To: Platform DevOps Team Lead
Installation Made Easy ("IME") provides software and process management that enable retailers and contractors to offer installed home improvements to homeowners in a convenient, consistent, and affordable manner. IME senior management has over 100 years of retail management and home improvement industry experience.
We are seeking a Senior Cloud Site Reliability Engineer (SRE) with deep expertise in Microsoft Azure application platforms and hands-on experience with Ansible Automation Platform. If you enjoy digging into cloud infrastructure, automating repetitive tasks, and keeping mission-critical systems running smoothly, this role is for you
An ideal candidate for this role will be able to improve and respond to monitoring and alerting systems (LogicMonitor, Sumo Logic, PagerDuty) and lead remediation efforts across Azure environments. You will help make our cloud platform more stable, secure, automated, and cost-effective—while documenting everything clearly and owning projects end to end. The candidate must be able to work independently in a remote environment.
Essential Functions:
Cloud Infrastructure Remediation & Reliability
- Lead and execute remediation projects across Azure environments focused on stability, performance, cost optimization, and security.
- Perform deep-dive troubleshooting and implement solutions that prevent reoccurring incidents.
- Maintain and improve Infrastructure as Code (IaC) following best practices.
Automation & Configuration Management
- Develop and manage automation workflows using Ansible Automation Platform.
- Collaborate with DevOps teams to integrate automation into CI/CD pipelines.
- Identify infrastructure tasks that should be automated and make them disappear.
Monitoring & Observability
- Enhance and tune monitoring and alerting through LogicMonitor, Sumo Logic, and PagerDuty.
- Respond to critical alerts, investigate root causes, and reduce alert fatigue by optimizing thresholds and logic.
- Work with teams to define sensible SLIs/SLOs for cloud services.
Pipeline & DevOps Engineering
- Build, maintain, and optimize CI/CD pipelines (Azure DevOps or GitLab CI/CD).
- Support infrastructure provisioning, deployment automation, and secrets management.
- Improve deployment reliability and consistency through automation.
Project Execution & Technical Leadership
- Lead infrastructure and reliability projects end-to-end while working closely with platform and development teams.
- Knowledge share and mentor other engineers in SRE and cloud best practices.
- Champion operational excellence, reliability patterns, and scalability.
Documentation & Standards
- Produce clear documentation, diagrams, runbooks, and SOPs that real humans can read.
- Ensure infrastructure standards, patterns, and playbooks are consistently followed.
- Maintain internal knowledge bases to reflect the latest changes and learning.
Minimum Qualifications:
- 5+ years of hands-on Microsoft Azure experience across compute, networking, identity, security, and application services.
- Proficiency with Ansible Automation Platform for configuration management and orchestration.
- Experience using monitoring and incident response tools such as LogicMonitor, Sumo Logic, and PagerDuty.
- Strong scripting ability with PowerShell, Bash, or Python.
- Hands-on experience with Azure DevOps or equivalent CI/CD platform.
- Demonstrated ability to lead technical projects and deliver results.
- Strong written communication skills for producing documentation and runbooks.
- Ability to thrive in a remote-first environment with minimal oversight.
- Strong collaborator who can work across engineering, DevOps, and security teams.
- Passionate about automation, observability, and reliability engineering.
- Comfortable with incident response, debugging under pressure, and performing root cause analysis.
Preferred Qualifications:
- Azure certifications (AZ-104, AZ-400, AZ-305).
- Experience working with containerized environments (AKS/Kubernetes).
- Familiarity with security and compliance frameworks such as CIS, NIST, ISO.
- Exposure to PCI and SOC compliance environments.
- Knowledge of GitOps practices and advanced observability tools.
- Experience supporting software development teams with architecture and deployment patterns.
Physical Requirements:
- Prolonged periods of sitting at a desk and working on a computer.
Benefits to working with IME:
- 100% remote work environment
- Employer provided equipment.
- Medical, dental, and vision insurance
- Health savings plan includes employer contribution to health savings account.
- Medical and dental flexible spending accounts
- Company paid basic life, short-term disability, and long-term disability insurance.
- 401K plan with employer match
- Company matches 100% of the first 4% of salary deferrals.
- All contributions, including employer contributions, are 100% vested immediately.
- Employee discount program for Electronics, Groceries, Travel, Entertainment, and more
- Employee assistance program
- Pay on demand.
- Critical illness, hospital indemnity, group accident, and legal insurance
- Paid time off.
- And more
We are an Equal Opportunity and Drug‐Free Workplace.
The Job Description is not an exhaustive statement of all duties, responsibilities, or qualifications of the job, nor is it intended to limit opportunities for necessary modifications. The Job Description does not constitute an employment contract of any kind.
-
Senior Site Reliability Engineer
6 days ago
Remote, Oregon, United States Maxihost Full timeAbout 's global computing platform was launched in 2019, enabling businesses to programmatically deploy single-tenant Bare Metal instances in different parts of the world. We are a team of passionate individuals about hardware, software, and network infrastructure looking to build the fastest, easiest-to-use, developer-centric single-tenant Cloud...
-
Senior Site Reliability Engineer
7 days ago
Remote, Oregon, United States Fortress Information Security Full timeSenior Site Reliability EngineerLocation: RemoteCompensation: $160, ,000 per year, depending on experience and qualifications.Employment Type: Full-TimeWhat you can expect as the Senior Site Reliability Engineer at Fortress…The Senior Site Reliability Engineer is responsible for ensuring the reliability, performance, and scalability of critical systems and...
-
Senior Site Reliability Engineer
5 days ago
Remote, Oregon, United States Granicus Full timeThe CompanyServing the People Who Serve the PeopleGranicus is driven by the excitement of building, implementing, and maintaining technology that is transforming the Govtech industry by bringing governments and its constituents together. We are on a mission to support our customers with meeting the needs of their communities and implementing our technology...
-
Site Reliability Engineer
2 weeks ago
Remote, Oregon, United States ADT Full time $200,000 - $250,000 per yearADT is transitioning to an in-office model. New team members will work from home but should plan to return to an in-office model at a later date. We will keep you well informed and supported throughout the transition.Summary:We are seeking a highly skilled and motivated Site Reliability Engineer (SRE) to join our team. As an SRE, you will be responsible for...
-
Site Reliability Engineer
5 days ago
Remote, Oregon, United States Cutover Full timeAn inclusive work environment is an empowering one. At Cutover, we lead with empathy and enable others to succeed through curiosity, kindness, and self-expression.Location: Remote, United StatesThis role requires on-call shifts, roughly 1 in 4 weeks and 1 in 4 weekends - 2nd Shift: 2:00pm -11:00pm PST (10:00 PM - 7:00 AM UTC)Cutover provides enterprise...
-
Site Reliability Engineer
6 days ago
Remote, Oregon, United States 2Prod Technologies Corp. Full timeAbout 2Prod2Prod Technologies Corp. supports the federal government in delivering secure, scalable cloud solutions that advance critical national missions.Position Summary2Prod Technologies Corp. is seeking a Site Reliability Engineer (SRE) with strong GitLab expertise to support and enhance enterprise platforms. This role will focus primarily on GitLab...
-
Staff Site Reliability Engineer
2 weeks ago
Remote, Oregon, United States AlphaSense Full timeAbout AlphaSense: The world's most sophisticated companies rely on AlphaSense to remove uncertainty from decision-making. With market intelligence and search built on proven AI, AlphaSense delivers insights that matter from content you can trust. Our universe of public and private content includes equity research, company filings, event transcripts, expert...
-
Senior Cloud Engineer
2 weeks ago
Remote, Oregon, United States ST6 Partners Full timeSenior Cloud Engineer ST6 (Seal Team Six) is an elite team of battle-hardened software operators dedicated to building enduringly great software companies. Our focus is on professionalizing and scaling software businesses from $100 million to $500 million. We partner with top-tier private equity software firms such as TA, Hg, Insight Partners, and Genstar to...
-
Site Reliability Engineer
2 weeks ago
Remote, Oregon, United States JWay Group Full timeSr. Site Reliability Engineer, Stack ManagementAs a Site Reliability Engineer, you will be responsible for architecting, maintaining, and managing our client's infrastructure which includes solving some of the most challenging cloud access and data security problems for enterprise customers.Job ResponsibilitiesMaintain and support existing IT infrastructure...
-
Site Reliability Engineer, SaaS
2 weeks ago
Remote, Oregon, United States Veeam Software Full timeVeeam, the #1 global market leader in data resilience, believes businesses should control all their data whenever and wherever they need it. Veeam provides data resilience through data backup, data recovery, data portability, data security, and data intelligence. Based in Seattle, Veeam protects over 550,000 customers worldwide who trust Veeam to keep...