Senior Site Reliability Engineer

6 days ago

Remote, Oregon, United States Granicus Full time

The Company

Serving the People Who Serve the People

Granicus is driven by the excitement of building, implementing, and maintaining technology that is transforming the Govtech industry by bringing governments and its constituents together. We are on a mission to support our customers with meeting the needs of their communities and implementing our technology in ways that are equitable and inclusive. Granicus has consistently appeared on the GovTech 100 list over the past 5 years and has been recognized as the best companies to work on BuiltIn.

Over the last 25 years, we have served 5,500 federal, state, and local government agencies and more than 300 million citizen subscribers power an unmatched Subscriber Network that use our digital solutions to make the world a better place. With comprehensive cloud-based solutions for communications, government website design, meeting and agenda management software, records management, and digital services, Granicus empowers stronger relationships between government and residents across the U.S., U.K., Australia, New Zealand, and Canada. By simplifying interactions with residents, while disseminating critical information, Granicus brings governments closer to the people they serve—driving meaningful change for communities around the globe.

Want to know more? See more of what we do here.

Job Summary

Granicus is seeking an experienced and highly skilled Senior Site Reliability Engineer (SRE) to join our SRE team. As a Senior SRE, you will play a pivotal role in ensuring the reliability, scalability, and performance of our services. You will lead efforts in building and maintaining a robust infrastructure, automating processes, and guiding the team to implement best practices in site reliability.

What Your Impact Will Look Like

On-call Production Support: Provide production support on a shift according to the team on-call roster.
Work on the customer and internal engineering/implementation team raised tickets while not on-call for production support. For example, a client may request to correct some data on the database server which cannot be done through the web interface.
Work on SREs backlog items.
Monitor and Maintain Systems: Continuously monitor the health and performance of our services, systems, and infrastructure. Respond to alerts and incidents promptly to ensure high availability.
Automate Processes: Develop and maintain automation scripts and tools to streamline operations and reduce manual intervention.
Incident Management: Assist in troubleshooting and resolving incidents, performing root cause analysis, and implementing long-term fixes to prevent recurrence.
System Improvements: Participate in designing and implementing system improvements to enhance reliability, scalability, and performance.
Collaboration: Work closely with software engineers to understand application requirements, provide feedback on design and architecture, and support deployment and release processes.
Documentation: Create and maintain documentation for processes, procedures, and troubleshooting guides to ensure knowledge sharing within the team.
Capacity Planning: Assist in capacity planning activities to anticipate future needs and ensure that our infrastructure can handle growth.
Security: Implement and adhere to security best practices to protect our systems and data.

You Will Love This Job If You Have

5+ years in site reliability engineering, system administration, or a similar role, with a proven track record of managing large-scale, high-availability systems. Experience supporting AI/ML infrastructure, including model deployment, inference optimization, and integration with services like AWS Bedrock is highly desirable.
Expertise in Linux/Unix systems, and cloud platforms (AWS, Azure, or Google Cloud).
Strong proficiency in scripting languages (Python, Bash, Ruby) and programming languages (Go, Java, C++).
Familiarity with AI/ML operations, including model lifecycle management, vector databases, and inference performance tuning.
Experience with the ELK Stack (Elasticsearch, Logstash, Kibana) for centralized logging, monitoring, and observability.
Experience with configuration management tools (Ansible, Chef, Puppet).
Exposure to AI/ML toolchains, including AWS Bedrock, SageMaker, and LLMOps frameworks.
Certifications: Relevant certifications such as AWS Certified DevOps Engineer, AWS Certified Machine Learning – Specialty, Google Cloud Professional DevOps Engineer, or similar are a plus.

Pay Range

USD $80, USD $100,000.00 /Yr.

About Us

Don't have all the skills/experience mentioned above? At Granicus, we are trying to build diverse, inclusive teams. We do not have degree requirements for most of our roles. If you don't meet every requirement above but are excited to learn more, we encourage you to apply. We might just be able to find another role that could be a perfect fit

Security and Privacy Requirements

Responsible for Granicus information security by appropriately preserving the Confidentiality, Integrity, and Availability (CIA) of Granicus information assets in accordance with the company's information security program.
Responsible for ensuring the data privacy of our employees and customers, their data, as well as taking all required privacy training in a timely manner, in accordance with company policies.

The Team

We are a remote-first company with a globally distributed workforce across the United States, Canada, United Kingdom, India, Armenia, Australia, and New Zealand.

The Culture

At Granicus, we are building a transparent, inclusive, and safe space for everyone who wants to be
a part of our journey.
A few culture highlights include – Employee Resource Groups to encourage diverse voices
Coffee with Mark sessions – Our employees get to interact with our CEO on very important and
sometimes difficult issues ranging from mental health to work-life balance and current affairs.
Microsoft Teams communities focused on wellness, art, furbabies, family, parenting, and more.
We bring in special guests from time to time to discuss issues that impact our employee
population

The Impact

We are proud to serve dynamic organizations around the globe that use our digital solutions to make the world a better place — quite literally. We have so many powerful success stories that illustrate how our solutions are impacting the world. See more of our impact here.

The Benefits

At Granicus, we offer a comprehensive and flexible benefits package designed to support your well-being, growth, and work-life balance—starting from day one.
Here's what you can expect as a U.S.-based team member:

Flexibility & Balance

Flexible Time Off – Take the time you need to rest, recharge, and live your life.
Company-Wide Wellbeing Days – Paid days off to unplug and focus on your mental health.
Work From Home Reimbursement – Support a productive home office environment.

Health & Wellness

Multiple Health Plan Options – Including a 100% employer-paid plan.
Employer HSA Contributions – When enrolled in a High-Deductible Health Plan.
Fitness Reimbursement Program – Stay active, your way.
On-Demand Mental Health Support – Access to Headspace and other wellness tools.

Family & Future

Paid Parental Leave – For both birthing and non-birthing parents.
Traditional & Roth 401(k) – With a generous company match.
Life & AD&D Insurance – 100% employer-paid coverage for peace of mind.

Growth & Recognition

Online Learning Platforms – Fuel your professional development.
Competitive Salary & Bonuses – Your contributions are valued and rewarded.

Equal Opportunity Employer

Granicus is committed to providing equal employment opportunities. All qualified applicants and employees will be considered for employment and advancement without regard to race, color, religion, creed, national origin, ancestry, sex, gender, gender identity, gender expression, physical or mental disability, age, genetic information, sexual or affectional orientation, marital status, status with regard to public assistance, familial status, military or veteran status or any other status protected by applicable law.

Senior Site Reliability Engineer

7 days ago

Remote, Oregon, United States Fortress Information Security Full time

Senior Site Reliability EngineerLocation: RemoteCompensation: $160, ,000 per year, depending on experience and qualifications.Employment Type: Full-TimeWhat you can expect as the Senior Site Reliability Engineer at Fortress…The Senior Site Reliability Engineer is responsible for ensuring the reliability, performance, and scalability of critical systems and...
Senior Site Reliability Engineer

6 days ago

Remote, Oregon, United States Maxihost Full time

About 's global computing platform was launched in 2019, enabling businesses to programmatically deploy single-tenant Bare Metal instances in different parts of the world. We are a team of passionate individuals about hardware, software, and network infrastructure looking to build the fastest, easiest-to-use, developer-centric single-tenant Cloud...
Senior Cloud Site Reliability Engineer

4 days ago

Remote, Oregon, United States Installation Made Easy, Inc Full time

Position Title: Senior Cloud Site Reliability Engineer (Azure)Department: Information TechnologyLocation: RemoteReports To: Platform DevOps Team LeadInstallation Made Easy ("IME") provides software and process management that enable retailers and contractors to offer installed home improvements to homeowners in a convenient, consistent, and affordable...
Staff Site Reliability Engineer

2 weeks ago

Remote, Oregon, United States AlphaSense Full time

About AlphaSense: The world's most sophisticated companies rely on AlphaSense to remove uncertainty from decision-making. With market intelligence and search built on proven AI, AlphaSense delivers insights that matter from content you can trust. Our universe of public and private content includes equity research, company filings, event transcripts, expert...
Site Reliability Engineer

2 weeks ago

Remote, Oregon, United States ADT Full time $200,000 - $250,000 per year

ADT is transitioning to an in-office model. New team members will work from home but should plan to return to an in-office model at a later date. We will keep you well informed and supported throughout the transition.Summary:We are seeking a highly skilled and motivated Site Reliability Engineer (SRE) to join our team. As an SRE, you will be responsible for...
Site Reliability Engineer

5 days ago

Remote, Oregon, United States Cutover Full time

An inclusive work environment is an empowering one. At Cutover, we lead with empathy and enable others to succeed through curiosity, kindness, and self-expression.Location: Remote, United StatesThis role requires on-call shifts, roughly 1 in 4 weeks and 1 in 4 weekends - 2nd Shift: 2:00pm -11:00pm PST (10:00 PM - 7:00 AM UTC)Cutover provides enterprise...
Site Reliability Engineer

2 weeks ago

Remote, Oregon, United States JWay Group Full time

Sr. Site Reliability Engineer, Stack ManagementAs a Site Reliability Engineer, you will be responsible for architecting, maintaining, and managing our client's infrastructure which includes solving some of the most challenging cloud access and data security problems for enterprise customers.Job ResponsibilitiesMaintain and support existing IT infrastructure...
Site Reliability Engineer

6 days ago

Remote, Oregon, United States 2Prod Technologies Corp. Full time

About 2Prod2Prod Technologies Corp. supports the federal government in delivering secure, scalable cloud solutions that advance critical national missions.Position Summary2Prod Technologies Corp. is seeking a Site Reliability Engineer (SRE) with strong GitLab expertise to support and enhance enterprise platforms. This role will focus primarily on GitLab...
Lead Site Reliability Engineer

4 days ago

Remote, Oregon, United States Canary Technologies Corp Full time

About Us Canary Technologies is changing the game for hotels with modern software powered by Canary's hospitality-specific AI platform. Canary is utilized by 20,000+ hoteliers in 100+ countries to equip hoteliers with the technology they need to work smarter and wow their guests. Major hotel brands such as Wyndham, Marriott, IHG, Four Seasons, Rosewood, and...
Global Head of Site Reliability Engineering

4 days ago

Remote, Oregon, United States Socure Full time

Why Socure?At Socure, we're on a mission—to verify 100% of good identities in real time and eliminate identity fraud from the internet.Using predictive analytics and advanced machine learning trained on billions of signals to power RiskOS, Socure has created the most accurate identity verification and fraud prevention platform in the world. Trusted by...

Americas

Europe

Asia / Oceania

Africa

Senior Site Reliability Engineer