Senior Site Reliability Engineer

2 weeks ago


Remote, Oregon, United States Spekit Full time

Headquartered out of Denver, CO, we're a small but mighty team on a mission to completely reinvent the future of learning at work.

Introducing Spekit: the new way to learn in today's remote, digital workplace. Say goodbye to distracted zoom training sessions and lengthy LMS courses your teams will forget. Instead, Spekit takes all of your training & enablement – for applications, processes, sales playbooks, SOPs, and more and embeds that training directly in your employees' tools & workflows, right when and where they need them. Whether that's a walkthrough to guide you through creating a quote in Salesforce or a competitor battle card to handle an objection in email, Spekit is your learning companion every step of the way.

The world's most innovative companies, including Uber Freight, Outreach, Snowflake, Southwest Airlines, and Udemy, leverage Spekit to accelerate onboarding, drive tool adoption, increase productivity, remove the friction of change management, and fuel the growth of their employees – from anywhere.

With over $60M in VC funding from top venture firms, including Craft Ventures, Bonfire Ventures, Renegade Partners, and the Foundry Group, Spekit is the rocket ship you'll want to be on.

About the role:

Spekit's Infrastructure Team is looking for a highly motivated Senior Site Reliability Engineer (SRE). This role plays a critical role in ensuring the reliability, scalability, and performance of our systems and services. This position requires a deep understanding of both software engineering and systems administration, with a focus on automation, monitoring, and incident response. The Senior SRE will collaborate closely with cross-functional teams to design, build, and maintain resilient and efficient systems that meet the needs of our users and business objectives.

Must-have qualifications:

  • Excellent understanding of Cloud environments such as AWS
  • Solid understanding of the Kubernetes ecosystem and best practices of K8s
  • 8+ years in Infrastructure Ops, Site Reliability Engineering, or DevOps focused role
  • Ability to troubleshoot and solve complex problems during highly stressful situations such as during incidents
  • Data Warehouse, Database management, and performance tuning (PostgreSQL)
  • Experience with LGTM stack (Loki for logs, Grafana for visualization, Tempo for traces, and Mimir for metrics.)
  • Must be excellent in coding, refactoring, and writing tests
  • Excellent programming skills in Python, Go or similar
  • Working knowledge of security tools such as Snyk, Cloudflare
  • Application and Linux System Troubleshooting
  • System, Platform, and Web Application Development (Django/Python, , Go, Nginx, , Gunicorn, etc)
  • Proficient in working in a cloud-based Linux CLI environment
  • Knowledge and understanding of network protocols (HTTP/S, SSH, SSL, DNS)
  • Must be willing to be part of an on-call rotation to support production systems during PST hours and weekends
  • Excellent communication skills both written and verbal

What's in it for you?

100% paid employee Medical, Dental, Vision, and Basic & Optional Life Insurance. Benefits begin on your first day

Insurance coverage for the whole family, including flexible spending accounts

Meaningful equity -- every employee is granted stock options when they walk in the door

Flexible Paid Time Off (PTO) policy with mandatory minimum of 2 weeks of annual vacation time

Hybrid work environment: Casual and open Denver, CO office with the ability to balance your time working from home

10 paid holidays days, sick leave, mental health days, and a 1-week end-of-year company shutdown

Paid parental leave

L&D stipend that can be used for learning opportunities at your discretion

The chance to help build from the ground up. The hires we're making now are foundational to our growth as a company

Things we value, culture-wise:

Grit & Growth. We run towards challenges. If something seems unsolvable, it unleashes our persistence, our creativity, and our ability to move through uncertainty to create a solution.
Simple yet Spektacular. We're in the early stages of building something really great and that requires a lot of hands on deck and a focus on execution. In this journey, we uncover joy in simplicity, obsess over the experience, pivot quickly and always reach for excellence.
Tenacity. The endless pursuit of customer love We believe in collaboration, transparency, integrity, trust, listening, doing what is right, and always going above and beyond for our team and customers.
Belonging. We strive to build a company culture inclusive of all voices, differences of opinions, and the permission to be our authentic selves. We accept and celebrate what makes us unique and connects us to one another.
Enjoy the Journey. Love what you do and who you do it with We embrace joy and kindness and we bring our authentic selves to work each day. We seek to share our optimism and compassion with everyone around us.

About the Team

At Spekit, we strive to be the change we seek. And the change we seek is a wealth of diversity in technology and the workplace. As a company with two female founders, we know that diverse and inclusive cultures drive innovative results. We've committed as an organization to elevate underrepresented minorities in technology through awareness, partnerships and even hosting our own scholarships to do our part in changing the status quo. If this sounds like the right place for you, we'd love to chat



  • Remote, Oregon, United States Catena Media Full time

    As a Senior Site Reliability Engineer at Catena, you will play a crucial role in maintaining optimal system performance and upholding high standards of availability, security, and resilience. Working at the intersection of software development and operations, you will collaborate closely with cross-functional teams to deliver high-quality services to our...


  • Remote, Oregon, United States Sojern Full time

    Position Summary:Sojern is looking for a Senior Site Reliability Engineer in the US to collaborate with Software Engineering teams located primarily in the Pacific Time Zone. An ideal candidate would have extensive experience building cloud infrastructure on Google Cloud with Terraform, and have strong experience running and securing workloads at scale on...


  • Remote, Oregon, United States Articulate Full time

    Articulate is looking for a Senior Site Reliability Engineer to join our amazing Platform Engineering team. The Senior Site Reliability Engineer I will be responsible for working cross-functionally to deliver and maintain scalable and reliable infrastructure. What you'll do:Be an example of the best practices your team and adjacent teams should follow when...


  • Remote, Oregon, United States DFIN Full time

    Donnelley Financial Solutions (DFIN) is a leader in risk and compliance solutions, providing insightful technology, industry expertise and data insights to clients across the globe. We're here to help you make smarter decisions with insightful technology, industry expertise and data insights at every stage of your business and investment lifecycles. As...


  • Remote, Oregon, United States Roadie Full time

    Roadie, a UPS Company, is a logistics management and crowdsourced delivery platform. Founded in 2014, Roadie offers businesses fast, flexible and asset-light logistics solutions for last-mile delivery. Roadie enables local delivery to more than 95% of U.S. households by providing access to more than 200,000 independent drivers nationwide – allowing...


  • Remote, Oregon, United States Podium Full time

    At Podium, our mission is to help local businesses win. Our lead conversion platform, powered by AI and integrations, helps local businesses convert leads faster, communicate easier, and make more sales. Every day, thousands of local businesses utilize our review management, communication, marketing, and payments products. Our work and focus on helping local...


  • Remote, Oregon, United States Aurora Labs Full time

    About Us Aurora Labs is the development company behind Aurora—the EVM blockchain that runs on the NEAR Protocol. We are also the developers of, and integration partner behind, Aurora Cloud—a suite of products that allow Web2 companies to capture the value of Web3.We invite you to be a part of our team of smart, professional, result-oriented and fun...


  • Remote, Oregon, United States Lumin Digital Full time

    Our Site Reliability Engineers (SRE) are good developers with an operations mindset. They enjoy reducing or completely eliminating manual tasks, are excellent problem solvers, and know automation is the key to operating a large-scale system.SREs make sure that our application is highly available and Service Level Objectives (SLO) are met. SREs work closely...


  • Remote, Oregon, United States Arcadia (DC) Full time

    Who We Are Arcadia is the technology company empowering energy innovators and consumers to fight the climate crisis. Our software and APIs are revolutionizing an industry held back by outdated systems and institutions by creating unprecedented access to the data and clean energy needed to make a decarbonized energy grid possible.In 2014, Arcadia set out on...


  • Remote, Oregon, United States Edge & Node Full time

    Edge & Node stands as the revolutionary vanguard of web3, a vision of a world powered by individual autonomy, shared self-sovereignty and limitless collaboration. Established by trailblazers behind The Graph, we're on a mission to make The Graph the internet's unbreakable foundation of open data. Edge & Node invented and standardized subgraphs across the...


  • Remote, Oregon, United States Sunrun Full time

    Everything we do at Sunrun is driven by a determination to transform the way we power our lives. We know that starts at the individual employee level. We strive to foster an environment you can thrive in through our commitment to diversity, inclusion and belonging.Objective:As a Sr. Site reliability engineer you are expected to help drive monitoring,...


  • Remote, Oregon, United States Stack Overflow Full time

    Every developer has a tab open on Stack Overflow. We are one of the most popular websites in the world - a community-based space focused on increasing productivity, decreasing cycle times, accelerating time to market, and protecting institutional knowledge. Innovation is at the heart of everything we do. We embrace collaboration, transparency, and believe in...

  • Reliability Engineer

    6 hours ago


    Remote, Oregon, United States Ocado Group Full time

    Reliability EngineerAnywhere, USA (West Coast preference)As a Reliability Engineer at Ocado, you will work directly with customers to maximize uptime of autonomous mobile robots and cloud-based software systems for warehouse automation. Conduct comprehensive analyses to evaluate the reliability and performance of our products, identify potential failure...


  • Remote, Oregon, United States Fireblocks Full time

    The world of digital assets is accelerating in speed, magnitude, and complexity, opening the door to new ways for leveraging the blockchain. Fireblocks' platform and network provide the simplest and most secure way for companies to work with digital assets and it trusted by some of the largest financial institutions, banks, globally-recognized brands, and...


  • Remote, Oregon, United States Supabase Full time

    Supabase is an Open Source and fully remote company building developer tools for databases.We are seeking an experienced SRE to manage the infrastructure of our Postgres databases. We currently manage over 1M Postgres instances and are growing fast.You will:Help build the Supabase Postgres offering.Focus on improving the reliability of database backups and...


  • Remote, Oregon, United States Galaxy Full time

    Who We Are:At Galaxy we are building products and services to help the world invest in economic progress. We believe crypto and blockchain innovations will permeate and improve all aspects of our global economy. Our vision is a society where value and ownership flow as freely as information. Galaxy is a digital asset and blockchain leader helping...


  • Remote, Oregon, United States DFIN Full time

    Donnelley Financial Solutions (DFIN) is a leader in risk and compliance solutions, providing insightful technology, industry expertise and data insights to clients across the globe. We're here to help you make smarter decisions with insightful technology, industry expertise and data insights at every stage of your business and investment lifecycles. As...


  • Remote, Oregon, United States Virtasant Full time

    Virtasant is a leading cloud consulting services provider. We heavily focus on lift & shift, cloud-native development, cloud cost optimization, and migration services. As a consulting company, we often are faced with the challenge to create an engineering team in a matter of a week or two. To do that, we have created a secondary support business that runs an...


  • Remote, Oregon, United States OpenTeams Full time

    Who We AreOpenTeams is the services marketplace where open source software users can find, vet, and contract with service providers. At OpenTeams we believe in a culture of do-ers, learners, and collaborators. We are looking for people who are motivated, humble, curious, and respectful of others. In order to meet the demands of our high growth business, we...


  • Remote, Oregon, United States Gremlin Full time

    Job Description: Today's complex, fast-paced systems have become a minefield of reliability risks—any of which could cause an outage that costs millions and destroys customer confidence. That's why high-availability teams use the Gremlin to find and fix ‌reliability risks before they become incidents.Gremlin Reliability Platform helps software teams...