Senior Site Reliability Engineer

2 weeks ago


Remote, Oregon, United States Lumin Digital Full time

Our Site Reliability Engineers (SRE) are good developers with an operations mindset. They enjoy reducing or completely eliminating manual tasks, are excellent problem solvers, and know automation is the key to operating a large-scale system.

SREs make sure that our application is highly available and Service Level Objectives (SLO) are met. SREs work closely with our Software Engineers (SWE) using their interest in operations and development skills to ensure new features follow SRE best practices and are supportable.

ESSENTIAL FUNCTIONS:

-CI/CD.Monitor and resolve issues in all environments. Ensure SLO and uptime are met.

-Ensure SRE concerns are addressed from the time a feature is designed through its deployment to production.

-Work on the SRE scrum team

-Engage in capacity planning and demand forecasting, anticipating performance bottlenecks and scaling the environment as needed.

-Change management.

-Uptime and SLO reporting.

KNOWLEDGE, SKILLS & ABILITIES:

-Cultural fit.

-Humility.

-Strong sense of ownership, customer service, and integrity.

-Willing to walk in the mud.

-Commitment to continually improving yourself.

-Operational expertise with a desire to eliminate manual tasks:DevOps approach - Automation and resilient systems are key.Monitoring and Alerting - Monitor the right things.

Alert appropriately:

Self heal.

Involve people when needed.

Log tickets when no immediate action is required.

-Remain calm in trying circumstances.

-Exceptional full stack and environment troubleshooting skills.

-Expert-level knowledge of at least one configuration management system (Chef, Ansible, Puppet, etc.).

-Understanding of standard networking protocols and components such as: HTTP, DNS, TCP/IP, ICMP, the OSI Model, Subnetting and Load Balancing.

-Security mindset.

-Data cannot and will not be compromised.

-Driven to ensure that being on-call is boring.

-Exceptional written and verbal communication skills.

-Past history working on an agile scrum team.

-Expert hosting in the Cloud.

-AWS preferred, but Google Cloud and Azure are also of interest.

-Experience with a microservice architecture running in containers (Docker or other containerization technology).

-Experience with Terraform and Kubernetes

-Understand CI / CD and ability to architect the workflow.

-Willing to participate in a 24x7 on-call rotation.

DESIRED SKILLS:

-2+ years of experience as a software engineer. C#, Angular, JavaScript preferred.

-AWS Certification preferred but not essential: SysOps and/or Solutions Architect ideal.

-Experience with Amazon RDS, EKS, CloudWatch, etc.

-Experience with Docker tooling and ecosystem.

Education:

Bachelor's degree or higher in Computer Science, or equivalent experience.

LIFE AT LUMIN DIGITAL

Lumin Digital is a fintech company specializing in digital banking solutions. Through a fundamentally different approach to technology, service, and people, we're creating the next generation of financial solutions each and every day. Lumin helps banks and credit unions build and deploy next-gen digital experiences that help to continually serve, engage, and grow their membership base. While other platforms are partially adapted or retrofitted for the cloud, Lumin is 100% cloud-native. It was built specifically for the cloud environment, allowing us to realize the advantages more fully it offers. It's a difference that financial institutions and their users will see and feel almost immediately.

Our people have a passion for new possibilities. We intentionally foster curiosity through our culture. We engage people who can't help but ask "what if," "why not," and "what's next." We encourage them to bring forward ideas that challenge, raise, and reset expectations. And we empower them to continually explore, experiment, and apply what they learn. We champion curiosity because curiosity is how we grow– as a company, as a partner, and as individuals. For more information, visit .

California Employee privacy notice



  • Remote, Oregon, United States Catena Media Full time

    As a Senior Site Reliability Engineer at Catena, you will play a crucial role in maintaining optimal system performance and upholding high standards of availability, security, and resilience. Working at the intersection of software development and operations, you will collaborate closely with cross-functional teams to deliver high-quality services to our...


  • Remote, Oregon, United States Spekit Full time

    Headquartered out of Denver, CO, we're a small but mighty team on a mission to completely reinvent the future of learning at work. Introducing Spekit: the new way to learn in today's remote, digital workplace. Say goodbye to distracted zoom training sessions and lengthy LMS courses your teams will forget. Instead, Spekit takes all of your training &...


  • Remote, Oregon, United States Sojern Full time

    Position Summary:Sojern is looking for a Senior Site Reliability Engineer in the US to collaborate with Software Engineering teams located primarily in the Pacific Time Zone. An ideal candidate would have extensive experience building cloud infrastructure on Google Cloud with Terraform, and have strong experience running and securing workloads at scale on...


  • Remote, Oregon, United States Articulate Full time

    Articulate is looking for a Senior Site Reliability Engineer to join our amazing Platform Engineering team. The Senior Site Reliability Engineer I will be responsible for working cross-functionally to deliver and maintain scalable and reliable infrastructure. What you'll do:Be an example of the best practices your team and adjacent teams should follow when...


  • Remote, Oregon, United States DFIN Full time

    Donnelley Financial Solutions (DFIN) is a leader in risk and compliance solutions, providing insightful technology, industry expertise and data insights to clients across the globe. We're here to help you make smarter decisions with insightful technology, industry expertise and data insights at every stage of your business and investment lifecycles. As...


  • Remote, Oregon, United States Roadie Full time

    Roadie, a UPS Company, is a logistics management and crowdsourced delivery platform. Founded in 2014, Roadie offers businesses fast, flexible and asset-light logistics solutions for last-mile delivery. Roadie enables local delivery to more than 95% of U.S. households by providing access to more than 200,000 independent drivers nationwide – allowing...


  • Remote, Oregon, United States Podium Full time

    At Podium, our mission is to help local businesses win. Our lead conversion platform, powered by AI and integrations, helps local businesses convert leads faster, communicate easier, and make more sales. Every day, thousands of local businesses utilize our review management, communication, marketing, and payments products. Our work and focus on helping local...


  • Remote, Oregon, United States Aurora Labs Full time

    About Us Aurora Labs is the development company behind Aurora—the EVM blockchain that runs on the NEAR Protocol. We are also the developers of, and integration partner behind, Aurora Cloud—a suite of products that allow Web2 companies to capture the value of Web3.We invite you to be a part of our team of smart, professional, result-oriented and fun...


  • Remote, Oregon, United States Arcadia (DC) Full time

    Who We Are Arcadia is the technology company empowering energy innovators and consumers to fight the climate crisis. Our software and APIs are revolutionizing an industry held back by outdated systems and institutions by creating unprecedented access to the data and clean energy needed to make a decarbonized energy grid possible.In 2014, Arcadia set out on...


  • Remote, Oregon, United States Edge & Node Full time

    Edge & Node stands as the revolutionary vanguard of web3, a vision of a world powered by individual autonomy, shared self-sovereignty and limitless collaboration. Established by trailblazers behind The Graph, we're on a mission to make The Graph the internet's unbreakable foundation of open data. Edge & Node invented and standardized subgraphs across the...


  • Remote, Oregon, United States Sunrun Full time

    Everything we do at Sunrun is driven by a determination to transform the way we power our lives. We know that starts at the individual employee level. We strive to foster an environment you can thrive in through our commitment to diversity, inclusion and belonging.Objective:As a Sr. Site reliability engineer you are expected to help drive monitoring,...


  • Remote, Oregon, United States Stack Overflow Full time

    Every developer has a tab open on Stack Overflow. We are one of the most popular websites in the world - a community-based space focused on increasing productivity, decreasing cycle times, accelerating time to market, and protecting institutional knowledge. Innovation is at the heart of everything we do. We embrace collaboration, transparency, and believe in...

  • Reliability Engineer

    9 hours ago


    Remote, Oregon, United States Ocado Group Full time

    Reliability EngineerAnywhere, USA (West Coast preference)As a Reliability Engineer at Ocado, you will work directly with customers to maximize uptime of autonomous mobile robots and cloud-based software systems for warehouse automation. Conduct comprehensive analyses to evaluate the reliability and performance of our products, identify potential failure...


  • Remote, Oregon, United States Fireblocks Full time

    The world of digital assets is accelerating in speed, magnitude, and complexity, opening the door to new ways for leveraging the blockchain. Fireblocks' platform and network provide the simplest and most secure way for companies to work with digital assets and it trusted by some of the largest financial institutions, banks, globally-recognized brands, and...


  • Remote, Oregon, United States Supabase Full time

    Supabase is an Open Source and fully remote company building developer tools for databases.We are seeking an experienced SRE to manage the infrastructure of our Postgres databases. We currently manage over 1M Postgres instances and are growing fast.You will:Help build the Supabase Postgres offering.Focus on improving the reliability of database backups and...


  • Remote, Oregon, United States Galaxy Full time

    Who We Are:At Galaxy we are building products and services to help the world invest in economic progress. We believe crypto and blockchain innovations will permeate and improve all aspects of our global economy. Our vision is a society where value and ownership flow as freely as information. Galaxy is a digital asset and blockchain leader helping...


  • Remote, Oregon, United States DFIN Full time

    Donnelley Financial Solutions (DFIN) is a leader in risk and compliance solutions, providing insightful technology, industry expertise and data insights to clients across the globe. We're here to help you make smarter decisions with insightful technology, industry expertise and data insights at every stage of your business and investment lifecycles. As...


  • Remote, Oregon, United States Virtasant Full time

    Virtasant is a leading cloud consulting services provider. We heavily focus on lift & shift, cloud-native development, cloud cost optimization, and migration services. As a consulting company, we often are faced with the challenge to create an engineering team in a matter of a week or two. To do that, we have created a secondary support business that runs an...


  • Remote, Oregon, United States OpenTeams Full time

    Who We AreOpenTeams is the services marketplace where open source software users can find, vet, and contract with service providers. At OpenTeams we believe in a culture of do-ers, learners, and collaborators. We are looking for people who are motivated, humble, curious, and respectful of others. In order to meet the demands of our high growth business, we...


  • Remote, Oregon, United States Gremlin Full time

    Job Description: Today's complex, fast-paced systems have become a minefield of reliability risks—any of which could cause an outage that costs millions and destroys customer confidence. That's why high-availability teams use the Gremlin to find and fix ‌reliability risks before they become incidents.Gremlin Reliability Platform helps software teams...