Site Reliability
3 weeks ago
Site Reliability / Gitops Engineer Apply for the Site Reliability / Gitops Engineer role at Canonical . Canonical is a leading provider of open source software and operating systems to the global enterprise and technology markets. Our platform, Ubuntu, is widely used in breakthrough enterprise initiatives such as public cloud, data science, AI, engineering innovation, and IoT. With over 1200 colleagues in 75+ countries, Canonical is founder‑led, profitable, and rapidly growing. Job Summary The Information Systems (IS) team supports and maintains all Canonical’s IT production services used by more than 60 million Ubuntu users. As a Site Reliability / Gitops Engineer, you will drive operations automation to the next level across private and public clouds using open‑source infrastructure‑as‑code tools, CI/CD pipelines, and Canonical’s leading automation products. You will also provide feedback on our products and collaborate with development teams to improve their scalability and resilience. Location This role is available remotely in any timezone. Responsibilities Apply your experience of IaC to develop infrastructure‑as‑code practices within IS by constantly increasing automation and improving IaC processes. Automate software operations for re‑usability and consistency across private and public clouds while considering the complexities of distributed systems. Develop new features and improve the resilience and scalability of Canonical’s cloud and container portfolio. Maintain operational responsibility for all of Canonical’s core services, networks, and infrastructure. Develop skills in troubleshooting, capacity planning, and performance investigation; set up, maintain, and use observability tools such as Prometheus, Grafana, and Elasticsearch; design and maintain monitoring and alerting for various systems and services. Collaborate with development teams to design service architecture, documentation, playbooks, policies, and operational procedures. Provide assistance and work with globally distributed engineering, operations, and support peers. Receive uninterrupted development time to focus on larger projects and automation of manual tasks. Share experience, know‑how, and best practices with team members through design sessions, mentorship, and collaborative work. Carry final responsibility for time‑critical escalations. Qualifications Deep experience and knowledge of defining operations in code, using version control, peer review, and CI/CD to roll out changes to applications and infrastructure. Strong modern engineering background (peer‑review, unit testing, SCM, CI/CD, Agile). Python software development experience on large projects. Practical knowledge of Linux networking, routing, and firewalls. Affinity with various Linux storage technologies, from Ceph to databases. Hands‑on experience administering enterprise Linux servers. Extensive knowledge of cloud computing concepts and technologies. Bachelor’s degree or greater, preferably in computer science or a related engineering field. Clear and effective communication in English over email, chat, video, or voice calls and in‑person. Motivated and able to troubleshoot from kernel to web, and willing to ask for help when appropriate. Willingness to be flexible and learn new things quickly. Addicted by the needs of fast‑changing environments. Happy to work within distributed teams. Passionate about, and familiarized with, open‑source, especially Ubuntu or Debian. About Canonical Canonical is a pioneering tech firm leading the global move to open source. As the publisher of Ubuntu, one of the most important open‑source projects and the platform for AI, IoT, and the cloud, we are changing the world of software. Canonical recruits on a global basis and expects excellence; success requires being the best at what we do. Most colleagues have worked from home since our inception in 2004, and working here is a step into the future that will challenge you to think differently, work smarter, learn new skills, and raise your game. Equal Opportunity Employer Canonical is an equal opportunity employer. We are proud to foster a workplace free from discrimination. Diversity of experience, perspectives, and background creates a better work environment and better products. Whatever your identity, we will give your application fair consideration. Seniority level Mid‑Senior level Employment type Full‑time Job function Engineering and Information Technology Industries Software Development #J-18808-Ljbffr
-
Site Reliability Engineer
3 weeks ago
City of Albany, United States Canonical Full timeJoin to apply for the Site Reliability Engineer role at Canonical Canonical is a leading provider of open source software and operating systems for the global enterprise and technology markets. Ubuntu, the company’s flagship platform, is widely used in breakthrough initiatives such as public cloud, data science, AI, engineering innovation, and IoT. With...
-
Senior Site Reliability Engineer
3 weeks ago
City of Albany, United States Canonical Full timeSenior Site Reliability Engineer – Join Canonical About Canonical Canonical is a leading provider of open source software and operating systems to the global enterprise and technology markets. Our platform, Ubuntu, is widely used in breakthrough enterprise initiatives such as public cloud, data science, AI, engineering innovation and IoT. Our customers...
-
Site Reliability Engineer — Open Source Infra
3 weeks ago
City of Albany, United States Canonical Full timeA leading open-source software provider is seeking a Site Reliability Engineer to enhance enterprise infrastructure through DevOps practices. You will deploy and manage OpenStack and Kubernetes for global clients, ensuring high service quality through incident resolution and monitoring. Ideal candidates will have a degree in Software Engineering, experience...
-
Site Reliability
6 hours ago
New York City Metropolitan Area, United States The Phoenix Group Full timeSite Reliability / Platform Engineering RolesLocation:New York City (Hybrid or Flexible)We work with a range of NYC-based organizations across fintech, asset management, SaaS, and data-driven companies that operate large-scale, production-critical systems.This posting reflects the types ofsite reliability and platform engineering roleswe consistently see...
-
Site Reliability Engineer
3 weeks ago
Town of Florida, United States Optomi Full timeOverview This range is provided by Optomi. Your actual pay will be based on your skills and experience — talk with your recruiter to learn more. Base pay range $145,000.00/yr - $160,000.00/yr Cloud & Infrastructure Technical Recruiter @ Optomi | Bachelor of Science Site Reliability Engineer Optomi, in partnership with a leading global media organization...
-
Site Reliability Engineer
1 week ago
Foster City, United States Replit, Inc. Full timeReplit is the agentic software creation platform that enables anyone to build applications using natural language. With millions of users worldwide and over 500,000 business users, Replit is democratizing software development by removing traditional barriers to application creation.About the role:Join our Site Reliability Engineering team and help ensure the...
-
Site Reliability Engineer
3 days ago
Foster City, United States Repl.it Full timeReplit is the fastest way to turn ideas into software. With our powerful AI-powered Agent and Assistant, anyone can create and launch apps from natural language in just one click. Build and deploy full-stack applications directly from your browser—no setup required. Never written a line of code in your life? No problem. Replit makes software creation...
-
Site Reliability Engineer
3 weeks ago
Town of Texas, United States Longbridge Securities Full timeLongbridge is a fast-growing online brokerage platform on a mission to make investing smarter, simpler, and more accessible for everyone. Overview We are looking for a hands-on Site Reliability Engineer (SRE) to design, scale, and safeguard the reliability of our next-generation financial platforms. This is a high-impact role where you’ll partner closely...
-
Site Reliability Engineer
2 weeks ago
Foster City, United States Zoox Full timeZoox is seeking a Site Reliability Engineer to help ensure the availability, performance, and resilience of the services that power the development and operation of our autonomous vehicles. In this role, you will own the full lifecycle of our services—from designing fault-tolerant, maintainable systems to deploying, operating, and continuously improving...
-
Site Reliability Engineer
3 weeks ago
Town of Texas, United States SS&C Technologies Full timeOverview Site Reliability Engineer (SRE) at SS&C Technologies. Remote opportunities available in multiple states. SS&C Technologies is a global investment and financial services software provider with a long-standing presence and a broad client base. About the Role The Site Reliability Engineer (SRE) is responsible for leading technology teams to deliver...