Site Reliability Engineer

3 weeks ago


San Francisco, United States GRNET Full time

About GRNETGRNET - National Infrastructures for Research and Technology, is an entity of the Greek Government, operating under the Ministry of Digital Governance. It provides advanced network and cloud computing services to academic and research institutions, educational entities at all levels, as well as to public, broader public, and private sector agencies.GRNET has a wide service portfolio, covering a number of sectors. A brief list is provided below: Unified Portal for all Government-related Digital Services (Ενιαία Ψηφιακή Πύλη). RE-Cloud: Cloud Services for Research and Education (~okeanos, ViMa). Networking: GRNET acts as the ISP for Greece's research and academic institutions. GRNET Site Reliability EngineeringGRNET maintains its infrastructure across multiple data centers distributed throughout Greece, primarily utilizing Free and Open Source software such as Kubernetes, ArgoCD, GitLab CI, Debian GNU/Linux, OpenStack, Ceph, Ansible and more. GRNET adopts the Site Reliability Engineering approach.Our SRE department is divided into three groups: Services, Platform, and Cloud. As an SRE, you will be assigned to one of these groups, depending on the current needs, your preferences, and expertise. Here is a summary of the responsibilities of each group: Cloud: Design and implement GRNET's new Cloud infrastructure utilizing OpenStack and Kubernetes. Platform: Design and manage our Internal Developer Platform, based on Kubernetes. Services: Manage services, pipelines and tooling for use on top of our Internal Developer Platform. Your RoleWhether you have a Systems or Software Engineering background, we are seeking Senior SREs with a strong DevOps mindset that are willing for the following:Design and implement fault-tolerant, scalable and distributed services.Bring your technical opinion and vision to the table: It matters.Handle problems that require under-the-hood investigation, whether it is called legacy infrastructure, technical debt or unfamiliar technical territory.Lead projects within the team.Able to collaborate with multiple people and teams based on a policy of openness.RequirementsRequired Qualifications At least three (3) years of professional working experience as an SRE or Software Engineer with emphasis on infrastructure. Bachelor's degree in Computer Science or a related field, alternatively, comparable professional experience. Experience on designing distributed services at scale; whether on-premises or in the cloud. Experience on running containerized workloads on Kubernetes. Knowledge of DevOps practices that bridge gaps, promote communication and speed up processes. Knowledge of Linux internals: Words like cgroups, tcpdump, inode, procfs should sound familiar to you. Knowledge of at least one (1) programming language. Working-level Proficiency in Greek. The role involves communication with Greek-speaking stakeholders and teams. Bonus QualificationsExperience with Data Center and Linux networking concepts and internals.Experience with on-premises Cloud infrastructure (OpenStack, OpenNebula, etc).Related personal projects or contributions to open-source projects.Benefits An exciting, dynamic and fast paced working environment that encourages team spirit, cooperation and continuous learning of state-of-the-art technology. A competitive remuneration package and benefits, international collaborations and an environment that fosters innovation. Training and participation in technical conferences. GRNET is dedicated to promoting diversity and inclusion in the workplace and is an equal opportunity employer. We welcome applications from individuals of varied backgrounds. Our policy ensures that no applicant is discriminated against based on race, age, color, gender identity and expression, disability, national origin, medical conditions, religion, parental status, or any legally protected characteristics.All applications will be treated with strict confidentiality.



  • San Francisco, CA, United States Apollo Solutions Full time

    Site Reliability Engineer Apollo Solutions have partnered with a groundbreaking artifical inteligence business who are making major developments in how we use AI/ML for gaming/security. They are working closely with government contracts as well as gaming consoles companys and are now searching for an SRE to join their growing team. The Site Reliability...


  • San Francisco, United States Patreon Full time

    Patreon is the best place for creators to build exclusive content and community for their fans. We enable creators (podcasters, writers, musicians, illustrators, etc) to connect with their fans directly and make money from their creative work. Creators can sell one-off items from their own shops or offer recurring monthly memberships with exclusive access to...


  • San Francisco, United States Apollo Solutions Full time

    Principal Site Reliability Engineer Apollo Solutions have partnered with a groundbreaking Fintech start-up backed by top tier venture capital. They are looking to significantly disrupt how we view, store and invest our personal finance and have already made significant waves in the industry. The Principal Site Reliability Engineer will be working closely...


  • San Francisco, United States Pelago Full time

    Role Overview: At Pelago, we run a serverless architecture on AWS, with infrastructure managed using Terraform. Our system has been built to deliver our virtual clinic for Substance Use Management, and we are looking for a talented Site Reliability Engineer to join the engineering team supporting Pelago.As a HIPAA compliant, HITRUST certified organization it...


  • San Francisco, United States Apollo Solutions Full time

    Principal Site Reliability Engineer Apollo Solutions have partnered with a groundbreaking Fintech start-up backed by top tier venture capital. They are looking to significantly disrupt how we view, store and invest our personal finance and have already made significant waves in the industry. The Principal Site Reliability Engineer will be working closely...


  • San Francisco, United States Instabase Full time

    At Instabase, we're passionate about democratizing access to cutting-edge AI innovation to enable any organization to solve previously unsolvable unstructured data problems in their industry. With customers representing some of the largest and most complex organizations in the world, and investors like Greylock, Andreessen Horowitz, and Index Ventures, our...


  • San Francisco, United States eTeam Inc. Full time

    Role: Site Reliability Engineer Location: 100% remote Duration: 6+ MonthsPrimary Skill: Minimum 8 years exp in Terraform, Ansible, Networking, Jenkins, Python, GCP in Technology companies. Security (vulnerability management).


  • San Francisco, United States Instabase Full time

    At Instabase, we're passionate about democratizing access to cutting-edge AI innovation to enable any organization to solve previously unsolvable unstructured data problems in their industry.  With customers representing some of the largest and most complex organizations in the world, and investors like Greylock, Andreessen Horowitz, and Index Ventures, our...


  • San Francisco, United States Instabase Full time

    At Instabase, we're passionate about democratizing access to cutting-edge AI innovation to enable any organization to solve previously unsolvable unstructured data problems in their industry.  With customers representing some of the largest and most complex organizations in the world, and investors like Greylock, Andreessen Horowitz, and Index Ventures, our...


  • San Francisco, United States Apollo Solutions Full time

    Principal Site Reliability Engineer SRE Apollo Solutions have proudly partnered with a Series E SaaS organization based in San Francisco. They have recently employed a highly respected CEO who has spent his career successfully scaling multiple start-ups with large exit events including a $1 billion+ IPO. We are looking for a Principal SRE based in San...


  • San Francisco, United States Apollo Solutions Full time

    Principal Site Reliability Engineer SRE Apollo Solutions have proudly partnered with a Series E SaaS organization based in San Francisco. They have recently employed a highly respected CEO who has spent his career successfully scaling multiple start-ups with large exit events including a $1 billion+ IPO. We are looking for a Principal SRE based in San...


  • San Francisco, United States Apollo Solutions Full time

    Principal Site Reliability Engineer SRE Apollo Solutions have proudly partnered with a Series E SaaS organization based in San Francisco. They have recently employed a highly respected CEO who has spent his career successfully scaling multiple start-ups with large exit events including a $1 billion+ IPO. We are looking for a Principal SRE based in San...


  • San Francisco, United States Talkdesk Full time

    At Talkdesk, we are courageous innovators focused on helping organizations around the world create better customer experiences. Our AI-powered cloud contact center solutions optimize our customers’ most critical customer service processes. We are recognized as a Contact Center as a Service (CCaaS) leader by influential research organizations including...


  • San Francisco, United States Talkdesk Full time

    At Talkdesk, we are courageous innovators focused on helping organizations around the world create better customer experiences. Our AI-powered cloud contact center solutions optimize our customers’ most critical customer service processes. We are recognized as a Contact Center as a Service (CCaaS) leader by influential research organizations including...


  • San Francisco, United States Resource Informatics Group Full time

    Job Title: Site Reliability Engineer Work Location : San Francisco, CA (Hybrid after showing successful engagement) Duration: 18+ months Most important skills: 10 years of Oracle database administration experience on large production environment Database hands on skills especially around database and system troubleshooting and administration GoldenGate...


  • San Francisco, United States Resource Informatics Group Full time

    Job Title: Site Reliability Engineer Work Location : San Francisco, CA (Hybrid after showing successful engagement) Duration: 18+ months Most important skills: 10 years of Oracle database administration experience on large production environment Database hands on skills especially around database and system troubleshooting and administration GoldenGate...


  • San Francisco, United States Talkdesk Full time

    At Talkdesk, we are courageous innovators focused on helping organizations around the world create better customer experiences. Our AI-powered cloud contact center solutions optimize our customers’ most critical customer service processes. We are recognized as a Contact Center as a Service (CCaaS) leader by influential research organizations including...


  • San Francisco, United States Talkdesk Full time

    At Talkdesk, we are courageous innovators focused on helping organizations around the world create better customer experiences. Our AI-powered cloud contact center solutions optimize our customers’ most critical customer service processes. We are recognized as a Contact Center as a Service (CCaaS) leader by influential research organizations including...


  • San Francisco, United States DAOmatch Full time

    Aptos is a people-first blockchain on a mission to help billions of people achieve universal and fair access to decentralized assets in a safe and scalable way.Founded by some of the original creators and maintainers that researched, designed, and built the Diem blockchain to serve this purpose, we have dedicated several years toward this mission. We believe...


  • San Francisco, United States Resource Informatics Group Full time

    Job Title: Site Reliability Engineer Work Location: San Francisco, CA (Hybrid after showing successful engagement) Duration: 18+ months Most important skills:10 years of Oracle database administration experience on large production environment Database hands on skills especially around database and system troubleshooting and administration GoldenGate setup,...