Senior Software Engineer, Reliability Engineering Specialist

1 week ago


Sunnyvale, California, United States Onehouse Full time
About Onehouse

Onehouse is a mission-driven organization dedicated to liberating data from data platform silos. We deliver the industry's most interoperable data lakehouse through a cloud-native managed service built on Apache Hudi. Onehouse enables businesses to ingest data at scale with minute-level freshness, centrally store it, and make it available to any downstream query engine and use case (from traditional analytics to real-time AI/ML).

We are a team of self-driven, inspired, and seasoned technologists that have created large-scale data systems and globally distributed platforms that sit at the heart of some of the largest enterprises out there, including Uber, Snowflake, AWS, LinkedIn, Confluent, and many more. Riding off $33M total funding and a fresh Series A backed by Greylock/Addition, we are quickly expanding and looking for rising talent to grow with us and become future leaders of the team.

The Community You Will Join

When you join Onehouse, you're joining a team of passionate professionals tackling the deeply technical challenges of building a 2-sided engineering product. Our engineering team serves as the bridge between the worlds of open source and enterprise: contributing directly to and growing Apache Hudi (already used at scale by global enterprises like Uber, Amazon, ByteDance, etc.) and concurrently defining a new industry category - the transactional data lake. The Reliability Engineering team is the glue that binds all of this together. You will be responsible for developing and maintaining the tools and systems that enable our engineering teams to operate our services reliably and at scale. You will closely cross-functionally partner with our engineering teams to ensure our services are able to scale with our growing business.

The Impact You Will Drive:
  • At Onehouse, you will own our entire live production infrastructure and operational posture to run massive data systems at scale.
  • Ensure our services remain resilient by identifying opportunities for improvement and drive their implementation.
  • Identify opportunities to improve our overall operational efficiency and growing by owning the modern tools in our cloud-only operation and our practices for proactive automation, monitoring, and response.
  • Acting as a mentor to guide cross-functional teams during crisis situations and ensure timely resolution, minimizing the impact on our customers and business.
A Typical Day:
  • Build and own our reliability engineering practice from the ground up, owning our entire production infrastructure and operational posture.
  • Establish a culture of reliability across engineering by providing a comprehensive incident management platform that is being used for instrumentation, operability, and around incidents.
  • Design, implement, and maintain new services, tools, and monitoring to support service reliability and alerting.
  • Serve as an active member of our SRE team, responding to and managing high-severity incidents or any situations concerning the wellbeing and continuous operation of our mission-critical systems.
  • Collaborate with your stakeholders across engineering teams to ensure continuous adoption of best practices, rollout scenarios for the space, and that services are designed with reliability in mind.
  • Continuously analyze and evaluate the tradeoffs of the existing designs and make recommendations based on new technologies and industry best practices.
  • Support services before they go live through activities such as system design consulting, developing software platforms and frameworks, capacity management, and launch reviews.
  • Maintain services once they are live by measuring and monitoring availability, latency, and overall system health through an intimate understanding of how the critical parts of our site work.
  • Contribute to better incident management posture and retrospectives, driving improvements in our overall reliability and incident response time as well as on-call runbooks and post-mortem reports.
  • Drive our compliance posture; ensuring that all our products and processes comply with relevant regulations and standards, especially during compliance audits.
What You Bring to the Table:
  • Bachelor's degree in Computer Science or related field.
  • 7+ years of experience in software engineering or SRE roles, with a focus on large-scale distributed systems.
  • Strong coding skills in at least one programming language, such as Java, Python, or Go.
  • Strong conviction in software development best practices, including version control, automated testing, and continuous integration and delivery.
  • Excellent problem-solving, triaging, and debugging skills in large-scale distributed systems.
  • Experience with managing Kubernetes clusters and applications at scale.
  • Experience deploying applications on one or more cloud platforms such as AWS, Google Cloud Platform, or Microsoft Azure.
  • Experience defining and owning reliability-focused systems and processes (e.g., Incident Management, Post-mortem).
  • Experience with software development-related compliance processes (e.g., SOC 2, FedRAMP).
  • Experience with the following tech stack:
  • Infrastructure-as-code (e.g., Terraform, CloudFormation)
  • Automation frameworks (e.g., Jenkins, CircleCI)
  • Monitoring stacks (e.g., Prometheus and ELK)
  • Cloud security management (e.g., IAM, SSO)
  • Data processing technologies like Spark
How We'll Take Care of You

-Competitive Compensation; the estimated base salary range for this role is $150,000 - $220,000

-Equity Compensation; our success is your success with eligible participation in our company equity plan

-Health & Well-being; we'll invest in your physical and mental well-being with up to 90% health coverage (50% for spouses/dependents) including comprehensive medical, dental & vision benefits

-Financial Future; we'll invest in your financial well-being by making this role eligible to contribute to our company 401(k) or Roth 401(k) retirement plan

-Location; we are a remote-friendly company (internationally distributed across N. America + India), though some roles will be subject to in-person requirements in alignment with the needs of the business

-Generous Time Off; unlimited PTO (mandatory 1 week/year minimum), uncapped sick days, and 11 paid company holidays

-Company Camaraderie; Annual company offsites and Quarterly team onsites @Sunnyvale HQ

-Food & Meal Allowance; weekly lunch stipend, in-office snacks/drinks

-Equipment; we'll provide you with the equipment you need to be successful and a one-time $500 stipend for your initial desk setup

-Child Bonding; 8 weeks off for parents (birthing, non-birthing, adoptive, foster, child placement, new guardianship) - fully paid so you can focus your energy on your newest addition

House Values

One Team

Optimize for the company, your team, self - in that order. We may fight long and hard in the trenches, take care of your co-workers with empathy. We give more than we take to build the one house, that everyone dreams of being part of.

Tough & Persevering

We are building our company in a very large, fast-growing but highly competitive space. Life will get tough sometimes. We take hardships in the stride, be positive, focus all energy on the path forward and develop a champion's mindset to overcome odds. Always day one.

Keep Making It Better Always

Rome was not built in a day; If we can get 1% better each day for one year, we'll end up thirty-seven times better. This means being organized, communicating promptly, taking even small tasks seriously, tracking all small ideas, and paying it forward.

Think Big, Act Fast

We have tremendous scope for innovation, but we will still be judged by impact over time. Big, bold ideas still need to be strategized against priorities, broken down, set in rapid motion, measure, refine, repeat. Great execution is what separates promising companies from proven unicorns.

Be Customer Obsessed

Everyone has the responsibility to drive towards the best experience for the customer, be an OSS user or a paid customer. If something is broken, own it, say something, do something; never ignore. Be the change that you want to see in the company.

Pay Range Transparency

Onehouse is committed to fair and equitable compensation practices. Our job titles may span more than one career level. The pay range(s) for this role is listed above and represents the base salary range for non-commissionable roles or on-target earnings for commissionable roles. Actual compensation packages are dependent upon several factors that are unique to each candidate, including but not limited to: job-related skills, depth of transferable experience, relevant certifications and training, business needs, market demands, and specific work location. Based on the factors above, Onehouse utilizes the full width of the range; the base pay range is subject to change and may be modified in the future. The total compensation package for this position will also include eligibility for equity options and the benefits listed above.



  • Sunnyvale, California, United States Capgemini Engineering Full time

    Site Reliability EngineerCapgemini Engineering is seeking a skilled Site Reliability Engineer to join our team in Sunnyvale, CA. As a Site Reliability Engineer, you will play a critical role in ensuring the reliability and performance of our cloud-based applications using Azure Kubernetes Services (AKS).Key Responsibilities:Maintain and improve the...


  • Sunnyvale, California, United States Capgemini Engineering Full time

    About the RoleWe are seeking a highly skilled Site Reliability Engineer to join our team at Capgemini Engineering. As a key member of our infrastructure team, you will be responsible for ensuring the reliability and performance of our cloud-based applications.Key ResponsibilitiesMaintain and Improve Cloud Infrastructure: Utilize Azure Kubernetes Services...


  • Sunnyvale, California, United States Intelliswift Software Full time

    About the RoleWe are seeking a highly skilled Full Stack Software Engineer to join our team at Intelliswift Software. As a key member of our development team, you will be responsible for designing, developing, and maintaining scalable front-end applications using React and TypeScript.Key ResponsibilitiesDevelop and maintain high-quality, scalable front-end...


  • Sunnyvale, California, United States SENIOR SPIRIT OF ROSELLE PARK Full time

    Job SummaryWe are seeking a highly skilled Senior Software Quality Assurance Engineer to join our team at SENIOR SPIRIT OF ROSELLE PARK. As a key member of our QA team, you will be responsible for ensuring the quality and reliability of our software products.Key ResponsibilitiesDevelop and execute comprehensive quality assurance plans to ensure product...


  • Sunnyvale, California, United States Capgemini Engineering Full time

    Job Title: Site Reliability EngineerCapgemini Engineering is seeking a skilled Site Reliability Engineer to join our team in Sunnyvale, CA. As a Site Reliability Engineer, you will play a crucial role in ensuring the reliability and performance of our cloud-based applications using Azure Kubernetes Services (AKS).Key Responsibilities:Maintain and improve the...


  • Sunnyvale, California, United States Intelliswift Software Full time

    Position: Senior Hardware EngineerCompany: Intelliswift SoftwareOverview: We are seeking a highly skilled Senior Hardware Engineer with extensive experience in power management for consumer electronics and wearable devices.Key Qualifications:7 to 10+ years of relevant experience, with a focus on leadership roles.Expertise in power consumption analysis and...


  • Sunnyvale, California, United States Intuitive Surgical Full time

    Job DescriptionAt Intuitive Surgical, we are committed to advancing the world of minimally invasive care. We are seeking a highly skilled Senior Embedded Software Engineer to join our Future Forward Research group.Primary Function of PositionWe are looking for a talented software engineer to design and develop system software and digital applications for our...


  • Sunnyvale, California, United States Fortinet Full time

    Job Title: Senior DevOps Engineer - Infrastructure SpecialistJob Type: Full-timeLocation: HybridWe're seeking a highly skilled Senior DevOps Engineer / Infrastructure Specialist to manage and optimize our critical development infrastructure. This role is essential for maintaining a reliable, efficient, and scalable environment to support our software...


  • Sunnyvale, California, United States Intelliswift Software Full time

    Position: Python Software EngineerLocation: RemoteContract Duration: 12-MonthsEssential Skills:Proficient in Python, including code development for data repositoriesExperience with mobile devices and wearable technologyPreferred Skills:Background in computer engineering with hardware interactionExperience in power and performance data analysis, including...


  • Sunnyvale, California, United States Intelliswift Software Full time

    Position: Python Software EngineerLocation: RemoteContract Duration: 12-MonthsEssential Skills:Proficient in Python, with experience in developing code for data repositoriesExperience with mobile devices and wearable technologyPreferred Skills:Background in computer engineering, particularly with hardware interactionsExperience in analyzing power and...


  • Sunnyvale, California, United States Juniper Networks, Inc. Full time

    Juniper Networks, Inc. in Sunnyvale, CA seeks a Software Engineering Sr Manager to participate on a multi-site team of managers, architects & engineers involved in development of platform software for Timing software, platform infrastructure, device drivers, chassis control, device management. Key responsibilities include: * Collaborating with...


  • Sunnyvale, California, United States Mumba Technologies, Inc. Full time

    Job Title: Senior Software EngineerWe are seeking a highly skilled Senior Software Engineer to join our team at Mumba Technologies, Inc. as we develop cutting-edge system software and digital applications for the latest hardware targeting end-users involved in surgical robotic procedures.Key Responsibilities:Design and develop system software and associated...


  • Sunnyvale, California, United States Apple Full time

    About the RoleWe are seeking a highly skilled Senior Site Reliability Engineer to join our team at Apple. As a key member of our Manufacturing Systems & Infrastructure (MSI) team, you will play a critical role in maintaining and enhancing the reliability of our production systems.Key ResponsibilitiesDesign, develop, and maintain scalable, reliable, and...


  • Sunnyvale, California, United States Amazon Full time

    About the RoleWe are seeking a highly skilled Senior Software Development Engineer to join our team at Amazon. As a key member of our Information Experience Technology (IXT) organization, you will play a critical role in delivering engaging, natural conversational experiences for our customers.Key ResponsibilitiesDesign and Code Solutions: You will be...


  • Sunnyvale, California, United States Capgemini Engineering Full time

    Job Overview:We are seeking a skilled Healthcare Software Engineer with a robust background in interoperability solutions within the healthcare sector. This is a remarkable opportunity to join Capgemini Engineering, a prominent global leader in digital and software engineering. In this role, you will be instrumental in designing and developing microservices...


  • Sunnyvale, California, United States Walmart Full time

    Job SummaryWe are seeking a highly skilled Senior Software Engineer to join our team at Walmart Global Tech. As a key member of our engineering team, you will be responsible for designing, developing, and maintaining large-scale software systems that drive business growth and innovation.About the RoleThis is a unique opportunity to work on cutting-edge...


  • Sunnyvale, California, United States Google Full time

    About the RoleWe're seeking a highly skilled Senior Software Engineer to join our team at Google. As a key member of our Generative AI team, you will be responsible for designing, developing, and deploying cutting-edge software solutions that drive innovation and growth.ResponsibilitiesDesign and develop software solutions that meet the needs of our users...


  • Sunnyvale, California, United States Amazon Full time

    Job SummaryWe are seeking a highly skilled Senior Software Development Engineer to join our team at Amazon. As a key member of our Information Experience Technology (IXT) organization, you will play a critical role in delivering engaging, natural conversational experiences for our customers.Key ResponsibilitiesDesign and Develop High-Quality Software: You...


  • Sunnyvale, California, United States Apple Full time

    About the RoleWe are seeking a highly skilled Senior Site Reliability Engineer to join our team at Apple. As a key member of our Manufacturing Systems & Infrastructure (MSI) team, you will play a critical role in maintaining and enhancing the reliability of our production systems.Key ResponsibilitiesDesign, develop, and maintain scalable, reliable, and...


  • Sunnyvale, California, United States Apple Full time

    About the RoleWe are seeking a highly skilled Senior Site Reliability Engineer to join our team at Apple. As a key member of our Manufacturing Systems & Infrastructure (MSI) team, you will play a critical role in maintaining and enhancing the reliability of our production systems.Key ResponsibilitiesDesign, develop, and maintain scalable, reliable, and...