Sr. Site Reliability Engineer

21 hours ago


Hawthorne CA United States SPACE EXPLORATION TECHNOLOGIES CORP Full time

SR. SITE RELIABILITY ENGINEER - TOP SECRET CLEARANCE

As a Senior Site Reliability Engineer, you will architect, develop, and test key aspects of the infrastructure for an in-house solution for analysis, simulation, prototyping, and operation of software in support of all SpaceX flight systems. You will have full ownership of the automation and technical infrastructure to support scalable high-performance web applications that manage large volumes of data in addition to a suite of simulation and test products. In this high-impact role, you will work across engineering groups to build a high-throughput distributed system that will be used to develop, demonstrate, and operate cutting-edge software and hardware. We are looking for smart, motivated software engineers who enjoy taking on complex challenges, work well in dynamic environments, and care about software best practices.

Our application software is critical to future mission success and we have no shortage of interesting challenges that require innovative, cutting-edge solutions.

RESPONSIBILITIES:

  • Architect application and database clusters leveraging microservices in support of SpaceX flight systems for both on-premises and in the cloud deployments
  • Develop automation to deploy and manage applications both on-premises and in the cloud utilizing infrastructure as code where necessary
  • Collaborate with software engineers to create highly scalable, operable and maintainable products
  • Collaborate with IT and software engineers to develop test automation suite leveraging DevOps infrastructure
  • Develop policies and automation in collaboration with IT and software engineers to ensure compliance with DevSecOps best practices
  • Collaborate with IT and software engineers to develop policies and automation to ensure security compliance requirements are fulfilled
  • Engage in and improve the whole lifecycle of services -- from inception and design, through deployment, operation and refinement

BASIC QUALIFICATIONS:

  • Bachelor’s degree in computer science, information systems/IT, or an engineering discipline; OR 5+ years of professional experience in software, DevOps, or site reliability engineering in lieu of a degree
  • 3+ years of experience with Linux operating systems
  • Experience building and managing production systems leveraging containerization technologies (i.e. Docker, Kubernetes)
  • Experience with designing and managing solutions in cloud environments such as AWS, Azure or GCP
  • Experience in Bash, Python, and/or other scripting languages
  • Active Secret, Top Secret, Top Secret SCI, OR ability and willingness to obtain a Top Secret clearance

PREFERRED SKILLS AND EXPERIENCE:

  • 5+ years of systems administration, site reliability engineering, or DevOps experience
  • 3+ years of experience working with Kubernetes, Docker, or similar technologies
  • Strong understanding of message queue technologies such as RabbitMQ or Kafka
  • Strong understanding of virtualization and hypervisor technologies
  • Understanding of databases and performance tuning
  • Experience with identity management and authentication protocols
  • Focus on performance bottlenecks and performance improvement techniques
  • Excellent communication skills with the ability to communicate with customers, peers, management, etc. in both formal and informal situations
  • Ability to quickly learn new tools and frameworks

ADDITIONAL REQUIREMENTS:

  • Willing to work extended hours and weekends when needed
#J-18808-Ljbffr

  • Sunnyvale, CA, United States Natcast, Inc. Full time

    Natcast (short for The National Center for the Advancement of Semiconductor Technology) is a new, purpose-built, non-profit entity created to operate the National Semiconductor Technology Center (NSTC) consortium, established by the CHIPS Act of the U.S. government. Working at Natcast represents an opportunity to help extend America’s leadership in...


  • Plainsboro Township, NJ, United States Integra LifeSciences Full time

    Changing lives. Building Careers. Joining us is a chance for you to do important work that creates change and shapes the future of healthcare. Thinking differently is what we do best. To us, change equals opportunity. Every day, more than 4,000 of us are challenging what’s possible and making headway to help improve outcomes. Position: Sr. Reliability...


  • Dallas, TX, United States Sygna LLC Full time

    Job Title: Sr. Site Reliability Engineer Ready to apply Before you do, make sure to read all the details pertaining to this job in the description below. Contract Type: Contract to hire Location: Hybrid (Dallas Tx) Must Have and Metrics Technical Skills: Years of experience: 7+ Ability to collaborate with cross-functional teams, troubleshoot...


  • McLean, VA, United States GameStop Full time

    Overview Design. Disrupt. Repeat. Be an agent of change on a team committed to achieving client-focused, mission-driven excellence. Steampunk is looking for an experienced Site Reliability Engineer with an appetite for taking on new challenges. Who We Are Steampunk is the explosive collision of human-centered design and traditional government...


  • McLean, VA, United States Root Center For Advanced Recovery Full time

    Overview Design. Disrupt. Repeat. Be an agent of change on a team committed to achieving client-focused, mission-driven excellence. Steampunk is looking for an experienced Site Reliability Engineer with an appetite for taking on new challenges. Who We Are Steampunk is the explosive collision of human-centered design and traditional government contracting. An...


  • Redwood City, CA, United States C3 AI Full time

    We are looking for an Associate Site Reliability Engineer / Site Reliability Engineer to join our team at our HQ in Redwood City, CA. Responsibilities: Maximize system uptime and availability, ensuring functional and performance SLAs. Establish end-to-end monitoring and alerting on all critical aspects. Solve complex problems for critical services...


  • Sunnyvale, CA, United States Apple Inc. Full time

    To view your favorites, sign in with your Apple Account. Imagine what you could do here. At Apple, new ideas have a way of becoming extraordinary products, services, and customer experiences very quickly. Bring passion and dedication to your job and there's no telling what you could accomplish. The people here at Apple don’t just create products —...


  • Sunnyvale, CA, United States Microsoft Full time

    There has never been a more exciting time to be working in healthcare at Microsoft. Our Health & Life Sciences Solutions organization is an interdisciplinary team of product managers, designers, engineers, and clinicians who are designing, developing and deploying next-generation healthcare solutions powered by the Microsoft Cloud for healthcare...


  • Chicago, IL, United States Datamaxis Full time

    Location : Chicago, IL Position Type : Fulltime (3 days a week (Tue, Wed & Thu) onsite or more if needed) Salary : $125,000 to 140,000 (10% yearly bonus) Responsibilities: Manage and monitor systems and infrastructure hosted on-premises and Cloud. Good understanding of different layers of an application and system design - networking concepts, cloud...


  • Chicago, IL, United States WEX, Inc. Full time

    The WEX Site Reliability Engineering (SRE) team is seeking an entry-level Site Reliability Engineer Level 1 who is passionate about learning and growing in the field of software development and solutions focused on observability, incident response, reliability and performance, operational excellence, and compliance. The team will be part of the Benefits...


  • San Francisco, CA, United States Earnest Current Job Openings Full time

    The Site Reliability Engineer II position will report to the Lead Cloud Engineer. As an SRE II Engineer, you will: Set up and maintain comprehensive monitoring, create and refine playbooks, build dashboards, and adopt industry-standard practices to enhance the reliability and resilience of our site and systems. Develop and manage IaC to ensure reliable,...


  • Annapolis Junction, MD, United States Maximus Full time

    General information Job Posting Title Site Reliability Engineer Date Wednesday, October 16, 2024 City Annapolis Junction State MD Country United States Working time Full-time Description & Requirements Maximus is seeking a Site Reliability Engineer to provide expertise to a federal client in support of their mission critical systems in defense of our...


  • Hawthorne, California, United States SpaceX Full time

    Company OverviewSpaceX is a pioneering space exploration company that aims to enable human life on Mars. With a focus on developing cutting-edge technologies, SpaceX is seeking a highly skilled Site Reliability Engineer to join its team.Job DescriptionWe are looking for a seasoned Site Reliability Engineer to operate and scale custom-built mission-critical...


  • Annapolis Junction, MD, United States Maximus Full time

    General information ...


  • Duluth, GA, United States BlueSky Resource Solutions Full time

    Job Title: Site Reliability Engineer – ObservabilityOverview:We are seeking a Site Reliability Engineer III to develop and maintain our observability platform. This role focuses on ensuring the reliability, performance, and scalability of microservices, Kubernetes clusters, and cloud infrastructure. You'll collaborate with cross-functional teams to deliver...


  • Miami, FL, United States Royal Caribbean Group Full time

    Site Reliability Engineer Journey with us! Combine your career goals and sense of adventure by joining our incredible team of employees at Royal Caribbean Group . We are proud to offer a competitive compensation and benefits package, and excellent career development opportunities, each offering unique ways to explore the world. We are proud to be the...


  • Fairfax, VA, United States Apex Systems Full time

    We are seeking talented professionals to join our successful and growing team in building the next-generation Continuous Diagnostics and Mitigation (CDM) Cyber data solution. The CDM Program is the Cybersecurity and Infrastructure Security Agency’s (CISA) dynamic approach to strengthening the cybersecurity of Federal networks and systems through better...


  • Los Angeles, CA, United States CV Library Full time

    Position Title: Site Reliability Engineer (SRE for Datacenter) Location: REMOTE Pay Rate: $100/hr (+benefits) Assignment Length: 3-month W2 Contract Industry: Technology The Ideal Candidate will have experience with system operations and running large-scale, massively distributed infrastructure. Responsibilities: Data monitoring and alerting, data...


  • Newton, MA, United States Intelliswift Software Full time

    Title : Site Reliability EngineerLocation : Newton, MA HybridDuration : 6 MonthsPay rate : $38.73 per hour on W2We are seeking a skilled Site Reliability Engineer (SRE) Level 2 to join our dynamic team. The ideal candidate will have a strong technical background, excellent problem-solving skills, and a passion for enhancing system reliability and...


  • Washington, DC, United States Alldus International Consulting Ltd Full time

    Our client is a Series A startup within the Generative AI space and they are hiring a Site Reliability Engineer to join the team. Backed by one of the leading venture capital firms in the industry, this is an exciting opportunity to join a SaaS company that is revolutionizing their industry. Responsibilities: As the Site Reliability Engineer, you will...