Site Reliability Engineer

23 hours ago


Miami FL United States Royal Caribbean Group Full time
Site Reliability Engineer

Journey with us Combine your career goals and sense of adventure by joining our incredible team of employees at Royal Caribbean Group . We are proud to offer a competitive compensation and benefits package, and excellent career development opportunities, each offering unique ways to explore the world.

We are proud to be the vacation-industry leader with global brands — including Royal Caribbean International, Celebrity Cruises, and Silversea Cruises — the most innovative fleet and private destinations, and the best people. Together, we are dedicated to turning the vacation of a lifetime into a lifetime of vacations for our guests.

Royal Caribbean Group’s Digital team has an exciting career opportunity for a full-time Site Reliability Engineer reporting to the Senior Manager, Site Reliability Engineer - Digital Operations.

This position will work on-site in Miami, Florida.

Position Summary:
As a Site Reliability Engineer (SRE) at Royal Caribbean, you will play a critical role in ensuring the reliability, performance, and seamless operation of our digital ecosystem. This includes our guest-facing mobile apps, websites, and the backend systems that power them. You will work collaboratively with development, operations, and product teams to build and maintain a highly resilient and scalable digital experience for our guests.

Essential Duties and Responsibilities:

  1. Incident Response and Resolution: Respond to and resolve production incidents, prioritizing guest-facing issues to minimize disruption. Conduct root cause analysis with guidance from senior team members and implement preventive measures to avoid recurrence.
  2. Monitoring and Observability: Build, maintain, and enhance monitoring tools and dashboards (using Prometheus, Grafana, or similar) to provide visibility into system health, performance, and guest impact. Proactively detect and address potential issues.
  3. Automation and Tooling: Develop and implement automation scripts and tools to streamline operations, reduce manual intervention, and improve system reliability. Utilize configuration management tools and infrastructure as code principles.
  4. Collaboration: Work closely with product teams to incorporate reliability principles into new feature development. Collaborate with operations teams to ensure smooth deployments and transitions.
  5. Documentation and Knowledge Sharing: Create and maintain clear documentation on system architecture, troubleshooting guides, and incident postmortems. Share knowledge and best practices with the team.
  6. On-Call Support: Participate in on-call rotation as defined by team needs, primarily focusing on acknowledging and escalating incidents, with guidance from senior team members.
  7. Working Hours: Expectations of non-standard working hours which include mornings, nights, and weekend rotations.

Qualifications, Knowledge, and Skills:

  1. 3+ years of experience in IT operations, software development, or a related field.
  2. Bachelor’s degree in computer science or a related field preferred.
  3. Technical Expertise: Strong knowledge of mobile (iOS, Android) and web technologies, backend systems, cloud infrastructure (AWS, Azure, etc.), and database technologies.
  4. Programming: Proficiency in one or more programming languages (e.g., Python, Java, Go, Jenkins) for scripting and automation.
  5. Working knowledge of Kubernetes is a high plus.
  6. Monitoring and Observability: Experience with tools like Prometheus, Grafana, Splunk, or similar.
  7. Incident Management: Experience with incident management tools like PagerDuty, ServiceNow, or similar.
  8. Security: Understanding of security best practices, vulnerability identification, and incident response.
  9. Communication: Excellent written and verbal communication skills for collaborating with diverse teams and stakeholders.
  10. Customer Service: Understands and is aligned to the purpose of providing a great client experience (client-focused attitude).
  11. Detail Oriented: The ability to understand and appreciate the fine, granular details.
  12. SQL Database: Ability to work with large volumes of customer data. Ability to use Oracle SQL (or similar) to query databases and perform edits to SQL queries.

Preferred Qualifications:

  1. Experience in the hospitality or travel industry.
  2. Familiarity with Royal Caribbean's digital ecosystem.
  3. Experience with high-traffic, guest-facing systems.
  4. Previous experience in working with ticket-based incident systems.
  5. ITIL v3 or v4 Foundations Certification.

We know there's a lot to consider. As you go through the application process, our recruiters will be glad to provide guidance, and more relevant details to answer any additional questions. Thank you again for your interest in Royal Caribbean Group. We'll hope to see you onboard soon

It is the policy of the Company to ensure equal employment and promotion opportunity to qualified candidates without discrimination or harassment on the basis of race, color, religion, sex, age, national origin, disability, sexual orientation, sexuality, gender identity or expression, marital status, or any other characteristic protected by law. Royal Caribbean Group and each of its subsidiaries prohibit and will not tolerate discrimination or harassment.

#J-18808-Ljbffr

  • Miami, FL, United States INSPYR Solutions Full time

    Title: Site Reliability Engineer Make sure to apply quickly in order to maximise your chances of being considered for an interview Read the complete job description below. Location: Miami, FL Duration: 6+ months Compensation: $55.00 -60.00 Work Requirements: US Citizen, GC Holders or Authorized to Work in the U.S. Site Reliability...


  • Miami, United States INSPYR Solutions Full time

    Title: Site Reliability EngineerLocation: Miami, FLDuration: 6+ monthsCompensation: $ Work Requirements: US Citizen, GC Holders or Authorized to Work in the Site Reliability EngineerDescription:


  • Miami, FL, United States INSPYR Solutions Full time

    Client: Royal Caribbean Cruise Lines Apply (by clicking the relevant button) after checking through all the related job information below. Location: Miami, FL Website: www.rccl.com Duration: 6+ month contract Site Reliability Engineer Description: Consultant will play a critical role in ensuring the reliability, performance, and seamless operation of our...


  • Chicago, IL, United States WEX, Inc. Full time

    The WEX Site Reliability Engineering (SRE) team is seeking an entry-level Site Reliability Engineer Level 1 who is passionate about learning and growing in the field of software development and solutions focused on observability, incident response, reliability and performance, operational excellence, and compliance. The team will be part of the Benefits...


  • Sunnyvale, CA, United States Natcast, Inc. Full time

    Natcast (short for The National Center for the Advancement of Semiconductor Technology) is a new, purpose-built, non-profit entity created to operate the National Semiconductor Technology Center (NSTC) consortium, established by the CHIPS Act of the U.S. government. Working at Natcast represents an opportunity to help extend America’s leadership in...


  • Annapolis Junction, MD, United States Maximus Full time

    General information Job Posting Title Site Reliability Engineer Date Wednesday, October 16, 2024 City Annapolis Junction State MD Country United States Working time Full-time Description & Requirements Maximus is seeking a Site Reliability Engineer to provide expertise to a federal client in support of their mission critical systems in defense of our...


  • Annapolis Junction, MD, United States Maximus Full time

    General information ...


  • Duluth, GA, United States BlueSky Resource Solutions Full time

    Job Title: Site Reliability Engineer – ObservabilityOverview:We are seeking a Site Reliability Engineer III to develop and maintain our observability platform. This role focuses on ensuring the reliability, performance, and scalability of microservices, Kubernetes clusters, and cloud infrastructure. You'll collaborate with cross-functional teams to deliver...


  • Fairfax, VA, United States Apex Systems Full time

    We are seeking talented professionals to join our successful and growing team in building the next-generation Continuous Diagnostics and Mitigation (CDM) Cyber data solution. The CDM Program is the Cybersecurity and Infrastructure Security Agency’s (CISA) dynamic approach to strengthening the cybersecurity of Federal networks and systems through better...


  • Redwood City, CA, United States C3 AI Full time

    We are looking for an Associate Site Reliability Engineer / Site Reliability Engineer to join our team at our HQ in Redwood City, CA. Responsibilities: Maximize system uptime and availability, ensuring functional and performance SLAs. Establish end-to-end monitoring and alerting on all critical aspects. Solve complex problems for critical services...


  • Newton, MA, United States Intelliswift Software Full time

    Title : Site Reliability EngineerLocation : Newton, MA HybridDuration : 6 MonthsPay rate : $38.73 per hour on W2We are seeking a skilled Site Reliability Engineer (SRE) Level 2 to join our dynamic team. The ideal candidate will have a strong technical background, excellent problem-solving skills, and a passion for enhancing system reliability and...


  • Washington, DC, United States Alldus International Consulting Ltd Full time

    Our client is a Series A startup within the Generative AI space and they are hiring a Site Reliability Engineer to join the team. Backed by one of the leading venture capital firms in the industry, this is an exciting opportunity to join a SaaS company that is revolutionizing their industry. Responsibilities: As the Site Reliability Engineer, you will...


  • Miami, United States INSPYR Solutions Full time

    Client: Royal Caribbean Cruise Lines Location: Miami, FL Website: www.rccl.com Duration: 6+ month contract Site Reliability Engineer Description: Consultant will play a critical role in ensuring the reliability, performance, and seamless operation of our digital ecosystem. This includes our guest-facing mobile apps, websites, and the backend systems...


  • Portland, OR, United States Matlen Silver Full time

    Compensation: $70 - $75/HourHybrid: 2 Days Onsite Portland, OregonDomain: Retail/Supply ChainJob Title: Site Reliability EngineerPosition SummaryAs a Site Reliability Engineer/DevOps Engineer, you will be responsible for ensuring the availability, performance, and reliability of Fulfillment Technology solutions for our client to support omni-channel...


  • Indianapolis, IN, United States BCforward Full time

    Site Reliability EngineerBCforward is currently seeking a highly motivated Site Reliability Engineer for an opportunity in Remote!Position Title: Site Reliability EngineerLocation: RemoteAnticipated Start Date: 12/10/2024Please note this is the target date and is subject to change. BCforward will send official notice ahead of a confirmed start date.Expected...


  • Aiea, HI, United States Smxtech Full time

    SMX is seeking a Site Reliability Engineer to support the USINDOPACOM J6 portfolio of programs. This position is a hybrid between Camp H.M. Smith Marine Corps Base and Joint Base Pearl Harbor-Hickam in Hawaii. This position requires a DoD TS/SCI security clearance which requires US citizenship for work on DoD contracts. Responsibilities Independently manage...


  • Sunnyvale, CA, United States Apple Inc. Full time

    To view your favorites, sign in with your Apple Account. Imagine what you could do here. At Apple, new ideas have a way of becoming extraordinary products, services, and customer experiences very quickly. Bring passion and dedication to your job and there's no telling what you could accomplish. The people here at Apple don’t just create products —...


  • Indianapolis, IN, United States BCforward Full time

    Site Reliability EngineerBCforward is currently seeking a highly motivated Site Reliability Engineer for an opportunity in Remote!Position Title: Site Reliability EngineerLocation: RemoteAnticipated Start Date: 12/10/2024Please note this is the target date and is subject to change. BCforward will send official notice ahead of a confirmed start date.Expected...


  • Sunnyvale, CA, United States Microsoft Full time

    There has never been a more exciting time to be working in healthcare at Microsoft. Our Health & Life Sciences Solutions organization is an interdisciplinary team of product managers, designers, engineers, and clinicians who are designing, developing and deploying next-generation healthcare solutions powered by the Microsoft Cloud for healthcare...


  • Austin, TX, United States Sustainable Talent Full time

    Join Sustainable Talent as an Engineering Technician (Site Reliability Engineer) supporting Nvidia and their IPP Platform Group (Infrastructure, Planning and Process)! This is a W-2 full-time contract with openings in Hillsboro, OR and Austin, TX. We offer competitive pay $35-45/hourly based on factors like experience, education, location, etc. and provide...