Current jobs related to Site Reliability Engineer - Austin, Texas - Unreal Gigs


  • Austin, Texas, United States Apple Full time

    Job Title: Site Reliability EngineerJob Summary:At Apple, we are seeking a highly skilled Site Reliability Engineer to join our Ad Platforms team. As a Site Reliability Engineer, you will be responsible for ensuring the reliability, scalability, and performance of our ad-tech systems.Key Responsibilities:Implement and improve our infrastructure and...


  • Austin, Texas, United States Unreal Gigs Full time

    Job Summary:At Unreal Gigs, we're seeking a skilled Site Reliability Engineer to join our team. As a Site Reliability Engineer, you'll play a critical role in ensuring the high availability, scalability, and performance of our complex distributed systems. You'll be responsible for building and maintaining highly reliable systems, automating infrastructure...


  • Austin, Texas, United States ORACLE AMERICA Full time

    Job Summary:Oracle America is seeking a skilled Site Reliability Developer 3 to join our team in Austin, TX. As a Site Reliability Developer, you will be responsible for solving complex problems related to infrastructure and cloud services, and building automation to prevent problem recurrence.Key Responsibilities:Solve complex problems related to...


  • Austin, Texas, United States Apple Full time

    Job Title: Site Reliability Engineering ManagerAbout the Role:Apple is seeking a highly skilled Site Reliability Engineering Manager to lead our cloud services team. As a Site Reliability Engineering Manager, you will be responsible for establishing SRE practices for our private cloud service to accelerate our ability to reliably and consistently deliver...


  • Austin, Texas, United States Oxford Knight Full time

    Database Site Reliability EngineerOxford Knight is seeking an experienced Database Site Reliability Engineer to join our Trading Systems Infrastructure team. As a key member of our team, you will be responsible for designing, building, and maintaining our diverse production database infrastructure, focusing on bare metal performance, scalability, and...


  • Austin, Texas, United States Futran Tech Solutions Pvt. Ltd. Full time

    Job Title: Site Reliability Engineer/Infrastructure SpecialistLocation: RemoteJob Type: Full-timeAbout the Role:We are seeking a highly skilled Site Reliability Engineer/Infrastructure Specialist to join our team at Futran Tech Solutions Pvt. Ltd. The ideal candidate will have experience supporting internet-facing production services and distributed systems,...


  • Austin, Texas, United States Apple Full time

    About the RoleWe are seeking a highly skilled Site Reliability Engineer to join our team at Apple. As a Site Reliability Engineer, you will play a vital role in designing, building, and maintaining our core infrastructure.This infrastructure enables thousands of Apple Developers to submit their Apps to the App Store that delight millions of Apple...


  • Austin, Texas, United States Apple Full time

    Job SummaryApple is seeking a Site Reliability Engineer to join our team. As a Site Reliability Engineer, you will be responsible for ensuring the availability, performance, and maintenance of high-volume, highly available, mission-critical enterprise platforms and applications related to Apple Manufacturing & Product lifecycle.Key Responsibilities- Develop...


  • Austin, Texas, United States Terminal Industries Full time

    About UsTerminal Industries is a leading provider of software solutions for the logistics industry. Our platform digitizes, indexes, and automates the yard, leveraging best-in-class machine learning to optimize truck, trailer, chassis, container, and personnel usage.Our PlatformOur platform provides warehouse operators with the intelligence needed to...


  • Austin, Texas, United States Apple Full time

    Job SummaryAt Apple, we are seeking a highly skilled Site Reliability Engineer to join our Ad Platforms team. As a Site Reliability Engineer, you will be responsible for ensuring the reliability and scalability of our ad-tech systems.Key ResponsibilitiesDesign and implement infrastructure and application monitoring and observability capabilities to improve...


  • Austin, Texas, United States AutoRABIT Holding Inc. Full time

    About the RoleAutoRABIT Holding Inc. is seeking a highly skilled Senior Site Reliability/DevOps Engineer to join our team. As a key member of our cloud services team, you will be responsible for developing, scaling, and operating our cloud infrastructure.Key Responsibilities:Design, implement, and maintain scalable, resilient, and secure infrastructure using...


  • Austin, Texas, United States Terminal Industries Full time

    About UsTerminal Industries builds software that digitizes, indexes, and automates the yard, leveraging best-in-class machine learning. Our platform provides warehouse operators with the intelligence needed to optimize their usage of trucks, trailers, chassis, containers, and personnel. These are the fundamental operating assets of commerce - and represent...


  • Austin, Texas, United States Diverse Lynx Full time

    Job Description for Kafka SRE:As a Site Reliability Engineer for Kafka Platform, you will be responsible for carrying out SRE duties to ensure the smooth operation of the Kafka Streaming Platform. Your key responsibilities will include having a thorough understanding of the Kafka architecture, including producers, consumers, topics, and partitions. You will...


  • Austin, Texas, United States Apple Full time

    At Apple, we're looking for a talented Site Reliability Engineer to join our Apple Services Engineering team. As an SRE, you'll play a vital role in designing, building, and maintaining our core infrastructure, which enables thousands of Apple Developers to submit their Apps to the App Store that delight millions of Apple customers.We're seeking someone with...


  • Austin, Texas, United States Procore Technologies Full time

    Job DescriptionWe're seeking a highly skilled Staff Site Reliability Engineer to join Procore's Project Execution Group. In this role, you'll lead, collaborate, and develop solutions to maintain the health of the core platform. The goal is to ensure the chosen design and architecture is highly available, performant, and reliable as this team is directly...


  • Austin, Texas, United States Electric Reliability Council of Texas Full time

    Job DescriptionAt the Electric Reliability Council of Texas, we strive to create a dynamic work environment that fosters innovation and collaboration. Our team is dedicated to building a reliable and efficient power grid, and we're seeking a skilled Reliability and Compliance Engineer to join our efforts.As a key member of our team, you will work closely...

  • Planning Engineer

    1 month ago


    Austin, Texas, United States Electric Reliability Council of Texas Full time

    Job SummaryAt the Electric Reliability Council of Texas (ERCOT), we are seeking a highly skilled Planning Engineer to join our Regional Planning team. As a key member of our team, you will be responsible for ensuring the reliable operation of the electric power grid in compliance with NERC Standards, ERCOT Protocols, and Market Guides.Key...


  • Austin, Texas, United States Electric Reliability Council of Texas Full time

    Job DescriptionAt the Electric Reliability Council of Texas (ERCOT), we are committed to fostering a diverse and inclusive work environment that encourages collaboration and innovation. Our team of talented professionals is dedicated to building a sustainable future for the Texas power grid and wholesale market.Key ResponsibilitiesMonitor and analyze system...


  • Austin, Texas, United States Electric Reliability Council of Texas Full time

    Job OverviewAt the Electric Reliability Council of Texas (ERCOT), we are seeking a highly skilled Power System Support Engineer to join our team. As a key member of our operations team, you will be responsible for providing engineering analysis and technical support to ensure the reliable operation of the electric power grid.Key Responsibilities:Perform...

  • Reliability Engineer

    4 weeks ago


    Austin, Texas, United States Solar Edge, LLC Full time

    About the Role:SolarEdge is a global leader in high-performance smart energy technology, with a diverse product offering that includes intelligent solar inverters, battery storage, backup systems, EV charging, and complete home energy management ecosystems.We are seeking a skilled Reliability Engineer to join our new SolarEdge Manufacturing Team in Austin,...

Site Reliability Engineer

4 weeks ago


Austin, Texas, United States Unreal Gigs Full time
Job Summary:

At Unreal Gigs, we're seeking a highly skilled Site Reliability Engineer to join our team. As a Site Reliability Engineer, you'll play a critical role in ensuring the high availability, scalability, and performance of our complex distributed systems. You'll be responsible for designing, implementing, and maintaining reliable systems, automating infrastructure management, and collaborating with cross-functional teams to drive system reliability and performance.

Key Responsibilities:

  1. System Monitoring and Incident Management:
  • Set up and manage monitoring, logging, and alerting systems using tools like Prometheus, Grafana, or ELK Stack. You'll proactively identify and resolve issues before they impact users and be responsible for managing incidents when they arise.
  • Automation and Infrastructure as Code (IaC):
  • Automate everything from infrastructure provisioning to deployments and scaling, using tools like Terraform, Ansible, or Puppet to manage infrastructure as code. You'll ensure that systems are built to scale and adapt automatically to load.
  • High Availability and Performance Optimization:
  • Ensure services and applications are always available and optimized for performance. You'll design and implement strategies to improve uptime, reduce latency, and scale services efficiently, using techniques such as load balancing, failover systems, and clustering.
  • Disaster Recovery and Backup Solutions:
  • Design, implement, and test disaster recovery strategies and backup solutions. You'll ensure that systems and data are recoverable in the event of an outage or failure, minimizing downtime and impact on users.
  • Collaboration with Development and DevOps Teams:
  • Work closely with developers and DevOps engineers to ensure that new features are reliable and scalable. You'll collaborate to implement reliability engineering practices such as service level indicators (SLIs) and service level objectives (SLOs) and enforce best practices for system reliability.
  • On-Call Responsibilities and Incident Response:
  • Participate in on-call rotations to respond to incidents, troubleshoot problems, and bring systems back to normal operation. You'll ensure smooth communication during outages and post-mortems to improve future reliability.
  • Capacity Planning and Scalability:
  • Perform capacity planning to ensure systems can handle traffic increases and growth. You'll predict future demand and ensure that infrastructure scales smoothly to accommodate it.
Requirements:

  • System Reliability and Automation Expertise: Experience with building and maintaining highly reliable systems and automating infrastructure management using tools like Terraform, Ansible, or Puppet. You're skilled at optimizing systems for uptime and performance.
  • Monitoring and Incident Management: Proficiency in setting up and managing monitoring, logging, and alerting systems like Prometheus, Grafana, or ELK Stack. You have experience with incident management and problem resolution.
  • Cloud Infrastructure Management: Hands-on experience managing cloud infrastructure on platforms such as AWS, GCP, or Azure. You're skilled at deploying and maintaining scalable systems in the cloud.
  • Performance Optimization: Expertise in optimizing systems for low latency, high throughput, and minimal downtime. You understand load balancing, caching strategies, and database performance optimization.
  • Security and Compliance: Understanding of security best practices, encryption, and compliance frameworks such as SOC2 or GDPR. You ensure that systems are secure while maintaining reliability.
Education and Experience:

  • Bachelor's degree in Computer Science, Systems Engineering, or a related field. Equivalent experience in site reliability engineering, systems administration, or DevOps is also valued.
  • Certifications such as AWS Certified Solutions Architect, Kubernetes Administrator, or SRE Practitioner are a plus.
  • 3+ years of experience in site reliability engineering or a similar role, with a focus on system automation, performance optimization, and cloud infrastructure management.
  • Proven experience managing large-scale, distributed systems with a focus on maintaining uptime, monitoring, and incident resolution.
  • Hands-on experience with containerization (Docker) and orchestration (Kubernetes) in a production environment.
Benefits:

  • Health and Wellness: Comprehensive medical, dental, and vision insurance plans with low co-pays and premiums.
  • Paid Time Off: Competitive vacation, sick leave, and 20 paid holidays per year.
  • Work-Life Balance: Flexible work schedules and telecommuting options.
  • Professional Development: Opportunities for training, certification reimbursement, and career advancement programs.
  • Wellness Programs: Access to wellness programs, including gym memberships, health screenings, and mental health resources.
  • Life and Disability Insurance: Life insurance and short-term/long-term disability coverage.
  • Employee Assistance Program (EAP): Confidential counseling and support services for personal and professional challenges.
  • Tuition Reimbursement: Financial assistance for continuing education and professional development.
  • Community Engagement: Opportunities to participate in community service and volunteer activities.
  • Recognition Programs: Employee recognition programs to celebrate achievements and milestones.