Current jobs related to Senior Site Reliability Engineer - Austin, Texas - Visa


  • Austin, Texas, United States Publishing Full time

    Job DescriptionAt Publishing, we're seeking a highly skilled Senior Site Reliability Engineer to join our team. As a key member of our infrastructure team, you will be responsible for designing, implementing, and maintaining scalable and reliable cloud infrastructure to support our growing business.ResponsibilitiesDesign and implement scalable cloud...


  • Austin, Texas, United States The Charles Schwab Corporation Full time

    About the RoleAt The Charles Schwab Corporation, we're seeking a highly skilled Senior Site Reliability Engineer to join our team. As a key member of our engineering organization, you'll be responsible for designing, implementing, and maintaining scalable, highly available, and secure cloud-based systems.Key ResponsibilitiesLead the execution of site...


  • Austin, Texas, United States Expedia Group Full time

    Senior Site Reliability EngineerWe are seeking a highly skilled and experienced Senior Software Development Engineer (SRE) to join our team at Expedia Group. The ideal candidate will be responsible for ensuring the reliability, scalability, and performance of our services and systems. You will work closely with development and operations teams to design,...


  • Austin, Texas, United States Expedia Group Full time

    Senior Software Development Engineer - Site ReliabilityWe are seeking a highly skilled and experienced Senior Software Development Engineer (SRE) to join our team at Expedia Group. The ideal candidate will be responsible for ensuring the reliability, scalability, and performance of our services and systems. You will work closely with development and...


  • Austin, Texas, United States Publishing Inc Full time

    About the RoleWe are seeking a highly skilled Senior Site Reliability Engineer to join our team at Publishing.com. As a key member of our IT team, you will be responsible for designing, implementing, and maintaining our cloud infrastructure and operational workflows.ResponsibilitiesDesign and implement scalable solutions to address our growing infrastructure...


  • Austin, Texas, United States AutoRABIT Holding Inc. Full time

    About the RoleAutoRABIT Holding Inc. is seeking a highly skilled Senior Site Reliability/DevOps Engineer to join our team. As a key member of our cloud services team, you will be responsible for developing, scaling, and operating our cloud infrastructure.Key Responsibilities:Design, implement, and maintain scalable, resilient, and secure infrastructure using...


  • Austin, Texas, United States Weedmaps Full time

    About the RoleWe are seeking a highly skilled Senior Site Reliability Engineer to join our team at Weedmaps. As a key member of our engineering team, you will play a critical role in ensuring the reliability, scalability, and performance of our cloud-based services.Key ResponsibilitiesLeverage your engineering expertise to build, monitor, and improve our...


  • Austin, Texas, United States Expedia Group Full time

    Job SummaryWe are seeking a highly skilled and experienced Senior Software Development Engineer (SRE) to join our team at Expedia Group. The ideal candidate will be responsible for ensuring the reliability, scalability, and performance of our services and systems. You will work closely with development and operations teams to design, build, and maintain...


  • Austin, Texas, United States Publishing Inc Full time

    About the RoleAt Publishing Inc, we're seeking a highly skilled Senior Site Reliability Engineer to join our team. As a key member of our infrastructure team, you will be responsible for designing, implementing, and maintaining our cloud infrastructure to ensure high availability, scalability, and performance.ResponsibilitiesDesign and implement scalable...


  • Austin, Texas, United States Apple Full time

    About the RoleWe are seeking a highly skilled Site Reliability Engineer to join our team at Apple. As a Site Reliability Engineer, you will play a vital role in designing, building, and maintaining our core infrastructure.This infrastructure enables thousands of Apple Developers to submit their Apps to the App Store that delight millions of Apple...


  • Austin, Texas, United States Terminal Industries Full time

    About UsTerminal Industries is a pioneering company that leverages cutting-edge machine learning to digitize, index, and automate the yard. Our platform empowers warehouse operators to optimize their usage of trucks, trailers, chassis, containers, and personnel.We address industry-wide pain points, including compliance, manual processes, equipment location,...


  • Austin, Texas, United States Apple Full time

    Job Title: Site Reliability EngineerJob Summary:At Apple, we are seeking a highly skilled Site Reliability Engineer to join our Ad Platforms team. As a Site Reliability Engineer, you will be responsible for ensuring the reliability, scalability, and performance of our ad-tech systems.Key Responsibilities:Implement and improve our infrastructure and...


  • Austin, Texas, United States Terminal Industries Full time

    About UsTerminal Industries is a leading provider of software solutions for the logistics industry. Our platform digitizes, indexes, and automates the yard, leveraging best-in-class machine learning to optimize truck, trailer, chassis, container, and personnel usage.Our PlatformOur platform provides warehouse operators with the intelligence needed to...


  • Austin, Texas, United States Tesla Full time

    Job SummaryWe are seeking a highly skilled Senior Site Reliability Engineer to join our Energy team at Tesla. As a key member of our team, you will be responsible for designing, building, and operating the infrastructure that powers our Energy IoT applications.Key ResponsibilitiesInvestigate and resolve complex technical issues related to the availability,...


  • Austin, Texas, United States Oracle Full time

    Job DescriptionOracle is seeking a highly skilled Site Reliability Engineer to join our team. As a Site Reliability Engineer, you will be responsible for ensuring the reliability, scalability, and performance of our cloud-based services.Key ResponsibilitiesDesign, develop, and deploy automation tools to improve the efficiency and reliability of our cloud...


  • Austin, Texas, United States Oracle Full time

    Job DescriptionOracle is seeking a highly skilled Site Reliability Engineer to join our team. As a Site Reliability Engineer, you will be responsible for ensuring the reliability, scalability, and performance of our cloud-based services.Key ResponsibilitiesDesign, develop, and deploy software to improve the availability, scalability, and efficiency of Oracle...


  • Austin, Texas, United States Unreal Gigs Full time

    Job Summary:At Unreal Gigs, we're seeking a skilled Site Reliability Engineer to join our team. As a Site Reliability Engineer, you'll play a critical role in ensuring the high availability, scalability, and performance of our complex distributed systems. You'll be responsible for building and maintaining highly reliable systems, automating infrastructure...


  • Austin, Texas, United States Cisco Full time

    About the RoleCisco is seeking a highly skilled Site Reliability Engineer to join our team. As a Site Reliability Engineer, you will be responsible for designing, implementing, and maintaining the reliability and scalability of our cloud-based infrastructure.Key ResponsibilitiesDesign and implement automated solutions to improve the reliability and...


  • Austin, Texas, United States Thales Full time

    Job Title: Site Reliability EngineerThales is seeking an experienced Site Reliability Engineer to join our team. As a Site Reliability Engineer, you will be responsible for ensuring the reliability, performance, and security of our cloud-based services.Key Responsibilities:Collaborate with project managers and service delivery managers to analyze traffic...


  • Austin, Texas, United States H-E-B Full time

    Job Title: Staff Site Reliability EngineerH-E-B Digital is seeking a highly skilled Staff Site Reliability Engineer to join our team. As a key member of our engineering organization, you will be responsible for designing and implementing fault-tolerant architectures, influencing code architecture, and establishing reliability standards across...

Senior Site Reliability Engineer

2 months ago


Austin, Texas, United States Visa Full time
Company Overview


Visa stands as a global frontrunner in digital payment solutions, orchestrating over 215 billion transactions annually across a vast network of consumers, merchants, financial institutions, and governmental bodies in more than 200 nations.

Our vision is to unite the globe through the most advanced, convenient, reliable, and secure payment infrastructure, empowering individuals, enterprises, and economies to flourish.

Joining Visa means becoming part of a community that values purpose and inclusivity – where your development is prioritized, your identity is celebrated, and your contributions are significant.

We are committed to fostering economies that are inclusive of all, enhancing the lives of individuals everywhere.

Your role will directly influence billions globally, facilitating financial access and shaping the future of monetary transactions.


Join Visa:
A Network Committed to Inclusivity.
Position Overview

The Product Reliability Engineering (PRE) division is an integral part of Visa's technological framework.

This team is tasked with the stewardship and support of Visa's data resources, delivering enhanced products and services that drive innovation for our partners and clients, both within Visa and on a global scale.

The Big Data Platform Team within Product Reliability Engineering focuses on supporting an open-source Big Data ecosystem and associated services at Visa.

As a Senior Site Reliability Engineer, you will oversee monitoring, diagnosing, automating, and continuously enhancing software products and tools to bolster the availability and resilience of open-source platforms at Visa.


Key Responsibilities:
  • Administer and engineer solutions on open-source technologies such as Hadoop, Spark, Airflow, and machine learning platforms operating on open-source Kubernetes clusters.
  • Exhibit strong troubleshooting and debugging capabilities.
  • Foster cross-team collaboration, building and nurturing relationships with customer teams, user communities, architects, and engineering teams to ensure production scalability and stability.
  • Conduct effective root-cause analysis of significant production incidents and develop comprehensive documentation for learning.
  • Plan and execute capacity expansions and upgrades promptly to prevent scaling issues and bugs.
  • Automate repetitive tasks to minimize manual effort and reduce human error.
  • Optimize alerting systems and establish observability to proactively identify issues and performance challenges.
  • Utilize DevOps tools and methodologies (incident, problem, and change management) in daily operations.
  • Implement automation and self-healing processes as required.
  • Lead investigations into root causes of Kubernetes application service failures and support escalation processes.
  • Ensure Kubernetes platform services meet performance and SLA standards effectively.
  • Enhance security and hardening of the Kubernetes cluster with monitoring and auditing dashboards.

This role operates in a hybrid work environment. Employees in hybrid roles are expected to work from the office 2-3 designated days a week, as determined by leadership, with a general expectation of being in the office 50% or more of the time based on business requirements.

Qualifications

Basic Qualifications:
5+ years of relevant experience with a Bachelor's Degree, or at least 2 years of experience with an Advanced degree (e.g., Masters, MBA, JD, MD), or 0 years of experience with a PhD, OR 8+ years of relevant experience.

Preferred Qualifications:
6 or more years of experience with a Bachelor's Degree, or 4 or more years of relevant experience with an Advanced Degree (e.g., Masters, MBA, JD, MD), or up to 3 years of relevant experience with a PhD.
At least 3 years of hands-on experience with on-premises container infrastructure – OpenShift, open-source Kubernetes preferred.
Familiarity with infrastructure operations and production support of container technologies and orchestration platforms is advantageous.
Knowledge of Docker/Kubernetes deployment, configuration, scaling, and management of containerized applications is beneficial.
Experience in managing and optimizing performance of Hadoop platforms.
Extensive knowledge of the Hadoop ecosystem, including HDFS, Yarn, HIVE, and SPARK.
Proficient in Shell and Python programming for automating repetitive DevOps tasks.
Understanding of security tools such as Kerberos and Ranger.
Strong knowledge and experience in Unix/Linux Systems Administration in relevant technologies.
Experience with configuration management tools like Chef and Ansible is a plus.
Familiarity with monitoring and logging tools such as Prometheus and Grafana is advantageous.
Excellent verbal and written communication skills, along with strong analytical and problem-solving abilities.
Self-motivated with the ability to work independently.
Additional Information

Work Hours:
Varies based on departmental needs.

Travel Requirements:
This position may require travel 5-10% of the time.

Mental/Physical Requirements:
This position will be performed in an office environment.


The role necessitates the ability to sit and stand at a desk, communicate in person and via telephone, and frequently operate standard office equipment, such as telephones and computers.

Visa is an Equal Employment Opportunity Employer.


Qualified candidates will receive consideration for employment without regard to race, color, religion, sex, national origin, sexual orientation, gender identity, disability, or protected veteran status.


Visa will also consider for employment qualified applicants with criminal histories in a manner consistent with EEOC guidelines and applicable local law.