Data Engineer

2 weeks ago


Whippany, New Jersey, United States Highbrow LLC Full time $120,000 - $180,000 per year

Job Title: PySpark Developer

Company Overview

[Insert Company Name] is a leading organization in [industry/sector, e.g., data analytics, finance, or technology services], committed to handling large-scale data securely and efficiently. We are seeking a talented PySpark Developer to join our data engineering team, focusing on processing high-volume datasets while ensuring compliance with data privacy standards through PII identification and tokenization.

Job Summary

As a PySpark Developer, you will be responsible for building and optimizing data pipelines that ingest massive datasets from Hadoop systems. Your primary focus will be on scanning dataset fields to detect Personally Identifiable Information (PII), integrating tokenization services for data anonymization, and ensuring high-performance query execution. This role requires expertise in big data technologies, Python, and Apache Spark, with a strong emphasis on scalability, efficiency, and data security.

Key Responsibilities

  • Design, develop, and maintain PySpark-based ETL pipelines to read and process high volumes of multiple datasets from Hadoop Distributed File System (HDFS).
  • Analyze and traverse multiple fields within datasets to identify attributes containing PII data, using pattern matching, rules-based logic, or machine learning-assisted detection where applicable.
  • Integrate and call external tokenization services to tokenize sensitive PII data for secure storage and processing, as well as de-tokenize data when required for authorized access.
  • Optimize PySpark queries and data processing workflows to handle huge volumes of data efficiently, minimizing latency and resource consumption.
  • Collaborate with data architects, security teams, and stakeholders to ensure compliance with data privacy regulations (e.g., GDPR, CCPA).
  • Monitor and troubleshoot data pipeline performance, implementing best practices for partitioning, caching, and join optimizations in PySpark.
  • Document code, processes, and data flows to support team knowledge sharing and maintainability.
  • Participate in code reviews, testing, and deployment of data solutions in a CI/CD environment.

Required Qualifications and Skills

  • Bachelor's or Master's degree in Computer Science, Data Engineering, or a related field.
  • 5+ years of hands-on experience with
    Apache Spark
    and
    PySpark
    for big data processing.
  • Advanced proficiency in
    Python
    for data processing, scripting, and integration with Spark applications.
  • Proven expertise in working with Hadoop ecosystems, including HDFS, YARN, and related tools.
  • Strong understanding of data privacy concepts, including PII identification techniques (e.g., regex patterns, entity recognition).
  • Experience integrating APIs or services for tokenization/de-tokenization (e.g., via RESTful services or cloud-based tools like AWS Macie or custom microservices).
  • Deep knowledge of handling large-scale data volumes, including data partitioning, shuffling, and broadcast joins in Spark.
  • Acute awareness of query optimization strategies, such as cost-based optimization, predicate pushdown, and tuning Spark configurations (e.g., executor memory, parallelism).
  • Proficiency in SQL for data querying.
  • Experience with version control systems (e.g., Git) and agile methodologies.

Preferred Qualifications

  • Certifications in big data technologies (e.g., Databricks Certified Developer for Apache Spark, Cloudera Certified Data Engineer).
  • Familiarity with cloud platforms like AWS, Azure, or GCP for big data processing.
  • Knowledge of additional data security tools or frameworks (e.g., Apache Ranger, Kerberos for authentication).
  • Experience with machine learning libraries in PySpark (e.g., MLlib) for advanced PII detection.
  • Background in data governance or compliance roles.

What We Offer

  • Competitive salary and benefits package.
  • Opportunities for professional growth in a dynamic, innovative environment.
  • Flexible work arrangements, including remote options.
  • Access to cutting-edge tools and technologies for big data and AI.


  • Whippany, New Jersey, United States Barclays Full time $120,000 - $175,000 per year

    Join Barclays as a Data Warehouse Developer. You will play a key role in decommissioning legacy systems and transitioning to a strategic platform. This role requires you to be influential, enthusiastic, and self-motivated. You should be comfortable escalating issues to leadership when necessary and collaborating directly with team members to resolve...


  • Whippany, New Jersey, United States Barclays Full time $200,000 - $280,000

    Job DescriptionPurpose of the roleTo drive technical excellence and innovation by leading the design and implementation of robust software solutions, providing mentorship to engineering teams, fostering cross-functional collaboration, and contributing to strategic planning to ensure the delivery of high-quality solutions aligned with business...


  • Whippany, New Jersey, United States Barclays Full time $170,000 - $230,000

    Job DescriptionPurpose of the roleTo design, develop and improve software, utilising various engineering methodologies, that provides business, platform, and technology capabilities for our customers and colleagues. AccountabilitiesDevelopment and delivery of high-quality software solutions by using industry aligned programming languages, frameworks, and...


  • Whippany, New Jersey, United States DS Technologies Inc Full time

    Release Engineer/CoordinatorLocation: Hybrid working at least 2 days a week from Whippany NJ officeEnd client: Barclays High-level job description: Work on supporting Production Release coordination tasks for two US-based squads – involving multiple skill sets in Data/ETL and JAVA/API code artifactsWork closely with Scum Master and Engineers –...


  • Whippany, New Jersey, United States Barclays Full time

    Embark on a transformative journey as a Senior Engineering Lead. At Barclays, our vision is clear - to redefine the future of banking and help craft innovative solutions. In this position, you will lead high performing teams to build secure, scalable platforms that power millions of transactions daily. You will influence strategic decisions, drive innovation...


  • Whippany, New Jersey, United States Barclays Full time $170,000 - $230,000

    Job DescriptionPurpose of the roleTo design, develop and improve software, utilising various engineering methodologies, that provides business, platform, and technology capabilities for our customers and colleagues. AccountabilitiesDevelopment and delivery of high-quality software solutions by using industry aligned programming languages, frameworks, and...

  • Software Engineer

    2 weeks ago


    Whippany, New Jersey, United States Barclays Full time $135,000 - $175,000

    Job DescriptionPurpose of the roleTo design, develop and improve software, utilising various engineering methodologies, that provides business, platform, and technology capabilities for our customers and colleagues. AccountabilitiesDevelopment and delivery of high-quality software solutions by using industry aligned programming languages, frameworks, and...

  • Principal Engineer

    7 days ago


    Whippany, New Jersey, United States TalentAlly Full time

    Barclays Corporate Digital Banking is hiring for a new Director role: Principal Engineer - Data and AI.The Role Holder WillProduce options analysis and propose technical decisions for complex problemsHands-on engineer beta / poc versions of GenAI solutionsDemonstrate output to senior stakeholdersParticipate in Scaled Agile ceremonies to enable the AI...


  • Whippany, New Jersey, United States Barclays Full time $170,000 - $230,000 per year

    The Specialized Infrastructure team in Barclays is seeking a skilled Senior Extranet Networking Engineer with a background in designing and architecting network patterns in low-latency and high-frequency trading environments that encompass modern cybersecurity and firewall standards. You will play a key role by collaborating and engaging with key business...


  • Whippany, New Jersey, United States Barclays Full time $170,000 - $230,000

    Job DescriptionPurpose of the roleTo design, develop and improve software, utilising various engineering methodologies, that provides business, platform, and technology capabilities for our customers and colleagues. AccountabilitiesDevelopment and delivery of high-quality software solutions by using industry aligned programming languages, frameworks, and...