Data Engineer
2 weeks ago
Job Title: PySpark Developer
Company Overview
[Insert Company Name] is a leading organization in [industry/sector, e.g., data analytics, finance, or technology services], committed to handling large-scale data securely and efficiently. We are seeking a talented PySpark Developer to join our data engineering team, focusing on processing high-volume datasets while ensuring compliance with data privacy standards through PII identification and tokenization.
Job Summary
As a PySpark Developer, you will be responsible for building and optimizing data pipelines that ingest massive datasets from Hadoop systems. Your primary focus will be on scanning dataset fields to detect Personally Identifiable Information (PII), integrating tokenization services for data anonymization, and ensuring high-performance query execution. This role requires expertise in big data technologies, Python, and Apache Spark, with a strong emphasis on scalability, efficiency, and data security.
Key Responsibilities
- Design, develop, and maintain PySpark-based ETL pipelines to read and process high volumes of multiple datasets from Hadoop Distributed File System (HDFS).
- Analyze and traverse multiple fields within datasets to identify attributes containing PII data, using pattern matching, rules-based logic, or machine learning-assisted detection where applicable.
- Integrate and call external tokenization services to tokenize sensitive PII data for secure storage and processing, as well as de-tokenize data when required for authorized access.
- Optimize PySpark queries and data processing workflows to handle huge volumes of data efficiently, minimizing latency and resource consumption.
- Collaborate with data architects, security teams, and stakeholders to ensure compliance with data privacy regulations (e.g., GDPR, CCPA).
- Monitor and troubleshoot data pipeline performance, implementing best practices for partitioning, caching, and join optimizations in PySpark.
- Document code, processes, and data flows to support team knowledge sharing and maintainability.
- Participate in code reviews, testing, and deployment of data solutions in a CI/CD environment.
Required Qualifications and Skills
- Bachelor's or Master's degree in Computer Science, Data Engineering, or a related field.
- 5+ years of hands-on experience with
Apache Spark
and
PySpark
for big data processing. - Advanced proficiency in
Python
for data processing, scripting, and integration with Spark applications. - Proven expertise in working with Hadoop ecosystems, including HDFS, YARN, and related tools.
- Strong understanding of data privacy concepts, including PII identification techniques (e.g., regex patterns, entity recognition).
- Experience integrating APIs or services for tokenization/de-tokenization (e.g., via RESTful services or cloud-based tools like AWS Macie or custom microservices).
- Deep knowledge of handling large-scale data volumes, including data partitioning, shuffling, and broadcast joins in Spark.
- Acute awareness of query optimization strategies, such as cost-based optimization, predicate pushdown, and tuning Spark configurations (e.g., executor memory, parallelism).
- Proficiency in SQL for data querying.
- Experience with version control systems (e.g., Git) and agile methodologies.
Preferred Qualifications
- Certifications in big data technologies (e.g., Databricks Certified Developer for Apache Spark, Cloudera Certified Data Engineer).
- Familiarity with cloud platforms like AWS, Azure, or GCP for big data processing.
- Knowledge of additional data security tools or frameworks (e.g., Apache Ranger, Kerberos for authentication).
- Experience with machine learning libraries in PySpark (e.g., MLlib) for advanced PII detection.
- Background in data governance or compliance roles.
What We Offer
- Competitive salary and benefits package.
- Opportunities for professional growth in a dynamic, innovative environment.
- Flexible work arrangements, including remote options.
- Access to cutting-edge tools and technologies for big data and AI.
-
Data Warehouse Developer
2 weeks ago
Whippany, New Jersey, United States Barclays Full time $120,000 - $175,000 per yearJoin Barclays as a Data Warehouse Developer. You will play a key role in decommissioning legacy systems and transitioning to a strategic platform. This role requires you to be influential, enthusiastic, and self-motivated. You should be comfortable escalating issues to leadership when necessary and collaborating directly with team members to resolve...
-
Principal Engineer – Cloud Transformation
2 weeks ago
Whippany, New Jersey, United States Barclays Full time $200,000 - $280,000Job DescriptionPurpose of the roleTo drive technical excellence and innovation by leading the design and implementation of robust software solutions, providing mentorship to engineering teams, fostering cross-functional collaboration, and contributing to strategic planning to ensure the delivery of high-quality solutions aligned with business...
-
Senior Software Engineer
2 weeks ago
Whippany, New Jersey, United States Barclays Full time $170,000 - $230,000Job DescriptionPurpose of the roleTo design, develop and improve software, utilising various engineering methodologies, that provides business, platform, and technology capabilities for our customers and colleagues. AccountabilitiesDevelopment and delivery of high-quality software solutions by using industry aligned programming languages, frameworks, and...
-
Release Engineer/Coordinator
2 days ago
Whippany, New Jersey, United States DS Technologies Inc Full timeRelease Engineer/CoordinatorLocation: Hybrid working at least 2 days a week from Whippany NJ officeEnd client: Barclays High-level job description: Work on supporting Production Release coordination tasks for two US-based squads – involving multiple skill sets in Data/ETL and JAVA/API code artifactsWork closely with Scum Master and Engineers –...
-
Senior Engineering Lead
7 days ago
Whippany, New Jersey, United States Barclays Full timeEmbark on a transformative journey as a Senior Engineering Lead. At Barclays, our vision is clear - to redefine the future of banking and help craft innovative solutions. In this position, you will lead high performing teams to build secure, scalable platforms that power millions of transactions daily. You will influence strategic decisions, drive innovation...
-
Senior Engineering Lead
3 days ago
Whippany, New Jersey, United States Barclays Full time $170,000 - $230,000Job DescriptionPurpose of the roleTo design, develop and improve software, utilising various engineering methodologies, that provides business, platform, and technology capabilities for our customers and colleagues. AccountabilitiesDevelopment and delivery of high-quality software solutions by using industry aligned programming languages, frameworks, and...
-
Software Engineer
2 weeks ago
Whippany, New Jersey, United States Barclays Full time $135,000 - $175,000Job DescriptionPurpose of the roleTo design, develop and improve software, utilising various engineering methodologies, that provides business, platform, and technology capabilities for our customers and colleagues. AccountabilitiesDevelopment and delivery of high-quality software solutions by using industry aligned programming languages, frameworks, and...
-
Principal Engineer
7 days ago
Whippany, New Jersey, United States TalentAlly Full timeBarclays Corporate Digital Banking is hiring for a new Director role: Principal Engineer - Data and AI.The Role Holder WillProduce options analysis and propose technical decisions for complex problemsHands-on engineer beta / poc versions of GenAI solutionsDemonstrate output to senior stakeholdersParticipate in Scaled Agile ceremonies to enable the AI...
-
Senior Extranet Networking Engineer
1 week ago
Whippany, New Jersey, United States Barclays Full time $170,000 - $230,000 per yearThe Specialized Infrastructure team in Barclays is seeking a skilled Senior Extranet Networking Engineer with a background in designing and architecting network patterns in low-latency and high-frequency trading environments that encompass modern cybersecurity and firewall standards. You will play a key role by collaborating and engaging with key business...
-
Senior Connectivity Engineer
2 weeks ago
Whippany, New Jersey, United States Barclays Full time $170,000 - $230,000Job DescriptionPurpose of the roleTo design, develop and improve software, utilising various engineering methodologies, that provides business, platform, and technology capabilities for our customers and colleagues. AccountabilitiesDevelopment and delivery of high-quality software solutions by using industry aligned programming languages, frameworks, and...