Cork, Ireland, Ireland
9 hours ago
Data Engineer - Machine Learning

It's fun to work in a company where people truly BELIEVE in what they're doing!

We're committed to bringing passion and customer focus to the business.

Corporate Overview

Proofpoint is a leading cybersecurity company protecting organizations’ greatest assets and biggest risks: vulnerabilities in people. With an integrated suite of cloud-based solutions, Proofpoint helps companies around the world stop targeted threats, safeguard their data, and make their users more resilient against cyber attacks. Leading organizations of all sizes, including more than half of the Fortune 1000, rely on Proofpoint for people-centric security and compliance solutions mitigating their most critical risks across email, the cloud, social media, and the web.

We are singularly devoted to helping our customers protect their greatest assets and biggest security risk: their people. That’s why we’re a leader in next-generation cybersecurity.

About the Team

The AI Forge is an internal machine learning group that consults across Proofpoint's entire product portfolio.  We are a group of machine learning scientists and software engineers who love keeping up with the latest ML research, fostering a collaborative and creative workplace, and solving challenging problems. Over the past several years, we have developed data-driven product features, leveraging a range of model architectures from tree-based models to state-of-the-art transformers. 

We are launching an initiative in Cork, Ireland to work on impactful projects and product applying state of the art AI in support of Proofpoint’s Human Centric Security focus. We are looking for talented and motivated individuals to join this new team.

About the Role

As a Data Pipeline Engineer at Proofpoint, you will develop and maintain large-scale data ingestion, processing, and training pipelines within our Privacy Attested AI Platform. Your work will be critical in preparing large volumes of data for model training and then facilitating the training of those models.

We welcome applications from candidates at all experience levels (junior to senior).

Responsibilities

Design and implement scalable, high-performance data pipelines for ingesting and processing multi-modal cybersecurity data (emails, URLs, forensic logs).

Develop distributed training architectures, ensuring efficient multi-GPU model training across cloud and on-prem environments.

Ensure data integrity through de-duplication, transformation, and validation techniques.

Work within our privacy-compliant data handling environment, and collaborate with the privacy team while building pipelines.

Optimize data pipelines for high-throughput AI model training workflows.

Collaborate with AI infrastructure engineers, Machine Learning Scientists, and cloud infrastructure teams to align data processing with AI objectives.

Work with distributed computing frameworks (Spark, Ray, Dask, etc.) to scale data processing across multiple cloud environments.

Implement monitoring and observability tools for data lineage tracking and pipeline performance.

 

Qualifications

Strong experience in Python, Go, or Java for data pipeline and distributed computing development.

Hands-on experience with data pipeline frameworks (Apache Kafka, Spark, Flink, Airflow, or similar).

Understanding of distributed computing architectures for AI training (Ray, Kubernetes, PyTorch Distributed, or similar).

Familiarity with cloud-based data storage solutions (AWS S3, GCP BigQuery, Azure Data Lake).

Understanding of data security, access controls, and encryption techniques.

 

Preferred Qualifications (Senior-Level Candidates):

Expertise in high-scale distributed data systems.

Experience with privacy-preserving techniques such as federated learning, secure enclaves, etc.

Prior experience working on data pipelines and scalable compute architectures for AI model training.

If you like wild growth and working with happy, enthusiastic over-achievers, you'll enjoy your career with us!

Confirm your E-mail: Send Email