Job Description:
Data Scientist / Data Warehouse Engineer (Unstructured Data Extraction & Processing)Position OverviewParsons Corporation is seeking a Data Scientist / Data Warehouse Engineer with a strong focus on handling unstructured data extraction and processing. The ideal candidate will design, develop, and maintain scalable data pipelines, integrating both structured and unstructured data from various sources. This role requires technical expertise in data engineering tools and best practices, as well as excellent communication and collaboration skills to work cross-functionally with data analysts, scientists, and stakeholders. Join a dedicated and distributed team of scientists, software architects, and software engineers responsible for developing a Generative Artificial Intelligence (GenAI) enabled capability to expedite the design of infrastructure projects such as highways, bridges, etc.
Key ResponsibilitiesUnstructured Data Processing
Extract, cleanse, and process unstructured data (e.g., text, logs, images) for use in analytics and machine learning.Develop and optimize custom ETL/ELT pipelines to handle complex data formats and large data volumes.Data Pipeline Development
Build robust and scalable data pipelines using Apache Spark, Hadoop, or Apache Beam.Automate workflows and schedule data processes using orchestration tools such as Apache Airflow, Prefect, or Luigi.Data Warehousing & Storage
Design, implement, and maintain modern data warehouse solutions (e.g., Databricks, Snowflake, Redshift, BigQuery).Manage both relational (SQL) and NoSQL databases for structured and unstructured data storage.Cloud Integration
Deploy and optimize data solutions on cloud platforms (Azure, AWS, or GCP).Leverage services like Azure Data Factory, AWS Glue, or Google Dataflow for seamless data ingestion and transformation.Performance Optimization & Troubleshooting
Monitor, diagnose, and improve data system performance and reliability.Collaborate with other teams to refine database queries, optimize ETL processes, and ensure data integrity.Data Governance & Security
Implement data quality checks, versioning, and security protocols in compliance with regulations (GDPR, CCPA).Ensure robust access controls and encryption measures for sensitive information.Collaboration & Documentation
Work closely with cross-functional teams to understand data requirements and deliver solutions.Document workflows, system designs, and troubleshooting procedures to support knowledge sharing and future maintenance.Required Technical SkillsProgramming
Proficiency in Python for data processing and automation.Experience with scripting languages (e.g., Bash, Shell) is a plus.Data Processing Frameworks
Hands-on experience with Apache Spark, Hadoop, or Apache Beam.Familiarity with ETL/ELT processes and best practices.Database & Querying
Strong understanding of SQL with experience in PostgreSQL, MySQL, or Oracle.Exposure to NoSQL databases like MongoDB, Cassandra, or DynamoDB.Cloud Platforms
Working knowledge of Azure (e.g., Data Factory, Synapse, Data Lake), AWS (e.g., S3, Redshift, Glue), or GCP (BigQuery, Dataflow).Data Warehousing
Experience with Databricks, Snowflake, Redshift, or BigQuery.Data Pipelines & Orchestration
Familiarity with workflow orchestration tools (Airflow, Prefect, Luigi).Big Data Tools
Proficiency working with distributed data systems like HDFS or cloud-native equivalents.Version Control
Skilled in Git for collaborative development and code versioning.ExperienceYears of Experience: Minimum 4 years in data engineering, data warehousing, or a related field.Project Exposure: Demonstrated ability to build and optimize scalable data pipelines for both batch and real-time processing.Debugging & Optimization: Proven track record of diagnosing performance issues and optimizing data systems.Data Governance & Security: Experience implementing data privacy regulations and best practices in data quality and access controls.Soft SkillsProblem-Solving
Capable of independently troubleshooting complex data and system issues.Communication
Strong ability to collaborate with data analysts, scientists, and other engineers to translate business requirements into effective data solutions.Documentation
Competent in documenting data workflows, system designs, and troubleshooting steps clearly and concisely.Team Collaboration
Experience working in cross-functional teams of professionals that are located around the world, ideally within Agile or similar methodologies.EducationBachelor’s or Master’s degree in Computer Science, Information Systems, Engineering, or a related field.Equivalent practical experience can compensate for formal education in some cases.Certifications (Optional but Valuable)AWS Certified Data Analytics – SpecialtyGoogle Professional Data EngineerMicrosoft Azure Data Engineer AssociateDatabricks Certified Data Engineer AssociateAdditional ConsiderationsAnalytical & Statistical Skills: A background in data analysis or data science is highly beneficial for designing effective data models and understanding business insights.Machine Learning Integration: Exposure to integrating machine learning pipelines, especially GenAI technology, for further data-driven intelligence is a plus.Innovative Mindset: Enthusiasm for exploring new tools, frameworks, and methodologies to continually optimize data solutions.Why Join Us
Impactful Role: Shape the architecture and strategy for unstructured data management and analytics, influencing key decisions and driving business value.Collaborative Environment: Work alongside a dynamic team of data professionals, leveraging cutting-edge technologies to solve real-world challenges.Professional Growth: Expand your technical acumen and leadership capabilities in a role that offers continuous learning and development opportunities.Minimum Clearance Required to Start:
Not Applicable/NoneParsons is an equal opportunity employer committed to diversity in the workplace. Minority/Female/Disabled/Protected Veteran.