Data Engineer
Cummins Inc.
**DESCRIPTION**
**Key Responsibilities:**
+ Implement and automate deployment of distributed systems for ingesting and transforming data from various sources (relational, event-based, unstructured).
+ Continuously monitor and troubleshoot data quality and data integrity issues.
+ Implement data governance processes and methods for managing metadata, access, and retention for internal and external users.
+ Develop reliable, efficient, scalable, and quality data pipelines with monitoring and alert mechanisms using ETL/ELT tools or scripting languages.
+ Develop physical data models and implement data storage architectures as per design guidelines.
+ Analyze complex data elements and systems, data flow, dependencies, and relationships to contribute to conceptual, physical, and logical data models.
+ Participate in testing and troubleshooting of data pipelines.
+ Develop and operate large-scale data storage and processing solutions using distributed and cloud-based platforms (e.g., Data Lakes, Hadoop, Hbase, Cassandra, MongoDB, Accumulo, DynamoDB).
+ Use agile development technologies, such as DevOps, Scrum, Kanban, and continuous improvement cycles, for data-driven applications.
**RESPONSIBILITIES**
**Competencies:**
+ **System Requirements Engineering:** Translate stakeholder needs into verifiable requirements; establish acceptance criteria; track status throughout the system lifecycle; assess impact of changes.
+ **Collaborates:** Build partnerships and work collaboratively with others to meet shared objectives.
+ **Communicates Effectively:** Develop and deliver multi-mode communications that convey a clear understanding of the unique needs of different audiences.
+ **Customer Focus:** Build strong customer relationships and deliver customer-centric solutions.
+ **Decision Quality:** Make good and timely decisions that keep the organization moving forward.
+ **Data Extraction:** Perform ETL activities from various sources and transform them for consumption by downstream applications and users.
+ **Programming:** Create, write, and test computer code, test scripts, and build scripts using industry standards and tools.
+ **Quality Assurance Metrics:** Apply measurement science to assess solution outcomes using ITOM, SDLC standards, tools, metrics, and KPIs.
+ **Solution Documentation:** Document information and solutions based on knowledge gained during product development activities.
+ **Solution Validation Testing:** Validate configuration item changes or solutions using SDLC standards and metrics.
+ **Data Quality:** Identify, understand, and correct data flaws to support effective information governance.
+ **Problem Solving:** Solve problems using systematic analysis processes and industry-standard methodologies.
+ **Values Differences:** Recognize the value that different perspectives and cultures bring to an organization.
**Education, Licenses, Certifications:**
+ College, university, or equivalent degree in a relevant technical discipline, or relevant equivalent experience required.
+ This position may require licensing for compliance with export controls or sanctions regulations.
**Nice to Have Experience:**
+ Understanding of the ML lifecycle.
+ Exposure to Big Data open source technologies.
+ Familiarity with clustered compute cloud-based implementations.
+ Experience developing applications requiring large file movement for a cloud-based environment.
+ Exposure to building analytical solutions and IoT technology.
**Work Environment:**
+ Most work will be with stakeholders in the US, with an overlap of 2-3 hours during EST hours as needed.
+ This role will be Hybrid.
**QUALIFICATIONS**
**Experience:**
+ 3-5 years of experience in data engineering with a strong background in Azure Databricks and Scala/Python.
+ Hands-on experience with Spark (Scala/PySpark) and SQL.
+ Experience with Spark Streaming, Spark Internals, and Query Optimization.
+ Proficiency in Azure Cloud Services.
+ Experience in Agile Development and Unit Testing of ETL.
+ Experience creating ETL pipelines with ML model integration.
+ Knowledge of Big Data storage strategies (optimization and performance).
+ Critical problem-solving skills.
+ Basic understanding of Data Models (SQL/NoSQL) including Delta Lake or Lakehouse.
+ Quick learner.
**Job** Systems/Information Technology
**Organization** Cummins Inc.
**Role Category** Remote
**Job Type** Exempt - Experienced
**ReqID** 2409183
**Relocation Package** Yes
Confirm your E-mail: Send Email
All Jobs from Cummins Inc.