Anywhere Colombia, Colombia
9 days ago
Data Engineer

Data Engineer is responsible for designing, building, and maintaining the infrastructure and systems required for collecting, storing, and processing large datasets efficiently.

Education: Bachelor's degree in computer science with 8+ years of experience

Experience:

Technical Skills Programming Languages: Proficiency in Python, SQL, Java, or Scala for data manipulation and pipeline development. Data Processing Frameworks: Experience with tools like Apache Spark, Hadoop, or Apache Kafka for large-scale data processing. Data Systems and Platforms Databases: Knowledge of both relational databases (e.g., MySQL, PostgreSQL) and NoSQL databases (e.g., MongoDB, Cassandra). Data Warehousing: Experience with platforms like Snowflake, Amazon Redshift and Azure Synapse. Cloud Platforms: Familiarity with AWS, Azure Cloud for deploying and managing data pipelines. Having Good experience in Fabric is advantageous Experience working with distributed computing systems such as Hadoop HDFS, Hive, or Spark. Managing and optimizing data lakes and delta lakes for structured and unstructured data. Data Modeling and Architecture Expertise in designing efficient data models (e.g., star schema, snowflake schema) and maintaining data integrity. Understanding of modern data architectures like Data Mesh or Lambda Architecture. Data Pipeline Development Building and automating ETL/ELT pipelines for extracting data from diverse sources, transforming it, and loading it into target systems. Monitoring and troubleshooting pipeline performance and failures. Workflow Orchestration Hands-on experience with orchestration tools such as Azure Data Factory, AWS Glue jobs, DMS or Prefect to schedule and manage workflows. Version Control and CI/CD Utilizing Git for version control and implementing CI/CD practices for data pipeline deployments.

Key Skills:

Proficiency in programming languages such as Python, SQL, and optionally Scala or Java. Proficiency in data processing frameworks like Apache Spark and Hadoop is crucial for handling large-scale and real-time data. Expertise in ETL/ELT tools such as Azure ADF and Fabric Data Pipeline is important for creating efficient and scalable data pipelines. A solid understanding of database systems, including relational databases like MySQL and PostgreSQL, as well as NoSQL solutions such as MongoDB and Cassandra, is fundamental.  Experience with cloud platforms, including AWS, Azure and their data-specific services like S3, BigQuery, and Azure Data Factory, is highly valuable.  Data modeling skills, including designing star or snowflake schema, and knowledge of modern architectures like Lambda and Data Mesh, are critical for building scalable solutions.

Role and Responsibilities:

Responsible for designing, developing, and maintaining data pipelines and infrastructure to support our data-driven decision-making processes.  Design, build, and maintain data pipelines to extract, transform, and load data from various sources into our data warehouse and data lake. Proficient in creating data bricks creating notebooks, working with catalogs, native SQL, creating clusters, Parameterizing notebooks,  and administrating data bricks. Define security models and assign roles as per requirement.  Responsible for creating data flow in Synapse analytics integrating external source systems, creating external tables, data flows and create data models. Schedule the pipelines using various jobs, creating trigger Design and develop data pipelines using Fabric pipelines, spark notebooks accessing multiple data sources. Proficient in developing Data bricks notebooks and data optimization Develop and implement data models to ensure data integrity and consistency. Manage and optimize data storage solutions, including databases and data warehouses.  Develop and implement data quality checks and validation procedures to ensure data accuracy and reliability. Design and implement data infrastructure components, including data pipelines, data lakes, and data warehouses. Collaborate with data scientists, analysts, and other stakeholders to understand business requirements and translate them into technical solutions.  Monitoring Azure and Fabric data pipelines, spark jobs and work on fixes based on the request priority.  Responsible for data monitoring activities, having good knowledge on reporting tools like Power Bi and Tableau is required. Responsible for understanding the client requirements and architect solutions in both Azure and AWS cloud platforms.  Monitor and optimize data pipeline performance and scalability to ensure efficient data processing.
Confirm your E-mail: Send Email