2. Position Title: Senior Data Engineer
Experience Required: 5-10 Years
Position Summary:
We are seeking an experienced Senior Data Engineer with 5 to 10 years of hands-on experience in building and managing data pipelines and big data infrastructure. The ideal candidate will have a strong technical background in data engineering tools, big data ecosystems, and programming languages. This role focuses on designing, implementing, and optimizing complex data workflows using Apache Airflow, Apache NiFi, Kafka, PySpark, Spark Scala, and other key technologies. Candidates with proficiency in Python, Java, and shell scripting are highly preferred.
Key Responsibilities:
- Data Pipeline Design & Management:
Design, build, and maintain scalable and reliable data pipelines to support both real-time and batch processing requirements. - Workflow Optimization & Monitoring:
Manage, monitor, and optimize workflows in Apache Airflow to ensure data quality, integrity, and system performance. - Data Integration:
Develop and integrate data flows using Apache NiFi to ensure seamless data ingestion and transformation processes. - Real-Time Data Streaming:
Work extensively with Apache Kafka for data streaming and messaging across various data sources. - Data Processing Solutions:
Implement data processing solutions using PySpark and Spark Scala to handle large-scale datasets and complex transformations. - Automation & Scripting:
Write efficient code in Python and Java to automate data workflows and support data engineering needs, utilizing shell scripting for operational tasks. - Cross-functional Collaboration:
Collaborate with cross-functional teams to understand data requirements and provide optimized engineering solutions. - Data Security & Compliance:
Ensure data security, compliance, and performance by following best practices in big data and distributed systems. - Continuous Improvement:
Continuously improve the performance, scalability, and reliability of data processing pipelines.
Required Skills and Experience:
- Apache Airflow:
Extensive experience in managing, scheduling, and monitoring data pipelines. - Apache NiFi:
Strong experience in designing data flows for ingestion and transformation. - Apache Kafka:
In-depth knowledge of Kafka for real-time data streaming and messaging systems. - PySpark & Spark Scala:
Proficiency in using PySpark and Spark Scala for large-scale data processing. - Programming Languages:
Strong experience with Python and Java, with additional expertise in shell scripting. - Big Data Knowledge:
Familiarity with big data ecosystems and distributed data processing. - Problem-Solving & Teamwork:
Ability to work independently and collaboratively in a fast-paced environment, solving complex data engineering challenges.
Educational Qualifications:
- Required:
Bachelor's or Master's degree in Computer Science, Information Technology, Data Engineering, or a related field.
Preferred Qualifications:
- Cloud Experience:
Experience in cloud environments (AWS, GCP, Azure) with big data components. - Version Control & CI/CD:
Experience with version control tools (e.g., Git) and CI/CD practices in data engineering. - Analytical & Communication Skills:
Strong analytical, problem-solving, and communication skills.