Design, develop, and optimize large-scale data pipelines to facilitate efficient data collection, processing, and storage.
Maintain and monitor existing data pipelines, ensuring high data availability and consistency.
Data Processing and Analysis:
Develop and refine data processing algorithms capable of handling and analyzing vast amounts of data.
Utilize distributed computing frameworks such as Hadoop and Spark for scalable data processing.
Data Storage and Management:
Design and maintain robust, large-scale data storage solutions using technologies like HDFS, Cassandra, NoSQL and BigQuery. Implement data management practices to ensure data integrity and security.
Collaboration and Communication:
Collaborate with data scientists, analysts, and other stakeholders to accurately understand data requirements and deliver effective solutions.
Communicate complex data concepts and insights clearly to non-technical stakeholders.
Stay informed on the latest trends and advancements in big data technologies.
Continuously evaluate and enhance the existing data infrastructure and processes to improve performance and scalability.
Requirements:
Bachelor's degree or higher in Computer Science, Data Science, or a related field. A minimum of 2 years of relevant professional experience is preferred.
Proven experience in designing and maintaining data pipelines and storage solutions.
Strong technical proficiency with distributed computing frameworks like Spark.
Expertise in data management systems such as HDFS, Cassandra, NoSQL or BigQuery.