Big Data Processing: Implement and manage big data processing systems. Experience or strong interest in big data implementation is required.
Data Pipeline Implementation: Develop and maintain robust data processing pipelines. Candidates should have experience or a strong interest in data pipeline architectures.
Batch vs. Real-Time Processing: Clearly articulate the differences in implementation strategies between batch processing and real-time APIs.
Streaming Data Responsibilities: Explain the separation of responsibilities between producers and consumers in streaming data processes.
Database Management: Build and operate databases storing over 10 GiB of data, ensuring efficiency and scalability.
Data Platform Operations: Operate platforms such as Amazon Redshift, Google BigQuery, Snowflake, and Databricks. Experience in managing these or similar platforms is highly desirable.
Qualifications and Skills
Experience with Big Data: Proven track record in handling large-scale data projects, with specific skills in time-series databases, streaming data processing, and multi-tiered database architectures.
Data Warehousing and Data Lakes: Hands-on experience with data warehouse and data lake technologies, including understanding of Lambda architecture.
Technical Proficiency: Strong technical skills in relevant big data technologies and frameworks.
Problem Solving: Excellent analytical and problem-solving skills, capable of managing complex data challenges.
Communication: Effective communication skills, able to document and explain data processes clearly to both technical and non-technical stakeholders.
Nice to Have
Cloud Experience:
Experience with cloud platforms such as AWS, Google Cloud Platform (GCP), or Microsoft Azure.
Knowledge of cloud-based data storage solutions (e.g., S3, Google Cloud Storage, Azure Blob Storage).
Familiarity with cloud-based data processing services (e.g., AWS Lambda, Google Cloud Dataflow, Azure Data Factory).
Experience with cloud infrastructure automation and management tools (e.g., Terraform, CloudFormation, Ansible).
Machine Learning Integration: Understanding of integrating machine learning models into data pipelines.
DevOps Practices: Experience with DevOps practices and tools for continuous integration and deployment (CI/CD).
Data Security: Knowledge of data security best practices and compliance standards in cloud environments.
Visualization Tools: Experience with data visualization tools and platforms (e.g., Tableau, Power BI, Looker).
Programming Languages: Proficiency in additional programming languages relevant to data processing and backend development (e.g., Scala, Go, Rust).