Overview: We are looking for a skilled and detail-oriented ETL Testing Specialist to join our dynamic team. This role involves working with large datasets and ensuring the accuracy, completeness, and performance of our ETL processes. The ideal candidate should have hands-on experience with Azure Databricks and proficiency in either Python or PySpark. The ETL Testing Specialist will play a critical role in validating data transformation processes, performing data quality checks, and ensuring that data pipelines are functioning as expected.
Key Responsibilities:
- Collaborate with Data Engineers and ETL developers to understand business requirements, data flows, and the data transformation process.
- Design, develop, and execute manual and automated ETL test scripts to validate the correctness of data transformation logic, data loads, and data migration.
- Validate data quality by performing data profiling, reconciliation, and accuracy checks across multiple environments.
- Identify and report defects related to ETL processes, provide detailed bug reports, and track them to resolution.
- Use Azure Databricks to run and validate ETL workflows, ensuring data consistency, integrity, and performance.
- Write and optimize Python or PySpark scripts for data transformation and testing.
- Perform regression, performance, and load testing for ETL processes to ensure optimal performance of data pipelines.
- Ensure that data is processed within specified time frames and meets business requirements.
- Work closely with cross-functional teams to ensure seamless integration of ETL processes and data pipelines.
- Develop and maintain ETL test plans, test cases, and documentation to ensure thorough testing coverage.
- Provide support for post-production testing and validation of data migration or data warehouse updates.
-
Mandatory Skills:
- Azure Databricks: Hands-on experience in working with Azure Databricks for managing and orchestrating data pipelines and performing data transformations.
- Python or PySpark: Proficiency in Python or PySpark for developing test scripts, data manipulation, and automating ETL testing processes.
- Strong understanding of ETL concepts, data transformation, data validation, and testing.
- Experience with SQL and relational databases (e.g., SQL Server, MySQL, or Oracle) to query, validate, and verify data.
- Familiarity with testing methodologies, such as unit testing, functional testing, and data validation testing.
- Experience with Data Warehousing concepts and testing.
- Familiar with version control tools like Git and CI/CD pipelines.