ETL / ELT Quiz (200 Questions)

200 exercises10 free

011. In ETL, when does the Transformation step occur?
Free→
022. ELT stands for:
Free→
033. A key difference between ETL and ELT is:
Free→
044. ETL is often chosen when:
Free→
055. ELT is often preferred with modern cloud warehouses because:
Free→
066. The main steps in ETL are:
Free→
077. One advantage of ELT over ETL is:
Free→
088. When choosing ETL vs. ELT, one key consideration is:
Free→
099. ETL traditionally was favored in legacy systems because:
Free→
1010. One drawback of ETL is:
Free→
1111. ELT leverages the target system's resources to:
Locked→
1212. Both ETL and ELT ultimately aim to:
Locked→
1313. ETL pipelines often run on a schedule to:
Locked→
1414. ELT is more flexible when requirements change because:
Locked→
1515. ELT often aligns well with modern data lake or warehouse strategies because:
Locked→
1616. One disadvantage of ELT is:
Locked→
1717. Deciding between ETL and ELT depends on:
Locked→
1818. ELT often pairs well with:
Locked→
1919. ETL pipelines might struggle with changing requirements because:
Locked→
2020. Ultimately, ETL and ELT are chosen based on:
Locked→
2121. Data extraction involves:
Locked→
2222. Incremental extraction means:
Locked→
2323. Handling schema changes in extraction involves:
Locked→
2424. Extracting from relational databases often uses:
Locked→
2525. APIs for extraction may require handling:
Locked→
2626. Files (CSV, JSON) extraction involves:
Locked→
2727. Handling authentication during extraction might involve:
Locked→
2828. When a source system is slow, extraction strategies might include:
Locked→
2929. Ensuring data integrity during extraction might involve:
Locked→
3030. Source system locks and concurrency issues during extraction can be mitigated by:
Locked→
3131. Metadata extraction during extraction phase involves:
Locked→
3232. Logging extraction events helps by:
Locked→
3333. Scheduling extractions often involves:
Locked→
3434. Dealing with unreliable sources may involve:
Locked→
3535. Handling different file formats (CSV, JSON, XML) requires:
Locked→
3636. Some ETL tools come with built-in source connectors to:
Locked→
3737. Minimizing network transfers during extraction might involve:
Locked→
3838. Validating extracted data ensures:
Locked→
3939. Metadata like source timestamps help by:
Locked→
4040. Ensuring a stable extraction process might mean:
Locked→
4141. Data transformation often includes:
Locked→
4242. Converting date formats and normalizing data units is part of:
Locked→
4343. Business rules in transformations might include:
Locked→
4444. SQL-based transformations in ELT approach use:
Locked→
4545. Handling Slowly Changing Dimensions (SCD) in transformations means:
Locked→
4646. Joining multiple data sources in transformation steps is common to:
Locked→
4747. Applying data quality checks mid-transformation ensures:
Locked→
4848. Debugging transformation logic often involves:
Locked→
4949. Using frameworks like Spark for transformation helps with:
Locked→
5050. Ensuring transformations are idempotent means:
Locked→
5151. Applying aggregations (e.g., sum, avg) during transformations helps to:
Locked→
5252. Handling character encoding issues (UTF-8 vs. ASCII) in transformations ensures:
Locked→
5353. Versioning transformation logic means:
Locked→
5454. Pushdown transformations refer to:
Locked→
5555. Idempotent transformations mean if rerun:
Locked→
5656. Reusability in transformations can be achieved by:
Locked→
5757. Handling schema evolution during transformation involves:
Locked→
5858. Transformation performance optimization may include:
Locked→
5959. Debugging transformation errors might use:
Locked→
6060. Once transformations are finalized:
Locked→
6161. Data loading involves:
Locked→
6262. Bulk loading can improve performance by:
Locked→
6363. Incremental loading (upserts) means:
Locked→
6464. Managing indexes during load may involve:
Locked→
6565. Timing loads off-peak hours can:
Locked→
6666. Transactions during load ensure:
Locked→
6767. Verifying load success might involve:
Locked→
6868. Partitioned loading improves performance by:
Locked→
6969. Using target-specific load utilities (e.g., COPY command in Redshift) can:
Locked→
7070. Handling load failures might involve:
Locked→
7171. Distinguishing between full refresh and incremental refresh loading means:
Locked→
7272. Post-load validations ensure:
Locked→
7373. Notifications on load completion or errors help by:
Locked→
7474. Balancing load tasks means:
Locked→
7575. Late-arriving data might be handled by:
Locked→
7676. Atomicity of loads ensures:
Locked→
7777. Archiving or purging old data during load cycles is done to:
Locked→
7878. Ensuring consistency and atomicity of loads might require:
Locked→
7979. In ELT scenarios, loading raw data first allows:
Locked→
8080. After successful load:
Locked→
8181. Traditional ETL tools (Informatica, Talend) often:
Locked→
8282. Modern ELT tools integrate with cloud warehouses like:
Locked→
8383. Orchestration tools (Airflow, Luigi) help by:
Locked→
8484. SaaS integration platforms (Fivetran, Stitch) often:
Locked→
8585. Using Python scripts for ETL can be advantageous for:
Locked→
8686. Dockerizing ETL jobs provides:
Locked→
8787. CI/CD pipelines for ETL code mean:
Locked→
8888. On-prem ETL tools vs. cloud-native solutions differ in:
Locked→
8989. Data virtualization tools help by:
Locked→
9090. Evaluating open-source vs. commercial ETL solutions involves:
Locked→
9191. Performance benchmarks among ETL tools help to:
Locked→
9292. Leveraging Spark or Flink in transformations is useful for:
Locked→
9393. DataOps integrates ETL/ELT with:
Locked→
9494. Integration with data catalogs helps ETL/ELT by:
Locked→
9595. Using message queues (Kafka) in extraction helps by:
Locked→
9696. ETL in a microservices architecture might mean:
Locked→
9797. Impact of orchestration tool’s scheduling features:
Locked→
9898. Version control of ETL scripts and configs helps with:
Locked→
9999. Selecting a tool based on data volume and complexity means:
Locked→
100100. Considering team skill sets for tool selection means:
Locked→
101101. Identifying bottlenecks in ETL/ELT pipelines often involves checking:
Locked→
102102. Parallelization strategies might include:
Locked→
103103. Partitioning data for parallel processing helps by:
Locked→
104104. Efficient file formats like Parquet or ORC improve performance by:
Locked→
105105. Compressing data before transfer reduces:
Locked→
106106. Memory management for large ETL jobs can be improved by:
Locked→
107107. Caching intermediate results might help if:
Locked→
108108. Query optimization in ELT scenarios includes:
Locked→
109109. Minimizing unnecessary data movement means:
Locked→
110110. Scheduling ETL jobs off-peak can improve performance by:
Locked→
111111. Monitoring runtime metrics (CPU, memory, throughput) helps:
Locked→
112112. Choosing incremental over full loads can improve performance by:
Locked→
113113. Using columnar storage in the target system helps because:
Locked→
114114. Monitoring runtime with trend analysis helps:
Locked→
115115. Retry and backoff strategies during extraction and loading help by:
Locked→
116116. Eliminating unnecessary transformations means:
Locked→
117117. Code profiling in ETL scripts helps by:
Locked→
118118. Reducing unnecessary data movement (ELT vs. ETL) can improve performance by:
Locked→
119119. Adopting streaming ETL for continuous processing improves performance for:
Locked→
120120. Automating performance regression tests means:
Locked→
121121. Implementing validation checks at extraction ensures:
Locked→
122122. Data cleansing involves:
Locked→
123123. Standardizing reference data during transformation helps:
Locked→
124124. Data lineage means:
Locked→
125125. Auditing changes and maintaining historical versions of data allows:
Locked→
126126. Data quality metrics (completeness, consistency) help by:
Locked→
127127. Error handling pipelines (quarantine bad records) means:
Locked→
128128. Role-based access controls in ETL/ELT governance ensure:
Locked→
129129. Metadata management helps by:
Locked→
130130. Ensuring consistency between source and target schemas prevents:
Locked→
131131. Self-service data quality checks mean:
Locked→
132132. SLA definitions for data timeliness and correctness ensure:
Locked→
133133. Logging quality metrics allows:
Locked→
134134. Governance frameworks (like DAMA) applied to ETL/ELT mean:
Locked→
135135. Aligning ETL/ELT practices with organizational policies ensures:
Locked→
136136. Continuous improvement cycles for data quality involve:
Locked→
137137. Ensuring completeness means:
Locked→
138138. Consistency checks ensure:
Locked→
139139. Accuracy checks might compare extracted data against:
Locked→
140140. Integrating with data governance tools means:
Locked→
141141. Encrypting data in transit ensures:
Locked→
142142. Using secure protocols (TLS/SSL) for data extraction from APIs prevents:
Locked→
143143. Masking or tokenizing PII fields in transformations ensures:
Locked→
144144. Applying column-level encryption or hashing can:
Locked→
145145. Strict access controls on ETL pipelines means:
Locked→
146146. Complying with regulations like GDPR may involve:
Locked→
147147. Auditing and logging who accessed ETL data ensures:
Locked→
148148. Using secrets managers for credentials instead of hardcoding prevents:
Locked→
149149. Minimizing data movement of sensitive records may mean:
Locked→
150150. Regular compliance audits and ETL process reviews ensure:
Locked→
151151. Using VPCs or private networking for data transfers ensures:
Locked→
152152. Complying with HIPAA for healthcare data might require:
Locked→
153153. Separation of duties in ETL/ELT management means:
Locked→
154154. Using temporary credentials or tokens instead of static keys enhances security by:
Locked→
155155. Applying principle of least privilege to ETL system accounts means:
Locked→
156156. Sanitizing logs and debug info means:
Locked→
157157. Ensuring data disposal and retention policies are followed means:
Locked→
158158. Using anonymized or synthetic test data prevents:
Locked→
159159. Regular compliance audits might involve:
Locked→
160160. Storing sensitive data only in encrypted form reduces risk if:
Locked→
161161. Real-time ETL differs from batch ETL by:
Locked→
162162. Change Data Capture (CDC) techniques detect:
Locked→
163163. Tools like Debezium or Attunity assist with:
Locked→
164164. Using Kafka or Kinesis in streaming ETL means:
Locked→
165165. Micro-batching vs. true streaming differs by:
Locked→
166166. Handling out-of-order events in streaming ETL often requires:
Locked→
167167. Ensuring exactly-once delivery in streaming ETL means:
Locked→
168168. State management in streaming transformations involves:
Locked→
169169. Windowing functions (tumbling, sliding windows) in streaming help by:
Locked→
170170. Dealing with backpressure and rate limiting ensures:
Locked→
171171. Transforming on the fly vs. storing raw events first is a choice between:
Locked→
172172. Choosing storage sinks for streaming outputs (NoSQL, data lakes) depends on:
Locked→
173173. Monitoring streaming pipelines for lag and throughput helps by:
Locked→
174174. Recovery and fault tolerance in streaming ETL might use:
Locked→
175175. Schema evolution in a streaming environment means:
Locked→
176176. Watermarks in event-time processing help by:
Locked→
177177. Integration with stream processing frameworks (Flink, Spark Structured Streaming) means:
Locked→
178178. Real-time alerting on data quality in streams allows:
Locked→
179179. Balancing latency vs. completeness means:
Locked→
180180. Continuous integration and deployment for streaming pipelines means:
Locked→
181181. Unit tests for individual transformation logic mean:
Locked→
182182. Integration tests across the entire ETL pipeline ensure:
Locked→
183183. Using mock sources and targets in tests allows:
Locked→
184184. Regression testing ensures:
Locked→
185185. Load testing ETL pipelines helps by:
Locked→
186186. Synthetic test data generation helps test:
Locked→
187187. Integrating tests with CI/CD means:
Locked→
188188. Automated error detection and retries in pipelines means:
Locked→
189189. Monitoring tools (e.g., Airflow UI) provide:
Locked→
190190. Alerting on job failures or SLA breaches means:
Locked→
191191. Logging best practices (structured logs) aid in:
Locked→
192192. Version control of ETL pipeline definitions means:
Locked→
193193. Auditing job runs (who triggered them, when) provides:
Locked→
194194. Scalability tests ensure:
Locked→
195195. Using checksums and row counts for validation in tests means:
Locked→
196196. Canary deployments for ETL changes mean:
Locked→
197197. Performance metrics (CPU, memory) in monitoring help identify:
Locked→
198198. Trend analysis in monitoring dashboards means:
Locked→
199199. Security scans on ETL code (linters, secret detection) ensure:
Locked→
200200. Continuous improvement loops from monitoring data means:
Locked→