← All topics

ETL / ELT Quiz (200 Questions)

200 exercises10 free
  1. 011. In ETL, when does the Transformation step occur?
    Free
  2. 022. ELT stands for:
    Free
  3. 033. A key difference between ETL and ELT is:
    Free
  4. 044. ETL is often chosen when:
    Free
  5. 055. ELT is often preferred with modern cloud warehouses because:
    Free
  6. 066. The main steps in ETL are:
    Free
  7. 077. One advantage of ELT over ETL is:
    Free
  8. 088. When choosing ETL vs. ELT, one key consideration is:
    Free
  9. 099. ETL traditionally was favored in legacy systems because:
    Free
  10. 1010. One drawback of ETL is:
    Free
  11. 1111. ELT leverages the target system's resources to:
    Locked
  12. 1212. Both ETL and ELT ultimately aim to:
    Locked
  13. 1313. ETL pipelines often run on a schedule to:
    Locked
  14. 1414. ELT is more flexible when requirements change because:
    Locked
  15. 1515. ELT often aligns well with modern data lake or warehouse strategies because:
    Locked
  16. 1616. One disadvantage of ELT is:
    Locked
  17. 1717. Deciding between ETL and ELT depends on:
    Locked
  18. 1818. ELT often pairs well with:
    Locked
  19. 1919. ETL pipelines might struggle with changing requirements because:
    Locked
  20. 2020. Ultimately, ETL and ELT are chosen based on:
    Locked
  21. 2121. Data extraction involves:
    Locked
  22. 2222. Incremental extraction means:
    Locked
  23. 2323. Handling schema changes in extraction involves:
    Locked
  24. 2424. Extracting from relational databases often uses:
    Locked
  25. 2525. APIs for extraction may require handling:
    Locked
  26. 2626. Files (CSV, JSON) extraction involves:
    Locked
  27. 2727. Handling authentication during extraction might involve:
    Locked
  28. 2828. When a source system is slow, extraction strategies might include:
    Locked
  29. 2929. Ensuring data integrity during extraction might involve:
    Locked
  30. 3030. Source system locks and concurrency issues during extraction can be mitigated by:
    Locked
  31. 3131. Metadata extraction during extraction phase involves:
    Locked
  32. 3232. Logging extraction events helps by:
    Locked
  33. 3333. Scheduling extractions often involves:
    Locked
  34. 3434. Dealing with unreliable sources may involve:
    Locked
  35. 3535. Handling different file formats (CSV, JSON, XML) requires:
    Locked
  36. 3636. Some ETL tools come with built-in source connectors to:
    Locked
  37. 3737. Minimizing network transfers during extraction might involve:
    Locked
  38. 3838. Validating extracted data ensures:
    Locked
  39. 3939. Metadata like source timestamps help by:
    Locked
  40. 4040. Ensuring a stable extraction process might mean:
    Locked
  41. 4141. Data transformation often includes:
    Locked
  42. 4242. Converting date formats and normalizing data units is part of:
    Locked
  43. 4343. Business rules in transformations might include:
    Locked
  44. 4444. SQL-based transformations in ELT approach use:
    Locked
  45. 4545. Handling Slowly Changing Dimensions (SCD) in transformations means:
    Locked
  46. 4646. Joining multiple data sources in transformation steps is common to:
    Locked
  47. 4747. Applying data quality checks mid-transformation ensures:
    Locked
  48. 4848. Debugging transformation logic often involves:
    Locked
  49. 4949. Using frameworks like Spark for transformation helps with:
    Locked
  50. 5050. Ensuring transformations are idempotent means:
    Locked
  51. 5151. Applying aggregations (e.g., sum, avg) during transformations helps to:
    Locked
  52. 5252. Handling character encoding issues (UTF-8 vs. ASCII) in transformations ensures:
    Locked
  53. 5353. Versioning transformation logic means:
    Locked
  54. 5454. Pushdown transformations refer to:
    Locked
  55. 5555. Idempotent transformations mean if rerun:
    Locked
  56. 5656. Reusability in transformations can be achieved by:
    Locked
  57. 5757. Handling schema evolution during transformation involves:
    Locked
  58. 5858. Transformation performance optimization may include:
    Locked
  59. 5959. Debugging transformation errors might use:
    Locked
  60. 6060. Once transformations are finalized:
    Locked
  61. 6161. Data loading involves:
    Locked
  62. 6262. Bulk loading can improve performance by:
    Locked
  63. 6363. Incremental loading (upserts) means:
    Locked
  64. 6464. Managing indexes during load may involve:
    Locked
  65. 6565. Timing loads off-peak hours can:
    Locked
  66. 6666. Transactions during load ensure:
    Locked
  67. 6767. Verifying load success might involve:
    Locked
  68. 6868. Partitioned loading improves performance by:
    Locked
  69. 6969. Using target-specific load utilities (e.g., COPY command in Redshift) can:
    Locked
  70. 7070. Handling load failures might involve:
    Locked
  71. 7171. Distinguishing between full refresh and incremental refresh loading means:
    Locked
  72. 7272. Post-load validations ensure:
    Locked
  73. 7373. Notifications on load completion or errors help by:
    Locked
  74. 7474. Balancing load tasks means:
    Locked
  75. 7575. Late-arriving data might be handled by:
    Locked
  76. 7676. Atomicity of loads ensures:
    Locked
  77. 7777. Archiving or purging old data during load cycles is done to:
    Locked
  78. 7878. Ensuring consistency and atomicity of loads might require:
    Locked
  79. 7979. In ELT scenarios, loading raw data first allows:
    Locked
  80. 8080. After successful load:
    Locked
  81. 8181. Traditional ETL tools (Informatica, Talend) often:
    Locked
  82. 8282. Modern ELT tools integrate with cloud warehouses like:
    Locked
  83. 8383. Orchestration tools (Airflow, Luigi) help by:
    Locked
  84. 8484. SaaS integration platforms (Fivetran, Stitch) often:
    Locked
  85. 8585. Using Python scripts for ETL can be advantageous for:
    Locked
  86. 8686. Dockerizing ETL jobs provides:
    Locked
  87. 8787. CI/CD pipelines for ETL code mean:
    Locked
  88. 8888. On-prem ETL tools vs. cloud-native solutions differ in:
    Locked
  89. 8989. Data virtualization tools help by:
    Locked
  90. 9090. Evaluating open-source vs. commercial ETL solutions involves:
    Locked
  91. 9191. Performance benchmarks among ETL tools help to:
    Locked
  92. 9292. Leveraging Spark or Flink in transformations is useful for:
    Locked
  93. 9393. DataOps integrates ETL/ELT with:
    Locked
  94. 9494. Integration with data catalogs helps ETL/ELT by:
    Locked
  95. 9595. Using message queues (Kafka) in extraction helps by:
    Locked
  96. 9696. ETL in a microservices architecture might mean:
    Locked
  97. 9797. Impact of orchestration tool’s scheduling features:
    Locked
  98. 9898. Version control of ETL scripts and configs helps with:
    Locked
  99. 9999. Selecting a tool based on data volume and complexity means:
    Locked
  100. 100100. Considering team skill sets for tool selection means:
    Locked
  101. 101101. Identifying bottlenecks in ETL/ELT pipelines often involves checking:
    Locked
  102. 102102. Parallelization strategies might include:
    Locked
  103. 103103. Partitioning data for parallel processing helps by:
    Locked
  104. 104104. Efficient file formats like Parquet or ORC improve performance by:
    Locked
  105. 105105. Compressing data before transfer reduces:
    Locked
  106. 106106. Memory management for large ETL jobs can be improved by:
    Locked
  107. 107107. Caching intermediate results might help if:
    Locked
  108. 108108. Query optimization in ELT scenarios includes:
    Locked
  109. 109109. Minimizing unnecessary data movement means:
    Locked
  110. 110110. Scheduling ETL jobs off-peak can improve performance by:
    Locked
  111. 111111. Monitoring runtime metrics (CPU, memory, throughput) helps:
    Locked
  112. 112112. Choosing incremental over full loads can improve performance by:
    Locked
  113. 113113. Using columnar storage in the target system helps because:
    Locked
  114. 114114. Monitoring runtime with trend analysis helps:
    Locked
  115. 115115. Retry and backoff strategies during extraction and loading help by:
    Locked
  116. 116116. Eliminating unnecessary transformations means:
    Locked
  117. 117117. Code profiling in ETL scripts helps by:
    Locked
  118. 118118. Reducing unnecessary data movement (ELT vs. ETL) can improve performance by:
    Locked
  119. 119119. Adopting streaming ETL for continuous processing improves performance for:
    Locked
  120. 120120. Automating performance regression tests means:
    Locked
  121. 121121. Implementing validation checks at extraction ensures:
    Locked
  122. 122122. Data cleansing involves:
    Locked
  123. 123123. Standardizing reference data during transformation helps:
    Locked
  124. 124124. Data lineage means:
    Locked
  125. 125125. Auditing changes and maintaining historical versions of data allows:
    Locked
  126. 126126. Data quality metrics (completeness, consistency) help by:
    Locked
  127. 127127. Error handling pipelines (quarantine bad records) means:
    Locked
  128. 128128. Role-based access controls in ETL/ELT governance ensure:
    Locked
  129. 129129. Metadata management helps by:
    Locked
  130. 130130. Ensuring consistency between source and target schemas prevents:
    Locked
  131. 131131. Self-service data quality checks mean:
    Locked
  132. 132132. SLA definitions for data timeliness and correctness ensure:
    Locked
  133. 133133. Logging quality metrics allows:
    Locked
  134. 134134. Governance frameworks (like DAMA) applied to ETL/ELT mean:
    Locked
  135. 135135. Aligning ETL/ELT practices with organizational policies ensures:
    Locked
  136. 136136. Continuous improvement cycles for data quality involve:
    Locked
  137. 137137. Ensuring completeness means:
    Locked
  138. 138138. Consistency checks ensure:
    Locked
  139. 139139. Accuracy checks might compare extracted data against:
    Locked
  140. 140140. Integrating with data governance tools means:
    Locked
  141. 141141. Encrypting data in transit ensures:
    Locked
  142. 142142. Using secure protocols (TLS/SSL) for data extraction from APIs prevents:
    Locked
  143. 143143. Masking or tokenizing PII fields in transformations ensures:
    Locked
  144. 144144. Applying column-level encryption or hashing can:
    Locked
  145. 145145. Strict access controls on ETL pipelines means:
    Locked
  146. 146146. Complying with regulations like GDPR may involve:
    Locked
  147. 147147. Auditing and logging who accessed ETL data ensures:
    Locked
  148. 148148. Using secrets managers for credentials instead of hardcoding prevents:
    Locked
  149. 149149. Minimizing data movement of sensitive records may mean:
    Locked
  150. 150150. Regular compliance audits and ETL process reviews ensure:
    Locked
  151. 151151. Using VPCs or private networking for data transfers ensures:
    Locked
  152. 152152. Complying with HIPAA for healthcare data might require:
    Locked
  153. 153153. Separation of duties in ETL/ELT management means:
    Locked
  154. 154154. Using temporary credentials or tokens instead of static keys enhances security by:
    Locked
  155. 155155. Applying principle of least privilege to ETL system accounts means:
    Locked
  156. 156156. Sanitizing logs and debug info means:
    Locked
  157. 157157. Ensuring data disposal and retention policies are followed means:
    Locked
  158. 158158. Using anonymized or synthetic test data prevents:
    Locked
  159. 159159. Regular compliance audits might involve:
    Locked
  160. 160160. Storing sensitive data only in encrypted form reduces risk if:
    Locked
  161. 161161. Real-time ETL differs from batch ETL by:
    Locked
  162. 162162. Change Data Capture (CDC) techniques detect:
    Locked
  163. 163163. Tools like Debezium or Attunity assist with:
    Locked
  164. 164164. Using Kafka or Kinesis in streaming ETL means:
    Locked
  165. 165165. Micro-batching vs. true streaming differs by:
    Locked
  166. 166166. Handling out-of-order events in streaming ETL often requires:
    Locked
  167. 167167. Ensuring exactly-once delivery in streaming ETL means:
    Locked
  168. 168168. State management in streaming transformations involves:
    Locked
  169. 169169. Windowing functions (tumbling, sliding windows) in streaming help by:
    Locked
  170. 170170. Dealing with backpressure and rate limiting ensures:
    Locked
  171. 171171. Transforming on the fly vs. storing raw events first is a choice between:
    Locked
  172. 172172. Choosing storage sinks for streaming outputs (NoSQL, data lakes) depends on:
    Locked
  173. 173173. Monitoring streaming pipelines for lag and throughput helps by:
    Locked
  174. 174174. Recovery and fault tolerance in streaming ETL might use:
    Locked
  175. 175175. Schema evolution in a streaming environment means:
    Locked
  176. 176176. Watermarks in event-time processing help by:
    Locked
  177. 177177. Integration with stream processing frameworks (Flink, Spark Structured Streaming) means:
    Locked
  178. 178178. Real-time alerting on data quality in streams allows:
    Locked
  179. 179179. Balancing latency vs. completeness means:
    Locked
  180. 180180. Continuous integration and deployment for streaming pipelines means:
    Locked
  181. 181181. Unit tests for individual transformation logic mean:
    Locked
  182. 182182. Integration tests across the entire ETL pipeline ensure:
    Locked
  183. 183183. Using mock sources and targets in tests allows:
    Locked
  184. 184184. Regression testing ensures:
    Locked
  185. 185185. Load testing ETL pipelines helps by:
    Locked
  186. 186186. Synthetic test data generation helps test:
    Locked
  187. 187187. Integrating tests with CI/CD means:
    Locked
  188. 188188. Automated error detection and retries in pipelines means:
    Locked
  189. 189189. Monitoring tools (e.g., Airflow UI) provide:
    Locked
  190. 190190. Alerting on job failures or SLA breaches means:
    Locked
  191. 191191. Logging best practices (structured logs) aid in:
    Locked
  192. 192192. Version control of ETL pipeline definitions means:
    Locked
  193. 193193. Auditing job runs (who triggered them, when) provides:
    Locked
  194. 194194. Scalability tests ensure:
    Locked
  195. 195195. Using checksums and row counts for validation in tests means:
    Locked
  196. 196196. Canary deployments for ETL changes mean:
    Locked
  197. 197197. Performance metrics (CPU, memory) in monitoring help identify:
    Locked
  198. 198198. Trend analysis in monitoring dashboards means:
    Locked
  199. 199199. Security scans on ETL code (linters, secret detection) ensure:
    Locked
  200. 200200. Continuous improvement loops from monitoring data means:
    Locked