CER-0303
When external Delta tables are dropped from Databricks Unity Catalog or the legacy Hive metastore, only the table metadata is removed — the underlying data files in cloud object storage (such as S3, ADLS, or GCS) remain untouched and continue to incur per-GB-month storage charges. This behavior is by design: external tables decouple metadata from data lifecycle management, meaning Databricks explicitly does not delete the underlying storage when an external table is dropped. The result is orphaned storage — files that no longer have any catalog reference, are not consumed by any downstream pipeline, and deliver no business value, yet continue to accumulate charges indefinitely.
This pattern is particularly prevalent in environments using medallion architecture (bronze/silver/gold layers), where tables are frequently recreated during pipeline evolution, schema experimentation, or migration between environments. Development and test workloads compound the problem, as teams routinely create and abandon external table references without cleaning up the associated storage. Unlike managed tables in Unity Catalog — which have a retention period with recovery capability before automatic deletion — external tables offer no such safety net. The orphaned storage is structurally invisible to standard cost dashboards because it appears as generic object storage charges, not as Databricks-specific line items. Over time, this silent accumulation can represent a meaningful share of an organization's total storage spend.
Importantly, Databricks VACUUM operations do not address this pattern. VACUUM cleans up old file versions within active Delta tables, but it cannot act on storage paths that have been completely disconnected from catalog metadata through external table drops. The only way to reclaim this storage is to manually identify and delete the orphaned files in cloud storage.
This inefficiency involves two independent billing dimensions:
When an external Delta table is dropped, the metadata removal has no effect on the underlying storage billing. The cloud provider continues to charge for every byte of data in the orphaned storage path at the applicable per-GB-month rate. Because external tables are often used for large-scale data — particularly in data lake and lakehouse architectures — the orphaned storage volumes can be substantial.