Writing Tables Hands On
- This chapter explains how to write, update, and manage data in Delta tables using Spark SQL in Databricks.
- You will learn how to overwrite, append, and upsert records with ACID-compliant operations. All examples are demonstrated using the bookstore dataset.
Creating Delta Tables
To begin, create a Delta table using a CTAS
statement from well-structured source files (e.g., Parquet):
CREATE TABLE orders
USING DELTA
AS SELECT * FROM parquet.`/path/to/orders/`
````
* The table will be managed.
* Schema is auto-inferred from Parquet.
---
## Overwriting Table Data
### Method 1: CREATE OR REPLACE TABLE
```sql
CREATE OR REPLACE TABLE orders
AS SELECT * FROM parquet.`/path/to/orders-new/`
- Fully replaces table content.
- Schema can change.
- Generates a new version in the table's history.
Advantages:
- Time Travel support for previous versions
- Atomic write operation
- No recursive directory deletion
- Old data is retained
Table History Inspection
Method 2: INSERT OVERWRITE
- Replaces existing table content.
- Must match current schema.
- Does not create new tables.
Advantages:
- Schema enforcement
- Prevents unintended schema changes
- Creates a new version in table history
History Confirmation
Schema Enforcement Differences
If schema mismatch occurs (e.g., added column):
INSERT OVERWRITE orders
SELECT *, current_timestamp() AS modified_at
FROM parquet.`/path/to/orders-updated/`
Error:
A schema mismatch detected when writing to the Delta table
INSERT OVERWRITE
enforces strict schema matchCREATE OR REPLACE
allows schema evolution
Appending New Records
INSERT INTO
- Adds new rows to the existing table.
- Total count increases.
- Simple and fast operation.
Warning: Does not prevent duplicates. Re-running will duplicate records.
Merging Records (Upserts)
MERGE INTO Customers Table
Used to update existing records and insert new ones from a source view.
Step 1: Create Temporary View
CREATE OR REPLACE TEMP VIEW customer_updates
AS
SELECT * FROM json.`/path/to/customers-updated.json`
Step 2: Perform Merge
MERGE INTO customers AS target
USING customer_updates AS source
ON target.customer_id = source.customer_id
WHEN MATCHED AND target.email IS NULL AND source.email IS NOT NULL THEN
UPDATE SET target.email = source.email,
target.last_updated = current_timestamp()
WHEN NOT MATCHED THEN
INSERT *
Results:
- 100 rows updated
- 201 rows inserted
- 0 rows deleted
Note: Operation is atomic and ACID-compliant.
MERGE INTO Books Table
Filter records before inserting to avoid duplicates and apply custom logic.
Step 1: Create Temporary View
Step 2: Merge With Conditional Insert
MERGE INTO books AS target
USING books_updates AS source
ON target.book_id = source.book_id AND target.title = source.title
WHEN NOT MATCHED AND source.category = 'Computer Science' THEN
INSERT *
Results:
- 3 new books inserted (category: Computer Science)
Idempotency Check
Re-running the same merge:
Result:
0 records inserted — duplicates prevented.
Summary
Operation | Statement | Schema Change | Prevents Duplicates | Notes |
---|---|---|---|---|
Overwrite Table | CREATE OR REPLACE TABLE |
Yes | No | Allows schema evolution |
Overwrite Table | INSERT OVERWRITE |
No | No | Strict schema match |
Append Records | INSERT INTO |
No | No | Fast but not idempotent |
Upsert (Insert/Update) | MERGE INTO |
No | Yes | Best for deduplication and conditional logic |
Note: Delta Lake enables reliable, performant table writes through transactional guarantees and table versioning.