This will be a fairly straightforward blog post covering different ways of copying data from a shortcut delta table into a delta table created automatically via a notebook.
Select statement with a join
%%sql
SELECT
party.recid AS PartyId
,party.name AS Name
,COALESCE(party.namealias, '') AS ShortName
,COALESCE(postal.countryregionid, '') AS Country
,COALESCE(postal.state, '') AS State
,COALESCE(postal.city, '') AS City
,COALESCE(postal.street, '') AS Street
,COALESCE(postal.zipcode, '') AS PostCode
,COALESCE(phone.locator, '') AS PhoneNumber
,COALESCE(email.locator, '') AS Email
FROM dirpartytable party
LEFTOUTERJOIN logisticspostaladdress postal ON postal.location= party.primaryaddresslocation
AND postal.validto >current_date() -- filters only valid(effective) addresses
LEFTOUTERJOIN logisticselectronicaddress phone ON phone.recid = party.primarycontactphone
LEFTOUTERJOIN logisticselectronicaddress email ON email.recid = party.primarycontactemail
You should see a table result showing below your query.
Create table if not exists
This is a one of copy and will not copy data if the table exists already.
%%sql
CREATETABLEIFNOTEXISTS fact_dirpartytable
USING DELTA AS
SELECT
party.recid AS PartyId
,party.name AS Name
,COALESCE(party.namealias, '') AS ShortName
,COALESCE(postal.countryregionid, '') AS Country
,COALESCE(postal.state, '') AS State
,COALESCE(postal.city, '') AS City
,COALESCE(postal.street, '') AS Street
,COALESCE(postal.zipcode, '') AS PostCode
,COALESCE(phone.locator, '') AS PhoneNumber
,COALESCE(email.locator, '') AS Email
FROM dirpartytable party
LEFTOUTERJOIN logisticspostaladdress postal ON postal.location= party.primaryaddresslocation
AND postal.validto >current_date() -- filters only valid(effective) addresses
LEFTOUTERJOIN logisticselectronicaddress phone ON phone.recid = party.primarycontactphone
LEFTOUTERJOIN logisticselectronicaddress email ON email.recid = party.primarycontactemail
Create table if not exists - use merge
This does a copy similar to the above but uses a merge to match the records
%%sql
-- Step 1: Create Delta table
CREATETABLEIFNOTEXISTS fact3_dirpartytable (
PartyId LONG,
Name STRING,
ShortName STRING,
Country STRING,
State STRING,
City STRING,
Street STRING,
PostCode STRING,
PhoneNumber STRING,
Email STRING
) USING delta;
-- Step 2: Create temporary view
CREATEORREPLACETEMPORARYVIEW temp_dirpartytable AS
SELECT
party.recid AS PartyId
,party.name AS Name
,COALESCE(party.namealias, '') AS ShortName
,COALESCE(postal.countryregionid, '') AS Country
,COALESCE(postal.state, '') AS State
,COALESCE(postal.city, '') AS City
,COALESCE(postal.street, '') AS Street
,COALESCE(postal.zipcode, '') AS PostCode
,COALESCE(phone.locator, '') AS PhoneNumber
,COALESCE(email.locator, '') AS Email
FROM dirpartytable party
LEFTOUTERJOIN logisticspostaladdress postal ON postal.location= party.primaryaddresslocation
AND postal.validto >current_date() -- filters only valid(effective) addresses
LEFTOUTERJOIN logisticselectronicaddress phone ON phone.recid = party.primarycontactphone
LEFTOUTERJOIN logisticselectronicaddress email ON email.recid = party.primarycontactemail;
-- Step 3: Merge into delta table
MERGEINTO fact3_dirpartytable AS target
USING temp_dirpartytable AS source
ON target.PartyId = source.PartyId
WHENMATCHEDTHEN
UPDATESET target.Name = source.Name, target.ShortName = source.ShortName
WHENNOTMATCHEDTHEN
INSERT (PartyId, Name, ShortName) VALUES (source.PartyId, source.Name, source.ShortName);
This will do an update, insert but will not handle deletes.
Create table, Delete and Insert data
This creates the table, then deletes the data in full and inserts it all again.
%%sql
-- Step 1: Create Delta table
CREATETABLEIFNOTEXISTS fact4_dirpartytable (
PartyId LONG,
Name STRING,
ShortName STRING,
Country STRING,
State STRING,
City STRING,
Street STRING,
PostCode STRING,
PhoneNumber STRING,
Email STRING
) USING delta;
-- Step 2: Delete data from the Delta table
DELETEFROM fact4_dirpartytable;
-- Step 3: Create temporary view
INSERTINTO fact4_dirpartytable
SELECT
party.recid AS PartyId
,party.name AS Name
,COALESCE(party.namealias, '') AS ShortName
,COALESCE(postal.countryregionid, '') AS Country
,COALESCE(postal.state, '') AS State
,COALESCE(postal.city, '') AS City
,COALESCE(postal.street, '') AS Street
,COALESCE(postal.zipcode, '') AS PostCode
,COALESCE(phone.locator, '') AS PhoneNumber
,COALESCE(email.locator, '') AS Email
FROM dirpartytable party
LEFTOUTERJOIN logisticspostaladdress postal ON postal.location= party.primaryaddresslocation
AND postal.validto >current_date() -- filters only valid(effective) addresses
LEFTOUTERJOIN logisticselectronicaddress phone ON phone.recid = party.primarycontactphone
LEFTOUTERJOIN logisticselectronicaddress email ON email.recid = party.primarycontactemail;
This runs fast but however not what you may want to do on a regular basis.
Create a temporary view and use SinkModifiedOn
Create a temporary view within your notebook to use as part of complex queries.
This query joins 4 tables together and each table has its own SinkModifiedOn field. I wanted to create a view that gave me the greatest (max) SingModifiedOn date time. This is to later allow me to do an incremental update.
CREATEORREPLACETEMPORARYVIEW temp_dirpartytable AS
SELECT
party.SinkModifiedOn AS party_SinkModifiedOn,
postal.SinkModifiedOn AS postal_SinkModifiedOn,
phone.SinkModifiedOn AS phone_SinkModifiedOn,
email.SinkModifiedOn AS email_SinkModifiedOn,
GREATEST(party.SinkModifiedOn, postal.SinkModifiedOn, phone.SinkModifiedOn, email.SinkModifiedOn) AS SinkModifiedOn,
party.recid AS PartyId
,party.name AS Name
,COALESCE(party.namealias, '') AS SearchName
,COALESCE(postal.countryregionid, '') AS Country
,COALESCE(postal.state, '') AS State
,COALESCE(postal.city, '') AS City
,COALESCE(postal.street, '') AS Street
,COALESCE(postal.zipcode, '') AS PostCode
,COALESCE(phone.locator, '') AS PhoneNumber
,COALESCE(email.locator, '') AS Email
FROM dirpartytable party
LEFTOUTERJOIN logisticspostaladdress postal ON postal.location= party.primaryaddresslocation
AND postal.validto >current_date() -- filters only valid(effective) addresses
LEFTOUTERJOIN logisticselectronicaddress phone ON phone.recid = party.primarycontactphone
LEFTOUTERJOIN logisticselectronicaddress email ON email.recid = party.primarycontactemail;
SELECT*FROM temp_dirpartytable
WHERE SinkModifiedOn >='2024-09-03T02:39:16Z';
This would be good for a transactional table where there are no deletes. You get the last SinkModifiedOn date time field for all the related table. Then filter based on the last run you have in your destination table. You could then do incremental updates.