maandag 26 juni 2023

Fabric : Shortcuts and referencing SQL Endpoints in Fabric


In this blogpost I'm investigating the reference between Warehouse and Lakehouse. You can reference in Warehouse, Lakehouse tables (and views) and viceversa. But it is not the same experience. It happens in different ways. In Warehouse you can reference a table in the Lakehouse with SELECT * FROM <Lakehouse>.<schema>.<tablename> (or<viewname>) and in the Lakehouse you can link a table from the Warehouse with a shortcut. 

Using data of a Lakehouse object in Warehouse

The first experiment is using Lakehouse data in Warehouse. It's a bit confusing because the option is + warehouses, but (in this example) I'm referencing a SQL Endpoint of the Lakehouse in the warehouse. Microsoft sees SQL endpoints as the same as a warehouse in Fabric. Click on the + Warehouses.

The select the SQL endpoint of the Lakehouse.

Now the selected SQL endpoint is in the Warehouse

And now I can query the data from the Lakehouse in the Warehouse. This can be seen in the following screenshot.

Can I change the data of the Lakehouse with Warehouse? Let's see if I can do that.

No, you cannot. The error message is "Data Manipulation Language (DML) statements are not supported for this table type in this version of SQL Server." SQL Server? Anyway it's not possible.

What happens when I change the name of the table in the Lakehouse? I change the name with the Rename option in Lakehouse

Let's rename it into "product_test".

And now check the query in Warehouse. I had to refresh the Lakehouse in Warehouse. If I don't do that, the query executes succesfully surprisingly. It keeps working 'correctly' until I refresh the Lakehouse. It's a bit confusing. May be it's a timing issue? It's also a bit awkward that whenever I leave the Warehouse item, the Lakehouse disappears and I have add it everytime when I enter the Warehouse again. In the end it will give this error.

Yet another thing is the option to remove the item form the dataset. It seems that Fabric is saying that the product table is added to the standard dataset? Let's try to remove it.

It will give an error : 

The error message : "Unable to update BI model with these changes. Please try again later or contact support."

I'm not sure whether the error is correct now because I was renaming the table now and then. May be Fabric is now confused, now;-). Is it supposed to be that when Lakehouse is 'connected' that you can add tables of the Lakehouse to the Warehouse standard dataset? Then, it seems to me that this is not completely correcty implemented yet.

Using data of Warehouse in Lakehouse

Next is how we can use data from the Warehouse in the Lakehouse.

And then I have to tell Microsoft Fabric which location where I can find the object.

Select the Warehouse.

Select the tables (or views). I select the complete schema as a shortcut.

And now the tables are available in the Lakehouse 

Can I query the tables in the notebook? And use these in the notebook? No I can't seem to get this working properly. I'm not sure whether what I'm doing wrong or that this is not properly implemented yet by Microsoft.

I think that I should create a shortcut directly to a table and not at a higher level at the Warehouse. When I reference the table 'sales' directly it's working. 

So the table is shortcutted from the Warehouse in the Lakehouse and I can use this in my notebook. Interesting!

Final Thoughts

The first thing that comes to my mind is why "shortcuts" in Lakehouse and why a "reference a SQL Endpoint" in Warehouse. Why not use "shortcuts" everywhere (and use the SQL endpoint under water?). It would have been a consistent user experience in my opinion if Microsoft would used shortcuts everywhere. It's a bit confusing. But the idea of linking data is great! Now you could say that the data warehouse guys (or girls) develop the truth (for instance a customer table) and you can use these in the Lakehouse for data engineering a Machine learning model.


Geen opmerkingen:

Een reactie posten