Building a Scalable Data Warehouse in the Cloud with PaaS(sion) - Part 3 of 3
Cloud & DevOps
Cloud computing is the future in the present. While it seems rather complicated in the context of data warehousing, nothing could be further from the truth! If you didn’t read previous parts of this article – Click Here!
Azure Analysis Services – tabular in the cloud.
If you haven’t had contact with the tabular model yet, don’t worry! Its logic, modeling, and principles of operation are much simpler than what we find in traditional OLAP , so getting to know it is just a matter of time. In the multidimensional approach, we had to deal with the “peculiar” abstraction level of the data cubes, which required time to correct. In the tabular approach, the idea is similar to the relational model, and therefore more intuitive and natural for most of us. What is the secret of the efficiency of such a solution, you may ask? The in-memory VertiPaq engine is responsible for this, with a balanced compression ratio and advanced multithreaded processing algorithms. In addition, if we have the need, our model can be based on DirectQuery, where analytical queries will be run directly on the database, not calculated in AAS. Azure Analysis Services supports the compatibility of the latest tabular model 1400, so many of the limitations of previous versions are now history. We can manage the model through the TOM library or using TMSL scripts that implement the instructions in the JSON object, making them much easier to write. Thanks to the available methods of processing selected partitions, we do not have to reprocess the entire model every time. The DAX language will allow us to calculate the required columns and measures and download data from the model. Of course, we do not have to limit ourselves to it, as tabular also supports MDX.
The Azure portal now offers several Analysis Services cost plans, divided by purpose and hardware parameters. Each offers specific prices for the time of work of our server, which of course means that we can only pay for actual use. The server can be started and stopped at any time. Similarly, we can always change its performance. The data that we will store in memory is highly compressed, so depending on the data characteristics, we can accommodate up to hundreds of gigabytes of information. In our project, we also used the Azure Automation platform. It is a service that allows us to automate processes using PowerShell or Python. We can create a flexible solutions for running pipelines, process data models, automatically reload partitions, or even implement start or stop other azure services.
Power BI – the cherry on top.
Efficient loading into Azure Data Warehouse and a well-prepared data model in Azure Analysis Services is half the battle. There will be no benefits from dry data without the tools to analyze and present them. Fortunately, Power BI works great in this! In our solution, we use the Power BI Embedded version and embed reports directly in SharePoint Online. This gives us an easy way to share data to users directly in their company portal. Naturally, Excel and other desktop solutions are not a problem here. The most important thing is that in each of them, by using our recalculated in-memory tabular model in Azure all of our interactions, intersections, filtering, etc. will be done in the cloud, very quickly and on sets that have already exceeded billions of rows a long time ago!
360° IT Check is a weekly publication where we bring you the latest and greatest in the world of tech. We cover topics like emerging technologies & frameworks, news about innovative startups, and other topics which affect the world of tech directly or indirectly.