Building a Scalable Data Warehouse in the Cloud with PaaS(sion) - Part 2 of 3
Table of Contents

Building a Scalable Data Warehouse in the Cloud with PaaS(sion) - Part 2 of 3

Cloud computing is the future in the present. While it seems rather complicated in the context of data warehousing, nothing could be further from the truth!
If you didn’t read the first part of this article –
Click Here!

Mass parallel processing – SQL Server on steroids

The technology of parallel data processing on many connected servers is not new. But in the case of a product like SQL Server, it can arouse interest, and rightly so! Imagine one non-trivial SQL query that has to return complex aggregates from hundreds of billions of rows. In Azure Data Warehouse, it will first go to the primary server – Control Node, where it will be divided into tasks for 60 smaller databases called distributions, which are connected to a specified number of compute servers known as Compute Nodes. The number of computing units depends on our subscription in Azure, where in the simplest variant (DWU100) one Compute Node will perform tasks on all 60 databases, and in the highest (DW6000) each database will be used for tasks individually by 60 computing units.


1

In addition, if our requirements are even bigger, we can change the compute model to a performance-optimized model (Compute Optimized Performace Tier) with just a few clicks, where the distribution data is kept on local storage caches and tempdb structures based on extremely fast NVMe SSDs. Their performance in action can be viewed here.

Data Factory v2 and Data Warehouse – an outstanding team!

The advantages of MPP architecture are also used by the ADFv2 mechanisms. PolyBase support allows you to load data directly into distribution databases in parallel, bypassing Control Node. The parameterization logic, based on the JSON format, allows you to easily and quickly create universal solutions. Automatic column and type mapping help you create generic components for loading data from different sources to stage.


1

In the ELT approach (Extract, Load, Transform), all transformations are carried out at the warehouse, just after loading the data. It gives us complete freedom to manage changes in logic using pure T-SQL code, but also gives incomparably higher performance compared to traditional SQL Server or even Integration Services. The “CREATE TABLE AS SELECT” statement allows us to create tables easily and quickly. We can also directly add and reload partitions without the need for special functions and schemes, all thanks to the simplified syntax. We just need to remember that our data is already divided into 60 parts (after all, we have 60 different distributions), which is important for performance in Columnstore tables. Of course, for smaller sets we can use a traditional clustered index or leave them as heaps.

The entire flow of loading and processing data can be run directly from Azure Data Factory v2. Thanks to error handling, timeouts, and repeat configuration we can quickly and effectively implement logic for unexpected situations.

Do you like it and need more?
Don’t worry! The last part of this article will be available next week!


Learn it. Know it. Done.

Liked the article? subscribe to updates!
360° IT Check is a weekly publication where we bring you the latest and greatest in the world of tech. We cover topics like emerging technologies & frameworks, news about innovative startups, and other topics which affect the world of tech directly or indirectly.

Like what you’re reading? Make sure to subscribe to our weekly newsletter!
Categories:
Share

Join 17,850 tech enthusiasts for your weekly dose of tech news

By filling in the above fields and clicking “Subscribe”, you agree to the processing by ITMAGINATION of your personal data contained in the above form for the purposes of sending you messages in the form of newsletter subscription, in accordance with our Privacy Policy.
Thank you! Your submission has been received!
We will send you at most one email per week with our latest tech news and insights.

In the meantime, feel free to explore this page or our Resources page for eBooks, technical guides, GitHub Demos, and more!
Oops! Something went wrong while submitting the form.

Related articles

Our Partners & Certifications
Microsoft Gold Partner Certification 2021 for ITMAGINATION
ITMAGINATION Google Cloud Partner
AWS Partner Network ITMAGINATION
ISO 9001 ITMAGINATIONISO-IEC 27001:2013 ITMAGINATION
© 2024 ITMAGINATION. All Rights Reserved. Privacy Policy