Data & Analytics

17th May 2023

Power BI Meta Data extraction using Python

Share:

Power BI Meta Data extraction using Python

In this blog we are going to learn about Power BI.pbit files, Power BI desktop file Meta data, Extraction of Power BI Meta data and saving it as an excel file using .pbit file and a simple Python code using libraries like Pandas, OS, Regex, JSON and dax_extract.

What is Power BI and .pbix files?

Power BI is a market leading business intelligence tool by Microsoft for Cleaning, Modifying and Visualizing raw data to come up with actionable insights. Power BI comes with its own data transformation engine called power query and a formula expression language called DAX (Data Analysis Expressions).

DAX gives power BI the ability to calculate new columns, dynamic measures, and tables inside Power Bi desktop.

By default, Power BI report files are saved with .pbix extension which is a renamed version of a ZIP file which contains multiple components, such as the visuals, report canvas, model metadata, and data.

What is Power BI .pbit file

.pbit is a template file created by Power Bi desktop which is also a renamed version of a ZIP file that contains all the Meta data for the Power BI report but doesn’t contain the data itself. Once we extract .pbit file we get a DataModelSchema file along with other files which contain all the Meta data of a Power BI desktop files.

Later in this blog we will be using these .pbit and DataModelSchema files to extract Power BI desktop Meta data.

What is the Meta data in a Power BI Desktop file

Regarding what you see in the Report View in a Power BI desktop, meta data is everything. You can think of all the information as meta data, including the name, source, expression, data type, calculated tables, calculated columns, calculated measures, relationships and lineage between the model’s various tables, hierarchies, parameters, etc.

We will mainly concentrate on extracting Calculated Measures, Calculated Columns, and Relationships in this blog.

Extraction of Meta data using Python

Python was used to process and extract the JSON from the.pbit file and DataModelSchema. We first converted JSON to a Python dictionary before extracting the necessary Meta data.

Below are the steps we will need to achieve the requirement:

 

1. Exporting .pbix file as .pbit file

There are two ways to save our power BI desktop file as .pbit file.

  • Once we are in Power BI desktop, we have an option to save our file as power BI template(.pbit) file
  • We can go to File–>Export–>Power BI Template and save the .pbit file at the desired directory.

2. Unzipping .pbit file to get DataModelSchema file

We can directly unzip the .pbit file using the 7z-Zip file manager or any other file manager. Once we Unzip the file, we will get a folder with the same name as that of the .pbit file. Inside the folder we will get the DataModelSchema file, we will have to change its extension to .txt for reading in python.

3. Reading .pbit and Data model schema file in python

We have an option to directly read the .pbit file in python using the dax_extract library. Second option to read the text file in python and using the JSON module convert it into a Python dictionary. Code can be found in the GitHub repository link given at the end of this file.

4. Extracting Measures from the dictionary

The dictionary that we get consists details of all the tables as separate lists, Individuals tables have details related to the columns and measures belonging to that table, we can loop on each table one by one and get details of columns, Measures etc. Below is an example of the Python code can be found in the GitHub Repository link given at the end of this file.

  table Number table Name Measure Name Measure Expression
0 5 Query Data % Query Resolved CALCULATE(COUNT(‘Query Data'[Client ID]),’Quer…
1 5 Query Data Special Query Percentage CALCULATE(COUNT(‘Query Data'[Client ID]),’Quer…
2 6 Asset Data Client Retention Rate CALCULATE(COUNT(‘Asset Data'[Client ID]),’Asse…

 

5. Extracting calculated columns from the Dictionary

Like how we extracted the measures we can loop on each table and get details of all the calculated columns. Below is the sample output of the Python code can be found in the GitHub Repository link given at the end of this file.

 

  table no Table Name name expression
6 2 Calendar Day DAY(‘Calendar'[Date])
7 2 Calendar Month MONTH(‘Calendar'[Date])
8 2 Calendar Quarter CONCATENATE(“Q”,QUARTER(‘Calendar'[Date]) )
9 2 Calendar Year YEAR(‘Calendar'[Date])

 

Also Read:  Certainty in streaming real-time ETL

6. Extracting relationships from the dictionary

Data for relationships is available in the model key of the data dictionary and can be easily extracted. Below is the sample output of the Python code can be found in the GitHub Repository link given at the end of this file. 

 

  From Table From Column To Table To Column State
0 Operational Data Refresh Date LocalDateTable_50948e70-816c-4122-bb48-2a2e442… Date ready
1 Operational Data Client ID Client Data Client ID ready
2 Query Data Query Date Calendar Date ready
3 Asset Data Client ID Client Data Client ID ready
4 Asset Data Contract Maturity Date LocalDateTable_d625a62f-98f2-4794-80e3-4d14736… Date ready
5 Asset Data Enrol Date Calendar Date ready

 

7. Saving Extracted data as an Excel file

All the extracted data can be saved in empty lists and these lists can be used to derive a Pandas data frame. This Pandas data frame can be exported as Excel and easily used for reference and validation purposes in a complex model. Below snapshot gives an idea of how this can be done.

Do you want to know more about Power BI meta data using Python? Then reach out to our experts today.

Click here

Conclusion

In this blog we learnt about extracting metadata from .pbit and DataModelSchema file. We have created a Python script that allows users to enter the file location of .pbit and DataModelSchema file and then metadata extraction along with excel generation can be automated. The code can be found on the below GitHub also sample excel files can be downloaded from below GitHub link. Hope this is helpful and will see you soon with another interesting topic.

 

Author

Yash Kumar Shrivastava

Share:

Latest Blogs

How to Leverage DevOps in Successful Application Modernization 

Product Engineering

5th May 2025

How to Leverage DevOps in Successful Application Modernization 

Read More
Transformer Models in Multimodal AI: Challenges and Innovation 

Gen AI

5th May 2025

Transformer Models in Multimodal AI: Challenges and Innovation 

Read More
Minimalist UX Design: Striking a Perfect Balance in Design 

Product Engineering

5th May 2025

Minimalist UX Design: Striking a Perfect Balance in Design 

Read More

Related Blogs

How fortune 500 companies are accelerating AI innovation with databricks 

Data & Analytics

2nd May 2025

How fortune 500 companies are accelerating AI innovation with databricks 

The AI revolution isn’t coming—it’s here, and Fortune 500 companies are in an arms race...

Read More
Optimizing ETL Workflows with Databricks and Delta Lake: Faster, Reliable, Scalable

Data & Analytics

13th Mar 2025

Optimizing ETL Workflows with Databricks and Delta Lake: Faster, Reliable, Scalable

ETL workflows form the backbone of data-driven decision-making in the modern data ecosystem. Although ETL...

Read More
Explainable AI in Finance: Ensuring Accountability and Compliance

Data & Analytics

24th Jan 2025

Explainable AI in Finance: Ensuring Accountability and Compliance

AI transforms the financial sector by enabling optimized decision-making, automating processes, and uncovering insights from...

Read More
Array ( [0] => Array ( [f_s_link] => https://x.com/IndiumSoftware [f_social_icon] => i-x ) [1] => Array ( [f_s_link] => https://www.instagram.com/indium.tech/ [f_social_icon] => i-insta ) [2] => Array ( [f_s_link] => https://www.linkedin.com/company/indiumsoftware/ [f_social_icon] => i-linkedin ) [3] => Array ( [f_s_link] => https://www.facebook.com/indiumsoftware/ [f_social_icon] => i-facebook ) )