Hi there, great to meet you here. My name is Neil, currently working as a Data Engineer operating in the cloud. Previously I was working in DevOps capacity, focusing on building useful CICD pipelines, automations, APIs and else. Through out my career, I’d like to share some tips and tricks which I hope you find helpful !
Purview Lineage: Part A Databricks Manual Lineage
Purview has been published by Microsoft as a unified data governance solution to help manage and govern your multi-cloud, SaaS and on prem data. You can create a holistic and up-to-date view of your data landscape with automated data discovery, data classification and end to end lineage. This provides data users with valuable, trustworthy data management. While the auto scanned lineage is useful most of the times, there are always cases where you need to manually generate your lineage graph....
Databricks row and column level security
Recently I had a chat with one of client regarding on access control of their reports and dashboards. Interestingly it was found out that client is currently doing this by creating similar reports and granting access to people in different security groups. Obviously this is not the best idea because of redundant reports, the ideal solution is to implement row and column level security on the table so that people in different access groups will have visibility to subsets of the rows in the table or view....
Azure networking: Hub and spoke topology with terraform
The hub and spoke topology has been widely adopted for enterprise production deployment. In this lab, let put on our network/infrastructure engineer hat and get our hand dirty on Azure Hub and spoke topology with one of the popular IaC – Terraform. Lets have a look at the high level architecture first. Overall architecture of the lab The essence of the topology is, by the name of it, having all traffic routed to hub before it gets forwarded to spoke....
Secure Databricks cluster with vNet injection and access resources via Azure private endpoint
What an interesting topic I had recently regarding on security hardening Databricks using Secure cluster connectivity + vNet injection. This configuration will allow the cluster to access Azure Data Lake Storage (I know right ?! what a popular combination!) and keyvault with private endpoint. In this post, in a lab environment, we will find out how we can put Databricks cluster inside existing Azure virtual network and access private endpoint deployed inside it....
Consume Websocket stream and send to Prometheus in Python
Recently I was tasked with consuming data from websocket, analyze it and then send data to Prometheus. The theory is pretty straight forward: getting data from websocket API in a stream and analyze and take the data points and send it to prometheus for visualization. In this blog you will have all the steps and code needed to reproduce this flow. With this in mind, I decided using python to achieve all these....