The Azure Data Lake Store is a cloud repository where you can easily store data of any size or any type. It is the Hadoop Distributed File System for the cloud and available on-demad. Data stored in Data Lake Store is easily accessible to Azure Data Lake Analytics and Azure HDInsight. It will be possible to integrate it with other Hadoop distributions and projects like Hortonworks , Cloudera, spark, strom and flume.
Below are the steps to create Azure Data Late Store and manage it using Azure Portal and Azure CLI.
Creating Azure Data Lake Store using portal
On Azure portal go to New -> Data + storage -> Data Lake Store
- Enter a name for Data Lake Store.
- Select a resource group or create a new resource group.
- Select Location as East US 2
- Click on create
Using Data Explorer to manage Data files
Azure Portal Data Explorer helps to visualise and manage the data files in Azure Data Lake Store. It provides operations such as upload, preview , download, rename, delete and manage accessibility of data files.
Uploading a File:
To upload a data file click on upload button. On Upload files pane select the required file to be uploaded and click on start upload.
File Preview:
Click on the file to have a preview of the data. The data format and the number of rows to display can be set using format settings.
Getting PATH of a file:
To get the path and webhdfs path of the file go to properties of the file. The path can be used in HDInsight clusters.
Managing Azure Data Lake Storage Using Azure CLI
Installing Azure CLI
-
- Install Microsoft Azure Cross-platform Command Line Tools. Below is the link to download the Web Platform Installer.
http://go.microsoft.com/?linkid=9828653&clcid=0x409
- Install Microsoft Azure Cross-platform Command Line Tools. Below is the link to download the Web Platform Installer.
- Install Node.js. Below is the link to the installer
https://nodejs.org/dist/v5.1.0/node-v5.1.0-x64.msi
Managing Azure Data Lake Store form CLI
Open a new command prompt or a Powershell and type command azure login. It will return a message with device login url and a code. Open a browser and go to https://aka.ms/devicelogin. Enter the given code in the message and login.
After successful login come back to the Powershell.
Change the mode to the Azure Resource Manger mode using below command
[code]azure config mode arm[/code]
If you have more than one subscription for your account you need to set the subscription
To list all your subscription
[code]azure account list[/code]
To set the subscription
[code]azure account set <subscriptionname>[/code]
You need a resource group to create a new datalake store accouont. You can create a new one or use an existing one. Below is the command to create a new resource group.
[code]azure group create <resourceGroup Name> <location>[/code]
e.g. [code]azure group create myDataLakeResourceGroup “East US 2″[/code]
Note that Azure Data Lake Preview is only available to location East US 2.
Use below command to create a new Data Lake Store account
[code]azure datalake store account create <dataLakeStoreAccountName> <location> <resourceGroup>[/code]
e.g. [code]azure datalake store account create [code]testdatalakestore “East US 2” myDataLakeResourceGroup[/code]
Below are the commands to manage the Azure Data Lake Store
Create folder
[code]azure datalake store filesystem create <dataLakeStoreAccountName> <path> –folder[/code]
e.g. [code]azure datalake store filesystem create mynewdatalakestore /mynewfolder –folder[/code]
Upload data file
[code]azure datalake store filesystem import <dataLakeStoreAccountName> “<source path>” “<destination path>”[/code]
e.g. [code]azure datalake store filesystem import mynewdatalakestore “C:\Data\TestData.csv” “/mynewfolder/TestData.csv”[/code]
List files
[code]azure datalake store filesystem list <dataLakeStoreAccountName> <path>[/code]
e.g.[code]azure datalake store filesystem list testdatalakestore /mynewfolder[/code]
Download a file
[code]azure datalake store filesystem export <dataLakeStoreAccountName> <source_path> <destination_path>[/code]
e.g. [code]azure datalake store filesystem export testdatalakestore /mynewfolder/TestData.csv “C:\downloads\TestData.csv”[/code]
Rename a file
[code]azure datalake store filesystem move <dataLakeStoreAccountName> <path/old_file_name> <path/new_file_name>[/code]
e.g. [code]azure datalake store filesystem move testdatalakestore /mynewfolder/TestData.csv/mynewfolder/TestData_copy.csv[/code]
Delete a file
[code]azure datalake store filesystem delete <dataLakeStoreAccountName> <path>[/code]
e.g. [code]azure datalake store filesystem delete testdatalakestore /mynewfolder/TestData_copy.csv[/code]
View access control list
[code]azure datalake store permissions show <dataLakeStoreName> <path>[/code]
e.g. [code]azure datalake store permissions show testdatalakestore /[/code]
Delete Data Lake Store account
[code]azure datalake store account delete <dataLakeStoreAccountName>[/code]
e.g. [code]azure datalake store account delete testdatalakestore[/code]