Sunday, September 9, 2018

Azure Batch AI

Today I tried Azure Batch AI and want to share something useful. The most complex part for me was pricing. The official documentation mentions there is no additional pricing by Azure Batch AI, but it is important when and how it charges us.
https://azure.microsoft.com/en-us/pricing/details/batch-ai/
1. Create Azure Batch AI on your Azure Portal
1-1. Go to Azure Portal and launch Azure cloud shell. This CLI is useful when you manage Azure resources. I prefer Linux and choose Bash here.

1-2. Create resource group for this Azure Batch AI test in East US 2

1-3. Create workspace in the created Azure Batch AI resource. You can find Azure Batch AI is created in your Azure Portal.

2. Prepare storage for input and output files

2-1. Create storage account

2-2. Create Blob container to keep input files

2-3. Create Blob container to store output files

2-4. Go to /clouddrive and download train_mnist.py to the directory

2-5. Upload train_mnist.py in Azure Cloud Shell "clouddrive" directory to Blob container "inputs"

3. Create Cluster and Experiment

3-1. Create cluster by using the following command to create a single cluster NC6 NVIDIA Tesla K80 GPU and generate ssh keys

You can see "CURRENT NODE COUNT" is going to change. Keep in mind that it charges you when a current node count is one or more regardless of the experiment or job status

3-2. Create Experiment by the following command

3-3. Create job.json for setting to find input file and output location, and then open the job.json by vim command

3-4. Run the job by the following command
3-5. After finishing the job, you have to change the node count down into zero. Otherwise, it keeps charging you!!! Go to the cluster and choose "Scale"

Change Target number of nodes down into 0 and "Save"

You can see the current node count is changing into zero

In a few minutes, you can make sure the current node is zero, which means you are not being charged any more