https://azure.microsoft.com/en-us/pricing/details/batch-ai/
data:image/s3,"s3://crabby-images/44d54/44d549029fc6d239d0819e54b40876569db89423" alt=""
1. Create Azure Batch AI on your Azure Portal
1-1. Go to Azure Portal and launch Azure cloud shell. This CLI is useful when you manage Azure resources. I prefer Linux and choose Bash here.
1-2. Create resource group for this Azure Batch AI test in East US 2
data:image/s3,"s3://crabby-images/b6370/b63706b4dd0eeb4ddae394251c1b4324fb1ba583" alt=""
1-3. Create workspace in the created Azure Batch AI resource. You can find Azure Batch AI is created in your Azure Portal.
data:image/s3,"s3://crabby-images/a7202/a72021e3cbfa187270e924f357f54580770c7933" alt=""
2. Prepare storage for input and output files
2-1. Create storage account
2-2. Create Blob container to keep input files
2-3. Create Blob container to store output files
data:image/s3,"s3://crabby-images/c9bd8/c9bd87e8f7fc8c014db89aee537c02cf18d77b54" alt=""
2-4. Go to /clouddrive and download train_mnist.py to the directory
2-5. Upload train_mnist.py in Azure Cloud Shell "clouddrive" directory to Blob container "inputs"
data:image/s3,"s3://crabby-images/8fc91/8fc9198e3a19ffc04163d1bc381e32f993b3fa35" alt=""
3. Create Cluster and Experiment
3-1. Create cluster by using the following command to create a single cluster NC6 NVIDIA Tesla K80 GPU and generate ssh keys
You can see "CURRENT NODE COUNT" is going to change. Keep in mind that it charges you when a current node count is one or more regardless of the experiment or job status
data:image/s3,"s3://crabby-images/c4284/c42843b39d494e0ee8454c58a8f53460769eaae4" alt=""
data:image/s3,"s3://crabby-images/58238/582387cc903cd37347c16910545526758c3c2188" alt=""
3-2. Create Experiment by the following command
3-3. Create job.json for setting to find input file and output location, and then open the job.json by vim command
3-4. Run the job by the following command
3-5. After finishing the job, you have to change the node count down into zero. Otherwise, it keeps charging you!!! Go to the cluster and choose "Scale"
data:image/s3,"s3://crabby-images/d5dac/d5dac462a22a6c3609d2ee95040f0f515837a026" alt=""
Change Target number of nodes down into 0 and "Save"
data:image/s3,"s3://crabby-images/f8470/f8470e9e9beeaeac609aaf4b443eec1b965095d4" alt=""
You can see the current node count is changing into zero
data:image/s3,"s3://crabby-images/32644/326443f99492afce62a1712222f523d731107b0b" alt=""
In a few minutes, you can make sure the current node is zero, which means you are not being charged any more
data:image/s3,"s3://crabby-images/f5e23/f5e23bc6f77d490cec8604840cbd5049f2264a33" alt=""