This function takes the task type, dataset, parameters and GPU node which users enter to launch a training task in the GPU cluster for users to monitor and manage. The panel for this function is as below. Supported task types include Image Classification and Depth Estimation. Select a dataset that you have uploaded to the server in the designated format, a GPU node and training parameters before clicking Start Training!
Training Task Management Panel
Function Two: Training Task and Machine Status Monitoring
This function allows users to monitor running tasks, model convergence and machine status. The panel is as below. Model convergence provides information such as loss function, the accuracy of the training set and the test set. Machine status shows GPU utilization, power consumption, etc.
Training Task and Machine Status Monitoring Panel
Function Three: Model Inference Service
After a model has been trained, the final network model parameters will be saved in the cluster’s file system. To run the model inference service, select a model and upload the object data. The panel for this function is as below. The left section is for users to select a model and upload an image as the inference object. The result of inference will be shown on the right side. For Image Classification, the service will show a few categories which the model considers the most probable. For Depth Estimation, the service will present a depth map of the object image, which users can download to local.