Scheduler#

class Runner(check_interrupted, check_reload, reset_reload, celery, device_manager, remote_job_params, jobs_db, redis_data, no_sleep=False)[source]#

Checks for new inference and classifier jobs and scheduler them when there is a free device. Progress of running jobs is tracked and dependencies between different request are taken into account. Furthermore, it is ensured that same request is not executed twice if the same request has successfully finished in the past, is currently waiting for execution or is being executed at the moment.

However, if the same request has failed in the past, then it is executed again.

Internally, hashes and IDs are used to identify requests. Hash denotes the meaning of the request and is same for two equivalent requests. On the other hand, ID is different for two requests, even if they have the same meaning. Each request that is a duplicate of some other request is simply marked as successful if its equivalent has completed successfully in the past. If its equivalent is running at the moment, then the new request adopts its status.

Dependencies are resolved based on IDs rather than hashes. This way, if some job failed in the past, a new job that depends on that job will not automatically fail, but will wait for the execution of the new equivalent job.

Parameters:

check_interrupted (Callable[[], bool]) – A function which when called without any parameters returns True if the runner should stop processing new requests and monitoring running jobs and False if runner should continue with its operation.
celery (CeleryJobManager) – Interface for communication with Celery. device_manager (DeviceManager): Device manager.
remote_job_params (RemoteJobParams) – Parameters relevant for scheduling Celery jobs.
jobs_db (JobsDBInterface) – Interface for communication with PostgreSQL.
redis_data (Tuple[InferenceQueue, StatusMap, ClassifierDeps]) – Interfaces for communication with Redis.
no_sleep (bool) – True is there should be no delay between iterations of scheduling new jobs and checking existing jobs, False otherwise.

run()[source]#: Main method that calls all other private methods.

class CeleryJobManager[source]#

Manages communication with Celery.

abstract start_job(job_name, args, queue)[source]#

Starts a job using Celery.

Parameters:

job_name (str) – Name of the remote function.
args (tuple) – Arguments passed to the remote function.
queue (str) – Name of the queue into which the job will be put.

Returns:

Celery ID which can be used with get_result to obtain the job status.

Return type:

str

abstract get_status(celery_id)[source]#

Returns the current status of the job and the result if relevant.

Parameters:: celery_id (str) – Job ID assigned by Celery.
Returns:: Job status and result if relevant.
Return type:: AsyncResult

class DeviceManager(gpu_ids_string, max_cpu_jobs)[source]#

Tracks which CPU and GPU devices are currently in use and which are free. If no GPU device is specified, GPU jobs are treated as CPU jobs, meaning that the results will give an indication that there are free GPUs even though there are not GPUs available.

Parameters:

gpu_ids_string (str) – Specification of GPU devices to use.
max_cpu_jobs (int) – Maximal number of concurrent CPU jobs.

any_gpu_free()[source]#: True if any GPU device is free, False otherwise. In case that there are no GPU devices specified, the result indicates whether there is a free CPU slot.

any_cpu_free()[source]#: True if any CPU device is free, False otherwise.

get_free_gpu()[source]#

Requests a free GPU device or a CPU slot if there are no GPU devices specified.

Returns:: ID of the free device.
Return type:: DeviceID

get_free_cpu()[source]#

Requests a free CPU slot.

Returns:: Value None, which denotes the use of a CPU.
Return type:: DeviceID

release_device(device_id)[source]#

Makes the specified device available for other jobs.

Parameters:: device_id (DeviceID) – Device to release.

static get_gpu_ids_from_string(gpu_ids_string)[source]#

Parses string which specifies which GPU devices to use. The resulting strings can be used to set environment variable ‘CUDA_VISIBLE_DEVICES’. The returned device IDs always look like ‘0’,’1’,’2’,… even if the specified devices have different IDs.

This is because when using ‘device_ids’ property within docker-compose GPU devices are remapped to start with 0.

> (xzyao) Updated: Now it runs on baremetal without docker-compose, so we don’t remap the id to avoid conflicts with others

Parameters:: gpu_ids_string (str) – Specification of GPU devices to use.
Returns:: GPU IDs that will be used to select a GPU.
Return type:: Sequence[DeviceID]

class RemoteJobParams(general_inference_job_name, general_classifier_job_name, general_task2vec_job_name, general_finetune_job_name, general_queue_name)[source]#

property general_inference_job_name#: Alias for field number 0

property general_classifier_job_name#: Alias for field number 1

property general_task2vec_job_name#: Alias for field number 2

property general_finetune_job_name#: Alias for field number 3

property general_queue_name#: Alias for field number 4