anatools.anaclient.datasets module

Dataset Functions

cancel_dataset(self, datasetId, workspaceId=None)

Stop a running job.

Parameters
  • datasetId (str) – Dataset ID of the running job to stop.

  • workspaceId (str) – Workspace ID of the running job. If none is provided, the default workspace will get used.

Returns

Returns True if the job was cancelled successfully.

Return type

bool

create_dataset(self, name, graphId, description='', runs=1, priority=1, seed=1, tags=[], workspaceId=None)

Create a new synthetic dataset using a graph in the workspace. This will start a new dataset job in the workspace.

Parameters
  • name (str) – Name for dataset.

  • graphId (str) – ID of the graph to create dataset from.

  • description (str) – Description for new dataset.

  • runs (int) – Number of times a channel will run within a single job. This is also how many different images will get created within the dataset.

  • priority (int) – Job priority.

  • seed (int) – Seed number.

  • tags (list[str]) – Tags for new dataset.

  • workspaceId (str) – Workspace ID of the staged graph’s workspace. If none is provided, the current workspace will get used.

Returns

Success or failure message about dataset creation.

Return type

str

create_mixed_dataset(self, name, parameters, description='', seed=None, tags=None, workspaceId=None)
Creates a new datasts using the samples provided in the parameters dict. The dict must be defined by:
{

“datasetId1”: {“samples”: <int>, “classes”: [<class1>, class2>, …]}, “datasetId2”: {“samples”: <int>}, …

}

Parameters
  • name (str) – The name of the new mixed dataset

  • parameters (dict) – A dictionary of datasetId keys with values of {“samples”: <int>, “classes”: [<class1>, class2], …}

  • description (str) – Description for new dataset.

  • seed (int) – The seed for the mixed dataset, used to set the random seed.

  • tags (list[str]) – A list of tags to apply to the new dataset.

  • workspaceId (str) – The workspace the dataset is in.

Returns

The dataset ID of the new mixed dataset.

Return type

str

delete_dataset(self, datasetId, workspaceId=None)

Delete an existing dataset.

Parameters
  • datasetId (str) – Dataset ID of dataset to delete.

  • workspaceId (str) – Workspace ID that the dataset is in. If none is provided, the current workspace will get used.

Returns

Returns True if the dataset was deleted successfully.

Return type

bool

download_dataset(self, datasetId, workspaceId=None, localDir=None)

Download a dataset.

Parameters
  • datasetId (str) – Dataset ID of dataset to download.

  • workspaceId (str) – Workspace ID that the dataset is in. If none is provided, the default workspace will get used.

  • localDir (str) – Path for where to download the dataset. If none is provided, current working directory will be used.

Returns

Returns the path the dataset was downloaded to.

Return type

str

edit_dataset(self, datasetId, description=None, name=None, pause=None, priority=None, tags=None, workspaceId=None)

Update dataset properties.

Parameters
  • datasetId (str) – Dataset ID to edit the name, description or tags for.

  • name (str) – New name for dataset.

  • description (str) – New description.

  • tags (list) – New tags for dataset.

  • pause (bool) – Pauses the dataset job if it is running.

  • priority (int) – New priority for dataset job (1-3).

  • workspaceId (str) – Workspace ID of the dataset to get updated. If none is provided, the current workspace will get used.

Returns

Returns True if the dataset was updated successfully.

Return type

bool

get_dataset_files(self, datasetId, path=None, workspaceId=None, cursor=None, limit=100)

Gets a list of files that are contained in the specified dataset

Parameters
  • datasetId (str) – Dataset ID to filter.

  • path (str) – Directory path in the dataset, e.g. “images”

  • workspaceId (str) – Workspace ID of the dataset’s workspace. If none is provided, the current workspace will get used.

  • cursor (str) – Cursor for pagination.

  • limit (int) – Maximum number of files to retrieve.

Returns

List of file names.

Return type

[str]

get_dataset_jobs(self, organizationId=None, workspaceId=None, datasetId=None, cursor=None, limit=None, filters=None, fields=None)

Queries the organization or workspace for active dataset jobs based off provided parameters. If neither organizationId or workspaceId is provided, the current workspace will get used.

Parameters
  • organizationId (str) – Queries an organization for active dataset jobs.

  • workspaceId (str) – Queries a workspace for active dataset jobs.

  • datasetId (str) – Dataset ID to filter.

  • cursor (str) – Cursor for pagination.

  • limit (int) – Maximum number of dataset jobs to return.

  • filters (dict) – Filters items that match the filter

  • fields (list) – List of fields to return, leave empty to get all fields.

Returns

Information about the active dataset jobs.

Return type

str

get_dataset_log(self, datasetId, runId, saveLogFile=False, workspaceId=None, fields=None)

Shows dataset log information to the user.

Parameters
  • datasetId (str) – The dataset the run belongs to.

  • runId (str) – The run to retrieve the log for.

  • saveLogFile (bool) – If True, saves log file to current working directory.

  • workspaceId (str) – The workspace the run belongs to.

  • fields (list) – List of fields to return, leave empty to get all fields.

Returns

Get log information by runId

Return type

list[dict]

get_dataset_runs(self, datasetId, state=None, workspaceId=None, fields=None)

Shows all dataset run information to the user. Can filter by state.

Parameters
  • datasetId (str) – The dataset to retrieve logs for.

  • state (str) – Filter run list by status.

  • workspaceId (str) – The workspace the dataset is in.

  • fields (list) – List of fields to return, leave empty to get all fields.

Returns

List of run associated with datasetId.

Return type

list[dict]

get_datasets(self, datasetId=None, workspaceId=None, filters=None, cursor=None, limit=None, fields=None)

Queries the workspace datasets based off provided parameters.

Parameters
  • datasetId (str) – Dataset ID to filter.

  • workspaceId (str) – Workspace ID of the dataset’s workspace. If none is provided, the current workspace will get used.

  • filters (dict) – Filters items that match the filter

  • cursor (str) – Cursor for pagination.

  • limit (int) – Maximum of datasets to return.

  • fields (list) – List of fields to return, leave empty to get all fields.

Returns

Information about the dataset based off the query parameters.

Return type

list[dict]

upload_dataset(self, filename, description=None, tags=None, workspaceId=None)

Uploads a compressed file to the datasets library in the workspace.

Parameters
  • filename (str) – Path to the dataset folder or file for uploading. Must be zip or tar file types.

  • workspaceId (str) – WorkspaceId to upload dataset to. Defaults to current.

  • description (str) – Description for new dataset.

Returns

The unique identifier for this dataset.

Return type

str