anatools.anaclient.datasets module

Dataset Functions

cancel_dataset(self, datasetId, workspaceId=None)

Stop a running job.

Parameters
  • datasetId (str) – Dataset ID of the running job to stop.

  • workspaceId (str) – Workspace ID of the running job. If none is provided, the default workspace will get used.

Returns

Returns True if the job was cancelled successfully.

Return type

bool

create_dataset(self, name, graphId, description='', runs=1, priority=1, seed=1, compressDataset=True, tags=[], workspaceId=None)

Create a new synthetic dataset using a graph in the workspace. This will start a new dataset job in the workspace.

Parameters
  • name (str) – Name for dataset.

  • graphId (str) – ID of the graph to create dataset from.

  • description (str) – Description for new dataset.

  • runs (int) – Number of times a channel will run within a single job. This is also how many different images will get created within the dataset.

  • priority (int) – Job priority.

  • seed (int) – Seed number.

  • compressDataset (bool) – Whether to compress the dataset. If false, the dataset asset will be accessed through mount_workspaces.

  • tags (list[str]) – Tags for new dataset.

  • workspaceId (str) – Workspace ID of the staged graph’s workspace. If none is provided, the current workspace will get used.

Returns

Success or failure message about dataset creation.

Return type

str

create_mixed_dataset(self, name, parameters, description='', seed=None, tags=None, workspaceId=None)
Creates a new datasts using the samples provided in the parameters dict. The dict must be defined by:
{

“datasetId1”: {“samples”: <int>, “classes”: [<class1>, class2>, …]}, “datasetId2”: {“samples”: <int>}, …

}

Parameters
  • name (str) – The name of the new mixed dataset

  • parameters (dict) – A dictionary of datasetId keys with values of {“samples”: <int>, “classes”: [<class1>, class2], …}

  • description (str) – Description for new dataset.

  • seed (int) – The seed for the mixed dataset, used to set the random seed.

  • tags (list[str]) – A list of tags to apply to the new dataset.

  • workspaceId (str) – The workspace the dataset is in.

Returns

The dataset ID of the new mixed dataset.

Return type

str

delete_dataset(self, datasetId, workspaceId=None)

Delete an existing dataset.

Parameters
  • datasetId (str) – Dataset ID of dataset to delete.

  • workspaceId (str) – Workspace ID that the dataset is in. If none is provided, the current workspace will get used.

Returns

Returns True if the dataset was deleted successfully.

Return type

bool

download_dataset(self, datasetId, workspaceId=None, localDir=None)

Download a dataset.

Parameters
  • datasetId (str) – Dataset ID of dataset to download.

  • workspaceId (str) – Workspace ID that the dataset is in. If none is provided, the default workspace will get used.

  • localDir (str) – Path for where to download the dataset. If none is provided, current working directory will be used.

Returns

Returns the path the dataset was downloaded to.

Return type

str

Raises
  • ValueError – If datasetId is not provided.

  • Exception – If the dataset cannot be downloaded (e.g., still running, failed, or not found).

download_dataset_file(self, datasetId, filepath, workspaceId=None, localDir=None)

Download a single file from a dataset.

This allows downloading individual files from a dataset rather than downloading the entire dataset archive. Use get_dataset_files() to list available files in a dataset.

Parameters
  • datasetId (str) – Dataset ID of the dataset containing the file.

  • filepath (str) – Relative path to the file within the dataset (e.g., “images/000000-1-image.png”).

  • workspaceId (str) – Workspace ID that the dataset is in. If none is provided, the default workspace will get used.

  • localDir (str) – Path for where to download the file. If none is provided, current working directory will be used.

Returns

Returns the path the file was downloaded to.

Return type

str

Raises
  • ValueError – If datasetId or filepath is not provided.

  • Exception – If the file cannot be downloaded (e.g., not found).

Examples

>>> # First, list available files
>>> files = ana.get_dataset_files(datasetId='abc123', path='images')
>>> print(files)
['000000-1-image.png', '000001-1-image.png', ...]
>>>
>>> # Then download a specific file
>>> path = ana.download_dataset_file(datasetId='abc123', filepath='images/000000-1-image.png')
>>> print(path)
'/home/user/000000-1-image.png'
edit_dataset(self, datasetId, description=None, name=None, pause=None, priority=None, tags=None, workspaceId=None)

Update dataset properties.

Parameters
  • datasetId (str) – Dataset ID to edit the name, description or tags for.

  • name (str) – New name for dataset.

  • description (str) – New description.

  • tags (list) – New tags for dataset.

  • pause (bool) – Pauses the dataset job if it is running.

  • priority (int) – New priority for dataset job (1-3).

  • workspaceId (str) – Workspace ID of the dataset to get updated. If none is provided, the current workspace will get used.

Returns

Returns True if the dataset was updated successfully.

Return type

bool

get_dataset_files(self, datasetId, path=None, workspaceId=None, cursor=None, limit=100)

Gets a list of files that are contained in the specified dataset

Parameters
  • datasetId (str) – Dataset ID to filter.

  • path (str) – Directory path in the dataset, e.g. “images”

  • workspaceId (str) – Workspace ID of the dataset’s workspace. If none is provided, the current workspace will get used.

  • cursor (str) – Cursor for pagination.

  • limit (int) – Maximum number of files to retrieve.

Returns

List of file names.

Return type

[str]

get_dataset_jobs(self, organizationId=None, workspaceId=None, datasetId=None, cursor=None, limit=None, filters=None, fields=None)

Queries the organization or workspace for active dataset jobs based off provided parameters. If neither organizationId or workspaceId is provided, the current workspace will get used.

Parameters
  • organizationId (str) – Queries an organization for active dataset jobs.

  • workspaceId (str) – Queries a workspace for active dataset jobs.

  • datasetId (str) – Dataset ID to filter.

  • cursor (str) – Cursor for pagination.

  • limit (int) – Maximum number of dataset jobs to return.

  • filters (dict) – Filters items that match the filter

  • fields (list) – List of fields to return, leave empty to get all fields.

Returns

Information about the active dataset jobs.

Return type

str

get_dataset_log(self, datasetId, runId, saveLogFile=False, workspaceId=None, fields=None)

Shows dataset log information to the user.

Parameters
  • datasetId (str) – The dataset the run belongs to.

  • runId (str) – The run to retrieve the log for.

  • saveLogFile (bool) – If True, saves log file to current working directory.

  • workspaceId (str) – The workspace the run belongs to.

  • fields (list) – List of fields to return, leave empty to get all fields.

Returns

Get log information by runId

Return type

list[dict]

get_dataset_runs(self, datasetId, state=None, workspaceId=None, fields=None)

Shows all dataset run information to the user. Can filter by state.

Parameters
  • datasetId (str) – The dataset to retrieve logs for.

  • state (str) – Filter run list by status.

  • workspaceId (str) – The workspace the dataset is in.

  • fields (list) – List of fields to return, leave empty to get all fields.

Returns

List of run associated with datasetId.

Return type

list[dict]

get_datasets(self, datasetId=None, workspaceId=None, filters=None, cursor=None, limit=None, fields=None)

Queries the workspace datasets based off provided parameters.

Parameters
  • datasetId (str) – Dataset ID to filter.

  • workspaceId (str) – Workspace ID of the dataset’s workspace. If none is provided, the current workspace will get used.

  • filters (dict) – Filters items that match the filter

  • cursor (str) – Cursor for pagination.

  • limit (int) – Maximum of datasets to return.

  • fields (list) – List of fields to return, leave empty to get all fields.

Returns

Information about the dataset based off the query parameters.

Return type

list[dict]

upload_dataset(self, filename, description=None, tags=None, workspaceId=None)

Uploads a compressed file to the datasets library in the workspace.

Parameters
  • filename (str) – Path to the dataset folder or file for uploading. Must be zip or tar file types.

  • description (str) – Description for new dataset.

  • tags (list[str]) – Tags for new dataset.

  • workspaceId (str) – WorkspaceId to upload dataset to. Defaults to current.

Returns

The unique identifier for this dataset.

Return type

str