anatools.anaclient.datasets module¶
Dataset Functions
- cancel_dataset(self, datasetId, workspaceId=None)¶
Stop a running job.
- Parameters
datasetId (str) – Dataset ID of the running job to stop.
workspaceId (str) – Workspace ID of the running job. If none is provided, the default workspace will get used.
- Returns
Returns True if the job was cancelled successfully.
- Return type
bool
- create_dataset(self, name, graphId, description='', runs=1, priority=1, seed=1, tags=[], workspaceId=None)¶
Create a new synthetic dataset using a graph in the workspace. This will start a new dataset job in the workspace.
- Parameters
name (str) – Name for dataset.
graphId (str) – ID of the graph to create dataset from.
description (str) – Description for new dataset.
runs (int) – Number of times a channel will run within a single job. This is also how many different images will get created within the dataset.
priority (int) – Job priority.
seed (int) – Seed number.
tags (list[str]) – Tags for new dataset.
workspaceId (str) – Workspace ID of the staged graph’s workspace. If none is provided, the current workspace will get used.
- Returns
Success or failure message about dataset creation.
- Return type
str
- create_mixed_dataset(self, name, parameters, description='', seed=None, tags=None, workspaceId=None)¶
- Creates a new datasts using the samples provided in the parameters dict. The dict must be defined by:
- {
“datasetId1”: {“samples”: <int>, “classes”: [<class1>, class2>, …]}, “datasetId2”: {“samples”: <int>}, …
}
- Parameters
name (str) – The name of the new mixed dataset
parameters (dict) – A dictionary of datasetId keys with values of {“samples”: <int>, “classes”: [<class1>, class2], …}
description (str) – Description for new dataset.
seed (int) – The seed for the mixed dataset, used to set the random seed.
tags (list[str]) – A list of tags to apply to the new dataset.
workspaceId (str) – The workspace the dataset is in.
- Returns
The dataset ID of the new mixed dataset.
- Return type
str
- delete_dataset(self, datasetId, workspaceId=None)¶
Delete an existing dataset.
- Parameters
datasetId (str) – Dataset ID of dataset to delete.
workspaceId (str) – Workspace ID that the dataset is in. If none is provided, the current workspace will get used.
- Returns
Returns True if the dataset was deleted successfully.
- Return type
bool
- download_dataset(self, datasetId, workspaceId=None, localDir=None)¶
Download a dataset.
- Parameters
datasetId (str) – Dataset ID of dataset to download.
workspaceId (str) – Workspace ID that the dataset is in. If none is provided, the default workspace will get used.
localDir (str) – Path for where to download the dataset. If none is provided, current working directory will be used.
- Returns
Returns the path the dataset was downloaded to.
- Return type
str
- edit_dataset(self, datasetId, description=None, name=None, pause=None, priority=None, tags=None, workspaceId=None)¶
Update dataset properties.
- Parameters
datasetId (str) – Dataset ID to edit the name, description or tags for.
name (str) – New name for dataset.
description (str) – New description.
tags (list) – New tags for dataset.
pause (bool) – Pauses the dataset job if it is running.
priority (int) – New priority for dataset job (1-3).
workspaceId (str) – Workspace ID of the dataset to get updated. If none is provided, the current workspace will get used.
- Returns
Returns True if the dataset was updated successfully.
- Return type
bool
- get_dataset_files(self, datasetId, path=None, workspaceId=None, cursor=None, limit=100)¶
Gets a list of files that are contained in the specified dataset
- Parameters
datasetId (str) – Dataset ID to filter.
path (str) – Directory path in the dataset, e.g. “images”
workspaceId (str) – Workspace ID of the dataset’s workspace. If none is provided, the current workspace will get used.
cursor (str) – Cursor for pagination.
limit (int) – Maximum number of files to retrieve.
- Returns
List of file names.
- Return type
[str]
- get_dataset_jobs(self, organizationId=None, workspaceId=None, datasetId=None, cursor=None, limit=None, filters=None, fields=None)¶
Queries the organization or workspace for active dataset jobs based off provided parameters. If neither organizationId or workspaceId is provided, the current workspace will get used.
- Parameters
organizationId (str) – Queries an organization for active dataset jobs.
workspaceId (str) – Queries a workspace for active dataset jobs.
datasetId (str) – Dataset ID to filter.
cursor (str) – Cursor for pagination.
limit (int) – Maximum number of dataset jobs to return.
filters (dict) – Filters items that match the filter
fields (list) – List of fields to return, leave empty to get all fields.
- Returns
Information about the active dataset jobs.
- Return type
str
- get_dataset_log(self, datasetId, runId, saveLogFile=False, workspaceId=None, fields=None)¶
Shows dataset log information to the user.
- Parameters
datasetId (str) – The dataset the run belongs to.
runId (str) – The run to retrieve the log for.
saveLogFile (bool) – If True, saves log file to current working directory.
workspaceId (str) – The workspace the run belongs to.
fields (list) – List of fields to return, leave empty to get all fields.
- Returns
Get log information by runId
- Return type
list[dict]
- get_dataset_runs(self, datasetId, state=None, workspaceId=None, fields=None)¶
Shows all dataset run information to the user. Can filter by state.
- Parameters
datasetId (str) – The dataset to retrieve logs for.
state (str) – Filter run list by status.
workspaceId (str) – The workspace the dataset is in.
fields (list) – List of fields to return, leave empty to get all fields.
- Returns
List of run associated with datasetId.
- Return type
list[dict]
- get_datasets(self, datasetId=None, workspaceId=None, filters=None, cursor=None, limit=None, fields=None)¶
Queries the workspace datasets based off provided parameters.
- Parameters
datasetId (str) – Dataset ID to filter.
workspaceId (str) – Workspace ID of the dataset’s workspace. If none is provided, the current workspace will get used.
filters (dict) – Filters items that match the filter
cursor (str) – Cursor for pagination.
limit (int) – Maximum of datasets to return.
fields (list) – List of fields to return, leave empty to get all fields.
- Returns
Information about the dataset based off the query parameters.
- Return type
list[dict]
- upload_dataset(self, filename, description=None, tags=None, workspaceId=None)¶
Uploads a compressed file to the datasets library in the workspace.
- Parameters
filename (str) – Path to the dataset folder or file for uploading. Must be zip or tar file types.
workspaceId (str) – WorkspaceId to upload dataset to. Defaults to current.
description (str) – Description for new dataset.
- Returns
The unique identifier for this dataset.
- Return type
str