Package datatap
This module provides classes and methods for interacting with dataTap. This includes inspecting individual annotations, creating or importing new annotations, and creating or loading datasets for machine learning.
The visual data management platform from Zensors.
Join for free at app.datatap.dev.
The dataTap Python library is the primary interface for using dataTap's rich data management tools. Create datasets, stream annotations, and analyze model performance all with one library.
Documentation
Full documentation is available at docs.datatap.dev.
Features
- [x] ⚡ Begin training instantly
- [x] 🔥 Works with all major ML frameworks (Pytorch, TensorFlow, etc.)
- [x] 🛰️ Real-time streaming to avoid large dataset downloads
- [x] 🌐 Universal data format for simple data exchange
- [x] 🎨 Combine data from multiples sources into a single dataset easily
- [x] 🧮 Rich ML utilities to compute PR-curves, confusion matrices, and accuracy metrics.
- [x] 💽 Free access to a variety of open datasets.
Getting Started (Platform)
To begin, select a dataset from the dataTap repository.
Then copy the starter code based on your library preference.
Paste the starter code and start training.
Getting Started (API)
Install the client library.
pip install datatap
Register at app.datatap.dev. Then, go to Settings > Api Keys
to find your personal API key.
export DATATAP_API_KEY="XXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXX"
Start using open datasets instantly.
from datatap import Api
api = Api()
coco = api.get_default_database().get_repository("_/coco")
dataset = coco.get_dataset("latest")
print("COCO: ", dataset)
Data Streaming Example
import itertools
from datatap import Api
api = Api()
dataset = (api
.get_default_database()
.get_repository("_/wider-person")
.get_dataset("latest")
)
training_stream = dataset_version.stream_split("training")
for annotation in itertools.islice(training_stream, 5):
print("Received annotation:", annotation)
More Examples
Support and FAQ
Q. How do I resolve a missing API Key?
If you see the error Exception: No API key available. Either provide it or use the [DATATAP_API_KEY] environment variable
, then the dataTap library was not able to find your API key. You can find your API key on app.datatap.dev under settings. You can either set it as an environment variable or as the first argument to the Api
constructor.
Q. Can dataTap be used offline?
Some functionality can be used offline, such as the droplet utilities and metrics. However, repository access and dataset streaming require internet access, even for local databases.
Q. Is dataTap accepting contributions?
dataTap currently uses a separate code review system for managing contributions. The team is looking into switching that system to GitHub to allow public contributions. Until then, we will actively monitor the GitHub issue tracker to help accomodate the community's needs.
Q. How can I get help using dataTap?
You can post a question in the issue tracker. The dataTap team actively monitors the repository, and will try to get back to you as soon as possible.
Expand source code
"""
This module provides classes and methods for interacting with dataTap. This includes inspecting individual annotations,
creating or importing new annotations, and creating or loading datasets for machine learning.
.. include:: ../README.md
"""
import sys as _sys
if _sys.version_info < (3, 7):
print("\x1b[38;5;1mUsing an unsupported python version. Please install Python 3.7 or greater\x1b[0m")
raise Exception("Invalid python version")
from .api.entities import Api
__all__ = [
"Api",
"api",
"droplet",
"geometry",
"template",
"utils",
]
Sub-modules
datatap.api
-
The
datatap.api
module provides two different interfaces for the API … datatap.comet
datatap.droplet
-
This module provides classes for working with ML data. Specifically, it provides methods for creating new ML data objects, converting ML data objects …
datatap.examples
-
Example code
datatap.geometry
-
This module provides geometric primitives for storing or manipulating ML annotations …
datatap.metrics
-
The metrics module provides a number of utilities for analyzing droplets in the context of a broader training or evaluation job …
datatap.template
-
Templates are used to describe how a given annotation (or set of annotations) is structured …
datatap.tf
-
The
datatap.tf
module provides utilities for using dataTap with Tensorflow … datatap.torch
-
The
datatap.torch
module provides utilities for using dataTap with PyTorch … datatap.utils
-
A collection of primarily internal-use utilities.
Classes
class Api (api_key: Optional[str] = None, uri: Optional[str] = None)
-
The
Api
object is the primary method of interacting with the dataTap API.The
Api
constructor takes two optional arguments.The first,
api_key
, should be the current user's personal API key. In order to encourage good secret practices, this class will use the value found in theDATATAP_API_KEY
if no key is passed in. Consider using environment variables or another secret manager for your API keys.The second argument is
uri
. This should only be used if you would like to target a different API server than the default. For instance, if you are using a proxy to reach the API, you can use theuri
argument to point toward your proxy.This object encapsulates most of the logic for interacting with API. For instance, to get a list of all datasets that a user has access to, you can run
from datatap import Api api = Api() print([ dataset for database in api.get_database_list() for dataset in database.get_dataset_list() ])
For more details on the functionality provided by the Api object, take a look at its documentation.
Expand source code
class Api: """ The `Api` object is the primary method of interacting with the dataTap API. The `Api` constructor takes two optional arguments. The first, `api_key`, should be the current user's personal API key. In order to encourage good secret practices, this class will use the value found in the `DATATAP_API_KEY` if no key is passed in. Consider using environment variables or another secret manager for your API keys. The second argument is `uri`. This should only be used if you would like to target a different API server than the default. For instance, if you are using a proxy to reach the API, you can use the `uri` argument to point toward your proxy. This object encapsulates most of the logic for interacting with API. For instance, to get a list of all datasets that a user has access to, you can run ```py from datatap import Api api = Api() print([ dataset for database in api.get_database_list() for dataset in database.get_dataset_list() ]) ``` For more details on the functionality provided by the Api object, take a look at its documentation. """ def __init__(self, api_key: Optional[str] = None, uri: Optional[str] = None): self.endpoints = ApiEndpoints(api_key, uri) def get_current_user(self) -> User: """ Returns the current logged-in user. """ return User.from_json(self.endpoints, self.endpoints.user.current()) def get_database_list(self) -> List[Database]: """ Returns a list of all databases that the current user has access to. """ return [ Database.from_json(self.endpoints, json_db) for json_db in self.endpoints.database.list() ] def get_default_database(self) -> Database: """ Returns the default database for the user (this defaults to the public database). """ # TODO(zwade): Have a way of specifying a per-user default current_user = self.get_current_user() if current_user.default_database is None: raise Exception("Trying to find the default database, but none is specified") return self.get_database_by_uid(current_user.default_database) def get_database_by_uid(self, uid: str) -> Database: """ Queries a database by its UID and returns it. """ return Database.from_json(self.endpoints, self.endpoints.database.query_by_uid(uid)) @overload def get_database_by_name(self, name: str, allow_multiple: Literal[True]) -> List[Database]: ... @overload def get_database_by_name(self, name: str, allow_multiple: Literal[False] = False) -> Database: ... def get_database_by_name(self, name: str, allow_multiple: bool = False) -> Union[Database, List[Database]]: """ Queries a database by its name and returns it. If `allow_multiple` is true, it will return a list of databases. """ database_list = [ Database.from_json(self.endpoints, database) for database in self.endpoints.database.query_by_name(name) ] if allow_multiple: return database_list else: return assert_one(database_list)
Methods
def get_current_user(self) ‑> User
-
Returns the current logged-in user.
Expand source code
def get_current_user(self) -> User: """ Returns the current logged-in user. """ return User.from_json(self.endpoints, self.endpoints.user.current())
def get_database_by_name(self, name: str, allow_multiple: bool = False) ‑> Union[Database, List[Database]]
-
Queries a database by its name and returns it. If
allow_multiple
is true, it will return a list of databases.Expand source code
def get_database_by_name(self, name: str, allow_multiple: bool = False) -> Union[Database, List[Database]]: """ Queries a database by its name and returns it. If `allow_multiple` is true, it will return a list of databases. """ database_list = [ Database.from_json(self.endpoints, database) for database in self.endpoints.database.query_by_name(name) ] if allow_multiple: return database_list else: return assert_one(database_list)
def get_database_by_uid(self, uid: str) ‑> Database
-
Queries a database by its UID and returns it.
Expand source code
def get_database_by_uid(self, uid: str) -> Database: """ Queries a database by its UID and returns it. """ return Database.from_json(self.endpoints, self.endpoints.database.query_by_uid(uid))
def get_database_list(self) ‑> List[Database]
-
Returns a list of all databases that the current user has access to.
Expand source code
def get_database_list(self) -> List[Database]: """ Returns a list of all databases that the current user has access to. """ return [ Database.from_json(self.endpoints, json_db) for json_db in self.endpoints.database.list() ]
def get_default_database(self) ‑> Database
-
Returns the default database for the user (this defaults to the public database).
Expand source code
def get_default_database(self) -> Database: """ Returns the default database for the user (this defaults to the public database). """ # TODO(zwade): Have a way of specifying a per-user default current_user = self.get_current_user() if current_user.default_database is None: raise Exception("Trying to find the default database, but none is specified") return self.get_database_by_uid(current_user.default_database)