# Dataset
Dataset client is the interface to query your different ingests. You have multiple options to query your data:
# Entities
Returns a list of the available entities in Knolar.io
customer_name = 'demo-customer'
sdk_token = os.getenv('SDK_API_KEY')
sdk = KnolarIOSDK(customer_name, sdk_token)
results = sdk.dataset.entities()
# Select
This method returns the data without filtering for a specific period.
Example:
result = sdk.dataset.select(query_data={
"entity":"demo-ingest",
"id_name_field":"tag",
"ids":["XXXX.YYYYY.ZZZ","DDDD.EEEEE.FFFF"],
"metric_name_field":["value"],
"prefixes":[],
"start_date":"2021-04-26T00:00:00.000Z",
"end_date":"2021-04-27T11:29:26.600Z",
"fields":["count", "tag", "value"]
})
Parameters:
- query_data: (dict)
- entity: (str) The entity to be queried.
- prefixes: (list) List of prefixes which will be returned. If empty, all prefixes will be returned.
- id_name_field: (str) The tag used to filter.
- ids:: (list) The values of the tag (
id_name_field
) to be returned. - metric_name_field: (list) The name of the metric fields to be returned..
- fields: (list) Extra fields to be returned.
- start_date: (str) The earliest date to be returned in ISO 8601 format.
- end_date: (str) The latest date to be returned in ISO 8601 format.
# Last
This method returns the previous point given a point in time.
Example:
result = sdk.dataset.select(query_data={
"entity":"demo-ingest",
"id_name_field":"tag",
"ids":["01XXX.Y00000.ZZZ"],
"metric_name_field":["value"],
"prefixes":[],
"date":"2021-04-27",
"fields":["count", "description", "max", "min", "origin", "tag", "type", "units", "value", "value_str"]
})
Parameters:
- query_data: (dict)
- entity: (str) The entity to be queried.
- prefixes: (list) List of prefixes which will be returned. If empty, all prefixes will be returned.
- id_name_field: (str) The tag used to filter.
- ids: (list) The values of the tag (
id_name_field
) to be returned. - metric_name_field: (list) The name of the metric fields to be returned.
- fields: (list) Extra fields to be returned.
- offset: (int) The offset to be applied.
- start_date: (str) The earliest date to be returned in ISO 8601 format.
- end_date: (str) The latest date to be returned in ISO 8601 format.
NOTE: start_date
or end_date
can include durations (opens new window) when the other field includes a specific date.
For example:
{
"start_date": "2020-01-01T00:00:00Z",
"end_date": "PT15M"
}
will return the data between 2020-01-01T00:00:00Z
and 2020-01-01T00:15:00Z
.
# Next
This method returns the next point given a point in time.
Example:
result = sdk.dataset.next({
"entity":"demo-ingest",
"id_name_field":"tag",
"ids":["01XXX.Y00000.ZZZ"],
"metric_name_field":["value"],
"prefixes":[],
"date":"2021-04-26T11:29:26.600Z",
"offset":0,
"fields":["count", "description", "tag", "value"]
})
Parameters:
- query_data: (dict)
- entity: (str) The entity to be queried.
- prefixes: (list) List of prefixes which will be returned. If empty, all prefixes will be returned.
- id_name_field: (str) The tag used to filter.
- ids: (list) The values of the tag (
id_name_field
) to be returned. - metric_name_field: (list) The name of the metric fields to be returned.
- fields: (list) Extra fields to be returned.
- offset: (int) The offset to be applied.
- start_date: (str) The earliest date to be returned in ISO 8601 format.
- end_date: (str) The latest date to be returned in ISO 8601 format.
NOTE: start_date
or end_date
can include durations (opens new window) when the other field includes a specific date.
For example:
{
"start_date": "2020-01-01T00:00:00Z",
"end_date": "PT15M"
}
will return the data between 2020-01-01T00:00:00Z
and 2020-01-01T00:15:00Z
.
# Snap
This method is used to obtain points with a specified period and granularity. In case there is no points for the specified granularity, points are interpolated linearly.
result = sdk.dataset.snap(query_data={{
"entity":"test-realtime-david-07",
"id_name_field":"tag",
"ids":["00XXX.Y00000.ZZZ"],
"metric_name_field":["value"],
"prefixes":[],
"granularity":"day",
"start_date":"2021-04-26T00:00:00.000Z",
"end_date":"2021-04-27T11:29:26.600Z"
}
Parameters:
- query_data: (dict)
- entity: The entity to be queried.
- prefixes: List of prefixes which will be returned. If empty, all prefixes will be returned.
- id_name_field: The tag used to filter.
- ids: The values of the tag (
id_name_field
) to be returned. - metric_name_field: The name of the metric fields to be returned.
- granularity: The way to aggregate the data. You can check allowed granularities here (opens new window).
- start_date: The earliest date to be returned in ISO 8601 format.
- end_date: The latest date to be returned in ISO 8601 format.
NOTE: start_date
or end_date
can include durations (opens new window) when the other field includes a specific date.
For example:
{
"start_date": "2020-01-01T00:00:00Z",
"end_date": "PT15M"
}
will return the data between 2020-01-01T00:00:00Z
and 2020-01-01T00:15:00Z
.
# Average
This method returns the mathematical average of a metric for the range of data and periodicity requested.
result = sdk.dataset.average(query_data={
"entity":"demo-ingest",
"id_name_field":"tag",
"ids":["01XXX.Y11001.ZZZ"],
"metric_name_field":["value"],
"prefixes":[],
"granularity":"day",
"start_date":"2021-04-26T11:29:26.596Z",
"end_date":"2021-04-26T12:31:42.357Z"
})
Parameters:
- query_data: (dict)
- entity: (str) The entity to be queried.
- prefixes: (list) List of prefixes which will be returned. If empty, all prefixes will be returned.
- id_name_field: (str) The tag used to filter.
- ids: (list) The values of the tag (
id_name_field
) to be returned. - metric_name_field: (list) The name of the metric fields to be returned.
- granularity: (str) The way to aggregate the data. You can check allowed granularities here (opens new window).
- start_date: (str) The earliest date to be returned in ISO 8601 format.
- end_date: (str) The latest date to be returned in ISO 8601 format.
NOTE: start_date or end_date can include durations (opens new window) when the other field includes a specific date.
For example:
{
"start_date": "2020-01-01T00:00:00Z",
"end_date": "PT15M"
}
will return the data between 2020-01-01T00:00:00Z
and 2020-01-01T00:15:00Z
.
# Average-dimensions
This method returns the mathematical average of a metric for the range of data and periodicity requested including extra dimension fields.
result = sdk.dataset.average_dimensions(query_data={
"entity":"demo-ingest",
"id_name_field":"tag",
"ids":["01XXX.Y11001.ZZZ"],
"metric_name_field":["value"],
"prefixes":[],
"fields":["count", "description", "tag", "value"],
"start_date":"2021-04-26T11:29:26.596Z",
"end_date":"2021-04-26T12:31:42.357Z"
})
Parameters:
- query_data: (dict)
- entity: (str) The entity to be queried.
- id_name_field: (str) The tag used to filter.
- ids: (list) The values of the tag (
id_name_field
) to be returned. - metric_name_field: (list) The name of the metric fields to be returned.
- fields: (list) Extra fields to be returned.
- start_date: (str) The earliest date to be returned in ISO 8601 format.
- end_date: (str) The latest date to be returned in ISO 8601 format.
NOTE: start_date
or end_date
can include durations (opens new window) when the other field includes a specific date.
For example,
{
"start_date": "2020-01-01T00:00:00Z",
"end_date": "PT15M"
}
will return the data between 2020-01-01T00:00:00Z
and 2020-01-01T00:15:00Z
.
# Druid Native Query
You can also use Druid Native Queries (opens new window) with Knolar. Some examples are shown below.
#SCAN QUERY EXAMPLE
druid_native_query_scan_params = {
"queryType": "scan",
"dataSource": "demo-datasource",
"intervals": {
"type": "intervals",
"intervals": [
"-146136543-09-08T08:23:32.096Z/146140482-04-24T15:36:27.903Z"
]
},
"virtualColumns": [],
"resultFormat": "list",
"limit": 100,
"order": "none",
"filter": None,
"columns": [
"__time",
"tag",
"value"
],
"legacy": False,
"descending": False,
"granularity": {
"type": "all"
}
}
results = sdk.dataset.druid_native_query(druid_native_query_scan_params)
#TIMESERIES QUERY EXAMPLE
druid_native_query_timeseries_params = {
"queryType": "timeseries",
"dataSource": "demo-ingest",
"granularity": "second",
"descending": "true",
"limit": 100,
"filter": {
"type": "and",
"fields": [
{
"type": "selector",
"dimension": "tag",
"value": "03XXX.YYYY_583.ZZ"
},
{
"type": "or",
"fields": [
{
"type": "selector",
"dimension": "tag",
"value": "01XXX.Y0000.111"
}
]
}
]
},
"intervals": [
"2021-06-30T08:23:32.096Z/2021-07-02T15:36:27.903Z"
]
}
native_query_timeseries_results = sdk.dataset.druid_native_query(query_data=druid_native_query_timeseries_params)
Parameters:
- query_data: (dict) A valid native query dict
# Allowed Druid Native Queries:
- Timeseries (opens new window)
- TopN (opens new window)
- GroupBy (opens new window)
- Scan (opens new window)
- Search (opens new window)
# SQL Queries
Also you can use SQL Syntax to query your ingest data.
sql_query_params = 'Select * from "test-ingest-david-not-delete" limit 10'
results = sdk.dataset.sql_query(query_data=druid_sql_query_params)
Parameters:
- query_data: (str) A valid SQL query string.