# Dataset

Dataset client is the interface to query your different ingests. You have multiple options to query your data:

# Entities

Returns a list of the available entities in Knolar.io

    customer_name = 'demo-customer'
    sdk_token = os.getenv('SDK_API_KEY')
    sdk = KnolarIOSDK(customer_name, sdk_token)
    results = sdk.dataset.entities()

# Select

This method returns the data without filtering for a specific period.

Example:


result = sdk.dataset.select(query_data={
    "entity":"demo-ingest",
    "id_name_field":"tag",
    "ids":["XXXX.YYYYY.ZZZ","DDDD.EEEEE.FFFF"],
    "metric_name_field":["value"],
    "prefixes":[],
    "start_date":"2021-04-26T00:00:00.000Z",
    "end_date":"2021-04-27T11:29:26.600Z",
    "fields":["count", "tag", "value"]
})

Parameters:

  • query_data: (dict)
    • entity: (str) The entity to be queried.
    • prefixes: (list) List of prefixes which will be returned. If empty, all prefixes will be returned.
    • id_name_field: (str) The tag used to filter.
    • ids:: (list) The values of the tag (id_name_field) to be returned.
    • metric_name_field: (list) The name of the metric fields to be returned..
    • fields: (list) Extra fields to be returned.
    • start_date: (str) The earliest date to be returned in ISO 8601 format.
    • end_date: (str) The latest date to be returned in ISO 8601 format.

# Last

This method returns the previous point given a point in time.

Example:

result = sdk.dataset.select(query_data={
    "entity":"demo-ingest",
    "id_name_field":"tag",
    "ids":["01XXX.Y00000.ZZZ"],
    "metric_name_field":["value"],
    "prefixes":[],
    "date":"2021-04-27",
    "fields":["count", "description", "max", "min", "origin", "tag", "type", "units", "value", "value_str"]
})

Parameters:

  • query_data: (dict)
    • entity: (str) The entity to be queried.
    • prefixes: (list) List of prefixes which will be returned. If empty, all prefixes will be returned.
    • id_name_field: (str) The tag used to filter.
    • ids: (list) The values of the tag (id_name_field) to be returned.
    • metric_name_field: (list) The name of the metric fields to be returned.
    • fields: (list) Extra fields to be returned.
    • offset: (int) The offset to be applied.
    • start_date: (str) The earliest date to be returned in ISO 8601 format.
    • end_date: (str) The latest date to be returned in ISO 8601 format.

NOTE: start_date or end_date can include durations (opens new window) when the other field includes a specific date.

For example:

{ 
    "start_date": "2020-01-01T00:00:00Z",
    "end_date": "PT15M"
}

will return the data between 2020-01-01T00:00:00Z and 2020-01-01T00:15:00Z.

# Next

This method returns the next point given a point in time.

Example:

result = sdk.dataset.next({
    "entity":"demo-ingest",
    "id_name_field":"tag",
    "ids":["01XXX.Y00000.ZZZ"],
    "metric_name_field":["value"],
    "prefixes":[],
    "date":"2021-04-26T11:29:26.600Z",
    "offset":0,
    "fields":["count", "description", "tag", "value"]
})

Parameters:

  • query_data: (dict)
    • entity: (str) The entity to be queried.
    • prefixes: (list) List of prefixes which will be returned. If empty, all prefixes will be returned.
    • id_name_field: (str) The tag used to filter.
    • ids: (list) The values of the tag (id_name_field) to be returned.
    • metric_name_field: (list) The name of the metric fields to be returned.
    • fields: (list) Extra fields to be returned.
    • offset: (int) The offset to be applied.
    • start_date: (str) The earliest date to be returned in ISO 8601 format.
    • end_date: (str) The latest date to be returned in ISO 8601 format.

NOTE: start_date or end_date can include durations (opens new window) when the other field includes a specific date.

For example:

{
    "start_date": "2020-01-01T00:00:00Z", 
    "end_date": "PT15M"
}

will return the data between 2020-01-01T00:00:00Z and 2020-01-01T00:15:00Z.

# Snap

This method is used to obtain points with a specified period and granularity. In case there is no points for the specified granularity, points are interpolated linearly.

result = sdk.dataset.snap(query_data={{
        "entity":"test-realtime-david-07",
        "id_name_field":"tag",
        "ids":["00XXX.Y00000.ZZZ"],
        "metric_name_field":["value"],
        "prefixes":[],
        "granularity":"day",
        "start_date":"2021-04-26T00:00:00.000Z",
        "end_date":"2021-04-27T11:29:26.600Z"
    }

Parameters:

  • query_data: (dict)
    • entity: The entity to be queried.
    • prefixes: List of prefixes which will be returned. If empty, all prefixes will be returned.
    • id_name_field: The tag used to filter.
    • ids: The values of the tag (id_name_field) to be returned.
    • metric_name_field: The name of the metric fields to be returned.
    • granularity: The way to aggregate the data. You can check allowed granularities here (opens new window).
    • start_date: The earliest date to be returned in ISO 8601 format.
    • end_date: The latest date to be returned in ISO 8601 format.

NOTE: start_date or end_date can include durations (opens new window) when the other field includes a specific date.

For example:

{ 
     "start_date": "2020-01-01T00:00:00Z",
     "end_date": "PT15M"
}

will return the data between 2020-01-01T00:00:00Z and 2020-01-01T00:15:00Z.

# Average

This method returns the mathematical average of a metric for the range of data and periodicity requested.


result = sdk.dataset.average(query_data={
    "entity":"demo-ingest",
    "id_name_field":"tag",
    "ids":["01XXX.Y11001.ZZZ"],
    "metric_name_field":["value"],
    "prefixes":[],
    "granularity":"day",
    "start_date":"2021-04-26T11:29:26.596Z",
    "end_date":"2021-04-26T12:31:42.357Z"
})

Parameters:

  • query_data: (dict)
    • entity: (str) The entity to be queried.
    • prefixes: (list) List of prefixes which will be returned. If empty, all prefixes will be returned.
    • id_name_field: (str) The tag used to filter.
    • ids: (list) The values of the tag (id_name_field) to be returned.
    • metric_name_field: (list) The name of the metric fields to be returned.
    • granularity: (str) The way to aggregate the data. You can check allowed granularities here (opens new window).
    • start_date: (str) The earliest date to be returned in ISO 8601 format.
    • end_date: (str) The latest date to be returned in ISO 8601 format.

NOTE: start_date or end_date can include durations (opens new window) when the other field includes a specific date.

For example:

   {
       "start_date": "2020-01-01T00:00:00Z",
       "end_date": "PT15M"
   }

will return the data between 2020-01-01T00:00:00Z and 2020-01-01T00:15:00Z.

# Average-dimensions

This method returns the mathematical average of a metric for the range of data and periodicity requested including extra dimension fields.


result = sdk.dataset.average_dimensions(query_data={
    "entity":"demo-ingest",
    "id_name_field":"tag",
    "ids":["01XXX.Y11001.ZZZ"],
    "metric_name_field":["value"],
    "prefixes":[],
    "fields":["count", "description", "tag", "value"],
    "start_date":"2021-04-26T11:29:26.596Z",
    "end_date":"2021-04-26T12:31:42.357Z"
})

Parameters:

  • query_data: (dict)
    • entity: (str) The entity to be queried.
    • id_name_field: (str) The tag used to filter.
    • ids: (list) The values of the tag (id_name_field) to be returned.
    • metric_name_field: (list) The name of the metric fields to be returned.
    • fields: (list) Extra fields to be returned.
    • start_date: (str) The earliest date to be returned in ISO 8601 format.
    • end_date: (str) The latest date to be returned in ISO 8601 format.

NOTE: start_date or end_date can include durations (opens new window) when the other field includes a specific date.

For example,

{
    "start_date": "2020-01-01T00:00:00Z",
    "end_date": "PT15M"
}

will return the data between 2020-01-01T00:00:00Z and 2020-01-01T00:15:00Z.

# Druid Native Query

You can also use Druid Native Queries (opens new window) with Knolar. Some examples are shown below.

    #SCAN QUERY EXAMPLE
    druid_native_query_scan_params = {
        "queryType": "scan",
        "dataSource": "demo-datasource",
        "intervals": {
            "type": "intervals",
            "intervals": [
                "-146136543-09-08T08:23:32.096Z/146140482-04-24T15:36:27.903Z"
            ]
        },
        "virtualColumns": [],
        "resultFormat": "list",
        "limit": 100,
        "order": "none",
        "filter": None,
        "columns": [
            "__time",
            "tag",
            "value"
        ],
        "legacy": False,
        "descending": False,
        "granularity": {
            "type": "all"
        }
    }
 
    results = sdk.dataset.druid_native_query(druid_native_query_scan_params)


    #TIMESERIES QUERY EXAMPLE
    druid_native_query_timeseries_params = {
        "queryType": "timeseries",
        "dataSource": "demo-ingest",
        "granularity": "second",
        "descending": "true",
        "limit": 100,
        "filter": {
            "type": "and",
            "fields": [
                {
                    "type": "selector",
                    "dimension": "tag",
                    "value": "03XXX.YYYY_583.ZZ"
                },
                {
                    "type": "or",
                    "fields": [
                        {
                            "type": "selector",
                            "dimension": "tag",
                            "value": "01XXX.Y0000.111"
                        }
                    ]
                }
            ]
        },
        "intervals": [
            "2021-06-30T08:23:32.096Z/2021-07-02T15:36:27.903Z"
        ]
    }

    native_query_timeseries_results = sdk.dataset.druid_native_query(query_data=druid_native_query_timeseries_params)

Parameters:

  • query_data: (dict) A valid native query dict

# Allowed Druid Native Queries:

# SQL Queries

Also you can use SQL Syntax to query your ingest data.


sql_query_params = 'Select * from "test-ingest-david-not-delete" limit 10'

results = sdk.dataset.sql_query(query_data=druid_sql_query_params)

Parameters:

  • query_data: (str) A valid SQL query string.