Discussion:
[influxdb] select substring of a tag's value, for geohash-based spatial aggregation?
Sam Hatchett
2016-09-26 21:01:38 UTC
Permalink
In trying to combine InfluxDB with Grafana's World Map plugin for
displaying time series data spatially, I've identified the need to
spatially subsample because I just have too many geographical points to
display all at once - too much lag time to return the result set. The
location's geohash provides a great way to do this (encoded as a tag value,
so I can group-by geohash to return some aggregate of potentially hundreds
of location's values), and then I can select the max, mean, min, etc., for
each spatial region, but the level of spatial aggregation has to be decided
prior to writing data to Influx.

What I'm wondering about are the relative merits of workarounds, or of
finding some way to decide on the geohash length when the query is executed.

So for instance, have multiple tags: (geo8, geo9, geo10 ...) for each
series so that I can decide on spatial subsampling when querying and choose
the right length:
select "value" from "temperature" where ... group by "geo8"

But what I really want to do is store the full 12-character geohash for
each location and group by a substring of that geohash:
select "value" from "temperature" where ... group by substr("geohash",0,8)

It's also useful to note that I can currently limit the locations that are
queried by using a regex:
select "value" from "temperature" where "geohash" =~ /^gbsuv7s/

... but can't use a substring of the hash to do the group-by, which would
take me the rest of the way there.

Any thoughts or comments would be most appreciated. I don't think that
there's a really straightforward solution right now, but does using a
substring (or other arbitrary transformation) of a tag value for grouping
make sense to anyone else? Seems like there could be many more uses of this
type of approach.
--
Remember to include the InfluxDB version number with all issue reports
---
You received this message because you are subscribed to the Google Groups "InfluxDB" group.
To unsubscribe from this group and stop receiving emails from it, send an email to influxdb+***@googlegroups.com.
To post to this group, send email to ***@googlegroups.com.
Visit this group at https://groups.google.com/group/influxdb.
To view this discussion on the web visit https://groups.google.com/d/msgid/influxdb/2ea6b448-16a6-45e9-85c1-afff36b1a4d4%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Sam Hatchett
2017-01-26 16:57:11 UTC
Permalink
Sorry, I seem to have let this issue languish. However I see in the v1.2
docs that the GROUP BY sections states: "Other supported features: Regular
Expressions" - which is wonderful. However I can't see any illustrative
examples for using this new feature, and I cannot find its mention in the
changelog - wonder if anyone can post some example queries to illustrate
how this works.

Thanks!
I think grouping by a regex of a tag value does make sense, and could be a
powerful tool. While a SUBSTRING() function would also perhaps work,
functions aren't currently supported outside the SELECT clause, there are
no text manipulation functions, and functions are only valid on field
values, not tags. So the regex approach is much less overall change for the
system.
Can you open a feature request
<https://github.com/influxdata/influxdb/issues/new> describing your use
case?
Post by Sam Hatchett
In trying to combine InfluxDB with Grafana's World Map plugin for
displaying time series data spatially, I've identified the need to
spatially subsample because I just have too many geographical points to
display all at once - too much lag time to return the result set. The
location's geohash provides a great way to do this (encoded as a tag value,
so I can group-by geohash to return some aggregate of potentially hundreds
of location's values), and then I can select the max, mean, min, etc., for
each spatial region, but the level of spatial aggregation has to be decided
prior to writing data to Influx.
What I'm wondering about are the relative merits of workarounds, or of
finding some way to decide on the geohash length when the query is executed.
So for instance, have multiple tags: (geo8, geo9, geo10 ...) for each
series so that I can decide on spatial subsampling when querying and choose
select "value" from "temperature" where ... group by "geo8"
But what I really want to do is store the full 12-character geohash for
select "value" from "temperature" where ... group by substr("geohash",0,8
)
It's also useful to note that I can currently limit the locations that
select "value" from "temperature" where "geohash" =~ /^gbsuv7s/
... but can't use a substring of the hash to do the group-by, which would
take me the rest of the way there.
Any thoughts or comments would be most appreciated. I don't think that
there's a really straightforward solution right now, but does using a
substring (or other arbitrary transformation) of a tag value for grouping
make sense to anyone else? Seems like there could be many more uses of this
type of approach.
--
Remember to include the InfluxDB version number with all issue reports
---
You received this message because you are subscribed to the Google Groups
"InfluxDB" group.
To unsubscribe from this group and stop receiving emails from it, send an
<javascript:>.
Visit this group at https://groups.google.com/group/influxdb.
To view this discussion on the web visit
https://groups.google.com/d/msgid/influxdb/2ea6b448-16a6-45e9-85c1-afff36b1a4d4%40googlegroups.com
<https://groups.google.com/d/msgid/influxdb/2ea6b448-16a6-45e9-85c1-afff36b1a4d4%40googlegroups.com?utm_medium=email&utm_source=footer>
.
For more options, visit https://groups.google.com/d/optout.
--
Sean Beckett
Director of Support and Professional Services
InfluxDB
--
Remember to include the version number!
---
You received this message because you are subscribed to the Google Groups "InfluxData" group.
To unsubscribe from this group and stop receiving emails from it, send an email to influxdb+***@googlegroups.com.
To post to this group, send email to ***@googlegroups.com.
Visit this group at https://groups.google.com/group/influxdb.
To view this discussion on the web visit https://groups.google.com/d/msgid/influxdb/3ad70437-3f8d-425c-8dbc-de6b641cc206%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
m***@gmail.com
2017-06-22 18:33:28 UTC
Permalink
Is there a docs on how to use the SUBSTRING() function? I am trying to get the substring of a key that I need to split on a delimiter.
--
Remember to include the version number!
---
You received this message because you are subscribed to the Google Groups "InfluxData" group.
To unsubscribe from this group and stop receiving emails from it, send an email to influxdb+***@googlegroups.com.
To post to this group, send email to ***@googlegroups.com.
Visit this group at https://groups.google.com/group/influxdb.
To view this discussion on the web visit https://groups.google.com/d/msgid/influxdb/054c0c77-37da-4d00-97f5-6d92a60a3da9%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Frank Inselbuch
2017-06-23 09:28:33 UTC
Permalink
It would be great if the query language would embrace more of the SQL92
standard (which includes substring function, for example).

https://en.wikipedia.org/wiki/SQL-92
Post by Sam Hatchett
In trying to combine InfluxDB with Grafana's World Map plugin for
displaying time series data spatially, I've identified the need to
spatially subsample because I just have too many geographical points to
display all at once - too much lag time to return the result set. The
location's geohash provides a great way to do this (encoded as a tag value,
so I can group-by geohash to return some aggregate of potentially hundreds
of location's values), and then I can select the max, mean, min, etc., for
each spatial region, but the level of spatial aggregation has to be decided
prior to writing data to Influx.
What I'm wondering about are the relative merits of workarounds, or of
finding some way to decide on the geohash length when the query is executed.
So for instance, have multiple tags: (geo8, geo9, geo10 ...) for each
series so that I can decide on spatial subsampling when querying and choose
select "value" from "temperature" where ... group by "geo8"
But what I really want to do is store the full 12-character geohash for
select "value" from "temperature" where ... group by substr("geohash",0,8)
It's also useful to note that I can currently limit the locations that are
select "value" from "temperature" where "geohash" =~ /^gbsuv7s/
... but can't use a substring of the hash to do the group-by, which would
take me the rest of the way there.
Any thoughts or comments would be most appreciated. I don't think that
there's a really straightforward solution right now, but does using a
substring (or other arbitrary transformation) of a tag value for grouping
make sense to anyone else? Seems like there could be many more uses of this
type of approach.
--
Remember to include the version number!
---
You received this message because you are subscribed to the Google Groups "InfluxData" group.
To unsubscribe from this group and stop receiving emails from it, send an email to influxdb+***@googlegroups.com.
To post to this group, send email to ***@googlegroups.com.
Visit this group at https://groups.google.com/group/influxdb.
To view this discussion on the web visit https://groups.google.com/d/msgid/influxdb/7dd812f1-c1b9-4bf5-99c8-650caa0889a1%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Loading...