[influxdb] Minimum Downtime EC2 Instance Change Procedure

Discussion:

Mark Bell

2017-03-23 01:31:23 UTC

I'm looking for the recommended way to change the underlying EC2 instance
for an InfluxDB server, in particular to move to a larger instance size.

Background:

* Data is stored on an EBS gp2 volume
* Influx data is ~150GB on disk, ~890k series
* InfluxDB 1.1.1 (can upgrade to 1.2.x prior to migration if it helps)

Temporary downtime on historical data availability is fine, but I'd like to
avoid as much downtime for data collection as I can, some downtime is
likely unavoidable but I'd rather it be more like 15 minutes than 2 hours.

In an ideal world, I'd setup the new node, update all reporters to point to
the new node, then backfill all data from the old node; it's this final
step I'm unclear on. It's not really mentioned in the backup / restore docs
what would happen if I restored data to a node that already has data in it
for the same database (does it destroy it?).

There are a few other ways I can think of to accomplish this, but I was
hoping someone has gone through a similar process at similar data sizes and
could provide some insight. I'll do a cold run of this from EBS snapshots
regardless but I'd like to avoid traveling too far down the wrong path in
testing approaches.

Thanks in advance for any guidance.

Cheers,

Mark

--
Remember to include the version number!
---
You received this message because you are subscribed to the Google Groups "InfluxData" group.
To unsubscribe from this group and stop receiving emails from it, send an email to influxdb+***@googlegroups.com.
To post to this group, send email to ***@googlegroups.com.
Visit this group at https://groups.google.com/group/influxdb.
To view this discussion on the web visit https://groups.google.com/d/msgid/influxdb/e3eb3c69-fd23-4932-a298-9e2f9e069fe1%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Mark Bell

2017-04-06 14:26:21 UTC

Permalink

Hi Jack,

I wasn't aware of the new community site, will direct there in the future.

This isn't about resizing the disk, but rather changing the EC2 instance.
e.g. moving from m2.2xlarge -> c4.4xlarge.

It seems that the best 'no downtime for writes' approach to this is to set
up a new clean node, redirect all writers to it, then backfill the data
from the old node. It's the backfill part I'm unclear on. The restore code
seems to not clean out the target database, but fails if shard ids collide,
so it's unclear if using backup/restore to a database with pre-existing
data like this is intended or just a side effect of it not cleaning out the
target database.

I think the main question is if there is a better approach to a migration
like this? Particularly if there is a preferred way given the size / series
count of the database.

Mark,
I think AWS has elastic resize
<https://aws.amazon.com/blogs/aws/amazon-ebs-update-new-elastic-volumes-change-everything/> on
EBS volumes now so you can do this w/o downtime. Does that help? In the
future, please ask questions like this over on our community site
<https://community.influxdata.com>.

Post by Mark Bell
I'm looking for the recommended way to change the underlying EC2 instance
for an InfluxDB server, in particular to move to a larger instance size.
* Data is stored on an EBS gp2 volume
* Influx data is ~150GB on disk, ~890k series
* InfluxDB 1.1.1 (can upgrade to 1.2.x prior to migration if it helps)
Temporary downtime on historical data availability is fine, but I'd like
to avoid as much downtime for data collection as I can, some downtime is
likely unavoidable but I'd rather it be more like 15 minutes than 2 hours.
In an ideal world, I'd setup the new node, update all reporters to point
to the new node, then backfill all data from the old node; it's this final
step I'm unclear on. It's not really mentioned in the backup / restore docs
what would happen if I restored data to a node that already has data in it
for the same database (does it destroy it?).
There are a few other ways I can think of to accomplish this, but I was
hoping someone has gone through a similar process at similar data sizes and
could provide some insight. I'll do a cold run of this from EBS snapshots
regardless but I'd like to avoid traveling too far down the wrong path in
testing approaches.
Thanks in advance for any guidance.
Cheers,
Mark

--
Remember to include the version number!
---
You received this message because you are subscribed to the Google Groups "InfluxData" group.
To unsubscribe from this group and stop receiving emails from it, send an email to influxdb+***@googlegroups.com.
To post to this group, send email to ***@googlegroups.com.
Visit this group at https://groups.google.com/group/influxdb.
To view this discussion on the web visit https://groups.google.com/d/msgid/influxdb/622ba529-a573-4cba-a34e-0bedd483868d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

j***@gmail.com

2017-04-07 18:53:12 UTC

Permalink

Mark,

What I would do is setup the new server, then switch the DNS or routing to direct clients to the new setup and use influx_inspect1 and influx -import to migrate the data to the new server from the old one.

Do you have any uptime requirements?

Post by Mark Bell
Hi Jack,
I wasn't aware of the new community site, will direct there in the future.
This isn't about resizing the disk, but rather changing the EC2 instance. e.g. moving from m2.2xlarge -> c4.4xlarge.Â
It seems that the best 'no downtime for writes' approach to this is to set up a new clean node, redirect all writers to it, then backfill the data from the old node. It's the backfill part I'm unclear on. The restore code seems to not clean out the target database, but fails if shard ids collide, so it's unclear if using backup/restore to a database with pre-existing data like this is intended or just a side effect of it not cleaning out the target database.Â
I think the main question is if there is a better approach to a migration like this? Particularly if there is a preferred way given the size / series count of the database.
Mark,
I think AWS has elastic resizeÂ on EBS volumes now so you can do this w/o downtime. Does that help? In the future, please ask questions like this over on our community site.
I'm looking for the recommended way to change the underlying EC2 instance for an InfluxDB server, in particular to move to a larger instance size.
* Data is stored on an EBS gp2 volume
* Influx data is ~150GB on disk, ~890k series
* InfluxDB 1.1.1 (can upgrade to 1.2.x prior to migration if it helps)
Temporary downtime on historical data availability is fine, but I'd like to avoid as much downtime for data collection as I can, some downtime is likely unavoidable but I'd rather it be more like 15 minutes than 2 hours.
In an ideal world, I'd setup the new node, update all reporters to point to the new node, then backfill all data from the old node; it's this final step I'm unclear on. It's not really mentioned in the backup / restore docs what would happen if I restored data to a node that already has data in it for the same database (does it destroy it?).
There are a few other ways I can think of to accomplish this, but I was hoping someone has gone through a similar process at similar data sizes and could provide some insight. I'll do a cold run of this from EBS snapshots regardless but I'd like to avoid traveling too far down the wrong path in testing approaches.Â
Thanks in advance for any guidance.
Cheers,
Mark

--
Remember to include the version number!
---
You received this message because you are subscribed to the Google Groups "InfluxData" group.
To unsubscribe from this group and stop receiving emails from it, send an email to influxdb+***@googlegroups.com.
To post to this group, send email to ***@googlegroups.com.
Visit this group at https://groups.google.com/group/influxdb.
To view this discussion on the web visit https://groups.google.com/d/msgid/influxdb/b994cf58-811b-41f6-b49b-bf300d34c6f2%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.