|
|
# Geo Troubleshooting **[PREMIUM ONLY]**
|
|
# Geo Troubleshooting **[PREMIUM ONLY]**
|
|
|
|
|
|
|
|
NOTE: **Note:**
|
|
|
|
|
This list is an attempt to document all the moving parts that can go wrong.
|
|
|
|
|
We are working into getting all this steps verified automatically in a
|
|
|
|
|
rake task in the future.
|
|
|
|
|
|
|
|
|
|
Setting up Geo requires careful attention to details and sometimes it's easy to
|
|
Setting up Geo requires careful attention to details and sometimes it's easy to
|
|
|
miss a step. Here is a list of questions you should ask to try to detect
|
|
miss a step.
|
|
|
what you need to fix (all commands and path locations are for Omnibus installs):
|
|
|
|
|
|
Here is a list of steps you should take to attempt to fix problem:
|
|
|
|
|
|
|
|
- Perform [basic troubleshooting](#basic-troubleshooting).
|
|
|
|
- Fix any [replication errors](#fixing-replication-errors).
|
|
|
|
- Fix any [Foreign Data Wrapper](#fixing-foreign-data-wrapper-errors) errors.
|
|
|
|
- Fix any [common](#fixing-common-errors) errors.
|
|
|
|
|
|
|
|
## First check the health of the **secondary** node
|
|
## Basic troubleshooting
|
|
|
|
|
|
|
|
Before attempting more advanced troubleshooting:
|
|
|
|
|
|
|
|
- Check [the health of the **secondary** node](#check-the-health-of-the-secondary-node).
|
|
|
|
- Check [if PostgreSQL replication is working](#check-if-postgresql-replication-is-working).
|
|
|
|
|
|
|
|
### Check the health of the **secondary** node
|
|
|
|
|
|
|
|
Visit the **primary** node's **Admin Area > Geo** (`/admin/geo/nodes`) in
|
|
Visit the **primary** node's **Admin Area > Geo** (`/admin/geo/nodes`) in
|
|
|
your browser. We perform the following health checks on each **secondary** node
|
|
your browser. We perform the following health checks on each **secondary** node
|
| ... | @@ -23,10 +31,12 @@ to help identify if something is wrong: |
... | @@ -23,10 +31,12 @@ to help identify if something is wrong: |
|
|
|
|
|
|
|

|
|

|
|
|
|
|
|
|
|
For information on how to resolve common errors reported from the UI, see [common errors](#common-errors).
|
|
For information on how to resolve common errors reported from the UI, see
|
|
|
|
[Fixing Common Errors](#fixing-common-errors).
|
|
|
|
|
|
|
|
If the UI is not working, or you are unable to log in, you can run the Geo
|
|
If the UI is not working, or you are unable to log in, you can run the Geo
|
|
|
health check manually to get this information as well as a few more details.
|
|
health check manually to get this information as well as a few more details.
|
|
|
|
|
|
|
This rake task can be run on an app node in the **primary** or **secondary**
|
|
This rake task can be run on an app node in the **primary** or **secondary**
|
|
|
Geo nodes:
|
|
Geo nodes:
|
|
|
|
|
|
| ... | @@ -36,7 +46,7 @@ sudo gitlab-rake gitlab:geo:check |
... | @@ -36,7 +46,7 @@ sudo gitlab-rake gitlab:geo:check |
|
|
|
|
|
|
|
Example output:
|
|
Example output:
|
|
|
|
|
|
|
|
```
|
|
```text
|
|
|
Checking Geo ...
|
|
Checking Geo ...
|
|
|
|
|
|
|
|
GitLab Geo is available ... yes
|
|
GitLab Geo is available ... yes
|
| ... | @@ -68,7 +78,7 @@ sudo gitlab-rake geo:status |
... | @@ -68,7 +78,7 @@ sudo gitlab-rake geo:status |
|
|
|
|
|
|
|
Example output:
|
|
Example output:
|
|
|
|
|
|
|
|
```
|
|
```text
|
|
|
http://secondary.example.com/
|
|
http://secondary.example.com/
|
|
|
-----------------------------------------------------
|
|
-----------------------------------------------------
|
|
|
GitLab Version: 11.10.4-ee
|
|
GitLab Version: 11.10.4-ee
|
| ... | @@ -89,16 +99,21 @@ http://secondary.example.com/ |
... | @@ -89,16 +99,21 @@ http://secondary.example.com/ |
|
|
Last status report was: 2 minutes ago
|
|
Last status report was: 2 minutes ago
|
|
|
```
|
|
```
|
|
|
|
|
|
|
|
## Is Postgres replication working?
|
|
### Check if PostgreSQL replication is working
|
|
|
|
|
|
|
|
To check if PostgreSQL replication is working, check if:
|
|
|
|
|
|
|
|
- [Nodes are pointing to the correct database instance](#are-nodes-pointing-to-the-correct-database-instance).
|
|
|
|
- [Geo can detect the current node correctly](#can-geo-detect-the-current-node-correctly).
|
|
|
|
|
|
|
|
### Are my nodes pointing to the correct database instance?
|
|
#### Are nodes pointing to the correct database instance?
|
|
|
|
|
|
|
|
You should make sure your **primary** Geo node points to the instance with
|
|
You should make sure your **primary** Geo node points to the instance with
|
|
|
writing permissions.
|
|
writing permissions.
|
|
|
|
|
|
|
|
Any **secondary** nodes should point only to read-only instances.
|
|
Any **secondary** nodes should point only to read-only instances.
|
|
|
|
|
|
|
|
### Can Geo detect my current node correctly?
|
|
#### Can Geo detect the current node correctly?
|
|
|
|
|
|
|
|
Geo uses the defined node from the **Admin Area > Geo** screen, and tries to match
|
|
Geo uses the defined node from the **Admin Area > Geo** screen, and tries to match
|
|
|
it with the value defined in the `/etc/gitlab/gitlab.rb` configuration file.
|
|
it with the value defined in the `/etc/gitlab/gitlab.rb` configuration file.
|
| ... | @@ -112,29 +127,38 @@ sudo gitlab-rails runner "puts Gitlab::Geo.current_node.inspect" |
... | @@ -112,29 +127,38 @@ sudo gitlab-rails runner "puts Gitlab::Geo.current_node.inspect" |
|
|
|
|
|
|
|
and expect something like:
|
|
and expect something like:
|
|
|
|
|
|
|
|
```
|
|
```ruby
|
|
|
#<GeoNode id: 2, schema: "https", host: "gitlab.example.com", port: 443, relative_url_root: "", primary: false, ...>
|
|
#<GeoNode id: 2, schema: "https", host: "gitlab.example.com", port: 443, relative_url_root: "", primary: false, ...>
|
|
|
```
|
|
```
|
|
|
|
|
|
|
|
By running the command above, `primary` should be `true` when executed in
|
|
By running the command above, `primary` should be `true` when executed in
|
|
|
the **primary** node, and `false` on any **secondary** node.
|
|
the **primary** node, and `false` on any **secondary** node.
|
|
|
|
|
|
|
|
## How do I fix the message, "ERROR: replication slots can only be used if max_replication_slots > 0"?
|
|
## Fixing replication errors
|
|
|
|
|
|
|
|
The following sections outline troubleshooting steps for fixing replication
|
|
|
|
errors.
|
|
|
|
|
|
|
|
### Message: "ERROR: replication slots can only be used if max_replication_slots > 0"?
|
|
|
|
|
|
|
|
This means that the `max_replication_slots` PostgreSQL variable needs to
|
|
This means that the `max_replication_slots` PostgreSQL variable needs to
|
|
|
be set on the **primary** database. In GitLab 9.4, we have made this setting
|
|
be set on the **primary** database. In GitLab 9.4, we have made this setting
|
|
|
default to 1. You may need to increase this value if you have more
|
|
default to 1. You may need to increase this value if you have more
|
|
|
**secondary** nodes. Be sure to restart PostgreSQL for this to take
|
|
**secondary** nodes.
|
|
|
|
|
|
|
|
Be sure to restart PostgreSQL for this to take
|
|
|
effect. See the [PostgreSQL replication
|
|
effect. See the [PostgreSQL replication
|
|
|
setup][database-pg-replication] guide for more details.
|
|
setup][database-pg-replication] guide for more details.
|
|
|
|
|
|
|
|
## How do I fix the message, "FATAL: could not start WAL streaming: ERROR: replication slot "geo_secondary_my_domain_com" does not exist"?
|
|
### Message: "FATAL: could not start WAL streaming: ERROR: replication slot "geo_secondary_my_domain_com" does not exist"?
|
|
|
|
|
|
|
|
This occurs when PostgreSQL does not have a replication slot for the
|
|
This occurs when PostgreSQL does not have a replication slot for the
|
|
|
**secondary** node by that name. You may want to rerun the [replication
|
|
**secondary** node by that name.
|
|
|
|
|
|
|
|
You may want to rerun the [replication
|
|
|
process](database.md) on the **secondary** node .
|
|
process](database.md) on the **secondary** node .
|
|
|
|
|
|
|
|
## How do I fix the message, "Command exceeded allowed execution time" when setting up replication?
|
|
### Message: "Command exceeded allowed execution time" when setting up replication?
|
|
|
|
|
|
|
|
This may happen while [initiating the replication process][database-start-replication] on the **secondary** node,
|
|
This may happen while [initiating the replication process][database-start-replication] on the **secondary** node,
|
|
|
and indicates that your initial dataset is too large to be replicated in the default timeout (30 minutes).
|
|
and indicates that your initial dataset is too large to be replicated in the default timeout (30 minutes).
|
| ... | @@ -153,7 +177,7 @@ sudo gitlab-ctl \ |
... | @@ -153,7 +177,7 @@ sudo gitlab-ctl \ |
|
|
This will give the initial replication up to six hours to complete, rather than
|
|
This will give the initial replication up to six hours to complete, rather than
|
|
|
the default thirty minutes. Adjust as required for your installation.
|
|
the default thirty minutes. Adjust as required for your installation.
|
|
|
|
|
|
|
|
## How do I fix the message, "PANIC: could not write to file 'pg_xlog/xlogtemp.123': No space left on device"
|
|
### Message: "PANIC: could not write to file 'pg_xlog/xlogtemp.123': No space left on device"
|
|
|
|
|
|
|
|
Determine if you have any unused replication slots in the **primary** database. This can cause large amounts of
|
|
Determine if you have any unused replication slots in the **primary** database. This can cause large amounts of
|
|
|
log data to build up in `pg_xlog`. Removing the unused slots can reduce the amount of space used in the `pg_xlog`.
|
|
log data to build up in `pg_xlog`. Removing the unused slots can reduce the amount of space used in the `pg_xlog`.
|
| ... | @@ -184,11 +208,12 @@ Slots where `active` is `f` are not active. |
... | @@ -184,11 +208,12 @@ Slots where `active` is `f` are not active. |
|
|
SELECT pg_drop_replication_slot('<name_of_extra_slot>');
|
|
SELECT pg_drop_replication_slot('<name_of_extra_slot>');
|
|
|
```
|
|
```
|
|
|
|
|
|
|
|
## Very large repositories never successfully synchronize on the **secondary** node
|
|
### Very large repositories never successfully synchronize on the **secondary** node
|
|
|
|
|
|
|
|
GitLab places a timeout on all repository clones, including project imports
|
|
GitLab places a timeout on all repository clones, including project imports
|
|
|
and Geo synchronization operations. If a fresh `git clone` of a repository
|
|
and Geo synchronization operations. If a fresh `git clone` of a repository
|
|
|
on the primary takes more than a few minutes, you may be affected by this.
|
|
on the primary takes more than a few minutes, you may be affected by this.
|
|
|
|
|
|
|
To increase the timeout, add the following line to `/etc/gitlab/gitlab.rb`
|
|
To increase the timeout, add the following line to `/etc/gitlab/gitlab.rb`
|
|
|
on the **secondary** node:
|
|
on the **secondary** node:
|
|
|
|
|
|
| ... | @@ -205,7 +230,7 @@ sudo gitlab-ctl reconfigure |
... | @@ -205,7 +230,7 @@ sudo gitlab-ctl reconfigure |
|
|
This will increase the timeout to three hours (10800 seconds). Choose a time
|
|
This will increase the timeout to three hours (10800 seconds). Choose a time
|
|
|
long enough to accommodate a full clone of your largest repositories.
|
|
long enough to accommodate a full clone of your largest repositories.
|
|
|
|
|
|
|
|
## How to reset Geo **secondary** node replication
|
|
### Reseting Geo **secondary** node replication
|
|
|
|
|
|
|
|
If you get a **secondary** node in a broken state and want to reset the replication state,
|
|
If you get a **secondary** node in a broken state and want to reset the replication state,
|
|
|
to start again from scratch, there are a few steps that can help you:
|
|
to start again from scratch, there are a few steps that can help you:
|
| ... | @@ -289,12 +314,16 @@ to start again from scratch, there are a few steps that can help you: |
... | @@ -289,12 +314,16 @@ to start again from scratch, there are a few steps that can help you: |
|
|
gitlab-ctl start
|
|
gitlab-ctl start
|
|
|
```
|
|
```
|
|
|
|
|
|
|
|
## How do I fix a "Foreign Data Wrapper (FDW) is not configured" error?
|
|
## Fixing Foreign Data Wrapper errors
|
|
|
|
|
|
|
|
This section documents ways to fix potential Foreign Data Wrapper errors.
|
|
|
|
|
|
|
|
### "Foreign Data Wrapper (FDW) is not configured" error
|
|
|
|
|
|
|
|
When setting up Geo, you might see this warning in the `gitlab-rake
|
|
When setting up Geo, you might see this warning in the `gitlab-rake
|
|
|
gitlab:geo:check` output:
|
|
gitlab:geo:check` output:
|
|
|
|
|
|
|
|
```
|
|
```text
|
|
|
GitLab Geo tracking database Foreign Data Wrapper schema is up-to-date? ... foreign data wrapper is not configured
|
|
GitLab Geo tracking database Foreign Data Wrapper schema is up-to-date? ... foreign data wrapper is not configured
|
|
|
```
|
|
```
|
|
|
|
|
|
| ... | @@ -307,7 +336,7 @@ There are a few key points to remember: |
... | @@ -307,7 +336,7 @@ There are a few key points to remember: |
|
|
By default, the Geo secondary and tracking database are running on the
|
|
By default, the Geo secondary and tracking database are running on the
|
|
|
same host on different ports. That is, 5432 and 5431 respectively.
|
|
same host on different ports. That is, 5432 and 5431 respectively.
|
|
|
|
|
|
|
|
### Checking configuration
|
|
#### Checking configuration
|
|
|
|
|
|
|
|
NOTE: **Note:**
|
|
NOTE: **Note:**
|
|
|
The following steps are for Omnibus installs only. Using Geo with source-based installs was **deprecated** in GitLab 11.5.
|
|
The following steps are for Omnibus installs only. Using Geo with source-based installs was **deprecated** in GitLab 11.5.
|
| ... | @@ -419,7 +448,7 @@ should see something like this: |
... | @@ -419,7 +448,7 @@ should see something like this: |
|
|
- `geo_postgresql['fdw_external_user']`
|
|
- `geo_postgresql['fdw_external_user']`
|
|
|
- `geo_postgresql['fdw_external_password']`
|
|
- `geo_postgresql['fdw_external_password']`
|
|
|
|
|
|
|
|
### Manual reload of FDW schema
|
|
#### Manual reload of FDW schema
|
|
|
|
|
|
|
|
If you're still unable to get FDW working, you may want to try a manual
|
|
If you're still unable to get FDW working, you may want to try a manual
|
|
|
reload of the FDW schema. To manually reload the FDW schema:
|
|
reload of the FDW schema. To manually reload the FDW schema:
|
| ... | @@ -459,9 +488,25 @@ reload of the FDW schema. To manually reload the FDW schema: |
... | @@ -459,9 +488,25 @@ reload of the FDW schema. To manually reload the FDW schema: |
|
|
[database-start-replication]: database.md#step-3-initiate-the-replication-process
|
|
[database-start-replication]: database.md#step-3-initiate-the-replication-process
|
|
|
[database-pg-replication]: database.md#postgresql-replication
|
|
[database-pg-replication]: database.md#postgresql-replication
|
|
|
|
|
|
|
|
## Common errors
|
|
### "Geo database has an outdated FDW remote schema" error
|
|
|
|
|
|
|
|
GitLab can error with a `Geo database has an outdated FDW remote schema` message.
|
|
|
|
|
|
|
|
For example:
|
|
|
|
|
|
|
|
This section documents common errors reported in the admin UI and how to fix them.
|
|
```text
|
|
|
|
Geo database has an outdated FDW remote schema. It contains 229 of 236 expected tables. Please refer to Geo Troubleshooting.
|
|
|
|
```
|
|
|
|
|
|
|
|
To resolve this, run the following command:
|
|
|
|
|
|
|
|
```sh
|
|
|
|
sudo gitlab-rake geo:db:refresh_foreign_tables
|
|
|
|
```
|
|
|
|
|
|
|
|
## Fixing common errors
|
|
|
|
|
|
|
|
This section documents common errors reported in the Admin UI and how to fix them.
|
|
|
|
|
|
|
|
### Geo database configuration file is missing
|
|
### Geo database configuration file is missing
|
|
|
|
|
|
| ... | @@ -470,7 +515,6 @@ GitLab cannot find or doesn't have permission to access the `database_geo.yml` c |
... | @@ -470,7 +515,6 @@ GitLab cannot find or doesn't have permission to access the `database_geo.yml` c |
|
|
In an Omnibus GitLab installation, the file should be in `/var/opt/gitlab/gitlab-rails/etc`.
|
|
In an Omnibus GitLab installation, the file should be in `/var/opt/gitlab/gitlab-rails/etc`.
|
|
|
If it doesn't exist or inadvertent changes have been made to it, run `sudo gitlab-ctl reconfigure` to restore it to its correct state.
|
|
If it doesn't exist or inadvertent changes have been made to it, run `sudo gitlab-ctl reconfigure` to restore it to its correct state.
|
|
|
|
|
|
|
|
|
|
|
|
|
If this path is mounted on a remote volume, please check your volume configuration and that it has correct permissions.
|
|
If this path is mounted on a remote volume, please check your volume configuration and that it has correct permissions.
|
|
|
|
|
|
|
|
### Geo node has a database that is writable which is an indication it is not configured for replication with the primary node.
|
|
### Geo node has a database that is writable which is an indication it is not configured for replication with the primary node.
|
| ... | @@ -511,4 +555,4 @@ If you are using GitLab Omnibus installation, something might have failed during |
... | @@ -511,4 +555,4 @@ If you are using GitLab Omnibus installation, something might have failed during |
|
|
This error means the Geo Tracking Database doesn't have the FDW server and credentials
|
|
This error means the Geo Tracking Database doesn't have the FDW server and credentials
|
|
|
configured.
|
|
configured.
|
|
|
|
|
|
|
|
See [How do I fix a "Foreign Data Wrapper (FDW) is not configured" error?](#how-do-i-fix-a-foreign-data-wrapper-fdw-is-not-configured-error). |
|
See ["Foreign Data Wrapper (FDW) is not configured" error?](#foreign-data-wrapper-fdw-is-not-configured-error). |