QAN issues¶
This section focuses on problems with QAN, such as queries not being retrieved.
Missing data¶
Why don’t I see any query-related information?¶
There might be multiple places where the problem might come from:
- connection problem between pmm-agent and pmm-managed
- PMM-agent cannot connect to the database.
- data source is not properly configured.
Why don’t I see the whole query?¶
Long query examples and fingerprints are truncated to 2048 symbols by default to reduce space usage. In this case, the Explain section will not work. Max query size can be configured using flag --max-query-length while adding a service.
Incorrect metrics: unrealistic query execution times¶
If you’re seeing query execution times that seem impossible (like 50,000+ seconds for simple SELECT statements), this is typically caused by metric calculation errors rather than actual performance issues.
This is because enabling query plans causes pg_stat_monitor to create multiple records for each query, leading to incorrect timing calculations.
To fix the issue, disable query plan collection:
-- Check if query plan collection is enabled
SHOW pg_stat_monitor.pgsm_enable_query_plan;
-- If it shows 'on', disable it
ALTER SYSTEM SET pg_stat_monitor.pgsm_enable_query_plan = off;
SELECT pg_reload_conf();
-- Verify the change took effect
SHOW pg_stat_monitor.pgsm_enable_query_plan;
After disabling query plan collection, new metrics should show realistic execution times within minutes.
QAN service fails after upgrade¶
After upgrading PMM Server, the QAN service may fail to start with BACKOFF, FATAL, or EXITED status, preventing the QAN dashboard from loading. You’ll see the following error in /srv/logs/qan-api2.log, where x is the migration version number:
stdlog: Migrations: Dirty database version x. Fix and force version.
This happens when the ClickHouse schema migration is interrupted during the upgrade.
Resolution¶
- PMM 3.5.0 and later: The issue is fixed automatically. PMM detects and completes the interrupted schema migration upon restart.
-
Earlier versions: Use the following manual workaround:
-
Access the PMM container:
podman exec -it pmm-server /bin/bash -
Connect to ClickHouse:
clickhouse client --username=<clickhouse_user> --password=<clickhouse_password> -d pmm -
Fix the migration state. Replace
xwith the version number from your error logs.USE pmm; INSERT INTO schema_migrations (version, dirty, sequence) VALUES (x, 0, toUnixTimestamp(NOW())*1000000000); EXIT; -
Restart QAN:
supervisorctl restart qan-api2 -
Verify QAN is running:
supervisorctl status qan-api2
-
ClickHouse memory issues in low-memory environments¶
If you’re running PMM Server with less than 16 GB RAM and seeing “memory limit exceeded” errors in ClickHouse logs, switch to the low-memory configuration.
PMM includes two ClickHouse profiles:
- default: optimized for performance (16 GB+ RAM)
- low-memory: optimized for constrained environments, based on ClickHouse recommendations
Switch to low-memory configuration¶
docker exec -it -u pmm pmm-server ./switch-config.sh low
To switch back:
docker exec -it -u pmm pmm-server ./switch-config.sh default
The script stops ClickHouse, updates the configuration, and restarts the service.
Persistent configuration¶
If you run PMM Server with the --rm flag, run the switch script each time the container starts. For systemd, add to your unit file:
ExecStartPost=/usr/bin/docker exec -u pmm pmm-server ./switch-config.sh low
See Install PMM Server with Podman for systemd configuration examples.
Configuration details
Both configuration files are located in /etc/clickhouse-server/ inside the PMM Server container:
default-config.xml: default profilelow-memory-config.xml: low-memory profile
The script is available at /etc/clickhouse-server/switch-config.sh or /opt/switch-config.sh.
When switching profiles, both config.xml and users.xml are updated to point to the selected profile.