summaryrefslogtreecommitdiffstats
path: root/monitoring (follow)
Commit message (Collapse)AuthorAgeFilesLines
* monitoring: Update nvmeof alert limits in configVallari Agrawal9 days3-24/+91
| | | | | | | | | | | | | Update these in config.libsonnet: - NVMeoFMaxGatewaysPerGroup (4->8) - NVMeoFMaxGatewaysPerCluster (4->32) - NVMeoFMaxNamespaces (1024->2048) - NVMeoFHighClientCount (32->128) Also update prometheus_alerts.yml and test_alerts.yml accordingly. Signed-off-by: Vallari Agrawal <vallari.agrawal@ibm.com>
* monitoring: Add prometheus alert NVMeoFMultipleNamespacesOfRBDImageVallari Agrawal2024-12-183-0/+67
| | | | | | | | NVMeoFMultipleNamespacesOfRBDImage alerts the user if a RBD image is used for multiple namespaces. This is important alerts for cases where namespaces are created on same image for different gateway group. Signed-off-by: Vallari Agrawal <vallari.agrawal@ibm.com>
* Merge pull request #60873 from rhcs-dashboard/fix-69074-mainafreen232024-12-163-12/+18
|\ | | | | | | | | mgr/dashboard: Add ceph_daemon filter to rgw overview grafana panel queries Reviewed-by: Afreen Misbah <afreen@ibm.com>
| * mgr/dashboard: Add ceph_daemon filter to rgw overview grafana panelAashish Sharma2024-12-053-12/+18
| | | | | | | | | | | | | | | | | | | | | | | | queries Currently rgw_servers filtering is not working in RGW Overview garfana graphs. It is showing data of all the RGW services, even though filter set to single service. This PR intends to solve this issue Fixes: https://tracker.ceph.com/issues/69074 Signed-off-by: Aashish Sharma <aasharma@redhat.com>
* | monitoring: Add alert NVMeoFTooManyNamespacesVallari Agrawal2024-11-194-5/+291
|/ | | | | | | | | | | | NVMeoFTooManyNamespaces helps to alert user if total number of namespaces across subsystems are more than 1024. Change NVMeoFTooManySubsystems limit to 128 from 16. Fixes: https://github.com/ceph/ceph-nvmeof/issues/948 Signed-off-by: Vallari Agrawal <vallari.agrawal@ibm.com>
* monitoring: add tests for 2 new nvmeof alertsVallari Agrawal2024-11-111-0/+69
| | | | | | | Add test for alerts NVMeoFMissingListener and NVMeoFZeroListenerSubsystem to test_alerts.yml. Signed-off-by: Vallari Agrawal <vallari.agrawal@ibm.com>
* monitoring: add 2 new nvmeof alertsVallari Agrawal2024-11-111-0/+20
| | | | | | | Add NVMeoFMissingListener and NVMeoFZeroListenerSubsystem alerts to prometheus_alerts.libsonnet. Signed-off-by: Vallari Agrawal <vallari.agrawal@ibm.com>
* monitoring: add 2 nvmeof alerts to prometheus_alerts.yamlVallari Agrawal2024-11-111-0/+18
| | | | | | | | - `NVMeoFMissingListener`: trigger if all listeners are not created for each gateway in a subsystem - `NVMeoFZeroListenerSubsystem`: trigger if a subsystem has no listeners Signed-off-by: Vallari Agrawal <vallari.agrawal@ibm.com>
* Merge pull request #60100 from piyushagarwal1411/fix-68316-mainAashish Sharma2024-11-054-2/+66
|\ | | | | | | | | mgr/dashboard: Add 'Browse Dashboards' button in multi-cluster and ceph-cluster Grafana dashboards Reviewed-by: Aashish Sharma <aasharma@redhat.com>
| * mgr/dashboard: Add 'Browse Dashboards' button in multi-cluster and ↵Piyush Agarwal2024-10-164-2/+66
| | | | | | | | | | | | | | | | | | ceph-cluster Grafana dashboards Fixes: https://tracker.ceph.com/issues/68316 Signed-off-by: piyushagarwal1411 <piyushagarwal14.pa@gmail.com> Signed-off-by: Piyush Agarwal <piyushagarwal14.pa@gmail.com>
* | Merge pull request #56849 from frittentheke/issue_64321_alertsafreen232024-10-214-748/+771
|\ \ | |/ |/| | | | | Add multi-cluster support (showMultiCluster=True) to alerts Reviewed-by: Afreen Misbah <afreen@ibm.com>
| * Add multi-cluster support (showMultiCluster=True) to alertsChristian Rohmann2024-10-214-748/+771
| | | | | | | | | | | | | | | | | | | | Following PR https://github.com/ceph/ceph/pull/55495 fixing the dashboard in regards to multiple clusters storing their metrics in a single Prometheus instance, this PR addresses the issues for alerts. Fixes: https://tracker.ceph.com/issues/64321 Signed-off-by: Christian Rohmann <christian.rohmann@inovex.de>
* | mgr/dashboard: Add Performance Details grafana charts for individual ↵Aashish Sharma2024-08-223-106/+10
|/ | | | | | | | clusters in Manage-clusters page Fixes: https://tracker.ceph.com/issues/67192 Signed-off-by: Aashish Sharma <aasharma@redhat.com>
* mgr/dashboard: Add a new chart for replication delta per shard in rgw sync ↵Aashish Sharma2024-07-172-0/+133
| | | | | | | | overview grafana dashboard Fixes: https://tracker.ceph.com/issues/66994 Signed-off-by: Aashish Sharma <aasharma@redhat.com>
* Merge pull request #56014 from badone/wip-tracker-63591-pyyaml-cython_sourcesNizamudeen A2024-05-213-3/+3
|\ | | | | | | | | | | install-deps: Update Pyyaml version Reviewed-by: Ankush Behl <cloudbehl@gmail.com> Reviewed-by: Nizamudeen A <nia@redhat.com>
| * install-deps: Update Pyyaml versionBrad Hubbard2024-03-073-3/+3
| | | | | | | | | | | | | | | | Move to 6.0.1 to overcome https://github.com/yaml/pyyaml/issues/601 Fixes: https://tracker.ceph.com/issues/63591 Signed-off-by: Brad Hubbard <bhubbard@redhat.com>
* | mgr/dashboard: fix cluster filter typo in multi-cluster-overviewAashish Sharma2024-05-022-4/+4
| | | | | | | | | | | | | | | | grafana dashboard Fixes: https://tracker.ceph.com/issues/65760 Signed-off-by: Aashish Sharma <aasharma@redhat.com>
* | Merge pull request #56575 from cloudbehl/ceph-cluster-json-updateAashish Sharma2024-05-021-29/+51
|\ \ | | | | | | | | | | | | monitoring/ceph-mixin: Add cluster variable to ceph-cluster.json Reviewed-by: Aashish Sharma <aasharma@redhat.com>
| * | monitoring/ceph-mixin: Add cluster variable to ceph-cluster.jsoncloudbehl2024-03-291-29/+51
| | | | | | | | | | | | | | | | | | Fixes: https://tracker.ceph.com/issues/65218 Signed-off-by: cloudbehl <cloudbehl@gmail.com>
* | | Merge pull request #55495 from frittentheke/issue_64321Nizamudeen A2024-05-0238-1692/+1457
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | monitoring/ceph-mixin: Cleanup of variables, queries and tests (to fix showMultiCluster=True) Reviewed-by: Aashish Sharma <aasharma@redhat.com> Reviewed-by: Ankush Behl <cloudbehl@gmail.com> Reviewed-by: Nizamudeen A <nia@redhat.com>
| * | | Cleanup of variables, queries and tests to enable showMultiCluster=TrueChristian Rohmann2024-04-2238-1692/+1457
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Rendering the dashboards with showMultiCluster=True allows for them to work with multiple clusters storing their metrics in a single Prometheus instance. This works via the cluster label and that functionality already existed. This just fixes some inconsistencies in applying the label filters. Additionally this contains updates to the tests to have them succeed with with both configurations and avoid the introduction of regressions in regards to multiCluster in the future. There also are some consistency cleanups here and there: * `datasource` was not used consistently * `cluster` label_values are determined from `ceph_health_status` * `job` template and filters on this label were removed to align multi cluster support solely via the `cluster` label * `ceph_hosts` filter now uses label_values from any ceph_metadata metrici to now show all instance values, but those of hosts with some Ceph component / daemon. * Enable showMultiCluster=True since `cluster` label is now always present, via https://github.com/ceph/ceph/pull/54964 Improves: https://tracker.ceph.com/issues/64321 Signed-off-by: Christian Rohmann <christian.rohmann@inovex.de>
* | | | monitoring/ceph-mixin: set NVMeoFMaxGatewaysPerGroup to 4Adam King2024-04-221-1/+1
|/ / / | | | | | | | | | | | | | | | Recommendation from the nvmeof team Signed-off-by: Adam King <adking@redhat.com>
* / / mgr/dashboard: replace deprecated table panel in grafana with a newerAashish Sharma2024-04-0213-975/+2229
|/ / | | | | | | | | | | | | | | table panel Fixes: https://tracker.ceph.com/issues/65174 Signed-off-by: Aashish Sharma <aasharma@redhat.com>
* | Merge pull request #55574 from ceph/feature-multi-cluster-management-monitoringNizamudeen A2024-03-064-2/+3092
|\ \ | |/ |/| | | | | mgr/dashboard: introduce multi cluster management and monitoring in ceph dashboard Reviewed-by: Nizamudeen A <nia@redhat.com>
| * mgr/dashboard: introduce multi-cluster overview pageNizamudeen A2024-03-052-2/+2
| | | | | | | | | | | | https://tracker.ceph.com/issues/64530 Signed-off-by: Nizamudeen A <nia@redhat.com> Signed-off-by: Aashish Sharma <aasharma@redhat.com>
| * mgr/dashboard: Add a manage clusters page to the multi-cluster nav toAashish Sharma2024-02-224-2/+3092
| | | | | | | | | | | | | | list/connect/disconnect/edit clusters in multi-cluster setup Fixes: https://tracker.ceph.com/issues/64530 Signed-off-by: Aashish Sharma <aasharma@redhat.com>
* | Merge pull request #55510 from pcuzner/add-nvmeof-alertsAashish Sharma2024-02-295-1/+715
|\ \ | | | | | | | | | | | | | | | ceph-mixin: Update mixin to include alerts for the nvmeof gateway(s) Reviewed-by: Aashish Sharma <aasharma@redhat.com>
| * | ceph-mixins: Update MIB to include nvmeof notificationPaul Cuzner2024-02-261-1/+8
| | | | | | | | | | | | Signed-off-by: Paul Cuzner <pcuzner@ibm.com>
| * | ceph-mixins: Add test cases for nvmeof alertsPaul Cuzner2024-02-261-0/+423
| | | | | | | | | | | | Signed-off-by: Paul Cuzner <pcuzner@ibm.com>
| * | ceph-mixins: nvmeof alerts addedPaul Cuzner2024-02-261-0/+129
| | | | | | | | | | | | Signed-off-by: Paul Cuzner <pcuzner@ibm.com>
| * | ceph-mixins: Add nvmeof alertsPaul Cuzner2024-02-261-0/+145
| | | | | | | | | | | | Signed-off-by: Paul Cuzner <pcuzner@ibm.com>
| * | ceph-mixins: Add vars to support nvmeof alertsPaul Cuzner2024-02-251-0/+10
| | | | | | | | | | | | Signed-off-by: Paul Cuzner <pcuzner@ibm.com>
* | | mgr/dashboard: replace piechart plugin charts with native pie chartAashish Sharma2024-02-274-50/+220
| |/ |/| | | | | | | | | | | | | panel Fixes: https://tracker.ceph.com/issues/64579 Signed-off-by: Aashish Sharma <aasharma@redhat.com>
* | Merge pull request #55314 from cloudbehl/rgw-dashboard-jsonAashish Sharma2024-02-135-47/+47
|\ \ | | | | | | | | | | | | | | | mgr/dashboard: Fixing RGW graph panels Reviewed-by: Aashish Sharma <aasharma@redhat.com>
| * | mgr/dashboards: add generated json filesAashish Sharma2024-02-074-29/+29
| | | | | | | | | | | | Signed-off-by: Aashish Sharma <aasharma@redhat.com>
| * | mgr/dashboard: Fixing RGW graph panelscloudbehl2024-01-251-18/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | - Fixing grafana panels for rgw dashboards - Fixing RGW overview dashboard queries fixes https://tracker.ceph.com/issues/64177 Signed-off-by: cloudbehl <cloudbehl@gmail.com>
* | | mgr/dashboard: Add RGW per user/bucket panels in grafanaAashish Sharma2024-02-095-1/+7153
| |/ |/| | | | | | | | | Fixes: https://tracker.ceph.com/issues/64359 Signed-off-by: Aashish Sharma <aasharma@redhat.com>
* | monitoring: add new alertsGuillaume Abrioux2024-01-254-0/+272
|/ | | | | | This adds new hardware monitoring alerts. Signed-off-by: Guillaume Abrioux <gabrioux@ibm.com>
* mgr/dashboard: upgrade from old 'graph' type panels to the newAashish Sharma2023-12-2217-74/+605
| | | | | | | | | | 'timeseries' panel The graph panel type is deprecated, and disappears after Grafana v9.1 (current version is 10.0) to prevent more old type panels being created. These should be migrated to the timeseries panel type, to avoid potential problems with future Grafana versions. Fixes: https://tracker.ceph.com/issues/61720 Signed-off-by: Aashish Sharma <aasharma@redhat.com>
* monitoring: upgrade grafana container to 9.4.12Nizamudeen A2023-12-061-1/+1
| | | | | | Fixes the CVEs mentioned here: https://grafana.com/blog/2023/06/06/grafana-security-release-new-grafana-versions-with-security-fixes-for-cve-2023-2183-and-cve-2023-2801/ Signed-off-by: Nizamudeen A <nia@redhat.com>
* Merge pull request #54355 from nobuto-m/info-rbd-stats-poolsNizamudeen A2023-11-303-15/+32
|\ | | | | | | | | | | mgr/dashboard: info on why RBD graphs are empty Reviewed-by: Ankush Behl <cloudbehl@gmail.com> Reviewed-by: Nizamudeen A <nia@redhat.com>
| * mgr/dashboard: info on why RBD graphs are emptyNobuto Murata2023-11-063-15/+32
| | | | | | | | | | | | | | | | | | | | Those RBD IO statistics graphs are empty out of the box and it's on purpose. Instead of giving an impression that those graphs are broken, point users to a documentation explaining about optional steps to enable those statistics. https://docs.ceph.com/en/latest/mgr/prometheus/#rbd-io-statistics Signed-off-by: Nobuto Murata <nobuto.murata@canonical.com>
* | Merge pull request #51340 from ↵Aashish Sharma2023-11-207-58/+6541
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | Javlopez/feature/12087-upgrade-and-generate-grafana-dashboards monitoring: add new dashboards Fixes: https://tracker.ceph.com/issues/63592 Reviewed-by: Aashish Sharma <aasharma@redhat.com>
| * | monitoring: update libsonnet files for generate ceph-cluster.jsonJavier2023-10-217-58/+6541
| | | | | | | | | | | | | | | | | | | | | add ceph-cluster.libsonnet file to generate ceph-cluster.json Fixes: https://tracker.ceph.com/issues/61443 Signed-off-by: Javier <sjavierlopez@gmail.com>
* | | Merge pull request #53650 from rhcs-dashboard/fix-62969-mainAashish Sharma2023-11-171-2/+88
|\ \ \ | |_|/ |/| | | | | | | | mgr/dashboard: Show the OSDs Out and Down panels as red whenever an OSD is in Out or Down state in Ceph Cluster grafana dashboard Reviewed-by: Nizamudeen A <nia@redhat.com>
| * | mgr/dashboard: Show the OSD's Out and Down panels as red whenever an OSD is ↵Aashish Sharma2023-10-111-2/+88
| |/ | | | | | | | | | | | | | | in Out or Down state in Ceph Cluster grafana dashboard Fixes: https://tracker.ceph.com/issues/62969 Signed-off-by: Aashish Sharma <aasharma@redhat.com>
* | Merge pull request #53807 from rhcs-dashboard/fix-63088-mainAashish Sharma2023-10-257-26/+26
|\ \ | | | | | | | | | | | | | | | mgr/dashboard: Consider null values as zero in grafana panels Reviewed-by: Nizamudeen A <nia@redhat.com>
| * | mgr/dashboard: Consider null values as zero in grafana panelsAashish Sharma2023-10-047-26/+26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | After upgrading from RHCS4 to RHCS5..some of the grafana charts broke. This is because in RHCS5 we do not generate the metrics if its value is zero as a result the null value from that metric breaks the grafana charts or graphs. This PR is to fix the above mentioned issue. Fixes: https://tracker.ceph.com/issues/63088 Signed-off-by: Aashish Sharma <aasharma@redhat.com>
* | | mgr/dashboard: fix broken alert generatorNizamudeen A2023-10-134-3/+9
| |/ |/| | | | | | | | | | | | | Currently the alert generator is broken if you try to run `tox -ealerts-fix`. I fixed it and ran the command and it built a new json file as well. Signed-off-by: Nizamudeen A <nia@redhat.com>
* | Merge pull request #50132 from aruniiird/add-rbd-mirror-mon-alertsJuan Miguel Olmo2023-10-109-15/+329
|\ \ | |/ |/| ceph-mixin: Add RBD Mirror monitoring alerts