ceph - ceph

	Commit message (Collapse)	Author	Age	Files	Lines
*	monitoring: upgrade grafana container to 9.4.12	Nizamudeen A	2023-12-06	1	-1/+1
\| \| \| \| \| \|	Fixes the CVEs mentioned here: https://grafana.com/blog/2023/06/06/grafana-security-release-new-grafana-versions-with-security-fixes-for-cve-2023-2183-and-cve-2023-2801/ Signed-off-by: Nizamudeen A <nia@redhat.com>
*	monitoring/grafana: update the grafana version	Nizamudeen A	2023-04-03	1	-1/+1
\| \| \| \|	Signed-off-by: Nizamudeen A <nia@redhat.com>
*	mgr/dashboard: upgrade grafana pie-chart and vonage-status-panel versions	Aashish Sharma	2022-04-06	1	-2/+2
\| \| \| \| \|	Fixes:https://tracker.ceph.com/issues/55195 Signed-off-by: Aashish Sharma <aasharma@redhat.com>
*	monitoring/grafana: fix version	Ernesto Puerta	2022-04-04	1	-1/+1
\| \| \| \| \|	Fixes: https://tracker.ceph.com/issues/55172 Signed-off-by: Ernesto Puerta <epuertat@redhat.com>
*	grafana/Makefile: don't push to docker	Ernesto Puerta	2022-04-01	1	-5/+1
\| \| \| \| \|	Fixes: https://tracker.ceph.com/issues/55155 Signed-off-by: Ernesto Puerta <epuertat@redhat.com>
*	mgr/dashboard: fix transition-through-oci image workaround in grafana build	Aashish Sharma	2022-03-23	1	-10/+2
\| \| \| \| \|	Fixes: https://tracker.ceph.com/issues/54311 Signed-off-by: Aashish Sharma <aasharma@redhat.com>
*	mgr/dashboard/monitoring: update grafana version	Aashish Sharma	2022-03-21	1	-2/+2
\| \| \| \| \| \|	Fixes: https://tracker.ceph.com/issues/54311 Signed-off-by: Aashish Sharma <aasharma@redhat.com>
*	mgr/dashboard: monitoring: refactor into ceph-mixin	Arthur Outhenin-Chalandre	2022-02-03	37	-13649/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Mixin is a way to bundle dashboards, prometheus rules and alerts into jsonnet package. Shifting to mixin will allow easier integration with monitoring automation that some users may use. This commit moves `/monitoring/grafana/dashboards` and `/monitoring/prometheus` to `/monitoring/ceph-mixin`. Prometheus alerts was also converted to Jsonnet using an automated way (from yaml to json to jsonnet). This commit minimises any change made to the generated files and should not change neithers the dashboards nor the Prometheus alerts. In the future some configuration will also be added to jsonnet to add more functionalities to the dashboards or alerts (i.e.: multi cluster). Fixes: https://tracker.ceph.com/issues/53374 Signed-off-by: Arthur Outhenin-Chalandre <arthur.outhenin-chalandre@cern.ch>
*	Merge pull request #43707 from BenoitKnecht/ceph-mgr-service-id	Ernesto Puerta	2022-02-02	6	-51/+228
\|\ \| \| \| \| \| \| \| \| \| \| \| \|	mgr: Fix ceph_daemon label in ceph_rgw_* metrics Reviewed-by: Aashish Sharma <aasharma@redhat.com> Reviewed-by: Ernesto Puerta <epuertat@redhat.com> Reviewed-by: Pere Diaz Bou <pdiazbou@redhat.com>
\| *	monitoring/grafana: Add tests for radosgw panels	Benoît Knecht	2022-01-11	2	-0/+174
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Some of the expressions modified in c40290390d7 were not covered by any tests, especially those in the `radosgw-detail.json` dashboard. This commit fills in those gaps. Signed-off-by: Benoît Knecht <bknecht@protonmail.ch>
\| *	monitoring/grafana: Update radosgw dashboards	Benoît Knecht	2022-01-11	5	-52/+55
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	With the `ceph_daemon` label now replaced by `instance_id` on all `ceph_rgw_` metrics, we need to update Grafana dashboards get that label back from `ceph_rgw_metadata` using this type of construct: ``` ceph_rgw_req on (instance_id) group_left(ceph_daemon) ceph_rgw_metadata ``` Signed-off-by: Benoît Knecht <bknecht@protonmail.ch>
* \|	monitoring/grafana: replace filestore osd count	Pere Diaz Bou	2022-01-18	2	-2/+2
\| \| \| \| \| \| \| \|	Signed-off-by: Pere Diaz Bou <pdiazbou@redhat.com>
* \|	monitoring/grafana: use Path class instead of split	Pere Diaz Bou	2022-01-18	1	-2/+2
\| \| \| \| \| \| \| \|	Signed-off-by: Pere Diaz Bou <pdiazbou@redhat.com>
* \|	monitoring/grafana: remove explicit str casting	Pere Diaz Bou	2022-01-18	1	-1/+6
\| \| \| \| \| \| \| \|	Signed-off-by: Pere Diaz Bou <pdiazbou@redhat.com>
* \|	monitoring/grafana: add generated json files	Pere Diaz Bou	2022-01-18	4	-27/+19
\| \| \| \| \| \| \| \|	Signed-off-by: Pere Diaz Bou <pdiazbou@redhat.com>
* \|	monitoring/grafana: ValueError instead of RuntimeError	Pere Diaz Bou	2022-01-18	1	-1/+1
\| \| \| \| \| \| \| \|	Signed-off-by: Pere Diaz Bou <pdiazbou@redhat.com>
* \|	monitoring/grafana: Replace missing legendFormat warning with error	Pere Diaz Bou	2022-01-18	2	-19/+26
\| \| \| \| \| \| \| \|	Signed-off-by: Pere Diaz Bou <pdiazbou@redhat.com>
* \|	monitoring: Add unit tests for OSD panels in ceph-cluster dashboard	Patrick Seidensal	2022-01-13	1	-0/+33
\| \| \| \| \| \| \| \|	Signed-off-by: Patrick Seidensal <pseidensal@suse.com>
* \|	monitoring: fix display ceph_osd_in in Grafana panel	Patrick Seidensal	2022-01-13	2	-3/+14
\| \| \| \| \| \| \| \|	Signed-off-by: Patrick Seidensal <pseidensal@suse.com>
* \|	mgr/prometheus: Fix regression with OSD/host details/overview dashboards	Patrick Seidensal	2022-01-13	7	-28/+260
\|/ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Fix issues with PromQL expressions and vector matching with the `ceph_disk_occupation` metric. As it turns out, `ceph_disk_occupation` cannot simply be used as expected, as there seem to be some edge cases for users that have several OSDs on a single disk. This leads to issues which cannot be approached by PromQL alone (many-to-many PromQL erros). The data we have expected is simply different in some rare cases. I have not found a sole PromQL solution to this issue. What we basically need is the following. 1. Match on labels `host` and `instance` to get one or more OSD names from a metadata metric (`ceph_disk_occupation`) to let a user know about which OSDs belong to which disk. 2. Match on labels `ceph_daemon` of the `ceph_disk_occupation` metric, in which case the value of `ceph_daemon` must not refer to more than a single OSD. The exact opposite to requirement 1. As both operations are currently performed on a single metric, and there is no way to satisfy both requirements on a single metric, the intention of this commit is to extend the metric by providing a similar metric that satisfies one of the requirements. This enables the queries to differentiate between a vector matching operation to show a string to the user (where `ceph_daemon` could possibly be `osd.1` or `osd.1+osd.2`) and to match a vector by having a single `ceph_daemon` in the condition for the matching. Although the `ceph_daemon` label is used on a variety of daemons, only OSDs seem to be affected by this issue (only if more than one OSD is run on a single disk). This means that only the `ceph_disk_occupation` metadata metric seems to need to be extended and provided as two metrics. `ceph_disk_occupation` is supposed to be used for matching the `ceph_daemon` label value. foo * on(ceph_daemon) group_left ceph_disk_occupation `ceph_disk_occupation_human` is supposed to be used for anything where the resulting data is displayed to be consumed by humans (graphs, alert messages, etc). foo * on(device,instance) group_left(ceph_daemon) ceph_disk_occupation_human Fixes: https://tracker.ceph.com/issues/52974 Signed-off-by: Patrick Seidensal <pseidensal@suse.com>
*	Merge pull request #44294 from rhcs-dashboard/feature-bluestore-onode	Ernesto Puerta	2022-01-11	3	-154/+936
\|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	mgr/dashboard: monitoring:Implement BlueStore onode hit/miss counters into the dashboard Reviewed-by: Aashish Sharma <aasharma@redhat.com> Reviewed-by: Alfonso Martínez <almartin@redhat.com> Reviewed-by: Avan Thakkar <athakkar@redhat.com> Reviewed-by: Ernesto Puerta <epuertat@redhat.com> Reviewed-by: Laura Flores <lflores@redhat.com> Reviewed-by: neha-ojha <NOT@FOUND> Reviewed-by: Pere Diaz Bou <pdiazbou@redhat.com>
\| *	mgr/dashboard: monitoring:Implement BlueStore onode hit/miss counters into ↵	Aashish Sharma	2022-01-05	3	-154/+936
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	the dashboard Provide the details pulled from Bluestore stats in order to display the onode hit/miss counters Fixes: https://tracker.ceph.com/issues/53577 Signed-off-by: Aashish Sharma <aasharma@redhat.com>
* \|	Merge pull request #44190 from rhcs-dashboard/grafana-regex	Ernesto Puerta	2021-12-21	4	-3/+29
\|\ \ \| \|/ \|/\| \| \| \| \| \| \| \| \| \| \| \| \|	monitoring/grafana: improve grafana unit tests variable substitution Reviewed-by: Alfonso Martínez <almartin@redhat.com> Reviewed-by: Avan Thakkar <athakkar@redhat.com> Reviewed-by: Ernesto Puerta <epuertat@redhat.com> Reviewed-by: Nizamudeen A <nia@redhat.com> Reviewed-by: Pere Diaz Bou <pdiazbou@redhat.com>
\| *	monitoring/grafana: doctest util regex	Pere Diaz Bou	2021-12-15	3	-2/+27
\| \| \| \| \| \| \| \|	Signed-off-by: Pere Diaz Bou <pdiazbou@redhat.com>
\| *	monitoring/grafana: rename tox promql test	Pere Diaz Bou	2021-12-14	1	-2/+2
\| \| \| \| \| \| \| \|	Signed-off-by: Pere Diaz Bou <pdiazbou@redhat.com>
\| *	monitoring/grafana: improve grafana unit tests variable substitution	Pere Diaz Bou	2021-12-14	1	-1/+2
\| \| \| \| \| \| \| \|	Signed-off-by: Pere Diaz Bou <pdiazbou@redhat.com>
* \|	mgr/dashboard: disable Promql test in ARM	Ernesto Puerta	2021-12-13	1	-0/+2
\|/ \| \| \| \| \| \| \|	Temporarily disable this test while debugging the issue (since https://github.com/ceph/ceph/pull/43669 originally passed the ARM check). Fixes: https://tracker.ceph.com/issues/53451 Signed-off-by: Ernesto Puerta <epuertat@redhat.com>
*	mgr/dashboard: introduce HAProxy metrics for RGW	Avan Thakkar	2021-12-09	3	-6/+821
\| \| \| \| \|	Fixes: https://tracker.ceph.com/issues/53311 Signed-off-by: Avan Thakkar <athakkar@redhat.com>
*	monitoring/grafana: Grafana query tester	Pere Diaz Bou	2021-11-16	13	-3/+555
\| \| \| \|	Signed-off-by: Pere Diaz Bou <pdiazbou@redhat.com>
*	monitoring: ethernet bonding filter in Network Load	Pere Diaz Bou	2021-10-27	2	-14/+43
\| \| \| \|	Signed-off-by: Pere Diaz Bou <pdiazbou@redhat.com>
*	mgr/dashboard: monitoring: grafonnet refactoring for cephfs dashboards	Aashish Sharma	2021-10-19	3	-303/+351
\| \| \| \| \| \| \|	This PR intends to refactor cephfs dashboards using grafonnet Fixes:https://tracker.ceph.com/issues/52777 Signed-off-by: Aashish Sharma <aasharma@redhat.com>
*	mgr/dashboard: monitoring: grafonnet refactoring for osds dashboards	Aashish Sharma	2021-10-19	3	-1642/+1769
\| \| \| \| \| \| \|	This PR intends to refactor osds dashboards using grafonnet Fixes:https://tracker.ceph.com/issues/52777 Signed-off-by: Aashish Sharma <aasharma@redhat.com>
*	mgr/dashboard: monitoring: grafonnet refactoring for pools dashboards	Aashish Sharma	2021-10-19	3	-2196/+2248
\| \| \| \| \| \| \|	This PR intends to refactor pools dashboards using grafonnet Fixes:https://tracker.ceph.com/issues/52777 Signed-off-by: Aashish Sharma <aasharma@redhat.com>
*	mgr/dashboard: monitoring: grafonnet refactoring for rbd dashboards	Aashish Sharma	2021-10-19	3	-1070/+1173
\| \| \| \| \| \| \|	This PR intends to refactor rbd dashboards using grafonnet Fixes:https://tracker.ceph.com/issues/52777 Signed-off-by: Aashish Sharma <aasharma@redhat.com>
*	mgr/dashboard: monitoring: grafonnet refactoring for radosgw dashboards	Aashish Sharma	2021-10-19	4	-1096/+1270
\| \| \| \| \| \| \|	This PR intends to refactor radosgw dashboards using grafonnet Fixes:https://tracker.ceph.com/issues/52777 Signed-off-by: Aashish Sharma <aasharma@redhat.com>
*	Merge pull request #43469 from rhcs-dashboard/hosts-grafana-dashboards	Ernesto Puerta	2021-10-18	3	-2050/+2150
\|\ \| \| \| \| \| \| \| \| \| \| \| \|	mgr/dashboard: monitoring: grafonnet refactoring for hosts dashboards Reviewed-by: Aashish Sharma <aasharma@redhat.com> Reviewed-by: Avan Thakkar <athakkar@redhat.com> Reviewed-by: Nizamudeen A <nia@redhat.com>
\| *	mgr/dashboard: monitoring: grafonnet refactoring for hosts dashboards	Aashish Sharma	2021-10-12	3	-2050/+2150
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	This PR intends to refactor hosts dashboards using grafonnet Fixes:https://tracker.ceph.com/issues/52777 Signed-off-by: Aashish Sharma <aasharma@redhat.com>
* \|	mgr/dashboard: replace Client connections with active-stdby mgrs	Avan Thakkar	2021-10-11	1	-5/+20
\|/ \| \| \| \|	Fixes: https://tracker.ceph.com/issues/52121 Signed-off-by: Avan Thakkar <athakkar@redhat.com>
*	monitoring: update grafana-piechart-panel plugin	Patrick Seidensal	2021-09-10	1	-1/+1
\| \| \| \| \| \|	Fixes: https://tracker.ceph.com/issues/51211 Signed-off-by: Patrick Seidensal <pseidensal@suse.com>
*	cmake: exclude "grafonnet-lib" target from "all"	Kefu Chai	2021-08-20	1	-1/+2
\| \| \| \| \| \| \| \|	so we don't build this target when running "make", and hence avoid accessing the internet in a building envronment where the internest access is not allowed. Signed-off-by: Kefu Chai <kchai@redhat.com>
*	cmake: silence build output when building external deps	Kefu Chai	2021-08-16	1	-1/+4
\| \| \| \| \| \| \| \|	when download/building grafonnet-lib, dpdk, spdk, liburing and fio, they dump lots of output during configuration and building phrases, all of which is irrelevant to us. so let's just silence it. Signed-off-by: Kefu Chai <kchai@redhat.com>
*	Merge pull request #41570 from jhrcz-ls/wip-cephfs-overview-use-rate	Ernesto Puerta	2021-08-12	1	-2/+2
\|\ \| \| \| \|	mgr/dashboard: cephfs MDS Workload to use rate for counter type metric
\| *	[mgr/dashboard] cephfs metrics in MDS Workload panels to use rate because of ↵	Jan Horáček	2021-07-29	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	counter type metric Fixes: https://tracker.ceph.com/issues/51954 Signed-off-by: Jan Horacek <jan.horacek@livesport.eu>
* \|	mgr/dashboard: fix grafonnet build error	Aashish Sharma	2021-08-12	1	-0/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	This PR tends to fix the issue caused by #42194 Fixes:https://tracker.ceph.com/issues/52238 Signed-off-by: Aashish Sharma <aasharma@redhat.com>
* \|	Merge pull request #42194 from rhcs-dashboard/add-grafonnet-grafana	Ernesto Puerta	2021-08-11	6	-436/+576
\|\ \ \| \| \| \| \| \|	mgr/dashboard: monitoring: replace Grafana JSON with Grafonnet based code
\| * \|	mgr/dashboard: monitoring: replace Grafana JSON with Grafonnet based Code	Aashish Sharma	2021-08-11	6	-436/+576
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This PR intends to add grafonnet to generate grafana JSON files Fixes: https://tracker.ceph.com/issues/45184 Signed-off-by: Aashish Sharma <aasharma@redhat.com>
* \| \|	Merge pull request #41880 from david-caro/fix_cluster_grafana_dashboard	Ernesto Puerta	2021-08-02	1	-2/+2
\|\ \ \ \| \|/ / \|/\| \| \| \| \| \| \| \| \| \| \| \| \| \|	monitoring/grafana/cluster: use per-unit max and limit values Reviewed-by: Aashish Sharma <aasharma@redhat.com> Reviewed-by: Ernesto Puerta <epuertat@redhat.com> Reviewed-by: p-se <NOT@FOUND>
\| * \|	monitoring/grafana/cluster: use per-unit max and limit values	David Caro	2021-06-16	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The value we get is a perunit, so the limits and the max value should be over 1, not 100. Note that the value being shown was correct, it was the gauge that was not showing the correct indicators. Signed-off-by: David Caro <david@dcaro.es>
* \| \|	monitoring: fix Physical Device Latency unit	Seena Fallah	2021-07-07	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Based on the expr it should be seconds Signed-off-by: Seena Fallah <seenafallah@gmail.com>
* \| \|	Merge pull request #41838 from p-se/grafana-clean-up	Ernesto Puerta	2021-06-25	5	-457/+435
\|\ \ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	monitoring: Clean up Grafana dashboards Reviewed-by: Alfonso Martínez <almartin@redhat.com> Reviewed-by: Avan Thakkar <athakkar@redhat.com> Reviewed-by: Ernesto Puerta <epuertat@redhat.com> Reviewed-by: jan--f <NOT@FOUND> Reviewed-by: p-se <NOT@FOUND> Reviewed-by: Paul Cuzner <pcuzner@redhat.com>