Im trying to deploy this glusterfs in a kubernetes cluster that was deployed via kubespray. I have 3 vm's (on baremetal running centos 7. I believe I followed all the prerequisites but im getting this once i run the ./gk-deploy-g
./gk-deploy -g
Welcome to the deployment tool for GlusterFS on Kubernetes and OpenShift.
Before getting started, this script has some requirements of the execution
environment and of the container platform that you should verify.
The client machine that will run this script must have:
* Administrative access to an existing Kubernetes or OpenShift cluster
* Access to a python interpreter 'python'
Each of the nodes that will host GlusterFS must also have appropriate firewall
rules for the required GlusterFS ports:
* 2222 - sshd (if running GlusterFS in a pod)
* 24007 - GlusterFS Management
* 24008 - GlusterFS RDMA
* 49152 to 49251 - Each brick for every volume on the host requires its own
port. For every new brick, one new port will be used starting at 49152. We
recommend a default range of 49152-49251 on each host, though you can adjust
this to fit your needs.
The following kernel modules must be loaded:
* dm_snapshot
* dm_mirror
* dm_thin_pool
For systems with SELinux, the following settings need to be considered:
* virt_sandbox_use_fusefs should be enabled on each node to allow writing to
remote GlusterFS volumes
In addition, for an OpenShift deployment you must:
* Have 'cluster_admin' role on the administrative account doing the deployment
* Add the 'default' and 'router' Service Accounts to the 'privileged' SCC
* Have a router deployed that is configured to allow apps to access services
running in the cluster
Do you wish to proceed with deployment?
[Y]es, [N]o? [Default: Y]: y
Using Kubernetes CLI.
2017-09-04 15:33:58.778503 I | proto: duplicate proto type registered: google.protobuf.Any
2017-09-04 15:33:58.778568 I | proto: duplicate proto type registered: google.protobuf.Duration
2017-09-04 15:33:58.778582 I | proto: duplicate proto type registered: google.protobuf.Timestamp
Using namespace "default".
Checking for pre-existing resources...
GlusterFS pods ... not found.
deploy-heketi pod ... not found.
heketi pod ... not found.
Creating initial resources ... 2017-09-04 15:34:07.986783 I | proto: duplicate proto type registered: google.protobuf.Any
2017-09-04 15:34:07.986853 I | proto: duplicate proto type registered: google.protobuf.Duration
2017-09-04 15:34:07.986867 I | proto: duplicate proto type registered: google.protobuf.Timestamp
Error from server (AlreadyExists): error when creating "/root/gluster-kubernetes/deploy/kube-templates/heketi-service-account.yaml": serviceaccounts "heketi-service-account" already exists
2017-09-04 15:34:08.288683 I | proto: duplicate proto type registered: google.protobuf.Any
2017-09-04 15:34:08.288765 I | proto: duplicate proto type registered: google.protobuf.Duration
2017-09-04 15:34:08.288779 I | proto: duplicate proto type registered: google.protobuf.Timestamp
Error from server (AlreadyExists): clusterrolebindings.rbac.authorization.k8s.io "heketi-sa-view" already exists
2017-09-04 15:34:08.479687 I | proto: duplicate proto type registered: google.protobuf.Any
2017-09-04 15:34:08.479766 I | proto: duplicate proto type registered: google.protobuf.Duration
2017-09-04 15:34:08.479780 I | proto: duplicate proto type registered: google.protobuf.Timestamp
clusterrolebinding "heketi-sa-view" not labeled
OK
2017-09-04 15:34:08.751038 I | proto: duplicate proto type registered: google.protobuf.Any
2017-09-04 15:34:08.751103 I | proto: duplicate proto type registered: google.protobuf.Duration
2017-09-04 15:34:08.751116 I | proto: duplicate proto type registered: google.protobuf.Timestamp
error: 'storagenode' already has a value (glusterfs), and --overwrite is false
Failed to label node 'sum-vm-1'
Looks like you've run this more than once? What happened the first time you ran it? Otherwise you somehow manually added a storagenode=glusterfs
label on the node.
BTW this error should be able to be overcome if you have a recent enough version of this repo that has https://github.com/gluster/gluster-kubernetes/pull/339 merged.
@jarrpa i was getting the same the first time i ran it.. do you mind elaborating the last part about overcoming it?
Given that your log ended with Failed to label node 'sum-vm-1'
I thought that's what you were raising the Issue about. The PR I linked gets rid of that message.
If you're raising the issue about the proto: duplicate proto
lines, I have never seen that before and have no idea what it means. :( Did the deployment fail somehow?
@jarrpa i tried it and was gettng this...
Using namespace "default".
Checking for pre-existing resources...
GlusterFS pods ... not found.
deploy-heketi pod ... found.
heketi pod ... not found.
gluster-s3 pod ... not found.
...yes, that's normal output. Did the deployment fail somehow?
@jarrpa
it failed :(
Error from server (AlreadyExists): error when creating "STDIN": daemonsets.extensions "glusterfs" already exists
Waiting for GlusterFS pods to start ... pods not found.
...okay, reset your environment:
gk-deploy -gy --abort
rm -rf /etc/glusterfs /var/lib/glusterd
on every nodeThen run gk-deploy -gvy
. if it fails, paste the full output here along with the output of kubectl get deploy,ds,po -o wide
.
hi @jarrpa it failed
./gk-deploy -gvy
Using Kubernetes CLI.
2017-09-05 10:58:04.307027 I | proto: duplicate proto type registered: google.protobuf.Any
2017-09-05 10:58:04.307092 I | proto: duplicate proto type registered: google.protobuf.Duration
2017-09-05 10:58:04.307108 I | proto: duplicate proto type registered: google.protobuf.Timestamp
Checking status of namespace matching 'default':
default Active 11d
Using namespace "default".
Checking for pre-existing resources...
GlusterFS pods ...
Checking status of pods matching '--selector=glusterfs=pod':
Timed out waiting for pods matching '--selector=glusterfs=pod'.
not found.
deploy-heketi pod ...
Checking status of pods matching '--selector=deploy-heketi=pod':
Timed out waiting for pods matching '--selector=deploy-heketi=pod'.
not found.
heketi pod ...
Checking status of pods matching '--selector=heketi=pod':
Timed out waiting for pods matching '--selector=heketi=pod'.
not found.
gluster-s3 pod ...
Checking status of pods matching '--selector=glusterfs=s3-pod':
Timed out waiting for pods matching '--selector=glusterfs=s3-pod'.
not found.
Creating initial resources ... /usr/local/bin/kubectl -n default create -f /root/gluster-kubernetes/deploy/kube-templates/heketi-service-account.yaml 2>&1
2017-09-05 10:58:15.580733 I | proto: duplicate proto type registered: google.protobuf.Any
2017-09-05 10:58:15.580803 I | proto: duplicate proto type registered: google.protobuf.Duration
2017-09-05 10:58:15.580816 I | proto: duplicate proto type registered: google.protobuf.Timestamp
serviceaccount "heketi-service-account" created
/usr/local/bin/kubectl -n default create clusterrolebinding heketi-sa-view --clusterrole=edit --serviceaccount=default:heketi-service-account 2>&1
2017-09-05 10:58:15.882600 I | proto: duplicate proto type registered: google.protobuf.Any
2017-09-05 10:58:15.882667 I | proto: duplicate proto type registered: google.protobuf.Duration
2017-09-05 10:58:15.882679 I | proto: duplicate proto type registered: google.protobuf.Timestamp
clusterrolebinding "heketi-sa-view" created
/usr/local/bin/kubectl -n default label --overwrite clusterrolebinding heketi-sa-view glusterfs=heketi-sa-view heketi=sa-view
2017-09-05 10:58:16.071163 I | proto: duplicate proto type registered: google.protobuf.Any
2017-09-05 10:58:16.071223 I | proto: duplicate proto type registered: google.protobuf.Duration
2017-09-05 10:58:16.071237 I | proto: duplicate proto type registered: google.protobuf.Timestamp
clusterrolebinding "heketi-sa-view" labeled
OK
Marking 'sum-vm-1' as a GlusterFS node.
/usr/local/bin/kubectl -n default label nodes lenddo-vm-1 storagenode=glusterfs --overwrite 2>&1
2017-09-05 10:58:16.277147 I | proto: duplicate proto type registered: google.protobuf.Any
2017-09-05 10:58:16.277213 I | proto: duplicate proto type registered: google.protobuf.Duration
2017-09-05 10:58:16.277226 I | proto: duplicate proto type registered: google.protobuf.Timestamp
node "sum-vm-1" labeled
Marking 'sum-vm-2' as a GlusterFS node.
/usr/local/bin/kubectl -n default label nodes lenddo-vm-2 storagenode=glusterfs --overwrite 2>&1
2017-09-05 10:58:16.503202 I | proto: duplicate proto type registered: google.protobuf.Any
2017-09-05 10:58:16.503260 I | proto: duplicate proto type registered: google.protobuf.Duration
2017-09-05 10:58:16.503273 I | proto: duplicate proto type registered: google.protobuf.Timestamp
node "sum-vm-2" labeled
Marking 'lenddo-vm-3' as a GlusterFS node.
/usr/local/bin/kubectl -n default label nodes lenddo-vm-3 storagenode=glusterfs --overwrite 2>&1
2017-09-05 10:58:16.715662 I | proto: duplicate proto type registered: google.protobuf.Any
2017-09-05 10:58:16.715720 I | proto: duplicate proto type registered: google.protobuf.Duration
2017-09-05 10:58:16.715733 I | proto: duplicate proto type registered: google.protobuf.Timestamp
node "sum-vm-3" labeled
Deploying GlusterFS pods.
sed -e 's/storagenode\: glusterfs/storagenode\: 'glusterfs'/g' /root/gluster-kubernetes/deploy/kube-templates/glusterfs-daemonset.yaml | /usr/local/bin/kubectl -n default create -f - 2>&1
2017-09-05 10:58:16.928411 I | proto: duplicate proto type registered: google.protobuf.Any
2017-09-05 10:58:16.928467 I | proto: duplicate proto type registered: google.protobuf.Duration
2017-09-05 10:58:16.928479 I | proto: duplicate proto type registered: google.protobuf.Timestamp
daemonset "glusterfs" created
Waiting for GlusterFS pods to start ...
Checking status of pods matching '--selector=glusterfs=pod':
glusterfs-h996x 0/1 CrashLoopBackOff 5 5m
glusterfs-hmln9 0/1 CrashLoopBackOff 5 5m
Timed out waiting for pods matching '--selector=glusterfs=pod'.
pods not found.
kubectl get deploy,ds,po -o wide
2017-09-05 11:05:25.072539 I | proto: duplicate proto type registered: google.protobuf.Any
2017-09-05 11:05:25.072617 I | proto: duplicate proto type registered: google.protobuf.Duration
2017-09-05 11:05:25.072632 I | proto: duplicate proto type registered: google.protobuf.Timestamp
NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE-SELECTOR AGE CONTAINER(S) IMAGE(S) SELECTOR
ds/glusterfs 3 2 0 2 0 storagenode=glusterfs 7m glusterfs gluster/gluster-centos:latest glusterfs=pod,glusterfs-node=pod
NAME READY STATUS RESTARTS AGE IP NODE
po/glusterfs-h996x 0/1 CrashLoopBackOff 6 7m 192.168.1.240 sum-vm-1
po/glusterfs-hmln9 0/1 CrashLoopBackOff 6 7m 192.168.1.241 sum-vm-2
...Hm. So two of your GlusterFS pods are failing, and the third one is missing entirely. Is there anything useful if you run "kubectl describe" on the daemonset or pods?
# kubectl describe pod glusterfs-h996x --namespace=default
2017-09-05 11:20:25.629923 I | proto: duplicate proto type registered: google.protobuf.Any
2017-09-05 11:20:25.629983 I | proto: duplicate proto type registered: google.protobuf.Duration
2017-09-05 11:20:25.629997 I | proto: duplicate proto type registered: google.protobuf.Timestamp
Name: glusterfs-h996x
Namespace: default
Node: sum-vm-1/192.168.1.240
Start Time: Tue, 05 Sep 2017 10:58:17 +0800
Labels: controller-revision-hash=1016952396
glusterfs=pod
glusterfs-node=pod
pod-template-generation=1
Annotations: kubernetes.io/created-by={"kind":"SerializedReference","apiVersion":"v1","reference":{"kind":"DaemonSet","namespace":"default","name":"glusterfs","uid":"10ecb06a-91e6-11e7-8884-000c29580815","apiVersi...
Status: Running
IP: 192.168.1.240
Created By: DaemonSet/glusterfs
Controlled By: DaemonSet/glusterfs
Containers:
glusterfs:
Container ID: docker://597ef206a63bbc4a6416163fd2c60d6eecd7b9c260507107f0a5bdfcc38eb75e
Image: gluster/gluster-centos:latest
Image ID: docker-pullable://gluster/gluster-centos@sha256:e3e2881af497bbd76e4d3de90a4359d8167aa8410db2c66196f0b99df6067cb2
Port: <none>
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Error
Exit Code: 1
Started: Tue, 05 Sep 2017 11:19:23 +0800
Finished: Tue, 05 Sep 2017 11:19:23 +0800
Ready: False
Restart Count: 9
Requests:
cpu: 100m
memory: 100Mi
Liveness: exec [/bin/bash -c systemctl status glusterd.service] delay=40s timeout=3s period=25s #success=1 #failure=15
Readiness: exec [/bin/bash -c systemctl status glusterd.service] delay=40s timeout=3s period=25s #success=1 #failure=15
Environment: <none>
Mounts:
/dev from glusterfs-dev (rw)
/etc/glusterfs from glusterfs-etc (rw)
/etc/ssl from glusterfs-ssl (ro)
/run from glusterfs-run (rw)
/run/lvm from glusterfs-lvm (rw)
/sys/fs/cgroup from glusterfs-cgroup (ro)
/var/lib/glusterd from glusterfs-config (rw)
/var/lib/heketi from glusterfs-heketi (rw)
/var/lib/misc/glusterfsd from glusterfs-misc (rw)
/var/log/glusterfs from glusterfs-logs (rw)
/var/run/secrets/kubernetes.io/serviceaccount from default-token-1n7cc (ro)
Conditions:
Type Status
Initialized True
Ready False
PodScheduled True
Volumes:
glusterfs-heketi:
Type: HostPath (bare host directory volume)
Path: /var/lib/heketi
glusterfs-run:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
glusterfs-lvm:
Type: HostPath (bare host directory volume)
Path: /run/lvm
glusterfs-etc:
Type: HostPath (bare host directory volume)
Path: /etc/glusterfs
glusterfs-logs:
Type: HostPath (bare host directory volume)
Path: /var/log/glusterfs
glusterfs-config:
Type: HostPath (bare host directory volume)
Path: /var/lib/glusterd
glusterfs-dev:
Type: HostPath (bare host directory volume)
Path: /dev
glusterfs-misc:
Type: HostPath (bare host directory volume)
Path: /var/lib/misc/glusterfsd
glusterfs-cgroup:
Type: HostPath (bare host directory volume)
Path: /sys/fs/cgroup
glusterfs-ssl:
Type: HostPath (bare host directory volume)
Path: /etc/ssl
default-token-1n7cc:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-1n7cc
Optional: false
QoS Class: Burstable
Node-Selectors: storagenode=glusterfs
Tolerations: node.alpha.kubernetes.io/notReady:NoExecute
node.alpha.kubernetes.io/unreachable:NoExecute
Events:
FirstSeen LastSeen Count From SubObjectPath Type Reason Message
--------- -------- ----- ---- ------------- -------- ------ -------
22m 22m 2 kubelet, sum-vm-1 Normal SuccessfulMountVolume (combined from similar events): MountVolume.SetUp succeeded for volume "default-token-1n7cc"
22m 22m 1 kubelet, sum-vm-1 Normal SuccessfulMountVolume MountVolume.SetUp succeeded for volume "glusterfs-misc"
22m 22m 1 kubelet, sum-vm-1 Normal SuccessfulMountVolume MountVolume.SetUp succeeded for volume "glusterfs-dev"
22m 22m 1 kubelet, sum-vm-1 Normal SuccessfulMountVolume MountVolume.SetUp succeeded for volume "glusterfs-cgroup"
22m 22m 1 kubelet, sum-vm-1 Normal SuccessfulMountVolume MountVolume.SetUp succeeded for volume "glusterfs-ssl"
22m 22m 1 kubelet, sum-vm-1 Normal SuccessfulMountVolume MountVolume.SetUp succeeded for volume "glusterfs-heketi"
22m 22m 1 kubelet, sum-vm-1 Normal SuccessfulMountVolume MountVolume.SetUp succeeded for volume "glusterfs-lvm"
22m 22m 1 kubelet, sum-vm-1 Normal SuccessfulMountVolume MountVolume.SetUp succeeded for volume "glusterfs-etc"
22m 22m 1 kubelet, sum-vm-1 Normal SuccessfulMountVolume MountVolume.SetUp succeeded for volume "glusterfs-run"
22m 22m 1 kubelet, sum-vm-1 Normal SuccessfulMountVolume MountVolume.SetUp succeeded for volume "glusterfs-config"
22m 1m 10 kubelet, sum-vm-1 spec.containers{glusterfs} Normal Pulled Container image "gluster/gluster-centos:latest" already present on machine
22m 1m 10 kubelet, sum-vm-1 spec.containers{glusterfs} Normal Created Created container
22m 1m 10 kubelet, sum-vm-1 spec.containers{glusterfs} Normal Started Started container
22m 13s 106 kubelet, sum-vm-1 spec.containers{glusterfs} Warning BackOff Back-off restarting failed container
22m 13s 106 kubelet, sum-vm-1 Warning FailedSync Error syncing pod
@jarrpa heres the output
Does the OS you are using support systemd?
@Justluckyg can you do a describe on the pod, we are looking for an event that references 'dbus'
@erinboyd I don't see how the OS supporting systemd matters, systemd is in the container. And he just gave us the describe of the pod.
@Justluckyg sorry for the delay, can you also do a describe of the glusterfs daemonset when it reaches such a state?
@jarrpa we ran into a similar issue with the service broker integration on a non-RHEL OS. The error was bubbling up via the container...
**kubectl describe pod glusterfs-2dccz**
2017-09-08 11:48:24.425706 I | proto: duplicate proto type registered: google.protobuf.Any
2017-09-08 11:48:24.425777 I | proto: duplicate proto type registered: google.protobuf.Duration
2017-09-08 11:48:24.425790 I | proto: duplicate proto type registered: google.protobuf.Timestamp
Name: glusterfs-2dccz
Namespace: default
Node: sum-vm-2/192.168.1.241
Start Time: Fri, 08 Sep 2017 11:45:58 +0800
Labels: controller-revision-hash=1016952396
glusterfs=pod
glusterfs-node=pod
pod-template-generation=1
Annotations: kubernetes.io/created-by={"kind":"SerializedReference","apiVersion":"v1","reference":{"kind":"DaemonSet","namespace":"default","name":"glusterfs","uid":"38954545-9448-11e7-8884-000c29580815","apiVersi...
Status: Running
IP: 192.168.1.241
Created By: DaemonSet/glusterfs
Controlled By: DaemonSet/glusterfs
Containers:
glusterfs:
Container ID: docker://46a4ffeef4c1a4682eb0ac780b49851c0384cc6e714fd8731467b052fb393f64
Image: gluster/gluster-centos:latest
Image ID: docker-pullable://gluster/gluster-centos@sha256:e3e2881af497bbd76e4d3de90a4359d8167aa8410db2c66196f0b99df6067cb2
Port: <none>
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Error
Exit Code: 1
Started: Fri, 08 Sep 2017 11:47:18 +0800
Finished: Fri, 08 Sep 2017 11:47:18 +0800
Ready: False
Restart Count: 4
Requests:
cpu: 100m
memory: 100Mi
Liveness: exec [/bin/bash -c systemctl status glusterd.service] delay=40s timeout=3s period=25s #success=1 #failure=15
Readiness: exec [/bin/bash -c systemctl status glusterd.service] delay=40s timeout=3s period=25s #success=1 #failure=15
Environment: <none>
Mounts:
/dev from glusterfs-dev (rw)
/etc/glusterfs from glusterfs-etc (rw)
/etc/ssl from glusterfs-ssl (ro)
/run from glusterfs-run (rw)
/run/lvm from glusterfs-lvm (rw)
/sys/fs/cgroup from glusterfs-cgroup (ro)
/var/lib/glusterd from glusterfs-config (rw)
/var/lib/heketi from glusterfs-heketi (rw)
/var/lib/misc/glusterfsd from glusterfs-misc (rw)
/var/log/glusterfs from glusterfs-logs (rw)
/var/run/secrets/kubernetes.io/serviceaccount from default-token-1n7cc (ro)
Conditions:
Type Status
Initialized True
Ready False
PodScheduled True
Volumes:
glusterfs-heketi:
Type: HostPath (bare host directory volume)
Path: /var/lib/heketi
glusterfs-run:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
glusterfs-lvm:
Type: HostPath (bare host directory volume)
Path: /run/lvm
glusterfs-etc:
Type: HostPath (bare host directory volume)
Path: /etc/glusterfs
glusterfs-logs:
Type: HostPath (bare host directory volume)
Path: /var/log/glusterfs
glusterfs-config:
Type: HostPath (bare host directory volume)
Path: /var/lib/glusterd
glusterfs-dev:
Type: HostPath (bare host directory volume)
Path: /dev
glusterfs-misc:
Type: HostPath (bare host directory volume)
Path: /var/lib/misc/glusterfsd
glusterfs-cgroup:
Type: HostPath (bare host directory volume)
Path: /sys/fs/cgroup
glusterfs-ssl:
Type: HostPath (bare host directory volume)
Path: /etc/ssl
default-token-1n7cc:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-1n7cc
Optional: false
QoS Class: Burstable
Node-Selectors: storagenode=glusterfs
Tolerations: node.alpha.kubernetes.io/notReady:NoExecute
node.alpha.kubernetes.io/unreachable:NoExecute
Events:
FirstSeen LastSeen Count From SubObjectPath Type Reason Message
--------- -------- ----- ---- ------------- -------- ------ -------
2m 2m 1 kubelet, sum-vm-2 Normal SuccessfulMountVolume MountVolume.SetUp succeeded for volume "glusterfs-etc"
2m 2m 1 kubelet, sum-vm-2 Normal SuccessfulMountVolume MountVolume.SetUp succeeded for volume "glusterfs-run"
2m 2m 1 kubelet, sum-vm-2 Normal SuccessfulMountVolume MountVolume.SetUp succeeded for volume "glusterfs-ssl"
2m 2m 1 kubelet, sum-vm-2 Normal SuccessfulMountVolume MountVolume.SetUp succeeded for volume "glusterfs-lvm"
2m 2m 1 kubelet, sum-vm-2 Normal SuccessfulMountVolume MountVolume.SetUp succeeded for volume "glusterfs-misc"
2m 2m 1 kubelet, sum-vm-2 Normal SuccessfulMountVolume MountVolume.SetUp succeeded for volume "glusterfs-dev"
2m 2m 1 kubelet, sum-vm-2 Normal SuccessfulMountVolume MountVolume.SetUp succeeded for volume "glusterfs-logs"
2m 2m 1 kubelet, sum-vm-2 Normal SuccessfulMountVolume MountVolume.SetUp succeeded for volume "glusterfs-cgroup"
2m 2m 1 kubelet, sum-vm-2 Normal SuccessfulMountVolume MountVolume.SetUp succeeded for volume "glusterfs-heketi"
2m 2m 2 kubelet, sum-vm-2 Normal SuccessfulMountVolume (combined from similar events): MountVolume.SetUp succeeded for volume "default-token-1n7cc"
2m 1m 5 kubelet, sum-vm-2 spec.containers{glusterfs} Normal Pulled Container image "gluster/gluster-centos:latest" already present on machine
2m 1m 5 kubelet, sum-vm-2 spec.containers{glusterfs} Normal Created Created container
2m 1m 5 kubelet, sum-vm-2 spec.containers{glusterfs} Normal Started Started container
2m 7s 13 kubelet, sum-vm-2 spec.containers{glusterfs} Warning BackOff Back-off restarting failed container
2m 7s 13 kubelet, sum-vm-2 Warning FailedSync Error syncing pod
cat /etc/*-release
CentOS Linux release 7.3.1611 (Core)
NAME="CentOS Linux"
VERSION="7 (Core)"
ID="centos"
ID_LIKE="rhel fedora"
VERSION_ID="7"
PRETTY_NAME="CentOS Linux 7 (Core)"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:centos:centos:7"
HOME_URL="https://www.centos.org/"
BUG_REPORT_URL="https://bugs.centos.org/"
CENTOS_MANTISBT_PROJECT="CentOS-7"
CENTOS_MANTISBT_PROJECT_VERSION="7"
REDHAT_SUPPORT_PRODUCT="centos"
REDHAT_SUPPORT_PRODUCT_VERSION="7"
CentOS Linux release 7.3.1611 (Core)
CentOS Linux release 7.3.1611 (Core)
@erinboyd @jarrpa pls see describe pod of glusterfs daemonset, im using centos 7 core.
@erinboyd also to answer your question about systemd, yes.
[[ systemctl
=~ -.mount ]] && echo yes || echo no
yes
@Justluckyg That's not the daemonset, that's the pod. I'd like the output of kubectl describe ds glusterfs
.
@jarrpa ohh i didnt know that command, here you go:
kubectl describe ds glusterfs
2017-09-08 12:16:22.526270 I | proto: duplicate proto type registered: google.protobuf.Any
2017-09-08 12:16:22.526534 I | proto: duplicate proto type registered: google.protobuf.Duration
2017-09-08 12:16:22.526548 I | proto: duplicate proto type registered: google.protobuf.Timestamp
Name: glusterfs
Selector: glusterfs=pod,glusterfs-node=pod
Node-Selector: storagenode=glusterfs
Labels: glusterfs=daemonset
Annotations: description=GlusterFS DaemonSet
tags=glusterfs
Desired Number of Nodes Scheduled: 3
Current Number of Nodes Scheduled: 3
Number of Nodes Scheduled with Up-to-date Pods: 3
Number of Nodes Scheduled with Available Pods: 0
Number of Nodes Misscheduled: 0
Pods Status: 3 Running / 0 Waiting / 0 Succeeded / 0 Failed
Pod Template:
Labels: glusterfs=pod
glusterfs-node=pod
Containers:
glusterfs:
Image: gluster/gluster-centos:latest
Port: <none>
Requests:
cpu: 100m
memory: 100Mi
Liveness: exec [/bin/bash -c systemctl status glusterd.service] delay=40s timeout=3s period=25s #success=1 #failure=15
Readiness: exec [/bin/bash -c systemctl status glusterd.service] delay=40s timeout=3s period=25s #success=1 #failure=15
Environment: <none>
Mounts:
/dev from glusterfs-dev (rw)
/etc/glusterfs from glusterfs-etc (rw)
/etc/ssl from glusterfs-ssl (ro)
/run from glusterfs-run (rw)
/run/lvm from glusterfs-lvm (rw)
/sys/fs/cgroup from glusterfs-cgroup (ro)
/var/lib/glusterd from glusterfs-config (rw)
/var/lib/heketi from glusterfs-heketi (rw)
/var/lib/misc/glusterfsd from glusterfs-misc (rw)
/var/log/glusterfs from glusterfs-logs (rw)
Volumes:
glusterfs-heketi:
Type: HostPath (bare host directory volume)
Path: /var/lib/heketi
glusterfs-run:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
glusterfs-lvm:
Type: HostPath (bare host directory volume)
Path: /run/lvm
glusterfs-etc:
Type: HostPath (bare host directory volume)
Path: /etc/glusterfs
glusterfs-logs:
Type: HostPath (bare host directory volume)
Path: /var/log/glusterfs
glusterfs-config:
Type: HostPath (bare host directory volume)
Path: /var/lib/glusterd
glusterfs-dev:
Type: HostPath (bare host directory volume)
Path: /dev
glusterfs-misc:
Type: HostPath (bare host directory volume)
Path: /var/lib/misc/glusterfsd
glusterfs-cgroup:
Type: HostPath (bare host directory volume)
Path: /sys/fs/cgroup
glusterfs-ssl:
Type: HostPath (bare host directory volume)
Path: /etc/ssl
Events:
FirstSeen LastSeen Count From SubObjectPath Type Reason Message
--------- -------- ----- ---- ------------- -------- ------ -------
30m 30m 1 daemon-set Normal SuccessfulCreate Created pod: glusterfs-vvr33
30m 30m 1 daemon-set Normal SuccessfulCreate Created pod: glusterfs-2dccz
30m 30m 1 daemon-set Normal SuccessfulCreate Created pod: glusterfs-2sgr0
@Justluckyg The gluster containers write their logs to the /var/log/glusterfs
directories of the nodes they're running on. Can you inspect one of the nodes and see if the glusterd.log
file shows any useful error messages?
hi @jarrpa theres nothing in the /var/log/glusterfs, here what i saw in the /var/log/containers
[[email protected] containers]# tail glusterfs-2sgr0_default_glusterfs-0cba0fb47aadda22f5f0fe2aca8a260213a5849db32a7f0411e72f0e3dfe5847.log
{"log":"Couldn't find an alternative telinit implementation to spawn.\n","stream":"stderr","time":"2017-09-08T04:28:02.038319988Z"}
[[email protected] containers]# cd ..
[[email protected] log]# ls -la glusterfs/
total 4
drwxr-xr-x. 2 root root 6 Nov 15 2016 .
drwxr-xr-x. 12 root root 4096 Sep 4 11:28 ..
What's your version of Kube?
You might be running into this: https://github.com/gluster/gluster-kubernetes/issues/298
@jarrpa
Kubernetes v1.7.3+coreos.0
Docker version 1.13.1, build 092cba3
I did try downgrading to 1.12 but that version conflicts with the kubespray ansible i ran to deploy the cluster https://github.com/kubernetes-incubator/kubespray
@jarrpa according to this
if i add this to the container config, it will get the systemd to run
env:
- name: SYSTEMD_IGNORE_CHROOT
value: "1"
command:
- /usr/lib/systemd/systemd
- --system
how and where can i change it ?.. thanks!!
Yes, but you don't want to do that. :) The systemd folks have said that they do not support running under --system
configuration when it's not actually PID 1. Unfortunately, right now there is not a way around this. There are only two official workarounds: 1.) downgrade Docker, or 2.) Pass a flag to all kubelets in your cluster https://github.com/gluster/gluster-kubernetes/issues/298#issuecomment-325953404
Your other option is to wait for another release of Kube v1.8, as this PR should also solve the problem: https://github.com/kubernetes/kubernetes/pull/51634
Finally, if you want to get something working now, you can try out these experimental images I've put together more-or-less just for fun: https://github.com/gluster/gluster-kubernetes/issues/298#issuecomment-325985569
@jarrpa appreciate your response.. when will 1.8 be released?
and as per your custom image, if i try to use that, ill just replace the one i cloned initially, specifically the glusterfs-daemonset.yaml right?
Looks like no date has been set yet. And yeah, replace the glusterfs-daemonset.yaml file.
@jarrpa hi again.. i think im making some progress after cloning your yaml file.
I was getting below, how do i change the request of resources to something lower. i testing this in a desktop server only before testing in our production env.
Events:
FirstSeen LastSeen Count From SubObjectPath Type Reason Message
--------- -------- ----- ---- ------------- -------- ------ -------
6m 6m 1 daemon-set Normal SuccessfulCreate Created pod: glusterfs-wc5hm
6m 6m 1 daemon-set Normal SuccessfulCreate Created pod: glusterfs-t35z8
6m 2s 92 daemonset-controller Warning FailedPlacement failed to place pod on "sum-vm-3": Node didn't have enough resource: cpu, requested: 100, used: 895, capacity: 900
@jarrpa but then the 2 pods that has the sufficient resources are still crashing, logs here below
[glusterd......] [2017-09-11 04:30:00.371392] E [rpc-transport.c:283:rpc_transport_load] 0-rpc-transport: /usr/lib64/glusterfs/3.10.5/rpc-transport/rdma.so: cannot open shared object file: No such file or directory
[glusterd......] [2017-09-11 04:30:00.371427] W [rpc-transport.c:287:rpc_transport_load] 0-rpc-transport: volume 'rdma.management': transport-type 'rdma' is not valid or not found on this machine
[glusterd......] [2017-09-11 04:30:00.371435] W [rpcsvc.c:1661:rpcsvc_create_listener] 0-rpc-service: cannot create listener, initing the transport failed
[glusterd......] [2017-09-11 04:30:00.371441] E [MSGID: 106243] [glusterd.c:1720:init] 0-management: creation of 1 listeners failed, continuing with succeeded transport
[glusterd......] [2017-09-11 04:30:02.063040] I [MSGID: 106513] [glusterd-store.c:2197:glusterd_restore_op_version] 0-glusterd: retrieved op-version: 31004
[glusterd......] [2017-09-11 04:30:02.063186] I [MSGID: 106194] [glusterd-store.c:3776:glusterd_store_retrieve_missed_snaps_list] 0-management: No missed snaps list.
[glusterd......] Final graph:
[glusterd......] +------------------------------------------------------------------------------+
[glusterd......] 1: volume management
[glusterd......] 2: type mgmt/glusterd
[glusterd......] 3: option rpc-auth.auth-glusterfs on
[glusterd......] 4: option rpc-auth.auth-unix on
[glusterd......] 5: option rpc-auth.auth-null on
[glusterd......] 6: option rpc-auth-allow-insecure on
[glusterd......] 7: option transport.socket.listen-backlog 128
[glusterd......] 8: option event-threads 1
[glusterd......] 9: option ping-timeout 0
[glusterd......] 10: option transport.socket.read-fail-log off
[glusterd......] 11: option transport.socket.keepalive-interval 2
[glusterd......] 12: option transport.socket.keepalive-time 10
[glusterd......] 13: option transport-type rdma
[glusterd......] 14: option working-directory /var/lib/glusterd
[glusterd......] 15: end-volume
[glusterd......] 16:
[glusterd......] +------------------------------------------------------------------------------+
[glusterd......] [2017-09-11 04:30:02.064674] I [MSGID: 101190] [event-epoll.c:629:event_dispatch_epoll_worker] 0-epoll: Started thread with index 1
[glusterd......] [2017-09-11 04:30:03.135422] I [MSGID: 106488] [glusterd-handler.c:1538:__glusterd_handle_cli_get_volume] 0-management: Received get vol req
@jarrpa does having a working glusterfs cluster a pre req before i could even run gk-deploy?
and is the instructions below the same ?
https://wiki.centos.org/HowTos/GlusterFSonCentOS
one more rather "dumb" question, accdg to the requirements, it should have a completely empty block device storage. im running this on a test desktop server with a single disk and using vsphere esxi to create vm's and create a cluster for these vm's. i only added a virtual disk to this nodes and declared those in the topology.json.
am i required to use a totally different physical block storage device for this to work?
@Justluckyg
does having a working glusterfs cluster a pre req before i could even run gk-deploy?
nope.. gk-deploy helps in setting up glusterfs cluster setup.
Refer this link for pre-requisites:
https://github.com/gluster/gluster-kubernetes/blob/master/deploy/gk-deploy#L514
am i required to use a totally different physical block storage device for this to work?
not required. You can mention virtual disk in topology.json
@SaravanaStorageNetwork does the virtual disk count as a valid block device for glusterfs to work?
@SaravanaStorageNetwork does the virtual disk count as a valid block device for glusterfs to work?
Yes
@Justluckyg Sorry for the delayed response, I'm travelling all this week. :)
I was getting below, how do i change the request of resources to something lower.
Just edit this line.
but then the 2 pods that has the sufficient resources are still crashing
The logs you provided don't show the error condition, that's just normal glusterd output. Try kubectl logs -p
.
On the block device: You just need UNUSED block devices. As long as the block devices you mention are not in use (e.g. by the OS) they can be virtual or physical; they just need to be accessible from the node.
@jarrpa here's the logs
kubectl logs -p glusterfs-1czwt
[glusterd......] [2017-09-13 05:17:19.840473] I [MSGID: 100030] [glusterfsd.c:2475:main] 0-/usr/sbin/glusterd: Started running /usr/sbin/glusterd version 3.10.5 (args: /usr/sbin/glusterd -p /var/run/glusterd.pid --log-level INFO)
[glusterd......] [2017-09-13 05:17:19.848096] I [MSGID: 106478] [glusterd.c:1449:init] 0-management: Maximum allowed open file descriptors set to 65536
[glusterd......] [2017-09-13 05:17:19.848128] I [MSGID: 106479] [glusterd.c:1496:init] 0-management: Using /var/lib/glusterd as working directory
[glusterd......] [2017-09-13 05:17:19.849940] E [rpc-transport.c:283:rpc_transport_load] 0-rpc-transport: /usr/lib64/glusterfs/3.10.5/rpc-transport/rdma.so: cannot open shared object file: No such file or directory
[glusterd......] [2017-09-13 05:17:19.849958] W [rpc-transport.c:287:rpc_transport_load] 0-rpc-transport: volume 'rdma.management': transport-type 'rdma' is not valid or not found on this machine
[glusterd......] [2017-09-13 05:17:19.849974] W [rpcsvc.c:1661:rpcsvc_create_listener] 0-rpc-service: cannot create listener, initing the transport failed
[glusterd......] [2017-09-13 05:17:19.849981] E [MSGID: 106243] [glusterd.c:1720:init] 0-management: creation of 1 listeners failed, continuing with succeeded transport
[glusterd......] [2017-09-13 05:17:21.654311] I [MSGID: 106513] [glusterd-store.c:2197:glusterd_restore_op_version] 0-glusterd: retrieved op-version: 31004
[glusterd......] [2017-09-13 05:17:21.654433] I [MSGID: 106194] [glusterd-store.c:3776:glusterd_store_retrieve_missed_snaps_list] 0-management: No missed snaps list.
[glusterd......] Final graph:
[glusterd......] +------------------------------------------------------------------------------+
[glusterd......] 1: volume management
[glusterd......] 2: type mgmt/glusterd
[glusterd......] 3: option rpc-auth.auth-glusterfs on
[glusterd......] 4: option rpc-auth.auth-unix on
[glusterd......] 5: option rpc-auth.auth-null on
[glusterd......] 6: option rpc-auth-allow-insecure on
[glusterd......] 7: option transport.socket.listen-backlog 128
[glusterd......] 8: option event-threads 1
[glusterd......] 9: option ping-timeout 0
[glusterd......] 10: option transport.socket.read-fail-log off
[glusterd......] 11: option transport.socket.keepalive-interval 2
[glusterd......] 12: option transport.socket.keepalive-time 10
[glusterd......] 13: option transport-type rdma
[glusterd......] 14: option working-directory /var/lib/glusterd
[glusterd......] 15: end-volume
[glusterd......] 16:
[glusterd......] +------------------------------------------------------------------------------+
[glusterd......] [2017-09-13 05:17:21.655855] I [MSGID: 101190] [event-epoll.c:629:event_dispatch_epoll_worker] 0-epoll: Started thread with index 1
[glusterd......] [2017-09-13 05:17:22.721654] I [MSGID: 106488] [glusterd-handler.c:1538:__glusterd_handle_cli_get_volume] 0-management: Received get vol req
[glusterd......] Starting glusterd ... OK
[glusterd......] [2017-09-13 05:17:22.783844] I [MSGID: 106544] [glusterd.c:158:glusterd_uuid_init] 0-management: retrieved UUID: e535fee0-b219-4eae-b9cc-d454510a70ef
[glusterd......] volume set: success
[glusterd......] [2017-09-13 05:17:22.849589] I [run.c:191:runner_log] (-->/usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so(+0xcf4fa) [0x7f9d37b904fa] -->/usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so(+0xcefac) [0x7f9d37b8ffac] -->/lib64/libglusterfs.so.0(runner_log+0x105) [0x7f9d3cbb7ee5] ) 0-management: Ran script: /var/lib/glusterd/hooks/1/set/post/S30samba-set.sh --volname=all -o cluster.brick-multiplex=on --gd-workdir=/var/lib/glusterd
[glusterd......] [2017-09-13 05:17:22.866467] E [MSGID: 106025] [glusterd-op-sm.c:1097:glusterd_op_stage_set_volume] 0-management: Option with name: cluster.max-bricks-per-process does not exist
[glusterd......] [2017-09-13 05:17:22.866521] E [MSGID: 106301] [glusterd-syncop.c:1321:gd_stage_op_phase] 0-management: Staging of operation 'Volume Set' failed on localhost : option : cluster.max-bricks-per-process does not exist
[glusterd......] Did you mean cluster.max-op-version or ...min-free-inodes?
[glusterd......] volume set: failed: option : cluster.max-bricks-per-process does not exist
[glusterd......] Did you mean cluster.max-op-version or ...min-free-inodes?
[glusterd......] Setting max bricks per process failed
[glusterd......] Killing glusterd ...
[glusterd......] [2017-09-13 05:17:22.870984] W [glusterfsd.c:1332:cleanup_and_exit] (-->/lib64/libpthread.so.0(+0x736d) [0x7f9d3b9bc36d] -->/usr/sbin/glusterd(glusterfs_sigwaiter+0xf5) [0x7f9d3d06ad05] -->/usr/sbin/glusterd(cleanup_and_exit+0x5a) [0x7f9d3d06aafa] ) 0-: received signum (15), shutting down
[glusterd......] [2017-09-13 05:17:22.906409] I [run.c:191:runner_log] (-->/usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so(+0xcf4fa) [0x7f9d37b904fa] -->/usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so(+0xcefac) [0x7f9d37b8ffac] -->/lib64/libglusterfs.so.0(runner_log+0x105) [0x7f9d3cbb7ee5] ) 0-management: Ran script: /var/lib/glusterd/hooks/1/set/post/S32gluster_enable_shared_storage.sh --volname=all -o cluster.brick-multiplex=on --gd-workdir=/var/lib/glusterd
[glusterd......] OK
[glusterd......] [2017-09-13 05:17:23.913910] I [MSGID: 100030] [glusterfsd.c:2475:main] 0-/usr/sbin/glusterd: Started running /usr/sbin/glusterd version 3.10.5 (args: /usr/sbin/glusterd -p /var/run/glusterd.pid --log-level INFO)
[glusterd......] [2017-09-13 05:17:23.919857] I [MSGID: 106478] [glusterd.c:1449:init] 0-management: Maximum allowed open file descriptors set to 65536
[glusterd......] [2017-09-13 05:17:23.919907] I [MSGID: 106479] [glusterd.c:1496:init] 0-management: Using /var/lib/glusterd as working directory
[glusterd......] [2017-09-13 05:17:23.923276] E [rpc-transport.c:283:rpc_transport_load] 0-rpc-transport: /usr/lib64/glusterfs/3.10.5/rpc-transport/rdma.so: cannot open shared object file: No such file or directory
[glusterd......] [2017-09-13 05:17:23.923314] W [rpc-transport.c:287:rpc_transport_load] 0-rpc-transport: volume 'rdma.management': transport-type 'rdma' is not valid or not found on this machine
[glusterd......] [2017-09-13 05:17:23.923333] W [rpcsvc.c:1661:rpcsvc_create_listener] 0-rpc-service: cannot create listener, initing the transport failed
[glusterd......] [2017-09-13 05:17:23.923343] E [MSGID: 106243] [glusterd.c:1720:init] 0-management: creation of 1 listeners failed, continuing with succeeded transport
[glusterd......] [2017-09-13 05:17:25.579290] I [MSGID: 106513] [glusterd-store.c:2197:glusterd_restore_op_version] 0-glusterd: retrieved op-version: 31004
[glusterd......] [2017-09-13 05:17:25.579417] I [MSGID: 106194] [glusterd-store.c:3776:glusterd_store_retrieve_missed_snaps_list] 0-management: No missed snaps list.
[glusterd......] Final graph:
[glusterd......] +------------------------------------------------------------------------------+
[glusterd......] 1: volume management
[glusterd......] 2: type mgmt/glusterd
[glusterd......] 3: option rpc-auth.auth-glusterfs on
[glusterd......] 4: option rpc-auth.auth-unix on
[glusterd......] 5: option rpc-auth.auth-null on
[glusterd......] 6: option rpc-auth-allow-insecure on
[glusterd......] 7: option transport.socket.listen-backlog 128
[glusterd......] 8: option event-threads 1
[glusterd......] 9: option ping-timeout 0
[glusterd......] 10: option transport.socket.read-fail-log off
[glusterd......] 11: option transport.socket.keepalive-interval 2
[glusterd......] 12: option transport.socket.keepalive-time 10
[glusterd......] 13: option transport-type rdma
[glusterd......] 14: option working-directory /var/lib/glusterd
[glusterd......] 15: end-volume
[glusterd......] 16:
[glusterd......] +------------------------------------------------------------------------------+
[glusterd......] [2017-09-13 05:17:25.580817] I [MSGID: 101190] [event-epoll.c:629:event_dispatch_epoll_worker] 0-epoll: Started thread with index 1
[glusterd......] [2017-09-13 05:17:26.655818] I [MSGID: 106488] [glusterd-handler.c:1538:__glusterd_handle_cli_get_volume] 0-management: Received get vol req
[glusterd......] Starting glusterd ... OK
[targetctl.....] Traceback (most recent call last):
[targetctl.....] File "/usr/lib/python3.6/site-packages/rtslib_fb/utils.py", line 429, in modprobe
[targetctl.....] kmod.Kmod().modprobe(module)
[targetctl.....] File "kmod/kmod.pyx", line 106, in kmod.kmod.Kmod.modprobe (kmod/kmod.c:3166)
[targetctl.....] File "kmod/kmod.pyx", line 82, in lookup (kmod/kmod.c:2393)
[targetctl.....] kmod.error.KmodError: Could not modprobe
[targetctl.....]
[targetctl.....] During handling of the above exception, another exception occurred:
[targetctl.....]
[targetctl.....] Traceback (most recent call last):
[targetctl.....] File "/usr/bin/targetctl", line 82, in <module>
[targetctl.....] main()
[targetctl.....] File "/usr/bin/targetctl", line 79, in main
[targetctl.....] funcs[sys.argv[1]](savefile)
[targetctl.....] File "/usr/bin/targetctl", line 47, in restore
[targetctl.....] errors = RTSRoot().restore_from_file(restore_file=from_file)
[targetctl.....] File "/usr/lib/python3.6/site-packages/rtslib_fb/root.py", line 75, in __init__
[targetctl.....] modprobe('target_core_mod')
[targetctl.....] File "/usr/lib/python3.6/site-packages/rtslib_fb/utils.py", line 431, in modprobe
[targetctl.....] raise RTSLibError("Could not load module: %s" % module)
[targetctl.....] rtslib_fb.utils.RTSLibError: Could not load module: target_core_mod
[gluster-blockd] [2017-09-13 05:17:26.977966] ERROR: tcmu-runner running, but targetcli doesn't list user:glfs handler [at gluster-blockd.c+281 :<blockNodeSanityCheck>]
tcmu-runner has failed
gluster-blockd has failed
Exiting
Killing processes ... [glusterd......] [2017-09-13 05:17:27.707047] W [glusterfsd.c:1332:cleanup_and_exit] (-->/lib64/libpthread.so.0(+0x736d) [0x7f7c3a5e336d] -->/usr/sbin/glusterd(glusterfs_sigwaiter+0xf5) [0x7f7c3bc91d05] -->/usr/sbin/glusterd(cleanup_and_exit+0x5a) [0x7f7c3bc91afa] ) 0-: received signum (15), shutting down
OK
@Justluckyg You're missing some kernel modules which are new requirements for the versions of Gluster running in my containers. See here.
@jarrpa I was able to get the pods running after loading those modules (thanks! yey!) However, im having another problem same as the one he reported
I tried manually starting gluster daemon,and it wont start too.
I'm not completely sure with how he resolved it, like tagging the latest, which image and which yaml file did he modify?
also unlike him, i cannot do the peer probe, it says the daemon is not running.
update
i tried changing the heketi image tag to latest (also tried block) on both heketi-deployment and deploy-heketi yaml file, but both gave the same error:
Checking status of pods matching '--selector=deploy-heketi=pod':
deploy-heketi-1254564305-1cfgl 1/1 Running 0 11s
OK
Determining heketi service URL ... OK
/usr/local/bin/kubectl -n default exec -i deploy-heketi-1254564305-1cfgl -- heketi-cli -s http://localhost:8080 --user admin --secret '' topology load --json=/etc/heketi/topology.json 2>&1
Creating cluster ... ID: a82854def37a9243e8013c021e7ccaca
Allowing file volumes on cluster.
Allowing block volumes on cluster.
Creating node sum-vm-1 ... Unable to create node: New Node doesn't have glusterd running
Creating node sum-vm-2 ... Unable to create node: New Node doesn't have glusterd running
Creating node sum-vm-3 ... Unable to create node: New Node doesn't have glusterd running
Error loading the cluster topology.
Please check the failed node or device and rerun this script
systemctl status glusterd
● glusterd.service - GlusterFS, a clustered file-system server
Loaded: loaded (/usr/lib/systemd/system/glusterd.service; enabled; vendor preset: disabled)
Active: failed (Result: exit-code) since Fri 2017-09-15 14:28:40 +08; 1h 13min ago
Process: 879 ExecStart=/usr/sbin/glusterd -p /var/run/glusterd.pid --log-level $LOG_LEVEL $GLUSTERD_OPTIONS (code=exited, status=255)
Sep 15 14:28:39 sum-vm-1 systemd[1]: Starting GlusterFS, a clustered file-system server...
Sep 15 14:28:40 sum-vm-1 GlusterFS[879]: [glusterfsd.c:1844:parse_cmdline] 0-glusterfs: ERROR: parsing the volfile failed [No such file or directory]
Sep 15 14:28:40 sum-vm-1 glusterd[879]: USAGE: /usr/sbin/glusterd [options] [mountpoint]
Sep 15 14:28:40 sum-vm-1 systemd[1]: glusterd.service: control process exited, code=exited status=255
Sep 15 14:28:40 sum-vm-1 systemd[1]: Failed to start GlusterFS, a clustered file-system server.
Sep 15 14:28:40 sum-vm-1 systemd[1]: Unit glusterd.service entered failed state.
Sep 15 14:28:40 sum-vm-1 systemd[1]: glusterd.service failed.
@jarrpa another update
so i deleted the virtual disks and recreated it again, reset the glusterfs and deleted all those dirs and redeploy.
this is the new error im getting
Waiting for GlusterFS pods to start ...
Checking status of pods matching '--selector=glusterfs=pod':
glusterfs-3f5kb 1/1 Running 0 42s
glusterfs-pgwrw 1/1 Running 0 42s
glusterfs-wf7v3 1/1 Running 0 42s
OK
/usr/local/bin/kubectl -n default create secret generic heketi-config-secret --from-file=private_key=/dev/null --from-file=./heketi.json --from-file=topology.json=topology.json
secret "heketi-config-secret" created
/usr/local/bin/kubectl -n default label --overwrite secret heketi-config-secret glusterfs=heketi-config-secret heketi=config-secret
secret "heketi-config-secret" labeled
sed -e 's/\${HEKETI_EXECUTOR}/kubernetes/' -e 's/\${HEKETI_ADMIN_KEY}//' -e 's/\${HEKETI_USER_KEY}//' /root/gluster-kubernetes/deploy/kube-templates/deploy-heketi-deployment.yaml | /usr/local/bin/kubectl -n default create -f - 2>&1
service "deploy-heketi" created
deployment "deploy-heketi" created
Waiting for deploy-heketi pod to start ...
Checking status of pods matching '--selector=deploy-heketi=pod':
deploy-heketi-2199298601-6dlc3 1/1 Running 0 7s
OK
Determining heketi service URL ... Failed to communicate with deploy-heketi service.
[[email protected] ~]# kubectl get svc
NAME CLUSTER-IP EXTERNAL-IP PORT(S) AGE
deploy-heketi 10.233.10.148 <none> 8080/TCP 10m
kubernetes 10.233.0.1 <none> 443/TCP 21d
[[email protected] ~]# kubectl describe svc deploy-heketi
Name: deploy-heketi
Namespace: default
Labels: deploy-heketi=service
glusterfs=heketi-service
Annotations: description=Exposes Heketi Service
Selector: deploy-heketi=pod
Type: ClusterIP
IP: 10.233.10.148
Port: deploy-heketi 8080/TCP
Endpoints: 10.233.124.175:8080
Session Affinity: None
Events: <none>
Can you curl the heketi port? e.g. curl http://10.233.10.148:8080/hello
? If not there might be a networking or firewall issue.
so I tried it again, im not sure what changed but the first time i did it, the heketi service was reachable (that was when i was getting the none of the nodes glusterd has started)
i aborted the deployment, deleted the directories and tried deploying it again. and same error:
Determining heketi service URL ... Failed to communicate with deploy-heketi service.
what log can i check to see if its the network or firewall thats causing it? my cluster was deployed via kubespray (using calico as default cni)
thanks @jarrpa
@Justluckyg Uncertain... but if you're using kubespray, you might be running into this?
@jarrpa hi, i reran my kubespray and gk-deploy, i got past the heketi this time but now im getting this. this vdisk are all empty, im not sure what this error means..
Creating cluster ... ID: 63fd16cc8effbb6ea0ebb24cb4517c87
Creating node sum-vm-1 ... ID: 8f6818a2a227d4fea569f078edf2d98e
Adding device /dev/sdb ... Unable to add device: Unable to execute command on glusterfs-9h860: Couldn't find device with uuid QUKge9-Ytef-JM0q-fpAT-kc9c-cHD0-dGoZVN.
Couldn't find device with uuid qgu45O-hQg5-BZp2-g9oe-NXsN-a5lV-53Erq6.
Can't open /dev/sdb exclusively. Mounted filesystem?
Creating node sum-vm-2 ... ID: 7a46eac46700ae27dafba52d3eb0b8e8
Adding device /dev/sdb ... Unable to add device: Unable to execute command on glusterfs-cg1pm: Can't open /dev/sdb exclusively. Mounted filesystem?
Creating node sum-vm-3 ... ID: 1e53192e8131b311e289d5257fb4e2f8
Adding device /dev/sdb ... Unable to add device: Unable to execute command on glusterfs-9l6nr: Can't open /dev/sdb exclusively. Mounted filesystem?
Error loading the cluster topology.
Please check the failed node or device and rerun this script
@Justluckyg vgs
and pvs
show nothing for those drives on the respective hosts?
@Justluckyg And did you completely destroy the deploy-heketi pod?
@jarrpa do you mean kubectl get pvs? sorry not sure what you meant
@Justluckyg No. Those are commands you can run on the hosts to see if there are any LVM Volume Groups (vgs) or Physical Volumes (pvs) defined that are using the raw block devices.
Hi @jarrpa
The drive i allocated for each of my nodes is the
/dev/sdb 50G 52M 47G 1% /kubernetes
drive and in each nodes i ran pvdisplay and vgdisplay, none of them shows the /dev/sdb
```
[[email protected] deploy]# vgdisplay
WARNING: Device for PV QUKge9-Ytef-JM0q-fpAT-kc9c-cHD0-dGoZVN not found or rejected by a filter.
WARNING: Device for PV qgu45O-hQg5-BZp2-g9oe-NXsN-a5lV-53Erq6 not found or rejected by a filter.
--- Volume group ---
VG Name cl
System ID
Format lvm2
Metadata Areas 1
Metadata Sequence No 5
VG Access read/write
VG Status resizable
MAX LV 0
Cur LV 2
Open LV 2
Max PV 0
Cur PV 3
Act PV 1
VG Size <182.99 GiB
PE Size 4.00 MiB
Total PE 46845
Alloc PE / Size 3839 / <15.00 GiB
Free PE / Size 43006 / 167.99 GiB
VG UUID MGMTCn-29jM-W3ku-ppGI-p82g-CnOM-GnulFM
[[email protected] deploy]# pvdisplay
WARNING: Device for PV QUKge9-Ytef-JM0q-fpAT-kc9c-cHD0-dGoZVN not found or rejected by a filter.
WARNING: Device for PV qgu45O-hQg5-BZp2-g9oe-NXsN-a5lV-53Erq6 not found or rejected by a filter.
--- Physical volume ---
PV Name /dev/sda2
VG Name cl
PV Size <15.00 GiB / not usable 3.00 MiB
Allocatable yes (but full)
PE Size 4.00 MiB
Total PE 3839
Free PE 0
Allocated PE 3839
PV UUID cdn6Ss-tBS2-OSU9-GMJw-xTae-eLv0-EU8p58
--- Physical volume ---
PV Name [unknown]
VG Name cl
PV Size 84.00 GiB / not usable 4.00 MiB
Allocatable yes
PE Size 4.00 MiB
Total PE 21503
Free PE 21503
Allocated PE 0
PV UUID QUKge9-Ytef-JM0q-fpAT-kc9c-cHD0-dGoZVN
--- Physical volume ---
PV Name [unknown]
VG Name cl
PV Size 84.00 GiB / not usable 4.00 MiB
Allocatable yes
PE Size 4.00 MiB
Total PE 21503
Free PE 21503
Allocated PE 0
PV UUID qgu45O-hQg5-BZp2-g9oe-NXsN-a5lV-53Erq6
[[email protected] ~]# vgdisplay
--- Volume group ---
VG Name cl
System ID
Format lvm2
Metadata Areas 1
Metadata Sequence No 3
VG Access read/write
VG Status resizable
MAX LV 0
Cur LV 2
Open LV 2
Max PV 0
Cur PV 1
Act PV 1
VG Size <15.00 GiB
PE Size 4.00 MiB
Total PE 3839
Alloc PE / Size 3839 / <15.00 GiB
Free PE / Size 0 / 0
VG UUID 8YFCKa-zbof-o91I-90n3-2KvR-dSBX-4HLem3
[[email protected] ~]# pvdisplay
--- Physical volume ---
PV Name /dev/sda2
VG Name cl
PV Size <15.00 GiB / not usable 3.00 MiB
Allocatable yes (but full)
PE Size 4.00 MiB
Total PE 3839
Free PE 0
Allocated PE 3839
PV UUID n0bqJP-hCY2-jNIf-rHr5-N9xZ-8zrb-adMiio
[[email protected] ~]# vgdisplay
--- Volume group ---
VG Name cl
System ID
Format lvm2
Metadata Areas 1
Metadata Sequence No 3
VG Access read/write
VG Status resizable
MAX LV 0
Cur LV 2
Open LV 2
Max PV 0
Cur PV 1
Act PV 1
VG Size <15.00 GiB
PE Size 4.00 MiB
Total PE 3839
Alloc PE / Size 3839 / <15.00 GiB
Free PE / Size 0 / 0
VG UUID c2u8D8-5ixB-no7G-b2rH-8z3A-WfuW-XJKHcO
[[email protected] ~]# pvdisplay
--- Physical volume ---
PV Name /dev/sda2
VG Name cl
PV Size <15.00 GiB / not usable 3.00 MiB
Allocatable yes (but full)
PE Size 4.00 MiB
Total PE 3839
Free PE 0
Allocated PE 3839
PV UUID s1uM06-PHtg-rtiu-vMCo-nYy2-vMYJ-McppA2
````
What does "/dev/sdb 50G 52M 47G 1% /kubernetes" mean? Is sdb already formatted and mounted on /kubernetes?
it was an addtional vdisk i attahced to each vm and mounted to /kubernetes @jarrpa
Ah! The disk must not be mounted. It must just be a raw block device, so no mount and no filesystem formatted onto it. Please "umount /kubernetes" and then do the "wipefs".
@jarrpa
this should be how it looks?
[[email protected] ~]# lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
fd0 2:0 1 4K 0 disk
sda 8:0 0 16G 0 disk
├─sda1 8:1 0 1G 0 part /boot
└─sda2 8:2 0 15G 0 part
├─cl-root 253:0 0 13.4G 0 lvm /
└─cl-swap 253:1 0 1.6G 0 lvm [SWAP]
sdb 8:16 0 50G 0 disk
sr0 11:0 1 4.1G 0 rom
@Justluckyg Yup!
@jarrpa thanks, i just tried to re-ran the script and im not sure whats happening now, its been stuck at creating node sum-vm-1 ... ID....
Its been like that for the last 10mins..
@Justluckyg Ugh... it feels like we're SO CLOSE.
What timezone are you in? Would you be comfortable with an audio call with screen sharing? This would probably get us to a solution much faster.
Otherwise, let's go through the basics one more time:
gk-deploy -g --abort
/dev/sdb
devicesgk-deploy -gvy
If this fails, capture the full output of:
gk-deploy
kubectl get -o wide --selector=glusterfs
.lsblk
on each nodeUgh, sorry for the wrong terminology. Too much time downstream. :) Fixed.
Hi @jarrpa I really appreciate your help on this... I tried it again, i even reset my k8s cluster and redeploy gk-deploy, rebooted the servers, reloaded the modules, re ran the script.. and the farthest ive gotten is still at
Creating cluster ... ID: 181034507612bba11108086588c1e1c9
Creating node sum-vm-1 ... ID: fffeb796201431210a556c9a23e4a555
im GMT+8, Manila timezone, to answer your question.
what other logs i can look at? should i look at the container logs to see where its failing at? theres no error, it just stays there until my ssh session times out.
kubectl get -o wide --selector=glusterfs
You must specify the type of resource to get. Valid resource types include:
* all
* certificatesigningrequests (aka 'csr')
* clusterrolebindings
* clusterroles
* clusters (valid only for federation apiservers)
* componentstatuses (aka 'cs')
* configmaps (aka 'cm')
* controllerrevisions
* cronjobs
* daemonsets (aka 'ds')
* deployments (aka 'deploy')
* endpoints (aka 'ep')
* events (aka 'ev')
* horizontalpodautoscalers (aka 'hpa')
* ingresses (aka 'ing')
* jobs
* limitranges (aka 'limits')
* namespaces (aka 'ns')
* networkpolicies (aka 'netpol')
* nodes (aka 'no')
* persistentvolumeclaims (aka 'pvc')
* persistentvolumes (aka 'pv')
* poddisruptionbudgets (aka 'pdb')
* podpreset
* pods (aka 'po')
* podsecuritypolicies (aka 'psp')
* podtemplates
* replicasets (aka 'rs')
* replicationcontrollers (aka 'rc')
* resourcequotas (aka 'quota')
* rolebindings
* roles
* secrets
* serviceaccounts (aka 'sa')
* services (aka 'svc')
* statefulsets
* storageclasses
* thirdpartyresources
error: Required resource not specified.
Use "kubectl explain <resource>" for a detailed description of that resource (e.g. kubectl explain pods).
See 'kubectl get -h' for help and examples.
this is the same to all 3 nodes
lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
fd0 2:0 1 4K 0 disk
sda 8:0 0 100G 0 disk
├─sda1 8:1 0 1G 0 part /boot
└─sda2 8:2 0 99G 0 part
├─cl-root 253:0 0 13.4G 0 lvm /
└─cl-swap 253:1 0 1.6G 0 lvm [SWAP]
sdb 8:16 0 50G 0 disk
sr0 11:0 1 4.1G 0 rom
Of course I forgot something, sorry. :( Try getting the output of this: kubectl get all,ds,svc,deploy,secret,sa,clusterrolebinding --selector=glusterfs -o wide
. Also see if you can do curl http://<IP>:8080/hello
where IP
is the IP address of your deploy-heketi service.
You can look at the container logs for the GlusterFS pods and the heketi pod, and also look for any events from the output of their kubectl describe
output.
I'm in US CDT (GMT-5), so we're about 13hrs apart. Depending on your schedule, I could get online for a call (generally) before 0000 CDT / 1300 PHT or starting at 0800 CDT / 2100 PHT. We could use Google Hangouts or another tool of your choice.
hi @jarrpa
i recreated my entire topology, i think my desktop (server where im testing this on) hdd is problematic. so i set up the same in aws (used m3.medium) and used same os (centos) deployed the cluster via kubespray (kubectl 1.7.5)
now heketi pod wont launch. :(
Waiting for GlusterFS pods to start ...
Checking status of pods matching '--selector=glusterfs=pod':
glusterfs-drw2p 1/1 Running 0 42s
glusterfs-q1wjb 1/1 Running 0 42s
glusterfs-qsks3 1/1 Running 0 42s
OK
sed: -e expression #2, char 20: unknown option to `s'
/usr/local/bin/kubectl -n default create secret generic heketi-config-secret --from-file=./heketi.json --from-file=topology.json=topology.json
2017-09-25 02:41:23.068495 I | proto: duplicate proto type registered: google.protobuf.Any
2017-09-25 02:41:23.068566 I | proto: duplicate proto type registered: google.protobuf.Duration
2017-09-25 02:41:23.068585 I | proto: duplicate proto type registered: google.protobuf.Timestamp
secret "heketi-config-secret" created
/usr/local/bin/kubectl -n default label --overwrite secret heketi-config-secret glusterfs=heketi-config-secret heketi=config-secret
secret "heketi-config-secret" labeled
sed -e 's/\${HEKETI_EXECUTOR}/kubernetes/' -e 's/\${HEKETI_FSTAB}//var/lib/heketi/fstab/' -e 's/\${HEKETI_ADMIN_KEY}//' -e 's/\${HEKETI_USER_KEY}//' /home/centos/gluster-kubernetes/deploy/kube-templates/deploy-heketi-deployment.yaml | /usr/local/bin/kubectl -n default create -f - 2>&1
sed: -e expression #2, char 21: unknown option to `s'
google.protobuf.Timestamp
error: no objects passed to create
Waiting for deploy-heketi pod to start ...
Checking status of pods matching '--selector=deploy-heketi=pod':
Timed out waiting for pods matching '--selector=deploy-heketi=pod'.
pod not found.
it may be related to #358 .. not sure though if there's a fix and what could have caused it..
im not using the gluster:master but your experimented version without systemd. because when i tried to use the master again in this set up, i was encountering the same issue with glusterfs pods crashing.
these are other kubectl outputs:
kubectl get pods --all-namespaces
NAMESPACE NAME READY STATUS RESTARTS AGE
default glusterfs-drw2p 1/1 Running 0 15m
default glusterfs-q1wjb 1/1 Running 0 15m
default glusterfs-qsks3 1/1 Running 0 15m
kube-system calico-node-1qbdh 1/1 Running 0 1d
kube-system calico-node-8nrs5 1/1 Running 0 1d
kube-system calico-node-lxsdl 1/1 Running 0 1d
kube-system kube-apiserver-kube1 1/1 Running 0 1d
kube-system kube-controller-manager-kube1 1/1 Running 0 1d
kube-system kube-dns-3888408129-lmk1r 3/3 Running 0 1d
kube-system kube-dns-3888408129-thn7w 3/3 Running 0 1d
kube-system kube-proxy-kube1 1/1 Running 0 1d
kube-system kube-proxy-kube2 1/1 Running 0 1d
kube-system kube-proxy-kube3 1/1 Running 0 1d
kube-system kube-scheduler-kube1 1/1 Running 0 1d
kube-system kubedns-autoscaler-1629318612-fc4rj 1/1 Running 0 1d
kube-system kubernetes-dashboard-3941213843-59739 1/1 Running 0 1d
kube-system nginx-proxy-kube2 1/1 Running 0 1d
kube-system nginx-proxy-kube3 1/1 Running 0 1d
kubectl get all,ds,svc,deploy,secret,sa,clusterrolebinding --selector=glusterfs -o wide
NAME READY STATUS RESTARTS AGE IP NODE
po/glusterfs-drw2p 1/1 Running 0 16m 172.16.100.24 kube3
po/glusterfs-q1wjb 1/1 Running 0 16m 172.16.100.23 kube2
po/glusterfs-qsks3 1/1 Running 0 16m 172.16.100.21 kube1
NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE-SELECTOR AGE CONTAINER(S) IMAGE(S) SELECTOR
ds/glusterfs 3 3 3 3 3 storagenode=glusterfs 16m glusterfs jarrpa/gluster-fedora-minimal:block glusterfs=pod,glusterfs-node=pod
NAME TYPE DATA AGE
secrets/heketi-config-secret Opaque 2 15m
NAME SECRETS AGE
sa/heketi-service-account 1 16m
NAME AGE ROLE USERS GROUPS SERVICEACCOUNTS
clusterrolebindings/heketi-sa-view 16m edit default/heketi-service-account
it may be related to #358 .. not sure though if there's a fix and what could have caused it..
Yes. This may be the related one.
Are you trying the latest master and still seeing the issue?
@SaravanaStorageNetwork if i use the gluster:master, the glusterfs pod will not even start. it just crashes. im using @jarrpa #298 dev test image
@jarrpa @SaravanaStorageNetwork if i use the gluster:master:
kubectl logs -p glusterfs-2m15j
Couldn't find an alternative telinit implementation to spawn.
@Justluckyg Could you share your gk-deploy script (maybe through https://gist.github.com)?
I am sure the issue is with sed command usage which you mentioned here https://github.com/gluster/gluster-kubernetes/issues/341#issuecomment-331765212
Couldn't find an alternative telinit implementation to spawn.
sorry..I don't have much insight on this.
@Justluckyg You need to use a verison of gk-deploy that has https://github.com/gluster/gluster-kubernetes/pull/354 merged. The latest version of master on gluster-kubernetes will have that.
You should be able to use my experimental images with that version to get it working.
@jarrpa Hi Jose, i tried it again and so far heketi is replying back with the Hello from Heketi however my deployment is still stuck at creating node..
Checking status of pods matching '--selector=deploy-heketi=pod':
deploy-heketi-2199298601-qs27x 1/1 Running 0 9s
OK
Determining heketi service URL ...
OK
/usr/local/bin/kubectl -n default exec -i deploy-heketi-2199298601-qs27x -- heketi-cli -s http://localhost:8080 --user admin --secret '' topology load --json=/etc/heketi/topology.json 2>&1
Creating cluster ... ID: 46b1b30820b03a6d359988c96d74a792
Creating node node-vm-1 ... ID: 3e94d0b032a336a2048613ac59f180a
kubectl get all,ds,svc,deploy,secret,sa,clusterrolebinding --selector=glusterfs -o wide
NAME READY STATUS RESTARTS AGE IP NODE
po/deploy-heketi-2199298601-qs27x 1/1 Running 0 18m 10.233.118.132 lenddo-vm-3
po/glusterfs-hq6vc 1/1 Running 0 19m 192.168.1.240 node-vm-1
po/glusterfs-rqk1b 1/1 Running 0 19m 192.168.1.242 node-vm-3
po/glusterfs-wdv59 1/1 Running 0 19m 192.168.1.241 node-vm-2
NAME CLUSTER-IP EXTERNAL-IP PORT(S) AGE SELECTOR
svc/deploy-heketi 10.233.13.136 <none> 8080/TCP 18m deploy-heketi=pod
NAME DESIRED CURRENT UP-TO-DATE AVAILABLE AGE CONTAINER(S) IMAGE(S) SELECTOR
deploy/deploy-heketi 1 1 1 1 18m deploy-heketi heketi/heketi:dev deploy-heketi=pod,glusterfs=heketi-pod
NAME DESIRED CURRENT READY AGE CONTAINER(S) IMAGE(S) SELECTOR
rs/deploy-heketi-2199298601 1 1 1 18m deploy-heketi heketi/heketi:dev deploy-heketi=pod,glusterfs=heketi-pod,pod-template-hash=2199298601
NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE-SELECTOR AGE CONTAINER(S) IMAGE(S) SELECTOR
ds/glusterfs 3 3 3 3 3 storagenode=glusterfs 19m glusterfs jarrpa/gluster-fedora-minimal:block glusterfs=pod,glusterfs-node=pod
NAME CLUSTER-IP EXTERNAL-IP PORT(S) AGE SELECTOR
svc/deploy-heketi 10.233.13.136 <none> 8080/TCP 18m deploy-heketi=pod
NAME DESIRED CURRENT UP-TO-DATE AVAILABLE AGE CONTAINER(S) IMAGE(S) SELECTOR
deploy/deploy-heketi 1 1 1 1 18m deploy-heketi heketi/heketi:dev deploy-heketi=pod,glusterfs=heketi-pod
NAME TYPE DATA AGE
secrets/heketi-config-secret Opaque 3 18m
NAME SECRETS AGE
sa/heketi-service-account 1 19m
NAME AGE ROLE USERS GROUPS SERVICEACCOUNTS
clusterrolebindings/heketi-sa-view 19m edit default/heketi-service-account
Do you have time tonight (evening your timezone) for a call or google hangout? Thank you so much.
@Justluckyg As per log, next step is adding devices. I am not sure why it is failing?
Could you check the following in the nodes?
cat /proc/partitions
vgs
pvs
lvs
There should not be any devices created on those devices you mentioned in topology file.
if present, clear then using
pvremove <device>
vgremove <device>
lvremove <device>
commands.
3.
Also, wipefs those devices like
wipefs -a -f /dev/vdc
4.
As gk-deploy is a bash script, you can add set -x
added at beginning of script to see what is going on.
hi @SaravanaStorageNetwork my /dev/sdb didnt appear in the pvs, vgs and lvs. i did wiped the /devv/sdb and re-ran the script with set -x added in the beginning of the script, and the script shows stuck as well...
+ [[ Creating cluster ... ID: 04036a8a2331c4c69da5a974473137c1 == return\ [0-9]* ]]
+ output 'Creating cluster ... ID: 04036a8a2331c4c69da5a974473137c1'
+ opts=-e
+ [[ Creating cluster ... ID: 04036a8a2331c4c69da5a974473137c1 == \-\n ]]
+ out='Creating cluster ... ID: 04036a8a2331c4c69da5a974473137c1'
+ echo -e 'Creating cluster ... ID: 04036a8a2331c4c69da5a974473137c1'
+ [[ x != \x ]]
+ read -r line
Creating cluster ... ID: 04036a8a2331c4c69da5a974473137c1
+ [[ Creating node node-vm-1 ... ID: b7569941348ceef2542942889d75248d == return\ [0-9]* ]]
+ output 'Creating node node-vm-1 ... ID: b7569941348ceef2542942889d75248d'
+ opts=-e
+ [[ Creating node node-vm-1 ... ID: b7569941348ceef2542942889d75248d == \-\n ]]
+ out='Creating node node-vm-1 ... ID: b7569941348ceef2542942889d75248d'
+ echo -e 'Creating node node-vm-1 ... ID: b7569941348ceef2542942889d75248d'
+ [[ x != \x ]]
+ read -r line
Creating node node-vm-1 ... ID: b7569941348ceef2542942889d75248d
@Justluckyg
Could you check and share your topology.json file?
For example , I have shared mine here for comparison -
https://gist.github.com/SaravanaStorageNetwork/ddbc2446015dcf6c97bf83c7e62011da
Hi @SaravanaStorageNetwork
cat topology.json
{
"clusters": [
{
"nodes": [
{
"node": {
"hostnames": {
"manage": [
"node-vm-1"
],
"storage": [
"192.168.1.240"
]
},
"zone": 1
},
"devices": [
"/dev/sdb"
]
},
{
"node": {
"hostnames": {
"manage": [
"node-vm-2"
],
"storage": [
"192.168.1.241"
]
},
"zone": 1
},
"devices": [
"/dev/sdb"
]
},
{
"node": {
"hostnames": {
"manage": [
"node-vm-3"
],
"storage": [
"192.168.1.242"
]
},
"zone": 1
},
"devices": [
"/dev/sdb"
]
}
]
}
]
}
@Justluckyg Is there anything in the logs of kubectl logs deploy-heketi-2199298601-qs27x
that might give a clue?
@Justluckyg Also, today I'll be unavailable between 1900 CDT / 0800 PHT 2200 CDT / 1100 PHT. If you can catch me outside that time but before 0000 CDT / 1300 PHT, I'd be happy to have a chat. Send me a chat request via Hangouts, my username is the email address on my profile. :)
@jarrpa thank u! ill chat you around 1100PHT - 1300 🙂
@jarrpa i used the heketi latest on both deploy-heketi-deployment.yaml and heketi-deployment.yaml, still the same, is stuck at creating node..
@Justluckyg To answer a mystery I mentioned last night: it turns out that when heketi is using kubeexec (e.g. when it is managing GlusterFS through Kubernetes) it has no operation timeout and thus will hang indefinitely. Unfortunate, but good to know. :)
After our chat last night, I'm trying to bring in more people to look at this. Could you provide an updated comment with your current setup and the latest problem you're facing? Include the following:
gk-deploy
verbose output (don't run with set -x
)kubectl get all,ds,svc,deploy,secret,sa,clusterrolebinding --all-namespaces
Hi @jarrpa here's what i have so far:
# cat /etc/*-release
CentOS Linux release 7.4.1708 (Core)
NAME="CentOS Linux"
VERSION="7 (Core)"
ID="centos"
ID_LIKE="rhel fedora"
VERSION_ID="7"
PRETTY_NAME="CentOS Linux 7 (Core)"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:centos:centos:7"
HOME_URL="https://www.centos.org/"
BUG_REPORT_URL="https://bugs.centos.org/"
CENTOS_MANTISBT_PROJECT="CentOS-7"
CENTOS_MANTISBT_PROJECT_VERSION="7"
REDHAT_SUPPORT_PRODUCT="centos"
REDHAT_SUPPORT_PRODUCT_VERSION="7"
CentOS Linux release 7.4.1708 (Core)
CentOS Linux release 7.4.1708 (Core)
Linux kube2 3.10.0-514.16.1.el7.x86_64 #1 SMP Wed Apr 12 15:04:24 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
(Tried both on bare metal via vmware esx 5.5 and amazon centos ami)
Used https://github.com/kubernetes-incubator/kubespray.git to deploy Kubernetes cluster.
Disabled selinux and firewalld
Git cloned https://github.com/gluster/gluster-kubernetes.git but was having issue with the gluster centos image where in my glusterfs pods just continuously crashes and never gets to run so I used experimental image: jarrpa/gluster-fedora-minimal:block which resolved my issue with creating the glusterfs pods and heketi pod
Whenever I reset, I use this command to the main node where gluster git was pulled
./gk-deploy -gy --abort
And these commands on all 3
rm -rf /var/lib/heketi /etc/glusterfs /var/lib/glusterd /var/log/glusterfs
wipefs /dev/sdb
I've also tried using heketi:latest image and edited deply-heketi and heketi-deployment yml files.
[[email protected] ~]# kubectl get pods --all-namespaces
NAMESPACE NAME READY STATUS RESTARTS AGE
default deploy-heketi-4089530887-88mpq 1/1 Running 0 57s
default glusterfs-33301 1/1 Running 0 1m
default glusterfs-qdz41 1/1 Running 0 1m
default glusterfs-s5pm4 1/1 Running 0 1m
kube-system kube-apiserver-lenddo-vm-1 1/1 Running 1 2d
kube-system kube-controller-manager-lenddo-vm-1 1/1 Running 1 2d
kube-system kube-dns-2410490047-rvcr2 3/3 Running 0 2d
kube-system kube-dns-2410490047-tx8z7 3/3 Running 0 2d
kube-system kube-proxy-lenddo-vm-1 1/1 Running 1 2d
kube-system kube-proxy-lenddo-vm-2 1/1 Running 0 2d
kube-system kube-proxy-lenddo-vm-3 1/1 Running 0 2d
kube-system kube-scheduler-lenddo-vm-1 1/1 Running 1 2d
kube-system kubedns-autoscaler-4166808448-bd17h 1/1 Running 0 2d
kube-system kubernetes-dashboard-3307607089-3nzts 1/1 Running 0 2d
kube-system nginx-proxy-lenddo-vm-2 1/1 Running 0 2d
kube-system nginx-proxy-lenddo-vm-3 1/1 Running 0 2d
[[email protected] ~]# kubectl get all,ds,svc,deploy,secret,sa,clusterrolebinding --all-namespaces
NAMESPACE NAME READY STATUS RESTARTS AGE
default po/deploy-heketi-4089530887-88mpq 1/1 Running 0 1m
default po/glusterfs-33301 1/1 Running 0 1m
default po/glusterfs-qdz41 1/1 Running 0 1m
default po/glusterfs-s5pm4 1/1 Running 0 1m
kube-system po/kube-apiserver-lenddo-vm-1 1/1 Running 1 2d
kube-system po/kube-controller-manager-lenddo-vm-1 1/1 Running 1 2d
kube-system po/kube-dns-2410490047-rvcr2 3/3 Running 0 2d
kube-system po/kube-dns-2410490047-tx8z7 3/3 Running 0 2d
kube-system po/kube-proxy-lenddo-vm-1 1/1 Running 1 2d
kube-system po/kube-proxy-lenddo-vm-2 1/1 Running 0 2d
kube-system po/kube-proxy-lenddo-vm-3 1/1 Running 0 2d
kube-system po/kube-scheduler-lenddo-vm-1 1/1 Running 1 2d
kube-system po/kubedns-autoscaler-4166808448-bd17h 1/1 Running 0 2d
kube-system po/kubernetes-dashboard-3307607089-3nzts 1/1 Running 0 2d
kube-system po/nginx-proxy-lenddo-vm-2 1/1 Running 0 2d
kube-system po/nginx-proxy-lenddo-vm-3 1/1 Running 0 2d
NAMESPACE NAME CLUSTER-IP EXTERNAL-IP PORT(S) AGE
default svc/deploy-heketi 10.233.50.182 <none> 8080/TCP 1m
default svc/kubernetes 10.233.0.1 <none> 443/TCP 2d
kube-system svc/kube-dns 10.233.0.3 <none> 53/UDP,53/TCP 2d
kube-system svc/kubernetes-dashboard 10.233.33.172 <none> 80/TCP 2d
NAMESPACE NAME DESIRED CURRENT UP-TO-DATE AVAILABLE AGE
default deploy/deploy-heketi 1 1 1 1 1m
kube-system deploy/kube-dns 2 2 2 2 2d
kube-system deploy/kubedns-autoscaler 1 1 1 1 2d
kube-system deploy/kubernetes-dashboard 1 1 1 1 2d
NAMESPACE NAME DESIRED CURRENT READY AGE
default rs/deploy-heketi-4089530887 1 1 1 1m
kube-system rs/kube-dns-2410490047 2 2 2 2d
kube-system rs/kubedns-autoscaler-4166808448 1 1 1 2d
kube-system rs/kubernetes-dashboard-3307607089 1 1 1 2d
NAMESPACE NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE-SELECTOR AGE
default ds/glusterfs 3 3 3 3 3 storagenode=glusterfs 1m
NAMESPACE NAME CLUSTER-IP EXTERNAL-IP PORT(S) AGE
default svc/deploy-heketi 10.233.50.182 <none> 8080/TCP 1m
default svc/kubernetes 10.233.0.1 <none> 443/TCP 2d
kube-system svc/kube-dns 10.233.0.3 <none> 53/UDP,53/TCP 2d
kube-system svc/kubernetes-dashboard 10.233.33.172 <none> 80/TCP 2d
NAMESPACE NAME DESIRED CURRENT UP-TO-DATE AVAILABLE AGE
default deploy/deploy-heketi 1 1 1 1 1m
kube-system deploy/kube-dns 2 2 2 2 2d
kube-system deploy/kubedns-autoscaler 1 1 1 1 2d
kube-system deploy/kubernetes-dashboard 1 1 1 1 2d
NAMESPACE NAME TYPE DATA AGE
default secrets/default-token-s24hk kubernetes.io/service-account-token 3 2d
default secrets/heketi-config-secret Opaque 3 1m
default secrets/heketi-service-account-token-m664z kubernetes.io/service-account-token 3 1m
kube-public secrets/default-token-zc6gr kubernetes.io/service-account-token 3 2d
kube-system secrets/default-token-5ng2g kubernetes.io/service-account-token 3 2d
NAMESPACE NAME SECRETS AGE
default sa/default 1 2d
default sa/heketi-service-account 1 1m
kube-public sa/default 1 2d
kube-system sa/default 1 2d
NAMESPACE NAME AGE
error: clusterRoleBinding is not namespaced
[[email protected] ~]# kubectl logs deploy-heketi-4089530887-88mpq
Heketi v5.0.0-5-gb005e0f-release-5
[kubeexec] WARNING 2017/10/05 03:33:34 Rebalance on volume expansion has been enabled. This is an EXPERIMENTAL feature
[heketi] INFO 2017/10/05 03:33:34 Loaded kubernetes executor
[heketi] INFO 2017/10/05 03:33:34 Loaded simple allocator
[heketi] INFO 2017/10/05 03:33:34 GlusterFS Application Loaded
Listening on port 8080
[negroni] Started GET /clusters
[negroni] Completed 200 OK in 117.209µs
[negroni] Started POST /clusters
[negroni] Completed 201 Created in 1.461793ms
[negroni] Started POST /nodes
[heketi] INFO 2017/10/05 03:33:44 Adding node lenddo-vm-1
[negroni] Completed 202 Accepted in 1.39661ms
[asynchttp] INFO 2017/10/05 03:33:44 asynchttp.go:125: Started job 552abc6ff7662f71d8e06a7a4a415994
[negroni] Started GET /queue/552abc6ff7662f71d8e06a7a4a415994
[negroni] Completed 200 OK in 31.402µs
[heketi] INFO 2017/10/05 03:33:44 Added node 457cf8146e42ac83471c3ab63ccabd4a
[asynchttp] INFO 2017/10/05 03:33:44 asynchttp.go:129: Completed job 552abc6ff7662f71d8e06a7a4a415994 in 2.161574ms
[negroni] Started GET /queue/552abc6ff7662f71d8e06a7a4a415994
[negroni] Completed 303 See Other in 66.491µs
[negroni] Started GET /nodes/457cf8146e42ac83471c3ab63ccabd4a
[negroni] Completed 200 OK in 427.641µs
[negroni] Started POST /devices
[heketi] INFO 2017/10/05 03:33:44 Adding device /dev/sdb to node 457cf8146e42ac83471c3ab63ccabd4a
[negroni] Completed 202 Accepted in 1.857125ms
[asynchttp] INFO 2017/10/05 03:33:44 asynchttp.go:125: Started job 9d4566aefa452e84e9818be55f6ce6e7
[negroni] Started GET /queue/9d4566aefa452e84e9818be55f6ce6e7
[negroni] Completed 200 OK in 43.837µs
[negroni] Started GET /queue/9d4566aefa452e84e9818be55f6ce6e7
[negroni] Completed 200 OK in 52.87µs
[negroni] Started GET /queue/9d4566aefa452e84e9818be55f6ce6e7
[negroni] Completed 200 OK in 50.684µs
[[email protected] ~]# kubectl logs glusterfs-33301
google.protobuf.Timestamp
[glusterd......] [2017-10-05 03:32:52.210335] I [MSGID: 100030] [glusterfsd.c:2475:main] 0-/usr/sbin/glusterd: Started running /usr/sbin/glusterd version 3.10.5 (args: /usr/sbin/glusterd -p /var/run/glusterd.pid --log-level INFO)
[glusterd......] [2017-10-05 03:32:52.660424] I [MSGID: 106478] [glusterd.c:1449:init] 0-management: Maximum allowed open file descriptors set to 65536
[glusterd......] [2017-10-05 03:32:52.660471] I [MSGID: 106479] [glusterd.c:1496:init] 0-management: Using /var/lib/glusterd as working directory
[glusterd......] [2017-10-05 03:32:52.666920] E [rpc-transport.c:283:rpc_transport_load] 0-rpc-transport: /usr/lib64/glusterfs/3.10.5/rpc-transport/rdma.so: cannot open shared object file: No such file or directory
[glusterd......] [2017-10-05 03:32:52.667183] W [rpc-transport.c:287:rpc_transport_load] 0-rpc-transport: volume 'rdma.management': transport-type 'rdma' is not valid or not found on this machine
[glusterd......] [2017-10-05 03:32:52.667200] W [rpcsvc.c:1661:rpcsvc_create_listener] 0-rpc-service: cannot create listener, initing the transport failed
[glusterd......] [2017-10-05 03:32:52.667210] E [MSGID: 106243] [glusterd.c:1720:init] 0-management: creation of 1 listeners failed, continuing with succeeded transport
[glusterd......] [2017-10-05 03:32:55.183840] E [MSGID: 101032] [store.c:433:gf_store_handle_retrieve] 0-: Path corresponding to /var/lib/glusterd/glusterd.info. [No such file or directory]
[glusterd......] [2017-10-05 03:32:55.183880] E [MSGID: 101032] [store.c:433:gf_store_handle_retrieve] 0-: Path corresponding to /var/lib/glusterd/glusterd.info. [No such file or directory]
[glusterd......] [2017-10-05 03:32:55.183884] I [MSGID: 106514] [glusterd-store.c:2219:glusterd_restore_op_version] 0-management: Detected new install. Setting op-version to maximum : 31004
[glusterd......] [2017-10-05 03:32:55.184042] I [MSGID: 106194] [glusterd-store.c:3776:glusterd_store_retrieve_missed_snaps_list] 0-management: No missed snaps list.
[glusterd......] Final graph:
[glusterd......] +------------------------------------------------------------------------------+
[glusterd......] 1: volume management
[glusterd......] 2: type mgmt/glusterd
[glusterd......] 3: option rpc-auth.auth-glusterfs on
[glusterd......] 4: option rpc-auth.auth-unix on
[glusterd......] 5: option rpc-auth.auth-null on
[glusterd......] 6: option rpc-auth-allow-insecure on
[glusterd......] 7: option transport.socket.listen-backlog 128
[glusterd......] 8: option event-threads 1
[glusterd......] 9: option ping-timeout 0
[glusterd......] 10: option transport.socket.read-fail-log off
[glusterd......] 11: option transport.socket.keepalive-interval 2
[glusterd......] 12: option transport.socket.keepalive-time 10
[glusterd......] 13: option transport-type rdma
[glusterd......] 14: option working-directory /var/lib/glusterd
[glusterd......] 15: end-volume
[glusterd......] 16:
[glusterd......] +------------------------------------------------------------------------------+
[glusterd......] [2017-10-05 03:32:55.185552] I [MSGID: 101190] [event-epoll.c:629:event_dispatch_epoll_worker] 0-epoll: Started thread with index 1
[glusterd......] [2017-10-05 03:32:56.270504] I [MSGID: 106488] [glusterd-handler.c:1538:__glusterd_handle_cli_get_volume] 0-management: Received get vol req
[glusterd......] Starting glusterd ... OK
[glusterd......] [2017-10-05 03:32:56.354918] E [MSGID: 101032] [store.c:433:gf_store_handle_retrieve] 0-: Path corresponding to /var/lib/glusterd/glusterd.info. [No such file or directory]
[glusterd......] [2017-10-05 03:32:56.354975] I [MSGID: 106477] [glusterd.c:190:glusterd_uuid_generate_save] 0-management: generated UUID: 916c3695-1ec9-4dad-bbe5-2bc89afedb2a
[glusterd......] volume set: success
[glusterd......] [2017-10-05 03:32:56.413880] I [run.c:191:runner_log] (-->/usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so(+0xcf4fa) [0x7f54a0fe94fa] -->/usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so(+0xcefac) [0x7f54a0fe8fac] -->/lib64/libglusterfs.so.0(runner_log+0x105) [0x7f54a6010ee5] ) 0-management: Ran script: /var/lib/glusterd/hooks/1/set/post/S30samba-set.sh --volname=all -o cluster.brick-multiplex=on --gd-workdir=/var/lib/glusterd
[glusterd......] [2017-10-05 03:32:56.440866] E [MSGID: 106025] [glusterd-op-sm.c:1097:glusterd_op_stage_set_volume] 0-management: Option with name: cluster.max-bricks-per-process does not exist
[glusterd......] [2017-10-05 03:32:56.440920] E [MSGID: 106301] [glusterd-syncop.c:1321:gd_stage_op_phase] 0-management: Staging of operation 'Volume Set' failed on localhost : option : cluster.max-bricks-per-process does not exist
[glusterd......] Did you mean cluster.max-op-version or ...min-free-inodes?
[glusterd......] volume set: failed: option : cluster.max-bricks-per-process does not exist
[glusterd......] Did you mean cluster.max-op-version or ...min-free-inodes?
[glusterd......] Setting max bricks per process failed
[glusterd......] Killing glusterd ...
[glusterd......] [2017-10-05 03:32:56.443655] I [run.c:191:runner_log] (-->/usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so(+0xcf4fa) [0x7f54a0fe94fa] -->/usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so(+0xcefac) [0x7f54a0fe8fac] -->/lib64/libglusterfs.so.0(runner_log+0x105) [0x7f54a6010ee5] ) 0-management: Ran script: /var/lib/glusterd/hooks/1/set/post/S32gluster_enable_shared_storage.sh --volname=all -o cluster.brick-multiplex=on --gd-workdir=/var/lib/glusterd
[glusterd......] [2017-10-05 03:32:56.446377] W [glusterfsd.c:1332:cleanup_and_exit] (-->/lib64/libpthread.so.0(+0x736d) [0x7f54a4e1536d] -->/usr/sbin/glusterd(glusterfs_sigwaiter+0xf5) [0x7f54a64c3d05] -->/usr/sbin/glusterd(cleanup_and_exit+0x5a) [0x7f54a64c3afa] ) 0-: received signum (15), shutting down
[glusterd......] OK
[glusterd......] [2017-10-05 03:32:57.455322] I [MSGID: 100030] [glusterfsd.c:2475:main] 0-/usr/sbin/glusterd: Started running /usr/sbin/glusterd version 3.10.5 (args: /usr/sbin/glusterd -p /var/run/glusterd.pid --log-level INFO)
[glusterd......] [2017-10-05 03:32:57.461623] I [MSGID: 106478] [glusterd.c:1449:init] 0-management: Maximum allowed open file descriptors set to 65536
[glusterd......] [2017-10-05 03:32:57.461653] I [MSGID: 106479] [glusterd.c:1496:init] 0-management: Using /var/lib/glusterd as working directory
[glusterd......] [2017-10-05 03:32:57.463549] E [rpc-transport.c:283:rpc_transport_load] 0-rpc-transport: /usr/lib64/glusterfs/3.10.5/rpc-transport/rdma.so: cannot open shared object file: No such file or directory
[glusterd......] [2017-10-05 03:32:57.463566] W [rpc-transport.c:287:rpc_transport_load] 0-rpc-transport: volume 'rdma.management': transport-type 'rdma' is not valid or not found on this machine
[glusterd......] [2017-10-05 03:32:57.463572] W [rpcsvc.c:1661:rpcsvc_create_listener] 0-rpc-service: cannot create listener, initing the transport failed
[glusterd......] [2017-10-05 03:32:57.463578] E [MSGID: 106243] [glusterd.c:1720:init] 0-management: creation of 1 listeners failed, continuing with succeeded transport
[glusterd......] [2017-10-05 03:32:59.956252] I [MSGID: 106513] [glusterd-store.c:2197:glusterd_restore_op_version] 0-glusterd: retrieved op-version: 31004
[glusterd......] [2017-10-05 03:32:59.956372] I [MSGID: 106194] [glusterd-store.c:3776:glusterd_store_retrieve_missed_snaps_list] 0-management: No missed snaps list.
[glusterd......] Final graph:
[glusterd......] +------------------------------------------------------------------------------+
[glusterd......] 1: volume management
[glusterd......] 2: type mgmt/glusterd
[glusterd......] 3: option rpc-auth.auth-glusterfs on
[glusterd......] 4: option rpc-auth.auth-unix on
[glusterd......] 5: option rpc-auth.auth-null on
[glusterd......] 6: option rpc-auth-allow-insecure on
[glusterd......] 7: option transport.socket.listen-backlog 128
[glusterd......] 8: option event-threads 1
[glusterd......] 9: option ping-timeout 0
[glusterd......] 10: option transport.socket.read-fail-log off
[glusterd......] 11: option transport.socket.keepalive-interval 2
[glusterd......] 12: option transport.socket.keepalive-time 10
[glusterd......] 13: option transport-type rdma
[glusterd......] 14: option working-directory /var/lib/glusterd
[glusterd......] 15: end-volume
[glusterd......] 16:
[glusterd......] +------------------------------------------------------------------------------+
[glusterd......] [2017-10-05 03:32:59.957792] I [MSGID: 101190] [event-epoll.c:629:event_dispatch_epoll_worker] 0-epoll: Started thread with index 1
[glusterd......] [2017-10-05 03:33:01.027934] I [MSGID: 106488] [glusterd-handler.c:1538:__glusterd_handle_cli_get_volume] 0-management: Received get vol req
[glusterd......] Starting glusterd ... OK
[tcmu-runner...] 2017-10-05 03:33:01.042 250 [DEBUG] main:816 : handler path: /usr/lib64/tcmu-runner
[tcmu-runner...] 2017-10-05 03:33:01.042 250 [INFO] load_our_module:489 : no modules directory '/lib/modules/3.10.0-514.26.2.el7.x86_64', checking module target_core_user entry in '/sys/modules/'
[tcmu-runner...] 2017-10-05 03:33:01.042 250 [DEBUG] load_our_module:492 : Module target_core_user already loaded
[tcmu-runner...] 2017-10-05 03:33:01.042 250 [DEBUG] main:829 : 1 runner handlers found
[tcmu-runner...] 2017-10-05 03:33:01.042 250 [ERROR] set_genl_features:279 : Could not set features. Error -10
[tcmu-runner...] 2017-10-05 03:33:01.045 250 [DEBUG] dbus_bus_acquired:437 : bus org.kernel.TCMUService1 acquired
[tcmu-runner...] 2017-10-05 03:33:01.045 250 [DEBUG] dbus_name_acquired:453 : name org.kernel.TCMUService1 acquired
[[email protected] deploy]# ./gk-deploy -gvy
Using Kubernetes CLI.
Checking status of namespace matching 'default':
default Active 2d
Using namespace "default".
Checking for pre-existing resources...
GlusterFS pods ...
Checking status of pods matching '--selector=glusterfs=pod':
Timed out waiting for pods matching '--selector=glusterfs=pod'.
not found.
deploy-heketi pod ...
Checking status of pods matching '--selector=deploy-heketi=pod':
Timed out waiting for pods matching '--selector=deploy-heketi=pod'.
not found.
heketi pod ...
Checking status of pods matching '--selector=heketi=pod':
Timed out waiting for pods matching '--selector=heketi=pod'.
not found.
gluster-s3 pod ...
Checking status of pods matching '--selector=glusterfs=s3-pod':
Timed out waiting for pods matching '--selector=glusterfs=s3-pod'.
not found.
Creating initial resources ... /usr/local/bin/kubectl -n default create -f /root/gluster-kubernetes/deploy/kube-templates/heketi-service-account.yaml 2>&1
serviceaccount "heketi-service-account" created
/usr/local/bin/kubectl -n default create clusterrolebinding heketi-sa-view --clusterrole=edit --serviceaccount=default:heketi-service-account 2>&1
clusterrolebinding "heketi-sa-view" created
/usr/local/bin/kubectl -n default label --overwrite clusterrolebinding heketi-sa-view glusterfs=heketi-sa-view heketi=sa-view
clusterrolebinding "heketi-sa-view" labeled
OK
Marking 'lenddo-vm-1' as a GlusterFS node.
/usr/local/bin/kubectl -n default label nodes lenddo-vm-1 storagenode=glusterfs --overwrite 2>&1
node "lenddo-vm-1" labeled
Marking 'lenddo-vm-2' as a GlusterFS node.
/usr/local/bin/kubectl -n default label nodes lenddo-vm-2 storagenode=glusterfs --overwrite 2>&1
node "lenddo-vm-2" labeled
Marking 'lenddo-vm-3' as a GlusterFS node.
/usr/local/bin/kubectl -n default label nodes lenddo-vm-3 storagenode=glusterfs --overwrite 2>&1
node "lenddo-vm-3" labeled
Deploying GlusterFS pods.
sed -e 's/storagenode\: glusterfs/storagenode\: 'glusterfs'/g' /root/gluster-kubernetes/deploy/kube-templates/glusterfs-daemonset.yaml | /usr/local/bin/kubectl -n default create -f - 2>&1
daemonset "glusterfs" created
Waiting for GlusterFS pods to start ...
Checking status of pods matching '--selector=glusterfs=pod':
glusterfs-33301 1/1 Running 0 41s
glusterfs-qdz41 1/1 Running 0 41s
glusterfs-s5pm4 1/1 Running 0 41s
OK
/usr/local/bin/kubectl -n default create secret generic heketi-config-secret --from-file=private_key=/dev/null --from-file=./heketi.json --from-file=topology.json=topology.json
secret "heketi-config-secret" created
/usr/local/bin/kubectl -n default label --overwrite secret heketi-config-secret glusterfs=heketi-config-secret heketi=config-secret
secret "heketi-config-secret" labeled
sed -e 's/\${HEKETI_EXECUTOR}/kubernetes/' -e 's#\${HEKETI_FSTAB}#/var/lib/heketi/fstab#' -e 's/\${HEKETI_ADMIN_KEY}//' -e 's/\${HEKETI_USER_KEY}//' /root/gluster-kubernetes/deploy/kube-templates/deploy-heketi-deployment.yaml | /usr/local/bin/kubectl -n default create -f - 2>&1
service "deploy-heketi" created
deployment "deploy-heketi" created
Waiting for deploy-heketi pod to start ...
Checking status of pods matching '--selector=deploy-heketi=pod':
deploy-heketi-4089530887-88mpq 1/1 Running 0 9s
OK
Determining heketi service URL ... 2017-10-05 11:33:43.820264 I | OK
/usr/local/bin/kubectl -n default exec -i deploy-heketi-4089530887-88mpq -- heketi-cli -s http://localhost:8080 --user admin --secret '' topology load --json=/etc/heketi/topology.json 2>&1
Creating cluster ... ID: f77ba0ffa6a37fd10ec16c3726d401a3
Creating node lenddo-vm-1 ... ID: 457cf8146e42ac83471c3ab63ccabd4a
@Justluckyg Updated your comment to fix some formatting issues. :)
Thanks for the info! I'm assuming the topology file mentioned here is still current, yes?
When you wipe the drives, are you running wipefs /dev/sdb
or wipefs -a /dev/sdb
?
Also, we're still missing the following information:
@jarrpa yes that topology is still current. hostname is different, lenddo-vm-1, lenddo-vm-2 and lenddo-vm-3.
i was only using wipefs /dev/sdb
I wasnt using the -a flag.
kubectl version
Client Version: version.Info{Major:"1", Minor:"7", GitVersion:"v1.7.3+coreos.0", GitCommit:"42de91f04e456f7625941a6c4aaedaa69708be1b", GitTreeState:"clean", BuildDate:"2017-08-07T19:44:31Z", GoVersion:"go1.8.3", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"7", GitVersion:"v1.7.3+coreos.0", GitCommit:"42de91f04e456f7625941a6c4aaedaa69708be1b", GitTreeState:"clean", BuildDate:"2017-08-07T19:44:31Z", GoVersion:"go1.8.3", Compiler:"gc", Platform:"linux/amd64"}
[[email protected] gluster-kubernetes]# git log
commit b39f85357c3c46ba5ce508ac1b67e7d0a546ae6c
Merge: 9234f81 200b708
Author: Jose A. Rivera <[email protected]>
Date: Thu Sep 7 08:09:36 2017 -0500
Merge pull request #346 from SaravanaStorageNetwork/fix_kube_template
Fix default value for GB_GLFS_LRU_COUNT
[[email protected] gluster-kubernetes]# git status
# On branch master
# Changes not staged for commit:
# (use "git add <file>..." to update what will be committed)
# (use "git checkout -- <file>..." to discard changes in working directory)
#
# modified: deploy/gk-deploy
# modified: deploy/kube-templates/deploy-heketi-deployment.yaml
# modified: deploy/kube-templates/glusterfs-daemonset.yaml
# modified: deploy/kube-templates/heketi-deployment.yaml
#
# Untracked files:
# (use "git add <file>..." to include in what will be committed)
#
# deploy/gk-deploy-old
no changes added to commit (use "git add" and/or "git commit -a")
I replaced the gk-deploy script with the gk-deploy from the master branch, didnt make any difference.
ok.. i had some progress after resetting the gluster and adding -a in wipefs.. @jarrpa
Creating cluster ... ID: a34633291757811b9164303d76463ddb
Creating node lenddo-vm-1 ... ID: 751ed953092fe6059e9e9aaa9ef5e1b1
Adding device /dev/sdb ... OK
Creating node lenddo-vm-2 ... ID: 63e4da78a85015176422390f08a5fff0
Adding device /dev/sdb ... OK
Creating node lenddo-vm-3 ... ID: 7f3fc5d496be4403a6abd57c71925fa3
Adding device /dev/sdb ... OK
heketi topology loaded.
/usr/local/bin/kubectl -n default exec -i deploy-heketi-4089530887-l58qj -- heketi-cli -s http://localhost:8080 --user admin --secret '' setup-openshift-heketi-storage --listfile=/tmp/heketi-storage.json 2>&1
Saving /tmp/heketi-storage.json
/usr/local/bin/kubectl -n default exec -i deploy-heketi-4089530887-l58qj -- cat /tmp/heketi-storage.json | /usr/local/bin/kubectl -n default create -f - 2>&1
secret "heketi-storage-secret" created
endpoints "heketi-storage-endpoints" created
service "heketi-storage-endpoints" created
job "heketi-storage-copy-job" created
Checking status of pods matching '--selector=job-name=heketi-storage-copy-job':
heketi-storage-copy-job-fsrgw 0/1 ContainerCreating 0 4m
And this is what im getting when i desrcribe the heketistorage pod
kubectl describe pod heketi-storage-copy-job-fsrgw
2017-10-05 12:51:37.703881 I | proto: duplicate proto type registered: google.protobuf.Any
2017-10-05 12:51:37.703952 I | proto: duplicate proto type registered: google.protobuf.Duration
2017-10-05 12:51:37.703967 I | proto: duplicate proto type registered: google.protobuf.Timestamp
Name: heketi-storage-copy-job-fsrgw
Namespace: default
Node: lenddo-vm-2/192.168.1.241
Start Time: Thu, 05 Oct 2017 12:46:44 +0800
Labels: controller-uid=2fc039df-a988-11e7-9a8d-000c29580815
job-name=heketi-storage-copy-job
Annotations: kubernetes.io/created-by={"kind":"SerializedReference","apiVersion":"v1","reference":{"kind":"Job","namespace":"default","name":"heketi-storage-copy-job","uid":"2fc039df-a988-11e7-9a8d-000c29580815","...
Status: Pending
IP:
Created By: Job/heketi-storage-copy-job
Controlled By: Job/heketi-storage-copy-job
Containers:
heketi:
Container ID:
Image: heketi/heketi:dev
Image ID:
Port: <none>
Command:
cp
/db/heketi.db
/heketi
State: Waiting
Reason: ContainerCreating
Ready: False
Restart Count: 0
Environment: <none>
Mounts:
/db from heketi-storage-secret (rw)
/heketi from heketi-storage (rw)
/var/run/secrets/kubernetes.io/serviceaccount from default-token-s24hk (ro)
Conditions:
Type Status
Initialized True
Ready False
PodScheduled True
Volumes:
heketi-storage:
Type: Glusterfs (a Glusterfs mount on the host that shares a pod's lifetime)
EndpointsName: heketi-storage-endpoints
Path: heketidbstorage
ReadOnly: false
heketi-storage-secret:
Type: Secret (a volume populated by a Secret)
SecretName: heketi-storage-secret
Optional: false
default-token-s24hk:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-s24hk
Optional: false
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: <none>
Events:
FirstSeen LastSeen Count From SubObjectPath Type Reason Message
--------- -------- ----- ---- ------------- -------- ------ -------
4m 4m 1 default-scheduler Normal Scheduled Successfully assigned heketi-storage-copy-job-fsrgw to lenddo-vm-2
4m 4m 1 kubelet, lenddo-vm-2 Normal SuccessfulMountVolume MountVolume.SetUp succeeded for volume "default-token-s24hk"
4m 4m 1 kubelet, lenddo-vm-2 Normal SuccessfulMountVolume MountVolume.SetUp succeeded for volume "heketi-storage-secret"
4m 4m 1 kubelet, lenddo-vm-2 Warning FailedMount MountVolume.SetUp failed for volume "heketi-storage" : glusterfs: mount failed: mount failed: exit status 1
Mounting command: mount
Mounting arguments: 192.168.1.242:heketidbstorage /var/lib/kubelet/pods/2fd9b6c3-a988-11e7-9a8d-000c29580815/volumes/kubernetes.io~glusterfs/heketi-storage glusterfs [log-level=ERROR log-file=/var/lib/kubelet/plugins/kubernetes.io/glusterfs/heketi-storage/heketi-storage-copy-job-fsrgw-glusterfs.log backup-volfile-servers=192.168.1.240:192.168.1.241:192.168.1.242]
Output: ERROR: Mount point does not exist
Please specify a mount point
Usage:
man 8 /sbin/mount.glusterfs
the following error information was pulled from the glusterfs log to help diagnose this issue:
/usr/sbin/glusterfs(+0x69a0)[0x7f08f82e69a0]
---------
4m 41s 9 kubelet, lenddo-vm-2 Warning FailedMount MountVolume.SetUp failed for volume "heketi-storage" : stat /var/lib/kubelet/pods/2fd9b6c3-a988-11e7-9a8d-000c29580815/volumes/kubernetes.io~glusterfs/heketi-storage: transport endpoint is not connected
2m 34s 2 kubelet, lenddo-vm-2 Warning FailedMount Unable to mount volumes for pod "heketi-storage-copy-job-fsrgw_default(2fd9b6c3-a988-11e7-9a8d-000c29580815)": timeout expired waiting for volumes to attach/mount for pod "default"/"heketi-storage-copy-job-fsrgw". list of unattached/unmounted volumes=[heketi-storage]
2m 34s 2 kubelet, lenddo-vm-2 Warning FailedSync Error syncing pod
Checking status of pods matching '--selector=job-name=heketi-storage-copy-job':
heketi-storage-copy-job-fsrgw 0/1 ContainerCreating 0 5m
Timed out waiting for pods matching '--selector=job-name=heketi-storage-copy-job'.
Error waiting for job 'heketi-storage-copy-job' to complete.'
kubectl logs heketi-storage-copy-job-fsrgw
Error from server (BadRequest): container "heketi" in pod "heketi-storage-copy-job-fsrgw" is waiting to start: ContainerCreating
kubectl get all,ds,svc,deploy,secret,sa,clusterrolebinding --all-namespaces
NAMESPACE NAME READY STATUS RESTARTS AGE
default po/deploy-heketi-4089530887-l58qj 1/1 Running 0 10m
default po/glusterfs-4n5f6 1/1 Running 0 10m
default po/glusterfs-kdkw3 1/1 Running 0 10m
default po/glusterfs-kh863 1/1 Running 0 10m
default po/heketi-storage-copy-job-fsrgw 0/1 ContainerCreating 0 9m
kube-system po/kube-apiserver-lenddo-vm-1 1/1 Running 1 2d
kube-system po/kube-controller-manager-lenddo-vm-1 1/1 Running 1 2d
kube-system po/kube-dns-2410490047-rvcr2 3/3 Running 0 2d
kube-system po/kube-dns-2410490047-tx8z7 3/3 Running 0 2d
kube-system po/kube-proxy-lenddo-vm-1 1/1 Running 1 2d
kube-system po/kube-proxy-lenddo-vm-2 1/1 Running 0 2d
kube-system po/kube-proxy-lenddo-vm-3 1/1 Running 0 2d
kube-system po/kube-scheduler-lenddo-vm-1 1/1 Running 1 2d
kube-system po/kubedns-autoscaler-4166808448-bd17h 1/1 Running 0 2d
kube-system po/kubernetes-dashboard-3307607089-3nzts 1/1 Running 0 2d
kube-system po/nginx-proxy-lenddo-vm-2 1/1 Running 0 2d
kube-system po/nginx-proxy-lenddo-vm-3 1/1 Running 0 2d
NAMESPACE NAME CLUSTER-IP EXTERNAL-IP PORT(S) AGE
default svc/deploy-heketi 10.233.27.89 <none> 8080/TCP 10m
default svc/heketi-storage-endpoints 10.233.63.78 <none> 1/TCP 9m
default svc/kubernetes 10.233.0.1 <none> 443/TCP 2d
kube-system svc/kube-dns 10.233.0.3 <none> 53/UDP,53/TCP 2d
kube-system svc/kubernetes-dashboard 10.233.33.172 <none> 80/TCP 2d
NAMESPACE NAME DESIRED SUCCESSFUL AGE
default jobs/heketi-storage-copy-job 1 0 9m
NAMESPACE NAME DESIRED CURRENT UP-TO-DATE AVAILABLE AGE
default deploy/deploy-heketi 1 1 1 1 10m
kube-system deploy/kube-dns 2 2 2 2 2d
kube-system deploy/kubedns-autoscaler 1 1 1 1 2d
kube-system deploy/kubernetes-dashboard 1 1 1 1 2d
NAMESPACE NAME DESIRED CURRENT READY AGE
default rs/deploy-heketi-4089530887 1 1 1 10m
kube-system rs/kube-dns-2410490047 2 2 2 2d
kube-system rs/kubedns-autoscaler-4166808448 1 1 1 2d
kube-system rs/kubernetes-dashboard-3307607089 1 1 1 2d
NAMESPACE NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE-SELECTOR AGE
default ds/glusterfs 3 3 3 3 3 storagenode=glusterfs 10m
NAMESPACE NAME CLUSTER-IP EXTERNAL-IP PORT(S) AGE
default svc/deploy-heketi 10.233.27.89 <none> 8080/TCP 10m
default svc/heketi-storage-endpoints 10.233.63.78 <none> 1/TCP 9m
default svc/kubernetes 10.233.0.1 <none> 443/TCP 2d
kube-system svc/kube-dns 10.233.0.3 <none> 53/UDP,53/TCP 2d
kube-system svc/kubernetes-dashboard 10.233.33.172 <none> 80/TCP 2d
NAMESPACE NAME DESIRED CURRENT UP-TO-DATE AVAILABLE AGE
default deploy/deploy-heketi 1 1 1 1 10m
kube-system deploy/kube-dns 2 2 2 2 2d
kube-system deploy/kubedns-autoscaler 1 1 1 1 2d
kube-system deploy/kubernetes-dashboard 1 1 1 1 2d
NAMESPACE NAME TYPE DATA AGE
default secrets/default-token-s24hk kubernetes.io/service-account-token 3 2d
default secrets/heketi-config-secret Opaque 3 10m
default secrets/heketi-service-account-token-lltzn kubernetes.io/service-account-token 3 10m
default secrets/heketi-storage-secret Opaque 1 9m
kube-public secrets/default-token-zc6gr kubernetes.io/service-account-token 3 2d
kube-system secrets/default-token-5ng2g kubernetes.io/service-account-token 3 2d
NAMESPACE NAME SECRETS AGE
default sa/default 1 2d
default sa/heketi-service-account 1 10m
kube-public sa/default 1 2d
kube-system sa/default 1 2d
NAMESPACE NAME AGE
error: clusterRoleBinding is not namespaced
@jarrpa so I aborted the glusterfs again and deleted the files and dirs, wipefs -af /dev/sdb and now im getting this. the output of lsblk is now different...
If i do vgs, i dont see any volume group for /dev/sdb.
how do i remove this?
Creating cluster ... ID: bf66059410f91c5cebbbd8f5e308a2cc
Creating node lenddo-vm-1 ... ID: 5f33e9ba54784a66863d2e5fca9156cd
Adding device /dev/sdb ... Unable to add device: Unable to execute command on glusterfs-hq076: Couldn't find device with uuid QUKge9-Ytef-JM0q-fpAT-kc9c-cHD0-dGoZVN.
Couldn't find device with uuid qgu45O-hQg5-BZp2-g9oe-NXsN-a5lV-53Erq6.
Can't open /dev/sdb exclusively. Mounted filesystem?
Creating node lenddo-vm-2 ... ID: 32fccc28362fd7fa65fda8239358b492
Adding device /dev/sdb ... Unable to add device: Unable to execute command on glusterfs-vc0mb: Can't open /dev/sdb exclusively. Mounted filesystem?
Creating node lenddo-vm-3 ... ID: 23cf83e60be6c4851ab032094f06e43f
Adding device /dev/sdb ... Unable to add device: Unable to execute command on glusterfs-dvl3t: Can't open /dev/sdb exclusively. Mounted filesystem?
Error loading the cluster topology.
Please check the failed node or device and rerun this script.
**[[email protected] deploy]# lsblk**
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
fd0 2:0 1 4K 0 disk
sda 8:0 0 100G 0 disk
├─sda1 8:1 0 1G 0 part /boot
└─sda2 8:2 0 99G 0 part
├─cl-root 253:0 0 13.4G 0 lvm /
└─cl-swap 253:1 0 1.6G 0 lvm [SWAP]
sdb 8:16 0 50G 0 disk
├─vg_157b3107d4607da4385c106dbf3c2efe-tp_b19693f10248b6144064cf6b97057c33_tmeta 253:2 0 12M 0 lvm
│ └─vg_157b3107d4607da4385c106dbf3c2efe-tp_b19693f10248b6144064cf6b97057c33-tpool 253:4 0 2G 0 lvm
│ ├─vg_157b3107d4607da4385c106dbf3c2efe-tp_b19693f10248b6144064cf6b97057c33 253:5 0 2G 0 lvm
│ └─vg_157b3107d4607da4385c106dbf3c2efe-brick_b19693f10248b6144064cf6b97057c33 253:6 0 2G 0 lvm
└─vg_157b3107d4607da4385c106dbf3c2efe-tp_b19693f10248b6144064cf6b97057c33_tdata 253:3 0 2G 0 lvm
└─vg_157b3107d4607da4385c106dbf3c2efe-tp_b19693f10248b6144064cf6b97057c33-tpool 253:4 0 2G 0 lvm
├─vg_157b3107d4607da4385c106dbf3c2efe-tp_b19693f10248b6144064cf6b97057c33 253:5 0 2G 0 lvm
└─vg_157b3107d4607da4385c106dbf3c2efe-brick_b19693f10248b6144064cf6b97057c33 253:6 0 2G 0 l
Does
vgremove \ of corresponding vg devices of sdb does not helps?
@SaravanaStorageNetwork i tried vgremove vg_157b3107d4607da4385c106dbf3c2efe-tp_b19693f10248b6144064cf6b97057c33_tmeta
but it said its not a valid vg.
Check whether this helps, sdb in your case.
As long as you don't care what's on the drive at /dev/sdc, try this:
dd if=/dev/zero of=/dev/sdc bs=1m count=10
That will zero out the first 10 MB of the disk, including any LVM or RAID headers. Then reboot; the system should see that the disk is no longer a part of any LVM group.
@SaravanaStorageNetwork I was able to pass the adding device on nodes, however this is what im getting now:
Creating cluster ... ID: 806ce4f0897b5fa7e66f782c267d79fe
Creating node lenddo-vm-1 ... ID: 8999d9d0190c29068a92351a0f374496
Adding device /dev/sdb ... OK
Creating node lenddo-vm-2 ... ID: fc1ab38c5952c471bec13dae1805225c
Adding device /dev/sdb ... OK
Creating node lenddo-vm-3 ... ID: c56d9a002ac915533331e788541386dc
Adding device /dev/sdb ... OK
heketi topology loaded.
/usr/local/bin/kubectl -n default exec -i deploy-heketi-4089530887-t0jj6 -- heketi-cli -s http://localhost:8080 --user admin --secret '' setup-openshift-heketi-storage --listfile=/tmp/heketi-storage.json 2>&1
Saving /tmp/heketi-storage.json
/usr/local/bin/kubectl -n default exec -i deploy-heketi-4089530887-t0jj6 -- cat /tmp/heketi-storage.json | /usr/local/bin/kubectl -n default create -f - 2>&1
secret "heketi-storage-secret" created
job "heketi-storage-copy-job" created
Error from server (AlreadyExists): endpoints "heketi-storage-endpoints" already exists
Error from server (AlreadyExists): services "heketi-storage-endpoints" already exists
Failed on creating heketi storage resources.
You have some stale resources still present.
By using abort option in the script, try deleting existing resources and check for any existing resources.
If present, delete them manually and try again
@SaravanaStorageNetwork where can i find the stale resources for heketi-storage-endpoints?
I always do this on all nodes whenever i abort:
rm -rf /var/lib/heketi /etc/glusterfs /var/lib/glusterd /var/log/glusterfs
Am i missing any other directories?
@Justluckyg Run gk-deploy -g --abort
then inspect the output of kubectl get svc,ep --all-namespaces
to see if anything it left. If so, let us know as that's a bug. :)
@Justluckyg For the record, "resources" in this case are Kubernetes objects like Pods and Endpoints.
@jarrpa thank you i will try that tomorrow on prem (dont have access remotely :()
by the way, would you know as well what would cause this:
Checking status of pods matching '--selector=job-name=heketi-storage-copy-job':
heketi-storage-copy-job-fsrgw 0/1 ContainerCreating 0 5m
Timed out waiting for pods matching '--selector=job-name=heketi-storage-copy-job'.
Error waiting for job 'heketi-storage-copy-job' to complete.'
kubectl logs heketi-storage-copy-job-fsrgw
Error from server (BadRequest): container "heketi" in pod "heketi-storage-copy-job-fsrgw" is waiting to start: ContainerCreating
the container never gets created for heketi storage copy
@Justluckyg Updated your comment again for proper formatting. :)
If you do an kubectl describe
on the job it'll probably show that it can't mount the GlusterFS volume, which isn't surprising given the incomplete state of your setup.
@jarrpa when i describe the heketi storage pod, this is what im getting:
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: <none>
Events:
FirstSeen LastSeen Count From SubObjectPath Type Reason Message
--------- -------- ----- ---- ------------- -------- ------ -------
18h 1m 492 kubelet, lenddo-vm-2 Warning FailedMount Unable to mount volumes for pod "heketi-storage-copy-job-zd4js_default(0e39e43c-a9ac-11e7-b78f-000c29580815)": timeout expired waiting for volumes to attach/mount for pod "default"/"heketi-storage-copy-job-zd4js". list of unattached/unmounted volumes=[heketi-storage]
18h 1m 492 kubelet, lenddo-vm-2 Warning FailedSync Error syncing pod
18h 1m 555 kubelet, lenddo-vm-2 Warning FailedMount MountVolume.SetUp failed for volume "heketi-storage" : stat /var/lib/kubelet/pods/0e39e43c-a9ac-11e7-b78f-000c29580815/volumes/kubernetes.io~glusterfs/heketi-storage: transport endpoint is not connected
and when i do df -h on the node vm 2:
[[email protected] ~]# df -h
df: ‘/var/lib/kubelet/pods/0e39e43c-a9ac-11e7-b78f-000c29580815/volumes/kubernetes.io~glusterfs/heketi-storage’: Transport endpoint is not connected
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/cl-root 14G 7.1G 6.4G 53% /
devtmpfs 910M 0 910M 0% /dev
tmpfs 920M 0 920M 0% /dev/shm
tmpfs 920M 102M 819M 11% /run
tmpfs 920M 0 920M 0% /sys/fs/cgroup
/dev/sda1 1014M 184M 831M 19% /boot
[[email protected] ~]# kubectl get endpoints
NAME ENDPOINTS AGE
deploy-heketi 10.233.118.150:8080 18h
heketi-storage-endpoints 192.168.1.240:1,192.168.1.241:1,192.168.1.242:1 23h
kubernetes 192.168.1.240:6443 3d
[[email protected] ~]# kubectl get all,ds,svc,deploy,secret,sa,clusterrolebinding --all-namespaces
NAMESPACE NAME READY STATUS RESTARTS AGE
default po/deploy-heketi-4089530887-gl8cs 1/1 Running 0 18h
default po/glusterfs-kw67h 1/1 Running 0 18h
default po/glusterfs-tdfnn 1/1 Running 0 18h
default po/glusterfs-z4s1q 1/1 Running 0 18h
default po/heketi-storage-copy-job-zd4js 0/1 ContainerCreating 0 18h
kube-system po/kube-apiserver-lenddo-vm-1 1/1 Running 4 3d
kube-system po/kube-controller-manager-lenddo-vm-1 1/1 Running 5 3d
kube-system po/kube-dns-2410490047-f36cs 3/3 Running 3 21h
kube-system po/kube-dns-2410490047-m258q 3/3 Running 3 21h
kube-system po/kube-proxy-lenddo-vm-1 1/1 Running 4 3d
kube-system po/kube-proxy-lenddo-vm-2 1/1 Running 3 3d
kube-system po/kube-proxy-lenddo-vm-3 1/1 Running 3 3d
kube-system po/kube-scheduler-lenddo-vm-1 1/1 Running 4 3d
kube-system po/kubedns-autoscaler-4166808448-bd17h 0/1 Error 2 3d
kube-system po/kubernetes-dashboard-3307607089-gp79b 1/1 Running 1 21h
kube-system po/nginx-proxy-lenddo-vm-2 1/1 Running 3 3d
kube-system po/nginx-proxy-lenddo-vm-3 1/1 Running 3 3d
NAMESPACE NAME CLUSTER-IP EXTERNAL-IP PORT(S) AGE
default svc/deploy-heketi 10.233.53.200 <none> 8080/TCP 18h
default svc/heketi-storage-endpoints 10.233.63.78 <none> 1/TCP 23h
default svc/kubernetes 10.233.0.1 <none> 443/TCP 3d
kube-system svc/kube-dns 10.233.0.3 <none> 53/UDP,53/TCP 3d
kube-system svc/kubernetes-dashboard 10.233.33.172 <none> 80/TCP 3d
NAMESPACE NAME DESIRED SUCCESSFUL AGE
default jobs/heketi-storage-copy-job 1 0 18h
NAMESPACE NAME DESIRED CURRENT UP-TO-DATE AVAILABLE AGE
default deploy/deploy-heketi 1 1 1 1 18h
kube-system deploy/kube-dns 2 2 2 2 3d
kube-system deploy/kubedns-autoscaler 1 1 1 0 3d
kube-system deploy/kubernetes-dashboard 1 1 1 1 3d
NAMESPACE NAME DESIRED CURRENT READY AGE
default rs/deploy-heketi-4089530887 1 1 1 18h
kube-system rs/kube-dns-2410490047 2 2 2 3d
kube-system rs/kubedns-autoscaler-4166808448 1 1 0 3d
kube-system rs/kubernetes-dashboard-3307607089 1 1 1 3d
NAMESPACE NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE-SELECTOR AGE
default ds/glusterfs 3 3 3 3 3 storagenode=glusterfs 18h
NAMESPACE NAME CLUSTER-IP EXTERNAL-IP PORT(S) AGE
default svc/deploy-heketi 10.233.53.200 <none> 8080/TCP 18h
default svc/heketi-storage-endpoints 10.233.63.78 <none> 1/TCP 23h
default svc/kubernetes 10.233.0.1 <none> 443/TCP 3d
kube-system svc/kube-dns 10.233.0.3 <none> 53/UDP,53/TCP 3d
kube-system svc/kubernetes-dashboard 10.233.33.172 <none> 80/TCP 3d
NAMESPACE NAME DESIRED CURRENT UP-TO-DATE AVAILABLE AGE
default deploy/deploy-heketi 1 1 1 1 18h
kube-system deploy/kube-dns 2 2 2 2 3d
kube-system deploy/kubedns-autoscaler 1 1 1 0 3d
kube-system deploy/kubernetes-dashboard 1 1 1 1 3d
NAMESPACE NAME TYPE DATA AGE
default secrets/default-token-s24hk kubernetes.io/service-account-token 3 3d
default secrets/heketi-config-secret Opaque 3 18h
default secrets/heketi-service-account-token-3qr2x kubernetes.io/service-account-token 3 18h
default secrets/heketi-storage-secret Opaque 1 18h
kube-public secrets/default-token-zc6gr kubernetes.io/service-account-token 3 3d
kube-system secrets/default-token-7c50v kubernetes.io/service-account-token 3 21h
NAMESPACE NAME SECRETS AGE
default sa/default 1 3d
default sa/heketi-service-account 1 18h
kube-public sa/default 1 3d
kube-system sa/default 1 3d
NAMESPACE NAME AGE
error: clusterRoleBinding is not namespaced
@Justluckyg As I suspected. Yes, follow the guidance we just gave: run the abort, the check to see if there are any heketi svc or ep resources left. Report if there are any. Run the rm
and wipefs -a
commands on all storage nodes. Run the deploy again. If you get to this same state, with the copy job timing out, inspect the GlusterFS logs on the node being targeted for the mount. Also verify whether you can manually mount the gluster volume (mount -t glusterfs
) from the node running the copy job.
@jarrpa i just did the above, i was the getting the same. i manually deleted the heketi service that didnt get deleted by the gk-deploy command. but the heketi-storage is still failing
[[email protected] ~]# kubectl describe pod heketi-storage-copy-job-gkszq
Name: heketi-storage-copy-job-gkszq
Namespace: default
Node: lenddo-vm-2/192.168.1.241
Start Time: Fri, 06 Oct 2017 12:08:03 +0800
Labels: controller-uid=f2f74b50-aa4b-11e7-9eb4-000c29580815
job-name=heketi-storage-copy-job
Annotations: kubernetes.io/created-by={"kind":"SerializedReference","apiVersion":"v1","reference":{"kind":"Job","namespace":"default","name":"heketi-storage-copy-job","uid":"f2f74b50-aa4b-11e7-9eb4-000c29580815","...
Status: Pending
IP:
Created By: Job/heketi-storage-copy-job
Controlled By: Job/heketi-storage-copy-job
Containers:
heketi:
Container ID:
Image: heketi/heketi:dev
Image ID:
Port: <none>
Command:
cp
/db/heketi.db
/heketi
State: Waiting
Reason: ContainerCreating
Ready: False
Restart Count: 0
Environment: <none>
Mounts:
/db from heketi-storage-secret (rw)
/heketi from heketi-storage (rw)
/var/run/secrets/kubernetes.io/serviceaccount from default-token-s24hk (ro)
Conditions:
Type Status
Initialized True
Ready False
PodScheduled True
Volumes:
heketi-storage:
Type: Glusterfs (a Glusterfs mount on the host that shares a pod's lifetime)
EndpointsName: heketi-storage-endpoints
Path: heketidbstorage
ReadOnly: false
heketi-storage-secret:
Type: Secret (a volume populated by a Secret)
SecretName: heketi-storage-secret
Optional: false
default-token-s24hk:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-s24hk
Optional: false
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: <none>
Events:
FirstSeen LastSeen Count From SubObjectPath Type Reason Message
--------- -------- ----- ---- ------------- -------- ------ -------
5m 5m 1 default-scheduler Normal Scheduled Successfully assigned heketi-storage-copy-job-gkszq to lenddo-vm-2
5m 5m 1 kubelet, lenddo-vm-2 Normal SuccessfulMountVolume MountVolume.SetUp succeeded for volume "default-token-s24hk"
5m 5m 1 kubelet, lenddo-vm-2 Normal SuccessfulMountVolume MountVolume.SetUp succeeded for volume "heketi-storage-secret"
5m 5m 1 kubelet, lenddo-vm-2 Warning FailedMount MountVolume.SetUp failed for volume "heketi-storage" : glusterfs: mount failed: mount failed: exit status 1
Mounting command: mount
Mounting arguments: 192.168.1.242:heketidbstorage /var/lib/kubelet/pods/f2fc138d-aa4b-11e7-9eb4-000c29580815/volumes/kubernetes.io~glusterfs/heketi-storage glusterfs [log-level=ERROR log-file=/var/lib/kubelet/plugins/kubernetes.io/glusterfs/heketi-storage/heketi-storage-copy-job-gkszq-glusterfs.log backup-volfile-servers=192.168.1.240:192.168.1.241:192.168.1.242]
Output: ERROR: Mount point does not exist
Please specify a mount point
Usage:
man 8 /sbin/mount.glusterfs
the following error information was pulled from the glusterfs log to help diagnose this issue:
/usr/sbin/glusterfs(+0x69a0)[0x7f0d59d799a0]
---------
5m 1m 9 kubelet, lenddo-vm-2 Warning FailedMount MountVolume.SetUp failed for volume "heketi-storage" : stat /var/lib/kubelet/pods/f2fc138d-aa4b-11e7-9eb4-000c29580815/volumes/kubernetes.io~glusterfs/heketi-storage: transport endpoint is not connected
3m 1m 2 kubelet, lenddo-vm-2 Warning FailedMount Unable to mount volumes for pod "heketi-storage-copy-job-gkszq_default(f2fc138d-aa4b-11e7-9eb4-000c29580815)": timeout expired waiting for volumes to attach/mount for pod "default"/"heketi-storage-copy-job-gkszq". list of unattached/unmounted volumes=[heketi-storage]
3m 1m 2 kubelet, lenddo-vm-2 Warning FailedSync Error syncing pod
and this is the glusterd logs from that node:
[2017-10-06 04:07:58.903816] I [rpc-clnt.c:1059:rpc_clnt_connection_init] 0-management: setting frame-timeout to 600
[2017-10-06 04:07:58.918264] I [rpc-clnt.c:1059:rpc_clnt_connection_init] 0-snapd: setting frame-timeout to 600
[2017-10-06 04:07:58.918736] I [rpc-clnt.c:1059:rpc_clnt_connection_init] 0-nfs: setting frame-timeout to 600
[2017-10-06 04:07:58.918837] I [MSGID: 106132] [glusterd-proc-mgmt.c:83:glusterd_proc_stop] 0-management: nfs already stopped
[2017-10-06 04:07:58.918874] I [MSGID: 106568] [glusterd-svc-mgmt.c:228:glusterd_svc_stop] 0-management: nfs service is stopped
[2017-10-06 04:07:58.919237] I [rpc-clnt.c:1059:rpc_clnt_connection_init] 0-glustershd: setting frame-timeout to 600
[2017-10-06 04:07:58.919905] I [MSGID: 106132] [glusterd-proc-mgmt.c:83:glusterd_proc_stop] 0-management: glustershd already stopped
[2017-10-06 04:07:58.919973] I [MSGID: 106568] [glusterd-svc-mgmt.c:228:glusterd_svc_stop] 0-management: glustershd service is stopped
[2017-10-06 04:07:58.920005] I [MSGID: 106567] [glusterd-svc-mgmt.c:196:glusterd_svc_start] 0-management: Starting glustershd service
[2017-10-06 04:07:59.926164] I [rpc-clnt.c:1059:rpc_clnt_connection_init] 0-quotad: setting frame-timeout to 600
[2017-10-06 04:07:59.926590] I [rpc-clnt.c:1059:rpc_clnt_connection_init] 0-bitd: setting frame-timeout to 600
[2017-10-06 04:07:59.926736] I [MSGID: 106132] [glusterd-proc-mgmt.c:83:glusterd_proc_stop] 0-management: bitd already stopped
[2017-10-06 04:07:59.926772] I [MSGID: 106568] [glusterd-svc-mgmt.c:228:glusterd_svc_stop] 0-management: bitd service is stopped
[2017-10-06 04:07:59.926899] I [rpc-clnt.c:1059:rpc_clnt_connection_init] 0-scrub: setting frame-timeout to 600
[2017-10-06 04:07:59.927069] I [MSGID: 106132] [glusterd-proc-mgmt.c:83:glusterd_proc_stop] 0-management: scrub already stopped
[2017-10-06 04:07:59.927103] I [MSGID: 106568] [glusterd-svc-mgmt.c:228:glusterd_svc_stop] 0-management: scrub service is stopped
[2017-10-06 04:08:02.146806] I [run.c:191:runner_log] (-->/usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so(+0xcf4fa) [0x7f63cb75a4fa] -->/usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so(+0xcefac) [0x7f63cb759fac] -->/lib64/libglusterfs.so.0(runner_log+0x105) [0x7f63d0781ee5] ) 0-management: Ran script: /var/lib/glusterd/hooks/1/start/post/S29CTDBsetup.sh --volname=heketidbstorage --first=yes --version=1 --volume-op=start --gd-workdir=/var/lib/glusterd
[2017-10-06 04:08:02.156404] E [run.c:191:runner_log] (-->/usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so(+0xcf4fa) [0x7f63cb75a4fa] -->/usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so(+0xcef52) [0x7f63cb759f52] -->/lib64/libglusterfs.so.0(runner_log+0x105) [0x7f63d0781ee5] ) 0-management: Failed to execute script: /var/lib/glusterd/hooks/1/start/post/S30samba-start.sh --volname=heketidbstorage --first=yes --version=1 --volume-op=start --gd-workdir=/var/lib/glusterd
>/lib64/libglusterfs.so.0(runner_log+0x105) [0x7f63d0781ee5] ) 0-management: Failed to execute script: /var/lib/glusterd/hooks/1/start/post/S30samba-start.sh --volname=heketidbstorage --first=yes --version=1 --volume-op=start --gd-workdir=/var/lib/glusterd
I did the mount -t glusterfs from node2, didnt make any difference...
@Justluckyg Okay, but you were able to mount, yes? And what was the exact name of the service you had to delete?
@jarrpa with the mount -t glusterfs? i only did that after trying to run the gk-deploy -gvy script while the heketi storage copy job was containercreating.
the heketi-service was the one i deleted.
@Justluckyg Right, but it worked? And what was the EXACT name of the service, was it heketi-storage-endpoints
?
@jarrpa when i entered the command mount -t glusterfs
it took it, there was no error. and yes for the service. heketi-storage-endpoints
@Justluckyg Idea: Try mounting the GlusterFS volume again, on a node not running the copy job pod, and see if you can do an ls
on the mounted directory.
@jarrpa should i just do mount -t glusterfs
or is there anything after that? because when i do that on node 3 and do df -h, i dont see any newly mounted directory?
@Justluckyg Yes: Create a new directory somewhere, then run mount -t glusterfs 192.168.1.242:heketidbstorage <SOME_DIR>
. This should tell us if the GlusterFS volume can be accessed from the node you're working on. Afterwards, run umount <SOME_DIR>
to unmount the volume.
@jarrpa its unable to mount..
[[email protected] ~]# mkdir test
[[email protected] ~]# mount -t glusterfs 192.168.1.242:heketidbstorage test
Mount failed. Please check the log file for more details.
@Justluckyg Ah-ha! try running the mount
command again with -v
.
@jarrpa it didnt take it
[email protected] ~]# mount -v -t glusterfs 192.168.1.242:heketidbstorage test
/sbin/mount.glusterfs: illegal option -- v
Usage: /sbin/mount.glusterfs <volumeserver>:<volumeid/volumeport> -o<options> <mountpoint>
Options:
man 8 /sbin/mount.glusterfs
To display the version number of the mount helper: /sbin/mount.glusterfs -V
@Justluckyg Irritating... see if there are any logs in /var/log
that mention heketidbstorage
, something like grep -R heketidbstorage /var/log
.
@jarrpa theres a bunch, not sure which one will be helpful:
so i put it here
@Justluckyg Good start. Look for errors towards the end of /var/log/glusterfs/bricks/var-lib-heketi-mounts-vg_bb208485241a154b8d3070d2da837a53-brick_80e7daa7eb94cae8e4d1c81ccbdad92b-brick.log
.
@Justluckyg Also /var/log/glusterfs/root-test.log
.
@jarrpa no error there. all Informational line. But in glusterfs/glusterd.log, im getting this:
same on the node where the copy-job is trying to create.
[2017-10-06 04:08:01.705565] I [run.c:191:runner_log] (-->/usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so(+0xcf4fa) [0x7fa89dc9b4fa] -->/usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so(+0xcefac) [0x7fa89dc9afac] -->/lib64/libglusterfs.so.0(runner_log+0x105) [0x7fa8a2cc2ee5] ) 0-management: Ran script: /var/lib/glusterd/hooks/1/start/post/S29CTDBsetup.sh --volname=heketidbstorage --first=yes --version=1 --volume-op=start --gd-workdir=/var/lib/glusterd
[2017-10-06 04:08:01.898044] E [run.c:191:runner_log] (-->/usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so(+0xcf4fa) [0x7fa89dc9b4fa] -->/usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so(+0xcef52) [0x7fa89dc9af52] -->/lib64/libglusterfs.so.0(runner_log+0x105) [0x7fa8a2cc2ee5] ) 0-management: Failed to execute script: /var/lib/glusterd/hooks/1/start/post/S30samba-start.sh --volname=heketidbstorage --first=yes --version=1 --volume-op=start --gd-workdir=/var/lib/glusterd
```
[[email protected] glusterfs]# cat /var/log/glusterfs/root-test.log
[2017-10-06 13:02:02.140216] I [MSGID: 100030] [glusterfsd.c:2338:main] 0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs version 3.7.20 (args: /usr/sbin/glusterfs --volfile-server=192.168.1.242 --volfile-id=heketidbstorage /root/test)
pending frames:
patchset: git://git.gluster.com/glusterfs.git
signal received: 11
time of crash:
2017-10-06 13:02:02
configuration details:
argp 1
backtrace 1
dlfcn 1
libpthread 1
llistxattr 1
setfsid 1
spinlock 1
epoll.h 1
xattr.h 1
st_atim.tv_nsec 1
package-string: glusterfs 3.7.20
/lib64/libglusterfs.so.0(_gf_msg_backtrace_nomem+0xc2)[0x7f6a92360722]
/lib64/libglusterfs.so.0(gf_print_trace+0x31d)[0x7f6a92385ddd]
/lib64/libc.so.6(+0x35250)[0x7f6a909d4250]
/lib64/libglusterfs.so.0(gf_ports_reserved+0x142)[0x7f6a92386442]
/lib64/libglusterfs.so.0(gf_process_reserved_ports+0x7e)[0x7f6a923866be]
/usr/lib64/glusterfs/3.7.20/rpc-transport/socket.so(+0xc958)[0x7f6a86d50958]
/usr/lib64/glusterfs/3.7.20/rpc-transport/socket.so(client_bind+0x93)[0x7f6a86d50d83]
/usr/lib64/glusterfs/3.7.20/rpc-transport/socket.so(+0xa153)[0x7f6a86d4e153]
/lib64/libgfrpc.so.0(rpc_clnt_reconnect+0xc9)[0x7f6a9212de19]
/lib64/libgfrpc.so.0(rpc_clnt_start+0x39)[0x7f6a9212ded9]
/usr/sbin/glusterfs(glusterfs_mgmt_init+0x24c)[0x7f6a928493ac]
/usr/sbin/glusterfs(glusterfs_volumes_init+0x46)[0x7f6a928442b6]
/usr/sbin/glusterfs(main+0x810)[0x7f6a92840860]
/lib64/libc.so.6(__libc_start_main+0xf5)[0x7f6a909c0b35]
no root-test.log in the node where the copy-job is trying to create. lenddo-vm-2
@Justluckyg You can ignore those errors, they're "normal". :)
The root-test log would only appear on nodes where you've mounted a gluster volume to /root/test
.
Do me a favor and check the version of Gluster running in the pods (gluster --version
) and the version of the GlusterFS FUSE client on your nodes (mount.glusterfs -V
).
@jarrpa same on all 3
[[email protected] glusterfs]# mount.glusterfs -V
glusterfs 3.7.20 built on Jan 30 2017 15:30:07
Repository revision: git://git.gluster.com/glusterfs.git
Copyright (c) 2006-2013 Red Hat, Inc. <http://www.redhat.com/>
GlusterFS comes with ABSOLUTELY NO WARRANTY.
It is licensed to you under your choice of the GNU Lesser
General Public License, version 3 or any later version (LGPLv3
or later), or the GNU General Public License, version 2 (GPLv2),
in all cases as published by the Free Software Foundation.
[[email protected] glusterfs]# gluster --version
glusterfs 3.7.20 built on Jan 30 2017 15:30:09
Repository revision: git://git.gluster.com/glusterfs.git
Copyright (c) 2006-2011 Gluster Inc. <http://www.gluster.com>
GlusterFS comes with ABSOLUTELY NO WARRANTY.
You may redistribute copies of GlusterFS under the terms of the GNU General Public License.
@Justluckyg Hmm... that might be too old. Do you have any way of updating that?
@jarrpa i noticed upgrading it is not as easy as just doing a yum upgrade as it throws a lot of dependency errors.
@Justluckyg Darn. Well, that's my current thinking, unfortunately. Can you try to see if you can resolve the dependency issues?
@jarrpa sure ill try that, do you have any preferred stable version?
@Justluckyg Preferably at or newer than the version in the GlusterFS pods, which I think is 3.10.5.
whew... after a month of testing and your unwavering support @jarrpa @SaravanaStorageNetwork... I finally successfully completed the deployment and deployed my first storage pvc using your sample nginx hello world application!
So your recommendation to upgrade the gluster ultimately resolved the issue where the heketi-storage pod was not creating @jarrpa. im now running 3.8.4 of glusterfs-fuse package.
I cant thank you enough for helping me! Now I am off to testing this further, with that I am glad to close this issue now :)
YEEEESSS!! HAH! :D Happy to hear that! Always feel free to come back any time you need additional help. Or if you just want to gice us praise, we won't turn that down either. ;)
@jarrpa I have the right version of mount.glusterfs, and glusterFS is running on 3 nodes. However, I still see the error: "Waiting for GlusterFS pods to start ... pods not found."
@verizonold If you are still having trouble please open a new issue and provide any information about your environment, what you've done, as well as the output of kubectl logs <heketi_pod>
.
Most helpful comment
YEEEESSS!! HAH! :D Happy to hear that! Always feel free to come back any time you need additional help. Or if you just want to gice us praise, we won't turn that down either. ;)