Gluster-kubernetes: glusterfs pods will not deploy

1

Im trying to deploy this glusterfs in a kubernetes cluster that was deployed via kubespray. I have 3 vm's (on baremetal running centos 7. I believe I followed all the prerequisites but im getting this once i run the ./gk-deploy-g

./gk-deploy -g
Welcome to the deployment tool for GlusterFS on Kubernetes and OpenShift.

Before getting started, this script has some requirements of the execution
environment and of the container platform that you should verify.

The client machine that will run this script must have:
 * Administrative access to an existing Kubernetes or OpenShift cluster
 * Access to a python interpreter 'python'

Each of the nodes that will host GlusterFS must also have appropriate firewall
rules for the required GlusterFS ports:
 * 2222  - sshd (if running GlusterFS in a pod)
 * 24007 - GlusterFS Management
 * 24008 - GlusterFS RDMA
 * 49152 to 49251 - Each brick for every volume on the host requires its own
   port. For every new brick, one new port will be used starting at 49152. We
   recommend a default range of 49152-49251 on each host, though you can adjust
   this to fit your needs.

The following kernel modules must be loaded:
 * dm_snapshot
 * dm_mirror
 * dm_thin_pool

For systems with SELinux, the following settings need to be considered:
 * virt_sandbox_use_fusefs should be enabled on each node to allow writing to
   remote GlusterFS volumes

In addition, for an OpenShift deployment you must:
 * Have 'cluster_admin' role on the administrative account doing the deployment
 * Add the 'default' and 'router' Service Accounts to the 'privileged' SCC
 * Have a router deployed that is configured to allow apps to access services
   running in the cluster

Do you wish to proceed with deployment?

[Y]es, [N]o? [Default: Y]: y
Using Kubernetes CLI.
2017-09-04 15:33:58.778503 I | proto: duplicate proto type registered: google.protobuf.Any
2017-09-04 15:33:58.778568 I | proto: duplicate proto type registered: google.protobuf.Duration
2017-09-04 15:33:58.778582 I | proto: duplicate proto type registered: google.protobuf.Timestamp
Using namespace "default".
Checking for pre-existing resources...
  GlusterFS pods ... not found.
  deploy-heketi pod ... not found.
  heketi pod ... not found.
Creating initial resources ... 2017-09-04 15:34:07.986783 I | proto: duplicate proto type registered: google.protobuf.Any
2017-09-04 15:34:07.986853 I | proto: duplicate proto type registered: google.protobuf.Duration
2017-09-04 15:34:07.986867 I | proto: duplicate proto type registered: google.protobuf.Timestamp
Error from server (AlreadyExists): error when creating "/root/gluster-kubernetes/deploy/kube-templates/heketi-service-account.yaml": serviceaccounts "heketi-service-account" already exists
2017-09-04 15:34:08.288683 I | proto: duplicate proto type registered: google.protobuf.Any
2017-09-04 15:34:08.288765 I | proto: duplicate proto type registered: google.protobuf.Duration
2017-09-04 15:34:08.288779 I | proto: duplicate proto type registered: google.protobuf.Timestamp
Error from server (AlreadyExists): clusterrolebindings.rbac.authorization.k8s.io "heketi-sa-view" already exists
2017-09-04 15:34:08.479687 I | proto: duplicate proto type registered: google.protobuf.Any
2017-09-04 15:34:08.479766 I | proto: duplicate proto type registered: google.protobuf.Duration
2017-09-04 15:34:08.479780 I | proto: duplicate proto type registered: google.protobuf.Timestamp
clusterrolebinding "heketi-sa-view" not labeled
OK
2017-09-04 15:34:08.751038 I | proto: duplicate proto type registered: google.protobuf.Any
2017-09-04 15:34:08.751103 I | proto: duplicate proto type registered: google.protobuf.Duration
2017-09-04 15:34:08.751116 I | proto: duplicate proto type registered: google.protobuf.Timestamp
error: 'storagenode' already has a value (glusterfs), and --overwrite is false
Failed to label node 'sum-vm-1'
Justluckyg picture Justluckyg  ·  4 Sep 2017

Most helpful comment

3

YEEEESSS!! HAH! :D Happy to hear that! Always feel free to come back any time you need additional help. Or if you just want to gice us praise, we won't turn that down either. ;)

jarrpa picture jarrpa  ·  9 Oct 2017

All comments

0

Looks like you've run this more than once? What happened the first time you ran it? Otherwise you somehow manually added a storagenode=glusterfs label on the node.

BTW this error should be able to be overcome if you have a recent enough version of this repo that has https://github.com/gluster/gluster-kubernetes/pull/339 merged.

jarrpa picture jarrpa  ·  5 Sep 2017
0

@jarrpa i was getting the same the first time i ran it.. do you mind elaborating the last part about overcoming it?

Justluckyg picture Justluckyg  ·  5 Sep 2017
0

Given that your log ended with Failed to label node 'sum-vm-1' I thought that's what you were raising the Issue about. The PR I linked gets rid of that message.

If you're raising the issue about the proto: duplicate proto lines, I have never seen that before and have no idea what it means. :( Did the deployment fail somehow?

jarrpa picture jarrpa  ·  5 Sep 2017
0

@jarrpa i tried it and was gettng this...

 Using namespace "default".
Checking for pre-existing resources...
  GlusterFS pods ... not found.
  deploy-heketi pod ... found.
  heketi pod ... not found.
  gluster-s3 pod ... not found.

Justluckyg picture Justluckyg  ·  5 Sep 2017
0

...yes, that's normal output. Did the deployment fail somehow?

jarrpa picture jarrpa  ·  5 Sep 2017
0

@jarrpa

it failed :(
Error from server (AlreadyExists): error when creating "STDIN": daemonsets.extensions "glusterfs" already exists
Waiting for GlusterFS pods to start ... pods not found.

Justluckyg picture Justluckyg  ·  5 Sep 2017
0

...okay, reset your environment:

  • Run gk-deploy -gy --abort
  • Run rm -rf /etc/glusterfs /var/lib/glusterd on every node

Then run gk-deploy -gvy. if it fails, paste the full output here along with the output of kubectl get deploy,ds,po -o wide.

jarrpa picture jarrpa  ·  5 Sep 2017
0

hi @jarrpa it failed

./gk-deploy -gvy
Using Kubernetes CLI.
2017-09-05 10:58:04.307027 I | proto: duplicate proto type registered: google.protobuf.Any
2017-09-05 10:58:04.307092 I | proto: duplicate proto type registered: google.protobuf.Duration
2017-09-05 10:58:04.307108 I | proto: duplicate proto type registered: google.protobuf.Timestamp

Checking status of namespace matching 'default':
default   Active    11d
Using namespace "default".
Checking for pre-existing resources...
  GlusterFS pods ...
Checking status of pods matching '--selector=glusterfs=pod':

Timed out waiting for pods matching '--selector=glusterfs=pod'.
not found.
  deploy-heketi pod ...
Checking status of pods matching '--selector=deploy-heketi=pod':

Timed out waiting for pods matching '--selector=deploy-heketi=pod'.
not found.
  heketi pod ...
Checking status of pods matching '--selector=heketi=pod':

Timed out waiting for pods matching '--selector=heketi=pod'.
not found.
  gluster-s3 pod ...
Checking status of pods matching '--selector=glusterfs=s3-pod':

Timed out waiting for pods matching '--selector=glusterfs=s3-pod'.
not found.
Creating initial resources ... /usr/local/bin/kubectl -n default create -f /root/gluster-kubernetes/deploy/kube-templates/heketi-service-account.yaml 2>&1
2017-09-05 10:58:15.580733 I | proto: duplicate proto type registered: google.protobuf.Any
2017-09-05 10:58:15.580803 I | proto: duplicate proto type registered: google.protobuf.Duration
2017-09-05 10:58:15.580816 I | proto: duplicate proto type registered: google.protobuf.Timestamp
serviceaccount "heketi-service-account" created
/usr/local/bin/kubectl -n default create clusterrolebinding heketi-sa-view --clusterrole=edit --serviceaccount=default:heketi-service-account 2>&1
2017-09-05 10:58:15.882600 I | proto: duplicate proto type registered: google.protobuf.Any
2017-09-05 10:58:15.882667 I | proto: duplicate proto type registered: google.protobuf.Duration
2017-09-05 10:58:15.882679 I | proto: duplicate proto type registered: google.protobuf.Timestamp
clusterrolebinding "heketi-sa-view" created
/usr/local/bin/kubectl -n default label --overwrite clusterrolebinding heketi-sa-view glusterfs=heketi-sa-view heketi=sa-view
2017-09-05 10:58:16.071163 I | proto: duplicate proto type registered: google.protobuf.Any
2017-09-05 10:58:16.071223 I | proto: duplicate proto type registered: google.protobuf.Duration
2017-09-05 10:58:16.071237 I | proto: duplicate proto type registered: google.protobuf.Timestamp
clusterrolebinding "heketi-sa-view" labeled
OK
Marking 'sum-vm-1' as a GlusterFS node.
/usr/local/bin/kubectl -n default label nodes lenddo-vm-1 storagenode=glusterfs --overwrite 2>&1
2017-09-05 10:58:16.277147 I | proto: duplicate proto type registered: google.protobuf.Any
2017-09-05 10:58:16.277213 I | proto: duplicate proto type registered: google.protobuf.Duration
2017-09-05 10:58:16.277226 I | proto: duplicate proto type registered: google.protobuf.Timestamp
node "sum-vm-1" labeled
Marking 'sum-vm-2' as a GlusterFS node.
/usr/local/bin/kubectl -n default label nodes lenddo-vm-2 storagenode=glusterfs --overwrite 2>&1
2017-09-05 10:58:16.503202 I | proto: duplicate proto type registered: google.protobuf.Any
2017-09-05 10:58:16.503260 I | proto: duplicate proto type registered: google.protobuf.Duration
2017-09-05 10:58:16.503273 I | proto: duplicate proto type registered: google.protobuf.Timestamp
node "sum-vm-2" labeled
Marking 'lenddo-vm-3' as a GlusterFS node.
/usr/local/bin/kubectl -n default label nodes lenddo-vm-3 storagenode=glusterfs --overwrite 2>&1
2017-09-05 10:58:16.715662 I | proto: duplicate proto type registered: google.protobuf.Any
2017-09-05 10:58:16.715720 I | proto: duplicate proto type registered: google.protobuf.Duration
2017-09-05 10:58:16.715733 I | proto: duplicate proto type registered: google.protobuf.Timestamp
node "sum-vm-3" labeled
Deploying GlusterFS pods.
sed -e 's/storagenode\: glusterfs/storagenode\: 'glusterfs'/g' /root/gluster-kubernetes/deploy/kube-templates/glusterfs-daemonset.yaml | /usr/local/bin/kubectl -n default create -f - 2>&1
2017-09-05 10:58:16.928411 I | proto: duplicate proto type registered: google.protobuf.Any
2017-09-05 10:58:16.928467 I | proto: duplicate proto type registered: google.protobuf.Duration
2017-09-05 10:58:16.928479 I | proto: duplicate proto type registered: google.protobuf.Timestamp
daemonset "glusterfs" created
Waiting for GlusterFS pods to start ...
Checking status of pods matching '--selector=glusterfs=pod':
glusterfs-h996x   0/1       CrashLoopBackOff   5         5m
glusterfs-hmln9   0/1       CrashLoopBackOff   5         5m
Timed out waiting for pods matching '--selector=glusterfs=pod'.
pods not found.
kubectl get deploy,ds,po -o wide
2017-09-05 11:05:25.072539 I | proto: duplicate proto type registered: google.protobuf.Any
2017-09-05 11:05:25.072617 I | proto: duplicate proto type registered: google.protobuf.Duration
2017-09-05 11:05:25.072632 I | proto: duplicate proto type registered: google.protobuf.Timestamp
NAME           DESIRED   CURRENT   READY     UP-TO-DATE   AVAILABLE   NODE-SELECTOR           AGE       CONTAINER(S)   IMAGE(S)                        SELECTOR
ds/glusterfs   3         2         0         2            0           storagenode=glusterfs   7m        glusterfs      gluster/gluster-centos:latest   glusterfs=pod,glusterfs-node=pod

NAME                 READY     STATUS             RESTARTS   AGE       IP              NODE
po/glusterfs-h996x   0/1       CrashLoopBackOff   6          7m        192.168.1.240   sum-vm-1
po/glusterfs-hmln9   0/1       CrashLoopBackOff   6          7m        192.168.1.241   sum-vm-2
Justluckyg picture Justluckyg  ·  5 Sep 2017
0

...Hm. So two of your GlusterFS pods are failing, and the third one is missing entirely. Is there anything useful if you run "kubectl describe" on the daemonset or pods?

jarrpa picture jarrpa  ·  5 Sep 2017
0
# kubectl describe pod glusterfs-h996x --namespace=default
2017-09-05 11:20:25.629923 I | proto: duplicate proto type registered: google.protobuf.Any
2017-09-05 11:20:25.629983 I | proto: duplicate proto type registered: google.protobuf.Duration
2017-09-05 11:20:25.629997 I | proto: duplicate proto type registered: google.protobuf.Timestamp
Name:       glusterfs-h996x
Namespace:  default
Node:       sum-vm-1/192.168.1.240
Start Time: Tue, 05 Sep 2017 10:58:17 +0800
Labels:     controller-revision-hash=1016952396
        glusterfs=pod
        glusterfs-node=pod
        pod-template-generation=1
Annotations:    kubernetes.io/created-by={"kind":"SerializedReference","apiVersion":"v1","reference":{"kind":"DaemonSet","namespace":"default","name":"glusterfs","uid":"10ecb06a-91e6-11e7-8884-000c29580815","apiVersi...
Status:     Running
IP:     192.168.1.240
Created By: DaemonSet/glusterfs
Controlled By:  DaemonSet/glusterfs
Containers:
  glusterfs:
    Container ID:   docker://597ef206a63bbc4a6416163fd2c60d6eecd7b9c260507107f0a5bdfcc38eb75e
    Image:      gluster/gluster-centos:latest
    Image ID:       docker-pullable://gluster/gluster-centos@sha256:e3e2881af497bbd76e4d3de90a4359d8167aa8410db2c66196f0b99df6067cb2
    Port:       <none>
    State:      Waiting
      Reason:       CrashLoopBackOff
    Last State:     Terminated
      Reason:       Error
      Exit Code:    1
      Started:      Tue, 05 Sep 2017 11:19:23 +0800
      Finished:     Tue, 05 Sep 2017 11:19:23 +0800
    Ready:      False
    Restart Count:  9
    Requests:
      cpu:      100m
      memory:       100Mi
    Liveness:       exec [/bin/bash -c systemctl status glusterd.service] delay=40s timeout=3s period=25s #success=1 #failure=15
    Readiness:      exec [/bin/bash -c systemctl status glusterd.service] delay=40s timeout=3s period=25s #success=1 #failure=15
    Environment:    <none>
    Mounts:
      /dev from glusterfs-dev (rw)
      /etc/glusterfs from glusterfs-etc (rw)
      /etc/ssl from glusterfs-ssl (ro)
      /run from glusterfs-run (rw)
      /run/lvm from glusterfs-lvm (rw)
      /sys/fs/cgroup from glusterfs-cgroup (ro)
      /var/lib/glusterd from glusterfs-config (rw)
      /var/lib/heketi from glusterfs-heketi (rw)
      /var/lib/misc/glusterfsd from glusterfs-misc (rw)
      /var/log/glusterfs from glusterfs-logs (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-1n7cc (ro)
Conditions:
  Type      Status
  Initialized   True
  Ready     False
  PodScheduled  True
Volumes:
  glusterfs-heketi:
    Type:   HostPath (bare host directory volume)
    Path:   /var/lib/heketi
  glusterfs-run:
    Type:   EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:
  glusterfs-lvm:
    Type:   HostPath (bare host directory volume)
    Path:   /run/lvm
  glusterfs-etc:
    Type:   HostPath (bare host directory volume)
    Path:   /etc/glusterfs
  glusterfs-logs:
    Type:   HostPath (bare host directory volume)
    Path:   /var/log/glusterfs
  glusterfs-config:
    Type:   HostPath (bare host directory volume)
    Path:   /var/lib/glusterd
  glusterfs-dev:
    Type:   HostPath (bare host directory volume)
    Path:   /dev
  glusterfs-misc:
    Type:   HostPath (bare host directory volume)
    Path:   /var/lib/misc/glusterfsd
  glusterfs-cgroup:
    Type:   HostPath (bare host directory volume)
    Path:   /sys/fs/cgroup
  glusterfs-ssl:
    Type:   HostPath (bare host directory volume)
    Path:   /etc/ssl
  default-token-1n7cc:
    Type:   Secret (a volume populated by a Secret)
    SecretName: default-token-1n7cc
    Optional:   false
QoS Class:  Burstable
Node-Selectors: storagenode=glusterfs
Tolerations:    node.alpha.kubernetes.io/notReady:NoExecute
        node.alpha.kubernetes.io/unreachable:NoExecute
Events:
  FirstSeen LastSeen    Count   From            SubObjectPath           Type        Reason          Message
  --------- --------    -----   ----            -------------           --------    ------          -------
  22m       22m     2   kubelet, sum-vm-1                   Normal      SuccessfulMountVolume   (combined from similar events): MountVolume.SetUp succeeded for volume "default-token-1n7cc"
  22m       22m     1   kubelet, sum-vm-1                   Normal      SuccessfulMountVolume   MountVolume.SetUp succeeded for volume "glusterfs-misc"
  22m       22m     1   kubelet, sum-vm-1                   Normal      SuccessfulMountVolume   MountVolume.SetUp succeeded for volume "glusterfs-dev"
  22m       22m     1   kubelet, sum-vm-1                   Normal      SuccessfulMountVolume   MountVolume.SetUp succeeded for volume "glusterfs-cgroup"
  22m       22m     1   kubelet, sum-vm-1                   Normal      SuccessfulMountVolume   MountVolume.SetUp succeeded for volume "glusterfs-ssl"
  22m       22m     1   kubelet, sum-vm-1                   Normal      SuccessfulMountVolume   MountVolume.SetUp succeeded for volume "glusterfs-heketi"
  22m       22m     1   kubelet, sum-vm-1                   Normal      SuccessfulMountVolume   MountVolume.SetUp succeeded for volume "glusterfs-lvm"
  22m       22m     1   kubelet, sum-vm-1                   Normal      SuccessfulMountVolume   MountVolume.SetUp succeeded for volume "glusterfs-etc"
  22m       22m     1   kubelet, sum-vm-1                   Normal      SuccessfulMountVolume   MountVolume.SetUp succeeded for volume "glusterfs-run"
  22m       22m     1   kubelet, sum-vm-1                   Normal      SuccessfulMountVolume   MountVolume.SetUp succeeded for volume "glusterfs-config"
  22m       1m      10  kubelet, sum-vm-1   spec.containers{glusterfs}  Normal      Pulled          Container image "gluster/gluster-centos:latest" already present on machine
  22m       1m      10  kubelet, sum-vm-1   spec.containers{glusterfs}  Normal      Created         Created container
  22m       1m      10  kubelet, sum-vm-1   spec.containers{glusterfs}  Normal      Started         Started container
  22m       13s     106 kubelet, sum-vm-1   spec.containers{glusterfs}  Warning     BackOff         Back-off restarting failed container
  22m       13s     106 kubelet, sum-vm-1                   Warning     FailedSync      Error syncing pod

@jarrpa heres the output

Justluckyg picture Justluckyg  ·  5 Sep 2017
0

Does the OS you are using support systemd?

erinboyd picture erinboyd  ·  7 Sep 2017
0

@Justluckyg can you do a describe on the pod, we are looking for an event that references 'dbus'

erinboyd picture erinboyd  ·  7 Sep 2017
0

@erinboyd I don't see how the OS supporting systemd matters, systemd is in the container. And he just gave us the describe of the pod.

@Justluckyg sorry for the delay, can you also do a describe of the glusterfs daemonset when it reaches such a state?

jarrpa picture jarrpa  ·  7 Sep 2017
0

@jarrpa we ran into a similar issue with the service broker integration on a non-RHEL OS. The error was bubbling up via the container...

erinboyd picture erinboyd  ·  8 Sep 2017
0
**kubectl describe pod glusterfs-2dccz**
2017-09-08 11:48:24.425706 I | proto: duplicate proto type registered: google.protobuf.Any
2017-09-08 11:48:24.425777 I | proto: duplicate proto type registered: google.protobuf.Duration
2017-09-08 11:48:24.425790 I | proto: duplicate proto type registered: google.protobuf.Timestamp
Name:       glusterfs-2dccz
Namespace:  default
Node:       sum-vm-2/192.168.1.241
Start Time: Fri, 08 Sep 2017 11:45:58 +0800
Labels:     controller-revision-hash=1016952396
        glusterfs=pod
        glusterfs-node=pod
        pod-template-generation=1
Annotations:    kubernetes.io/created-by={"kind":"SerializedReference","apiVersion":"v1","reference":{"kind":"DaemonSet","namespace":"default","name":"glusterfs","uid":"38954545-9448-11e7-8884-000c29580815","apiVersi...
Status:     Running
IP:     192.168.1.241
Created By: DaemonSet/glusterfs
Controlled By:  DaemonSet/glusterfs
Containers:
  glusterfs:
    Container ID:   docker://46a4ffeef4c1a4682eb0ac780b49851c0384cc6e714fd8731467b052fb393f64
    Image:      gluster/gluster-centos:latest
    Image ID:       docker-pullable://gluster/gluster-centos@sha256:e3e2881af497bbd76e4d3de90a4359d8167aa8410db2c66196f0b99df6067cb2
    Port:       <none>
    State:      Waiting
      Reason:       CrashLoopBackOff
    Last State:     Terminated
      Reason:       Error
      Exit Code:    1
      Started:      Fri, 08 Sep 2017 11:47:18 +0800
      Finished:     Fri, 08 Sep 2017 11:47:18 +0800
    Ready:      False
    Restart Count:  4
    Requests:
      cpu:      100m
      memory:       100Mi
    Liveness:       exec [/bin/bash -c systemctl status glusterd.service] delay=40s timeout=3s period=25s #success=1 #failure=15
    Readiness:      exec [/bin/bash -c systemctl status glusterd.service] delay=40s timeout=3s period=25s #success=1 #failure=15
    Environment:    <none>
    Mounts:
      /dev from glusterfs-dev (rw)
      /etc/glusterfs from glusterfs-etc (rw)
      /etc/ssl from glusterfs-ssl (ro)
      /run from glusterfs-run (rw)
      /run/lvm from glusterfs-lvm (rw)
      /sys/fs/cgroup from glusterfs-cgroup (ro)
      /var/lib/glusterd from glusterfs-config (rw)
      /var/lib/heketi from glusterfs-heketi (rw)
      /var/lib/misc/glusterfsd from glusterfs-misc (rw)
      /var/log/glusterfs from glusterfs-logs (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-1n7cc (ro)
Conditions:
  Type      Status
  Initialized   True
  Ready     False
  PodScheduled  True
Volumes:
  glusterfs-heketi:
    Type:   HostPath (bare host directory volume)
    Path:   /var/lib/heketi
  glusterfs-run:
    Type:   EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:
  glusterfs-lvm:
    Type:   HostPath (bare host directory volume)
    Path:   /run/lvm
  glusterfs-etc:
    Type:   HostPath (bare host directory volume)
    Path:   /etc/glusterfs
  glusterfs-logs:
    Type:   HostPath (bare host directory volume)
    Path:   /var/log/glusterfs
  glusterfs-config:
    Type:   HostPath (bare host directory volume)
    Path:   /var/lib/glusterd
  glusterfs-dev:
    Type:   HostPath (bare host directory volume)
    Path:   /dev
  glusterfs-misc:
    Type:   HostPath (bare host directory volume)
    Path:   /var/lib/misc/glusterfsd
  glusterfs-cgroup:
    Type:   HostPath (bare host directory volume)
    Path:   /sys/fs/cgroup
  glusterfs-ssl:
    Type:   HostPath (bare host directory volume)
    Path:   /etc/ssl
  default-token-1n7cc:
    Type:   Secret (a volume populated by a Secret)
    SecretName: default-token-1n7cc
    Optional:   false
QoS Class:  Burstable
Node-Selectors: storagenode=glusterfs
Tolerations:    node.alpha.kubernetes.io/notReady:NoExecute
        node.alpha.kubernetes.io/unreachable:NoExecute
Events:
  FirstSeen LastSeen    Count   From            SubObjectPath           Type        Reason          Message
  --------- --------    -----   ----            -------------           --------    ------          -------
  2m        2m      1   kubelet, sum-vm-2                   Normal      SuccessfulMountVolume   MountVolume.SetUp succeeded for volume "glusterfs-etc"
  2m        2m      1   kubelet, sum-vm-2                   Normal      SuccessfulMountVolume   MountVolume.SetUp succeeded for volume "glusterfs-run"
  2m        2m      1   kubelet, sum-vm-2                   Normal      SuccessfulMountVolume   MountVolume.SetUp succeeded for volume "glusterfs-ssl"
  2m        2m      1   kubelet, sum-vm-2                   Normal      SuccessfulMountVolume   MountVolume.SetUp succeeded for volume "glusterfs-lvm"
  2m        2m      1   kubelet, sum-vm-2                   Normal      SuccessfulMountVolume   MountVolume.SetUp succeeded for volume "glusterfs-misc"
  2m        2m      1   kubelet, sum-vm-2                   Normal      SuccessfulMountVolume   MountVolume.SetUp succeeded for volume "glusterfs-dev"
  2m        2m      1   kubelet, sum-vm-2                   Normal      SuccessfulMountVolume   MountVolume.SetUp succeeded for volume "glusterfs-logs"
  2m        2m      1   kubelet, sum-vm-2                   Normal      SuccessfulMountVolume   MountVolume.SetUp succeeded for volume "glusterfs-cgroup"
  2m        2m      1   kubelet, sum-vm-2                   Normal      SuccessfulMountVolume   MountVolume.SetUp succeeded for volume "glusterfs-heketi"
  2m        2m      2   kubelet, sum-vm-2                   Normal      SuccessfulMountVolume   (combined from similar events): MountVolume.SetUp succeeded for volume "default-token-1n7cc"
  2m        1m      5   kubelet, sum-vm-2   spec.containers{glusterfs}  Normal      Pulled          Container image "gluster/gluster-centos:latest" already present on machine
  2m        1m      5   kubelet, sum-vm-2   spec.containers{glusterfs}  Normal      Created         Created container
  2m        1m      5   kubelet, sum-vm-2   spec.containers{glusterfs}  Normal      Started         Started container
  2m        7s      13  kubelet, sum-vm-2   spec.containers{glusterfs}  Warning     BackOff         Back-off restarting failed container
  2m        7s      13  kubelet, sum-vm-2                   Warning     FailedSync      Error syncing pod
cat /etc/*-release
CentOS Linux release 7.3.1611 (Core)
NAME="CentOS Linux"
VERSION="7 (Core)"
ID="centos"
ID_LIKE="rhel fedora"
VERSION_ID="7"
PRETTY_NAME="CentOS Linux 7 (Core)"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:centos:centos:7"
HOME_URL="https://www.centos.org/"
BUG_REPORT_URL="https://bugs.centos.org/"

CENTOS_MANTISBT_PROJECT="CentOS-7"
CENTOS_MANTISBT_PROJECT_VERSION="7"
REDHAT_SUPPORT_PRODUCT="centos"
REDHAT_SUPPORT_PRODUCT_VERSION="7"

CentOS Linux release 7.3.1611 (Core)
CentOS Linux release 7.3.1611 (Core)

@erinboyd @jarrpa pls see describe pod of glusterfs daemonset, im using centos 7 core.

@erinboyd also to answer your question about systemd, yes.

[[ systemctl =~ -.mount ]] && echo yes || echo no
yes

Justluckyg picture Justluckyg  ·  8 Sep 2017
0

@Justluckyg That's not the daemonset, that's the pod. I'd like the output of kubectl describe ds glusterfs.

jarrpa picture jarrpa  ·  8 Sep 2017
0

@jarrpa ohh i didnt know that command, here you go:

kubectl describe ds glusterfs
2017-09-08 12:16:22.526270 I | proto: duplicate proto type registered: google.protobuf.Any
2017-09-08 12:16:22.526534 I | proto: duplicate proto type registered: google.protobuf.Duration
2017-09-08 12:16:22.526548 I | proto: duplicate proto type registered: google.protobuf.Timestamp
Name:       glusterfs
Selector:   glusterfs=pod,glusterfs-node=pod
Node-Selector:  storagenode=glusterfs
Labels:     glusterfs=daemonset
Annotations:    description=GlusterFS DaemonSet
        tags=glusterfs
Desired Number of Nodes Scheduled: 3
Current Number of Nodes Scheduled: 3
Number of Nodes Scheduled with Up-to-date Pods: 3
Number of Nodes Scheduled with Available Pods: 0
Number of Nodes Misscheduled: 0
Pods Status:    3 Running / 0 Waiting / 0 Succeeded / 0 Failed
Pod Template:
  Labels:   glusterfs=pod
        glusterfs-node=pod
  Containers:
   glusterfs:
    Image:  gluster/gluster-centos:latest
    Port:   <none>
    Requests:
      cpu:      100m
      memory:       100Mi
    Liveness:       exec [/bin/bash -c systemctl status glusterd.service] delay=40s timeout=3s period=25s #success=1 #failure=15
    Readiness:      exec [/bin/bash -c systemctl status glusterd.service] delay=40s timeout=3s period=25s #success=1 #failure=15
    Environment:    <none>
    Mounts:
      /dev from glusterfs-dev (rw)
      /etc/glusterfs from glusterfs-etc (rw)
      /etc/ssl from glusterfs-ssl (ro)
      /run from glusterfs-run (rw)
      /run/lvm from glusterfs-lvm (rw)
      /sys/fs/cgroup from glusterfs-cgroup (ro)
      /var/lib/glusterd from glusterfs-config (rw)
      /var/lib/heketi from glusterfs-heketi (rw)
      /var/lib/misc/glusterfsd from glusterfs-misc (rw)
      /var/log/glusterfs from glusterfs-logs (rw)
  Volumes:
   glusterfs-heketi:
    Type:   HostPath (bare host directory volume)
    Path:   /var/lib/heketi
   glusterfs-run:
    Type:   EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:
   glusterfs-lvm:
    Type:   HostPath (bare host directory volume)
    Path:   /run/lvm
   glusterfs-etc:
    Type:   HostPath (bare host directory volume)
    Path:   /etc/glusterfs
   glusterfs-logs:
    Type:   HostPath (bare host directory volume)
    Path:   /var/log/glusterfs
   glusterfs-config:
    Type:   HostPath (bare host directory volume)
    Path:   /var/lib/glusterd
   glusterfs-dev:
    Type:   HostPath (bare host directory volume)
    Path:   /dev
   glusterfs-misc:
    Type:   HostPath (bare host directory volume)
    Path:   /var/lib/misc/glusterfsd
   glusterfs-cgroup:
    Type:   HostPath (bare host directory volume)
    Path:   /sys/fs/cgroup
   glusterfs-ssl:
    Type:   HostPath (bare host directory volume)
    Path:   /etc/ssl
Events:
  FirstSeen LastSeen    Count   From        SubObjectPath   Type        Reason          Message
  --------- --------    -----   ----        -------------   --------    ------          -------
  30m       30m     1   daemon-set          Normal      SuccessfulCreate    Created pod: glusterfs-vvr33
  30m       30m     1   daemon-set          Normal      SuccessfulCreate    Created pod: glusterfs-2dccz
  30m       30m     1   daemon-set          Normal      SuccessfulCreate    Created pod: glusterfs-2sgr0
Justluckyg picture Justluckyg  ·  8 Sep 2017
0

@Justluckyg The gluster containers write their logs to the /var/log/glusterfs directories of the nodes they're running on. Can you inspect one of the nodes and see if the glusterd.log file shows any useful error messages?

jarrpa picture jarrpa  ·  8 Sep 2017
0

hi @jarrpa theres nothing in the /var/log/glusterfs, here what i saw in the /var/log/containers

[[email protected] containers]# tail glusterfs-2sgr0_default_glusterfs-0cba0fb47aadda22f5f0fe2aca8a260213a5849db32a7f0411e72f0e3dfe5847.log
{"log":"Couldn't find an alternative telinit implementation to spawn.\n","stream":"stderr","time":"2017-09-08T04:28:02.038319988Z"}
[[email protected] containers]# cd ..
[[email protected] log]# ls -la glusterfs/
total 4
drwxr-xr-x.  2 root root    6 Nov 15  2016 .
drwxr-xr-x. 12 root root 4096 Sep  4 11:28 ..
Justluckyg picture Justluckyg  ·  8 Sep 2017
0

What's your version of Kube?

jarrpa picture jarrpa  ·  8 Sep 2017
0
jarrpa picture jarrpa  ·  8 Sep 2017
0

@jarrpa
Kubernetes v1.7.3+coreos.0

Docker version 1.13.1, build 092cba3

I did try downgrading to 1.12 but that version conflicts with the kubespray ansible i ran to deploy the cluster https://github.com/kubernetes-incubator/kubespray

Justluckyg picture Justluckyg  ·  8 Sep 2017
0

@jarrpa according to this
if i add this to the container config, it will get the systemd to run

env:
      - name: SYSTEMD_IGNORE_CHROOT
        value: "1"
      command:
      - /usr/lib/systemd/systemd
      - --system

how and where can i change it ?.. thanks!!

Justluckyg picture Justluckyg  ·  8 Sep 2017
1

Yes, but you don't want to do that. :) The systemd folks have said that they do not support running under --system configuration when it's not actually PID 1. Unfortunately, right now there is not a way around this. There are only two official workarounds: 1.) downgrade Docker, or 2.) Pass a flag to all kubelets in your cluster https://github.com/gluster/gluster-kubernetes/issues/298#issuecomment-325953404

Your other option is to wait for another release of Kube v1.8, as this PR should also solve the problem: https://github.com/kubernetes/kubernetes/pull/51634

Finally, if you want to get something working now, you can try out these experimental images I've put together more-or-less just for fun: https://github.com/gluster/gluster-kubernetes/issues/298#issuecomment-325985569

jarrpa picture jarrpa  ·  8 Sep 2017
0

@jarrpa appreciate your response.. when will 1.8 be released?
and as per your custom image, if i try to use that, ill just replace the one i cloned initially, specifically the glusterfs-daemonset.yaml right?

Justluckyg picture Justluckyg  ·  9 Sep 2017
0

Looks like no date has been set yet. And yeah, replace the glusterfs-daemonset.yaml file.

jarrpa picture jarrpa  ·  9 Sep 2017
0

@jarrpa hi again.. i think im making some progress after cloning your yaml file.

I was getting below, how do i change the request of resources to something lower. i testing this in a desktop server only before testing in our production env.

Events:
  FirstSeen LastSeen    Count   From            SubObjectPath   Type        Reason          Message
  --------- --------    -----   ----            -------------   --------    ------          -------
  6m        6m      1   daemon-set              Normal      SuccessfulCreate    Created pod: glusterfs-wc5hm
  6m        6m      1   daemon-set              Normal      SuccessfulCreate    Created pod: glusterfs-t35z8
  6m        2s      92  daemonset-controller            Warning     FailedPlacement     failed to place pod on "sum-vm-3": Node didn't have enough resource: cpu, requested: 100, used: 895, capacity: 900
Justluckyg picture Justluckyg  ·  11 Sep 2017
0

@jarrpa but then the 2 pods that has the sufficient resources are still crashing, logs here below

[glusterd......] [2017-09-11 04:30:00.371392] E [rpc-transport.c:283:rpc_transport_load] 0-rpc-transport: /usr/lib64/glusterfs/3.10.5/rpc-transport/rdma.so: cannot open shared object file: No such file or directory
[glusterd......] [2017-09-11 04:30:00.371427] W [rpc-transport.c:287:rpc_transport_load] 0-rpc-transport: volume 'rdma.management': transport-type 'rdma' is not valid or not found on this machine
[glusterd......] [2017-09-11 04:30:00.371435] W [rpcsvc.c:1661:rpcsvc_create_listener] 0-rpc-service: cannot create listener, initing the transport failed
[glusterd......] [2017-09-11 04:30:00.371441] E [MSGID: 106243] [glusterd.c:1720:init] 0-management: creation of 1 listeners failed, continuing with succeeded transport
[glusterd......] [2017-09-11 04:30:02.063040] I [MSGID: 106513] [glusterd-store.c:2197:glusterd_restore_op_version] 0-glusterd: retrieved op-version: 31004
[glusterd......] [2017-09-11 04:30:02.063186] I [MSGID: 106194] [glusterd-store.c:3776:glusterd_store_retrieve_missed_snaps_list] 0-management: No missed snaps list.
[glusterd......] Final graph:
[glusterd......] +------------------------------------------------------------------------------+
[glusterd......] 1: volume management
[glusterd......] 2:     type mgmt/glusterd
[glusterd......] 3:     option rpc-auth.auth-glusterfs on
[glusterd......] 4:     option rpc-auth.auth-unix on
[glusterd......] 5:     option rpc-auth.auth-null on
[glusterd......] 6:     option rpc-auth-allow-insecure on
[glusterd......] 7:     option transport.socket.listen-backlog 128
[glusterd......] 8:     option event-threads 1
[glusterd......] 9:     option ping-timeout 0
[glusterd......] 10:     option transport.socket.read-fail-log off
[glusterd......] 11:     option transport.socket.keepalive-interval 2
[glusterd......] 12:     option transport.socket.keepalive-time 10
[glusterd......] 13:     option transport-type rdma
[glusterd......] 14:     option working-directory /var/lib/glusterd
[glusterd......] 15: end-volume
[glusterd......] 16:
[glusterd......] +------------------------------------------------------------------------------+
[glusterd......] [2017-09-11 04:30:02.064674] I [MSGID: 101190] [event-epoll.c:629:event_dispatch_epoll_worker] 0-epoll: Started thread with index 1
[glusterd......] [2017-09-11 04:30:03.135422] I [MSGID: 106488] [glusterd-handler.c:1538:__glusterd_handle_cli_get_volume] 0-management: Received get vol req
Justluckyg picture Justluckyg  ·  11 Sep 2017
0

@jarrpa does having a working glusterfs cluster a pre req before i could even run gk-deploy?

and is the instructions below the same ?
https://wiki.centos.org/HowTos/GlusterFSonCentOS

one more rather "dumb" question, accdg to the requirements, it should have a completely empty block device storage. im running this on a test desktop server with a single disk and using vsphere esxi to create vm's and create a cluster for these vm's. i only added a virtual disk to this nodes and declared those in the topology.json.

am i required to use a totally different physical block storage device for this to work?

Justluckyg picture Justluckyg  ·  11 Sep 2017
0

@Justluckyg

does having a working glusterfs cluster a pre req before i could even run gk-deploy?

nope.. gk-deploy helps in setting up glusterfs cluster setup.

Refer this link for pre-requisites:
https://github.com/gluster/gluster-kubernetes/blob/master/deploy/gk-deploy#L514

am i required to use a totally different physical block storage device for this to work?

not required. You can mention virtual disk in topology.json

SaravanaStorageNetwork picture SaravanaStorageNetwork  ·  11 Sep 2017
0

@SaravanaStorageNetwork does the virtual disk count as a valid block device for glusterfs to work?

Justluckyg picture Justluckyg  ·  11 Sep 2017
0

@SaravanaStorageNetwork does the virtual disk count as a valid block device for glusterfs to work?

Yes

SaravanaStorageNetwork picture SaravanaStorageNetwork  ·  11 Sep 2017
0

@Justluckyg Sorry for the delayed response, I'm travelling all this week. :)

I was getting below, how do i change the request of resources to something lower.

Just edit this line.

but then the 2 pods that has the sufficient resources are still crashing

The logs you provided don't show the error condition, that's just normal glusterd output. Try kubectl logs -p.

On the block device: You just need UNUSED block devices. As long as the block devices you mention are not in use (e.g. by the OS) they can be virtual or physical; they just need to be accessible from the node.

jarrpa picture jarrpa  ·  12 Sep 2017
0

@jarrpa here's the logs

kubectl logs -p glusterfs-1czwt
[glusterd......] [2017-09-13 05:17:19.840473] I [MSGID: 100030] [glusterfsd.c:2475:main] 0-/usr/sbin/glusterd: Started running /usr/sbin/glusterd version 3.10.5 (args: /usr/sbin/glusterd -p /var/run/glusterd.pid --log-level INFO)
[glusterd......] [2017-09-13 05:17:19.848096] I [MSGID: 106478] [glusterd.c:1449:init] 0-management: Maximum allowed open file descriptors set to 65536
[glusterd......] [2017-09-13 05:17:19.848128] I [MSGID: 106479] [glusterd.c:1496:init] 0-management: Using /var/lib/glusterd as working directory
[glusterd......] [2017-09-13 05:17:19.849940] E [rpc-transport.c:283:rpc_transport_load] 0-rpc-transport: /usr/lib64/glusterfs/3.10.5/rpc-transport/rdma.so: cannot open shared object file: No such file or directory
[glusterd......] [2017-09-13 05:17:19.849958] W [rpc-transport.c:287:rpc_transport_load] 0-rpc-transport: volume 'rdma.management': transport-type 'rdma' is not valid or not found on this machine
[glusterd......] [2017-09-13 05:17:19.849974] W [rpcsvc.c:1661:rpcsvc_create_listener] 0-rpc-service: cannot create listener, initing the transport failed
[glusterd......] [2017-09-13 05:17:19.849981] E [MSGID: 106243] [glusterd.c:1720:init] 0-management: creation of 1 listeners failed, continuing with succeeded transport
[glusterd......] [2017-09-13 05:17:21.654311] I [MSGID: 106513] [glusterd-store.c:2197:glusterd_restore_op_version] 0-glusterd: retrieved op-version: 31004
[glusterd......] [2017-09-13 05:17:21.654433] I [MSGID: 106194] [glusterd-store.c:3776:glusterd_store_retrieve_missed_snaps_list] 0-management: No missed snaps list.
[glusterd......] Final graph:
[glusterd......] +------------------------------------------------------------------------------+
[glusterd......] 1: volume management
[glusterd......] 2:     type mgmt/glusterd
[glusterd......] 3:     option rpc-auth.auth-glusterfs on
[glusterd......] 4:     option rpc-auth.auth-unix on
[glusterd......] 5:     option rpc-auth.auth-null on
[glusterd......] 6:     option rpc-auth-allow-insecure on
[glusterd......] 7:     option transport.socket.listen-backlog 128
[glusterd......] 8:     option event-threads 1
[glusterd......] 9:     option ping-timeout 0
[glusterd......] 10:     option transport.socket.read-fail-log off
[glusterd......] 11:     option transport.socket.keepalive-interval 2
[glusterd......] 12:     option transport.socket.keepalive-time 10
[glusterd......] 13:     option transport-type rdma
[glusterd......] 14:     option working-directory /var/lib/glusterd
[glusterd......] 15: end-volume
[glusterd......] 16:
[glusterd......] +------------------------------------------------------------------------------+
[glusterd......] [2017-09-13 05:17:21.655855] I [MSGID: 101190] [event-epoll.c:629:event_dispatch_epoll_worker] 0-epoll: Started thread with index 1
[glusterd......] [2017-09-13 05:17:22.721654] I [MSGID: 106488] [glusterd-handler.c:1538:__glusterd_handle_cli_get_volume] 0-management: Received get vol req
[glusterd......] Starting glusterd ... OK
[glusterd......] [2017-09-13 05:17:22.783844] I [MSGID: 106544] [glusterd.c:158:glusterd_uuid_init] 0-management: retrieved UUID: e535fee0-b219-4eae-b9cc-d454510a70ef
[glusterd......] volume set: success
[glusterd......] [2017-09-13 05:17:22.849589] I [run.c:191:runner_log] (-->/usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so(+0xcf4fa) [0x7f9d37b904fa] -->/usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so(+0xcefac) [0x7f9d37b8ffac] -->/lib64/libglusterfs.so.0(runner_log+0x105) [0x7f9d3cbb7ee5] ) 0-management: Ran script: /var/lib/glusterd/hooks/1/set/post/S30samba-set.sh --volname=all -o cluster.brick-multiplex=on --gd-workdir=/var/lib/glusterd
[glusterd......] [2017-09-13 05:17:22.866467] E [MSGID: 106025] [glusterd-op-sm.c:1097:glusterd_op_stage_set_volume] 0-management: Option with name: cluster.max-bricks-per-process does not exist
[glusterd......] [2017-09-13 05:17:22.866521] E [MSGID: 106301] [glusterd-syncop.c:1321:gd_stage_op_phase] 0-management: Staging of operation 'Volume Set' failed on localhost : option : cluster.max-bricks-per-process does not exist
[glusterd......] Did you mean cluster.max-op-version or ...min-free-inodes?
[glusterd......] volume set: failed: option : cluster.max-bricks-per-process does not exist
[glusterd......] Did you mean cluster.max-op-version or ...min-free-inodes?
[glusterd......] Setting max bricks per process failed
[glusterd......] Killing glusterd ...
[glusterd......] [2017-09-13 05:17:22.870984] W [glusterfsd.c:1332:cleanup_and_exit] (-->/lib64/libpthread.so.0(+0x736d) [0x7f9d3b9bc36d] -->/usr/sbin/glusterd(glusterfs_sigwaiter+0xf5) [0x7f9d3d06ad05] -->/usr/sbin/glusterd(cleanup_and_exit+0x5a) [0x7f9d3d06aafa] ) 0-: received signum (15), shutting down
[glusterd......] [2017-09-13 05:17:22.906409] I [run.c:191:runner_log] (-->/usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so(+0xcf4fa) [0x7f9d37b904fa] -->/usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so(+0xcefac) [0x7f9d37b8ffac] -->/lib64/libglusterfs.so.0(runner_log+0x105) [0x7f9d3cbb7ee5] ) 0-management: Ran script: /var/lib/glusterd/hooks/1/set/post/S32gluster_enable_shared_storage.sh --volname=all -o cluster.brick-multiplex=on --gd-workdir=/var/lib/glusterd
[glusterd......] OK
[glusterd......] [2017-09-13 05:17:23.913910] I [MSGID: 100030] [glusterfsd.c:2475:main] 0-/usr/sbin/glusterd: Started running /usr/sbin/glusterd version 3.10.5 (args: /usr/sbin/glusterd -p /var/run/glusterd.pid --log-level INFO)
[glusterd......] [2017-09-13 05:17:23.919857] I [MSGID: 106478] [glusterd.c:1449:init] 0-management: Maximum allowed open file descriptors set to 65536
[glusterd......] [2017-09-13 05:17:23.919907] I [MSGID: 106479] [glusterd.c:1496:init] 0-management: Using /var/lib/glusterd as working directory
[glusterd......] [2017-09-13 05:17:23.923276] E [rpc-transport.c:283:rpc_transport_load] 0-rpc-transport: /usr/lib64/glusterfs/3.10.5/rpc-transport/rdma.so: cannot open shared object file: No such file or directory
[glusterd......] [2017-09-13 05:17:23.923314] W [rpc-transport.c:287:rpc_transport_load] 0-rpc-transport: volume 'rdma.management': transport-type 'rdma' is not valid or not found on this machine
[glusterd......] [2017-09-13 05:17:23.923333] W [rpcsvc.c:1661:rpcsvc_create_listener] 0-rpc-service: cannot create listener, initing the transport failed
[glusterd......] [2017-09-13 05:17:23.923343] E [MSGID: 106243] [glusterd.c:1720:init] 0-management: creation of 1 listeners failed, continuing with succeeded transport
[glusterd......] [2017-09-13 05:17:25.579290] I [MSGID: 106513] [glusterd-store.c:2197:glusterd_restore_op_version] 0-glusterd: retrieved op-version: 31004
[glusterd......] [2017-09-13 05:17:25.579417] I [MSGID: 106194] [glusterd-store.c:3776:glusterd_store_retrieve_missed_snaps_list] 0-management: No missed snaps list.
[glusterd......] Final graph:
[glusterd......] +------------------------------------------------------------------------------+
[glusterd......] 1: volume management
[glusterd......] 2:     type mgmt/glusterd
[glusterd......] 3:     option rpc-auth.auth-glusterfs on
[glusterd......] 4:     option rpc-auth.auth-unix on
[glusterd......] 5:     option rpc-auth.auth-null on
[glusterd......] 6:     option rpc-auth-allow-insecure on
[glusterd......] 7:     option transport.socket.listen-backlog 128
[glusterd......] 8:     option event-threads 1
[glusterd......] 9:     option ping-timeout 0
[glusterd......] 10:     option transport.socket.read-fail-log off
[glusterd......] 11:     option transport.socket.keepalive-interval 2
[glusterd......] 12:     option transport.socket.keepalive-time 10
[glusterd......] 13:     option transport-type rdma
[glusterd......] 14:     option working-directory /var/lib/glusterd
[glusterd......] 15: end-volume
[glusterd......] 16:
[glusterd......] +------------------------------------------------------------------------------+
[glusterd......] [2017-09-13 05:17:25.580817] I [MSGID: 101190] [event-epoll.c:629:event_dispatch_epoll_worker] 0-epoll: Started thread with index 1
[glusterd......] [2017-09-13 05:17:26.655818] I [MSGID: 106488] [glusterd-handler.c:1538:__glusterd_handle_cli_get_volume] 0-management: Received get vol req
[glusterd......] Starting glusterd ... OK
[targetctl.....] Traceback (most recent call last):
[targetctl.....] File "/usr/lib/python3.6/site-packages/rtslib_fb/utils.py", line 429, in modprobe
[targetctl.....] kmod.Kmod().modprobe(module)
[targetctl.....] File "kmod/kmod.pyx", line 106, in kmod.kmod.Kmod.modprobe (kmod/kmod.c:3166)
[targetctl.....] File "kmod/kmod.pyx", line 82, in lookup (kmod/kmod.c:2393)
[targetctl.....] kmod.error.KmodError: Could not modprobe
[targetctl.....]
[targetctl.....] During handling of the above exception, another exception occurred:
[targetctl.....]
[targetctl.....] Traceback (most recent call last):
[targetctl.....] File "/usr/bin/targetctl", line 82, in <module>
[targetctl.....] main()
[targetctl.....] File "/usr/bin/targetctl", line 79, in main
[targetctl.....] funcs[sys.argv[1]](savefile)
[targetctl.....] File "/usr/bin/targetctl", line 47, in restore
[targetctl.....] errors = RTSRoot().restore_from_file(restore_file=from_file)
[targetctl.....] File "/usr/lib/python3.6/site-packages/rtslib_fb/root.py", line 75, in __init__
[targetctl.....] modprobe('target_core_mod')
[targetctl.....] File "/usr/lib/python3.6/site-packages/rtslib_fb/utils.py", line 431, in modprobe
[targetctl.....] raise RTSLibError("Could not load module: %s" % module)
[targetctl.....] rtslib_fb.utils.RTSLibError: Could not load module: target_core_mod
[gluster-blockd] [2017-09-13 05:17:26.977966] ERROR: tcmu-runner running, but targetcli doesn't list user:glfs handler [at gluster-blockd.c+281 :<blockNodeSanityCheck>]
tcmu-runner has failed
gluster-blockd has failed
Exiting
Killing processes ... [glusterd......] [2017-09-13 05:17:27.707047] W [glusterfsd.c:1332:cleanup_and_exit] (-->/lib64/libpthread.so.0(+0x736d) [0x7f7c3a5e336d] -->/usr/sbin/glusterd(glusterfs_sigwaiter+0xf5) [0x7f7c3bc91d05] -->/usr/sbin/glusterd(cleanup_and_exit+0x5a) [0x7f7c3bc91afa] ) 0-: received signum (15), shutting down
OK
Justluckyg picture Justluckyg  ·  13 Sep 2017
0

@Justluckyg You're missing some kernel modules which are new requirements for the versions of Gluster running in my containers. See here.

jarrpa picture jarrpa  ·  15 Sep 2017
0

@jarrpa I was able to get the pods running after loading those modules (thanks! yey!) However, im having another problem same as the one he reported

I tried manually starting gluster daemon,and it wont start too.
I'm not completely sure with how he resolved it, like tagging the latest, which image and which yaml file did he modify?

also unlike him, i cannot do the peer probe, it says the daemon is not running.

update

i tried changing the heketi image tag to latest (also tried block) on both heketi-deployment and deploy-heketi yaml file, but both gave the same error:

Checking status of pods matching '--selector=deploy-heketi=pod':
deploy-heketi-1254564305-1cfgl   1/1       Running   0         11s
OK
Determining heketi service URL ... OK
/usr/local/bin/kubectl -n default exec -i deploy-heketi-1254564305-1cfgl -- heketi-cli -s http://localhost:8080 --user admin --secret '' topology load --json=/etc/heketi/topology.json 2>&1
Creating cluster ... ID: a82854def37a9243e8013c021e7ccaca
Allowing file volumes on cluster.
Allowing block volumes on cluster.
Creating node sum-vm-1 ... Unable to create node: New Node doesn't have glusterd running
Creating node sum-vm-2 ... Unable to create node: New Node doesn't have glusterd running
Creating node sum-vm-3 ... Unable to create node: New Node doesn't have glusterd running
Error loading the cluster topology.
Please check the failed node or device and rerun this script
systemctl status glusterd
● glusterd.service - GlusterFS, a clustered file-system server
   Loaded: loaded (/usr/lib/systemd/system/glusterd.service; enabled; vendor preset: disabled)
   Active: failed (Result: exit-code) since Fri 2017-09-15 14:28:40 +08; 1h 13min ago
  Process: 879 ExecStart=/usr/sbin/glusterd -p /var/run/glusterd.pid --log-level $LOG_LEVEL $GLUSTERD_OPTIONS (code=exited, status=255)

Sep 15 14:28:39 sum-vm-1 systemd[1]: Starting GlusterFS, a clustered file-system server...
Sep 15 14:28:40 sum-vm-1 GlusterFS[879]: [glusterfsd.c:1844:parse_cmdline] 0-glusterfs: ERROR: parsing the volfile failed [No such file or directory]
Sep 15 14:28:40 sum-vm-1 glusterd[879]: USAGE: /usr/sbin/glusterd [options] [mountpoint]
Sep 15 14:28:40 sum-vm-1 systemd[1]: glusterd.service: control process exited, code=exited status=255
Sep 15 14:28:40 sum-vm-1 systemd[1]: Failed to start GlusterFS, a clustered file-system server.
Sep 15 14:28:40 sum-vm-1 systemd[1]: Unit glusterd.service entered failed state.
Sep 15 14:28:40 sum-vm-1 systemd[1]: glusterd.service failed.
Justluckyg picture Justluckyg  ·  15 Sep 2017
0

@jarrpa another update

so i deleted the virtual disks and recreated it again, reset the glusterfs and deleted all those dirs and redeploy.

this is the new error im getting

Waiting for GlusterFS pods to start ...
Checking status of pods matching '--selector=glusterfs=pod':
glusterfs-3f5kb   1/1       Running   0         42s
glusterfs-pgwrw   1/1       Running   0         42s
glusterfs-wf7v3   1/1       Running   0         42s
OK
/usr/local/bin/kubectl -n default create secret generic heketi-config-secret --from-file=private_key=/dev/null --from-file=./heketi.json --from-file=topology.json=topology.json
secret "heketi-config-secret" created
/usr/local/bin/kubectl -n default label --overwrite secret heketi-config-secret glusterfs=heketi-config-secret heketi=config-secret
secret "heketi-config-secret" labeled
sed -e 's/\${HEKETI_EXECUTOR}/kubernetes/' -e 's/\${HEKETI_ADMIN_KEY}//' -e 's/\${HEKETI_USER_KEY}//' /root/gluster-kubernetes/deploy/kube-templates/deploy-heketi-deployment.yaml | /usr/local/bin/kubectl -n default create -f - 2>&1
service "deploy-heketi" created
deployment "deploy-heketi" created
Waiting for deploy-heketi pod to start ...
Checking status of pods matching '--selector=deploy-heketi=pod':
deploy-heketi-2199298601-6dlc3   1/1       Running   0         7s
OK
Determining heketi service URL ... Failed to communicate with deploy-heketi service.
[[email protected] ~]# kubectl get svc
NAME            CLUSTER-IP      EXTERNAL-IP   PORT(S)    AGE
deploy-heketi   10.233.10.148   <none>        8080/TCP   10m
kubernetes      10.233.0.1      <none>        443/TCP    21d
[[email protected] ~]# kubectl describe svc deploy-heketi
Name:           deploy-heketi
Namespace:      default
Labels:         deploy-heketi=service
            glusterfs=heketi-service
Annotations:        description=Exposes Heketi Service
Selector:       deploy-heketi=pod
Type:           ClusterIP
IP:         10.233.10.148
Port:           deploy-heketi   8080/TCP
Endpoints:      10.233.124.175:8080
Session Affinity:   None
Events:         <none>
Justluckyg picture Justluckyg  ·  15 Sep 2017
0

Can you curl the heketi port? e.g. curl http://10.233.10.148:8080/hello? If not there might be a networking or firewall issue.

jarrpa picture jarrpa  ·  15 Sep 2017
0

so I tried it again, im not sure what changed but the first time i did it, the heketi service was reachable (that was when i was getting the none of the nodes glusterd has started)
i aborted the deployment, deleted the directories and tried deploying it again. and same error:
Determining heketi service URL ... Failed to communicate with deploy-heketi service.

what log can i check to see if its the network or firewall thats causing it? my cluster was deployed via kubespray (using calico as default cni)

thanks @jarrpa

Justluckyg picture Justluckyg  ·  18 Sep 2017
0

@Justluckyg Uncertain... but if you're using kubespray, you might be running into this?

jarrpa picture jarrpa  ·  18 Sep 2017
0

@jarrpa hi, i reran my kubespray and gk-deploy, i got past the heketi this time but now im getting this. this vdisk are all empty, im not sure what this error means..

Creating cluster ... ID: 63fd16cc8effbb6ea0ebb24cb4517c87
Creating node sum-vm-1 ... ID: 8f6818a2a227d4fea569f078edf2d98e
Adding device /dev/sdb ... Unable to add device: Unable to execute command on glusterfs-9h860:   Couldn't find device with uuid QUKge9-Ytef-JM0q-fpAT-kc9c-cHD0-dGoZVN.
Couldn't find device with uuid qgu45O-hQg5-BZp2-g9oe-NXsN-a5lV-53Erq6.
Can't open /dev/sdb exclusively.  Mounted filesystem?
Creating node sum-vm-2 ... ID: 7a46eac46700ae27dafba52d3eb0b8e8
Adding device /dev/sdb ... Unable to add device: Unable to execute command on glusterfs-cg1pm:   Can't open /dev/sdb exclusively.  Mounted filesystem?
Creating node sum-vm-3 ... ID: 1e53192e8131b311e289d5257fb4e2f8
Adding device /dev/sdb ... Unable to add device: Unable to execute command on glusterfs-9l6nr:   Can't open /dev/sdb exclusively.  Mounted filesystem?
Error loading the cluster topology.
Please check the failed node or device and rerun this script
Justluckyg picture Justluckyg  ·  18 Sep 2017
0

@Justluckyg vgs and pvs show nothing for those drives on the respective hosts?

jarrpa picture jarrpa  ·  18 Sep 2017
0

@Justluckyg And did you completely destroy the deploy-heketi pod?

jarrpa picture jarrpa  ·  18 Sep 2017
0

@jarrpa do you mean kubectl get pvs? sorry not sure what you meant

Justluckyg picture Justluckyg  ·  18 Sep 2017
0

@Justluckyg No. Those are commands you can run on the hosts to see if there are any LVM Volume Groups (vgs) or Physical Volumes (pvs) defined that are using the raw block devices.

jarrpa picture jarrpa  ·  18 Sep 2017
0

Hi @jarrpa

The drive i allocated for each of my nodes is the
/dev/sdb 50G 52M 47G 1% /kubernetes

drive and in each nodes i ran pvdisplay and vgdisplay, none of them shows the /dev/sdb

```
[[email protected] deploy]# vgdisplay
WARNING: Device for PV QUKge9-Ytef-JM0q-fpAT-kc9c-cHD0-dGoZVN not found or rejected by a filter.
WARNING: Device for PV qgu45O-hQg5-BZp2-g9oe-NXsN-a5lV-53Erq6 not found or rejected by a filter.
--- Volume group ---
VG Name cl
System ID
Format lvm2
Metadata Areas 1
Metadata Sequence No 5
VG Access read/write
VG Status resizable
MAX LV 0
Cur LV 2
Open LV 2
Max PV 0
Cur PV 3
Act PV 1
VG Size <182.99 GiB
PE Size 4.00 MiB
Total PE 46845
Alloc PE / Size 3839 / <15.00 GiB
Free PE / Size 43006 / 167.99 GiB
VG UUID MGMTCn-29jM-W3ku-ppGI-p82g-CnOM-GnulFM

[[email protected] deploy]# pvdisplay
WARNING: Device for PV QUKge9-Ytef-JM0q-fpAT-kc9c-cHD0-dGoZVN not found or rejected by a filter.
WARNING: Device for PV qgu45O-hQg5-BZp2-g9oe-NXsN-a5lV-53Erq6 not found or rejected by a filter.
--- Physical volume ---
PV Name /dev/sda2
VG Name cl
PV Size <15.00 GiB / not usable 3.00 MiB
Allocatable yes (but full)
PE Size 4.00 MiB
Total PE 3839
Free PE 0
Allocated PE 3839
PV UUID cdn6Ss-tBS2-OSU9-GMJw-xTae-eLv0-EU8p58

--- Physical volume ---
PV Name [unknown]
VG Name cl
PV Size 84.00 GiB / not usable 4.00 MiB
Allocatable yes
PE Size 4.00 MiB
Total PE 21503
Free PE 21503
Allocated PE 0
PV UUID QUKge9-Ytef-JM0q-fpAT-kc9c-cHD0-dGoZVN

--- Physical volume ---
PV Name [unknown]
VG Name cl
PV Size 84.00 GiB / not usable 4.00 MiB
Allocatable yes
PE Size 4.00 MiB
Total PE 21503
Free PE 21503
Allocated PE 0
PV UUID qgu45O-hQg5-BZp2-g9oe-NXsN-a5lV-53Erq6

[[email protected] ~]# vgdisplay
--- Volume group ---
VG Name cl
System ID
Format lvm2
Metadata Areas 1
Metadata Sequence No 3
VG Access read/write
VG Status resizable
MAX LV 0
Cur LV 2
Open LV 2
Max PV 0
Cur PV 1
Act PV 1
VG Size <15.00 GiB
PE Size 4.00 MiB
Total PE 3839
Alloc PE / Size 3839 / <15.00 GiB
Free PE / Size 0 / 0
VG UUID 8YFCKa-zbof-o91I-90n3-2KvR-dSBX-4HLem3

[[email protected] ~]# pvdisplay
--- Physical volume ---
PV Name /dev/sda2
VG Name cl
PV Size <15.00 GiB / not usable 3.00 MiB
Allocatable yes (but full)
PE Size 4.00 MiB
Total PE 3839
Free PE 0
Allocated PE 3839
PV UUID n0bqJP-hCY2-jNIf-rHr5-N9xZ-8zrb-adMiio

[[email protected] ~]# vgdisplay
--- Volume group ---
VG Name cl
System ID
Format lvm2
Metadata Areas 1
Metadata Sequence No 3
VG Access read/write
VG Status resizable
MAX LV 0
Cur LV 2
Open LV 2
Max PV 0
Cur PV 1
Act PV 1
VG Size <15.00 GiB
PE Size 4.00 MiB
Total PE 3839
Alloc PE / Size 3839 / <15.00 GiB
Free PE / Size 0 / 0
VG UUID c2u8D8-5ixB-no7G-b2rH-8z3A-WfuW-XJKHcO

[[email protected] ~]# pvdisplay
--- Physical volume ---
PV Name /dev/sda2
VG Name cl
PV Size <15.00 GiB / not usable 3.00 MiB
Allocatable yes (but full)
PE Size 4.00 MiB
Total PE 3839
Free PE 0
Allocated PE 3839
PV UUID s1uM06-PHtg-rtiu-vMCo-nYy2-vMYJ-McppA2

````

Justluckyg picture Justluckyg  ·  19 Sep 2017
0

What does "/dev/sdb 50G 52M 47G 1% /kubernetes" mean? Is sdb already formatted and mounted on /kubernetes?

jarrpa picture jarrpa  ·  19 Sep 2017
0

it was an addtional vdisk i attahced to each vm and mounted to /kubernetes @jarrpa

Justluckyg picture Justluckyg  ·  19 Sep 2017
0

Ah! The disk must not be mounted. It must just be a raw block device, so no mount and no filesystem formatted onto it. Please "umount /kubernetes" and then do the "wipefs".

jarrpa picture jarrpa  ·  19 Sep 2017
0

@jarrpa

this should be how it looks?

[[email protected] ~]# lsblk
NAME        MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
fd0           2:0    1    4K  0 disk
sda           8:0    0   16G  0 disk
├─sda1        8:1    0    1G  0 part /boot
└─sda2        8:2    0   15G  0 part
  ├─cl-root 253:0    0 13.4G  0 lvm  /
  └─cl-swap 253:1    0  1.6G  0 lvm  [SWAP]
sdb           8:16   0   50G  0 disk
sr0          11:0    1  4.1G  0 rom

Justluckyg picture Justluckyg  ·  19 Sep 2017
0

@Justluckyg Yup!

jarrpa picture jarrpa  ·  19 Sep 2017
0

@jarrpa thanks, i just tried to re-ran the script and im not sure whats happening now, its been stuck at creating node sum-vm-1 ... ID....
Its been like that for the last 10mins..

Justluckyg picture Justluckyg  ·  19 Sep 2017
0

@Justluckyg Ugh... it feels like we're SO CLOSE.

What timezone are you in? Would you be comfortable with an audio call with screen sharing? This would probably get us to a solution much faster.

Otherwise, let's go through the basics one more time:

  1. Delete all related resources in Kubernetes with gk-deploy -g --abort
  2. Unmount and wipe the /dev/sdb devices
  3. Delete the GlusterFS config directories on the hosts
  4. Re-run gk-deploy -gvy

If this fails, capture the full output of:

  1. gk-deploy
  2. kubectl get -o wide --selector=glusterfs.
  3. lsblk on each node
jarrpa picture jarrpa  ·  19 Sep 2017
0

Ugh, sorry for the wrong terminology. Too much time downstream. :) Fixed.

jarrpa picture jarrpa  ·  19 Sep 2017
0

Hi @jarrpa I really appreciate your help on this... I tried it again, i even reset my k8s cluster and redeploy gk-deploy, rebooted the servers, reloaded the modules, re ran the script.. and the farthest ive gotten is still at

Creating cluster ... ID: 181034507612bba11108086588c1e1c9
Creating node sum-vm-1 ... ID: fffeb796201431210a556c9a23e4a555

im GMT+8, Manila timezone, to answer your question.

what other logs i can look at? should i look at the container logs to see where its failing at? theres no error, it just stays there until my ssh session times out.

kubectl get -o wide --selector=glusterfs

You must specify the type of resource to get. Valid resource types include:

    * all
    * certificatesigningrequests (aka 'csr')
    * clusterrolebindings
    * clusterroles
    * clusters (valid only for federation apiservers)
    * componentstatuses (aka 'cs')
    * configmaps (aka 'cm')
    * controllerrevisions
    * cronjobs
    * daemonsets (aka 'ds')
    * deployments (aka 'deploy')
    * endpoints (aka 'ep')
    * events (aka 'ev')
    * horizontalpodautoscalers (aka 'hpa')
    * ingresses (aka 'ing')
    * jobs
    * limitranges (aka 'limits')
    * namespaces (aka 'ns')
    * networkpolicies (aka 'netpol')
    * nodes (aka 'no')
    * persistentvolumeclaims (aka 'pvc')
    * persistentvolumes (aka 'pv')
    * poddisruptionbudgets (aka 'pdb')
    * podpreset
    * pods (aka 'po')
    * podsecuritypolicies (aka 'psp')
    * podtemplates
    * replicasets (aka 'rs')
    * replicationcontrollers (aka 'rc')
    * resourcequotas (aka 'quota')
    * rolebindings
    * roles
    * secrets
    * serviceaccounts (aka 'sa')
    * services (aka 'svc')
    * statefulsets
    * storageclasses
    * thirdpartyresources
    error: Required resource not specified.
Use "kubectl explain <resource>" for a detailed description of that resource (e.g. kubectl explain pods).
See 'kubectl get -h' for help and examples.

this is the same to all 3 nodes

lsblk
NAME        MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
fd0           2:0    1    4K  0 disk
sda           8:0    0  100G  0 disk
├─sda1        8:1    0    1G  0 part /boot
└─sda2        8:2    0   99G  0 part
  ├─cl-root 253:0    0 13.4G  0 lvm  /
  └─cl-swap 253:1    0  1.6G  0 lvm  [SWAP]
sdb           8:16   0   50G  0 disk
sr0          11:0    1  4.1G  0 rom
Justluckyg picture Justluckyg  ·  22 Sep 2017
0

Of course I forgot something, sorry. :( Try getting the output of this: kubectl get all,ds,svc,deploy,secret,sa,clusterrolebinding --selector=glusterfs -o wide. Also see if you can do curl http://<IP>:8080/hello where IP is the IP address of your deploy-heketi service.

You can look at the container logs for the GlusterFS pods and the heketi pod, and also look for any events from the output of their kubectl describe output.

I'm in US CDT (GMT-5), so we're about 13hrs apart. Depending on your schedule, I could get online for a call (generally) before 0000 CDT / 1300 PHT or starting at 0800 CDT / 2100 PHT. We could use Google Hangouts or another tool of your choice.

jarrpa picture jarrpa  ·  22 Sep 2017
0

hi @jarrpa

i recreated my entire topology, i think my desktop (server where im testing this on) hdd is problematic. so i set up the same in aws (used m3.medium) and used same os (centos) deployed the cluster via kubespray (kubectl 1.7.5)

now heketi pod wont launch. :(

Waiting for GlusterFS pods to start ...
Checking status of pods matching '--selector=glusterfs=pod':
glusterfs-drw2p   1/1       Running   0         42s
glusterfs-q1wjb   1/1       Running   0         42s
glusterfs-qsks3   1/1       Running   0         42s
OK
sed: -e expression #2, char 20: unknown option to `s'
/usr/local/bin/kubectl -n default create secret generic heketi-config-secret  --from-file=./heketi.json --from-file=topology.json=topology.json
2017-09-25 02:41:23.068495 I | proto: duplicate proto type registered: google.protobuf.Any
2017-09-25 02:41:23.068566 I | proto: duplicate proto type registered: google.protobuf.Duration
2017-09-25 02:41:23.068585 I | proto: duplicate proto type registered: google.protobuf.Timestamp
secret "heketi-config-secret" created
/usr/local/bin/kubectl -n default label --overwrite secret heketi-config-secret glusterfs=heketi-config-secret heketi=config-secret
secret "heketi-config-secret" labeled
sed -e 's/\${HEKETI_EXECUTOR}/kubernetes/' -e 's/\${HEKETI_FSTAB}//var/lib/heketi/fstab/' -e 's/\${HEKETI_ADMIN_KEY}//' -e 's/\${HEKETI_USER_KEY}//' /home/centos/gluster-kubernetes/deploy/kube-templates/deploy-heketi-deployment.yaml | /usr/local/bin/kubectl -n default create -f - 2>&1
sed: -e expression #2, char 21: unknown option to `s'
google.protobuf.Timestamp
error: no objects passed to create
Waiting for deploy-heketi pod to start ...
Checking status of pods matching '--selector=deploy-heketi=pod':

Timed out waiting for pods matching '--selector=deploy-heketi=pod'.
pod not found.

it may be related to #358 .. not sure though if there's a fix and what could have caused it..
im not using the gluster:master but your experimented version without systemd. because when i tried to use the master again in this set up, i was encountering the same issue with glusterfs pods crashing.

these are other kubectl outputs:

kubectl get pods --all-namespaces
NAMESPACE     NAME                                    READY     STATUS    RESTARTS   AGE
default       glusterfs-drw2p                         1/1       Running   0          15m
default       glusterfs-q1wjb                         1/1       Running   0          15m
default       glusterfs-qsks3                         1/1       Running   0          15m
kube-system   calico-node-1qbdh                       1/1       Running   0          1d
kube-system   calico-node-8nrs5                       1/1       Running   0          1d
kube-system   calico-node-lxsdl                       1/1       Running   0          1d
kube-system   kube-apiserver-kube1                    1/1       Running   0          1d
kube-system   kube-controller-manager-kube1           1/1       Running   0          1d
kube-system   kube-dns-3888408129-lmk1r               3/3       Running   0          1d
kube-system   kube-dns-3888408129-thn7w               3/3       Running   0          1d
kube-system   kube-proxy-kube1                        1/1       Running   0          1d
kube-system   kube-proxy-kube2                        1/1       Running   0          1d
kube-system   kube-proxy-kube3                        1/1       Running   0          1d
kube-system   kube-scheduler-kube1                    1/1       Running   0          1d
kube-system   kubedns-autoscaler-1629318612-fc4rj     1/1       Running   0          1d
kube-system   kubernetes-dashboard-3941213843-59739   1/1       Running   0          1d
kube-system   nginx-proxy-kube2                       1/1       Running   0          1d
kube-system   nginx-proxy-kube3                       1/1       Running   0          1d
kubectl get all,ds,svc,deploy,secret,sa,clusterrolebinding --selector=glusterfs -o wide

NAME                 READY     STATUS    RESTARTS   AGE       IP              NODE
po/glusterfs-drw2p   1/1       Running   0          16m       172.16.100.24   kube3
po/glusterfs-q1wjb   1/1       Running   0          16m       172.16.100.23   kube2
po/glusterfs-qsks3   1/1       Running   0          16m       172.16.100.21   kube1

NAME           DESIRED   CURRENT   READY     UP-TO-DATE   AVAILABLE   NODE-SELECTOR           AGE       CONTAINER(S)   IMAGE(S)                              SELECTOR
ds/glusterfs   3         3         3         3            3           storagenode=glusterfs   16m       glusterfs      jarrpa/gluster-fedora-minimal:block   glusterfs=pod,glusterfs-node=pod

NAME                           TYPE      DATA      AGE
secrets/heketi-config-secret   Opaque    2         15m

NAME                        SECRETS   AGE
sa/heketi-service-account   1         16m

NAME                                 AGE       ROLE      USERS     GROUPS    SERVICEACCOUNTS
clusterrolebindings/heketi-sa-view   16m       edit                          default/heketi-service-account
Justluckyg picture Justluckyg  ·  25 Sep 2017
0

it may be related to #358 .. not sure though if there's a fix and what could have caused it..

Yes. This may be the related one.
Are you trying the latest master and still seeing the issue?

SaravanaStorageNetwork picture SaravanaStorageNetwork  ·  25 Sep 2017
0

@SaravanaStorageNetwork if i use the gluster:master, the glusterfs pod will not even start. it just crashes. im using @jarrpa #298 dev test image

Justluckyg picture Justluckyg  ·  25 Sep 2017
0

@jarrpa @SaravanaStorageNetwork if i use the gluster:master:

kubectl logs -p glusterfs-2m15j

Couldn't find an alternative telinit implementation to spawn.

Justluckyg picture Justluckyg  ·  25 Sep 2017
0

@Justluckyg Could you share your gk-deploy script (maybe through https://gist.github.com)?
I am sure the issue is with sed command usage which you mentioned here https://github.com/gluster/gluster-kubernetes/issues/341#issuecomment-331765212

Couldn't find an alternative telinit implementation to spawn.

sorry..I don't have much insight on this.

SaravanaStorageNetwork picture SaravanaStorageNetwork  ·  25 Sep 2017
0

@Justluckyg You need to use a verison of gk-deploy that has https://github.com/gluster/gluster-kubernetes/pull/354 merged. The latest version of master on gluster-kubernetes will have that.

You should be able to use my experimental images with that version to get it working.

jarrpa picture jarrpa  ·  25 Sep 2017
0

@jarrpa Hi Jose, i tried it again and so far heketi is replying back with the Hello from Heketi however my deployment is still stuck at creating node..

Checking status of pods matching '--selector=deploy-heketi=pod':
deploy-heketi-2199298601-qs27x   1/1       Running   0         9s
OK
Determining heketi service URL ...
OK
/usr/local/bin/kubectl -n default exec -i deploy-heketi-2199298601-qs27x -- heketi-cli -s http://localhost:8080 --user admin --secret '' topology load --json=/etc/heketi/topology.json 2>&1
Creating cluster ... ID: 46b1b30820b03a6d359988c96d74a792
Creating node node-vm-1 ... ID: 3e94d0b032a336a2048613ac59f180a

kubectl get all,ds,svc,deploy,secret,sa,clusterrolebinding --selector=glusterfs -o wide

NAME                                READY     STATUS    RESTARTS   AGE       IP               NODE
po/deploy-heketi-2199298601-qs27x   1/1       Running   0          18m       10.233.118.132   lenddo-vm-3
po/glusterfs-hq6vc                  1/1       Running   0          19m       192.168.1.240    node-vm-1
po/glusterfs-rqk1b                  1/1       Running   0          19m       192.168.1.242    node-vm-3
po/glusterfs-wdv59                  1/1       Running   0          19m       192.168.1.241    node-vm-2

NAME                CLUSTER-IP      EXTERNAL-IP   PORT(S)    AGE       SELECTOR
svc/deploy-heketi   10.233.13.136   <none>        8080/TCP   18m       deploy-heketi=pod

NAME                   DESIRED   CURRENT   UP-TO-DATE   AVAILABLE   AGE       CONTAINER(S)    IMAGE(S)            SELECTOR
deploy/deploy-heketi   1         1         1            1           18m       deploy-heketi   heketi/heketi:dev   deploy-heketi=pod,glusterfs=heketi-pod

NAME                          DESIRED   CURRENT   READY     AGE       CONTAINER(S)    IMAGE(S)            SELECTOR
rs/deploy-heketi-2199298601   1         1         1         18m       deploy-heketi   heketi/heketi:dev   deploy-heketi=pod,glusterfs=heketi-pod,pod-template-hash=2199298601

NAME           DESIRED   CURRENT   READY     UP-TO-DATE   AVAILABLE   NODE-SELECTOR           AGE       CONTAINER(S)   IMAGE(S)                              SELECTOR
ds/glusterfs   3         3         3         3            3           storagenode=glusterfs   19m       glusterfs      jarrpa/gluster-fedora-minimal:block   glusterfs=pod,glusterfs-node=pod

NAME                CLUSTER-IP      EXTERNAL-IP   PORT(S)    AGE       SELECTOR
svc/deploy-heketi   10.233.13.136   <none>        8080/TCP   18m       deploy-heketi=pod

NAME                   DESIRED   CURRENT   UP-TO-DATE   AVAILABLE   AGE       CONTAINER(S)    IMAGE(S)            SELECTOR
deploy/deploy-heketi   1         1         1            1           18m       deploy-heketi   heketi/heketi:dev   deploy-heketi=pod,glusterfs=heketi-pod

NAME                           TYPE      DATA      AGE
secrets/heketi-config-secret   Opaque    3         18m

NAME                        SECRETS   AGE
sa/heketi-service-account   1         19m

NAME                                 AGE       ROLE      USERS     GROUPS    SERVICEACCOUNTS
clusterrolebindings/heketi-sa-view   19m       edit                          default/heketi-service-account

Do you have time tonight (evening your timezone) for a call or google hangout? Thank you so much.

Justluckyg picture Justluckyg  ·  3 Oct 2017
0

@Justluckyg As per log, next step is adding devices. I am not sure why it is failing?

Could you check the following in the nodes?


  1. cat /proc/partitions
    you should see the raw devices which you have mentioned in topology file here. There should not be any partitions on same.
  1. Check
vgs
pvs
lvs

There should not be any devices created on those devices you mentioned in topology file.
if present, clear then using

pvremove <device>
vgremove  <device>
lvremove  <device>

commands.

3.
Also, wipefs those devices like
wipefs -a -f /dev/vdc

4.
As gk-deploy is a bash script, you can add set -x added at beginning of script to see what is going on.

SaravanaStorageNetwork picture SaravanaStorageNetwork  ·  3 Oct 2017
0

hi @SaravanaStorageNetwork my /dev/sdb didnt appear in the pvs, vgs and lvs. i did wiped the /devv/sdb and re-ran the script with set -x added in the beginning of the script, and the script shows stuck as well...

+ [[ Creating cluster ... ID: 04036a8a2331c4c69da5a974473137c1 == return\ [0-9]* ]]
+ output 'Creating cluster ... ID: 04036a8a2331c4c69da5a974473137c1'
+ opts=-e
+ [[ Creating cluster ... ID: 04036a8a2331c4c69da5a974473137c1 == \-\n ]]
+ out='Creating cluster ... ID: 04036a8a2331c4c69da5a974473137c1'
+ echo -e 'Creating cluster ... ID: 04036a8a2331c4c69da5a974473137c1'
+ [[ x != \x ]]
+ read -r line
Creating cluster ... ID: 04036a8a2331c4c69da5a974473137c1
+ [[ Creating node node-vm-1 ... ID: b7569941348ceef2542942889d75248d == return\ [0-9]* ]]
+ output 'Creating node node-vm-1 ... ID: b7569941348ceef2542942889d75248d'
+ opts=-e
+ [[ Creating node node-vm-1 ... ID: b7569941348ceef2542942889d75248d == \-\n ]]
+ out='Creating node node-vm-1 ... ID: b7569941348ceef2542942889d75248d'
+ echo -e 'Creating node node-vm-1 ... ID: b7569941348ceef2542942889d75248d'
+ [[ x != \x ]]
+ read -r line
Creating node node-vm-1 ... ID: b7569941348ceef2542942889d75248d
Justluckyg picture Justluckyg  ·  3 Oct 2017
0

@Justluckyg

Could you check and share your topology.json file?

For example , I have shared mine here for comparison -
https://gist.github.com/SaravanaStorageNetwork/ddbc2446015dcf6c97bf83c7e62011da

SaravanaStorageNetwork picture SaravanaStorageNetwork  ·  3 Oct 2017
0

Hi @SaravanaStorageNetwork

cat topology.json
{
  "clusters": [
    {
      "nodes": [
        {
          "node": {
            "hostnames": {
              "manage": [
                "node-vm-1"
              ],
              "storage": [
                "192.168.1.240"
              ]
            },
            "zone": 1
          },
          "devices": [
            "/dev/sdb"
          ]
        },
        {
          "node": {
            "hostnames": {
              "manage": [
                "node-vm-2"
              ],
              "storage": [
                "192.168.1.241"
              ]
            },
            "zone": 1
          },
          "devices": [
            "/dev/sdb"
          ]
        },
        {
          "node": {
            "hostnames": {
              "manage": [
                "node-vm-3"
              ],
              "storage": [
                "192.168.1.242"
              ]
            },
            "zone": 1
          },
          "devices": [
            "/dev/sdb"
          ]
        }
      ]
    }
  ]
}
Justluckyg picture Justluckyg  ·  3 Oct 2017
0

@Justluckyg Is there anything in the logs of kubectl logs deploy-heketi-2199298601-qs27x that might give a clue?

jarrpa picture jarrpa  ·  3 Oct 2017
1

@Justluckyg Also, today I'll be unavailable between 1900 CDT / 0800 PHT 2200 CDT / 1100 PHT. If you can catch me outside that time but before 0000 CDT / 1300 PHT, I'd be happy to have a chat. Send me a chat request via Hangouts, my username is the email address on my profile. :)

jarrpa picture jarrpa  ·  3 Oct 2017
0

@jarrpa thank u! ill chat you around 1100PHT - 1300 🙂

Justluckyg picture Justluckyg  ·  3 Oct 2017
0

@jarrpa i used the heketi latest on both deploy-heketi-deployment.yaml and heketi-deployment.yaml, still the same, is stuck at creating node..

Justluckyg picture Justluckyg  ·  4 Oct 2017
0

@Justluckyg To answer a mystery I mentioned last night: it turns out that when heketi is using kubeexec (e.g. when it is managing GlusterFS through Kubernetes) it has no operation timeout and thus will hang indefinitely. Unfortunate, but good to know. :)

After our chat last night, I'm trying to bring in more people to look at this. Could you provide an updated comment with your current setup and the latest problem you're facing? Include the following:

  • OS version
  • Kubernetes version
  • gluster-kubernetes git commit (with GitHub link to that tree)
  • Any modifications, including:

    • GlusterFS container image

    • heketi container image

    • Altered gk templates

  • Topology file
  • Description of problem
  • Mention anything you've already tried, including:

    • Deleting and rerunning the deployment (provide commands)

    • Changing container images

  • gk-deploy verbose output (don't run with set -x)
  • output of kubectl get all,ds,svc,deploy,secret,sa,clusterrolebinding --all-namespaces
  • Logs from relevant heketi and GlusterFS pods
jarrpa picture jarrpa  ·  4 Oct 2017
0

Hi @jarrpa here's what i have so far:

# cat /etc/*-release
CentOS Linux release 7.4.1708 (Core)
NAME="CentOS Linux"
VERSION="7 (Core)"
ID="centos"
ID_LIKE="rhel fedora"
VERSION_ID="7"
PRETTY_NAME="CentOS Linux 7 (Core)"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:centos:centos:7"
HOME_URL="https://www.centos.org/"
BUG_REPORT_URL="https://bugs.centos.org/"

CENTOS_MANTISBT_PROJECT="CentOS-7"
CENTOS_MANTISBT_PROJECT_VERSION="7"
REDHAT_SUPPORT_PRODUCT="centos"
REDHAT_SUPPORT_PRODUCT_VERSION="7"

CentOS Linux release 7.4.1708 (Core)
CentOS Linux release 7.4.1708 (Core)


Linux kube2 3.10.0-514.16.1.el7.x86_64 #1 SMP Wed Apr 12 15:04:24 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux

(Tried both on bare metal via vmware esx 5.5 and amazon centos ami)

Used https://github.com/kubernetes-incubator/kubespray.git to deploy Kubernetes cluster.

Disabled selinux and firewalld

Git cloned https://github.com/gluster/gluster-kubernetes.git but was having issue with the gluster centos image where in my glusterfs pods just continuously crashes and never gets to run so I used experimental image: jarrpa/gluster-fedora-minimal:block which resolved my issue with creating the glusterfs pods and heketi pod

Whenever I reset, I use this command to the main node where gluster git was pulled

./gk-deploy -gy --abort

And these commands on all 3

rm -rf /var/lib/heketi /etc/glusterfs /var/lib/glusterd /var/log/glusterfs
wipefs /dev/sdb

I've also tried using heketi:latest image and edited deply-heketi and heketi-deployment yml files.

[[email protected] ~]# kubectl get pods --all-namespaces
NAMESPACE     NAME                                    READY     STATUS    RESTARTS   AGE
default       deploy-heketi-4089530887-88mpq          1/1       Running   0          57s
default       glusterfs-33301                         1/1       Running   0          1m
default       glusterfs-qdz41                         1/1       Running   0          1m
default       glusterfs-s5pm4                         1/1       Running   0          1m
kube-system   kube-apiserver-lenddo-vm-1              1/1       Running   1          2d
kube-system   kube-controller-manager-lenddo-vm-1     1/1       Running   1          2d
kube-system   kube-dns-2410490047-rvcr2               3/3       Running   0          2d
kube-system   kube-dns-2410490047-tx8z7               3/3       Running   0          2d
kube-system   kube-proxy-lenddo-vm-1                  1/1       Running   1          2d
kube-system   kube-proxy-lenddo-vm-2                  1/1       Running   0          2d
kube-system   kube-proxy-lenddo-vm-3                  1/1       Running   0          2d
kube-system   kube-scheduler-lenddo-vm-1              1/1       Running   1          2d
kube-system   kubedns-autoscaler-4166808448-bd17h     1/1       Running   0          2d
kube-system   kubernetes-dashboard-3307607089-3nzts   1/1       Running   0          2d
kube-system   nginx-proxy-lenddo-vm-2                 1/1       Running   0          2d
kube-system   nginx-proxy-lenddo-vm-3                 1/1       Running   0          2d
[[email protected] ~]# kubectl get all,ds,svc,deploy,secret,sa,clusterrolebinding --all-namespaces
NAMESPACE     NAME                                       READY     STATUS    RESTARTS   AGE
default       po/deploy-heketi-4089530887-88mpq          1/1       Running   0          1m
default       po/glusterfs-33301                         1/1       Running   0          1m
default       po/glusterfs-qdz41                         1/1       Running   0          1m
default       po/glusterfs-s5pm4                         1/1       Running   0          1m
kube-system   po/kube-apiserver-lenddo-vm-1              1/1       Running   1          2d
kube-system   po/kube-controller-manager-lenddo-vm-1     1/1       Running   1          2d
kube-system   po/kube-dns-2410490047-rvcr2               3/3       Running   0          2d
kube-system   po/kube-dns-2410490047-tx8z7               3/3       Running   0          2d
kube-system   po/kube-proxy-lenddo-vm-1                  1/1       Running   1          2d
kube-system   po/kube-proxy-lenddo-vm-2                  1/1       Running   0          2d
kube-system   po/kube-proxy-lenddo-vm-3                  1/1       Running   0          2d
kube-system   po/kube-scheduler-lenddo-vm-1              1/1       Running   1          2d
kube-system   po/kubedns-autoscaler-4166808448-bd17h     1/1       Running   0          2d
kube-system   po/kubernetes-dashboard-3307607089-3nzts   1/1       Running   0          2d
kube-system   po/nginx-proxy-lenddo-vm-2                 1/1       Running   0          2d
kube-system   po/nginx-proxy-lenddo-vm-3                 1/1       Running   0          2d

NAMESPACE     NAME                       CLUSTER-IP      EXTERNAL-IP   PORT(S)         AGE
default       svc/deploy-heketi          10.233.50.182   <none>        8080/TCP        1m
default       svc/kubernetes             10.233.0.1      <none>        443/TCP         2d
kube-system   svc/kube-dns               10.233.0.3      <none>        53/UDP,53/TCP   2d
kube-system   svc/kubernetes-dashboard   10.233.33.172   <none>        80/TCP          2d

NAMESPACE     NAME                          DESIRED   CURRENT   UP-TO-DATE   AVAILABLE   AGE
default       deploy/deploy-heketi          1         1         1            1           1m
kube-system   deploy/kube-dns               2         2         2            2           2d
kube-system   deploy/kubedns-autoscaler     1         1         1            1           2d
kube-system   deploy/kubernetes-dashboard   1         1         1            1           2d

NAMESPACE     NAME                                 DESIRED   CURRENT   READY     AGE
default       rs/deploy-heketi-4089530887          1         1         1         1m
kube-system   rs/kube-dns-2410490047               2         2         2         2d
kube-system   rs/kubedns-autoscaler-4166808448     1         1         1         2d
kube-system   rs/kubernetes-dashboard-3307607089   1         1         1         2d

NAMESPACE   NAME           DESIRED   CURRENT   READY     UP-TO-DATE   AVAILABLE   NODE-SELECTOR           AGE
default     ds/glusterfs   3         3         3         3            3           storagenode=glusterfs   1m

NAMESPACE     NAME                       CLUSTER-IP      EXTERNAL-IP   PORT(S)         AGE
default       svc/deploy-heketi          10.233.50.182   <none>        8080/TCP        1m
default       svc/kubernetes             10.233.0.1      <none>        443/TCP         2d
kube-system   svc/kube-dns               10.233.0.3      <none>        53/UDP,53/TCP   2d
kube-system   svc/kubernetes-dashboard   10.233.33.172   <none>        80/TCP          2d

NAMESPACE     NAME                          DESIRED   CURRENT   UP-TO-DATE   AVAILABLE   AGE
default       deploy/deploy-heketi          1         1         1            1           1m
kube-system   deploy/kube-dns               2         2         2            2           2d
kube-system   deploy/kubedns-autoscaler     1         1         1            1           2d
kube-system   deploy/kubernetes-dashboard   1         1         1            1           2d

NAMESPACE     NAME                                         TYPE                                  DATA      AGE
default       secrets/default-token-s24hk                  kubernetes.io/service-account-token   3         2d
default       secrets/heketi-config-secret                 Opaque                                3         1m
default       secrets/heketi-service-account-token-m664z   kubernetes.io/service-account-token   3         1m
kube-public   secrets/default-token-zc6gr                  kubernetes.io/service-account-token   3         2d
kube-system   secrets/default-token-5ng2g                  kubernetes.io/service-account-token   3         2d

NAMESPACE     NAME                        SECRETS   AGE
default       sa/default                  1         2d
default       sa/heketi-service-account   1         1m
kube-public   sa/default                  1         2d
kube-system   sa/default                  1         2d

NAMESPACE   NAME      AGE
error: clusterRoleBinding is not namespaced

[[email protected] ~]# kubectl logs deploy-heketi-4089530887-88mpq
Heketi v5.0.0-5-gb005e0f-release-5
[kubeexec] WARNING 2017/10/05 03:33:34 Rebalance on volume expansion has been enabled.  This is an EXPERIMENTAL feature
[heketi] INFO 2017/10/05 03:33:34 Loaded kubernetes executor
[heketi] INFO 2017/10/05 03:33:34 Loaded simple allocator
[heketi] INFO 2017/10/05 03:33:34 GlusterFS Application Loaded
Listening on port 8080
[negroni] Started GET /clusters
[negroni] Completed 200 OK in 117.209µs
[negroni] Started POST /clusters
[negroni] Completed 201 Created in 1.461793ms
[negroni] Started POST /nodes
[heketi] INFO 2017/10/05 03:33:44 Adding node lenddo-vm-1
[negroni] Completed 202 Accepted in 1.39661ms
[asynchttp] INFO 2017/10/05 03:33:44 asynchttp.go:125: Started job 552abc6ff7662f71d8e06a7a4a415994
[negroni] Started GET /queue/552abc6ff7662f71d8e06a7a4a415994
[negroni] Completed 200 OK in 31.402µs
[heketi] INFO 2017/10/05 03:33:44 Added node 457cf8146e42ac83471c3ab63ccabd4a
[asynchttp] INFO 2017/10/05 03:33:44 asynchttp.go:129: Completed job 552abc6ff7662f71d8e06a7a4a415994 in 2.161574ms
[negroni] Started GET /queue/552abc6ff7662f71d8e06a7a4a415994
[negroni] Completed 303 See Other in 66.491µs
[negroni] Started GET /nodes/457cf8146e42ac83471c3ab63ccabd4a
[negroni] Completed 200 OK in 427.641µs
[negroni] Started POST /devices
[heketi] INFO 2017/10/05 03:33:44 Adding device /dev/sdb to node 457cf8146e42ac83471c3ab63ccabd4a
[negroni] Completed 202 Accepted in 1.857125ms
[asynchttp] INFO 2017/10/05 03:33:44 asynchttp.go:125: Started job 9d4566aefa452e84e9818be55f6ce6e7
[negroni] Started GET /queue/9d4566aefa452e84e9818be55f6ce6e7
[negroni] Completed 200 OK in 43.837µs
[negroni] Started GET /queue/9d4566aefa452e84e9818be55f6ce6e7
[negroni] Completed 200 OK in 52.87µs
[negroni] Started GET /queue/9d4566aefa452e84e9818be55f6ce6e7
[negroni] Completed 200 OK in 50.684µs

[[email protected] ~]# kubectl logs glusterfs-33301
google.protobuf.Timestamp
[glusterd......] [2017-10-05 03:32:52.210335] I [MSGID: 100030] [glusterfsd.c:2475:main] 0-/usr/sbin/glusterd: Started running /usr/sbin/glusterd version 3.10.5 (args: /usr/sbin/glusterd -p /var/run/glusterd.pid --log-level INFO)
[glusterd......] [2017-10-05 03:32:52.660424] I [MSGID: 106478] [glusterd.c:1449:init] 0-management: Maximum allowed open file descriptors set to 65536
[glusterd......] [2017-10-05 03:32:52.660471] I [MSGID: 106479] [glusterd.c:1496:init] 0-management: Using /var/lib/glusterd as working directory
[glusterd......] [2017-10-05 03:32:52.666920] E [rpc-transport.c:283:rpc_transport_load] 0-rpc-transport: /usr/lib64/glusterfs/3.10.5/rpc-transport/rdma.so: cannot open shared object file: No such file or directory
[glusterd......] [2017-10-05 03:32:52.667183] W [rpc-transport.c:287:rpc_transport_load] 0-rpc-transport: volume 'rdma.management': transport-type 'rdma' is not valid or not found on this machine
[glusterd......] [2017-10-05 03:32:52.667200] W [rpcsvc.c:1661:rpcsvc_create_listener] 0-rpc-service: cannot create listener, initing the transport failed
[glusterd......] [2017-10-05 03:32:52.667210] E [MSGID: 106243] [glusterd.c:1720:init] 0-management: creation of 1 listeners failed, continuing with succeeded transport
[glusterd......] [2017-10-05 03:32:55.183840] E [MSGID: 101032] [store.c:433:gf_store_handle_retrieve] 0-: Path corresponding to /var/lib/glusterd/glusterd.info. [No such file or directory]
[glusterd......] [2017-10-05 03:32:55.183880] E [MSGID: 101032] [store.c:433:gf_store_handle_retrieve] 0-: Path corresponding to /var/lib/glusterd/glusterd.info. [No such file or directory]
[glusterd......] [2017-10-05 03:32:55.183884] I [MSGID: 106514] [glusterd-store.c:2219:glusterd_restore_op_version] 0-management: Detected new install. Setting op-version to maximum : 31004
[glusterd......] [2017-10-05 03:32:55.184042] I [MSGID: 106194] [glusterd-store.c:3776:glusterd_store_retrieve_missed_snaps_list] 0-management: No missed snaps list.
[glusterd......] Final graph:
[glusterd......] +------------------------------------------------------------------------------+
[glusterd......] 1: volume management
[glusterd......] 2:     type mgmt/glusterd
[glusterd......] 3:     option rpc-auth.auth-glusterfs on
[glusterd......] 4:     option rpc-auth.auth-unix on
[glusterd......] 5:     option rpc-auth.auth-null on
[glusterd......] 6:     option rpc-auth-allow-insecure on
[glusterd......] 7:     option transport.socket.listen-backlog 128
[glusterd......] 8:     option event-threads 1
[glusterd......] 9:     option ping-timeout 0
[glusterd......] 10:     option transport.socket.read-fail-log off
[glusterd......] 11:     option transport.socket.keepalive-interval 2
[glusterd......] 12:     option transport.socket.keepalive-time 10
[glusterd......] 13:     option transport-type rdma
[glusterd......] 14:     option working-directory /var/lib/glusterd
[glusterd......] 15: end-volume
[glusterd......] 16:
[glusterd......] +------------------------------------------------------------------------------+
[glusterd......] [2017-10-05 03:32:55.185552] I [MSGID: 101190] [event-epoll.c:629:event_dispatch_epoll_worker] 0-epoll: Started thread with index 1
[glusterd......] [2017-10-05 03:32:56.270504] I [MSGID: 106488] [glusterd-handler.c:1538:__glusterd_handle_cli_get_volume] 0-management: Received get vol req
[glusterd......] Starting glusterd ... OK
[glusterd......] [2017-10-05 03:32:56.354918] E [MSGID: 101032] [store.c:433:gf_store_handle_retrieve] 0-: Path corresponding to /var/lib/glusterd/glusterd.info. [No such file or directory]
[glusterd......] [2017-10-05 03:32:56.354975] I [MSGID: 106477] [glusterd.c:190:glusterd_uuid_generate_save] 0-management: generated UUID: 916c3695-1ec9-4dad-bbe5-2bc89afedb2a
[glusterd......] volume set: success
[glusterd......] [2017-10-05 03:32:56.413880] I [run.c:191:runner_log] (-->/usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so(+0xcf4fa) [0x7f54a0fe94fa] -->/usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so(+0xcefac) [0x7f54a0fe8fac] -->/lib64/libglusterfs.so.0(runner_log+0x105) [0x7f54a6010ee5] ) 0-management: Ran script: /var/lib/glusterd/hooks/1/set/post/S30samba-set.sh --volname=all -o cluster.brick-multiplex=on --gd-workdir=/var/lib/glusterd
[glusterd......] [2017-10-05 03:32:56.440866] E [MSGID: 106025] [glusterd-op-sm.c:1097:glusterd_op_stage_set_volume] 0-management: Option with name: cluster.max-bricks-per-process does not exist
[glusterd......] [2017-10-05 03:32:56.440920] E [MSGID: 106301] [glusterd-syncop.c:1321:gd_stage_op_phase] 0-management: Staging of operation 'Volume Set' failed on localhost : option : cluster.max-bricks-per-process does not exist
[glusterd......] Did you mean cluster.max-op-version or ...min-free-inodes?
[glusterd......] volume set: failed: option : cluster.max-bricks-per-process does not exist
[glusterd......] Did you mean cluster.max-op-version or ...min-free-inodes?
[glusterd......] Setting max bricks per process failed
[glusterd......] Killing glusterd ...
[glusterd......] [2017-10-05 03:32:56.443655] I [run.c:191:runner_log] (-->/usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so(+0xcf4fa) [0x7f54a0fe94fa] -->/usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so(+0xcefac) [0x7f54a0fe8fac] -->/lib64/libglusterfs.so.0(runner_log+0x105) [0x7f54a6010ee5] ) 0-management: Ran script: /var/lib/glusterd/hooks/1/set/post/S32gluster_enable_shared_storage.sh --volname=all -o cluster.brick-multiplex=on --gd-workdir=/var/lib/glusterd
[glusterd......] [2017-10-05 03:32:56.446377] W [glusterfsd.c:1332:cleanup_and_exit] (-->/lib64/libpthread.so.0(+0x736d) [0x7f54a4e1536d] -->/usr/sbin/glusterd(glusterfs_sigwaiter+0xf5) [0x7f54a64c3d05] -->/usr/sbin/glusterd(cleanup_and_exit+0x5a) [0x7f54a64c3afa] ) 0-: received signum (15), shutting down
[glusterd......] OK
[glusterd......] [2017-10-05 03:32:57.455322] I [MSGID: 100030] [glusterfsd.c:2475:main] 0-/usr/sbin/glusterd: Started running /usr/sbin/glusterd version 3.10.5 (args: /usr/sbin/glusterd -p /var/run/glusterd.pid --log-level INFO)
[glusterd......] [2017-10-05 03:32:57.461623] I [MSGID: 106478] [glusterd.c:1449:init] 0-management: Maximum allowed open file descriptors set to 65536
[glusterd......] [2017-10-05 03:32:57.461653] I [MSGID: 106479] [glusterd.c:1496:init] 0-management: Using /var/lib/glusterd as working directory
[glusterd......] [2017-10-05 03:32:57.463549] E [rpc-transport.c:283:rpc_transport_load] 0-rpc-transport: /usr/lib64/glusterfs/3.10.5/rpc-transport/rdma.so: cannot open shared object file: No such file or directory
[glusterd......] [2017-10-05 03:32:57.463566] W [rpc-transport.c:287:rpc_transport_load] 0-rpc-transport: volume 'rdma.management': transport-type 'rdma' is not valid or not found on this machine
[glusterd......] [2017-10-05 03:32:57.463572] W [rpcsvc.c:1661:rpcsvc_create_listener] 0-rpc-service: cannot create listener, initing the transport failed
[glusterd......] [2017-10-05 03:32:57.463578] E [MSGID: 106243] [glusterd.c:1720:init] 0-management: creation of 1 listeners failed, continuing with succeeded transport
[glusterd......] [2017-10-05 03:32:59.956252] I [MSGID: 106513] [glusterd-store.c:2197:glusterd_restore_op_version] 0-glusterd: retrieved op-version: 31004
[glusterd......] [2017-10-05 03:32:59.956372] I [MSGID: 106194] [glusterd-store.c:3776:glusterd_store_retrieve_missed_snaps_list] 0-management: No missed snaps list.
[glusterd......] Final graph:
[glusterd......] +------------------------------------------------------------------------------+
[glusterd......] 1: volume management
[glusterd......] 2:     type mgmt/glusterd
[glusterd......] 3:     option rpc-auth.auth-glusterfs on
[glusterd......] 4:     option rpc-auth.auth-unix on
[glusterd......] 5:     option rpc-auth.auth-null on
[glusterd......] 6:     option rpc-auth-allow-insecure on
[glusterd......] 7:     option transport.socket.listen-backlog 128
[glusterd......] 8:     option event-threads 1
[glusterd......] 9:     option ping-timeout 0
[glusterd......] 10:     option transport.socket.read-fail-log off
[glusterd......] 11:     option transport.socket.keepalive-interval 2
[glusterd......] 12:     option transport.socket.keepalive-time 10
[glusterd......] 13:     option transport-type rdma
[glusterd......] 14:     option working-directory /var/lib/glusterd
[glusterd......] 15: end-volume
[glusterd......] 16:
[glusterd......] +------------------------------------------------------------------------------+
[glusterd......] [2017-10-05 03:32:59.957792] I [MSGID: 101190] [event-epoll.c:629:event_dispatch_epoll_worker] 0-epoll: Started thread with index 1
[glusterd......] [2017-10-05 03:33:01.027934] I [MSGID: 106488] [glusterd-handler.c:1538:__glusterd_handle_cli_get_volume] 0-management: Received get vol req
[glusterd......] Starting glusterd ... OK
[tcmu-runner...] 2017-10-05 03:33:01.042 250 [DEBUG] main:816 : handler path: /usr/lib64/tcmu-runner
[tcmu-runner...] 2017-10-05 03:33:01.042 250 [INFO] load_our_module:489 : no modules directory '/lib/modules/3.10.0-514.26.2.el7.x86_64', checking module target_core_user entry in '/sys/modules/'
[tcmu-runner...] 2017-10-05 03:33:01.042 250 [DEBUG] load_our_module:492 : Module target_core_user already loaded
[tcmu-runner...] 2017-10-05 03:33:01.042 250 [DEBUG] main:829 : 1 runner handlers found
[tcmu-runner...] 2017-10-05 03:33:01.042 250 [ERROR] set_genl_features:279 : Could not set features. Error -10
[tcmu-runner...] 2017-10-05 03:33:01.045 250 [DEBUG] dbus_bus_acquired:437 : bus org.kernel.TCMUService1 acquired
[tcmu-runner...] 2017-10-05 03:33:01.045 250 [DEBUG] dbus_name_acquired:453 : name org.kernel.TCMUService1 acquired
[[email protected] deploy]# ./gk-deploy -gvy
Using Kubernetes CLI.

Checking status of namespace matching 'default':
default   Active    2d
Using namespace "default".
Checking for pre-existing resources...
  GlusterFS pods ...
Checking status of pods matching '--selector=glusterfs=pod':

Timed out waiting for pods matching '--selector=glusterfs=pod'.
not found.
  deploy-heketi pod ...
Checking status of pods matching '--selector=deploy-heketi=pod':

Timed out waiting for pods matching '--selector=deploy-heketi=pod'.
not found.
  heketi pod ...
Checking status of pods matching '--selector=heketi=pod':

Timed out waiting for pods matching '--selector=heketi=pod'.
not found.
  gluster-s3 pod ...
Checking status of pods matching '--selector=glusterfs=s3-pod':

Timed out waiting for pods matching '--selector=glusterfs=s3-pod'.
not found.
Creating initial resources ... /usr/local/bin/kubectl -n default create -f /root/gluster-kubernetes/deploy/kube-templates/heketi-service-account.yaml 2>&1
serviceaccount "heketi-service-account" created
/usr/local/bin/kubectl -n default create clusterrolebinding heketi-sa-view --clusterrole=edit --serviceaccount=default:heketi-service-account 2>&1
clusterrolebinding "heketi-sa-view" created
/usr/local/bin/kubectl -n default label --overwrite clusterrolebinding heketi-sa-view glusterfs=heketi-sa-view heketi=sa-view
clusterrolebinding "heketi-sa-view" labeled
OK
Marking 'lenddo-vm-1' as a GlusterFS node.
/usr/local/bin/kubectl -n default label nodes lenddo-vm-1 storagenode=glusterfs --overwrite 2>&1
node "lenddo-vm-1" labeled
Marking 'lenddo-vm-2' as a GlusterFS node.
/usr/local/bin/kubectl -n default label nodes lenddo-vm-2 storagenode=glusterfs --overwrite 2>&1
node "lenddo-vm-2" labeled
Marking 'lenddo-vm-3' as a GlusterFS node.
/usr/local/bin/kubectl -n default label nodes lenddo-vm-3 storagenode=glusterfs --overwrite 2>&1
node "lenddo-vm-3" labeled
Deploying GlusterFS pods.
sed -e 's/storagenode\: glusterfs/storagenode\: 'glusterfs'/g' /root/gluster-kubernetes/deploy/kube-templates/glusterfs-daemonset.yaml | /usr/local/bin/kubectl -n default create -f - 2>&1
daemonset "glusterfs" created
Waiting for GlusterFS pods to start ...
Checking status of pods matching '--selector=glusterfs=pod':
glusterfs-33301   1/1       Running   0         41s
glusterfs-qdz41   1/1       Running   0         41s
glusterfs-s5pm4   1/1       Running   0         41s
OK
/usr/local/bin/kubectl -n default create secret generic heketi-config-secret --from-file=private_key=/dev/null --from-file=./heketi.json --from-file=topology.json=topology.json
secret "heketi-config-secret" created
/usr/local/bin/kubectl -n default label --overwrite secret heketi-config-secret glusterfs=heketi-config-secret heketi=config-secret
secret "heketi-config-secret" labeled
sed -e 's/\${HEKETI_EXECUTOR}/kubernetes/' -e 's#\${HEKETI_FSTAB}#/var/lib/heketi/fstab#' -e 's/\${HEKETI_ADMIN_KEY}//' -e 's/\${HEKETI_USER_KEY}//' /root/gluster-kubernetes/deploy/kube-templates/deploy-heketi-deployment.yaml | /usr/local/bin/kubectl -n default create -f - 2>&1
service "deploy-heketi" created
deployment "deploy-heketi" created
Waiting for deploy-heketi pod to start ...
Checking status of pods matching '--selector=deploy-heketi=pod':
deploy-heketi-4089530887-88mpq   1/1       Running   0         9s
OK
Determining heketi service URL ... 2017-10-05 11:33:43.820264 I | OK
/usr/local/bin/kubectl -n default exec -i deploy-heketi-4089530887-88mpq -- heketi-cli -s http://localhost:8080 --user admin --secret '' topology load --json=/etc/heketi/topology.json 2>&1
Creating cluster ... ID: f77ba0ffa6a37fd10ec16c3726d401a3
Creating node lenddo-vm-1 ... ID: 457cf8146e42ac83471c3ab63ccabd4a
Justluckyg picture Justluckyg  ·  5 Oct 2017
0

@Justluckyg Updated your comment to fix some formatting issues. :)

Thanks for the info! I'm assuming the topology file mentioned here is still current, yes?

When you wipe the drives, are you running wipefs /dev/sdb or wipefs -a /dev/sdb?

Also, we're still missing the following information:

  • Kubernetes version
  • gluster-kubernetes git commit (the actual commit hash you're using)
jarrpa picture jarrpa  ·  5 Oct 2017
0

@jarrpa yes that topology is still current. hostname is different, lenddo-vm-1, lenddo-vm-2 and lenddo-vm-3.

i was only using wipefs /dev/sdb
I wasnt using the -a flag.

kubectl version

Client Version: version.Info{Major:"1", Minor:"7", GitVersion:"v1.7.3+coreos.0", GitCommit:"42de91f04e456f7625941a6c4aaedaa69708be1b", GitTreeState:"clean", BuildDate:"2017-08-07T19:44:31Z", GoVersion:"go1.8.3", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"7", GitVersion:"v1.7.3+coreos.0", GitCommit:"42de91f04e456f7625941a6c4aaedaa69708be1b", GitTreeState:"clean", BuildDate:"2017-08-07T19:44:31Z", GoVersion:"go1.8.3", Compiler:"gc", Platform:"linux/amd64"}


[[email protected] gluster-kubernetes]# git log
commit b39f85357c3c46ba5ce508ac1b67e7d0a546ae6c
Merge: 9234f81 200b708
Author: Jose A. Rivera <[email protected]>
Date:   Thu Sep 7 08:09:36 2017 -0500

    Merge pull request #346 from SaravanaStorageNetwork/fix_kube_template

    Fix default value for GB_GLFS_LRU_COUNT


[[email protected] gluster-kubernetes]# git status
# On branch master
# Changes not staged for commit:
#   (use "git add <file>..." to update what will be committed)
#   (use "git checkout -- <file>..." to discard changes in working directory)
#
#   modified:   deploy/gk-deploy
#   modified:   deploy/kube-templates/deploy-heketi-deployment.yaml
#   modified:   deploy/kube-templates/glusterfs-daemonset.yaml
#   modified:   deploy/kube-templates/heketi-deployment.yaml
#
# Untracked files:
#   (use "git add <file>..." to include in what will be committed)
#
#   deploy/gk-deploy-old
no changes added to commit (use "git add" and/or "git commit -a")

I replaced the gk-deploy script with the gk-deploy from the master branch, didnt make any difference.

Justluckyg picture Justluckyg  ·  5 Oct 2017
0

ok.. i had some progress after resetting the gluster and adding -a in wipefs.. @jarrpa

Creating cluster ... ID: a34633291757811b9164303d76463ddb
Creating node lenddo-vm-1 ... ID: 751ed953092fe6059e9e9aaa9ef5e1b1
Adding device /dev/sdb ... OK
Creating node lenddo-vm-2 ... ID: 63e4da78a85015176422390f08a5fff0
Adding device /dev/sdb ... OK
Creating node lenddo-vm-3 ... ID: 7f3fc5d496be4403a6abd57c71925fa3
Adding device /dev/sdb ... OK
heketi topology loaded.
/usr/local/bin/kubectl -n default exec -i deploy-heketi-4089530887-l58qj -- heketi-cli -s http://localhost:8080 --user admin --secret '' setup-openshift-heketi-storage --listfile=/tmp/heketi-storage.json 2>&1
Saving /tmp/heketi-storage.json
/usr/local/bin/kubectl -n default exec -i deploy-heketi-4089530887-l58qj -- cat /tmp/heketi-storage.json | /usr/local/bin/kubectl -n default create -f - 2>&1
secret "heketi-storage-secret" created
endpoints "heketi-storage-endpoints" created
service "heketi-storage-endpoints" created
job "heketi-storage-copy-job" created

Checking status of pods matching '--selector=job-name=heketi-storage-copy-job':
heketi-storage-copy-job-fsrgw   0/1       ContainerCreating   0         4m

And this is what im getting when i desrcribe the heketistorage pod

kubectl describe pod heketi-storage-copy-job-fsrgw
2017-10-05 12:51:37.703881 I | proto: duplicate proto type registered: google.protobuf.Any
2017-10-05 12:51:37.703952 I | proto: duplicate proto type registered: google.protobuf.Duration
2017-10-05 12:51:37.703967 I | proto: duplicate proto type registered: google.protobuf.Timestamp
Name:       heketi-storage-copy-job-fsrgw
Namespace:  default
Node:       lenddo-vm-2/192.168.1.241
Start Time: Thu, 05 Oct 2017 12:46:44 +0800
Labels:     controller-uid=2fc039df-a988-11e7-9a8d-000c29580815
        job-name=heketi-storage-copy-job
Annotations:    kubernetes.io/created-by={"kind":"SerializedReference","apiVersion":"v1","reference":{"kind":"Job","namespace":"default","name":"heketi-storage-copy-job","uid":"2fc039df-a988-11e7-9a8d-000c29580815","...
Status:     Pending
IP:
Created By: Job/heketi-storage-copy-job
Controlled By:  Job/heketi-storage-copy-job
Containers:
  heketi:
    Container ID:
    Image:      heketi/heketi:dev
    Image ID:
    Port:       <none>
    Command:
      cp
      /db/heketi.db
      /heketi
    State:      Waiting
      Reason:       ContainerCreating
    Ready:      False
    Restart Count:  0
    Environment:    <none>
    Mounts:
      /db from heketi-storage-secret (rw)
      /heketi from heketi-storage (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-s24hk (ro)
Conditions:
  Type      Status
  Initialized   True
  Ready     False
  PodScheduled  True
Volumes:
  heketi-storage:
    Type:       Glusterfs (a Glusterfs mount on the host that shares a pod's lifetime)
    EndpointsName:  heketi-storage-endpoints
    Path:       heketidbstorage
    ReadOnly:       false
  heketi-storage-secret:
    Type:   Secret (a volume populated by a Secret)
    SecretName: heketi-storage-secret
    Optional:   false
  default-token-s24hk:
    Type:   Secret (a volume populated by a Secret)
    SecretName: default-token-s24hk
    Optional:   false
QoS Class:  BestEffort
Node-Selectors: <none>
Tolerations:    <none>
Events:
  FirstSeen LastSeen    Count   From            SubObjectPath   Type        Reason          Message
  --------- --------    -----   ----            -------------   --------    ------          -------
  4m        4m      1   default-scheduler           Normal      Scheduled       Successfully assigned heketi-storage-copy-job-fsrgw to lenddo-vm-2
  4m        4m      1   kubelet, lenddo-vm-2            Normal      SuccessfulMountVolume   MountVolume.SetUp succeeded for volume "default-token-s24hk"
  4m        4m      1   kubelet, lenddo-vm-2            Normal      SuccessfulMountVolume   MountVolume.SetUp succeeded for volume "heketi-storage-secret"
  4m        4m      1   kubelet, lenddo-vm-2            Warning     FailedMount     MountVolume.SetUp failed for volume "heketi-storage" : glusterfs: mount failed: mount failed: exit status 1
Mounting command: mount
Mounting arguments: 192.168.1.242:heketidbstorage /var/lib/kubelet/pods/2fd9b6c3-a988-11e7-9a8d-000c29580815/volumes/kubernetes.io~glusterfs/heketi-storage glusterfs [log-level=ERROR log-file=/var/lib/kubelet/plugins/kubernetes.io/glusterfs/heketi-storage/heketi-storage-copy-job-fsrgw-glusterfs.log backup-volfile-servers=192.168.1.240:192.168.1.241:192.168.1.242]
Output: ERROR: Mount point does not exist
Please specify a mount point
Usage:
man 8 /sbin/mount.glusterfs

 the following error information was pulled from the glusterfs log to help diagnose this issue:
/usr/sbin/glusterfs(+0x69a0)[0x7f08f82e69a0]
---------

  4m    41s 9   kubelet, lenddo-vm-2        Warning FailedMount MountVolume.SetUp failed for volume "heketi-storage" : stat /var/lib/kubelet/pods/2fd9b6c3-a988-11e7-9a8d-000c29580815/volumes/kubernetes.io~glusterfs/heketi-storage: transport endpoint is not connected
  2m    34s 2   kubelet, lenddo-vm-2        Warning FailedMount Unable to mount volumes for pod "heketi-storage-copy-job-fsrgw_default(2fd9b6c3-a988-11e7-9a8d-000c29580815)": timeout expired waiting for volumes to attach/mount for pod "default"/"heketi-storage-copy-job-fsrgw". list of unattached/unmounted volumes=[heketi-storage]
  2m    34s 2   kubelet, lenddo-vm-2        Warning FailedSync  Error syncing pod
Checking status of pods matching '--selector=job-name=heketi-storage-copy-job':
heketi-storage-copy-job-fsrgw   0/1       ContainerCreating   0         5m
Timed out waiting for pods matching '--selector=job-name=heketi-storage-copy-job'.
Error waiting for job 'heketi-storage-copy-job' to complete.'

kubectl logs heketi-storage-copy-job-fsrgw

Error from server (BadRequest): container "heketi" in pod "heketi-storage-copy-job-fsrgw" is waiting to start: ContainerCreating


kubectl get all,ds,svc,deploy,secret,sa,clusterrolebinding --all-namespaces

NAMESPACE     NAME                                       READY     STATUS              RESTARTS   AGE
default       po/deploy-heketi-4089530887-l58qj          1/1       Running             0          10m
default       po/glusterfs-4n5f6                         1/1       Running             0          10m
default       po/glusterfs-kdkw3                         1/1       Running             0          10m
default       po/glusterfs-kh863                         1/1       Running             0          10m
default       po/heketi-storage-copy-job-fsrgw           0/1       ContainerCreating   0          9m
kube-system   po/kube-apiserver-lenddo-vm-1              1/1       Running             1          2d
kube-system   po/kube-controller-manager-lenddo-vm-1     1/1       Running             1          2d
kube-system   po/kube-dns-2410490047-rvcr2               3/3       Running             0          2d
kube-system   po/kube-dns-2410490047-tx8z7               3/3       Running             0          2d
kube-system   po/kube-proxy-lenddo-vm-1                  1/1       Running             1          2d
kube-system   po/kube-proxy-lenddo-vm-2                  1/1       Running             0          2d
kube-system   po/kube-proxy-lenddo-vm-3                  1/1       Running             0          2d
kube-system   po/kube-scheduler-lenddo-vm-1              1/1       Running             1          2d
kube-system   po/kubedns-autoscaler-4166808448-bd17h     1/1       Running             0          2d
kube-system   po/kubernetes-dashboard-3307607089-3nzts   1/1       Running             0          2d
kube-system   po/nginx-proxy-lenddo-vm-2                 1/1       Running             0          2d
kube-system   po/nginx-proxy-lenddo-vm-3                 1/1       Running             0          2d

NAMESPACE     NAME                           CLUSTER-IP      EXTERNAL-IP   PORT(S)         AGE
default       svc/deploy-heketi              10.233.27.89    <none>        8080/TCP        10m
default       svc/heketi-storage-endpoints   10.233.63.78    <none>        1/TCP           9m
default       svc/kubernetes                 10.233.0.1      <none>        443/TCP         2d
kube-system   svc/kube-dns                   10.233.0.3      <none>        53/UDP,53/TCP   2d
kube-system   svc/kubernetes-dashboard       10.233.33.172   <none>        80/TCP          2d

NAMESPACE   NAME                           DESIRED   SUCCESSFUL   AGE
default     jobs/heketi-storage-copy-job   1         0            9m

NAMESPACE     NAME                          DESIRED   CURRENT   UP-TO-DATE   AVAILABLE   AGE
default       deploy/deploy-heketi          1         1         1            1           10m
kube-system   deploy/kube-dns               2         2         2            2           2d
kube-system   deploy/kubedns-autoscaler     1         1         1            1           2d
kube-system   deploy/kubernetes-dashboard   1         1         1            1           2d

NAMESPACE     NAME                                 DESIRED   CURRENT   READY     AGE
default       rs/deploy-heketi-4089530887          1         1         1         10m
kube-system   rs/kube-dns-2410490047               2         2         2         2d
kube-system   rs/kubedns-autoscaler-4166808448     1         1         1         2d
kube-system   rs/kubernetes-dashboard-3307607089   1         1         1         2d

NAMESPACE   NAME           DESIRED   CURRENT   READY     UP-TO-DATE   AVAILABLE   NODE-SELECTOR           AGE
default     ds/glusterfs   3         3         3         3            3           storagenode=glusterfs   10m

NAMESPACE     NAME                           CLUSTER-IP      EXTERNAL-IP   PORT(S)         AGE
default       svc/deploy-heketi              10.233.27.89    <none>        8080/TCP        10m
default       svc/heketi-storage-endpoints   10.233.63.78    <none>        1/TCP           9m
default       svc/kubernetes                 10.233.0.1      <none>        443/TCP         2d
kube-system   svc/kube-dns                   10.233.0.3      <none>        53/UDP,53/TCP   2d
kube-system   svc/kubernetes-dashboard       10.233.33.172   <none>        80/TCP          2d

NAMESPACE     NAME                          DESIRED   CURRENT   UP-TO-DATE   AVAILABLE   AGE
default       deploy/deploy-heketi          1         1         1            1           10m
kube-system   deploy/kube-dns               2         2         2            2           2d
kube-system   deploy/kubedns-autoscaler     1         1         1            1           2d
kube-system   deploy/kubernetes-dashboard   1         1         1            1           2d

NAMESPACE     NAME                                         TYPE                                  DATA      AGE
default       secrets/default-token-s24hk                  kubernetes.io/service-account-token   3         2d
default       secrets/heketi-config-secret                 Opaque                                3         10m
default       secrets/heketi-service-account-token-lltzn   kubernetes.io/service-account-token   3         10m
default       secrets/heketi-storage-secret                Opaque                                1         9m
kube-public   secrets/default-token-zc6gr                  kubernetes.io/service-account-token   3         2d
kube-system   secrets/default-token-5ng2g                  kubernetes.io/service-account-token   3         2d

NAMESPACE     NAME                        SECRETS   AGE
default       sa/default                  1         2d
default       sa/heketi-service-account   1         10m
kube-public   sa/default                  1         2d
kube-system   sa/default                  1         2d

NAMESPACE   NAME      AGE
error: clusterRoleBinding is not namespaced
Justluckyg picture Justluckyg  ·  5 Oct 2017
0

@jarrpa so I aborted the glusterfs again and deleted the files and dirs, wipefs -af /dev/sdb and now im getting this. the output of lsblk is now different...
If i do vgs, i dont see any volume group for /dev/sdb.
how do i remove this?

Creating cluster ... ID: bf66059410f91c5cebbbd8f5e308a2cc
Creating node lenddo-vm-1 ... ID: 5f33e9ba54784a66863d2e5fca9156cd
Adding device /dev/sdb ... Unable to add device: Unable to execute command on glusterfs-hq076:   Couldn't find device with uuid QUKge9-Ytef-JM0q-fpAT-kc9c-cHD0-dGoZVN.
Couldn't find device with uuid qgu45O-hQg5-BZp2-g9oe-NXsN-a5lV-53Erq6.
Can't open /dev/sdb exclusively.  Mounted filesystem?
Creating node lenddo-vm-2 ... ID: 32fccc28362fd7fa65fda8239358b492
Adding device /dev/sdb ... Unable to add device: Unable to execute command on glusterfs-vc0mb:   Can't open /dev/sdb exclusively.  Mounted filesystem?
Creating node lenddo-vm-3 ... ID: 23cf83e60be6c4851ab032094f06e43f
Adding device /dev/sdb ... Unable to add device: Unable to execute command on glusterfs-dvl3t:   Can't open /dev/sdb exclusively.  Mounted filesystem?
Error loading the cluster topology.
Please check the failed node or device and rerun this script.
**[[email protected] deploy]# lsblk**
NAME                                                                              MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
fd0                                                                                 2:0    1    4K  0 disk
sda                                                                                 8:0    0  100G  0 disk
├─sda1                                                                              8:1    0    1G  0 part /boot
└─sda2                                                                              8:2    0   99G  0 part
  ├─cl-root                                                                       253:0    0 13.4G  0 lvm  /
  └─cl-swap                                                                       253:1    0  1.6G  0 lvm  [SWAP]
sdb                                                                                 8:16   0   50G  0 disk
├─vg_157b3107d4607da4385c106dbf3c2efe-tp_b19693f10248b6144064cf6b97057c33_tmeta   253:2    0   12M  0 lvm
│ └─vg_157b3107d4607da4385c106dbf3c2efe-tp_b19693f10248b6144064cf6b97057c33-tpool 253:4    0    2G  0 lvm
│   ├─vg_157b3107d4607da4385c106dbf3c2efe-tp_b19693f10248b6144064cf6b97057c33     253:5    0    2G  0 lvm
│   └─vg_157b3107d4607da4385c106dbf3c2efe-brick_b19693f10248b6144064cf6b97057c33  253:6    0    2G  0 lvm
└─vg_157b3107d4607da4385c106dbf3c2efe-tp_b19693f10248b6144064cf6b97057c33_tdata   253:3    0    2G  0 lvm
  └─vg_157b3107d4607da4385c106dbf3c2efe-tp_b19693f10248b6144064cf6b97057c33-tpool 253:4    0    2G  0 lvm
    ├─vg_157b3107d4607da4385c106dbf3c2efe-tp_b19693f10248b6144064cf6b97057c33     253:5    0    2G  0 lvm
    └─vg_157b3107d4607da4385c106dbf3c2efe-brick_b19693f10248b6144064cf6b97057c33  253:6    0    2G  0 l
Justluckyg picture Justluckyg  ·  5 Oct 2017
0

Does

vgremove \

of corresponding vg devices of sdb does not helps?

SaravanaStorageNetwork picture SaravanaStorageNetwork  ·  5 Oct 2017
0

@SaravanaStorageNetwork i tried vgremove vg_157b3107d4607da4385c106dbf3c2efe-tp_b19693f10248b6144064cf6b97057c33_tmeta

but it said its not a valid vg.

Justluckyg picture Justluckyg  ·  5 Oct 2017
0

Check whether this helps, sdb in your case.

As long as you don't care what's on the drive at /dev/sdc, try this:

dd if=/dev/zero of=/dev/sdc bs=1m count=10
That will zero out the first 10 MB of the disk, including any LVM or RAID headers. Then reboot; the system should see that the disk is no longer a part of any LVM group.

https://stackoverflow.com/questions/28421676/unable-to-remove-volume-group-due-to-incorrect-metadata-area-header-checksum

SaravanaStorageNetwork picture SaravanaStorageNetwork  ·  5 Oct 2017
0

@SaravanaStorageNetwork I was able to pass the adding device on nodes, however this is what im getting now:

Creating cluster ... ID: 806ce4f0897b5fa7e66f782c267d79fe
Creating node lenddo-vm-1 ... ID: 8999d9d0190c29068a92351a0f374496
Adding device /dev/sdb ... OK
Creating node lenddo-vm-2 ... ID: fc1ab38c5952c471bec13dae1805225c
Adding device /dev/sdb ... OK
Creating node lenddo-vm-3 ... ID: c56d9a002ac915533331e788541386dc
Adding device /dev/sdb ... OK
heketi topology loaded.
/usr/local/bin/kubectl -n default exec -i deploy-heketi-4089530887-t0jj6 -- heketi-cli -s http://localhost:8080 --user admin --secret '' setup-openshift-heketi-storage --listfile=/tmp/heketi-storage.json 2>&1

Saving /tmp/heketi-storage.json
/usr/local/bin/kubectl -n default exec -i deploy-heketi-4089530887-t0jj6 -- cat /tmp/heketi-storage.json | /usr/local/bin/kubectl -n default create -f - 2>&1

secret "heketi-storage-secret" created
job "heketi-storage-copy-job" created
Error from server (AlreadyExists): endpoints "heketi-storage-endpoints" already exists
Error from server (AlreadyExists): services "heketi-storage-endpoints" already exists
Failed on creating heketi storage resources.
Justluckyg picture Justluckyg  ·  5 Oct 2017
0

You have some stale resources still present.

By using abort option in the script, try deleting existing resources and check for any existing resources.
If present, delete them manually and try again

SaravanaStorageNetwork picture SaravanaStorageNetwork  ·  5 Oct 2017
0

@SaravanaStorageNetwork where can i find the stale resources for heketi-storage-endpoints?

I always do this on all nodes whenever i abort:
rm -rf /var/lib/heketi /etc/glusterfs /var/lib/glusterd /var/log/glusterfs

Am i missing any other directories?

Justluckyg picture Justluckyg  ·  5 Oct 2017
0

@Justluckyg Run gk-deploy -g --abort then inspect the output of kubectl get svc,ep --all-namespaces to see if anything it left. If so, let us know as that's a bug. :)

jarrpa picture jarrpa  ·  5 Oct 2017
0

@Justluckyg For the record, "resources" in this case are Kubernetes objects like Pods and Endpoints.

jarrpa picture jarrpa  ·  5 Oct 2017
0

@jarrpa thank you i will try that tomorrow on prem (dont have access remotely :()

by the way, would you know as well what would cause this:

Checking status of pods matching '--selector=job-name=heketi-storage-copy-job':
heketi-storage-copy-job-fsrgw   0/1       ContainerCreating   0         5m
Timed out waiting for pods matching '--selector=job-name=heketi-storage-copy-job'.
Error waiting for job 'heketi-storage-copy-job' to complete.'

kubectl logs heketi-storage-copy-job-fsrgw

Error from server (BadRequest): container "heketi" in pod "heketi-storage-copy-job-fsrgw" is waiting to start: ContainerCreating

the container never gets created for heketi storage copy

Justluckyg picture Justluckyg  ·  5 Oct 2017
0

@Justluckyg Updated your comment again for proper formatting. :)

If you do an kubectl describe on the job it'll probably show that it can't mount the GlusterFS volume, which isn't surprising given the incomplete state of your setup.

jarrpa picture jarrpa  ·  5 Oct 2017
0

@jarrpa when i describe the heketi storage pod, this is what im getting:

QoS Class:  BestEffort
Node-Selectors: <none>
Tolerations:    <none>
Events:
  FirstSeen LastSeen    Count   From            SubObjectPath   Type        Reason      Message
  --------- --------    -----   ----            -------------   --------    ------      -------
  18h       1m      492 kubelet, lenddo-vm-2            Warning     FailedMount Unable to mount volumes for pod "heketi-storage-copy-job-zd4js_default(0e39e43c-a9ac-11e7-b78f-000c29580815)": timeout expired waiting for volumes to attach/mount for pod "default"/"heketi-storage-copy-job-zd4js". list of unattached/unmounted volumes=[heketi-storage]
  18h       1m      492 kubelet, lenddo-vm-2            Warning     FailedSync  Error syncing pod
  18h       1m      555 kubelet, lenddo-vm-2            Warning     FailedMount MountVolume.SetUp failed for volume "heketi-storage" : stat /var/lib/kubelet/pods/0e39e43c-a9ac-11e7-b78f-000c29580815/volumes/kubernetes.io~glusterfs/heketi-storage: transport endpoint is not connected

and when i do df -h on the node vm 2:

[[email protected] ~]# df -h
df: ‘/var/lib/kubelet/pods/0e39e43c-a9ac-11e7-b78f-000c29580815/volumes/kubernetes.io~glusterfs/heketi-storage’: Transport endpoint is not connected
Filesystem           Size  Used Avail Use% Mounted on
/dev/mapper/cl-root   14G  7.1G  6.4G  53% /
devtmpfs             910M     0  910M   0% /dev
tmpfs                920M     0  920M   0% /dev/shm
tmpfs                920M  102M  819M  11% /run
tmpfs                920M     0  920M   0% /sys/fs/cgroup
/dev/sda1           1014M  184M  831M  19% /boot
[[email protected] ~]# kubectl get endpoints

NAME                       ENDPOINTS                                         AGE
deploy-heketi              10.233.118.150:8080                               18h
heketi-storage-endpoints   192.168.1.240:1,192.168.1.241:1,192.168.1.242:1   23h
kubernetes                 192.168.1.240:6443                                3d

[[email protected] ~]# kubectl get all,ds,svc,deploy,secret,sa,clusterrolebinding --all-namespaces

NAMESPACE     NAME                                       READY     STATUS              RESTARTS   AGE
default       po/deploy-heketi-4089530887-gl8cs          1/1       Running             0          18h
default       po/glusterfs-kw67h                         1/1       Running             0          18h
default       po/glusterfs-tdfnn                         1/1       Running             0          18h
default       po/glusterfs-z4s1q                         1/1       Running             0          18h
default       po/heketi-storage-copy-job-zd4js           0/1       ContainerCreating   0          18h
kube-system   po/kube-apiserver-lenddo-vm-1              1/1       Running             4          3d
kube-system   po/kube-controller-manager-lenddo-vm-1     1/1       Running             5          3d
kube-system   po/kube-dns-2410490047-f36cs               3/3       Running             3          21h
kube-system   po/kube-dns-2410490047-m258q               3/3       Running             3          21h
kube-system   po/kube-proxy-lenddo-vm-1                  1/1       Running             4          3d
kube-system   po/kube-proxy-lenddo-vm-2                  1/1       Running             3          3d
kube-system   po/kube-proxy-lenddo-vm-3                  1/1       Running             3          3d
kube-system   po/kube-scheduler-lenddo-vm-1              1/1       Running             4          3d
kube-system   po/kubedns-autoscaler-4166808448-bd17h     0/1       Error               2          3d
kube-system   po/kubernetes-dashboard-3307607089-gp79b   1/1       Running             1          21h
kube-system   po/nginx-proxy-lenddo-vm-2                 1/1       Running             3          3d
kube-system   po/nginx-proxy-lenddo-vm-3                 1/1       Running             3          3d

NAMESPACE     NAME                           CLUSTER-IP      EXTERNAL-IP   PORT(S)         AGE
default       svc/deploy-heketi              10.233.53.200   <none>        8080/TCP        18h
default       svc/heketi-storage-endpoints   10.233.63.78    <none>        1/TCP           23h
default       svc/kubernetes                 10.233.0.1      <none>        443/TCP         3d
kube-system   svc/kube-dns                   10.233.0.3      <none>        53/UDP,53/TCP   3d
kube-system   svc/kubernetes-dashboard       10.233.33.172   <none>        80/TCP          3d

NAMESPACE   NAME                           DESIRED   SUCCESSFUL   AGE
default     jobs/heketi-storage-copy-job   1         0            18h

NAMESPACE     NAME                          DESIRED   CURRENT   UP-TO-DATE   AVAILABLE   AGE
default       deploy/deploy-heketi          1         1         1            1           18h
kube-system   deploy/kube-dns               2         2         2            2           3d
kube-system   deploy/kubedns-autoscaler     1         1         1            0           3d
kube-system   deploy/kubernetes-dashboard   1         1         1            1           3d

NAMESPACE     NAME                                 DESIRED   CURRENT   READY     AGE
default       rs/deploy-heketi-4089530887          1         1         1         18h
kube-system   rs/kube-dns-2410490047               2         2         2         3d
kube-system   rs/kubedns-autoscaler-4166808448     1         1         0         3d
kube-system   rs/kubernetes-dashboard-3307607089   1         1         1         3d

NAMESPACE   NAME           DESIRED   CURRENT   READY     UP-TO-DATE   AVAILABLE   NODE-SELECTOR           AGE
default     ds/glusterfs   3         3         3         3            3           storagenode=glusterfs   18h

NAMESPACE     NAME                           CLUSTER-IP      EXTERNAL-IP   PORT(S)         AGE
default       svc/deploy-heketi              10.233.53.200   <none>        8080/TCP        18h
default       svc/heketi-storage-endpoints   10.233.63.78    <none>        1/TCP           23h
default       svc/kubernetes                 10.233.0.1      <none>        443/TCP         3d
kube-system   svc/kube-dns                   10.233.0.3      <none>        53/UDP,53/TCP   3d
kube-system   svc/kubernetes-dashboard       10.233.33.172   <none>        80/TCP          3d

NAMESPACE     NAME                          DESIRED   CURRENT   UP-TO-DATE   AVAILABLE   AGE
default       deploy/deploy-heketi          1         1         1            1           18h
kube-system   deploy/kube-dns               2         2         2            2           3d
kube-system   deploy/kubedns-autoscaler     1         1         1            0           3d
kube-system   deploy/kubernetes-dashboard   1         1         1            1           3d

NAMESPACE     NAME                                         TYPE                                  DATA      AGE
default       secrets/default-token-s24hk                  kubernetes.io/service-account-token   3         3d
default       secrets/heketi-config-secret                 Opaque                                3         18h
default       secrets/heketi-service-account-token-3qr2x   kubernetes.io/service-account-token   3         18h
default       secrets/heketi-storage-secret                Opaque                                1         18h
kube-public   secrets/default-token-zc6gr                  kubernetes.io/service-account-token   3         3d
kube-system   secrets/default-token-7c50v                  kubernetes.io/service-account-token   3         21h

NAMESPACE     NAME                        SECRETS   AGE
default       sa/default                  1         3d
default       sa/heketi-service-account   1         18h
kube-public   sa/default                  1         3d
kube-system   sa/default                  1         3d

NAMESPACE   NAME      AGE
error: clusterRoleBinding is not namespaced
Justluckyg picture Justluckyg  ·  6 Oct 2017
0

@Justluckyg As I suspected. Yes, follow the guidance we just gave: run the abort, the check to see if there are any heketi svc or ep resources left. Report if there are any. Run the rm and wipefs -a commands on all storage nodes. Run the deploy again. If you get to this same state, with the copy job timing out, inspect the GlusterFS logs on the node being targeted for the mount. Also verify whether you can manually mount the gluster volume (mount -t glusterfs) from the node running the copy job.

jarrpa picture jarrpa  ·  6 Oct 2017
0

@jarrpa i just did the above, i was the getting the same. i manually deleted the heketi service that didnt get deleted by the gk-deploy command. but the heketi-storage is still failing

[[email protected] ~]# kubectl describe pod heketi-storage-copy-job-gkszq

Name:       heketi-storage-copy-job-gkszq
Namespace:  default
Node:       lenddo-vm-2/192.168.1.241
Start Time: Fri, 06 Oct 2017 12:08:03 +0800
Labels:     controller-uid=f2f74b50-aa4b-11e7-9eb4-000c29580815
        job-name=heketi-storage-copy-job
Annotations:    kubernetes.io/created-by={"kind":"SerializedReference","apiVersion":"v1","reference":{"kind":"Job","namespace":"default","name":"heketi-storage-copy-job","uid":"f2f74b50-aa4b-11e7-9eb4-000c29580815","...
Status:     Pending
IP:
Created By: Job/heketi-storage-copy-job
Controlled By:  Job/heketi-storage-copy-job
Containers:
  heketi:
    Container ID:
    Image:      heketi/heketi:dev
    Image ID:
    Port:       <none>
    Command:
      cp
      /db/heketi.db
      /heketi
    State:      Waiting
      Reason:       ContainerCreating
    Ready:      False
    Restart Count:  0
    Environment:    <none>
    Mounts:
      /db from heketi-storage-secret (rw)
      /heketi from heketi-storage (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-s24hk (ro)
Conditions:
  Type      Status
  Initialized   True
  Ready     False
  PodScheduled  True
Volumes:
  heketi-storage:
    Type:       Glusterfs (a Glusterfs mount on the host that shares a pod's lifetime)
    EndpointsName:  heketi-storage-endpoints
    Path:       heketidbstorage
    ReadOnly:       false
  heketi-storage-secret:
    Type:   Secret (a volume populated by a Secret)
    SecretName: heketi-storage-secret
    Optional:   false
  default-token-s24hk:
    Type:   Secret (a volume populated by a Secret)
    SecretName: default-token-s24hk
    Optional:   false
QoS Class:  BestEffort
Node-Selectors: <none>
Tolerations:    <none>
Events:
  FirstSeen LastSeen    Count   From            SubObjectPath   Type        Reason          Message
  --------- --------    -----   ----            -------------   --------    ------          -------
  5m        5m      1   default-scheduler           Normal      Scheduled       Successfully assigned heketi-storage-copy-job-gkszq to lenddo-vm-2
  5m        5m      1   kubelet, lenddo-vm-2            Normal      SuccessfulMountVolume   MountVolume.SetUp succeeded for volume "default-token-s24hk"
  5m        5m      1   kubelet, lenddo-vm-2            Normal      SuccessfulMountVolume   MountVolume.SetUp succeeded for volume "heketi-storage-secret"
  5m        5m      1   kubelet, lenddo-vm-2            Warning     FailedMount     MountVolume.SetUp failed for volume "heketi-storage" : glusterfs: mount failed: mount failed: exit status 1
Mounting command: mount
Mounting arguments: 192.168.1.242:heketidbstorage /var/lib/kubelet/pods/f2fc138d-aa4b-11e7-9eb4-000c29580815/volumes/kubernetes.io~glusterfs/heketi-storage glusterfs [log-level=ERROR log-file=/var/lib/kubelet/plugins/kubernetes.io/glusterfs/heketi-storage/heketi-storage-copy-job-gkszq-glusterfs.log backup-volfile-servers=192.168.1.240:192.168.1.241:192.168.1.242]
Output: ERROR: Mount point does not exist
Please specify a mount point
Usage:
man 8 /sbin/mount.glusterfs

 the following error information was pulled from the glusterfs log to help diagnose this issue:
/usr/sbin/glusterfs(+0x69a0)[0x7f0d59d799a0]
---------

  5m    1m  9   kubelet, lenddo-vm-2        Warning FailedMount MountVolume.SetUp failed for volume "heketi-storage" : stat /var/lib/kubelet/pods/f2fc138d-aa4b-11e7-9eb4-000c29580815/volumes/kubernetes.io~glusterfs/heketi-storage: transport endpoint is not connected
  3m    1m  2   kubelet, lenddo-vm-2        Warning FailedMount Unable to mount volumes for pod "heketi-storage-copy-job-gkszq_default(f2fc138d-aa4b-11e7-9eb4-000c29580815)": timeout expired waiting for volumes to attach/mount for pod "default"/"heketi-storage-copy-job-gkszq". list of unattached/unmounted volumes=[heketi-storage]
  3m    1m  2   kubelet, lenddo-vm-2        Warning FailedSync  Error syncing pod

and this is the glusterd logs from that node:

[2017-10-06 04:07:58.903816] I [rpc-clnt.c:1059:rpc_clnt_connection_init] 0-management: setting frame-timeout to 600
[2017-10-06 04:07:58.918264] I [rpc-clnt.c:1059:rpc_clnt_connection_init] 0-snapd: setting frame-timeout to 600
[2017-10-06 04:07:58.918736] I [rpc-clnt.c:1059:rpc_clnt_connection_init] 0-nfs: setting frame-timeout to 600
[2017-10-06 04:07:58.918837] I [MSGID: 106132] [glusterd-proc-mgmt.c:83:glusterd_proc_stop] 0-management: nfs already stopped
[2017-10-06 04:07:58.918874] I [MSGID: 106568] [glusterd-svc-mgmt.c:228:glusterd_svc_stop] 0-management: nfs service is stopped
[2017-10-06 04:07:58.919237] I [rpc-clnt.c:1059:rpc_clnt_connection_init] 0-glustershd: setting frame-timeout to 600
[2017-10-06 04:07:58.919905] I [MSGID: 106132] [glusterd-proc-mgmt.c:83:glusterd_proc_stop] 0-management: glustershd already stopped
[2017-10-06 04:07:58.919973] I [MSGID: 106568] [glusterd-svc-mgmt.c:228:glusterd_svc_stop] 0-management: glustershd service is stopped
[2017-10-06 04:07:58.920005] I [MSGID: 106567] [glusterd-svc-mgmt.c:196:glusterd_svc_start] 0-management: Starting glustershd service
[2017-10-06 04:07:59.926164] I [rpc-clnt.c:1059:rpc_clnt_connection_init] 0-quotad: setting frame-timeout to 600
[2017-10-06 04:07:59.926590] I [rpc-clnt.c:1059:rpc_clnt_connection_init] 0-bitd: setting frame-timeout to 600
[2017-10-06 04:07:59.926736] I [MSGID: 106132] [glusterd-proc-mgmt.c:83:glusterd_proc_stop] 0-management: bitd already stopped
[2017-10-06 04:07:59.926772] I [MSGID: 106568] [glusterd-svc-mgmt.c:228:glusterd_svc_stop] 0-management: bitd service is stopped
[2017-10-06 04:07:59.926899] I [rpc-clnt.c:1059:rpc_clnt_connection_init] 0-scrub: setting frame-timeout to 600
[2017-10-06 04:07:59.927069] I [MSGID: 106132] [glusterd-proc-mgmt.c:83:glusterd_proc_stop] 0-management: scrub already stopped
[2017-10-06 04:07:59.927103] I [MSGID: 106568] [glusterd-svc-mgmt.c:228:glusterd_svc_stop] 0-management: scrub service is stopped
[2017-10-06 04:08:02.146806] I [run.c:191:runner_log] (-->/usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so(+0xcf4fa) [0x7f63cb75a4fa] -->/usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so(+0xcefac) [0x7f63cb759fac] -->/lib64/libglusterfs.so.0(runner_log+0x105) [0x7f63d0781ee5] ) 0-management: Ran script: /var/lib/glusterd/hooks/1/start/post/S29CTDBsetup.sh --volname=heketidbstorage --first=yes --version=1 --volume-op=start --gd-workdir=/var/lib/glusterd
[2017-10-06 04:08:02.156404] E [run.c:191:runner_log] (-->/usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so(+0xcf4fa) [0x7f63cb75a4fa] -->/usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so(+0xcef52) [0x7f63cb759f52] -->/lib64/libglusterfs.so.0(runner_log+0x105) [0x7f63d0781ee5] ) 0-management: Failed to execute script: /var/lib/glusterd/hooks/1/start/post/S30samba-start.sh --volname=heketidbstorage --first=yes --version=1 --volume-op=start --gd-workdir=/var/lib/glusterd

>/lib64/libglusterfs.so.0(runner_log+0x105) [0x7f63d0781ee5] ) 0-management: Failed to execute script: /var/lib/glusterd/hooks/1/start/post/S30samba-start.sh --volname=heketidbstorage --first=yes --version=1 --volume-op=start --gd-workdir=/var/lib/glusterd

I did the mount -t glusterfs from node2, didnt make any difference...

Justluckyg picture Justluckyg  ·  6 Oct 2017
0

@Justluckyg Okay, but you were able to mount, yes? And what was the exact name of the service you had to delete?

jarrpa picture jarrpa  ·  6 Oct 2017
0

@jarrpa with the mount -t glusterfs? i only did that after trying to run the gk-deploy -gvy script while the heketi storage copy job was containercreating.

the heketi-service was the one i deleted.

Justluckyg picture Justluckyg  ·  6 Oct 2017
0

@Justluckyg Right, but it worked? And what was the EXACT name of the service, was it heketi-storage-endpoints?

jarrpa picture jarrpa  ·  6 Oct 2017
0

@jarrpa when i entered the command mount -t glusterfs it took it, there was no error. and yes for the service. heketi-storage-endpoints

Justluckyg picture Justluckyg  ·  6 Oct 2017
0

@Justluckyg Idea: Try mounting the GlusterFS volume again, on a node not running the copy job pod, and see if you can do an ls on the mounted directory.

jarrpa picture jarrpa  ·  6 Oct 2017
0

@jarrpa should i just do mount -t glusterfs or is there anything after that? because when i do that on node 3 and do df -h, i dont see any newly mounted directory?

Justluckyg picture Justluckyg  ·  6 Oct 2017
0

@Justluckyg Yes: Create a new directory somewhere, then run mount -t glusterfs 192.168.1.242:heketidbstorage <SOME_DIR>. This should tell us if the GlusterFS volume can be accessed from the node you're working on. Afterwards, run umount <SOME_DIR> to unmount the volume.

jarrpa picture jarrpa  ·  6 Oct 2017
0

@jarrpa its unable to mount..
[[email protected] ~]# mkdir test
[[email protected] ~]# mount -t glusterfs 192.168.1.242:heketidbstorage test
Mount failed. Please check the log file for more details.

Justluckyg picture Justluckyg  ·  6 Oct 2017
0

@Justluckyg Ah-ha! try running the mount command again with -v.

jarrpa picture jarrpa  ·  6 Oct 2017
0

@jarrpa it didnt take it

[email protected] ~]# mount -v -t  glusterfs 192.168.1.242:heketidbstorage test
/sbin/mount.glusterfs: illegal option -- v
Usage: /sbin/mount.glusterfs <volumeserver>:<volumeid/volumeport> -o<options> <mountpoint>
Options:
man 8 /sbin/mount.glusterfs
To display the version number of the mount helper: /sbin/mount.glusterfs -V
Justluckyg picture Justluckyg  ·  6 Oct 2017
0

@Justluckyg Irritating... see if there are any logs in /var/log that mention heketidbstorage, something like grep -R heketidbstorage /var/log.

jarrpa picture jarrpa  ·  6 Oct 2017
0

@jarrpa theres a bunch, not sure which one will be helpful:

so i put it here

Justluckyg picture Justluckyg  ·  6 Oct 2017
0

@Justluckyg Good start. Look for errors towards the end of /var/log/glusterfs/bricks/var-lib-heketi-mounts-vg_bb208485241a154b8d3070d2da837a53-brick_80e7daa7eb94cae8e4d1c81ccbdad92b-brick.log.

jarrpa picture jarrpa  ·  6 Oct 2017
0

@Justluckyg Also /var/log/glusterfs/root-test.log.

jarrpa picture jarrpa  ·  6 Oct 2017
0

@jarrpa no error there. all Informational line. But in glusterfs/glusterd.log, im getting this:
same on the node where the copy-job is trying to create.

[2017-10-06 04:08:01.705565] I [run.c:191:runner_log] (-->/usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so(+0xcf4fa) [0x7fa89dc9b4fa] -->/usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so(+0xcefac) [0x7fa89dc9afac] -->/lib64/libglusterfs.so.0(runner_log+0x105) [0x7fa8a2cc2ee5] ) 0-management: Ran script: /var/lib/glusterd/hooks/1/start/post/S29CTDBsetup.sh --volname=heketidbstorage --first=yes --version=1 --volume-op=start --gd-workdir=/var/lib/glusterd
[2017-10-06 04:08:01.898044] E [run.c:191:runner_log] (-->/usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so(+0xcf4fa) [0x7fa89dc9b4fa] -->/usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so(+0xcef52) [0x7fa89dc9af52] -->/lib64/libglusterfs.so.0(runner_log+0x105) [0x7fa8a2cc2ee5] ) 0-management: Failed to execute script: /var/lib/glusterd/hooks/1/start/post/S30samba-start.sh --volname=heketidbstorage --first=yes --version=1 --volume-op=start --gd-workdir=/var/lib/glusterd

```
[[email protected] glusterfs]# cat /var/log/glusterfs/root-test.log
[2017-10-06 13:02:02.140216] I [MSGID: 100030] [glusterfsd.c:2338:main] 0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs version 3.7.20 (args: /usr/sbin/glusterfs --volfile-server=192.168.1.242 --volfile-id=heketidbstorage /root/test)
pending frames:
patchset: git://git.gluster.com/glusterfs.git
signal received: 11
time of crash:
2017-10-06 13:02:02
configuration details:
argp 1
backtrace 1
dlfcn 1
libpthread 1
llistxattr 1
setfsid 1
spinlock 1
epoll.h 1
xattr.h 1
st_atim.tv_nsec 1
package-string: glusterfs 3.7.20
/lib64/libglusterfs.so.0(_gf_msg_backtrace_nomem+0xc2)[0x7f6a92360722]
/lib64/libglusterfs.so.0(gf_print_trace+0x31d)[0x7f6a92385ddd]
/lib64/libc.so.6(+0x35250)[0x7f6a909d4250]
/lib64/libglusterfs.so.0(gf_ports_reserved+0x142)[0x7f6a92386442]
/lib64/libglusterfs.so.0(gf_process_reserved_ports+0x7e)[0x7f6a923866be]
/usr/lib64/glusterfs/3.7.20/rpc-transport/socket.so(+0xc958)[0x7f6a86d50958]
/usr/lib64/glusterfs/3.7.20/rpc-transport/socket.so(client_bind+0x93)[0x7f6a86d50d83]
/usr/lib64/glusterfs/3.7.20/rpc-transport/socket.so(+0xa153)[0x7f6a86d4e153]
/lib64/libgfrpc.so.0(rpc_clnt_reconnect+0xc9)[0x7f6a9212de19]
/lib64/libgfrpc.so.0(rpc_clnt_start+0x39)[0x7f6a9212ded9]
/usr/sbin/glusterfs(glusterfs_mgmt_init+0x24c)[0x7f6a928493ac]
/usr/sbin/glusterfs(glusterfs_volumes_init+0x46)[0x7f6a928442b6]
/usr/sbin/glusterfs(main+0x810)[0x7f6a92840860]
/lib64/libc.so.6(__libc_start_main+0xf5)[0x7f6a909c0b35]

/usr/sbin/glusterfs(+0x69a0)[0x7f6a928409a0]

no root-test.log in the node where the copy-job is trying to create. lenddo-vm-2

Justluckyg picture Justluckyg  ·  6 Oct 2017
0

@Justluckyg You can ignore those errors, they're "normal". :)

The root-test log would only appear on nodes where you've mounted a gluster volume to /root/test.

Do me a favor and check the version of Gluster running in the pods (gluster --version) and the version of the GlusterFS FUSE client on your nodes (mount.glusterfs -V).

jarrpa picture jarrpa  ·  6 Oct 2017
0

@jarrpa same on all 3

[[email protected] glusterfs]# mount.glusterfs -V
glusterfs 3.7.20 built on Jan 30 2017 15:30:07
Repository revision: git://git.gluster.com/glusterfs.git
Copyright (c) 2006-2013 Red Hat, Inc. <http://www.redhat.com/>
GlusterFS comes with ABSOLUTELY NO WARRANTY.
It is licensed to you under your choice of the GNU Lesser
General Public License, version 3 or any later version (LGPLv3
or later), or the GNU General Public License, version 2 (GPLv2),
in all cases as published by the Free Software Foundation.
[[email protected] glusterfs]# gluster --version
glusterfs 3.7.20 built on Jan 30 2017 15:30:09
Repository revision: git://git.gluster.com/glusterfs.git
Copyright (c) 2006-2011 Gluster Inc. <http://www.gluster.com>
GlusterFS comes with ABSOLUTELY NO WARRANTY.
You may redistribute copies of GlusterFS under the terms of the GNU General Public License.

Justluckyg picture Justluckyg  ·  6 Oct 2017
0

@Justluckyg Hmm... that might be too old. Do you have any way of updating that?

jarrpa picture jarrpa  ·  6 Oct 2017
0

@jarrpa i noticed upgrading it is not as easy as just doing a yum upgrade as it throws a lot of dependency errors.

Justluckyg picture Justluckyg  ·  6 Oct 2017
0

@Justluckyg Darn. Well, that's my current thinking, unfortunately. Can you try to see if you can resolve the dependency issues?

jarrpa picture jarrpa  ·  6 Oct 2017
0

@jarrpa sure ill try that, do you have any preferred stable version?

Justluckyg picture Justluckyg  ·  6 Oct 2017
0

@Justluckyg Preferably at or newer than the version in the GlusterFS pods, which I think is 3.10.5.

jarrpa picture jarrpa  ·  6 Oct 2017
3

whew... after a month of testing and your unwavering support @jarrpa @SaravanaStorageNetwork... I finally successfully completed the deployment and deployed my first storage pvc using your sample nginx hello world application!
So your recommendation to upgrade the gluster ultimately resolved the issue where the heketi-storage pod was not creating @jarrpa. im now running 3.8.4 of glusterfs-fuse package.

I cant thank you enough for helping me! Now I am off to testing this further, with that I am glad to close this issue now :)

Justluckyg picture Justluckyg  ·  9 Oct 2017
3

YEEEESSS!! HAH! :D Happy to hear that! Always feel free to come back any time you need additional help. Or if you just want to gice us praise, we won't turn that down either. ;)

jarrpa picture jarrpa  ·  9 Oct 2017
0

@jarrpa I have the right version of mount.glusterfs, and glusterFS is running on 3 nodes. However, I still see the error: "Waiting for GlusterFS pods to start ... pods not found."

verizonold picture verizonold  ·  6 Jun 2018
0

@verizonold If you are still having trouble please open a new issue and provide any information about your environment, what you've done, as well as the output of kubectl logs <heketi_pod>.

jarrpa picture jarrpa  ·  11 Jun 2018
Source: Link