Kubernetes: Explain why not subnets are found for creating an ELB on AWS

0

Hi,
I created a Kubernetes cluster from coreos-aws (with existing VPC and subnets). I can't create an ELB on it.

The file:

apiVersion: v1
kind: Service
metadata:
  name: nginxservice
  labels:
    name: nginxservice
spec:
  ports:
    - port: 80
  selector:
    app: nginx
  type: LoadBalancer

The command:

kubectl kubeconfig=kubeconfig describe svc nginxservice
Name:           nginxservice
Namespace:      default
Labels:         name=nginxservice
Selector:       app=nginx
Type:           LoadBalancer
IP:         172.18.200.63
Port:           <unnamed>   80/TCP
NodePort:       <unnamed>   31870/TCP
Endpoints:      172.18.29.3:80,172.18.31.3:80
Session Affinity:   None
Events:
  FirstSeen             LastSeen            Count   From            SubobjectPath   Reason          Message
  Wed, 20 Jul 2016 18:41:15 +0200   Wed, 20 Jul 2016 18:41:20 +0200 2   {service-controller }           CreatingLoadBalancer    Creating load balancer
  Wed, 20 Jul 2016 18:41:15 +0200   Wed, 20 Jul 2016 18:41:20 +0200 2   {service-controller }           CreatingLoadBalancerFailed  Error creating load balancer (will retry): Failed to create load balancer for service default/nginxservice: could not find any suitable subnets for creating the ELB

I added manually the missing tag KubernetesCluster on the subnet w/o result. Can you add an clear message about what is missing?

sdouche picture sdouche  路  20 Jul 2016

Most helpful comment

98

Error creating load balancer (will retry): Failed to create load balancer for service default/my-nginx: could not find any suitable subnets for creating the ELB

@JayBee6 @cemo @rokka-n @cyberroadie @druidsbane @sdouche YMMV, but below is what I do and is my understanding about targeting AWS ELBs created by k8s. I'm using clusters created with kube-aws (so AWS CloudFormation) to deploy to private subnets on existing VPCs. This sounds like a similar situation to the discussion above. I don't do anything manually (馃挬馃挬馃挬), so the below might be useful.

When you are deploying into an existing AWS VPC they can be dozens of existing subnets and k8s has no way to work out which one should house the external load balancers. AFAIK k8s only checks the subnets the cluster is actually housed in, so if none of those subnets are public (i.e. have an Internet gateway) then you'll get this error above.

This is easily fixed by tagging the public subnet(s) that you want k8s to use. Pick the public subnet you what and add _both_ the following tags (the value of the second tag is blank). I usually tag one DMZ subnet in each AZ the cluster occupies.

| Name | Value |
|-|-|
| KubernetesCluster | your-cluster-name |
| kubernetes.io/role/elb | |

@manojlds Likewise for internal ELBs. You add the annotation service.beta.kubernetes.io/aws-load-balancer-internal: "0.0.0.0/0" to your Ingress, and tag the preferred subnet(s) for internal load balancers with _both_ of the following tags.

| Name | Value |
|-|-|
| KubernetesCluster | your-cluster-name |
| kubernetes.io/role/internal-elb | |

These tags are great having the KubernetesCluster, as it makes it easy to have the ELBs turn up in the right places when you have multiple k8s clusters in the same VPC. So I just tag the target subnets for ELBs as part of the cluster creation, and everything is automagic after that.

Real documentation for this is indeed lacking, but 'go' is highly readable. I found the stuff above by surfing the source code for interesting tags and following those constants around to see what they did.

whereisaaron picture whereisaaron  路  23 Jan 2017

All comments

0

cc @pwittrock Not sure if this is support or an actual issue?

apelisse picture apelisse  路  20 Jul 2016
0

@sdouche are you perhaps running out of free IP addresses in your subnet? Each ELB will also need to a separate network interface on (each) target subnet, and I believe the rule is either 5 or 8 free addresses in the subnet for ELB creation to be allowed.

@cgag ran into this recently during some operational work here at coreos and told me about it.

colhom picture colhom  路  21 Jul 2016
0

Hi @colhom,
I've 240 free IPs on the subnet. The issue is to find a network, not a free IP.

sdouche picture sdouche  路  21 Jul 2016
0

@sdouche could I see your diff that allows you to deploy to an existing subnet? I've been curious to see how folks are doing this- we use route tables and vpc peering heavily, so in our case we have no need to deploy to the same subnet.

colhom picture colhom  路  21 Jul 2016
0

Or are you just modifying the stack-template.json after render but prior to up?

colhom picture colhom  路  21 Jul 2016
0

Just modified the stack-template.json and removed the creation of network items (more details here: https://github.com/coreos/coreos-kubernetes/issues/340)

sdouche picture sdouche  路  21 Jul 2016
10

@sdouche Are those subnets private? A public ELB can't be created in private subnets. K8s will get all subnets tagged with correct KubernetesCluster, then ignore private subnets for public ELB.

You can try to tag a public subnet with correct KubernetesCluster, then wait k8s to retry to create the ELB in that subnet.

qqshfox picture qqshfox  路  21 Jul 2016
0

@qqshfox good point, it's a private subnet. Why private subnets are ignored? How to create a private cluster?

sdouche picture sdouche  路  21 Jul 2016
-20

You can create an internal ELB by using some magic k8s metadata tag.

qqshfox picture qqshfox  路  21 Jul 2016
3

"some magic k8s metadata tag"? What are they?

sdouche picture sdouche  路  21 Jul 2016
2

@sdouche You have to tag your subnet with the "KubernetesCluster" tag. I see you used kube-aws before, you can look at that for inspiration on how to to properly create your subnets. Also note that making a loadbalancer in a private subnet doesn't make much sense if you want to expose a service to the world (can't route in)

pieterlange picture pieterlange  路  21 Jul 2016
0

Hi @pieterlange. Ok so, if I want a private cluster, how to expose services and pods w/o ELB? Do I need to route the 2 overlay networks? How to do that? I suppose with Flannel's aws backend.

sdouche picture sdouche  路  21 Jul 2016
0

I do not understand what you're trying to accomplish so it's a little bit difficult to help. Issues like these (this is starting to look like a support request) are better solved through slack chat or stackoverflow as there's no actionable material for the developers here. I suggest closing the ticket and trying over there.

pieterlange picture pieterlange  路  21 Jul 2016
6

You're right, sorry. Back to the initial request: I think it would be better to write: "could not find any public subnets for creating the ELB" (for public ELB of course, which is the default option). What do you think?

sdouche picture sdouche  路  21 Jul 2016
1

@justinsb WDYT?

pwittrock picture pwittrock  路  22 Jul 2016
11

How to create a private ELB with private subnets?

manojlds picture manojlds  路  7 Oct 2016
0

Some information in https://github.com/kubernetes/kubernetes/issues/17620 about private ELBs.

cknowles picture cknowles  路  12 Nov 2016
0

Has anyone gotten this to work recently? I can get it to create the internal/private ELB but none of the node machines are added to the ELB. If I manually add them everything works fine, so it is set up properly except for adding the ASG for the nodes or adding the nodes themselves.

@justinsb Is there some annotation I need to use possibly to allow it to find the nodes it needs to add to the private ELB? I'm creating the cluster with _kubeadm_ to join the nodes and the AWS cloud provider integration. The subnets, vpcs and autoscaling groups are all tagged with "KubernetesCluster" and a name. That does propagate to the ELB, but none of the node instances are picked up. I don't see anything specific in the code to add the node ASG to the ELB based on annotation...

druidsbane picture druidsbane  路  15 Nov 2016
9

I have the same problem. I've got Kubernetes running in a private subnet. To explain it a bit further (this is AWS specific). Our infrastructure team has created specific requirements regarding security. We need to have three layers (subnets) in one VPC zone. Diagram:

| type | connection | components |
| --- | --- | --- |
| public subnet | internet gateway | ELB |
| private subnet 1 | nat gateway | kubernetes (master/nodes) |
| private subnet 2 | direct connect | proxy for on premise server access |

For this to work I had to manually create a ELB in layer 1 (public subnet) and point them to the master nodes in layer 2 (private subnet 1). I also installed the dashboard and this works fine together with the kubectl command line tool. (Both are exposed to the internet)

However when I deploy an app (e.g. nginx) I get the following error:

_Error creating load balancer (will retry): Failed to create load balancer for service default/my-nginx: could not find any suitable subnets for creating the ELB_

The Kubernetes dashboard says the service-controller is the source of this. And when I run:

 $ kubectl get services

it outputs:

    NAME         CLUSTER-IP      EXTERNAL-IP   PORT(S)   AGE
    kubernetes   100.xx.x.1      <none>        443/TCP   3h
    my-nginx     100.xx.xx.99    <pending>     80/TCP    1h

Is there a way to tell the controller which subnet it should use to create the load balancer for the service?

cyberroadie picture cyberroadie  路  19 Nov 2016
5
  1. Still the same problem with provisioning ELBs for ingress for instances in private subnets.

But no worries, kubernetes is built upon 15 years of experience of running production workloads at Google. Amazon will fix their ELBs sometimes soon.

rokka-n picture rokka-n  路  13 Jan 2017
2

@cyberroadie How did you solve your problem? I am in the same situation and no idea to resolve problem.

cemo picture cemo  路  17 Jan 2017
-6

Manually creating the routes via the AWS web interface.

cyberroadie picture cyberroadie  路  17 Jan 2017
1

@cemo @rokka-n @cyberroadie i was able to fix this by tagging the subnets in aws

JayBee6 picture JayBee6  路  17 Jan 2017
0

@JayBee6 @cyberroadie

I had given a try today and found that two errors of mine. I prepared my environment by Terraform and kubeadm.

  1. In kubeadm I missed add to aws provider configuration.
  2. I have checked aws.go source codes and found that it has some parts related to subnet configurations. It seems that I missed adding some tags to subnets.

After that I have successfully created an ELB but there were not instances attached this configuration. Maybe I need to add some tags to instance as well. Any idea about this @JayBee6 ?

cemo picture cemo  路  17 Jan 2017
0

@cemo did you create the loadbalancers manually? if yes then you might have to. I have the following tags in my instances.

KubernetesCluster : clustername
kz8s : clustername

JayBee6 picture JayBee6  路  17 Jan 2017
2

@JayBee6 I have exposed my kubernete service as a loadBalancer. It created an AWS ELB but instances were not attached to it. I manually attached instances and everything started to work.

cemo picture cemo  路  17 Jan 2017
98

Error creating load balancer (will retry): Failed to create load balancer for service default/my-nginx: could not find any suitable subnets for creating the ELB

@JayBee6 @cemo @rokka-n @cyberroadie @druidsbane @sdouche YMMV, but below is what I do and is my understanding about targeting AWS ELBs created by k8s. I'm using clusters created with kube-aws (so AWS CloudFormation) to deploy to private subnets on existing VPCs. This sounds like a similar situation to the discussion above. I don't do anything manually (馃挬馃挬馃挬), so the below might be useful.

When you are deploying into an existing AWS VPC they can be dozens of existing subnets and k8s has no way to work out which one should house the external load balancers. AFAIK k8s only checks the subnets the cluster is actually housed in, so if none of those subnets are public (i.e. have an Internet gateway) then you'll get this error above.

This is easily fixed by tagging the public subnet(s) that you want k8s to use. Pick the public subnet you what and add _both_ the following tags (the value of the second tag is blank). I usually tag one DMZ subnet in each AZ the cluster occupies.

| Name | Value |
|-|-|
| KubernetesCluster | your-cluster-name |
| kubernetes.io/role/elb | |

@manojlds Likewise for internal ELBs. You add the annotation service.beta.kubernetes.io/aws-load-balancer-internal: "0.0.0.0/0" to your Ingress, and tag the preferred subnet(s) for internal load balancers with _both_ of the following tags.

| Name | Value |
|-|-|
| KubernetesCluster | your-cluster-name |
| kubernetes.io/role/internal-elb | |

These tags are great having the KubernetesCluster, as it makes it easy to have the ELBs turn up in the right places when you have multiple k8s clusters in the same VPC. So I just tag the target subnets for ELBs as part of the cluster creation, and everything is automagic after that.

Real documentation for this is indeed lacking, but 'go' is highly readable. I found the stuff above by surfing the source code for interesting tags and following those constants around to see what they did.

whereisaaron picture whereisaaron  路  23 Jan 2017
1

Well that just some tags hidden in the go code, put it in production man
What possibly could go wrong with it?

rokka-n picture rokka-n  路  23 Jan 2017
0

I actually still have trouble getting this all setup.... I have to set it up manually as we already have quite an infrastructure and assets that we have to conform to.

At this time, I've gotten to the point where I can get K8 to create an internal ELB pointing to the right subnets and what not... But... it does not attach any instances where the pods have launched... I have my Kube minions tagged properly KubernetesCluster : clustername... but I dunno what else is missing.

vchan2002 picture vchan2002  路  15 Feb 2017
0

I currently have implemented in AWS a single VPC with multiple k8s clusters each in their own respective private subnet. I have a public subnet too that I want all the individual clusters to use for deploying external ELBs.

If I don't set the "KubernetesCluster" tag, everything works as expected with the exception of not really knowing which k8s created resources belong to which k8s cluster.

Of course, setting the "KubernetesCluster" tag for all the relevant resources respective of the cluster name means I can easily identify which k8s resources belong to which k8s cluster, but how do I tag the shared public subnets?

Simply omitting the tag results in no ELB being created for a service, as there is not suitable place to put one. And adding the tag results in the public subnets being locked to a single k8s cluster.

Is there anyway to "share" the public subnet??

benbooth493 picture benbooth493  路  16 Feb 2017
11

@benbooth493 In 1.5 there was no way to share a subnet for creating on-demand load balancers between two Kubernetes clusters. The KubernetesCluster tag name was hard coded. But it was easy enough to create a couple of smallish public subnets for each cluster for this purpose.

In 1.6 that is fixed with a new tag where you embed the cluster name in the tag name, e.g. 'kubernetes.io/cluster/MyCluster'

// TagNameKubernetesClusterPrefix is the tag name we use to differentiate multiple
// logically independent clusters running in the same AZ.
// The tag key = TagNameKubernetesClusterPrefix + clusterID
// The tag value is an ownership value
const TagNameKubernetesClusterPrefix = "kubernetes.io/cluster/"

// TagNameKubernetesClusterLegacy is the legacy tag name we use to differentiate multiple
// logically independent clusters running in the same AZ.  The problem with it was that it
// did not allow shared resources.
const TagNameKubernetesClusterLegacy = "KubernetesCluster"
whereisaaron picture whereisaaron  路  20 May 2017
19

I just had the same issue, but solved it in a different way.

I created my Kubenetes cluster in AWS using Kops. All my subnets and my ELB were tagged as described in this issue but still got Error creating load balancer (will retry): Failed to create load balancer for service blablabla: could not find any suitable subnets for creating the ELB.

The fix is described at https://github.com/kubernetes/kops/issues/2266. I added service.beta.kubernetes.io/aws-load-balancer-internal: 0.0.0.0/0 to my service config file and Kubernetes could finally set the ELB as the endpoint of my service.

apiVersion: v1
kind: Service
metadata:
  name: elasticsearch
  labels:
    app: elasticsearch
  annotations:
    service.beta.kubernetes.io/aws-load-balancer-internal: 0.0.0.0/0
spec:
  type: LoadBalancer
  selector:
    app: elasticsearch
  ports:
  - name: http
    port: 9200
    protocol: TCP 
  - name: transport
    port: 9300
    protocol: TCP
arthurjguerra picture arthurjguerra  路  2 Nov 2017
0

@whereisaaron what is an "ownership value" in this case?

2rs2ts picture 2rs2ts  路  9 Jan 2018
31

@2rs2ts the ownership value for the kubernetes.io/cluster/<your-cluster-id> tag is I think either "owned" or "shared". But I am not sure the values matters for 'hasClusterTag' to work.

You can read the code @2rs2ts to understand the process of finding a subnet.

  1. The AWS support first finds all the subnets you have associated with your cluster via the tag kubernetes.io/cluster/<your-cluster-id>
    (You'll note the same subnet can be tagged for use by more than one cluster this way). Subnets without this tag are ignored and not considered. If no subnets are tagged only the current subnet is considered.

  2. (a) For external loadbalancers (the default), any subnets that aren't public are excluded (who's routing table doesn't have an Internet Gateway route). Then it will look for the kubernetes.io/role/elb tag on the remaining subnets and pick one of those. Or if no public subnets are tagged, one gets picked at random.

  3. (b) For internal loadbalancers (which you indicate you want using the service.beta.kubernetes.io/aws-load-balancer-internal: 0.0.0.0/0 annotation), it will first go for those with the kubernetes.io/role/internal-elb tag, or else pick one at random.

This process is repeated for each AZ your cluster occupies.

whereisaaron picture whereisaaron  路  11 Jan 2018
0

I have a bit of a problem with this design... it's not clear how to use the shared value at all when you're running many clusters in a subnet without forcing each cluster to . I have a system which bootstraps ephemeral single-node Kubernetes clusters in an auto scaling group and allows users to claim them on-demand for 1 - 24 hours (mostly used for experiments and running tests). The number of clusters can quickly be in the hundreds and there is a tag limit on AWS resources of 50.

A couple problems:

  1. More clusters than available tags for shared subnets.
  2. I need to add setup / teardown routines for shared resources into the resource provisioner. Right now that's a cloud-init script that runs at boot for init. For cleanup there is nothing.
  3. I have no way to control which subnets get assigned a kubernetes instance in the pool so the auto scaler might drop one into a subnet with more than 50 other clusters.
plombardi89 picture plombardi89  路  7 Feb 2018
0

IMHO I think @plombardi89 that kind of a mis-use of an autoscaler :smile: Since 'claimed' nodes are pets not cattle. However, if you want to go this way, then I can suggest you create the autoscaler to create t2.nano instances (somewhere), with a cloudinit script that uses a CloudFormation template to create a tiny subnet, with the one-node cluster and any subnet tags. When the t2.nano gets the scale-down or shutdown request, delete the CloudFormation stack to clean up the cluster and its tiny subnet.

whereisaaron picture whereisaaron  路  7 Feb 2018
1

@whereisaaron I agree it's a bit of a misuse but it's not a pet vs. cattle distinction IMO. We use the autoscaler to always ensure there are single node instances of Kubernetes available to claimed. A claim request detaches the instance from the autoscaler and for hours it can be used by a developer or for automated testing. At the end of that period the instance is terminated and never heard from again. Using the autoscaler this way is nice because there is no code needed to manage the pool capacity. Most clusters once claimed are used for a handful of minutes before being discarded.

The only thing shared by the claimed instances are VPC and subnets. It feels like there should be another way to tell Kubernetes "Hey these subnets are perfectly valid to deploy ELB's into" that doesn't rely on tags... maybe a configuration flag or using a Dynamo table to track this information.

plombardi89 picture plombardi89  路  9 Feb 2018
1

I think tags are the correct mechanism @plombardi89 you'll have to propose a patch for hasClusterTag() for findSubnets() that supports a wildcard tag like just kubernetes.io/cluster or kubernetes.io/cluster/_all

whereisaaron picture whereisaaron  路  9 Feb 2018
0

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

fejta-bot picture fejta-bot  路  14 Jun 2018
0

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten
/remove-lifecycle stale

fejta-bot picture fejta-bot  路  14 Jul 2018
0

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

fejta-bot picture fejta-bot  路  13 Aug 2018
1

Relevant documentation in AWS: "Cluster VPC Considerations" https://docs.aws.amazon.com/eks/latest/userguide/network_reqs.html

If using EKS, tagging of the VPC & subnets referenced in your EKS cluster appears to be automatic. However, it may be necessary to tag additional subnets.

rehevkor5 picture rehevkor5  路  12 Jun 2019
0

@sdouche Are those subnets private? A public ELB can't be created in private subnets. K8s will get all subnets tagged with correct KubernetesCluster, then ignore private subnets for public ELB.

You can try to tag a public subnet with correct KubernetesCluster, then wait k8s to retry to create the ELB in that subnet.

You are right. I tagged my public subnet with KubernetesCluster(cluster-name) , then recreate ingress-nginx . now I cant use the ELB access my private net application!

Trisia picture Trisia  路  20 Sep 2019
0

@whereisaaron
as you said , subnet need a tag , so that kubernetes can find the correct sunbets .

linbingdouzhe picture linbingdouzhe  路  15 Oct 2019
1

I manage to restrict internal load balance to only intra_subet with the help of tags

```
public_subnet_tags = {
"kubernetes.io/cluster/${var.eks_cluster_name}" = "shared"
"kubernetes.io/role/elb" = "1"
}

private_subnet_tags = {
"kubernetes.io/cluster/${var.eks_cluster_name}" = "shared"
}

intra_subnet_tags = {
"kubernetes.io/cluster/${var.eks_cluster_name}" = "shared"
"kubernetes.io/role/internal-elb" = "1"
}

deveshmehta picture deveshmehta  路  31 Jan 2020
0

I would like only to add that in pivotal container service (pks) you have to tag the elb subnet this way:
kubernetes.io/cluster/service-instance_UUID
where UUID is something like 7as7dcc6-d46c-48b4-8e33-364f795a88e3
and leave the value empty
Hope this helps!

nimmichele picture nimmichele  路  26 Feb 2020