<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0"><channel><title><![CDATA[Marcus Noble]]></title><description><![CDATA[The blog of Marcus Noble, self-described tinkerer, platform engineer and all round average guy!]]></description><link>https://marcusnoble.co.uk</link><generator>metalsmith-feed</generator><lastBuildDate>Mon, 16 Mar 2026 08:06:28 GMT</lastBuildDate><atom:link href="https://marcusnoble.co.uk/feed.xml" rel="self" type="application/rss+xml"/><item><title><![CDATA[Investigating and fixing "StopPodSandbox from runtime service failed" Kubelet errors]]></title><description><![CDATA[<p>For months now (maybe years, who knows 🤷) I've had the following error sprinkled throughout my Kubelet logs across multiple of my worker nodes:</p>
<pre><code class="language-plain">StopPodSandbox from runtime service failed: rpc error: code = Unknown desc = failed to destroy network for sandbox &quot;055b221e44a28ce8d9120f771d5e1ef201f2457ce49c58999a0147104cca2708&quot;: failed to get network &quot;cbr0&quot; cached result: decoding version from network config: unexpected end of JSON input
</code></pre>
<p>As there has never been any associated impact with this that I could see I just ignored it... for a long time!</p>
<p>Well, fast forward to last Friday night where a sudden hyper-focus hit and I decided I wanted to know what was going on and how to stop it - here's how it went.</p>
<h2>A little context</h2>
<p>Before we dive into things I think it's worth providing a bit of context first so we all know what we're working with. The Kubernetes cluster in question is my &quot;homelab&quot; cluster that is running <a href="https://www.talos.dev">Talos Linux</a> within a <a href="https://www.proxmox.com/en/">Proxmox</a> cluster. The cluster has a single control plane node and 5 worker nodes. The worker nodes are where we'll be focussing. Beyond that there's nothing too special about this cluster. CNI is provided by <a href="https://github.com/flannel-io/flannel">Flannel</a> with default configuration. The only thing that is kinda unusual is my use of <a href="https://github.com/kubernetes-csi/csi-driver-smb">SMB for my CSI driver</a> - I only mention this because it's possible that a failure here <em>might</em> have been the thing that trigger the initial issue but I don't know for sure.</p>
<h2>Identifying the issue</h2>
<p>As I sad, I had ignored this for months and not thought much of it until Friday. I was cleaning up some unrelated stuff in my cluster and was checking my Kubelet logs stored in <a href="https://grafana.com/docs/loki/latest/">Loki</a> to see if my cleaning up had been applied. This is how I (re-)noticed the high number of sandbox errors from various Kubelets. So I took a closer look with:</p>
<pre><code class="language-promql">{service=&quot;kubelet&quot;} |= &quot;failed to get network \&quot;cbr0\&quot; cached result&quot;
</code></pre>
<p>There was a LOT of these errors in the logs, almost 1000 per hour! 😳</p>
<p>Even though there was a lot of errors there was actually only a handful of sandbox IDs causing the problem so it wasn't widespread, just frequent.</p>
<h2>Figuring out what was wrong</h2>
<p>Because I'm a Professional Kubernetes Platform Engineer™ I knew exactly how to tackle this... ask the internet! So fired up DuckDuckGo, pasted in my error message and started wading through a LOT of useless results.</p>
<ul>
<li>I found a couple <a href="https://github.com/projectcalico/calico/issues/4084">similar</a> <a href="https://github.com/projectcalico/calico/issues/7647">issues</a> that people had experienced with Calico but didn't seem to fit what I was seeing.</li>
<li>There was a <a href="https://github.com/containerd/containerd/issues/6565">closed issue</a> against Containerd where <a href="https://github.com/containerd/containerd/issues/6565#issuecomment-1632089810">this comment</a> gave me some hope in solving it with a loopback CNI but turned out unrelated also.</li>
</ul>
<p>These did give me enough info to start looking in the right places though. There was just one problem - Talos Linux doesn't have a shell you can SSH into, it's all API driven.</p>
<p><code>kubectl debug</code> to the rescue!!!</p>
<p>Thankfully, it's possible to mimic an SSH session using <code>kubectl debug</code> against the node. As I knew I'd need a bunch of container-related tools to help with debugging I launched a debug container using the <a href="https://github.com/raesene/alpine-containertools">alpine-containertools</a> image from <a href="https://www.mccune.org.uk">Rory McCune</a>. Due to how the debug command mounts the host filesystem I also had to override the <code>CONTAINER_RUNTIME_ENDPOINT</code> environment variable to point to the correct location.</p>
<pre><code class="language-bash">kubectl debug -it --image raesene/alpine-containertools \
  --env CONTAINER_RUNTIME_ENDPOINT=unix:///host/run/containerd/containerd.sock \
  node/talos-192-168-1-15 \
  -- sh
</code></pre>
<p>Now we can start poking around on the node. First, lets see if we can force delete the sandbox pod ourself.</p>
<pre><code class="language-bash">crictl rmp --force 055b221e44a28ce8d9120f771d5e1ef201f2457ce49c58999a0147104cca2708

Failed to stop sandbox before removing: rpc error: code = Unknown desc = failed to destroy network for sandbox &quot;055b221e44a28ce8d9120f771d5e1ef201f2457ce49c58999a0147104cca2708&quot;: failed to get network &quot;cbr0&quot; cached result: decoding version from network config: unexpected end of JSON input
</code></pre>
<p>No luck! But we've been able to reproduce what Kubelet is hitting at least.</p>
<p>I spent a bit more time poking at <code>crictl</code> to get an understanding of the state of things. Doing so I was able to identify which pods were broken on each node.</p>
<pre><code class="language-bash"># Get the full details about the pod in containerd
crictl inspectp --output=json 055b221e44a28ce8d9120f771d5e1ef201f2457ce49c58999a0147104cca2708
# List all pods known to containerd, and filter to those in a Notready state
crictl pods | grep NotReady
</code></pre>
<p>Eventually, after wading through all the unrelated search results one did seem relevant, a GitHub issue on <code>containerd</code>: &quot;<a href="https://github.com/containerd/containerd/issues/8197">Empty cache files causes &quot;KillPodSandbox&quot; errors when deleting pods</a>&quot;.</p>
<h2>Cleaning things up</h2>
<p>This final GitHub issue contained one important piece of information I was missing - the location of the network cache: <code>/var/lib/cni/results</code>. This is the directory where the CNI stores a JSON file for each network it creates. So let's go take a look:</p>
<pre><code class="language-bash">&gt; ls -la /host/var/lib/cni/results

total 148
drwx------    2 root     root          4096 Sep 27 05:48 .
drwx------    5 root     root            52 Jun 23  2022 ..
-rw-------    1 root     root          1647 Sep 27 05:47 cbr0-01d576d2f808a969bb7cb8da11d1ee3117ec4e9792d4aea33abec55318b74e01-eth0
-rw-------    1 root     root             0 Jun 15 21:28 cbr0-055b221e44a28ce8d9120f771d5e1ef201f2457ce49c58999a0147104cca2708-eth0
-rw-------    1 root     root          1608 Sep 27 05:48 cbr0-225486a2df0c2c8be5038245d099b4c48dc03ede648735a2b4676154650e7741-eth0
-rw-------    1 root     root          2045 Sep 27 05:46 cbr0-2e848e89543e7b6491d0b47eb2be7ec620865024d942012303ede16842aa1108-eth0
-rw-------    1 root     root          1638 Sep 27 05:48 cbr0-415cd18f24ec2740486a7bf254054f871bd8abe4c26bc6e27dbf1f5a10c29f69-eth0
-rw-------    1 root     root          1638 Sep 27 05:46 cbr0-47d021d26d2d1935025cee58941ff68ad28e09f1337b867e0865be56dde64a2a-eth0
-rw-------    1 root     root          1616 Sep 27 05:46 cbr0-5d40d2f86c6f4be7a85969f24e844ea049b41af14b97cfa3f5a655f52d4fc695-eth0
-rw-------    1 root     root          1616 Sep 27 05:48 cbr0-68f252ab057496036e8852a4f6103c1b6c564f6976d04676010858a5588a9a10-eth0
-rw-------    1 root     root          1596 Sep 27 05:48 cbr0-91d8805fe4b95c92f4c53e1321813b5fe9f71467c5692328bb7646649db57b22-eth0
-rw-------    1 root     root          1764 Sep 27 05:48 cbr0-a0eaa73bd701c9b1b1167efd49487b48074936211de1c6be37beec5184e42691-eth0
</code></pre>
<p>Notice anything suspicious? The file size of the file associated with our failing sandbox ID is <code>0</code>. 🤔 That explains why the CNI is unable to load from the cache then - it's empty. Taking a quick look at one of the other files shows a valid JSON object with various network configuration stored within.</p>
<p>So lets delete this file - <code>rm -rf /host/var/lib/cni/results/cbr0-055b221e44a28ce8d9120f771d5e1ef201f2457ce49c58999a0147104cca2708-eth0</code></p>
<p>With that gone, let's try again to delete the pod:</p>
<pre><code class="language-bash">crictl rmp 055b221e44a28ce8d9120f771d5e1ef201f2457ce49c58999a0147104cca2708

Removed sandbox 055b221e44a28ce8d9120f771d5e1ef201f2457ce49c58999a0147104cca2708
</code></pre>
<p>SUCCESS! 🎉</p>
<p>Now we know what to clean up it's time to repeat the process for each filing sandbox ID on each node. Once that is done the kubelet logs are looking <em>muuuucccchhhhhhh</em> better!</p>
<h2>Preventing this happening again</h2>
<p>As I'm still unclear why or how these pods first failed I can't be sure it wont happen again. My current leading theories are it was either caused by a previous power failure that left the nodes in a bad state or was related to my NAS crashing as the few pods I took a close look at all had SMB volumes associated with them.</p>
<p>Regardless of what caused it I want to make sure I'm being more proactive going forward and leaving my cluster in a better state than it was before. I could set up a dashboard that includes the log query from Loki that we used at the start of this post but that requires me to check it often and... I'm lazy. What we really want is some alerts to annoy me when / if this happens again.</p>
<p>I already have quite a decent monitoring and alerting setup in my homelab made up of <a href="https://victoriametrics.com">VictoriaMetrics</a>, <a href="https://prometheus.io/docs/alerting/latest/alertmanager/">AlertManager</a> and various metric exporters so I'm in a good starting point.</p>
<p>But what I am lacking is containerd metrics.</p>
<p>A quick search lead me to this <a href="https://collabnix.com/monitoring-containerd/">great post</a> by Abraham Dahunsi covering a lot of details about monitoring containerd. This gave the information on what metrics to work with and what I can do with them but I was still missing the actual metrics. In Talos Linux the containerd metrics aren't enabled by default but thankfully it's <a href="https://www.talos.dev/v1.11/talos-guides/configuration/containerd/">quite simple to enable</a>, I just needed to add the following to my machine config for each of my nodes:</p>
<pre><code class="language-yaml">machine:
  files:
    - content: |
        [metrics]
          address = &quot;0.0.0.0:11234&quot;
      path: /etc/cri/conf.d/20-customization.part
      op: create
</code></pre>
<p>I then just need to instruct <a href="https://docs.victoriametrics.com/victoriametrics/vmagent/">vmagent</a> to scrape this new endpoint:</p>
<pre><code class="language-yaml">    scrape_configs:
    - job_name: containerd
      scrape_interval: 30s
      scheme: https
      tls_config:
        ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
      bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
      kubernetes_sd_configs:
      - role: node
      relabel_configs:
      - action: labelmap
        regex: __meta_kubernetes_node_label_(.+)
      - target_label: __address__
        replacement: kubernetes.default.svc:443
      - source_labels: [__meta_kubernetes_node_name]
        regex: (.+)
        target_label: __metrics_path__
        replacement: /api/v1/nodes/${1}:11234/proxy/v1/metrics
      - source_labels: [__meta_kubernetes_node_name]
        action: replace
        target_label: kubernetes_node
</code></pre>
<p>Now I have the metrics coming in, we can put them to use at identifying issues.</p>
<p><strong>Identify failures starting a pod sandbox:</strong></p>
<pre><code class="language-promql">sum by(grpc_code, instance) (
  rate(grpc_server_handled_total{job=&quot;containerd&quot;, grpc_code!=&quot;OK&quot;, grpc_method=&quot;RunPodSandbox&quot;})
) &gt; 0
</code></pre>
<p><strong>Identify failures stopping or removing a pod sandbox:</strong> (This is the issue we handled in this post)</p>
<pre><code class="language-promql">sum by(grpc_code, grpc_method, instance) (
  rate(grpc_server_handled_total{job=&quot;containerd&quot;, grpc_code!=&quot;OK&quot;, grpc_method=~&quot;StopPodSandbox|RemovePodSandbox&quot;})
) &gt; 0
</code></pre>
<p><strong>Find all pods known to containerd that aren't known to Kubernetes:</strong></p>
<pre><code class="language-promql">(
  container_pids_current{namespace=&quot;k8s.io&quot;}
    unless on (container_id)
    label_replace(
        kube_pod_container_info,
        &quot;container_id&quot;, &quot;$1&quot;, &quot;container_id&quot;, &quot;containerd://(.+)&quot;
    )
    unless on (container_id)
    label_replace(
        kube_pod_init_container_info,
        &quot;container_id&quot;, &quot;$1&quot;, &quot;container_id&quot;, &quot;containerd://(.+)&quot;
    )
)
unless on (container_id)
(
  container_memory_swap_limit_bytes &gt; 0
)
</code></pre>
<p>This last one has a lot going on but it breaks down to:</p>
<ol>
<li>List all <code>pids</code> counts for all containers in the <code>k8s.io</code> namespace in containerd</li>
<li>Unless the <code>container_id</code> is also found on the <code>kube_pod_container_info</code> metric (where we need to strip the <code>containerd://</code> prefix)</li>
<li>Or the <code>container_id</code> is also found on the <code>kube_pod_init_container_info</code> metric</li>
<li>And finally only if the <code>container_memory_swap_limit_bytes</code> is greater than 0 which will filter out the sandbox containers (pause container) which aren't exposed to Kubernetes anyway.</li>
</ol>
<p>These three then become new alerts that can notify me if any match.</p>
<pre><code class="language-yaml">- name: containerd
  rules:
  - alert: ContainerdPodStartFailed
    annotations:
      description: |
        Containerd on **{​{ .Labels.instance }}** has failed to start a pod sandbox due to **{​{ .Labels.grpc_code }}**.
    expr: |
      sum by(grpc_code, instance) (rate(grpc_server_handled_total{job=&quot;containerd&quot;,grpc_code!=&quot;OK&quot;, grpc_method=&quot;RunPodSandbox&quot;})) &gt; 0
    for: 5m
    labels:
      severity: notify

  - alert: ContainerdPodRemoveFailed
    annotations:
      description: |
        Containerd on **{​{ .Labels.instance }}** has failed to {​{ .Labels.grpc_method }} due to **{​{ .Labels.grpc_code }}**.
    expr: |
      sum by(grpc_code, grpc_method, instance) (rate(grpc_server_handled_total{job=&quot;containerd&quot;, grpc_code!=&quot;OK&quot;, grpc_method=~&quot;StopPodSandbox|RemovePodSandbox&quot;})) &gt; 0
    for: 5m
    labels:
      severity: notify

  - alert: ContainerdHasLeftoverContainersRunning
    annotations:
      description: |
        Containerd on **{​{ .Labels.instance }}** has extra containers that aren't known to Kubernetes.
    expr: |
      (
        container_pids_current{namespace=&quot;k8s.io&quot;}
          unless on (container_id)
        label_replace(
            kube_pod_container_info,
            &quot;container_id&quot;, &quot;$1&quot;, &quot;container_id&quot;, &quot;containerd://(.+)&quot;
        )
        unless on (container_id)
        label_replace(
            kube_pod_init_container_info,
            &quot;container_id&quot;, &quot;$1&quot;, &quot;container_id&quot;, &quot;containerd://(.+)&quot;
        )
      )
      unless on (container_id)
      (
        container_memory_swap_limit_bytes &gt; 0
      )
    for: 15m
    labels:
      severity: notify
</code></pre>
<h2>This I learnt</h2>
<p>I'd never really used <code>crictl</code> much before this so it was good to be able to better understand it. One thing I didn't realise was it also has the concept of &quot;pods&quot; and &quot;containers&quot; which are different, but map to, those found in Kubernetes. A &quot;pod&quot; seems to refer to the sandbox container - that is, the container that is created to setup all the networking etc. that is needed for the workloads (this is the &quot;pause container&quot;). Pods can be listed with <code>crictl pods</code> and inspected with <code>crictl inspectp [ID]</code>. Containers can be listed with <code>crictl ps</code> and inspected with <code>crictl inspect</code>.</p>
<p>Before this I didn't really know what a CNI <em>actually did</em> in terms of things it created etc., and I still mostly don't, but I've at least seem some hints of it in the way of the network cache files that it creates.</p>
<p>Debugging Talos when it goes wrong at the host level is <em>tricky</em> as you have limited access to the node itself. This is often good as it prevents you from being able to make live changes that could be forgotten or reverted accidentally but when you actually need to fix something like this it just gets in the way. I'm very thankful for <code>kubectl debug</code> (and Rory's wonderful <code>alpine-containertools</code> image 💙) that makes this debugging possible and relatively painless.</p>
]]></description><link>https://marcusnoble.co.uk/2025-09-28-investigating-and-fixing-stoppodsandbox-from-runtime-service-failed-kubelet-errors</link><guid isPermaLink="true">https://marcusnoble.co.uk/2025-09-28-investigating-and-fixing-stoppodsandbox-from-runtime-service-failed-kubelet-errors</guid><pubDate>Sun, 28 Sep 2025 00:00:00 GMT</pubDate></item><item><title><![CDATA[My tips on giving technical talks]]></title><description><![CDATA[<details>
<summary>Changelog</summary>
<p>2025-05-10: Added advice from Josh Clark</p>
<p>2025-05-03: Added some advice from Lobste.rs comments</p>
<p>2025-05-03: Added advice from Aaron Patterson about repeating questions back</p>
<p>2025-05-01: Added advice from Márk Sági-Kazár</p>
<p>2025-05-01: Added suggestion from Oriel about going to the toilet before speaking.</p>
</details>
<p>I've been <a href="https://youtube.com/playlist?list=PLT41C0Ggz5wa66-AU5xapbOuzkUKUPLzi">giving talks</a> at meetups and conferences for a few years now. I started off after the encouragement of my friends giving their own talk and looking so cool doing it! It's taken a while but I think I'm at a stage now where I'm not only good at it (at least I hope so 😅) but I feel confident and comfortable while doing it. I want everyone to have that same confidence and I want to hear ALL OF YOU giving talks too! You have stories to tell, lessons to share and experience to pass on. So here is my learnings on how I approach giving a talk in front of a crowd of techies, mainly focussed on technical talks but most of this should apply to most public speaking.</p>
<p>Before we get into it, I just want to set some expectations.</p>
<p>These are things that have worked for me, including advice I've been given by other speakers over the years, but that doesn't always mean they will work for you. We're all unique in weird and wonderful ways so if you find something doesn't work for you or isn't comfortable then ignore it and focus on the things that do help. 💙</p>
<p>I'm also not going to cover (much) about how to come up with a talk idea or how to get it accepted at a conference or meetup. I also have difficulty here and don't feel confident enough to offer advice on this.</p>
<p>I'm also not a &quot;professional&quot; speaker. This isn't my full-time job, even if I sometimes do it on behalf of my company. But that is also why I think all of you can also be giving talks and sharing your knowledge with the wonderful community.</p>
<p>So, let's dive into it shall we?</p>
<h2>Coming up with the talk</h2>
<p>As I said above, I can't really give much advice here. I struggle with this a lot of the time but there is one thing I can say for sure...</p>
<p><strong>Talk about what you’re interested in and enjoy</strong> - if you don’t enjoy the subject your audience won’t either. It is also likely you'll have a terrible time giving the talk (not to mention writing the talk) if it's not something you enjoy.</p>
<p>One other thing I can say on this matter though is <strong>just because someone else has given a talk on a specific topic doesn't mean you cannot</strong>. We each bring a unique view to a topic and our delivery of the subject can resonate and connect with people differently.</p>
<h2>Writing the talk</h2>
<h3>The language used</h3>
<p>Make sure <strong>the language you use is inclusive</strong> and avoid words that trivialises actions. For example, avoid words like &quot;guys&quot; and &quot;master&quot; as much as you can to ensure you're not making anyone feel excluded - I like to use &quot;y'all&quot; when referring to a group of people. I also recommend limiting your use of words like &quot;basically&quot; and &quot;simply&quot; - these can feel alienating sometimes when the thing your talking about isn't seen as being basic or simple to the audience.</p>
<p>It's generally a good idea to <strong>avoid jokes</strong> unless you're <em>sure</em> they'll land and are designed to lift people up, not belittle anyone. A lot of the time jokes require some sort of pre-existing cultural knowledge that might not be the case for all your audience members. This is especially true at international conferences where you have people from different countries and languages in the audience.</p>
<p>Don't perpetuate stereotypes - avoid things like &quot;so simple even my mum can use it&quot; or similar comments.</p>
<p>Don't be afraid of saying &quot;um&quot; or pausing in your talk. In fact I suggest you make use of pauses in your speaking to add emphasis and break up sections.</p>
<h3>Your slides</h3>
<p>Design is not my strongest skill so I'm going to keep these brief and to the point:</p>
<ul>
<li>Make your font sizes BIG so that people at the back of a large room can see without struggling</li>
<li>Avoid paragraphs - no one is there to read, they're there to listen</li>
<li>Limit the amount of fonts and variations you use. Pick one main text font and one mono font. Use bold, italics and colour sparingly and to add emphasis where needed.</li>
</ul>
<p>If you're going to be presenting your slides on a projector then your <strong>slides will always look very washed out</strong> compared to what you see on your computer screen. Plan ahead for this and make sure your slides have good contrast between the text and background. If you're worried about this I suggest speaking with the organisers at the start of the day and testing out your slides in the room you'll be presenting.</p>
<p>I recommend <strong>including your social media handle</strong> on each of your slides if you can. People take photos of slides to remind themselves of the content later and there's a good chance they would have forgot who gave the talk by then.</p>
<p>I also recommend having a final slide with links to get a copy of your slides if possible and where people can get in touch with you to ask questions after the talk.</p>
<p>Always remember that <strong>your slides are only there to reenforce what you're saying</strong> and are not the total content. They should enhance and summarise your points as you go so keep them brief and snappy.</p>
<p>On a similar note to avoiding paragraphs, <a href="https://lobste.rs/~stip">stip</a> points out on <a href="https://lobste.rs/s/ioyjfm/my_tips_on_giving_technical_talks#c_jixtpt">lobste.rs</a> that:</p>
<blockquote>
<p>you should limit code on slides as much as possible, ideally only to a few lines. In my opinion, most points can be made with just a function signature or a several-line toy function. Showing the audience more than that risks losing them while they try to figure out what is going on.</p>
</blockquote>
<blockquote>
<p>If you really need to put a long function on a slide, try and highlight the relevant section as you’re talking about it so the audience can follow along better.</p>
</blockquote>
<p>This is great advice! Techies in the audience will automatically try and read all the code and figure out what it does rather than listen to you talking. For this I like to use contrast of text colour to highlight the bits I want people to focus on and &quot;fade out&quot; the rest of the code that is just boiler plate or will be relevant later. I will often have the same code on multiple slides but each slide highlights a different bit of the code so I can walk the audience through it rather than dumping it on them all in one go.</p>
<h3>Technical</h3>
<p>Most presentation software has the ability to write <strong>speakers notes</strong>. I don't personally use these myself as I can't read fast enough to make them useful but if you feel more confident with them then I totally recommend using them. The main thing to keep in mind though is not to use them as a script but rather just short notes to remind you of your place in the talk.</p>
<p>Ensure you know how to make your slides fullscreen before giving the talk. This may sound obvious but if your using Google Slides from Chrome on a Mac, for example, then you will actually need to enable to ability to go fullscreen in your browser first. I've seen this catch people out several times. If this is your setup then you want to go to <code>View</code> → <code>Always Show Toolbar in Full Screen</code> and make sure that is NOT checked. 😉</p>
<p><strong>Demos go wrong</strong> and conference <strong>WiFi sucks</strong>. Be sure to have offline alternatives of your slides and backup options for demos going wrong. A video recording of your demo is always a good choice. There's rarely ever a need to do a live demo, I certainly never do - my typing skills and spelling are terrible 🤣.</p>
<h3>Ending the talk</h3>
<p><strong>Always end the talk with a “thank you” and not “any questions”</strong>. When you ask for questions it causes confusion for the audience as to if they should clap or not. Let the conference / meetup organiser ask if there are questions as they also know if there’s time available and if there is a portable microphone for the audience to use.</p>
<h2>On the day</h2>
<h3>Setting up your tech</h3>
<p>Make sure you <strong>arrive prepared</strong>. This means having any adapters or dongles you need for your laptop to connect to the provided HDMI cable, that your laptop is sufficiently charged (at least 50%) and you have an offline copy of your slides available as we've already said the WiFi is unreliable.</p>
<p>Set your laptop and phone to <strong>Do Not Disturb mode</strong>. The last thing you want is an embarrassing notification to pop up for all to see.</p>
<p>Ensure your laptop is <strong>clear of clutter</strong> - arrange the windows you'll need during your talk, close or hide all others. It's never fun watching someone switch between multiple windows or tabs trying to find where their slides or demo went to.</p>
<h3>Taking the stage</h3>
<p>Just before you step up on to the stage I recommend <strong>removing your conference badge</strong> so it doesn't distract or get caught on anything and <strong>emptying your pockets</strong> to avoid things jingling or distracting you mid-talk.</p>
<p>Be sure to <strong>stay hydrated</strong> while giving your talk. If the organisers don't provide you with a bottle of water be sure to get your own drink to take up with you. Often the stage will be lit by very bright and very hot lights and you will feel your throat going dry in no time!</p>
<p>I can't believe I forgot this, so thank you to <a href="https://infosec.exchange/@barubary/114431272916715726">Oriel Jutty</a> for reminding me about this one - make sure you <strong>go to the toilet before your talk</strong>! Especially withe the &quot;stay hydrated&quot; tip above you really don't want to get a few minutes into your talk and then be distracted by your bladder. I usually end up going to the toilet several times in the hour running up to my talk! 😅</p>
<h3>Giving the talk</h3>
<p>I like to use a <strong>presentation remote</strong> to control my slides as I like to move around the stage when I'm talking. This isn't required but I recommend it rather than hiding behind your laptop the whole time clicking the keyboard. My current clicker is this <a href="https://www.amazon.co.uk/dp/B0D8HWPDG8">finger ring presentation remote</a>. The added benefit of using a remote is it gives you something to do with (one of) your hands.</p>
<p>On a similar note, <strong>don't bother with a laser pointer</strong>. They're rarely visible to all attendees and are completely lost on the recording of the talk (if there is one). If you do want to point to things on the slides I recommend looking into software based laser pointer or just using your mouse cursor.</p>
<h3>A quick word on microphones</h3>
<p>When it comes to giving a talk to a medium to large sized crowd you'll almost certainly be using a microphone to amplify your voice. You often don't get a choice in what microphone you can use but if you do here are my thoughts on the different styles:</p>
<ul>
<li>Podium - These are usually fixed to the podium and you need to always face that direction when talking. Remember not to talk while pointing at the screen behind you as people won't hear you.</li>
<li>Handheld - These should be held right up to your mouth, closer than you initially think. If you start to drift it away from your face the sound quality degrades very quickly. Be aware that these can be awkward if you plan to do live demos where typing is needed as one of your hands will need to be holding the mic.</li>
<li>Lapel - These clip on to your clothes, usually near the collar, so avoid loose or noisy clothing that could cause noise on the mic</li>
<li>&quot;Britney&quot; style - Not sure the correct name for these but they're the ones fitted over your ear with a small arm mic on them. These are my favourite as you don't need to hold them and you can move around with the, just be careful if coughing as the sound isn't nice.</li>
</ul>
<p>On a related point, <strong>try to avoid needing to play any sound from your laptop</strong> during your slides or demo. If you absolutely need sound for your talk then please discuss it with the organisers before the event so they can ensure it is catered for.</p>
<h3>Questions</h3>
<p>As said above, end your talk with a &quot;thank you&quot; and let the organiser initiate questions if there is time.</p>
<p>Be aware that you don't need to be all knowing - if you don't have an answer for a question it is <strong>totally acceptable to say &quot;I don't know&quot;</strong>.</p>
<p>If you think an answer might require considerable time or be confusing to explain from the stage then suggest that the person asking the question meets you after the session to discuss it further.</p>
<p>One thing <a href="https://bsky.app/profile/tenderlove.dev/post/3lo7btf5gfs2u">Aaron Patterson pointed out on Bluesky</a> that I hadn't included here is:</p>
<blockquote>
<p>during Q&amp;A try to repeat the questions. Audience members don't always have a mic so repeating will help everyone, it makes sure you understand the Q, and buys you time to think of an answer</p>
</blockquote>
<h2>After the talk</h2>
<p>So, you did it! You gave the talk! Congratulations! 🎉</p>
<p>Now what?</p>
<p>Well, first things first - it's always good to <strong>make yourself available</strong> immediately after your talk as it's possible people will want to come ask you questions or possibly give you feedback or thanks. I also like to take this time to &quot;unwind&quot; a little after my talk and make any notes on things that I think might need changing or improving for next time.</p>
<p>On that note - yes, there should be a next time. If your talk was a success and you enjoyed giving it then why stop there? There are always more people that will want to hear what you have to say so don't be afraid of reusing talks at other events. Just be sure not to simply copy and paste each time but work to improve and update as you learn and grow.</p>
<h3>My most important tip</h3>
<p>I forget who gave me this advice, many years ago, but it's stuck with me all this time - <strong>Celebrate every talk</strong> - I like to have a “little treat” after (doughnut, muffin, something tasty) but do whatever is nice and rewarding for you. You put a lot of work into it and regardless of how it went on the day you deserve to celebrate that achievement.</p>
<hr>
<p>If you have any tips of your own I'd love to hear them! Let me know on <a href="https://bsky.app/profile/averagemarcus.bsky.social">Bluesky</a> or <a href="https://k8s.social/@Marcus">Mastodon</a>.</p>
<p>Finally I'd like to leave you with a collection of resources I have learnt from over the years...</p>
<h2>Resources</h2>
<h3>Books</h3>
<ul>
<li><a href="https://app.thestorygraph.com/books/3af298e1-8c09-4f8a-b6d0-a770730619db">Resonate: Present Visual Stories That Transform Audiences by Nancy Duarte</a></li>
<li><a href="https://app.thestorygraph.com/books/3d295f69-0b78-4fbe-ad6a-b612d2f6d225">Slide:Ology: The Art and Science of Creating Great Presentations by Nancy Duarte</a></li>
<li><a href="https://demystifying-public-speaking.com/">Demystifying Public Speaking by Lara Hogan</a></li>
</ul>
<h3>Websites</h3>
<ul>
<li><a href="https://github.com/coryhouse/speaker-starter-kit">speaker-starter-kit</a></li>
<li><a href="https://speaking.io/">speaking.io</a></li>
<li><a href="https://www.selfdefined.app/">Self-Defined</a></li>
</ul>
<h3>Blog posts</h3>
<ul>
<li><a href="https://medium.com/@dellsystem/i-cant-read-your-slides-737acde6e9dc">I can’t read your slides</a></li>
<li>Preparing a talk - <a href="https://www.juliaferraioli.com/blog/2017/09/preparing-talk-before-you-start/">Part 1</a>, <a href="https://www.juliaferraioli.com/blog/2017/10/preparing-talk-writing-your-talk/">Part 2</a>, <a href="https://www.juliaferraioli.com/blog/2017/10/preparing-talk-presenting/">Part 3</a></li>
<li><a href="https://medium.com/samsung-internet-dev/public-speaking-for-beginners-8bdee16123ba">Public Speaking for Beginners</a></li>
<li><a href="https://amberwilson.co.uk/blog/inclusionary-exclusionary-language/">Inclusionary and Exclusionary Language</a></li>
<li><a href="https://alicebartlett.co.uk/blog/how-to-do-ok-at-slides">A white-label slide deck</a></li>
</ul>
<h3>Videos</h3>
<ul>
<li><a href="https://www.youtube.com/watch?v=AoeeLl5FC-M">The art of slide design</a></li>
</ul>
<h2>Advice from Friends</h2>
<p>After posting this I started asking for other people's tips and advice. I will update below as I get more to share.</p>
<h3><a href="https://bsky.app/profile/sagikazarmark.com">Márk Sági-Kazár</a></h3>
<blockquote>
<p>I see folks sometimes struggling with coming up with the content. They have the title and the topic, they have the thank you slide, but they have trouble with the slides in between. I usually recommend writing a draft first instead of thinking in terms of slides. Number one rule of writing is that you don't write and edit at the same time.</p>
<p>Personally, I use pen and paper, because it slows my thoughts and I don't bounce between ideas.</p>
</blockquote>
<p>I can totally see how this would be useful for many folks. Myself I actually prefer the opposite, I throw together a bunch of slides with my various points on them and keep rearranging and refining them until they fit the story I want to tell. The key thing is, try various approaches and see what works for you. It might also not be the same approach for each talk you build - creativity can be fickle sometimes.</p>
<blockquote>
<p>Another common mistake is time: less experienced speakers can't estimate how long they are going to talk with a given set of slides. One reason for that is they don't rehearse (I'm not saying it's a must, but it definitely helps). Another reason (even when they rehearse) is they lose focus and they stretch out some of the slides. Rehearse and practice focusing on the key ideas on each slide.</p>
</blockquote>
<p>I should have included this in my post initially. Rehearsing a talk is a very personal thing I've found and everyone does it differently but it is very important to have a good idea of how long your talk is going to last. My top tip for this is to open a Google Meet (or your video conference of choice), start the recording and then present the entire talk as though you were giving it in front of a crowd. You don't need anyone else in the call with you, just let the recoding go until your done and then once complete you can look at the timestamp and see how long your talk takes. You don't even have to watch yourself back if you don't want to. 😅</p>
<blockquote>
<p>But my number one tip (or rather the one I keep repeating the most): don't look at the flippin' screen behind you. 😄 If you are not wearing a mic (eg. at a small meetup) noone is going to hear you. If you are wearing a mic, it just looks bad, especially if there is a recording.</p>
</blockquote>
<p>😅 Yeah - I'm at fault for this quite often to be honest. I like to point at things and be quite physically expressive.</p>
<blockquote>
<p>Zoom in on your terminal windows.</p>
</blockquote>
<p>Bigger is always better here. Assume the person sat at the back of the room also has the worst possible eyesight.</p>
<blockquote>
<p>Disable true tone (sometimes it makes screen recording yellowish)</p>
</blockquote>
<p>Solid practical tip if you use true tone on your machine. Not something I've ever thought of before.</p>
<h3><a href="https://lobste.rs/~tomhukins">tomhukins</a> on <a href="https://lobste.rs/s/ioyjfm/my_tips_on_giving_technical_talks#c_dklog7">lobste.rs</a></h3>
<blockquote>
<p>In one of my talks, I completely forgot what I wanted to say. I noticed myself panicking, so took a deep breath, paused and looked at the audience. When I watched the video of my talk, it looked like a natural pause in the presentation. Although saying “um” as a pause is okay, silence works better.</p>
</blockquote>
<blockquote>
<p>I’m a native English speaker and I often present to people who aren’t. I try to speak slowly and clearly, and to use simple or common words whenever possible. I want to communicate an idea, not to sound clever.</p>
</blockquote>
<blockquote>
<p>Sharing links and contact details on a final slide makes sense: include a QR code for the audience to photograph.</p>
</blockquote>
<h3><a href="https://www.linkedin.com/in/je-clark/">Josh Clark</a> on <a href="https://www.linkedin.com/feed/update/urn:li:activity:7323429734250418176?commentUrn=urn%3Ali%3Acomment%3A%28activity%3A7323429734250418176%2C7326995404191870977%29&amp;dashCommentUrn=urn%3Ali%3Afsd_comment%3A%287326995404191870977%2Curn%3Ali%3Aactivity%3A7323429734250418176%29">LinkedIn</a></h3>
<blockquote>
<p>In demos, make sure the relevant info (e.g. terminal commands) is in the middle third of the screen. The top of the projector screen can often be washed out or reflecting the overhead lights in the room, and the bottom is often cut off by people's heads.</p>
</blockquote>
<p>This is actually really good advice for all your slides. Having padding around the edges of your presentation theme will help you deal with this issue and any wonky projector issues that you might face.</p>
<blockquote>
<p>And when I'm juggling multiple windows (slides, terminal, wireshark, etc) in a presentation, I like to set them up in individual virtual desktops on my mac. In rehearsals, I make sure the order of those desktops is conducive to swiping between them.</p>
</blockquote>
<p>This is a great practical tip for how to handle the &quot;cleaning your desktop&quot; advice I have above! Much slicker than moving around windows or finding the appropriate tab or whatever.</p>
]]></description><link>https://marcusnoble.co.uk/2025-04-30-my-tips-on-giving-technical-talks</link><guid isPermaLink="true">https://marcusnoble.co.uk/2025-04-30-my-tips-on-giving-technical-talks</guid><pubDate>Wed, 30 Apr 2025 00:00:00 GMT</pubDate></item><item><title><![CDATA[Building Social Media Bots With Node-RED]]></title><description><![CDATA[<p>I run several bots that post to social media and the number seems to keep growing. I've had a few people in the past ask me how I run them so I thought I'd finally get around to writing down how I manage it by leveraging <a href="https://nodered.org/">Node-RED</a>.</p>
<h2>Intro to the bots</h2>
<p>Before getting into the details, lets take a little look at what bots I actually run.</p>
<ul>
<li>Blog posts - Whenever I publish a new blog post I automatically create a new post on my personal <a href="https://bsky.app/profile/averagemarcus.bsky.social">Bluesky</a>, <a href="https://k8s.social/@Marcus">Mastodon</a> and <a href="https://www.linkedin.com/in/marcusnoble/">LinkedIn</a> profiles.</li>
<li><a href="https://cloudnative.now/">CloudNative.Now</a> - A monthly newsletter that I run that posts to <a href="https://bsky.app/profile/cloudnative.now">Bluesky</a> and <a href="https://k8s.social/@CloudNativeNow">Mastodon</a> whenever a new issue is released.</li>
<li><a href="https://conf.party/">Conf.Party</a> - Similar to above, this is a site I run to collect parties happening around conferences. It posts to <a href="https://mastodon.social/@confparty">Mastodon</a> and <a href="https://bsky.app/profile/conf.party">Bluesky</a> when new parties are added.</li>
<li>Kubernetes Releases - I have a bot post to <a href="https://k8s.social/@k8s_releases">Mastodon</a> and <a href="https://bsky.app/profile/k8s-releases.bsky.social">Bluesky</a> whenever there is a new Kubernetes release available on GitHub.</li>
<li>Mastodon Releases - I have a bot post to <a href="https://mastodon.social/@mastodon_releases">Mastodon</a> whenever there is a new Mastodon release available on GitHub.</li>
<li>Dropout.Tv Releases - A bot that posts to <a href="https://mastodon.social/@dropout_releases">Mastodon</a> and <a href="https://bsky.app/profile/dropout-releases.bsky.social">Bluesky</a> whenever there is a new video on the <a href="https://www.dropout.tv/new-releases">Dropout.TV new releases</a> page.</li>
</ul>
<p>The bots each post a customised message template that remains the same for each post.</p>
<h2>How this looks in Node-RED</h2>
<p>Almost all the bots follow roughly the same flow:</p>
<ol>
<li>They are triggered by a 15 minute interval inject node.</li>
<li>An RSS feed with the source data is fetched (this is something like my blog RSS feed or the GitHub releases RSS feed).</li>
<li>Parse the RSS feed XML.</li>
<li>Iterate through each item in the feed and for each one check if the guid of that item already exists within a bot-specific context data store.</li>
<li>If guid is not found in existing list include that item in the results then add the guid to the list and store into the context data store.</li>
<li>Split the returned list of items into individual items.</li>
<li>Build a message template for each social media platform.</li>
<li>Send that message to a platform-specific subflow that handles the actual API call needed for each platform. (More on this below)</li>
</ol>
<figure class="center" markdown="1">
<p><img src="/images/node-red-kubernetes-releases-bot.png" alt="Node-RED bot - Kubernetes Releases"></p>
<figcaption>The Kubernetes Releases bot in Node-RED</figcaption>
</figure>
<p>The exception to this is the Dropout TV bot. There isn't an RSS feed for the new releases so I had to do some web scraping instead and it took a bit of trial and error. I wont go into detail for this as it's very specific to this bot.</p>
<figure class="center" markdown="1">
<p><img src="/images/node-red-dropout-bot.png" alt="Node-RED bot - Dropout TV Releases"></p>
<figcaption>The considerably more complex Dropout TV bot in Node-RED</figcaption>
</figure>
<p>Lets take a look at each step in a bit more detail...</p>
<h3>1. Inject Node</h3>
<p>Nothing special here, it's an <a href="https://nodered.org/docs/user-guide/nodes#inject">Inject Node</a> with a repeat interval set to 15 minutes. Set this to whatever makes most sense to your bot but be sure to be respectful of the datasource you're fetching from and not to call too often.</p>
<h3>2 &amp; 3. Fetch RSS Feed</h3>
<p>Again, nothing special here, just a HTTP request node that fetches the URL of the RSS feed and passes it on to an XML node to parse it to the JSON payload.</p>
<h3>4. Check for updates</h3>
<p>This is where we start to implement some logic. We use a Function Node with the following custom JavaScript:</p>
<pre><code class="language-js">// Grab our bots context data store or initialise it as an empty array
let prev = flow.get(&quot;kubernetes_releases&quot;) || [];
// Grab all the guids from our incoming RSS feed data
let guids = msg.payload.feed.entry.map(a =&gt; a.id[0])
// Create an array to store new posts in to return at the end
let newPosts = [];

// For each of the guids check if our existing context data contains it already
// If not, add the guid to the existing and add the whole post to the newPosts array
guids.forEach((guid, i) =&gt; {
    if (!prev.includes(guid)) {
        prev.push(guid)
        newPosts.push(msg.payload.feed.entry[i]);
    }
});

// Make sure we update our context data store with our updated list of entries
flow.set(&quot;kubernetes_releases&quot;, prev);

// Return all the new posts
return {
    payload: newPosts
};
</code></pre>
<h3>5. Split</h3>
<p>Use a Split Node to split the array returned from the previous function into single calls - Array split using fixed length of 1.</p>
<h3>6. Generate Post Message</h3>
<p>This will likely be different, more or less, for each bot as you'll want to build something that makes sense for the context of your bot.</p>
<p>The result of this function will also vary depending on the platform as different platforms require different values for authentication and similar. I'll cover these in more detail below when I look at each provider individually.</p>
<p>As an example, here's the function I use for the Mastodon release bot:</p>
<pre><code class="language-js">// Grab the title of the Release from the incoming RSS item.
// Depending on the feed structure this might be different.
let title = msg.payload.title[0];

// Handle some special cases where the release version contains &quot;rc&quot;
// as we want to make it clear that this is a Release Candidate release.
let variant = &quot;&quot;;
if (title.includes('rc')) {
  variant = &quot; Release Candidate&quot;;
}

// Build the post text
let postBody = `New Mastodon${variant} Release

:mastodon: ${title} :mastodon:

${msg.payload.link[0]['$'].href}

#Mastodon #Mastoadmin
`;

// Return the payload that will be passed into the Mastodon subflow
return {
  // The Mastodon instance the bot belongs to
  instance: &quot;mastodon.social&quot;,
  // The dev access token for the bot
  password: &quot;XXXXXXXXXXXXXXX&quot;,
  // The post text
  body: postBody
}
</code></pre>
<h3>7. Post to Platform</h3>
<p>This is different for each platform so we'll look at those individually below...</p>
<h2>Posting to Mastodon</h2>
<figure class="center" markdown="1">
<p><img src="/images/node-red-post-to-mastodon.png" alt="Node-RED - Post to Mastodon subflow"></p>
<figcaption>Reusable Subflow for posting a status update to Mastodon</figcaption>
</figure>
<p>Posting an update to Mastodon is fairly simple. You'll first need to generate a new application access key (Settings -&gt; Development -&gt; New application) and make sure it has the <code>write:status</code> scope. Make a note of the access token as we'll be using this in our payload.</p>
<p>This subflow expects 3 properties on the input message:</p>
<ul>
<li><code>instance</code> - This is the Mastodon instance the bot belongs to and to which the API request will be made.</li>
<li><code>password</code> - This is the access token of our development applications.</li>
<li><code>body</code> - The actual text to post as the status update</li>
</ul>
<p>With these 3 values we then first format the API payload in a function node using the following JavaScript:</p>
<pre><code class="language-js">return {
  payload: {
    status: msg.body,
    visibility: &quot;public&quot;
  },
  url: `https://${msg.instance}/api/v1/statuses`,
  headers: {
    &quot;Authorization&quot;: &quot;Bearer &quot; + msg.password
  }
}
</code></pre>
<p>This is then passed to an http request node with the method set to <code>POST</code> which will perform the API call and then finally we output it to a debug node so we can check for any errors returned from the Mastodon API.</p>
<h2>Posting to Bluesky</h2>
<figure class="center" markdown="1">
<p><img src="/images/node-red-post-to-bluesky.png" alt="Node-RED - Post to Bluesky subflow"></p>
<figcaption>Reusable Subflow for posting a status update to Bluesky</figcaption>
</figure>
<p>The Bluesky API is somewhat more complex to work with compared to Mastodon.</p>
<p>First off, we'll need to generate a new <a href="https://bsky.app/settings/app-passwords">App Password</a> to authenticate out bot (no need to direct message access).</p>
<p>We also need the <a href="https://en.wikipedia.org/wiki/Decentralized_identifier">Decentralized Identifier</a> (DID) of our bot user. There's a couple ways to get the DID, you can either use the <a href="https://atproto-browser.vercel.app/">ATProto Browser</a> website and enter the link to the user profile or you can make a CURL request to get the profile details, e.g.:</p>
<pre><code class="language-sh">HANDLE='averagemarcus.bsky.social'
DID_URL=&quot;https://public.api.bsky.app/xrpc/app.bsky.actor.getProfile&quot;
curl -G --silent --data-urlencode &quot;actor=$HANDLE&quot; &quot;$DID_URL&quot;
</code></pre>
<p>The DID will look something like <code>did:plc:mtepw4cvbmdvu7zygmm5xbop</code>.</p>
<p>This subflow expects 3 properties on the input message:</p>
<ul>
<li><code>appPassword</code> - The password we just generated for our bot.</li>
<li><code>did</code> - The DID of our bot.</li>
<li><code>body</code> - The actual text to post as the status update</li>
</ul>
<h3>Creating an API session</h3>
<p>The first thing we need to do when working with the Bluesky API is to create a new session, using our DID and app password. In a function node we format these into the required payload:</p>
<pre><code class="language-js">msg.payload = {
  &quot;identifier&quot;: msg.did,
  &quot;password&quot;: msg.appPassword
};

return msg;
</code></pre>
<p>We then pass this to an http request node with the following values set:</p>
<ul>
<li>Method = <code>POST</code></li>
<li>URL = <code>https://bsky.social/xrpc/com.atproto.server.createSession</code></li>
<li>Headers = <code>Content-Type: application/json</code></li>
</ul>
<p>And finally have this node output to a json node to parse the response into the payload for us to make use of in the next stage.</p>
<h3>Building the post message</h3>
<p>This is where things get <em>really</em> complicated. The ATProto isn't as user-friendly as the Mastodon API. You can't just pass it a block of text and it'll handle it and turn it into a nicely processed status post. Instead, you need to send you text along with all the details of the hashtags, links, media and mentions that are contained in your post.</p>
<p>To be able to do this, we need a couple extra node modules to be made available to our Node-RED setup. In your Node-RED config directory make sure the following packages are also installed:</p>
<ul>
<li><code>node-fetch</code></li>
<li><code>@atproto/api</code></li>
</ul>
<p>Now, in out <code>Format post</code> function node we need to reference these two packages in our &quot;Setup&quot; tab:</p>
<figure class="center" markdown="1">
<p><img src="/images/node-red-bluesky-format-post-setup.png" alt="Node-RED - Format Bluesky post setup tab"></p>
<figcaption>Make sure both modules are made available to the function node</figcaption>
</figure>
<p>The function itself then looks something like this:</p>
<pre><code class="language-js">const getBlueskyAgent = async () =&gt; {
  const agent = new atproto.AtpAgent({
    service: &quot;https://bsky.social&quot;,
  })

  await agent.login({
    identifier: msg.did,
    password: msg.appPassword,
  })

  return agent
}

const getUrlMetadata = async (url) =&gt; {
  const req = await fetch(`https://cardyb.bsky.app/v1/extract?url=${url}`)
  const metadata = await req.json()

  return metadata
}

const getBlueskyEmbedCard = async (url, agent) =&gt; {
  if (!url) return

  try {
    const metadata = await getUrlMetadata(url)
    const blob = await fetch(metadata.image.replaceAll('&amp;', '%26')).then(r =&gt; r.blob())
    const { data } = await agent.uploadBlob(blob, { encoding: &quot;image/jpeg&quot; })

    return {
      $type: &quot;app.bsky.embed.external&quot;,
      external: {
        uri: url,
        title: metadata.title,
        description: metadata.description,
        thumb: data.blob,
      },
    }
  } catch (error) {
    console.error(&quot;Error fetching embed card:&quot;, error)
    return
  }
}


const agent = await getBlueskyAgent()
const rt = new atproto.RichText({ text: msg.body })
await rt.detectFacets(agent)

let url = rt.facets.find(f =&gt; f.features &amp;&amp; f.features.length &amp;&amp; f.features[0].uri);
if (url) {
  url = url.features[0].uri
}

return {
  payload: {
    &quot;collection&quot;: &quot;app.bsky.feed.post&quot;,
    &quot;repo&quot;: msg.did,
    &quot;record&quot;: {
      &quot;createdAt&quot;: (new Date),
      &quot;$type&quot;: &quot;app.bsky.feed.post&quot;,
      text: rt.text,
      facets: rt.facets,
      embed: await getBlueskyEmbedCard(url, agent),
    }
  },
  headers: {
    &quot;Content-Type&quot;: &quot;application/json&quot;,
    &quot;Authorization&quot;: &quot;Bearer &quot; + msg.payload.accessJwt
  }
};
</code></pre>
<p>There's a lot going on here so I'm going to break it down...</p>
<p>First we have a function that generates a new Bluesky client for us to use to handle most of the actual work. We use our DID and app password as the login credentials to be able to perform any actions tat need authorisation.</p>
<pre><code class="language-js">const getBlueskyAgent = async () =&gt; {
  const agent = new atproto.AtpAgent({
    service: &quot;https://bsky.social&quot;,
  })

  await agent.login({
    identifier: msg.did,
    password: msg.appPassword,
  })

  return agent
}
</code></pre>
<p>Next we have a couple helper functions to help with URL previews. First calls a service that generates metadata about an URL:</p>
<pre><code class="language-js">const getUrlMetadata = async (url) =&gt; {
  const req = await fetch(`https://cardyb.bsky.app/v1/extract?url=${url}`)
  const metadata = await req.json()

  return metadata
}
</code></pre>
<p>This returns something along the lines of the following:</p>
<pre><code class="language-json">{
  &quot;error&quot;: &quot;&quot;,
  &quot;likely_type&quot;: &quot;html&quot;,
  &quot;url&quot;: &quot;https://marcusnoble.co.uk/index.html&quot;,
  &quot;title&quot;: &quot;Blog&quot;,
  &quot;description&quot;: &quot;The blog of Marcus Noble, self-described tinkerer, platform engineer and all round average guy!&quot;,
  &quot;image&quot;: &quot;https://cardyb.bsky.app/v1/image?url=https%3A%2F%2Fopengraph.cluster.fun%2Fopengraph%2F%3FsiteTitle%3DMarcus%252BNoble%26title%3DBlog%26tags%3D%26image%3Dhttps%253A%252F%252Fmarcusnoble.co.uk%252Fimages%252Fmarcus.jpg%26bluesky%3D%2540averagemarcus.bsky.social%26fediverse%3D%2540marcus%2540k8s.social%26github%3DAverageMarcus%26website%3Dwww.MarcusNoble.co.uk%26bgColor%3D%2523ffffff%26fgColor%3D%2523263943&quot;
}
</code></pre>
<p>This output is used in the next helper function to get our preview card details:</p>
<pre><code class="language-js">const getBlueskyEmbedCard = async (url, agent) =&gt; {
  if (!url) return

  try {
    const metadata = await getUrlMetadata(url)
    const blob = await fetch(metadata.image.replaceAll('&amp;', '%26')).then(r =&gt; r.blob())
    const { data } = await agent.uploadBlob(blob, { encoding: &quot;image/jpeg&quot; })

    return {
      $type: &quot;app.bsky.embed.external&quot;,
      external: {
        uri: url,
        title: metadata.title,
        description: metadata.description,
        thumb: data.blob,
      },
    }
  } catch (error) {
    console.error(&quot;Error fetching embed card:&quot;, error)
    return
  }
}
</code></pre>
<p>This builds an &quot;embed&quot; record for us to use with our post and will cause Bluesky clients to show a preview card of an URL contained in our post message.</p>
<p>With the helper functions defined we can start building our message. First off, we initialise a new client and parse our message body for any &quot;facets&quot;. In the ATProto &quot;facets&quot; are things like hashtags, urls and mentions - the things that and up being links when rendered. We need to define these manually in our payload to the API so the <code>@atproto/api</code> module helps us generate these.</p>
<pre><code class="language-js">const agent = await getBlueskyAgent()
const rt = new atproto.RichText({ text: msg.body })
await rt.detectFacets(agent)
</code></pre>
<p>With the facets generated we then want to check if they contain any URLs and if so grab the first one found. We will use this to generate the preview card using our helper functions above.</p>
<pre><code class="language-js">let url = rt.facets.find(f =&gt; f.features &amp;&amp; f.features.length &amp;&amp; f.features[0].uri);
if (url) {
  url = url.features[0].uri
}
</code></pre>
<p>Finally we build our actual API payload. There's a few things going on here that I don't want to go into <em>too</em> much detail about but if you're interested then take a look at the <a href="https://docs.bsky.app/docs/advanced-guides/posts">Bluesky docs</a> or the <a href="https://atproto.blue/en/latest/atproto/atproto_client.models.app.bsky.feed.post.html">ATProto SDK docs</a>.</p>
<pre><code class="language-js">return {
  payload: {
    &quot;collection&quot;: &quot;app.bsky.feed.post&quot;,
    &quot;repo&quot;: msg.did,
    &quot;record&quot;: {
      &quot;createdAt&quot;: (new Date),
      &quot;$type&quot;: &quot;app.bsky.feed.post&quot;,
      text: rt.text,
      facets: rt.facets,
      embed: await getBlueskyEmbedCard(url, agent),
    }
  },
  headers: {
    &quot;Content-Type&quot;: &quot;application/json&quot;,
    &quot;Authorization&quot;: &quot;Bearer &quot; + msg.payload.accessJwt
  }
};
</code></pre>
<p>The main thing to be aware of here is the &quot;collection&quot; of <code>app.bsky.feed.post</code> tells the API that this is a status post, the &quot;repo&quot; is our bot user and then the &quot;record&quot; contains our post text, the &quot;facets&quot; we generated with the <code>@atproto/api</code> module and an &quot;embed&quot; containing details of an URL preview card if an url was found in our message.</p>
<p>Finally, the <code>Authorization</code> header makes use of the <code>accessjwt</code> of the session we created in the previous stage.</p>
<p>With the message now, finally, build we can pass this on to an http request node that sends a <code>POST</code> request to <code>https://bsky.social/xrpc/com.atproto.repo.createRecord</code> and then finally sends the returned output to a debug node so we can check for errors.</p>
<h2>Posting to LinkedIn</h2>
<figure class="center" markdown="1">
<p><img src="/images/node-red-post-to-linkedin.png" alt="Node-RED - Post to LinkedIn subflow"></p>
<figcaption>Reusable Subflow for posting a status update to LinkedIn</figcaption>
</figure>
<p>I actually wrote about this last week - <a href="https://marcusnoble.co.uk/2025-02-02-posting-to-linkedin-via-the-api/">Posting to LinkedIn via the API</a>. The Node-RED implementation is just formatting the payload and then sending the HTTP request to the LinkedIn API.</p>
<pre><code class="language-js">let body = msg.body;
let urn = msg.urn;
let access_token = msg.access_token;

return {
  payload: {
    &quot;author&quot;: &quot;urn:li:person:&quot;+urn,
    &quot;lifecycleState&quot;: &quot;PUBLISHED&quot;,
    &quot;specificContent&quot;: {
      &quot;com.linkedin.ugc.ShareContent&quot;: {
        &quot;shareCommentary&quot;: {
          &quot;text&quot;: body
        },
        &quot;shareMediaCategory&quot;: &quot;ARTICLE&quot;,
        &quot;media&quot;: [
          {
            &quot;status&quot;: &quot;READY&quot;,
            &quot;originalUrl&quot;: msg.link
          }
        ]
      }
    },
    &quot;visibility&quot;: { &quot;com.linkedin.ugc.MemberNetworkVisibility&quot;: &quot;PUBLIC&quot; }
  },
  url: `https://api.linkedin.com/v2/ugcPosts`,
  headers: {
    &quot;Authorization&quot;: &quot;Bearer &quot; + access_token,
    &quot;LinkedIn-Version&quot;: &quot;202210&quot;,
    &quot;X-Restli-Protocol-Version&quot;: &quot;2.0.0&quot;
  }
}
</code></pre>
<h2>Twitter</h2>
<p>It's dead. The API isn't accessible anymore. Move on.</p>
<h2>✨ Bonus ✨ - Archiving to Wayback Machine</h2>
<p>As several of my bots are triggered from new posts being created on either this blog or new newsletter posts on <a href="https://cloud.native.now">CloudNative.Now</a> I realised I could leverage the same flow to also ensure that the <a href="https://web.archive.org/">Wayback Machine</a> on <a href="https://archive.org">Archive.org</a> grabs a copy of the new page to cache it.</p>
<figure class="center" markdown="1">
<p><img src="/images/node-red-blog-flow.png" alt="Node-RED - Flow used for new blog posts"></p>
<figcaption>In addition to sending to the social media subflows I also send to a Wayback Machine subflow</figcaption>
</figure>
<p>The subflow to handle the Wayback Machine calls isn't too complex:</p>
<figure class="center" markdown="1">
<p><img src="/images/node-red-wayback-machine.png" alt="Node-RED - Wayback Machine subflow"></p>
<figcaption>Subflow to cache page to Wayback Machine</figcaption>
</figure>
<p>The subflow splits the incoming call into two flows - one that caches the new page and another that re-caches the homepage as this is also updated with the new post included.</p>
<p>Formatting the URL for the new page is <em>very</em> simple, we just take the URL of the page from the incoming message and prepend the Wayback Machine URL - <code>https://web.archive.org/save/</code>:</p>
<pre><code class="language-js">return {
    url: `https://web.archive.org/save/${msg.payload.link[0]}`
};
</code></pre>
<p>This then output to an http request node that calls the Wayback Machine save URL with out new page URL that triggers it to create a new cached copy for us.</p>
<p>The homepage flow is <em>slightly</em> more work, we just need to grab the homepage link from the new post URL:</p>
<pre><code class="language-js">let link = msg.payload.link[0].replace('https://', '').split('/')[0]

return {
    url: `https://web.archive.org/save/https://${link}`
};
</code></pre>
<p>And then again pass this on to the http request node to trigger the cache process.</p>
<h2>Wrap Up</h2>
<p>Hopefully this is helpful to some of y'all. Even if you don't end up using Node-RED for your bots there should be enough info here on how to make use of the different APIs to help you along.</p>
<p>I'd love to hear about what bots y'all are making - let me know on <a href="https://bsky.app/profile/averagemarcus.bsky.social">Bluesky</a> or <a href="https://k8s.social/@Marcus">Mastodon</a>!</p>
]]></description><link>https://marcusnoble.co.uk/2025-02-08-building-social-media-bots-with-node-red</link><guid isPermaLink="true">https://marcusnoble.co.uk/2025-02-08-building-social-media-bots-with-node-red</guid><pubDate>Sat, 08 Feb 2025 00:00:00 GMT</pubDate></item><item><title><![CDATA[Posting to LinkedIn via the API]]></title><description><![CDATA[<p>I have a habit of automating as much of my life as I can. As part of that I have some small automations that handle posting to social media automatically when I publish a new blog post. (More on this in a future post). In the past this was Twitter but that is dead so these days I post to my Mastodon and Bluesky accounts automatically whenever a new blog post is available on my website. Maybe that's how you ended up here today! One platform I had been avoiding for a long time was LinkedIn. I had the impression that they didn't have a free, personal API available for users to use to post status updates. Well, turns out I was mistaken and it's actually not <em>too</em> difficult to setup, there's just a LOT of outdated information out there. So lets fix this...</p>
<h2>Creating a new App</h2>
<p>I'm going to assume you already have a LinkedIn account, if not you will need to create one or this whole process is pretty pointless. 😅 Once logged in, head over to <a href="https://www.linkedin.com/developers/apps/new"><code>https://www.linkedin.com/developers/apps/new</code></a> and fill out the details.</p>
<blockquote>
<p>Note: For this you require a &quot;company page&quot; that the app will be associated with. You can <a href="https://www.linkedin.com/company/setup/new">create one</a> easily enough that can just be used as a placeholder for the app.</p>
</blockquote>
<p>Once your app is created the first thing you'll need to do is head to the &quot;Settings&quot; tab and follow the process to verify your app. Not sure why this isn't part of the app creation flow but whatever. 🤷</p>
<p>With your app verified you now need to enable the &quot;products&quot; you'll be using so switch to the Products tab. For sharing posts to our LinkedIn profile we're going to need to enable the following:</p>
<ul>
<li>Share on LinkedIn</li>
<li>Sign In with LinkedIn using OpenID Connect</li>
</ul>
<p>Finally, switch over to the &quot;Auth&quot; tab and scroll down to the &quot;OAuth 2.0 scopes&quot; section, you should hopefully be seeing something similar to the following:</p>
<figure class="center" markdown="1">
<p><img src="/images/linkedin-scopes.png" alt="A screenshot of the OAuth 2.0 scopes section showing the following scopes listed: openid, profile, w_member_social and email"></p>
<figcaption>Expected LinkedIn app scopes</figcaption>
</figure>
<h2>Creating an Access Token</h2>
<p>Now that we have an app created we need to generate a new access token to use when making API calls. Thankfully, this is now very simple as LinkedIn provide a tool to generate a token whereas previously users were required to run a server to handle callbacks, etc.</p>
<p>Navigate to <a href="https://www.linkedin.com/developers/tools/oauth/token-generator"><code>https://www.linkedin.com/developers/tools/oauth/token-generator</code></a>, select your app from the dropdown, check all the scopes available and click the &quot;Request access token&quot; button.</p>
<p>You will likely need to login again here but once done you should be presented with an access token that will expire in 2 months.</p>
<p>Keep this safe and DO NOT SHARE IT.</p>
<h2>Using the API</h2>
<p>Now that we have an access token we can finally make calls to the API!</p>
<p>First, lets get our own user profile details as we're going to need our <a href="https://www.ietf.org/rfc/rfc2141.txt">URN</a> to use when posting status updated.</p>
<pre><code class="language-shell">curl --silent -H &quot;Authorization: Bearer ${ACCESS_TOKEN}&quot; \
  &quot;https://api.linkedin.com/v2/userinfo&quot;

{
    &quot;sub&quot;: &quot;XXXXXXX&quot;,
    &quot;email_verified&quot;: true,
    &quot;name&quot;: &quot;☁️Marcus Noble&quot;,
    &quot;locale&quot;: {
        &quot;country&quot;: &quot;US&quot;,
        &quot;language&quot;: &quot;en&quot;
    },
    &quot;given_name&quot;: &quot;☁️Marcus&quot;,
    &quot;family_name&quot;: &quot;Noble&quot;,
    &quot;email&quot;: &quot;&quot;,
    &quot;picture&quot;: &quot;&quot;
}
</code></pre>
<p>Replace the <code>${ACCES_TOKEN}</code> with your newly generated token. The response you get back should look similar to the above JSON but obviously with your LinkedIn details. Make a note of the <code>sub</code> value as we'll be using this next.</p>
<p>To post a status update to your profile:</p>
<pre><code class="language-shell">ACCESS_TOKEN=&quot;xxx&quot;
URN=&quot;xxx&quot;
POST_BODY=&quot;This is a test post&quot;

curl -X POST \
  -H &quot;LinkedIn-Version: 202210&quot; \
  -H &quot;X-Restli-Protocol-Version: 2.0.0&quot; \
  -H &quot;Authorization:  Bearer ${ACCESS_TOKEN}&quot; \
  --data '{&quot;author&quot;: &quot;urn:li:person:'${URN}'&quot;,&quot;lifecycleState&quot;: &quot;PUBLISHED&quot;,&quot;specificContent&quot;: {&quot;com.linkedin.ugc.ShareContent&quot;: {&quot;shareCommentary&quot;: {&quot;text&quot;: &quot;'${POST_BODY}'&quot;},&quot;shareMediaCategory&quot;: &quot;NONE&quot;}},&quot;visibility&quot;: {&quot;com.linkedin.ugc.MemberNetworkVisibility&quot;: &quot;PUBLIC&quot;}}' \
  &quot;https://api.linkedin.com/v2/ugcPosts&quot;
</code></pre>
<p>Fill in the environment variables with your appropriate values and run the curl command to post a new status update to your profile.</p>
<blockquote>
<p>Note: If you include any <code>&quot;</code> in your post body you will need to escape them or it will cause the JSON to be invalid.</p>
</blockquote>
<p>For full details on what is available when sharing to LinkedIn check out the <a href="https://learn.microsoft.com/en-gb/linkedin/consumer/integrations/self-serve/share-on-linkedin">official documentation</a>.</p>
]]></description><link>https://marcusnoble.co.uk/2025-02-02-posting-to-linkedin-via-the-api</link><guid isPermaLink="true">https://marcusnoble.co.uk/2025-02-02-posting-to-linkedin-via-the-api</guid><pubDate>Sun, 02 Feb 2025 00:00:00 GMT</pubDate></item><item><title><![CDATA[Announcing Cloud Native Now]]></title><description><![CDATA[<details>
<summary>Changelog</summary>
<p>2025-01-17: Added details about Cloudflare</p>
</details>
<p>Last week I <a href="https://bsky.app/profile/did:plc:mtepw4cvbmdvu7zygmm5xbop/post/3lf5iccpxj22u">announced</a> the launch of a new project I've been working on - ✨ <a href="https://cloudnative.now/">Cloud Native Now</a> ✨ - a new monthly newsletter that will provide a roundup of all the happenings in the cloud native world. This newsletter is my attempt at keeping myself, and others, up-to-date on all the latest news, tools and events happening in the cloud native world. A new issue will be published each month on the last Friday of that month and contain a roundup of articles, announcements, tools, tutorials, events and CFPs relating to cloud native technologies and the community.</p>
<figure class="center" markdown="1">
<p><img src="/images/Cloud_Native_Now_-_Square.jpg" alt="The Cloud Native Now logo"></p>
</figure>
<p>I hope y'all will <a href="https://cloudnative.now/about/#/portal/signup">subscribe to the email newsletter</a> or add the <a href="https://cloudnative.now/rss/">RSS feed</a> to your favourite feed reader to make sure you don’t miss anything! Or, if you prefer, you'll be able to view the entire <a href="https://cloudnative.now/archive/">archive</a> directly on the website.</p>
<p>There is also a <a href="https://bsky.app/profile/cloudnative.now?ref=cloudnative.now">Bluesky</a> and <a href="https://k8s.social/@CloudNativeNow?ref=cloudnative.now">Mastodon</a> account if you prefer to follow those. If you have any suggestions for items to include or any improvements / changes that you think would be good then please do reach out. I'd love to hear from you!</p>
<h2>Technical Details</h2>
<p>Now, I'm sure y'all are interested in all those tasty technical details!</p>
<h3>Infrastructure</h3>
<p>There's nothing too wild going on, the majority of things are running on a <a href="https://www.civo.com/">Civo</a> Kubernetes cluster along with a MySQL database and object store. The <a href="https://github.com/NamelessPlanet/CloudNativeNow-Gitops/tree/main/infra">infrastructure</a> is managed using <a href="https://opentofu.org/">OpenTofu</a> and applied by a <a href="https://github.com/NamelessPlanet/CloudNativeNow-Gitops/blob/main/.github/workflows/opentofu.yaml">GitHub action</a> when changes are merged into the main branch.</p>
<p>OpenTofu also <a href="https://github.com/NamelessPlanet/CloudNativeNow-Gitops/blob/611eb9c2350bca125e1200f77f2fdd4977cf5386/infra/main.tf#L164-L167">handles</a> installing <a href="https://fluxcd.io/">Flux</a> into the cluster which points back to the same repo and handles managing Kubernetes resources via GitOps. The setup is mostly the same as outlined in my previous post - <a href="https://marcusnoble.co.uk/2025-01-03-bootstrapping-a-civo-cluster-with-opentofu-and-flux/">Bootstrapping a Civo cluster with OpenTofu and Flux</a>. Along with installing Flux into the cluster, OpenTofu also creates a few <a href="https://github.com/NamelessPlanet/CloudNativeNow-Gitops/blob/611eb9c2350bca125e1200f77f2fdd4977cf5386/infra/main.tf#L169-L213">Kubernetes Secrets</a> containing various credentials that will be needed by applications (Civo creds for cluster-autoscaler, database credentials, object store credentials).</p>
<p>Flux is then setup to handle some <a href="https://github.com/NamelessPlanet/CloudNativeNow-Gitops/blob/611eb9c2350bca125e1200f77f2fdd4977cf5386/flux/kube-system/kustomization.yaml">core applications</a> (sealed-secrets, metrics server, etc.) and then applies some extra Kustomizations that handles installing <a href="https://github.com/NamelessPlanet/CloudNativeNow-Gitops/tree/611eb9c2350bca125e1200f77f2fdd4977cf5386/apps">apps</a> and <a href="https://github.com/NamelessPlanet/CloudNativeNow-Gitops/tree/611eb9c2350bca125e1200f77f2fdd4977cf5386/extra-config">extra-config</a> (these are things that require apps to be installed first so have a dependency).</p>
<h3>Applications</h3>
<p>There's not many applications that make up Cloud Native Now, the main is a self-hosted <a href="https://ghost.org/">Ghost</a> powered site that handles the website, newsletter and RSS feed. This provides a very nice content management system for creating posts with very little effort. For the actual <em>sending</em> of the newsletter, <a href="https://www.mailgun.com/">Mailgun</a> is the service of choice here. The posts and content for Ghost is all stored in a Civo-managed MySQL database and object store. For the theme, I forked the <a href="https://github.com/TryGhost/Alto">Alto</a> theme and applied some custom tweaks in my <a href="https://github.com/NamelessPlanet/CloudNativeNow-Theme">own repo</a>.</p>
<p>Within the cluster there is also <a href="https://github.com/kubernetes/ingress-nginx/">ingress-nginx</a> (for ingress), <a href="https://github.com/kubernetes/autoscaler/tree/master/cluster-autoscaler">cluster-autoscaler</a> (for dynamic worker node sizing) and <a href="https://github.com/cert-manager/cert-manager">cert-manager</a> (for SSL certificates).</p>
<p>Backups are very basic right now - a couple <a href="https://github.com/NamelessPlanet/CloudNativeNow-Gitops/blob/main/apps/ghost-backup.yaml">CronJobs</a> that run regularly to perform a MySQL dump and a copy of the object store content into another backup object store (with some versioning in place).</p>
<p>To handle the collation of entries for the newsletter throughout the month I've put together a custom <a href="https://www.getgrist.com/">Grist</a> project that I use to record links and data as I find them. Unfortunately, this isn't public and I don't have anything to share. To make this easier to work with I built a custom <a href="https://github.com/NamelessPlanet/url-to-grist">url-to-grist</a> tool that allows me to share an URL to it and it'll automatically import it into Grist and attempt to grab some metadata about the page to help with putting together the newsletter.</p>
<p>I'm also re-using my existing <a href="https://nodered.org/">Node-RED</a> setup to have new posts (based on the RSS feed) automatically posted to both the Mastodon and Bluesky accounts. Again, this is unfortunately not public but would be happy to chat about it if anyone is interested.</p>
<p>One area that still needs improvement is monitoring. Currently there is some <a href="https://github.com/prometheus/blackbox_exporter">blackbox</a> monitoring from my local infrastructure to ensure pages are reachable and I have some (temporary) <a href="https://github.com/prometheus/mysqld_exporter">MySQL monitoring</a> to keep an eye on the state of the database. This is an area I'll be focussing more on over the next few months to ensure uptime is maintained as much as possible.</p>
<h3>Issues along the way</h3>
<p>It hasn't been all smooth sailing so far. I had some trial-and-error with OpenTofu and getting the infra set up how I wanted. This was <em>mostly</em> due to me having not worked with it for a long time and just bumbling through it.</p>
<p>One major issue I have experienced a few times other the past week or so is <a href="https://bsky.app/profile/cloudnative.now/post/3lfcv22dm722t">database connectivity problems</a>. It's still unclear exactly what happened and I've been working with Civo support to try and figure this out but it looks like somehow the database was being overwhelmed. I've now got monitoring in place to try and catch this happening again but as of now it hasn't yet occurred again.</p>
<p>It looks like I had also misconfigured the email sending at some point between launching and publishing this blog post. 😅 This is now fixed and sign ups should be working again now.</p>
<h3>The Future</h3>
<p>There's several things I already have planned as well as some vague ideas for things I might do, depending on how popular this ends up becoming.</p>
<p>Besides improving the monitoring as mentioned above <s>I plan to also implement <a href="https://www.cloudflare.com/">Cloudflare</a> as a cache in front of the website to ensure the database doesn't get overloaded too much</s> (see below).</p>
<p>Eventually, after a couple issues or so to get a feel for what people want, I'm hoping to also set up an accompanying Podcast where I talk with a different guest each month about all the happenings in that months newsletter. I want to use this as a way to talk with all the ✨ wonderful ✨ humans in our cloud native community and get a wide and diverse array of thoughts and opinions to complement (and maybe contrast) with my own. I'm really hoping to not have this just become &quot;another white guy with a podcast&quot; but something that can really show the amazing individuals and efforts without our community.</p>
<p>If things go well and this ends up taking off, I have some loose plans on how to cover the costs as things grow. I guarantee that all newsletter editions will always be freely available without an account but I'm hoping to later introduce paid membership that comes with some extra goodies. Not sure what <em>exactly</em> yet but I'm thinking along the lines of being able to add comments to the posts, the warm good feeling of helping keeping things running and <em>maybe</em> some physical rewards (postcards, stickers, etc.) if I can figure out how to send to the EU. 😅 There's also a possibility to sponsored posts and job listings in the future but I need to come up with some requirements that these would need to meet first to ensure quality.</p>
<p>And who knows... maybe I'll go wild and look at putting together a <a href="https://bsky.app/profile/salisburyheavyindustries.com/post/3lf5j56yo5k2w">conference</a> that I've always wanted too!</p>
<h3>Updates</h3>
<p>I have now switched to using <a href="https://www.cloudflare.com/">Cloudflare</a> for the nameserver and proxying requests through their edge servers with a 2 hour cache added to most of the site. This should allow the site to remain up, and read-only, during any periods where the database (or other) is having trouble. I did have a small issue when setting this up that resulted in infinite redirects - turns out I needed to switch from &quot;Flexible&quot; to &quot;Full&quot; in the Cloudflare SSL configuration, otherwise nginx kept trying to perform a redirect to the SSL port as Cloudflare was sending traffic to port 80.</p>
<p>For those interested, this is the specific expression I'm using for the caching in Cloudflare with an Edge TTL of 2 hours:</p>
<pre><code>(http.request.full_uri wildcard &quot;https://cloudnative.now/*&quot; and not starts_with(http.request.uri.path, &quot;/ghost/&quot;))
or
(http.request.full_uri wildcard &quot;https://www.cloudnative.now/*&quot; and not starts_with(http.request.uri.path, &quot;/ghost/&quot;))
</code></pre>
]]></description><link>https://marcusnoble.co.uk/2025-01-14-announcing-cloud-native-now</link><guid isPermaLink="true">https://marcusnoble.co.uk/2025-01-14-announcing-cloud-native-now</guid><pubDate>Tue, 14 Jan 2025 00:00:00 GMT</pubDate></item><item><title><![CDATA[Bootstrapping a Civo cluster with OpenTofu and Flux]]></title><description><![CDATA[<p>As part of a side project I'm currently working on I needed to spin up a new Kubernetes cluster that I could manage via GitOps. I decided to take this opportunity to take a look at <a href="https://opentofu.org/">OpenTofu</a> and see how it handles as it's been several years now since I last used Terraform. My plan was to use OpenTofu to scaffold a fairly basic <a href="https://www.civo.com/">Civo</a> Kubernetes cluster and then use <a href="https://fluxcd.io/">Flux</a> to handle installing workloads into the cluster. It took me a little trial-and-error so I thought I'd write up my final setup to help others avoid the issues and to help my future self when I come to do this again in a couple years!</p>
<p>If you'd rather skip this blog post and just see the final result you can find all the code in the repo <a href="https://github.com/AverageMarcus/bootstrap-civo-cluster-blog-post-example">bootstrap-civo-cluster-blog-post-example</a>.</p>
<h2>Pre-Requisites</h2>
<p>For the purposes of this post the following are assumed:</p>
<ul>
<li>You have an account with Civo (if not, you <a href="https://dashboard.civo.com/signup">can sign up now</a> and get <strong>$250</strong> credit to play with)</li>
<li>You have your <a href="https://dashboard.civo.com/security">Civo API Token</a> set as the <code>CIVO_TOKEN</code> environment variable in your terminal</li>
<li>You have the OpenTofu CLI <a href="https://opentofu.org/docs/intro/install/">installed</a></li>
<li>You have a GitHub repo created to contain all your GitOps source (this can be private, if you prefer)</li>
<li>A <a href="https://github.com/settings/tokens">GitHub personal access token</a></li>
</ul>
<h2>Repo Structure</h2>
<p>First we'll setup our repo file structure so that we have the directories we need for all of our different GitOps resources.</p>
<p>From the root of your new repo:</p>
<pre><code class="language-sh">mkdir -p {apps,flux,infra}
</code></pre>
<p>You should end up with something like the following:</p>
<pre><code>📂 .
├── 📂 apps
├── 📂 flux
├── 📂 infra
</code></pre>
<h2>Infrastructure</h2>
<p>The first thing we'll want to do is get our infrastructure setup using OpenTofu. All our infra code will live within our new <code>infra</code> directory so lets go ahead and create the files we'll be working with:</p>
<pre><code class="language-sh">cd infra
touch main.tf # This is our main file containing our resources
touch providers.tf # This contains the config for the providers we will be using
touch variables.tf # This will describe any input variables we plan to use
touch output.tf # This contains all the output variables we will generate from our new infrastructure
</code></pre>
<p>With our new project ready lets setup start setting up the providers we will be using. For this we will be making use of two providers: <a href="https://registry.terraform.io/providers/civo/civo/latest/docs">Civo</a> and <a href="https://registry.terraform.io/providers/fluxcd/flux/latest/docs">Flux</a>. In our <code>providers.tf</code> lets define each of these providers (along with their current versions at time of writing this).</p>
<pre><code class="language-tf">terraform {
  required_providers {
    civo = {
      source  = &quot;civo/civo&quot;
      version = &quot;&gt;= 1.1.3&quot;
    }
    flux = {
      source  = &quot;fluxcd/flux&quot;
      version = &quot;&gt;= 1.2&quot;
    }
  }
}
</code></pre>
<p>We also need to configure each of the providers with various details. For now we're just going to add placeholders as they will reference some resources we haven't yet created so we'll come back to fill these in later. For now, add the following placeholders to the bottom of your <code>providers.tf</code> file:</p>
<pre><code class="language-tf">provider &quot;civo&quot; {
}

provider &quot;flux&quot; {
}
</code></pre>
<p>With our providers defined we can now initialize our project and have the providers downloaded so we can start working.</p>
<pre><code class="language-sh">tofu init
</code></pre>
<p>Once this is done you'll notice some new files have been created in our <code>infra</code> directory:</p>
<pre><code>📂 .
├── 📂 .terraform
│  └── 📂 providers
├── 📄 .terraform.lock.hcl
├── 📄 main.tf
├── 📄 output.tf
├── 📄 providers.tf
└── 📄 variables.tf
</code></pre>
<p>The <code>.terraform.lock.hcl</code> is our <a href="https://opentofu.org/docs/language/files/dependency-lock/">dependency lock file</a> that we will want to commit into git but the <code>.terraform</code> file contains the downloaded providers and ideally we would leave that out of our repo and generate it as needed. So lets take a moment to create a <code>.gitignore</code> in the root our of repo to ensure we are only going to commit the files we care about.</p>
<pre><code>**/.terraform/*
*.tfstate
*.tfstate.*
*.tfvars
*.tfvars.json
.terraform.tfstate.lock.info
.terraformrc
terraform.rc
</code></pre>
<p>Great! Now lets start creating some infrastructure!</p>
<p>Let's move over to our <code>main.tf</code> where we're going to define three resources - a <a href="https://registry.terraform.io/providers/civo/civo/latest/docs/resources/network">Civo Network</a>, a <a href="https://registry.terraform.io/providers/civo/civo/latest/docs/resources/firewall">Civo Firewall</a> and a <a href="https://registry.terraform.io/providers/civo/civo/latest/docs/resources/kubernetes_cluster">Civo Kubernetes Cluster</a>. In all of my examples, I will be using the imaginative name of <code>example</code> so replace it with whatever makes sense for you.</p>
<pre><code class="language-tf">resource &quot;civo_network&quot; &quot;example&quot; {
  label = &quot;example-network&quot;
}

resource &quot;civo_firewall&quot; &quot;example&quot; {
  name                 = &quot;example-firewall&quot;
  create_default_rules = true
  network_id           = civo_network.example.id
}

resource &quot;civo_kubernetes_cluster&quot; &quot;example&quot; {
  name        = &quot;ExampleCluster&quot;

  firewall_id = civo_firewall.example.id
  network_id  = civo_network.example.id

  cluster_type       = &quot;k3s&quot;
  kubernetes_version = &quot;1.30.5-k3s1&quot;
  cni                = &quot;flannel&quot;

  write_kubeconfig = true

  pools {
    size       = &quot;g4s.kube.small&quot;
    node_count = 2
  }

  # This allows us to make use of cluster autoscaler.
  # If you don't plan to use cluster autoscaler you can remove this `lifecycle` block.
  lifecycle {
    ignore_changes = [
      pools[&quot;node_count&quot;],
    ]
  }
}
</code></pre>
<p>It's looking good so far but before we attempt to create our infra we need to specify which Civo region we'll be using. Rather than specifying this on every resource we can configure it on our Civo provider which will then apply to all resources it creates if not set.</p>
<p>To make things more configurable, let's create a variable that we can use to set the region at runtime. In our <code>variables.tf</code> lets add a new <code>region</code> variable and give it a default value so we don't always need to provide it:</p>
<pre><code class="language-tf">variable &quot;region&quot; {
  description = &quot;The region to create all our Civo resources in&quot;
  type        = string
  default     = &quot;LON1&quot;
}
</code></pre>
<p>With this in place we can return back to our <code>providers.tf</code> and set the region on our Civo provider by updating the <code>civo</code> placeholder we added earlier:</p>
<pre><code class="language-tf">provider &quot;civo&quot; {
  region = var.region
}
</code></pre>
<p>We're now at a point where we could create some infrastructure. Lets first confirm everything is looking good by running <code>tofu plan</code> to get an overview of what will be created. You should see something similar to the following:</p>
<pre><code>✨ tofu plan

OpenTofu used the selected providers to generate the following execution plan. Resource actions are indicated with the following symbols:
  + create

OpenTofu will perform the following actions:

  # civo_firewall.example will be created
  + resource &quot;civo_firewall&quot; &quot;example&quot; {
      + create_default_rules = true
      + id                   = (known after apply)
      + name                 = &quot;example-firewall&quot;
      + network_id           = (known after apply)
      + region               = (known after apply)
    }

  # civo_kubernetes_cluster.example will be created
  + resource &quot;civo_kubernetes_cluster&quot; &quot;example&quot; {
      + api_endpoint           = (known after apply)
      + cluster_type           = &quot;k3s&quot;
      + cni                    = &quot;flannel&quot;
      + created_at             = (known after apply)
      + dns_entry              = (known after apply)
      + firewall_id            = (known after apply)
      + id                     = (known after apply)
      + installed_applications = (known after apply)
      + kubeconfig             = (sensitive value)
      + kubernetes_version     = &quot;1.30.5-k3s1&quot;
      + master_ip              = (known after apply)
      + name                   = &quot;ExampleCluster&quot;
      + network_id             = (known after apply)
      + num_target_nodes       = (known after apply)
      + ready                  = (known after apply)
      + region                 = (known after apply)
      + status                 = (known after apply)
      + target_nodes_size      = (known after apply)
      + write_kubeconfig       = true

      + pools {
          + instance_names      = (known after apply)
          + label               = (known after apply)
          + node_count          = 2
          + public_ip_node_pool = (known after apply)
          + size                = &quot;g4s.kube.small&quot;
        }
    }

  # civo_network.example will be created
  + resource &quot;civo_network&quot; &quot;example&quot; {
      + cidr_v4        = (known after apply)
      + default        = (known after apply)
      + id             = (known after apply)
      + label          = &quot;example-network&quot;
      + name           = (known after apply)
      + nameservers_v4 = (known after apply)
      + region         = (known after apply)
    }

Plan: 3 to add, 0 to change, 0 to destroy.
</code></pre>
<p>Looking good! But, before we go ahead and apply this there's one more thing we should do first - define some outputs!</p>
<p>We would need at least a couple output values defined so we can access the new cluster. Lets create some outputs that report the cluster name, it's API endpoint and the KubeConfig file we can use to access it with <code>kubectl</code>. We'll define these in our <code>outputs.tf</code> file:</p>
<pre><code class="language-tf">output &quot;kubernetes_name&quot; {
  value       = civo_kubernetes_cluster.example.name
  description = &quot;The name of the Kubernetes cluster&quot;
}

output &quot;kubernetes_api_endpoint&quot; {
  value       = civo_kubernetes_cluster.example.api_endpoint
  description = &quot;The API endpoint of the Kubernetes cluster&quot;
}

output &quot;kubernetes_kubeconfig&quot; {
  value       = civo_kubernetes_cluster.example.kubeconfig
  description = &quot;The KubeConfig for the Kubernetes cluster&quot;
  sensitive   = true
}
</code></pre>
<p>Note that for the <code>kubernetes_kubeconfig</code> we've set <code>sensitive = true</code>. This will prevent the contents from being printed out into any logs and leaking our credentials but will still be saved into the state so be careful.</p>
<p>Ok, I think we are ready to create some infra! You can choose to skip this and apply everything together at the end but if you want to check that things are working as expected you can run <code>tofu apply</code> and then follow the instructions output.</p>
<p>If all goes well you should see OpenTofu creating the new Civo Kubernetes cluster.</p>
<pre><code>Apply complete! Resources: 3 added, 0 changed, 0 destroyed.
</code></pre>
<p>You may notice that a new file has also been created - <code>terraform.tfstate</code>. Assuming you correctly created the <code>.gitignore</code> earlier this file should be ignored by git and not committed into our repo as it will contain sensitive values!</p>
<p>Now is a good time to commit in your changes so far before moving on to the next section!</p>
<h2>Installing Flux</h2>
<p>Now that we have a Kubernetes cluster created we can now have Flux installed into it to handle our GitOps'd workloads.</p>
<p>Go back to our <code>providers.tf</code> where we'll configure our Flux provider to make use of our new Kubernetes cluster. For this we are going to reference output values from our Civo Kubernetes Cluster and make use of two built in OpenTofu functions - <a href="https://opentofu.org/docs/language/functions/yamldecode/">yamldecode</a> and <a href="https://opentofu.org/docs/language/functions/base64encode/">base64encode</a>.</p>
<pre><code class="language-tf">provider &quot;flux&quot; {
  kubernetes = {
    host = civo_kubernetes_cluster.example.api_endpoint

    client_certificate = base64decode(
      yamldecode(civo_kubernetes_cluster.example.kubeconfig).users[0].user.client-certificate-data
    )
    client_key = base64decode(
      yamldecode(civo_kubernetes_cluster.example.kubeconfig).users[0].user.client-key-data
    )
    cluster_ca_certificate = base64decode(
      yamldecode(civo_kubernetes_cluster.example.kubeconfig).clusters[0].cluster.certificate-authority-data
    )
  }
}
</code></pre>
<p>As you can see above, we're making use of the <code>api_endpoint</code> output value for the host and we're using the <code>kubeconfig</code> output value with our two functions to parse the specific values we need from the yaml file and then encode them into base64 for use by Flux.</p>
<p>Flux also needs a git repo configured that it will use for all it's gitops source. To make this dynamic, lets define some variables in our <code>variables.tf</code> that we can use to configure this at runtime. We're going to add three new variables - one will be to pass in a GitHub Token and the other two will define the repo.</p>
<pre><code class="language-tf">variable &quot;github_token&quot; {
  description = &quot;GitHub token&quot;
  sensitive   = true
  type        = string
  default     = &quot;&quot;
}

variable &quot;github_org&quot; {
  description = &quot;GitHub organization&quot;
  type        = string
  default     = &quot;AverageMarcus&quot;
}

variable &quot;github_repository&quot; {
  description = &quot;GitHub repository&quot;
  type        = string
  default     = &quot;bootstrap-civo-cluster-blog-post-example&quot;
}
</code></pre>
<p>Here you can see I've set default values that point to the example code repo for this blog post.</p>
<p>Now, lets go back to our Flux provider config and add the git config, using our variables:</p>
<pre><code class="language-tf">provider &quot;flux&quot; {
  kubernetes = {
    host = civo_kubernetes_cluster.example.api_endpoint

    client_certificate = base64decode(
      yamldecode(civo_kubernetes_cluster.example.kubeconfig).users[0].user.client-certificate-data
    )
    client_key = base64decode(
      yamldecode(civo_kubernetes_cluster.example.kubeconfig).users[0].user.client-key-data
    )
    cluster_ca_certificate = base64decode(
      yamldecode(civo_kubernetes_cluster.example.kubeconfig).clusters[0].cluster.certificate-authority-data
    )
  }

  git = {
    url = &quot;https://github.com/${var.github_org}/${var.github_repository}.git&quot;
    http = {
      username = &quot;git&quot;
      password = var.github_token
    }
  }
}
</code></pre>
<p>Great! We're almost there! The last thing we need to set up Flux is define a <a href="https://registry.terraform.io/providers/fluxcd/flux/latest/docs/resources/bootstrap_git"><code>flux_bootstrap_git</code></a> resource. Back in our <code>main.tf</code> add the following to the bottom:</p>
<pre><code class="language-tf">resource &quot;flux_bootstrap_git&quot; &quot;example&quot; {
  embedded_manifests = true

  # This is the path within our repo where Flux will watch by default
  path               = &quot;flux&quot;
}
</code></pre>
<p>Now, let's install Flux!</p>
<p>Before we do so, this is a good place to commit and push your latest changes before Flux adds files into your repo.</p>
<p>We're going to run apply again but this time we need to pass in our GitHub personal access token so that it has access to write to our repo.</p>
<pre><code class="language-sh">tofu apply -var=&quot;github_token=${GITHUB_TOKEN}&quot;
</code></pre>
<p>Once done you should see a new directory created in your repo on GitHub containing the flux files - <code>flux/flux-system</code>. Let's pull these down into our local copy before we continue - <code>git pull</code>.</p>
<h2>Confirming Our Cluster Is Setup</h2>
<p>Now that we've got our Civo infrastructure created and Flux setup and installed I think it might be a good time to take a look in our cluster and confirm everything looks as we expect.</p>
<p>We can extract our cluster's KubeConfig by grabbing it from our outputs:</p>
<pre><code class="language-sh">tofu output -raw kubernetes_kubeconfig &gt; ~/.kube/civo-example.yaml
export KUBECONFIG=~/.kube/civo-example.yaml
</code></pre>
<p>With our new KubeConfig set, lets take a little look at our cluster, feel free to skip over this section if you are happy with your cluster running.</p>
<pre><code class="language-sh">✨ kubectl get nodes
NAME                                                  STATUS   ROLES    AGE   VERSION
k3s-examplecluster-94d9-fc72b3-node-pool-910d-4znod   Ready    &lt;none&gt;   23m   v1.30.5+k3s1
k3s-examplecluster-94d9-fc72b3-node-pool-910d-esen4   Ready    &lt;none&gt;   23m   v1.30.5+k3s1

✨ kubectl get namespaces
NAME              STATUS   AGE
default           Active   24m
flux-system       Active   6m29s
kube-node-lease   Active   24m
kube-public       Active   24m
kube-system       Active   24m

✨ kubectl get gitrepo --all-namespaces
NAMESPACE     NAME          URL                                                                             AGE     READY   STATUS
flux-system   flux-system   https://github.com/AverageMarcus/bootstrap-civo-cluster-blog-post-example.git   6m57s   True    stored artifact for revision 'main@sha1:d1f3a4272b26ebc97b2698dd59dbbc4f462485fa'

✨ kubectl get kustomizations --all-namespaces
NAMESPACE     NAME          AGE     READY   STATUS
flux-system   flux-system   6m46s   True    Applied revision: main@sha1:d1f3a4272b26ebc97b2698dd59dbbc4f462485fa
</code></pre>
<h2>Installing Workloads</h2>
<p>What's the point of having a cluster setup if we don't install anything on it? Finally, we're going to define some workloads in our git repo and have Flux automatically pick them up and install them into our cluster for us.</p>
<p>First thing we're going to do is define a new <code>Kustomization</code> that will point to our <code>apps</code> directory in our cluster. Create a new file at <code>flux/apps.yaml</code> with the following contents:</p>
<pre><code class="language-yaml">apiVersion: kustomize.toolkit.fluxcd.io/v1
kind: Kustomization
metadata:
  name: apps
  namespace: flux-system
spec:
  interval: 5m0s
  path: ./apps
  prune: true
  sourceRef:
    kind: GitRepository
    name: flux-system
</code></pre>
<p>This will create a new Kustomization in our cluster that will use the already-existing <code>flux-system</code> GitRepository that is pointing to our GitHub repo.</p>
<p>Before we commit that, lets first add an application into our <code>/apps</code> directory for it to pick up. For our example, we're going to install <a href="https://github.com/kubernetes/ingress-nginx">ingress-nginx</a> as a Helm chart. Create the file <code>apps/ingress-nginx.yaml</code> with the following:</p>
<pre><code class="language-yaml">apiVersion: source.toolkit.fluxcd.io/v1
kind: HelmRepository
metadata:
  name: nginx
  namespace: default
spec:
  url: https://kubernetes.github.io/ingress-nginx
---
apiVersion: helm.toolkit.fluxcd.io/v2
kind: HelmRelease
metadata:
  name: nginx
  namespace: default
spec:
  interval: 5m
  targetNamespace: kube-system
  chart:
    spec:
      chart: ingress-nginx
      version: &quot;4.11.3&quot;
      sourceRef:
        kind: HelmRepository
        name: nginx
        namespace: default
  values: {}
</code></pre>
<p>This will create two new resources in our cluster - a <code>HelmRepository</code> and a <code>HelmRelease</code>. These two together will install ingress-nginx into our cluster with the version <code>4.11.3</code> and the default values for the chart. If you want to provide any values, you can do so now.</p>
<p>Commit those changes and push them up to GitHub.</p>
<p>It may take a few minutes but eventually Flux will pick up the new resources and start processing them. You can watch for the changes with the following:</p>
<pre><code class="language-sh">kubectl get kustomizations --all-namespaces --watch
</code></pre>
<p>Once the Kustomization is there and showing as ready we can then check for ingress-nginx being installed:</p>
<pre><code class="language-sh">✨ kubectl get pods --namespace kube-system
NAME                                                          READY   STATUS    RESTARTS        AGE
civo-ccm-7967db4cfc-md7jg                                     1/1     Running   2 (6m49s ago)   38m
civo-csi-controller-0                                         4/4     Running   0               38m
civo-csi-node-b29h9                                           2/2     Running   0               38m
civo-csi-node-h9b8p                                           2/2     Running   0               38m
coredns-7b98449c4-t5zz4                                       1/1     Running   0               38m
kube-system-nginx-ingress-nginx-controller-7ff945b767-tc5fz   1/1     Running   0               72s
</code></pre>
<p>🎉</p>
<p>There's our app, managed by GitOps with Flux, running in our Civo cluster created as Infrastructure as Code with OpenTofu!</p>
<h2>Wrapping Up</h2>
<p>We've now got a cluster setup that we can easily install and manage workloads via git. From here you can add more applications under the <code>apps</code> directory and they will be automatically picked up and processed by Flux.</p>
<p>Want to take things further?</p>
<ul>
<li>How about setting up <a href="https://docs.renovatebot.com/">Renovate</a> to handle automatic version upgrades?</li>
<li>Maybe you need more infra for your workloads - how about a <a href="https://registry.terraform.io/providers/civo/civo/latest/docs/resources/database">Database</a> or an <a href="https://registry.terraform.io/providers/civo/civo/latest/docs/resources/object_store">Object Store</a>?</li>
<li>What about using <a href="https://opentofu.org/docs/language/state/remote/">Remote State</a> and leveraging GitHub Actions to automatically apply changes in the <code>infra</code> directory?</li>
</ul>
<p>These are out of scope for this post but if I get the time they may feature in a future post. 😉</p>
<p>Hopefully this is helpful for someone! I'd love to hear how you get on and any feedback you have, let me know on Mastodon at <a href="https://k8s.social/@Marcus">@Marcus@k8s.social</a> or Bluesky at <a href="https://bsky.app/profile/averagemarcus.bsky.social">@averagemarcus.bsky.social</a>!</p>
]]></description><link>https://marcusnoble.co.uk/2025-01-03-bootstrapping-a-civo-cluster-with-opentofu-and-flux</link><guid isPermaLink="true">https://marcusnoble.co.uk/2025-01-03-bootstrapping-a-civo-cluster-with-opentofu-and-flux</guid><pubDate>Fri, 03 Jan 2025 00:00:00 GMT</pubDate></item><item><title><![CDATA[My Recommended Kubernetes Resources for Newbies]]></title><description><![CDATA[<details>
<summary>Changelog</summary>
<p>2024-06-24: Added links to video channels DevOpsToolkit, You Choose and Enlightning. Added link to KubeHuddle conference.</p>
</details>
<p>Recently, a friend of mine asked me what resources I'd recommend to start learning about Kubernetes. He was a victim of the layoffs that seem to be so prevalent right now and has experience as a classic SysOps / SysAdmin engineer but no expose to Kubernetes yet and wanted to learn to help improve his job-hunting prospects.</p>
<p>I wasn't sure what to recommend at first, it's been a long time since I was learning Kubernetes for the first time and wasn't sure what was still useful and relevant but what follows is what I ended up sharing with him, and now with all of you.</p>
<blockquote>
<p>If you have suggestions for more resources to include here please reach out to me on Mastodon at <a href="https://k8s.social/@Marcus">@Marcus@k8s.social</a>!</p>
</blockquote>
<p>(I have also previously wrote about <a href="/2021-09-02-my-recommended-go-resources/">my recommended Go resources</a> which also might be of interest.)</p>
<h2>📚 Books</h2>
<ul>
<li><a href="https://www.amazon.com/Kubernetes-Book-Version-November-2018-ebook/dp/B072TS9ZQZ">The Kubernetes Book</a> by Nigel Poulton</li>
<li><a href="https://www.amazon.com/Docker-Deep-Dive-Nigel-Poulton-ebook/dp/B01LXWQUFF/">Docker Deep Dive</a> by Nigel Poulton (if you also need to get up to speed with Docker / Containers)</li>
<li><a href="https://www.amazon.com/Quick-Start-Kubernetes-Nigel-Poulton-ebook/dp/B08T21NW4Z">Quick Start Kubernetes</a> by Nigel Poulton</li>
<li><a href="https://www.amazon.com/Understanding-Kubernetes-visual-way-sketchnotes/dp/B0BB619188">Understanding Kubernetes in a visual way</a> by Aurélie Vache</li>
<li><a href="https://www.cncf.io/phippy/the-childrens-illustrated-guide-to-kubernetes/">The Illustrated Children’s Guide to Kubernetes</a> - don't let its child-focussed format fool you, this is a great (free) book! And theres a whole <a href="https://www.cncf.io/phippy/">series of books</a> on related topics available.</li>
</ul>
<p>The following might not be quite beginner friendly and should be picked up after you understand the basics:</p>
<ul>
<li><a href="https://www.amazon.co.uk/Kubernetes-Best-Practices-Blueprints-Applications/dp/1098142160">Kubernetes Best Practices</a> by Brendan Burns, Eddie Villalba, Dave Strebel &amp; Lachlan Evenson</li>
<li><a href="https://www.amazon.co.uk/Kubernetes-Patterns-Reusable-Designing-Applications/dp/1492050288/">Kubernetes Patterns</a> by by Bilgin Ibryam, Roland Huβ Ph.d.</li>
</ul>
<h2>🧰 Services, Tools and Libraries</h2>
<ul>
<li><a href="https://www.civo.com/">Civo</a> - you can get $250 worth of credit when you sign up to play with their managed Kubernetes offerings.</li>
<li><a href="https://kind.sigs.k8s.io/">Kind</a> - local testing / experimenting with Kubernetes running inside a container.</li>
<li><a href="https://www.talos.dev/">Talos</a> - My favourite OS for running Kubernetes on. This is what powers my homelab cluster.</li>
<li><a href="https://k3s.io/">k3s</a> - a great, lightweight Kubernetes distrobution that you can even run on a Raspberry Pi!</li>
<li><a href="https://github.com/derailed/k9s/">k9s</a> - an interactive terminal tool for working with a Kubernetes cluster</li>
</ul>
<h2>🔗 Websites / Blog Posts / Tutorials</h2>
<ul>
<li><a href="https://kubernetes.io/docs/tutorials/kubernetes-basics/">Official Kubernetes tutorials</a></li>
<li><a href="https://www.civo.com/learn">Civo Learn</a> - Tutorials</li>
<li><a href="https://www.civo.com/academy">Civo Academy</a> - A series of workshops to help beginners learn Kubernetes</li>
<li><a href="https://kube.academy/">Kube Academy</a> from VMWare has a lot of tutorials to follow</li>
<li><a href="https://www.pluralsight.com">Pluralsight</a> - if you have a subscription there's several good classes on here related to Kubernetes. I can recommend the ones by <a href="https://nigelpoulton.com/">Nigel Poulton</a>.</li>
<li>I've previously written a blog post, <a href="/2022-07-04-managing-kubernetes-without-losing-your-cool/">Managing Kubernetes without losing your cool</a>, that has some tools and tips I recommend for everyone working with Kubernetes.</li>
</ul>
<h2>📺 Videos</h2>
<ul>
<li><a href="https://www.youtube.com/@RawkodeAcademy">Rawkode Academy</a> - David covers more than just Kubernetes and has a lot of useful videos about all things cloud native (and more)!</li>
<li><a href="https://www.youtube.com/@KunalKushwaha">Kunal Kushwaha</a> - Kunal is a powerhouse when it comes to making super helpful videos. There a lot of stuff here, not just Kubernetes.</li>
<li><a href="https://www.youtube.com/@AnaisUrlichs">Anaïs Urlichs</a> - Lots of great, beginner friendly video about Kubernetes and related tools. Specifically the <a href="https://www.youtube.com/watch?v=SeQevrW176A">Full Kubernetes tutorial on Docker, KinD, kubectl, Helm, Prometheus, Grafana</a> which might be helpful.</li>
<li><a href="https://www.youtube.com/@cncf/playlists">CNCF</a> - Collections of conference talks from previous KubeCon, Kubernetes Community Days and other conferences.</li>
<li><a href="https://www.youtube.com/@DevOpsToolkit">DevOpsToolkit</a> - Viktor is great at introducing complex topics in an approachable way</li>
<li><a href="https://www.youtube.com/playlist?list=PLyicRj904Z9-FzCPvGpVHgRQVYJpVmx3Z">You Choose</a> - a series of videos going through lots of the CNCF Landscape tools and pitting them against each other.</li>
<li><a href="https://tanzu.vmware.com/content/enlightning">Enlightning</a> - Whitney does a great job of going through lots of CNCF technologies with the help of her lightboard.</li>
</ul>
<h2>📽️ Conferences</h2>
<ul>
<li>KubeCon + CloudnativeCon is great for meeting people in the community if you have a chance to attend. There are events available in <a href="https://events.linuxfoundation.org/kubecon-cloudnativecon-north-america/">North America</a>, <a href="https://events.linuxfoundation.org/kubecon-cloudnativecon-europe-2025/">Europe</a>, <a href="https://events.linuxfoundation.org/kubecon-cloudnativecon-open-source-summit-ai-dev-china/">China</a> and <a href="https://events.linuxfoundation.org/kubecon-cloudnativecon-india/">India</a></li>
<li>I'm a huge fan of <a href="https://www.cncf.io/kcds/">Kubernetes Community Days</a>! These are smaller, local Kubernetes conferences hosted all over the world.</li>
<li><a href="https://cloud-native.rejekts.io/">Cloud Native Rejekts</a> is recent favourite of mine and is set up as the &quot;b-side&quot; conference to KubeCon. This great, community focussed conference runs for 2 days before the main KubeCon conference.</li>
<li><a href="https://kubehuddle.com/">KubeHuddle</a> - a wonderful community focussed conference</li>
</ul>
<h2>📰 Newsletters</h2>
<ul>
<li><a href="https://www.cncf.io/kubeweekly/">KubeWeekly</a> from CNCF</li>
</ul>
<hr>
<p>Hopefully these are helpful for someone! I'd love to hear what your favourite resources are, let me know on Mastodon at <a href="https://k8s.social/@Marcus">@Marcus@k8s.social</a>!</p>
]]></description><link>https://marcusnoble.co.uk/2024-06-24-my-recommended-kubernetes-resources-for-newbies</link><guid isPermaLink="true">https://marcusnoble.co.uk/2024-06-24-my-recommended-kubernetes-resources-for-newbies</guid><pubDate>Mon, 24 Jun 2024 00:00:00 GMT</pubDate></item><item><title><![CDATA[My First EMF Camp]]></title><description><![CDATA[<p>Last weekend I finally went to my first ever <a href="https://www.emfcamp.org/">EMF Camp</a> after wanting to go for 4+ years now. I had a thoroughly good time while I was there and wanted to share my experience to encourage others to go in the future.</p>
<h2>What is it?</h2>
<p>For those that don't know, the event is described as &quot;Electromagnetic Field is a non-profit camping festival for those with an inquisitive mind or an interest in making things&quot;. What this <em>basically</em> means is it's a multi-day camping festival for nerds of all kinds! The event takes places from Thursday to Monday (if you want to stay that long) in a large deer park in Eastnor in a lovely part of the UK. Throughout the long weekend you will be able to attend <a href="https://www.emfcamp.org/schedule/2024">talks and workshops</a> on a large range of diverse and interesting topics, dance and party the night away at a couple different music areas, play some fun and geeky games around the site and marvel at the creativity and imagination of the other attendees and their installations.</p>
<p>The event is 100% run by volunteers and a lot of love and effort goes into making it successful for everyone. All attendees are encouraged to volunteer for a shift helping out at various things around the site from bar shifts to infrastructure support.</p>
<h2>The run-up</h2>
<p>As this was my first time I was quite anxious in the run up to it as I'm not really one for camping or for festivals but I was very excited that several people I knew were also going, including several old friends from my JSOxford days! 💙</p>
<figure class="center" markdown="1">
<p><img src="../images/emf-jsoxford.jpg" alt="A photo of Ben, Pete, Dan, Myself and Seren looking very happy at EMF Camp"></p>
<figcaption>Mini JSOxford reunion!</figcaption>
</figure>
<p>I was also helping <a href="https://k8s.social/@sammachin@chaos.social">Sam Machin</a> with some of the phone infrastructure for the event by setting up and configuring a Kubernetes cluster that ran on-site and handled almost all of the applications that allowed the various telephone networks to work and be built on top of. As I'm sure you can imagine, I really enjoyed helping out with this and already have a bunch of ideas on how to make it better and more reliable for the next event! We did have to deal with a couple bugs during the weekend but for the most part it went well! 😅</p>
<h2>On-site</h2>
<p>The main thing you'll notice while walking around the site is that there is no shortage of interesting things to look at or interact with. Some of the things are organised by the event and others are just cool things that people have brought along with them for the enjoyment of others.</p>
<p>Some highlights include...</p>
<figure class="center" markdown="1">
<p><img src="../images/emf-van.jpg" alt="A photo of a camper van covered in colourful, bright lights"></p>
<figcaption>A light up van with controlable lights from a web app!</figcaption>
</figure>
<figure class="center" markdown="1">
<p><img src="../images/emf-blacksmith.jpg" alt="A white tent covering 4 anvils and two smelting stations. A man in a leather apron is using some tools to interact with the hot coals"></p>
<figcaption>Blacksmithing!</figcaption>
</figure>
<figure class="center" markdown="1">
<p><img src="../images/emf-lasers.jpg" alt="A photo of a campsite with some tents at night with several green lasers being shot into the sky"></p>
<figcaption>Lasers!!!</figcaption>
</figure>
<figure class="center" markdown="1">
<p><img src="../images/emf-tesla-coil.jpg" alt="Close up photo of the top section of a Tesla coil with several strands of electricity coming off of it"></p>
<figcaption>A musical Tesla coil</figcaption>
</figure>
<figure class="center" markdown="1">
<p><img src="../images/emf-laser-duck-hunt.jpg" alt="A photo of a hillside at night seen above some food stalls. Projected onto the hillside using lasers is the game &amp;quot;Duck Hunt&amp;quot;"></p>
<figcaption>Duck Hunt! With LASERS!</figcaption>
</figure>
<p>... plus many more cool, interesting and striaght up weird and wonderful things.</p>
<p>One thing EMF Camp has that you likely won't see at many other festivals is a very impressive amount of network infrastructure that serves the whole site. Almost the entire site is covered in <a href="https://www.emfcamp.org/about/internet">WiFi</a> backed by a 40Gb fibre connection to the internet (into the middle of a field!!! 🤯) that is useable by all attendees. The site is also covered in an impressive network of <a href="https://www.emfcamp.org/about/phones">phone systems</a> operated by the POC (Phones Operations Centre) that includes DECT coverage on the whole site, the ability to use SIP clients, a plain old telephone service and a 2G GSM network. It was also possible for attendees to build apps using the phone network that other attendees can then use over the weekend. Along with all of this was a fantastic network of <a href="https://www.emfcamp.org/about/power">electric</a> hookup points across the site for all to use that ended up leading to my most millennial camping experience..</p>
<figure class="center" markdown="1">
<p><img src="../images/emf-airfryer.jpg" alt="A photo of a small black air fryer sat on a table inside a tent"></p>
<figcaption>Breakfast was easy with an AirFryer</figcaption>
</figure>
<p>While talking about food it's worth noting that the <a href="https://www.emfcamp.org/about/food">food and drink</a> options on site were amazing. There was several fantastic food stalls open most of the time and there was two bars available to get drinks from.</p>
<figure class="center" markdown="1">
<p><img src="../images/emf-food.jpg" alt="A small carton of food with a fork in it. The carton contains chicken and is covered in a green sauce with chillis and seeds sprinkled on top. A lemon slice is placed at the side."></p>
<figcaption>This was so delicious I had to get it again the next day!</figcaption>
</figure>
<p>While I didn't get to see as many talk as I might have liked (the seats weren't very comfortable) I did manage to get to see some of my wonderful friends speak!</p>
<figure class="center" markdown="1">
<p><img src="../images/emf-jo.jpg" alt="A photo of Jo Franchetti stood on a stage behind a podium with their presentation showing on a screen to the right"></p>
<figcaption>
<p><a href="https://www.emfcamp.org/schedule/2024/321-is-everything-difficult-or-is-it-just-me">Jo Franchetti - Is everything difficult, or is it just me?</a></p>
</figcaption>
</figure>
<figure class="center" markdown="1">
<p><img src="../images/emf-terence.jpg" alt="A photo of Terence Eden stood behind a laptop at a podium"></p>
<figcaption>
<p><a href="https://www.emfcamp.org/schedule/2024/16-lessons-learned-open-sourcing-the-uks-covid-tracing-app">Terence Eden - Lessons learned Open Sourcing the UK's Covid Tracing App</a></p>
</figcaption>
</figure>
<p>I also managed to catch a showing of &quot;<a href="https://festivalofthespokennerd.com/show/an-evening-of-unnecessary-detail/">An Evening Of Unnecessary Detail</a>&quot; which I very much enjoyed.</p>
<figure class="center" markdown="1">
<p><img src="../images/emf-evening-unnecessary.jpg" alt="A photo of a stage with a crowd of people at the bottom. A screen to the left is showing the title image for an evening of unnecessary detail"></p>
<figcaption>An Evening Of Unnecessary Detail</figcaption>
</figure>
<h2>Things I think could be improved</h2>
<p>On the whole everything I experienced was great! There's only some very minor annoyances that I'd like to see improved next time around.</p>
<p>While the toilets on site were quite nice (besides the fetilizer ones that grossed me out) I found them quite tight to squeeze into. Not a big deal compared to camping I've done in the past though.</p>
<p>There was a &quot;night market&quot; on site where some vendors were set up selling some funky, geeky crafts. I loved this idea but the room they were in was so small and crowded that I couldn't handle being in there for very long at all. It was a shame as I'd have liked to have chatted with some of the vendors while I was there.</p>
<p>As a non-(alcoholic)-drinker I would have liked to have seen more non-alcoholic options available in the bars. The selection they had was perfectly fine, it just would have been nice to have some 0% beers or similar.</p>
<p>The programmable badges had issues. This was kinda to be expected as they were brand new and very ambitious but it was still frustrating not really being able to use them right away. I'm looking forward to re-using mine in 2026 when I'm sure the software will be rock solid!</p>
<p>Not something for the event itself to improve on but something for me next time - I'd like to take part more in the various activities around the site. I was taking things steady this year as I didn't want to get overwhelemd by everything but I do now wish I took part in the Clippy Murder Mystery or the capture the flag style game with the badges. There's so much going on all over the site that you're bound to miss out on something you wish you'd managed to do.</p>
<h2>Summary</h2>
<p>If it wasn't obvious by now I had a fantastic weekened at EMF Camp and will certainly be attending the next one in 2026 if possible. I hope to see even more of my friends (new and old) there so we can hang out and nerd out on all the cool experiences that EMF Camp has on offer.</p>
]]></description><link>https://marcusnoble.co.uk/2024-06-07-my-first-emf-camp</link><guid isPermaLink="true">https://marcusnoble.co.uk/2024-06-07-my-first-emf-camp</guid><pubDate>Fri, 07 Jun 2024 00:00:00 GMT</pubDate></item><item><title><![CDATA[Custom Renovate datasource]]></title><description><![CDATA[<p>I'm quite a fan of <a href="https://docs.renovatebot.com/">Renovate</a> for automating my dependancy updates. I've been using it quite extensively - at <a href="https://giantswarm.io/">Giant Swarm</a>, with my personal infrastructure and with <a href="https://k8s.social/">k8s.social</a> - to keep things up-to-date in the various Kubernetes clusters I manage. This has been working great for container images and Helm charts, with both being managed via GitOps and Renovate automating the version updates via PRs, but I have now found myself wanting to automate the updating of the Kubernetes version for my <a href="https://www.civo.com/">Civo</a> cluster. Renovate doesn't have built in support for this but we can make it work with the use of custom datasources!</p>
<p>Before we dig in to the Renovate configuration, lets take a look at what exactly we're trying to achieve...</p>
<h2>The Goal</h2>
<p>All of the infrastructure for <a href="https://k8s.social/">k8s.social</a> is managed in git and <a href="https://github.com/k8s-social/infra">available on GitHub</a>. The Civo Kubernetes cluster is managed using <a href="https://www.pulumi.com/">Pulumi</a>, using Go, and applied from a <a href="https://github.com/k8s-social/infra/blob/main/.github/workflows/deploy.yaml">GitHub Action</a>. The Kubernetes version is included in the <a href="https://github.com/k8s-social/infra/blob/e141372aea1e1399d6024f854b28b1fbdd50c70c/main.go#L94"><code>main.go</code> file</a>:</p>
<pre><code class="language-golang">k8SCluster, err := civo.NewKubernetesCluster(ctx, &quot;k8sCluster&quot;, &amp;civo.KubernetesClusterArgs{
    Name:       pulumi.String(&quot;k8s.social&quot;),
    KubernetesVersion: pulumi.String(&quot;1.27.1-k3s1&quot;),
})
if err != nil {
    return err
}
</code></pre>
<p>(Lines removed for clarity)</p>
<p>So as you can see above, we're currently using the version <code>1.27.1-k3s1</code> but I'm aware that <code>1.28.2-k3s1</code> is now available to use. Rather than me having to update that value myself each time a new release is published (and to prevent me forgetting) lets have Renovate update that string for us automatically when it detects a new version.</p>
<h2>Getting the versions</h2>
<p>The first thing we need, before we can even start on the Renovate configuration, is some way of getting a list of available versions. As Renovate supports using a JSON file from an URL (spoiler!) it would be best if we have an API we could call to get the versions.</p>
<p>As it turns out, Civo does have an <a href="https://www.civo.com/api/kubernetes#list-available-versions">API you can call to get a list of versions</a>. Unfortunately, this requires you to be authenticated when calling it and I'd rather not provide Renovate with my API token just to get a list of available versions.</p>
<p>Instead I put together a very quick API app, <a href="https://github.com/AverageMarcus/civo-versions">civo-versions</a>, that calls the Civo API with authentication and returns the versions as a JSON object. If y'all want to make use of it yourself it is available at <code>https://civo-versions.cluster.fun/</code>. (Different endpoints are available for different filtering, take a look at the <a href="https://github.com/AverageMarcus/civo-versions/blob/main/README.md">README.md</a> for more info).</p>
<h2>Creating the custom datasource in Renovate</h2>
<blockquote>
<p>ℹ️ <strong>Note:</strong> As I write this, Renovate is currently in the process of renaming / migrating the <code>regexManagers</code> to be <code>customManagers</code> (<a href="https://github.com/renovatebot/renovate/issues/19066">see PR</a>). This made the documentation currently quite confusing and hard to follow as things aren't quite completed yet. For the purpose of this post I shall be using the old <code>regexManagers</code> which should still work for a while as there is a migration step built in to Renvoate to convert it to the new <code>customManagers</code>.</p>
</blockquote>
<p>Now that we have somewhere that lists the available versions we can configure renovate!</p>
<p>For this to work we'll be making use of two properties in the Renovate configuration: <a href="https://docs.renovatebot.com/configuration-options/#customdatasources"><code>customDatasources</code></a> and <a href="https://docs.renovatebot.com/configuration-options/#custommanagers"><code>regexManagers</code></a>.</p>
<pre><code class="language-json">{
    &quot;customDatasources&quot;: {

    },
    &quot;regexManagers&quot;: [

    ]
}
</code></pre>
<p>According to the <a href="https://docs.renovatebot.com/modules/datasource/custom/">documentation</a> the resulting JSON must match the following format (with the <code>version</code> being the only required field):</p>
<pre><code class="language-json">{
  &quot;releases&quot;: [
    {
      &quot;version&quot;: &quot;v1.0.0&quot;,
      &quot;isDeprecated&quot;: true,
      &quot;releaseTimestamp&quot;: &quot;2022-12-24T18:21Z&quot;,
      &quot;changelogUrl&quot;: &quot;https://github.com/demo-org/demo/blob/main/CHANGELOG.md#v0710&quot;,
      &quot;sourceUrl&quot;: &quot;https://github.com/demo-org/demo&quot;,
      &quot;sourceDirectory&quot;: &quot;monorepo/folder&quot;
    }
  ],
  &quot;sourceUrl&quot;: &quot;https://github.com/demo-org/demo&quot;,
  &quot;sourceDirectory&quot;: &quot;monorepo/folder&quot;,
  &quot;changelogUrl&quot;: &quot;https://github.com/demo-org/demo/blob/main/CHANGELOG.md&quot;,
  &quot;homepage&quot;: &quot;https://demo.org&quot;
}
</code></pre>
<p>While Renovate does have capabilities to perform transformations on the JSON is retrieves I decided to make things easier and have <code>civo-versions</code> return the JSON in a compatible format to start with. Because of this, our <code>customDatasources</code> will look quite sparse:</p>
<pre><code class="language-json">{
  &quot;customDatasources&quot;: {
    &quot;civo-k3s&quot;: {
      &quot;defaultRegistryUrlTemplate&quot;: &quot;https://civo-versions.cluster.fun/k3s/&quot;
    }
  }
}
</code></pre>
<p>Here we're giving our custom datasource the name of <code>civo-k3s</code> and the <code>defaultRegistryUrlTemplate</code> points to the URL where we can get a list of the current k3s versions available from Civo. If our returned JSON wasn't already in the required format we could use <code>transformTemplates</code> to perform some manipulation of the data - see the <a href="https://docs.renovatebot.com/modules/datasource/custom/">documentation</a> for more details if needed.</p>
<p>Now that we have our new custom datasource we can make use of it just like we would with the built in datasources.</p>
<p>We'll be using the regex manager to parse <code>main.go</code> and extract the line that contains our Kubernetes version string, then we'll then use our new custom datasource to replace the version if a newer one is found.</p>
<pre><code class="language-json">{
  &quot;customDatasources&quot;: {
    &quot;civo-k3s&quot;: {
      &quot;defaultRegistryUrlTemplate&quot;: &quot;https://civo-versions.cluster.fun/k3s/&quot;
    }
  },
  &quot;regexManagers&quot;: [
    {
      &quot;fileMatch&quot;: [&quot;main.go&quot;],
      &quot;matchStrings&quot;: [&quot;KubernetesVersion: pulumi.String\\(\&quot;(?&lt;currentValue&gt;\\S+)-k3s1\&quot;\\)&quot;],
      &quot;datasourceTemplate&quot;: &quot;custom.civo-k3s&quot;,
      &quot;depNameTemplate&quot;: &quot;civo-k3s&quot;,
      &quot;extractVersionTemplate&quot;: &quot;^(?&lt;version&gt;.*)-k3s1$&quot;
    }
  ]
}
</code></pre>
<p>In the above you can see the following:</p>
<ul>
<li><code>fileMatch</code> tells Renovate which files we want to try and replace versions within.</li>
<li><code>matchStrings</code> is a list of regex strings to match on. The <code>(?&lt;currentValue&gt;\\S+)</code> capture group is used to extract the current version number that Renovate will replace.</li>
<li><code>datasourceTemplate</code> is the name of the datasource we want to use for this manager. For custom datasources it is the name we defined prefixed with <code>custom.</code>.</li>
<li><code>depNameTemplate</code> is the name to use for the dependancy. This will be used in the PR title / body for example.</li>
<li><code>extractVersionTemplate</code> is a regex string to perform on the new version.</li>
</ul>
<h2>Putting it all together</h2>
<p>With all this in place and committed into git we should now have Renoate creating <a href="https://github.com/k8s-social/infra/pull/28">PRs to bump our Kubernetes version</a>. 🎉</p>
<figure class="center" markdown="1">
<p><img src="/images/renovate-kubernetes-bump.png" alt="Screenshot of PR opened by Renovate to update the Kubernetes version"></p>
<figcaption>Renovate very quickly opened this PR as soon as the new configuration was pushed</figcaption>
</figure>
<h2>Bonus</h2>
<p>To make the use of this Civo version datasource easier for others I've added it to my <a href="https://github.com/AverageMarcus/renovate-config">renovate-config</a> repo that contains reusable Renovate configuration presets.</p>
<p>My <a href="https://github.com/AverageMarcus/renovate-config/blob/main/civo.json">Civo preset</a> contains multiple datasources covering both k3s and Talos cluster types as well as limiting the versions just to stable releases. To make use of the presets, add the following to your Renovate configs <a href="https://github.com/k8s-social/infra/blob/e141372aea1e1399d6024f854b28b1fbdd50c70c/renovate.json#L6"><code>extends</code> property</a>:</p>
<pre><code>github&gt;averagemarcus/renovate-config:civo
</code></pre>
<h2>Resources</h2>
<ul>
<li><a href="https://docs.renovatebot.com/modules/datasource/custom/">Renovate custom datasource documentation</a></li>
<li><a href="https://docs.renovatebot.com/configuration-options/#custommanagers">Renovate <code>customManager</code> documentation</a></li>
<li><a href="https://www.civo.com/api/kubernetes#list-available-versions">Civo API documentation</a></li>
<li><a href="https://github.com/AverageMarcus/renovate-config">My personal Renovate presets</a></li>
<li><a href="https://github.com/AverageMarcus/civo-versions">civo-versions</a></li>
</ul>
]]></description><link>https://marcusnoble.co.uk/2023-10-04-custom-renovate-datasource</link><guid isPermaLink="true">https://marcusnoble.co.uk/2023-10-04-custom-renovate-datasource</guid><pubDate>Wed, 04 Oct 2023 00:00:00 GMT</pubDate></item><item><title><![CDATA[Migrating Redis data within Kubernetes]]></title><description><![CDATA[<p>I’ve recently been seeing some stability issues with Redis that I have running for the <a href="https://k8s.social">k8s.social</a> Mastodon instances. After looking into it I realised that I had it configured in a master/replica architecture but I wasn’t actually making any use of the replicas as Mastodon was configured to do everything via the master. There’s two things wrong with this - firstly I’m wasting resources by having the replicas running but more importantly I created a single point of failure. When the Redis master went down, so did Mastodon.</p>
<p>Not good!</p>
<p>To improve on this situation I decided to switch to making use of <a href="https://redis.io/docs/management/sentinel/">Redis Sentinel</a> to take advantage of its failover capabilities. I wanted to keep the same configuration for everything else so I was sure it would behave the same with Mastodon. This mainly meant I wanted to keep a <strong>single endpoint</strong> that Mastodon connected to Redis with and that Redis was configured with persistence in <strong>Append Only File (AOF)</strong> mode.</p>
<p>In most situations where I’ve used Redis it’s been as a caching layer so to switch to a new setup all I would need to do is point my app at the new endpoint. Mastodon, however, makes use of persistence in Redis and uses it as a data store for various things including users home feeds. So I needed to migrate across the existing data.</p>
<h2>Migration Steps</h2>
<p>After hours of trial and error I finally got a set of steps to follow that allowed me to migrate the data safely with only a small windows of time where I missed out on new updates from the live server. This was acceptable for me in this instance but might not always be suitable so please keep that in mind.</p>
<blockquote>
<p>Before you plan to follow these steps yourself, please make sure you also take a look at the <strong>Gotchas</strong> below!</p>
</blockquote>
<h3>Assumptions</h3>
<ul>
<li>You are using the <a href="https://github.com/bitnami/charts/tree/main/bitnami/redis">Redis Helm chart by Bitnami</a></li>
<li>You’re Redis instance is running in a Kubernetes cluster and it has enough resources available to run another instance alongside</li>
<li>You have Redis configured in master/replica mode with Append Only File persistence.</li>
<li>Your current Redis instance is deployed into the <code>redis</code> namespace.</li>
<li>Your data is in the Redis database <code>1</code></li>
</ul>
<p>Not all of these are required to follow these steps (I wasn’t actually using the Helm charts) but it helps to establish some common groundwork.</p>
<h3>Steps</h3>
<ol>
<li>Setup Redis Sentinel alongside old setup (ideally in a new namespace) with the following configuration changes:</li>
</ol>
<pre><code>1. `appendonly` config set to `no` (we’ll change this later)
2. Replicas set to `1`
</code></pre>
<ol start="2">
<li>Once new PVC has been created and Redis has started scale down Redis Sentinel’s StatefulSet replicas to <code>0</code></li>
<li>On old Redis master, snapshot the current state:</li>
</ol>
<pre><code>1. `kubectl exec -n redis -it redis-master-0 -- bash`
2. `redis-cli -a ${REDIS_PASSWORD} -n 1 SAVE`
</code></pre>
<ol start="4">
<li>
<p>On your local machine copy backup: <code>kubectl cp redis/redis-master-0:/data/dump.rdb ./dump.rdb</code></p>
</li>
<li>
<p>Launch a temporary pod that mounts the new PVC to upload dump into:</p>
<pre><code class="language-go">kubectl run --namespace redis-sentinel -i --rm --tty temp --overrides='
  {
      &quot;apiVersion&quot;: &quot;v1&quot;,
      &quot;kind&quot;: &quot;Pod&quot;,
      &quot;metadata&quot;: {
          &quot;name&quot;: &quot;temp&quot;
      },
      &quot;spec&quot;: {
          &quot;containers&quot;: [{
             &quot;command&quot;: [
                  &quot;tail&quot;,
                  &quot;-f&quot;,
                  &quot;/dev/null&quot;
             ],
             &quot;image&quot;: &quot;bitnami/minideb&quot;,
             &quot;name&quot;: &quot;mycontainer&quot;,
             &quot;volumeMounts&quot;: [{
                 &quot;mountPath&quot;: &quot;/mnt&quot;,
                 &quot;name&quot;: &quot;redisdata&quot;
              }]
          }],
          &quot;restartPolicy&quot;: &quot;Never&quot;,
          &quot;volumes&quot;: [{
              &quot;name&quot;: &quot;redisdata&quot;,
              &quot;persistentVolumeClaim&quot;: {
                  &quot;claimName&quot;: &quot;redis-data-redis-node-0&quot;
              }
          }]
      }
  }' --image=&quot;bitnami/minideb&quot;
</code></pre>
</li>
<li>
<p>On your local machine upload backup to the temporary pod:</p>
</li>
</ol>
<pre><code>1. `kubectl cp ./dump.rdb redis-sentinel/temp:/mnt/dump.rdb`
</code></pre>
<ol start="7">
<li>Delete the temporary pod (<code>kubectl delete pod -n redis-sentinel temp</code>) and restore the Redis Sentinel’s StatefulSet to <code>1</code> replica</li>
<li>Once the new Redis Sentinel is running, instruct it to switch to appendonly:</li>
</ol>
<pre><code>1. `kubectl exec -n redis-sentinel -it redis-node-0 -- bash`
2. `redis-cli -a $REDIS_PASSWORD -n 1 bgrewriteaof`

(👆 This step seemed to be missing from all the backup/restore guides I found for Redis and was required to continue making use of the Append Only File.)
</code></pre>
<ol start="9">
<li>Update the Redis Sentinel’s config to set <code>appendonly</code> back to <code>yes</code></li>
<li>Restart the Redis Sentinel pod</li>
<li>Once Redis is up and running scale the StatefulSet back up to <code>3</code> (or however many replicas you plan to run)</li>
<li>Deploy <a href="https://github.com/flant/redis-sentinel-proxy">redis-sentinel-proxy</a> to act as a single entrypoint to Redis that points to the current leader. (You can take a look at <a href="https://github.com/k8s-social/gitops/blob/18c201d815829b49e2b9f1316f1edac1e80a3d42/manifests/redis-sentinel/proxy.yaml#L46-L55C14">my configuration</a> if you’re not sure how it needs to be set up)</li>
<li>Update your applications to point to the new redis-sentinel-proxy Service.</li>
</ol>
<h2>Gotchas</h2>
<h3>CoreDNS Changes</h3>
<p>Redis Sentinel makes use of a <a href="https://kubernetes.io/docs/concepts/services-networking/service/#headless-services">headless Kubernetes service</a> to discover and connect to the Redis replicas. For this to work correctly, CoreDNS must be configured with <code>endpoint_pod_names</code> within the <code>kubernetes</code> block (see the <a href="https://coredns.io/plugins/kubernetes/">CoreDNS documentation</a> for more info). This allows pods to be resolved by hostnames such as <code>redis-node-0.Redis-headless.redis.svc</code> instead of <code>1-2-3-4.redis-headless.redis.svc</code>.</p>
<p>If the <code>endpoint_pod_names</code> properly already exists in your CoreDNS config you don’t need to do anything but it wasn’t default in my cluster.</p>
<p><strong>Example:</strong></p>
<pre><code class="language-go">kubernetes cluster.local in-addr.arpa ip6.arpa {
    pods insecure
    endpoint_pod_names
    fallthrough in-addr.arpa ip6.arpa
}
</code></pre>
<h3>Re-enabling <code>appendonly</code></h3>
<p>It turns out that simply re-enabling <code>appendonly yes</code> and performing a server restart will lead to all your previous data (in the dump.rdb) being ignored and Redis starting fresh with the Append Only File mode.</p>
<p>To get around this issue you must first inform Redis that you are enabling <code>appendonly</code> by performing <code>redis-cli -a $REDIS_PASSWORD -n 1 bgrewriteaof</code>.</p>
<p>This wasn’t mentioned in any of the backup / restore instructions I found for Redis until I came across “<a href="https://pellegrino.link/2020/06/10/2020-redis-persistence-migrating-rdb-to-aof.html">Migrating to AOF from RDB with Redis</a>” by Laurent. I went through two attempts where I lost the data before I realised this was needed so I’m very thankful for discovering this post!</p>
<h3>No write endpoint</h3>
<p>it turns out that Redis Sentinel, and more specifically the Bitnami Hem chart, doesn’t have a service specifically for the leader instance. This means if you attempt to use the <code>redis.redis-sentinel.svc</code> Service for both reads and writes you’ll eventually receive an error telling you that Redis is in read only mode as the service load balances across all the pods, not just the leader. Instead you’re expected to have your app query Sentinel to request the endpoint of the current leader. This was a problem for me as I couldn’t make changes to Mastodon to support this. Instead I deployed <a href="https://github.com/flant/redis-sentinel-proxy">redis-sentinel-proxy</a> that acts as a small proxy to the current write-capable pod and updates as it fails over to a new pod.</p>
<p>There is now a slight concern that this has become the single point of failure. The proxy is fairly lightweight and launches quickly so I’m pretty confident that if errors do occur a replacement will be available quickly. I’ve made sure it is running with at least 2 replicas with a pod anti-affinity to hopefully avoid node failures taking it down.</p>
<h3>Migration window</h3>
<p>I didn’t set up any replication from the old Redis to the new, instead I was making use of a snapshot from the old and restoring it into the new. Because of this there was a small window of time where new data was being created in the old Redis that wouldn’t be available in the new as I was in the process of switching over. For my purposes this was acceptable and I scheduled the move during a period of time I know saw less activity but I know this approach wont be suitable for all situations.</p>
<p>To minimize this window I worked with only a single Redis Sentinel replica during the migration and I set <code>sentinel.getMasterTimeout</code> to a very small value to ensure that first pod came up quickly so I could move traffic over. Just remember to put that timeout back up once you’re done!</p>
<h3>The scripts in the Helm chart didn’t work correctly</h3>
<p>It’s entirely possible I did something wrong here but the <code>prestop</code> scripts within the Bitnami chart didn’t work correctly and weren’t actually triggering the failover on shutdown. I only discovered this while I was testing out <a href="https://github.com/flant/redis-sentinel-proxy">redis-sentinel-proxy</a> and noticed there was a couple minutes where it was still attempting to connect to the old leader that had shutdown.</p>
<p>From what I could tell, the main thing missing was the prestop script for the Sentinel container didn’t trigger the failover command. In the end I decided not to use the Helm chart but instead use it to generate manifests to use directly. <a href="https://github.com/k8s-social/gitops/blob/18c201d815829b49e2b9f1316f1edac1e80a3d42/manifests/redis-sentinel/config.yaml#L466-L560">You can see the scripts I use on GitHub</a>.</p>
<h2>Summary</h2>
<p>In the end, after a few hours of trial and error, the actual migration process was fairly smooth. I think I had a migration window of about a minute or less where new data was lost.</p>
<p>I was pretty annoyed that none of the backup / restore guides I came across actually handled the AOF issues. They all stated that it needed disabling at the start to generate the dump.rdb but there was then no mention about it for the resulting Redis instance upon restoring. Hopefully this post will help someone else avoid that issue in the future.</p>
<p>I found the Bitnami Helm chart to be a pain to work with. It has so many different configurations it was hard to know for sure what the result would be, plus the above issue about the scripts not working. I feel better having used it as a base and generating the yaml manifests to then further work with.</p>
<p>My only outstanding concern now is the redis-sentinel-proxy being a single point of failure. It’s only a small concern though, it has been running smoothly for the past few days now and it’s configured to be resilient enough to handle failures but I’ll be keeping an eye on this going forward.</p>
<p>If you’ve found this useful, or found issues with it, I’d love to hear! You can find me on Mastodon at <a href="https://k8s.social/@Marcus">@marcus@k8s.social</a>.</p>
<h2>Resources &amp; References</h2>
<ul>
<li><a href="https://docs.bitnami.com/kubernetes/infrastructure/redis/administration/backup-restore/">https://docs.bitnami.com/kubernetes/infrastructure/redis/administration/backup-restore/</a></li>
<li><a href="https://github.com/bitnami/charts/tree/main/bitnami/redis">https://github.com/bitnami/charts/tree/main/bitnami/redis</a></li>
<li><a href="https://hub.docker.com/r/flant/redis-sentinel-proxy">https://hub.docker.com/r/flant/redis-sentinel-proxy</a> / <a href="https://github.com/flant/redis-sentinel-proxy">https://github.com/flant/redis-sentinel-proxy</a></li>
<li><a href="https://pellegrino.link/2020/06/10/2020-redis-persistence-migrating-rdb-to-aof.html">https://pellegrino.link/2020/06/10/2020-redis-persistence-migrating-rdb-to-aof.htm</a></li>
</ul>
]]></description><link>https://marcusnoble.co.uk/2023-09-04-migrating-redis-data-within-kubernetes</link><guid isPermaLink="true">https://marcusnoble.co.uk/2023-09-04-migrating-redis-data-within-kubernetes</guid><pubDate>Mon, 04 Sep 2023 00:00:00 GMT</pubDate></item><item><title><![CDATA[Managing Kubernetes without losing your cool]]></title><description><![CDATA[<details>
<summary>Changelog</summary>
<p>2022-07-15: Added tweet from Ian Coldwater with <code>nsenter</code> example</p>
</details>
<p>This post is based on a <a href="https://www.youtube.com/watch?v=SLysG0QWiG4">webinar i've previously given</a> where I go through some of my favourite tips for working with Kubernetes clusters all day long. The goal of all of these techniques is to make my life easier and (hopefully) less error prone. I start off with the first 5 tips being applicable to anyone working with Kubernetes and can be picked up right away. From there I move on to a couple that would benefit from having some old-skool Linux sys-admin experience. Finally I finish of with some more advanced techniques that require some previous programming experience.</p>
<h2>#0 - Pay someone else to do it</h2>
<p>Yeah, ok, this one is a little tongue-in-cheek as this is what we provide at <a href="https://www.giantswarm.io/">Giant Swarm</a> but it is worth thinking about.</p>
<p>If you have dozens or even hundreds of clusters, on-top of standard development work, you’re going to be stretched thin. Getting someone else to manage things while you focus on what makes your business money can often be the right choice.</p>
<h2>#1 - Love your Terminal</h2>
<p>It doesn’t matter if your shell of choice is Bash, ZSH, Fish or even Powershell you’ll benefit greatly from learning the ins-and-outs of your terminal of choice.</p>
<p>Many shell implementations have some variation of “rc” files - often referred to as “dotfiles”. In bash, it’s a file called <code>.bashrc</code>, in ZSH it’s <code>.zshrc</code>. Generally these are found in your users home directory. These files are interpreted by your shell each time a new session (tab/window) is opened and allows you to set up defaults and customisations. One of my favourite uses for these session files is to create custom aliases for functions (or groups of functions) that I use often to help me (try to) avoid typos and “fat fingering” when typing out long commands.</p>
<p>Many people share their custom “dotfiles” on GitHub or similar and are a great source of neat tricks and handy shortcuts. My dotsfiles can be found on GitHub at <a href="https://github.com/averagemarcus/dotfiles">AverageMarcus/dotfiles</a>.</p>
<h2>#2 - Learn to Love <code>kubectl</code></h2>
<p>Everything you might want to do with a Kubernetes cluster can be acomplished with the help of <code>kubectl</code>. Learning how to master this handy little CLI can go a long way to mastering Kubernetes cluster management.</p>
<p>The official documentation offers a single page view of all built in commands that is a great place to start if you’re not sure of a command: <a href="https://kubernetes.io/docs/reference/generated/kubectl/kubectl-commands">kubectl-commands</a>.</p>
<p>Building on tip #1, my most used command is <code>k</code>, an alias I have set up for <code>kubectl</code> to save on keystrokes. Add the following to your relevant ‘rc’ file:</p>
<pre><code class="language-bash">alias k='kubectl'
</code></pre>
<p>Once this is in your terminal session, you can do something like the following:</p>
<pre><code class="language-bash">k get pods
</code></pre>
<p>(You’ll see more of this alias in the other examples used in this post)</p>
<p><code>kubectl</code> even has a very handy command to find out everything you need to know about the properties of any Kubernetes resource - <a href="https://kubernetes.io/docs/reference/generated/kubectl/kubectl-commands#explain">kubectl explain</a></p>
<pre><code class="language-bash"># k explain pods.spec.containers
KIND: Pod
VERSION: v1

RESOURCE: containers &lt;[]Object&gt;

DESCRIPTION:
</code></pre>
<p>I find this most useful when you can’t access the official docs or your know exactly the field you’re interested in.</p>
<h2>#3 - Multiple Kubeconfigs</h2>
<p>When dealing with multiple Kubernetes clusters your kubeconfig file can very quickly become an unwieldy mess.</p>
<p>Thankfully there are some tools available to make it more manageable by making it quick and easy to switch between cluster context.</p>
<ul>
<li>kubectx and kubens - <a href="https://github.com/ahmetb/kubectx">https://github.com/ahmetb/kubectx</a></li>
<li>kubeswitch - <a href="https://github.com/danielfoehrKn/kubeswitch">https://github.com/danielfoehrKn/kubeswitch</a></li>
<li>kubie - <a href="https://github.com/sbstp/kubie">https://github.com/sbstp/kubie</a></li>
</ul>
<p><code>kubeswitch</code> is my current favourite as it supports loading kubeconfigs from a directory of yaml files, allowing me to export the kubeconfig of each cluster to their own file. I had previously had this same multi-file setup with just <code>kubectl</code> by combining multiple files in the <code>KUBECONFIG</code> environment variable with a <code>:</code> (see below) but I found that many tools didn’t support this and attempted to treat the value as a single file path.</p>
<p>Multi-file kubeconfig hack, expecting files to be in the <code>~/.kube/clusters</code> directory:</p>
<pre><code class="language-bash">## Merge multiple kubeconfigs
function join_by { local d=$1; shift; echo -n &quot;$1&quot;; shift; printf &quot;%s&quot; &quot;${@/#/$d}&quot;; }
KUBECONFIG=&quot;$HOME/.kube/config&quot;
OTHER_CLUSTERS=&quot;$(join_by :$HOME/.kube/clusters/ $(echo $HOME/.kube/clusters/$(/bin/ls $HOME/.kube/clusters)))&quot;
export KUBECONFIG=$KUBECONFIG:$OTHER_CLUSTERS
</code></pre>
<p>I recommend using one of the tools listed above rather than this shell hack.</p>
<h2>#4 - k9s</h2>
<p><a href="https://github.com/derailed/k9s/">k9s</a> is one of my favourite tools when working with clusters. It provides an interactive terminal that supports all resource types and actions, lots of keybinding and filtering. Once you learn all the shortcuts and features, this handy tool will save you a lot of time when working with problematic clusters.</p>
<figure class="center" markdown="1">
<p><img src="../images/k9s.gif" alt="Preview of k9s in action, showing a few of the actions available"></p>
<figcaption>k9s in action!</figcaption>
</figure>
<h2>#5 - <code>kubectl</code> plugins</h2>
<p>Just like Kubernetes itself, <code>kubectl</code> can be extended to provide more powerful functionality via a plugin mechanism, there’s even a package manager available to help discover plugins - <a href="https://github.com/kubernetes-sigs/krew">Krew</a>.</p>
<p>But you don’t need a package manager, or really anything too fancy, to add plugins to <code>kubectl</code>. Any executable in your <code>$PATH</code> that is prefixed with <code>kubectl-</code> becomed a <code>kubectl</code> plugin.</p>
<p>For example, assume we wanted to make a simple “hello, world” style plugin, we just need to create an executable somewhere in our path and name it something like <code>kubectl-hello.sh</code> and then we automatically have the <code>kubectl hello</code> command available to us.</p>
<pre><code class="language-bash"># kubectl-hello.sh
echo &quot;Hello from kubectl&quot;

---

&gt; kubectl hello
&quot;Hello from kubectl&quot;
</code></pre>
<p>This example uses a shell script but your plugins can be built in any language you like as long as it’s executable and has the needed prefix.</p>
<p>One thing to note with plugins though - if you use tabcompletion / autocomplete in your shell with kubectl it generally wont work with the plugin commands.</p>
<p>If you want to get started with some <code>kubectl</code> plugins you can take a look at the <a href="https://krew.sigs.k8s.io/plugins/">Krew plugin directory</a> to see if a plugin already exists for your needed task. Some of my favourite plugins are:</p>
<ul>
<li><a href="https://github.com/stern/stern">stern</a> - Multi-pod/container log tailing</li>
<li><a href="https://github.com/ahmetb/kubectl-tree">tree</a> - Show hierarchy of resources based on ownerReferences</li>
<li><a href="https://github.com/replicatedhq/outdated">outdated</a> - Find containers with outdated images</li>
<li><a href="https://github.com/giantswarm/kubectl-gs">gs</a> - Giant Swarm’s plugin for working with our managed clusters</li>
</ul>
<h2>#6 - <code>kshell</code> / <code>kubectl debug</code></h2>
<p>There are many times where I need to launch a temporary pod in a cluster to aid with debugging, usually related to networking issues. To help with this I have a handy Bash alias to easily launch a temporary pod and drop me right into its shell.</p>
<pre><code class="language-bash">alias kshell='kubectl run -it --image bash --restart Never --rm shell'
</code></pre>
<p>Now I can run <code>kshell</code> to get a new debugging pod running the <code>bash</code> container image. Need more tools then available in the <code>bash</code> image? Replace <code>bash</code> with something like <code>ubuntu</code>.</p>
<pre><code class="language-bash"># kshell
If you don't see a command prompt, try pressing enter.
bash-5.1# nslookup google.com
Server: 1.1.1.1
Address: 1.1.1.1:53

Non-authoritative answer:
Name: google.com
Address: 142.250.187.206
</code></pre>
<p>This is great for more general debugging of a cluster, especially with networking issues such as DNS resolution or network policies.</p>
<p>This is great for more general debugging but what about debugging a specific pod? Well we’ve got a couple tools we can use depending on the situation - <code>kubectl exec</code> or <code>kubectl debug</code>.</p>
<p><code>kubectl exec</code> lets up drop into the shell of a running container, just like we do with <code>kshell</code> above, but this isn’t always possible.</p>
<pre><code class="language-bash"># kubectl exec my-broken-pod -it -- sh
error: Internal error occurred: error executing command in container: failed to exec in container: failed to start exec……
</code></pre>
<p>The container needs to have a shell we can drop into which isn’t always the case and we’ll be presented with something similar to the error above.</p>
<p>Similarily, if our pod is CrashLooping we’ll only be able to have access to the shell until the pod is restarted again, at which point we’ll be kicked out back to our local shell session.</p>
<p>Thankfully, if running Kubernetes 1.23 or later we can make use of <code>kubectl debug</code> to potentially investigate these problem pods.</p>
<pre><code class="language-bash"># kubectl debug -it --image bash my-broken-pod
Defaulting debug container name to debugger-gprmk. If you don't see a command prompt, try pressing enter.
bash-5.1#
</code></pre>
<p>This injects an <a href="https://kubernetes.io/docs/concepts/workloads/pods/ephemeral-containers/">Ephemeral Container</a> into our chosen pod that can then run alongside our broken container.</p>
<p><code>kubectl debug</code> has a few different modes:</p>
<ul>
<li>
<p>The default launches an “ephemeral container” within the pod you’re debugging</p>
<pre><code class="language-bash">kubectl debug
</code></pre>
</li>
<li>
<p>The <code>copy-to</code> flag creates a copy of the pod with some values replaced (e.g. the image used)</p>
<pre><code class="language-bash">kubectl debug –copy-to
</code></pre>
</li>
<li>
<p>Specifying the node launches a pod in the nodes host namespace to debug the host machine</p>
<pre><code class="language-bash">kubectl debug node/my-node
</code></pre>
</li>
</ul>
<p>There are some limitations though, for example <code>kubectl debug</code> can’t access the whole filesystem of our failing pod, it’ll only be able to access mounted volumes if specified.</p>
<p>When to use what:</p>
<table role="grid">
<thead>
<tr>
<th></th>
<th style="text-align:center"><strong>kshell</strong></th>
<th style="text-align:center"><strong>kubectl exec</strong></th>
<th style="text-align:center"><strong>kubectl debug</strong></th>
</tr>
</thead>
<tbody>
<tr>
<td>Multiple workloads experiencing network issues</td>
<td style="text-align:center">✅</td>
<td style="text-align:center"></td>
<td style="text-align:center"></td>
</tr>
<tr>
<td>Workload not running as expected but not CrashLooping and isn’t a stripped down image (e.g. not Scratch / Distroless)</td>
<td style="text-align:center"></td>
<td style="text-align:center">✅</td>
<td style="text-align:center"></td>
</tr>
<tr>
<td>Workload not running as expected but not CrashLooping and has an image based on Scratch / Distroless or similar</td>
<td style="text-align:center"></td>
<td style="text-align:center"></td>
<td style="text-align:center">✅</td>
</tr>
<tr>
<td>Workload is CrashLooping</td>
<td style="text-align:center"></td>
<td style="text-align:center"></td>
<td style="text-align:center">✅</td>
</tr>
</tbody>
</table>
<h2>#7 - <code>kube-ssh</code></h2>
<p>Sometimes investigating the cluster isn’t enough to find the cause of a problem. Sometimes the host machines hold the answer. If we have ssh access to our machines, this is a simple task, but that’s not always the case. It’s a common practice to disable ssh and similar services on a cluster and treat all machines as immutable and replaceable. This is great from a security perspective but not so much when trying to debug a problem that seems limited to a single (or subset) of nodes.</p>
<p>If our cluster is running Kubernetes 1.23 or later then we can make use of the <code>kubectl debug</code> command as described above with the node we want to debug.</p>
<p>But if we aren’t yet running 1.23 there are some alternatives.</p>
<p>I have a project called <a href="https://github.com/AverageMarcus/kube-ssh">kube-ssh</a> that gives “ssh-like” access to a nodes underlying host machine and Giant Swarm has a similar <code>kubectl</code> plugin, <a href="https://github.com/giantswarm/kubectl-enter">kubectl-enter</a>.</p>
<p><code>kube-ssh</code> uses a few of the tips already suggested in this post to launch a new pod in the cluster with elevated permissions and makes use of <code>nsenter</code> to switch to the hosts Linux namespace.</p>
<pre><code class="language-bash">sh -c &quot;$(curl -sSL https://raw.githubusercontent.com/AverageMarcus/kube-ssh/master/ssh.sh)&quot;
[0] - ip-10-18-21-146.eu-west-1.compute.internal
[1] - ip-10-18-21-234.eu-west-1.compute.internal
[2] - ip-10-18-21-96.eu-west-1.compute.internal
Which node would you like to connect to? 1

If you don't see a command prompt, try pressing enter.
[root@ip-10-18-21-234 ~]#
</code></pre>
<p>(Please don’t blindly execute a script from the internet like that, examine it first at least)</p>
<p>There are some caveats though:</p>
<ul>
<li>The underlying host needs a shell just like we mentioned above with <code>kubectl exec</code>. (So this wont work if you’re running <a href="https://talos.dev">Talos</a> for example)</li>
<li>You require enough permissions to launch pods with privileged securityContext - RBAC, PSPs and Admission Controllers could all potentially block this. (This could also be considered a benefit to this approach over traditional SSH)</li>
<li>Not a real SSH session</li>
</ul>
<p>Shortly after I posted this blog post, <a href="https://twitter.com/IanColdwater">Ian Coldwater</a> tweeted out a version of this technique that's small enough to fit in a tweet! 🤯</p>
<figure class="center" markdown="1">
<p><a href="https://twitter.com/IanColdwater/status/1545065196561080321"><img src="/images/tweets/1545065196561080321.svg" alt=""></a></p>
<figcaption>Timely Tweet by Ian Colwater</figcaption>
</figure>
<h2>#8 - Admission Webhooks</h2>
<p>Kubernetes has two types of dynamic admission webhooks (there are also CRD conversion webhooks but we’re not talking about those here):</p>
<ul>
<li>ValidatingWebhook - Ability to block actions against the API server if fails to meet given criteria.</li>
<li>MutatingWebhook - Modify requests before passing them on to the API server.</li>
</ul>
<p>These webhooks allow you to implement more advanced access control that is possible with RBAC, PSPs, etc. as well as the ability to modify requests to the Kubernetes API on-the-fly. With this you can take more control over your clusters in an automated way, for example you could:</p>
<ul>
<li>Add default labels to resources as they’re created.</li>
<li>Prevent the <code>latest</code> tag being used on images.</li>
<li>Enforce all pods have resource requests/limits specified.</li>
<li>“Hotfix” for security issues (e.g. mutating all pods to include a <code>LOG4J_FORMAT_MSG_NO_LOOKUPS</code> env var to prevent Log4Shell exploit).</li>
</ul>
<p>With these webhooks you’re also able to do subtractive access control - take away a users ability to perform a certain action - something that isn’t possible with RBAC which is only addative. I have actually wrote about this previously where we needed to remove a specific permission from the cluster-admin role: <a href="https://marcusnoble.co.uk/2022-01-20-restricting-cluster-admin-permissions/">Restricting Cluster Admin Permissions</a></p>
<p>There’s two general ways to go about implementing webhooks in your cluster, either you build something bespoke yourself with custom logic or you leverage a third-party tool that abstracts away from of the low-level details. I recommend starting with a third-party solution to start and if that doesn’t fulfil your needs then look at a custom implementation.</p>
<p>The two current most popular options are <a href="https://kyverno.io/">Kyverno</a> and <a href="https://open-policy-agent.github.io/gatekeeper/website/docs/">OPA Gatekeeper</a>.</p>
<p>Example Kyverno policy taken from my <a href="https://marcusnoble.co.uk/2022-01-20-restricting-cluster-admin-permissions/">Restricting cluster-admin permissions </a> blog post:</p>
<pre><code class="language-bash">apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
  name: block-bulk-certconfigs-delete
  annotations:
    policies.kyverno.io/description: Block delete all bug in CLI
spec:
  rules:
  - name: block-bulk-certconfigs-delete
    match:
      any:
      - resources:
          kinds: [CertConfig]
    preconditions:
      any:
      - key: &quot;&quot;
        operator: Equals
        value: &quot;&quot;
    validate:
      message: |
        Your current kubectl-gs version contains a critical bug
      deny:
        conditions:
        - key: &quot;&quot;
          operator: In
          value: [DELETE]
</code></pre>
<p>There are some things to be aware of though. It’s completely possible (and frustratingly common) to take down a cluster with incorrectly configured webhooks.</p>
<p>Where possible, always try and exclude applying webhooks to the <code>kube-system</code> namespace to ensure critical components aren’t accidentally blocked by a webhook.</p>
<p>For example, i’ve dealt with cases where a validating webhook applied to all pods, in all namespaces, and during a cluster upgrade the pods that provided the logic for the webhooks weren’t yet running when the apiserver pod was being created. The webhooks pod was also blocked from launching because the webhook couldn’t run against the pod launch. The launching of this pod being blocked cascaded to more and more failures in the cluster until eventually manual intervention was needed.</p>
<p>On a similar note, be aware of the <code>FailurePolicy</code> property on the webhook resources. By default this is set to <code>fail</code> which will prevent whatever action is being targeted if the webhooks service is unavailable.</p>
<p>There is also a <code>reinvocationPolicy</code> property that can be set if changes made by a <code>MutatingWebhook</code> need to trigger a previously executed webhook to re-run with the latest changes. Use this sparingly as you don’t want to be continuously running webhooks all the time.</p>
<p>The last thing to be aware of - the order webhooks are executed isn’t guaranteed. The <code>MutatingWebhooks</code> are run before the <code>ValidatingWebhooks</code> but within those two phases there’s nothing to ensure a specific running order. In practice, they are run in alphabetical order but that shouldn’t be counted on.</p>
<p>Unfortunately these recommendations are rarely followed by the various open source project that make use of them so you need to be careful about how these applications are configured in your cluster. Oh, and if you have the bright idea like I did to implement a webhook that enforces these recommendations you can stop now. <a href="https://github.com/kubernetes/kubernetes/blob/ea0764452222146c47ec826977f49d7001b0ea8c/staging/src/k8s.io/apiserver/pkg/admission/plugin/webhook/rules/rules.go#L121">Webhooks aren’t performed against other webhook resources</a>.</p>
<h2>#9 - Kubernetes API</h2>
<p>All Kubernetes operations are done via the API - <code>kubectl</code> uses it, in-cluster controllers use it, the Kubernetes scheduler uses it and <strong><em>you can use it too!</em> ✨</strong></p>
<p>Many agree that the API is what makes Kubernetes so powerful and successful. It provides a solid foundation with some core resources and functionality, then let’s others extend it and build on top more advanced and powerful features.</p>
<p>The API can be extended by either:</p>
<ul>
<li>the creation of Custom Resource Definitions (CRDs) and controllers.</li>
<li>implementing an Aggregation Layer (such as what metrics-server implements). - I’ll not be talking about this in this post.</li>
</ul>
<p>It’s easy to start experimenting with the Kubernetes API, <code>kubectl</code> actually exposes a <code>--raw</code> flag that we can use to perform an action against a raw API endpoint without having to worry about handling authentication as this will be pulled from your <code>KUBECONFIG</code> details. For example, the following is equivalent to <code>kubectl get pods -n default</code>:</p>
<pre><code class="language-bash"># kubectl get --raw /api/v1/namespaces/default/pods
{&quot;kind&quot;:&quot;PodList&quot;,&quot;apiVersion&quot;:&quot;v1&quot;,&quot;metadata&quot;:{&quot;selfLink&quot;:...
</code></pre>
<p>The following table shows which <code>kubectl</code> commands to use for the different HTTP method verbs.</p>
<table role="grid">
<thead>
<tr>
<th><strong>HTTP Method</strong></th>
<th><strong>Kubectl command</strong></th>
</tr>
</thead>
<tbody>
<tr>
<td>GET</td>
<td>kubectl get --raw</td>
</tr>
<tr>
<td>POST</td>
<td>kubectl create --raw</td>
</tr>
<tr>
<td>DELETE</td>
<td>kubectl delete --raw</td>
</tr>
<tr>
<td>PUT</td>
<td>kubectl replace --raw</td>
</tr>
</tbody>
</table>
<p>There is even a <code>kubectl</code> command to list out the available APIs in a given cluster in case you weren’t sure.</p>
<pre><code class="language-bash"># kubectl api-resources
NAME                              SHORTNAMES         APIVERSION                             NAMESPACED   KIND
apiservices                                          apiregistration.k8s.io/v1              false        APIService
applications                      app,apps           argoproj.io/v1alpha1                   true         Application
applicationsets                   appset,appsets     argoproj.io/v1alpha1                   true         ApplicationSet
appprojects                       appproj,appprojs   argoproj.io/v1alpha1                   true         AppProject
bindings                                             v1                                     true         Binding
certificatesigningrequests        csr                certificates.k8s.io/v1                 false        CertificateSigningRequest
clusterinterceptors               ci                 triggers.tekton.dev/v1alpha1           false        ClusterInterceptor
clusterrolebindings                                  rbac.authorization.k8s.io/v1           false        ClusterRoleBinding
clusterroles                                         rbac.authorization.k8s.io/v1           false        ClusterRole
</code></pre>
<p>Given the above results, the API endpoints are formatted like so:</p>
<p><code>/{API_VERSION}/namespace/{NAMESPACE}/{RESOURCE_KIND}/{NAME}</code></p>
<p>If a resource isn’t namespaced (if its cluster-scoped), or if your wanted to perform the request across all namespaces, then you can remove the <code>namespace/{NAMESPACE}/</code> from the URL.</p>
<p>Similarly, if you’re wanting to list resources rather than retrieve a single resource then you remove the <code>/{NAME}</code> from the end of the URL.</p>
<blockquote>
<p>Be aware that the <code>{API_VERSION}</code> has a bit of a gotcha! If the <code>APIVERSION</code> column in the <code>kubectl</code> output just says <code>v1</code> then this is one of the “core” resources and the API path will start with <code>/api/v1/</code>. But, if the resource is not one of the “core” ones, for example deployments are <code>apps/v1</code>, then the endpoints starts slightly differently: <code>/apis/apps/v1</code>. Take note that this path starts with <code>apis</code> (with an “s”) followed by the api version, whereas the core start with <code>api</code> (without and “s”).</p>
</blockquote>
<p>You may be wondering why we’d want to use the API directly rather than making use of <code>kubectl</code>, well there’s at least a couple good reasons:</p>
<ul>
<li>the API allows us to work with Kubernetes in any programming language we want and without the need for additional applications.</li>
<li>the <code>kubectl</code> tool is designed to be used by humans primarily meaning most of the output is best suited for a person to read, rather than a computer. Similarly, the commands available in <code>kubectl</code> don’t always map to a single API call, often there are several calls made “behind the scenes” before the result is returned to the user.</li>
</ul>
<p>Generally its unlikely we’ll make raw HTTP requests to the Kubernetes API server and instead make use of a client library in our programming language of choice. For example:</p>
<ul>
<li><a href="https://github.com/Kubernetes/client-go">kubernetes/client-go</a> - the official Golang module for interacting with the Kubernetes API</li>
<li><a href="https://registry.terraform.io/providers/hashicorp/kubernetes/latest/docs">Kubernetes Provider</a> for Terraform (actually uses the above Go module under the hood)</li>
<li><a href="https://github.com/kubernetes-client">kubernetes-client</a> organisation on GitHub has many official clients in different languages</li>
</ul>
<h2>#10 - Custom Resources and Operators</h2>
<p><a href="https://kubernetes.io/docs/concepts/extend-kubernetes/api-extension/custom-resources/">Custom Resource Definitions (CRDs)</a> and <a href="https://kubernetes.io/docs/concepts/extend-kubernetes/operator/">Operators</a> give us the ability to implement abstractions and higher-order representations of concepts. You can define your own resource types with CRDs that then can be handled just like any other resource in a Kubernetes cluster and you can implement the operator pattern to add logic to those custom (or default, built-in) resources.</p>
<p>This is a whole topic on its own and not something I can do justice here so I highly recommend taking a look at the <a href="https://blog.container-solutions.com/kubernetes-operators-explained">Kubernetes Operators Explained</a> blog post from <a href="https://www.container-solutions.com/">Container Solutions</a>. Below is a diagram from that blog post explaining the process of an operator.</p>
<figure class="center" markdown="1">
<p><img src="../images/kubernetes_operators_diagram.png" alt="Workflow diagram of the operator pattern in Kubernetes"></p>
<figcaption>Credit: Container Solutions</figcaption>
</figure>
<p>An operator implements the standard reconciliation loop logic, as found in almost all parts of Kubernetes, to watch for changes to resources or external factors and make adjustments accordingly - either to said resources current state or some external system in response to a change to a resource.</p>
<p>There are several popular frameworks available to handle most of the boilerplate logic needed to create an operator:</p>
<ul>
<li><a href="https://kubebuilder.io/">Kubebuilder</a> is pretty much the most commonly used, especially if working with any of the cluster-api providers.</li>
<li><a href="https://operatorframework.io/">Operator Framework</a></li>
<li><a href="https://kudo.dev/">Kudo</a></li>
<li><a href="https://metacontroller.github.io/metacontroller/intro.html">Metacontroller</a></li>
</ul>
<p>To learn more on this subject I can recommend the following YouTube videos:</p>
<ul>
<li><a href="https://www.youtube.com/watch?v=LLVoyXjYlYM">Writing a Kubernetes Operator from scratch</a></li>
<li><a href="https://www.youtube.com/watch?v=KBTXBUVNF2I">Tutorial: From Zero to Operator in 90 Minutes!</a></li>
<li><a href="https://www.youtube.com/watch?v=8JFRw9dZU_s">Smooth Operator: A Rough Guide to Kubernetes Operators</a></li>
</ul>
<h2>Wrap-up</h2>
<p>Hopefully there has been at least something from these points that you can take away and make use of today. If you have any tips you rely on when working with Kubernetes I'd love to hear them, reach out to me on Twitter <a href="https://twitter.com/Marcus_Noble_">@Marcus_Noble_</a> or on Mastodon <a href="https://k8s.social/@Marcus">@marcus@k8s.social</a>.</p>
]]></description><link>https://marcusnoble.co.uk/2022-07-04-managing-kubernetes-without-losing-your-cool</link><guid isPermaLink="true">https://marcusnoble.co.uk/2022-07-04-managing-kubernetes-without-losing-your-cool</guid><pubDate>Mon, 04 Jul 2022 00:00:00 GMT</pubDate></item><item><title><![CDATA[Restricting cluster-admin Permissions]]></title><description><![CDATA[<p>If you've managed multi-user / multi-tenant Kubernetes clusters then there's a good chance you've come across <a href="https://kubernetes.io/docs/reference/access-authn-authz/rbac/">RBAC</a> (Role-Based Access Control). RBAC provides a strong method of providing permissions to users, groups or service accounts withing a cluster. These permissions can either be cluster-wide, with <code>ClusterRole</code>, or namespace scoped, with <code>Role</code>. Roles can be combined together to build up all the rules stating what the associated entity is allowed to perform. These rules are additive, starting from a base of no permissions to do anything, building up what is allowed to be performed and there's no syntax to take away a permission that is granted by another rule.</p>
<p>Generally, and by default, operators of the cluster are assigned to the <code>cluster-admin</code> ClusterRole. This gives the user access and permission to do all operations on all resources in the cluster.</p>
<pre><code class="language-yaml">apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: cluster-admin
  labels:
    kubernetes.io/bootstrapping: rbac-defaults
rules:
- apiGroups:
  - '*'
  resources:
  - '*'
  verbs:
  - '*'
- nonResourceURLs:
  - '*'
  verbs:
  - '*'
</code></pre>
<p>There's very good reason for this, an admin generally needs to be able to setup and manage the cluster, including the ability to define and assign roles. But what if we need to <em>block</em> an action performed by cluster admins? We can't do it with RBAC, it only allows for <strong>adding</strong> of permissions, not taking them away.</p>
<p>Well, recently at <a href="https://giantswarm.io/">Giant Swarm</a> we had an <a href="https://github.com/giantswarm/kubectl-gs/pull/632">issue in one of our CLIs</a> used by cluster admins that incorrectly deleted more resources than intended. This was causing issues authenticating with the cluster and we needed to get a fix in place <em>fast</em>. The problem was we had no control over when people would update their CLI even once we'd released the fix.</p>
<p>We couldn't hot-fix this by updating RBAC rules as we couldn't subtract the specific permission, but what we could do was leverage an <a href="https://kubernetes.io/docs/reference/access-authn-authz/admission-controllers/"><strong>admission controller</strong></a> to block the request to the API.</p>
<p>At Giant Swarm we're pretty big users of <a href="https://kyverno.io/">Kyverno</a> (which deploys an admission controller) and use it for a lot of validation and defaulting of resources in all our clusters. Not only can Kyverno validate and block based on a resource's values but it can also validate the <em>operation</em> being performed, and by who.</p>
<p>With this functionality available we were able to deploy a cluster policy that would block cluster-admin (and anyone else) from performing the specific delete all action that was causing us issues.</p>
<pre><code class="language-yaml">apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
  name: block-bulk-certconfigs-delete
  annotations:
    policies.kyverno.io/title: Block bulk certconfig deletes
    policies.kyverno.io/subject: CertConfigs
    policies.kyverno.io/description: &gt;-
      A bug in an old kubectl-gs version causes all certconfigs to
      be deleted on performing a login, this policy blocks that
      action while still allowing the single delete.
spec:
  validationFailureAction: enforce
  background: false
  rules:
  - name: block-bulk-certconfigs-delete
    match:
      any:
      - resources:
          kinds:
          - CertConfig
    preconditions:
      any:
      - key: &quot;&quot;
        operator: Equals
        value: &quot;&quot;
    validate:
      message: &quot;Your current kubectl-gs version contains a critical bug, please update to the latest version using `kubectl gs selfupdate`&quot;
      deny:
        conditions:
        - key: &quot;&quot;
          operator: In
          value:
          - DELETE
</code></pre>
<p>There's a few things going on here so I'm going to explain each bit individually.</p>
<p>First up we specify the Kind of resource we want this policy to apply to, in our case that's <code>CertConfig</code> but it can be anything within the cluster. It's also possible to target specific groups and API versions if needed.</p>
<pre><code class="language-yaml">match:
    any:
    - resources:
        kinds:
        - CertConfig
</code></pre>
<p>Next up we're setting a precondition that allows us to do some filtering based on the details of the request. In this instance we're only interested in requests that don't have a name specified (this is the case when operating on a list of resources rather than a single resource) which allows us to target only those &quot;delete all&quot; requests.</p>
<pre><code class="language-yaml">preconditions:
    any:
    - key: &quot;&quot;
    operator: Equals
    value: &quot;&quot;
</code></pre>
<p>Finally we have the actual validation rule. We're specifying a <code>deny</code> rule that'll block requests (matching the previous <code>match</code> and <code>preconditions</code>) with the <code>DELETE</code> operation. We're also able to define a message that'll be returned to the client as the reason for why the admission controller rejected the request. We're using this to inform users about the bug and encouraging them to upgrade.</p>
<pre><code class="language-yaml">validate:
    message: &quot;Your current kubectl-gs version contains a critical bug, please update to the latest version using `kubectl gs selfupdate`&quot;
    deny:
    conditions:
    - key: &quot;&quot;
        operator: In
        value:
        - DELETE
</code></pre>
<p>With this single policy deployed to our clusters we've been able to block the bug in our CLI, even when the user performing the action has cluster-admin level permissions.</p>
<h2>Other Use Cases</h2>
<p>While we needed this to work around a bug in our client there are other situations where this approach could be useful.</p>
<ul>
<li>A policy that prevents deletion of any resource that has a <code>do-not-delete: &quot;true&quot;</code> annotation on it to prevent accidental deletion of critical resources (such as persisten volumes or secrets).</li>
<li>A policy that prevents fetching the details of secrets in a specific namespace while still allowing in every other namespace (including those that may not yet exist).</li>
<li>A policy that blocks deletes, updates or patches from everyone except a specific user that can be used to prevent others &quot;cleaning up&quot; resources that you may be trying to debug.</li>
</ul>
<p>So, if you haven't already, I recommend taking a look at <a href="https://kyverno.io">Kyverno</a> and especially taking a look at the <a href="https://kyverno.io/policies/">example Policies</a> that they have for an idea of what can be done.</p>
]]></description><link>https://marcusnoble.co.uk/2022-01-20-restricting-cluster-admin-permissions</link><guid isPermaLink="true">https://marcusnoble.co.uk/2022-01-20-restricting-cluster-admin-permissions</guid><pubDate>Thu, 20 Jan 2022 00:00:00 GMT</pubDate></item><item><title><![CDATA[My Recommended Go Resources]]></title><description><![CDATA[<p>I was recently asked by a collegue at work if there are any resource I recommend with getting familiar with Go. It turned into quite a list so I thought i'd share it with everyone (and hopefully keep updating it) in the hopes that others will find it useful.</p>
<h2>📚 Books</h2>
<p>My favourite book for learning Go is - ✨ <strong><a href="https://www.gopl.io/">The Go Programming Language</a></strong> ✨ by Alan A. A. Donovan &amp; Brian W. Kernighan.
This was the book I used to pick Go up in the first place. If you already have previous programming experience it's quite a nice book to first read about the specifics of the go language then flip through as needed to the various topics (such as Goroutines) that you might not know about.</p>
<p>Some 🆓 books I've seen recommended:</p>
<ul>
<li><strong><a href="https://threedots.tech/go-with-the-domain/">Go With The Domain</a></strong> by <a href="https://threedots.tech/">Three Dots Labs</a></li>
<li><strong><a href="https://www.packtpub.com/product/go-design-patterns/9781786466204">Go Design Patterns</a></strong> by Mario Castro Contreras  (Free with trial)</li>
</ul>
<p>I've also got the following books but haven't yet had a chance to read any of them so can't comment on their usefulness <em>just yet</em>:</p>
<ul>
<li><strong><a href="https://lets-go.alexedwards.net/">Let's Go</a></strong> by Alex Edwards</li>
<li><strong><a href="https://lets-go-further.alexedwards.net/">Let's Go Further</a></strong> by Alex Edwards</li>
<li><strong><a href="https://education.ardanlabs.com/courses/ultimate-go-notebook">Ultimate Go Notebook</a></strong> by <a href="https://education.ardanlabs.com/">Ardan Labs</a></li>
</ul>
<p>There's also <strong><a href="https://openfaas.gumroad.com/l/everyday-golang">Everyday Golang</a></strong> by <a href="https://www.alexellis.io/">Alex Ellis</a> which I haven't had a chance to pick up yet but if it's anything like Alex's other writing it'll be a great resource!</p>
<h2>🧰 Tools and Libraries</h2>
<p>In terms of useful resources, tools and libraries I use a lot:</p>
<ul>
<li><a href="https://play.golang.org/">The Go Playground</a> is a fantastic tool for learning and experimenting.</li>
<li><a href="https://gofiber.io/">Fiber</a> - My favourite web server framework. Very similar to the Express framework available for NodeJS.</li>
<li><a href="https://github.com/spf13/viper">Viper</a> - Config management (env vars, config files, cli flags).</li>
<li><a href="https://github.com/spf13/cobra">Cobra</a> - Framework for building amazing CLI tools.</li>
<li><a href="https://github.com/spf13/afero">Afero</a> - Filesystem framework (very useful for tests).</li>
<li><a href="https://github.com/rs/zerolog">ZeroLog</a> - Very useful logging framework. Support log levels, metadata attributes and different log output styles.</li>
<li><a href="https://gorm.io/">GORM</a> - ORM database library with official support for MySQL, PostgreSQL, SQLite and SQL Server.</li>
</ul>
<h2>🔗 Websites / Blog Posts</h2>
<ul>
<li><a href="https://awesome-go.com/">Awesome Go</a> - List of all things Go.</li>
<li><a href="https://bhupesh-v.github.io/embedding-static-files-in-golang/">Embedding Static Files in Golang</a> - Embedding static files into a Go binary.</li>
<li><a href="https://github.com/nikolaydubina/go-recipes">Go Recipes</a> - a collection of handy Go commands you can run to help with things like module updates.</li>
<li><a href="https://gist.github.com/AverageMarcus/78fbcf45e72e09d9d5e75924f0db4573">Multi-arch Dockerfile</a> - Example multi-stage Dockerfile for building tiny, multi-arch container images for Go applications.</li>
<li><a href="https://golangweekly.com/">Go Weekly Newsletter</a> - A fantastic weekly newsletter on all things Go (RSS feed also available).</li>
<li><a href="https://blog.alexellis.io/golang-writing-unit-tests/">Writing Unit Tests</a> - Writing Go unit tests.</li>
</ul>
<h2>🧑‍💻 Suggested by others</h2>
<ul>
<li><a href="https://github.com/quii/learn-go-with-tests">https://github.com/quii/learn-go-with-tests</a> - Learn Go with TDD.</li>
<li><a href="https://gobyexample.com/">https://gobyexample.com/</a> - Common program examples.</li>
<li><a href="https://changelog.com/gotime">https://changelog.com/gotime</a> - Good podcast on the Go community.</li>
<li>My friend <a href="https://twitter.com/MichaelCade1">Michael Cade</a> has shared his <a href="https://www.upthestack.io/2024/06/24/My-Journey-to-Learning-Golang.html">Journey To Learning Golang</a> that has some great tips included!</li>
</ul>
<p>If anyone has any other resources they recommend I'd love to hear about them and update this list. Please feel free to reach out to me on Twitter at <a href="https://twitter.com/Marcus_Noble_">@Marcus_Noble_</a>.</p>
]]></description><link>https://marcusnoble.co.uk/2021-09-02-my-recommended-go-resources</link><guid isPermaLink="true">https://marcusnoble.co.uk/2021-09-02-my-recommended-go-resources</guid><pubDate>Thu, 02 Sep 2021 00:00:00 GMT</pubDate></item><item><title><![CDATA[Migrating from Docker to Podman]]></title><description><![CDATA[<details>
<summary>Changelog</summary>
<p>2021-09-01: Added note about socket bugfix PR</p>
<p>2021-09-01: Added troubleshooting section about port forwarding bug</p>
<p>2021-09-04: Added note about switching to Podman on Windows</p>
<p>2021-09-04: Added update about port forwarding PR</p>
<p>2021-09-04: Added note about M1 Mac support</p>
<p>2021-09-04: Added volume mount limitation</p>
<p>2021-09-04: Added podman-macos utility</p>
</details>
<p>Docker has <a href="https://www.docker.com/blog/updating-product-subscriptions/">recently announced</a> that Docker Desktop will soon require a subscription and, based on the size of your company, may require a paid subscription. (It remains free for personal use)</p>
<p>There has been quite a bit of reaction to this news:</p>
<figure class="center" markdown="1">
<p><a href="https://twitter.com/QuinnyPig/status/1432720164169076755"><img src="/images/tweets/1432720164169076755.svg" alt=""></a></p>
<figcaption>Corey isn't too impressed with the news</figcaption>
</figure>
<figure class="center" markdown="1">
<p><a href="https://twitter.com/manuel_zapf/status/1432974196632604676"><img src="/images/tweets/1432974196632604676.svg" alt=""></a></p>
<figcaption>Manuel makes a good point about paying for what we rely on</figcaption>
</figure>
<p>Depending on which side your opinions lie, you might be looking for alternatives. Well it just so happens that <a href="https://podman.io">Podman</a> posted this well-timed tweet:</p>
<figure class="center" markdown="1">
<p><a href="https://twitter.com/Podman_io/status/1432800271873323010"><img src="/images/tweets/1432800271873323010.svg" alt=""></a></p>
<figcaption>Well timed announcement</figcaption>
</figure>
<p>So, lets give it a whirl...</p>
<h2>Replacing Docker with Podman (on Mac)</h2>
<blockquote>
<p>Note: This currently doesn't work for Macs with an M1 CPU. I've come across this post in my search - <a href="https://www.cloudassembler.com/post/podman-machine-mac-m1/">Running Podman Machine on the Mac M1</a> - but I've not confirmed if it works or  not.</p>
</blockquote>
<ol>
<li>
<p><code>brew install podman</code></p>
</li>
<li>
<p>Wait while brew downloads, builds and installs...</p>
</li>
<li>
<p>Create a virtual machine for Podman to run from:</p>
<pre><code class="language-sh">✨ podman machine init

Downloading VM image: fedora-coreos-34.20210821.1.1-qemu.x86_64.qcow2.xz: done
Extracting compressed file

🕙 took 2m44s
</code></pre>
</li>
<li>
<p>Start the virtual machine and set up the connection to Podman:</p>
<pre><code class="language-sh">✨ podman machine start

INFO[0000] waiting for clients...
INFO[0000] listening tcp://0.0.0.0:7777
INFO[0000] new connection from  to /var/folders/x_/bfc7v6kn4fs0rl9k77whs0nw0000gn/T/podman/qemu_podman-machine-default.sock
Waiting for VM ...
qemu-system-x86_64: warning: host doesn't support requested feature: CPUID.80000001H:ECX.svm [bit 2]

🕙 took 34s
</code></pre>
</li>
<li>
<p><code>alias docker=podman</code> (Add this to your <code>.bashrc</code> (if using Bash), <code>.zshrc</code> (if using ZSH) or whatever the correct file for your shell is)</p>
</li>
<li>
<p>🎉</p>
</li>
</ol>
<h2>Replacing Docker with Podman (on Windows)</h2>
<p>I don't currently have access to a Windows machine where I can test this out but <a href="https://twitter.com/frank_k_p">Frank</a> sent me this <a href="https://twitter.com/frank_k_p/status/1433490007088668672">on Twitter</a> that covers the process needed for those on Windows with WLS2 - <a href="https://www.redhat.com/sysadmin/podman-windows-wsl2">How to run Podman on Windows with WSL2</a>.</p>
<h2>Troubleshooting</h2>
<p>Ok, so it's not all <em>completely</em> pain free, there are a few issues you might hit...</p>
<h3>Failed to parse config</h3>
<pre><code class="language-sh">Error: failed to parse query parameter 'X-Registry-Config': &quot;n/a&quot;: error storing credentials in temporary auth file (server: &quot;https://index.docker.io/v1/&quot;, user: &quot;&quot;): key https://index.docker.io/v1/ contains http[s]:// prefix
</code></pre>
<p>Podman seems more strict than Docker when parsing the config file, check the <code>~/.docker/config.json</code> file for the key with the <code>https://</code> prefix (as mentioned in the error message) and remove it.</p>
<h3>Sock already exists</h3>
<pre><code class="language-sh">✨ podman machine start
ERRO[0000] &quot;/var/folders/x_/bfc7v6kn4fs0rl9k77whs0nw0000gn/T/podman/qemu_podman-machine-default.sock&quot; already exists
panic: interface conversion: net.Conn is nil, not *net.UnixConn
</code></pre>
<p>This seems to happen (for me at least) when I've previously run <code>podman machine stop</code>. It looks like the sock file isn't correctly being removed. Doing an <code>rm</code> on that file mentioned in the error message will be enough to get you going again.</p>
<blockquote>
<p>UPDATE: Looks like this will be fixed in an upcoming release. - <a href="https://github.com/containers/podman/pull/11342">PR</a></p>
</blockquote>
<h3>Volume mounts</h3>
<pre><code class="language-sh">✨ podman run --rm -it -v $(pwd):/usr/share/nginx/html:ro --publish 8000:80 docker.io/library/nginx:latest
Error: statfs /Users/marcus/web: no such file or directory
</code></pre>
<p>Podman machine currently has no support for mounting volumes from the host machine (your Mac) into the container on the virtual machine. Instead, it attepts to mount a directory matching what you specified from the <em>virtual machine</em> rather than your Mac.</p>
<p>This is a fairly big issue if you're looking for a smooth transition from Docker Desktop.</p>
<p>There's currently a fairly active <a href="https://github.com/containers/podman/issues/8016">issue</a> about this limitation but as of right now there doesn't seem to be a nice workaround or solution.</p>
<h3>Automatic published port forwarding</h3>
<pre><code class="language-sh">✨ podman run --rm -it --publish 8000:80 docker.io/library/nginx:latest &amp;
✨ curl http://localhost:8000

curl: (7) Failed to connect to localhost port 8000: Connection refused
</code></pre>
<p>The current latest version of Podman (<a href="https://github.com/containers/podman/releases/tag/v3.3.1">v3.3.1</a>) has a bug where the automatic port forwarding from host to VM when publishing a port with the <code>-p / --publish</code> flag doesn't work.</p>
<p>There's currently a couple workarounds for this:</p>
<p>The first is passing in the <code>--network bridge</code> flag to the podman command, e.g.</p>
<pre><code class="language-sh">✨ podman run --rm -it --publish 8000:80 --network bridge docker.io/library/nginx:latest
</code></pre>
<p>The other, more perminant option is to add <code>rootless_networking = &quot;cni&quot;</code> under the <code>[containers]</code> section of your <code>~/.config/containers/containers.conf</code> file.</p>
<p>To follow the progress of this bug, please refer to the <a href="https://github.com/containers/podman/issues/11396">issue</a>. <strong>UPDATE</strong>: This has now been merged and is expected to be released in v3.3.2 in the next few days or so.</p>
<h2>short-name resolution</h2>
<pre><code class="language-sh">Error: error creating build container: short-name resolution enforced but cannot prompt without a TTY
</code></pre>
<p>Ok, this is the big one and the major issue you'll likely hit making the switch today from Docker to Podman. Lets dive into it in a bit more detail...</p>
<p>First we need to understand what a short-name is in this context. It refers to container images that don't have a full domain name prefixed. You've likely come across these quite a lot before - e.g. <code>alpine:latest</code>, <code>ubuntu:12</code>, <code>giantswarm/pause:latest</code>, etc.</p>
<p>When using Docker, these images are actually first prefixed with <code>docker.io</code> (or <code>docker.io/library</code> for those official images without a namespace) before being pulled.</p>
<p>Podman doesn't have this as a default. It can work in the same way as Docker but needs a bit of configuring.</p>
<p>It's worth briefly pausing here to explain <em>why</em> this behavior is different. Podman takes the <strong>secure by default</strong> attitude to configuration and installation, and this difference is a prime example of that mindset. You've likely heard in the news over the past few years about some of the supply chain hacks that have had a big impact on some companies and projects. One of the common attack vectors is tricking users into installing what they think is a legitimate package but actually contains malicious code. The use of short names for images opens up the risk of accidentally pulling the wrong image from the wrong registry.</p>
<p>To mitigate this risk Podman has a feature where it will prompt you asking which registry you'd like to pull the shot named image from and will then save that choice to speed things up later. (On a side note, there's a repo where the community is trying to collate a collection of some of the most widely used shortcodes - https://github.com/containers/shortnames)</p>
<p>So, Podman has this handy feature to help out with security so why are we seeing an error? Well, when running Podman on MacOS (or Windows) we're actually running it in a Linux VM and remotely connecting to Podman running in that machine. Because of this we don't have an interactive terminal with the underlying Podman engine so it is unable to receive our choice if it asked us which registry to use.</p>
<h3>Fix</h3>
<p>There's a couple of solutions for this:</p>
<ol>
<li>
<p>Instead of using short names we could switch to using fully prefixed images (this includes updating any <code>FROM</code> commands in our Dockerfiles also).</p>
</li>
<li>
<p>The other approach is to reduce this security feature to be on-par with the experience we're used to with Docker.</p>
</li>
</ol>
<p>As the first solution really just relies on you changing the image names you're referencing, which will depend on how you're working, I'll focus on the second solution.</p>
<p>With our machine created and started (as outlined above) we need to access the machine to make a small configuration change. Thankfully Podman makes this quite easy:</p>
<pre><code class="language-sh">podman machine ssh
</code></pre>
<p>This will drop you into an SSH session within the virtual machine created for Podman. Once in this machine we want to make a change to the <code>/etc/containers/registries.conf</code> file. If we take a look at the file contents we'll see the final lines of it (at time of writing this) as followed:</p>
<pre><code># Enforcing mode for short names is default for Fedora 34 and newer
short-name-mode=&quot;enforcing&quot;
</code></pre>
<p>The <code>short-name-mode</code> property has 3 possible values:</p>
<ul>
<li><strong>enforcing</strong>: If no alias is found and more than one unqualified-search registry is set, prompt the user to select one registry to pull from. If the user cannot be prompted (i.e., stdin or stdout are not a TTY), Podman will throw an error.</li>
<li><strong>permissive</strong>: Behaves as enforcing but will not throw an error if the user cannot be prompted. Instead, Podman will try all unqualified-search registries in the given order. Note that no alias will be recorded.</li>
<li><strong>disabled</strong>: Podman will try all unqualified-search registries in the given order, and no alias will be recorded. This is pretty much the same behavior of Podman before short names were introduced.</li>
</ul>
<p>If we want Podman to perform more like Docker we'll want to change this value to <code>permissive</code>:</p>
<pre><code class="language-sh">sudo sed -i 's/short-name-mode=&quot;enforcing&quot;/short-name-mode=&quot;permissive&quot;/g' /etc/containers/registries.conf
</code></pre>
<p>There's one more property in this file that it's worth at least being aware of.</p>
<pre><code>unqualified-search-registries = [&quot;registry.fedoraproject.org&quot;, &quot;registry.access.redhat.com&quot;, &quot;docker.io&quot;, &quot;quay.io&quot;]
</code></pre>
<p>This property contains a list of all the registries that will be checked (in order) when looking up a short name image. <strong>Be sure the values in here are ones that you trust!</strong></p>
<p>With that change made we can <code>exit</code> from the virtual machine and Podman should then search for any short name images using these registries from now on.</p>
<h2>GUI Replacement</h2>
<p>For those that like to have a graphical UI to manage / monitor their running containers <a href="https://github.com/heyvito">Victor</a> has released <a href="https://github.com/heyvito/podman-macos">podman-macos</a> that provides a tiny taskbar utility for Podman.</p>
<figure class="center" markdown="1">
<p><img src="/images/podman-macos.png" alt=""></p>
<figcaption>Podman GUI for MacOS</figcaption>
</figure>
<h2>Wrap Up</h2>
<p>I'm sure there's many more inconsistencies but so far I'm pretty impressed. I'm plan to try using Podman instead of Docker for a while and see how I get on. I'll try and update this post with anything more I find out.</p>
<p>If anyone wants to share their experiences with Podman please reach out to me on Twitter at <a href="https://twitter.com/Marcus_Noble_">@Marcus_Noble_</a>.</p>
]]></description><link>https://marcusnoble.co.uk/2021-09-01-migrating-from-docker-to-podman</link><guid isPermaLink="true">https://marcusnoble.co.uk/2021-09-01-migrating-from-docker-to-podman</guid><pubDate>Wed, 01 Sep 2021 00:00:00 GMT</pubDate></item><item><title><![CDATA[Job hunting / hiring during a pandemic]]></title><description><![CDATA[<details>
<summary>Changelog</summary>
<p>2021-09-02: Added resources</p>
</details>
<p>I’ve recently accepted a new job offer and wanted to share a little about my experience job hunting during a global pandemic. Even after more than a year it was clear that some companies still hadn’t adjusted or taken the time to stop and look at the approach they were taking and the effect it had on the interview candidate. Hopefully some of this will be of use to someone looking to hire during similar situations.</p>
<p>Most of what’s to come is aimed at those doing the hiring rather than the hunting but I think it could be of benefit for both sides to be aware of what might come up.</p>
<p>In no particular order…</p>
<h2>Be clear about remote positions</h2>
<p>In a world where we’ve all been forced to work remotely it’s good to know what a role is actually going to be like when some normalcy returns to the world.</p>
<p>Is the role you’re advertising a permanent remote position? If it is, is it fully remote or are some days in the office expected / required?</p>
<p>Be clear about <em>where</em> you can hire remotely. Are there any legal or financial reasons why you can or can’t employ someone in a given country? If so, state those up front and save everyone some time.</p>
<p>I’d say it’s also good to mention any “core hours” or the time zones the majority of the team work in. Not everyone will be comfortable either with the asynchronous communication or having to shift their working day earlier or later. On a similar note, make sure to mention if contact with external people is part of the role and if there are any time constraints around that.</p>
<h2>Longer interview process</h2>
<p>One thing that became clear to me fairly early on was how much more spread out the whole interview process was.</p>
<p>It’s not uncommon for a company to want to perform a few different interviews (e.g. technical, behavioural, etc.) or with a few different interviewers. In face-to-face interviews these normally tend to happen all in the same day, one after another. In a remote approach this was <em>much</em> more dragged out with interviews spread over multiple weeks. As an example, I had one application spread over a month and half with a total of 7 hours of interviews plus time taken on a take-home programming exercise.</p>
<p>This all amounted to dragging out all the usual stress/anxiety that comes with interviewing. If the candidate is currently unemployed (not uncommon during this pandemic) then any delay to the process could have a serious impact on their life and their family.</p>
<h2>Pants &amp; T-shirt</h2>
<p>Ok, so it’s not all bad.</p>
<p>While I don’t recommend actually sitting in your pants (or underwear for the American readers) and a t-shirt while on an interview call I did notice there was much less worry or expectation about what to wear. Where previously I’d feel compelled to wear a shirt and smart trousers to an interview I could instead wear shorts and a t-shirt and feel much more comfortable in what I was wearing.</p>
<p>This may seem like a small thing but that extra level of comfort adds wonders to confidence during the interview.</p>
<h2>Audio is most important</h2>
<p>One thing that really struck me was how important it was that the audio worked, rather than the video.</p>
<p>While the video does allow both parties to pick up on non-verbal cues, it’s much more important that both can hear each other clearly so neither misunderstand.</p>
<p>Google Meet is especially bad in this area from my experience. When the connection degrades it’s both the video and the audio that suffers, leading to several &quot;sorry, can you repeat that please?”. In contrast I’ve found Zoom to fare quite well in this area as in a <a href="https://support.zoom.us/hc/en-us/articles/207368756-Using-QoS-DSCP-Marking">default setup the audio is prioritised slightly higher than video and screen share traffic</a>.</p>
<p>If free / open source is more your thing then <a href="https://jitsi.org/jitsi-meet/">Jitsi Meet</a> prioritises audio, even to the point where it’ll stop the video if required.</p>
<p>On a similar note, make sure you have a stable internet connection and try to avoid tethering from a mobile where possible. If unavoidable make sure to let the other person know at the start and suggest switching to audio only.</p>
<h2>Include the company name in email subjects</h2>
<p>When I received emails and calendar invites only about half of them included the company name in the subject, instead most were titles something like &quot;Interview with Marcus Noble&quot;. I couldn’t see at a glance which company it was related to and as I’d applied to a lot of different places it was quite a chore to keep things organised.</p>
<p>When sending calendar invites I much prefer to know the company and the type of interview (if relevant) than the person I’m interviewing with. Information about the interviewer and anything else relevant can be contained in the main body content but when looking at the overview of my calendar I want to know quickly which interviews I need to prepare for that day.</p>
<h2>Explain the process</h2>
<p>This is something that almost all the companies I spoke with did and it’s super valuable for the candidate.</p>
<p>Knowing exactly what to expect and when means the candidate can just focus on showing their best self during the interview process and not having to get anxious about all the worst case scenarios that flood their mind. If you can include this information along with the initial job specification then that’s even better!</p>
<h2>Have a single contact person</h2>
<p>I noticed that there was the vast number of different people that were involved in the whole interviewing process. It wasn’t uncommon to interact with a technical recruiter, the hiring manager and multiple different people from various teams. Having a single person to contact if needed and knowing that from the start goes a long way to relieving some stress on the candidates side.</p>
<h2>Scheduling is hard</h2>
<p>Pretty much every company I interviewed with used something similar to <a href="https://calendly.com/">Calendly</a> to make it easier for them to arrange the best time for the interview.</p>
<p>The problem though is the interview candidate has to update their availability on <em>all</em> of these. So once one company has scheduled a call you then need to go through all the others you have given your availability to and update them.</p>
<p>Ideally it’d be better if the candidate could use a single service for their availability but I can’t see that happening in reality.</p>
<h2>Competitive salary</h2>
<blockquote>
<p>(Ok this one is more of a general rant)</p>
</blockquote>
<p>If your salary is so competitive then brag about it! Put it in the job spec and save wasted time on both sides.</p>
<p>Not wiling to share your salary? Then ask the candidates what they’re looking for, if it’s more than you can afford then thank them for their interest and let them know you don’t currently have any roles suitable. <strong>DO NOT</strong> use this as a way of underpaying people. If someone is asking for a value below what you think the role should be don’t give them that low salary to save money. You’ll get a much happier and productive worker if you pay them fairly.</p>
<h2>LinkedIn</h2>
<p>I’ve not had much luck with LinkedIn but your mileage may vary.</p>
<p>Most of the people contacting me on there were just automatic messages and largely irrelevant to what I was looking for.</p>
<p>I forget where I first saw this suggested but adding an Emoji into your name is a really good way of filtering out the automatic messages from the personal ones. Anything that was sent to &quot;☁️ Marcus&quot; was immediately ignored as they were mostly completely irrelevant.</p>
<h2>Ghosting / Delay in Responses</h2>
<p>One thing that was especially frustrating was the companies that didn't seem like they had the time or interest to actually take part in the hiring process. I had a few companies that didn't even respond to an application and a good few others that were slow (over a week) to respond at each stage.</p>
<p>Not only does this add to the longer process mentioned above it also makes candidates either worry about the position or move on to something else.</p>
<p>If you don't plan to proceed with an application then let that candidate know. Even a basic boilerplate email would be enough so the candidate can focus elsewhere.</p>
<hr>
<p>All-in-all the process wasn’t <em>too</em> bad. Being able to do all the interviews in a setting that I was comfortable was a big positive but I also understand that I’m in a very privileged position where I have a quiet space to do that and not currently <em>need</em> a new job.</p>
<p>I'm not sure how different the situation would be if I wasn't in this privileged position but I can't imagine it being pleasant. It's worth keeping in mind when you're hiring that you may be automatically excluding some quality candidates due to either dragging out the whole process or not being flexible enough. A &quot;we embrace diversity&quot; message at the bottom of your job spec isn't enough.</p>
<p>If anyone is currently in the process of job hunting (or hiring) and wants to chat about any of this or just get some general advice I'm always happy to help where I can (though I'm far from an expert). Feel free to message me on <a href="https://twitter.com/Marcus_Noble_">Twitter</a> (DMs open).</p>
<h2>Bonus: Resources for those job hunting</h2>
<ul>
<li><a href="https://technicalinterviews.dev/">De-Coding the Technical Interview Process</a> by <a href="https://twitter.com/EmmaBostian">Emma Bostian</a> fantastic book (although examples are web dev based) outlining a lot of what to expect in the interview process.</li>
<li><a href="https://techinterviewhandbook.org/">Tech Interview Handbook</a> - Wish I'd come across this sooner. Lots of great information (especially liking the <a href="https://techinterviewhandbook.org/questions-to-ask/">Questions to ask</a> page) but does seem fairly USA centric.</li>
</ul>
]]></description><link>https://marcusnoble.co.uk/2021-08-10-job-hunting-hiring-during-a-pandemic</link><guid isPermaLink="true">https://marcusnoble.co.uk/2021-08-10-job-hunting-hiring-during-a-pandemic</guid><pubDate>Tue, 10 Aug 2021 00:00:00 GMT</pubDate></item><item><title><![CDATA[Multicloud Kubernetes]]></title><description><![CDATA[<details>
<summary>Changelog</summary>
<p>2021-08-08: Added latency details</p>
</details>
<p>I've been using Scaleway's <a href="https://www.scaleway.com/en/kubernetes-kapsule/">Kapsule</a> managed Kubernetes offering for my personal projects for a while now (this blog is running on it) so I was pretty excited when they announced a new managed Kubernetes offering dubbed <a href="https://blog.scaleway.com/k8s-multi-cloud/">Kosmos</a>. What makes Kosmos <em>really</em> interesting is it's sold as a multi-cloud Kubernetes offering.</p>
<blockquote>
<p>Kubernetes Kosmos is the first-ever managed Kubernetes engine, allowing you to attach an instance or dedicated server from any cloud provider to a Scaleway’s Kubernetes control plane.</p>
<p>With Kubernetes Kosmos, you can:</p>
<ul>
<li>Deploy clusters across multiple providers</li>
<li>Attach nodes from any cloud providers</li>
<li>Benefit from scalability &amp; stability of cross-provider infrastructure</li>
<li>Access a fully managed &amp; redundant Control Plane</li>
</ul>
</blockquote>
<p>This is quite a powerful approach as it allows the user to take advantage of the best functionality of each cloud provider on offer. But on top of that you are able to use a managed control plane with your on-premise instances.</p>
<p>This is all made possible thanks to the <a href="https://github.com/squat/kilo">Kilo</a> project that's built on top of WireGuard to provide multi-cloud networking.</p>
<h2>Let's give it a try</h2>
<p>Creation of a Kosmos cluster is a fairly easy process via the Scalway web console. Select the cluster type you want to create (Kosmos), the region to deploy into and the version of Kubernetes you want to use.</p>
<figure class="center" markdown="1">
<p><img src="/images/kosmos/CreateCluster.png" alt="Create a Cluster form"></p>
<figcaption>Create a Cluster</figcaption>
</figure>
<p>You also have the option of giving your cluster a name, description and some tags to differentiate it from any other clusters you may have.</p>
<p>During the initial creation you have the option of adding a Scaleway managed node pool. You can choose to skip this and instead set up your node pools later.</p>
<figure class="center" markdown="1">
<p><img src="/images/kosmos/CreationInProgress.png" alt="Cluster Creating"></p>
<figcaption>The cluster will take a few moments to become ready</figcaption>
</figure>
<p>Once the cluster has finished creating and is ready to use you can add more node pools. One thing I did find interesting when adding a second Scaleway managed node pool was I could pick a different region to what my cluster was deployed into.</p>
<figure class="center" markdown="1">
<p><img src="/images/kosmos/MultiRegionPools.png" alt="Node pools in two different regions"></p>
<figcaption>Two node pools in two different regions</figcaption>
</figure>
<p>Above you can see two node pools, the one called default I created along with the Kosmos cluster, the other was created after in a different region to the control plane. So far we have a multi-region Kubernetes cluster.</p>
<figure class="center" markdown="1">
<p><img src="/images/kosmos/KubectlMultiRegionPools.png" alt="Kubectl showing two nodes in two different regions"></p>
<figcaption>Our two nodes in two different pools</figcaption>
</figure>
<h2>Multi-cloud</h2>
<p>Now that we have Kosmos set up it's time to give the multi-cloud functionality a try. The process for this isn't much more that adding another node pool to the cluster but this time we need to select the &quot;Multi-Cloud&quot; type instead of &quot;Scaleway&quot; for our new pool.</p>
<figure class="center" markdown="1">
<p><img src="/images/kosmos/AddPool.png" alt="Adding a new multi-cloud node pool"></p>
<figcaption>Select the "Multi-Cloud" type</figcaption>
</figure>
<p>Once the pool is created you can pick to &quot;Add a node&quot; which will provide you with instructions on how to register an external node with your Kosmos cluster.</p>
<p>It's worth noting here that currently only <strong>Ubuntu 20.04</strong> is supported as the host OS (20.10 currently fails with an <code>apparmor_parser</code> error) and the setup script needs to be run as root (I find it best to run the setup script from the init userdata if your cloud provider supports it).</p>
<figure class="center" markdown="1">
<p><img src="/images/kosmos/AddExternalNode.png" alt="Instructions on adding an external node"></p>
<figcaption>The commands you need to run to register your new node</figcaption>
</figure>
<h3>Adding an external node</h3>
<p>For our first multi-cloud nodes I'm going to be using <a href="https://civo.com/">Civo</a>.</p>
<figure class="center" markdown="1">
<p><img src="/images/kosmos/CivoNode.png" alt="New node form in Civo"></p>
<figcaption>Make sure you select an instance with at least 2Gb memory (not 1Gb as I have here) so you have enough for Kubelet</figcaption>
</figure>
<p>The <code>multicloud-init.sh</code> script provided by Scaleway doesn't provide much in the way of progress updates and the external node can take quite a while before it shows up in kubectl so to keep an eye on anything going wrong I recommend taking a look at <code>journalctl -f</code> for your first nodes.</p>
<p>After a while your new nodes should pop up in kubectl.</p>
<figure class="center" markdown="1">
<p><img src="/images/kosmos/KubectlCivo.png" alt="New Civo nodes showing in Kubectl"></p>
<figcaption>You can now see two external nodes hosted with Civo (the cool-wilson pool)</figcaption>
</figure>
<p>The init script doesn't currently provide any way of passing in node labels for Kubelet so to make things clearer I've manually added a <code>cloud</code> label to each node.</p>
<figure class="center" markdown="1">
<p><img src="/images/kosmos/CloudLabel.png" alt="Kubectl showing all nodes and their cloud label"></p>
<figcaption>Two Scaleway managed nodes and two Civo nodes</figcaption>
</figure>
<p>While we're at it, lets add another cloud provider into the mix. This time I'm going to add a bare metal instance from <a href="https://www.equinix.com/services/edge-services/equinix-metal">Equinix Metal</a>.</p>
<figure class="center" markdown="1">
<p><img src="/images/kosmos/KubectlEquinix.png" alt="Kubectl showing all nodes and their cloud label"></p>
<figcaption>As before I've manually added the node label</figcaption>
</figure>
<p>Finally, let's add an on-premise instance to our cluster (or rather a virtual machine running on my Mac).</p>
<figure class="center" markdown="1">
<p><img src="/images/kosmos/KubectlLocal.png" alt="Kubectl showing all nodes and their cloud label"></p>
<figcaption>As before I've manually added the node label</figcaption>
</figure>
<p>We've now got a single Kubernetes cluster with worker nodes spread across (at least) 5 physical locations, 3 different cloud providers and a mix of different resource capabilities.</p>
<h2>Making use of the cluster</h2>
<p>The main thing I wanted to test with this multi-cloud approach was how ingress handled pods being scheduled elsewhere. Turns out there's no surprises here. A LoadBalancer is created and hosted by Scaleway that then points to the appropriate Kuberentes Service resource which in turn points to the appropriate pod no matter where it happens to be scheduled.</p>
<p>It'd be good to do some more testing at some point on latency and network traffic costs to work out if this is a cost-effective approach or not. If anyone does dig into this more please do let me know!</p>
<h2>Some improvements needed</h2>
<p>It's not all perfect, after all it is still an early beta.</p>
<p>I've already mentioned some of the limitations of the <code>multicloud-init.sh</code> script such as not being able to add node labels at creation time. These are fairly easy to work around though and likely to be supported in the future. The lack of progress visibility when adding a new external node is a bit of a pain when first trying out the service but not really an issue once you've got everything set up right.</p>
<p>One thing I did notice that wasn't ideal was if an external node is deleted in the external cloud the associated node resource in the cluster doesn't get removed and instead just changes to a &quot;NotReady&quot; status.</p>
<figure class="center" markdown="1">
<p><img src="/images/kosmos/NotReady.png" alt="Kubectl showing a deleted node as NotReady"></p>
<figcaption>When I deleted one of the external instances the node remains in the NotReady state.</figcaption>
</figure>
<p>This doesn't seem like a big deal at first but it does leave some pods in a pending state while they wait for the node to become ready again. If you're taking advantage of autoscaling of your external nodes this is likley to crop up fairly quickly and could cause unexpected issues.</p>
<p>The last issue I hit was when trying to add an ARM based machine (either a local Raspberry Pi or a cloud based ARM instance). The <code>multicloud-init.sh</code> script doesn't currently support architectures other than x86_64 <em>but</em> a small tweak to the script and we can get an ARM node set up...</p>
<h2>Adding ARM support</h2>
<p>To get ARM-based instances working we need to make a few changes to the <code>multicloud-init.sh</code> script provided by Scaleway. Below you can see the diff of the changes I used to get an ARM instance running on Equinix Metal.</p>
<pre><code>7a8,12
&gt; os_arch=&quot;amd64&quot;
&gt; if [[ &quot;$(arch)&quot; != &quot;x86_64&quot; ]]; then
&gt;   os_arch=&quot;arm64&quot;
&gt; fi
&gt;
90c95,99
&lt;   apt-get install -y containerd.io &gt; /dev/null 2&gt;&amp;1
---
&gt;   if [[ &quot;${os_arch}&quot; == &quot;amd64&quot; ]]; then
&gt;     apt-get install -y containerd.io &gt; /dev/null 2&gt;&amp;1
&gt;   else
&gt;     apt-get install -y containerd &gt; /dev/null 2&gt;&amp;1
&gt;   fi
221c230
&lt;   curl -L &quot;https://storage.googleapis.com/kubernetes-release/release/v${kubernetes_version}/bin/linux/amd64/kubectl&quot; \
---
&gt;   curl -L &quot;https://storage.googleapis.com/kubernetes-release/release/v${kubernetes_version}/bin/linux/${os_arch}/kubectl&quot; \
223c232
&lt;   curl -L &quot;https://storage.googleapis.com/kubernetes-release/release/v${kubernetes_version}/bin/linux/amd64/kubelet&quot; \
---
&gt;   curl -L &quot;https://storage.googleapis.com/kubernetes-release/release/v${kubernetes_version}/bin/linux/${os_arch}/kubelet&quot; \
</code></pre>
<p>This currently only works with 64-bit ARM instances (so not all Raspberry Pis) but it shouldn't take too much to expand it to correctly support more architectures. Hopefully Scaleway will provide support for ARM soon as we're seeing a lot of advances in ARM machines lately.</p>
<figure class="center" markdown="1">
<p><img src="/images/kosmos/KubectlARM.png" alt="Kubectl showing an ARM based node"></p>
<figcaption>Our new ARM node</figcaption>
</figure>
<p>There is still one small issue with ARM instances after using this modified script - the <code>node-problem-detector</code> DaemonSet used by Scaleway currently only targets x86 so will fail to run on our new ARM instance.</p>
<h2>Cross-cloud latency</h2>
<p><a href="https://twitter.com/markboost">Mark Boost</a> asked on Twitter about the general performance and latency involved with multi-cloud clusters so I did a quick and dirty test to see how things generally behave.</p>
<p>My test involved 4 different node pools, each with a single instance:</p>
<ul>
<li>Scaleway - par1 = 1x DEV1_M</li>
<li>Scaleway - ams1 = 1x DEV1_M</li>
<li>Civo - lon1 = 1x Small</li>
<li>Civo - nyc1 = 1x Small</li>
</ul>
<p>(Note: The Kosmos control plane is hosted in Scaleway par1 region)</p>
<pre><code>✨ k get nodes --output custom-columns=Name:.metadata.name,Cloud:&quot;.metadata.labels.topology\.kubernetes\.io/region&quot;
Name                                             Cloud
scw-multicloud-civo-lon-00a1d7529f8340109990e1   civo-lon1
scw-multicloud-civo-lon-8deec4c46be0445fb89c2f   civo-nyc1
scw-multicloud-scaleway-ams1-6d6b9ac4eeed4f50b   nl-ams
scw-multicloud-scaleway-par1-fd031cee6aa64843b   fr-par
</code></pre>
<p>For my test case I'm leveraging <a href="https://linkerd.io/">Linkerd</a> along with their emojivoto demo application. The Linkerd pods are running on one of the two Scaleway instances and for each test I will target a different node for the emojivoto deployments. Each test is only run for a very short period of time (a few minutes) so results are very much ballpark figures and not to be given <em>too</em> much weight.</p>
<p>My main reason for this approach is that I'm lazy and Linkerd gives a fairly nice set of dashboard for latency with very little effort.</p>
<p>Ok, now for the results...</p>
<p>First up we have all deployments scheduled onto the <strong>Scaleway-ams1</strong> node:</p>
<figure class="center" markdown="1">
<p><img src="/images/kosmos/Latency-1.png" alt=""></p>
<figcaption>Scaleway ams1</figcaption>
</figure>
<p>You can see this is fairly bumpy to start with as the pods get started but all fairly low as we'd expect.</p>
<p>Next up we have all pods scheduled onto the <strong>Scaleway-par1</strong> node:</p>
<figure class="center" markdown="1">
<p><img src="/images/kosmos/Latency-2.png" alt=""></p>
<figcaption>Scaleway par1</figcaption>
</figure>
<p>This is the same region as our control plane. All the latencies are very low with only a little bit of fluctuation.</p>
<p>We then have the first of our external nodes - <strong>Civo-lon1</strong>:</p>
<figure class="center" markdown="1">
<p><img src="/images/kosmos/Latency-3.png" alt=""></p>
<figcaption>Civo lon1</figcaption>
</figure>
<p>This seems to have much more fluctuation in the latency but overall still showing as very low with a lot of requests still inline with those nodes hosted on Scaleway.</p>
<p>The last node to try is <strong>Civo-nyc1</strong>:</p>
<figure class="center" markdown="1">
<p><img src="/images/kosmos/Latency-4.png" alt=""></p>
<figcaption>Civo nyc1</figcaption>
</figure>
<p>This one looks to be trending slightly slower but still within the same sort of range as the other nodes. It's worth pointing out that this node is physically located the furthest away with all the other nodes located within Europe.</p>
<p>Finally, as we have 4 nodes and 4 different deployments, I wanted to test how things would look with the application spread over all the available nodes. This next result shows the latencies with each of the deployments scheduled to a different node:</p>
<figure class="center" markdown="1">
<p><img src="/images/kosmos/Latency-5.png" alt=""></p>
<figcaption>Spread over all nodes</figcaption>
</figure>
<p>Right away you can see that this have much more latency with the Y-axis being at least double that of any of the previous results. That being said, it is still all within 100ms and for this very small test at least within an acceptable range.</p>
<h2>Final Thoughts</h2>
<p>Combining it with something like <a href="https://github.com/kubernetes-sigs/cluster-api/">cluster-api</a> or <a href="https://github.com/gardener/gardener">Gardener</a> could make the management of multi-cloud resources much more manageable.</p>
<p>Overall I'm pretty impressed with Kosmos so far, I think it has a lot of potential and I expect some of the other cloud providers will offer something similar eventually. It seems like a simpler alternative to Kubernetes federation when multi-tenancy isn't a requirement.</p>
]]></description><link>https://marcusnoble.co.uk/2021-08-08-multicloud-kubernetes</link><guid isPermaLink="true">https://marcusnoble.co.uk/2021-08-08-multicloud-kubernetes</guid><pubDate>Sun, 08 Aug 2021 00:00:00 GMT</pubDate></item><item><title><![CDATA[T.I.L. CLI flag handling in Bash using getopts]]></title><description><![CDATA[<p>I'm not sure how I've never come across this before but while looking through the <a href="https://www.scaleway.com/en/betas/#kuberneteskosmos">Scaleway Kosmos</a> multi-cloud init script I dicovered the <a href="https://www.man7.org/linux/man-pages/man1/getopts.1p.html"><code>getopts</code></a> utility.</p>
<p><code>getopts</code> makes it easier to parse arguments passed to a shell script by defining which letters your script supports. It supports both boolean and string style arguments but only supports single letter flags. (e.g. <code>-h</code> and not <code>--help</code>)</p>
<p>Example usage:</p>
<pre><code class="language-sh">#!/bin/bash

NAME=&quot;World&quot;
FORCE=false

showHelp() {
    echo &quot;Usage: example.sh [args]&quot;
    exit 0
}

while getopts 'hfn:' FLAG
do
  case $FLAG in
    h) showHelp ;;
    f) FORCE=true ;;
    n) NAME=$OPTARG ;;
    *) echo &quot;Unsupported argument flag passed&quot; ;;
  esac
done

echo &quot;Hello, $NAME&quot;

</code></pre>
<p>Notice the <code>:</code> following the <code>n</code>? That indicates that a value should follow the argument flag (<code>n</code> in this example) and will be made available as the <code>OPTARG</code> variable.</p>
]]></description><link>https://marcusnoble.co.uk/2021-08-04-t-i-l-cli-flag-handling-in-bash-using-getopts</link><guid isPermaLink="true">https://marcusnoble.co.uk/2021-08-04-t-i-l-cli-flag-handling-in-bash-using-getopts</guid><pubDate>Wed, 04 Aug 2021 19:49:37 GMT</pubDate></item><item><title><![CDATA[T.I.L. YAML keys allow for spaces in them]]></title><description><![CDATA[<p>While browsing through some of <a href="https://github.com/frenck">Frenck's</a> <a href="https://github.com/frenck/home-assistant-config">Home Assistant Config</a> for ideas I came across <a href="https://github.com/frenck/home-assistant-config/blob/a963e1cb3e2acf7beda2b466b334218ac27ee42f/config/integrations/automation.yaml#L7">this interesting line of YAML</a>:</p>
<pre><code class="language-yaml">---
# This handles the loading of my automations
#
# https://www.home-assistant.io/docs/automation/
#
automation: !include ../automations.yaml
automation split: !include_dir_list ../automations    # &lt;--
</code></pre>
<p>I found myself staring at this for a while, followed by searching the <a href="https://www.home-assistant.io/">Home Assistant</a> documentation website to see if <code>split</code> was a special keyword I wasn't aware of.</p>
<p>And then it dawned on me! As all JSON is valid YAML, and JSON keys can be pretty much any string it makes sense that YAML supports it.</p>
<p>The above example converted to JSON using <a href="https://mikefarah.gitbook.io/yq">yq</a> looks like this:</p>
<pre><code class="language-json">// yq config.yaml -o json
{
  &quot;automation&quot;: &quot;../automations.yaml&quot;,
  &quot;automation split&quot;: &quot;../automations&quot;
}
</code></pre>
<p>Knowing this, I decided to try out a few more variations to see what works...</p>
<p>YAML:</p>
<pre><code class="language-yaml">---
123: Valid
---: also valid
5.5: yup! this too
#how about this?: nope, this is treated as a comment
//: yeah, totally valid
✨: yep!
[1]: Works
[1, 2]: Still works, treated as string
{another}: This one is interesting
</code></pre>
<p>JSON:</p>
<pre><code class="language-json">{
  &quot;123&quot;: &quot;Valid&quot;,
  &quot;---&quot;: &quot;also valid&quot;,
  &quot;5.5&quot;: &quot;yup! this too&quot;,
  &quot;//&quot;: &quot;yeah, totally valid&quot;,
  &quot;✨&quot;: &quot;yep!&quot;,
  &quot;[1]&quot;: &quot;Works&quot;,
  &quot;[1, 2]&quot;: &quot;Still works, treated as string&quot;,
  &quot;{\&quot;another\&quot;=&gt;nil}&quot;: &quot;This one is interesting&quot;
}
</code></pre>
<p>Depending on the library used, varying results can be generated. For example, yamlonline (update: website no longer online) returns the following for the same input:</p>
<pre><code class="language-json">{
	&quot;1&quot;: &quot;Works&quot;,
	&quot;123&quot;: &quot;Valid&quot;,
	&quot;---&quot;: &quot;also valid&quot;,
	&quot;5.5&quot;: &quot;yup! this too&quot;,
	&quot;//&quot;: &quot;yeah, totally valid&quot;,
	&quot;✨&quot;: &quot;yep!&quot;,
	&quot;1,2&quot;: &quot;Still works, treated as string&quot;,
	&quot;[object Object]&quot;: &quot;This one is interesting&quot;
}
</code></pre>
]]></description><link>https://marcusnoble.co.uk/2021-05-11-t-i-l-yaml-keys-allow-for-spaces-in-them</link><guid isPermaLink="true">https://marcusnoble.co.uk/2021-05-11-t-i-l-yaml-keys-allow-for-spaces-in-them</guid><pubDate>Tue, 11 May 2021 00:00:00 GMT</pubDate></item><item><title><![CDATA[T.I.L. Kubernetes label length]]></title><description><![CDATA[<p>It turns out that label <em>values</em> in Kubernetes have a limit of 63 characters!</p>
<p>I discovered this today when none of my nodes seemed to be connecting to the control plane. Eventually discovered the hostname of the node was longer than 63 characters (mainly due to multiple subdomain levels) and so the <code>kubernetes.io/hostname</code> label being automtically added to the node was causing Kubernetes to reject it.</p>
<p>If you hit this like me, the hostname used for the label can be <a href="https://kubernetes.io/docs/reference/labels-annotations-taints/#kubernetesiohostname">overridden using the <code>--hostname-override</code> flag on kubelet</a> or by setting the value of the label yourself with the <code>--node-labels</code> flag.</p>
]]></description><link>https://marcusnoble.co.uk/2021-04-20-t-i-l-kubernetes-label-length</link><guid isPermaLink="true">https://marcusnoble.co.uk/2021-04-20-t-i-l-kubernetes-label-length</guid><pubDate>Tue, 20 Apr 2021 14:10:37 GMT</pubDate></item><item><title><![CDATA[T.I.L. How to get the favicon of any site]]></title><description><![CDATA[<p>If you ever find yourself needing to display a small icon for a 3rd party URL but don't want to have to crawl the site to pull out the favicon URL then you can make use of a Google CDN:</p>
<pre><code>https://s2.googleusercontent.com/s2/favicons?domain_url=https://marcusnoble.co.uk/
</code></pre>
<p>Example: <img src="https://s2.googleusercontent.com/s2/favicons?domain_url=https://marcusnoble.co.uk/" alt=""></p>
<p>You can even provide any page, not just the root URL.</p>
<p>e.g. <code>https://s2.googleusercontent.com/s2/favicons?domain_url=https://marcusnoble.co.uk/2020-11-10-t-i-l-how-to-get-the-favicon-of-any-site/</code></p>
<p><img src="https://s2.googleusercontent.com/s2/favicons?domain_url=https://marcusnoble.co.uk/2020-11-10-t-i-l-how-to-get-the-favicon-of-any-site/" alt=""></p>
]]></description><link>https://marcusnoble.co.uk/2020-11-10-t-i-l-how-to-get-the-favicon-of-any-site</link><guid isPermaLink="true">https://marcusnoble.co.uk/2020-11-10-t-i-l-how-to-get-the-favicon-of-any-site</guid><pubDate>Tue, 10 Nov 2020 08:49:37 GMT</pubDate></item></channel></rss>