Tracing through Ginkgo Tests with OpenTelemetry

Testing distributed services is hard. Troubleshooting test failures is challenging when the failure isn’t from the immediate service under test, but from another dependent service.

One way to ease troubleshooting test failures is by using tracing. Tracing can help see the flow of requests between tests and services. This process works for services created in the same repository as the tests and can work with external services.

Tracing can benefit test cases by tracking which requests are being made by tests, but tracing shines when the services under test also support tracing.

In this post, we’ll:

  • setup OpenTelemetry Collector and Jaeger all-in-one for collecting and viewing traces
  • create a microservice, named microservice-a, that makes an HTTP request to another microservice, microservice-b
  • create another microservice, named microservice-b, that randomly returns a message or an error
  • create a Ginkgo test suite that makes an HTTP request to microservice-a

The microservices and Ginkgo will all publish spans using the OpenTelemetry SDK to the OpenTelemetry Collector, which will then send the spans to Jaeger for viewing.

Note: a trace is a collection of related spans

Here’s a diagram of what this final flow will look like:

sequenceDiagram
  participant Ginkgo
  participant microservice a
  participant microservice b
  participant OpenTelemetry Collector
  participant Jaeger
  Ginkgo->>microservice a: HTTP GET /
  microservice a->>microservice b: HTTP GET /
  microservice b-->>microservice a: message
  microservice a-->>Ginkgo: message
  microservice a->>OpenTelemetry Collector: spans
  microservice b->>OpenTelemetry Collector: spans
  Ginkgo->>OpenTelemetry Collector: spans
  OpenTelemetry Collector->>Jaeger: spans

Here is an example of what a failing trace looks like:

example trace in Jaeger

Feel free to skip to Create Ginkgo suite with TracerProvider set up if you want to jump straight to the Ginkgo test suite with tracing. The source code for this post is available at ginkgo-gomega-open-telemetry.

Set up tracing provider and propagator

We’ll create a couple of functions to create a propagator and tracer provider. The functions will be used by both microservices and Ginkgo.

The propagator will enable the attachment and extraction OpenTelemetry headers from HTTP requests to create connected spans across services. The tracing provider will export spans to the OpenTelemetry Collector listening on port 44317.

Create a new directory named internal by running:

1
mkdir internal

Then, create a new file named internal/tracing.go with the following content:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
package internal

import (
	"context"
	"fmt"

	"go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracegrpc"
	"go.opentelemetry.io/otel/propagation"
	"go.opentelemetry.io/otel/sdk/resource"
	sdktrace "go.opentelemetry.io/otel/sdk/trace"
	semconv "go.opentelemetry.io/otel/semconv/v1.26.0"
)

func GetPropagator() propagation.TextMapPropagator {
	return propagation.NewCompositeTextMapPropagator(
		propagation.TraceContext{},
		propagation.Baggage{},
	)
}

func GetTracerProvider(serviceName string) (*sdktrace.TracerProvider, error) {
	resource := resource.NewWithAttributes(
		semconv.SchemaURL,
		semconv.ServiceName(serviceName),
	)

	exporter, err := otlptracegrpc.New(
		context.Background(),
		otlptracegrpc.WithEndpoint("localhost:44317"),
		otlptracegrpc.WithInsecure(),
	)
	if err != nil {
		return nil, fmt.Errorf("failed to create exporter: %w", err)
	}

	tracerProvider := sdktrace.NewTracerProvider(
		sdktrace.WithBatcher(
			exporter,
		),
		sdktrace.WithResource(resource),
	)

	return tracerProvider, nil
}

The above code has two functions. GetPropagator will enable OpenTelemetry’s trace state and trace flags to be attached and extracted from HTTP headers. GetTracerProvider creates a tracer provider that exports spans to the OpenTelemetry Collector running a gRPC server on localhost:44317 without TLS enabled.

Create microservices that publish spans

We won’t dive too much into what these microservices are doing. They mostly exist, so we have examples to test via Ginkgo. The main thing to note is that microservice-a makes a request to microservice-b. microservice-b returns a message, and microservice-a wraps that reponse and returns another response. microservice-b will randomly return an error so that we can see errors within Ginkgo tests.

Let’s create our first microservice, microservice-a, that will make an HTTP request to microservice-b and return a message wrapping the message from microservice-b.

Create a cmd/microservice-a directory via:

1
mkdir -p cmd/microservice-a

Create cmd/microservice-a/main.go file with the following content:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
package main

import (
	"context"
	"encoding/json"
	"fmt"
	"log"
	"net/http"

	"github.com/dustinspecker/ginkgo-gomega-open-telemetry/internal"
	"go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp"
	"go.opentelemetry.io/otel"
)

type Response struct {
	Message string `json:"message"`
}

func getMessage(ctx context.Context) (string, error) {
	url := "http://localhost:8081/"
	jsonMessageResponse, err := otelhttp.Get(ctx, url)
	if err != nil {
		return "", fmt.Errorf("failed to create request: %w", err)
	}
	defer jsonMessageResponse.Body.Close()

	if jsonMessageResponse.StatusCode >= 300 {
		return "", fmt.Errorf("failed request %s: %s", url, jsonMessageResponse.Status)
	}

	response := Response{}
	if err := json.NewDecoder(jsonMessageResponse.Body).Decode(&response); err != nil {
		return "", fmt.Errorf("failed decoding response: %w", err)
	}

	return response.Message, nil
}

func main() {
	tracerProvider, err := internal.GetTracerProvider("microservice-a")
	if err != nil {
		log.Fatalf("failed to get tracer: %v", err)
	}
	// Shutdown tracerProvider at the end of main since log.Fatalf is used
	// when listener stops

	otel.SetTracerProvider(tracerProvider)

	prop := internal.GetPropagator()
	otel.SetTextMapPropagator(prop)

	mux := http.NewServeMux()
	mux.Handle("/", otelhttp.WithRouteTag("/", http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
		message, err := getMessage(r.Context())
		if err != nil {
			http.Error(w, err.Error(), http.StatusInternalServerError)
			return
		}

		response := Response{
			Message: fmt.Sprintf("message from microservice-a: %s", message),
		}
		jsonResponse, err := json.Marshal(response)
		if err != nil {
			http.Error(w, err.Error(), http.StatusInternalServerError)
			return
		}

		w.Header().Add("Content-Type", "application/json")
		w.Write(jsonResponse)
	})))

	err = http.ListenAndServe(":8080", otelhttp.NewHandler(mux, "server", otelhttp.WithMessageEvents(otelhttp.ReadEvents, otelhttp.WriteEvents)))

	tracerProvider.Shutdown(context.Background())

	// err should never be nil since ListenAndServe always returns a non-nil error
	if err != nil {
		log.Fatalf("failed to listen and serve: %v", err)
	}
}

This code sets up OpenTelemetry tracing, starts a HTTP server listening on port 8080, and requests to http://localhost:8080 will make a request to http://localhost:8081/ and return a message wrapping that message.

Let’s create the second microservice, microservice-b, that will return a message or an error.

Create a cmd/microservice-b directory via:

1
mkdir -p cmd/microservice-b

Create cmd/microservice-b/main.go with the following content:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
package main

import (
	"context"
	"encoding/json"
	"errors"
	"log"
	"math/rand"
	"net/http"

	"github.com/dustinspecker/ginkgo-gomega-open-telemetry/internal"
	"go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp"
	"go.opentelemetry.io/otel"
	"go.opentelemetry.io/otel/trace"
)

type Response struct {
	Message string `json:"message"`
}

func main() {
	tracerProvider, err := internal.GetTracerProvider("microservice-b")
	if err != nil {
		log.Fatalf("failed to get tracer: %v", err)
	}
	// Shutdown tracerProvider at the end of main since log.Fatalf is used
	// when listener stops

	otel.SetTracerProvider(tracerProvider)

	prop := internal.GetPropagator()
	otel.SetTextMapPropagator(prop)

	mux := http.NewServeMux()

	mux.Handle("/", otelhttp.WithRouteTag("/", http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
		if rand.Intn(2) == 1 {
			span := trace.SpanFromContext(r.Context())
			span.RecordError(errors.New("random error"))
			http.Error(w, "Internal Server Error", http.StatusInternalServerError)
			return
		}

		response := Response{
			Message: "Hello, World!",
		}

		jsonResponse, err := json.Marshal(response)
		if err != nil {
			http.Error(w, err.Error(), http.StatusInternalServerError)
			return
		}

		w.Header().Add("Content-Type", "application/json")
		w.Write(jsonResponse)
	})))

	err = http.ListenAndServe(":8081", otelhttp.NewHandler(mux, "server", otelhttp.WithMessageEvents(otelhttp.ReadEvents, otelhttp.WriteEvents)))

	tracerProvider.Shutdown(context.Background())

	// err should never be nil since ListenAndServe always returns a non-nil error
	if err != nil {
		log.Fatalf("failed to listen and serve: %v", err)
	}
}

It’s very similar to microservice-a but will randomly return an error. microservice-b will listen on port 8081.

Setup OpenTelemetry tracing collection

OpenTelemetry Collector supports several of receivers and exporters to convert different tracing formats. We want to receive OpenTelemetry traces via gRPC and export them to Jaeger as OpenTelemetry traces via gRPC.

Note: I’m saying OpenTelemetry Collector, but we’re actually using OpenTelemetry Collector Contrib which has additional receivers and exporters.

OpenTelemetry Collector may seem unnecessary since Jaeger can support OpenTelemetry traces directly, but it’s a good example of how to use the it to convert between different tracing formats. In a follow-up post, I’ll show how to publish metric data to Prometheus, which will be a better use case of OpenTelemetry Collector.

Create a new file named open-telemetry-config.yaml with the following content:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
receivers:
  otlp:
    protocols:
      grpc:
        endpoint: 0.0.0.0:44317

exporters:
  otlp/2:
    endpoint: 0.0.0.0:4317
    tls:
      insecure: true

service:
  pipelines:
    traces:
      receivers:
        - otlp
      exporters:
        - otlp/2

This instructs OpenTelemetry Collector to listen on port 44317 for OpenTelemetry traces via gRPC and export them to Jaeger on port 4317.

Now, let’s start Jaeger and the OpenTelemetry Collector using Docker containers.

Start Jaeger all-in-one by running the following command in a new terminal:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
docker run \
  --interactive \
  --name jaeger \
  --rm \
  --tty \
  --env COLLECTOR_OTLP_ENABLED=true \
  --publish 0.0.0.0:16686:16686 \
  --publish 0.0.0.0:4317:4317 \
  --publish 0.0.0.0:44317:44317 \
  jaegertracing/all-in-one:1.35

Start the OpenTelemetry Collector by running the following command in a new terminal:

1
2
3
4
5
6
7
8
docker run \
  --interactive \
  --name opten-telemetry-collector \
  --network container:jaeger \
  --rm \
  --tty \
  --volume "$PWD/open-telemetry-config.yaml:/open-telemetry-config.yaml" \
  otel/opentelemetry-collector-contrib:0.104.0 --config /open-telemetry-config.yaml

Jaeger all-in-one and OpenTelemetry Collector will run in the same network so that the OpenTelemetry Collector can send traces to Jaeger. A few ports will be published as well, so that spans may be pushed to OpenTelemetry Collector and Jaeger’s UI is available.

These Docker commands should work regardless of whether you use Docker Desktop.

Create Ginkgo suite with TracerProvider set up

Let’s create a new Ginkgo suite and some example test cases. Start by bootstrapping a Ginkgo suite and example test file via:

1
2
3
4
5
mkdir test
cd test
ginkgo bootstrap
ginkgo generate example
cd ..

Let’s modify our Ginkgo suite to set up tracing via OpenTelemetry. Modify ./test/test_suite_test.go to match:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
package test_test

import (
	"context"
	"fmt"
	"testing"

	"github.com/dustinspecker/ginkgo-gomega-open-telemetry/internal"
	. "github.com/onsi/ginkgo/v2"
	. "github.com/onsi/gomega"
	"go.opentelemetry.io/otel"
	"go.opentelemetry.io/otel/codes"
	"go.opentelemetry.io/otel/trace"
)

var (
	// Note: test cases should not directly use suiteCtx or suiteSpan
	suiteCtx  context.Context
	suiteSpan trace.Span

	// Note: test cases may create children spans by providing testCtx to tracer.Start
	testCtx context.Context
	// Note: test cases may attach attributes to testSpan if desired through testSpan.SetAttributes
	testSpan trace.Span
)

func TestTest(t *testing.T) {
	RegisterFailHandler(Fail)
	RunSpecs(t, "Test Suite")
}

// setup tracer provider, propagator, and create suite-level span
// each test case will have a child span of the suite-level span
var _ = BeforeSuite(func() {
	tracerProvider, err := internal.GetTracerProvider("test")
	Expect(err).NotTo(HaveOccurred(), "error creating tracer provider")
	DeferCleanup(func() {
		Expect(tracerProvider.Shutdown(context.Background())).To(Succeed(), "error shutting down tracer provider")
	})
	otel.SetTracerProvider(tracerProvider)

	propagator := internal.GetPropagator()
	otel.SetTextMapPropagator(propagator)

	suiteCtx, suiteSpan = tracerProvider.Tracer("test").Start(context.Background(), "test-suite")
	DeferCleanup(func() {
		suiteSpan.End()
	})
})

// create a span for the test case
var _ = BeforeEach(func() {
	testCtx, testSpan = otel.GetTracerProvider().Tracer("test").Start(suiteCtx, CurrentSpecReport().FullText())
	DeferCleanup(func() {
		if CurrentSpecReport().Failed() {
			testSpan.SetStatus(codes.Error, CurrentSpecReport().Failure.Message)
		} else {
			testSpan.SetStatus(codes.Ok, "test passed")
		}

		testSpan.End()

		traceLink := fmt.Sprintf("http://localhost:16686/trace/%s", testSpan.SpanContext().TraceID().String())
		GinkgoWriter.Println("visit trace:", traceLink)
	})
})

A few things to note from above:

  • BeforeSuite will create a suite-level span that all test cases will be children of
  • BeforeEach will create a test-level span that will be a child of the suite-level span
    • A span named testSpan is available for tests to attach attributes
    • A context named testCtx is available for tests to create children spans
  • During test clean up, the span will report having an error if the test case failed
  • During test clean up, Ginkgo will print a helpful link for viewing traces the console

Please let me know if you know a good way to avoid these global variables!

Next, let’s create some example test cases to demonstrate tracing. Modify the ./test/example_test.go file to match:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
package test_test

import (
	"net/http"
	"time"

	. "github.com/onsi/ginkgo/v2"
	. "github.com/onsi/gomega"
	"go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp"
)

var _ = Describe("Example", func() {
	It("request to microservice-a should succeed", func() {
		response, err := otelhttp.Get(testCtx, "http://localhost:8080")
		Expect(err).ToNot(HaveOccurred(), "error making request to microservice-a")
		DeferCleanup(func() {
			Expect(response.Body.Close()).To(Succeed(), "error closing response body")
		})

		Expect(response).To(HaveHTTPStatus(http.StatusOK), "request to microservice-a should return 200 OK")
	})

	It("this test always passes", func() {
		Expect(true).To(BeTrue(), "should always be true")
	})
})

This test file has two test cases. The first test case make an HTTP request to microservice-a, which then makes another request to microservice-b. This test case uses otelhttp.Get so that OpenTelemetry will automatically create a span named HTTP Get for the request made by the test case. The second test case always passes and is strictly to demonstrate additional spans existing for multiple test cases.

I want to call out that each test should always be using testCtx when invoking functions taking a Context.

Execute test cases

Let’s see our tracing in action. Start up our microservices by running the following commands in separate terminals:

1
go run cmd/microservice-a/main.go
1
go run cmd/microservice-b/main.go

Then run our Ginkgo test suite by running:

1
ginkgo run -v ./test/

Click the link printed to the console to view the trace in Jaeger. Alternatively, you may navigate to https://localhost:16686 to search for the trace.

The trace will look something like this when failing:

example trace in Jaeger

Feel free to re-run ginkgo run -v ./test/ a few times to see success and failure scenarios.

Merging a span’s context and Ginkgo’s SpecContext

So far, we’ve only been providing context created by our tracer. Ginkgo supports SpecContext’s for handling test timeouts. We can merge the two contexts so that our HTTP requests have a span context and will also cancel if a timeout is exceeded.

Install the onecontext package via:

1
go get github.com/teivah/onecontext

Update ./test/example_test.go with the following changes:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
 package test_test

 import (
 	"net/http"
 	"time"

 	. "github.com/onsi/ginkgo/v2"
 	. "github.com/onsi/gomega"
+	"github.com/teivah/onecontext"
 	"go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp"
 )

 var _ = Describe("Example", func() {
-	It("request to microservice-a should succeed", func() {
+	It("request to microservice-a should succeed", func(specContext SpecContext) {
+		ctx, cancel := onecontext.Merge(testCtx, specContext)
+		DeferCleanup(cancel)
+
-		response, err := otelhttp.Get(testCtx, "http://localhost:8080")
+		response, err := otelhttp.Get(ctx, "http://localhost:8080")
 		Expect(err).ToNot(HaveOccurred(), "error making request to microservice-a")
 		DeferCleanup(func() {
 			Expect(response.Body.Close()).To(Succeed(), "error closing response body")
 		})

 		Expect(response).To(HaveHTTPStatus(http.StatusOK), "request to microservice-a should return 200 OK")
-	})
+	}, SpecTimeout(5*time.Second))

 	It("this test always passes", func() {
 		Expect(true).To(BeTrue(), "should always be true")
 	})
 })

Then run tests like before with ginkgo run -v ./test/.

Please let me know if you find a better way to merge contexts! I would love to have an alternative that Ginkgo itself could handle.


I hope this post has shed some light on how to trace through Ginkgo tests using OpenTelemetry. As mentioned in the introduction, tracing can be a powerful tool for troubleshooting test failures in distributed systems if each service is also publishing spans.

Another thing about this approach is that if it’s hard to troubleshoot test failures, it’s probably hard to troubleshoot production issues. This approach guides you towards improving tracing for your tests by improving the tracing of the product, ultimately improving supporting your production services.

A future post will cover how to publish metrics to Prometheus using OpenTelemetry Collector. We’ll publish data per test case.

If you have any questions or feedback, please get in touch with me on LinkedIn, Twitter, or GitHub. Also, let me know if you have found other techniques to improve troubleshooting and testing distributed services.

updatedupdated2024-07-182024-07-18