Types of automated testing

There are many ways to organize your automated tests. You can write unit tests, integration tests, end to end tests. The requirements depend on the product, of course, but I believe that the optimal solution is usually a mix of multiple methods.

To be clear, it’s a great thing that automated tests are being written in the first place. Too often I see testing being neglected completely. But doing tests the “wrong” way may make them slow, fragile, unreliable or simply useless. This causes frustration and waste of time. It’s all about the return on investment: the best result for the least amount of effort.

First, let’s go through the testing methods that I most commonly use.

Unit tests

Unit tests are mainly focused on testing a single component, usually a method/function. Sometimes the component can be a bit larger too, such as a class or UI component (in the frontend world).

Unit tests should be simple, fast, easy to write, easy to read. They’re good for testing corner cases and different classes of valid and invalid inputs. Unit tests should be reliable, meaning they should not fail randomly. They should be atomic and isolated from the other tests. Unit tests should have minimal dependencies to outside world. It’s important the unit tests are fast, so they can be executed repeatedly when writing the implementation.

Unit tests can be white box (structural) or black box (behavioral), meaning focusing on the code being tested, or on the inputs and outputs. In my opinion, white box tests achieve better coverage, but they can be too much tied to implementation, breaking when the code is refactored. Again, a combination of both methods is probably the best.

Mocks and fakes

A note about mocking: I usually prefer custom “fakes” over “mocks” for breaking dependencies, except for very simple cases or if the interface being mocked is very large. The difference is, when you start defining behavior for the mock, it usually becomes quite verbose when using the mocking framework. Fakes are more flexible, usually with less code. But it’s important to keep the fake simple, otherwise it will start having bugs as well. The whole purpose is that the fake/mock is simpler than the functionality being replaced.

Examples

  • Algorithms
  • Validation logic
  • Regular expressions
  • User interface components (such as React components)

Tools

  • Test runners: xUnit, JUnit, MSTest, Jest, Vitest
  • Mocking libraries: NSubstitute

Integration tests

Integration tests focus on testing interaction of multiple components. Usually in web development, for me this means testing an API or a single page. Major dependencies such as backend, SQL database and distributed services should still be mocked. SQL database can be included when it makes sense, when it’s relevant to the outcome of the test and doesn’t slow down the execution too much.

In my opinion, it’s a good idea to clearly separate integration tests from unit tests. Integration test set can be slower to execute, and there might be some flakiness involved. Usually I would set the CI server fail the build if unit tests fail, but continue if integration tests fail. Of course, ideally, integration tests should be reliable as well, but based on experience, it can be difficult because of timing issues in the frontend. When we were integration testing SignalR APIs, sometimes the API would just hang randomly.

Examples

  • API tests
  • Authentication
  • Serialization (JSON)
  • User interface navigation (routing)

Tools

End to end tests

End to end (E2E) tests test the whole application, usually in a dedicated environment. E2E tests can be heavy – running a test set can take hours. Based on my experience, it’s also very difficult to make them stable and reliable: some flakiness is common.

Personally, I would try to keep the number of E2E tests small, and their scope limited. E2E tests are good for core use cases and smoke tests.

Tools

  • Robot framework

Rules of good tests

In the ideal situation, all tests would follow all these rules, but in reality we have to make compromises, because of performance or amount of work.

Tests should be

  • Reliable: if a test fails, it means something has changed or something is broken. A test should not fail randomly. This is called flakiness.
  • Atomic: tests should not affect other tests. The order of tests should not matter. Any test should be executable individually or in a set with other tests.
  • Simple: tests should be easier to understand, read and write than the system under test (SUT) they’re testing.

Flakiness

Ideally, tests should not fail randomly. Random failures decrease the reliability and usability of the tests. Test failure should indicate that something has broken.

However, in the real world, making all tests 100% reliable seems to be very difficult. For example, JavaScript user interface code is always asynchronous, and asynchronous code can create race conditions, and race conditions may cause deadlocks or timeouts. It’s possible to make asynchronous code synchronous, but this means adding locks for all async calls, for example by using fake timers, which in turn makes the tests too much tied to implementation, in my opinion. When tests are strongly tied to implementation, that makes it harder to refactor the code. So in my experience, there is no perfect solution.

The best solution I have found is to clearly separate “flaky” tests from reliable ones, for example to different project, or different naming. Usually I consider integration and E2E tests potentially flaky. Sometimes it’s just a matter of mocking away dependencies that cause asynchronous/flaky behavior.

Scope of tests

Often I see a choice between how much to test with one test. Unit tests try to be as limited as possible. Testing Library ideology is about testing the system as it’s used, which implies bringing in related components when testing user interface. You can achieve 100% line coverage with all testing methods, but the interaction of components also matters. I’ve seen so many unit tests that are not really testing anything (pass a null, it calls another method with null value).

Testing larger components means less tests overall, and the tests resemble real usage better. The problem with large tests is they become slower, more complicated and unreliable. Knowing the correct method is the key.

I prefer a combination of unit tests and integration tests: one or two integration tests for the overall functionality, corner cases as unit tests.

Who writes the tests?

Do you need E2E tests at all? Who should write them? QA or developers? It’s always good to have some outside perspective, and QA might be more skilled at critical thinking anyway, to be able to break the code in ways the developer might not imagine.

However, often I see that understanding of the code may also help write more useful (white box) test cases. And when things are not working, developers are usually better at figuring that out because of debuggers and monitoring tools. For example, when testing Material UI (MUI) components, we noticed that the component state was not updating in tests. It took a debugger to figure out that it was caused by built-in delay, which was unnoticeable when used by a human, but the test script made it break.

Developers can usually write tools for assisting test setup. For example, QA might set up users by using public APIs or HTML forms, which might be slow. Meanwhile, developers can write a special test setup API which creates the users in the database. One dilemma I faced was whether these test setup APIs should also be tested, because I’ve seen them breaking also. I think it makes sense to test the test infrastructure.

Collaboration between developers and QA is definitely useful. Perhaps QA could serve as a consultant for developers, suggesting test cases, pair programming test cases with developers. In any case, it’s important that the code is manually tested once by someone else than the person who wrote it, but that can also be another developer.

Leave a Reply