Enterprise-ready Gh and Dyn libraries – 3

Testing the libraries

Before starting, if you don’t know Braden, you should check out his newsletter. He shares insightful articles on computational development in the AEC space. As with the previous article, Braden continues to collaborate on this series, contributing his sharp insights and experience throughout.

< Article 2 | Article 4 >

Now that we’ve established our monolithic project structure and automated build process, we need to ensure our multi-platform libraries work correctly across all target environments, by testing them!

Once your solution is developed and the code is in place, the crucial next step is testing. This isn’t a one-size-fits-all process, especially when developing tools for Grasshopper (GH) and/or Dynamo (Dyn) within the Architecture, Engineering, and Construction (AEC) sector. The nuance arises not just from varying codebases, but from the direct impact these tools can have on project outcomes and safety. For instance, a visual programming script that automates geometric outputs demands a different testing approach than one that performs complex engineering calculations.

In the AEC world, where tools may support entire teams or companies, the stakes are exceptionally high. Thorough testing of both function and results is paramount. While a non-functional tool is an inconvenience, one that produces erroneous results, potentially unverified by a professional, can have catastrophic consequences.

Therefore, this article aims to provide an approach to testing GH/Dyn components, fostering a mindset for how to strategically validate what you’ve built, rather than a rigid, step-by-step plan.

Testing is complex for GH and/or Dyn libraries

While large AEC software vendors incorporate rigorous testing, within many mid to large-sized architecture and engineering firms, the testing of internally developed custom scripts, plugins, and computational tools often takes a backseat. These tools, often created to address specific project needs or automate tasks, may be viewed as one-off or ad hoc solutions. However, they often become integral to workflows and can persist for a long time. In this context, dedicated testing is frequently overlooked, perceived as a time-consuming activity that’s a luxury when facing tight project budgets and deadlines.

Despite its importance, testing often has a negative reputation for being a waste of time. This perception may stem from the challenges associated with testing and the unclear communication of its value within our industry. Since testing is more closely related to product and technology development, and our field has not fully embraced this connection, its significance may remain overlooked.

Testing GH or Dyn libraries isn’t just about testing the code; it’s also about testing the underlying infrastructure. You also have to test the workflow of these components, including how users will use them and how the current work in different programs. Like Grasshopper alone vs Grasshopper in the RhinoInside environment.

Risk-based testing strategy

Before diving into testing types, you need to assess what actually needs testing based on risk levels. Not all components carry the same risk, and your testing effort should reflect this reality. The fundamental question that should drive every testing decision is: “If this component fails or produces incorrect results, what’s the worst that could happen?” This single question will immediately clarify your testing priorities and help you allocate your limited time and resources effectively.

Consider the spectrum of consequences in AEC software development. At one extreme, you have structural calculations where incorrect beam sizing could lead to catastrophic failure, injury, or death. Code compliance verification components carry similar weight—a miscalculation in seismic loads or wind resistance could result in buildings that don’t meet safety standards. Quantity takeoffs might seem less critical until you realise that incorrect material calculations could cost hundreds of thousands of dollars on a large project, potentially bankrupting a contractor or causing project delays that ripple through entire development schedules.

The picture becomes more nuanced when we consider data transformations and geometric operations. While these components can have equally catastrophic consequences—a parser that incorrectly converts structural data into geometry could result in buildings being constructed with wrong dimensions or orientations—they differ fundamentally in their detectability. Unit conversion errors in geometric operations are often immediately visible when you look at the resulting model: a beam that should be 6 meters long but appears as 6 feet will be glaringly obvious. Boolean operations that fail typically produce visually apparent artifacts: missing geometry, self-intersecting surfaces, or clearly incorrect shapes that jump out during visual inspection.

This visual detectability doesn’t reduce the inherent risk of these components, but it does change the testing strategy required. A parser that converts structural data into the wrong geometry carries the same potential for catastrophic failure as a structural calculation, but because the errors are immediately apparent to users, they’re caught much earlier in the workflow. In contrast, a miscalculation in a complex structural formula might only be detected by a subject matter expert who carefully reviews the numerical outputs—something that doesn’t always happen in fast-paced project environments.

At the lowest risk level sit the components that enhance user experience but don’t directly impact engineering decisions. Visualisation helpers, interface elements, and documentation generators can fail without serious consequences beyond user annoyance. A colour-coding component that breaks might make drawings less readable, but it won’t compromise the integrity of the underlying design.

This risk-based thinking should fundamentally shape how you approach testing. High-risk components demand comprehensive unit testing, rigorous integration testing, multiple validation approaches, and extensive user testing with subject matter experts. Medium-risk components need solid testing coverage but can accept some gaps in edge cases. Low-risk components might only need smoke tests to ensure they don’t crash the system. The key insight is that treating all components equally in your testing strategy is both inefficient and potentially dangerous—you’ll either over-test trivial components or under-test critical ones.

Three testing types for three different requirements

Unit/Logic tests – For isolated logic (e.g., logical operations, conversions, data operations).
Integration tests – For workflows (geometrical operations) and end-to-end behaviour.
User tests – Rarely automated, but useful via project use and user feedback. testing the ‘UX’ of the node and the clarity of the documentation and information.

Unit/Logic tests

These are the most straightforward types of tests. It’s what most people have come to associate with the testing process. You put X in a function, and you test that you get Y out. It’s basically when you want to test the raw logic of your components. There are plenty of books, talks, tutorials, and resources available if you want to expand your knowledge on this topic, but for AEC development, there are specific patterns that prove particularly valuable.

The key insight for unit testing Grasshopper and Dynamo components is understanding what constitutes a “unit” in visual programming environments. Unlike traditional software, where you test individual functions, your units are often the core logic methods that power your components, extracted from the visual programming interface.

Key unit testing patterns

Boundary Value Testing is critical because AEC applications deal with real-world constraints. Always test the extreme values your component might encounter. These edge cases often reveal calculation errors or overflow conditions that could cause serious issues in production. Engineering disasters like the Tacoma Narrows Bridge collapse often occur when structures are used past their design limits, demonstrating why testing boundary conditions in software is equally crucial.

Tolerance testing addresses the fundamental challenge of working with floating-point precision in calculations. Your tests should verify that your components handle these tolerance comparisons correctly, as geometric operations failing due to precision issues are a common source of frustration. This is something very important to consider, specifically for the manufacturing sector.

Unit conversion testing might seem trivial, but it’s where many real-world errors occur. Test not just the mathematical correctness of conversions, but also edge cases like converting between the same units, handling zero values, and ensuring that chains of conversions (meters to feet to inches) maintain accuracy. These seemingly simple operations can cause expensive mistakes when they fail silently, as demonstrated by NASA’s Mars Climate Orbiter disaster where a $327 million spacecraft was lost due to a pounds-force vs. newtons conversion error between Lockheed Martin and NASA systems.

Integrated test

Alright, here’s where things get complicated very quickly, because there are many ways to test how your users should be using your components. A big difference between writing code and creating GH/Dyn components is that there is always an implied way of using these components.

The challenge with integration testing for GH/Dyn components lies not just in testing the individual component logic, but in validating entire workflows within their host applications. Unlike traditional software testing, where you control the entire execution environment, here you’re testing within someone else’s program, with all the licensing, installation, and environment complexities that entail.

The four pathways to automated integration testing

When it comes to automating tests that touch Rhino + Grasshopper (and by extension, similar challenges exist for Dynamo), there are four mainstream approaches. Each bootstraps the host application in a different place, so their trade-offs revolve around where the application lives, how licenses are consumed, and how portable the workflow is to CI.

Option 1: Embed Rhino with RhinoCore (headless, local)

How it works: Your test runner (MSTest, xUnit, NUnit) calls new RhinoCore(args, RuntimeMode.Headless); Rhino spins up inside the same process, and you can create documents, load GHAs, etc.

[System.STAThread]
[TestMethod]
public void TestMethod1()
{
    using (new RhinoCore())
    {
        var rTree = new RTree<int>(2);
        var treePoints = RTreeTestData.Points;
        var numTrees = treePoints.Length / 2;
        for (var i = 0; i < numTrees; i++)
            rTree.Insert(new Point3(treePoints[2 * i], treePoints[2 * i + 1], 0.0), i);

        Assert.GreaterOrEqual(numTrees, 2);
    }
}

This approach shines for developers who want fast interactive debugging—you stay inside VS/VS Code Test Explorer and can set breakpoints in component code, giving you immediate feedback that’s perfect for test-driven development cycles. However, this convenience comes with significant constraints. You’ll need Rhino installed and licensed on every dev/CI machine since headless mode still checks out a license. While RhinoCore technically runs on macOS, most test-runner launchers and many plugins assume Win64, making this effectively Windows-only today.

Option 2: RhinoCompute (remote execution)

RhinoCompute is a REST geometry server that exposes 2400+ RhinoCommon API calls through a stateless web service. You start compute.geometry.exe on a workstation or build server and hit it from tests via GrasshopperCompute.EvaluateDefinition() or direct REST API calls. A practical example of this approach can be found in hrntsm’s GH-UnitTest-by-RhinoCompute repository.

RhinoCompute excels as a lightweight testing solution where only rhino3dm client libraries are needed; no Rhino installation required on developer laptops or CI runners. This approach scales horizontally beautifully, allowing you to pool multiple compute instances or point many repositories at one central Rhino server. The language-agnostic nature means any HTTP client (C#, Python, JS) can drive the same definition or bespoke endpoint, making it perfect for microservices architecture that fits naturally into containerised deployment strategies. You also retain access to existing Rhino/Grasshopper plugins through the online interface.

The trade-offs center around infrastructure complexity and performance. You still need a licensed Rhino somewhere. Network latency becomes a factor, as well as extra infrastructure to maintain.

Option 3: Rhino.Testing NuGet fixture

Add the Rhino.Testing package and decorate test classes with [RhinoTestFixture]; the library reads Rhino.Testing.Configs.xml to find RhinoSystemDirectory, boots RhinoCore headless once per test-assembly, and can auto-load Eto, RDK, Grasshopper, or specific plugins. You can see this approach in action in the SimpleRhinoTests repository, which provides a complete working example.

So, you can have a series of test scripts that help test your components, as well as a list of unit tests that help test the functionality of those components.

The Rhino.Testing approach offers minimal boilerplate where one attribute replaces custom SetupFixture code, and it works seamlessly across VS, Rider, and VS Code development environments. Built-in Grasshopper helpers like RunGrasshopper() and TestGrasshopper() can execute .gh files and surface GHReport for assertions, making it particularly valuable for visual programming testing. The cross-framework targeting supports both net48 and net7.0-windows, so you can test plugins that ship with dual builds, and maintain structured test scripts that help test your components alongside traditional unit tests.

The limitations mirror some of Option 1’s constraints; it’s still local and licensed. Still an early-days project tagged beta, and it’s currently NUnit-only.

Option 4: GitHub Actions workflow (setup-rhino3d + Rhino.Testing)

The setup-rhino3d action downloads and silently installs the latest Rhino service release on a Windows runner, your job restores NuGet packages (including Rhino.Testing) and runs dotnet test.

The GitHub Actions approach delivers a complete CI loop, where every push and pull request automatically exercises your Grasshopper nodes and Rhino plugins, with no manual VM setup required. This is made possible by the composite action, which hides installer flags and cache keys behind a simple YAML configuration. It works seamlessly with public runners because the Rhino installer is pulled at workflow time, eliminating the need for pre-baked images. You can add matrix jobs for net48 vs net7.0.

However, this convenience comes with notable constraints. It’s Windows-only today, with macOS/Linux support tracked but not yet available. You’ll need license automation through Cloud Zoo email or Zoo server keys, which requires managing encrypted secrets in your repository. There’s also a cold start penalty, where downloading and installing Rhino adds 3-5 minutes to each CI run; caching helps, but it’s not free. Runner security becomes a concern in enterprise realities.

Thoughts on the options

The complexity of testing workflows in visual programming environments means that you’ll likely need multiple approaches. Start with the simplest option that covers your immediate needs, then expand your testing infrastructure as your tool matures and your user base grows. Remember, the goal isn’t perfect automation; it’s building confidence that your components work correctly in the hands of professionals who depend on them for critical decisions.

Rule of thumb: Use Option 1 or 3 for day-to-day tests on your workstation; switch to Option 4 when you need many workers and have your tests fully integrated in your CI/CL pipeline.

For Dynamo, the situation is different but includes structured testing options. Dynamo provides a comprehensive testing infrastructure through the DynamoVisualProgramming.Tests the NuGet package, which enables local testing using NUnit with specialised base classes, like GeometricTestBase for unit tests using the geometry library and SystemTestBase for system tests that can start Dynamo and evaluate .dyn files. While this allows for thorough testing of Dynamo packages locally, there are currently no automated remote execution alternatives similar to RhinoCompute’s approach. That being said, Autodesk’s Dynamo as a service is forthcoming, which may eventually enable automated testing scenarios similar to those offered by RhinoCompute for Grasshopper.

User test

Okay, then we move on to the most complicated type of testing. User testing. Since we create components intended for use in projects, there is still a need to encourage users to test them. Although we can create as many test scripts as possible, they may not cover the wide variety of situations that real-world projects and users will encounter.

The complexity of reviewing functionalities (nodes) in DYN and GH packages comes from the requirement of multiple types of review. Whatever the nodes focus on, logic that operates on data or geometry, even though you run all the tests and you are sure the node is working perfectly, getting feedback from the end user, testing the node is crucial. Why is this important? Often, the way a final user intends to use the node differs from the developer’s expectations. Therefore, the testing phase must encompass functionality, usability, and documentation reviews.

Users need to test the node in real-world situations, especially its edge cases. This is where valuable feedback regarding usability is obtained, revealing that certain inputs might be unclear or that the node’s design may not align with how the end user will actually use it. This highlights the need for both usability evaluations and an interpretation of the documentation.

The challenge in reviewing nodes stems from the requirement for two distinct types of assessments: a code review by a developer and a user review conducted by subject matter experts (SMEs). This duality complicates the testing phase and makes it difficult to fully automate. While unit tests and integrated testing can be beneficial, the final user review often yields the most insightful feedback.

To facilitate this, it’s essential to involve the final user in the testing process. Since these users are typically not developers, it’s impractical to expect them to download the repository, build the solution, and debug like a developer would.

Before describing an effective solution tested in the past, it is essential to create systematic user testing with clear deliverables rather than ad-hoc feedback collection. Provide users with specific scenarios to test, such as a basic design scenario where the expected output should be appropriate error messages or warnings, and success criteria focus on graceful handling without crashes.

Complement these scenarios with structured feedback forms that capture specific information: what scenario was being tested, what inputs were used, what was the expected versus actual output, whether any errors or unexpected behaviour occurred, how intuitive the component was to use, and what documentation was unclear. This systematic approach ensures you gather actionable feedback rather than vague impressions, making it possible to identify specific areas for improvement and validate that your components work correctly in real-world situations.

One effective solution that has yielded good results is the deployment of alpha packages during the pull request (PR) process.

Generally, when developing solutions of this nature, a code management tool such as GitHub, Azure DevOps, or Bitbucket is used. During the typical development process, a PR is opened, and as part of this phase, specific pipelines are created to build alpha packages tailored for the open PR. This is especially useful when changes are made or user testing of the nodes is required. These pipelines generate packages that users can easily install using either provided or internally developed package managers.

The SMEs will test the node by following a checklist of best practices and providing feedback. Since it is not feasible to give everyone a GitHub account, a dedicated feedback page is included in the tool’s web documentation. This page is used to log feedback, which is then linked to the PR using its number. The PR number is incorporated into the alpha package name and serves as a reference that users must include when submitting their feedback. This action triggers other automation processes in the background to push the feedback to the PR as comments.

While the process of approval is an interesting topic, it falls under the area of governance, which is not the focus of this article. However, it is certainly a significant point of discussion and decision for enterprises.

Quality gates

Before any component moves from development to production, establish clear criteria that define when it’s truly ready for release. These quality gates should require that all high and medium-risk tests pass without exception, performance benchmarks are consistently met, no critical or high-severity bugs remain unresolved, code coverage targets are achieved for the component’s risk level, and user acceptance criteria are satisfied through real-world testing scenarios. The key is making these criteria measurable and binary; there should be no ambiguity about whether a component meets the standard.

Conclusion

Testing Grasshopper and Dynamo components requires a multi-layered approach that goes beyond traditional software testing. By implementing risk-based testing strategies, comprehensive unit and integration tests, and systematic user feedback collection, you can build robust tools that meet the demanding requirements of the AEC industry.

Remember: the goal isn’t perfect test coverage—it’s building confidence that your tools will perform correctly when professionals depend on them for critical decisions. Start with high-risk components, implement concrete testing patterns, and gradually expand your testing coverage based on real user needs and feedback.

In the next articles, we will move on to the aspects of development and deployment.

Enterprise-ready Gh and Dyn libraries - 3