NVTX++  3.0
C++ convenience wrappers for NVTX.
NVTX++ Documentation

Table of Contents

Quick Start

To add NVTX ranges to your code, use the nvtx3::thread_range RAII object. A range begins when the object is created, and ends when the object is destroyed.

#include "nvtx3.hpp"
void some_function(){
// Begins a NVTX range with the messsage "some_function"
// The range ends when some_function() returns and `r` is destroyed
nvtx3::thread_range r{"some_function"};
for(int i = 0; i < 6; ++i){
nvtx3::thread_range loop{"loop range"};
std::this_thread::sleep_for(std::chrono::seconds{1});
}
} // Range ends when `r` is destroyed

The example code above generates the following timeline view in Nsight Systems:

example_range.png

Alternatively, use the Convenience Macros like NVTX3_FUNC_RANGE() to add ranges to your code that automatically use the name of the enclosing function as the range's message.

#include "nvtx3.hpp"
void some_function(){
// Creates a range with a message "some_function" that ends when the
enclosing
// function returns
...
}

Overview

The NVTX library provides a set of functions for users to annotate their code to aid in performance profiling and optimization. These annotations provide information to tools like Nsight Systems to improve visualization of application timelines.

Ranges are one of the most commonly used NVTX constructs for annotating a span of time. For example, imagine a user wanted to see every time a function, my_function, is called and how long it takes to execute. This can be accomplished with an NVTX range created on the entry to the function and terminated on return from my_function using the push/pop C APIs:

void my_function(...){
nvtxRangePushA("my_function"); // Begins NVTX range
// do work
nvtxRangePop(); // Ends NVTX range
}

One of the challenges with using the NVTX C API is that it requires manually terminating the end of the range with nvtxRangePop. This can be challenging if my_function() has multiple returns or can throw exceptions as it requires calling nvtxRangePop() before all possible return points.

NVTX++ solves this inconvenience through the "RAII" technique by providing a nvtx3::thread_range class that begins a range at construction and ends the range on destruction. The above example then becomes:

void my_function(...){
nvtx3::thread_range r{"my_function"}; // Begins NVTX range
// do work
} // Range ends on exit from `my_function` when `r` is destroyed

The range object r is deterministically destroyed whenever my_function returns—ending the NVTX range without manual intervention. For more information, see Ranges and nvtx3::domain_thread_range.

Another inconvenience of the NVTX C APIs are the several constructs where the user is expected to initialize an object at the beginning of an application and reuse that object throughout the lifetime of the application. For example Domains, Categories, and Registered messages.

Example:

nvtxDomainHandle_t D = nvtxDomainCreateA("my domain");
// Reuse `D` throughout the rest of the application

This can be problematic if the user application or library does not have an explicit initialization function called before all other functions to ensure that these long-lived objects are initialized before being used.

NVTX++ makes use of the "construct on first use" technique to alleviate this inconvenience. In short, a function local static object is constructed upon the first invocation of a function and returns a reference to that object on all future invocations. See the documentation for nvtx3::registered_message, nvtx3::domain, nvtx3::named_category, and https://isocpp.org/wiki/faq/ctors#static-init-order-on-first-use for more information.

Using construct on first use, the above example becomes:

struct my_domain{ static constexpr char const* name{"my domain"}; };
// The first invocation of `domain::get` for the type `my_domain` will
// construct a `nvtx3::domain` object and return a reference to it. Future
// invocations simply return a reference.
nvtx3::domain const& D = nvtx3::domain::get<my_domain>();

For more information about NVTX and how it can be used, see https://docs.nvidia.com/cuda/profiler-users-guide/index.html#nvtx and https://devblogs.nvidia.com/cuda-pro-tip-generate-custom-application-profile-timelines-nvtx/ for more information.

Ranges

Ranges are used to describe a span of time during the execution of an application. Common examples are using ranges to annotate the time it takes to execute a function or an iteration of a loop.

NVTX++ uses RAII to automate the generation of ranges that are tied to the lifetime of objects. Similar to std::lock_guard in the C++ Standard Template Library.

Thread Range

nvtx3::domain_thread_range is a class that begins a range upon construction and ends the range at destruction. This is one of the most commonly used constructs in NVTX++ and is useful for annotating spans of time on a particular thread. These ranges can be nested to arbitrary depths.

nvtx3::thread_range is an alias for a nvtx3::domain_thread_range in the global NVTX domain. For more information about Domains, see Domains.

Various attributes of a range can be configured constructing a nvtx3::domain_thread_range with a nvtx3::event_attributes object. For more information, see Event Attributes.

Example:

void some_function(){
// Creates a range for the duration of `some_function`
while(true){
// Creates a range for every loop iteration
// `loop_range` is nested inside `r`
nvtx3::thread_range loop_range{};
}
}

Process Range

nvtx3::domain_process_range is identical to nvtx3::domain_thread_range with the exception that a domain_process_range can be created and destroyed on different threads. This is useful to annotate spans of time that can bridge multiple threads.

nvtx3::domain_thread_ranges should be preferred unless one needs the ability to begin and end a range on different threads.

Marks

nvtx3::mark allows annotating an instantaneous event in an application's timeline. For example, indicating when a mutex is locked or unlocked.

std::mutex global_lock;
void lock_mutex(){
global_lock.lock();
// Marks an event immediately after the mutex is locked
nvtx3::mark<my_domain>("lock_mutex");
}

Domains

Similar to C++ namespaces, Domains allow for scoping NVTX events. By default, all NVTX events belong to the "global" domain. Libraries and applications should scope their events to use a custom domain to differentiate where the events originate from.

It is common for a library or application to have only a single domain and for the name of that domain to be known at compile time. Therefore, Domains in NVTX++ are represented by tag types.

For example, to define a custom domain, simply define a new concrete type (a class or struct) with a static member called name that contains the desired name of the domain.

struct my_domain{ static constexpr char const* name{"my domain"}; };

For any NVTX++ construct that can be scoped to a domain, the type my_domain can be passed as an explicit template argument to scope it to the custom domain.

The tag type nvtx3::domain::global represents the global NVTX domain.

// By default, `domain_thread_range` belongs to the global domain
// Alias for a `domain_thread_range` in the global domain
// `r` belongs to the custom domain

When using a custom domain, it is reccomended to define type aliases for NVTX constructs in the custom domain.

using my_thread_range = nvtx3::domain_thread_range<my_domain>;
using my_registered_message = nvtx3::registered_message<my_domain>;
using my_named_category = nvtx3::named_category<my_domain>;

See nvtx3::domain for more information.

Event Attributes

NVTX events can be customized with various attributes to provide additional information (such as a custom message) or to control visualization of the event (such as the color used). These attributes can be specified per-event via arguments to a nvtx3::event_attributes object.

NVTX events can be customized via four "attributes":

It is possible to construct a nvtx3::event_attributes from any number of attribute objects (nvtx3::color, nvtx3::message, nvtx3::payload, nvtx3::category) in any order. If an attribute is not specified, a tool specific default value is used. See nvtx3::event_attributes for more information.

// Custom color, message
event_attributes attr{nvtx3::rgb{127, 255, 0},
"message"};
// Custom color, message, payload, category
event_attributes attr{nvtx3::rgb{127, 255, 0},
"message",
// Arguments can be in any order
event_attributes attr{nvtx3::payload{42},
"message",
nvtx3::rgb{127, 255, 0}};
// "First wins" with multiple arguments of the same type
event_attributes attr{ nvtx3::payload{42}, nvtx3::payload{7} }; // payload is
42

message

A nvtx3::message allows associating a custom message string with an NVTX event.

Example:

// Create an `event_attributes` with the custom message "my message"
nvtx3::event_attributes attr{nvtx3::Mesage{"my message"}};
// strings and string literals implicitly assumed to be a `nvtx3::message`
nvtx3::event_attributes attr{"my message"};

Registered Messages

Associating a nvtx3::message with an event requires copying the contents of the message every time the message is used, i.e., copying the entire message string. This may cause non-trivial overhead in performance sensitive code.

To eliminate this overhead, NVTX allows registering a message string, yielding a "handle" that is inexpensive to copy that may be used in place of a message string. When visualizing the events, tools such as Nsight Systems will take care of mapping the message handle to its string.

A message should be registered once and the handle reused throughout the rest of the application. This can be done by either explicitly creating static nvtx3::registered_message objects, or using the nvtx3::registered_message::get construct on first use helper (recommended).

Similar to Domains, nvtx3::registered_message::get requires defining a custom tag type with a static message member whose value will be the contents of the registered string.

Example:

// Explicitly constructed, static `registered_message`
static registered_message<my_domain> static_message{"my message"};
// Or use construct on first use:
// Define a tag type with a `message` member string to register
struct my_message{ static constexpr char const* message{ "my message" }; };
// Uses construct on first use to register the contents of
// `my_message::message`

color

Associating a nvtx3::color with an event allows controlling how the event is visualized in a tool such as Nsight Systems. This is a convenient way to visually differentiate among different events.

// Define a color via rgb color values
nvtx3::color c{nvtx3::rgb{127, 255, 0}};
// rgb color values can be passed directly to an `event_attributes`

category

A nvtx3::category is simply an integer id that allows for fine-grain grouping of NVTX events. For example, one might use separate categories for IO, memory allocation, compute, etc.

Named Categories

Associates a name string with a category id to help differentiate among categories.

For any given category id Id, a named_category{Id, "name"} should only be constructed once and reused throughout an application. This can be done by either explicitly creating static nvtx3::named_category objects, or using the nvtx3::named_category::get construct on first use helper (recommended).

Similar to Domains, nvtx3::named_category::get requires defining a custom tag type with static name and id members.

// Explicitly constructed, static `named_category`
static nvtx3::named_category static_category{42, "my category"};
// OR use construct on first use:
// Define a tag type with `name` and `id` members
struct my_category{
static constexpr char const* name{"my category"}; // category name
static constexpr category::id_type id{42}; // category id
};
// Use construct on first use to name the category id `42`
// with name "my category"
nvtx3::named_category const& my_category =
named_category<my_domain>::get<my_category>();
// Range `r` associated with category id `42`
nvtx3::event_attributes attr{my_category};

payload

Allows associating a user-defined numerical value with an event.

nvtx3:: event_attributes attr{nvtx3::payload{42}}; // Constructs a payload
from
// the `int32_t` value 42

Example

Putting it all together:

// Define a custom domain tag type
struct my_domain{ static constexpr char const* name{"my domain"}; };
// Define a named category tag type
struct my_category{
static constexpr char const* name{"my category"};
static constexpr uint32_t id{42};
};
// Define a registered message tag type
struct my_message{ static constexpr char const* message{"my message"}; };
// For convenience, use aliases for domain scoped objects
using my_thread_range = nvtx3::domain_thread_range<my_domain>;
using my_registered_message = nvtx3::registered_message<my_domain>;
using my_named_category = nvtx3::named_category<my_domain>;
// Default values for all attributes
my_thread_range r0{attr};
// Custom (unregistered) message, and unnamed category
my_thread_range r1{attr1};
// Alternatively, pass arguments of `event_attributes` ctor directly to
// `my_thread_range`
my_thread_range r2{"message", nvtx3::category{2}};
// construct on first use a registered message
auto msg = my_registered_message::get<my_message>();
// construct on first use a named category
auto category = my_named_category::get<my_category>();
// Use registered message and named category
my_thread_range r3{msg, category, nvtx3::rgb{127, 255, 0},
// Any number of arguments in any order
my_thread_range r{nvtx3::rgb{127, 255,0}, msg};

Convenience Macros

Oftentimes users want to quickly and easily add NVTX ranges to their library or application to aid in profiling and optimization.

A convenient way to do this is to use the NVTX3_FUNC_RANGE and NVTX3_FUNC_RANGE_IN macros. These macros take care of constructing an nvtx3::domain_thread_range with the name of the enclosing function as the range's message.

void some_function(){
// Automatically generates an NVTX range for the duration of the function
// using "some_function" as the event's message.
}