The Art of Render Graphs

While building a game engine (especially when using modern graphics APIs like Vulkan or DirectX), you will definitely stumble upon an annoying issue. You have too many operations to perform that are surprisingly unrelated to each other. For example, you need a pass to create shadows, another pass to render objects using the created shadow maps, and a pass to render your UI from ImGui.

You may say that's not an issue; it's like three different functions, easy. Between these calls, I can just transition images as needed, right? Well, that will work, of course. After all, those are simple three passes. But, what if I just decided to add another pass, like a post-processing pass, which must be added before the UI? (Of course, we don't want the post-processing to happen on our UI components!). 

If you didn't discover the issue yet, It's now you have do a special transition between the render and post processing and another one from post processing to UI. 

And this is where we need to introduce an automated way to seamlessly add passes anywhere and in any order, while letting the system sort them correctly. We simply declare that shadows must be rendered before objects because the object pass depends on the shadow data, and the render graph takes care of the rest.

On top of that, the system should be able to cull passes that are not actually used. For example, if shadows are disabled in the scene, it makes no sense to generate or even request a shadow map texture. That entire pass should be removed automatically.

You could, of course, choose to not add the shadow pass at all while building the render graph. However, if every pass has to manually check whether all of its inputs are enabled or valid, you quickly end up with countless conditional checks scattered across the frame code. Render graphs solve this by allowing you to describe dependencies explicitly and let the graph compiler handle ordering and culling for you.

 

In this post, I will be using Vulkan because I stumbled the issue of passes while building a game engine.

When I started learning about game engines, everyone I told about render graphs told me I didn't need them. That made me want to do it even more, and I don't regret it. That taught me a great feature and made implementing things like tiled culling very easy. I just define things, and most of the hard logic is automatically generated.

 

But why a graph? Let's write a formal definition of the render graph: A render graph is a directed acyclic graph (DAG) where each node represents a rendering pass, and edges represent resource dependencies between passes.


Passes do not perform transitions, allocation, or synchronization themselves. Instead, they describe what they read and write, and the graph compiler derives execution order, resource lifetimes, and barriers.

I hope this image make things understandable. If not we are going through it.

  • Squares define passes
  • Arrows define direction from Pass A to Pass B
  • non connected passes means they have been culled therefore they won't be executed

Another question would be why all this complexity? Like won't just a simple tracing would work why graphs,...etc. I didn't understand the answer till I remembered unity render passes structure. 

You see we have two different of passes:

  1. Engine passes: Passes that's essential for engine like gbuffer, shadow, skybox, culling,...etc.
  2. Game Passes: Like custom post processing, special work that should be done in a render pass,..etc. 
    • The thing about game passes they will be always added after engine passes.
    • For example, I want to add a post processing for skybox only that means this should happen before lit pass. Therefore, we need also constrain to tell it to be exactly where it should be

In my implementation of a render graph I decided as I mentioned earlier I will make it support reordering. I felt like that's the right way to do it because in a real game you may have tens of render passes defining them in order may not be always right. The ordering part typically optional.

Now, Let's head into implementation, I decided to start with class based passes I felt like I love it like this then I made a compatible layer of function passes.

 

Let's start with RenderPass.h which is an abstract class all render passes should be based on it:

class RenderPass {
public:
    explicit RenderPass(const std::string &name) : _name(name) {
    }

    virtual ~RenderPass() = default;

    virtual void Setup(RenderGraphBuilder &builder) = 0;

    virtual void Execute(CommandBuffer& cmd, const RenderContext& context) = 0;

    const std::string &GetName() const { return _name; }


    virtual bool HasSideEffect() {
        return true;
    }
protected:
    std::string _name;
};

Simple two functions:

  • Setup: which will define textures (definition and usages)
  • Execute: which execute needed operations inside the command buffer

Let’s move on to the Builder, which is a proxy class for the render graph. Its role is to create and import textures, and most importantly define their valid usage

class RenderGraphBuilder {
public:
    [[nodiscard]] RGHandle CreateTexture(const RGTextureDesc &desc) const;

    [[nodiscard]] RGHandle FindTexture(const std::string &name) const;
    
    void AddDependency(RGHandle handle, AccessType access);
    
private:
    std::vector<std::pair<RGHandle, AccessType> > _dependencies;
    
};

It's a simple class with three functions:

  • CreateTexture: A function that takes information about a texture like size, type, format, and name and return a handle of this function.
  • FindTexture: Finds an image that was created by another pass to be used in the current pass.
  • AddDependency: This function takes the texture that was created or define and define how should it be used for the pass like will it be.

I know you may have so many questions, I promise I will try to answer them all but let's continue for now

Let's define what is AccesType. As mentioned above, AccesType is a how the texture will be used, I won't define everything of course rather I will define three types to be used later in our example:

enum class AccessType {
    ShaderRead, // Generic Read (defaults to ReadOnlyOptimal)
    ColorAttachmentWrite, // Generic Write (defaults to ColorAttachmentWrite)
    ReadWrite, // Generic ReadWrite (defaults to General)

};

 

Now, Let's start the real work. We will start simple a RenderGraph that can re order and cull. 

But first, Something that should be mentioned in my game engine I have a class called ResourceManager which responsible for defining and destroying textures. I won't look into that but keep that in mind because I will assume its defined for me as my render graph build its own textures inside and manage their lifetimes. The handle for resource manager textures called RGResource

Let's start by defining some needed structs:

struct RGHandle {
        uint32_t id = UINT32_MAX;
        [[nodiscard]] bool IsValid() const noexcept { return id != UINT32_MAX; }
};

struct PassDependency {
    RGHandle resource;
    AccessType access; // How the pass uses it (Read, Write, ReadWrite)
};

struct CompiledPass {
    RenderPass *pass;
    std::vector<vk::ImageMemoryBarrier2> imageBarriers;
};

Let's define what are these structs:

  • RGHandle: this is the handle that will be given to passe that represent. 
  • PassDependency: This is a struct that holds the data from AddDependency function mentioned above.
  • CompiledPass: A struct holds the render the pass and needed image barriers to be executed before executing the pass itself to make sure images in the correct layout.

Now, we have everything to start implementing the algorithm. We define a function called Compilethat does the following:

def Compile():
    # 1. DISCOVERY & DEPENDENCY CAPTURE
    for pass in passes:
        # Run the developer's Setup function
        # This fills the 'builder._dependencies' vector with (RGHandle, AccessType)
        pass.Setup(builder)
        
        # NOTE: Transient textures are NOT created here. 
        StoreDescriptions(builder.createdResources)
        
        # Record which pass index touches which resource handle
        RecordResourceTouches(pass.index, builder.dependencies)

    # 2. LIFETIME ANALYSIS
    for resource in allResources:
        # Determine 'firstPass' and 'lastPass' where this resource is used
        # This prevents allocating memory for textures before they are needed
        resource.firstUse = min(touches[passIndex])
        resource.lastUse = max(touches[passIndex])

    # 3. BUILD DEPENDENCY GRAPH (Adjacency List)
    for resource in allResources:
        # Sort usages by pass index (e.g., if used in Pass 0, then Pass 5, then Pass 10)
        usages = SortUsagesByPass(resource)
        
        for i in range(len(usages)):
            for j in range(i + 1, len(usages)):
                passA, accessA = usages[i]
                passB, accessB = usages[j]

                # Optimization: If both passes only READ, there is no dependency (RAR is fine)
                if not accessA.isWrite() and not accessB.isWrite():
                    continue
                
                # Otherwise, Pass B depends on Pass A
                # Add edge: A -> B and increase B's In-Degree (number of incoming dependencies)
                AddEdge(passA, passB)
                InDegree[passB] += 1

    # 4. CULLING PASSES
    # 4.1 Identify "Root" Passes that MUST run:
    for pass in passes:
        if pass.HasSideEffect(): # (e.g., Present to Screen)
            MarkAsNeeded(pass)
        elif pass.WritesToImportedResource(): # (e.g., Writing to Swapchain)
            MarkAsNeeded(pass)

    # 4.2 Propagate status backwards:
    # Use a Queue to find all passes that produce data for the "Needed" roots
    # If Pass B is needed and depends on Pass A, then Pass A is now needed.
    PropagateNeededStatus(ReverseAdjacencyGraph)

    # 5. PHYSICAL ALLOCATION & FINALIZATION
    # Now that we know which passes/resources are actually used:
    for resource in allResources:
        if resource.isTransient() and resource.isNeededByActivePass():
            # Grab a real texture from the pool or create a new one in Vulkan
            AllocatePhysicalTexture(resource.desc)

    # Generate Barriers, Topologically Sort the needed passes, and build Final Execution List
    FinalizeCompiledGraph()

This is a simple overview of the algorithm itself. If you don't get it, don't worry we will cover each part on its own.

1- Setup Passes And Record Their Dependencies

We take the passes defined by the user and run their setup functions. This gives us a vector that represents all resources. For each resource, we store a list of passes that use it. Each entry in that list is a pair containing the pass index and the type of access that pass needs.

This how the function would look like:

void RenderGraph::setupPassesAndRecordDependencies(
    std::vector<std::vector<std::pair<uint32_t, AccessType> > > &resourceTouchList) {
    const auto passCount = static_cast<uint32_t>(_passes.size());
    _passDeps.assign(passCount, {});
    _explicitEdges.clear();
    resourceTouchList.assign(0, {});

    for (uint32_t p = 0; p < passCount; ++p) {
        auto &pass = _passes[p];
        RenderGraphBuilder builder(this, p);
        pass->Setup(builder);

        for (auto &[handle, accessType]: builder._dependencies) {
            if (!handle.IsValid()) continue;
            uint32_t rid = handle.id;
            if (rid >= resourceTouchList.size()) {
                resourceTouchList.resize(rid + 1);
            }
            resourceTouchList[rid].emplace_back(p, accessType);
            _passDeps[p].push_back({handle, accessType});
        }
    }
}

 

2- Compute The Life Time of a Resource

This step is vital for the way I’ve set things up. When you ask the graph to create a texture, creating it right away is inefficient. That pass might end up being culled, and by waiting, we can facilitate texture reuse (aliasing) throughout the frame.

So, this step does the following, For each resource we check the touch list that was created from the first step and we figure out when the first time they got used and when the last time

void RenderGraph::computeResourceLifetimes(
    const std::vector<std::vector<std::pair<uint32_t, AccessType> > > &resourceTouchList) {
    const uint32_t resCount = static_cast<uint32_t>(_resources.size());
    _resourceFirstUse.assign(resCount, UINT32_MAX);
    _resourceLastUse.assign(resCount, 0);

    for (uint32_t r = 0; r < resCount; ++r) {
        if (resourceTouchList[r].empty()) continue;
        uint32_t first = UINT32_MAX, last = 0;
        for (auto &pr: resourceTouchList[r]) {
            first = std::min(first, pr.first);
            last = std::max(last, pr.first);
        }
        _resourceFirstUse[r] = first;
        _resourceLastUse[r] = last;
    }
}

 

3- Building The Graph

In this step we take all these passes and add them to the graph as edges and resources become the connectivity between those edge

This is how the code works and will explain it live :)

void RenderGraph::buildAdjacencyGraph(
    const std::vector<std::vector<std::pair<uint32_t, AccessType> > > &resourceTouchList,
    std::vector<std::unordered_set<uint32_t> > &adj, std::vector<uint32_t> &indeg) {
    const uint32_t resCount = static_cast<uint32_t>(_resources.size());
    for (uint32_t r = 0; r < resCount; ++r) {
        auto touches = resourceTouchList[r];
        if (touches.size() < 2) continue;

        // Sort by pass index to ensure deterministic processing order
        std::ranges::sort(touches, [](auto &a, auto &b) { return a.first < b.first; });

        const bool isTransient = _resources[r].transient && !_resources[r].imported;

        for (size_t i = 0; i < touches.size(); ++i) {
            for (size_t j = i + 1; j < touches.size(); ++j) {
                uint32_t passA_idx = touches[i].first;
                AccessType accessA = touches[i].second;

                uint32_t passB_idx = touches[j].first;
                AccessType accessB = touches[j].second;

                bool aWrites = IsWriteAccess(accessA);
                bool bWrites = IsWriteAccess(accessB);

                // If neither writes, no dependency (Read-Read is fine)
                if (!aWrites && !bWrites) continue;

                // Determine dependency direction
                // Default: Insertion Order (Pass A -> Pass B)
                // This covers: Write->Write, Write->Read (if A writes), Read->Write (if Imported/Persistent)
                uint32_t src = passA_idx;
                uint32_t dst = passB_idx;

                if (isTransient) {
                    // Special case for Transient Resources:
                    // They are created (Written) and then Consumed (Read).
                    // If we have a Read and a Write, and the Read comes "first" in the list,
                    // it implies the Read depends on the Write that happens later (because it's transient, it has no prior state).
                    // So we must invert to Write -> Read.
                    if (!aWrites && bWrites) {
                        // A is Reader, B is Writer.
                        // Dependency: B -> A.
                        src = passB_idx;
                        dst = passA_idx;
                    }
                }

                // Add edge src -> dst
                if (adj[src].insert(dst).second) {
                    indeg[dst]++;
                }
            }
        }
    }
}

transient means this image has been created inside the graph while importedmeans this image has been persisted to the graph from outside like swapchain images for example

We loop over resources and for each resource we check its touch list and build graph accroding to these rules:

  • Read → Write: If pass A writes to a texture and pass B reads it, we create an edge A → B is created.
  • Write → Write: If pass B writes to a texture and also pass A write to it, we keep order to avoid race condition B → A.
  • Read → Read: If both passes A & B are read only we ignore this iteration as no dependency between them and they are executed in the given order.

There is a special case which is reordering passes because they were added out of order. For transient textures if pass A was defined before pass B while pass B writes to an image and pass A reads from it. Order is wrong in this case and the algorithm will re order them to be B → A.

 

4- Pass Culling

As mentioned earlier how important culling is as it makes the code readable by making small tweaks you just decide how the frame would look like. But, How can we really cull a pass? Let's have a look first in the code

std::vector<bool> RenderGraph::cullPasses(
    const std::vector<std::vector<std::pair<uint32_t, AccessType> > > &resourceTouchList) {
    const auto passCount = static_cast<uint32_t>(_passes.size());
    std::vector<bool> needed(passCount, false);
    std::vector<std::unordered_set<uint32_t> > dependencies(passCount); // consumer -> producers
    std::queue<uint32_t> q;

    // 1. Build Data Dependencies from Resources
    const auto resCount = static_cast<uint32_t>(_resources.size());

    for (uint32_t r = 0; r < resCount; ++r) {
        auto touches = resourceTouchList[r];
        if (touches.size() < 2) continue;

        std::ranges::sort(touches, [](auto &a, auto &b) { return a.first < b.first; });

        int32_t lastWriter = -1;

        for (const auto &[passIdx, access]: touches) {
            const bool isWrite = IsWriteAccess(access);
            const bool isRead = (access == AccessType::Read ||
                                 access == AccessType::ShaderRead ||
                                 access == AccessType::ReadWrite ||
                                 access == AccessType::DepthAttachmentRead ||
                                 access == AccessType::DepthStencilAttachmentRead ||
                                 access == AccessType::TransferRead ||
                                 access == AccessType::ComputeShaderRead ||
                                 access == AccessType::FragmentShaderReadSampledImageOrUniformTexelBuffer);

            // If it reads, it depends on the last writer
            if (isRead && lastWriter != -1 && lastWriter != static_cast<int32_t>(passIdx)) {
                dependencies[passIdx].insert(lastWriter);
            }

            // If it writes, it becomes the new writer
            if (isWrite) {
                lastWriter = static_cast<int32_t>(passIdx);
            }
        }
    }

    // 2. Identify Root Needed Passes
    for (uint32_t p = 0; p < passCount; ++p) {
        bool isRoot = false;

        // Condition A: Side Effects
        if (_passes[p]->HasSideEffect()) {
            isRoot = true;
        }
        // Condition B: Writes to Imported or Persistent Resource
        else {
            for (const auto &dep: _passDeps[p]) {
                RGResource *res = GetResource(dep.resource);
                if (!res) continue;

                // If resource is NOT transient (i.e. Imported or Persistent)
                if (!res->transient || res->imported) {
                    if (IsWriteAccess(dep.access)) {
                        isRoot = true;
                        break;
                    }
                }
            }
        }

        if (isRoot) {
            if (!needed[p]) {
                needed[p] = true;
                q.push(p);
            }
        }
    }

    while (!q.empty()) {
        uint32_t u = q.front();
        q.pop();

        for (uint32_t v: dependencies[u]) {
            if (!needed[v]) {
                needed[v] = 1;
                q.push(v);
            }
        }
    }

    return needed;
}

The algorithm works by building Consumer → Producer map. For each resource, it tracks which pass wrote it to it last (Producer). If any other pass reads this resource, the algorithm marks it as a Reader (Consumer) that depends on the last writer (Producer).

Some passes must be treated differently or we call them Root. Two cases where passes treated as roots:

  1. Has a side effect, its a virtual function defined inside RenderPass. If it returns true, It means this is a necessary pass.
  2. Writes to an imported image, If  there is pass that's writing to an imported image . For example, you are building a pass to render a second camera to an image rather than for screen. This pass must be treated as a root.

Once we identify all our roots, the algorithm works backward using BFS to mark passes this way: If pass is needed, and pass A depends on pass B, then pass B is also needed.

The return from this function is a vector of bools where each index represent  if a pass is needed or not

 

3- Topologically Sort The Remaining Passes (Kahn’s Algorithm)

In this step, we determine the valid execution sequence for all needed passes. This happens to ensure a pass is executed only when all its dependencies are executed already.

The code for it is kinda simple

void RenderGraph::buildTopologicalOrder(const std::vector<bool> &needed,
                                        const std::vector<std::unordered_set<uint32_t> > &adj,
                                        std::vector<uint32_t> &indeg) {
    const uint32_t passCount = static_cast<uint32_t>(_passes.size());
    std::queue<uint32_t> q;
    for (uint32_t p = 0; p < passCount; ++p) {
        if (needed[p] && indeg[p] == 0) {
            q.push(p);
        }
    }

    _passOrder.clear();
    while (!q.empty()) {
        uint32_t u = q.front();
        q.pop();
        _passOrder.push_back(u);
        for (uint32_t v: adj[u]) {
            if (needed[v]) {
                indeg[v]--;
                if (indeg[v] == 0) {
                    q.push(v);
                }
            }
        }
    }

    size_t neededCount = std::ranges::count(needed, true);
    ASSERT_EX(_passOrder.size() == neededCount && "Cycle detected in passes");
}

We start the loop from leafs (only needed leafs). For each leaf, we basically “peel” it off the graph and add it to the final execution timeline. As we do that, we let any passes that were waiting on this leaf know by decrementing their dependency count. When a pass’s count drops to zero, it means all its inputs are ready at that point, it becomes a new leaf and gets pushed onto the queue.

 

These are the most important and the hardest steps so far in a render graph. The remaining steps so far is allocating textures and building barriers. This is my full code

#include <unordered_set>
#include <rendering/RenderGraph.h>
#include <rendering/RenderContext.h>
#include <rendering/Renderer.h>
#include <queue>
#include <future>
#include <map>
using namespace engine::rendering::graph;


struct ResourceAccessInfo {
    vk::PipelineStageFlags2 stageMask = vk::PipelineStageFlagBits2::eTopOfPipe;
    vk::AccessFlags2 accessMask = {};
    vk::ImageLayout layout = vk::ImageLayout::eUndefined;
};

inline bool IsWriteAccess(AccessType access) {
    return access == AccessType::Write || access == AccessType::ReadWrite ||
           access == AccessType::ColorAttachmentWrite ||
           access == AccessType::DepthAttachmentWrite || access == AccessType::DepthStencilAttachmentWrite ||
           access == AccessType::TransferWrite || access == AccessType::ComputeShaderWrite;
}

ResourceAccessInfo GetResourceAccessInfo(AccessType access, const RGTextureDesc &desc) {
    ResourceAccessInfo info;

    switch (access) {
        case AccessType::Read: // Default to Shader Read
        case AccessType::ShaderRead:
            info.accessMask = vk::AccessFlagBits2::eShaderRead;
            info.stageMask = vk::PipelineStageFlagBits2::eFragmentShader;
            info.layout = vk::ImageLayout::eShaderReadOnlyOptimal;
            break;
        case AccessType::Write: // Default to Color Attachment Write
        case AccessType::ColorAttachmentWrite:
            info.accessMask = vk::AccessFlagBits2::eColorAttachmentWrite;
            info.stageMask = vk::PipelineStageFlagBits2::eColorAttachmentOutput;
            info.layout = vk::ImageLayout::eColorAttachmentOptimal;
            break;
        case AccessType::ReadWrite: // Default to General
            info.accessMask = vk::AccessFlagBits2::eShaderRead | vk::AccessFlagBits2::eShaderWrite;
            info.stageMask = vk::PipelineStageFlagBits2::eComputeShader |
                             vk::PipelineStageFlagBits2::eFragmentShader;
            info.layout = vk::ImageLayout::eGeneral;
            break;
        case AccessType::DepthAttachmentRead:
            info.accessMask = vk::AccessFlagBits2::eDepthStencilAttachmentRead;
            info.stageMask = vk::PipelineStageFlagBits2::eEarlyFragmentTests |
                             vk::PipelineStageFlagBits2::eLateFragmentTests;
            info.layout = vk::ImageLayout::eDepthStencilReadOnlyOptimal;
            break;
        case AccessType::DepthAttachmentWrite:
        case AccessType::DepthStencilAttachmentWrite:
            info.accessMask = vk::AccessFlagBits2::eDepthStencilAttachmentWrite;
            info.stageMask = vk::PipelineStageFlagBits2::eEarlyFragmentTests |
                             vk::PipelineStageFlagBits2::eLateFragmentTests;
            info.layout = vk::ImageLayout::eDepthStencilAttachmentOptimal;
            break;
        case AccessType::DepthStencilAttachmentRead:
            info.accessMask = vk::AccessFlagBits2::eDepthStencilAttachmentRead;
            info.stageMask = vk::PipelineStageFlagBits2::eEarlyFragmentTests |
                             vk::PipelineStageFlagBits2::eLateFragmentTests;
            info.layout = vk::ImageLayout::eDepthStencilReadOnlyOptimal;
            break;
        case AccessType::TransferRead:
            info.accessMask = vk::AccessFlagBits2::eTransferRead;
            info.stageMask = vk::PipelineStageFlagBits2::eTransfer;
            info.layout = vk::ImageLayout::eTransferSrcOptimal;
            break;
        case AccessType::TransferWrite:
            info.accessMask = vk::AccessFlagBits2::eTransferWrite;
            info.stageMask = vk::PipelineStageFlagBits2::eTransfer;
            info.layout = vk::ImageLayout::eTransferDstOptimal;
            break;
        case AccessType::ComputeShaderRead:
            info.accessMask = vk::AccessFlagBits2::eShaderRead;
            info.stageMask = vk::PipelineStageFlagBits2::eComputeShader;
            info.layout = vk::ImageLayout::eShaderReadOnlyOptimal;
            break;
        case AccessType::ComputeShaderWrite:
            info.accessMask = vk::AccessFlagBits2::eShaderWrite;
            info.stageMask = vk::PipelineStageFlagBits2::eComputeShader;
            info.layout = vk::ImageLayout::eGeneral;
            break;
        case AccessType::FragmentShaderReadSampledImageOrUniformTexelBuffer:
            info.accessMask = vk::AccessFlagBits2::eShaderSampledRead;
            info.stageMask = vk::PipelineStageFlagBits2::eFragmentShader;
            info.layout = vk::ImageLayout::eShaderReadOnlyOptimal;
            break;
        case AccessType::Present:
            info.stageMask = vk::PipelineStageFlagBits2::eBottomOfPipe;
            info.accessMask = {};
            info.layout = vk::ImageLayout::ePresentSrcKHR;
            break;
    }

    if (info.layout == vk::ImageLayout::eUndefined) {
        info.layout = vk::ImageLayout::eGeneral;
    }

    return info;
}

vk::ImageMemoryBarrier2 MakeBarrierForResourceTransition(const RGResource &res, AccessType prevAccess,
                                                         AccessType curAccess,
                                                         engine::rendering::resources::ResourceManager *
                                                         resourceManager) {
    ResourceAccessInfo prevInfo = GetResourceAccessInfo(prevAccess, res.desc);
    ResourceAccessInfo curInfo = GetResourceAccessInfo(curAccess, res.desc);

    vk::ImageMemoryBarrier2 b{};
    b.srcStageMask = prevInfo.stageMask;
    b.srcAccessMask = prevInfo.accessMask;
    b.dstStageMask = curInfo.stageMask;
    b.dstAccessMask = curInfo.accessMask;
    b.oldLayout = prevInfo.layout;
    b.newLayout = curInfo.layout;
    b.srcQueueFamilyIndex = VK_QUEUE_FAMILY_IGNORED;
    b.dstQueueFamilyIndex = VK_QUEUE_FAMILY_IGNORED;

    b.image = nullptr;
    if (res.imported) {
        if (res.physical_texture) {
            b.image = res.physical_texture->image;
        }
    } else if (res.resourceHandle.IsValid()) {
        engine::rendering::resources::Texture *texture = resourceManager->GetTexture(res.resourceHandle);
        ASSERT_EX(texture && "Texture is invalid");
        b.image = texture->image;
    }

    b.subresourceRange.aspectMask = res.desc.aspect;
    b.subresourceRange.aspectMask = res.desc.aspect;
    b.subresourceRange.baseArrayLayer = 0;
    b.subresourceRange.layerCount = res.desc.arrayLayers;
    b.subresourceRange.baseMipLevel = 0;
    b.subresourceRange.levelCount = 1;

    return b;
}


RGHandle RenderGraph::RegisterPersistentTexture(
    resources::Texture *texture) {
    // Create a description from the existing texture's properties
    RGTextureDesc desc{};
    desc.name = texture->name;
    desc.extent = vk::Extent3D(texture->imageExtent.width, texture->imageExtent.height, texture->imageExtent.depth);
    desc.format = static_cast<vk::Format>(texture->imageFormat);
    desc.usage = texture->usage;
    desc.aspect = vk::ImageAspectFlagBits::eColor;

    // Check if a placeholder already exists for this name
    for (auto &res: _resources) {
        if (res.name == texture->name) {
            // Fill in the placeholder
            res.desc = desc;
            res.imported = true;
            res.transient = false;
            res.physical_texture = texture;
            res.oldLayout = vk::ImageLayout::eUndefined;
            return res.handle;
        }
    }


    RGResource resource{};
    resource.handle.id = static_cast<uint32_t>(_resources.size());
    resource.name = texture->name;
    resource.imported = true;
    resource.transient = false;
    resource.physical_texture = texture;
    resource.desc = desc;
    resource.oldLayout = vk::ImageLayout::eUndefined;

    _resources.push_back(resource);
    return resource.handle;
}

void RenderGraph::AddRunAfterDependency(uint32_t currentPassIndex, const std::string &targetPassName) {
    // Current runs AFTER Target -> Edge: Target -> Current
    // otherName -> passIdx
    _explicitEdges.push_back({currentPassIndex, targetPassName, false});
}

void RenderGraph::AddRunBeforeDependency(uint32_t currentPassIndex, const std::string &targetPassName) {
    // Current runs BEFORE Target -> Edge: Current -> Target
    // passIdx -> otherName
    _explicitEdges.push_back({currentPassIndex, targetPassName, true});
}

uint32_t RenderGraph::getPassIndex(const std::string &name) const {
    for (uint32_t i = 0; i < _passes.size(); ++i) {
        if (_passes[i]->GetName() == name) return i;
    }
    return UINT32_MAX;
}

void RenderGraph::UpdateImportedResource(RGHandle handle, resources::Texture *new_texture) {
    RGResource *res = GetResource(handle);
    if (res && res->imported) {
        res->physical_texture = new_texture;
    }
}


void RenderGraph::setupPassesAndRecordDependencies(
    std::vector<std::vector<std::pair<uint32_t, AccessType> > > &resourceTouchList) {
    const auto passCount = static_cast<uint32_t>(_passes.size());
    _passDeps.assign(passCount, {});
    _explicitEdges.clear();
    resourceTouchList.assign(0, {}); // Clear resourceTouchList as well

    for (uint32_t p = 0; p < passCount; ++p) {
        auto &pass = _passes[p];
        RenderGraphBuilder builder(this, p);
        pass->Setup(builder);

        for (auto &[handle, accessType]: builder._dependencies) {
            if (!handle.IsValid()) continue;
            uint32_t rid = handle.id;
            if (rid >= resourceTouchList.size()) {
                resourceTouchList.resize(rid + 1);
            }
            resourceTouchList[rid].emplace_back(p, accessType);
            _passDeps[p].push_back({handle, accessType});
        }
    }
}

void RenderGraph::computeResourceLifetimes(
    const std::vector<std::vector<std::pair<uint32_t, AccessType> > > &resourceTouchList) {
    const uint32_t resCount = static_cast<uint32_t>(_resources.size());
    _resourceFirstUse.assign(resCount, UINT32_MAX);
    _resourceLastUse.assign(resCount, 0);

    for (uint32_t r = 0; r < resCount; ++r) {
        if (resourceTouchList[r].empty()) continue;
        uint32_t first = UINT32_MAX, last = 0;
        for (auto &pr: resourceTouchList[r]) {
            first = std::min(first, pr.first);
            last = std::max(last, pr.first);
        }
        _resourceFirstUse[r] = first;
        _resourceLastUse[r] = last;
    }
}

void RenderGraph::buildAdjacencyGraph(
    const std::vector<std::vector<std::pair<uint32_t, AccessType> > > &resourceTouchList,
    std::vector<std::unordered_set<uint32_t> > &adj, std::vector<uint32_t> &indeg) {
    const uint32_t resCount = static_cast<uint32_t>(_resources.size());
    for (uint32_t r = 0; r < resCount; ++r) {
        auto touches = resourceTouchList[r];
        if (touches.size() < 2) continue;

        // Sort by pass index to ensure deterministic processing order
        std::ranges::sort(touches, [](auto &a, auto &b) { return a.first < b.first; });

        const bool isTransient = _resources[r].transient && !_resources[r].imported;

        for (size_t i = 0; i < touches.size(); ++i) {
            for (size_t j = i + 1; j < touches.size(); ++j) {
                uint32_t passA_idx = touches[i].first;
                AccessType accessA = touches[i].second;

                uint32_t passB_idx = touches[j].first;
                AccessType accessB = touches[j].second;

                bool aWrites = IsWriteAccess(accessA);
                bool bWrites = IsWriteAccess(accessB);

                // If neither writes, no dependency (Read-Read is fine)
                if (!aWrites && !bWrites) continue;

                // Determine dependency direction
                // Default: Insertion Order (Pass A -> Pass B)
                // This covers: Write->Write, Write->Read (if A writes), Read->Write (if Imported/Persistent)
                uint32_t src = passA_idx;
                uint32_t dst = passB_idx;

                if (isTransient) {
                    // Special case for Transient Resources:
                    // They are created (Written) and then Consumed (Read).
                    // If we have a Read and a Write, and the Read comes "first" in the list,
                    // it implies the Read depends on the Write that happens later (because it's transient, it has no prior state).
                    // So we must invert to Write -> Read.
                    if (!aWrites && bWrites) {
                        // A is Reader, B is Writer.
                        // Dependency: B -> A.
                        src = passB_idx;
                        dst = passA_idx;
                    }
                }

                // Add edge src -> dst
                if (adj[src].insert(dst).second) {
                    indeg[dst]++;
                }
            }
        }
    }

    // Apply explicit constraints
    for (const auto &edge: _explicitEdges) {
        uint32_t targetIdx = getPassIndex(edge.otherPassName);
        if (targetIdx == UINT32_MAX) {
            // Target pass not found (e.g., culled or typo)
            continue;
        }

        uint32_t u, v;
        if (edge.runBefore) {
            // Current(passIdx) RUNS BEFORE Target -> Edge: Current -> Target
            u = edge.passIdx;
            v = targetIdx;
        } else {
            // Current(passIdx) RUNS AFTER Target -> Edge: Target -> Current
            u = targetIdx;
            v = edge.passIdx;
        }

        if (adj[u].insert(v).second) {
            indeg[v]++;
        }
    }
}


std::vector<bool> RenderGraph::cullPasses(
    const std::vector<std::vector<std::pair<uint32_t, AccessType> > > &resourceTouchList) {
    const auto passCount = static_cast<uint32_t>(_passes.size());
    std::vector<bool> needed(passCount, false);
    std::vector<std::unordered_set<uint32_t> > dependencies(passCount); // consumer -> producers
    std::queue<uint32_t> q;

    // 1. Build Data Dependencies from Resources
    const auto resCount = static_cast<uint32_t>(_resources.size());

    for (uint32_t r = 0; r < resCount; ++r) {
        auto touches = resourceTouchList[r];
        if (touches.size() < 2) continue;

        std::ranges::sort(touches, [](auto &a, auto &b) { return a.first < b.first; });

        int32_t lastWriter = -1;

        for (const auto &[passIdx, access]: touches) {
            const bool isWrite = IsWriteAccess(access);
            const bool isRead = (access == AccessType::Read ||
                                 access == AccessType::ShaderRead ||
                                 access == AccessType::ReadWrite ||
                                 access == AccessType::DepthAttachmentRead ||
                                 access == AccessType::DepthStencilAttachmentRead ||
                                 access == AccessType::TransferRead ||
                                 access == AccessType::ComputeShaderRead ||
                                 access == AccessType::FragmentShaderReadSampledImageOrUniformTexelBuffer);

            // If it reads, it depends on the last writer
            if (isRead && lastWriter != -1 && lastWriter != static_cast<int32_t>(passIdx)) {
                dependencies[passIdx].insert(lastWriter);
            }

            // If it writes, it becomes the new writer
            if (isWrite) {
                lastWriter = static_cast<int32_t>(passIdx);
            }
        }
    }

    // 2. Add Explicit Dependencies
    for (const auto &edge: _explicitEdges) {
        uint32_t targetIdx = getPassIndex(edge.otherPassName);
        if (targetIdx == UINT32_MAX) continue;

        uint32_t u, v; // u -> v (u executes before v)
        if (edge.runBefore) {
            u = edge.passIdx;
            v = targetIdx;
        } else {
            u = targetIdx;
            v = edge.passIdx;
        }
        // v depends on u
        dependencies[v].insert(u);
    }

    // 3. Identify Root Needed Passes
    for (uint32_t p = 0; p < passCount; ++p) {
        bool isRoot = false;

        // Condition A: Side Effects
        if (_passes[p]->HasSideEffect()) {
            isRoot = true;
        }
        // Condition B: Writes to Imported or Persistent Resource
        else {
            for (const auto &dep: _passDeps[p]) {
                RGResource *res = GetResource(dep.resource);
                if (!res) continue;

                // If resource is NOT transient (i.e. Imported or Persistent)
                if (!res->transient || res->imported) {
                    if (IsWriteAccess(dep.access)) {
                        isRoot = true;
                        break;
                    }
                }
            }
        }

        if (isRoot) {
            if (!needed[p]) {
                needed[p] = true;
                q.push(p);
            }
        }
    }

    while (!q.empty()) {
        uint32_t u = q.front();
        q.pop();

        for (uint32_t v: dependencies[u]) {
            if (!needed[v]) {
                needed[v] = 1;
                q.push(v);
            }
        }
    }

    return needed;
}

void RenderGraph::buildTopologicalOrder(const std::vector<bool> &needed,
                                        const std::vector<std::unordered_set<uint32_t> > &adj,
                                        std::vector<uint32_t> &indeg) {
    const uint32_t passCount = static_cast<uint32_t>(_passes.size());
    std::queue<uint32_t> q;
    for (uint32_t p = 0; p < passCount; ++p) {
        if (needed[p] && indeg[p] == 0) {
            q.push(p);
        }
    }

    _passOrder.clear();
    while (!q.empty()) {
        uint32_t u = q.front();
        q.pop();
        _passOrder.push_back(u);
        for (uint32_t v: adj[u]) {
            if (needed[v]) {
                indeg[v]--;
                if (indeg[v] == 0) {
                    q.push(v);
                }
            }
        }
    }

    size_t neededCount = std::ranges::count(needed, true);
    ASSERT_EX(_passOrder.size() == neededCount && "Cycle detected in passes");
}

void RenderGraph::buildBarriers(const std::vector<std::vector<std::pair<uint32_t, AccessType> > > &resourceTouchList,
                                const std::vector<bool> &needed) {
    _barriersPerPass.assign(_passOrder.size(), {});
    const auto resCount = static_cast<uint32_t>(_resources.size());

    // Build a map from Pass Index -> Execution Order Index for sorting execution-dependent barriers
    std::unordered_map<uint32_t, uint32_t> passIndexToOrderIndex;
    passIndexToOrderIndex.reserve(_passOrder.size());
    for (uint32_t i = 0; i < _passOrder.size(); ++i) {
        passIndexToOrderIndex[_passOrder[i]] = i;
    }

    // Group resources by their PHYSICAL handle index
    // Key: Physical Resource Index, Value: List of Logical Resource Indices
    std::map<uint32_t, std::vector<uint32_t> > physicalGroups;


    for (uint32_t r = 0; r < resCount; ++r) {
        if (!_resources[r].imported && _resources[r].resourceHandle.IsValid()) {
            physicalGroups[_resources[r].resourceHandle.index].push_back(r);
        }
    }

    // Helper to process a timeline of touches
    auto ProcessTimeline = [&](const std::vector<uint32_t> &logicalResources, bool isImported) {
        // Collect ALL touches from ALL logical resources in this group
        struct Touch {
            uint32_t passIdx;
            AccessType access;
            uint32_t logicalResIdx;
        };
        std::vector<Touch> combinedTouches;
        combinedTouches.reserve(logicalResources.size());
        for (uint32_t currResIdx: logicalResources) {
            const auto &touches = resourceTouchList[currResIdx];
            for (const auto &t: touches) {
                if (needed[t.first]) {
                    combinedTouches.push_back({t.first, t.second, currResIdx});
                }
            }
        }

        if (combinedTouches.empty()) return;

        // Sort by EXECUTION ORDER
        std::ranges::sort(combinedTouches, [&](const auto &a, const auto &b) {
            return passIndexToOrderIndex.at(a.passIdx) < passIndexToOrderIndex.at(b.passIdx);
        });


        // Initial transition for the first use
        {
            const auto &firstTouch = combinedTouches[0];
            const auto &firstRes = _resources[firstTouch.logicalResIdx];
            ResourceAccessInfo curInfo = GetResourceAccessInfo(firstTouch.access, firstRes.desc);

            vk::ImageMemoryBarrier2 b{};
            b.srcStageMask = vk::PipelineStageFlagBits2::eTopOfPipe;
            b.srcAccessMask = {};
            b.dstStageMask = curInfo.stageMask;
            b.dstAccessMask = curInfo.accessMask;
            b.oldLayout = firstRes.oldLayout;
            b.newLayout = curInfo.layout;
            b.srcQueueFamilyIndex = VK_QUEUE_FAMILY_IGNORED;
            b.dstQueueFamilyIndex = VK_QUEUE_FAMILY_IGNORED;

            b.image = nullptr;
            if (firstRes.imported && firstRes.physical_texture) {
                b.image = firstRes.physical_texture->image;
                _resources[firstTouch.logicalResIdx].oldLayout = curInfo.layout;
            } else if (firstRes.resourceHandle.IsValid()) {
                resources::Texture *texture = _resourceManager->GetTexture(firstRes.resourceHandle);
                if (texture) {
                    b.image = texture->image;
                    // Transient resources don't really need oldLayout persistence across frames usually,
                    // unless we want to optimize 'start of frame' barriers.
                    // But here, we just set the init layout.
                }
            }

            if (b.image) {
                b.subresourceRange.aspectMask = firstRes.desc.aspect;
                b.subresourceRange.baseArrayLayer = 0;
                b.subresourceRange.layerCount = firstRes.desc.arrayLayers;
                b.subresourceRange.baseMipLevel = 0;
                b.subresourceRange.levelCount = 1;

                uint32_t orderIdx = passIndexToOrderIndex.at(firstTouch.passIdx);
                _barriersPerPass[orderIdx].push_back(b);
            }
        }

        if (combinedTouches.size() < 2) return;

        for (size_t i = 1; i < combinedTouches.size(); ++i) {
            const auto &prevTouch = combinedTouches[i - 1];
            const auto &curTouch = combinedTouches[i];

            const auto &currRes = _resources[curTouch.logicalResIdx];

            auto barrier = MakeBarrierForResourceTransition(currRes, prevTouch.access, curTouch.access,
                                                            _resourceManager);

            if (currRes.imported) {
                ResourceAccessInfo curInfo = GetResourceAccessInfo(curTouch.access, currRes.desc);
                _resources[curTouch.logicalResIdx].oldLayout = curInfo.layout;
            }

            uint32_t orderIdx = passIndexToOrderIndex.at(curTouch.passIdx);
            _barriersPerPass[orderIdx].push_back(barrier);
        }
    };

    // 1. Process Imported Resources (Unique)
    for (uint32_t r = 0; r < resCount; ++r) {
        if (_resources[r].imported) {
            std::vector<uint32_t> group = {r};
            ProcessTimeline(group, true);
        }
    }

    // 2. Process Transient Physical Groups
    for (auto &group: physicalGroups | std::views::values) {
        if (!group.empty()) {
            ProcessTimeline(group, false);
        }
    }
}

void RenderGraph::buildCompiledPasses() {
    _compiledPasses.clear();
    _compiledPasses.reserve(_passOrder.size());

    for (size_t i = 0; i < _passOrder.size(); ++i) {
        uint32_t passIdx = _passOrder[i];
        CompiledPass cp;
        cp.pass = _passes[passIdx].get();
        cp.imageBarriers = _barriersPerPass[i];
        _compiledPasses.push_back(std::move(cp));
    }
}

void RenderGraph::finalizePasses() {
    std::vector<std::unique_ptr<RenderPass> > finalPasses;
    finalPasses.reserve(_passOrder.size());
    // A map to re-index passes if needed, though moving is safer
    std::vector<size_t> oldIndices = _passOrder;
    std::unordered_map<uint32_t, std::unique_ptr<RenderPass> > passesToMove;
    for (uint32_t idx: oldIndices) {
        passesToMove[idx] = std::move(_passes[idx]);
    }
    for (uint32_t idx: oldIndices) {
        finalPasses.push_back(std::move(passesToMove[idx]));
    }
    _passes.swap(finalPasses);
}


void RenderGraph::Compile(uint32_t frameIndex) {
    const uint32_t passCount = static_cast<uint32_t>(_passes.size());

    // 1. Setup passes and record resource touches
    // Note: resCount is computed AFTER setup, because Setup may create new resources.
    std::vector<std::vector<std::pair<uint32_t, AccessType> > > resourceTouchList;
    setupPassesAndRecordDependencies(resourceTouchList);

    // 1.5 Validate: no placeholder resources remain unfilled
    for (const auto &res: _resources) {
        if (!res.imported && res.desc.format == vk::Format::eUndefined) {
            ASSERT_EX(false && "RenderGraph: Resource declared but never created");
        }
    }

    const auto resCount = static_cast<uint32_t>(_resources.size());

    // Ensure resourceTouchList covers all resources (some may have been created but not yet touched)
    resourceTouchList.resize(resCount);

    // 2. Compute resource lifetimes (first and last use)
    computeResourceLifetimes(resourceTouchList);

    // 3. Build dependency graph (adjacency list and in-degrees)
    std::vector<std::unordered_set<uint32_t> > adj(passCount);
    std::vector<uint32_t> indeg(passCount, 0);
    buildAdjacencyGraph(resourceTouchList, adj, indeg);

    // 4. Cull unneeded passes
    std::vector<bool> needed = cullPasses(resourceTouchList);

    // 5. Topologically sort the remaining passes
    buildTopologicalOrder(needed, adj, indeg);

    // 5.5 Allocate physical resources for transient resources with intra-frame aliasing
    {
        // Build a map from pass index -> execution order index
        std::unordered_map<uint32_t, uint32_t> passToExecOrder;
        passToExecOrder.reserve(_passOrder.size());
        for (uint32_t i = 0; i < _passOrder.size(); ++i) {
            passToExecOrder[_passOrder[i]] = i;
        }

        // Recompute resource lifetimes in execution order (not raw pass index)
        const uint32_t totalResCount = static_cast<uint32_t>(_resources.size());
        std::vector<uint32_t> execFirstUse(totalResCount, UINT32_MAX);
        std::vector<uint32_t> execLastUse(totalResCount, 0);
        std::vector<bool> resIsNeeded(totalResCount, false);

        for (uint32_t r = 0; r < totalResCount; ++r) {
            if (!_resources[r].transient || _resources[r].imported) continue;
            if (r >= resourceTouchList.size()) continue;

            for (auto &[passIdx, accessType]: resourceTouchList[r]) {
                if (!needed[passIdx]) continue;
                auto it = passToExecOrder.find(passIdx);
                if (it == passToExecOrder.end()) continue;

                uint32_t execIdx = it->second;
                execFirstUse[r] = std::min(execFirstUse[r], execIdx);
                execLastUse[r] = std::max(execLastUse[r], execIdx);
                resIsNeeded[r] = true;
            }
        }

        // Collect needed transient resources and sort by first use in execution order
        struct ResAlloc {
            uint32_t resIdx;
            uint32_t firstUse; // execution order
            uint32_t lastUse; // execution order
        };
        std::vector<ResAlloc> toAllocate;
        for (uint32_t r = 0; r < totalResCount; ++r) {
            if (!_resources[r].transient || _resources[r].imported) continue;
            if (!resIsNeeded[r]) continue;
            toAllocate.push_back({r, execFirstUse[r], execLastUse[r]});
        }

        std::ranges::sort(toAllocate, [](const ResAlloc &a, const ResAlloc &b) {
            return a.firstUse < b.firstUse;
        });

        // Active textures sorted by end time (earliest end first)
        // Each entry: {endTime, desc, handle}
        struct ActiveEntry {
            uint32_t endTime;
            RGTextureDesc desc;
            resources::RGTextureHandle handle;
        };
        auto cmp = [](const ActiveEntry &a, const ActiveEntry &b) { return a.endTime > b.endTime; };
        std::priority_queue<ActiveEntry, std::vector<ActiveEntry>, decltype(cmp)> activeTextures(cmp);

        // Frame-local free pool for intra-frame reuse
        std::unordered_map<RGTextureDesc, std::vector<resources::RGTextureHandle>,
            RGTextureDescHash, RGTextureDescEqual> frameFreePool;

        for (auto &alloc: toAllocate) {
            auto &res = _resources[alloc.resIdx];

            // Expire active textures whose lifetime has ended before this resource starts
            while (!activeTextures.empty() && activeTextures.top().endTime < alloc.firstUse) {
                auto expired = activeTextures.top();
                activeTextures.pop();
                frameFreePool[expired.desc].push_back(expired.handle);
            }

            // Try to reuse from frame-local free pool first (intra-frame aliasing)
            bool reused = false;
            if (frameFreePool.contains(res.desc)) {
                auto &freeList = frameFreePool[res.desc];
                if (!freeList.empty()) {
                    res.resourceHandle = freeList.back();
                    freeList.pop_back();
                    reused = true;
                }
            }

            // Then try the persistent inter-frame pool
            if (!reused) {
                if (frameIndex < _texturePools.size() && _texturePools[frameIndex].contains(res.desc)) {
                    auto &pool = _texturePools[frameIndex][res.desc];
                    if (!pool.empty()) {
                        res.resourceHandle = pool.back();
                        pool.pop_back();
                        reused = true;
                    }
                }
            }

            // Finally, create a new texture
            if (!reused) {
                VmaAllocationCreateInfo allocInfo{};
                allocInfo.usage = VMA_MEMORY_USAGE_GPU_ONLY;
                auto &desc = res.desc;
                res.resourceHandle = _resourceManager->CreateTexture(
                    desc.name, desc.format, desc.usage, desc.extent,
                    desc.aspect, allocInfo, desc.arrayLayers, 1, desc.sampleCount, desc.flags);
            }

            // Track this texture as active until its last use
            activeTextures.push({alloc.lastUse, res.desc, res.resourceHandle});
        }
    }

    // 6. Build barriers between passes
    buildBarriers(resourceTouchList, needed);

    // 7. Create final compiled pass list for execution
    buildCompiledPasses();

    // 8. Prune the original pass list to only contain active passes in order
    finalizePasses();
}


void RenderGraph::Execute(CommandBuffer &cmd, const WindowDefinition &extent, Renderer *renderer) {
    PROFILE_CLASS_SCOPE();
    // Create the context for this frame's execution
    RenderContext context(*this, extent);

    if (renderer && renderer->GetJobSystem()) {
        auto jobSystem = renderer->GetJobSystem();
        std::vector<std::future<void> > futures;
        std::vector<vk::CommandBuffer> secondaryBuffers(_compiledPasses.size());

        // 1. Dispatch parallel recording (Barriers + Passes)
        for (size_t i = 0; i < _compiledPasses.size(); ++i) {
            futures.push_back(jobSystem->Execute([&, i]() {
                uint32_t threadIndex = engine::core::JobSystem::GetThreadIndex();

                // 1. Allocate Secondary Buffer
                vk::CommandBuffer cmdBuf = renderer->AllocateSecondaryCommandBuffer(threadIndex);

                vk::CommandBufferInheritanceInfo inheritanceInfo{};
                vk::CommandBufferBeginInfo beginInfo{};
                beginInfo.flags = vk::CommandBufferUsageFlagBits::eOneTimeSubmit;
                beginInfo.pInheritanceInfo = &inheritanceInfo;

                cmdBuf.begin(beginInfo);

                // Access the compiled pass data (Ensure this is read-only or thread-safe!)
                const auto &cp = _compiledPasses[i];

                // 2. RECORD BARRIERS (Parallelized)
                // We record the barrier into the secondary buffer *before* the pass executes.
                if (!cp.imageBarriers.empty()) {
                    vk::DependencyInfo depInfo{};
                    depInfo.imageMemoryBarrierCount = static_cast<uint32_t>(cp.imageBarriers.size());
                    depInfo.pImageMemoryBarriers = cp.imageBarriers.data();

                    // IMPORTANT: Use 'cmdBuf' (secondary), not 'cmd' (primary)
                    cmdBuf.pipelineBarrier2(depInfo);
                }

                // 3. RECORD PASS
                CommandBuffer wrapper(renderer->GetContext()->device, cmdBuf);
                _compiledPasses[i].pass->Execute(wrapper, context);

                cmdBuf.end();

                secondaryBuffers[i] = cmdBuf;
            }));
        }

        // 2. Wait for all threads and Execute
        for (size_t i = 0; i < _compiledPasses.size(); ++i) {
            futures[i].get(); // Wait for recording to finish
        }
        cmd.GetHandle().executeCommands(secondaryBuffers);
    } else {
        // Serial Path (Fallback)
        for (const auto &cp: _compiledPasses) {
            if (!cp.imageBarriers.empty()) {
                vk::DependencyInfo depInfo{};
                depInfo.imageMemoryBarrierCount = static_cast<uint32_t>(cp.imageBarriers.size());
                depInfo.pImageMemoryBarriers = cp.imageBarriers.data();

                cmd.GetHandle().pipelineBarrier2(depInfo);
            }

            // Execute the pass (user-recorded draw/dispatch commands)
            cp.pass->Execute(cmd, context);
        }
    }
}


RGResource *RenderGraph::GetResource(const RGHandle &handle) {
    if (!handle.IsValid() || handle.id >= _resources.size()) return nullptr;
    return &_resources[handle.id];
}

engine::rendering::resources::Texture *RenderGraph::getPhysicalTexture(const std::string &name) {
    for (const auto &res: _resources) {
        if (res.name == name) {
            if (res.resourceHandle.IsValid()) {
                return _resourceManager->GetTexture(res.resourceHandle);
            }
        }
    }
    return nullptr;
}

RGHandle RenderGraph::getTextureHandle(const std::string &name) {
    // Check if the resource already exists
    for (const auto &res: _resources) {
        if (res.name == name) {
            return res.handle;
        }
    }

    // Forward reference: create a placeholder resource
    RGResource placeholder{};
    placeholder.handle.id = static_cast<uint32_t>(_resources.size());
    placeholder.name = name;
    placeholder.imported = false;
    placeholder.transient = true;
    placeholder.desc.name = name;
    placeholder.desc.format = vk::Format::eUndefined; // Marks as placeholder
    _resources.push_back(placeholder);
    return placeholder.handle;
}

RGHandle RenderGraph::CreateTexture(const RGTextureDesc &desc) {
    // Check if a placeholder already exists for this name
    for (auto &res: _resources) {
        if (res.name == desc.name) {
            // Fill in the placeholder with the real description
            res.desc = desc;
            res.transient = true;
            return res.handle;
        }
    }

    // No placeholder found, create new resource
    RGResource resource{};
    resource.handle.id = static_cast<uint32_t>(_resources.size());
    resource.name = desc.name;
    resource.desc = desc;
    resource.imported = false;
    resource.transient = true;
    _resources.push_back(resource);
    return resource.handle;
}

void RenderGraph::Reset(uint32_t frameIndex) {

    std::unordered_set<uint32_t> returnedHandles;
    for (auto &res: _resources) {
        if (res.transient && !res.imported && res.resourceHandle.IsValid()) {
            if (returnedHandles.insert(res.resourceHandle.index).second) {
                if (frameIndex < _texturePools.size()) {
                    _texturePools[frameIndex][res.desc].push_back(res.resourceHandle);
                }
            }
        }
    }

    // Clear all internal state to prepare for a new build
    _passes.clear();
    _resources.clear();
    _passOrder.clear();
    _compiledPasses.clear();
    _resourceFirstUse.clear();
    _resourceLastUse.clear();
    _passDeps.clear();
    _barriersPerPass.clear();
}

 

It supports the following:

  1. Define textures per frame in flight.
  2. Reusage textures in the same frame if they match to avoid double allocations.
  3. Multi threading recording.
  4. Passes constraints (Force a pass to be executed after a specific pass)

If you’ve come this far, it means you were interested and also asking an important question: why did I build it with this feature set? And you’re right, it’s very complicated. There are a lot of features for a solo developer, but I like to build things properly with all possible features. I didn’t lose any performance while upgrading and adding new ones.

This article was updated on February 19, 2026