从零开始手敲次世代引擎(十九)

上一篇我们实现了一个简单的基于块链的Allocator。

接下来我们来实现我们的内存管理模块:Memory Manager

根据之前我们的讨论,我们设计Memory Manager为这样一个角色,它总管着所有动态分配的内存。(但是严格来说,诸如堆栈,自动变量,Memory Manager创建之前创建的对象,以及一些全局对象,比如代表我们架构里的各种模块,也就是***Manager,并不由其管理)

这种管理,是通过管理一系列的Allocator来实现的。每种Allocator,代表了一种分配策略。Allocator以页(Page)为单位获取资源,再以块(Block)为单位分配资源。

采用这种结构的最大好处是:我们可以很方便地添加新类型的Allocator,并通过修改内存分配需求(Request)与分配器(Allocator)之间的映射关系(Allocator Lookup Policy)来快速地实现新的内存分配策略。

另外的好处是:我们可以通过一个线程与Allocator之间的绑定关系,迅速地实现线程的本地堆(Thread Local Storage)。这个堆由于为某个线程所独占,所以并不需要互锁机制,从而可以大大地加速线程的执行速度。

而且,这种结构还可以纵向拓展。如参考引用2那样,只要稍加改造,我们可以在Allocator之间形成层级关系以及兄弟(slibing)关系。

这种层级关系的意义在于,如果我们将一个很复杂的处理划分为一些单纯的短片段的话,那么每个片段的内存访问模式(access pattern)是有规律可循的。也就是说,有的片段总是倾向于频繁的小块内存使用;有的则是大块大块的使用;有的不怎么使用;有的则突发性大量使用,等等。这些不同的使用频率和使用强度,如果我们在同一个层级对其进行管理,那么状况就十分复杂,变得不确定性很强,很难预测;然而如果我们能归纳它们的特征,尽量将类似频率和强度的大量处理组织在同一个层级,那么同一个层级的互相随机叠加,此消彼长,从整体上就会呈现出一种相对的确定性。这种趋势随着并行运行的处理的数量和不确定性增加而增强。

我们的游戏引擎设计为多线程多模块异步平行执行模式。每个模块的任务类型很不一样,执行频率也不同。比如,渲染模块需要逐帧运行,涉及到大量的大块内存使用,但是这些buffer往往生命周期很短;场景加载模块则相对来说以很长的周期运行,其数据结构可能会在内存当中保持数分钟甚至数十分钟;而AI等逻辑模块则是典型的计算模块,会涉及到大量小buffer的高频分配与释放。

于此同时,游戏场景是由场景物体组成的,我们的很多模块都需要以场景物体为单位进行处理。同一个模块对于不同场景物体的处理是类似的,也就是说对于内存的访问模式是类似的。我们可以很自然地把他们组织成为一个内存管理上的兄弟关系。

好,接下来就让我们把这些想法落实到代码当中。因为我们目前还没有其它模块,我们还不需要完成上面所设计的全部内容。我们先将我们上一篇所写的Allocator组织到我们的Memory Manager当中,提供一个最基本的,单层的但是支持不同分配尺寸的,线程不安全的内存管理模块。

代码主要参考了参考引用1,结合我们的架构与命名规则进行了封装,并且进行了跨平台方面的一些改造。

#pragma once
#include "IRuntimeModule.hpp"
#include "Allocator.hpp"
#include <new>

namespace My {
    class MemoryManager : implements IRuntimeModule
    {
    public:
        template<typename T, typename... Arguments>
        T* New(Arguments... parameters)
        {
            return new (Allocate(sizeof(T))) T(parameters...);
        }

        template<typename T>
        void Delete(T *p)
        {
            reinterpret_cast<T*>(p)->~T();
            Free(p, sizeof(T));
        }

    public:
        virtual ~MemoryManager() {}

        virtual int Initialize();
        virtual void Finalize();
        virtual void Tick();

        void* Allocate(size_t size);
        void  Free(void* p, size_t size);
    private:
        static size_t*        m_pBlockSizeLookup;
        static Allocator*     m_pAllocators;
    private:
        static Allocator* LookUpAllocator(size_t size);
    };
}
#include "MemoryManager.hpp"
#include <malloc.h>

using namespace My;

namespace My {
    static const uint32_t kBlockSizes[] = {
        // 4-increments
        4,  8, 12, 16, 20, 24, 28, 32, 36, 40, 44, 48,
        52, 56, 60, 64, 68, 72, 76, 80, 84, 88, 92, 96, 

        // 32-increments
        128, 160, 192, 224, 256, 288, 320, 352, 384, 
        416, 448, 480, 512, 544, 576, 608, 640, 

        // 64-increments
        704, 768, 832, 896, 960, 1024
    };

    static const uint32_t kPageSize  = 8192;
    static const uint32_t kAlignment = 4;

    // number of elements in the block size array
    static const uint32_t kNumBlockSizes = 
        sizeof(kBlockSizes) / sizeof(kBlockSizes[0]);

    // largest valid block size
    static const uint32_t kMaxBlockSize = 
        kBlockSizes[kNumBlockSizes - 1];
}

int My::MemoryManager::Initialize()
{
    // one-time initialization
    static bool s_bInitialized = false;
    if (!s_bInitialized) {
        // initialize block size lookup table
        m_pBlockSizeLookup = new size_t[kMaxBlockSize + 1];
        size_t j = 0;
        for (size_t i = 0; i <= kMaxBlockSize; i++) {
            if (i > kBlockSizes[j]) ++j;
            m_pBlockSizeLookup[i] = j;
        }

        // initialize the allocators
        m_pAllocators = new Allocator[kNumBlockSizes];
        for (size_t i = 0; i < kNumBlockSizes; i++) {
            m_pAllocators[i].Reset(kBlockSizes[i], kPageSize, kAlignment);
        }

        s_bInitialized = true;
    }

    return 0;
}

void My::MemoryManager::Finalize()
{
    delete[] m_pAllocators;
    delete[] m_pBlockSizeLookup;
}

void My::MemoryManager::Tick()
{
}

Allocator* My::MemoryManager::LookUpAllocator(size_t size)
{

    // check eligibility for lookup
    if (size <= kMaxBlockSize)
        return m_pAllocators + m_pBlockSizeLookup[size];
    else
        return nullptr;
}

void* My::MemoryManager::Allocate(size_t size)
{
    Allocator* pAlloc = LookUpAllocator(size);
    if (pAlloc)
        return pAlloc->Allocate();
    else
        return malloc(size);
}

void My::MemoryManager::Free(void* p, size_t size)
{
    Allocator* pAlloc = LookUpAllocator(size);
    if (pAlloc)
        pAlloc->Free(p);
    else
        free(p);
}

参考引用

  1. Memory Management part 2 of 3: C-Style Interface | Ming-Lun “Allen” Chou
  2. How tcmalloc Works
  3. Memory management
  4. operator new, operator new[]

本作品采用知识共享署名 4.0 国际许可协议进行许可。

从零开始手敲次世代游戏引擎(十八)

如上一篇预告的,本篇我们对游戏引擎当中的内存管理进行一些初步的探讨。
首先,关于游戏引擎内存管理的必要性,除了为了实现加载远大于物理内存容量的内容(比如开放世界游戏)之外,还有很多性能和调试方面的考虑。关于这方面在参考引用1当中有比较详细且风趣的阐述。
当然参考引用1成文时间是差不多两年以前,很多参数在今天看来已经有了一些变化。比如当代CPU和内存之间的带宽一般在几十个Gbps,而GPU与内存(显存)之间的带宽已经飙升到几百个Gbps的水准。但是这并没有改变内存访问依然远远落后于CPU/GPU计算能力的状况。
而且尤为重要的是,对于游戏引擎(运行时)来说,一切都是数个毫秒(VR游戏要求120fps)到十数个毫秒(60fps),最多也就是33毫秒(30fps)的人生的轮回。在这样的系统当中,每一个毫秒都弥足珍贵,都值得我们去拼。(当然,就如我们前一篇指出的,也并不是所有的处理都需要按照这个节奏去跑)
这也回答了在本系列为什么采用C/C++这种“中级语言”进行编程。因为这是一个在性能/控制力/可维护性上比较理想的折衷点。
这里我补充一下关于malloc/new的知识,为什么说它们比较低效。
我们知道,操作系统的主要功能就是管理计算机系统的各种硬件资源。应用程序需要使用硬件资源到时候,需要向操作系统进行申请。而这种申请的接口,就被称为系统调用。

在近代操作系统当中,出于安全方面的考虑,操作系统与用户程序不是跑在一个级别上的。操作系统拥有所有的特权,而用户程序只是跑在操作系统提供的一个虚拟环境之上。用户程序看到的内存地址并不是真正的物理内存地址,而是一个虚拟的地址空间。这个地址空间是完全为用户程序定制的,不同的用户程序,即使这个地址一样,也不是指向同一个物理内存(或者分页文件)地址。

因此,当我们调用malloc/new进行heap分配的时候,并不是我们的线程直接杀入内核,去领一块内存出来。而是我们提交一个申领申请,放在放申请单的盒子里,然后等。操作系统方面按顺序处理这些申请,处理完了将处理结果放在处理结果盒子里,然后叫我们的号让我们去领。这个过程和我们在生活当中到特权机关去办事很类似。

虽然这些系统API调用看起来都是同步的,但实际上这是一个异步操作,只不过在操作完成之前,我们的线程会被block住,操作完成了,线程unblock,函数返回,看起来就像普通函数调用那样,其实这是一个比较复杂的过程。

而且在这个过程当中的参数传递,一般情况下都会发生拷贝。这是因为操作系统和用户程序分别工作在不同的地址空间,因此直接传递指针(地址)也是没有什么意义的。

因此,提高程序在CPU端的执行效率的一个重要手段,就是要减少系统调用。在程序初始化阶段就一次申领所需的资源,然后自己内部进行分配管理,这就是一种常用的减少系统调用的方法。

参考引用4提供了一种基于块链(block chain)的内存管理方法。我们首先将它引入到我们的引擎当中。这部分属于引擎的核心内容,因此我们将文件创建在Framework/Common下面:

Allocator.hpp

#include <cstddef>
#include <cstdint>

namespace My {

    struct BlockHeader {
        // union-ed with data
        BlockHeader* pNext;
    };

    struct PageHeader {
        PageHeader* pNext;
        BlockHeader* Blocks() {
                return reinterpret_cast<BlockHeader*>(this + 1);
        }
    };

    class Allocator {
        public:
                // debug patterns
                static const uint8_t PATTERN_ALIGN = 0xFC;
                static const uint8_t PATTERN_ALLOC = 0xFD;
                static const uint8_t PATTERN_FREE  = 0xFE;

                Allocator(size_t data_size, size_t page_size, size_t alignment);
                ~Allocator();

                // resets the allocator to a new configuration
                void Reset(size_t data_size, size_t page_size, size_t alignment);

                // alloc and free blocks
                void* Allocate();
                void  Free(void* p);
                void  FreeAll();
        private:
#if defined(_DEBUG)
                // fill a free page with debug patterns
                void FillFreePage(PageHeader* pPage);

                // fill a block with debug patterns
                void FillFreeBlock(BlockHeader* pBlock);

                // fill an allocated block with debug patterns
                void FillAllocatedBlock(BlockHeader* pBlock);
#endif

                // gets the next block
                BlockHeader* NextBlock(BlockHeader* pBlock);

                // the page list
                PageHeader* m_pPageList;

                // the free block list
                BlockHeader* m_pFreeList;

                size_t      m_szDataSize;
                size_t      m_szPageSize;
                size_t      m_szAlignmentSize;
                size_t      m_szBlockSize;
                uint32_t    m_nBlocksPerPage;

                // statistics
                uint32_t    m_nPages;
                uint32_t    m_nBlocks;
                uint32_t    m_nFreeBlocks;

                // disable copy & assignment
                Allocator(const Allocator& clone);
                Allocator &operator=(const Allocator &rhs);
    };
}

上面的代码相对于参考引用4的改动主要是下面几个方面:

  1. 根据我们引擎整体的命名风格调整了变量的名字
  2. 使用更为明确的数据类型,比如uint32_t,以及移植性更好的数据类型,如size_t
  3. 添加了必须且跨平台的C++标准头文件
  4. 使用预编译指令将用于调试的代码标出,只在调试版本当中编译这些代码

Allocator.cpp

#include "Allocator.hpp"
#include <cassert>
#include <cstring>

#ifndef ALIGN
#define ALIGN(x, a)         (((x) + ((a) - 1)) & ~((a) - 1))
#endif

using namespace My;

My::Allocator::Allocator(size_t data_size, size_t page_size, size_t alignment)
        : m_pPageList(nullptr), m_pFreeList(nullptr)
{
    Reset(data_size, page_size, alignment);
}

My::Allocator::~Allocator()
{
    FreeAll();
}

void My::Allocator::Reset(size_t data_size, size_t page_size, size_t alignment)
{
    FreeAll();

    m_szDataSize = data_size;
    m_szPageSize = page_size;

    size_t minimal_size = (sizeof(BlockHeader) > m_szDataSize) ? sizeof(BlockHeader) : m_szDataSize;
    // this magic only works when alignment is 2^n, which should general be the case
    // because most CPU/GPU also requires the aligment be in 2^n
    // but still we use a assert to guarantee it
#if defined(_DEBUG)
    assert(alignment > 0 && ((alignment & (alignment-1))) == 0);
#endif
    m_szBlockSize = ALIGN(minimal_size, alignment);

    m_szAlignmentSize = m_szBlockSize - minimal_size;

    m_nBlocksPerPage = (m_szPageSize - sizeof(PageHeader)) / m_szBlockSize;
}

void* My::Allocator::Allocate()
{
    if (!m_pFreeList) {
        // allocate a new page
        PageHeader* pNewPage = reinterpret_cast<PageHeader*>(new uint8_t[m_szPageSize]);
        ++m_nPages;
        m_nBlocks += m_nBlocksPerPage;
        m_nFreeBlocks += m_nBlocksPerPage;

#if defined(_DEBUG)
        FillFreePage(pNewPage);
#endif

        if (m_pPageList) {
            pNewPage->pNext = m_pPageList;
        }

        m_pPageList = pNewPage;

        BlockHeader* pBlock = pNewPage->Blocks();
        // link each block in the page
        for (uint32_t i = 0; i < m_nBlocksPerPage; i++) {
            pBlock->pNext = NextBlock(pBlock);
            pBlock = NextBlock(pBlock);
        }
        pBlock->pNext = nullptr;

        m_pFreeList = pNewPage->Blocks();
    }

    BlockHeader* freeBlock = m_pFreeList;
    m_pFreeList = m_pFreeList->pNext;
    --m_nFreeBlocks;

#if defined(_DEBUG)
    FillAllocatedBlock(freeBlock);
#endif

    return reinterpret_cast<void*>(freeBlock);
}

void My::Allocator::Free(void* p)
{
    BlockHeader* block = reinterpret_cast<BlockHeader*>(p);

#if defined(_DEBUG)
    FillFreeBlock(block);
#endif

    block->pNext = m_pFreeList;
    m_pFreeList = block;
    ++m_nFreeBlocks;
}

void My::Allocator::FreeAll()
{
    PageHeader* pPage = m_pPageList;
    while(pPage) {
        PageHeader* _p = pPage;
        pPage = pPage->pNext;

        delete[] reinterpret_cast<uint8_t*>(_p);
    }

    m_pPageList = nullptr;
    m_pFreeList = nullptr;

    m_nPages        = 0;
    m_nBlocks       = 0;
    m_nFreeBlocks   = 0;
}

#if defined(_DEBUG)
void My::Allocator::FillFreePage(PageHeader *pPage)
{
    // page header
    pPage->pNext = nullptr;
 
    // blocks
    BlockHeader *pBlock = pPage->Blocks();
    for (uint32_t i = 0; i < m_nBlocksPerPage; i++)
    {
        FillFreeBlock(pBlock);
        pBlock = NextBlock(pBlock);
    }
}
 
void My::Allocator::FillFreeBlock(BlockHeader *pBlock)
{
    // block header + data
    std::memset(pBlock, PATTERN_FREE, m_szBlockSize - m_szAlignmentSize);
 
    // alignment
    std::memset(reinterpret_cast<uint8_t*>(pBlock) + m_szBlockSize - m_szAlignmentSize, 
                PATTERN_ALIGN, m_szAlignmentSize);
}
 
void My::Allocator::FillAllocatedBlock(BlockHeader *pBlock)
{
    // block header + data
    std::memset(pBlock, PATTERN_ALLOC, m_szBlockSize - m_szAlignmentSize);
 
    // alignment
    std::memset(reinterpret_cast<uint8_t*>(pBlock) + m_szBlockSize - m_szAlignmentSize, 
                PATTERN_ALIGN, m_szAlignmentSize);
}
#endif
My::BlockHeader* My::Allocator::NextBlock(BlockHeader *pBlock)
{
    return reinterpret_cast<BlockHeader *>(reinterpret_cast<uint8_t*>(pBlock) + m_szBlockSize);
}

上面的代码相对于参考引用4的改动主要是下面几个方面:

  1. 反映了头文件当中的变化
  2. 使用了更为高效的对齐计算算法(有前提条件,具体见代码注释)

好了。接下来我们修改我们的CMakeLists.txt,加入新文件

C:\Users\Tim.AzureAD\Source\Repos\GameEngineFromScratch>gvim Framework\Common\CMakeLists.txt
add_library(Common
Allocator.cpp
BaseApplication.cpp
GraphicsManager.cpp
main.cpp
)

然后我们就可以尝试编译看看,是否可以通过。具体的编译过程在文章5已经详细叙述过了,这里就不赘述。

但是事实上我们会需要不止一种的Memory Allocator。因为我们的程序当中会使用的对象有着不同的尺寸,我们无法使用一种固定的Block Size来满足各种各样的分配尺寸需求。因为如果Block Size过小,显然无法满足需要;如果过大,则是浪费。参考引用5当中也谈到了这一点。

此外,我们某些buffer是给CPU使用,有些是给GPU使用,有些是给两者使用。有些只需要高速地读,比如贴图;有些需要高速的写,比如Rendering Target;有些则需要保证同步,比如Fence。我们需要实现这些控制。

并且,如上一篇所述,我们的各个模块将采用一种异步并行的方式执行各自的任务。近代CPU都是多核的,我们需要充分地利用这个特性,就需要多线程。但是目前我们的Allocator还不是线程安全的。线程安全的代码需要一种排他锁定的机制,但是这种机制又往往是低效和容易带来死锁问题的。我们需要平衡这些问题。

我们将在后文继续讨论这些问题并改善我们的内存管理代码。

参考引用

  1. Writing a Game Engine from Scratch – Part 2: Memory
  2. Gallery of Processor Cache Effects
  3. Game Engine Architecture
  4. Memory Management part 1 of 3: The Allocator | Ming-Lun “Allen” Chou
  5. Memory Management part 2 of 3: C-Style Interface | Ming-Lun “Allen” Chou

本作品采用知识共享署名 4.0 国际许可协议进行许可。

从零开始手敲次世代游戏引擎(十七)

到目前为止,我们学习了Windows和Linux环境下,Direct X和OpenGL两种图形API的基本编程。(Linux+DirectX的组合不存在)

本来按照计划,我们还要继续进行Vulkan的学习,以及MacOS/Android/IOS三种平台。(PS4/PSV由于NDA关系就不在这里作具体叙述了。相关的代码我会写,但是暂时不公开。今后考虑个方法单独提供给有相关开发资质的人,或是等这两个平台都过时了再开放。。。)

但是考虑到我们在这个图形支线任务已经耗费了将近十篇的功夫,可能再这么写下去很多读者要觉得无聊了。所以我们暂且将这个支线任务挂起来,回去推推主线。毕竟,老刷怪也是很无聊。

到目前为止,我们都是以比较平的方式探索了这些图形API的用法,目的是为了进行我们引擎相关模块的设计。那么,我们现在来总结一下到目前为止我们看到的。

首先,我们在文章(七)文章(八)当中探讨了Windows平台和Linux平台上的基本绘图上下文——窗口的创建。这看来是无论使用哪种图形API,都必须要走的一步。

事实上,还有两种情况我们没有考虑:

  1. 全屏绘制。恰恰对于主机,往往是没有所谓的窗口管理系统的,而是直接进行全屏绘制。这是因为主机一般是单任务系统,而且窗口管理系统也会产生很多额外的开销
  2. 无屏(Off Screen)渲染。比如PS Now这种服务,还有Nvidia的串流服务。很明显,随着云计算的进步,以及互联网的发展,在云端玩游戏这种方式已经正在走上历史舞台。这种方式下,渲染既不是发生在窗口,也不是发生在全屏,而仅仅是一块内存Buffer

根据到目前为止我们所学的,我们可以认识到一个粗略的游戏引擎工作流程如下

  1. 我们首先需要一个建立一个跨平台的模块,它能够在不同的操作系统+图形API环境当中,为我们创建这个基本的上下文。(可能是窗口,可能是全屏FrameBuffer,也可能是Off Screen Buffer)
  2. 然后,我们需要对平台的硬件能力进行查询和遍历,找到平台硬件(这里特指GPU)所能够支持的画布格式,并且将1所创建的上下文的FrameBuffer格式指定为这个格式,GPU才能够在上面作画。
  3. CPU使用平台所支持的图形API创建绘图所需要的各种Heap/Buffer/View,生成资源描述子(RootSignature或者Descriptor),将各种资源的元数据(Meta Data)填入描述子,并传递给GPU
  4. CPU根据场景描述信息进行顶点数据/索引/贴图/Shader等的加载,并将其展开在GPU所能看到的(也就是在描述子里面登记过的)Buffer当中
  5. 帧循环开始
  6. CPU读取用户输入(在之前的文章当中还未涉及),并更新用户可操作场景物体的位置和状态
  7. CPU执行游戏逻辑(包括动画、AI),并更新对应物体的位置和状态
  8. CPU进行物体的裁剪,找出需要绘制的物体(可见的物体)
  9. CPU将可见物体的位置和状态翻译成为常量,并把常量上传到GPU可见的常量缓冲区
  10. CPU生成记录GPU绘图指令的Buffer (CommandList),并记录绘图指令
  11. CPU创建Fence,以及相关的Event,进行CPU和GPU之间的同步
  12. CPU提交记录了绘图指令的Buffer(CommandList),然后等待GPU完成绘制(通过观察Fence)
  13. CPU提交绘制结果,要求显示(Flip或者Present)
  14. 帧循环结束

然后再让我们看一下我们在文章(四)当中所做的顶层设计:

1。输入管理模块,用来获取用户输入
2。策略模块,用来执行策略
3。场景管理模块,用来管理场景和更新场景
4。渲染模块,用来执行渲染和画面输出
5。音频音效模块,用来管理声音,混音和播放
6。网络通信模块,用来管理网络通信
7。文件I/O模块,用来管理资源的加载和参数的保存回复
8。内存管理模块,用来调度管理内存上的资源
9。驱动模块,用来根据时间,事件等驱动其它模块
10。辅助模块,用来执行调试,log输出等辅助功能
11。应用程序模块,用来抽象处理配置文件,特定平台的通知,创建窗口等需要与特定平台对接的部分

对着这个设计,我们来对上面的14个步骤进行一下划分:

1-2,这个应该划分到(11。应用程序模块)当中。因为根据目前我们在支线的经验,无论是DirectX,还是OpenGL,他们的平台上下文创建的部分都是一套独立的API(Direct X: Win32+DXGI; OpenGL: Xlib/XCB+OpenGL Loader)。况且这部分对平台(操作系统)的依赖性很强,标准化程度低。将其从图形渲染模块剥离出来可以让图形渲染模块有更好的平台无关性。

3-4,这个看起来应该是(4。渲染模块)的初始化(Initialize)方法当中完成的。这看起来似乎没什么问题。但是4里面是根据场景描述信息进行的资源加载,到目前为止我们都是只画了一个几何体,中间也不变化;但是实际游戏的场景是变化的:

  1. 传统的游戏是分章节(关卡)的,并且关卡都限制在一个已知的尺寸之内(因为受我们可以使用的Heap的尺寸,也就是内存的限制)。传统的游戏在关卡之间会读盘(Load),在这个期间我们可以销毁图形渲染模块并根据新的场景描述信息重新创建它
  2. 对于近年的OpenWorld游戏,则要求无缝动态加载场景。这样的话,4当中的Heap的分配以及各种资源的描述子也是动态变化的。在这种模式下,我们不能销毁渲染模块并重新创建,而是需要动态去改变这些Heap以及描述子的能力

既然我们标榜开发次世代引擎,那么显然我们应该支持第二种情况。所以,步骤4并不能放在渲染模块的Initialize当中,而是应该放在(5.帧循环)之后。

然而,如我们开篇所述,游戏属于软实时系统,虽然不是人命关天,但是一帧所需的处理时间仍然是有着十分苛刻的要求。Heap的创建,资源的加载都是十分耗时的工作,不可能在单帧当中完成。而且这些工作也不是时时刻刻需要进行的,因此放在帧循环当中也是不合适的。

那么应该将其放在什么地方呢?很显然,这应该是一个独立在我们帧循环之外的步骤,也就是说,它和我们的帧渲染应该是一个并行的关系。但是同时,我们的帧渲染会需要用到这些资源,所以两者虽然是并行的关系,但在某些方面又有着相互牵制,或者说串行(serial)的关系。

另外,我们从之前的支线还可以得到一个经验,那就是这些资源的创建本身与资源的绑定并不是一回事情。GPU是依靠一个描述子查找表(在DX12当中称为RootSignature)来访问绘图相关的资源的。这一点在高版本的OpenGL,特别是Vulkan(虽然为了换个心情我们支线还没推进到那里)当中也有着很明显的体现。

因此,首先,我们可以肯定的是,对于没有登记到描述子表当中的资源,我们可以在任何时刻对其进行加载、卸载以及改变;其次,对于已经登记到描述子表当中,但是描述子表自身还未提交给GPU(这种隐含着我们有多个描述子表)的情况下,我们也是可以修改的;最后,对于登记到GPU正在使用的描述子表当中的资源,当GPU还未使用或者已经使用完毕的时候,我们也是可以改变它的(更换一个描述子很多情况下只是更换一个地址,是一种很快的操作)。

所以显而易见地,为了实现上面这种工作方式,我们至少需要:

1. 一个能够管理所有描述子,以及其代表的资源(buffer)的模块。在我们的顶层设计当中,最适合的是(8。内存管理模块)

2.一种能够以任意速率分别驱动不同模块进行工作,大部分时间采用并行的方式,在需要时能够实现模块间协作(串行的方式)的计算模式(执行模式)。并且这种执行模式还应该方便进行调试,减少出现condition racing的可能性(9。驱动模块)

其实,以上讨论的东西,也是驱动图形API发展成今天这个样子的主因之一。之前的图形API简单易用,但是完全封装了内存管理与执行模式,导致应用程序在这方面控制力很弱。而最新的API开始将这些暴露给应用程序,使得应用程序可以根据自己的需要进行这方面的深度优化。

内存管理和执行模式也是一个操作系统的灵魂。所以这方面的设计我们可以借鉴操作系统设计方面的经验。所幸的是作为一个游戏引擎所面临的可能性相对于一个操作系统来说要特定和局限得多,因此我们不必考虑得过于复杂,而且可以做更多有针对性的优化。

下一篇我们将具体进行内存管理方面的梳理和实现。

— (EOF)–

本作品采用知识共享署名 4.0 国际许可协议进行许可。

从零开始手敲次世代游戏引擎(十六)

上一篇我们将我们的图形接口从DX11升级到了DX12。但是依然只能画一个三角形。本篇我们继续完善我们的代码,让它能够绘制一个面包圈。

因为目前我们还没有文件读写的模块,为了让代码尽量简洁,我们采用实时生成的方式生成我们的面包圈顶点数据。这会需要一些线性空间的计算,所以我们重新加入之前画立方体用过的数学库。同时,我们定义了一个叫SimpleMesh的类,用来存储我们生成的模型数据。

#include "DirectXMath.h"
#include "Mesh.h"

我们还需要申请更多的堆,用来存放我们的顶点数据。我们还将使用贴图,以及进行动画。因此我们也需要为贴图(包括采样器)和动画数据(常量)申请堆。我们首先增加保存这些配置信息的全局变量。

ComPtr<ID3D12DescriptorHeap>    g_pDsvHeap;                         // an array of descriptors of GPU objects
ComPtr<ID3D12DescriptorHeap>    g_pCbvSrvHeap;                      // an array of descriptors of GPU objects
ComPtr<ID3D12DescriptorHeap>    g_pSamplerHeap;                     // an array of descriptors of GPU objects
ComPtr<ID3D12Resource>          g_pIndexBuffer;                     // the pointer to the vertex buffer
D3D12_INDEX_BUFFER_VIEW         g_IndexBufferView;                  // a view of the vertex buffer
ComPtr<ID3D12Resource>          g_pTextureBuffer;                   // the pointer to the texture buffer
ComPtr<ID3D12Resource>          g_pDepthStencilBuffer;              // the pointer to the depth stencil buffer
ComPtr<ID3D12Resource>          g_pConstantUploadBuffer;            // the pointer to the depth stencil buffer

创建3D模型数据

我们将引入光照,并根据物体表面法线进行局部光照计算。我们首先需要修改我们的顶点数据结构,使其包括法线和切线数据。同时我们要进行贴图,因此还需要位置保存贴图坐标(uv)

一个面包圈模型实际上是由一个绕Z轴旋转的向量和一个绕Y轴旋转的向量组合而成的。因此我们用一个双重循环来生成它。

Torus – Wikipedia

struct SimpleMeshVertex
{
    XMFLOAT3    m_position;
    XMFLOAT3    m_normal;
    XMFLOAT4    m_tangent;
    XMFLOAT2    m_uv;
};

SimpleMesh torus;
void BuildTorusMesh(
                float outerRadius, float innerRadius, 
                uint16_t outerQuads, uint16_t innerQuads, 
                float outerRepeats, float innerRepeats,
                SimpleMesh* pDestMesh) 
{
    const uint32_t outerVertices = outerQuads + 1;
    const uint32_t innerVertices = innerQuads + 1;
    const uint32_t vertices = outerVertices * innerVertices;
    const uint32_t numInnerQuadsFullStripes = 1;
    const uint32_t innerQuadsLastStripe = 0;
    const uint32_t triangles = 2 * outerQuads * innerQuads; // 2 triangles per quad

    pDestMesh->m_vertexCount            = vertices;
    pDestMesh->m_vertexStride           = sizeof(SimpleMeshVertex);
    pDestMesh->m_vertexAttributeCount   = kVertexElemCount;
    pDestMesh->m_vertexBufferSize       = pDestMesh->m_vertexCount * pDestMesh->m_vertexStride;

    pDestMesh->m_indexCount             = triangles * 3;            // 3 vertices per triangle
    pDestMesh->m_indexType              = IndexSize::kIndexSize16;  // whenever possible, use smaller index 
                                                                    // to save memory and enhance cache performance.
    pDestMesh->m_primitiveType          = PrimitiveType::kPrimitiveTypeTriList;
    pDestMesh->m_indexBufferSize        = pDestMesh->m_indexCount * sizeof(uint16_t);

    // build vertices 
    pDestMesh->m_vertexBuffer = new uint8_t[pDestMesh->m_vertexBufferSize];

    SimpleMeshVertex* outV = static_cast<SimpleMeshVertex*>(pDestMesh->m_vertexBuffer);
    const XMFLOAT2 textureScale = XMFLOAT2(outerRepeats / (outerVertices - 1.0f), innerRepeats / (innerVertices - 1.0f));
    for (uint32_t o = 0; o < outerVertices; ++o)
    {
        const float outerTheta = o * 2 * XM_PI / (outerVertices - 1);
        const XMMATRIX outerToWorld = XMMatrixTranslation(outerRadius, 0, 0) * XMMatrixRotationZ(outerTheta);

        for (uint32_t i = 0; i < innerVertices; ++i)
        {
            const float innerTheta = i * 2 * XM_PI / (innerVertices - 1);
            const XMMATRIX innerToOuter = XMMatrixTranslation(innerRadius, 0, 0) * XMMatrixRotationY(innerTheta);
            const XMMATRIX localToWorld = innerToOuter * outerToWorld;
            XMVECTOR v = XMVectorSet(0.0f, 0.0f, 0.0f, 1.0f);
            v = XMVector4Transform(v, localToWorld);
            XMStoreFloat3(&outV->m_position, v);
            v = XMVectorSet(1.0f, 0.0f, 0.0f, 0.0f);
            v = XMVector4Transform(v, localToWorld);
            XMStoreFloat3(&outV->m_normal, v);
            v = XMVectorSet(0.0f, 1.0f, 0.0f, 0.0f);
            v = XMVector4Transform(v, localToWorld);
            XMStoreFloat4(&outV->m_tangent, v);
            outV->m_uv.x = o * textureScale.x;
            outV->m_uv.y = i * textureScale.y;
            ++outV;
        }
    }

    // build indices
    pDestMesh->m_indexBuffer = new uint8_t[pDestMesh->m_indexBufferSize];

    uint16_t* outI = static_cast<uint16_t*>(pDestMesh->m_indexBuffer);
    uint16_t const numInnerQuadsStripes = numInnerQuadsFullStripes + (innerQuadsLastStripe > 0 ? 1 : 0);
    for (uint16_t iStripe = 0; iStripe < numInnerQuadsStripes; ++iStripe)
    {
        uint16_t const innerVertex0 = iStripe * innerQuads;

        for (uint16_t o = 0; o < outerQuads; ++o)
        {
            for (uint16_t i = 0; i < innerQuads; ++i)
            {
                const uint16_t index[4] = {
                    static_cast<uint16_t>((o + 0) * innerVertices + innerVertex0 + (i + 0)),
                    static_cast<uint16_t>((o + 0) * innerVertices + innerVertex0 + (i + 1)),
                    static_cast<uint16_t>((o + 1) * innerVertices + innerVertex0 + (i + 0)),
                    static_cast<uint16_t>((o + 1) * innerVertices + innerVertex0 + (i + 1)),
                };
                outI[0] = index[0];
                outI[1] = index[2];
                outI[2] = index[1];
                outI[3] = index[1];
                outI[4] = index[2];
                outI[5] = index[3];
                outI += 6;
            }
        }
    }
}

接下来,我们需要通知GPU我们全新的顶点数据结构。这是靠更新相关的DESCRIPTOR来实现的。

// create the input layout object
    D3D12_INPUT_ELEMENT_DESC ied[] =
    {
        {"POSITION", 0, DXGI_FORMAT_R32G32B32_FLOAT, 0, 0, D3D12_INPUT_CLASSIFICATION_PER_VERTEX_DATA, 0},
        {"NORMAL", 0, DXGI_FORMAT_R32G32B32_FLOAT, 0, 12, D3D12_INPUT_CLASSIFICATION_PER_VERTEX_DATA, 0},
        {"TANGENT", 0, DXGI_FORMAT_R32G32B32A32_FLOAT, 0, 24, D3D12_INPUT_CLASSIFICATION_PER_VERTEX_DATA, 0},
        {"TEXCOORD", 0, DXGI_FORMAT_R32G32_FLOAT, 0, 40, D3D12_INPUT_CLASSIFICATION_PER_VERTEX_DATA, 0},
    };

我们还需要在我们申请的堆当中分出一块缓冲区,用来将顶点索引数据传给GPU。在DX12当中,这些缓冲区的创建过程都是极为类似的。

// create index buffer
    {
       ThrowIfFailed(g_pDev->CreateCommittedResource(
           &CD3DX12_HEAP_PROPERTIES(D3D12_HEAP_TYPE_DEFAULT),
           D3D12_HEAP_FLAG_NONE,
           &CD3DX12_RESOURCE_DESC::Buffer(torus.m_indexBufferSize),
           D3D12_RESOURCE_STATE_COPY_DEST,
           nullptr,
           IID_PPV_ARGS(&g_pIndexBuffer)));
    
       ThrowIfFailed(g_pDev->CreateCommittedResource(
           &CD3DX12_HEAP_PROPERTIES(D3D12_HEAP_TYPE_UPLOAD),
           D3D12_HEAP_FLAG_NONE,
           &CD3DX12_RESOURCE_DESC::Buffer(torus.m_indexBufferSize),
           D3D12_RESOURCE_STATE_GENERIC_READ,
           nullptr,
           IID_PPV_ARGS(&pIndexBufferUploadHeap)));
    
       // Copy data to the intermediate upload heap and then schedule a copy 
       // from the upload heap to the vertex buffer.
       D3D12_SUBRESOURCE_DATA indexData = {};
       indexData.pData      = torus.m_indexBuffer;
       indexData.RowPitch   = torus.m_indexType;
       indexData.SlicePitch = indexData.RowPitch;
    
       UpdateSubresources<1>(g_pCommandList.Get(), g_pIndexBuffer.Get(), pIndexBufferUploadHeap.Get(), 0, 0, 1, &indexData);
       g_pCommandList->ResourceBarrier(1, 
                       &CD3DX12_RESOURCE_BARRIER::Transition(g_pIndexBuffer.Get(),
                               D3D12_RESOURCE_STATE_COPY_DEST,
                               D3D12_RESOURCE_STATE_INDEX_BUFFER));
    
       // initialize the vertex buffer view
       g_IndexBufferView.BufferLocation = g_pIndexBuffer->GetGPUVirtualAddress();
       g_IndexBufferView.Format         = DXGI_FORMAT_R16_UINT;
       g_IndexBufferView.SizeInBytes    = torus.m_indexBufferSize;
    }

最后,我们需要在录制绘图指令的时候,告诉GPU我们的顶点索引数据缓存区的地址。

g_pCommandList->IASetIndexBuffer(&g_IndexBufferView);

创建常量

为了让我们的模型动起来,我们需要创建常量,也就是MVP矩阵,以及光照数据。我们首先定义常量的结构。

struct SimpleConstants
{
    XMFLOAT4X4  m_modelView;
    XMFLOAT4X4  m_modelViewProjection;
    XMFLOAT4    m_lightPosition;
    XMFLOAT4    m_lightColor;
    XMFLOAT4    m_ambientColor;
    XMFLOAT4    m_lightAttenuation;
};

uint8_t*    g_pCbvDataBegin = nullptr;
SimpleConstants g_ConstantBufferData;

然后我们定义一些全局变量,用来记住计算MVP所需要的几个重要的变换矩阵。并且写一个初始化函数,对其进行初始化。

XMMATRIX g_mWorldToViewMatrix;
XMMATRIX g_mViewToWorldMatrix;
XMMATRIX g_mLightToWorldMatrix;
XMMATRIX g_mProjectionMatrix;
XMMATRIX g_mViewProjectionMatrix;

void InitConstants() {
    g_mViewToWorldMatrix = XMMatrixIdentity();
    const XMVECTOR lightPositionX = XMVectorSet(-1.5f, 4.0f, 9.0f, 1.0f);
    const XMVECTOR lightTargetX   = XMVectorSet( 0.0f, 0.0f, 0.0f, 1.0f);
    const XMVECTOR lightUpX       = XMVectorSet( 0.0f, 1.0f, 0.0f, 0.0f);
    g_mLightToWorldMatrix = XMMatrixInverse(nullptr, XMMatrixLookAtRH(lightPositionX, lightTargetX, lightUpX));

    const float g_depthNear = 1.0f;
    const float g_depthFar  = 100.0f;
    const float aspect      = static_cast<float>(nScreenWidth)/static_cast<float>(nScreenHeight);
    g_mProjectionMatrix  = XMMatrixPerspectiveOffCenterRH(-aspect, aspect, -1, 1, g_depthNear, g_depthFar);
    const XMVECTOR eyePos         = XMVectorSet( 0.0f, 0.0f, 2.5f, 1.0f);
    const XMVECTOR lookAtPos      = XMVectorSet( 0.0f, 0.0f, 0.0f, 1.0f);
    const XMVECTOR upVec          = XMVectorSet( 0.0f, 1.0f, 0.0f, 0.0f);
    g_mWorldToViewMatrix = XMMatrixLookAtRH(eyePos, lookAtPos, upVec);
    g_mViewToWorldMatrix = XMMatrixInverse(nullptr, g_mWorldToViewMatrix);

    g_mViewProjectionMatrix = g_mWorldToViewMatrix * g_mProjectionMatrix;
}

我们必须逐帧更新这些常量,也就是改变模型的位置,从而产生动画效果。

// this is the function used to update the constants
void Update()
{
    const float rotationSpeed = XM_PI * 2.0 / 120;
    static float rotationAngle = 0.0f;
    
    rotationAngle += rotationSpeed;
    if (rotationAngle >= XM_PI * 2.0) rotationAngle -= XM_PI * 2.0;
    const XMMATRIX m = XMMatrixRotationRollPitchYaw(rotationAngle, rotationAngle, 0.0f);
    XMStoreFloat4x4(&g_ConstantBufferData.m_modelView, XMMatrixTranspose(m * g_mWorldToViewMatrix));
    XMStoreFloat4x4(&g_ConstantBufferData.m_modelViewProjection, XMMatrixTranspose(m * g_mViewProjectionMatrix));
    XMVECTOR v = XMVectorSet(0.0f, 0.0f, 0.0f, 1.0f);
    v = XMVector4Transform(v, g_mLightToWorldMatrix);
    v = XMVector4Transform(v, g_mWorldToViewMatrix);
    XMStoreFloat4(&g_ConstantBufferData.m_lightPosition, v);
    g_ConstantBufferData.m_lightColor       = XMFLOAT4(1.0f, 1.0f, 1.0f, 1.0f);
    g_ConstantBufferData.m_ambientColor     = XMFLOAT4(0.0f, 0.0f, 0.7f, 1.0f);
    g_ConstantBufferData.m_lightAttenuation = XMFLOAT4(1.0f, 0.0f, 0.0f, 0.0f);
    
    memcpy(g_pCbvDataBegin, &g_ConstantBufferData, sizeof(g_ConstantBufferData));
}

同样的,我们需要创建一块CPU和GPU都能看到的缓冲区,用来向GPU传递这个常量。

	// Create the constant buffer.
	{
        size_t sizeConstantBuffer = (sizeof(SimpleConstants) + 255) & ~255; // CB size is required to be 256-byte aligned.
		ThrowIfFailed(g_pDev->CreateCommittedResource(
			&CD3DX12_HEAP_PROPERTIES(D3D12_HEAP_TYPE_UPLOAD),
			D3D12_HEAP_FLAG_NONE,
			&CD3DX12_RESOURCE_DESC::Buffer(sizeConstantBuffer),
			D3D12_RESOURCE_STATE_GENERIC_READ,
			nullptr,
			IID_PPV_ARGS(&g_pConstantUploadBuffer)));

        for (uint32_t i = 0; i < nFrameCount; i++)
        {
            CD3DX12_CPU_DESCRIPTOR_HANDLE cbvHandle(g_pCbvSrvHeap->GetCPUDescriptorHandleForHeapStart(), i + 1, g_nCbvSrvDescriptorSize);
		    // Describe and create a constant buffer view.
		    D3D12_CONSTANT_BUFFER_VIEW_DESC cbvDesc = {};
		    cbvDesc.BufferLocation = g_pConstantUploadBuffer->GetGPUVirtualAddress();
		    cbvDesc.SizeInBytes = sizeConstantBuffer;
		    g_pDev->CreateConstantBufferView(&cbvDesc, cbvHandle);
        }

		// Map and initialize the constant buffer. We don't unmap this until the
		// app closes. Keeping things mapped for the lifetime of the resource is okay.
		CD3DX12_RANGE readRange(0, 0);		// We do not intend to read from this resource on the CPU.
		ThrowIfFailed(g_pConstantUploadBuffer->Map(0, &readRange, reinterpret_cast<void**>(&g_pCbvDataBegin)));
	}
// 1 SRV + how many CBVs we have
    uint32_t nFrameResourceDescriptorOffset = 1 + g_nFrameIndex;
    CD3DX12_GPU_DESCRIPTOR_HANDLE cbvSrvHandle(g_pCbvSrvHeap->GetGPUDescriptorHandleForHeapStart(), nFrameResourceDescriptorOffset, g_nCbvSrvDescriptorSize);

    g_pCommandList->SetGraphicsRootDescriptorTable(2, cbvSrvHandle);

创建贴图

接下来让我们创建贴图。我们这里创建一个512×512的国际象棋棋盘格的基本贴图。

const uint32_t nTextureWidth = 512;
const uint32_t nTextureHeight = 512;
const uint32_t nTexturePixelSize = 4;       // R8G8B8A8

// Generate a simple black and white checkerboard texture.
uint8_t* GenerateTextureData()
{
    const uint32_t nRowPitch = nTextureWidth * nTexturePixelSize;
    const uint32_t nCellPitch = nRowPitch >> 3;		// The width of a cell in the checkboard texture.
    const uint32_t nCellHeight = nTextureWidth >> 3;	// The height of a cell in the checkerboard texture.
    const uint32_t nTextureSize = nRowPitch * nTextureHeight;
	uint8_t* pData = new uint8_t[nTextureSize];

	for (uint32_t n = 0; n < nTextureSize; n += nTexturePixelSize)
	{
		uint32_t x = n % nRowPitch;
		uint32_t y = n / nRowPitch;
		uint32_t i = x / nCellPitch;
		uint32_t j = y / nCellHeight;

		if (i % 2 == j % 2)
		{
			pData[n] = 0x00;		// R
			pData[n + 1] = 0x00;	// G
			pData[n + 2] = 0x00;	// B
			pData[n + 3] = 0xff;	// A
		}
		else
		{
			pData[n] = 0xff;		// R
			pData[n + 1] = 0xff;	// G
			pData[n + 2] = 0xff;	// B
			pData[n + 3] = 0xff;	// A
		}
	}

	return pData;
}

然后依然是在堆上创建缓冲区,用来传递贴图给GPU。

// Generate the texture
    uint8_t* pTextureData = GenerateTextureData();

    // Create the texture and sampler.
    {
        // Describe and create a Texture2D.
        D3D12_RESOURCE_DESC textureDesc = {};
        textureDesc.MipLevels = 1;
        textureDesc.Format = DXGI_FORMAT_R8G8B8A8_UNORM;
        textureDesc.Width = nTextureWidth;
        textureDesc.Height = nTextureHeight;
        textureDesc.Flags = D3D12_RESOURCE_FLAG_NONE;
        textureDesc.DepthOrArraySize = 1;
        textureDesc.SampleDesc.Count = 1;
        textureDesc.SampleDesc.Quality = 0;
        textureDesc.Dimension = D3D12_RESOURCE_DIMENSION_TEXTURE2D;

        ThrowIfFailed(g_pDev->CreateCommittedResource(
            &CD3DX12_HEAP_PROPERTIES(D3D12_HEAP_TYPE_DEFAULT),
            D3D12_HEAP_FLAG_NONE,
            &textureDesc,
            D3D12_RESOURCE_STATE_COPY_DEST,
            nullptr,
            IID_PPV_ARGS(&g_pTextureBuffer)));

        const UINT subresourceCount = textureDesc.DepthOrArraySize * textureDesc.MipLevels;
        const UINT64 uploadBufferSize = GetRequiredIntermediateSize(g_pTextureBuffer.Get(), 0, subresourceCount);

        ThrowIfFailed(g_pDev->CreateCommittedResource(
            &CD3DX12_HEAP_PROPERTIES(D3D12_HEAP_TYPE_UPLOAD),
            D3D12_HEAP_FLAG_NONE,
            &CD3DX12_RESOURCE_DESC::Buffer(uploadBufferSize),
            D3D12_RESOURCE_STATE_GENERIC_READ,
            nullptr,
            IID_PPV_ARGS(&pTextureUploadHeap)));

        // Copy data to the intermediate upload heap and then schedule a copy 
        // from the upload heap to the Texture2D.
        D3D12_SUBRESOURCE_DATA textureData = {};
        textureData.pData = pTextureData;
        textureData.RowPitch = nTextureWidth * nTexturePixelSize;
        textureData.SlicePitch = textureData.RowPitch * nTextureHeight;

        UpdateSubresources(g_pCommandList.Get(), g_pTextureBuffer.Get(), pTextureUploadHeap.Get(), 0, 0, subresourceCount, &textureData);
        g_pCommandList->ResourceBarrier(1, &CD3DX12_RESOURCE_BARRIER::Transition(g_pTextureBuffer.Get(), D3D12_RESOURCE_STATE_COPY_DEST, D3D12_RESOURCE_STATE_PIXEL_SHADER_RESOURCE));

        // Describe and create a sampler.
        D3D12_SAMPLER_DESC samplerDesc = {};
        samplerDesc.Filter = D3D12_FILTER_MIN_MAG_MIP_LINEAR;
        samplerDesc.AddressU = D3D12_TEXTURE_ADDRESS_MODE_WRAP;
        samplerDesc.AddressV = D3D12_TEXTURE_ADDRESS_MODE_WRAP;
        samplerDesc.AddressW = D3D12_TEXTURE_ADDRESS_MODE_WRAP;
        samplerDesc.MinLOD = 0;
        samplerDesc.MaxLOD = D3D12_FLOAT32_MAX;
        samplerDesc.MipLODBias = 0.0f;
        samplerDesc.MaxAnisotropy = 1;
        samplerDesc.ComparisonFunc = D3D12_COMPARISON_FUNC_ALWAYS;
        g_pDev->CreateSampler(&samplerDesc, g_pSamplerHeap->GetCPUDescriptorHandleForHeapStart());

        // Describe and create a SRV for the texture.
        D3D12_SHADER_RESOURCE_VIEW_DESC srvDesc = {};
        srvDesc.Shader4ComponentMapping = D3D12_DEFAULT_SHADER_4_COMPONENT_MAPPING;
        srvDesc.Format = DXGI_FORMAT_R8G8B8A8_UNORM;
        srvDesc.ViewDimension = D3D12_SRV_DIMENSION_TEXTURE2D;
        srvDesc.Texture2D.MipLevels = 1;
        CD3DX12_CPU_DESCRIPTOR_HANDLE srvHandle(g_pCbvSrvHeap->GetCPUDescriptorHandleForHeapStart());
        g_pDev->CreateShaderResourceView(g_pTextureBuffer.Get(), &srvDesc, srvHandle);
    }

更新Root Signature

Root Signature是DX12新增的概念。其实就好比我们代码头文件里面的函数申明,是向DX介绍我们资源的整体结构。因为我们增加了贴图和常量这两种类型的新资源,我们需要更新它。

  CD3DX12_DESCRIPTOR_RANGE1 ranges[3];
        ranges[0].Init(D3D12_DESCRIPTOR_RANGE_TYPE_SRV, 1, 0, 0, D3D12_DESCRIPTOR_RANGE_FLAG_DATA_STATIC);
        ranges[1].Init(D3D12_DESCRIPTOR_RANGE_TYPE_SAMPLER, 1, 0);
        ranges[2].Init(D3D12_DESCRIPTOR_RANGE_TYPE_CBV, 6, 0, 0, D3D12_DESCRIPTOR_RANGE_FLAG_DATA_STATIC);

        CD3DX12_ROOT_PARAMETER1 rootParameters[3];
        rootParameters[0].InitAsDescriptorTable(1, &ranges[0], D3D12_SHADER_VISIBILITY_PIXEL);
        rootParameters[1].InitAsDescriptorTable(1, &ranges[1], D3D12_SHADER_VISIBILITY_PIXEL);
        rootParameters[2].InitAsDescriptorTable(1, &ranges[2], D3D12_SHADER_VISIBILITY_ALL);

为新增资源类型创建Heap

所有的图形资源缓冲区都需要在堆里面创建。每种资源对于堆的管理(包括内存读写保护,可执行标志,以及访问它的GPU模块)的要求不同,我们需要分别为他们申请堆。(贴图和常量是可以共用一个堆的)

// Describe and create a depth stencil view (DSV) descriptor heap.
    D3D12_DESCRIPTOR_HEAP_DESC dsvHeapDesc = {};
    dsvHeapDesc.NumDescriptors = 1;
    dsvHeapDesc.Type = D3D12_DESCRIPTOR_HEAP_TYPE_DSV;
    dsvHeapDesc.Flags = D3D12_DESCRIPTOR_HEAP_FLAG_NONE;
    ThrowIfFailed(g_pDev->CreateDescriptorHeap(&dsvHeapDesc, IID_PPV_ARGS(&g_pDsvHeap)));

    // Describe and create a shader resource view (SRV) and constant 
    // buffer view (CBV) descriptor heap.
    D3D12_DESCRIPTOR_HEAP_DESC cbvSrvHeapDesc = {};
    cbvSrvHeapDesc.NumDescriptors =
        nFrameCount                                     // FrameCount Cbvs.
        + 1;                                            // + 1 for the Srv(Texture).
    cbvSrvHeapDesc.Type = D3D12_DESCRIPTOR_HEAP_TYPE_CBV_SRV_UAV;
    cbvSrvHeapDesc.Flags = D3D12_DESCRIPTOR_HEAP_FLAG_SHADER_VISIBLE;
    ThrowIfFailed(g_pDev->CreateDescriptorHeap(&cbvSrvHeapDesc, IID_PPV_ARGS(&g_pCbvSrvHeap)));

    // Describe and create a sampler descriptor heap.
    D3D12_DESCRIPTOR_HEAP_DESC samplerHeapDesc = {};
    samplerHeapDesc.NumDescriptors = 1;
    samplerHeapDesc.Type = D3D12_DESCRIPTOR_HEAP_TYPE_SAMPLER;
    samplerHeapDesc.Flags = D3D12_DESCRIPTOR_HEAP_FLAG_SHADER_VISIBLE;
    ThrowIfFailed(g_pDev->CreateDescriptorHeap(&samplerHeapDesc, IID_PPV_ARGS(&g_pSamplerHeap)));

    g_nCbvSrvDescriptorSize = g_pDev->GetDescriptorHandleIncrementSize(D3D12_DESCRIPTOR_HEAP_TYPE_CBV_SRV_UAV);

    ThrowIfFailed(g_pDev->CreateCommandAllocator(D3D12_COMMAND_LIST_TYPE_DIRECT, IID_PPV_ARGS(&g_pCommandAllocator)));

创建深度/模板缓冲区

我们需要打开深度测试来完成图形的隐藏面消隐

	// Create the depth stencil view.
	{
		D3D12_DEPTH_STENCIL_VIEW_DESC depthStencilDesc = {};
		depthStencilDesc.Format = DXGI_FORMAT_D32_FLOAT;
		depthStencilDesc.ViewDimension = D3D12_DSV_DIMENSION_TEXTURE2D;
		depthStencilDesc.Flags = D3D12_DSV_FLAG_NONE;

		D3D12_CLEAR_VALUE depthOptimizedClearValue = {};
		depthOptimizedClearValue.Format = DXGI_FORMAT_D32_FLOAT;
		depthOptimizedClearValue.DepthStencil.Depth = 1.0f;
		depthOptimizedClearValue.DepthStencil.Stencil = 0;

		ThrowIfFailed(g_pDev->CreateCommittedResource(
			&CD3DX12_HEAP_PROPERTIES(D3D12_HEAP_TYPE_DEFAULT),
			D3D12_HEAP_FLAG_NONE,
			&CD3DX12_RESOURCE_DESC::Tex2D(DXGI_FORMAT_D32_FLOAT, nScreenWidth, nScreenHeight, 1, 0, 1, 0, D3D12_RESOURCE_FLAG_ALLOW_DEPTH_STENCIL),
			D3D12_RESOURCE_STATE_DEPTH_WRITE,
			&depthOptimizedClearValue,
			IID_PPV_ARGS(&g_pDepthStencilBuffer)
			));

		g_pDev->CreateDepthStencilView(g_pDepthStencilBuffer.Get(), &depthStencilDesc, g_pDsvHeap->GetCPUDescriptorHandleForHeapStart());
	}
CD3DX12_CPU_DESCRIPTOR_HANDLE rtvHandle(g_pRtvHeap->GetCPUDescriptorHandleForHeapStart(), g_nFrameIndex, g_nRtvDescriptorSize);
    CD3DX12_CPU_DESCRIPTOR_HANDLE dsvHandle(g_pDsvHeap->GetCPUDescriptorHandleForHeapStart());
    g_pCommandList->OMSetRenderTargets(1, &rtvHandle, FALSE, &dsvHandle);
g_pCommandList->ClearDepthStencilView(g_pDsvHeap->GetCPUDescriptorHandleForHeapStart(), D3D12_CLEAR_FLAG_DEPTH, 1.0f, 0, 0, nullptr);

绘图

最后,更新我们的绘图指令,让其根据索引顺序进行绘制,而不是顶点缓冲区当中的顶点顺序。

  g_pCommandList->DrawIndexedInstanced(torus.m_indexCount, 1, 0, 0, 0);

Shader

由于我们改变了顶点数据结构,并且引进了光照,我们需要修改我们的Shader。为了方便,我们可以为VS Shader和PS Shader定义不同的入口函数(而不是缺省的main),这样就可以把两个Shader的代码放在一个文件当中。

让我们新创建一个simple.hlsl的文件,输入如下代码:

#include "cbuffer2.h"
#include "vsoutput2.hs"
#include "illum.hs"

v2p VSMain(a2v input) {
    v2p output;

	output.Position = mul(float4(input.Position.xyz, 1), m_modelViewProjection);
	float3 vN = normalize(mul(float4(input.Normal, 0), m_modelView).xyz);
	float3 vT = normalize(mul(float4(input.Tangent.xyz, 0), m_modelView).xyz);
	output.vPosInView = mul(float4(input.Position.xyz, 1), m_modelView).xyz;

	output.vNorm = vN;
	output.vTang = float4(vT, input.Tangent.w);

	output.TextureUV = input.TextureUV;

	return output;
}

SamplerState samp0 : register(s0);
Texture2D colorMap : register(t0);

float4 PSMain(v2p input) : SV_TARGET
{
	float3 lightRgb = m_lightColor.xyz;
	float4 lightAtten = m_lightAttenuation;
	float3 ambientRgb = m_ambientColor.rgb;
	float  specPow = 30;

	const float3 vN = input.vNorm;
	const float3 vT = input.vTang.xyz;
	const float3 vB = input.vTang.w * cross(vN, vT);
	float3 vL = m_lightPosition.xyz - input.vPosInView;
	const float3 vV = normalize(float3(0,0,0) - input.vPosInView);
	float d = length(vL); vL = normalize(vL);
	float attenuation = saturate(1.0f/(lightAtten.x + lightAtten.y * d + lightAtten.z * d * d) - lightAtten.w);

	float4 normalGloss = { 1.0f, 0.2f, 0.2f, 0.0f };
	normalGloss.xyz = normalGloss.xyz * 2.0f - 1.0f;
	normalGloss.y = -normalGloss.y; // normal map has green channel inverted

	float3 vBumpNorm = normalize(normalGloss.x * vT + normalGloss.y * vB + normalGloss.z * vN);
	float3 vGeomNorm = normalize(vN);

	float3 diff_col = colorMap.Sample(samp0, input.TextureUV.xy).xyz;
	float3 spec_col = 0.4 * normalGloss.w + 0.1;
	float3 vLightInts = attenuation * lightRgb * BRDF2_ts_nphong_nofr(vBumpNorm, vGeomNorm, vL, vV, diff_col, spec_col, specPow);
	vLightInts += (diff_col * ambientRgb);

	return float4(vLightInts, 1);
}

因为我们修改了入口函数,我们需要通知DX这个修改。具体是在我们代码当中动态Compile Shader的地方:

  D3DCompileFromFile(
        L"simple.hlsl",
        nullptr,
        D3D_COMPILE_STANDARD_FILE_INCLUDE,
        "VSMain",
        "vs_5_0",
        compileFlags,
        0,
        &vertexShader,
        &error);
    if (error) { OutputDebugString((LPCTSTR)error->GetBufferPointer()); error->Release(); throw std::exception(); }

    D3DCompileFromFile(
        L"simple.hlsl",
        nullptr,
        D3D_COMPILE_STANDARD_FILE_INCLUDE,
        "PSMain",
        "ps_5_0",
        compileFlags,
        0,
        &pixelShader,
        &error);
    if (error) { OutputDebugString((LPCTSTR)error->GetBufferPointer()); error->Release(); throw std::exception(); }

如果Shader的代码编译有问题,编译错误会通过OutputDebugString进行输出。这个输出在用Visual Studio调试的时候可以看到。

编译

编译命令如下:

调试版

D:\wenli\Source\Repos\GameEngineFromScratch\Platform\Windows>cl /EHsc /Debug /Zi /D_DEBUG_SHADER -I./DirectXMath helloengine_d3d12.cpp user32.lib d3d12.lib d3dcompiler.lib dxgi.lib

执行版

D:\wenli\Source\Repos\GameEngineFromScratch\Platform\Windows>cl /EHsc -I./DirectXMath helloengine_d3d12.cpp user32.lib d3d12.lib d3dcompiler.lib dxgi.lib

(本文完整代码在GitHub的branch article_16当中)

参考引用

  1. Torus – Wikipedia
  2. Microsoft/DirectX-Graphics-Samples

本作品采用知识共享署名 4.0 国际许可协议进行许可。

从零开始手敲次世代游戏引擎(十五)

上一篇我们在Windows上用OpenGL 4.0版本的接口实现了一个旋转的立方体(当然,实现这个功能实际上并没有使用什么4.0级别的特性。图形API规格和功能级别是两个概念)

到此为止我们应该对于一个基本的画面绘制流程有一定的了解了。虽然我们还没有涉及到诸如贴图光照,以及曲面细分,异步计算等概念,但是作为整个渲染管道的骨架已经基本搭建好了。

接下来让我们一起看看图形API的最前沿,DirectX 12和Vulkan。因为我们要写的是次世代引擎,我们需要考虑使用这些最新的图形API。

我们之所以没有从最新的API开始,因为我们有个学习的过程。图形API不是一步发展到今天这个样子的,我们需要了解这个过程。这不仅能缓和我们的学习曲线,更能让我们看清发展的走向。而能否正确预测这个走向,是评价一个架构好坏的重要基准之一。

DX12我参考的主要资料是【*1】Tutorial: Migrating Your Apps to DirectX* 12 – Part 1

— (题外话开始)–

很意外地,看到了Lv Wenwei的名字。他是蜗牛的技术总监,我2014年刚刚进入SIE的第一个任务,就是到苏州去支持他们移植开发《九阳神功》。虽然那个时候我其实刚进公司,对PS4开发基本一无所知。

吕老师谈吐很儒雅,技术看上去也相当不错。3年前的故事最终是以我把SIE日本的台湾人老师(也是我在SIE的导师)请来救场结束。

— (题外话结束)–

我们开始编码。我是一边看着【*2】Creating a basic Direct3D 12 component 一边升级我的代码的。

由于我们的代码已经比较长了。从本篇起我将不再贴出所有的代码。完整的代码请到GitHub上面去下载。本篇对应的branch为article_15。

首先我们是替换头文件。

// include the basic windows header file
 #include <windows.h>
 #include <windowsx.h>
+#include <stdio.h>
 #include <tchar.h>
 #include <stdint.h>

-#include <d3d11.h>
-#include <d3d11_1.h>
+#include <d3d12.h>
+#include "d3dx12.h"
+#include <DXGI1_4.h>
 #include <d3dcompiler.h>
 #include <DirectXMath.h>
 #include <DirectXPackedVector.h>
 #include <DirectXColors.h>

+#include <wrl/client.h>
+
+#include <string>
+#include <exception>
+

去掉了DX11的头文件,加入DX12的头文件。

从DirectX 10开始,微软导入了DXGI的概念。【*3】

DXGI与DX的关系有点类似OpenGL的Loader与OpenGL的关系;前者是创建绘图的上下文,后者是进行实际的绘图。

d3dx12.h 是一个工具头文件。它并不是DX12的一部分。微软是通过GitHub来提供这个文件的。这个文件主要是为了方便我们使用DX12,让代码看起来简洁一些。我在article_15的branch里面也提供了这个文件的拷贝。

#include语句当中文件名两边是<>还是””的秘密是:如果是系统或者sdk的头文件,就是<>;如果是放在我们项目当中的文件,就是””。这虽然是一些细节,但是不正确使用有的时候是会出现一些奇奇怪怪的问题。

下面的wrl/client.h是WRL的一部分。WRL是Windows Runtime Library的简称。这也是一个辅助性质的库,提供了一些符合design pattern的模板。我们这里主要是使用一个名为ComPtr的模板,用来比较智能地管理COM的Interface。

没错,又是COM。其实前面介绍的OpenGL那种运行时查询并绑定API入口的方式就和COM颇为类似。COM的中心思想就是每个模块都有一个众所周知的接口:IUnknown。这个接口支持一个Qurey方法,来查找其它的接口。这样就可以实现运行时的入口查找和调用。

接下来是一个C++异常陷阱。COM规范规定,所有的COM调用的返回值都是一个HRESULT类型的值,用来报告调用是否成功。我们前面的代码是在每次调用后检查这个返回值,如果失败进行相关log输出之后返回或者中断执行。这种写法的好处是代码可移植性高,缺点是代码里面插入了很多和原本要做的事情无关的代码,简洁性变差。我们这里参考微软官方的例子采用抛c++异常的方式处理这个返回值检查。但是需要注意的是c++异常的可移植性是不太好的。不过这里的代码本来就是平台专用代码,再加上是我们的支线任务,主要是用来打怪升级并探地图的,所以我们就这么用。

然后是全局变量的定义。我们现在因为是在探路,采用最为直观的“平的”代码方式,就是基本上是C的写法,不进行类的封装。等我们确定图形模块的划分之后,这些变量大部分都是要放到类里面去的。而另外一些则作为启动参数允许配置:如分辨率,色深等。

// global declarations
+const uint32_t nFrameCount     = 2;
+const bool     bUseWarpDevice = true;
+D3D12_VIEWPORT                  g_ViewPort = {0.0f, 0.0f,
+                                        static_cast(nScreenWidth),
+                                        static_cast(nScreenHeight)};   // viewport structure
+D3D12_RECT                      g_ScissorRect = {0, 0,
+                
-IDXGISwapChain          *g_pSwapchain = nullptr;              // the pointer to the swap chain interface
+ComPtr<IDXGISwapChain3>         g_pSwapChain = nullptr;             // the pointer to the swap chain interface
-ID3D11Device            *g_pDev       = nullptr;   
          // the pointer to our Direct3D device interface
+ComPtr<ID3D12Device>            g_pDev       = nullptr;             // the pointer to our Direct3D device interface
-ID3D11DeviceContext     *g_pDevcon    = nullptr;              // the pointer to our Direct3D device context
-
-ID3D11RenderTargetView  *g_pRTView    = nullptr;
+ComPtr<ID3D12Resource>          g_pRenderTargets[nFrameCount];      // the pointer to rendering buffer. [descriptor]
+uint32_t    g_nRtvDescriptorSize;
-ID3D11InputLayout       *g_pLayout    = nullptr;              // the pointer to the input layout
-ID3D11VertexShader      *g_pVS        = nullptr;              // the pointer to the vertex shader
-ID3D11PixelShader       *g_pPS        = nullptr;              // the pointer to the pixel shader

+ComPtr<ID3D12CommandAllocator>  g_pCommandAllocator;                // the pointer to command buffer allocator
+ComPtr<ID3D12CommandQueue>      g_pCommandQueue;                    // the pointer to command queue
+ComPtr<ID3D12RootSignature>     g_pRootSignature;                   // a graphics root signature defines what resources are bound to the pipeline
+ComPtr<ID3D12DescriptorHeap>    g_pRtvHeap;                         // an array of descriptors of GPU objects
+ComPtr<ID3D12PipelineState>     g_pPipelineState;                   // an object maintains the state of all currently set shaders
+                                                                    // and certain fixed function state objects
+                                                                    // such as the input assembler, tesselator, rasterizer and output manager
+ComPtr<ID3D12GraphicsCommandList>   g_pCommandList;                 // a list to store GPU commands, which will be submitted to GPU to execute when done
+
+
-ID3D11Buffer            *g_pVBuffer   = nullptr;              // Vertex Buffer
+ComPtr<ID3D12Resource>          g_pVertexBuffer;                         // the pointer to the vertex buffer
+D3D12_VERTEX_BUFFER_VIEW        g_VertexBufferView;                 // a view of the vertex buffer

可以看到最大的变化是DX12将GPU的命令队列暴露了出来,并且不再自动进行CPU与GPU之间的同步。

之前的DX版本对我们来说只有CPU一条时间线,所有API调用看起来是同步的。然而在DX12当中,现在多了一条GPU的时间线,大部分绘图API也从同步变成了“录制”,即,仅仅是在GPU命令队列当中生成一些指令。这个包含了指令的队列,什么时候提交给GPU进行处理,需要我们自己进行控制。所以多了好几个用于同步CPU和GPU的变量:

+// Synchronization objects
+uint32_t            g_nFrameIndex;
+HANDLE              g_hFenceEvent;
+ComPtr<ID3D12Fence> g_pFence;
+uint32_t            g_nFenceValue;

另外一个变化就是在创建和管理方面,不再细分GPU在执行绘图指令的时候会参照的各种资源,比如顶点缓冲区,RenderingTarget等。这些统统交由一个成为Resource的接口去处理。这是因为我们已经很接近显卡驱动了。在那么低的层面,这些东西统统是buffer,没啥太大区别。

在Shader的加载方面,变化不大。但是作为演示,本次我们采用运行时编译的方式。

Layout的概念取消掉了,取而代之的是Pilpeline State Object,用以将GPU的各个功能模块串联起来形成一个渲染流水线。

因为绘图API变成了异步录制执行方式,我们需要确保这些资源在gpu实际完成绘图之前可用。在这个例子当中,我们强制GPU首先完成这些指令的执行,并通过Fence实现和GPU的同步。

// this is the function that loads and prepares the shaders
 void InitPipeline() {
-    // load and compile the two shaders
-    ID3DBlob *VS, *PS;
-
-    D3DReadFileToBlob(L"copy.vso", &VS);
-    D3DReadFileToBlob(L"copy.pso", &PS);
-
-    // encapsulate both shaders into shader objects
-    g_pDev->CreateVertexShader(VS->GetBufferPointer(), VS->GetBufferSize(), NULL, &g_pVS);
-    g_pDev->CreatePixelShader(PS->GetBufferPointer(), PS->GetBufferSize(), NULL, &g_pPS);
+    ThrowIfFailed(g_pDev->CreateCommandAllocator(D3D12_COMMAND_LIST_TYPE_DIRECT, IID_PPV_ARGS(&g_pCommandAllocator)));
+
+    // create an empty root signature
+    CD3DX12_ROOT_SIGNATURE_DESC rsd;
+    rsd.Init(0, nullptr, 0, nullptr, D3D12_ROOT_SIGNATURE_FLAG_ALLOW_INPUT_ASSEMBLER_INPUT_LAYOUT);
+
+    ComPtr<ID3DBlob> signature;
+    ComPtr<ID3DBlob> error;
+    ThrowIfFailed(D3D12SerializeRootSignature(&rsd, D3D_ROOT_SIGNATURE_VERSION_1, &signature, &error));
+    ThrowIfFailed(g_pDev->CreateRootSignature(0, signature->GetBufferPointer(), signature->GetBufferSize(), IID_PPV_ARGS(&g_pRootSignature)));
+
+    // load the shaders
+#if defined(_DEBUG)
+    // Enable better shader debugging with the graphics debugging tools.
+    UINT compileFlags = D3DCOMPILE_DEBUG | D3DCOMPILE_SKIP_OPTIMIZATION;
+#else
+    UINT compileFlags = 0;
+#endif
+    ComPtr<ID3DBlob> vertexShader;
+    ComPtr<ID3DBlob> pixelShader;
+
+    D3DCompileFromFile(
+        GetAssetFullPath(L"copy.vs").c_str(),
+        nullptr,
+        D3D_COMPILE_STANDARD_FILE_INCLUDE,
+        "main",
+        "vs_5_0",
+        compileFlags,
+        0,
+        &vertexShader,
+        &error);
+    if (error) { OutputDebugString((LPCTSTR)error->GetBufferPointer()); error->Release(); throw std::exception(); }
+
+    D3DCompileFromFile(
+        GetAssetFullPath(L"copy.ps").c_str(),
+        nullptr,
+        D3D_COMPILE_STANDARD_FILE_INCLUDE,
+        "main",
+        "ps_5_0",
+        compileFlags,
+        0,
+        &pixelShader,
+        &error);
+    if (error) { OutputDebugString((LPCTSTR)error->GetBufferPointer()); error->Release(); throw std::exception(); }

-    // set the shader objects
-    g_pDevcon->VSSetShader(g_pVS, 0, 0);
-    g_pDevcon->PSSetShader(g_pPS, 0, 0);

     // create the input layout object
-    D3D11_INPUT_ELEMENT_DESC ied[] =
+    D3D12_INPUT_ELEMENT_DESC ied[] =
     {
-        {"POSITION", 0, DXGI_FORMAT_R32G32B32_FLOAT, 0, 0, D3D11_INPUT_PER_VERTEX_DATA, 0},
-        {"COLOR", 0, DXGI_FORMAT_R32G32B32A32_FLOAT, 0, 12, D3D11_INPUT_PER_VERTEX_DATA, 0},
+        {"POSITION", 0, DXGI_FORMAT_R32G32B32_FLOAT, 0, 0, D3D12_INPUT_CLASSIFICATION_PER_VERTEX_DATA, 0},
+        {"COLOR", 0, DXGI_FORMAT_R32G32B32A32_FLOAT, 0, 12, D3D12_INPUT_CLASSIFICATION_PER_VERTEX_DATA, 0},
     };

-    g_pDev->CreateInputLayout(ied, 2, VS->GetBufferPointer(), VS->GetBufferSize(), &g_pLayout);
-    g_pDevcon->IASetInputLayout(g_pLayout);
-
-    VS->Release();
-    PS->Release();
+    // describe and create the graphics pipeline state object (PSO)
+    D3D12_GRAPHICS_PIPELINE_STATE_DESC psod = {};
+    psod.InputLayout    = { ied, _countof(ied) };
+    psod.pRootSignature = g_pRootSignature.Get();
+    psod.VS             = { reinterpret_cast<UINT8*>(vertexShader->GetBufferPointer()), vertexShader->GetBufferSize() };
+    psod.PS             = { reinterpret_cast<UINT8*>(pixelShader->GetBufferPointer()), pixelShader->GetBufferSize() };
+    psod.RasterizerState= CD3DX12_RASTERIZER_DESC(D3D12_DEFAULT);
+    psod.BlendState     = CD3DX12_BLEND_DESC(D3D12_DEFAULT);
+    psod.DepthStencilState.DepthEnable  = FALSE;
+    psod.DepthStencilState.StencilEnable= FALSE;
+    psod.SampleMask     = UINT_MAX;
+    psod.PrimitiveTopologyType = D3D12_PRIMITIVE_TOPOLOGY_TYPE_TRIANGLE;
+    psod.NumRenderTargets = 1;
+    psod.RTVFormats[0]  = DXGI_FORMAT_R8G8B8A8_UNORM;
+    psod.SampleDesc.Count = 1;
+    ThrowIfFailed(g_pDev->CreateGraphicsPipelineState(&psod, IID_PPV_ARGS(&g_pPipelineState)));
+
+    ThrowIfFailed(g_pDev->CreateCommandList(0,
+                D3D12_COMMAND_LIST_TYPE_DIRECT,
+                g_pCommandAllocator.Get(),
+                g_pPipelineState.Get(),
+                IID_PPV_ARGS(&g_pCommandList)));
+
+    ThrowIfFailed(g_pCommandList->Close());
 }

 // this is the function that creates the shape to render
@@ -116,31 +257,127 @@ void InitGraphics() {
         {XMFLOAT3(-0.45f, -0.5f, 0.0f), XMFLOAT4(0.0f, 0.0f, 1.0f, 1.0f)}
     };


-    // create the vertex buffer
-    D3D11_BUFFER_DESC bd;
-    ZeroMemory(&bd, sizeof(bd));

-    bd.Usage = D3D11_USAGE_DYNAMIC;                // write access access by CPU and GPU
-    bd.ByteWidth = sizeof(VERTEX) * 3;             // size is the VERTEX struct * 3
-    bd.BindFlags = D3D11_BIND_VERTEX_BUFFER;       // use as a vertex buffer
-    bd.CPUAccessFlags = D3D11_CPU_ACCESS_WRITE;    // allow CPU to write in buffer
-    g_pDev->CreateBuffer(&bd, NULL, &g_pVBuffer);       // create the buffer

+    const UINT vertexBufferSize = sizeof(OurVertices);
+
+    // Note: using upload heaps to transfer static data like vert buffers is not
+    // recommended. Every time the GPU needs it, the upload heap will be marshalled
+    // over. Please read up on Default Heap usage. An upload heap is used here for
+    // code simplicity and because there are very few verts to actually transfer.
+    ThrowIfFailed(g_pDev->CreateCommittedResource(
+        &CD3DX12_HEAP_PROPERTIES(D3D12_HEAP_TYPE_UPLOAD),
+        D3D12_HEAP_FLAG_NONE,
+        &CD3DX12_RESOURCE_DESC::Buffer(vertexBufferSize),
+        D3D12_RESOURCE_STATE_GENERIC_READ,
+        nullptr,
+        IID_PPV_ARGS(&g_pVertexBuffer)));
+
-    // copy the vertices into the buffer
-    D3D11_MAPPED_SUBRESOURCE ms;
-    g_pDevcon->Map(g_pVBuffer, NULL, D3D11_MAP_WRITE_DISCARD, NULL, &ms);    // map the buffer
-    memcpy(ms.pData, OurVertices, sizeof(VERTEX) * 3);                       // copy the data
-    g_pDevcon->Unmap(g_pVBuffer, NULL);                                      // unmap the buffer

+    // copy the vertices into the buffer
+    uint8_t *pVertexDataBegin;
+    CD3DX12_RANGE readRange(0, 0);                  // we do not intend to read this buffer on CPU
+    ThrowIfFailed(g_pVertexBuffer->Map(0, &readRange,
+                reinterpret_cast<void**>(&pVertexDataBegin)));               // map the buffer
+    memcpy(pVertexDataBegin, OurVertices, vertexBufferSize);                 // copy the data
+    g_pVertexBuffer->Unmap(0, nullptr);                                      // unmap the buffer
+
+    // initialize the vertex buffer view
+    g_VertexBufferView.BufferLocation = g_pVertexBuffer->GetGPUVirtualAddress();
+    g_VertexBufferView.StrideInBytes  = sizeof(VERTEX);
+    g_VertexBufferView.SizeInBytes    = vertexBufferSize;
+
+    // create synchronization objects and wait until assets have been uploaded to the GPU
+    ThrowIfFailed(g_pDev->CreateFence(0, D3D12_FENCE_FLAG_NONE, IID_PPV_ARGS(&g_pFence)));
+    g_nFenceValue = 1;
+
+    // create an event handle to use for frame synchronization
+    g_hFenceEvent = CreateEvent(nullptr, FALSE, FALSE, nullptr);
+    if (g_hFenceEvent == nullptr)
+    {
+        ThrowIfFailed(HRESULT_FROM_WIN32(GetLastError()));
+    }


+    // wait for the command list to execute; we are reusing the same command
+    // list in our main loop but for now, we just want to wait for setup to
+    // complete before continuing.
+    WaitForPreviousFrame();
+}

所谓Fence,就是内存上一个变量。这个变量GPU和CPU都可以读写,而且通过cache控制的方法避免GPU和CPU之间出现对于这个值的内容不同步的情况。我们知道,在当代计算机系统结构当中,无论是CPU还是GPU都是有很复杂的cache结构,这种结构往往会导致CPU/GPU看到的变量的值与实际内存上保存的值不一致。

https://en.m.wikipedia.org/wiki/Cache_(computing)

而这个Fence,就是一个保证不会出现这种情况的变量。GPU在完成图形渲染任务之后,会更新这个Fence的值。而CPU在检测到这个值被更新之后,就知道GPU已经完成渲染,可以释放/重用相关资源了。

下面是构建整个swapchain。

// this function prepare graphic resources for use
-HRESULT CreateGraphicsResources(HWND hWnd)
+void CreateGraphicsResources(HWND hWnd)
+{
+    if (g_pSwapChain.Get() == nullptr)
+    {
+#if defined(_DEBUG)
+        // Enable the D3D12 debug layer.
+        {
+            ComPtr<ID3D12Debug> debugController;
+            if (SUCCEEDED(D3D12GetDebugInterface(IID_PPV_ARGS(&debugController))))
             {
-    HRESULT hr = S_OK;
-    if (g_pSwapchain == nullptr)
+                debugController->EnableDebugLayer();
+            }
+        }
+#endif
+
+        ComPtr<IDXGIFactory4> factory;
+        ThrowIfFailed(CreateDXGIFactory1(IID_PPV_ARGS(&factory)));
+
+        if (bUseWarpDevice)
+        {
+            ComPtr<IDXGIAdapter> warpAdapter;
+            ThrowIfFailed(factory->EnumWarpAdapter(IID_PPV_ARGS(&warpAdapter)));
+
+            ThrowIfFailed(D3D12CreateDevice(
+                warpAdapter.Get(),
+                D3D_FEATURE_LEVEL_11_0,
+                IID_PPV_ARGS(&g_pDev)
+                ));
+        }
+        else
         {
+            ComPtr<IDXGIAdapter1> hardwareAdapter;
+            GetHardwareAdapter(factory.Get(), &hardwareAdapter);
+
+            ThrowIfFailed(D3D12CreateDevice(
+                hardwareAdapter.Get(),
+                D3D_FEATURE_LEVEL_11_0,
+                IID_PPV_ARGS(&g_pDev)
+                ));
+        }
+
+        // Describe and create the command queue.
+        D3D12_COMMAND_QUEUE_DESC queueDesc = {};
+        queueDesc.Flags = D3D12_COMMAND_QUEUE_FLAG_NONE;
+        queueDesc.Type  = D3D12_COMMAND_LIST_TYPE_DIRECT;
+
+        ThrowIfFailed(g_pDev->CreateCommandQueue(&queueDesc, IID_PPV_ARGS(&g_pCommandQueue)));
+
         // create a struct to hold information about the swap chain
         DXGI_SWAP_CHAIN_DESC scd;

@@ -148,103 +385,109 @@ HRESULT CreateGraphicsResources(HWND hWnd)
         ZeroMemory(&scd, sizeof(DXGI_SWAP_CHAIN_DESC));

         // fill the swap chain description struct
-        scd.BufferCount = 1;                                    // one back buffer
-        scd.BufferDesc.Width = SCREEN_WIDTH;
-        scd.BufferDesc.Height = SCREEN_HEIGHT;
+        scd.BufferCount = nFrameCount;                           // back buffer count
+        scd.BufferDesc.Width = nScreenWidth;
+        scd.BufferDesc.Height = nScreenHeight;
         scd.BufferDesc.Format = DXGI_FORMAT_R8G8B8A8_UNORM;     // use 32-bit color
         scd.BufferDesc.RefreshRate.Numerator = 60;
         scd.BufferDesc.RefreshRate.Denominator = 1;
         scd.BufferUsage = DXGI_USAGE_RENDER_TARGET_OUTPUT;      // how swap chain is to be used
+        scd.SwapEffect  = DXGI_SWAP_EFFECT_FLIP_DISCARD;        // DXGI_SWAP_EFFECT_FLIP_DISCARD only supported after Win10
+                                                                // use DXGI_SWAP_EFFECT_DISCARD on platforms early than Win10
         scd.OutputWindow = hWnd;                                // the window to be used
-        scd.SampleDesc.Count = 4;                               // how many multisamples
+        scd.SampleDesc.Count = 1;                               // multi-samples can not be used when in SwapEffect sets to
+                                                                // DXGI_SWAP_EFFECT_FLOP_DISCARD
         scd.Windowed = TRUE;                                    // windowed/full-screen mode
-        scd.Flags = DXGI_SWAP_CHAIN_FLAG_ALLOW_MODE_SWITCH;     // allow full-screen switching
+        scd.Flags    = DXGI_SWAP_CHAIN_FLAG_ALLOW_MODE_SWITCH;  // allow full-screen transition

-        const D3D_FEATURE_LEVEL FeatureLevels[] = { D3D_FEATURE_LEVEL_11_1,
-                                                    D3D_FEATURE_LEVEL_11_0,
-                                                    D3D_FEATURE_LEVEL_10_1,
-                                                    D3D_FEATURE_LEVEL_10_0,
-                                                    D3D_FEATURE_LEVEL_9_3,
-                                                    D3D_FEATURE_LEVEL_9_2,
-                                                    D3D_FEATURE_LEVEL_9_1};
-        D3D_FEATURE_LEVEL FeatureLevelSupported;
-
-        HRESULT hr = S_OK;
-
-        // create a device, device context and swap chain using the information in the scd struct
-        hr = D3D11CreateDeviceAndSwapChain(NULL,
-                                      D3D_DRIVER_TYPE_HARDWARE,
-                                      NULL,
-                                      0,
-                                      FeatureLevels,
-                                      _countof(FeatureLevels),
-                                      D3D11_SDK_VERSION,
-                                      &scd,
-                                      &g_pSwapchain,
-                                      &g_pDev,
-                                      &FeatureLevelSupported,
-                                      &g_pDevcon);
-
-        if (hr == E_INVALIDARG) {
-            hr = D3D11CreateDeviceAndSwapChain(NULL,
-                                      D3D_DRIVER_TYPE_HARDWARE,
-                                      NULL,
-                                      0,
-                                      &FeatureLevelSupported,
-                                      1,
-                                      D3D11_SDK_VERSION,
+        ComPtr<IDXGISwapChain> swapChain;
+        ThrowIfFailed(factory->CreateSwapChain(
+                    g_pCommandQueue.Get(),                      // Swap chain needs the queue so that it can force a flush on it
                     &scd,
-                                      &g_pSwapchain,
-                                      &g_pDev,
-                                      NULL,
-                                      &g_pDevcon);
-        }
+                    &swapChain
+                    ));
+
+        ThrowIfFailed(swapChain.As(&g_pSwapChain));

-        if (hr == S_OK) {
+        g_nFrameIndex = g_pSwapChain->GetCurrentBackBufferIndex();
         CreateRenderTarget();
-            SetViewPort();
         InitPipeline();
         InitGraphics();
     }
 }
-    return hr;
-}

因为采用了ComPtr智能指针,不需要手动release了。(会在相关变量被重用或者程序结束的时候自动release)

void DiscardGraphicsResources()
 {
-    SafeRelease(&g_pLayout);
-    SafeRelease(&g_pVS);
-    SafeRelease(&g_pPS);
-    SafeRelease(&g_pVBuffer);
-    SafeRelease(&g_pSwapchain);
-    SafeRelease(&g_pRTView);
-    SafeRelease(&g_pDev);
-    SafeRelease(&g_pDevcon);
+    WaitForPreviousFrame();
+
+    CloseHandle(g_hFenceEvent);
 }

下面是录制绘图指令到command list当中。

+void PopulateCommandList()
 {
+    // command list allocators can only be reset when the associated
+    // command lists have finished execution on the GPU; apps should use
+    // fences to determine GPU execution progress.
+    ThrowIfFailed(g_pCommandAllocator->Reset());
+
+    // however, when ExecuteCommandList() is called on a particular command
+    // list, that command list can then be reset at any time and must be before
+    // re-recording.
+    ThrowIfFailed(g_pCommandList->Reset(g_pCommandAllocator.Get(), g_pPipelineState.Get()));
+
+    // Set necessary state.
+    g_pCommandList->SetGraphicsRootSignature(g_pRootSignature.Get());
+    g_pCommandList->RSSetViewports(1, &g_ViewPort);
+    g_pCommandList->RSSetScissorRects(1, &g_ScissorRect);
+
+    // Indicate that the back buffer will be used as a render target.
+    g_pCommandList->ResourceBarrier(1, &CD3DX12_RESOURCE_BARRIER::Transition(
+                g_pRenderTargets[g_nFrameIndex].Get(),
+                D3D12_RESOURCE_STATE_PRESENT,
+                D3D12_RESOURCE_STATE_RENDER_TARGET));
+
+    CD3DX12_CPU_DESCRIPTOR_HANDLE rtvHandle(g_pRtvHeap->GetCPUDescriptorHandleForHeapStart(), g_nFrameIndex, g_nRtvDescriptorSize);
+    g_pCommandList->OMSetRenderTargets(1, &rtvHandle, FALSE, nullptr);
+
     // clear the back buffer to a deep blue
     const FLOAT clearColor[] = {0.0f, 0.2f, 0.4f, 1.0f};
-    g_pDevcon->ClearRenderTargetView(g_pRTView, clearColor);
+    g_pCommandList->ClearRenderTargetView(rtvHandle, clearColor, 0, nullptr);

     // do 3D rendering on the back buffer here
     {
         // select which vertex buffer to display
-        UINT stride = sizeof(VERTEX);
-        UINT offset = 0;
-        g_pDevcon->IASetVertexBuffers(0, 1, &g_pVBuffer, &stride, &offset);
+        g_pCommandList->IASetVertexBuffers(0, 1, &g_VertexBufferView);

         // select which primtive type we are using
-        g_pDevcon->IASetPrimitiveTopology(D3D10_PRIMITIVE_TOPOLOGY_TRIANGLELIST);
+        g_pCommandList->IASetPrimitiveTopology(D3D10_PRIMITIVE_TOPOLOGY_TRIANGLELIST);

         // draw the vertex buffer to the back buffer
-        g_pDevcon->Draw(3, 0);
+        g_pCommandList->DrawInstanced(3, 1, 0, 0);
+    }
+
+    // Indicate that the back buffer will now be used to present.
+    g_pCommandList->ResourceBarrier(1, &CD3DX12_RESOURCE_BARRIER::Transition(
+                g_pRenderTargets[g_nFrameIndex].Get(),
+                D3D12_RESOURCE_STATE_RENDER_TARGET,
+                D3D12_RESOURCE_STATE_PRESENT));
+
+    ThrowIfFailed(g_pCommandList->Close());
 }

提交上面录制的绘图指令(command list),并执行frame buffer交换,输出画面。

// this is the function used to render a single frame
void RenderFrame()
{
+    // record all the commands we need to render the scene into the command list
+    PopulateCommandList();
+
+    // execute the command list
+    ID3D12CommandList *ppCommandLists[] = { g_pCommandList.Get() };
+    g_pCommandQueue->ExecuteCommandLists(_countof(ppCommandLists), ppCommandLists);
+
     // swap the back buffer and the front buffer
-    g_pSwapchain->Present(0, 0);
+    ThrowIfFailed(g_pSwapChain->Present(1, 0));
+
+    WaitForPreviousFrame();
 }

绘图指令由Draw升级为了DrawInstanced。后者支持一次drawcall绘制一组相同的物体。这个功能在绘制诸如树木、花草、篱笆、地面、吃瓜群众等群体环境物体时十分有效,可以减少很多CPU-GPU之间的同步成本。在VR当中,同一个物体会出现在两个眼睛的视野当中,也是采用这种方式减少drawcall的。

编译命令行如下(Visual Studio工具链):

D:\wenli\Source\Repos\GameEngineFromScratch\Platform\Windows>cl /EHsc helloengine_d3d12.cpp user32.lib d3d12.lib dxgi.lib d3dcompiler.lib

这篇代码因为用了很多微软的东西,暂时无法用clang(包括clang-cl)编译。不过反正本来就是windows平台独有的DX12,这没关系。

好了。这样我们就完成了DX11到DX12的升级。下一篇将会加入贴图。

(– EOF –)

参考资料:

  1. Tutorial: Migrating Your Apps to DirectX* 12 – Part 1
  2. Creating a basic Direct3D 12 component
  3. DXGI
  4. DirectXMath Programming Guide

本作品采用知识共享署名 4.0 国际许可协议进行许可。

从零开始手敲次世代游戏引擎(十四)

接上一篇,对代码进行一个简单的说明。

首先我们依然是加入OpenGL的头文件。这里需要特别注意的是Clang缺省是大小写敏感的,所以必须如下书写:

#include <GL/gl.h>

前面提到过,OpenGL是在显卡的驱动程序里面实现的。而显卡的驱动程序因厂商不同而不同。而且,同一个厂商也会有许多不同版本的驱动程序。因此,我们的程序不应该也不可能直接与驱动程序进行链接。因此,OpenGL的API是一种2进制的API,需要在运行的时候动态地去查找这些API所在的位置(也就是这些API在内存上的地址)

我们这次的程序看起来很长,但实际上一大半都是在进行这个工作而已。如果使用Glew等OpenGL Loader(也就是加载工具),那么这些代码都可以省去。但是,在使用方便的工具之前,了解它们实际上都在做些什么,是很有意思的。

/////////////
// DEFINES //
/////////////
#define WGL_DRAW_TO_WINDOW_ARB         0x2001
#define WGL_ACCELERATION_ARB           0x2003
#define WGL_SWAP_METHOD_ARB            0x2007
#define WGL_SUPPORT_OPENGL_ARB         0x2010
#define WGL_DOUBLE_BUFFER_ARB          0x2011
#define WGL_PIXEL_TYPE_ARB             0x2013
#define WGL_COLOR_BITS_ARB             0x2014
#define WGL_DEPTH_BITS_ARB             0x2022
#define WGL_STENCIL_BITS_ARB           0x2023
#define WGL_FULL_ACCELERATION_ARB      0x2027
#define WGL_SWAP_EXCHANGE_ARB          0x2028
#define WGL_TYPE_RGBA_ARB              0x202B
#define WGL_CONTEXT_MAJOR_VERSION_ARB  0x2091
#define WGL_CONTEXT_MINOR_VERSION_ARB  0x2092
#define GL_ARRAY_BUFFER                   0x8892
#define GL_STATIC_DRAW                    0x88E4
#define GL_FRAGMENT_SHADER                0x8B30
#define GL_VERTEX_SHADER                  0x8B31
#define GL_COMPILE_STATUS                 0x8B81
#define GL_LINK_STATUS                    0x8B82
#define GL_INFO_LOG_LENGTH                0x8B84
#define GL_TEXTURE0                       0x84C0
#define GL_BGRA                           0x80E1
#define GL_ELEMENT_ARRAY_BUFFER           0x8893

这是定义了一些识别子。这个定义是OpenGL规范事先规定好的,只是一个规定,并没有什么道理。显卡厂商按照这个规定写驱动,我们也必须按照这个规定写程序,两边才能对接起来。

WGL是微软提供的OpenGL的兼容版本。

下面的是OpenGL的API的类型定义。这些定义,同样的,是OpenGL规范的规定。

//////////////
// TYPEDEFS //
//////////////
typedef BOOL (WINAPI   * PFNWGLCHOOSEPIXELFORMATARBPROC) (HDC hdc, const int *piAttribIList, const FLOAT *pfAttribFList, UINT nMaxFormats, int *piFormats, UINT *nNumFormats);
typedef HGLRC (WINAPI  * PFNWGLCREATECONTEXTATTRIBSARBPROC) (HDC hDC, HGLRC hShareContext, const int *attribList);
typedef BOOL (WINAPI   * PFNWGLSWAPINTERVALEXTPROC) (int interval);
typedef void (APIENTRY * PFNGLATTACHSHADERPROC) (GLuint program, GLuint shader);
typedef void (APIENTRY * PFNGLBINDBUFFERPROC) (GLenum target, GLuint buffer);
typedef void (APIENTRY * PFNGLBINDVERTEXARRAYPROC) (GLuint array);
typedef void (APIENTRY * PFNGLBUFFERDATAPROC) (GLenum target, ptrdiff_t size, const GLvoid *data, GLenum usage);
typedef void (APIENTRY * PFNGLCOMPILESHADERPROC) (GLuint shader);
typedef GLuint(APIENTRY * PFNGLCREATEPROGRAMPROC) (void);
typedef GLuint(APIENTRY * PFNGLCREATESHADERPROC) (GLenum type);
typedef void (APIENTRY * PFNGLDELETEBUFFERSPROC) (GLsizei n, const GLuint *buffers);
typedef void (APIENTRY * PFNGLDELETEPROGRAMPROC) (GLuint program);
typedef void (APIENTRY * PFNGLDELETESHADERPROC) (GLuint shader);
typedef void (APIENTRY * PFNGLDELETEVERTEXARRAYSPROC) (GLsizei n, const GLuint *arrays);
typedef void (APIENTRY * PFNGLDETACHSHADERPROC) (GLuint program, GLuint shader);
typedef void (APIENTRY * PFNGLENABLEVERTEXATTRIBARRAYPROC) (GLuint index);
typedef void (APIENTRY * PFNGLGENBUFFERSPROC) (GLsizei n, GLuint *buffers);
typedef void (APIENTRY * PFNGLGENVERTEXARRAYSPROC) (GLsizei n, GLuint *arrays);
typedef GLint(APIENTRY * PFNGLGETATTRIBLOCATIONPROC) (GLuint program, const char *name);
typedef void (APIENTRY * PFNGLGETPROGRAMINFOLOGPROC) (GLuint program, GLsizei bufSize, GLsizei *length, char *infoLog);
typedef void (APIENTRY * PFNGLGETPROGRAMIVPROC) (GLuint program, GLenum pname, GLint *params);
typedef void (APIENTRY * PFNGLGETSHADERINFOLOGPROC) (GLuint shader, GLsizei bufSize, GLsizei *length, char *infoLog);
typedef void (APIENTRY * PFNGLGETSHADERIVPROC) (GLuint shader, GLenum pname, GLint *params);
typedef void (APIENTRY * PFNGLLINKPROGRAMPROC) (GLuint program);
typedef void (APIENTRY * PFNGLSHADERSOURCEPROC) (GLuint shader, GLsizei count, const char* *string, const GLint *length);
typedef void (APIENTRY * PFNGLUSEPROGRAMPROC) (GLuint program);
typedef void (APIENTRY * PFNGLVERTEXATTRIBPOINTERPROC) (GLuint index, GLint size, GLenum type, GLboolean normalized, GLsizei stride, const GLvoid *pointer);
typedef void (APIENTRY * PFNGLBINDATTRIBLOCATIONPROC) (GLuint program, GLuint index, const char *name);
typedef GLint(APIENTRY * PFNGLGETUNIFORMLOCATIONPROC) (GLuint program, const char *name);
typedef void (APIENTRY * PFNGLUNIFORMMATRIX4FVPROC) (GLint location, GLsizei count, GLboolean transpose, const GLfloat *value);
typedef void (APIENTRY * PFNGLACTIVETEXTUREPROC) (GLenum texture);
typedef void (APIENTRY * PFNGLUNIFORM1IPROC) (GLint location, GLint v0);
typedef void (APIENTRY * PFNGLGENERATEMIPMAPPROC) (GLenum target);
typedef void (APIENTRY * PFNGLDISABLEVERTEXATTRIBARRAYPROC) (GLuint index);
typedef void (APIENTRY * PFNGLUNIFORM3FVPROC) (GLint location, GLsizei count, const GLfloat *value);
typedef void (APIENTRY * PFNGLUNIFORM4FVPROC) (GLint location, GLsizei count, const GLfloat *value);

然后我们用这些函数指针类型定义我们自己的函数指针,用来存储OpenGL API的调用(跳转)地址:

PFNGLATTACHSHADERPROC glAttachShader;
PFNGLBINDBUFFERPROC glBindBuffer;
PFNGLBINDVERTEXARRAYPROC glBindVertexArray;
PFNGLBUFFERDATAPROC glBufferData;
PFNGLCOMPILESHADERPROC glCompileShader;
PFNGLCREATEPROGRAMPROC glCreateProgram;
PFNGLCREATESHADERPROC glCreateShader;
PFNGLDELETEBUFFERSPROC glDeleteBuffers;
PFNGLDELETEPROGRAMPROC glDeleteProgram;
PFNGLDELETESHADERPROC glDeleteShader;
PFNGLDELETEVERTEXARRAYSPROC glDeleteVertexArrays;
PFNGLDETACHSHADERPROC glDetachShader;
PFNGLENABLEVERTEXATTRIBARRAYPROC glEnableVertexAttribArray;
PFNGLGENBUFFERSPROC glGenBuffers;
PFNGLGENVERTEXARRAYSPROC glGenVertexArrays;
PFNGLGETATTRIBLOCATIONPROC glGetAttribLocation;
PFNGLGETPROGRAMINFOLOGPROC glGetProgramInfoLog;
PFNGLGETPROGRAMIVPROC glGetProgramiv;
PFNGLGETSHADERINFOLOGPROC glGetShaderInfoLog;
PFNGLGETSHADERIVPROC glGetShaderiv;
PFNGLLINKPROGRAMPROC glLinkProgram;
PFNGLSHADERSOURCEPROC glShaderSource;
PFNGLUSEPROGRAMPROC glUseProgram;
PFNGLVERTEXATTRIBPOINTERPROC glVertexAttribPointer;
PFNGLBINDATTRIBLOCATIONPROC glBindAttribLocation;
PFNGLGETUNIFORMLOCATIONPROC glGetUniformLocation;
PFNGLUNIFORMMATRIX4FVPROC glUniformMatrix4fv;
PFNGLACTIVETEXTUREPROC glActiveTexture;
PFNGLUNIFORM1IPROC glUniform1i;
PFNGLGENERATEMIPMAPPROC glGenerateMipmap;
PFNGLDISABLEVERTEXATTRIBARRAYPROC glDisableVertexAttribArray;
PFNGLUNIFORM3FVPROC glUniform3fv;
PFNGLUNIFORM4FVPROC glUniform4fv;

PFNWGLCHOOSEPIXELFORMATARBPROC wglChoosePixelFormatARB;
PFNWGLCREATECONTEXTATTRIBSARBPROC wglCreateContextAttribsARB;
PFNWGLSWAPINTERVALEXTPROC wglSwapIntervalEXT;

OpenGL的API分为核心API,扩展API,以及平台API。核心API是那些成熟的,已经正式纳入某个版本的OpenGL规范的API。扩展API是那些厂商拓展的,还没有正式纳入某个版本的OpenGL的API。平台API则是那些和平台密切相关的,比如生成上下文,改变分辨率,改变刷新率的API。上面以wgl开头的,为平台API。

随着显卡的发展和OpenGL的版本升级,这些API都是在动态改变的。[*1]

typedef struct VertexType
{
	VectorType position;
	VectorType color;
} VertexType;

这是定义了一个顶点所包含的数据(正式名称称为属性)的个数和类型。这个在前面的文章当中就已经用到过了。顶点数据结构对于渲染的性能指标有着很大的影响。因为在GPU工作的时候,它会将这些数据挨个读入到内部寄存器,然后交给内部的Shader执行器去执行我们写的Shader代码。这个数据结构越复杂,占用的内部寄存器就越多,那么可以并行执行的Shader就会减少,从而导致整个渲染时间增加。

Epic在使用UE4制作Paragon的时候,就遇到了这个问题。因为从别的角度来看,顶点数据结构以及Shader的通用程度也是十分重要的。UE4广泛使用基于物理的渲染,所以它的顶点数据结构是比较复杂的,包括很多参数。同时,Shader也是在执行期进行绑定的。也就是说,在UE4当中,并不是像我们现在这样,将顶点的数据结构和Shader一一对应起来。这就造成了在执行的时候,读入寄存器的大量顶点数据当中的相当一部分属性,对于大部分的Shader来说,其实是用不到的。这就白白的浪费了GPU的寄存器,降低了并行执行的数量,拉长了渲染时间。

因此,Epic在Paragon项目当中,采用了一些特别的手法,将不需要的顶点属性剔除,不读入GPU寄存器。但是实现这个是有前提条件的,比如AMD的GCN架构,因为它在执行VS Shader之前,有一个被称为是LS的阶段,专门用来处理数据加载的,那么就可以通过改变这个LS来实现这样的目的。

const char VS_SHADER_SOURCE_FILE[] = "color.vs";
const char PS_SHADER_SOURCE_FILE[] = "color.ps";

这里定义了VS Shader和PS Shader的程序文件名。在我们之前的DX的例子里,我们是在离线情况下先将Shader编译成二进制,再在运行时读入到GPU当中去的。这里我们演示了另外一种方法,直接读取Shader的源代码,然后在运行时内进行编译,之后再设置到GPU当中去。这种方法实质上允许我们在执行的时候动态生成Shader代码给自己使用。就类似Java/Javascript的JIT编译。我们现在有很多网游是支持“热更新”的,这种“热更新”往往就是采用了这种手段。当然,不仅仅是Shader,还包括游戏本身的逻辑,UI等(比如Lua语言)。

但是这种方法从安全角度来看是有巨大的隐患的。它意味着程序可以执行任何代码,包括未知的代码。因此,在一些成熟的平台,比如PS4平台上,这种功能是被禁止掉的。也就是说,只能使用离线编译好的代码。

float g_worldMatrix[16];
float g_viewMatrix[16];
float g_projectionMatrix[16];

这个就是所谓的MVP矩阵,是实现3D空间到2D空间的投影,摄像机语言,以及动画的关键。之前的文章我们渲染的都是静态的图像;而这里我们加入了这个MVP矩阵,我们就可以渲染动画了。

接下来都是一些辅助性函数(子过程),我们先跳过它们,看我们的主函数(WinMain)

首先我们可以看到的一个大的改变是我们创建了两次Window。

// fill in the struct with the needed information
    wc.cbSize = sizeof(WNDCLASSEX);
    wc.style = CS_HREDRAW | CS_VREDRAW | CS_OWNDC;
    wc.lpfnWndProc = DefWindowProc;
    wc.hInstance = hInstance;
    wc.hCursor = LoadCursor(NULL, IDC_ARROW);
    wc.hbrBackground = (HBRUSH)COLOR_WINDOW;
    wc.lpszClassName = _T("Temporary");

    // register the window class
    RegisterClassEx(&wc);

    // create the temporary window for OpenGL extension setup.
    hWnd = CreateWindowEx(WS_EX_APPWINDOW,
                          _T("Temporary"),    // name of the window class
                          _T("Temporary"),   // title of the window
                          WS_OVERLAPPEDWINDOW,    // window style
                          0,    // x-position of the window
                          0,    // y-position of the window
                          640,    // width of the window
                          480,    // height of the window
                          NULL,    // we have no parent window, NULL
                          NULL,    // we aren't using menus, NULL
                          hInstance,    // application handle
                          NULL);    // used with multiple windows, NULL

                                    // Don't show the window.
    ShowWindow(hWnd, SW_HIDE);

    InitializeExtensions(hWnd);

    DestroyWindow(hWnd);
    hWnd = NULL;

    // clear out the window class for use
    ZeroMemory(&wc, sizeof(WNDCLASSEX));

    // fill in the struct with the needed information
    wc.cbSize = sizeof(WNDCLASSEX);
    wc.style = CS_HREDRAW | CS_VREDRAW | CS_OWNDC;
    wc.lpfnWndProc = WindowProc;
    wc.hInstance = hInstance;
    wc.hCursor = LoadCursor(NULL, IDC_ARROW);
    wc.hbrBackground = (HBRUSH)COLOR_WINDOW;
    wc.lpszClassName = _T("Hello, Engine!");

    // register the window class
    RegisterClassEx(&wc);

    // create the window and use the result as the handle
    hWnd = CreateWindowEx(WS_EX_APPWINDOW,
        _T("Hello, Engine!"),    // name of the window class
        _T("Hello, Engine!"),   // title of the window
        WS_OVERLAPPEDWINDOW,    // window style
        300,    // x-position of the window
        300,    // y-position of the window
        960,    // width of the window
        540,    // height of the window
        NULL,    // we have no parent window, NULL
        NULL,    // we aren't using menus, NULL
        hInstance,    // application handle
        NULL);    // used with multiple windows, NULL

    InitializeOpenGL(hWnd, 960, 540, SCREEN_DEPTH, SCREEN_NEAR, true);

这是因为在Windows当中,获取OpenGL API入口的API,它本身也是一个OpenGL扩展函数。OpenGL的API必须在创建了一个Draw Context(DC)的情况下,才可以使用。因此我们先创建了一个纯粹以加载DC为目的的临时窗口,将OpenGL API的入口地址全部加载到我们事先定义好的函数指针之中。

这里需要注意的有几个点:

  1. 注册的windows class (wc)的wc.style,必须包含CS_OWNDC。这个标志强迫Windows为使用该windows class的每一个窗口创建一个单独的DC。如果没有这个标志,那么DC会在所有使用这个windows class的窗口之间共用,那么这是一个很危险的事情。(也就是某个窗口改变了frame buffer的格式会导致其它同一个windows class的窗口的frame buffer的格式也改变)
  2. 由于在调用wgl系的函数之前,要先确定frame buffer的pixel format,也就是frame buffer的格式,我们在第一次创建窗口之后,InitializeExtensions里面,指定了缺省的pixel format。(因为这个时候我们OpenGL API的函数入口都没有找到,没有办法知道显卡所支持的pixel format)然而不幸的是,这个SetPixelFormat对于每一个窗口,只能执行一次。因此,在我们完成InitializeExtensions之后,获得了OpenGL API的入口,想要将Frame buffer指定为我们喜欢的格式的时候,我们必须关闭当前这个(临时)窗口,重新创建一个窗口(也就是重新创建DC)
  3. 相同的window class的窗口是共用一个消息处理函数(windowproc)的。
    About Window Classes
    从而,如果我们在创建两次窗口的时候,使用了同一个window class,而且第二次创建紧接着第一次窗口的销毁之后(就如同我们这个例子)的话,那么我们在第二次创建窗口之后,很可能会收到第一次创建的窗口被销毁时发过来的WM_DESTROY消息。那么会导致我们第二个窗口立即被关闭。(如果我们像这个例子这样处理了WM_DESTROY的话)

接下来我们读入Shader程序,并对其进行编译和绑定(指设定GPU相关的寄存器,使其指向我们的程序在内存当中的地址,以及绑定Shader程序(在这个例子里是VS Shader)输入参数的格式。(在这个例子里就是我们VertexType的两个属性: position, color)

     // Bind the shader input variables.
        glBindAttribLocation(g_shaderProgram, 0, "inputPosition");
        glBindAttribLocation(g_shaderProgram, 1, "inputColor");

这其实就是告诉GPU,怎么去看我们输入的顶点数据。(否则,从GPU来看输入就是一块内存区域,一连串的byte,不知道每个顶点的数据从哪里开始,从哪里结束,各是什么意思)

然后是初始化我们的顶点数据和索引数据。索引数据并不是必须的。比如我们前一篇就没有用到索引数据。但是在一个实际的游戏当中,物体都是比较复杂的,每个顶点被多个三角面所用。所以,相对于带有许多属性的顶点数据来说,索引数据要轻量得多。与其让顶点数据不断重复出现(因为每3个顶点才能组成一个三角形,对于相邻的三角形,它们有1-2个顶点是共用的,这1-2个顶点就会重复出现),不如让索引重复出现。(顶点-索引机制就是先告诉GPU所有不同的顶点的数据,然后告诉GPU把哪些点连起来形成一个面,很像我们小时候玩的按数字连点画图游戏)

所以对于正方体来说,它一共有8个不同的顶点;6个面,每个面用一条对角线切成两个三角形的话,一共是12个三角形;每个三角形需要3个索引来描述,一共也就是36个索引。

     VertexType vertices[] = {
   // Position:  x       y      z  Color:r    g      b
            {{  1.0f,  1.0f,  1.0f }, { 1.0f, 0.0f, 0.0f }},
            {{  1.0f,  1.0f, -1.0f }, { 0.0f, 1.0f, 0.0f }},
            {{ -1.0f,  1.0f, -1.0f }, { 0.0f, 0.0f, 1.0f }},
            {{ -1.0f,  1.0f,  1.0f }, { 1.0f, 1.0f, 0.0f }},
            {{  1.0f, -1.0f,  1.0f }, { 1.0f, 0.0f, 1.0f }},
            {{  1.0f, -1.0f, -1.0f }, { 0.0f, 1.0f, 1.0f }},
            {{ -1.0f, -1.0f, -1.0f }, { 0.5f, 1.0f, 0.5f }},
            {{ -1.0f, -1.0f,  1.0f }, { 1.0f, 0.5f, 1.0f }},
        };
        uint16_t indices[] = { 1, 2, 3, 3, 2, 6, 6, 7, 3, 3, 0, 1, 0, 3, 7, 7, 6, 4, 4, 6, 5, 0, 7, 4, 1, 0, 4, 1, 4, 5, 2, 1, 5, 2, 5, 6 };

索引的排列有一定技巧。在当代的GPU当中,为了高速化,内部有各种各样的Cache,也就是缓存。GPU是按照索引顺序依次对顶点的坐标进行变换(一般就是乘以我们给它的MVP矩阵,把空间的点先按照我们的指示进行移动旋转,然后投射到屏幕坐标(2D)当中)。但是就如我们看到的,索引有重复。对于重复的索引所代表的顶点进行重复的计算是没有意义的。事实上,GPU会在一定程度上Cache之前计算的结果。如果后面的索引在这个Cache里面找到了,则直接利用前面计算的结果,而不重新计算。

然而,现实游戏当中的顶点是千万-亿级别的。GPU不可能有那么大的Cache全部记住。事实上能够记住的只有最近的几个而已。因此在排列这些索引的时候就很有讲究了。尽量让重复的排得近一点。这个里面有算法的,这里就先不展开,放在后面的文章讨论。

另外需要注意的是,为了减轻GPU的负担,大多数的表面都是有正反的。对于反面朝向我们的三角形,如果没有特别指定,GPU会直接将它丢弃掉。GPU判断这个正反是通过3个顶点在投射到屏幕空间之后,按照索引的顺序去看,是逆时针顺序还是顺时针顺序判断的。因此,在我们创建索引的时候,需要将空间几何体首先展平(也就是制作贴图的时候的UV展开),然后根据在那个平面上的逆时针方向编制每个三角形的索引。

(图片来自网络)

好了,接下来就是计算绑定MVP了。MVP的特点是每帧更新一次(当然也可以不变),在单帧的绘制过程当中,它是一个常数。所以,MVP在GPU当中存放的地方,也叫Constant Buffer。这个Constant,就是指在一帧当中,它们是常数(同样的还有光照数据等)

 // Update world matrix to rotate the model
    rotateAngle += PI / 120;
    float rotationMatrixY[16];
    float rotationMatrixZ[16];
    MatrixRotationY(rotationMatrixY, rotateAngle);
    MatrixRotationZ(rotationMatrixZ, rotateAngle);
    MatrixMultiply(g_worldMatrix, rotationMatrixZ, rotationMatrixY);

    // Generate the view matrix based on the camera's position.
    CalculateCameraPosition();

    // Set the color shader as the current shader program and set the matrices that it will use for rendering.
    glUseProgram(g_shaderProgram);
    SetShaderParameters(g_worldMatrix, g_viewMatrix, g_projectionMatrix);

我们这里使用了一个局部的静态变量,rotateAngle,来存储一个角度因变量(自变量是时间)。然后我们根据这个角度因变量计算出Y轴和Z轴的旋转矩阵。这两个矩阵相乘,就是我们的M矩阵。它实现了我们模型的动画。注意OpenGL是右手坐标系,这个和DX是不一样的。

同时我们根据镜头的当前姿态,计算出View矩阵。这个矩阵决定了我们观察的位置。

而P矩阵是我们提前根据视口的状态以及摄像机的FOV计算好的。在摄像机FOV不变化,屏幕分辨率不变化,视口不变化的情况下,它是不变化的。

在SetShaderParameter当中,我们查找到已经位于内存(而且是GPU可见的内存)当中的Shader的MVP变量的地址(也就是占位符的地址),把这些计算的结果覆盖上去。

最后我们调用绘图指令,命令GPU开始一帧的绘制:

     // Render the vertex buffer using the index buffer.
        glDrawElements(GL_TRIANGLES, g_indexCount, GL_UNSIGNED_SHORT, 0);

所以我们可以看到,其实一个标准的(基本)绘制流程就是CPU先设置好绘图的上下文(frame buffer,显卡的状态等),然后决定画面当中绘制哪些物体,将这些物体的顶点以及索引数据调入内存的一片区域,将这个区域暴露给GPU并绑定在GPU的特定寄存器上;计算每个物体的MVP,将结果更新到绘制该物体的Shader当中;签发绘制指令,让GPU完成物体的绘制工作。

最后我们再来看一下Shader。OpenGL的Shader是用一种称为GLSL的语言书写的。基本语法等依然是来自于C/C++语言。与DX的HLSL相比,大致上有以下区别:

  1. 没有特殊寄存器绑定的语言扩展。但是有特殊变量,起到寄存器绑定的作用。特殊变量以”gl_”开头;
  2. 输入输出变量都是作为全局变量进行声明,而非作为函数的参数声明;
  3. 类型名称不同。HLSL当中的数据类型名称以基本形的扩展形式出现,如float3;而GLSL当中则以vec3这种扩展的形式出现。
////////////////////////////////////////////////////////////////////////////////
// Filename: color.vs
////////////////////////////////////////////////////////////////////////////////

#version 400

/////////////////////
// INPUT VARIABLES //
/////////////////////
in vec3 inputPosition;
in vec3 inputColor;

//////////////////////
// OUTPUT VARIABLES //
//////////////////////
out vec3 color;

///////////////////////
// UNIFORM VARIABLES //
///////////////////////
uniform mat4 worldMatrix;
uniform mat4 viewMatrix;
uniform mat4 projectionMatrix;

////////////////////////////////////////////////////////////////////////////////
// Vertex Shader
////////////////////////////////////////////////////////////////////////////////
void main(void)
{
	// Calculate the position of the vertex against the world, view, and projection matrices.
	gl_Position = worldMatrix * vec4(inputPosition, 1.0f);
	gl_Position = viewMatrix * gl_Position;
	gl_Position = projectionMatrix * gl_Position;

	// Store the input color for the pixel shader to use.
	color = inputColor;
}

这个shader实现的就是把顶点的坐标乘以MVP矩阵,使其投射到2维视口坐标当中;另外原样拷贝输出色彩属性。

////////////////////////////////////////////////////////////////////////////////
// Filename: color.ps
////////////////////////////////////////////////////////////////////////////////
#version 400


/////////////////////
// INPUT VARIABLES //
/////////////////////
in vec3 color;


//////////////////////
// OUTPUT VARIABLES //
//////////////////////
out vec4 outputColor;


////////////////////////////////////////////////////////////////////////////////
// Pixel Shader
////////////////////////////////////////////////////////////////////////////////
void main(void)
{
	outputColor = vec4(color, 1.0f);
}

这个PS Shader实现的也是色彩的原样输出。因为我们这个例子当中frame buffer的格式是RGBA,而顶点色只有RGB三个分量,所以添加了一个A通道,值为1.0,意思是完全不透明。

(– EOF –)

参考引用:

  1. Load OpenGL Functions
  2. ARB assembly language

本作品采用知识共享署名 4.0 国际许可协议进行许可。

从零开始手敲次世代游戏引擎(十三)

上一篇我们在Linux系统上用OpenGL绘制了一个基本的空间矩形。本篇我们看看在Windows平台上如何使用OpenGL,并且看一下高版本(4.0)的OpenGL的使用方法。

注意无论是DX还是OpenGL,都需要硬件(GPU)的支持。不同版本的显卡能够支持的图形API版本也是不一样的。因此可能存在部分机器无法运行接下来的代码的情况。特别是在远程登录的情况下,或者在X forwarding情况下,不通过一些特别的技巧,一般是无法正确运行需要GPU加速的应用的。

另外纠正前面的文章的一个错误。vmware workstation player的免费版,是无法手动打开GPU加速的。

OpenGL API是由显卡的驱动程序实现的。所以我们的程序实际上并不会链接到这些API,而是在运行的时候去查找这些API的入口地址。这个用于查找API入口地址的API,一般就是gl.h和libGL(在windows当中为opengl32.lib)所提供的内容,但也是因系统不同而不同的。

由于OpenGL和DX是并行的关系,所以我们选择helloengine_win.c作为我们的起点,而不是helloengine_d*d.cpp。拷贝helloengine_win.c到helloengine_opengl.cpp,开始我们的编辑。

(本文大部分代码参考 rastertek.com/gl40tut03 编写。本文的目的是探查在Windows上使用OpenGL的方法。本文的代码并不会直接成为我们引擎的正式代码。)[*1]

#include <windows.h>
 #include <windowsx.h>
 #include <tchar.h>
+#include <GL/gl.h>
+#include <fstream>
+
+#include "math.h"
+
+using namespace std;
+
+/////////////
+// DEFINES //
+/////////////
+#define WGL_DRAW_TO_WINDOW_ARB         0x2001
+#define WGL_ACCELERATION_ARB           0x2003
+#define WGL_SWAP_METHOD_ARB            0x2007
+#define WGL_SUPPORT_OPENGL_ARB         0x2010
+#define WGL_DOUBLE_BUFFER_ARB          0x2011
+#define WGL_PIXEL_TYPE_ARB             0x2013
+#define WGL_COLOR_BITS_ARB             0x2014
+#define WGL_DEPTH_BITS_ARB             0x2022
+#define WGL_STENCIL_BITS_ARB           0x2023
+#define WGL_FULL_ACCELERATION_ARB      0x2027
+#define WGL_SWAP_EXCHANGE_ARB          0x2028
+#define WGL_TYPE_RGBA_ARB              0x202B
+#define WGL_CONTEXT_MAJOR_VERSION_ARB  0x2091
+#define WGL_CONTEXT_MINOR_VERSION_ARB  0x2092
+#define GL_ARRAY_BUFFER                   0x8892
+#define GL_STATIC_DRAW                    0x88E4
+#define GL_FRAGMENT_SHADER                0x8B30
+#define GL_VERTEX_SHADER                  0x8B31
+#define GL_COMPILE_STATUS                 0x8B81
+#define GL_LINK_STATUS                    0x8B82
+#define GL_INFO_LOG_LENGTH                0x8B84
+#define GL_TEXTURE0                       0x84C0
+#define GL_BGRA                           0x80E1
+#define GL_ELEMENT_ARRAY_BUFFER           0x8893
+
+//////////////
+// TYPEDEFS //
+//////////////
+typedef BOOL (WINAPI   * PFNWGLCHOOSEPIXELFORMATARBPROC) (HDC hdc, const int *piAttribIList, const FLOAT *pfAttribFList, UINT nMaxFormats, int *piFormats, UINT *nNumFormats);
+typedef HGLRC (WINAPI  * PFNWGLCREATECONTEXTATTRIBSARBPROC) (HDC hDC, HGLRC hShareContext, const int *attribList);
+typedef BOOL (WINAPI   * PFNWGLSWAPINTERVALEXTPROC) (int interval);
+typedef void (APIENTRY * PFNGLATTACHSHADERPROC) (GLuint program, GLuint shader);
+typedef void (APIENTRY * PFNGLBINDBUFFERPROC) (GLenum target, GLuint buffer);
+typedef void (APIENTRY * PFNGLBINDVERTEXARRAYPROC) (GLuint array);
+typedef void (APIENTRY * PFNGLBUFFERDATAPROC) (GLenum target, ptrdiff_t size, const GLvoid *data, GLenum usage);
+typedef void (APIENTRY * PFNGLCOMPILESHADERPROC) (GLuint shader);
+typedef GLuint(APIENTRY * PFNGLCREATEPROGRAMPROC) (void);
+typedef GLuint(APIENTRY * PFNGLCREATESHADERPROC) (GLenum type);
+typedef void (APIENTRY * PFNGLDELETEBUFFERSPROC) (GLsizei n, const GLuint *buffers);
+typedef void (APIENTRY * PFNGLDELETEPROGRAMPROC) (GLuint program);
+typedef void (APIENTRY * PFNGLDELETESHADERPROC) (GLuint shader);
+typedef void (APIENTRY * PFNGLDELETEVERTEXARRAYSPROC) (GLsizei n, const GLuint *arrays);
+typedef void (APIENTRY * PFNGLDETACHSHADERPROC) (GLuint program, GLuint shader);
+typedef void (APIENTRY * PFNGLENABLEVERTEXATTRIBARRAYPROC) (GLuint index);
+typedef void (APIENTRY * PFNGLGENBUFFERSPROC) (GLsizei n, GLuint *buffers);
+typedef void (APIENTRY * PFNGLGENVERTEXARRAYSPROC) (GLsizei n, GLuint *arrays);
+typedef GLint(APIENTRY * PFNGLGETATTRIBLOCATIONPROC) (GLuint program, const char *name);
+typedef void (APIENTRY * PFNGLGETPROGRAMINFOLOGPROC) (GLuint program, GLsizei bufSize, GLsizei *length, char *infoLog);+typedef void (APIENTRY * PFNGLGETPROGRAMIVPROC) (GLuint program, GLenum pname, GLint *params);
+typedef void (APIENTRY * PFNGLGETSHADERINFOLOGPROC) (GLuint shader, GLsizei bufSize, GLsizei *length, char *infoLog);
+typedef void (APIENTRY * PFNGLGETSHADERIVPROC) (GLuint shader, GLenum pname, GLint *params);
+typedef void (APIENTRY * PFNGLLINKPROGRAMPROC) (GLuint program);
+typedef void (APIENTRY * PFNGLSHADERSOURCEPROC) (GLuint shader, GLsizei count, const char* *string, const GLint *length);
+typedef void (APIENTRY * PFNGLUSEPROGRAMPROC) (GLuint program);
+typedef void (APIENTRY * PFNGLVERTEXATTRIBPOINTERPROC) (GLuint index, GLint size, GLenum type, GLboolean normalized, GLsizei stride, const GLvoid *pointer);
+typedef void (APIENTRY * PFNGLBINDATTRIBLOCATIONPROC) (GLuint program, GLuint index, const char *name);
+typedef GLint(APIENTRY * PFNGLGETUNIFORMLOCATIONPROC) (GLuint program, const char *name);
+typedef void (APIENTRY * PFNGLUNIFORMMATRIX4FVPROC) (GLint location, GLsizei count, GLboolean transpose, const GLfloat *value);
+typedef void (APIENTRY * PFNGLACTIVETEXTUREPROC) (GLenum texture);
+typedef void (APIENTRY * PFNGLUNIFORM1IPROC) (GLint location, GLint v0);
+typedef void (APIENTRY * PFNGLGENERATEMIPMAPPROC) (GLenum target);
+typedef void (APIENTRY * PFNGLDISABLEVERTEXATTRIBARRAYPROC) (GLuint index);
+typedef void (APIENTRY * PFNGLUNIFORM3FVPROC) (GLint location, GLsizei count, const GLfloat *value);
+typedef void (APIENTRY * PFNGLUNIFORM4FVPROC) (GLint location, GLsizei count, const GLfloat *value);
+
+PFNGLATTACHSHADERPROC glAttachShader;
+PFNGLBINDBUFFERPROC glBindBuffer;
+PFNGLBINDVERTEXARRAYPROC glBindVertexArray;
+PFNGLBUFFERDATAPROC glBufferData;
+PFNGLCOMPILESHADERPROC glCompileShader;
+PFNGLCREATEPROGRAMPROC glCreateProgram;
+PFNGLCREATESHADERPROC glCreateShader;
+PFNGLDELETEBUFFERSPROC glDeleteBuffers;
+PFNGLDELETEPROGRAMPROC glDeleteProgram;
+PFNGLDELETESHADERPROC glDeleteShader;
+PFNGLDELETEVERTEXARRAYSPROC glDeleteVertexArrays;
+PFNGLDETACHSHADERPROC glDetachShader;
+PFNGLENABLEVERTEXATTRIBARRAYPROC glEnableVertexAttribArray;
+PFNGLGENBUFFERSPROC glGenBuffers;
+PFNGLGENVERTEXARRAYSPROC glGenVertexArrays;
+PFNGLGETATTRIBLOCATIONPROC glGetAttribLocation;
+PFNGLGETPROGRAMINFOLOGPROC glGetProgramInfoLog;
+PFNGLGETPROGRAMIVPROC glGetProgramiv;
+PFNGLGETSHADERINFOLOGPROC glGetShaderInfoLog;
+PFNGLGETSHADERIVPROC glGetShaderiv;
+PFNGLLINKPROGRAMPROC glLinkProgram;
+PFNGLSHADERSOURCEPROC glShaderSource;
+PFNGLUSEPROGRAMPROC glUseProgram;
+PFNGLVERTEXATTRIBPOINTERPROC glVertexAttribPointer;
+PFNGLBINDATTRIBLOCATIONPROC glBindAttribLocation;
+PFNGLGETUNIFORMLOCATIONPROC glGetUniformLocation;
+PFNGLUNIFORMMATRIX4FVPROC glUniformMatrix4fv;
+PFNGLACTIVETEXTUREPROC glActiveTexture;
+PFNGLUNIFORM1IPROC glUniform1i;
+PFNGLGENERATEMIPMAPPROC glGenerateMipmap;
+PFNGLDISABLEVERTEXATTRIBARRAYPROC glDisableVertexAttribArray;
+PFNGLUNIFORM3FVPROC glUniform3fv;
+PFNGLUNIFORM4FVPROC glUniform4fv;
+
+PFNWGLCHOOSEPIXELFORMATARBPROC wglChoosePixelFormatARB;
+PFNWGLCREATECONTEXTATTRIBSARBPROC wglCreateContextAttribsARB;
+PFNWGLSWAPINTERVALEXTPROC wglSwapIntervalEXT;
+
+typedef struct VertexType
+{
+       VectorType position;
+       VectorType color;
+} VertexType;
+
+HDC     g_deviceContext = 0;
+HGLRC   g_renderingContext = 0;
+char    g_videoCardDescription[128];
+
+const bool VSYNC_ENABLED = true;
+const float SCREEN_DEPTH = 1000.0f;
+const float SCREEN_NEAR = 0.1f;
+
+int     g_vertexCount, g_indexCount;
+unsigned int g_vertexArrayId, g_vertexBufferId, g_indexBufferId;
+
+unsigned int g_vertexShader;
+unsigned int g_fragmentShader;
+unsigned int g_shaderProgram;
+
+const char VS_SHADER_SOURCE_FILE[] = "color.vs";
+const char PS_SHADER_SOURCE_FILE[] = "color.ps";
+
+float g_positionX = 0, g_positionY = 0, g_positionZ = -10;
+float g_rotationX = 0, g_rotationY = 0, g_rotationZ = 0;
+float g_worldMatrix[16];
+float g_viewMatrix[16];
+float g_projectionMatrix[16];
+
+bool InitializeOpenGL(HWND hwnd, int screenWidth, int screenHeight, float screenDepth, float screenNear, bool vsync)
+{
+        int attributeListInt[19];
+        int pixelFormat[1];
+        unsigned int formatCount;
+        int result;
+        PIXELFORMATDESCRIPTOR pixelFormatDescriptor;
+        int attributeList[5];
+        float fieldOfView, screenAspect;
+        char *vendorString, *rendererString;
+
+
+        // Get the device context for this window.
+        g_deviceContext = GetDC(hwnd);
+        if(!g_deviceContext)
+        {
+                return false;
+        }
+
+        // Support for OpenGL rendering.
+        attributeListInt[0] = WGL_SUPPORT_OPENGL_ARB;
+        attributeListInt[1] = TRUE;
+
+        // Support for rendering to a window.
+        attributeListInt[2] = WGL_DRAW_TO_WINDOW_ARB;
+        attributeListInt[3] = TRUE;
+
+        // Support for hardware acceleration.
+        attributeListInt[4] = WGL_ACCELERATION_ARB;
+        attributeListInt[5] = WGL_FULL_ACCELERATION_ARB;
+
+        // Support for 24bit color.
+        attributeListInt[6] = WGL_COLOR_BITS_ARB;
+        attributeListInt[7] = 24;
+
+        // Support for 24 bit depth buffer.
+        attributeListInt[8] = WGL_DEPTH_BITS_ARB;
+        attributeListInt[9] = 24;
+
+        // Support for double buffer.
+        attributeListInt[10] = WGL_DOUBLE_BUFFER_ARB;
+        attributeListInt[11] = TRUE;
+
+        // Support for swapping front and back buffer.
+        attributeListInt[12] = WGL_SWAP_METHOD_ARB;
+        attributeListInt[13] = WGL_SWAP_EXCHANGE_ARB;
+
+        // Support for the RGBA pixel type.
+        attributeListInt[14] = WGL_PIXEL_TYPE_ARB;
+        attributeListInt[15] = WGL_TYPE_RGBA_ARB;
+
+        // Support for a 8 bit stencil buffer.
+        attributeListInt[16] = WGL_STENCIL_BITS_ARB;
+        attributeListInt[17] = 8;
+
+        // Null terminate the attribute list.
+        attributeListInt[18] = 0;
+
+
+        // Query for a pixel format that fits the attributes we want.
+        result = wglChoosePixelFormatARB(g_deviceContext, attributeListInt, NULL, 1, pixelFormat, &formatCount);
+        if(result != 1)
+        {
+                return false;
+        }
+
+        // If the video card/display can handle our desired pixel format then we set it as the current one.
+        result = SetPixelFormat(g_deviceContext, pixelFormat[0], &pixelFormatDescriptor);
+        if(result != 1)
+        {
+                return false;
+        }
+
+        // Set the 4.0 version of OpenGL in the attribute list.
+        attributeList[0] = WGL_CONTEXT_MAJOR_VERSION_ARB;
+        attributeList[1] = 4;
+        attributeList[2] = WGL_CONTEXT_MINOR_VERSION_ARB;
+        attributeList[3] = 0;
+
+        // Null terminate the attribute list.
+        attributeList[4] = 0;
+
+        // Create a OpenGL 4.0 rendering context.
+        g_renderingContext = wglCreateContextAttribsARB(g_deviceContext, 0, attributeList);
+        if(g_renderingContext == NULL)
+        {
+                return false;
+        }
+
+        // Set the rendering context to active.
+        result = wglMakeCurrent(g_deviceContext, g_renderingContext);
+        if(result != 1)
+        {
+                return false;
+        }
+
+        // Set the depth buffer to be entirely cleared to 1.0 values.
+        glClearDepth(1.0f);
+
+        // Enable depth testing.
+        glEnable(GL_DEPTH_TEST);
+
+        // Set the polygon winding to front facing for the left handed system.
+        glFrontFace(GL_CW);
+
+        // Enable back face culling.
+        glEnable(GL_CULL_FACE);
+        glCullFace(GL_BACK);
+
+               // Initialize the world/model matrix to the identity matrix.
+               BuildIdentityMatrix(g_worldMatrix);
+
+               // Set the field of view and screen aspect ratio.
+               fieldOfView = PI / 4.0f;
+               screenAspect = (float)screenWidth / (float)screenHeight;
+
+               // Build the perspective projection matrix.
+               BuildPerspectiveFovLHMatrix(g_projectionMatrix, fieldOfView, screenAspect, screenNear, screenDepth);
+
+        // Get the name of the video card.
+        vendorString = (char*)glGetString(GL_VENDOR);
+        rendererString = (char*)glGetString(GL_RENDERER);
+        // Store the video card name in a class member variable so it can be retrieved later.
+        strcpy_s(g_videoCardDescription, vendorString);
+        strcat_s(g_videoCardDescription, " - ");
+        strcat_s(g_videoCardDescription, rendererString);
+
+        // Turn on or off the vertical sync depending on the input bool value.
+        if(vsync)
+        {
+                result = wglSwapIntervalEXT(1);
+        }
+        else
+        {
+                result = wglSwapIntervalEXT(0);
+        }
+
+        // Check if vsync was set correctly.
+        if(result != 1)
+        {
+                return false;
+        }
+
+        return true;
+}
+
+bool LoadExtensionList()
+{
+        // Load the OpenGL extensions that this application will be using.
+        wglChoosePixelFormatARB = (PFNWGLCHOOSEPIXELFORMATARBPROC)wglGetProcAddress("wglChoosePixelFormatARB");
+        if(!wglChoosePixelFormatARB)
+        {
+                return false;
+        }
+
+        wglCreateContextAttribsARB = (PFNWGLCREATECONTEXTATTRIBSARBPROC)wglGetProcAddress("wglCreateContextAttribsARB");
+        if(!wglCreateContextAttribsARB)
+        {
+                return false;
+        }
+
+        wglSwapIntervalEXT = (PFNWGLSWAPINTERVALEXTPROC)wglGetProcAddress("wglSwapIntervalEXT");
+        if(!wglSwapIntervalEXT)
+        {
+                return false;
+        }
+
+        glAttachShader = (PFNGLATTACHSHADERPROC)wglGetProcAddress("glAttachShader");
+        if(!glAttachShader)
+        {
+                return false;
+        }
+
+        glBindBuffer = (PFNGLBINDBUFFERPROC)wglGetProcAddress("glBindBuffer");
+        if(!glBindBuffer)
+        {
+                return false;
+        }
+
+        glBindVertexArray = (PFNGLBINDVERTEXARRAYPROC)wglGetProcAddress("glBindVertexArray");
+        if(!glBindVertexArray)
+        {
+                return false;
+        }
+
+        glBufferData = (PFNGLBUFFERDATAPROC)wglGetProcAddress("glBufferData");
+        if(!glBufferData)
+        {
+                return false;
+        }
+
+        glCompileShader = (PFNGLCOMPILESHADERPROC)wglGetProcAddress("glCompileShader");
+        if(!glCompileShader)
+        {
+                return false;
+        }
+
+        glCreateProgram = (PFNGLCREATEPROGRAMPROC)wglGetProcAddress("glCreateProgram");
+        if(!glCreateProgram)
+        {
+                return false;
+        }
+
+        glCreateShader = (PFNGLCREATESHADERPROC)wglGetProcAddress("glCreateShader");
+        if(!glCreateShader)
+        {
+                return false;
+        }
+
+        glDeleteBuffers = (PFNGLDELETEBUFFERSPROC)wglGetProcAddress("glDeleteBuffers");
+        if(!glDeleteBuffers)
+        {
+                return false;
+        }
+
+        glDeleteProgram = (PFNGLDELETEPROGRAMPROC)wglGetProcAddress("glDeleteProgram");
+        if(!glDeleteProgram)
+        {
+                return false;
+        }
+
+        glDeleteShader = (PFNGLDELETESHADERPROC)wglGetProcAddress("glDeleteShader");
+        if(!glDeleteShader)
+        {
+                return false;
+        }
+
+        glDeleteVertexArrays = (PFNGLDELETEVERTEXARRAYSPROC)wglGetProcAddress("glDeleteVertexArrays");
+        if(!glDeleteVertexArrays)
+        {
+                return false;
+        }
+
+        glDetachShader = (PFNGLDETACHSHADERPROC)wglGetProcAddress("glDetachShader");
+        if(!glDetachShader)
+        {
+                return false;
+        }
+
+        glEnableVertexAttribArray = (PFNGLENABLEVERTEXATTRIBARRAYPROC)wglGetProcAddress("glEnableVertexAttribArray");
+        if(!glEnableVertexAttribArray)
+        {
+                return false;
+        }
+
+        glGenBuffers = (PFNGLGENBUFFERSPROC)wglGetProcAddress("glGenBuffers");
+        if(!glGenBuffers)
+        {
+                return false;
+        }
+
+        glGenVertexArrays = (PFNGLGENVERTEXARRAYSPROC)wglGetProcAddress("glGenVertexArrays");
+        if(!glGenVertexArrays)
+        {
+                return false;
+        }
+
+        glGetAttribLocation = (PFNGLGETATTRIBLOCATIONPROC)wglGetProcAddress("glGetAttribLocation");
+        if(!glGetAttribLocation)
+        {
+                return false;
+        }
+
+        glGetProgramInfoLog = (PFNGLGETPROGRAMINFOLOGPROC)wglGetProcAddress("glGetProgramInfoLog");
+        if(!glGetProgramInfoLog)
+        {
+                return false;
+        }
+
+        glGetProgramiv = (PFNGLGETPROGRAMIVPROC)wglGetProcAddress("glGetProgramiv");
+        if(!glGetProgramiv)
+        {
+                return false;
+        }
+
+        glGetShaderInfoLog = (PFNGLGETSHADERINFOLOGPROC)wglGetProcAddress("glGetShaderInfoLog");
+        if(!glGetShaderInfoLog)
+        {
+                return false;
+        }
+
+        glGetShaderiv = (PFNGLGETSHADERIVPROC)wglGetProcAddress("glGetShaderiv");
+        if(!glGetShaderiv)
+        {
+                return false;
+        }
+
+        glLinkProgram = (PFNGLLINKPROGRAMPROC)wglGetProcAddress("glLinkProgram");
+        if(!glLinkProgram)
+        {
+                return false;
+        }
+
+        glShaderSource = (PFNGLSHADERSOURCEPROC)wglGetProcAddress("glShaderSource");
+        if(!glShaderSource)
+        {
+                return false;
+        }
+
+        glUseProgram = (PFNGLUSEPROGRAMPROC)wglGetProcAddress("glUseProgram");
+        if(!glUseProgram)
+        {
+                return false;
+        }
+
+        glVertexAttribPointer = (PFNGLVERTEXATTRIBPOINTERPROC)wglGetProcAddress("glVertexAttribPointer");
+        if(!glVertexAttribPointer)
+        {
+                return false;
+        }
+
+        glBindAttribLocation = (PFNGLBINDATTRIBLOCATIONPROC)wglGetProcAddress("glBindAttribLocation");
+        if(!glBindAttribLocation)
+        {
+                return false;
+        }
+
+        glGetUniformLocation = (PFNGLGETUNIFORMLOCATIONPROC)wglGetProcAddress("glGetUniformLocation");
+        if(!glGetUniformLocation)
+        {
+                return false;
+        }
+
+        glUniformMatrix4fv = (PFNGLUNIFORMMATRIX4FVPROC)wglGetProcAddress("glUniformMatrix4fv");
+        if(!glUniformMatrix4fv)
+        {
+                return false;
+        }
+
+        glActiveTexture = (PFNGLACTIVETEXTUREPROC)wglGetProcAddress("glActiveTexture");
+        if(!glActiveTexture)
+        {
+                return false;
+        }
+
+        glUniform1i = (PFNGLUNIFORM1IPROC)wglGetProcAddress("glUniform1i");
+        if(!glUniform1i)
+        {
+                return false;
+        }
+
+        glGenerateMipmap = (PFNGLGENERATEMIPMAPPROC)wglGetProcAddress("glGenerateMipmap");
+        if(!glGenerateMipmap)
+        {
+                return false;
+        }
+
+        glDisableVertexAttribArray = (PFNGLDISABLEVERTEXATTRIBARRAYPROC)wglGetProcAddress("glDisableVertexAttribArray");
+        if(!glDisableVertexAttribArray)
+        {
+                return false;
+        }
+
+        glUniform3fv = (PFNGLUNIFORM3FVPROC)wglGetProcAddress("glUniform3fv");
+        if(!glUniform3fv)
+        {
+                return false;
+        }
+
+        glUniform4fv = (PFNGLUNIFORM4FVPROC)wglGetProcAddress("glUniform4fv");
+        if(!glUniform4fv)
+        {
+                return false;
+        }
+
+        return true;
+}
+
+void FinalizeOpenGL(HWND hwnd)
+{
+        // Release the rendering context.
+        if(g_renderingContext)
+        {
+                wglMakeCurrent(NULL, NULL);
+                wglDeleteContext(g_renderingContext);
+                g_renderingContext = 0;
+        }
+
+        // Release the device context.
+        if(g_deviceContext)
+        {
+                ReleaseDC(hwnd, g_deviceContext);
+                g_deviceContext = 0;
+        }
+}
+
+void GetVideoCardInfo(char* cardName)
+{
+        strcpy_s(cardName, 128, g_videoCardDescription);
+        return;
+}
+
+bool InitializeExtensions(HWND hwnd)
+{
+        HDC deviceContext;
+        PIXELFORMATDESCRIPTOR pixelFormat;
+        int error;
+        HGLRC renderContext;
+        bool result;
+
+
+        // Get the device context for this window.
+        deviceContext = GetDC(hwnd);
+        if(!deviceContext)
+        {
+                return false;
+        }
+
+        // Set a temporary default pixel format.
+        error = SetPixelFormat(deviceContext, 1, &pixelFormat);
+        if(error != 1)
+        {
+                return false;
+        }
+
+        // Create a temporary rendering context.
+        renderContext = wglCreateContext(deviceContext);
+        if(!renderContext)
+        {
+                return false;
+        }
+
+        // Set the temporary rendering context as the current rendering context for this window.
+        error = wglMakeCurrent(deviceContext, renderContext);
+        if(error != 1)
+        {
+                return false;
+        }
+
+        // Initialize the OpenGL extensions needed for this application.  Note that a temporary rendering context was needed to do so.
+        result = LoadExtensionList();
+        if(!result)
+        {
+                return false;
+        }
+
+        // Release the temporary rendering context now that the extensions have been loaded.
+        wglMakeCurrent(NULL, NULL);
+        wglDeleteContext(renderContext);
+        renderContext = NULL;
+
+        // Release the device context for this window.
+        ReleaseDC(hwnd, deviceContext);
+        deviceContext = 0;
+
+        return true;
+}
+
+void OutputShaderErrorMessage(HWND hwnd, unsigned int shaderId, const char* shaderFilename)
+{
+        int logSize, i;
+        char* infoLog;
+        ofstream fout;
+        wchar_t newString[128];
+        unsigned int error;
+        size_t convertedChars;
+
+
+        // Get the size of the string containing the information log for the failed shader compilation message.
+        glGetShaderiv(shaderId, GL_INFO_LOG_LENGTH, &logSize);
+
+        // Increment the size by one to handle also the null terminator.
+        logSize++;
+
+        // Create a char buffer to hold the info log.
+        infoLog = new char[logSize];
+        if(!infoLog)
+        {
+                return;
+        }
+
+        // Now retrieve the info log.
+        glGetShaderInfoLog(shaderId, logSize, NULL, infoLog);
+
+        // Open a file to write the error message to.
+        fout.open("shader-error.txt");
+
+        // Write out the error message.
+        for(i=0; i<logSize; i++)
+        {
+                fout << infoLog[i];
+        }
+
+        // Close the file.
+        fout.close();
+
+        // Convert the shader filename to a wide character string.
+        error = mbstowcs_s(&convertedChars, newString, 128, shaderFilename, 128);
+        if(error != 0)
+        {
+                return;
+        }
+
+        // Pop a message up on the screen to notify the user to check the text file for compile errors.
+        MessageBoxW(hwnd, L"Error compiling shader.  Check shader-error.txt for message.", newString, MB_OK);
+
+        return;
+}
+
+void OutputLinkerErrorMessage(HWND hwnd, unsigned int programId)
+{
+        int logSize, i;
+        char* infoLog;
+        ofstream fout;
+
+
+        // Get the size of the string containing the information log for the failed shader compilation message.
+        glGetProgramiv(programId, GL_INFO_LOG_LENGTH, &logSize);
+
+        // Increment the size by one to handle also the null terminator.
+        logSize++;
+
+        // Create a char buffer to hold the info log.
+        infoLog = new char[logSize];
+        if(!infoLog)
+        {
+                return;
+        }
+
+        // Now retrieve the info log.
+        glGetProgramInfoLog(programId, logSize, NULL, infoLog);
+
+        // Open a file to write the error message to.
+        fout.open("linker-error.txt");
+
+        // Write out the error message.
+        for(i=0; i<logSize; i++)
+        {
+                fout << infoLog[i];
+        }
+
+        // Close the file.
+        fout.close();
+
+        // Pop a message up on the screen to notify the user to check the text file for linker errors.
+        MessageBox(hwnd, _T("Error compiling linker.  Check linker-error.txt for message."), _T("Linker Error"), MB_OK);
+}
+
+char* LoadShaderSourceFile(const char* filename)
+{
+        ifstream fin;
+        int fileSize;
+        char input;
+        char* buffer;
+
+
+        // Open the shader source file.
+        fin.open(filename);
+
+        // If it could not open the file then exit.
+        if(fin.fail())
+        {
+                return 0;
+        }
+
+        // Initialize the size of the file.
+        fileSize = 0;
+
+        // Read the first element of the file.
+        fin.get(input);
+
+        // Count the number of elements in the text file.
+        while(!fin.eof())
+        {
+                fileSize++;
+                fin.get(input);
+        }
+
+        // Close the file for now.
+        fin.close();
+
+        // Initialize the buffer to read the shader source file into.
+        buffer = new char[fileSize+1];
+        if(!buffer)
+        {
+                return 0;
+        }
+
+        // Open the shader source file again.
+        fin.open(filename);
+
+        // Read the shader text file into the buffer as a block.
+        fin.read(buffer, fileSize);
+
+        // Close the file.
+        fin.close();
+
+        // Null terminate the buffer.
+        buffer[fileSize] = '\0';
+
+        return buffer;
+}
+
+bool InitializeShader(HWND hwnd, const char* vsFilename, const char* fsFilename)
+{
+        const char* vertexShaderBuffer;
+        const char* fragmentShaderBuffer;
+        int status;
+
+        // Load the vertex shader source file into a text buffer.
+        vertexShaderBuffer = LoadShaderSourceFile(vsFilename);
+        if(!vertexShaderBuffer)
+        {
+                return false;
+        }
+
+        // Load the fragment shader source file into a text buffer.
+        fragmentShaderBuffer = LoadShaderSourceFile(fsFilename);
+        if(!fragmentShaderBuffer)
+        {
+                return false;
+        }
+
+        // Create a vertex and fragment shader object.
+        g_vertexShader = glCreateShader(GL_VERTEX_SHADER);
+        g_fragmentShader = glCreateShader(GL_FRAGMENT_SHADER);
+
+        // Copy the shader source code strings into the vertex and fragment shader objects.
+        glShaderSource(g_vertexShader, 1, &vertexShaderBuffer, NULL);
+        glShaderSource(g_fragmentShader, 1, &fragmentShaderBuffer, NULL);
+
+        // Release the vertex and fragment shader buffers.
+        delete [] vertexShaderBuffer;
+        vertexShaderBuffer = 0;
+
+        delete [] fragmentShaderBuffer;
+        fragmentShaderBuffer = 0;
+
+        // Compile the shaders.
+        glCompileShader(g_vertexShader);
+        glCompileShader(g_fragmentShader);
+
+        // Check to see if the vertex shader compiled successfully.
+        glGetShaderiv(g_vertexShader, GL_COMPILE_STATUS, &status);
+        if(status != 1)
+        {
+                // If it did not compile then write the syntax error message out to a text file for review.
+                OutputShaderErrorMessage(hwnd, g_vertexShader, vsFilename);
+                return false;
+        }
+
+        // Check to see if the fragment shader compiled successfully.
+        glGetShaderiv(g_fragmentShader, GL_COMPILE_STATUS, &status);
+        if(status != 1)
+        {
+                // If it did not compile then write the syntax error message out to a text file for review.
+                OutputShaderErrorMessage(hwnd, g_fragmentShader, fsFilename);
+                return false;
+        }
+
+        // Create a shader program object.
+        g_shaderProgram = glCreateProgram();
+
+        // Attach the vertex and fragment shader to the program object.
+        glAttachShader(g_shaderProgram, g_vertexShader);
+        glAttachShader(g_shaderProgram, g_fragmentShader);
+
+        // Bind the shader input variables.
+        glBindAttribLocation(g_shaderProgram, 0, "inputPosition");
+        glBindAttribLocation(g_shaderProgram, 1, "inputColor");
+
+        // Link the shader program.
+        glLinkProgram(g_shaderProgram);
+
+        // Check the status of the link.
+        glGetProgramiv(g_shaderProgram, GL_LINK_STATUS, &status);
+        if(status != 1)
+        {
+                // If it did not link then write the syntax error message out to a text file for review.
+                OutputLinkerErrorMessage(hwnd, g_shaderProgram);
+                return false;
+        }
+
+        return true;
+}
+
+void ShutdownShader()
+{
+        // Detach the vertex and fragment shaders from the program.
+        glDetachShader(g_shaderProgram, g_vertexShader);
+        glDetachShader(g_shaderProgram, g_fragmentShader);
+
+        // Delete the vertex and fragment shaders.
+        glDeleteShader(g_vertexShader);
+        glDeleteShader(g_fragmentShader);
+
+        // Delete the shader program.
+        glDeleteProgram(g_shaderProgram);
+}
+
+bool SetShaderParameters(float* worldMatrix, float* viewMatrix, float* projectionMatrix)
+{
+        unsigned int location;
+
+        // Set the world matrix in the vertex shader.
+        location = glGetUniformLocation(g_shaderProgram, "worldMatrix");
+        if(location == -1)
+        {
+                return false;
+        }
+        glUniformMatrix4fv(location, 1, false, worldMatrix);
+
+        // Set the view matrix in the vertex shader.
+        location = glGetUniformLocation(g_shaderProgram, "viewMatrix");
+        if(location == -1)
+        {
+                return false;
+        }
+        glUniformMatrix4fv(location, 1, false, viewMatrix);
+
+        // Set the projection matrix in the vertex shader.
+        location = glGetUniformLocation(g_shaderProgram, "projectionMatrix");
+        if(location == -1)
+        {
+                return false;
+        }
+        glUniformMatrix4fv(location, 1, false, projectionMatrix);
+
+        return true;
+}
+
+bool InitializeBuffers()
+{
+        VertexType vertices[] = {
+                       {{  1.0f,  1.0f,  1.0f }, { 1.0f, 0.0f, 0.0f }},
+                       {{  1.0f,  1.0f, -1.0f }, { 0.0f, 1.0f, 0.0f }},
+                       {{ -1.0f,  1.0f, -1.0f }, { 0.0f, 0.0f, 1.0f }},
+                       {{ -1.0f,  1.0f,  1.0f }, { 1.0f, 1.0f, 0.0f }},
+                       {{  1.0f, -1.0f,  1.0f }, { 1.0f, 0.0f, 1.0f }},
+                       {{  1.0f, -1.0f, -1.0f }, { 0.0f, 1.0f, 1.0f }},
+                       {{ -1.0f, -1.0f, -1.0f }, { 0.5f, 1.0f, 0.5f }},
+                       {{ -1.0f, -1.0f,  1.0f }, { 1.0f, 0.5f, 1.0f }},
+               };
+        uint16_t indices[] = { 1, 2, 3, 3, 2, 6, 6, 7, 3, 3, 0, 1, 0, 3, 7, 7, 6, 4, 4, 6, 5, 0, 7, 4, 1, 0, 4, 1, 4, 5, 2, 1, 5, 2, 5, 6 };
+
+        // Set the number of vertices in the vertex array.
+        g_vertexCount = sizeof(vertices) / sizeof(VertexType);
+
+        // Set the number of indices in the index array.
+        g_indexCount = sizeof(indices) / sizeof(uint16_t);
+
+        // Allocate an OpenGL vertex array object.
+        glGenVertexArrays(1, &g_vertexArrayId);
+
+        // Bind the vertex array object to store all the buffers and vertex attributes we create here.
+        glBindVertexArray(g_vertexArrayId);
+
+        // Generate an ID for the vertex buffer.
+        glGenBuffers(1, &g_vertexBufferId);
+
+        // Bind the vertex buffer and load the vertex (position and color) data into the vertex buffer.
+        glBindBuffer(GL_ARRAY_BUFFER, g_vertexBufferId);
+        glBufferData(GL_ARRAY_BUFFER, g_vertexCount * sizeof(VertexType), vertices, GL_STATIC_DRAW);
+
+        // Enable the two vertex array attributes.
+        glEnableVertexAttribArray(0);  // Vertex position.
+        glEnableVertexAttribArray(1);  // Vertex color.
+
+        // Specify the location and format of the position portion of the vertex buffer.
+        glBindBuffer(GL_ARRAY_BUFFER, g_vertexBufferId);
+        glVertexAttribPointer(0, 3, GL_FLOAT, false, sizeof(VertexType), 0);
+
+        // Specify the location and format of the color portion of the vertex buffer.
+        glBindBuffer(GL_ARRAY_BUFFER, g_vertexBufferId);
+        glVertexAttribPointer(1, 3, GL_FLOAT, false, sizeof(VertexType), (char*)NULL + (3 * sizeof(float)));
+
+        // Generate an ID for the index buffer.
+        glGenBuffers(1, &g_indexBufferId);
+
+        // Bind the index buffer and load the index data into it.
+        glBindBuffer(GL_ELEMENT_ARRAY_BUFFER, g_indexBufferId);
+        glBufferData(GL_ELEMENT_ARRAY_BUFFER, g_indexCount* sizeof(uint16_t), indices, GL_STATIC_DRAW);
+
+        return true;
+}
+
+void ShutdownBuffers()
+{
+        // Disable the two vertex array attributes.
+        glDisableVertexAttribArray(0);
+        glDisableVertexAttribArray(1);
+
+        // Release the vertex buffer.
+        glBindBuffer(GL_ARRAY_BUFFER, 0);
+        glDeleteBuffers(1, &g_vertexBufferId);
+
+        // Release the index buffer.
+        glBindBuffer(GL_ELEMENT_ARRAY_BUFFER, 0);
+        glDeleteBuffers(1, &g_indexBufferId);
+
+        // Release the vertex array object.
+        glBindVertexArray(0);
+        glDeleteVertexArrays(1, &g_vertexArrayId);
+
+        return;
+}
+
+void RenderBuffers()
+{
+        // Bind the vertex array object that stored all the information about the vertex and index buffers.
+        glBindVertexArray(g_vertexArrayId);
+
+        // Render the vertex buffer using the index buffer.
+        glDrawElements(GL_TRIANGLES, g_indexCount, GL_UNSIGNED_SHORT, 0);
+
+        return;
+}
+
+void CalculateCameraPosition()
+{
+    VectorType up, position, lookAt;
+    float yaw, pitch, roll;
+    float rotationMatrix[9];
+
+
+    // Setup the vector that points upwards.
+    up.x = 0.0f;
+    up.y = 1.0f;
+    up.z = 0.0f;
+
+    // Setup the position of the camera in the world.
+    position.x = g_positionX;
+    position.y = g_positionY;
+    position.z = g_positionZ;
+
+    // Setup where the camera is looking by default.
+    lookAt.x = 0.0f;
+    lookAt.y = 0.0f;
+    lookAt.z = 1.0f;
+
+    // Set the yaw (Y axis), pitch (X axis), and roll (Z axis) rotations in radians.
+    pitch = g_rotationX * 0.0174532925f;
+    yaw   = g_rotationY * 0.0174532925f;
+    roll  = g_rotationZ * 0.0174532925f;
+
+    // Create the rotation matrix from the yaw, pitch, and roll values.
+    MatrixRotationYawPitchRoll(rotationMatrix, yaw, pitch, roll);
+
+    // Transform the lookAt and up vector by the rotation matrix so the view is correctly rotated at the origin.
+    TransformCoord(lookAt, rotationMatrix);
+    TransformCoord(up, rotationMatrix);
+
+    // Translate the rotated camera position to the location of the viewer.
+    lookAt.x = position.x + lookAt.x;
+    lookAt.y = position.y + lookAt.y;
+    lookAt.z = position.z + lookAt.z;
+
+    // Finally create the view matrix from the three updated vectors.
+    BuildViewMatrix(position, lookAt, up, g_viewMatrix);
+}
+
+void Draw()
+{
+       static float rotateAngle = 0.0f;
+
+    // Set the color to clear the screen to.
+    glClearColor(0.2f, 0.3f, 0.4f, 1.0f);
+    // Clear the screen and depth buffer.
+    glClear(GL_COLOR_BUFFER_BIT | GL_DEPTH_BUFFER_BIT);
+
+       // Update world matrix to rotate the model
+       rotateAngle += PI / 120;
+       float rotationMatrixY[16];
+       float rotationMatrixZ[16];
+       MatrixRotationY(rotationMatrixY, rotateAngle);
+       MatrixRotationZ(rotationMatrixZ, rotateAngle);
+       MatrixMultiply(g_worldMatrix, rotationMatrixZ, rotationMatrixY);
+
+    // Generate the view matrix based on the camera's position.
+       CalculateCameraPosition();
+
+    // Set the color shader as the current shader program and set the matrices that it will use for rendering.
+       glUseProgram(g_shaderProgram);
+    SetShaderParameters(g_worldMatrix, g_viewMatrix, g_projectionMatrix);
+
+    // Render the model using the color shader.
+    RenderBuffers();
+
+    // Present the back buffer to the screen since rendering is complete.
+    SwapBuffers(g_deviceContext);
+}

 // the WindowProc function prototype
 LRESULT CALLBACK WindowProc(HWND hWnd,
@@ -25,32 +1049,75 @@ int WINAPI WinMain(HINSTANCE hInstance,

        // fill in the struct with the needed information
        wc.cbSize = sizeof(WNDCLASSEX);
-    wc.style = CS_HREDRAW | CS_VREDRAW;
+       wc.style = CS_HREDRAW | CS_VREDRAW | CS_OWNDC;
+    wc.lpfnWndProc = DefWindowProc;
+    wc.hInstance = hInstance;
+    wc.hCursor = LoadCursor(NULL, IDC_ARROW);
+    wc.hbrBackground = (HBRUSH)COLOR_WINDOW;
+    wc.lpszClassName = _T("Temporary");
+
+    // register the window class
+    RegisterClassEx(&wc);
+
+    // create the temporary window for OpenGL extension setup.
+    hWnd = CreateWindowEx(WS_EX_APPWINDOW,
+                          _T("Temporary"),    // name of the window class
+                          _T("Temporary"),   // title of the window
+                          WS_OVERLAPPEDWINDOW,    // window style
+                          0,    // x-position of the window
+                          0,    // y-position of the window
+                          640,    // width of the window
+                          480,    // height of the window
+                          NULL,    // we have no parent window, NULL
+                          NULL,    // we aren't using menus, NULL
+                          hInstance,    // application handle
+                          NULL);    // used with multiple windows, NULL
+
+                                                                       // Don't show the window.
+       ShowWindow(hWnd, SW_HIDE);
+
+    InitializeExtensions(hWnd);
+
+       DestroyWindow(hWnd);
+       hWnd = NULL;
+
+       // clear out the window class for use
+       ZeroMemory(&wc, sizeof(WNDCLASSEX));
+
+       // fill in the struct with the needed information
+       wc.cbSize = sizeof(WNDCLASSEX);
+       wc.style = CS_HREDRAW | CS_VREDRAW | CS_OWNDC;
        wc.lpfnWndProc = WindowProc;
        wc.hInstance = hInstance;
        wc.hCursor = LoadCursor(NULL, IDC_ARROW);
        wc.hbrBackground = (HBRUSH)COLOR_WINDOW;
-    wc.lpszClassName = _T("WindowClass1");
+       wc.lpszClassName = _T("Hello, Engine!");

        // register the window class
        RegisterClassEx(&wc);

        // create the window and use the result as the handle
-    hWnd = CreateWindowEx(0,
-                          _T("WindowClass1"),    // name of the window class
+       hWnd = CreateWindowEx(WS_EX_APPWINDOW,
+               _T("Hello, Engine!"),    // name of the window class
                _T("Hello, Engine!"),   // title of the window
                WS_OVERLAPPEDWINDOW,    // window style
                300,    // x-position of the window
                300,    // y-position of the window
-                          500,    // width of the window
-                          400,    // height of the window
+               960,    // width of the window
+               540,    // height of the window
                NULL,    // we have no parent window, NULL
                NULL,    // we aren't using menus, NULL
                hInstance,    // application handle
                NULL);    // used with multiple windows, NULL

+    InitializeOpenGL(hWnd, 960, 540, SCREEN_DEPTH, SCREEN_NEAR, true);
+
        // display the window on the screen
        ShowWindow(hWnd, nCmdShow);
+       SetForegroundWindow(hWnd);
+
+    InitializeShader(hWnd, VS_SHADER_SOURCE_FILE, PS_SHADER_SOURCE_FILE);
+    InitializeBuffers();

     // enter the main loop:

@@ -67,6 +1134,10 @@ int WINAPI WinMain(HINSTANCE hInstance,
         DispatchMessage(&msg);
     }

+    ShutdownBuffers();
+    ShutdownShader();
+    FinalizeOpenGL(hWnd);
+
     // return this part of the WM_QUIT message to Windows
     return msg.wParam;
 }
@@ -79,14 +1150,8 @@ LRESULT CALLBACK WindowProc(HWND hWnd, UINT message, WPARAM wParam, LPARAM lPara
     {
     case WM_PAINT:
         {
-               PAINTSTRUCT ps;
-               HDC hdc = BeginPaint(hWnd, &ps);
-               RECT rec = { 20, 20, 60, 80 };
-               HBRUSH brush = (HBRUSH) GetStockObject(BLACK_BRUSH);
-
-               FillRect(hdc, &rec, brush);
-
-               EndPaint(hWnd, &ps);
+          Draw();
+                 return 0;
         } break;
         // this message is read when the window is closed
     case WM_DESTROY:

另外新增一个数学头文件(math.h),用以实现一些基本的线性代数计算:

#include <math.h>

#ifndef PI
#define PI 3.14159265358979323846f
#endif

#ifndef TWO_PI
#define TWO_PI 3.14159265358979323846f * 2.0f
#endif

typedef struct VectorType
{
	union {
		struct { float x, y, z; };
		struct { float r, g, b; };
	};
} VectorType;

void MatrixRotationYawPitchRoll(float* matrix, float yaw, float pitch, float roll)
{
	float cYaw, cPitch, cRoll, sYaw, sPitch, sRoll;


	// Get the cosine and sin of the yaw, pitch, and roll.
	cYaw = cosf(yaw);
	cPitch = cosf(pitch);
	cRoll = cosf(roll);

	sYaw = sinf(yaw);
	sPitch = sinf(pitch);
	sRoll = sinf(roll);

	// Calculate the yaw, pitch, roll rotation matrix.
	matrix[0] = (cRoll * cYaw) + (sRoll * sPitch * sYaw);
	matrix[1] = (sRoll * cPitch);
	matrix[2] = (cRoll * -sYaw) + (sRoll * sPitch * cYaw);
	
	matrix[3] = (-sRoll * cYaw) + (cRoll * sPitch * sYaw);
	matrix[4] = (cRoll * cPitch);
	matrix[5] = (sRoll * sYaw) + (cRoll * sPitch * cYaw);
	
	matrix[6] = (cPitch * sYaw);
	matrix[7] = -sPitch;
	matrix[8] = (cPitch * cYaw);

	return;
}

void TransformCoord(VectorType& vector, float* matrix)
{
	float x, y, z;


	// Transform the vector by the 3x3 matrix.
	x = (vector.x * matrix[0]) + (vector.y * matrix[3]) + (vector.z * matrix[6]);
	y = (vector.x * matrix[1]) + (vector.y * matrix[4]) + (vector.z * matrix[7]);
	z = (vector.x * matrix[2]) + (vector.y * matrix[5]) + (vector.z * matrix[8]);

	// Store the result in the reference.
	vector.x = x;
	vector.y = y;
	vector.z = z;

	return;
}

void BuildViewMatrix(VectorType position, VectorType lookAt, VectorType up, float* result)
{
	VectorType zAxis, xAxis, yAxis;
	float length, result1, result2, result3;


	// zAxis = normal(lookAt - position)
	zAxis.x = lookAt.x - position.x;
	zAxis.y = lookAt.y - position.y;
	zAxis.z = lookAt.z - position.z;
	length = sqrt((zAxis.x * zAxis.x) + (zAxis.y * zAxis.y) + (zAxis.z * zAxis.z));
	zAxis.x = zAxis.x / length;
	zAxis.y = zAxis.y / length;
	zAxis.z = zAxis.z / length;

	// xAxis = normal(cross(up, zAxis))
	xAxis.x = (up.y * zAxis.z) - (up.z * zAxis.y);
	xAxis.y = (up.z * zAxis.x) - (up.x * zAxis.z);
	xAxis.z = (up.x * zAxis.y) - (up.y * zAxis.x);
	length = sqrt((xAxis.x * xAxis.x) + (xAxis.y * xAxis.y) + (xAxis.z * xAxis.z));
	xAxis.x = xAxis.x / length;
	xAxis.y = xAxis.y / length;
	xAxis.z = xAxis.z / length;

	// yAxis = cross(zAxis, xAxis)
	yAxis.x = (zAxis.y * xAxis.z) - (zAxis.z * xAxis.y);
	yAxis.y = (zAxis.z * xAxis.x) - (zAxis.x * xAxis.z);
	yAxis.z = (zAxis.x * xAxis.y) - (zAxis.y * xAxis.x);

	// -dot(xAxis, position)
	result1 = ((xAxis.x * position.x) + (xAxis.y * position.y) + (xAxis.z * position.z)) * -1.0f;

	// -dot(yaxis, eye)
	result2 = ((yAxis.x * position.x) + (yAxis.y * position.y) + (yAxis.z * position.z)) * -1.0f;

	// -dot(zaxis, eye)
	result3 = ((zAxis.x * position.x) + (zAxis.y * position.y) + (zAxis.z * position.z)) * -1.0f;

	// Set the computed values in the view matrix.
	result[0]  = xAxis.x;
	result[1]  = yAxis.x;
	result[2]  = zAxis.x;
	result[3]  = 0.0f;

	result[4]  = xAxis.y;
	result[5]  = yAxis.y;
	result[6]  = zAxis.y;
	result[7]  = 0.0f;

	result[8]  = xAxis.z;
	result[9]  = yAxis.z;
	result[10] = zAxis.z;
	result[11] = 0.0f;

	result[12] = result1;
	result[13] = result2;
	result[14] = result3;
	result[15] = 1.0f;
}

void BuildIdentityMatrix(float* matrix)
{
	matrix[0] = 1.0f;
	matrix[1] = 0.0f;
	matrix[2] = 0.0f;
	matrix[3] = 0.0f;

	matrix[4] = 0.0f;
	matrix[5] = 1.0f;
	matrix[6] = 0.0f;
	matrix[7] = 0.0f;

	matrix[8] = 0.0f;
	matrix[9] = 0.0f;
	matrix[10] = 1.0f;
	matrix[11] = 0.0f;

	matrix[12] = 0.0f;
	matrix[13] = 0.0f;
	matrix[14] = 0.0f;
	matrix[15] = 1.0f;

	return;
}


void BuildPerspectiveFovLHMatrix(float* matrix, float fieldOfView, float screenAspect, float screenNear, float screenDepth)
{
	matrix[0] = 1.0f / (screenAspect * tan(fieldOfView * 0.5f));
	matrix[1] = 0.0f;
	matrix[2] = 0.0f;
	matrix[3] = 0.0f;

	matrix[4] = 0.0f;
	matrix[5] = 1.0f / tan(fieldOfView * 0.5f);
	matrix[6] = 0.0f;
	matrix[7] = 0.0f;

	matrix[8] = 0.0f;
	matrix[9] = 0.0f;
	matrix[10] = screenDepth / (screenDepth - screenNear);
	matrix[11] = 1.0f;

	matrix[12] = 0.0f;
	matrix[13] = 0.0f;
	matrix[14] = (-screenNear * screenDepth) / (screenDepth - screenNear);
	matrix[15] = 0.0f;

	return;
}


void MatrixRotationY(float* matrix, float angle)
{
	matrix[0] = cosf(angle);
	matrix[1] = 0.0f;
	matrix[2] = -sinf(angle);
	matrix[3] = 0.0f;

	matrix[4] = 0.0f;
	matrix[5] = 1.0f;
	matrix[6] = 0.0f;
	matrix[7] = 0.0f;

	matrix[8] = sinf(angle);
	matrix[9] = 0.0f;
	matrix[10] = cosf(angle);
	matrix[11] = 0.0f;

	matrix[12] = 0.0f;
	matrix[13] = 0.0f;
	matrix[14] = 0.0f;
	matrix[15] = 1.0f;

	return;
}


void MatrixTranslation(float* matrix, float x, float y, float z)
{
	matrix[0] = 1.0f;
	matrix[1] = 0.0f;
	matrix[2] = 0.0f;
	matrix[3] = 0.0f;

	matrix[4] = 0.0f;
	matrix[5] = 1.0f;
	matrix[6] = 0.0f;
	matrix[7] = 0.0f;

	matrix[8] = 0.0f;
	matrix[9] = 0.0f;
	matrix[10] = 1.0f;
	matrix[11] = 0.0f;

	matrix[12] = x;
	matrix[13] = y;
	matrix[14] = z;
	matrix[15] = 1.0f;

	return;
}


void MatrixRotationZ(float* matrix, float angle)
{
	matrix[0] = cosf(angle);
	matrix[1] = -sinf(angle);
	matrix[2] = 0.0f;
	matrix[3] = 0.0f;

	matrix[4] = sinf(angle);
	matrix[5] = cosf(angle);
	matrix[6] = 0.0f;
	matrix[7] = 0.0f;

	matrix[8] = 0.0f;
	matrix[9] = 0.0f;
	matrix[10] = 1.0f;
	matrix[11] = 0.0f;

	matrix[12] = 0.0f;
	matrix[13] = 0.0f;
	matrix[14] = 0.0f;
	matrix[15] = 1.0f;

	return;
}


void MatrixMultiply(float* result, float* matrix1, float* matrix2)
{
	result[0] = (matrix1[0] * matrix2[0]) + (matrix1[1] * matrix2[4]) + (matrix1[2] * matrix2[8]) + (matrix1[3] * matrix2[12]);
	result[1] = (matrix1[0] * matrix2[1]) + (matrix1[1] * matrix2[5]) + (matrix1[2] * matrix2[9]) + (matrix1[3] * matrix2[13]);
	result[2] = (matrix1[0] * matrix2[2]) + (matrix1[1] * matrix2[6]) + (matrix1[2] * matrix2[10]) + (matrix1[3] * matrix2[14]);
	result[3] = (matrix1[0] * matrix2[3]) + (matrix1[1] * matrix2[7]) + (matrix1[2] * matrix2[11]) + (matrix1[3] * matrix2[15]);

	result[4] = (matrix1[4] * matrix2[0]) + (matrix1[5] * matrix2[4]) + (matrix1[6] * matrix2[8]) + (matrix1[7] * matrix2[12]);
	result[5] = (matrix1[4] * matrix2[1]) + (matrix1[5] * matrix2[5]) + (matrix1[6] * matrix2[9]) + (matrix1[7] * matrix2[13]);
	result[6] = (matrix1[4] * matrix2[2]) + (matrix1[5] * matrix2[6]) + (matrix1[6] * matrix2[10]) + (matrix1[7] * matrix2[14]);
	result[7] = (matrix1[4] * matrix2[3]) + (matrix1[5] * matrix2[7]) + (matrix1[6] * matrix2[11]) + (matrix1[7] * matrix2[15]);

	result[8] = (matrix1[8] * matrix2[0]) + (matrix1[9] * matrix2[4]) + (matrix1[10] * matrix2[8]) + (matrix1[11] * matrix2[12]);
	result[9] = (matrix1[8] * matrix2[1]) + (matrix1[9] * matrix2[5]) + (matrix1[10] * matrix2[9]) + (matrix1[11] * matrix2[13]);
	result[10] = (matrix1[8] * matrix2[2]) + (matrix1[9] * matrix2[6]) + (matrix1[10] * matrix2[10]) + (matrix1[11] * matrix2[14]);
	result[11] = (matrix1[8] * matrix2[3]) + (matrix1[9] * matrix2[7]) + (matrix1[10] * matrix2[11]) + (matrix1[11] * matrix2[15]);

	result[12] = (matrix1[12] * matrix2[0]) + (matrix1[13] * matrix2[4]) + (matrix1[14] * matrix2[8]) + (matrix1[15] * matrix2[12]);
	result[13] = (matrix1[12] * matrix2[1]) + (matrix1[13] * matrix2[5]) + (matrix1[14] * matrix2[9]) + (matrix1[15] * matrix2[13]);
	result[14] = (matrix1[12] * matrix2[2]) + (matrix1[13] * matrix2[6]) + (matrix1[14] * matrix2[10]) + (matrix1[15] * matrix2[14]);
	result[15] = (matrix1[12] * matrix2[3]) + (matrix1[13] * matrix2[7]) + (matrix1[14] * matrix2[11]) + (matrix1[15] * matrix2[15]);

	return;
}

编译方法(使用Visual Studio编译工具包):

D:\wenli\Source\Repos\GameEngineFromScratch\Platform\Windows>cl /EHsc /Z7 opengl32.lib user32.lib gdi32.lib helloengine_opengl.cpp

编译方法(使用Clang-cl):

D:\wenli\Source\Repos\GameEngineFromScratch\Platform\Windows>clang-cl /EHsc -o helloengine_opengl helloengine_opengl.cpp user32.lib gdi32.lib opengl32.lib

编译方法(使用Clang):

D:\wenli\Source\Repos\GameEngineFromScratch\Platform\Windows>clang -o helloengine_opengl helloengine_opengl.cpp -luser32 -lgdi32 -lopengl32
helloengine_opengl-99755a.o : warning LNK4217: 本地定义的符号 ___std_terminate 在函数 "int `public: virtual __thiscall std::basic_filebuf<char,struct std::char_traits<char> >::~basic_filebuf<char,struct std::char_traits<char> >(void)'::`1'::dtor$8" (?dtor$8@?0???1?$basic_filebuf@DU?$char_traits@D@std@@@std@@UAE@XZ@4HA) 中导入
helloengine_opengl-99755a.o : warning LNK4217: 本地定义的符号 __CxxThrowException@8 在函数 "class std::codecvt<char,char,struct _Mbstatet> const & __cdecl std::use_facet<class std::codecvt<char,char,struct _Mbstatet> >(class std::locale const &)" (??$use_facet@V?$codecvt@DDU_Mbstatet@@@std@@@std@@YAABV?$codecvt@DDU_Mbstatet@@@0@ABVlocale@0@@Z) 中导入

会出两个warning,这是因为我们目前这个代码是Windows平台专用的,在C++的异常模式方面,有一些Clang的小小兼容问题(用Clang-cl /EHsc可以解决这个问题),但是可以无视。

Shader程序(需要放在和源代码一个目录):

color.vs

////////////////////////////////////////////////////////////////////////////////
// Filename: color.vs
////////////////////////////////////////////////////////////////////////////////

#version 400

/////////////////////
// INPUT VARIABLES //
/////////////////////
in vec3 inputPosition;
in vec3 inputColor;

//////////////////////
// OUTPUT VARIABLES //
//////////////////////
out vec3 color;

///////////////////////
// UNIFORM VARIABLES //
///////////////////////
uniform mat4 worldMatrix;
uniform mat4 viewMatrix;
uniform mat4 projectionMatrix;

////////////////////////////////////////////////////////////////////////////////
// Vertex Shader
////////////////////////////////////////////////////////////////////////////////
void main(void)
{
	// Calculate the position of the vertex against the world, view, and projection matrices.
	gl_Position = worldMatrix * vec4(inputPosition, 1.0f);
	gl_Position = viewMatrix * gl_Position;
	gl_Position = projectionMatrix * gl_Position;

	// Store the input color for the pixel shader to use.
	color = inputColor;
}

color.ps

////////////////////////////////////////////////////////////////////////////////
// Filename: color.ps
////////////////////////////////////////////////////////////////////////////////
#version 400


/////////////////////
// INPUT VARIABLES //
/////////////////////
in vec3 color;


//////////////////////
// OUTPUT VARIABLES //
//////////////////////
out vec4 outputColor;


////////////////////////////////////////////////////////////////////////////////
// Pixel Shader
////////////////////////////////////////////////////////////////////////////////
void main(void)
{
	outputColor = vec4(color, 1.0f);
}

最后的运行效果如下图:

截图工具的关系,动画的颜色比较少,出现明显的色阶:

篇幅关系,代码的说明在后面的文章进行。

从零开始手敲次世代游戏引擎(十二)

上一篇我们在Windows环境下面用D3D绘制了一个在三维空间当中的三角形。

本篇我们在Linux环境当中用OpenGL来绘制一个在三维空间当中的矩形。

本文所用的代码存储在GitHub:article_12这个分支当中。

netwarm007/GameEngineFromScratch

与之前一样,为了便于之后比较差异,考虑图形模块的具体设计,我们尽量留用前面的代码。

首先进入Platform/Linux目录,复制helloengine_xcb.c到helloengine_opengl.cpp。然后作如下变更:

 #include <stdlib.h>
 #include <string.h>
 
+#include <X11/Xlib.h>
+#include <X11/Xlib-xcb.h>
 #include <xcb/xcb.h>
 
+#include <GL/gl.h> 
+#include <GL/glx.h> 
+#include <GL/glu.h>
+
+#define GLX_CONTEXT_MAJOR_VERSION_ARB       0x2091
+#define GLX_CONTEXT_MINOR_VERSION_ARB       0x2092
+typedef GLXContext (*glXCreateContextAttribsARBProc)(Display*, GLXFBConfig, GLXContext, Bool, const int*);
+
+// Helper to check for extension string presence.  Adapted from:
+//   http://www.opengl.org/resources/features/OGLextensions/
+static bool isExtensionSupported(const char *extList, const char *extension)
+{
+  const char *start;
+  const char *where, *terminator;
+  
+  /* Extension names should not have spaces. */
+  where = strchr(extension, ' ');
+  if (where || *extension == '\0')
+    return false;
+
+  /* It takes a bit of care to be fool-proof about parsing the
+     OpenGL extensions string. Don't be fooled by sub-strings,
+     etc. */
+  for (start=extList;;) {
+    where = strstr(start, extension);
+
+    if (!where)
+      break;
+
+    terminator = where + strlen(extension);
+
+    if ( where == start || *(where - 1) == ' ' )
+      if ( *terminator == ' ' || *terminator == '\0' )
+        return true;
+
+    start = terminator;
+  }
+
+  return false;
+}
+
+static bool ctxErrorOccurred = false;
+static int ctxErrorHandler(Display *dpy, XErrorEvent *ev)
+{
+    ctxErrorOccurred = true;
+    return 0;
+}
+
+void DrawAQuad() {
+    glClearColor(1.0, 1.0, 1.0, 1.0); 
+    glClear(GL_COLOR_BUFFER_BIT | GL_DEPTH_BUFFER_BIT); 
+
+    glMatrixMode(GL_PROJECTION); 
+    glLoadIdentity(); 
+    glOrtho(-1., 1., -1., 1., 1., 20.); 
+
+    glMatrixMode(GL_MODELVIEW); 
+    glLoadIdentity(); 
+    gluLookAt(0., 0., 10., 0., 0., 0., 0., 1., 0.); 
+
+    glBegin(GL_QUADS); 
+    glColor3f(1., 0., 0.); 
+    glVertex3f(-.75, -.75, 0.); 
+    glColor3f(0., 1., 0.); 
+    glVertex3f( .75, -.75, 0.); 
+    glColor3f(0., 0., 1.); 
+    glVertex3f( .75, .75, 0.); 
+    glColor3f(1., 1., 0.); 
+    glVertex3f(-.75, .75, 0.); 
+    glEnd(); 
+} 
+
 int main(void) {
     xcb_connection_t    *pConn;
     xcb_screen_t        *pScreen;
@@ -11,41 +85,145 @@ int main(void) {
     xcb_gcontext_t      foreground;
     xcb_gcontext_t      background;
     xcb_generic_event_t *pEvent;
+    xcb_colormap_t colormap;
     uint32_t        mask = 0;
-       uint32_t                values[2];
+    uint32_t        values[3];
     uint8_t         isQuit = 0;
 
-       char title[] = "Hello, Engine!";
+    char title[] = "Hello, Engine![OpenGL]";
     char title_icon[] = "Hello, Engine! (iconified)";
 
+    Display *display;
+    int default_screen;
+    GLXContext context;
+    GLXFBConfig *fb_configs;
+    GLXFBConfig fb_config;
+    int num_fb_configs = 0;
+    XVisualInfo *vi;
+    GLXDrawable drawable;
+    GLXWindow glxwindow;
+    glXCreateContextAttribsARBProc glXCreateContextAttribsARB;
+    const char *glxExts;
+
+    // Get a matching FB config
+    static int visual_attribs[] =
+    {
+      GLX_X_RENDERABLE    , True,
+      GLX_DRAWABLE_TYPE   , GLX_WINDOW_BIT,
+      GLX_RENDER_TYPE     , GLX_RGBA_BIT,
+      GLX_X_VISUAL_TYPE   , GLX_TRUE_COLOR,
+      GLX_RED_SIZE        , 8,
+      GLX_GREEN_SIZE      , 8,
+      GLX_BLUE_SIZE       , 8,
+      GLX_ALPHA_SIZE      , 8,
+      GLX_DEPTH_SIZE      , 24,
+      GLX_STENCIL_SIZE    , 8,
+      GLX_DOUBLEBUFFER    , True,
+      //GLX_SAMPLE_BUFFERS  , 1,
+      //GLX_SAMPLES         , 4,
+      None
+    };
+
+    int glx_major, glx_minor;
+
+    /* Open Xlib Display */ 
+    display = XOpenDisplay(NULL);
+    if(!display)
+    {
+        fprintf(stderr, "Can't open display\n");
+        return -1;
+    }
+
+    // FBConfigs were added in GLX version 1.3.
+    if (!glXQueryVersion(display, &glx_major, &glx_minor) || 
+       ((glx_major == 1) && (glx_minor < 3)) || (glx_major < 1))
+    {
+        fprintf(stderr, "Invalid GLX version\n");
+        return -1;
+    }
+
+    default_screen = DefaultScreen(display);
+
+    /* Query framebuffer configurations */
+    fb_configs = glXChooseFBConfig(display, default_screen, visual_attribs, &num_fb_configs);
+    if(!fb_configs || num_fb_configs == 0)
+    {
+        fprintf(stderr, "glXGetFBConfigs failed\n");
+        return -1;
+    }
+
+    /* Pick the FB config/visual with the most samples per pixel */
+    {
+        int best_fbc = -1, worst_fbc = -1, best_num_samp = -1, worst_num_samp = 999;
+
+        for (int i=0; i<num_fb_configs; ++i)
+        {
+            XVisualInfo *vi = glXGetVisualFromFBConfig(display, fb_configs[i]);
+            if (vi)
+            {
+                int samp_buf, samples;
+                glXGetFBConfigAttrib(display, fb_configs[i], GLX_SAMPLE_BUFFERS, &samp_buf);
+                glXGetFBConfigAttrib(display, fb_configs[i], GLX_SAMPLES, &samples);
+      
+                printf( "  Matching fbconfig %d, visual ID 0x%lx: SAMPLE_BUFFERS = %d,"
+                        " SAMPLES = %d\n", 
+                        i, vi -> visualid, samp_buf, samples);
+
+                if (best_fbc < 0 || (samp_buf && samples > best_num_samp))
+                    best_fbc = i, best_num_samp = samples;
+                if (worst_fbc < 0 || !samp_buf || samples < worst_num_samp)
+                    worst_fbc = i, worst_num_samp = samples;
+            }
+            XFree( vi );
+        }
+
+        fb_config = fb_configs[best_fbc];
+    }
+
+    /* Get a visual */
+    vi = glXGetVisualFromFBConfig(display, fb_config);
+    printf("Chosen visual ID = 0x%lx\n", vi->visualid);
+
     /* establish connection to X server */
-       pConn = xcb_connect(0, 0);
+    pConn = XGetXCBConnection(display);
+    if(!pConn)
+    {
+        XCloseDisplay(display);
+        fprintf(stderr, "Can't get xcb connection from display\n");
+        return -1;
+    }
 
-       /* get the first screen */
-       pScreen = xcb_setup_roots_iterator(xcb_get_setup(pConn)).data;
+    /* Acquire event queue ownership */
+    XSetEventQueueOwner(display, XCBOwnsEventQueue);
+
+    /* Find XCB screen */
+    xcb_screen_iterator_t screen_iter = 
+        xcb_setup_roots_iterator(xcb_get_setup(pConn));
+    for(int screen_num = vi->screen;
+        screen_iter.rem && screen_num > 0;
+        --screen_num, xcb_screen_next(&screen_iter));
+    pScreen = screen_iter.data;
 
     /* get the root window */
     window = pScreen->root;
 
-       /* create black (foreground) graphic context */
-       foreground = xcb_generate_id(pConn);
-       mask = XCB_GC_FOREGROUND | XCB_GC_GRAPHICS_EXPOSURES;
-       values[0] = pScreen->black_pixel;
-       values[1] = 0;
-       xcb_create_gc(pConn, foreground, window, mask, values);
+    /* Create XID's for colormap */
+    colormap = xcb_generate_id(pConn);
 
-       /* create which (background) graphic context */
-       background = xcb_generate_id(pConn);
-       mask = XCB_GC_BACKGROUND | XCB_GC_GRAPHICS_EXPOSURES;
-       values[0] = pScreen->white_pixel;
-       values[1] = 0;
-       xcb_create_gc(pConn, background, window, mask, values);
+    xcb_create_colormap(
+        pConn,
+        XCB_COLORMAP_ALLOC_NONE,
+        colormap,
+        window,
+        vi->visualid 
+        );
 
     /* create window */
     window = xcb_generate_id(pConn);
-       mask = XCB_CW_BACK_PIXEL | XCB_CW_EVENT_MASK;
-       values[0] = pScreen->white_pixel;
-       values[1] = XCB_EVENT_MASK_EXPOSURE | XCB_EVENT_MASK_KEY_PRESS;
+    mask = XCB_CW_EVENT_MASK  | XCB_CW_COLORMAP;
+    values[0] = XCB_EVENT_MASK_EXPOSURE | XCB_EVENT_MASK_KEY_PRESS;
+    values[1] = colormap;
+    values[2] = 0;
     xcb_create_window (pConn,                   /* connection */
                        XCB_COPY_FROM_PARENT,    /* depth */
                        window,                  /* window ID */
@@ -54,9 +232,11 @@ int main(void) {
                        640, 480,                /* width, height */
                        10,                      /* boarder width */
                        XCB_WINDOW_CLASS_INPUT_OUTPUT, /* class */
-                                          pScreen->root_visual,        /* visual */
+                       vi->visualid,            /* visual */
                        mask, values);           /* masks */
 
+    XFree(vi);
+
     /* set the title of the window */
     xcb_change_property(pConn, XCB_PROP_MODE_REPLACE, window,
                 XCB_ATOM_WM_NAME, XCB_ATOM_STRING, 8,
@@ -72,13 +252,120 @@ int main(void) {
 
     xcb_flush(pConn);
 

+    /* Get the default screen's GLX extension list */
+    glxExts = glXQueryExtensionsString(display, default_screen);
+
+    /* NOTE: It is not necessary to create or make current to a context before
+       calling glXGetProcAddressARB */
+    glXCreateContextAttribsARB = (glXCreateContextAttribsARBProc)
+           glXGetProcAddressARB( (const GLubyte *) "glXCreateContextAttribsARB" );
+
+    /* Create OpenGL context */
+    ctxErrorOccurred = false;
+    int (*oldHandler)(Display*, XErrorEvent*) =
+        XSetErrorHandler(&ctxErrorHandler);
+
+    if (!isExtensionSupported(glxExts, "GLX_ARB_create_context") ||
+       !glXCreateContextAttribsARB )
+    {
+        printf( "glXCreateContextAttribsARB() not found"
+            " ... using old-style GLX context\n" );
+        context = glXCreateNewContext(display, fb_config, GLX_RGBA_TYPE, 0, True);
+        if(!context)
+        {
+            fprintf(stderr, "glXCreateNewContext failed\n");
+            return -1;
+        }
+    }
+    else
+    {
+        int context_attribs[] =
+          {
+            GLX_CONTEXT_MAJOR_VERSION_ARB, 3,
+            GLX_CONTEXT_MINOR_VERSION_ARB, 0,
+            None
+          };
+
+        printf( "Creating context\n" );
+        context = glXCreateContextAttribsARB(display, fb_config, 0,
+                                          True, context_attribs );
+
+        XSync(display, False);
+        if (!ctxErrorOccurred && context)
+          printf( "Created GL 3.0 context\n" );
+        else
+        {
+          /* GLX_CONTEXT_MAJOR_VERSION_ARB = 1 */
+          context_attribs[1] = 1;
+          /* GLX_CONTEXT_MINOR_VERSION_ARB = 0 */
+          context_attribs[3] = 0;
+
+          ctxErrorOccurred = false;
+
+          printf( "Failed to create GL 3.0 context"
+                  " ... using old-style GLX context\n" );
+          context = glXCreateContextAttribsARB(display, fb_config, 0, 
+                                            True, context_attribs );
+        }
+    }
+
+    XSync(display, False);
+
+    XSetErrorHandler(oldHandler);
+
+    if (ctxErrorOccurred || !context)
+    {
+        printf( "Failed to create an OpenGL context\n" );
+        return -1;
+    }
+
+    /* Verifying that context is a direct context */
+    if (!glXIsDirect (display, context))
+    {
+        printf( "Indirect GLX rendering context obtained\n" );
+    }
+    else
+    {
+        printf( "Direct GLX rendering context obtained\n" );
+    }
+
+    /* Create GLX Window */
+    glxwindow = 
+            glXCreateWindow(
+                display,
+                fb_config,
+                window,
+                0
+                );
+
+    if(!window)
+    {
+        xcb_destroy_window(pConn, window);
+        glXDestroyContext(display, context);
+
+        fprintf(stderr, "glXDestroyContext failed\n");
+        return -1;
+    }
+
+    drawable = glxwindow;
+
+    /* make OpenGL context current */
+    if(!glXMakeContextCurrent(display, drawable, drawable, context))
+    {
+        xcb_destroy_window(pConn, window);
+        glXDestroyContext(display, context);
+
+        fprintf(stderr, "glXMakeContextCurrent failed\n");
+        return -1;
+    }
+
+
-       while((pEvent = xcb_wait_for_event(pConn)) && !isQuit) {
+    while(!isQuit && (pEvent = xcb_wait_for_event(pConn))) {
         switch(pEvent->response_type & ~0x80) {
         case XCB_EXPOSE:
             {       
-                       xcb_rectangle_t rect = { 20, 20, 60, 80 };
-                       xcb_poly_fill_rectangle(pConn, window, foreground, 1, &rect);
-                       xcb_flush(pConn);
+                DrawAQuad();
+                glXSwapBuffers(display, drawable);
             }
             break;
         case XCB_KEY_PRESS:
@@ -88,6 +375,8 @@ int main(void) {
         free(pEvent);
     }
 
+
+    /* Cleanup */
     xcb_disconnect(pConn);
 
     return 0;

几个要点:

首先,在X(当前版本:11)环境当中,就如我们之前所说的,它是被设计为C-S架构,而显卡是被XServer所隐蔽的,所以如果遵从这个架构,我们是不能直接访问显卡,而需要通过一个被称为GLX的X扩展库,将3D绘图指令以X协议扩展的方式发给X Server,然后X Server再发送给显卡。

但是这样的架构对于软实时系统的游戏来说,其实是过于复杂。所以在2008年开始,X导入了DRI架构,也就是对于本地渲染的情况,可以将OpenGL指令直接发给显卡驱动,而不需要经过X。后来又出现了DRI2等。具体细节请参考下面的链接。

GLX – Wikipedia

但是,GLX这个库在书写的时候还没有XCB,所以它是牢牢绑定Xlib的。而XCB和Xlib是一种替代关系。

所以,在基于XCB的GLX出来之前,我们不得不同时使用XCB和Xlib。用XCB来创建和管理基本的X窗口,而用Xlib + GLX来创建OpenGL相关的图形资源。这就是我们在代码里加了很多头文件的原因。

在代码当中,我们首先给出了我们想要的FrameBuffer(就是用来保存渲染结果并最终生成显示图像的内存上的一片区域)的格式:

    // Get a matching FB config
    static int visual_attribs[] =
    {
      GLX_X_RENDERABLE    , True,
      GLX_DRAWABLE_TYPE   , GLX_WINDOW_BIT,
      GLX_RENDER_TYPE     , GLX_RGBA_BIT,
      GLX_X_VISUAL_TYPE   , GLX_TRUE_COLOR,
      GLX_RED_SIZE        , 8,
      GLX_GREEN_SIZE      , 8,
      GLX_BLUE_SIZE       , 8,
      GLX_ALPHA_SIZE      , 8,
      GLX_DEPTH_SIZE      , 24,
      GLX_STENCIL_SIZE    , 8,
      GLX_DOUBLEBUFFER    , True,
      //GLX_SAMPLE_BUFFERS  , 1,
      //GLX_SAMPLES         , 4,
      None
    };

接下来的代码是罗列出缺省显示器所支持的符合上述条件的所有FrameBuffer格式,然后选择一个最好的(采样数最多的):

    /* Query framebuffer configurations */
    fb_configs = glXChooseFBConfig(display, default_screen, visual_attribs, &num_fb_configs);
    if(!fb_configs || num_fb_configs == 0)
    {
        fprintf(stderr, "glXGetFBConfigs failed\n");
        return -1;
    }

    /* Pick the FB config/visual with the most samples per pixel */
    {
        int best_fbc = -1, worst_fbc = -1, best_num_samp = -1, worst_num_samp = 999;

        for (int i=0; i<num_fb_configs; ++i)
        {
            XVisualInfo *vi = glXGetVisualFromFBConfig(display, fb_configs[i]);
            if (vi)
            {
                int samp_buf, samples;
                glXGetFBConfigAttrib(display, fb_configs[i], GLX_SAMPLE_BUFFERS, &samp_buf);
                glXGetFBConfigAttrib(display, fb_configs[i], GLX_SAMPLES, &samples);

                printf( "  Matching fbconfig %d, visual ID 0x%lx: SAMPLE_BUFFERS = %d,"
                        " SAMPLES = %d\n",
                        i, vi -> visualid, samp_buf, samples);

                if (best_fbc < 0 || (samp_buf && samples > best_num_samp))
                    best_fbc = i, best_num_samp = samples;
                if (worst_fbc < 0 || !samp_buf || samples < worst_num_samp)
                    worst_fbc = i, worst_num_samp = samples;
            }
            XFree( vi );
        }

        fb_config = fb_configs[best_fbc];
    }

因为上面都是通过Xlib进行的操作,但是我们要使用XCB来创建窗口并管理窗口,所以接下来做了一个同步,让XCB和Xlib都指向同一块屏幕(FrameBuffer)

    /* establish connection to X server */
    pConn = XGetXCBConnection(display);
    if(!pConn)
    {
        XCloseDisplay(display);
        fprintf(stderr, "Can't get xcb connection from display\n");
        return -1;
    }

    /* Acquire event queue ownership */
    XSetEventQueueOwner(display, XCBOwnsEventQueue);

    /* Find XCB screen */
    xcb_screen_iterator_t screen_iter =
        xcb_setup_roots_iterator(xcb_get_setup(pConn));
    for(int screen_num = vi->screen;
        screen_iter.rem && screen_num > 0;
        --screen_num, xcb_screen_next(&screen_iter));
    pScreen = screen_iter.data;

然后我们通过XCB创建窗体,这里和(九)是基本完全一样的。

再通过Xlib+GLX来创建这个窗体当中的OpenGL绘图上下文(Context)。这里取代的是(九)当中的foreground和background。这里的代码看起来稍微有些复杂,因为在OpenGL 3.0之前(不含)的版本与之后的版本的创建方法是不一样的。当然我们可以按照低版本创建,但是版本越低,能够使用的OpenGL功能就越少。所以我们的代码进行了一些版本的探查,并根据探查结果选择最好的创建方式:

    /* Get the default screen's GLX extension list */
    glxExts = glXQueryExtensionsString(display, default_screen);

    /* NOTE: It is not necessary to create or make current to a context before
       calling glXGetProcAddressARB */
    glXCreateContextAttribsARB = (glXCreateContextAttribsARBProc)
           glXGetProcAddressARB( (const GLubyte *) "glXCreateContextAttribsARB" );

    /* Create OpenGL context */
    ctxErrorOccurred = false;
    int (*oldHandler)(Display*, XErrorEvent*) =
        XSetErrorHandler(&ctxErrorHandler);

    if (!isExtensionSupported(glxExts, "GLX_ARB_create_context") ||
       !glXCreateContextAttribsARB )
    {
        printf( "glXCreateContextAttribsARB() not found"
            " ... using old-style GLX context\n" );
        context = glXCreateNewContext(display, fb_config, GLX_RGBA_TYPE, 0, True);
        if(!context)
        {
            fprintf(stderr, "glXCreateNewContext failed\n");
            return -1;
        }
    }
    else
    {
        int context_attribs[] =
          {
            GLX_CONTEXT_MAJOR_VERSION_ARB, 3,
            GLX_CONTEXT_MINOR_VERSION_ARB, 0,
            None
          };

        printf( "Creating context\n" );
        context = glXCreateContextAttribsARB(display, fb_config, 0,
                                          True, context_attribs );

        XSync(display, False);
        if (!ctxErrorOccurred && context)
          printf( "Created GL 3.0 context\n" );
        else
        {
          /* GLX_CONTEXT_MAJOR_VERSION_ARB = 1 */
          context_attribs[1] = 1;
          /* GLX_CONTEXT_MINOR_VERSION_ARB = 0 */
          context_attribs[3] = 0;

          ctxErrorOccurred = false;

          printf( "Failed to create GL 3.0 context"
                  " ... using old-style GLX context\n" );
          context = glXCreateContextAttribsARB(display, fb_config, 0,
                                            True, context_attribs );
        }
    }

    XSync(display, False);

    XSetErrorHandler(oldHandler);

    if (ctxErrorOccurred || !context)
    {
        printf( "Failed to create an OpenGL context\n" );
        return -1;
    }

然后为了让GLX能够使用我们通过XCB创建出来的窗口,我们对窗口进行了一次转换,让它也绑定到GLX的对象当中:

    /* Create GLX Window */
    glxwindow =
            glXCreateWindow(
                display,
                fb_config,
                window,
                0
                );

    if(!window)
    {
        xcb_destroy_window(pConn, window);
        glXDestroyContext(display, context);

        fprintf(stderr, "glXDestroyContext failed\n");
        return -1;
    }

通知OpenGL(显卡)画布的位置:

    drawable = glxwindow;

    /* make OpenGL context current */
    if(!glXMakeContextCurrent(display, drawable, drawable, context))
    {
        xcb_destroy_window(pConn, window);
        glXDestroyContext(display, context);

        fprintf(stderr, "glXMakeContextCurrent failed\n");
        return -1;
    }

之后用XCB处理窗体消息队列,并在XCB_EXPOSE消息处理流程当中,使用OpenGL函数完成绘图。

    while(!isQuit && (pEvent = xcb_wait_for_event(pConn))) {
        switch(pEvent->response_type & ~0x80) {
        case XCB_EXPOSE:
            {
                DrawAQuad();
                glXSwapBuffers(display, drawable);
            }
            break;
        case XCB_KEY_PRESS:
            isQuit = 1;
            break;
        }
        free(pEvent);
    }

这个程序的编译命令行如下:

[tim@localhost Linux]$ clang -lxcb -lX11 -lX11-xcb -lGL -lGLU -o helloengine_opengl helloengine_opengl.cpp

需要事先用apt或者yum安装libGL-dev,libGLU-dev, libX11-dev,libX11-xcb-dev,libxcb-dev。注意在不同的发行版本当中包的名字会稍有不同。

如果需要调试,则需要增加一个“-g”选项。然后使用gdb进行调试。

运行结果如下:

对比前一篇的Direct 3D,我们可以看到我们并没有提供任何的Shader程序。实际绘图的指令也仅仅是如下数行,比Direct 3D的一连串API调用要简洁明了许多。这就是我们之前提到过的,OpenGL是一种比较高层的封装,它让我们集中在要绘制的内容本身的同时,也隐藏了很多实际的处理。对于CAD、科学仿真等领域来说十分好用,但是对于更为复杂的应用来说,特别是游戏这种需要做深度优化的图形运用来讲,就显得有些封装过头了(当然,我们这里使用的是最简单的固定管道的OpenGL。OpenGL高版本也是支持GPU编程的,这个在后续介绍):

void DrawAQuad() {
    glClearColor(1.0, 1.0, 1.0, 1.0);
    glClear(GL_COLOR_BUFFER_BIT | GL_DEPTH_BUFFER_BIT);

    glMatrixMode(GL_PROJECTION);
    glLoadIdentity();
    glOrtho(-1., 1., -1., 1., 1., 20.);

    glMatrixMode(GL_MODELVIEW);
    glLoadIdentity();
    gluLookAt(0., 0., 10., 0., 0., 0., 0., 1., 0.);

    glBegin(GL_QUADS);
    glColor3f(1., 0., 0.);
    glVertex3f(-.75, -.75, 0.);
    glColor3f(0., 1., 0.);
    glVertex3f( .75, -.75, 0.);
    glColor3f(0., 0., 1.);
    glVertex3f( .75, .75, 0.);
    glColor3f(1., 1., 0.);
    glVertex3f(-.75, .75, 0.);
    glEnd();
}

从零开始手敲次世代游戏引擎(十一)

上一篇我们用Direct 2D绘制了一个平面图形。接下来我们用Direct 3D绘制一个3D图形。

首先我们还是通过复制的方法重用我们的代码。拷贝helloengine_d2d.cpp到helloengine_d3d.cpp。然后作如下修改(本文使用的是D3D 11接口,因为直接上D3D 12的话学习难度坡度太大):

@@ -2,13 +2,39 @@
 #include <windows.h>
 #include <windowsx.h>
 #include <tchar.h>
+#include <stdint.h>

-#include <d2d1.h>
+#include <d3d11.h>
+#include <d3d11_1.h>
+#include <d3dcompiler.h>
+#include <DirectXMath.h>
+#include <DirectXPackedVector.h>
+#include <DirectXColors.h>

-ID2D1Factory                   *pFactory = nullptr;
-ID2D1HwndRenderTarget  *pRenderTarget = nullptr;
-ID2D1SolidColorBrush   *pLightSlateGrayBrush = nullptr;
-ID2D1SolidColorBrush   *pCornflowerBlueBrush = nullptr;
+using namespace DirectX;
+using namespace DirectX::PackedVector;
+
+const uint32_t SCREEN_WIDTH  =  960;
+const uint32_t SCREEN_HEIGHT =  480;
+
+// global declarations
+IDXGISwapChain          *g_pSwapchain = nullptr;              // the pointer to the swap chain interface
+ID3D11Device            *g_pDev       = nullptr;              // the pointer to our Direct3D device interface
+ID3D11DeviceContext     *g_pDevcon    = nullptr;              // the pointer to our Direct3D device context
+
+ID3D11RenderTargetView  *g_pRTView    = nullptr;
+
+ID3D11InputLayout       *g_pLayout    = nullptr;              // the pointer to the input layout
+ID3D11VertexShader      *g_pVS        = nullptr;              // the pointer to the vertex shader
+ID3D11PixelShader       *g_pPS        = nullptr;              // the pointer to the pixel shader
+
+ID3D11Buffer            *g_pVBuffer   = nullptr;              // Vertex Buffer
+
+// vertex buffer structure
+struct VERTEX {
+        XMFLOAT3    Position;
+        XMFLOAT4    Color;
+};

 template<class T>
 inline void SafeRelease(T **ppInterfaceToRelease)
@@ -21,32 +47,164 @@ inline void SafeRelease(T **ppInterfaceToRelease)
     }
 }

+void CreateRenderTarget() {
+    HRESULT hr;
+    ID3D11Texture2D *pBackBuffer;
+
+    // Get a pointer to the back buffer
+    g_pSwapchain->GetBuffer( 0, __uuidof( ID3D11Texture2D ),
+                                 ( LPVOID* )&pBackBuffer );
+
+    // Create a render-target view
+    g_pDev->CreateRenderTargetView( pBackBuffer, NULL,
+                                          &g_pRTView );
+    pBackBuffer->Release();
+
+    // Bind the view
+    g_pDevcon->OMSetRenderTargets( 1, &g_pRTView, NULL );
+}
+
+void SetViewPort() {
+    D3D11_VIEWPORT viewport;
+    ZeroMemory(&viewport, sizeof(D3D11_VIEWPORT));
+
+    viewport.TopLeftX = 0;
+    viewport.TopLeftY = 0;
+    viewport.Width = SCREEN_WIDTH;
+    viewport.Height = SCREEN_HEIGHT;
+
+    g_pDevcon->RSSetViewports(1, &viewport);
+}
+
+// this is the function that loads and prepares the shaders
+void InitPipeline() {
+    // load and compile the two shaders
+    ID3DBlob *VS, *PS;
+
+    D3DReadFileToBlob(L"copy.vso", &VS);
+    D3DReadFileToBlob(L"copy.pso", &PS);
+
+    // encapsulate both shaders into shader objects
+    g_pDev->CreateVertexShader(VS->GetBufferPointer(), VS->GetBufferSize(), NULL, &g_pVS);
+    g_pDev->CreatePixelShader(PS->GetBufferPointer(), PS->GetBufferSize(), NULL, &g_pPS);
+
+    // set the shader objects
+    g_pDevcon->VSSetShader(g_pVS, 0, 0);
+    g_pDevcon->PSSetShader(g_pPS, 0, 0);
+
+    // create the input layout object
+    D3D11_INPUT_ELEMENT_DESC ied[] =
     {
+        {"POSITION", 0, DXGI_FORMAT_R32G32B32_FLOAT, 0, 0, D3D11_INPUT_PER_VERTEX_DATA, 0},
+        {"COLOR", 0, DXGI_FORMAT_R32G32B32A32_FLOAT, 0, 12, D3D11_INPUT_PER_VERTEX_DATA, 0},
+    };


+    g_pDev->CreateInputLayout(ied, 2, VS->GetBufferPointer(), VS->GetBufferSize(), &g_pLayout);
+    g_pDevcon->IASetInputLayout(g_pLayout);

+    VS->Release();
+    PS->Release();
+}

+// this is the function that creates the shape to render
+void InitGraphics() {
+    // create a triangle using the VERTEX struct
+    VERTEX OurVertices[] =
+    {
+        {XMFLOAT3(0.0f, 0.5f, 0.0f), XMFLOAT4(1.0f, 0.0f, 0.0f, 1.0f)},
+        {XMFLOAT3(0.45f, -0.5, 0.0f), XMFLOAT4(0.0f, 1.0f, 0.0f, 1.0f)},
+        {XMFLOAT3(-0.45f, -0.5f, 0.0f), XMFLOAT4(0.0f, 0.0f, 1.0f, 1.0f)}
+    };

+    // create the vertex buffer
+    D3D11_BUFFER_DESC bd;
+    ZeroMemory(&bd, sizeof(bd));

+    bd.Usage = D3D11_USAGE_DYNAMIC;                // write access access by CPU and GPU
+    bd.ByteWidth = sizeof(VERTEX) * 3;             // size is the VERTEX struct * 3
+    bd.BindFlags = D3D11_BIND_VERTEX_BUFFER;       // use as a vertex buffer
+    bd.CPUAccessFlags = D3D11_CPU_ACCESS_WRITE;    // allow CPU to write in buffer
+
+    g_pDev->CreateBuffer(&bd, NULL, &g_pVBuffer);       // create the buffer
+
+    // copy the vertices into the buffer
+    D3D11_MAPPED_SUBRESOURCE ms;
+    g_pDevcon->Map(g_pVBuffer, NULL, D3D11_MAP_WRITE_DISCARD, NULL, &ms);    // map the buffer
+    memcpy(ms.pData, OurVertices, sizeof(VERTEX) * 3);                       // copy the data
+    g_pDevcon->Unmap(g_pVBuffer, NULL);                                      // unmap the buffer
+}
+
+// this function prepare graphic resources for use
HRESULT CreateGraphicsResources(HWND hWnd)
{
    HRESULT hr = S_OK;
-    if (pRenderTarget == nullptr)
-    {
-        RECT rc;
-        GetClientRect(hWnd, &rc);
-        D2D1_SIZE_U size = D2D1::SizeU(rc.right - rc.left,
-                        rc.bottom - rc.top);
-        hr = pFactory->CreateHwndRenderTarget(
-            D2D1::RenderTargetProperties(),
-            D2D1::HwndRenderTargetProperties(hWnd, size),
-            &pRenderTarget);
-        if (SUCCEEDED(hr))
-        {
-            hr = pRenderTarget->CreateSolidColorBrush(D2D1::ColorF(D2D1::ColorF::LightSlateGray), &pLightSlateGrayBrush);
-        }

-        if (SUCCEEDED(hr))
-        {
-            hr = pRenderTarget->CreateSolidColorBrush(D2D1::ColorF(D2D1::ColorF::CornflowerBlue), &pCornflowerBlueBrush);
+    if (g_pSwapchain == nullptr)
+    {
+        // create a struct to hold information about the swap chain
+        DXGI_SWAP_CHAIN_DESC scd;
+
+        // clear out the struct for use
+        ZeroMemory(&scd, sizeof(DXGI_SWAP_CHAIN_DESC));
+
+        // fill the swap chain description struct
+        scd.BufferCount = 1;                                    // one back buffer
+        scd.BufferDesc.Width = SCREEN_WIDTH;
+        scd.BufferDesc.Height = SCREEN_HEIGHT;
+        scd.BufferDesc.Format = DXGI_FORMAT_R8G8B8A8_UNORM;     // use 32-bit color
+        scd.BufferDesc.RefreshRate.Numerator = 60;
+        scd.BufferDesc.RefreshRate.Denominator = 1;
+        scd.BufferUsage = DXGI_USAGE_RENDER_TARGET_OUTPUT;      // how swap chain is to be used
+        scd.OutputWindow = hWnd;                                // the window to be used
+        scd.SampleDesc.Count = 4;                               // how many multisamples
+        scd.Windowed = TRUE;                                    // windowed/full-screen mode
+        scd.Flags = DXGI_SWAP_CHAIN_FLAG_ALLOW_MODE_SWITCH;     // allow full-screen switching
+
+        const D3D_FEATURE_LEVEL FeatureLevels[] = { D3D_FEATURE_LEVEL_11_1,
+                                                    D3D_FEATURE_LEVEL_11_0,
+                                                    D3D_FEATURE_LEVEL_10_1,
+                                                    D3D_FEATURE_LEVEL_10_0,
+                                                    D3D_FEATURE_LEVEL_9_3,
+                                                    D3D_FEATURE_LEVEL_9_2,
+                                                    D3D_FEATURE_LEVEL_9_1};
+        D3D_FEATURE_LEVEL FeatureLevelSupported;
+
+        HRESULT hr = S_OK;
+
+        // create a device, device context and swap chain using the information in the scd struct
+        hr = D3D11CreateDeviceAndSwapChain(NULL,
+                                      D3D_DRIVER_TYPE_HARDWARE,
+                                      NULL,
+                                      0,
+                                      FeatureLevels,
+                                      _countof(FeatureLevels),
+                                      D3D11_SDK_VERSION,
+                                      &scd,
+                                      &g_pSwapchain,
+                                      &g_pDev,
+                                      &FeatureLevelSupported,
+                                      &g_pDevcon);
+
+        if (hr == E_INVALIDARG) {
+            hr = D3D11CreateDeviceAndSwapChain(NULL,
+                                      D3D_DRIVER_TYPE_HARDWARE,
+                                      NULL,
+                                      0,
+                                      &FeatureLevelSupported,
+                                      1,
+                                      D3D11_SDK_VERSION,
+                                      &scd,
+                                      &g_pSwapchain,
+                                      &g_pDev,
+                                      NULL,
+                                      &g_pDevcon);
+        }
+
+        if (hr == S_OK) {
+            CreateRenderTarget();
+            SetViewPort();
+            InitPipeline();
+            InitGraphics();
         }
     }
     return hr;
@@ -54,11 +212,40 @@ HRESULT CreateGraphicsResources(HWND hWnd)

 void DiscardGraphicsResources()
 {
-    SafeRelease(&pRenderTarget);
-    SafeRelease(&pLightSlateGrayBrush);
-    SafeRelease(&pCornflowerBlueBrush);
+    SafeRelease(&g_pLayout);
+    SafeRelease(&g_pVS);
+    SafeRelease(&g_pPS);
+    SafeRelease(&g_pVBuffer);
+    SafeRelease(&g_pSwapchain);
+    SafeRelease(&g_pRTView);
+    SafeRelease(&g_pDev);
+    SafeRelease(&g_pDevcon);
 }

+// this is the function used to render a single frame
+void RenderFrame()
+{
+    // clear the back buffer to a deep blue
+    const FLOAT clearColor[] = {0.0f, 0.2f, 0.4f, 1.0f};
+    g_pDevcon->ClearRenderTargetView(g_pRTView, clearColor);
+
+    // do 3D rendering on the back buffer here
+    {
+        // select which vertex buffer to display
+        UINT stride = sizeof(VERTEX);
+        UINT offset = 0;
+        g_pDevcon->IASetVertexBuffers(0, 1, &g_pVBuffer, &stride, &offset);
+
+        // select which primtive type we are using
+        g_pDevcon->IASetPrimitiveTopology(D3D10_PRIMITIVE_TOPOLOGY_TRIANGLELIST);
+
+        // draw the vertex buffer to the back buffer
+        g_pDevcon->Draw(3, 0);
+    }
+
+    // swap the back buffer and the front buffer
+    g_pSwapchain->Present(0, 0);
+}

 // the WindowProc function prototype
 LRESULT CALLBACK WindowProc(HWND hWnd,
@@ -77,9 +264,6 @@ int WINAPI WinMain(HINSTANCE hInstance,
     // this struct holds information for the window class
     WNDCLASSEX wc;

-    // initialize COM
-    if (FAILED(CoInitializeEx(nullptr, COINIT_APARTMENTTHREADED | COINIT_DISABLE_OLE1DDE))) return -1;
-
     // clear out the window class for use
     ZeroMemory(&wc, sizeof(WNDCLASSEX));

@@ -97,17 +281,17 @@ int WINAPI WinMain(HINSTANCE hInstance,

     // create the window and use the result as the handle
     hWnd = CreateWindowEx(0,
-                          _T("WindowClass1"),    // name of the window class
-                          _T("Hello, Engine![Direct 2D]"),   // title of the window
-                          WS_OVERLAPPEDWINDOW,    // window style
-                          100,    // x-position of the window
-                          100,    // y-position of the window
-                          960,    // width of the window
-                          540,    // height of the window
-                          NULL,    // we have no parent window, NULL
-                          NULL,    // we aren't using menus, NULL
-                          hInstance,    // application handle
-                          NULL);    // used with multiple windows, NULL
+                          _T("WindowClass1"),                   // name of the window class
+                          _T("Hello, Engine![Direct 3D]"),      // title of the window
+                          WS_OVERLAPPEDWINDOW,                  // window style
+                          100,                                  // x-position of the window
+                          100,                                  // y-position of the window
+                          SCREEN_WIDTH,                         // width of the window
+                          SCREEN_HEIGHT,                        // height of the window
+                          NULL,                                 // we have no parent window, NULL
+                          NULL,                                 // we aren't using menus, NULL
+                          hInstance,                            // application handle
+                          NULL);                                // used with multiple windows, NULL

     // display the window on the screen
     ShowWindow(hWnd, nCmdShow);
@@ -127,9 +311,6 @@ int WINAPI WinMain(HINSTANCE hInstance,
         DispatchMessage(&msg);
     }

-    // uninitialize COM
-    CoUninitialize();
-
     // return this part of the WM_QUIT message to Windows
     return msg.wParam;
 }
@@ -144,108 +325,27 @@ LRESULT CALLBACK WindowProc(HWND hWnd, UINT message, WPARAM wParam, LPARAM lPara
     switch(message)
     {
        case WM_CREATE:
-               if (FAILED(D2D1CreateFactory(
-                                       D2D1_FACTORY_TYPE_SINGLE_THREADED, &pFactory)))
-               {
-                       result = -1; // Fail CreateWindowEx.
-               }
                wasHandled = true;
-        result = 1;
         break;

        case WM_PAINT:
-           {
-                       HRESULT hr = CreateGraphicsResources(hWnd);
-                       if (SUCCEEDED(hr))
-                       {
-                               PAINTSTRUCT ps;
-                               BeginPaint(hWnd, &ps);
-
-                               // start build GPU draw command
-                               pRenderTarget->BeginDraw();
-
-                               // clear the background with white color
-                               pRenderTarget->Clear(D2D1::ColorF(D2D1::ColorF::White));
-
-                // retrieve the size of drawing area
-                D2D1_SIZE_F rtSize = pRenderTarget->GetSize();
-
-                // draw a grid background.
-                int width = static_cast<int>(rtSize.width);
-                int height = static_cast<int>(rtSize.height);
-
-                for (int x = 0; x < width; x += 10)
-                {
-                    pRenderTarget->DrawLine(
-                        D2D1::Point2F(static_cast<FLOAT>(x), 0.0f),
-                        D2D1::Point2F(static_cast<FLOAT>(x), rtSize.height),
-                        pLightSlateGrayBrush,
-                        0.5f
-                        );
-                }
-
-                for (int y = 0; y < height; y += 10)
-                {
-                    pRenderTarget->DrawLine(
-                        D2D1::Point2F(0.0f, static_cast<FLOAT>(y)),
-                        D2D1::Point2F(rtSize.width, static_cast<FLOAT>(y)),
-                        pLightSlateGrayBrush,
-                        0.5f
-                        );
-                }
-
-                // draw two rectangles
-                D2D1_RECT_F rectangle1 = D2D1::RectF(
-                     rtSize.width/2 - 50.0f,
-                     rtSize.height/2 - 50.0f,
-                     rtSize.width/2 + 50.0f,
-                     rtSize.height/2 + 50.0f
-                     );
-
-                 D2D1_RECT_F rectangle2 = D2D1::RectF(
-                     rtSize.width/2 - 100.0f,
-                     rtSize.height/2 - 100.0f,
-                     rtSize.width/2 + 100.0f,
-                     rtSize.height/2 + 100.0f
-                     );
-
-                // draw a filled rectangle
-                pRenderTarget->FillRectangle(&rectangle1, pLightSlateGrayBrush);
-
-                // draw a outline only rectangle
-                pRenderTarget->DrawRectangle(&rectangle2, pCornflowerBlueBrush);
-
-                               // end GPU draw command building
-                               hr = pRenderTarget->EndDraw();
-                               if (FAILED(hr) || hr == D2DERR_RECREATE_TARGET)
-                               {
-                                       DiscardGraphicsResources();
-                               }
-
-                               EndPaint(hWnd, &ps);
-                       }
-           }
+               result = CreateGraphicsResources(hWnd);
+               RenderFrame();
                wasHandled = true;
         break;

        case WM_SIZE:
-               if (pRenderTarget != nullptr)
+               if (g_pSwapchain != nullptr)
                {
-                       RECT rc;
-                       GetClientRect(hWnd, &rc);
-
-                       D2D1_SIZE_U size = D2D1::SizeU(rc.right - rc.left, rc.bottom - rc.top);
-
-                       pRenderTarget->Resize(size);
+                   DiscardGraphicsResources();
                }
                wasHandled = true;
         break;

        case WM_DESTROY:
                DiscardGraphicsResources();
-               if (pFactory) {pFactory->Release(); pFactory=nullptr; }
                PostQuitMessage(0);
-        result = 1;
                wasHandled = true;
         break;

简单解释一下:

(关于D3D编程的系统教程在这里:Getting Started with Direct3D

基本窗体创建没有任何变化。因为不是COM,不需要创建Factory。(需要考证。从结果上来说是这样,但是这里不需要COM相关的代码的原因应该是D3D11的库当中已经进行了相关的处理。因为D3D12明显是COM)所以WM_CREATE里面基本不做任何事情。

新加了如下几个函数(子过程):

         CreateRenderTarget();
            SetViewPort();
            InitPipeline();
            InitGraphics();

第一个依然是我们熟悉的,创建RenderTarget,也就是画布。

第二个是设置视口。也就是设置渲染结果在画布当中的映射。我们目前是将整个画布都分配给了一个视口。在实际的游戏开发当中,会有多人分屏游玩的模式。这个时候就需要把一张画布分割成好几个视口。另外一个典型的运用就是VR。VR需要绘制左眼和右眼两幅图像,因此也需要将画布分割为两个视口。

第三个是初始化渲染管道。渲染管道就是GPU的工作流水线。使用GPU进行3D渲染的时候,最一般的会有顶点变换,像素化和像素填色这3个阶段。在这个过程当中,顶点变换和填色是可以编程的(而像素化是硬件固定功能,不可编程)。在这个初始化函数里面,我们可以看到我们从磁盘读取了两个GPU用的程序,一个叫“copy.vso”,一个叫”copy.pso”。它们分别对应着GPU的顶点变换阶段(Vertex Shading)和像素填色阶段(Pixel Shading)。这两个程序是我们编写的,使用的语言是HLSL(这是一种类似C语言的,微软推出的GPU编程语言)。具体内容在下面说明。

第四个则是传入实际要绘制的模型的顶点信息了。我们这里绘制的是一个三角形,因此有3个顶点。注意在D3D当中,使用的坐标系为左手坐标系,就是x轴向右,y轴向上,z轴指向屏幕里面。(这点很特别,今后写别的图形API的时候就有比较)

Coordinate Systems (Direct3D 9)

我们的顶点结构是这样的:

// vertex buffer structure
struct VERTEX {
        XMFLOAT3    Position;
        XMFLOAT4    Color;
};

因此,我们是这样初始化顶点的:

 // create a triangle using the VERTEX struct
    VERTEX OurVertices[] =
    {
        {XMFLOAT3(0.0f, 0.5f, 0.0f), XMFLOAT4(1.0f, 0.0f, 0.0f, 1.0f)},
        {XMFLOAT3(0.45f, -0.5, 0.0f), XMFLOAT4(0.0f, 1.0f, 0.0f, 1.0f)},
        {XMFLOAT3(-0.45f, -0.5f, 0.0f), XMFLOAT4(0.0f, 0.0f, 1.0f, 1.0f)}
    };

坐标系的原点在视口的中心,因为我们只有一个视口,所以就是屏幕中心。视口在各个坐标轴的缺省的范围是[-1, 1],因此0.5差不多正好是在画布长宽各1/4的地方。第二个部分是颜色。分别对应R(红色通道)G(绿色通道)B(蓝色通道)和A(透明通道)。这个顺序是在代码的layout部分指定的。范围是[0, 1]。所以我们可以看到

  1. 第一个顶点是上方中央,红色
  2. 第二个顶点是下方右侧,绿色
  3. 第三个顶点是下方左侧,蓝色

然后我们来看我们所写的GPU程序(Shader,称为着色器)

copy.vs

#include "cbuffer.h"
#include "vsoutput.hs"

v2p main(a2v input) {
	v2p output;
	output.position = float4(input.position, 1.0);
	output.color = input.color;

	return output;
}

看到这个程序基本就是将输入原样输出。其中输入是来自我们的应用程序(就是我们上面定义的VERTEX),而输出是输出给流水线的下一个步骤。在我们这个例子里面,就是像素着色器:

copy.ps

#include "vsoutput.hs"

float4 main(v2p input) : SV_TARGET
{
    return input.color;
}

而像素着色器也只是原样输出输入的颜色。

注意到我们用到了两个头文件。一个是定义应用程序传给Vertex Shader的数据结构,一个是定义Vertex Shader输出给Pixel Shader的数据结构。内容如下:

cbuffer.h

struct a2v {
	float3 position : POSITION;
	float4 color	: COLOR;
};

vsoutput.hs

struct v2p {
	float4 position : SV_POSITION;
	float4 color	: COLOR;
};

我们可以看到,这里面有一些奇怪的,标准C/C++不支持的东西。就是冒号后面的那些东西。这些东西是用来将变量和GPU的寄存器进行绑定的。因为GPU并不是全部可编程的,整个处理流水线当中混杂着可编程的环节和不可编程的环节。因此,当我们的输出要提供给不可编程的环节使用的时候(比如Vertex Shader的输出当中的position会被像素化模块用来插值计算三角形内部的点的坐标;比如Pixel Shader输出的color会被GPU的显示输出模块用来输出画面),就需要将这些变量绑定到一些事先定义好的寄存器当中去。

具体细节,请参考HLSL教程:

HLSL (Windows)

以及GPU渲染管道的说明:

en.m.wikipedia.org/wiki

在Windows当中,编译Shader(D3D规格)的方法如下:

fxc /T vs_5_0 /Zi /Fo copy.vso copy.vs
fxc /T ps_5_0 /Zi /Fo copy.pso copy.ps

如果找不到fxc.exe,那么应该是没有安装DirectX相关的开发包。重新运行Visual Studio选择安装就可以了。

代码编译的方法如下(调试版)(编译出现大量DirectXMath相关的错误的话,继续看下面):

D:\wenli\Source\Repos\GameEngineFromScratch\Platform\Windows>clang-cl -I./DirectXMath/Inc -c -Z7 -o helloengine_d3d.obj helloengine_d3d.cpp
D:\wenli\Source\Repos\GameEngineFromScratch\Platform\Windows>link -debug user32.lib d3d11.lib d3dcompiler.lib helloengine_d3d.obj

Release版:

D:\wenli\Source\Repos\GameEngineFromScratch\Platform\Windows>clang -I./DirectXMath/Inc -l user32.lib -l d3d11.lib -l d3dcompiler.lib -o helloengine_d3d.exe helloengine_d3d.cpp

Direct 3D工作的方式与Direct 2D不同,不是采用COM的方式,(需要考证。从结果上来说确实不需要链接ole32.lib。但是有可能是D3D11的库里面包括了这一部分)而是直接调用显卡的驱动。所以我们去掉COM相关的初始化代码,链接的时候也去掉ole32.lib。

注意我们这里加入了一个新的头文件目录:./DirectXMath。这是因为目前随Visual Studio安装的(正确来说应该是随Windows SDK安装的)DirectXMath库似乎版本还是比较老的,不支持clang的编译(就是说用了clang所不支持的特性)。所以我们需要从github上面下载一个最新的版本:

D:\wenli\Source\Repos\GameEngineFromScratch\Platform\Windows>git submodule add https://github.com/Microsoft/DirectXMath.git DirectXMath
D:\wenli\Source\Repos\GameEngineFromScratch\Platform\Windows>git submodule init DirectXMath
D:\wenli\Source\Repos\GameEngineFromScratch\Platform\Windows>git submodule update DirectXMath

然后重新编译应该就好了。

运行的效果如下:

三角形内部出现了漂亮的过渡色。这是因为我们程序给GPU的只有3个顶点,对于其它的点,GPU是根据它到3个顶点的距离(准确说是重心坐标,就是从被计算的点向3个顶点作辅助线,从而整个3角形被划分为3个小三角形(如果点在三角形边缘的时候会出现面积为0的塌缩三角形),每个小三角性的面积除以原本的大三角形的面积,得到3个[0,1]之间的值,根据这个值加权平均3个顶点的颜色)来进行插值运算的。这种插值运算不仅仅发生在position这个参数当中,也发生在color这个参数当中。所以最终就形成了这么一种结果。

好了。我们已经跨入了3D的殿堂了。保存我们的代码,准备下一个部分。

参考引用:

  1. DirectX 11 Tutorials
  2. Direct3D Tutorials

从零开始手敲次世代游戏引擎(十)

上一篇文章我们分别使用GDI(Windows)和XCB(Linux)在窗体当中绘制了一个矩形。

然而,这些矩形其实是由CPU绘制的,而不是GPU。因此这种绘制方式是很慢的。

本篇开始我们使用GPU完成图形的绘制。首先让我们看一下Windows平台特有的Direct X。

Direct X在早期其实分为几个模块:专门绘制2D图形的DirectDraw(现在改名为Direct2D),专门绘制3D图形的Direct3D,专门用于多媒体播放的DirectShow,等等。

我们首先来看看使用Direct2D来进行绘制的代码是什么样子的。

因为D2D只是提供一种用GPU绘制图形的方式,创建窗口等操作还是和以前一样的,也就是说我们可以重用之前写的helloengine_win.c的大部分代码。另外,为了将来可以很方便的在CPU绘制和GPU绘制之间切换比较,我们应该保留helloengine_win.c。

所以,首先将helloengine_win.c复制一份,命名为helloengine_d2d.cpp。(改为.cpp后缀的原因是我们将要使用的d2d的头文件需要按照C++方式编译)

然后如下修改这个文件(左侧有”+“号的行为新增加的行,”-“的行为删除的行):

(本文代码参考MSDN Direct 2D教程编写)

@@ -3,6 +3,63 @@
 #include <windowsx.h>
 #include <tchar.h>

+#include <d2d1.h>
+
+ID2D1Factory           *pFactory = nullptr;
+ID2D1HwndRenderTarget  *pRenderTarget = nullptr;
+ID2D1SolidColorBrush   *pLightSlateGrayBrush = nullptr;
+ID2D1SolidColorBrush   *pCornflowerBlueBrush = nullptr;
+
+template<class T>
+inline void SafeRelease(T **ppInterfaceToRelease)
+{
+    if (*ppInterfaceToRelease != nullptr)
+    {
+        (*ppInterfaceToRelease)->Release();
+
+        (*ppInterfaceToRelease) = nullptr;
+    }
+}
+
+HRESULT CreateGraphicsResources(HWND hWnd)
+{
+    HRESULT hr = S_OK;
+    if (pRenderTarget == nullptr)
+    {
+        RECT rc;
+        GetClientRect(hWnd, &rc);
+
+        D2D1_SIZE_U size = D2D1::SizeU(rc.right - rc.left,
+                        rc.bottom - rc.top);
+
+        hr = pFactory->CreateHwndRenderTarget(
+            D2D1::RenderTargetProperties(),
+            D2D1::HwndRenderTargetProperties(hWnd, size),
+            &pRenderTarget);
+
+        if (SUCCEEDED(hr))
+        {
+            hr = pRenderTarget->CreateSolidColorBrush(D2D1::ColorF(D2D1::ColorF::LightSlateGray), &pLightSlateGrayBrush);
+
+        }
+
+        if (SUCCEEDED(hr))
+        {
+            hr = pRenderTarget->CreateSolidColorBrush(D2D1::ColorF(D2D1::ColorF::CornflowerBlue), &pCornflowerBlueBrush);
+
+        }
+    }
+    return hr;
+}
+
+void DiscardGraphicsResources()
+{
+    SafeRelease(&pRenderTarget);
+    SafeRelease(&pLightSlateGrayBrush);
+    SafeRelease(&pCornflowerBlueBrush);
+}
+
+
 // the WindowProc function prototype
 LRESULT CALLBACK WindowProc(HWND hWnd,
                          UINT message,
@@ -20,6 +77,9 @@ int WINAPI WinMain(HINSTANCE hInstance,
     // this struct holds information for the window class
     WNDCLASSEX wc;

+    // initialize COM
+    if (FAILED(CoInitializeEx(nullptr, COINIT_APARTMENTTHREADED | COINIT_DISABLE_OLE1DDE))) return -1;
+
     // clear out the window class for use
     ZeroMemory(&wc, sizeof(WNDCLASSEX));

@@ -28,7 +88,7 @@ int WINAPI WinMain(HINSTANCE hInstance,
     wc.style = CS_HREDRAW | CS_VREDRAW;
     wc.lpfnWndProc = WindowProc;
     wc.hInstance = hInstance;
-    wc.hCursor = LoadCursor(NULL, IDC_ARROW);
+    wc.hCursor = LoadCursor(nullptr, IDC_ARROW);
     wc.hbrBackground = (HBRUSH)COLOR_WINDOW;
     wc.lpszClassName = _T("WindowClass1");

@@ -38,12 +98,12 @@ int WINAPI WinMain(HINSTANCE hInstance,
     // create the window and use the result as the handle
     hWnd = CreateWindowEx(0,
                           _T("WindowClass1"),    // name of the window class
-                          _T("Hello, Engine!"),   // title of the window
+                          _T("Hello, Engine![Direct 2D]"),   // title of the window
                           WS_OVERLAPPEDWINDOW,    // window style
-                          300,    // x-position of the window
-                          300,    // y-position of the window
-                          500,    // width of the window
-                          400,    // height of the window
+                          100,    // x-position of the window
+                          100,    // y-position of the window
+                          960,    // width of the window
+                          540,    // height of the window
                           NULL,    // we have no parent window, NULL
                           NULL,    // we aren't using menus, NULL
                           hInstance,    // application handle
@@ -58,7 +118,7 @@ int WINAPI WinMain(HINSTANCE hInstance,
     MSG msg;

     // wait for the next message in the queue, store the result in 'msg'
-    while(GetMessage(&msg, NULL, 0, 0))
+    while(GetMessage(&msg, nullptr, 0, 0))
     {
         // translate keystroke messages into the right format
         TranslateMessage(&msg);
@@ -67,6 +127,9 @@ int WINAPI WinMain(HINSTANCE hInstance,
         DispatchMessage(&msg);
     }

+    // uninitialize COM
+    CoUninitialize();
+
     // return this part of the WM_QUIT message to Windows
     return msg.wParam;
 }
@@ -74,30 +137,126 @@ int WINAPI WinMain(HINSTANCE hInstance,
 // this is the main message handler for the program
 LRESULT CALLBACK WindowProc(HWND hWnd, UINT message, WPARAM wParam, LPARAM lParam)
 {
+    LRESULT result = 0;
+    bool wasHandled = false;
+
     // sort through and find what code to run for the message given
     switch(message)
     {
+       case WM_CREATE:
+               if (FAILED(D2D1CreateFactory(
+                                       D2D1_FACTORY_TYPE_SINGLE_THREADED, &pFactory)))
+               {
+                       result = -1; // Fail CreateWindowEx.
+                       return result;
+               }
+               wasHandled = true;
+        result = 0;
+        break;
+
        case WM_PAINT:
            {
-               PAINTSTRUCT ps;
-               HDC hdc = BeginPaint(hWnd, &ps);
-               RECT rec = { 20, 20, 60, 80 };
-               HBRUSH brush = (HBRUSH) GetStockObject(BLACK_BRUSH);
-
-               FillRect(hdc, &rec, brush);
-
-               EndPaint(hWnd, &ps);
-           } break;
-        // this message is read when the window is closed
-        case WM_DESTROY:
-            {
-                // close the application entirely
-                PostQuitMessage(0);
-                return 0;
-            } break;
+                       HRESULT hr = CreateGraphicsResources(hWnd);
+                       if (SUCCEEDED(hr))
+                       {
+                               PAINTSTRUCT ps;
+                               BeginPaint(hWnd, &ps);
+
+                               // start build GPU draw command
+                               pRenderTarget->BeginDraw();
+
+                               // clear the background with white color
+                               pRenderTarget->Clear(D2D1::ColorF(D2D1::ColorF::White));
+
+                // retrieve the size of drawing area
+                D2D1_SIZE_F rtSize = pRenderTarget->GetSize();
+
+                // draw a grid background.
+                int width = static_cast<int>(rtSize.width);
+                int height = static_cast<int>(rtSize.height);
+
+                for (int x = 0; x < width; x += 10)
+                {
+                    pRenderTarget->DrawLine(
+                        D2D1::Point2F(static_cast<FLOAT>(x), 0.0f),
+                        D2D1::Point2F(static_cast<FLOAT>(x), rtSize.height),
+                        pLightSlateGrayBrush,
+                        0.5f
+                        );
+                }
+
+                for (int y = 0; y < height; y += 10)
+                {
+                    pRenderTarget->DrawLine(
+                        D2D1::Point2F(0.0f, static_cast<FLOAT>(y)),
+                        D2D1::Point2F(rtSize.width, static_cast<FLOAT>(y)),
+                        pLightSlateGrayBrush,
+                        0.5f
+                        );
+                }
+
+                // draw two rectangles
+                D2D1_RECT_F rectangle1 = D2D1::RectF(
+                     rtSize.width/2 - 50.0f,
+                     rtSize.height/2 - 50.0f,
+                     rtSize.width/2 + 50.0f,
+                     rtSize.height/2 + 50.0f
+                     );
+
+                 D2D1_RECT_F rectangle2 = D2D1::RectF(
+                     rtSize.width/2 - 100.0f,
+                     rtSize.height/2 - 100.0f,
+                     rtSize.width/2 + 100.0f,
+                     rtSize.height/2 + 100.0f
+                     );
+
+                // draw a filled rectangle
+                pRenderTarget->FillRectangle(&rectangle1, pLightSlateGrayBrush);
+
+                // draw a outline only rectangle
+                pRenderTarget->DrawRectangle(&rectangle2, pCornflowerBlueBrush);
+
+                               // end GPU draw command building
+                               hr = pRenderTarget->EndDraw();
+                               if (FAILED(hr) || hr == D2DERR_RECREATE_TARGET)
+                               {
+                                       DiscardGraphicsResources();
+                               }
+
+                               EndPaint(hWnd, &ps);
+                       }
+           }
+               wasHandled = true;
+        break;
+
+       case WM_SIZE:
+               if (pRenderTarget != nullptr)
+               {
+                       RECT rc;
+                       GetClientRect(hWnd, &rc);
+
+                       D2D1_SIZE_U size = D2D1::SizeU(rc.right - rc.left, rc.bottom - rc.top);
+
+                       pRenderTarget->Resize(size);
+               }
+               wasHandled = true;
+        break;
+
+       case WM_DESTROY:
+               DiscardGraphicsResources();
+               if (pFactory) {pFactory->Release(); pFactory=nullptr; }
+               PostQuitMessage(0);
+        result = 0;
+               wasHandled = true;
+        break;
+
+    case WM_DISPLAYCHANGE:
+        InvalidateRect(hWnd, nullptr, false);
+        wasHandled = true;
+        break;
     }

     // Handle any messages the switch statement didn't
-    return DefWindowProc (hWnd, message, wParam, lParam);
+    if (!wasHandled) { result = DefWindowProc (hWnd, message, wParam, lParam); }
+    return result;
 }

简单说明一下:

首先,包含了d2d1.h。这是一个Direct2D的Wrapper,也就是一个对Direct2D进行了简单的包装的头文件。

然后,定义了如下4个全局变量:

ID2D1Factory           *pFactory = nullptr;
ID2D1HwndRenderTarget  *pRenderTarget = nullptr;
ID2D1SolidColorBrush   *pLightSlateGrayBrush = nullptr;
ID2D1SolidColorBrush   *pCornflowerBlueBrush = nullptr;

第一个是程序设计模式(Design Pattern)当中的所谓建造工厂的一个接口。第二个到第四个与我们应该已经比较眼熟了,一个是渲染对象,就是画布,后面是两支画笔。不过这里的都是COM的接口。也就是说,这些对象实际上是存在于COM当中的,我们的程序只是拥有一个指向它们的接口。

D2D的库是以一种被称为”COM组件“的方式提供的。粗略地来说类似于Java当中的Reflection,不仅仅是一个动态库,而且这个库所提供的API也是可以动态查询的,而不是事先通过头文件进行申明。

“COM”自身是一个很复杂的概念,是微软当时为了让Office能够在办公软件当中胜出,所开发出来的一种技术。在Office当中使用的基于”COM”的技术主要有”OLE”和”DDE“。前者是嵌入式对象,就是我们可以把一个视频、或者一个别的什么原本Office当中不支持的文件,放到Office文档当中。然后这个东西就会显示为一个图标,或者是一个静态的图片(snapshot),双击这个图标或者这个静态的图片就会启动能够支持这个格式的软件对其进行播放或者编辑。这种播放或者编辑有两种形式:一种是in place,就是直接在Office文档当中进行,一直是standalone,就是在Office之外进行。in place的时候,其实是在后台启动能够支持这种格式的一个程序,但是隐藏其窗口。然后把Office当中的一小块客户区域(就是之前我们用过的Rect所定义的一个矩形区域)传递给这个后台程序,让其负责处理这块区域的绘制和用户输入。也就是说,在Office程序的WM_PAINT事件的处理当中,将Office窗口的整个客户区域分割为由自己绘制的部分和由OLE绘制的部分,由OLE绘制的部分通过COM技术传递给后台应用进行绘制。比如我们嵌入的OLE对象是一个视频,那么当你在Office文档内播放这个视频的时候,实际上后台会启动Windows Media Player,只不过它的界面是隐藏的。Windows Media Player对视频进行解码播放,只不过和平常不一样的是,最后画面不是画在Windows Media Player自己的窗体上,而是画在Office文档当中的一块矩形区域当中。

最常见的应用就是在PPT里面放一个视频,或者放一个Excel表格,Word文档什么的。这个其实就是用的”OLE”技术。

而”DDE“大部分和”OLE”类似,所不同的是这个对象是单独存放在磁盘上,而不是嵌入到Office文档当中进行保存的。我们将一个Excel拖入到PPT的时候,Office会问我们是作为嵌入式对象,还是链接。嵌入式对象就是”OLE”,而链接就是”DDE”。“DDE” 的特点是你可以随时在外部编辑那个文件,而改变会自动反映到使用“DDE”链接进的那个文档当中。也就是说,如果你用“链接”的方式把一个Excel放入PPT,那么后面如果你修改了那个Excel,PPT里面的那个Excel对象的数据也会跟着变。

除了这种应用,Windows服务,DirectX 3D当中所用的filter,.NET技术,IE Browser所用的插件,Office所用的插件,等等,都是基于”COM”技术。”COM“技术还有后继的”COM+”技术以及在多个电脑上分布式处理的”DCOM“(在Windows Server当中我们可以由一台服务器部署管理其它服务器,就是靠着“DCOM”) 技术。

–(题外话开始) —

笔者刚刚参加工作的时候,所在的项目组是负责一台名为”Morpheus“的台式机的开发(正式商品名”VAIO Type X”)

vaio.sony.co.jp/Product

这台机器可以支持7个电视频道同时24小时x7天无缝录像。当时一套的售价(含显示器)是大约100万日元,按那个时候的汇率大概是7-8万RMB。(东京只有7个免费电视频道)

我当时进入公司的时候这个项目的开发已经接近一半。也就是硬件基本设计定型了而软件才刚刚开始。这个时候公司突然决定要去参加一个VAIO的市场活动,需要展示这台巨无霸机器。然而如果只是展示硬件颇为无趣,所以想要展示录像功能,虽然录像功能并没有做好。

所以,需要快速地开发一种替代模式来进行展示。当时的别的型号的VAIO也是可以录像的,只不过每台只能录制一个频道。所以为了实现7个频道的同时录制,就需要7台电脑同时工作一个礼拜。但是如果只是将7台电脑打开放在那里,录制的节目是连续的,并不会按照电子节目单(EPG)进行分割。而如果找7个人去手动按开始结束,在国内可能可行,在日本这个开销就大了。因为要3班倒,需要21个人。

笔者采用DCOM解决了这个问题。就是写个程序去按照点开始和结束,然后导入第8台机器,下载分析EPG并通过DCOM去控制那7台电脑上面的程序。

–(题外话结束) —

不过这个“COM” 技术虽然很NB,但并不是微软原创的技术。这种技术实际上是一种名为”CORBA(Welcome To CORBA Web Site!)”的技术的微软版本而已。

+
+template<class T>
+inline void SafeRelease(T **ppInterfaceToRelease)
+{
+    if (*ppInterfaceToRelease != nullptr)
+    {
+        (*ppInterfaceToRelease)->Release();
+
+        (*ppInterfaceToRelease) = nullptr;
+    }
+}
+

这是我们写的第一个使用了C++模板机制的函数。模板也称为泛型,具体就不展开了,有兴趣的可以去看C++的书。

+HRESULT CreateGraphicsResources(HWND hWnd)
+{
+    HRESULT hr = S_OK;
+    if (pRenderTarget == nullptr)
+    {
+        RECT rc;
+        GetClientRect(hWnd, &rc);
+
+        D2D1_SIZE_U size = D2D1::SizeU(rc.right - rc.left,
+                        rc.bottom - rc.top);
+
+        hr = pFactory->CreateHwndRenderTarget(
+            D2D1::RenderTargetProperties(),
+            D2D1::HwndRenderTargetProperties(hWnd, size),
+            &pRenderTarget);
+
+        if (SUCCEEDED(hr))
+        {
+            hr = pRenderTarget->CreateSolidColorBrush(D2D1::ColorF(D2D1::ColorF::LightSlateGray), &pLightSlateGrayBrush);
+
+        }
+
+        if (SUCCEEDED(hr))
+        {
+            hr = pRenderTarget->CreateSolidColorBrush(D2D1::ColorF(D2D1::ColorF::CornflowerBlue), &pCornflowerBlueBrush);
+
+        }
+    }
+    return hr;
+}

这个函数的作用是创建绘图所需要的画布、画笔。使用GPU绘图的时候,这个部分其实是有很多工作需要做的。然而这些D2D都为我们封装好了,所以我们可以简简单单地,以一种非常接近于GDI的方式去调用。(但同时是使我们少了很多控制力,这个就是之前所说的新的DX12所要解决的问题)

+void DiscardGraphicsResources()
+{
+    SafeRelease(&pRenderTarget);
+    SafeRelease(&pLightSlateGrayBrush);
+    SafeRelease(&pCornflowerBlueBrush);
+}
+

这个是用来释放画布、画笔所对应的GPU资源的。使用了我们上面定义的泛型函数。在我们这个例子里面,需要释放这些资源的主要有两种情况:

  1. 窗口的大小发生了改变
  2. 窗口被销毁(程序结束)
+    // initialize COM
+    if (FAILED(CoInitializeEx(nullptr, COINIT_APARTMENTTHREADED | COINIT_DISABLE_OLE1DDE))) return -1;
+

初始化COM。所有使用COM的程序,都需要在程序的开头做这么个调用,因为COM和普通动态库不同,它其实是一个比较独立的东西,有自己的一套机制。(其实动态库在使用之前也是需要加载的。只不过很多只在Windows上面写程序的程序员,因为微软深度傻瓜封装的关系,不太知道)

第一个参数是固定为nullptr(就是0。因为C++是强类型语言,0是一个整数,而空指针应该是指针类型,所以C++ 11里面定义了这个nullptr,用来取代之前的0,来满足类型匹配的要求)。

第二个参数由两部分组成。

COINIT_APARTMENTTHREADED

这个是告诉COM以一种所谓STA的方式运行。很粗略的来说可以认为COM组件是以一种与我们程序同步的方式运行。如果想知道细节请参考COM相关资料,比如下面这篇官方文档:

COINIT enumeration

简单来说,就如我上面解释“OLE”的时候介绍的,其实这个时候在我们的窗体之外,D2D COM会创建一个隐藏的窗体,然后监视着我们的窗体的消息队列。同时,所有的绘制都重定向到我们的窗体,而不是它自己的窗体。

COINIT_DISABLE_OLE1DDE

这个是关闭一些已经过时的COM功能,减少不必要的开销。

既然我们在应用程序初始化的时候初始化了COM组件,那么我们就需要在应用程序结束的地方结束它:

+    // uninitialize COM
+    CoUninitialize();
+

然后我们需要在窗口创建的过程当中创建Factory工厂。因为只有有了工厂我们才能创建画布、画笔。(在前面GDI或者XCB的代码当中,因为这些对象都是我们程序内部创建的,所以我们并不需要工厂。但是,现在我们是使用D2D,对象是在游离在我们程序本体之外的一个COM组件里面创建的。对于这些对象我们所知甚少,所以就要通过工厂创建。打个比喻,“外包”)。WM_CREATE是在我们调用CreateWindowEx()这个系统API的时候,系统回调我们的消息处理函数所发送给我们的消息。

+       case WM_CREATE:
+               if (FAILED(D2D1CreateFactory(
+                                       D2D1_FACTORY_TYPE_SINGLE_THREADED, &pFactory)))
+               {
+                       result = -1; // Fail CreateWindowEx.
+                       return result;
+               }
+               wasHandled = true;
+        result = 0;
+        break;
+

然后改动最大的部分,WM_PAINT消息处理部分:

+                       HRESULT hr = CreateGraphicsResources(hWnd);
+                       if (SUCCEEDED(hr))
+                       {
+                               PAINTSTRUCT ps;
+                               BeginPaint(hWnd, &ps);
+
+                               // start build GPU draw command
+                               pRenderTarget->BeginDraw();
+
+                               // clear the background with white color
+                               pRenderTarget->Clear(D2D1::ColorF(D2D1::ColorF::White));
+
+                // retrieve the size of drawing area
+                D2D1_SIZE_F rtSize = pRenderTarget->GetSize();
+
+                // draw a grid background.
+                int width = static_cast<int>(rtSize.width);
+                int height = static_cast<int>(rtSize.height);
+
+                for (int x = 0; x < width; x += 10)
+                {
+                    pRenderTarget->DrawLine(
+                        D2D1::Point2F(static_cast<FLOAT>(x), 0.0f),
+                        D2D1::Point2F(static_cast<FLOAT>(x), rtSize.height),
+                        pLightSlateGrayBrush,
+                        0.5f
+                        );
+                }
+
+                for (int y = 0; y < height; y += 10)
+                {
+                    pRenderTarget->DrawLine(
+                        D2D1::Point2F(0.0f, static_cast<FLOAT>(y)),
+                        D2D1::Point2F(rtSize.width, static_cast<FLOAT>(y)),
+                        pLightSlateGrayBrush,
+                        0.5f
+                        );
+                }
+
+                // draw two rectangles
+                D2D1_RECT_F rectangle1 = D2D1::RectF(
+                     rtSize.width/2 - 50.0f,
+                     rtSize.height/2 - 50.0f,
+                     rtSize.width/2 + 50.0f,
+                     rtSize.height/2 + 50.0f
+                     );
+
+                 D2D1_RECT_F rectangle2 = D2D1::RectF(
+                     rtSize.width/2 - 100.0f,
+                     rtSize.height/2 - 100.0f,
+                     rtSize.width/2 + 100.0f,
+                     rtSize.height/2 + 100.0f
+                     );
+
+                // draw a filled rectangle
+                pRenderTarget->FillRectangle(&rectangle1, pLightSlateGrayBrush);
+
+                // draw a outline only rectangle
+                pRenderTarget->DrawRectangle(&rectangle2, pCornflowerBlueBrush);
+
+                               // end GPU draw command building
+                               hr = pRenderTarget->EndDraw();
+                               if (FAILED(hr) || hr == D2DERR_RECREATE_TARGET)
+                               {
+                                       DiscardGraphicsResources();
+                               }
+
+                               EndPaint(hWnd, &ps);
+                       }
+           }
+               wasHandled = true;
+        break;

这部分咋看改动很多,其实和GDI绘制是十分类似的。所不同的是所有绘制指令我们都通过pRenderTarget这个接口调用。pRenderTarget是D2D COM组件所提供给我们的一个接口,那么也就是说实际的GPU绘图指令是在D2D COM组件当中完成的,而我们只是将命令和参数传(外包)给D2D。事实上,我们这些调用只是生成一些D2D消息放在我们窗体的消息队列当中,然后D2D看到这些消息就会进行处理,命令GPU进行绘制。

+       case WM_SIZE:
+               if (pRenderTarget != nullptr)
+               {
+                       RECT rc;
+                       GetClientRect(hWnd, &rc);
+
+                       D2D1_SIZE_U size = D2D1::SizeU(rc.right - rc.left, rc.bottom - rc.top);
+
+                       pRenderTarget->Resize(size);
+               }
+               wasHandled = true;
+        break;

这个是处理窗口尺寸变化的。当窗口尺寸变化的时候,我们需要通知GPU调整画布的大小(实际上会导致抛弃所有之前的绘图资源,重新建立一套新的画布、画笔)

+    case WM_DISPLAYCHANGE:
+        InvalidateRect(hWnd, nullptr, false);
+        wasHandled = true;
+        break;

InvalidateRect是通知系统窗口的客户区域(Client Rect)需要进行重新绘制。而WM_DISPLAYCHANGE是指显示器分辨率发生变化。

+    if (!wasHandled) { result = DefWindowProc (hWnd, message, wParam, lParam); }
+    return result;

这部分是捡漏。Windows的消息队列当中的消息很多,包括上面所说的COM相关消息。我们的代码里之进行了一部分消息的定制化处理。对于我们没有处理的消息,在这里调用系统缺省的处理方式进行处理。这个步骤很重要,否则窗口都不会创建成功。

好了。我们已经完成了整个代码的更改。接下来是编译它。因为我们用到了COM,所以我们追加需要链接old32.lib这个库;我们用到了D2D1,所以我们需要追加链接d2d1.lib这个库。我们现在没有用到GDI,所以不需要gid32.lib这个库了。

D:\wenli\Source\Repos\GameEngineFromScratch\Platform\Windows>clang -l user32.lib -l ole32.lib -l d2d1.lib -o helloengine_d2d.exe helloengine_d2d.cpp

执行helloengine_d2d.exe,我们看到了下面这个结果:

好了,我们完成了人生中第一次与GPU的亲密接触。

保存我们的程序。

Windows下面的调试方法

现在我们的程序已经变得比较复杂了。因此很可能会有各种bug。解决bug的方式是调试。

虽然我们使用了Clang工具进行编译,但是我们依旧可以使用Visual Studio进行调试。方法如下:

首先,我们需要将编译分为两个步骤,先使用clang进行obj的生成(也就是编译)。

D:\wenli\Source\Repos\GameEngineFromScratch\Platform\Windows>clang-cl -c -Z7 -o helloengine_d2d.obj helloengine_d2d.cpp

注意我们这里实际上使用的是clang-cl这个工具。这个工具是clang的一个兼容性版本,可以识别Visual Studio提供的cl.exe编译器的选项

llvm.org/devmtg/2014-04

然后我们使用Visual Studo的链接器进行链接

D:\wenli\Source\Repos\GameEngineFromScratch\Platform\Windows>link -debug user32.lib ole32.lib d2d1.lib helloengine_d2d.obj

这样我们就可以看到目录当中生成了.pdb文件。这个文件就是Visual Studio的调试符号库。

我们可以使用下面的命令启动Visual Studio的调试窗口:

D:\wenli\Source\Repos\GameEngineFromScratch\Platform\Windows>devenv /debug helloengine_d2d.exe

接下来就和常规的Visual Studio调试没有任何区别了。

参考引用:

  1. Direct2D (Windows)