<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/">
    <channel>
        <title>木白的知识小屋</title>
        <link>https://tangly1024.com/</link>
        <description>热爱知识的小白 feedId:69298335644214273+userId:69290848602792960</description>
        <lastBuildDate>Tue, 21 Apr 2026 04:36:35 GMT</lastBuildDate>
        <docs>https://validator.w3.org/feed/docs/rss2.html</docs>
        <generator>https://github.com/jpmonette/feed</generator>
        <language>zh-CN</language>
        <copyright>All rights reserved 2026, 木白  This message is used to verify that this feed (feedId:69298335644214273) belongs to me (userId:69290848602792960). Join me in enjoying the next generation information browser https://follow.is.</copyright>
        <item>
            <title><![CDATA[《鹧鸪天·劝惜年少》]]></title>
            <link>https://tangly1024.com/essay/old-see-young</link>
            <guid>https://tangly1024.com/essay/old-see-young</guid>
            <pubDate>Wed, 10 Sep 2025 00:00:00 GMT</pubDate>
            <content:encoded><![CDATA[<div id="notion-article" class="mx-auto overflow-hidden "><main class="notion light-mode notion-page notion-block-26a9379410e1804b856fec02bbb82551"><div class="notion-viewport"></div><div class="notion-collection-page-properties"></div><div class="notion-text notion-block-26a9379410e18006b1aaf637843d57b2">
<b>年少何知锦瑟珍</b>，轻抛光景等微尘。
花前醉饮千觞酒，镜里虚度几度春。</div><div class="notion-text notion-block-26a9379410e180faab1ec3ecca54bbca">金缕曲，玉楼人，<b>回头往事尽成痕</b>。
老来空对沧桑月，独照寒窗悔恨深。</div><div class="notion-blank notion-block-26a9379410e180afab8bcc488a05beb6"> </div></main></div>]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[多尺度处理的两大技术：金字塔池化与特征金字塔网络对比解析]]></title>
            <link>https://tangly1024.com/ai/pyraidpool_FPN</link>
            <guid>https://tangly1024.com/ai/pyraidpool_FPN</guid>
            <pubDate>Tue, 18 Mar 2025 00:00:00 GMT</pubDate>
            <description><![CDATA[金字塔池化 和  特征金字塔网络]]></description>
            <content:encoded><![CDATA[<div id="notion-article" class="mx-auto overflow-hidden "><main class="notion light-mode notion-page notion-block-1ba9379410e180bdb1c9e5bd063a8296"><div class="notion-viewport"></div><div class="notion-collection-page-properties"></div><div class="notion-callout notion-gray_background_co notion-block-1ba9379410e180429161f8c4d20fe469"><div class="notion-page-icon-inline notion-page-icon-span"><span class="notion-page-icon" role="img" aria-label="😀">😀</span></div><div class="notion-callout-text">在深度学习的视觉任务中，多尺度问题一直是核心挑战之一。金字塔池化（Pyramid Pooling）和特征金字塔网络（Feature Pyramid Network, FPN）作为两种重要的多尺度处理技术，分别在图像分类、目标检测和图像分割中发挥着重要作用。本文将深入探讨这两种技术的核心思想、结构特点、优缺点及其典型应用场景。</div></div><div class="notion-blank notion-block-1ba9379410e1805880f5e0fa335ec20d"> </div><hr class="notion-hr notion-block-1ba9379410e180ed9fd3edaa715dc256"/><h3 class="notion-h notion-h2 notion-h-indent-0 notion-block-1ba9379410e180099d98d2fe36ecaa5b" data-id="1ba9379410e180099d98d2fe36ecaa5b"><span><div id="1ba9379410e180099d98d2fe36ecaa5b" class="notion-header-anchor"></div><a class="notion-hash-link" href="#1ba9379410e180099d98d2fe36ecaa5b" title="1. 金字塔池化（Pyramid Pooling）"><svg viewBox="0 0 16 16" width="16" height="16"><path fill-rule="evenodd" d="M7.775 3.275a.75.75 0 001.06 1.06l1.25-1.25a2 2 0 112.83 2.83l-2.5 2.5a2 2 0 01-2.83 0 .75.75 0 00-1.06 1.06 3.5 3.5 0 004.95 0l2.5-2.5a3.5 3.5 0 00-4.95-4.95l-1.25 1.25zm-4.69 9.64a2 2 0 010-2.83l2.5-2.5a2 2 0 012.83 0 .75.75 0 001.06-1.06 3.5 3.5 0 00-4.95 0l-2.5 2.5a3.5 3.5 0 004.95 4.95l1.25-1.25a.75.75 0 00-1.06-1.06l-1.25 1.25a2 2 0 01-2.83 0z"></path></svg></a><span class="notion-h-title">1. 金字塔池化（Pyramid Pooling）</span></span></h3><h4 class="notion-h notion-h3 notion-h-indent-1 notion-block-1ba9379410e180e3b5f6d5333a0ed1ce" data-id="1ba9379410e180e3b5f6d5333a0ed1ce"><span><div id="1ba9379410e180e3b5f6d5333a0ed1ce" class="notion-header-anchor"></div><a class="notion-hash-link" href="#1ba9379410e180e3b5f6d5333a0ed1ce" title="核心思想"><svg viewBox="0 0 16 16" width="16" height="16"><path fill-rule="evenodd" d="M7.775 3.275a.75.75 0 001.06 1.06l1.25-1.25a2 2 0 112.83 2.83l-2.5 2.5a2 2 0 01-2.83 0 .75.75 0 00-1.06 1.06 3.5 3.5 0 004.95 0l2.5-2.5a3.5 3.5 0 00-4.95-4.95l-1.25 1.25zm-4.69 9.64a2 2 0 010-2.83l2.5-2.5a2 2 0 012.83 0 .75.75 0 001.06-1.06 3.5 3.5 0 00-4.95 0l-2.5 2.5a3.5 3.5 0 004.95 4.95l1.25-1.25a.75.75 0 00-1.06-1.06l-1.25 1.25a2 2 0 01-2.83 0z"></path></svg></a><span class="notion-h-title">核心思想</span></span></h4><div class="notion-text notion-block-1ba9379410e18073b9afdaa3bced0e57">金字塔池化的核心目标是通过多尺度池化操作，从单一特征图中提取不同粒度的信息，增强模型对输入尺度变化的鲁棒性。其典型代表是Spatial Pyramid Pooling (SPP)。</div><h4 class="notion-h notion-h3 notion-h-indent-1 notion-block-1ba9379410e1809a942fd479b39301a2" data-id="1ba9379410e1809a942fd479b39301a2"><span><div id="1ba9379410e1809a942fd479b39301a2" class="notion-header-anchor"></div><a class="notion-hash-link" href="#1ba9379410e1809a942fd479b39301a2" title="结构特点"><svg viewBox="0 0 16 16" width="16" height="16"><path fill-rule="evenodd" d="M7.775 3.275a.75.75 0 001.06 1.06l1.25-1.25a2 2 0 112.83 2.83l-2.5 2.5a2 2 0 01-2.83 0 .75.75 0 00-1.06 1.06 3.5 3.5 0 004.95 0l2.5-2.5a3.5 3.5 0 00-4.95-4.95l-1.25 1.25zm-4.69 9.64a2 2 0 010-2.83l2.5-2.5a2 2 0 012.83 0 .75.75 0 001.06-1.06 3.5 3.5 0 00-4.95 0l-2.5 2.5a3.5 3.5 0 004.95 4.95l1.25-1.25a.75.75 0 00-1.06-1.06l-1.25 1.25a2 2 0 01-2.83 0z"></path></svg></a><span class="notion-h-title">结构特点</span></span></h4><ul class="notion-list notion-list-disc notion-block-1ba9379410e180d7b931c7f517791510"><li><b>多层级池化</b>：将特征图划分为不同尺度的网格（如4x4、2x2、1x1），并对每个网格进行池化操作（通常为最大池化），生成不同粒度的特征。</li></ul><ul class="notion-list notion-list-disc notion-block-1ba9379410e1805da12bef8a08cad897"><li><b>固定输出维度</b>：无论输入图像尺寸如何变化，最终拼接后的特征维度固定，便于后续全连接层处理。</li></ul><h4 class="notion-h notion-h3 notion-h-indent-1 notion-block-1ba9379410e18080a4adefe81ddef350" data-id="1ba9379410e18080a4adefe81ddef350"><span><div id="1ba9379410e18080a4adefe81ddef350" class="notion-header-anchor"></div><a class="notion-hash-link" href="#1ba9379410e18080a4adefe81ddef350" title="应用场景"><svg viewBox="0 0 16 16" width="16" height="16"><path fill-rule="evenodd" d="M7.775 3.275a.75.75 0 001.06 1.06l1.25-1.25a2 2 0 112.83 2.83l-2.5 2.5a2 2 0 01-2.83 0 .75.75 0 00-1.06 1.06 3.5 3.5 0 004.95 0l2.5-2.5a3.5 3.5 0 00-4.95-4.95l-1.25 1.25zm-4.69 9.64a2 2 0 010-2.83l2.5-2.5a2 2 0 012.83 0 .75.75 0 001.06-1.06 3.5 3.5 0 00-4.95 0l-2.5 2.5a3.5 3.5 0 004.95 4.95l1.25-1.25a.75.75 0 00-1.06-1.06l-1.25 1.25a2 2 0 01-2.83 0z"></path></svg></a><span class="notion-h-title">应用场景</span></span></h4><ul class="notion-list notion-list-disc notion-block-1ba9379410e180bf919eea6f2ee629f1"><li><b>分类任务</b>：例如在SPP-Net中，金字塔池化用于处理不同尺寸的候选区域（Region of Interest, RoI），避免重复计算卷积特征。</li></ul><ul class="notion-list notion-list-disc notion-block-1ba9379410e180879695d1846fb62ac7"><li><b>分割任务</b>：在DeepLab的ASPP（Atrous Spatial Pyramid Pooling）模块中，通过多尺度池化捕获不同上下文信息，提升分割精度。</li></ul><h4 class="notion-h notion-h3 notion-h-indent-1 notion-block-1ba9379410e180febf99e7c16ab34205" data-id="1ba9379410e180febf99e7c16ab34205"><span><div id="1ba9379410e180febf99e7c16ab34205" class="notion-header-anchor"></div><a class="notion-hash-link" href="#1ba9379410e180febf99e7c16ab34205" title="优点"><svg viewBox="0 0 16 16" width="16" height="16"><path fill-rule="evenodd" d="M7.775 3.275a.75.75 0 001.06 1.06l1.25-1.25a2 2 0 112.83 2.83l-2.5 2.5a2 2 0 01-2.83 0 .75.75 0 00-1.06 1.06 3.5 3.5 0 004.95 0l2.5-2.5a3.5 3.5 0 00-4.95-4.95l-1.25 1.25zm-4.69 9.64a2 2 0 010-2.83l2.5-2.5a2 2 0 012.83 0 .75.75 0 001.06-1.06 3.5 3.5 0 00-4.95 0l-2.5 2.5a3.5 3.5 0 004.95 4.95l1.25-1.25a.75.75 0 00-1.06-1.06l-1.25 1.25a2 2 0 01-2.83 0z"></path></svg></a><span class="notion-h-title">优点</span></span></h4><ul class="notion-list notion-list-disc notion-block-1ba9379410e180c78ba2ecd348987d9d"><li><b>适应输入尺寸变化</b>：无需对输入图像进行固定尺寸的调整，减少计算冗余。</li></ul><ul class="notion-list notion-list-disc notion-block-1ba9379410e1808e9763e344ebdb274c"><li><b>融合多尺度信息</b>：通过多尺度池化增强模型的鲁棒性，尤其适合处理尺度变化较大的任务。</li></ul><h4 class="notion-h notion-h3 notion-h-indent-1 notion-block-1ba9379410e180538086e2f98ccc749d" data-id="1ba9379410e180538086e2f98ccc749d"><span><div id="1ba9379410e180538086e2f98ccc749d" class="notion-header-anchor"></div><a class="notion-hash-link" href="#1ba9379410e180538086e2f98ccc749d" title="缺点"><svg viewBox="0 0 16 16" width="16" height="16"><path fill-rule="evenodd" d="M7.775 3.275a.75.75 0 001.06 1.06l1.25-1.25a2 2 0 112.83 2.83l-2.5 2.5a2 2 0 01-2.83 0 .75.75 0 00-1.06 1.06 3.5 3.5 0 004.95 0l2.5-2.5a3.5 3.5 0 00-4.95-4.95l-1.25 1.25zm-4.69 9.64a2 2 0 010-2.83l2.5-2.5a2 2 0 012.83 0 .75.75 0 001.06-1.06 3.5 3.5 0 00-4.95 0l-2.5 2.5a3.5 3.5 0 004.95 4.95l1.25-1.25a.75.75 0 00-1.06-1.06l-1.25 1.25a2 2 0 01-2.83 0z"></path></svg></a><span class="notion-h-title">缺点</span></span></h4><ul class="notion-list notion-list-disc notion-block-1ba9379410e1806489c7fc75845bc717"><li><b>单特征图限制</b>：主要关注单一特征图的多尺度处理，未显式结合不同层级的语义信息。</li></ul><div class="notion-blank notion-block-1ba9379410e1805d9bf0f8abd416b46d"> </div><div class="notion-blank notion-block-1ba9379410e1801c9e84d2ef00db6149"> </div><hr class="notion-hr notion-block-1ba9379410e180bb93a9eb9e4935cca1"/><h3 class="notion-h notion-h2 notion-h-indent-0 notion-block-1ba9379410e1807c8f6fc7f2cfd9cf5d" data-id="1ba9379410e1807c8f6fc7f2cfd9cf5d"><span><div id="1ba9379410e1807c8f6fc7f2cfd9cf5d" class="notion-header-anchor"></div><a class="notion-hash-link" href="#1ba9379410e1807c8f6fc7f2cfd9cf5d" title="2. 特征金字塔网络（FPN）"><svg viewBox="0 0 16 16" width="16" height="16"><path fill-rule="evenodd" d="M7.775 3.275a.75.75 0 001.06 1.06l1.25-1.25a2 2 0 112.83 2.83l-2.5 2.5a2 2 0 01-2.83 0 .75.75 0 00-1.06 1.06 3.5 3.5 0 004.95 0l2.5-2.5a3.5 3.5 0 00-4.95-4.95l-1.25 1.25zm-4.69 9.64a2 2 0 010-2.83l2.5-2.5a2 2 0 012.83 0 .75.75 0 001.06-1.06 3.5 3.5 0 00-4.95 0l-2.5 2.5a3.5 3.5 0 004.95 4.95l1.25-1.25a.75.75 0 00-1.06-1.06l-1.25 1.25a2 2 0 01-2.83 0z"></path></svg></a><span class="notion-h-title">2. 特征金字塔网络（FPN）</span></span></h3><h4 class="notion-h notion-h3 notion-h-indent-1 notion-block-1ba9379410e180a4aee6f4c42ce87ab0" data-id="1ba9379410e180a4aee6f4c42ce87ab0"><span><div id="1ba9379410e180a4aee6f4c42ce87ab0" class="notion-header-anchor"></div><a class="notion-hash-link" href="#1ba9379410e180a4aee6f4c42ce87ab0" title="核心思想"><svg viewBox="0 0 16 16" width="16" height="16"><path fill-rule="evenodd" d="M7.775 3.275a.75.75 0 001.06 1.06l1.25-1.25a2 2 0 112.83 2.83l-2.5 2.5a2 2 0 01-2.83 0 .75.75 0 00-1.06 1.06 3.5 3.5 0 004.95 0l2.5-2.5a3.5 3.5 0 00-4.95-4.95l-1.25 1.25zm-4.69 9.64a2 2 0 010-2.83l2.5-2.5a2 2 0 012.83 0 .75.75 0 001.06-1.06 3.5 3.5 0 00-4.95 0l-2.5 2.5a3.5 3.5 0 004.95 4.95l1.25-1.25a.75.75 0 00-1.06-1.06l-1.25 1.25a2 2 0 01-2.83 0z"></path></svg></a><span class="notion-h-title">核心思想</span></span></h4><div class="notion-text notion-block-1ba9379410e1806981eee869d9da060f">FPN的核心目标是通过自顶向下（Top-Down）的路径和横向连接（Lateral Connections），构建多尺度特征金字塔，实现高层语义与低层细节的结合。</div><h4 class="notion-h notion-h3 notion-h-indent-1 notion-block-1ba9379410e18006b64be87440db2868" data-id="1ba9379410e18006b64be87440db2868"><span><div id="1ba9379410e18006b64be87440db2868" class="notion-header-anchor"></div><a class="notion-hash-link" href="#1ba9379410e18006b64be87440db2868" title="结构特点"><svg viewBox="0 0 16 16" width="16" height="16"><path fill-rule="evenodd" d="M7.775 3.275a.75.75 0 001.06 1.06l1.25-1.25a2 2 0 112.83 2.83l-2.5 2.5a2 2 0 01-2.83 0 .75.75 0 00-1.06 1.06 3.5 3.5 0 004.95 0l2.5-2.5a3.5 3.5 0 00-4.95-4.95l-1.25 1.25zm-4.69 9.64a2 2 0 010-2.83l2.5-2.5a2 2 0 012.83 0 .75.75 0 001.06-1.06 3.5 3.5 0 00-4.95 0l-2.5 2.5a3.5 3.5 0 004.95 4.95l1.25-1.25a.75.75 0 00-1.06-1.06l-1.25 1.25a2 2 0 01-2.83 0z"></path></svg></a><span class="notion-h-title">结构特点</span></span></h4><ul class="notion-list notion-list-disc notion-block-1ba9379410e18043a380df688896e830"><li><b>多层级预测</b>：在金字塔的每一层（如P2-P5）独立进行目标检测或分割，适应不同尺度的目标。</li></ul><ul class="notion-list notion-list-disc notion-block-1ba9379410e180908327f6c814fa2c70"><li><b>特征融合</b>：高层特征通过上采样与低层特征相加（或拼接），生成语义丰富且高分辨率的特征。</li></ul><h4 class="notion-h notion-h3 notion-h-indent-1 notion-block-1ba9379410e1805d81a4cbfde99d39f8" data-id="1ba9379410e1805d81a4cbfde99d39f8"><span><div id="1ba9379410e1805d81a4cbfde99d39f8" class="notion-header-anchor"></div><a class="notion-hash-link" href="#1ba9379410e1805d81a4cbfde99d39f8" title="应用场景"><svg viewBox="0 0 16 16" width="16" height="16"><path fill-rule="evenodd" d="M7.775 3.275a.75.75 0 001.06 1.06l1.25-1.25a2 2 0 112.83 2.83l-2.5 2.5a2 2 0 01-2.83 0 .75.75 0 00-1.06 1.06 3.5 3.5 0 004.95 0l2.5-2.5a3.5 3.5 0 00-4.95-4.95l-1.25 1.25zm-4.69 9.64a2 2 0 010-2.83l2.5-2.5a2 2 0 012.83 0 .75.75 0 001.06-1.06 3.5 3.5 0 00-4.95 0l-2.5 2.5a3.5 3.5 0 004.95 4.95l1.25-1.25a.75.75 0 00-1.06-1.06l-1.25 1.25a2 2 0 01-2.83 0z"></path></svg></a><span class="notion-h-title">应用场景</span></span></h4><ul class="notion-list notion-list-disc notion-block-1ba9379410e180469bb4c33ef575fa28"><li><b>目标检测</b>：例如Faster R-CNN + FPN，显著提升小目标检测效果。</li></ul><ul class="notion-list notion-list-disc notion-block-1ba9379410e180daa19cf456d1e6411c"><li><b>实例分割</b>：例如Mask R-CNN，利用金字塔特征生成精确的掩码。</li></ul><h4 class="notion-h notion-h3 notion-h-indent-1 notion-block-1ba9379410e180498965ce740ae568c1" data-id="1ba9379410e180498965ce740ae568c1"><span><div id="1ba9379410e180498965ce740ae568c1" class="notion-header-anchor"></div><a class="notion-hash-link" href="#1ba9379410e180498965ce740ae568c1" title="优点"><svg viewBox="0 0 16 16" width="16" height="16"><path fill-rule="evenodd" d="M7.775 3.275a.75.75 0 001.06 1.06l1.25-1.25a2 2 0 112.83 2.83l-2.5 2.5a2 2 0 01-2.83 0 .75.75 0 00-1.06 1.06 3.5 3.5 0 004.95 0l2.5-2.5a3.5 3.5 0 00-4.95-4.95l-1.25 1.25zm-4.69 9.64a2 2 0 010-2.83l2.5-2.5a2 2 0 012.83 0 .75.75 0 001.06-1.06 3.5 3.5 0 00-4.95 0l-2.5 2.5a3.5 3.5 0 004.95 4.95l1.25-1.25a.75.75 0 00-1.06-1.06l-1.25 1.25a2 2 0 01-2.83 0z"></path></svg></a><span class="notion-h-title">优点</span></span></h4><ul class="notion-list notion-list-disc notion-block-1ba9379410e18055bcc7eaf6f755dd80"><li><b>多尺度特征融合</b>：同时利用高层语义和低层细节，适合处理多尺度目标。</li></ul><ul class="notion-list notion-list-disc notion-block-1ba9379410e1804dbc39f227c459c60a"><li><b>端到端训练</b>：无需额外模块，模型训练更加高效。</li></ul><h4 class="notion-h notion-h3 notion-h-indent-1 notion-block-1ba9379410e18071b939eb8fd9a8bd52" data-id="1ba9379410e18071b939eb8fd9a8bd52"><span><div id="1ba9379410e18071b939eb8fd9a8bd52" class="notion-header-anchor"></div><a class="notion-hash-link" href="#1ba9379410e18071b939eb8fd9a8bd52" title="缺点"><svg viewBox="0 0 16 16" width="16" height="16"><path fill-rule="evenodd" d="M7.775 3.275a.75.75 0 001.06 1.06l1.25-1.25a2 2 0 112.83 2.83l-2.5 2.5a2 2 0 01-2.83 0 .75.75 0 00-1.06 1.06 3.5 3.5 0 004.95 0l2.5-2.5a3.5 3.5 0 00-4.95-4.95l-1.25 1.25zm-4.69 9.64a2 2 0 010-2.83l2.5-2.5a2 2 0 012.83 0 .75.75 0 001.06-1.06 3.5 3.5 0 00-4.95 0l-2.5 2.5a3.5 3.5 0 004.95 4.95l1.25-1.25a.75.75 0 00-1.06-1.06l-1.25 1.25a2 2 0 01-2.83 0z"></path></svg></a><span class="notion-h-title">缺点</span></span></h4><ul class="notion-list notion-list-disc notion-block-1ba9379410e180829a0cf5583b029fe9"><li><b>计算量增加</b>：需要构建多层金字塔，增加了计算复杂度。</li></ul><ul class="notion-list notion-list-disc notion-block-1ba9379410e18017af3ff4356078d31c"><li><b>融合方式敏感</b>：特征融合方式（如相加 vs. 拼接）对性能影响较大。</li></ul><hr class="notion-hr notion-block-1ba9379410e180448ca3f11cf5b2e121"/><h3 class="notion-h notion-h2 notion-h-indent-0 notion-block-1ba9379410e18019a18df8b49711d9b4" data-id="1ba9379410e18019a18df8b49711d9b4"><span><div id="1ba9379410e18019a18df8b49711d9b4" class="notion-header-anchor"></div><a class="notion-hash-link" href="#1ba9379410e18019a18df8b49711d9b4" title="3. 核心区别"><svg viewBox="0 0 16 16" width="16" height="16"><path fill-rule="evenodd" d="M7.775 3.275a.75.75 0 001.06 1.06l1.25-1.25a2 2 0 112.83 2.83l-2.5 2.5a2 2 0 01-2.83 0 .75.75 0 00-1.06 1.06 3.5 3.5 0 004.95 0l2.5-2.5a3.5 3.5 0 00-4.95-4.95l-1.25 1.25zm-4.69 9.64a2 2 0 010-2.83l2.5-2.5a2 2 0 012.83 0 .75.75 0 001.06-1.06 3.5 3.5 0 00-4.95 0l-2.5 2.5a3.5 3.5 0 004.95 4.95l1.25-1.25a.75.75 0 00-1.06-1.06l-1.25 1.25a2 2 0 01-2.83 0z"></path></svg></a><span class="notion-h-title">3. 核心区别</span></span></h3><table class="notion-simple-table notion-block-1ba9379410e1806e9c27c4c332c6d564"><tbody><tr class="notion-simple-table-row notion-block-1ba9379410e180cab57bc894f4a1bc09"><td class="" style="width:120px"><div class="notion-simple-table-cell"><b>方面</b></div></td><td class="" style="width:120px"><div class="notion-simple-table-cell"><b>金字塔池化</b></div></td><td class="" style="width:120px"><div class="notion-simple-table-cell"><b>FPN</b></div></td></tr><tr class="notion-simple-table-row notion-block-1ba9379410e1805080b4e67ff87e35fd"><td class="" style="width:120px"><div class="notion-simple-table-cell"><b>核心目标</b></div></td><td class="" style="width:120px"><div class="notion-simple-table-cell">单特征图的多尺度信息整合</div></td><td class="" style="width:120px"><div class="notion-simple-table-cell">多层级特征金字塔的构建与融合</div></td></tr><tr class="notion-simple-table-row notion-block-1ba9379410e18080ad79f9a70d43f9c7"><td class="" style="width:120px"><div class="notion-simple-table-cell"><b>结构方向</b></div></td><td class="" style="width:120px"><div class="notion-simple-table-cell">单一特征图的分层池化</div></td><td class="" style="width:120px"><div class="notion-simple-table-cell">自顶向下+横向连接的多层融合</div></td></tr><tr class="notion-simple-table-row notion-block-1ba9379410e180b499b2cf0af5232341"><td class="" style="width:120px"><div class="notion-simple-table-cell"><b>典型任务</b></div></td><td class="" style="width:120px"><div class="notion-simple-table-cell">分类（SPP）、分割（ASPP）</div></td><td class="" style="width:120px"><div class="notion-simple-table-cell">目标检测、实例分割</div></td></tr><tr class="notion-simple-table-row notion-block-1ba9379410e1809e9133ca8f93834581"><td class="" style="width:120px"><div class="notion-simple-table-cell"><b>输出维度</b></div></td><td class="" style="width:120px"><div class="notion-simple-table-cell">固定维度特征向量</div></td><td class="" style="width:120px"><div class="notion-simple-table-cell">多尺度特征图（每层独立预测）</div></td></tr><tr class="notion-simple-table-row notion-block-1ba9379410e180c88e49ca81aac2cf92"><td class="" style="width:120px"><div class="notion-simple-table-cell"><b>信息流动</b></div></td><td class="" style="width:120px"><div class="notion-simple-table-cell">单向（从输入到池化）</div></td><td class="" style="width:120px"><div class="notion-simple-table-cell">双向（高层到低层的信息传递）</div></td></tr></tbody></table><hr class="notion-hr notion-block-1ba9379410e1809c8e5af766ad4c5362"/><h3 class="notion-h notion-h2 notion-h-indent-0 notion-block-1ba9379410e1804e894af6e9303a2e0e" data-id="1ba9379410e1804e894af6e9303a2e0e"><span><div id="1ba9379410e1804e894af6e9303a2e0e" class="notion-header-anchor"></div><a class="notion-hash-link" href="#1ba9379410e1804e894af6e9303a2e0e" title="4. 实际应用对比"><svg viewBox="0 0 16 16" width="16" height="16"><path fill-rule="evenodd" d="M7.775 3.275a.75.75 0 001.06 1.06l1.25-1.25a2 2 0 112.83 2.83l-2.5 2.5a2 2 0 01-2.83 0 .75.75 0 00-1.06 1.06 3.5 3.5 0 004.95 0l2.5-2.5a3.5 3.5 0 00-4.95-4.95l-1.25 1.25zm-4.69 9.64a2 2 0 010-2.83l2.5-2.5a2 2 0 012.83 0 .75.75 0 001.06-1.06 3.5 3.5 0 00-4.95 0l-2.5 2.5a3.5 3.5 0 004.95 4.95l1.25-1.25a.75.75 0 00-1.06-1.06l-1.25 1.25a2 2 0 01-2.83 0z"></path></svg></a><span class="notion-h-title">4. 实际应用对比</span></span></h3><ul class="notion-list notion-list-disc notion-block-1ba9379410e180dda50fd4eae8a75df8"><li><b>金字塔池化</b></li><ul class="notion-list notion-list-disc notion-block-1ba9379410e180dda50fd4eae8a75df8"><div class="notion-text notion-block-1ba9379410e180a382e2f2e8b873bf13">常用于处理输入尺寸变化（如SPP）或增强局部上下文（如ASPP），适合分类或分割任务中的单尺度特征增强。其变体ASPP在分割任务中表现尤为突出。</div></ul></ul><ul class="notion-list notion-list-disc notion-block-1ba9379410e180df91a9ce30e48305ae"><li><b>FPN</b></li><ul class="notion-list notion-list-disc notion-block-1ba9379410e180df91a9ce30e48305ae"><div class="notion-text notion-block-1ba9379410e180a185a9e10708ba1370">专为多尺度预测设计，通过特征融合显式处理不同尺度目标，是目标检测任务（尤其是小目标检测）的标配模块，在实例分割任务中也广泛应用。</div></ul></ul><hr class="notion-hr notion-block-1ba9379410e180ed8228ee6fe4ece105"/><h3 class="notion-h notion-h2 notion-h-indent-0 notion-block-1ba9379410e1804f8315e1624c4e1cff" data-id="1ba9379410e1804f8315e1624c4e1cff"><span><div id="1ba9379410e1804f8315e1624c4e1cff" class="notion-header-anchor"></div><a class="notion-hash-link" href="#1ba9379410e1804f8315e1624c4e1cff" title="5. 总结"><svg viewBox="0 0 16 16" width="16" height="16"><path fill-rule="evenodd" d="M7.775 3.275a.75.75 0 001.06 1.06l1.25-1.25a2 2 0 112.83 2.83l-2.5 2.5a2 2 0 01-2.83 0 .75.75 0 00-1.06 1.06 3.5 3.5 0 004.95 0l2.5-2.5a3.5 3.5 0 00-4.95-4.95l-1.25 1.25zm-4.69 9.64a2 2 0 010-2.83l2.5-2.5a2 2 0 012.83 0 .75.75 0 001.06-1.06 3.5 3.5 0 00-4.95 0l-2.5 2.5a3.5 3.5 0 004.95 4.95l1.25-1.25a.75.75 0 00-1.06-1.06l-1.25 1.25a2 2 0 01-2.83 0z"></path></svg></a><span class="notion-h-title">5. 总结</span></span></h3><ul class="notion-list notion-list-disc notion-block-1ba9379410e1800aa300d4bf062c83ef"><li><b>金字塔池化</b>：通过多尺度池化操作，增强单一特征图的尺度不变性，适合分类和分割任务中的特征提取。</li></ul><ul class="notion-list notion-list-disc notion-block-1ba9379410e180ebb968e2a4296ff7b9"><li><b>FPN</b>：通过金字塔结构融合多层级特征，实现端到端的多尺度预测，尤其在目标检测任务中表现优异。</li></ul><div class="notion-text notion-block-1ba9379410e180fba963f85138b90039">两者均旨在解决多尺度问题，但侧重点不同：金字塔池化侧重于“特征提取”，而FPN侧重于“特征融合与预测”。在实际应用中，任务需求决定了技术选择：FPN是目标检测的标配，而金字塔池化及其变体（如ASPP）在分割任务中表现突出。理解两者的核心思想与适用场景，有助于在实际项目中做出更优的技术选型。</div></main></div>]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[稠密算力 vs 稀疏算力：解密现代计算的两种核心范式]]></title>
            <link>https://tangly1024.com/technology/dense-vs-sparse-computing-modern-compute-paradigms</link>
            <guid>https://tangly1024.com/technology/dense-vs-sparse-computing-modern-compute-paradigms</guid>
            <pubDate>Tue, 18 Mar 2025 00:00:00 GMT</pubDate>
            <description><![CDATA[稠密算力与稀疏算力]]></description>
            <content:encoded><![CDATA[<div id="notion-article" class="mx-auto overflow-hidden "><main class="notion light-mode notion-page notion-block-1ba9379410e18032bf14e5ce99d09810"><div class="notion-viewport"></div><div class="notion-collection-page-properties"></div><div class="notion-callout notion-gray_background_co notion-block-1ba9379410e180c7963ad5d8b0bd780e"><div class="notion-page-icon-inline notion-page-icon-span"><span class="notion-page-icon" role="img" aria-label="😀">😀</span></div><div class="notion-callout-text">引言：算力进化的双螺旋在GPU算力年增长60%的今天（数据来源：NVIDIA 2023财报），算力类型的分化正在重塑计算架构。当我们处理图像时，每个像素都必须计算；而在推荐商品时，90%的数据可能是无用的——这两种场景催生了稠密算力与稀疏算力的技术分野。理解这对&quot;算力双生子&quot;，是读懂现代AI芯片战争的关键密钥。</div></div><div class="notion-blank notion-block-1ba9379410e18041b929d486e538b484"> </div><div class="notion-blank notion-block-1ba9379410e180619f89e95c18ca5593"> </div><hr class="notion-hr notion-block-1ba9379410e1806f9a7cc76315e03f4d"/><h3 class="notion-h notion-h2 notion-h-indent-0 notion-block-1ba9379410e18015952dd961bbb7aa08" data-id="1ba9379410e18015952dd961bbb7aa08"><span><div id="1ba9379410e18015952dd961bbb7aa08" class="notion-header-anchor"></div><a class="notion-hash-link" href="#1ba9379410e18015952dd961bbb7aa08" title="第一章 基础认知：两种算力的本质差异"><svg viewBox="0 0 16 16" width="16" height="16"><path fill-rule="evenodd" d="M7.775 3.275a.75.75 0 001.06 1.06l1.25-1.25a2 2 0 112.83 2.83l-2.5 2.5a2 2 0 01-2.83 0 .75.75 0 00-1.06 1.06 3.5 3.5 0 004.95 0l2.5-2.5a3.5 3.5 0 00-4.95-4.95l-1.25 1.25zm-4.69 9.64a2 2 0 010-2.83l2.5-2.5a2 2 0 012.83 0 .75.75 0 001.06-1.06 3.5 3.5 0 00-4.95 0l-2.5 2.5a3.5 3.5 0 004.95 4.95l1.25-1.25a.75.75 0 00-1.06-1.06l-1.25 1.25a2 2 0 01-2.83 0z"></path></svg></a><span class="notion-h-title">第一章 基础认知：两种算力的本质差异</span></span></h3><h4 class="notion-h notion-h3 notion-h-indent-1 notion-block-1ba9379410e1808d80cae68b7dccfefd" data-id="1ba9379410e1808d80cae68b7dccfefd"><span><div id="1ba9379410e1808d80cae68b7dccfefd" class="notion-header-anchor"></div><a class="notion-hash-link" href="#1ba9379410e1808d80cae68b7dccfefd" title="1.1 稠密算力：精确制导的饱和打击"><svg viewBox="0 0 16 16" width="16" height="16"><path fill-rule="evenodd" d="M7.775 3.275a.75.75 0 001.06 1.06l1.25-1.25a2 2 0 112.83 2.83l-2.5 2.5a2 2 0 01-2.83 0 .75.75 0 00-1.06 1.06 3.5 3.5 0 004.95 0l2.5-2.5a3.5 3.5 0 00-4.95-4.95l-1.25 1.25zm-4.69 9.64a2 2 0 010-2.83l2.5-2.5a2 2 0 012.83 0 .75.75 0 001.06-1.06 3.5 3.5 0 00-4.95 0l-2.5 2.5a3.5 3.5 0 004.95 4.95l1.25-1.25a.75.75 0 00-1.06-1.06l-1.25 1.25a2 2 0 01-2.83 0z"></path></svg></a><span class="notion-h-title">1.1 稠密算力：精确制导的饱和打击</span></span></h4><ul class="notion-list notion-list-disc notion-block-1ba9379410e18058a58cd6ce9db1fe93"><li><b>定义</b>：对连续存储数据进行无差别计算</li></ul><ul class="notion-list notion-list-disc notion-block-1ba9379410e180689d60cf5f3e7c9fdd"><li><b>核心特征</b>：</li><ul class="notion-list notion-list-disc notion-block-1ba9379410e180689d60cf5f3e7c9fdd"><li>100%计算密度：每个计算单元都必须参与运算</li><li>规整数据流：矩阵/张量结构严格对齐</li><li>确定性时延：可精准预测计算耗时</li></ul></ul><div class="notion-text notion-block-1ba9379410e1808d8dbcc6f8119c9057"><b>典型案例</b>：4K视频渲染中，每个1920×1080像素点都需要进行光线追踪计算</div><h4 class="notion-h notion-h3 notion-h-indent-1 notion-block-1ba9379410e180489591c4601c4b80f2" data-id="1ba9379410e180489591c4601c4b80f2"><span><div id="1ba9379410e180489591c4601c4b80f2" class="notion-header-anchor"></div><a class="notion-hash-link" href="#1ba9379410e180489591c4601c4b80f2" title="1.2 稀疏算力：智能跳转的精确手术"><svg viewBox="0 0 16 16" width="16" height="16"><path fill-rule="evenodd" d="M7.775 3.275a.75.75 0 001.06 1.06l1.25-1.25a2 2 0 112.83 2.83l-2.5 2.5a2 2 0 01-2.83 0 .75.75 0 00-1.06 1.06 3.5 3.5 0 004.95 0l2.5-2.5a3.5 3.5 0 00-4.95-4.95l-1.25 1.25zm-4.69 9.64a2 2 0 010-2.83l2.5-2.5a2 2 0 012.83 0 .75.75 0 001.06-1.06 3.5 3.5 0 00-4.95 0l-2.5 2.5a3.5 3.5 0 004.95 4.95l1.25-1.25a.75.75 0 00-1.06-1.06l-1.25 1.25a2 2 0 01-2.83 0z"></path></svg></a><span class="notion-h-title">1.2 稀疏算力：智能跳转的精确手术</span></span></h4><ul class="notion-list notion-list-disc notion-block-1ba9379410e180e39a2ac84a386a3921"><li><b>定义</b>：基于条件判断的动态计算筛选</li></ul><ul class="notion-list notion-list-disc notion-block-1ba9379410e1801bb50ffd619e0d18ca"><li><b>核心特征</b>：</li><ul class="notion-list notion-list-disc notion-block-1ba9379410e1801bb50ffd619e0d18ca"><li>非均匀计算密度：有效计算占比可降至10%以下</li><li>数据依赖跳转：计算路径实时动态调整</li><li>压缩存储格式：采用CSR（Compressed Sparse Row）等编码方案</li></ul></ul><div class="notion-text notion-block-1ba9379410e180a9a151c47934f150cd"><b>典型案例</b>：自然语言处理中，跳过padding部分的无效计算</div><hr class="notion-hr notion-block-1ba9379410e180a6973bfd74afb818bc"/><h3 class="notion-h notion-h2 notion-h-indent-0 notion-block-1ba9379410e1809bbff3d6ff2f8783cc" data-id="1ba9379410e1809bbff3d6ff2f8783cc"><span><div id="1ba9379410e1809bbff3d6ff2f8783cc" class="notion-header-anchor"></div><a class="notion-hash-link" href="#1ba9379410e1809bbff3d6ff2f8783cc" title="第二章 技术解剖：架构设计的基因差异"><svg viewBox="0 0 16 16" width="16" height="16"><path fill-rule="evenodd" d="M7.775 3.275a.75.75 0 001.06 1.06l1.25-1.25a2 2 0 112.83 2.83l-2.5 2.5a2 2 0 01-2.83 0 .75.75 0 00-1.06 1.06 3.5 3.5 0 004.95 0l2.5-2.5a3.5 3.5 0 00-4.95-4.95l-1.25 1.25zm-4.69 9.64a2 2 0 010-2.83l2.5-2.5a2 2 0 012.83 0 .75.75 0 001.06-1.06 3.5 3.5 0 00-4.95 0l-2.5 2.5a3.5 3.5 0 004.95 4.95l1.25-1.25a.75.75 0 00-1.06-1.06l-1.25 1.25a2 2 0 01-2.83 0z"></path></svg></a><span class="notion-h-title">第二章 技术解剖：架构设计的基因差异</span></span></h3><h4 class="notion-h notion-h3 notion-h-indent-1 notion-block-1ba9379410e18015a934dc353517be16" data-id="1ba9379410e18015a934dc353517be16"><span><div id="1ba9379410e18015a934dc353517be16" class="notion-header-anchor"></div><a class="notion-hash-link" href="#1ba9379410e18015a934dc353517be16" title="2.1 硬件实现对比"><svg viewBox="0 0 16 16" width="16" height="16"><path fill-rule="evenodd" d="M7.775 3.275a.75.75 0 001.06 1.06l1.25-1.25a2 2 0 112.83 2.83l-2.5 2.5a2 2 0 01-2.83 0 .75.75 0 00-1.06 1.06 3.5 3.5 0 004.95 0l2.5-2.5a3.5 3.5 0 00-4.95-4.95l-1.25 1.25zm-4.69 9.64a2 2 0 010-2.83l2.5-2.5a2 2 0 012.83 0 .75.75 0 001.06-1.06 3.5 3.5 0 00-4.95 0l-2.5 2.5a3.5 3.5 0 004.95 4.95l1.25-1.25a.75.75 0 00-1.06-1.06l-1.25 1.25a2 2 0 01-2.83 0z"></path></svg></a><span class="notion-h-title">2.1 硬件实现对比</span></span></h4><table class="notion-simple-table notion-block-1ba9379410e1809f8a47de7f8fc8a069"><tbody><tr class="notion-simple-table-row notion-block-1ba9379410e18089a6c3cf49d6b7e5a4"><td class="" style="width:120px"><div class="notion-simple-table-cell">特征项</div></td><td class="" style="width:120px"><div class="notion-simple-table-cell">稠密计算架构</div></td><td class="" style="width:120px"><div class="notion-simple-table-cell">稀疏计算架构</div></td></tr><tr class="notion-simple-table-row notion-block-1ba9379410e18086aa0ff557ebda797c"><td class="" style="width:120px"><div class="notion-simple-table-cell">计算单元</div></td><td class="" style="width:120px"><div class="notion-simple-table-cell">固定管线（如GPU的SM阵列）</div></td><td class="" style="width:120px"><div class="notion-simple-table-cell">可重构逻辑（如FPGA动态路由）</div></td></tr><tr class="notion-simple-table-row notion-block-1ba9379410e180229679f958817599d6"><td class="" style="width:120px"><div class="notion-simple-table-cell">存储系统</div></td><td class="" style="width:120px"><div class="notion-simple-table-cell">高带宽HBM堆叠</div></td><td class="" style="width:120px"><div class="notion-simple-table-cell">智能缓存（支持数据预筛选）</div></td></tr><tr class="notion-simple-table-row notion-block-1ba9379410e180ffa815daa7116a0c9f"><td class="" style="width:120px"><div class="notion-simple-table-cell">指令集</div></td><td class="" style="width:120px"><div class="notion-simple-table-cell">SIMD（单指令多数据）</div></td><td class="" style="width:120px"><div class="notion-simple-table-cell">SPMD（单程序多数据）</div></td></tr><tr class="notion-simple-table-row notion-block-1ba9379410e18044a355fdb358f1ff2e"><td class="" style="width:120px"><div class="notion-simple-table-cell">典型代表</div></td><td class="" style="width:120px"><div class="notion-simple-table-cell">NVIDIA A100 Tensor Core</div></td><td class="" style="width:120px"><div class="notion-simple-table-cell">Google TPU Sparse Core</div></td></tr></tbody></table><h4 class="notion-h notion-h3 notion-h-indent-1 notion-block-1ba9379410e180ebac2ae5865b5a9a0e" data-id="1ba9379410e180ebac2ae5865b5a9a0e"><span><div id="1ba9379410e180ebac2ae5865b5a9a0e" class="notion-header-anchor"></div><a class="notion-hash-link" href="#1ba9379410e180ebac2ae5865b5a9a0e" title="2.2 关键技术突破"><svg viewBox="0 0 16 16" width="16" height="16"><path fill-rule="evenodd" d="M7.775 3.275a.75.75 0 001.06 1.06l1.25-1.25a2 2 0 112.83 2.83l-2.5 2.5a2 2 0 01-2.83 0 .75.75 0 00-1.06 1.06 3.5 3.5 0 004.95 0l2.5-2.5a3.5 3.5 0 00-4.95-4.95l-1.25 1.25zm-4.69 9.64a2 2 0 010-2.83l2.5-2.5a2 2 0 012.83 0 .75.75 0 001.06-1.06 3.5 3.5 0 00-4.95 0l-2.5 2.5a3.5 3.5 0 004.95 4.95l1.25-1.25a.75.75 0 00-1.06-1.06l-1.25 1.25a2 2 0 01-2.83 0z"></path></svg></a><span class="notion-h-title">2.2 关键技术突破</span></span></h4><div class="notion-text notion-block-1ba9379410e180228f42d60bf49f7a64"><b>稠密优化三大利器</b>：</div><ol start="1" class="notion-list notion-list-numbered notion-block-1ba9379410e180e1b2f4dad205ef5ee3" style="list-style-type:decimal"><li>矩阵分块（Tiling）：将大矩阵拆分为32×32子块提升缓存命中</li></ol><ol start="2" class="notion-list notion-list-numbered notion-block-1ba9379410e18054a145e6741fd512d5" style="list-style-type:decimal"><li>张量核融合：在V100上实现FP16累加到FP32的混合精度计算</li></ol><ol start="3" class="notion-list notion-list-numbered notion-block-1ba9379410e180219dfdf93faa5ea012" style="list-style-type:decimal"><li>波前调度（Wavefront Scheduling）：AMD CDNA架构的并行优化技术</li></ol><div class="notion-text notion-block-1ba9379410e180a9b93cf6e59317bd56"><b>稀疏突破三大创新</b>：</div><ol start="1" class="notion-list notion-list-numbered notion-block-1ba9379410e180efa427fafdef0e9c5a" style="list-style-type:decimal"><li>2:4结构化稀疏：NVIDIA Ampere架构的权重剪枝标准</li></ol><ol start="2" class="notion-list notion-list-numbered notion-block-1ba9379410e180c9a348ec4abf3ccc78" style="list-style-type:decimal"><li>动态激活预测：华为达芬奇架构的零值跳过技术</li></ol><ol start="3" class="notion-list notion-list-numbered notion-block-1ba9379410e18051bd7ae420100c3673" style="list-style-type:decimal"><li>概率性访存：Cerebras的稀疏数据流引擎</li></ol><hr class="notion-hr notion-block-1ba9379410e1809c8887c1e31b6e3d7f"/><h3 class="notion-h notion-h2 notion-h-indent-0 notion-block-1ba9379410e1805fb685dcfc8bc5c68e" data-id="1ba9379410e1805fb685dcfc8bc5c68e"><span><div id="1ba9379410e1805fb685dcfc8bc5c68e" class="notion-header-anchor"></div><a class="notion-hash-link" href="#1ba9379410e1805fb685dcfc8bc5c68e" title="第三章 应用战场：不同领域的算力博弈"><svg viewBox="0 0 16 16" width="16" height="16"><path fill-rule="evenodd" d="M7.775 3.275a.75.75 0 001.06 1.06l1.25-1.25a2 2 0 112.83 2.83l-2.5 2.5a2 2 0 01-2.83 0 .75.75 0 00-1.06 1.06 3.5 3.5 0 004.95 0l2.5-2.5a3.5 3.5 0 00-4.95-4.95l-1.25 1.25zm-4.69 9.64a2 2 0 010-2.83l2.5-2.5a2 2 0 012.83 0 .75.75 0 001.06-1.06 3.5 3.5 0 00-4.95 0l-2.5 2.5a3.5 3.5 0 004.95 4.95l1.25-1.25a.75.75 0 00-1.06-1.06l-1.25 1.25a2 2 0 01-2.83 0z"></path></svg></a><span class="notion-h-title">第三章 应用战场：不同领域的算力博弈</span></span></h3><h4 class="notion-h notion-h3 notion-h-indent-1 notion-block-1ba9379410e1806c880ed1a25e3cac92" data-id="1ba9379410e1806c880ed1a25e3cac92"><span><div id="1ba9379410e1806c880ed1a25e3cac92" class="notion-header-anchor"></div><a class="notion-hash-link" href="#1ba9379410e1806c880ed1a25e3cac92" title="3.1 稠密算力的统治领域"><svg viewBox="0 0 16 16" width="16" height="16"><path fill-rule="evenodd" d="M7.775 3.275a.75.75 0 001.06 1.06l1.25-1.25a2 2 0 112.83 2.83l-2.5 2.5a2 2 0 01-2.83 0 .75.75 0 00-1.06 1.06 3.5 3.5 0 004.95 0l2.5-2.5a3.5 3.5 0 00-4.95-4.95l-1.25 1.25zm-4.69 9.64a2 2 0 010-2.83l2.5-2.5a2 2 0 012.83 0 .75.75 0 001.06-1.06 3.5 3.5 0 00-4.95 0l-2.5 2.5a3.5 3.5 0 004.95 4.95l1.25-1.25a.75.75 0 00-1.06-1.06l-1.25 1.25a2 2 0 01-2.83 0z"></path></svg></a><span class="notion-h-title">3.1 稠密算力的统治领域</span></span></h4><ul class="notion-list notion-list-disc notion-block-1ba9379410e18028a280d8ab428d7ec3"><li><b>科学计算</b>：天气预报WRF模型中，全球网格划分产生10^18个计算点</li></ul><ul class="notion-list notion-list-disc notion-block-1ba9379410e180319c8ec7ea5c44be90"><li><b>图形渲染</b>：RTX 4090的129TFLOPs算力支撑8K光追实时渲染</li></ul><ul class="notion-list notion-list-disc notion-block-1ba9379410e180cab518c4ee0d6df93a"><li><b>自动驾驶</b>：BEV感知模型需连续处理多摄像头输入流</li></ul><h4 class="notion-h notion-h3 notion-h-indent-1 notion-block-1ba9379410e180f0a8fcd489d6694a55" data-id="1ba9379410e180f0a8fcd489d6694a55"><span><div id="1ba9379410e180f0a8fcd489d6694a55" class="notion-header-anchor"></div><a class="notion-hash-link" href="#1ba9379410e180f0a8fcd489d6694a55" title="3.2 稀疏算力的新兴领地"><svg viewBox="0 0 16 16" width="16" height="16"><path fill-rule="evenodd" d="M7.775 3.275a.75.75 0 001.06 1.06l1.25-1.25a2 2 0 112.83 2.83l-2.5 2.5a2 2 0 01-2.83 0 .75.75 0 00-1.06 1.06 3.5 3.5 0 004.95 0l2.5-2.5a3.5 3.5 0 00-4.95-4.95l-1.25 1.25zm-4.69 9.64a2 2 0 010-2.83l2.5-2.5a2 2 0 012.83 0 .75.75 0 001.06-1.06 3.5 3.5 0 00-4.95 0l-2.5 2.5a3.5 3.5 0 004.95 4.95l1.25-1.25a.75.75 0 00-1.06-1.06l-1.25 1.25a2 2 0 01-2.83 0z"></path></svg></a><span class="notion-h-title">3.2 稀疏算力的新兴领地</span></span></h4><ul class="notion-list notion-list-disc notion-block-1ba9379410e180c48d98e4cc8805c89e"><li><b>推荐系统</b>：阿里妈妈广告系统实现万亿级特征维度实时推理</li></ul><ul class="notion-list notion-list-disc notion-block-1ba9379410e1807883dfdd13ed6cb14b"><li><b>知识图谱</b>：Meta的ESKG引擎处理240亿实体关系查询</li></ul><ul class="notion-list notion-list-disc notion-block-1ba9379410e1802aa4b4da4199480b11"><li><b>大语言模型</b>：GPT-4注意力矩阵动态稀疏度达73%</li></ul><hr class="notion-hr notion-block-1ba9379410e180cead1dffc5ce5a2dce"/><h3 class="notion-h notion-h2 notion-h-indent-0 notion-block-1ba9379410e1804fad29ea6c9efed40e" data-id="1ba9379410e1804fad29ea6c9efed40e"><span><div id="1ba9379410e1804fad29ea6c9efed40e" class="notion-header-anchor"></div><a class="notion-hash-link" href="#1ba9379410e1804fad29ea6c9efed40e" title="第四章 性能博弈：实测数据揭示真相"><svg viewBox="0 0 16 16" width="16" height="16"><path fill-rule="evenodd" d="M7.775 3.275a.75.75 0 001.06 1.06l1.25-1.25a2 2 0 112.83 2.83l-2.5 2.5a2 2 0 01-2.83 0 .75.75 0 00-1.06 1.06 3.5 3.5 0 004.95 0l2.5-2.5a3.5 3.5 0 00-4.95-4.95l-1.25 1.25zm-4.69 9.64a2 2 0 010-2.83l2.5-2.5a2 2 0 012.83 0 .75.75 0 001.06-1.06 3.5 3.5 0 00-4.95 0l-2.5 2.5a3.5 3.5 0 004.95 4.95l1.25-1.25a.75.75 0 00-1.06-1.06l-1.25 1.25a2 2 0 01-2.83 0z"></path></svg></a><span class="notion-h-title">第四章 性能博弈：实测数据揭示真相</span></span></h3><h4 class="notion-h notion-h3 notion-h-indent-1 notion-block-1ba9379410e180eda505efdd819acbd4" data-id="1ba9379410e180eda505efdd819acbd4"><span><div id="1ba9379410e180eda505efdd819acbd4" class="notion-header-anchor"></div><a class="notion-hash-link" href="#1ba9379410e180eda505efdd819acbd4" title="4.1 算力效率对比（以A100 vs TPU v4为例）"><svg viewBox="0 0 16 16" width="16" height="16"><path fill-rule="evenodd" d="M7.775 3.275a.75.75 0 001.06 1.06l1.25-1.25a2 2 0 112.83 2.83l-2.5 2.5a2 2 0 01-2.83 0 .75.75 0 00-1.06 1.06 3.5 3.5 0 004.95 0l2.5-2.5a3.5 3.5 0 00-4.95-4.95l-1.25 1.25zm-4.69 9.64a2 2 0 010-2.83l2.5-2.5a2 2 0 012.83 0 .75.75 0 001.06-1.06 3.5 3.5 0 00-4.95 0l-2.5 2.5a3.5 3.5 0 004.95 4.95l1.25-1.25a.75.75 0 00-1.06-1.06l-1.25 1.25a2 2 0 01-2.83 0z"></path></svg></a><span class="notion-h-title">4.1 算力效率对比（以A100 vs TPU v4为例）</span></span></h4><table class="notion-simple-table notion-block-1ba9379410e1807f8502dcb9f9300bbd"><tbody><tr class="notion-simple-table-row notion-block-1ba9379410e180d6bec3f5d38d512678"><td class="" style="width:120px"><div class="notion-simple-table-cell">指标</div></td><td class="" style="width:120px"><div class="notion-simple-table-cell">稠密模式</div></td><td class="" style="width:120px"><div class="notion-simple-table-cell">稀疏模式</div></td></tr><tr class="notion-simple-table-row notion-block-1ba9379410e180bca233d7313645ddcd"><td class="" style="width:120px"><div class="notion-simple-table-cell">峰值算力</div></td><td class="" style="width:120px"><div class="notion-simple-table-cell">312TFLOPS</div></td><td class="" style="width:120px"><div class="notion-simple-table-cell">420TOPS</div></td></tr><tr class="notion-simple-table-row notion-block-1ba9379410e1804c9671eec48f34a044"><td class="" style="width:120px"><div class="notion-simple-table-cell">能效比</div></td><td class="" style="width:120px"><div class="notion-simple-table-cell">3.2TFLOPS/W</div></td><td class="" style="width:120px"><div class="notion-simple-table-cell">15.8TOPS/W</div></td></tr><tr class="notion-simple-table-row notion-block-1ba9379410e180a186b7d6f066df4112"><td class="" style="width:120px"><div class="notion-simple-table-cell">有效带宽</div></td><td class="" style="width:120px"><div class="notion-simple-table-cell">1.5TB/s</div></td><td class="" style="width:120px"><div class="notion-simple-table-cell">680GB/s</div></td></tr><tr class="notion-simple-table-row notion-block-1ba9379410e180a39f95c3792fac13e4"><td class="" style="width:120px"><div class="notion-simple-table-cell">典型延迟</div></td><td class="" style="width:120px"><div class="notion-simple-table-cell">12μs</div></td><td class="" style="width:120px"><div class="notion-simple-table-cell">8μs</div></td></tr></tbody></table><h4 class="notion-h notion-h3 notion-h-indent-1 notion-block-1ba9379410e18010ae63f05b44b56804" data-id="1ba9379410e18010ae63f05b44b56804"><span><div id="1ba9379410e18010ae63f05b44b56804" class="notion-header-anchor"></div><a class="notion-hash-link" href="#1ba9379410e18010ae63f05b44b56804" title="4.2 经济性分析（以训练175B参数模型为例）"><svg viewBox="0 0 16 16" width="16" height="16"><path fill-rule="evenodd" d="M7.775 3.275a.75.75 0 001.06 1.06l1.25-1.25a2 2 0 112.83 2.83l-2.5 2.5a2 2 0 01-2.83 0 .75.75 0 00-1.06 1.06 3.5 3.5 0 004.95 0l2.5-2.5a3.5 3.5 0 00-4.95-4.95l-1.25 1.25zm-4.69 9.64a2 2 0 010-2.83l2.5-2.5a2 2 0 012.83 0 .75.75 0 001.06-1.06 3.5 3.5 0 00-4.95 0l-2.5 2.5a3.5 3.5 0 004.95 4.95l1.25-1.25a.75.75 0 00-1.06-1.06l-1.25 1.25a2 2 0 01-2.83 0z"></path></svg></a><span class="notion-h-title">4.2 经济性分析（以训练175B参数模型为例）</span></span></h4><ul class="notion-list notion-list-disc notion-block-1ba9379410e180228ae3c816057a9010"><li><b>稠密集群</b>：需要4096块A100，耗电7.2MW，成本$460万</li></ul><ul class="notion-list notion-list-disc notion-block-1ba9379410e18034afa8dc786956aecc"><li><b>稀疏集群</b>：仅需1024块TPU v4，耗电1.1MW，成本$210万</li></ul><hr class="notion-hr notion-block-1ba9379410e1804b873ad00600536c68"/><h3 class="notion-h notion-h2 notion-h-indent-0 notion-block-1ba9379410e1800595ebe38a90bebcd6" data-id="1ba9379410e1800595ebe38a90bebcd6"><span><div id="1ba9379410e1800595ebe38a90bebcd6" class="notion-header-anchor"></div><a class="notion-hash-link" href="#1ba9379410e1800595ebe38a90bebcd6" title="第五章 未来趋势：融合计算的曙光"><svg viewBox="0 0 16 16" width="16" height="16"><path fill-rule="evenodd" d="M7.775 3.275a.75.75 0 001.06 1.06l1.25-1.25a2 2 0 112.83 2.83l-2.5 2.5a2 2 0 01-2.83 0 .75.75 0 00-1.06 1.06 3.5 3.5 0 004.95 0l2.5-2.5a3.5 3.5 0 00-4.95-4.95l-1.25 1.25zm-4.69 9.64a2 2 0 010-2.83l2.5-2.5a2 2 0 012.83 0 .75.75 0 001.06-1.06 3.5 3.5 0 00-4.95 0l-2.5 2.5a3.5 3.5 0 004.95 4.95l1.25-1.25a.75.75 0 00-1.06-1.06l-1.25 1.25a2 2 0 01-2.83 0z"></path></svg></a><span class="notion-h-title">第五章 未来趋势：融合计算的曙光</span></span></h3><h4 class="notion-h notion-h3 notion-h-indent-1 notion-block-1ba9379410e180638ce6e50de7207888" data-id="1ba9379410e180638ce6e50de7207888"><span><div id="1ba9379410e180638ce6e50de7207888" class="notion-header-anchor"></div><a class="notion-hash-link" href="#1ba9379410e180638ce6e50de7207888" title="5.1 行业最新动态"><svg viewBox="0 0 16 16" width="16" height="16"><path fill-rule="evenodd" d="M7.775 3.275a.75.75 0 001.06 1.06l1.25-1.25a2 2 0 112.83 2.83l-2.5 2.5a2 2 0 01-2.83 0 .75.75 0 00-1.06 1.06 3.5 3.5 0 004.95 0l2.5-2.5a3.5 3.5 0 00-4.95-4.95l-1.25 1.25zm-4.69 9.64a2 2 0 010-2.83l2.5-2.5a2 2 0 012.83 0 .75.75 0 001.06-1.06 3.5 3.5 0 00-4.95 0l-2.5 2.5a3.5 3.5 0 004.95 4.95l1.25-1.25a.75.75 0 00-1.06-1.06l-1.25 1.25a2 2 0 01-2.83 0z"></path></svg></a><span class="notion-h-title">5.1 行业最新动态</span></span></h4><ul class="notion-list notion-list-disc notion-block-1ba9379410e18081ba2dc1567a8c9e30"><li><b>NVIDIA</b>：Hopper架构支持动态稀疏化转换</li></ul><ul class="notion-list notion-list-disc notion-block-1ba9379410e18015982de0b5b89da447"><li><b>AMD</b>：CDNA3引入稀疏矩阵加速指令</li></ul><ul class="notion-list notion-list-disc notion-block-1ba9379410e1809eb2eac2b7b1b58156"><li><b>Intel</b>：Ponte Vecchio集成Flexible Sparsity引擎</li></ul><h4 class="notion-h notion-h3 notion-h-indent-1 notion-block-1ba9379410e180ed9ef6da89e6c76c68" data-id="1ba9379410e180ed9ef6da89e6c76c68"><span><div id="1ba9379410e180ed9ef6da89e6c76c68" class="notion-header-anchor"></div><a class="notion-hash-link" href="#1ba9379410e180ed9ef6da89e6c76c68" title="5.2 技术融合方向"><svg viewBox="0 0 16 16" width="16" height="16"><path fill-rule="evenodd" d="M7.775 3.275a.75.75 0 001.06 1.06l1.25-1.25a2 2 0 112.83 2.83l-2.5 2.5a2 2 0 01-2.83 0 .75.75 0 00-1.06 1.06 3.5 3.5 0 004.95 0l2.5-2.5a3.5 3.5 0 00-4.95-4.95l-1.25 1.25zm-4.69 9.64a2 2 0 010-2.83l2.5-2.5a2 2 0 012.83 0 .75.75 0 001.06-1.06 3.5 3.5 0 00-4.95 0l-2.5 2.5a3.5 3.5 0 004.95 4.95l1.25-1.25a.75.75 0 00-1.06-1.06l-1.25 1.25a2 2 0 01-2.83 0z"></path></svg></a><span class="notion-h-title">5.2 技术融合方向</span></span></h4><ol start="1" class="notion-list notion-list-numbered notion-block-1ba9379410e180179167c1017f66ebae" style="list-style-type:decimal"><li><b>动态稀疏感知</b>：运行时自动识别可稀疏化计算流</li></ol><ol start="2" class="notion-list notion-list-numbered notion-block-1ba9379410e180faba32d6e335500d43" style="list-style-type:decimal"><li><b>混合精度调度</b>：关键路径用稠密计算，边缘路径启用稀疏优化</li></ol><ol start="3" class="notion-list notion-list-numbered notion-block-1ba9379410e1804b8794dd5948c3866a" style="list-style-type:decimal"><li><b>存算一体设计</b>：Samsung的HBM-PIM实现存储端稀疏过滤</li></ol><hr class="notion-hr notion-block-1ba9379410e18009bea1e7bf98d5530b"/><h3 class="notion-h notion-h2 notion-h-indent-0 notion-block-1ba9379410e180afa157d024f59e61d5" data-id="1ba9379410e180afa157d024f59e61d5"><span><div id="1ba9379410e180afa157d024f59e61d5" class="notion-header-anchor"></div><a class="notion-hash-link" href="#1ba9379410e180afa157d024f59e61d5" title="结语：算力的辩证统一"><svg viewBox="0 0 16 16" width="16" height="16"><path fill-rule="evenodd" d="M7.775 3.275a.75.75 0 001.06 1.06l1.25-1.25a2 2 0 112.83 2.83l-2.5 2.5a2 2 0 01-2.83 0 .75.75 0 00-1.06 1.06 3.5 3.5 0 004.95 0l2.5-2.5a3.5 3.5 0 00-4.95-4.95l-1.25 1.25zm-4.69 9.64a2 2 0 010-2.83l2.5-2.5a2 2 0 012.83 0 .75.75 0 001.06-1.06 3.5 3.5 0 00-4.95 0l-2.5 2.5a3.5 3.5 0 004.95 4.95l1.25-1.25a.75.75 0 00-1.06-1.06l-1.25 1.25a2 2 0 01-2.83 0z"></path></svg></a><span class="notion-h-title">结语：算力的辩证统一</span></span></h3><div class="notion-text notion-block-1ba9379410e1803c8852e00b65d9dda2">当业界还在争论&quot;稀疏派&quot;与&quot;稠密派&quot;的技术路线时，真正的未来属于能驾驭这两种算力的&quot;双修者&quot;。就像CPU的标量计算与GPU的矢量计算最终走向融合，稠密与稀疏的界限正在新一代AI芯片（如Tesla Dojo）中逐渐模糊。理解这对技术双生子的本质，将帮助我们更好地迎接Zettascale（10^21次计算/秒）时代的到来。</div><div class="notion-blank notion-block-1ba9379410e18059af4bc91e00d5e8a8"> </div><div class="notion-text notion-block-1ba9379410e180db8166f0182740cad3"><b>延伸学习建议</b>：</div><ol start="1" class="notion-list notion-list-numbered notion-block-1ba9379410e18058ade9e45109846abb" style="list-style-type:decimal"><li>研究Open SparTA开源稀疏编程框架</li></ol><ol start="2" class="notion-list notion-list-numbered notion-block-1ba9379410e18024bc8cf05a0d79a91d" style="list-style-type:decimal"><li>实验PyTorch的torch.sparse模块</li></ol><ol start="3" class="notion-list notion-list-numbered notion-block-1ba9379410e180c08b45ca55f4a31c74" style="list-style-type:decimal"><li>跟踪MLPerf基准测试的稀疏推理赛道</li></ol><div class="notion-blank notion-block-1ba9379410e1804288a3cc61cfc43b30"> </div></main></div>]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[时海高四秩寄后学书]]></title>
            <link>https://tangly1024.com/essay/海高四秩风雪征程寄后学书</link>
            <guid>https://tangly1024.com/essay/海高四秩风雪征程寄后学书</guid>
            <pubDate>Thu, 13 Mar 2025 00:00:00 GMT</pubDate>
            <description><![CDATA[寄后辈]]></description>
            <content:encoded><![CDATA[<div id="notion-article" class="mx-auto overflow-hidden "><main class="notion light-mode notion-page notion-block-1b59379410e18047ae26f2e00466b56a"><div class="notion-viewport"></div><div class="notion-collection-page-properties"></div><h2 class="notion-h notion-h1 notion-h-indent-0 notion-block-1b59379410e1803d8676d39b58f2733a" data-id="1b59379410e1803d8676d39b58f2733a"><span><div id="1b59379410e1803d8676d39b58f2733a" class="notion-header-anchor"></div><a class="notion-hash-link" href="#1b59379410e1803d8676d39b58f2733a" title="🎋 时海高四秩寄后学书"><svg viewBox="0 0 16 16" width="16" height="16"><path fill-rule="evenodd" d="M7.775 3.275a.75.75 0 001.06 1.06l1.25-1.25a2 2 0 112.83 2.83l-2.5 2.5a2 2 0 01-2.83 0 .75.75 0 00-1.06 1.06 3.5 3.5 0 004.95 0l2.5-2.5a3.5 3.5 0 00-4.95-4.95l-1.25 1.25zm-4.69 9.64a2 2 0 010-2.83l2.5-2.5a2 2 0 012.83 0 .75.75 0 001.06-1.06 3.5 3.5 0 00-4.95 0l-2.5 2.5a3.5 3.5 0 004.95 4.95l1.25-1.25a.75.75 0 00-1.06-1.06l-1.25 1.25a2 2 0 01-2.83 0z"></path></svg></a><span class="notion-h-title">🎋 时海高四秩寄后学书</span></span></h2><hr class="notion-hr notion-block-1b59379410e180c4833de31d8e4309d1"/><div class="notion-blank notion-block-1b59379410e180b4aac1e1fbc1fa0acf"> </div><blockquote class="notion-quote notion-block-1b59379410e180f7a0d5e53b04aa8325"><div>时维乙巳，序属季春。</div><div class="notion-text notion-block-1b59379410e180739ea4ce34fabe78dc">适母校华馆初成，余闻雪檐铃语，恍见四十载青灯照影。</div><div class="notion-text notion-block-1b59379410e180059039e719e545030a">忆昔负笈北疆，踏碎琼瑶三寸，呵冰为砚，常伴松涛课读；</div><div class="notion-text notion-block-1b59379410e18058b6bcf1c0ef754403">今执星槎摆渡，循千百度云阶，方觉凿壁寒窗志，已化河汉万象光。</div></blockquote><hr class="notion-hr notion-block-1b59379410e18070aff6d0e5509f35e7"/><div class="notion-blank notion-block-1b59379410e180839504ed6d4c37ab17"> </div><blockquote class="notion-quote notion-block-1b59379410e180d58e79daa50d824256"><div>尝闻：志士当怀凌霄志，君子宁栖霜雪枝。</div><div class="notion-text notion-block-1b59379410e1804c8b24e9e775df8805">吾辈生于银粟之乡，当效雪松立仞，纵风刀琢骨，犹抱冰心。</div><div class="notion-text notion-block-1b59379410e180688e4bc36214b40f82">昔演算于冻砚，穷极九章之妙；</div><div class="notion-text notion-block-1b59379410e180b48b3bd4d12603df35">今掌析混沌数理，未负六管飞灰。</div><div class="notion-text notion-block-1b59379410e1806aba14fb03b16d5b20"><b>诸生谨记</b>：</div><div class="notion-text notion-block-1b59379410e180878935c6dfb3d2d774">少年襟抱，当纳星汉于芥子；</div><div class="notion-text notion-block-1b59379410e1806389a2f79bd728fd60">寒门贵胄，应种琅嬛在寒毡。</div></blockquote><hr class="notion-hr notion-block-1b59379410e18063957ec4093aa3dc63"/><div class="notion-blank notion-block-1b59379410e180788e28e898aa083c12"> </div><blockquote class="notion-quote notion-block-1b59379410e180cb8fdafe7711e2d2d6"><div>余客燕云九稔，每解数术天机，常念珠盘玉响。</div><div class="notion-text notion-block-1b59379410e180f1a8c0f35c7a292d45">愿君莫道朔风凛冽，须知寒梅破雪更著精神。</div><div class="notion-text notion-block-1b59379410e180c1834dfa0d142e5024">展卷时摹写乾坤经纬，搁笔处思量黎庶枯荣。</div><div class="notion-text notion-block-1b59379410e18018859cd75e8a1db834">他日若驭数据沧溟，须记霜蹄印月痕；</div><div class="notion-text notion-block-1b59379410e1804ea577c6460acef0bd">若绘智能经纬，莫忘冰魄鉴初心。</div></blockquote><hr class="notion-hr notion-block-1b59379410e180499250e9daf1d57424"/><div class="notion-blank notion-block-1b59379410e180ce8f15e8e1c3844b49"> </div><blockquote class="notion-quote notion-block-1b59379410e180a48354f719a8faa924"><div>今以征鸿之信相寄：</div><div class="notion-text notion-block-1b59379410e1809a92fec49821842c67">治学似铸昆吾剑，十载方听龙吟；</div><div class="notion-text notion-block-1b59379410e180aaa1f5d17f5db4d5ad">践知如织天孙锦，万梭终现凤章。</div><div class="notion-text notion-block-1b59379410e18082acf1e29f6477d189">惟愿诸君捧读此笺，既见雪窗呵手之坚，复存云帆济海之旷。</div></blockquote><hr class="notion-hr notion-block-1b59379410e180af8738e3a4c2e9aa2f"/><div class="notion-blank notion-block-1b59379410e1803e9d59e5a2510c3319"> </div><div class="notion-blank notion-block-1b59379410e18086bf00c1d8f4728014"> </div><div class="notion-text notion-block-1b59379410e180d8980cfd5547cb8942"><em>楮墨有限，意绪无疆</em>：</div><div class="notion-text notion-block-1b59379410e180448ef0eb860ee27824">愿我海高薪传不辍，<b>似镜泊春水生生不息</b>；</div><div class="notion-text notion-block-1b59379410e180ae9552e534b0ad7a75">盼吾少年风骨长存，<b>若完达松涛岁岁苍苍</b>。</div><div class="notion-blank notion-block-1b59379410e18001a875dab55d881fc1"> </div><div class="notion-text notion-block-1b59379410e1807188a1c27d78a89beb">——百度游子某 沐手谨奉</div><div class="notion-text notion-block-1b59379410e180b0b63aefeb85d8f40b">时公元二零二五年季春</div><div class="notion-blank notion-block-1b59379410e18073bfd9f490d862b85a"> </div></main></div>]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[DeepSeek FlashMLA 中的 KvCache]]></title>
            <link>https://tangly1024.com/technology/deepseek_flashmla_kvcache</link>
            <guid>https://tangly1024.com/technology/deepseek_flashmla_kvcache</guid>
            <pubDate>Mon, 24 Feb 2025 00:00:00 GMT</pubDate>
            <description><![CDATA[FlashMLA的KVCache优势]]></description>
            <content:encoded><![CDATA[<div id="notion-article" class="mx-auto overflow-hidden "><main class="notion light-mode notion-page notion-block-1a49379410e1801caa04e58f358103d4"><div class="notion-viewport"></div><div class="notion-collection-page-properties"></div><div class="notion-text notion-block-1a49379410e1809eb299ed6eba2273f3">在 <b>FlashMLA</b> 中，KVCache 的使用针对 <b>Hopper GPU架构</b> 和 <b>变长序列服务</b> 进行了深度优化，核心目标是<b>高效管理内存</b>和<b>最大化计算吞吐</b>。</div><hr class="notion-hr notion-block-1a49379410e180598543c8c32254b5f3"/><h4 class="notion-h notion-h3 notion-h-indent-0 notion-block-1a49379410e180eeb667ef4480c4cd80" data-id="1a49379410e180eeb667ef4480c4cd80"><span><div id="1a49379410e180eeb667ef4480c4cd80" class="notion-header-anchor"></div><a class="notion-hash-link" href="#1a49379410e180eeb667ef4480c4cd80" title="▎KVCache 的“分页存储”设计"><svg viewBox="0 0 16 16" width="16" height="16"><path fill-rule="evenodd" d="M7.775 3.275a.75.75 0 001.06 1.06l1.25-1.25a2 2 0 112.83 2.83l-2.5 2.5a2 2 0 01-2.83 0 .75.75 0 00-1.06 1.06 3.5 3.5 0 004.95 0l2.5-2.5a3.5 3.5 0 00-4.95-4.95l-1.25 1.25zm-4.69 9.64a2 2 0 010-2.83l2.5-2.5a2 2 0 012.83 0 .75.75 0 001.06-1.06 3.5 3.5 0 00-4.95 0l-2.5 2.5a3.5 3.5 0 004.95 4.95l1.25-1.25a.75.75 0 00-1.06-1.06l-1.25 1.25a2 2 0 01-2.83 0z"></path></svg></a><span class="notion-h-title"><b>▎KVCache 的“分页存储”设计</b></span></span></h4><ol start="1" class="notion-list notion-list-numbered notion-block-1a49379410e180b1ae71d5ecaac842b0" style="list-style-type:decimal"><li><b>问题</b>：传统KVCache按序列连续存储 → 遇到<b>变长序列</b>时（比如同时处理100字和1000字的请求），显存容易产生碎片，利用率低。</li></ol><ol start="2" class="notion-list notion-list-numbered notion-block-1a49379410e180f7aee0ceb598a2004b" style="list-style-type:decimal"><li><b>FlashMLA的解法</b>：

✅ <b>效果</b>：避免显存浪费，支持灵活的动态扩展。</li><ol class="notion-list notion-list-numbered notion-block-1a49379410e180f7aee0ceb598a2004b" style="list-style-type:lower-alpha"><ul class="notion-list notion-list-disc notion-block-1a49379410e180cd9e08df29fbd91f42"><li><b>分块存储</b>：将KVCache切成固定大小的块（例如每块存64个token的K/V），类似<b>内存分页</b>。</li></ul><ul class="notion-list notion-list-disc notion-block-1a49379410e180d3bd64e1a988e48a1d"><li><b>块表（Block Table）</b>：维护一个逻辑到物理块的映射表，动态分配块（类似文件系统的inode）。</li></ul></ol></ol><hr class="notion-hr notion-block-1a49379410e18051a217e875f02b0587"/><h4 class="notion-h notion-h3 notion-h-indent-0 notion-block-1a49379410e1807286c6f278b39ee054" data-id="1a49379410e1807286c6f278b39ee054"><span><div id="1a49379410e1807286c6f278b39ee054" class="notion-header-anchor"></div><a class="notion-hash-link" href="#1a49379410e1807286c6f278b39ee054" title="▎ 变长序列的“按需处理”"><svg viewBox="0 0 16 16" width="16" height="16"><path fill-rule="evenodd" d="M7.775 3.275a.75.75 0 001.06 1.06l1.25-1.25a2 2 0 112.83 2.83l-2.5 2.5a2 2 0 01-2.83 0 .75.75 0 00-1.06 1.06 3.5 3.5 0 004.95 0l2.5-2.5a3.5 3.5 0 00-4.95-4.95l-1.25 1.25zm-4.69 9.64a2 2 0 010-2.83l2.5-2.5a2 2 0 012.83 0 .75.75 0 001.06-1.06 3.5 3.5 0 00-4.95 0l-2.5 2.5a3.5 3.5 0 004.95 4.95l1.25-1.25a.75.75 0 00-1.06-1.06l-1.25 1.25a2 2 0 01-2.83 0z"></path></svg></a><span class="notion-h-title"><b>▎ 变长序列的“按需处理”</b></span></span></h4><ol start="1" class="notion-list notion-list-numbered notion-block-1a49379410e180a2952ddd6ba7ce7317" style="list-style-type:decimal"><li><b>传统瓶颈</b>：同时处理不同长度的序列时，GPU计算会被对齐到最长序列 → 产生大量无效计算。</li></ol><ol start="2" class="notion-list notion-list-numbered notion-block-1a49379410e180eead74cb1deb4af70f" style="list-style-type:decimal"><li><b>FlashMLA的优化</b>：</li><ol class="notion-list notion-list-numbered notion-block-1a49379410e180eead74cb1deb4af70f" style="list-style-type:lower-alpha"><ul class="notion-list notion-list-disc notion-block-1a49379410e1807896a9e37bc4fc8689"><li><b>元数据调度</b>：<code class="notion-inline-code">get_mla_metadata</code> 函数根据实际序列长度（<code class="notion-inline-code">cache_seqlens</code>）和头数（<code class="notion-inline-code">h_kv</code>）生成调度计划（<code class="notion-inline-code">tile_scheduler_metadata</code>）。</li></ul><ul class="notion-list notion-list-disc notion-block-1a49379410e180e7ba42ca37c2525892"><li><b>分片计算</b>：通过 <code class="notion-inline-code">num_splits</code> 将计算拆分为多个子任务，每个子任务适配GPU的并行核心，避免资源闲置。</li></ul></ol></ol><hr class="notion-hr notion-block-1a49379410e1809db1eafb52800a4316"/><h4 class="notion-h notion-h3 notion-h-indent-0 notion-block-1a49379410e18006837ed92212ff4750" data-id="1a49379410e18006837ed92212ff4750"><span><div id="1a49379410e18006837ed92212ff4750" class="notion-header-anchor"></div><a class="notion-hash-link" href="#1a49379410e18006837ed92212ff4750" title="▎ 极致性能：内存带宽 vs 计算密度"><svg viewBox="0 0 16 16" width="16" height="16"><path fill-rule="evenodd" d="M7.775 3.275a.75.75 0 001.06 1.06l1.25-1.25a2 2 0 112.83 2.83l-2.5 2.5a2 2 0 01-2.83 0 .75.75 0 00-1.06 1.06 3.5 3.5 0 004.95 0l2.5-2.5a3.5 3.5 0 00-4.95-4.95l-1.25 1.25zm-4.69 9.64a2 2 0 010-2.83l2.5-2.5a2 2 0 012.83 0 .75.75 0 001.06-1.06 3.5 3.5 0 00-4.95 0l-2.5 2.5a3.5 3.5 0 004.95 4.95l1.25-1.25a.75.75 0 00-1.06-1.06l-1.25 1.25a2 2 0 01-2.83 0z"></path></svg></a><span class="notion-h-title"><b>▎ 极致性能：内存带宽 vs 计算密度</b></span></span></h4><ol start="1" class="notion-list notion-list-numbered notion-block-1a49379410e1800f8301d0cd3229f90f" style="list-style-type:decimal"><li><b>两种场景优化</b>：</li><ol class="notion-list notion-list-numbered notion-block-1a49379410e1800f8301d0cd3229f90f" style="list-style-type:lower-alpha"><ul class="notion-list notion-list-disc notion-block-1a49379410e180bbaf0bc34e85d1ff77"><li><b>内存密集型</b>（3000 GB/s）：通过分块预取、合并内存访问，让K/V数据的加载速度接近GPU显存带宽极限。</li></ul><ul class="notion-list notion-list-disc notion-block-1a49379410e180cc8faaccd4bf1d8d4d"><li><b>计算密集型</b>（580 TFLOPS）：利用Hopper的Tensor Core，将注意力计算拆解为高效矩阵运算，隐藏计算延迟。</li></ul></ol></ol><ol start="2" class="notion-list notion-list-numbered notion-block-1a49379410e180938f90dc21d59d0a6a" style="list-style-type:decimal"><li><b>关键技术</b>：</li><ol class="notion-list notion-list-numbered notion-block-1a49379410e180938f90dc21d59d0a6a" style="list-style-type:lower-alpha"><ul class="notion-list notion-list-disc notion-block-1a49379410e180cbaa49e13aae5c0b95"><li><b>异步加载</b>：在计算当前块时，预加载下一个块的K/V数据。</li></ul><ul class="notion-list notion-list-disc notion-block-1a49379410e18075861dcea5273986bc"><li><b>数据压缩</b>：对分块内的K/V使用BF16格式存储，减少显存占用同时保持精度。</li></ul></ol></ol><hr class="notion-hr notion-block-1a49379410e180b9a15cc01ac87d2c38"/><h4 class="notion-h notion-h3 notion-h-indent-0 notion-block-1a49379410e180d59c92ee39acb14e18" data-id="1a49379410e180d59c92ee39acb14e18"><span><div id="1a49379410e180d59c92ee39acb14e18" class="notion-header-anchor"></div><a class="notion-hash-link" href="#1a49379410e180d59c92ee39acb14e18" title="▎ 代码流程示例"><svg viewBox="0 0 16 16" width="16" height="16"><path fill-rule="evenodd" d="M7.775 3.275a.75.75 0 001.06 1.06l1.25-1.25a2 2 0 112.83 2.83l-2.5 2.5a2 2 0 01-2.83 0 .75.75 0 00-1.06 1.06 3.5 3.5 0 004.95 0l2.5-2.5a3.5 3.5 0 00-4.95-4.95l-1.25 1.25zm-4.69 9.64a2 2 0 010-2.83l2.5-2.5a2 2 0 012.83 0 .75.75 0 001.06-1.06 3.5 3.5 0 00-4.95 0l-2.5 2.5a3.5 3.5 0 004.95 4.95l1.25-1.25a.75.75 0 00-1.06-1.06l-1.25 1.25a2 2 0 01-2.83 0z"></path></svg></a><span class="notion-h-title"><b>▎ 代码流程示例</b></span></span></h4><div class="notion-text notion-block-1a49379410e180aca4d9e2f7338703e3">以单层注意力计算为例：</div><hr class="notion-hr notion-block-1a49379410e180988bc4df1fbefd7ccd"/><h4 class="notion-h notion-h3 notion-h-indent-0 notion-block-1a49379410e180a7b319cf9039cb25fb" data-id="1a49379410e180a7b319cf9039cb25fb"><span><div id="1a49379410e180a7b319cf9039cb25fb" class="notion-header-anchor"></div><a class="notion-hash-link" href="#1a49379410e180a7b319cf9039cb25fb" title="▎ 通俗类比：图书馆与书架"><svg viewBox="0 0 16 16" width="16" height="16"><path fill-rule="evenodd" d="M7.775 3.275a.75.75 0 001.06 1.06l1.25-1.25a2 2 0 112.83 2.83l-2.5 2.5a2 2 0 01-2.83 0 .75.75 0 00-1.06 1.06 3.5 3.5 0 004.95 0l2.5-2.5a3.5 3.5 0 00-4.95-4.95l-1.25 1.25zm-4.69 9.64a2 2 0 010-2.83l2.5-2.5a2 2 0 012.83 0 .75.75 0 001.06-1.06 3.5 3.5 0 00-4.95 0l-2.5 2.5a3.5 3.5 0 004.95 4.95l1.25-1.25a.75.75 0 00-1.06-1.06l-1.25 1.25a2 2 0 01-2.83 0z"></path></svg></a><span class="notion-h-title"><b>▎ 通俗类比：图书馆与书架</b></span></span></h4><div class="notion-text notion-block-1a49379410e180f0821ff484a09878af">想象在图书馆找书：</div><ul class="notion-list notion-list-disc notion-block-1a49379410e180a48f3fe2bfa28320ce"><li><b>传统方法</b>：每次借书都要从第一本书开始逐本查找（对应无缓存的重复计算）。</li></ul><ul class="notion-list notion-list-disc notion-block-1a49379410e180ee8690d97e12b5379f"><li><b>FlashMLA的方法</b>：</li><ul class="notion-list notion-list-disc notion-block-1a49379410e180ee8690d97e12b5379f"><ol start="1" class="notion-list notion-list-numbered notion-block-1a49379410e1800ab57fc952f3706cfe" style="list-style-type:decimal"><li>将书架分成固定大小的格子（分块存储），每个格子贴标签（块表）。</li></ol><ol start="2" class="notion-list notion-list-numbered notion-block-1a49379410e180a592cfc5bee53a3017" style="list-style-type:decimal"><li>管理员提前记录每本书的位置（元数据调度），你需要时直接按标签取格子（内存高效访问）。</li></ol><ol start="3" class="notion-list notion-list-numbered notion-block-1a49379410e1809ba32dec8c700d4c2a" style="list-style-type:decimal"><li>多个读者（GPU线程）同时按标签找书，互不干扰（并行计算）。</li></ol></ul></ul><hr class="notion-hr notion-block-1a49379410e18013913be139b66c33b6"/><h4 class="notion-h notion-h3 notion-h-indent-0 notion-block-1a49379410e18050b765ef5372f20fc2" data-id="1a49379410e18050b765ef5372f20fc2"><span><div id="1a49379410e18050b765ef5372f20fc2" class="notion-header-anchor"></div><a class="notion-hash-link" href="#1a49379410e18050b765ef5372f20fc2" title="▎ 总结：FlashMLA的KVCache优势"><svg viewBox="0 0 16 16" width="16" height="16"><path fill-rule="evenodd" d="M7.775 3.275a.75.75 0 001.06 1.06l1.25-1.25a2 2 0 112.83 2.83l-2.5 2.5a2 2 0 01-2.83 0 .75.75 0 00-1.06 1.06 3.5 3.5 0 004.95 0l2.5-2.5a3.5 3.5 0 00-4.95-4.95l-1.25 1.25zm-4.69 9.64a2 2 0 010-2.83l2.5-2.5a2 2 0 012.83 0 .75.75 0 001.06-1.06 3.5 3.5 0 00-4.95 0l-2.5 2.5a3.5 3.5 0 004.95 4.95l1.25-1.25a.75.75 0 00-1.06-1.06l-1.25 1.25a2 2 0 01-2.83 0z"></path></svg></a><span class="notion-h-title"><b>▎ 总结：FlashMLA的KVCache优势</b></span></span></h4><ol start="1" class="notion-list notion-list-numbered notion-block-1a49379410e18089a936ca7cae69f2bf" style="list-style-type:decimal"><li><b>灵活适应变长</b>：分页机制让长短序列共存时显存利用率最大化。</li></ol><ol start="2" class="notion-list notion-list-numbered notion-block-1a49379410e180729e09d9ef4a80d875" style="list-style-type:decimal"><li><b>极致速度</b>：通过内存访问优化和计算分片，压榨Hopper GPU的硬件潜力。</li></ol><ol start="3" class="notion-list notion-list-numbered notion-block-1a49379410e1804da683f0716693d4ed" style="list-style-type:decimal"><li><b>工业级扩展</b>：支持大规模并发服务场景（如同时处理数千个不同长度的对话请求）。</li></ol><div class="notion-blank notion-block-1a49379410e180538851df46656b84bb"> </div></main></div>]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[万岁更相送，贤圣莫能度]]></title>
            <link>https://tangly1024.com/essay/rip_sorry6000</link>
            <guid>https://tangly1024.com/essay/rip_sorry6000</guid>
            <pubDate>Wed, 04 Dec 2024 00:00:00 GMT</pubDate>
            <content:encoded><![CDATA[<div id="notion-article" class="mx-auto overflow-hidden "><main class="notion light-mode notion-page notion-block-1529379410e180d2a510c139b9f3a41e"><div class="notion-viewport"></div><div class="notion-collection-page-properties"></div><details class="notion-toggle notion-block-1529379410e1818bb091e56435f7f3f4"><summary><em><b>浩浩阴阳移，年命如朝露。</b></em></summary><div></div></details><details class="notion-toggle notion-block-1529379410e1802cab8ec30483792f9e"><summary><em><b>人生忽如寄，寿无金石故。</b></em></summary><div></div></details><details class="notion-toggle notion-block-1529379410e1805f8b14c50b0c9b8753"><summary><em><b>万岁更相送，贤圣莫能度。</b></em>  </summary><div></div></details><hr class="notion-hr notion-block-1529379410e180088872f202562cf1c4"/><div class="notion-blank notion-block-1529379410e180f6885dfdce58e85b85"> </div><div class="notion-blank notion-block-1529379410e18056bcffcbd0c23ba58b"> </div></main></div>]]></content:encoded>
        </item>
    </channel>
</rss>