PostgreSQL 源碼解讀（130）- MVCC#14（vacuum過程-lazy_scan_heap函數）

發布時間：2020-08-04 16:02:47 來源：ITPUB博客閱讀：262 作者：husthxd 欄目：關系型數據庫

本節簡單介紹了PostgreSQL手工執行vacuum的處理流程,主要分析了ExecVacuum->vacuum->vacuum_rel->heap_vacuum_rel->lazy_scan_heap函數的實現邏輯,該函數掃描已打開的heap relation,清理堆中的每個頁面。

一、數據結構

宏定義
Vacuum和Analyze命令選項


/* ----------------------
 *      Vacuum and Analyze Statements
 *      Vacuum和Analyze命令選項
 * 
 * Even though these are nominally two statements, it's convenient to use
 * just one node type for both.  Note that at least one of VACOPT_VACUUM
 * and VACOPT_ANALYZE must be set in options.
 * 雖然在這里有兩種不同的語句,但只需要使用統一的Node類型即可.
 * 注意至少VACOPT_VACUUM/VACOPT_ANALYZE在選項中設置.
 * ----------------------
 */
typedef enum VacuumOption
{
    VACOPT_VACUUM = 1 << 0,     /* do VACUUM */
    VACOPT_ANALYZE = 1 << 1,    /* do ANALYZE */
    VACOPT_VERBOSE = 1 << 2,    /* print progress info */
    VACOPT_FREEZE = 1 << 3,     /* FREEZE option */
    VACOPT_FULL = 1 << 4,       /* FULL (non-concurrent) vacuum */
    VACOPT_SKIP_LOCKED = 1 << 5,    /* skip if cannot get lock */
    VACOPT_SKIPTOAST = 1 << 6,  /* don't process the TOAST table, if any */
    VACOPT_DISABLE_PAGE_SKIPPING = 1 << 7   /* don't skip any pages */
} VacuumOption;

xl_heap_freeze_tuple
該結構表示’freeze plan’,用于存儲在vacuum期間凍結tuple所需要的信息


/*
 * This struct represents a 'freeze plan', which is what we need to know about
 * a single tuple being frozen during vacuum.
 * 該結構表示'freeze plan',用于存儲在vacuum期間凍結tuple所需要的信息
 */
/* 0x01 was XLH_FREEZE_XMIN */
#define     XLH_FREEZE_XVAC     0x02
#define     XLH_INVALID_XVAC    0x04
typedef struct xl_heap_freeze_tuple
{
    TransactionId xmax;
    OffsetNumber offset;
    uint16      t_infomask2;
    uint16      t_infomask;
    uint8       frzflags;
} xl_heap_freeze_tuple;

二、源碼解讀

lazy_scan_heap掃描已打開的heap relation,清理堆中的每個頁面,具體工作包括:
1.將DEAD元組截斷為DEAD行指針
2.整理頁面碎片
3.設置提交狀態位(參見heap_page_prune)
4.構建空閑空間的DEAD元組和頁鏈表
5.計算堆中存活元組數量的統計信息，并在合適的情況下將頁標記為all-visible
6.執行index vacuuming并調用lazy_vacuum_heap回收DEAD行指針

其處理流程如下:
1.初始化相關變量
2.獲取總塊數(nblocks)
3.初始化統計信息和相關數組(vacrelstats/frozen)
4.計算下一個不能跳過的block(next_unskippable_block)
5.遍歷每個block
5.1如已達next_unskippable_block塊,計算下一個不能跳過的block
否則,如skipping_blocks為T,并且沒有強制執行頁面檢查,則跳到下一個block
5.2如即將超出DEAD元組tid的可用空間，那么在處理此頁面之前，執行vacuuming
5.2.1遍歷index relation,調用lazy_vacuum_index執行vacuum
5.2.2調用lazy_vacuum_heap清理heap relation中的元組
5.2.3重置vacrelstats->num_dead_tuples計數器為0
5.2.4Vacuum FSM以使新釋放的空間再頂層FSM pages中可見
5.3以擴展方式讀取buffer
5.4獲取buffer cleanup lock但不成功,則
A.aggressive為F并且非強制檢查頁面,則處理下一個block;
B.aggressive為T或者要求強制檢查頁面,如不需要凍結元組,則跳過該block;
C.aggressive為F(即要求強制檢查頁面),更新統計信息,跳過該block;
D.調用LockBufferForCleanup鎖定buf,進入常規流程
5.5如為新頁,執行相關處理邏輯(重新初始化或者標記buffer為臟),繼續下一個block;
5.6如為空頁,執行相關邏輯(設置all-visible標記等),繼續下一個block;
5.7調用heap_page_prune清理該page中的所有HOT-update鏈
5.8遍歷page中的行指針
5.8.1行指針未使用,繼續下一個tuple
5.8.2行指針是重定向指針,繼續下一個tuple
5.8.3行指針已廢棄,調用lazy_record_dead_tuple記錄需刪除的tuple,設置all_visible,繼續下一個tuple
5.8.4初始化tuple變量
5.8.5調用HeapTupleSatisfiesVacuum函數確定元組狀態,根據元組狀態執行相關標記處理
5.8.6如tupgone標記為T,記錄需刪除的tuple;否則調用heap_prepare_freeze_tuple判斷是否需要凍結,如需凍結則記錄偏移
5.9如凍結統計數>0,遍歷需凍結的行指針,執行凍結;如需記錄日志,則寫WAL Record
5.10如果沒有索引,那么執行vacuum page,而不需要二次掃描了.
5.11通過all_visible和all_visible_according_to_vm標記同步vm
5.12釋放frozen
5.13更新統計信息
5.14位最后一批dead tuples執行清理
5.15vacuum FSM
5.16執行vacuum收尾工作,為每個索引更新統計信息
5.17記錄系統日志


/*
 *  lazy_scan_heap() -- scan an open heap relation
 *  lazy_scan_heap() -- 掃描已打開的heap relation
 *
 *      This routine prunes each page in the heap, which will among other
 *      things truncate dead tuples to dead line pointers, defragment the
 *      page, and set commit status bits (see heap_page_prune).  It also builds
 *      lists of dead tuples and pages with free space, calculates statistics
 *      on the number of live tuples in the heap, and marks pages as
 *      all-visible if appropriate.  When done, or when we run low on space for
 *      dead-tuple TIDs, invoke vacuuming of indexes and call lazy_vacuum_heap
 *      to reclaim dead line pointers.
 *      這個例程將清理堆中的每個頁面，
 *        其中包括將DEAD元組截斷為DEAD行指針、整理頁面碎片和設置提交狀態位(參見heap_page_prune)。
 *      它還構建具有空閑空間的DEAD元組和頁鏈表，
 *        計算堆中存活元組數量的統計信息，并在適當的情況下將頁標記為all-visible。
 *      當完成時，或者當DEAD元組TIDs的空間不足時，
 *        執行index vacuuming并調用lazy_vacuum_heap來回收DEAD行指針。      
 *      If there are no indexes then we can reclaim line pointers on the fly;
 *      dead line pointers need only be retained until all index pointers that
 *      reference them have been killed.
 *      如果沒有索引，那么我們可以動態地回收行指針;
 *        DEAD行指針需要保留到引用它們的所有索引指針都被清理為止。
 */
static void
lazy_scan_heap(Relation onerel, int options, LVRelStats *vacrelstats,
               Relation *Irel, int nindexes, bool aggressive)
{
    BlockNumber nblocks,//塊數
                blkno;//塊號
    HeapTupleData tuple;//元組
    char       *relname;//關系名稱
    TransactionId relfrozenxid = onerel->rd_rel->relfrozenxid;//凍結的XID
    TransactionId relminmxid = onerel->rd_rel->relminmxid;//最新的mxid
    BlockNumber empty_pages,//空頁數
                vacuumed_pages,//已被vacuum數
                next_fsm_block_to_vacuum;//塊號
    //未被清理的元組數/仍存活的元組數(估算)/通過vacuum清理的元組數/DEAD但未被清理的元組數/未使用的行指針
    double      num_tuples,     /* total number of nonremovable tuples */
                live_tuples,    /* live tuples (reltuples estimate) */
                tups_vacuumed,  /* tuples cleaned up by vacuum */
                nkeep,          /* dead-but-not-removable tuples */
                nunused;        /* unused item pointers */
    IndexBulkDeleteResult **indstats;
    int         i;//臨時變量
    PGRUsage    ru0;
    Buffer      vmbuffer = InvalidBuffer;//buffer
    BlockNumber next_unskippable_block;//block number
    bool        skipping_blocks;//是否跳過block?
    xl_heap_freeze_tuple *frozen;//凍結元組數組
    StringInfoData buf;
    const int   initprog_index[] = {
        PROGRESS_VACUUM_PHASE,
        PROGRESS_VACUUM_TOTAL_HEAP_BLKS,
        PROGRESS_VACUUM_MAX_DEAD_TUPLES
    };
    int64       initprog_val[3];
    //初始化PGRUsage變量
    pg_rusage_init(&ru0);
    //獲取關系名稱
    relname = RelationGetRelationName(onerel);
    //記錄操作日志
    if (aggressive)
        ereport(elevel,
                (errmsg("aggressively vacuuming \"%s.%s\"",
                        get_namespace_name(RelationGetNamespace(onerel)),
                        relname)));
    else
        ereport(elevel,
                (errmsg("vacuuming \"%s.%s\"",
                        get_namespace_name(RelationGetNamespace(onerel)),
                        relname)));
    //初始化變量  
    empty_pages = vacuumed_pages = 0;
    next_fsm_block_to_vacuum = (BlockNumber) 0;
    num_tuples = live_tuples = tups_vacuumed = nkeep = nunused = 0;
    indstats = (IndexBulkDeleteResult **)
        palloc0(nindexes * sizeof(IndexBulkDeleteResult *));
    //獲取該relation總的塊數
    nblocks = RelationGetNumberOfBlocks(onerel);
    //初始化統計信息
    vacrelstats->rel_pages = nblocks;
    vacrelstats->scanned_pages = 0;
    vacrelstats->tupcount_pages = 0;
    vacrelstats->nonempty_pages = 0;
    vacrelstats->latestRemovedXid = InvalidTransactionId;
    //每個block都進行單獨記錄
    lazy_space_alloc(vacrelstats, nblocks);
    //為frozen數組分配內存空間
    frozen = palloc(sizeof(xl_heap_freeze_tuple) * MaxHeapTuplesPerPage);
    /* Report that we're scanning the heap, advertising total # of blocks */
    //報告正在掃描heap,并廣播總的塊數
    //PROGRESS_VACUUM_PHASE_SCAN_HEAP狀態
    initprog_val[0] = PROGRESS_VACUUM_PHASE_SCAN_HEAP;
    initprog_val[1] = nblocks;//總塊數
    initprog_val[2] = vacrelstats->max_dead_tuples;//最大廢棄元組數
    pgstat_progress_update_multi_param(3, initprog_index, initprog_val);
    /*
     * Except when aggressive is set, we want to skip pages that are
     * all-visible according to the visibility map, but only when we can skip
     * at least SKIP_PAGES_THRESHOLD consecutive pages.  Since we're reading
     * sequentially, the OS should be doing readahead for us, so there's no
     * gain in skipping a page now and then; that's likely to disable
     * readahead and so be counterproductive. Also, skipping even a single
     * page means that we can't update relfrozenxid, so we only want to do it
     * if we can skip a goodly number of pages.
     * 除非設置了aggressive，否則我們希望跳過根據vm確定的全部可見頁面，
     *   但只有當我們可以跳過至少SKIP_PAGES_THRESHOLD個連續頁面時才可以。
     * 因為我們是按順序讀取的，所以操作系統應該為我們提前讀取，
     *   所以時不時地跳過一個頁面是沒有好處的;這可能會禁用readahead，從而產生反效果。
     * 而且，即使跳過一個頁面，也意味著我們無法更新relfrozenxid，所以我們只希望跳過相當多的頁面。
     *
     * When aggressive is set, we can't skip pages just because they are
     * all-visible, but we can still skip pages that are all-frozen, since
     * such pages do not need freezing and do not affect the value that we can
     * safely set for relfrozenxid or relminmxid.
     * 當設置了aggressive(T)，我們不能僅僅因為頁面都是可見的就跳過它們，
     *   但是我們仍然可以跳過全部凍結的頁面，因為這些頁面不需要凍結，
     *   并且不影響我們可以安全地為relfrozenxid或relminmxid設置新值。
     *
     * Before entering the main loop, establish the invariant that
     * next_unskippable_block is the next block number >= blkno that we can't
     * skip based on the visibility map, either all-visible for a regular scan
     * or all-frozen for an aggressive scan.  We set it to nblocks if there's
     * no such block.  We also set up the skipping_blocks flag correctly at
     * this stage.
     * 在進入主循環之前，建立一個不變式，即next_unskippable_block: the next block number >= blkno，
     *   那么我們不能基于vm跳過它，對于常規掃描是全可見的，對于主動掃描是全凍結的。
     * 如果不存在這樣的block，那么我們就設它為nblocks。
     * 同時,我們還在這個階段正確地設置了skipping_blocks標志。
     *
     * Note: The value returned by visibilitymap_get_status could be slightly
     * out-of-date, since we make this test before reading the corresponding
     * heap page or locking the buffer.  This is OK.  If we mistakenly think
     * that the page is all-visible or all-frozen when in fact the flag's just
     * been cleared, we might fail to vacuum the page.  It's easy to see that
     * skipping a page when aggressive is not set is not a very big deal; we
     * might leave some dead tuples lying around, but the next vacuum will
     * find them.  But even when aggressive *is* set, it's still OK if we miss
     * a page whose all-frozen marking has just been cleared.  Any new XIDs
     * just added to that page are necessarily newer than the GlobalXmin we
     * computed, so they'll have no effect on the value to which we can safely
     * set relfrozenxid.  A similar argument applies for MXIDs and relminmxid.
     * 注意:visibilitymap_get_status返回的值可能有點過時，
     *   因為我們在讀取相應的堆頁面或鎖定緩沖區之前進行了測試。這沒有什么問題。
     * 如果我們錯誤地認為頁面是全部可見或全部凍結，
     *   而實際上剛剛清除了標志，那么我們可能無法執行vacuum。
     * 顯而易見，在沒有設置aggressive的情況下跳過一個頁面并不是什么大問題;
     *   我們可能會留下一些DEAD元組，但是下一個vacuum會找到它們。
     * 但是，即使設置了aggressive，如果我們錯過了剛剛清除了所有凍結標記的頁面，也沒關系。
     * 剛剛添加到該頁面的任何新xid都必須比我們計算的GlobalXmin更新，
     *   因此它們不會影響我們安全地設置relfrozenxid的值。
     * 類似的觀點也適用于mxid和relminmxid。
     *
     * We will scan the table's last page, at least to the extent of
     * determining whether it has tuples or not, even if it should be skipped
     * according to the above rules; except when we've already determined that
     * it's not worth trying to truncate the table.  This avoids having
     * lazy_truncate_heap() take access-exclusive lock on the table to attempt
     * a truncation that just fails immediately because there are tuples in
     * the last page.  This is worth avoiding mainly because such a lock must
     * be replayed on any hot standby, where it can be disruptive.
     * 即使按照上面的規則應該跳過pages,但我們將掃描該表的最后一頁，
     *   至少掃描到可以確定該表是否有元組的extent內以確定是否存在元組，
     * 除非我們已經確定不值得嘗試截斷表,那么就不需要執行這樣的掃描。
     * 這避免了lazy_truncate_heap()函數對表進行訪問獨占鎖定并嘗試立即執行截斷，因為最后一頁中有元組。
     * 這是值得的，主要是因為這樣的鎖必須在所有hot standby上replay，因為它可能會造成破壞。
     */
    //下一個未跳過的block
    next_unskippable_block = 0;
    if ((options & VACOPT_DISABLE_PAGE_SKIPPING) == 0)
    {
        //選項沒有禁用跳過PAGE
        while (next_unskippable_block < nblocks)//循環k
        {
            uint8       vmstatus;//vm狀態
            vmstatus = visibilitymap_get_status(onerel, next_unskippable_block,
                                                &vmbuffer);
            if (aggressive)
            {
                if ((vmstatus & VISIBILITYMAP_ALL_FROZEN) == 0)
                    break;//遇到全凍結的block,跳出循環
            }
            else
            {
                if ((vmstatus & VISIBILITYMAP_ALL_VISIBLE) == 0)
                    break;//如非強制掃描,遇到全可見block,跳出循環
            }
            vacuum_delay_point();
            next_unskippable_block++;
        }
    }
    if (next_unskippable_block >= SKIP_PAGES_THRESHOLD)
        skipping_blocks = true;//大于閾值,則設置為T
    else
        skipping_blocks = false;//否則為F
    for (blkno = 0; blkno < nblocks; blkno++)
    {
        //循環處理每個block
        Buffer      buf;//緩沖區編號
        Page        page;//page
        OffsetNumber offnum,//偏移
                    maxoff;
        bool        tupgone,
                    hastup;
        int         prev_dead_count;//上次已廢棄元組統計
        int         nfrozen;//凍結統計
        Size        freespace;//空閑空間
        bool        all_visible_according_to_vm = false;//通過vm判斷可見性的標記
        bool        all_visible;//全可見?
        bool        all_frozen = true;  /* provided all_visible is also true */
        bool        has_dead_tuples;//是否存在dead元組?
        TransactionId visibility_cutoff_xid = InvalidTransactionId;//事務ID
        /* see note above about forcing scanning of last page */
        //請查看上述關于最后一個page的強制掃描注釋
        //全部掃描&嘗試截斷
#define FORCE_CHECK_PAGE() \
        (blkno == nblocks - 1 && should_attempt_truncation(vacrelstats))
        //更新統計信息
        pgstat_progress_update_param(PROGRESS_VACUUM_HEAP_BLKS_SCANNED, blkno);
        if (blkno == next_unskippable_block)
        {
            //到達了next_unskippable_block標記的地方
            /* Time to advance next_unskippable_block */
            //是時候增加next_unskippable_block計數了
            next_unskippable_block++;
            //尋找下一個需跳過的block
            if ((options & VACOPT_DISABLE_PAGE_SKIPPING) == 0)
            {
                while (next_unskippable_block < nblocks)
                {
                    uint8       vmskipflags;
                    vmskipflags = visibilitymap_get_status(onerel,
                                                           next_unskippable_block,
                                                           &vmbuffer);
                    if (aggressive)
                    {
                        if ((vmskipflags & VISIBILITYMAP_ALL_FROZEN) == 0)
                            break;
                    }
                    else
                    {
                        if ((vmskipflags & VISIBILITYMAP_ALL_VISIBLE) == 0)
                            break;
                    }
                    vacuum_delay_point();
                    next_unskippable_block++;
                }
            }
            /*
             * We know we can't skip the current block.  But set up
             * skipping_blocks to do the right thing at the following blocks.
             * 不能跳過當前block.
             * 但設置skipping_blocks標記處理接下來的blocks
             */
            if (next_unskippable_block - blkno > SKIP_PAGES_THRESHOLD)
                skipping_blocks = true;
            else
                skipping_blocks = false;
            /*
             * Normally, the fact that we can't skip this block must mean that
             * it's not all-visible.  But in an aggressive vacuum we know only
             * that it's not all-frozen, so it might still be all-visible.
             * 通常，我們不能跳過這個塊的事實一定意味著它不是完全可見的。
             * 但在一個aggressive vacuum中，我們只知道它不是完全凍結的，所以它可能仍然是完全可見的。
             */
            if (aggressive && VM_ALL_VISIBLE(onerel, blkno, &vmbuffer))
                all_visible_according_to_vm = true;
        }
        else
        {
            //尚未到達next_unskippable_block標記的地方
            /*
             * The current block is potentially skippable; if we've seen a
             * long enough run of skippable blocks to justify skipping it, and
             * we're not forced to check it, then go ahead and skip.
             * Otherwise, the page must be at least all-visible if not
             * all-frozen, so we can set all_visible_according_to_vm = true.
             * 當前塊可能是可跳過的;如果我們已經看到了足夠長的可跳過的塊運行時間，則可以跳過它，
             *   并且如果我們不需要檢查，那么就繼續跳過它。
             * 否則，頁面必須至少全部可見(如果不是全部凍結的話)，
             *   因此我們可以設置all_visible_according_to_vm = true。
             */
            if (skipping_blocks && !FORCE_CHECK_PAGE())
            {
                /*
                 * Tricky, tricky.  If this is in aggressive vacuum, the page
                 * must have been all-frozen at the time we checked whether it
                 * was skippable, but it might not be any more.  We must be
                 * careful to count it as a skipped all-frozen page in that
                 * case, or else we'll think we can't update relfrozenxid and
                 * relminmxid.  If it's not an aggressive vacuum, we don't
                 * know whether it was all-frozen, so we have to recheck; but
                 * in this case an approximate answer is OK.
                 * 困難,棘手。如果這是在aggressive vacuum中，
                 *   那么在我們檢查頁面是否可跳過時，頁面肯定已經完全凍結，但現在可能不會了。
                 * 在這種情況下，我們必須小心地將其視為跳過的全部凍結頁面，
                 *   否則我們將認為無法更新relfrozenxid和relminmxid。
                 * 如果它不是一個aggressive vacuum，我們不知道它是否完全凍結了，
                 *   所以我們必須重新檢查;但在這種情況下，近似的答案是可以的。
                 */
                if (aggressive || VM_ALL_FROZEN(onerel, blkno, &vmbuffer))
                    vacrelstats->frozenskipped_pages++;//完全凍結的page計數+1
                continue;//跳到下一個block
            }
            all_visible_according_to_vm = true;
        }
        vacuum_delay_point();
        /*
         * If we are close to overrunning the available space for dead-tuple
         * TIDs, pause and do a cycle of vacuuming before we tackle this page.
         * 如即將超出DEAD元組tid的可用空間，那么在處理此頁面之前，暫停并執行一個vacuuming循環。
         */
        if ((vacrelstats->max_dead_tuples - vacrelstats->num_dead_tuples) < MaxHeapTuplesPerPage &&
            vacrelstats->num_dead_tuples > 0)
        {
            //存在廢棄的元組,而且:
            //MaxHeapTuplesPerPage + vacrelstats->num_dead_tuples > vacrelstats->max_dead_tuples
            const int   hvp_index[] = {
                PROGRESS_VACUUM_PHASE,
                PROGRESS_VACUUM_NUM_INDEX_VACUUMS
            };
            int64       hvp_val[2];
            /*
             * Before beginning index vacuuming, we release any pin we may
             * hold on the visibility map page.  This isn't necessary for
             * correctness, but we do it anyway to avoid holding the pin
             * across a lengthy, unrelated operation.
             * 在開始index vacuuming前,釋放在vm page上持有的所有pin.
             * 這對于正確性并不是必需的，但是我們這樣做是為了避免在一個冗長的、不相關的操作中持有pin。
             */
            if (BufferIsValid(vmbuffer))
            {
                ReleaseBuffer(vmbuffer);
                vmbuffer = InvalidBuffer;
            }
            /* Log cleanup info before we touch indexes */
            //在開始處理indexes前清除日志信息
            vacuum_log_cleanup_info(onerel, vacrelstats);
            /* Report that we are now vacuuming indexes */
            //正在清理vacuum indexes
            pgstat_progress_update_param(PROGRESS_VACUUM_PHASE,
                                         PROGRESS_VACUUM_PHASE_VACUUM_INDEX);
            /* Remove index entries */
            //遍歷index relation,執行vacuum
            //刪除指向在vacrelstats->dead_tuples元組的索引條目,更新運行時統計信息
            for (i = 0; i < nindexes; i++)
                lazy_vacuum_index(Irel[i],
                                  &indstats[i],
                                  vacrelstats);
            /*
             * Report that we are now vacuuming the heap.  We also increase
             * the number of index scans here; note that by using
             * pgstat_progress_update_multi_param we can update both
             * parameters atomically.
             * 報告正在vacumming heap.
             * 這里會增加索引掃描,注意通過設置pgstat_progress_update_multi_param參數可以同時自動更新參數.
             */
            hvp_val[0] = PROGRESS_VACUUM_PHASE_VACUUM_HEAP;
            hvp_val[1] = vacrelstats->num_index_scans + 1;
            pgstat_progress_update_multi_param(2, hvp_index, hvp_val);
            /* Remove tuples from heap */
            //清理heap relation中的元組
            lazy_vacuum_heap(onerel, vacrelstats);
            /*
             * Forget the now-vacuumed tuples, and press on, but be careful
             * not to reset latestRemovedXid since we want that value to be
             * valid.
             * 無需理會now-vacuumed元組,
             *   繼續處理，但是要小心不要重置latestRemovedXid，因為我們希望該值是有效的。
             */
            vacrelstats->num_dead_tuples = 0;//重置計數
            vacrelstats->num_index_scans++;//索引掃描次數+1
            /*
             * Vacuum the Free Space Map to make newly-freed space visible on
             * upper-level FSM pages.  Note we have not yet processed blkno.
             * Vacuum FSM以使新釋放的空間再頂層FSM pages中可見.
             * 注意,我們還沒有處理blkno。
             */
            FreeSpaceMapVacuumRange(onerel, next_fsm_block_to_vacuum, blkno);
            next_fsm_block_to_vacuum = blkno;
            /* Report that we are once again scanning the heap */
            //報告再次掃描heap
            pgstat_progress_update_param(PROGRESS_VACUUM_PHASE,
                                         PROGRESS_VACUUM_PHASE_SCAN_HEAP);
        }
        /*
         * Pin the visibility map page in case we need to mark the page
         * all-visible.  In most cases this will be very cheap, because we'll
         * already have the correct page pinned anyway.  However, it's
         * possible that (a) next_unskippable_block is covered by a different
         * VM page than the current block or (b) we released our pin and did a
         * cycle of index vacuuming.
         * 如需要標記page為all-visible,則在內存中PIN VM.
         * 在大多數情況下,這個動作的成本很低,因為我們已經pinned page了.
         * 但是,有可能(a) next_unskippable_block被不同的VM page而不是當前block覆蓋
         *   (b) 釋放了pin并且執行了index vacuuming
         */
        visibilitymap_pin(onerel, blkno, &vmbuffer);
        //以擴展方式讀取buffer
        buf = ReadBufferExtended(onerel, MAIN_FORKNUM, blkno,
                                 RBM_NORMAL, vac_strategy);
        /* We need buffer cleanup lock so that we can prune HOT chains. */
        //需要buffer cleanup lock以便清理HOT chains.
        //ConditionalLockBufferForCleanup - 跟LockBufferForCleanup類似,但不會等待鎖的獲取
        if (!ConditionalLockBufferForCleanup(buf))
        {
            //----------- 不能獲取到鎖
            /*
             * If we're not performing an aggressive scan to guard against XID
             * wraparound, and we don't want to forcibly check the page, then
             * it's OK to skip vacuuming pages we get a lock conflict on. They
             * will be dealt with in some future vacuum.
             * 如果執行的不是aggressive掃描(用于避免XID wraparound),而且我們不希望強制檢查頁面,
             *   那么出現鎖沖突跳過vacuuming pages也是可以接受的.
             * 這些page會在未來的vacuum中進行處理.
             */
            if (!aggressive && !FORCE_CHECK_PAGE())
            {
                //非aggressive掃描 && 不強制檢查page
                //釋放buffer,跳過pinned pages+1
                ReleaseBuffer(buf);
                vacrelstats->pinskipped_pages++;
                continue;
            }
            /*
             * Read the page with share lock to see if any xids on it need to
             * be frozen.  If not we just skip the page, after updating our
             * scan statistics.  If there are some, we wait for cleanup lock.
             * 使用共享鎖讀取page,檢查是否存在XIDs需要凍結.
             * 如無此需要,則更新掃描統計信息后跳過此page.
             * 如有此需要,則等待clean lock.
             *
             * We could defer the lock request further by remembering the page
             * and coming back to it later, or we could even register
             * ourselves for multiple buffers and then service whichever one
             * is received first.  For now, this seems good enough.
             * 我們可以通過記住頁面并稍后返回來進一步延遲鎖請求，
             *   或者甚至可以為多個緩沖區注冊，然后為最先接收到的緩沖區提供服務。
             *
             * If we get here with aggressive false, then we're just forcibly
             * checking the page, and so we don't want to insist on getting
             * the lock; we only need to know if the page contains tuples, so
             * that we can update nonempty_pages correctly.  It's convenient
             * to use lazy_check_needs_freeze() for both situations, though.
             * 如aggressive為F,那么強制執行page檢查,這時候不希望一直持有鎖,
             *   我們只需要知道page包含tuples以便可以正確的更新非空pages.
             * 對于這兩種情況，都可以方便地使用lazy_check_needs_freeze()。
             */
            //共享方式鎖定buffer
            LockBuffer(buf, BUFFER_LOCK_SHARE);
            //lazy_check_needs_freeze --> 掃描page檢查是否存在元組需要清理以避免wraparound
            if (!lazy_check_needs_freeze(buf, &hastup))
            {
                //不存在需要清理的tuples
                UnlockReleaseBuffer(buf);
                vacrelstats->scanned_pages++;
                vacrelstats->pinskipped_pages++;
                if (hastup)
                    vacrelstats->nonempty_pages = blkno + 1;
                //跳過該block
                continue;
            }
            if (!aggressive)
            {
                /*
                 * Here, we must not advance scanned_pages; that would amount
                 * to claiming that the page contains no freezable tuples.
                 * 在這里不需要增加scanned_pages,這相當于聲明頁面不包含可凍結的元組。
                 */
                UnlockReleaseBuffer(buf);
                vacrelstats->pinskipped_pages++;
                if (hastup)
                    vacrelstats->nonempty_pages = blkno + 1;
                continue;
            }
            LockBuffer(buf, BUFFER_LOCK_UNLOCK);
            LockBufferForCleanup(buf);
            /* drop through to normal processing */
        }
        //更新統計信息
        vacrelstats->scanned_pages++;
        vacrelstats->tupcount_pages++;
        //獲取page
        page = BufferGetPage(buf);
        if (PageIsNew(page))
        {
            //-------------- 新初始化的PAGE
            /*
             * An all-zeroes page could be left over if a backend extends the
             * relation but crashes before initializing the page. Reclaim such
             * pages for use.
             * 如果后臺進程擴展了relation但在初始化頁面前數據庫崩潰,那么初始化(全0)的page可以一直保留.
             * 重新聲明該頁面可用即可.
             *
             * We have to be careful here because we could be looking at a
             * page that someone has just added to the relation and not yet
             * been able to initialize (see RelationGetBufferForTuple). To
             * protect against that, release the buffer lock, grab the
             * relation extension lock momentarily, and re-lock the buffer. If
             * the page is still uninitialized by then, it must be left over
             * from a crashed backend, and we can initialize it.
             * 在這里注意小心應對,我們可能正在搜索一個其他進程需要添加到relation的page,
             *   而且該page尚未初始化(詳見RelationGetBufferForTuple).
             * 為了避免這種情況引起的問題,釋放緩存鎖,暫時獲取關系擴展鎖,并重新鎖定緩沖.
             * 如果這時候page仍為初始化,那么該page肯定是一個崩潰的后臺進程導致的,
             *   這時候我們可以初始化該page.
             *
             * We don't really need the relation lock when this is a new or
             * temp relation, but it's probably not worth the code space to
             * check that, since this surely isn't a critical path.
             * 對于新的或臨時relation,這時候不需要獲取relation鎖,
             *   但是可能不值得花這么多代碼來檢查它，因為這肯定不是一個關鍵路徑。
             *
             * Note: the comparable code in vacuum.c need not worry because
             * it's got exclusive lock on the whole relation.
             * 注意:無需擔心vacuum.c中的對比代碼,因為代碼并沒有獲取整個relation的獨享鎖.
             */
            LockBuffer(buf, BUFFER_LOCK_UNLOCK);
            //ExclusiveLock鎖定
            LockRelationForExtension(onerel, ExclusiveLock);
            //ExclusiveLock釋放
            UnlockRelationForExtension(onerel, ExclusiveLock);
            //鎖定buffer
            LockBufferForCleanup(buf);
            //再次判斷page是否NEW
            if (PageIsNew(page))
            {
                //page仍然是New的,那可以重新init該page了.
                ereport(WARNING,
                        (errmsg("relation \"%s\" page %u is uninitialized --- fixing",
                                relname, blkno)));
                PageInit(page, BufferGetPageSize(buf), 0);
                empty_pages++;
            }
            //獲取空閑空間
            freespace = PageGetHeapFreeSpace(page);
            //標記buffer為臟
            MarkBufferDirty(buf);
            UnlockReleaseBuffer(buf);
            //標記page
            RecordPageWithFreeSpace(onerel, blkno, freespace);
            //下一個page
            continue;
        }
        if (PageIsEmpty(page))
        {
            //----------------- 空PAGE
            empty_pages++;
            freespace = PageGetHeapFreeSpace(page);
            /* empty pages are always all-visible and all-frozen */
            //空pages通常是all-visible和all-frozen的
            if (!PageIsAllVisible(page))
            {
                //Page不是all-Visible
                //處理之
                START_CRIT_SECTION();
                /* mark buffer dirty before writing a WAL record */
                //寫入WAL Record前標記該buffer為臟buffer
                MarkBufferDirty(buf);
                /*
                 * It's possible that another backend has extended the heap,
                 * initialized the page, and then failed to WAL-log the page
                 * due to an ERROR.  Since heap extension is not WAL-logged,
                 * recovery might try to replay our record setting the page
                 * all-visible and find that the page isn't initialized, which
                 * will cause a PANIC.  To prevent that, check whether the
                 * page has been previously WAL-logged, and if not, do that
                 * now.
                 * 存在可能:另外一個后臺進程已擴展了heap,并初始化了page,但記錄日志失敗.
                 * 因為heap擴展是沒有寫日志的,恢復過程可能嘗試回放我們的記錄設置page
                 *   為all-visible并發現該page并未初始化,這會導致PANIC.
                 * 為了避免這種情況,檢查page先前是否已記錄日志,如沒有,現在執行該操作.
                 */
                if (RelationNeedsWAL(onerel) &&
                    PageGetLSN(page) == InvalidXLogRecPtr)
                    //如需要記錄WAL Record但page的LSN非法,則記錄日志
                    log_newpage_buffer(buf, true);
                //設置page的all-visible標記
                PageSetAllVisible(page);
                //設置vm
                visibilitymap_set(onerel, blkno, buf, InvalidXLogRecPtr,
                                  vmbuffer, InvalidTransactionId,
                                  VISIBILITYMAP_ALL_VISIBLE | VISIBILITYMAP_ALL_FROZEN);
                END_CRIT_SECTION();
            }
            UnlockReleaseBuffer(buf);
            RecordPageWithFreeSpace(onerel, blkno, freespace);
            //處理下一個block
            continue;
        }
        /*
         * Prune all HOT-update chains in this page.
         * 清理該page中的所有HOT-update鏈
         *
         * We count tuples removed by the pruning step as removed by VACUUM.
         * 計算通過VACUUM的清理步驟清楚的tuples數量.
         */
        tups_vacuumed += heap_page_prune(onerel, buf, OldestXmin, false,
                                         &vacrelstats->latestRemovedXid);
        /*
         * Now scan the page to collect vacuumable items and check for tuples
         * requiring freezing.
         * 現在,掃描page統計已清理的條目數并檢查哪些tuples需要凍結.
         */
        all_visible = true;
        has_dead_tuples = false;
        nfrozen = 0;
        hastup = false;
        prev_dead_count = vacrelstats->num_dead_tuples;
        maxoff = PageGetMaxOffsetNumber(page);//獲取最大偏移
        /*
         * Note: If you change anything in the loop below, also look at
         * heap_page_is_all_visible to see if that needs to be changed.
         * 注意:如果在下面的循環中修改了業務邏輯,
         *   需要檢查heap_page_is_all_visible判斷是否需要改變.
         */
        for (offnum = FirstOffsetNumber;
             offnum <= maxoff;
             offnum = OffsetNumberNext(offnum))
        {
            ItemId      itemid;
            itemid = PageGetItemId(page, offnum);
            /* Unused items require no processing, but we count 'em */
            //未使用的條目無需處理,但需要計數.
            if (!ItemIdIsUsed(itemid))
            {
                //未被使用,跳過
                nunused += 1;
                continue;
            }
            /* Redirect items mustn't be touched */
            //重定向的條目不需要"接觸".
            if (ItemIdIsRedirected(itemid))
            {
                //重定向的ITEM
                //該page不能被截斷
                hastup = true;  /* this page won't be truncatable */
                continue;
            }
            //設置行指針
            ItemPointerSet(&(tuple.t_self), blkno, offnum);
            /*
             * DEAD item pointers are to be vacuumed normally; but we don't
             * count them in tups_vacuumed, else we'd be double-counting (at
             * least in the common case where heap_page_prune() just freed up
             * a non-HOT tuple).
             * 廢棄的行指針將被正常vacuumed.
             * 但我們不需要通過tups_vacuumed變量計數,否則會重復統計.
             * (起碼在通常情況下,heap_page_prune()會釋放non-HOT元組)
             */
            if (ItemIdIsDead(itemid))
            {
                //記錄需刪除的tuple
                //vacrelstats->dead_tuples[vacrelstats->num_dead_tuples] = *itemptr;
                //vacrelstats->num_dead_tuples++;
                lazy_record_dead_tuple(vacrelstats, &(tuple.t_self));
                all_visible = false;
                continue;
            }
            Assert(ItemIdIsNormal(itemid));
            //獲取數據
            tuple.t_data = (HeapTupleHeader) PageGetItem(page, itemid);
            tuple.t_len = ItemIdGetLength(itemid);
            tuple.t_tableOid = RelationGetRelid(onerel);
            tupgone = false;
            /*
             * The criteria for counting a tuple as live in this block need to
             * match what analyze.c's acquire_sample_rows() does, otherwise
             * VACUUM and ANALYZE may produce wildly different reltuples
             * values, e.g. when there are many recently-dead tuples.
             * 統計存活元組的計算策略需要與analyze.c中的acquire_sample_rows()邏輯匹配,
             *   否則的話,VACUUM/ANALYZE可能會產生差異很大的reltuples值,
             *   比如在出現非常多近期被廢棄的元組的情況下.
             *
             * The logic here is a bit simpler than acquire_sample_rows(), as
             * VACUUM can't run inside a transaction block, which makes some
             * cases impossible (e.g. in-progress insert from the same
             * transaction).
             * 這里的邏輯比acquire_sample_rows()函數邏輯要簡單許多,
             *   因為VACUUM不能在事務塊內支線,這可以減少許多不必要的邏輯.
             */
            //為VACUUM確定元組的狀態.
            //在這里,主要目的是一個元組是否可能對所有正在運行中的事務可見.
            switch (HeapTupleSatisfiesVacuum(&tuple, OldestXmin, buf))
            {
                case HEAPTUPLE_DEAD:
                    /*
                     * Ordinarily, DEAD tuples would have been removed by
                     * heap_page_prune(), but it's possible that the tuple
                     * state changed since heap_page_prune() looked.  In
                     * particular an INSERT_IN_PROGRESS tuple could have
                     * changed to DEAD if the inserter aborted.  So this
                     * cannot be considered an error condition.
                     * 通常來說,廢棄的元組可能已通過heap_page_prune()函數清除,
                     *   但在heap_page_prune()搜索的過程中元組的狀態可能會出現變更.
                     * 特別是，如果插入程序中止，INSERT_IN_PROGRESS元組可能已經變成DEAD。
                     * 所以這不能被認為是一個錯誤條件。
                     *
                     * If the tuple is HOT-updated then it must only be
                     * removed by a prune operation; so we keep it just as if
                     * it were RECENTLY_DEAD.  Also, if it's a heap-only
                     * tuple, we choose to keep it, because it'll be a lot
                     * cheaper to get rid of it in the next pruning pass than
                     * to treat it like an indexed tuple.
                     * 如果該tuple是HOT-updated,那么必須通過pruge操作清理.
                     *   因此元組狀態調整為RECENTLY_DEAD.
                     * 同時,如果這是一個HOT,我們選擇保留該tuple,
                     *   因為在下一次清理中刪除它要比現在像處理索引元組那樣處理它成本要低得多。
                     *
                     * If this were to happen for a tuple that actually needed
                     * to be deleted, we'd be in trouble, because it'd
                     * possibly leave a tuple below the relation's xmin
                     * horizon alive.  heap_prepare_freeze_tuple() is prepared
                     * to detect that case and abort the transaction,
                     * preventing corruption.
                     * 如果這種情況發生在需要刪除的元組上，我們就有麻煩了，
                     *   因為它可能會使關系小于xmin的元組保持活動狀態。
                     * heap_prepare_freeze_tuple()函數用于檢測這種狀態,并終止事務以避免出現崩潰.
                     */
                    if (HeapTupleIsHotUpdated(&tuple) ||
                        HeapTupleIsHeapOnly(&tuple))
                        nkeep += 1;
                    else
                        //可以刪除元組
                        tupgone = true; /* we can delete the tuple */
                    //存在dead tuple,設置all Visible標記為F
                    all_visible = false;
                    break;
                case HEAPTUPLE_LIVE:
                    /*
                     * Count it as live.  Not only is this natural, but it's
                     * also what acquire_sample_rows() does.
                     * 存活元組計數.
                     * 這不僅很自然，而且acquire_sample_rows()也是這樣做的。
                     */
                    live_tuples += 1;
                    /*
                     * Is the tuple definitely visible to all transactions?
                     * 元組對所有事務肯定可見嗎?
                     *
                     * NB: Like with per-tuple hint bits, we can't set the
                     * PD_ALL_VISIBLE flag if the inserter committed
                     * asynchronously. See SetHintBits for more info. Check
                     * that the tuple is hinted xmin-committed because of
                     * that.
                     * 注意:與per-tuple hint bits類似,如果異步提交,那么不能設置PD_ALL_VISIBLE標記.
                     * 詳見SetHintBits函數.
                     * 因此需要檢測該元組已標記為xmin-committed.
                     */
                    if (all_visible)
                    {
                        //all_visible = T
                        TransactionId xmin;
                        if (!HeapTupleHeaderXminCommitted(tuple.t_data))
                        {
                            //xmin not committed,設置為F
                            all_visible = false;
                            break;
                        }
                        /*
                         * The inserter definitely committed. But is it old
                         * enough that everyone sees it as committed?
                         * 插入器確實已經提交
                         * 但已足夠老,其他進程都可以看到?
                         */
                        xmin = HeapTupleHeaderGetXmin(tuple.t_data);
                        if (!TransactionIdPrecedes(xmin, OldestXmin))
                        {
                            //元組xmin比OldestXmin要小,則設置為F
                            all_visible = false;
                            break;
                        }
                        /* Track newest xmin on page. */
                        //跟蹤page上最新的xmin
                        //if (int32)(xmin > visibility_cutoff_xid) > 0,return T
                        if (TransactionIdFollows(xmin, visibility_cutoff_xid))
                            visibility_cutoff_xid = xmin;
                    }
                    break;
                case HEAPTUPLE_RECENTLY_DEAD:
                    /*
                     * If tuple is recently deleted then we must not remove it
                     * from relation.
                     * 如元組是近期被刪除的,那么不能從relation中刪除這些元組.
                     */
                    nkeep += 1;
                    all_visible = false;
                    break;
                case HEAPTUPLE_INSERT_IN_PROGRESS:
                    /*
                     * This is an expected case during concurrent vacuum.
                     * 在并發vacuum期間這是可以預期的情況.
                     *
                     * We do not count these rows as live, because we expect
                     * the inserting transaction to update the counters at
                     * commit, and we assume that will happen only after we
                     * report our results.  This assumption is a bit shaky,
                     * but it is what acquire_sample_rows() does, so be
                     * consistent.
                     * 不能統計這些元組為存活元組,因為我們期望插入事務在提交時更新計數器,
                     *   同時我們假定只在報告了結果后才會發生.
                     * 這個假設有點不可靠，但acquire_sample_rows()就是這么做的，所以要保持一致。
                     */
                    all_visible = false;
                    break;
                case HEAPTUPLE_DELETE_IN_PROGRESS:
                    /* This is an expected case during concurrent vacuum */
                    //在同步期間,這種情況可以預期
                    all_visible = false;
                    /*
                     * Count such rows as live.  As above, we assume the
                     * deleting transaction will commit and update the
                     * counters after we report.
                     * 這些行視為存活行.
                     * 如上所述,我們假定刪除事務會提交并在我們報告后更新計數器.
                     */
                    live_tuples += 1;
                    break;
                default:
                    //沒有其他狀態了.
                    elog(ERROR, "unexpected HeapTupleSatisfiesVacuum result");
                    break;
            }
            if (tupgone)
            {
                 //記錄需刪除的tuple
                //vacrelstats->dead_tuples[vacrelstats->num_dead_tuples] = *itemptr;
                //vacrelstats->num_dead_tuples++;
                lazy_record_dead_tuple(vacrelstats, &(tuple.t_self));
                HeapTupleHeaderAdvanceLatestRemovedXid(tuple.t_data,
                                                       &vacrelstats->latestRemovedXid);
                tups_vacuumed += 1;
                has_dead_tuples = true;
            }
            else
            {
                bool        tuple_totally_frozen;//所有都凍結標記
                num_tuples += 1;
                hastup = true;
                /*
                 * Each non-removable tuple must be checked to see if it needs
                 * freezing.  Note we already have exclusive buffer lock.
                 * 每一個未清理的tuple必須檢查看看是否需要凍結.
                 * 注意我們已經持有了獨占緩沖鎖.
                 */
                if (heap_prepare_freeze_tuple(tuple.t_data,
                                              relfrozenxid, relminmxid,
                                              FreezeLimit, MultiXactCutoff,
                                              &frozen[nfrozen],
                                              &tuple_totally_frozen))
                    frozen[nfrozen++].offset = offnum;
                if (!tuple_totally_frozen)
                    all_frozen = false;
            }
        }                       /* scan along page */
        /*
         * If we froze any tuples, mark the buffer dirty, and write a WAL
         * record recording the changes.  We must log the changes to be
         * crash-safe against future truncation of CLOG.
         * 如果凍結了所有的元組,標記緩沖為臟狀態,寫入WAL Record記錄這些變化.
         * 必須記錄這些變化以避免截斷CLOG時出現崩潰導致數據丟失.
         */
        if (nfrozen > 0)
        {
            //已凍結計數>0,執行相關處理
            START_CRIT_SECTION();
            //標記緩沖為臟
            MarkBufferDirty(buf);
            /* execute collected freezes */
            //執行凍結
            for (i = 0; i < nfrozen; i++)
            {
                ItemId      itemid;
                HeapTupleHeader htup;
                itemid = PageGetItemId(page, frozen[i].offset);
                htup = (HeapTupleHeader) PageGetItem(page, itemid);
                //執行凍結
                heap_execute_freeze_tuple(htup, &frozen[i]);
            }
            /* Now WAL-log freezing if necessary */
            //如需要,記錄凍結日志
            if (RelationNeedsWAL(onerel))
            {
                XLogRecPtr  recptr;
                recptr = log_heap_freeze(onerel, buf, FreezeLimit,
                                         frozen, nfrozen);
                PageSetLSN(page, recptr);
            }
            END_CRIT_SECTION();
        }
        /*
         * If there are no indexes then we can vacuum the page right now
         * instead of doing a second scan.
         * 如果沒有索引,那么現在執行vacuum page而不需要二次掃描.
         */
        if (nindexes == 0 &&
            vacrelstats->num_dead_tuples > 0)
        {
            //------------- 如無索引并且存在dead元組,執行清理
            /* Remove tuples from heap */
            //清除元組
            lazy_vacuum_page(onerel, blkno, buf, 0, vacrelstats, &vmbuffer);
            has_dead_tuples = false;
            /*
             * Forget the now-vacuumed tuples, and press on, but be careful
             * not to reset latestRemovedXid since we want that value to be
             * valid.
             * 無需再關注現在已被vacuum的元組,繼續,但要小心不要重置了latestRemovedXid,
             *   因為我們希望該值是有效的.
             */
            vacrelstats->num_dead_tuples = 0;//重置計數器
            vacuumed_pages++;//已完成的page+1
            /*
             * Periodically do incremental FSM vacuuming to make newly-freed
             * space visible on upper FSM pages.  Note: although we've cleaned
             * the current block, we haven't yet updated its FSM entry (that
             * happens further down), so passing end == blkno is correct.
             * 周期性的進行增量FSM vacuuming,以使新釋放的空間在上層FSM pages中可見.
             * 注意:雖然我們已經清理了當前塊,我們并不需要更新塊的FSM入口(后續才進行處理),
             *   因此設置end == blkno是沒有問題的.
             */
            if (blkno - next_fsm_block_to_vacuum >= VACUUM_FSM_EVERY_PAGES)
            {
                //批量處理
                FreeSpaceMapVacuumRange(onerel, next_fsm_block_to_vacuum,
                                        blkno);
                next_fsm_block_to_vacuum = blkno;
            }
        }
        //獲取空閑空間
        freespace = PageGetHeapFreeSpace(page);
        //以下if/else邏輯用于同步vm狀態
        /* mark page all-visible, if appropriate */
        //如OK,標記頁面為all-Visible
        if (all_visible && !all_visible_according_to_vm)
        {
            //
            uint8       flags = VISIBILITYMAP_ALL_VISIBLE;
            if (all_frozen)
                flags |= VISIBILITYMAP_ALL_FROZEN;
            /*
             * It should never be the case that the visibility map page is set
             * while the page-level bit is clear, but the reverse is allowed
             * (if checksums are not enabled).  Regardless, set the both bits
             * so that we get back in sync.
             * 如page-level bit是否被清除,不應設置VM page,但允許反向設置(如沒有啟用校驗和).
             * 不管怎樣,把這兩個標記位都設置好,這樣我們就可以同步狀態了.
             *
             * NB: If the heap page is all-visible but the VM bit is not set,
             * we don't need to dirty the heap page.  However, if checksums
             * are enabled, we do need to make sure that the heap page is
             * dirtied before passing it to visibilitymap_set(), because it
             * may be logged.  Given that this situation should only happen in
             * rare cases after a crash, it is not worth optimizing.
             * 注意:如果heap page是all-visible但VM沒有設置,我們不需要設置該page為臟page.
             * 但是,如果啟用了校驗位,
             *   我們確實需要確保heap page在傳遞給visibilitymap_set()函數前標記為臟,因為可能需要記錄日志.
             * 給定的這個條件應只出現在較為罕見的崩潰之后,因此不值得調優.  
             */
            PageSetAllVisible(page);
            MarkBufferDirty(buf);
            visibilitymap_set(onerel, blkno, buf, InvalidXLogRecPtr,
                              vmbuffer, visibility_cutoff_xid, flags);
        }
        /*
         * As of PostgreSQL 9.2, the visibility map bit should never be set if
         * the page-level bit is clear.  However, it's possible that the bit
         * got cleared after we checked it and before we took the buffer
         * content lock, so we must recheck before jumping to the conclusion
         * that something bad has happened.
         * 從PostgreSQL 9.2開始，如果頁面級別位已清除，就不應該設置可見性映射位。
         * 但是,可能會出現在我們檢查之后和持有緩存內存鎖之前,頁面級別位被清理,
         *   因此我們必須在情況變壞之前重新檢查
         */
        else if (all_visible_according_to_vm && !PageIsAllVisible(page)
                 && VM_ALL_VISIBLE(onerel, blkno, &vmbuffer))
        {
            elog(WARNING, "page is not marked all-visible but visibility map bit is set in relation \"%s\" page %u",
                 relname, blkno);
            visibilitymap_clear(onerel, blkno, vmbuffer,
                                VISIBILITYMAP_VALID_BITS);
        }
        /*
         * It's possible for the value returned by GetOldestXmin() to move
         * backwards, so it's not wrong for us to see tuples that appear to
         * not be visible to everyone yet, while PD_ALL_VISIBLE is already
         * set. The real safe xmin value never moves backwards, but
         * GetOldestXmin() is conservative and sometimes returns a value
         * that's unnecessarily small, so if we see that contradiction it just
         * means that the tuples that we think are not visible to everyone yet
         * actually are, and the PD_ALL_VISIBLE flag is correct.
         * GetOldestXmin()返回的值有可能向后移動，
         *   因此我們看到的元組似乎還不是每個事務都可見，
         *   而PD_ALL_VISIBLE已經設置好了，這并沒有錯。
         * 實際安全的xmin值永遠都不應該往后移動,但GetOldestXmin()比較保守,有時會返回一個不必要的小值,
         *   因此如果我們看到這個毛病,那么意味著我們認為對所有事務都不可見的元組實際上仍在那里,
         *   而且PD_ALL_VISIBLE標記是正確的.
         *
         * There should never be dead tuples on a page with PD_ALL_VISIBLE
         * set, however.
         * 但是,在一個標記為PD_ALL_VISIBLE的page中,永遠不應出現dead tupls.
         */
        else if (PageIsAllVisible(page) && has_dead_tuples)
        {
            elog(WARNING, "page containing dead tuples is marked as all-visible in relation \"%s\" page %u",
                 relname, blkno);
            PageClearAllVisible(page);
            MarkBufferDirty(buf);
            visibilitymap_clear(onerel, blkno, vmbuffer,
                                VISIBILITYMAP_VALID_BITS);
        }
        /*
         * If the all-visible page is turned out to be all-frozen but not
         * marked, we should so mark it.  Note that all_frozen is only valid
         * if all_visible is true, so we must check both.
         * 如all-visible page已被凍結但未被標記,我們應該標記它.
         * 注意all_frozen只有在all_visible為T的情況下才是有效的,因此必須兩者都要檢查.
         */
        else if (all_visible_according_to_vm && all_visible && all_frozen &&
                 !VM_ALL_FROZEN(onerel, blkno, &vmbuffer))
        {
            /*
             * We can pass InvalidTransactionId as the cutoff XID here,
             * because setting the all-frozen bit doesn't cause recovery
             * conflicts.
             * 我們可以把InvalidTransactionId作為截斷XID參數進行傳遞,
             *   因為設置all-frozen位必會導致恢復沖突.
             */
            visibilitymap_set(onerel, blkno, buf, InvalidXLogRecPtr,
                              vmbuffer, InvalidTransactionId,
                              VISIBILITYMAP_ALL_FROZEN);
        }
        UnlockReleaseBuffer(buf);
        /* Remember the location of the last page with nonremovable tuples */
        //使用未被清理的元組記錄最后一個頁面的位置.
        if (hastup)
            vacrelstats->nonempty_pages = blkno + 1;
        /*
         * If we remembered any tuples for deletion, then the page will be
         * visited again by lazy_vacuum_heap, which will compute and record
         * its post-compaction free space.  If not, then we're done with this
         * page, so remember its free space as-is.  (This path will always be
         * taken if there are no indexes.)
         * 如果我們記得要刪除任何元組，那么lazy_vacuum_heap將再次訪問該頁，它將計算并記錄壓縮后的空閑空間。
         * 如果不是，那么我們就清理完了這個頁面，所以請記住它的空閑空間是原樣的。
         * (如果沒有索引，則始終采用此路徑。)
         */
        if (vacrelstats->num_dead_tuples == prev_dead_count)
            RecordPageWithFreeSpace(onerel, blkno, freespace);
    } //結束block循環
    /* report that everything is scanned and vacuumed */
    //報告所有數據已掃描并vacuumed.
    pgstat_progress_update_param(PROGRESS_VACUUM_HEAP_BLKS_SCANNED, blkno);
    pfree(frozen);
    /* save stats for use later */
    //存儲統計已備后用
    vacrelstats->tuples_deleted = tups_vacuumed;
    vacrelstats->new_dead_tuples = nkeep;
    /* now we can compute the new value for pg_class.reltuples */
    //現在可以為pg_class.reltuples設置新值了.
    vacrelstats->new_live_tuples = vac_estimate_reltuples(onerel,
                                                          nblocks,
                                                          vacrelstats->tupcount_pages,
                                                          live_tuples);
    /* also compute total number of surviving heap entries */
    //同時,技術存活的heap條目總數
    vacrelstats->new_rel_tuples =
        vacrelstats->new_live_tuples + vacrelstats->new_dead_tuples;
    /*
     * Release any remaining pin on visibility map page.
     * 在vm page中釋放所有的pin
     */
    if (BufferIsValid(vmbuffer))
    {
        ReleaseBuffer(vmbuffer);
        vmbuffer = InvalidBuffer;
    }
    /* If any tuples need to be deleted, perform final vacuum cycle */
    /* XXX put a threshold on min number of tuples here? */
    //如果仍有元組需要刪除,執行最后的vacuum循環.
    //在這里為元組的最小數目設置一個閾值?
    if (vacrelstats->num_dead_tuples > 0)
    {
        const int   hvp_index[] = {
            PROGRESS_VACUUM_PHASE,
            PROGRESS_VACUUM_NUM_INDEX_VACUUMS
        };
        int64       hvp_val[2];
        /* Log cleanup info before we touch indexes */
        //在訪問索引前記錄清理信息
        vacuum_log_cleanup_info(onerel, vacrelstats);
        /* Report that we are now vacuuming indexes */
        //報告我們正在vacumming索引
        pgstat_progress_update_param(PROGRESS_VACUUM_PHASE,
                                     PROGRESS_VACUUM_PHASE_VACUUM_INDEX);
        /* Remove index entries */
        //清理索引條目
        for (i = 0; i < nindexes; i++)
            lazy_vacuum_index(Irel[i],
                              &indstats[i],
                              vacrelstats);
        /* Report that we are now vacuuming the heap */
        //報告我們正在vacuuming heap
        hvp_val[0] = PROGRESS_VACUUM_PHASE_VACUUM_HEAP;
        hvp_val[1] = vacrelstats->num_index_scans + 1;
        pgstat_progress_update_multi_param(2, hvp_index, hvp_val);
        /* Remove tuples from heap */
        //清理元組
        pgstat_progress_update_param(PROGRESS_VACUUM_PHASE,
                                     PROGRESS_VACUUM_PHASE_VACUUM_HEAP);
        lazy_vacuum_heap(onerel, vacrelstats);
        vacrelstats->num_index_scans++;
    }
    /*
     * Vacuum the remainder of the Free Space Map.  We must do this whether or
     * not there were indexes.
     * vacuum FSM.
     * 不管是否存在索引,都必須如此處理.
     */
    if (blkno > next_fsm_block_to_vacuum)
        FreeSpaceMapVacuumRange(onerel, next_fsm_block_to_vacuum, blkno);
    /* report all blocks vacuumed; and that we're cleaning up */
    //報告所有blocks vacuumed,已完成清理.
    pgstat_progress_update_param(PROGRESS_VACUUM_HEAP_BLKS_VACUUMED, blkno);
    pgstat_progress_update_param(PROGRESS_VACUUM_PHASE,
                                 PROGRESS_VACUUM_PHASE_INDEX_CLEANUP);
    /* Do post-vacuum cleanup and statistics update for each index */
    //執行vacuum收尾工作,為每個索引更新統計信息
    for (i = 0; i < nindexes; i++)
        lazy_cleanup_index(Irel[i], indstats[i], vacrelstats);
    /* If no indexes, make log report that lazy_vacuum_heap would've made */
    //如無索引,寫日志
    if (vacuumed_pages)
        ereport(elevel,
                (errmsg("\"%s\": removed %.0f row versions in %u pages",
                        RelationGetRelationName(onerel),
                        tups_vacuumed, vacuumed_pages)));
    /*
     * This is pretty messy, but we split it up so that we can skip emitting
     * individual parts of the message when not applicable.
     * 一起寫日志會非常混亂,但我們把它拆分了,因此我們可以跳過發送消息的各個部分.
     */
    initStringInfo(&buf);
    appendStringInfo(&buf,
                     _("%.0f dead row versions cannot be removed yet, oldest xmin: %u\n"),
                     nkeep, OldestXmin);
    appendStringInfo(&buf, _("There were %.0f unused item pointers.\n"),
                     nunused);
    appendStringInfo(&buf, ngettext("Skipped %u page due to buffer pins, ",
                                    "Skipped %u pages due to buffer pins, ",
                                    vacrelstats->pinskipped_pages),
                     vacrelstats->pinskipped_pages);
    appendStringInfo(&buf, ngettext("%u frozen page.\n",
                                    "%u frozen pages.\n",
                                    vacrelstats->frozenskipped_pages),
                     vacrelstats->frozenskipped_pages);
    appendStringInfo(&buf, ngettext("%u page is entirely empty.\n",
                                    "%u pages are entirely empty.\n",
                                    empty_pages),
                     empty_pages);
    appendStringInfo(&buf, _("%s."), pg_rusage_show(&ru0));
    ereport(elevel,
            (errmsg("\"%s\": found %.0f removable, %.0f nonremovable row versions in %u out of %u pages",
                    RelationGetRelationName(onerel),
                    tups_vacuumed, num_tuples,
                    vacrelstats->scanned_pages, nblocks),
             errdetail_internal("%s", buf.data)));
    pfree(buf.data);
}

三、跟蹤分析

測試腳本,執行壓力測試的同時,執行vacuum


-- session 1
pgbench -c 2 -C -f ./update.sql -j 1 -n -T 600 -U xdb testdb
-- session 2
17:52:59 (xdb@[local]:5432)testdb=# vacuum verbose t1;

啟動gdb,設置斷點


(gdb) b lazy_scan_heap
Breakpoint 1 at 0x6bc38a: file vacuumlazy.c, line 470.
(gdb) c
Continuing.
Breakpoint 1, lazy_scan_heap (onerel=0x7f224a197788, options=5, vacrelstats=0x296d7b8, Irel=0x296d8b0, nindexes=1, 
    aggressive=false) at vacuumlazy.c:470
470     TransactionId relfrozenxid = onerel->rd_rel->relfrozenxid;
(gdb)

輸入參數
1-relation


(gdb) p *onerel
$1 = {rd_node = {spcNode = 1663, dbNode = 16402, relNode = 50820}, rd_smgr = 0x2930270, rd_refcnt = 1, rd_backend = -1, 
  rd_islocaltemp = false, rd_isnailed = false, rd_isvalid = true, rd_indexvalid = 1 '\001', rd_statvalid = false, 
  rd_createSubid = 0, rd_newRelfilenodeSubid = 0, rd_rel = 0x7f224a197bb8, rd_att = 0x7f224a0d8050, rd_id = 50820, 
  rd_lockInfo = {lockRelId = {relId = 50820, dbId = 16402}}, rd_rules = 0x0, rd_rulescxt = 0x0, trigdesc = 0x0, 
  rd_rsdesc = 0x0, rd_fkeylist = 0x0, rd_fkeyvalid = false, rd_partkeycxt = 0x0, rd_partkey = 0x0, rd_pdcxt = 0x0, 
  rd_partdesc = 0x0, rd_partcheck = 0x0, rd_indexlist = 0x7f224a198fe8, rd_oidindex = 0, rd_pkindex = 0, 
  rd_replidindex = 0, rd_statlist = 0x0, rd_indexattr = 0x0, rd_projindexattr = 0x0, rd_keyattr = 0x0, rd_pkattr = 0x0, 
  rd_idattr = 0x0, rd_projidx = 0x0, rd_pubactions = 0x0, rd_options = 0x0, rd_index = 0x0, rd_indextuple = 0x0, 
  rd_amhandler = 0, rd_indexcxt = 0x0, rd_amroutine = 0x0, rd_opfamily = 0x0, rd_opcintype = 0x0, rd_support = 0x0, 
  rd_supportinfo = 0x0, rd_indoption = 0x0, rd_indexprs = 0x0, rd_indpred = 0x0, rd_exclops = 0x0, rd_exclprocs = 0x0, 
  rd_exclstrats = 0x0, rd_amcache = 0x0, rd_indcollation = 0x0, rd_fdwroutine = 0x0, rd_toastoid = 0, 
  pgstat_info = 0x2923e50}
(gdb)

2-options=5,即VACOPT_VACUUM | VACOPT_VERBOSE
3-vacrelstats


(gdb) p *vacrelstats
$2 = {hasindex = true, old_rel_pages = 75, rel_pages = 0, scanned_pages = 0, pinskipped_pages = 0, frozenskipped_pages = 0, 
  tupcount_pages = 0, old_live_tuples = 10000, new_rel_tuples = 0, new_live_tuples = 0, new_dead_tuples = 0, 
  pages_removed = 0, tuples_deleted = 0, nonempty_pages = 0, num_dead_tuples = 0, max_dead_tuples = 0, dead_tuples = 0x0, 
  num_index_scans = 0, latestRemovedXid = 0, lock_waiter_detected = false}
(gdb)

4-Irel


(gdb) p *Irel
$3 = (Relation) 0x7f224a198688
(gdb) p **Irel
$4 = {rd_node = {spcNode = 1663, dbNode = 16402, relNode = 50823}, rd_smgr = 0x29302e0, rd_refcnt = 1, rd_backend = -1, 
  rd_islocaltemp = false, rd_isnailed = false, rd_isvalid = true, rd_indexvalid = 0 '\000', rd_statvalid = false, 
  rd_createSubid = 0, rd_newRelfilenodeSubid = 0, rd_rel = 0x7f224a1988a0, rd_att = 0x7f224a1989b8, rd_id = 50823, 
  rd_lockInfo = {lockRelId = {relId = 50823, dbId = 16402}}, rd_rules = 0x0, rd_rulescxt = 0x0, trigdesc = 0x0, 
  rd_rsdesc = 0x0, rd_fkeylist = 0x0, rd_fkeyvalid = false, rd_partkeycxt = 0x0, rd_partkey = 0x0, rd_pdcxt = 0x0, 
  rd_partdesc = 0x0, rd_partcheck = 0x0, rd_indexlist = 0x0, rd_oidindex = 0, rd_pkindex = 0, rd_replidindex = 0, 
  rd_statlist = 0x0, rd_indexattr = 0x0, rd_projindexattr = 0x0, rd_keyattr = 0x0, rd_pkattr = 0x0, rd_idattr = 0x0, 
  rd_projidx = 0x0, rd_pubactions = 0x0, rd_options = 0x0, rd_index = 0x7f224a198d58, rd_indextuple = 0x7f224a198d20, 
  rd_amhandler = 330, rd_indexcxt = 0x28cb340, rd_amroutine = 0x28cb480, rd_opfamily = 0x28cb598, rd_opcintype = 0x28cb5b8, 
  rd_support = 0x28cb5d8, rd_supportinfo = 0x28cb600, rd_indoption = 0x28cb738, rd_indexprs = 0x0, rd_indpred = 0x0, 
  rd_exclops = 0x0, rd_exclprocs = 0x0, rd_exclstrats = 0x0, rd_amcache = 0x0, rd_indcollation = 0x28cb718, 
  rd_fdwroutine = 0x0, rd_toastoid = 0, pgstat_info = 0x2923ec8}
(gdb)

5-nindexes=1,存在一個索引
6-aggressive=false,無需執行全表掃描
下面開始初始化相關變量


(gdb) n
471     TransactionId relminmxid = onerel->rd_rel->relminmxid;
(gdb) 
483     Buffer      vmbuffer = InvalidBuffer;
(gdb) 
488     const int   initprog_index[] = {
(gdb) 
495     pg_rusage_init(&ru0);
(gdb) 
497     relname = RelationGetRelationName(onerel);
(gdb) 
498     if (aggressive)
(gdb) 
504         ereport(elevel,
(gdb) 
509     empty_pages = vacuumed_pages = 0;
(gdb) 
510     next_fsm_block_to_vacuum = (BlockNumber) 0;
(gdb) 
511     num_tuples = live_tuples = tups_vacuumed = nkeep = nunused = 0;
(gdb) 
514         palloc0(nindexes * sizeof(IndexBulkDeleteResult *));
(gdb) 
513     indstats = (IndexBulkDeleteResult **)
(gdb) 
516     nblocks = RelationGetNumberOfBlocks(onerel);
(gdb) p relminmxid
$5 = 1
(gdb) p ru0
$6 = {tv = {tv_sec = 1548669429, tv_usec = 578779}, ru = {ru_utime = {tv_sec = 0, tv_usec = 29531}, ru_stime = {tv_sec = 0, 
      tv_usec = 51407}, {ru_maxrss = 7488, __ru_maxrss_word = 7488}, {ru_ixrss = 0, __ru_ixrss_word = 0}, {ru_idrss = 0, 
      __ru_idrss_word = 0}, {ru_isrss = 0, __ru_isrss_word = 0}, {ru_minflt = 1819, __ru_minflt_word = 1819}, {
      ru_majflt = 0, __ru_majflt_word = 0}, {ru_nswap = 0, __ru_nswap_word = 0}, {ru_inblock = 2664, 
      __ru_inblock_word = 2664}, {ru_oublock = 328, __ru_oublock_word = 328}, {ru_msgsnd = 0, __ru_msgsnd_word = 0}, {
      ru_msgrcv = 0, __ru_msgrcv_word = 0}, {ru_nsignals = 0, __ru_nsignals_word = 0}, {ru_nvcsw = 70, 
      __ru_nvcsw_word = 70}, {ru_nivcsw = 3, __ru_nivcsw_word = 3}}}
(gdb) p relname
$7 = 0x7f224a197bb8 "t1"
(gdb)

獲取總塊數


(gdb) n
517     vacrelstats->rel_pages = nblocks;
(gdb) p nblocks
$8 = 75
(gdb)

初始化統計信息和相關數組


(gdb) n
518     vacrelstats->scanned_pages = 0;
(gdb) 
519     vacrelstats->tupcount_pages = 0;
(gdb) 
520     vacrelstats->nonempty_pages = 0;
(gdb) 
521     vacrelstats->latestRemovedXid = InvalidTransactionId;
(gdb) 
523     lazy_space_alloc(vacrelstats, nblocks);
(gdb) 
524     frozen = palloc(sizeof(xl_heap_freeze_tuple) * MaxHeapTuplesPerPage);
(gdb) 
527     initprog_val[0] = PROGRESS_VACUUM_PHASE_SCAN_HEAP;
(gdb) 
528     initprog_val[1] = nblocks;
(gdb) 
529     initprog_val[2] = vacrelstats->max_dead_tuples;
(gdb) 
530     pgstat_progress_update_multi_param(3, initprog_index, initprog_val);
(gdb) p *vacrelstats
$9 = {hasindex = true, old_rel_pages = 75, rel_pages = 75, scanned_pages = 0, pinskipped_pages = 0, 
  frozenskipped_pages = 0, tupcount_pages = 0, old_live_tuples = 10000, new_rel_tuples = 0, new_live_tuples = 0, 
  new_dead_tuples = 0, pages_removed = 0, tuples_deleted = 0, nonempty_pages = 0, num_dead_tuples = 0, 
  max_dead_tuples = 21825, dead_tuples = 0x297e820, num_index_scans = 0, latestRemovedXid = 0, lock_waiter_detected = false}
(gdb)

計算下一個不能跳過的block
第0個塊也不能跳過(0 < 32),設置標記skipping_blocks為F


(gdb) n
576     next_unskippable_block = 0;
(gdb) 
577     if ((options & VACOPT_DISABLE_PAGE_SKIPPING) == 0)
(gdb) 
579         while (next_unskippable_block < nblocks)
(gdb) 
583             vmstatus = visibilitymap_get_status(onerel, next_unskippable_block,
(gdb) 
585             if (aggressive)
(gdb) p vmstatus
$10 = 0 '\000'
(gdb) n
592                 if ((vmstatus & VISIBILITYMAP_ALL_VISIBLE) == 0)
(gdb) 
593                     break;
(gdb) 
600     if (next_unskippable_block >= SKIP_PAGES_THRESHOLD)
(gdb) p next_unskippable_block
$11 = 0
(gdb) p SKIP_PAGES_THRESHOLD
$12 = 32
(gdb) n
603         skipping_blocks = false;
(gdb)

開始遍歷每個block
初始化相關變量


(gdb) 
605     for (blkno = 0; blkno < nblocks; blkno++)
(gdb) 
616         bool        all_visible_according_to_vm = false;
(gdb) 
618         bool        all_frozen = true;  /* provided all_visible is also true */
(gdb) 
620         TransactionId visibility_cutoff_xid = InvalidTransactionId;
(gdb) 
626         pgstat_progress_update_param(PROGRESS_VACUUM_HEAP_BLKS_SCANNED, blkno);
(gdb) 
628         if (blkno == next_unskippable_block)
(gdb)

blkno == next_unskippable_block,獲取下一個不可跳過的block


(gdb) p blkno
$13 = 0
(gdb) p next_unskippable_block
$14 = 0
(gdb) n
631             next_unskippable_block++;
(gdb) 
632             if ((options & VACOPT_DISABLE_PAGE_SKIPPING) == 0)
(gdb) 
634                 while (next_unskippable_block < nblocks)
(gdb) 
638                     vmskipflags = visibilitymap_get_status(onerel,
(gdb) 
641                     if (aggressive)
(gdb) p vmskipflags
$15 = 0 '\000'
(gdb) n
648                         if ((vmskipflags & VISIBILITYMAP_ALL_VISIBLE) == 0)
(gdb) 
649                             break;
(gdb) 
660             if (next_unskippable_block - blkno > SKIP_PAGES_THRESHOLD)
(gdb) p next_unskippable_block
$16 = 1
(gdb) n 
1047                        if (onerel->rd_rel->relhasoids &&
(gdb) 
1132                if (tupgone)
(gdb)

tupgone為F,判斷是否需要凍結(F)
獲取偏移,遍歷元組


(gdb) p tupgone
$17 = false
(gdb) n
1144                    num_tuples += 1;
(gdb) 
1145                    hastup = true;
(gdb) 
1151                    if (heap_prepare_freeze_tuple(tuple.t_data,
(gdb) 
1154                                                  &frozen[nfrozen],
(gdb) p nfrozen
$18 = 0
(gdb) n
1151                    if (heap_prepare_freeze_tuple(tuple.t_data,
(gdb) 
1158                    if (!tuple_totally_frozen)
(gdb) 
1159                        all_frozen = false;
(gdb) 
958              offnum = OffsetNumberNext(offnum))
(gdb) 
956         for (offnum = FirstOffsetNumber;
(gdb)

該元組正常


(gdb) p offnum
$19 = 3
(gdb) n
962             itemid = PageGetItemId(page, offnum);
(gdb) 
965             if (!ItemIdIsUsed(itemid))
(gdb) 
972             if (ItemIdIsRedirected(itemid))
(gdb) 
978             ItemPointerSet(&(tuple.t_self), blkno, offnum);
(gdb) 
986             if (ItemIdIsDead(itemid))
(gdb) 
993             Assert(ItemIdIsNormal(itemid));
(gdb) 
995             tuple.t_data = (HeapTupleHeader) PageGetItem(page, itemid);
(gdb) 
996             tuple.t_len = ItemIdGetLength(itemid);
(gdb) 
997             tuple.t_tableOid = RelationGetRelid(onerel);
(gdb) 
999             tupgone = false;
(gdb)

調用HeapTupleSatisfiesVacuum確定元組狀態,主要目的是一個元組是否可能對所有正在運行中的事務可見
該元組是Live tuple


1012                switch (HeapTupleSatisfiesVacuum(&tuple, OldestXmin, buf))
(gdb) 
(gdb) n
1047                        if (onerel->rd_rel->relhasoids &&
(gdb) n
1056                        live_tuples += 1;
(gdb) 
1067                        if (all_visible)
(gdb) p all_visible
$20 = false

跳出循環


(gdb) b vacuumlazy.c:1168
Breakpoint 2 at 0x6bd4e7: file vacuumlazy.c, line 1168.
(gdb) c
Continuing.
Breakpoint 2, lazy_scan_heap (onerel=0x7f224a197788, options=5, vacrelstats=0x296d7b8, Irel=0x296d8b0, nindexes=1, 
    aggressive=false) at vacuumlazy.c:1168
1168            if (nfrozen > 0)
(gdb)

更新統計信息


(gdb) n
1203            if (nindexes == 0 &&
(gdb) p nfrozen
$23 = 0
(gdb) n
1232            freespace = PageGetHeapFreeSpace(page);
(gdb) 
1235            if (all_visible && !all_visible_according_to_vm)
(gdb) 
1268            else if (all_visible_according_to_vm && !PageIsAllVisible(page)
(gdb) 
1290            else if (PageIsAllVisible(page) && has_dead_tuples)
(gdb) 
1305            else if (all_visible_according_to_vm && all_visible && all_frozen &&
(gdb) 
1318            UnlockReleaseBuffer(buf);
(gdb) 
1321            if (hastup)
(gdb) 
1322                vacrelstats->nonempty_pages = blkno + 1;
(gdb) p hastup
$24 = true
(gdb) n
1331            if (vacrelstats->num_dead_tuples == prev_dead_count)
(gdb) 
1332                RecordPageWithFreeSpace(onerel, blkno, freespace);

繼續下一個block


(gdb) 
605     for (blkno = 0; blkno < nblocks; blkno++)
(gdb) p blkno
$25 = 0
(gdb) n
616         bool        all_visible_according_to_vm = false;
(gdb) p blkno
$26 = 1
(gdb)

判斷(vacrelstats->max_dead_tuples - vacrelstats->num_dead_tuples) < MaxHeapTuplesPerPage && vacrelstats->num_dead_tuples > 0,不滿足,繼續執行


...
(gdb) 
701         vacuum_delay_point();
(gdb) 
707         if ((vacrelstats->max_dead_tuples - vacrelstats->num_dead_tuples) < MaxHeapTuplesPerPage &&
(gdb) p vacrelstats->max_dead_tuples
$27 = 21825
(gdb) p vacrelstats->num_dead_tuples
$28 = 0
(gdb) p MaxHeapTuplesPerPage
No symbol "__builtin_offsetof" in current context.
(gdb)

以擴展方式讀取buffer


(gdb) n
783         visibilitymap_pin(onerel, blkno, &vmbuffer);
(gdb) 
785         buf = ReadBufferExtended(onerel, MAIN_FORKNUM, blkno,
(gdb) 
789         if (!ConditionalLockBufferForCleanup(buf))
(gdb)

取buffer cleanup lock,成功!
調用heap_page_prune清理該page中的所有HOT-update鏈


(gdb) n
847         vacrelstats->scanned_pages++;
(gdb) 
848         vacrelstats->tupcount_pages++;
(gdb) 
850         page = BufferGetPage(buf);
(gdb) 
852         if (PageIsNew(page))
(gdb) 
894         if (PageIsEmpty(page))
(gdb) 
938         tups_vacuumed += heap_page_prune(onerel, buf, OldestXmin, false,
(gdb) 
945         all_visible = true;
(gdb)

遍歷page中的行指針


956         for (offnum = FirstOffsetNumber;
(gdb) p maxoff
$29 = 291
(gdb) 
$30 = 291
(gdb) n
962             itemid = PageGetItemId(page, offnum);
(gdb) n
965             if (!ItemIdIsUsed(itemid))
(gdb) 
972             if (ItemIdIsRedirected(itemid))
(gdb) 
978             ItemPointerSet(&(tuple.t_self), blkno, offnum);
(gdb) 
986             if (ItemIdIsDead(itemid))
(gdb) 
993             Assert(ItemIdIsNormal(itemid));
(gdb) 
995             tuple.t_data = (HeapTupleHeader) PageGetItem(page, itemid);
(gdb) 
996             tuple.t_len = ItemIdGetLength(itemid);
(gdb) 
997             tuple.t_tableOid = RelationGetRelid(onerel);
(gdb) 
999             tupgone = false;
(gdb) 
1012                switch (HeapTupleSatisfiesVacuum(&tuple, OldestXmin, buf))
(gdb) 
1099                        nkeep += 1;
(gdb) 
1100                        all_visible = false;
(gdb) 
1101                        break;
(gdb) 
1132                if (tupgone)
(gdb) 
1144                    num_tuples += 1;

跳出循環


(gdb) c
Continuing.
Breakpoint 2, lazy_scan_heap (onerel=0x7f224a197788, options=5, vacrelstats=0x296d7b8, Irel=0x296d8b0, nindexes=1, 
    aggressive=false) at vacuumlazy.c:1168
1168            if (nfrozen > 0)
(gdb)

DONE!

四、參考資料

PG Source Code

向AI問一下細節

中文字幕av专区_日韩电影在线播放_精品国产精品久久一区免费式_av在线免费观看网站

PostgreSQL 源碼解讀（130）- MVCC#14（vacuum過程-lazy_scan_heap函數）

一、數據結構

二、源碼解讀

三、跟蹤分析

四、參考資料

猜你喜歡

中文字幕av专区_日韩电影在线播放_精品国产精品久久一区免费式_av在线免费观看网站

PostgreSQL 源碼解讀（130）- MVCC#14（vacuum過程-lazy_scan_heap函數）

一、數據結構

二、源碼解讀

三、跟蹤分析

四、參考資料

猜你喜歡

最新資訊

相關推薦

相關標簽