您好,登錄后才能下訂單哦!
本節簡單介紹了PostgreSQL手工執行vacuum的處理流程,主要分析了ExecVacuum->vacuum->vacuum_rel->heap_vacuum_rel->lazy_scan_heap->heap_execute_freeze_tuple函數的實現邏輯,該函數執行實際的元組凍結操作(先前已完成準備工作)。
宏定義
Vacuum和Analyze命令選項
/* ----------------------
* Vacuum and Analyze Statements
* Vacuum和Analyze命令選項
*
* Even though these are nominally two statements, it's convenient to use
* just one node type for both. Note that at least one of VACOPT_VACUUM
* and VACOPT_ANALYZE must be set in options.
* 雖然在這里有兩種不同的語句,但只需要使用統一的Node類型即可.
* 注意至少VACOPT_VACUUM/VACOPT_ANALYZE在選項中設置.
* ----------------------
*/
typedef enum VacuumOption
{
VACOPT_VACUUM = 1 << 0, /* do VACUUM */
VACOPT_ANALYZE = 1 << 1, /* do ANALYZE */
VACOPT_VERBOSE = 1 << 2, /* print progress info */
VACOPT_FREEZE = 1 << 3, /* FREEZE option */
VACOPT_FULL = 1 << 4, /* FULL (non-concurrent) vacuum */
VACOPT_SKIP_LOCKED = 1 << 5, /* skip if cannot get lock */
VACOPT_SKIPTOAST = 1 << 6, /* don't process the TOAST table, if any */
VACOPT_DISABLE_PAGE_SKIPPING = 1 << 7 /* don't skip any pages */
} VacuumOption;
HeapTupleHeaderData
堆元組頭部.為了避免浪費空間,字段通過這么一種方式進行布局避免不必要的對齊填充.
/*
* Heap tuple header. To avoid wasting space, the fields should be
* laid out in such a way as to avoid structure padding.
* 堆元組頭部.為了避免浪費空間,字段通過這么一種方式進行布局避免結構體不必要的填充.
*
* Datums of composite types (row types) share the same general structure
* as on-disk tuples, so that the same routines can be used to build and
* examine them. However the requirements are slightly different: a Datum
* does not need any transaction visibility information, and it does need
* a length word and some embedded type information. We can achieve this
* by overlaying the xmin/cmin/xmax/cmax/xvac fields of a heap tuple
* with the fields needed in the Datum case. Typically, all tuples built
* in-memory will be initialized with the Datum fields; but when a tuple is
* about to be inserted in a table, the transaction fields will be filled,
* overwriting the datum fields.
* 組合類型(行類型)的Datums與磁盤上的元組共享相同的常規結構體,
* 因此可以使用相同的處理過程來構造和檢查這些信息.
* 但是,需求可能很不一樣:Datum不需要任何事物可見性相關的信息,但確實需要長度字和一些嵌入的類型信息.
* 在Datum這種情況下,我們可以通過使用堆元組中的xmin/cmin/xmax/cmax/xvac字段疊加來獲取這些信息.
* 典型的,在內存中構造的所有元組會通過Datum字段初始化,但在元組將要插入到表時,事務字段會被填充,覆寫Datum字段.
*
* The overall structure of a heap tuple looks like:
* fixed fields (HeapTupleHeaderData struct)
* nulls bitmap (if HEAP_HASNULL is set in t_infomask)
* alignment padding (as needed to make user data MAXALIGN'd)
* object ID (if HEAP_HASOID_OLD is set in t_infomask, not created
* anymore)
* user data fields
* 堆元組的整體結構看起來是這樣的:
* 固定字段(HeapTupleHeaderData結構體)
* nulls位圖(如在t_infomask中設置了HEAP_HASNULL標記位)
* 對齊填充(如MAXALIGN)
* 對象ID(如t_infomask設置了HEAP_HASOID_OLD標記位,則沒有創建)
* 用戶數據字段
*
* We store five "virtual" fields Xmin, Cmin, Xmax, Cmax, and Xvac in three
* physical fields. Xmin and Xmax are always really stored, but Cmin, Cmax
* and Xvac share a field. This works because we know that Cmin and Cmax
* are only interesting for the lifetime of the inserting and deleting
* transaction respectively. If a tuple is inserted and deleted in the same
* transaction, we store a "combo" command id that can be mapped to the real
* cmin and cmax, but only by use of local state within the originating
* backend. See combocid.c for more details. Meanwhile, Xvac is only set by
* old-style VACUUM FULL, which does not have any command sub-structure and so
* does not need either Cmin or Cmax. (This requires that old-style VACUUM
* FULL never try to move a tuple whose Cmin or Cmax is still interesting,
* ie, an insert-in-progress or delete-in-progress tuple.)
* 在三個物理字段中存儲了5個"虛擬"字段,分別是Xmin, Cmin, Xmax, Cmax, and Xvac.
* Xmin和Xmax通常是實際存儲的,但Cmin,Cmax和Xvac共享一個字段.
* 這樣之所以可行是因為我們知道Cmin和Cmax只在相應的插入和刪除事務生命周期時才會有用.
* 如果元組在同一個事務中插入和刪除,則存儲一個"combo"命令ID,該ID可以映射到實際的cmin和cmax,
* 但只有在原始后臺進程中使用本地狀態時才使用.
* 同時,Xvac在老版本的VACUUM FULL時才會設置,該命令不存在命令子結構因此不需要Cmin和Cmax.
* (這需要老版本的VACUUM FULL永遠不要嘗試移動Cmin和Cmax仍有用的元組,比如在插入或刪除元組期間).
*
* A word about t_ctid: whenever a new tuple is stored on disk, its t_ctid
* is initialized with its own TID (location). If the tuple is ever updated,
* its t_ctid is changed to point to the replacement version of the tuple. Or
* if the tuple is moved from one partition to another, due to an update of
* the partition key, t_ctid is set to a special value to indicate that
* (see ItemPointerSetMovedPartitions). Thus, a tuple is the latest version
* of its row iff XMAX is invalid or
* t_ctid points to itself (in which case, if XMAX is valid, the tuple is
* either locked or deleted). One can follow the chain of t_ctid links
* to find the newest version of the row, unless it was moved to a different
* partition. Beware however that VACUUM might
* erase the pointed-to (newer) tuple before erasing the pointing (older)
* tuple. Hence, when following a t_ctid link, it is necessary to check
* to see if the referenced slot is empty or contains an unrelated tuple.
* Check that the referenced tuple has XMIN equal to the referencing tuple's
* XMAX to verify that it is actually the descendant version and not an
* unrelated tuple stored into a slot recently freed by VACUUM. If either
* check fails, one may assume that there is no live descendant version.
* 關于c_ctid要說的:不管什么時候元組存儲到磁盤上,元組的t_ctid使用自己的TID(位置)進行初始化.
* 如果元組曾經修改過,那么t_ctid修改為指向元組的新版本上.
* 或者,如果元組從一個分區移動到另外一個分區,由于分區鍵的修改,
* t_ctid會設置為一個特別的值用以表示這種情況(詳細查看ItemPointerSetMovedPartitions).
* 因此,在XMAX是無需或者t_ctid指向自己的時候,元組是最后的版本
* (在這種情況下,如果XMAX是有效的,元組要么被鎖定要么已被刪除)
*
* t_ctid is sometimes used to store a speculative insertion token, instead
* of a real TID. A speculative token is set on a tuple that's being
* inserted, until the inserter is sure that it wants to go ahead with the
* insertion. Hence a token should only be seen on a tuple with an XMAX
* that's still in-progress, or invalid/aborted. The token is replaced with
* the tuple's real TID when the insertion is confirmed. One should never
* see a speculative insertion token while following a chain of t_ctid links,
* because they are not used on updates, only insertions.
* t_ctid有時候用于存儲 speculative insertion token而不是一個實際的TID.
* 在正在插入的元組上設置speculative token,直至插入程序確定繼續插入.
* 因此token在XMAX事務正在處理或者無效/回滾時可以查看.
* token在插入確認后被替換成實際的TID.
* 在跟蹤t_ctid鏈接鏈時,不應該看到speculative insertion token,
* 因為它們不用于更新,只用于插入。
*
* Following the fixed header fields, the nulls bitmap is stored (beginning
* at t_bits). The bitmap is *not* stored if t_infomask shows that there
* are no nulls in the tuple. If an OID field is present (as indicated by
* t_infomask), then it is stored just before the user data, which begins at
* the offset shown by t_hoff. Note that t_hoff must be a multiple of
* MAXALIGN.
* 在固定的頭部字段后是nulls位圖(以t_bits開始).
* 如t_infomask標記提示沒有空值,則不存才nulls位圖.
* 如果OID字段是現成的(通過t_infomask指示),那么在用戶數據前存儲,用戶數據從t_hoff所示的偏移量開始。
* 注意t_hoff必須是MAXALIGN的倍數.
*/
typedef struct HeapTupleFields
{
TransactionId t_xmin; /* 插入事務ID;inserting xact ID */
TransactionId t_xmax; /* 刪除或鎖定事務ID;deleting or locking xact ID */
union
{
CommandId t_cid; /* 插入或刪除命令ID或者combo命令;inserting or deleting command ID, or both */
TransactionId t_xvac; /* old-style VACUUM FULL xact ID */
} t_field3;//聯合體
} HeapTupleFields;//頭部字段
typedef struct DatumTupleFields
{
int32 datum_len_; /* 可變長頭部(不能夠直接接觸);varlena header (do not touch directly!) */
int32 datum_typmod; /* -1或者是記錄類型標識符;-1, or identifier of a record type */
Oid datum_typeid; /* 組合類型OID或者RECORDOID;composite type OID, or RECORDOID */
/*
* datum_typeid cannot be a domain over composite, only plain composite,
* even if the datum is meant as a value of a domain-over-composite type.
* This is in line with the general principle that CoerceToDomain does not
* change the physical representation of the base type value.
* 即使datum是domain-over-composite類型,datum_typeid也不能是域組合只能是平面組合.
* 這與一般原則相一致,即CoerceToDomain不改變基類型值的物理表示形式。
*
* Note: field ordering is chosen with thought that Oid might someday
* widen to 64 bits.
* 注意:字段排序的選擇考慮到Oid可能有一天會擴展到64位。
*/
} DatumTupleFields;
struct HeapTupleHeaderData
{
union
{
HeapTupleFields t_heap;
DatumTupleFields t_datum;
} t_choice;
ItemPointerData t_ctid; /* current TID of this or newer tuple (or a
* speculative insertion token) */
/* Fields below here must match MinimalTupleData! */
#define FIELDNO_HEAPTUPLEHEADERDATA_INFOMASK2 2
uint16 t_infomask2; /* number of attributes + various flags */
#define FIELDNO_HEAPTUPLEHEADERDATA_INFOMASK 3
uint16 t_infomask; /* various flag bits, see below */
#define FIELDNO_HEAPTUPLEHEADERDATA_HOFF 4
uint8 t_hoff; /* sizeof header incl. bitmap, padding */
/* ^ - 23 bytes - ^ */
#define FIELDNO_HEAPTUPLEHEADERDATA_BITS 5
bits8 t_bits[FLEXIBLE_ARRAY_MEMBER]; /* bitmap of NULLs */
/* MORE DATA FOLLOWS AT END OF STRUCT */
};
typedef HeapTupleHeaderData* HeapTupleHeader;
/*
結構體展開,詳見下表:
Field Type Length Offset Description
t_xmin TransactionId 4 bytes 0 insert XID stamp
t_xmax TransactionId 4 bytes 4 delete XID stamp
t_cid CommandId 4 bytes 8 insert and/or delete CID stamp (overlays with t_xvac)
t_xvac TransactionId 4 bytes 8 XID for VACUUM operation moving a row version
t_ctid ItemPointerData 6 bytes 12 current TID of this or newer row version
t_infomask2 uint16 2 bytes 18 number of attributes, plus various flag bits
t_infomask uint16 2 bytes 20 various flag bits
t_hoff uint8 1 byte 22 offset to user data
//注意:t_cid和t_xvac為聯合體,共用存儲空間
*/
//t_infomask=\x0802,十進制值為2050,二進制值為100000000010
//t_infomask說明
1 #define HEAP_HASNULL 0x0001 /* has null attribute(s) */
10 #define HEAP_HASVARWIDTH 0x0002 /* has variable-width attribute(s) */
100 #define HEAP_HASEXTERNAL 0x0004 /* has external stored attribute(s) */
1000 #define HEAP_HASOID 0x0008 /* has an object-id field */
10000 #define HEAP_XMAX_KEYSHR_LOCK 0x0010 /* xmax is a key-shared locker */
100000 #define HEAP_COMBOCID 0x0020 /* t_cid is a combo cid */
1000000 #define HEAP_XMAX_EXCL_LOCK 0x0040 /* xmax is exclusive locker */
10000000 #define HEAP_XMAX_LOCK_ONLY 0x0080 /* xmax, if valid, is only a locker */
/* xmax is a shared locker */
#define HEAP_XMAX_SHR_LOCK (HEAP_XMAX_EXCL_LOCK | HEAP_XMAX_KEYSHR_LOCK)
#define HEAP_LOCK_MASK (HEAP_XMAX_SHR_LOCK | HEAP_XMAX_EXCL_LOCK | \
HEAP_XMAX_KEYSHR_LOCK)
100000000 #define HEAP_XMIN_COMMITTED 0x0100 /* t_xmin committed */
1000000000 #define HEAP_XMIN_INVALID 0x0200 /* t_xmin invalid/aborted */
#define HEAP_XMIN_FROZEN (HEAP_XMIN_COMMITTED|HEAP_XMIN_INVALID)
10000000000 #define HEAP_XMAX_COMMITTED 0x0400 /* t_xmax committed */
100000000000 #define HEAP_XMAX_INVALID 0x0800 /* t_xmax invalid/aborted */
1000000000000 #define HEAP_XMAX_IS_MULTI 0x1000 /* t_xmax is a MultiXactId */
10000000000000 #define HEAP_UPDATED 0x2000 /* this is UPDATEd version of row */
100000000000000 #define HEAP_MOVED_OFF 0x4000 /* moved to another place by pre-9.0
* VACUUM FULL; kept for binary
* upgrade support */
1000000000000000 #define HEAP_MOVED_IN 0x8000 /* moved from another place by pre-9.0
* VACUUM FULL; kept for binary
* upgrade support */
#define HEAP_MOVED (HEAP_MOVED_OFF | HEAP_MOVED_IN)
1111111111110000 #define HEAP_XACT_MASK 0xFFF0 /* visibility-related bits */
//\x0802,二進制100000000010表示第2位和第12位為1,
//意味著存在可變長屬性(HEAP_HASVARWIDTH),XMAX無效(HEAP_XMAX_INVALID)
/*
* information stored in t_infomask2:
*/
#define HEAP_NATTS_MASK 0x07FF /* 11 bits for number of attributes */
/* bits 0x1800 are available */
#define HEAP_KEYS_UPDATED 0x2000 /* tuple was updated and key cols
* modified, or tuple deleted */
#define HEAP_HOT_UPDATED 0x4000 /* tuple was HOT-updated */
#define HEAP_ONLY_TUPLE 0x8000 /* this is heap-only tuple */
#define HEAP2_XACT_MASK 0xE000 /* visibility-related bits */
//把十六進制值轉換為二進制顯示
11111111111 #define HEAP_NATTS_MASK 0x07FF
10000000000000 #define HEAP_KEYS_UPDATED 0x2000
100000000000000 #define HEAP_HOT_UPDATED 0x4000
1000000000000000 #define HEAP_ONLY_TUPLE 0x8000
1110000000000000 #define HEAP2_XACT_MASK 0xE000
1111111111111110 #define SpecTokenOffsetNumber 0xfffe
//前(低)11位為屬性的個數,3意味著有3個屬性(字段)
xl_heap_freeze_tuple
xl_heap_freeze_tuple表示’freeze plan’,用于存儲在vacuum期間凍結tuple所需要的信息.
/*
* This struct represents a 'freeze plan', which is what we need to know about
* a single tuple being frozen during vacuum.
* 該結構表示'freeze plan',用于存儲在vacuum期間凍結tuple所需要的信息
*/
/* 0x01 was XLH_FREEZE_XMIN */
#define XLH_FREEZE_XVAC 0x02
#define XLH_INVALID_XVAC 0x04
typedef struct xl_heap_freeze_tuple
{
TransactionId xmax;
OffsetNumber offset;
uint16 t_infomask2;
uint16 t_infomask;
uint8 frzflags;
} xl_heap_freeze_tuple;
heap_execute_freeze_tuple執行實際的元組凍結操作(先前已完成準備工作),邏輯很簡單,設置xmax和凍結事務號.
/*
* heap_execute_freeze_tuple
* Execute the prepared freezing of a tuple.
* 執行實際的元組凍結操作(先前已完成準備工作)
*
* Caller is responsible for ensuring that no other backend can access the
* storage underlying this tuple, either by holding an exclusive lock on the
* buffer containing it (which is what lazy VACUUM does), or by having it be
* in private storage (which is what CLUSTER and friends do).
* 調用者有責任確保沒有其他后臺進程可以訪問該元組所在的存儲空間,
* 通過持有該元組所在的buffer獨占鎖(lazy VACUUM所做的事情),
* 或者在私有存儲空間中存儲(CLUSTER和友元的處理方式)
*
* Note: it might seem we could make the changes without exclusive lock, since
* TransactionId read/write is assumed atomic anyway. However there is a race
* condition: someone who just fetched an old XID that we overwrite here could
* conceivably not finish checking the XID against pg_xact before we finish
* the VACUUM and perhaps truncate off the part of pg_xact he needs. Getting
* exclusive lock ensures no other backend is in process of checking the
* tuple status. Also, getting exclusive lock makes it safe to adjust the
* infomask bits.
* 注意:看起來我們可以不需要獨占鎖就可以進行修改,因為TransactionId R/W假定是原子操作.
* 但是,這里有條件爭用:某些進程剛剛提取了一個舊的XID,而該XID已被覆蓋,
* 這時候會出現在完成VACUUM之前還沒有完成pg_xact之上的XID檢查,
* 并且可能會出現截斷了pg_xact所需要的部分內容.
* 獲取獨占鎖可以確保沒有其他后臺進程正在檢查元組狀態.
* 同時,獲取獨占鎖可以安全的調整infomask標記位.
*
* NB: All code in here must be safe to execute during crash recovery!
* 注意:這里的所有代碼必須在崩潰恢復期間可以安全的執行.
*/
void
heap_execute_freeze_tuple(HeapTupleHeader tuple, xl_heap_freeze_tuple *frz)
{
HeapTupleHeaderSetXmax(tuple, frz->xmax);
if (frz->frzflags & XLH_FREEZE_XVAC)
HeapTupleHeaderSetXvac(tuple, FrozenTransactionId);
if (frz->frzflags & XLH_INVALID_XVAC)
HeapTupleHeaderSetXvac(tuple, InvalidTransactionId);
tuple->t_infomask = frz->t_infomask;
tuple->t_infomask2 = frz->t_infomask2;
}
//設置元組的xmax值
#define HeapTupleHeaderSetXmax(tup, xid) \
( \
(tup)->t_choice.t_heap.t_xmax = (xid) \
)
//設置
#define HeapTupleHeaderSetXvac(tup, xid) \
do { \
Assert((tup)->t_infomask & HEAP_MOVED); \
(tup)->t_choice.t_heap.t_field3.t_xvac = (xid); \
} while (0)
N/A
PG Source Code
免責聲明:本站發布的內容(圖片、視頻和文字)以原創、轉載和分享為主,文章觀點不代表本網站立場,如果涉及侵權請聯系站長郵箱:is@yisu.com進行舉報,并提供相關證據,一經查實,將立刻刪除涉嫌侵權內容。