PostgreSQL中哪個函數為連接新生成的joinrel構造訪問路徑

發布時間：2021-11-10 16:16:29 來源：億速云閱讀：149 作者：iii 欄目：關系型數據庫

本篇內容主要講解“PostgreSQL中哪個函數為連接新生成的joinrel構造訪問路徑”，感興趣的朋友不妨來看看。本文介紹的方法操作簡單快捷，實用性強。下面就讓小編來帶大家學習“PostgreSQL中哪個函數為連接新生成的joinrel構造訪問路徑”吧!

一、數據結構

SpecialJoinInfo

 /*
  * "Special join" info.
  *
  * One-sided outer joins constrain the order of joining partially but not
  * completely.  We flatten such joins into the planner's top-level list of
  * relations to join, but record information about each outer join in a
  * SpecialJoinInfo struct.  These structs are kept in the PlannerInfo node's
  * join_info_list.
  *
  * Similarly, semijoins and antijoins created by flattening IN (subselect)
  * and EXISTS(subselect) clauses create partial constraints on join order.
  * These are likewise recorded in SpecialJoinInfo structs.
  *
  * We make SpecialJoinInfos for FULL JOINs even though there is no flexibility
  * of planning for them, because this simplifies make_join_rel()'s API.
  *
  * min_lefthand and min_righthand are the sets of base relids that must be
  * available on each side when performing the special join.  lhs_strict is
  * true if the special join's condition cannot succeed when the LHS variables
  * are all NULL (this means that an outer join can commute with upper-level
  * outer joins even if it appears in their RHS).  We don't bother to set
  * lhs_strict for FULL JOINs, however.
  *
  * It is not valid for either min_lefthand or min_righthand to be empty sets;
  * if they were, this would break the logic that enforces join order.
  *
  * syn_lefthand and syn_righthand are the sets of base relids that are
  * syntactically below this special join.  (These are needed to help compute
  * min_lefthand and min_righthand for higher joins.)
  *
  * delay_upper_joins is set true if we detect a pushed-down clause that has
  * to be evaluated after this join is formed (because it references the RHS).
  * Any outer joins that have such a clause and this join in their RHS cannot
  * commute with this join, because that would leave noplace to check the
  * pushed-down clause.  (We don't track this for FULL JOINs, either.)
  *
  * For a semijoin, we also extract the join operators and their RHS arguments
  * and set semi_operators, semi_rhs_exprs, semi_can_btree, and semi_can_hash.
  * This is done in support of possibly unique-ifying the RHS, so we don't
  * bother unless at least one of semi_can_btree and semi_can_hash can be set
  * true.  (You might expect that this information would be computed during
  * join planning; but it's helpful to have it available during planning of
  * parameterized table scans, so we store it in the SpecialJoinInfo structs.)
  *
  * jointype is never JOIN_RIGHT; a RIGHT JOIN is handled by switching
  * the inputs to make it a LEFT JOIN.  So the allowed values of jointype
  * in a join_info_list member are only LEFT, FULL, SEMI, or ANTI.
  *
  * For purposes of join selectivity estimation, we create transient
  * SpecialJoinInfo structures for regular inner joins; so it is possible
  * to have jointype == JOIN_INNER in such a structure, even though this is
  * not allowed within join_info_list.  We also create transient
  * SpecialJoinInfos with jointype == JOIN_INNER for outer joins, since for
  * cost estimation purposes it is sometimes useful to know the join size under
  * plain innerjoin semantics.  Note that lhs_strict, delay_upper_joins, and
  * of course the semi_xxx fields are not set meaningfully within such structs.
  */
 
 typedef struct SpecialJoinInfo
 {
     NodeTag     type;
     Relids      min_lefthand;   /* base relids in minimum LHS for join */
     Relids      min_righthand;  /* base relids in minimum RHS for join */
     Relids      syn_lefthand;   /* base relids syntactically within LHS */
     Relids      syn_righthand;  /* base relids syntactically within RHS */
     JoinType    jointype;       /* always INNER, LEFT, FULL, SEMI, or ANTI */
     bool        lhs_strict;     /* joinclause is strict for some LHS rel */
     bool        delay_upper_joins;  /* can't commute with upper RHS */
     /* Remaining fields are set only for JOIN_SEMI jointype: */
     bool        semi_can_btree; /* true if semi_operators are all btree */
     bool        semi_can_hash;  /* true if semi_operators are all hash */
     List       *semi_operators; /* OIDs of equality join operators */
     List       *semi_rhs_exprs; /* righthand-side expressions of these ops */
 } SpecialJoinInfo;

RelOptInfo

 typedef enum RelOptKind
 {
     RELOPT_BASEREL,//基本關系(如基表/子查詢等)
     RELOPT_JOINREL,//連接產生的關系,要注意的是通過連接等方式產生的結果亦可以視為關系
     RELOPT_OTHER_MEMBER_REL,
     RELOPT_OTHER_JOINREL,
     RELOPT_UPPER_REL,//上層的關系
     RELOPT_OTHER_UPPER_REL,
     RELOPT_DEADREL
 } RelOptKind;
 
 /*
  * Is the given relation a simple relation i.e a base or "other" member
  * relation?
  */
 #define IS_SIMPLE_REL(rel) \
     ((rel)->reloptkind == RELOPT_BASEREL || \
      (rel)->reloptkind == RELOPT_OTHER_MEMBER_REL)
 
 /* Is the given relation a join relation? */
 #define IS_JOIN_REL(rel)    \
     ((rel)->reloptkind == RELOPT_JOINREL || \
      (rel)->reloptkind == RELOPT_OTHER_JOINREL)
 
 /* Is the given relation an upper relation? */
 #define IS_UPPER_REL(rel)   \
     ((rel)->reloptkind == RELOPT_UPPER_REL || \
      (rel)->reloptkind == RELOPT_OTHER_UPPER_REL)
 
 /* Is the given relation an "other" relation? */
 #define IS_OTHER_REL(rel) \
     ((rel)->reloptkind == RELOPT_OTHER_MEMBER_REL || \
      (rel)->reloptkind == RELOPT_OTHER_JOINREL || \
      (rel)->reloptkind == RELOPT_OTHER_UPPER_REL)
 
 typedef struct RelOptInfo
 {
     NodeTag     type;//節點標識
 
     RelOptKind  reloptkind;//RelOpt類型
 
     /* all relations included in this RelOptInfo */
     Relids      relids;         /*Relids(rtindex)集合 set of base relids (rangetable indexes) */
 
     /* size estimates generated by planner */
     double      rows;           /*結果元組的估算數量 estimated number of result tuples */
 
     /* per-relation planner control flags */
     bool        consider_startup;   /*是否考慮啟動成本?是,需要保留啟動成本低的路徑 keep cheap-startup-cost paths? */
     bool        consider_param_startup; /*是否考慮參數化?的路徑 ditto, for parameterized paths? */
     bool        consider_parallel;  /*是否考慮并行處理路徑 consider parallel paths? */
 
     /* default result targetlist for Paths scanning this relation */
     struct PathTarget *reltarget;   /*掃描該Relation時默認的結果 list of Vars/Exprs, cost, width */
 
     /* materialization information */
     List       *pathlist;       /*訪問路徑鏈表 Path structures */
     List       *ppilist;        /*路徑鏈表中使用參數化路徑進行 ParamPathInfos used in pathlist */
     List       *partial_pathlist;   /* partial Paths */
     struct Path *cheapest_startup_path;//代價最低的啟動路徑
     struct Path *cheapest_total_path;//代價最低的整體路徑
     struct Path *cheapest_unique_path;//代價最低的獲取唯一值的路徑
     List       *cheapest_parameterized_paths;//代價最低的參數化?路徑鏈表
 
     /* parameterization information needed for both base rels and join rels */
     /* (see also lateral_vars and lateral_referencers) */
     Relids      direct_lateral_relids;  /*使用lateral語法,需依賴的Relids rels directly laterally referenced */
     Relids      lateral_relids; /* minimum parameterization of rel */
 
     /* information about a base rel (not set for join rels!) */
     //reloptkind=RELOPT_BASEREL時使用的數據結構
     Index       relid;          /* Relation ID */
     Oid         reltablespace;  /* 表空間 containing tablespace */
     RTEKind     rtekind;        /* 基表?子查詢?還是函數等等?RELATION, SUBQUERY, FUNCTION, etc */
     AttrNumber  min_attr;       /* 最小的屬性編號 smallest attrno of rel (often <0) */
     AttrNumber  max_attr;       /* 最大的屬性編號 largest attrno of rel */
     Relids     *attr_needed;    /* 數組 array indexed [min_attr .. max_attr] */
     int32      *attr_widths;    /* 屬性寬度 array indexed [min_attr .. max_attr] */
     List       *lateral_vars;   /* 關系依賴的Vars/PHVs LATERAL Vars and PHVs referenced by rel */
     Relids      lateral_referencers;    /*依賴該關系的Relids rels that reference me laterally */
     List       *indexlist;      /* 該關系的IndexOptInfo鏈表 list of IndexOptInfo */
     List       *statlist;       /* 統計信息鏈表 list of StatisticExtInfo */
     BlockNumber pages;          /* 塊數 size estimates derived from pg_class */
     double      tuples;         /* 元組數 */
     double      allvisfrac;     /* ? */
     PlannerInfo *subroot;       /* 如為子查詢,存儲子查詢的root if subquery */
     List       *subplan_params; /* 如為子查詢,存儲子查詢的參數 if subquery */
     int         rel_parallel_workers;   /* 并行執行,需要多少個workers? wanted number of parallel workers */
 
     /* Information about foreign tables and foreign joins */
     //FWD相關信息
     Oid         serverid;       /* identifies server for the table or join */
     Oid         userid;         /* identifies user to check access as */
     bool        useridiscurrent;    /* join is only valid for current user */
     /* use "struct FdwRoutine" to avoid including fdwapi.h here */
     struct FdwRoutine *fdwroutine;
     void       *fdw_private;
 
     /* cache space for remembering if we have proven this relation unique */
     //已知的,可保證唯一的Relids鏈表
     List       *unique_for_rels;    /* known unique for these other relid
                                      * set(s) */
     List       *non_unique_for_rels;    /* 已知的,不唯一的Relids鏈表 known not unique for these set(s) */
 
     /* used by various scans and joins: */
     List       *baserestrictinfo;   /* 如為基本關系,存儲約束條件 RestrictInfo structures (if base rel) */
     QualCost    baserestrictcost;   /* 解析約束表達式的成本? cost of evaluating the above */
     Index       baserestrict_min_security;  /* 最低安全等級 min security_level found in
                                              * baserestrictinfo */
     List       *joininfo;       /* 連接語句的約束條件信息 RestrictInfo structures for join clauses
                                  * involving this rel */
     bool        has_eclass_joins;   /* 是否存在等價類連接? T means joininfo is incomplete */
 
     /* used by partitionwise joins: */
     bool        consider_partitionwise_join;    /* 分區? consider partitionwise
                                                  * join paths? (if
                                                  * partitioned rel) */
     Relids      top_parent_relids;  /* Relids of topmost parents (if "other"
                                      * rel) */
 
     /* used for partitioned relations */
     //分區表使用
     PartitionScheme part_scheme;    /* 分區的schema Partitioning scheme. */
     int         nparts;         /* 分區數 number of partitions */
     struct PartitionBoundInfoData *boundinfo;   /* 分區邊界信息 Partition bounds */
     List       *partition_qual; /* 分區約束 partition constraint */
     struct RelOptInfo **part_rels;  /* 分區的RelOptInfo數組 Array of RelOptInfos of partitions,
                                      * stored in the same order of bounds */
     List      **partexprs;      /* 非空分區鍵表達式 Non-nullable partition key expressions. */
     List      **nullable_partexprs; /* 可為空的分區鍵表達式 Nullable partition key expressions. */
     List       *partitioned_child_rels; /* RT Indexes鏈表 List of RT indexes. */
 } RelOptInfo;

二、源碼解讀

join_search_one_level->...(如make_rels_by_clause_joins)->make_join_rel->populate_joinrel_with_paths函數為新生成的連接joinrel(給定參與連接的relations)構造訪問路徑.
輸入參數中的sjinfo(SpecialJoinInfo結構體)提供了有關連接的詳細信息，限制條件鏈表restrictlist(List)包含連接條件子句和適用于給定連接關系對的其他條件子句。

//-------------------------------------------------------------------- populate_joinrel_with_paths
/*
 * populate_joinrel_with_paths
 *    Add paths to the given joinrel for given pair of joining relations. The
 *    SpecialJoinInfo provides details about the join and the restrictlist
 *    contains the join clauses and the other clauses applicable for given pair
 *    of the joining relations.
 *    為新生成的連接joinrel(給定參與連接的relations)構造訪問路徑.
 *    SpecialJoinInfo提供了有關連接的詳細信息，
 *    限制條件鏈表包含連接條件子句和適用于給定連接關系對的其他條件子句。
 */
static void
populate_joinrel_with_paths(PlannerInfo *root, RelOptInfo *rel1,
                            RelOptInfo *rel2, RelOptInfo *joinrel,
                            SpecialJoinInfo *sjinfo, List *restrictlist)
{
    /*
     * Consider paths using each rel as both outer and inner.  Depending on
     * the join type, a provably empty outer or inner rel might mean the join
     * is provably empty too; in which case throw away any previously computed
     * paths and mark the join as dummy.  (We do it this way since it's
     * conceivable that dummy-ness of a multi-element join might only be
     * noticeable for certain construction paths.)
     * 考慮使用每個rel分別作為外表和內表的路徑。
     * 根據連接類型的不同，一個可證明為空的外表或內表可能意味著連接結果也是為空;
     * 在這種情況下，丟棄任何以前計算過的路徑，并將連接標記為虛(dummy)連接。
     * 
     * Also, a provably constant-false join restriction typically means that
     * we can skip evaluating one or both sides of the join.  We do this by
     * marking the appropriate rel as dummy.  For outer joins, a
     * constant-false restriction that is pushed down still means the whole
     * join is dummy, while a non-pushed-down one means that no inner rows
     * will join so we can treat the inner rel as dummy.
     * 此外，可以證明的常量-false連接限制通常意味著我們可以跳過對連接的一個或兩個方面的表達式解析。
   * 我們通過將適當的rel標記為虛(dummy)來完成這一操作。
   * 對于外連接，被下推的常量-false限制仍然意味著整個連接是虛連接，
   * 而非下推的限制意味著沒有內部行會連接，因此我們可以將內部rel視為虛擬的。
   * 
     * We need only consider the jointypes that appear in join_info_list, plus
     * JOIN_INNER.
     * 只需要考慮出現在join_info_list和JOIN_INNER中的jointype。
     */
    switch (sjinfo->jointype)
    {
        case JOIN_INNER:
            if (is_dummy_rel(rel1) || is_dummy_rel(rel2) ||
                restriction_is_constant_false(restrictlist, joinrel, false))
            {
                mark_dummy_rel(joinrel);//設置為虛連接
                break;
            }
            add_paths_to_joinrel(root, joinrel, rel1, rel2,
                                 JOIN_INNER, sjinfo,
                                 restrictlist);//添加路徑,rel1為外表,rel2為內表
            add_paths_to_joinrel(root, joinrel, rel2, rel1,
                                 JOIN_INNER, sjinfo,
                                 restrictlist);//添加路徑,rel2為外表,rel1為內表
            break;
        case JOIN_LEFT://同上
            if (is_dummy_rel(rel1) ||
                restriction_is_constant_false(restrictlist, joinrel, true))
            {
                mark_dummy_rel(joinrel);
                break;
            }
            if (restriction_is_constant_false(restrictlist, joinrel, false) &&
                bms_is_subset(rel2->relids, sjinfo->syn_righthand))
                mark_dummy_rel(rel2);
            add_paths_to_joinrel(root, joinrel, rel1, rel2,
                                 JOIN_LEFT, sjinfo,
                                 restrictlist);
            add_paths_to_joinrel(root, joinrel, rel2, rel1,
                                 JOIN_RIGHT, sjinfo,
                                 restrictlist);
            break;
        case JOIN_FULL://同上
            if ((is_dummy_rel(rel1) && is_dummy_rel(rel2)) ||
                restriction_is_constant_false(restrictlist, joinrel, true))
            {
                mark_dummy_rel(joinrel);
                break;
            }
            add_paths_to_joinrel(root, joinrel, rel1, rel2,
                                 JOIN_FULL, sjinfo,
                                 restrictlist);
            add_paths_to_joinrel(root, joinrel, rel2, rel1,
                                 JOIN_FULL, sjinfo,
                                 restrictlist);

            /*
             * If there are join quals that aren't mergeable or hashable, we
             * may not be able to build any valid plan.  Complain here so that
             * we can give a somewhat-useful error message.  (Since we have no
             * flexibility of planning for a full join, there's no chance of
             * succeeding later with another pair of input rels.)
             * 如果有無法合并或不能合并的join quals，我們可能無法建立任何有效的計劃。
         * 在這里報錯，這樣就可以給出一些有用的錯誤消息。
       * (由于無法靈活地為全連接添加計劃，所以以后再加入另一對rels就沒有成功的機會了。)
             */
            if (joinrel->pathlist == NIL)
                ereport(ERROR,
                        (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
                         errmsg("FULL JOIN is only supported with merge-joinable or hash-joinable join conditions")));
            break;
        case JOIN_SEMI://半連接

            /*
             * We might have a normal semijoin, or a case where we don't have
             * enough rels to do the semijoin but can unique-ify the RHS and
             * then do an innerjoin (see comments in join_is_legal).  In the
             * latter case we can't apply JOIN_SEMI joining.
             * 可能有一個普通的半連接，或者我們沒有足夠的rels來做半連接，
       * 但是可以通過唯一化RHS，然后做一個innerjoin(請參閱join_is_legal中的注釋)。
       * 在后一種情況下，我們不能應用JOIN_SEMI join。
             */
            if (bms_is_subset(sjinfo->min_lefthand, rel1->relids) &&
                bms_is_subset(sjinfo->min_righthand, rel2->relids))
            {
                if (is_dummy_rel(rel1) || is_dummy_rel(rel2) ||
                    restriction_is_constant_false(restrictlist, joinrel, false))
                {
                    mark_dummy_rel(joinrel);
                    break;
                }
                add_paths_to_joinrel(root, joinrel, rel1, rel2,
                                     JOIN_SEMI, sjinfo,
                                     restrictlist);
            }

            /*
             * If we know how to unique-ify the RHS and one input rel is
             * exactly the RHS (not a superset) we can consider unique-ifying
             * it and then doing a regular join.  (The create_unique_path
             * check here is probably redundant with what join_is_legal did,
             * but if so the check is cheap because it's cached.  So test
             * anyway to be sure.)
       * 如果我們知道如何唯一化RHS，一個輸入rel恰好是RHS(不是超集)，
       * 我們可以考慮唯一化它，然后進行常規連接。
       * (這里的create_unique_path檢查與join_is_legal的檢查可能是冗余的，
         * 但是如果是的話，這樣的檢查的成本就顯得很低，因為它可以緩存。所以不管怎樣，還是要檢查一下。
             */
            if (bms_equal(sjinfo->syn_righthand, rel2->relids) &&
                create_unique_path(root, rel2, rel2->cheapest_total_path,
                                   sjinfo) != NULL)
            {
                if (is_dummy_rel(rel1) || is_dummy_rel(rel2) ||
                    restriction_is_constant_false(restrictlist, joinrel, false))
                {
                    mark_dummy_rel(joinrel);
                    break;
                }
                add_paths_to_joinrel(root, joinrel, rel1, rel2,
                                     JOIN_UNIQUE_INNER, sjinfo,
                                     restrictlist);
                add_paths_to_joinrel(root, joinrel, rel2, rel1,
                                     JOIN_UNIQUE_OUTER, sjinfo,
                                     restrictlist);
            }
            break;
        case JOIN_ANTI://反連接
            if (is_dummy_rel(rel1) ||
                restriction_is_constant_false(restrictlist, joinrel, true))
            {
                mark_dummy_rel(joinrel);
                break;
            }
            if (restriction_is_constant_false(restrictlist, joinrel, false) &&
                bms_is_subset(rel2->relids, sjinfo->syn_righthand))
                mark_dummy_rel(rel2);
            add_paths_to_joinrel(root, joinrel, rel1, rel2,
                                 JOIN_ANTI, sjinfo,
                                 restrictlist);
            break;
        default://非法的連接類型
            /* other values not expected here */
            elog(ERROR, "unrecognized join type: %d", (int) sjinfo->jointype);
            break;
    }

    /* 嘗試partitionwise技術. Apply partitionwise join technique, if possible. */
    try_partitionwise_join(root, rel1, rel2, joinrel, sjinfo, restrictlist);
}


//------------------------------------------------------------------- add_paths_to_joinrel
/*
 * add_paths_to_joinrel
 *    Given a join relation and two component rels from which it can be made,
 *    consider all possible paths that use the two component rels as outer
 *    and inner rel respectively.  Add these paths to the join rel's pathlist
 *    if they survive comparison with other paths (and remove any existing
 *    paths that are dominated by these paths).
 *    給出組成連接的兩個組合rels,嘗試所有可能的路徑進行連接,比如分別設置為outer和inner表等.
 *    如果連接的路徑在與其他路徑的比較中可以留存下來，
 *    則將這些路徑添加到連接rel的路徑列表中(并刪除現有的由這些路徑控制的其他路徑)。
 * 
 * Modifies the pathlist field of the joinrel node to contain the best
 * paths found so far.
 * 更新joinrel->pathlist鏈表已容納最優的訪問路徑.
 *
 * jointype is not necessarily the same as sjinfo->jointype; it might be
 * "flipped around" if we are considering joining the rels in the opposite
 * direction from what's indicated in sjinfo.
 * jointype不需要與sjinfo->jointype一致,如果我們考慮加入與sjinfo所示相反方向的rels，它可能是“翻轉”的。
 * 
 * Also, this routine and others in this module accept the special JoinTypes
 * JOIN_UNIQUE_OUTER and JOIN_UNIQUE_INNER to indicate that we should
 * unique-ify the outer or inner relation and then apply a regular inner
 * join.  These values are not allowed to propagate outside this module,
 * however.  Path cost estimation code may need to recognize that it's
 * dealing with such a case --- the combination of nominal jointype INNER
 * with sjinfo->jointype == JOIN_SEMI indicates that.
 * 此外，這個處理過程和這個模塊中的其他過程接受特殊的JoinTypes(JOIN_UNIQUE_OUTER和JOIN_UNIQUE_INNER)，
 * 以表明應該對外部或內部關系進行唯一化，然后應用一個常規的內部連接。
 * 但是，這些值不允許傳播到這個模塊之外。
 * 訪問路徑成本估算過程可能需要認識到，
 * 它正在處理這樣的情況———名義上的INNER jointype與sjinfo->jointype == JOIN_SEMI的組合。
 */
void
add_paths_to_joinrel(PlannerInfo *root,
                     RelOptInfo *joinrel,
                     RelOptInfo *outerrel,
                     RelOptInfo *innerrel,
                     JoinType jointype,
                     SpecialJoinInfo *sjinfo,
                     List *restrictlist)
{
    JoinPathExtraData extra;
    bool        mergejoin_allowed = true;
    ListCell   *lc;
    Relids      joinrelids;

    /*
     * PlannerInfo doesn't contain the SpecialJoinInfos created for joins
     * between child relations, even if there is a SpecialJoinInfo node for
     * the join between the topmost parents. So, while calculating Relids set
     * representing the restriction, consider relids of topmost parent of
     * partitions.
     * PlannerInfo不包含為子關系之間的連接創建的SpecialJoinInfo，
   * 即使最頂層的父關系之間有一個SpecialJoinInfo節點。
   * 因此，在計算表示限制條件的Relids集合時，需考慮分區的最頂層父類的Relids。
     */
    if (joinrel->reloptkind == RELOPT_OTHER_JOINREL)
        joinrelids = joinrel->top_parent_relids;
    else
        joinrelids = joinrel->relids;

    extra.restrictlist = restrictlist;
    extra.mergeclause_list = NIL;
    extra.sjinfo = sjinfo;
    extra.param_source_rels = NULL;

    /*
     * See if the inner relation is provably unique for this outer rel.
     * 判斷內表是否已被驗證為唯一.
     * 
     * We have some special cases: for JOIN_SEMI and JOIN_ANTI, it doesn't
     * matter since the executor can make the equivalent optimization anyway;
     * we need not expend planner cycles on proofs.  For JOIN_UNIQUE_INNER, we
     * must be considering a semijoin whose inner side is not provably unique
     * (else reduce_unique_semijoins would've simplified it), so there's no
     * point in calling innerrel_is_unique.  However, if the LHS covers all of
     * the semijoin's min_lefthand, then it's appropriate to set inner_unique
     * because the path produced by create_unique_path will be unique relative
     * to the LHS.  (If we have an LHS that's only part of the min_lefthand,
     * that is *not* true.)  For JOIN_UNIQUE_OUTER, pass JOIN_INNER to avoid
     * letting that value escape this module.
     * 存在一些特殊的情況:
     * 1.對于JOIN_SEMI和JOIN_ANTI，這無關緊要，因為執行器無論如何都可以進行等價的優化;
     *   這些不需要在證明上花費時間證明。
   *  2.對于JOIN_UNIQUE_INNER，必須考慮一個內部不是唯一的半連接(否則reduce_unique_semijoin會簡化它)，
   *    所以調用innerrel_is_unique沒有任何意義。
   *    但是，如果LHS覆蓋了半連接的所有min_left，那么就應該設置inner_unique，
   *    因為create_unique_path生成的路徑相對于LHS是唯一的。
   *    (如果LHS只是min_left的一部分，那就不是真的)
   *    對于JOIN_UNIQUE_OUTER，傳遞JOIN_INNER以避免讓該值轉義這個模塊。
     */
    switch (jointype)
    {
        case JOIN_SEMI:
        case JOIN_ANTI:
            extra.inner_unique = false; /* well, unproven */
            break;
        case JOIN_UNIQUE_INNER:
            extra.inner_unique = bms_is_subset(sjinfo->min_lefthand,
                                               outerrel->relids);
            break;
        case JOIN_UNIQUE_OUTER:
            extra.inner_unique = innerrel_is_unique(root,
                                                    joinrel->relids,
                                                    outerrel->relids,
                                                    innerrel,
                                                    JOIN_INNER,
                                                    restrictlist,
                                                    false);
            break;
        default:
            extra.inner_unique = innerrel_is_unique(root,
                                                    joinrel->relids,
                                                    outerrel->relids,
                                                    innerrel,
                                                    jointype,
                                                    restrictlist,
                                                    false);
            break;
    }

    /*
     * Find potential mergejoin clauses.  We can skip this if we are not
     * interested in doing a mergejoin.  However, mergejoin may be our only
     * way of implementing a full outer join, so override enable_mergejoin if
     * it's a full join.
     * 尋找潛在的mergejoin條件。如果不允許Merge Join,則跳過。
   * 然而，mergejoin可能是實現完整外部連接的唯一方法，
     * 因此，如果它是完全連接，則不理會enable_mergejoin參數。
     */
    if (enable_mergejoin || jointype == JOIN_FULL)
        extra.mergeclause_list = select_mergejoin_clauses(root,
                                                          joinrel,
                                                          outerrel,
                                                          innerrel,
                                                          restrictlist,
                                                          jointype,
                                                          &mergejoin_allowed);

    /*
     * If it's SEMI, ANTI, or inner_unique join, compute correction factors
     * for cost estimation.  These will be the same for all paths.
     * 如果是半連接、反連接或inner_unique連接，則計算成本估算的相關因子,該值對所有路徑都是一樣的。
     */
    if (jointype == JOIN_SEMI || jointype == JOIN_ANTI || extra.inner_unique)
        compute_semi_anti_join_factors(root, joinrel, outerrel, innerrel,
                                       jointype, sjinfo, restrictlist,
                                       &extra.semifactors);

    /*
     * Decide whether it's sensible to generate parameterized paths for this
     * joinrel, and if so, which relations such paths should require.  There
     * is usually no need to create a parameterized result path unless there
     * is a join order restriction that prevents joining one of our input rels
     * directly to the parameter source rel instead of joining to the other
     * input rel.  (But see allow_star_schema_join().)  This restriction
     * reduces the number of parameterized paths we have to deal with at
     * higher join levels, without compromising the quality of the resulting
     * plan.  We express the restriction as a Relids set that must overlap the
     * parameterization of any proposed join path.
     * 確定為這個連接生成參數化路徑是否合理，如果是，這些路徑應該需要哪些關系。
   * 通常不需要創建一個參數化的結果路徑，除非存在一個連接順序限制，
   * 阻止將一個關系直接連接到參數源rel，而不是連接到另一個input rel(但請參閱allow_star_schema_join())。
   * 這種限制減少了在更高的連接級別上必須處理的參數化路徑的數量，而不會影響最終計劃的質量。
   * 把這個限制表示為一個Relids集合，它必須與任何建議的連接路徑的參數化重疊。
     */
    foreach(lc, root->join_info_list)
    {
        SpecialJoinInfo *sjinfo2 = (SpecialJoinInfo *) lfirst(lc);

        /*
         * SJ is relevant to this join if we have some part of its RHS
         * (possibly not all of it), and haven't yet joined to its LHS.  (This
         * test is pretty simplistic, but should be sufficient considering the
         * join has already been proven legal.)  If the SJ is relevant, it
         * presents constraints for joining to anything not in its RHS.
     * 與這個連接相關的SJ，如果有它的部分RHS(可能不是全部)，并且還沒有加入它的LHS。
       * (這個驗證非常簡單，但是考慮到連接已經被證明是合法的，這個驗證就足夠了。)
     * 如果SJ是相關的，那么它就為連接到其RHS之外的任何內容提供了約束。
         */
        if (bms_overlap(joinrelids, sjinfo2->min_righthand) &&
            !bms_overlap(joinrelids, sjinfo2->min_lefthand))
            extra.param_source_rels = bms_join(extra.param_source_rels,
                                               bms_difference(root->all_baserels,
                                                              sjinfo2->min_righthand));

        /* 全連接在語法上約束左右兩邊.full joins constrain both sides symmetrically */
        if (sjinfo2->jointype == JOIN_FULL &&
            bms_overlap(joinrelids, sjinfo2->min_lefthand) &&
            !bms_overlap(joinrelids, sjinfo2->min_righthand))
            extra.param_source_rels = bms_join(extra.param_source_rels,
                                               bms_difference(root->all_baserels,
                                                              sjinfo2->min_lefthand));
    }

    /*
     * However, when a LATERAL subquery is involved, there will simply not be
     * any paths for the joinrel that aren't parameterized by whatever the
     * subquery is parameterized by, unless its parameterization is resolved
     * within the joinrel.  So we might as well allow additional dependencies
     * on whatever residual lateral dependencies the joinrel will have.
     * 然而，當涉及到一個LATERAL子查詢時，除非在joinrel中解析其參數化，
   * 否則joinrel的任何路徑都不會被子查詢參數化。
   * 因此，也可以允許對joinrel將擁有的任何剩余的LATERAL依賴進行額外依賴。
     */
    extra.param_source_rels = bms_add_members(extra.param_source_rels,
                                              joinrel->lateral_relids);

    /*
     * 1. Consider mergejoin paths where both relations must be explicitly
     * sorted.  Skip this if we can't mergejoin.
     * 1. 嘗試merge join訪問路徑，其中兩個關系必須執行顯式的排序。
     *    如果禁用merge join，則跳過。
     */
    if (mergejoin_allowed)
        sort_inner_and_outer(root, joinrel, outerrel, innerrel,
                             jointype, &extra);

    /*
     * 2. Consider paths where the outer relation need not be explicitly
     * sorted. This includes both nestloops and mergejoins where the outer
     * path is already ordered.  Again, skip this if we can't mergejoin.
     * (That's okay because we know that nestloop can't handle right/full
     * joins at all, so it wouldn't work in the prohibited cases either.)
     * 2. 考慮外部關系不需要顯式排序的路徑。
   *    這包括nestloop和mergejoin，它們的外部路徑已經排序。
   *    再一次的，如果禁用merge join，則跳過。
     *    (nestloop無法處理正確/完全連接，所以在禁止的情況下它也無法工作)
     */
    if (mergejoin_allowed)
        match_unsorted_outer(root, joinrel, outerrel, innerrel,
                             jointype, &extra);

#ifdef NOT_USED

    /*
     * 3. Consider paths where the inner relation need not be explicitly
     * sorted.  This includes mergejoins only (nestloops were already built in
     * match_unsorted_outer).
     * 3. 嘗試內部關系不需要顯式排序的路徑。這只包括mergejoin(在match_unsorted_outer中已經構建了nestloop)。
     * (已廢棄)
   * 
     * Diked out as redundant 2/13/2000 -- tgl.  There isn't any really
     * significant difference between the inner and outer side of a mergejoin,
     * so match_unsorted_inner creates no paths that aren't equivalent to
     * those made by match_unsorted_outer when add_paths_to_joinrel() is
     * invoked with the two rels given in the other order.
     */
    if (mergejoin_allowed)
        match_unsorted_inner(root, joinrel, outerrel, innerrel,
                             jointype, &extra);
#endif

    /*
     * 4. Consider paths where both outer and inner relations must be hashed
     * before being joined.  As above, disregard enable_hashjoin for full
     * joins, because there may be no other alternative.
   * 4. 考慮在連接之前必須對外部/內部關系進行散列處理的路徑。
   *    如上所述，對于完全連接，忽略enable_hashjoin，因為可能沒有其他選擇。
     */
    if (enable_hashjoin || jointype == JOIN_FULL)
        hash_inner_and_outer(root, joinrel, outerrel, innerrel,
                             jointype, &extra);

    /*
     * 5. If inner and outer relations are foreign tables (or joins) belonging
     * to the same server and assigned to the same user to check access
     * permissions as, give the FDW a chance to push down joins.
     * 如果內部和外部關系是屬于同一服務器的外部表(或連接)，
     * 并分配給同一用戶以檢查訪問權限，則FDW有機會下推連接。
     */
    if (joinrel->fdwroutine &&
        joinrel->fdwroutine->GetForeignJoinPaths)
        joinrel->fdwroutine->GetForeignJoinPaths(root, joinrel,
                                                 outerrel, innerrel,
                                                 jointype, &extra);

    /*
     * 6. Finally, give extensions a chance to manipulate the path list.
     * 6. 最后,調用擴展鉤子函數.
     */
    if (set_join_pathlist_hook)
        set_join_pathlist_hook(root, joinrel, outerrel, innerrel,
                               jointype, &extra);
}

三、跟蹤分析

SQL語句如下:

testdb=# explain verbose select dw.*,grjf.grbh,grjf.xm,grjf.ny,grjf.je 
from t_dwxx dw,lateral (select gr.grbh,gr.xm,jf.ny,jf.je 
                        from t_grxx gr inner join t_jfxx jf 
                                       on gr.dwbh = dw.dwbh 
                                          and gr.grbh = jf.grbh) grjf 
order by dw.dwbh;
                                              QUERY PLAN                                               
-------------------------------------------------------------------------------------------------------
 Merge Join  (cost=18841.64..21009.94 rows=99850 width=47)
   Output: dw.dwmc, dw.dwbh, dw.dwdz, gr.grbh, gr.xm, jf.ny, jf.je
   Merge Cond: ((dw.dwbh)::text = (gr.dwbh)::text)
   ->  Index Scan using t_dwxx_pkey on public.t_dwxx dw  (cost=0.29..399.62 rows=10000 width=20)
         Output: dw.dwmc, dw.dwbh, dw.dwdz
   ->  Materialize  (cost=18836.82..19336.82 rows=100000 width=31)
         Output: gr.grbh, gr.xm, gr.dwbh, jf.ny, jf.je
         ->  Sort  (cost=18836.82..19086.82 rows=100000 width=31)
               Output: gr.grbh, gr.xm, gr.dwbh, jf.ny, jf.je
               Sort Key: gr.dwbh
               ->  Hash Join  (cost=3465.00..8138.00 rows=100000 width=31)
                     Output: gr.grbh, gr.xm, gr.dwbh, jf.ny, jf.je
                     Hash Cond: ((jf.grbh)::text = (gr.grbh)::text)
                     ->  Seq Scan on public.t_jfxx jf  (cost=0.00..1637.00 rows=100000 width=20)
                           Output: jf.ny, jf.je, jf.grbh
                     ->  Hash  (cost=1726.00..1726.00 rows=100000 width=16)
                           Output: gr.grbh, gr.xm, gr.dwbh
                           ->  Seq Scan on public.t_grxx gr  (cost=0.00..1726.00 rows=100000 width=16)
                                 Output: gr.grbh, gr.xm, gr.dwbh
(19 rows)

參與連接的有3張基表,分別是t_dwxx/t_grxx/t_jfxx,從執行計劃可見,由于存在order by dwbh排序子句,優化器"聰明"的選擇Merge Join.

啟動gdb,設置斷點,只考察level=3的情況(最終結果)

(gdb) b join_search_one_level
Breakpoint 1 at 0x755667: file joinrels.c, line 67.
(gdb) c
Continuing.

Breakpoint 1, join_search_one_level (root=0x1cae678, level=2) at joinrels.c:67
67    List    **joinrels = root->join_rel_level;
(gdb) c
Continuing.

Breakpoint 1, join_search_one_level (root=0x1cae678, level=3) at joinrels.c:67
67    List    **joinrels = root->join_rel_level;
(gdb)

跟蹤populate_joinrel_with_paths

(gdb) b populate_joinrel_with_paths
Breakpoint 2 at 0x75646d: file joinrels.c, line 780.

進入populate_joinrel_with_paths函數

(gdb) c
Continuing.

Breakpoint 2, populate_joinrel_with_paths (root=0x1cae678, rel1=0x1d10978, rel2=0x1d09610, joinrel=0x1d131b8, 
    sjinfo=0x7ffef59baf20, restrictlist=0x1d135e8) at joinrels.c:780
780   switch (sjinfo->jointype)

查看輸入參數
1.root:simple_rte_array數組,其中simple_rel_array_size = 6,存在6個Item,1->16734/t_dwxx,3->16742/t_grxx,4->16747/t_jfxx
2.rel1:1號和3號連接生成的Relation,即t_dwxx和t_grxx連接
3.rel2:4號RTE,即t_jfxx
4.joinrel:rel1和rel2通過build_join_rel函數生成的連接Relation
5.sjinfo:連接信息,連接類型為內連接JOIN_INNER
6.restrictlist:約束條件鏈表,t_grxx.grbh=t_jfxx.grbh

(gdb) p *root
$3 = {type = T_PlannerInfo, parse = 0x1cd7830, glob = 0x1cb8d38, query_level = 1, parent_root = 0x0, plan_params = 0x0, 
  outer_params = 0x0, simple_rel_array = 0x1d07af8, simple_rel_array_size = 6, simple_rte_array = 0x1d07b48, 
  all_baserels = 0x1d0ada8, nullable_baserels = 0x0, join_rel_list = 0x1d10e48, join_rel_hash = 0x0, 
  join_rel_level = 0x1d10930, join_cur_level = 3, init_plans = 0x0, cte_plan_ids = 0x0, multiexpr_params = 0x0, 
  eq_classes = 0x1d0a6d8, canon_pathkeys = 0x1d0ad28, left_join_clauses = 0x0, right_join_clauses = 0x0, 
  full_join_clauses = 0x0, join_info_list = 0x0, append_rel_list = 0x0, rowMarks = 0x0, placeholder_list = 0x0, 
  fkey_list = 0x0, query_pathkeys = 0x1d0ad78, group_pathkeys = 0x0, window_pathkeys = 0x0, distinct_pathkeys = 0x0, 
  sort_pathkeys = 0x1d0ad78, part_schemes = 0x0, initial_rels = 0x1d108c0, upper_rels = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 
    0x0}, upper_targets = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, processed_tlist = 0x1cbb608, grouping_map = 0x0, 
  minmax_aggs = 0x0, planner_cxt = 0x1bfa040, total_table_pages = 1427, tuple_fraction = 0, limit_tuples = -1, 
  qual_security_level = 0, inhTargetKind = INHKIND_NONE, hasJoinRTEs = true, hasLateralRTEs = false, 
  hasDeletedRTEs = false, hasHavingQual = false, hasPseudoConstantQuals = false, hasRecursion = false, wt_param_id = -1, 
  non_recursive_path = 0x0, curOuterRels = 0x0, curOuterParams = 0x0, join_search_private = 0x0, partColsUpdated = false}
(gdb) p *root->simple_rte_array[1]
$4 = {type = T_RangeTblEntry, rtekind = RTE_RELATION, relid = 16734, relkind = 114 'r', tablesample = 0x0, subquery = 0x0, ...
...
(gdb) p *rel1->relids
$10 = {nwords = 1, words = 0x1d10b8c}
(gdb) p *rel1->relids->words
$11 = 10
(gdb) p *rel2->relids->words
$13 = 16
(gdb) p *joinrel->relids->words
$15 = 26
(gdb) p *sjinfo
$16 = {type = T_SpecialJoinInfo, min_lefthand = 0x1d10b88, min_righthand = 0x1d09518, syn_lefthand = 0x1d10b88, 
  syn_righthand = 0x1d09518, jointype = JOIN_INNER, lhs_strict = false, delay_upper_joins = false, semi_can_btree = false, 
  semi_can_hash = false, semi_operators = 0x0, semi_rhs_exprs = 0x0}
...
(gdb) p *(Var *)((RelabelType *)$args->head->data.ptr_value)->arg
$34 = {xpr = {type = T_Var}, varno = 3, varattno = 2, vartype = 1043, vartypmod = 14, varcollid = 100, varlevelsup = 0, 
  varnoold = 3, varoattno = 2, location = 273}  -->t_grxx.grbh
(gdb) p *(Var *)((RelabelType *)$args->head->next->data.ptr_value)->arg
$35 = {xpr = {type = T_Var}, varno = 4, varattno = 1, vartype = 1043, vartypmod = 14, varcollid = 100, varlevelsup = 0, 
  varnoold = 4, varoattno = 1, location = 283}  -->t_jfxx.grbh

進入JOIN_INNER分支,調用函數add_paths_to_joinrel

(gdb) 
789       add_paths_to_joinrel(root, joinrel, rel1, rel2,

進入add_paths_to_joinrel函數

(gdb) step
add_paths_to_joinrel (root=0x1cae678, joinrel=0x1d131b8, outerrel=0x1d10978, innerrel=0x1d09610, jointype=JOIN_INNER, 
    sjinfo=0x7ffef59baf20, restrictlist=0x1d135e8) at joinpath.c:126
126   bool    mergejoin_allowed = true;

判斷內表是否已被驗證為唯一

162   switch (jointype)
(gdb) 
182       extra.inner_unique = innerrel_is_unique(root,
(gdb) 
189       break;
(gdb) p extra.inner_unique
$36 = false

尋找潛在的mergejoin條件。如果不允許Merge Join,則跳過
merge join的條件是t_grxx.grbh=t_jfxx.grbh

(gdb) n
198   if (enable_mergejoin || jointype == JOIN_FULL)
(gdb) 
199     extra.mergeclause_list = select_mergejoin_clauses(root,
(gdb) 
211   if (jointype == JOIN_SEMI || jointype == JOIN_ANTI || extra.inner_unique)
(gdb) p *(Var *)((RelabelType *)$args->head->data.ptr_value)->arg
$47 = {xpr = {type = T_Var}, varno = 3, varattno = 2, vartype = 1043, vartypmod = 14, varcollid = 100, varlevelsup = 0, 
  varnoold = 3, varoattno = 2, location = 273} -->t_grxx.grbh
(gdb) p *(Var *)((RelabelType *)$args->head->next->data.ptr_value)->arg
$48 = {xpr = {type = T_Var}, varno = 4, varattno = 1, vartype = 1043, vartypmod = 14, varcollid = 100, varlevelsup = 0, 
  varnoold = 4, varoattno = 1, location = 283} -->t_jfxx.grbh

確定為這個連接生成參數化路徑是否合理，如果是，這些路徑應該需要哪些關系(結果為:NULL)

(gdb) 
261   extra.param_source_rels = bms_add_members(extra.param_source_rels,
(gdb) 
268   if (mergejoin_allowed)
(gdb) p *extra.param_source_rels
Cannot access memory at address 0x0

嘗試merge join訪問路徑，其中兩個關系必須執行顯式的排序.
注:joinrel->pathlist在執行前為NULL,執行后生成了訪問路徑.

(gdb) p *joinrel->pathlist
Cannot access memory at address 0x0
(gdb) n
269     sort_inner_and_outer(root, joinrel, outerrel, innerrel,
(gdb) 
279   if (mergejoin_allowed)
(gdb) p *joinrel->pathlist
$50 = {type = T_List, length = 1, head = 0x1d13850, tail = 0x1d13850}

其他實現邏輯類似,sort_inner_and_outer等函數的實現邏輯,后續再行詳細解讀.
最終結果是生成了2條訪問路徑,存儲在pathlist鏈表中.

324   if (set_join_pathlist_hook)
(gdb) 
327 }
(gdb) p *joinrel->pathlist
$51 = {type = T_List, length = 2, head = 0x1d13850, tail = 0x1d13930}
(gdb) p *(Node *)joinrel->pathlist->head->data.ptr_value
$52 = {type = T_HashPath}
(gdb) p *(HashPath *)joinrel->pathlist->head->data.ptr_value
$53 = {jpath = {path = {type = T_HashPath, pathtype = T_HashJoin, parent = 0x1d131b8, pathtarget = 0x1d133c8, 
      param_info = 0x0, parallel_aware = false, parallel_safe = true, parallel_workers = 0, rows = 99850, 
      startup_cost = 3762, total_cost = 10075.348750000001, pathkeys = 0x0}, jointype = JOIN_INNER, inner_unique = false, 
    outerjoinpath = 0x1d11f48, innerjoinpath = 0x1d0f548, joinrestrictinfo = 0x1d135e8}, path_hashclauses = 0x1d13aa0, 
  num_batches = 2, inner_rows_total = 100000}
(gdb) p *(Node *)joinrel->pathlist->head->next->data.ptr_value
$54 = {type = T_NestPath}
(gdb) p *(NestPath *)joinrel->pathlist->head->next->data.ptr_value
$55 = {path = {type = T_NestPath, pathtype = T_NestLoop, parent = 0x1d131b8, pathtarget = 0x1d133c8, param_info = 0x0, 
    parallel_aware = false, parallel_safe = true, parallel_workers = 0, rows = 99850, startup_cost = 39.801122856046675, 
    total_cost = 41318.966172885761, pathkeys = 0x1d0b818}, jointype = JOIN_INNER, inner_unique = false, 
  outerjoinpath = 0x1d119d8, innerjoinpath = 0x1d0f9d8, joinrestrictinfo = 0x0}

到此，相信大家對“PostgreSQL中哪個函數為連接新生成的joinrel構造訪問路徑”有了更深的了解，不妨來實際操作一番吧！這里是億速云網站，更多相關內容可以進入相關頻道進行查詢，關注我們，繼續學習！

向AI問一下細節

中文字幕av专区_日韩电影在线播放_精品国产精品久久一区免费式_av在线免费观看网站

PostgreSQL中哪個函數為連接新生成的joinrel構造訪問路徑

一、數據結構

二、源碼解讀

三、跟蹤分析

猜你喜歡

中文字幕av专区_日韩电影在线播放_精品国产精品久久一区免费式_av在线免费观看网站

PostgreSQL中哪個函數為連接新生成的joinrel構造訪問路徑

一、數據結構

二、源碼解讀

三、跟蹤分析

猜你喜歡

最新資訊

相關推薦

相關標簽