您好,登錄后才能下訂單哦!
本篇內容介紹了“Mapping怎么使用”的有關知識,在實際案例的操作過程中,不少人都會遇到這樣的困境,接下來就讓小編帶領大家學習一下如何處理這些情況吧!希望大家仔細閱讀,能夠學有所成!
1
背景背景
登錄狀態下,日志中會采集到用戶的登錄id(account),可以做到用戶身份的精確標識;而在匿名狀態下,日志中沒有采集到用戶的登錄id,準確標識用戶,成為一件極其棘手的事情;
在事件日志中,對用戶能產生標識作用的字段有:
app日志中,有deviceid,account
web日志中,有cookieid,ip,account
wxapp日志中,有openid,account
在現實中,一個用戶,可能處于如下極其復雜的狀態:
登錄狀態訪問app
匿名狀態訪問app
登錄狀態訪問web
匿名狀態訪問web
登錄狀態訪問wx小程序
匿名狀態訪問wx小程序
一個用戶可能擁有不止一臺終端設備
一臺終端設備上可能有多個用戶使用
一個用戶可能一段時間后更換手機
……
解決方案
適用場景
適合沒有用戶注冊體系,或者極少數用戶會進行多設備登錄的產品,如工具類產品、搜索引擎、部分小型電商等。
這也是絕大多數數據分析產品唯一提供的方案。
局限性
同一用戶在不同設備使用會被認為不同的用戶,對后續的分析統計有影響。
不同用戶在相同設備使用會被認為是一個用戶,也對后續的分析統計有影響。
但如果用戶跨設備使用或者多用戶共用設備不是產品的常見場景的話,可以忽略上述問題。
適用場景
成功關聯設備 ID 和登錄 ID 之后,用戶在該設備 ID 上或該登錄 ID 下的行為就會貫通,被認為是一個 全局 ID 發生的。在進行事件、漏斗、留存等用戶相關分析時也會算作一個用戶。
關聯設備 ID 和登錄 ID 的方法雖然實現了更準確的用戶追蹤,但是也會增加復雜度。
所以一般來說,我們建議只有當同時滿足以下條件時,才考慮進行 ID 關聯:
需要貫通一個用戶在一個設備上注冊前后的行為。
需要貫通一個注冊用戶在不同設備上登錄之后的行為
局限性
一個設備 ID 只能和一個登錄 ID 關聯,而事實上一臺設備可能有多個用戶使用。
一個登錄 ID 只能和一個設備 ID 關聯,而事實上一個用戶可能用一個登錄 ID 在多臺設備上登錄。
適用場景
一個用戶在多個設備上進行登錄是一種比較常見的場景,比如 Web 端和 App 端可能都需要進行登錄。支持一個登錄 ID 下關聯多設備 ID 之后,用戶在多設備下的行為就會貫通,被認為是一個ID 發生的。
局限性
一個設備 ID 只能和一個登錄 ID 關聯,而事實上一臺設備可能有多個用戶使用。
一個設備 ID 一旦跟某個登錄 ID 關聯或者一個登錄 ID 和一個設備 ID 關聯,就不能解除(自動解除)。
而事實上,設備 ID 和登錄 ID 的動態關聯才應該是更合理的。
基本原則,與方案3相同
修正之處,一個設備ID被綁定到某個登陸ID(A)之后,如果該設備在后續一段時間(比如一個月內)被一個新的登陸ID(B)更頻繁使用,則該設備ID會被調整至綁定登陸ID(B)
實現
create table wedw_dw.test_id_mapping( device_id string comment '設備ID',user_id string comment '用戶ID',page_url string comment '瀏覽頁面',login_time timestamp comment '登錄時間') partitioned by (date_id string)row format delimited fields terminated by ','stored as textfile;
+------------+----------+------------+------------------------+-------------+--+| device_id | user_id | page_url | login_time | date_id |+------------+----------+------------+------------------------+-------------+--+| device01 | user01 | pageview | 2020-09-01 09:01:00.0 | 2020-09-01 || device01 | user01 | pageview | 2020-09-01 09:02:00.0 | 2020-09-01 || device01 | user01 | pageview | 2020-09-01 09:03:00.0 | 2020-09-01 || device01 | user01 | pageview | 2020-09-01 09:04:00.0 | 2020-09-01 || device02 | user02 | pageview | 2020-09-01 09:03:00.0 | 2020-09-01 || device02 | user02 | pageview | 2020-09-01 09:04:00.0 | 2020-09-01 || device02 | user02 | pageview | 2020-09-01 09:05:00.0 | 2020-09-01 || device02 | user02 | pageview | 2020-09-01 09:06:00.0 | 2020-09-01 || device02 | null | pageview | 2020-09-01 09:07:00.0 | 2020-09-01 || device02 | user03 | pageview | 2020-09-01 09:08:00.0 | 2020-09-01 || device03 | null | pageview | 2020-09-01 09:01:00.0 | 2020-09-01 || device03 | null | pageview | 2020-09-01 09:02:00.0 | 2020-09-01 || device04 | null | pageview | 2020-09-01 09:08:00.0 | 2020-09-01 || device04 | null | pageview | 2020-09-01 09:09:00.0 | 2020-09-01 |+------------+----------+------------+------------------------+-------------+--+
一個賬號在一個設備上的訪問記錄條數無論是多少,只計算一次分數 對每個設備上,登錄過的 賬號,按時間先后順序,賦予不同的分值(依次遞減10分) 評分結果示例如下:
#計算設備ID對應的全局IDdrop table if exists wedw_tmp.login_info_1;create table wedw_tmp.login_info_1asselect t1.device_id,t1.user_idfrom( select device_id ,user_id ,page_url ,login_time ,row_number() over(partition by device_id order by login_time asc) as rn from wedw_dw.test_id_mapping where date_id = '2020-09-01' and user_id<> 'null') t1where rn = 1;+------------+----------+--+| device_id | user_id |+------------+----------+--+| device01 | user01 || device02 | user02 |+------------+----------+--+
#計算設備ID對應的每個用戶的登錄信息及評分drop table if exists wedw_tmp.login_info_2;create table wedw_tmp.login_info_2asselectt4.device_id,collect_list(t4.login_info) as login_infofrom( select t3.device_id ,concat(t3.user_id,'->',110-10*t3.rn,'->',t3.login_time) as login_info from ( select t2.device_id ,t2.user_id ,t2.login_time ,row_number() over(partition by device_id order by login_time asc) as rn from ( select device_id ,user_id ,min(login_time) as login_time from wedw_dw.test_id_mapping where date_id = '2020-09-01' and user_id <> 'null' group by device_id,user_id ) t2 ) t3) t4group by t4.device_id;+------------+-------------------------------------------------------------------------+--+| device_id | login_info |+------------+-------------------------------------------------------------------------+--+| device01 | ["user01->100->2020-09-01 09:01:00"] || device02 | ["user02->100->2020-09-01 09:03:00","user03->90->2020-09-01 09:08:00"] |+------------+-------------------------------------------------------------------------+--+
# 創建guid映射表drop table if exists wedw_dw.device_guid;create table wedw_dw.device_guid( device_id string comment '設備ID',login_info array<string> comment '用戶登錄信息',guid string comment '全局ID')row format delimited fields terminated by ','stored as textfile;
# 將初始化好的數據插入guid映射表insert into table wedw_dw.device_guidselect t1.device_id,t2.login_info,t1.user_id as guidfromwedw_tmp.login_info_1 t1inner join wedw_tmp.login_info_2 t2on t1.device_id = t2.device_id;+------------+-------------------------------------------------------------------------+---------+--+| device_id | login_info | guid |+------------+-------------------------------------------------------------------------+---------+--+| device01 | ["user01->100->2020-09-01 09:01:00"] | user01 || device02 | ["user02->100->2020-09-01 09:03:00","user03->90->2020-09-01 09:08:00"] | user02 |+------------+-------------------------------------------------------------------------+---------+--+
+------------+----------+------------+------------------------+-------------+--+| device_id | user_id | page_url | login_time | date_id |+------------+----------+------------+------------------------+-------------+--+| device01 | user01 | pageview | 2020-09-02 09:11:00.0 | 2020-09-02 || device01 | user01 | pageview | 2020-09-02 09:12:00.0 | 2020-09-02 || device01 | user01 | pageview | 2020-09-02 09:13:00.0 | 2020-09-02 || device01 | user01 | pageview | 2020-09-02 09:14:00.0 | 2020-09-02 || device02 | user03 | pageview | 2020-09-02 09:13:00.0 | 2020-09-02 || device02 | user03 | pageview | 2020-09-02 09:14:00.0 | 2020-09-02 || device02 | user03 | pageview | 2020-09-02 09:15:00.0 | 2020-09-02 || device02 | user03 | pageview | 2020-09-02 09:16:00.0 | 2020-09-02 || device02 | null | pageview | 2020-09-02 09:17:00.0 | 2020-09-02 || device02 | user02 | pageview | 2020-09-02 09:18:00.0 | 2020-09-02 || device03 | null | pageview | 2020-09-02 09:12:00.0 | 2020-09-02 || device03 | user04 | pageview | 2020-09-02 09:12:00.0 | 2020-09-02 |+------------+----------+------------+------------------------+-------------+--+
# 計算本次設備ID對應的全局IDdrop table if exists wedw_tmp.login_info_3;create table wedw_tmp.login_info_3asselect t1.device_id,t1.user_idfrom( select device_id ,user_id ,page_url ,login_time ,row_number() over(partition by device_id order by login_time asc) as rn from wedw_dw.test_id_mapping where date_id = '2020-09-02' and user_id<> 'null') t1where rn = 1;+------------+----------+--+| device_id | user_id |+------------+----------+--+| device01 | user01 || device02 | user03 || device03 | user04 |+------------+----------+--+
#計算本次設備ID對應的每個用戶的登錄信息及評分drop table if exists wedw_tmp.login_info_4;create table wedw_tmp.login_info_4asselectt4.device_id,collect_list(t4.login_info) as login_infofrom( select t3.device_id ,concat(t3.user_id,'->',110-10*t3.rn,'->',t3.login_time) as login_info from ( select t2.device_id ,t2.user_id ,t2.login_time ,row_number() over(partition by device_id order by login_time asc) as rn from ( select device_id ,user_id ,min(login_time) as login_time from wedw_dw.test_id_mapping where date_id = '2020-09-02' and user_id <> 'null' group by device_id,user_id ) t2 ) t3) t4group by t4.device_id;+------------+-------------------------------------------------------------------------+--+| device_id | login_info |+------------+-------------------------------------------------------------------------+--+| device01 | ["user01->100->2020-09-02 09:11:00"] || device02 | ["user03->100->2020-09-02 09:13:00","user02->90->2020-09-02 09:18:00"] || device03 | ["user04->100->2020-09-02 09:12:00"] |+------------+-------------------------------------------------------------------------+--+
計算邏輯:1. 針對當日的日志數據,為每個設備上登錄過的每個賬號打分 評分規則:
一個賬號在一個設備上的訪問記錄條數無論是多少,只計算一次分數 對每個設備上,
登錄過的賬號,按時間先后順序,賦予不同的分值(依次遞減10分) 示例結果如下:
# 將本次計算好的guid映射結果插入到臨時表drop table if exists wedw_tmp.login_info_5;create table wedw_tmp.login_info_5select t1.device_id,t2.login_info,t1.user_id as guidfromwedw_tmp.login_info_3 t1inner join wedw_tmp.login_info_4 t2on t1.device_id = t2.device_id;+------------+-------------------------------------------------------------------------+---------+--+| device_id | login_info | guid |+------------+-------------------------------------------------------------------------+---------+--+| device01 | ["user01->100->2020-09-02 09:11:00"] | user01 || device02 | ["user03->100->2020-09-02 09:13:00","user02->90->2020-09-02 09:18:00"] | user03 || device03 | ["user04->100->2020-09-02 09:12:00"] | user04 |+------------+-------------------------------------------------------------------------+---------+--+
# 兩次結果數據行轉列插到一個臨時表中drop table if exists wedw_tmp.login_info_6;create table wedw_tmp.login_info_6selectdevice_id,split(tmp.sub,'->')[0] as user_id,split(tmp.sub,'->')[1] as score,split(tmp.sub,'->')[2] as login_timefromwedw_dw.device_guidlateral view explode(login_info) tmp as subunion all selectdevice_id,split(tmp.sub,'->')[0] as user_id,split(tmp.sub,'->')[1] as score,split(tmp.sub,'->')[2] as login_timefromwedw_tmp.login_info_5lateral view explode(login_info) tmp as sub;+------------+----------+--------+----------------------+--+| device_id | user_id | score | login_time |+------------+----------+--------+----------------------+--+| device01 | user01 | 100 | 2020-09-01 09:01:00 || device02 | user02 | 100 | 2020-09-01 09:03:00 || device02 | user03 | 90 | 2020-09-01 09:08:00 || device01 | user01 | 100 | 2020-09-02 09:11:00 || device02 | user03 | 100 | 2020-09-02 09:13:00 || device02 | user02 | 90 | 2020-09-02 09:18:00 || device03 | user04 | 100 | 2020-09-02 09:12:00 |+------------+----------+--------+----------------------+--+
# 計算前面結果數據中每個設備下每個用戶的評分和及最早登錄時間drop table if exists wedw_tmp.login_info_7;create table wedw_tmp.login_info_7select device_id,user_id,sum(score) as score,min(login_time) as login_timefromwedw_tmp.login_info_6group by device_id,user_id;+------------+----------+--------+----------------------+--+| device_id | user_id | score | login_time |+------------+----------+--------+----------------------+--+| device01 | user01 | 200.0 | 2020-09-01 09:01:00 || device02 | user02 | 190.0 | 2020-09-01 09:03:00 || device02 | user03 | 190.0 | 2020-09-01 09:08:00 || device03 | user04 | 100.0 | 2020-09-02 09:12:00 |+------------+----------+--------+----------------------+--+
# 合并兩次結果數據 計算設備ID對應的guiddrop table if exists wedw_tmp.login_info_8;create table wedw_tmp.login_info_8select t2.device_id as device_id,t2.user_id as guidfrom( select t1.device_id ,t1.user_id ,t1.score ,t1.login_time ,row_number() over(partition by device_id order by t1.score desc) as rn from wedw_tmp.login_info_7 t1)t2where t2.rn = 1;+------------+---------+--+| device_id | guid |+------------+---------+--+| device01 | user01 || device02 | user02 || device03 | user04 |+------------+---------+--+
# 計算兩次結果數據合并后每個設備ID對應的用戶登錄信息drop table if exists wedw_tmp.login_info_9;create table wedw_tmp.login_info_9selectt1.device_id,collect_list(t1.login_info) login_infofrom( select device_id ,concat(user_id,'->',score,'->',login_time) as login_info from wedw_tmp.login_info_7) t1group by device_id;+------------+------------------------------------------------------------------------------+--+| device_id | collect_list(login_info) |+------------+------------------------------------------------------------------------------+--+| device03 | ["user04->100.0->2020-09-02 09:12:00"] || device02 | ["user02->190.0->2020-09-01 09:03:00","user03->190.0->2020-09-01 09:08:00"] || device01 | ["user01->200.0->2020-09-01 09:01:00"] |+------------+------------------------------------------------------------------------------+--+
# 將合并好的數據插入guid映射表insert overwrite table wedw_dw.device_guidselect t1.device_id as device_id,t2.login_info as login_info,t1.guid as guidfrom wedw_tmp.login_info_8 t1inner join wedw_tmp.login_info_9 t2on t1.device_id = t2.device_id;+------------+------------------------------------------------------------------------------+---------+--+| device_id | login_info | guid |+------------+------------------------------------------------------------------------------+---------+--+| device03 | ["user04->100.0->2020-09-02 09:12:00"] | user04 || device02 | ["user02->190.0->2020-09-01 09:03:00","user03->190.0->2020-09-01 09:08:00"] | user02 || device01 | ["user01->200.0->2020-09-01 09:01:00"] | user01 |+------------+------------------------------------------------------------------------------+---------+--+
“Mapping怎么使用”的內容就介紹到這里了,感謝大家的閱讀。如果想了解更多行業相關的知識可以關注億速云網站,小編將為大家輸出更多高質量的實用文章!
免責聲明:本站發布的內容(圖片、視頻和文字)以原創、轉載和分享為主,文章觀點不代表本網站立場,如果涉及侵權請聯系站長郵箱:is@yisu.com進行舉報,并提供相關證據,一經查實,將立刻刪除涉嫌侵權內容。