中文字幕av专区_日韩电影在线播放_精品国产精品久久一区免费式_av在线免费观看网站

溫馨提示×

溫馨提示×

您好,登錄后才能下訂單哦!

密碼登錄×
登錄注冊×
其他方式登錄
點擊 登錄注冊 即表示同意《億速云用戶服務條款》

gawk gsub函數的實際應用

發布時間:2021-08-30 15:07:24 來源:億速云 閱讀:143 作者:chen 欄目:建站服務器

本篇內容主要講解“gawk gsub函數的實際應用”,感興趣的朋友不妨來看看。本文介紹的方法操作簡單快捷,實用性強。下面就讓小編來帶大家學習“gawk gsub函數的實際應用”吧!

在做一個數據清洗需求的時候,需要查詢兩張表里幾個字段相同的重復數據。大概思路就是用exists語句,類似:
select *
  from a
 where exists (select 1
          from b
         where a.col1 = b.col1
           and a.col2 = a.col2);
但是這里麻煩的地方在于要匹配的列太多了:
a.INFOCODE,a.SOURCENAME,a.SOURCETYPE,a.PUBLISHTYPE,a.NOTICEDATE,a.ENDDATE,a.NOTICETITLE,a.LANGUAGE,a.IMPORTLEVEL,a.SOURCEURL,a.ATTACHTYPE,a.ATTACHNAME,a.ATTACHSIZE,a.FORM,a.ACCESSORYNUM,a.NOTICESTATE,a.PUBLISHDATE,a.FILENUMBER
用Linux文本處理的方法解決這個問題:
先將這段放到一個文本里:
root@bd-dev-mingshuo-183:/tmp#more 1
a.INFOCODE,a.SOURCENAME,a.SOURCETYPE,a.PUBLISHTYPE,a.NOTICEDATE,a.ENDDATE,a.NOTICETITLE,a.LANGUAGE,a.IMPORTLEVEL,a.SOURCEURL,a.ATTACHTYPE,a.ATTACHNAME,a.ATTACHSIZE,a.FORM,a.ACCESSORYNUM,
a.NOTICESTATE,a.PUBLISHDATE,a.FILENUMBER 這里介紹一下gawk里的gsub函數  
gsub匹配所有的符合正則表達式的內容,然后替換,相當于 sed 's//g'  
語法如下:
gsub(regular expression, subsitution string, target string);
處理的目標范圍是第三個字段,匹配條件是第一個參數,匹配后,替換為第二個參數。

將一行文本處理為多行文本:
root@bd-dev-mingshuo-183:/tmp#more 1|gawk 'gsub(/,/,"\n",$0)'
a.INFOCODE
a.SOURCENAME
a.SOURCETYPE
a.PUBLISHTYPE
a.NOTICEDATE
a.ENDDATE
a.NOTICETITLE
a.LANGUAGE
a.IMPORTLEVEL
a.SOURCEURL
a.ATTACHTYPE
a.ATTACHNAME
a.ATTACHSIZE
a.FORM
a.ACCESSORYNUM
a.NOTICESTATE
a.PUBLISHDATE
a.FILENUMBER 復制每一列:
root@bd-dev-mingshuo-183:/tmp#more 1|gawk 'gsub(/,/,"\n",$0)'|gawk -F'\n' '{print "on",$0,"=",$0,"and"}'
on a.INFOCODE = a.INFOCODE and
on a.SOURCENAME = a.SOURCENAME and
on a.SOURCETYPE = a.SOURCETYPE and
on a.PUBLISHTYPE = a.PUBLISHTYPE and
on a.NOTICEDATE = a.NOTICEDATE and
on a.ENDDATE = a.ENDDATE and
on a.NOTICETITLE = a.NOTICETITLE and
on a.LANGUAGE = a.LANGUAGE and
on a.IMPORTLEVEL = a.IMPORTLEVEL and
on a.SOURCEURL = a.SOURCEURL and
on a.ATTACHTYPE = a.ATTACHTYPE and
on a.ATTACHNAME = a.ATTACHNAME and
on a.ATTACHSIZE = a.ATTACHSIZE and
on a.FORM = a.FORM and
on a.ACCESSORYNUM = a.ACCESSORYNUM and
on a.NOTICESTATE = a.NOTICESTATE and
on a.PUBLISHDATE = a.PUBLISHDATE and
on a.FILENUMBER = a.FILENUMBER and

替換
root@bd-dev-mingshuo-183:/tmp#more 1|gawk 'gsub(/,/,"\n",$0)'|gawk -F'\n' '{print $0,"=",$0,"and"}'|sed 's/= a/= b/g'     
a.INFOCODE = b.INFOCODE and
a.SOURCENAME = b.SOURCENAME and
a.SOURCETYPE = b.SOURCETYPE and
a.PUBLISHTYPE = b.PUBLISHTYPE and
a.NOTICEDATE = b.NOTICEDATE and
a.ENDDATE = b.ENDDATE and
a.NOTICETITLE = b.NOTICETITLE and
a.LANGUAGE = b.LANGUAGE and
a.IMPORTLEVEL = b.IMPORTLEVEL and
a.SOURCEURL = b.SOURCEURL and
a.ATTACHTYPE = b.ATTACHTYPE and
a.ATTACHNAME = b.ATTACHNAME and
a.ATTACHSIZE = b.ATTACHSIZE and
a.FORM = b.FORM and
a.ACCESSORYNUM = b.ACCESSORYNUM and
a.NOTICESTATE = b.NOTICESTATE and
a.PUBLISHDATE = b.PUBLISHDATE and
a.FILENUMBER = b.FILENUMBER and

處理過程比較簡單,重點在于gawk里的gsub函數的應用,以及處理思路。

到此,相信大家對“gawk gsub函數的實際應用”有了更深的了解,不妨來實際操作一番吧!這里是億速云網站,更多相關內容可以進入相關頻道進行查詢,關注我們,繼續學習!

向AI問一下細節

免責聲明:本站發布的內容(圖片、視頻和文字)以原創、轉載和分享為主,文章觀點不代表本網站立場,如果涉及侵權請聯系站長郵箱:is@yisu.com進行舉報,并提供相關證據,一經查實,將立刻刪除涉嫌侵權內容。

AI

枝江市| 衡山县| 保康县| 大邑县| 安康市| 碌曲县| 长沙县| 辉南县| 竹北市| 米林县| 巴彦县| 仁怀市| 徐州市| 青神县| 阿巴嘎旗| 大宁县| 浠水县| 丰镇市| 将乐县| 益阳市| 弥勒县| 广河县| 新竹县| 大邑县| 马山县| 台前县| 烟台市| 呼伦贝尔市| 乌兰县| 郸城县| 红桥区| 聂荣县| 唐河县| 三河市| 凤山县| 浑源县| 贺州市| 陵川县| 晋城| 洞口县| 合水县|