10-29 1 views
工作中经常遇到需要根据作息去做排班系统、跑定时报表任务、统一告警通道、巡检等等场景,之前是比较粗暴的区分周一~周五、周六和周日这样,遇到法定节假日时再手工调整一下
不是自己去调感觉还好,真落到自己头上来整,搞个两次就不想弄了
因此自己就满世界找现成的接口去搞,有点遗憾没找到,这才想着从源头去搞。还好几句爬虫,再上点正则
源代码放在了这里 https://github.com/itnotebooks/chinese-holiday
先看下目标网页内容及排版
https://www.gov.cn/zhengce/content/2022-12/08/content_5730844.htm
看了目标网页,整体思路如下:
1. 请求 BaseSearch url 查询指定年份的放假通知条目
2. 请求第1步查询到的放假通知页面的 URL
3. 分析页面,定位到 id = UCAP-CONTENT 的 div 容器,读取所有的 p 标签条目
4. 通过正则分析每个 p 标签的内容,判断是否为大写数字开头的序号,大写数字开头的序号为具体放假安排
5. 分析具体放假安排,取对应的年月日关键字;通过分析过往几年的通知内容,规律如下:
5.1 休息日还是工作日,会以以下两种文言描述
5.1.1 休息日:放假|补休|调休|公休
5.1.2 工作日:上班
5.2 具体的日期,会以以下三种文言描述:
5.2.1 [xxxx年]x月x日至[xxxx年][x月]x日
5.2.2 x月x日(星期x)、x月x日(星期x)
调用站内搜索接口获取具体年份安排页面的URL
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 |
//检索关键字 queryParams := map[string]interface{}{ "t": "zhengcelibrary_gw", "p": strconv.Itoa(page_index), "n": strconv.Itoa(10), "q": fmt.Sprintf("假期 %d", s.Year), "pcodeJiguan": "国办发明电", "puborg": "国务院办公厅", "filetype": "通知", "sort": "pubtime", } resp, err := http.Get(SearchUrl, queryParams) if err != nil { return nil, fmt.Errorf("SearchPageUrls 查询请求异常,err: %s", err.Error()) } if err := json.Unmarshal([]byte(resp), &response); err != nil { return nil, fmt.Errorf("SearchPageUrls 返回结果解析异常,err: %s", err.Error()) } if *response.Code != 200 { log.Printf("%s: %d: %s", SearchUrl, *response.Code, *response.Msg) return nil, nil } for _, item := range response.SearchVO.ListVO { if strings.Contains(*item.Title, strconv.Itoa(s.Year)) { urls.Add(*item.Url) } } page_index += 1 if page_index >= int(*response.SearchVO.TotalCount) { break } |
请求具体年份节假日安排公布页面,获取页面源代码,通过标签ID定位详细内容
1 2 3 4 5 6 |
r, err := http.Get(url, map[string]interface{}{}) if err != nil { return err } //定位到 id = UCAP-CONTENT 的 div 容器,读取所有的 p 标签条目 s.Container = soup.HTMLParse(r).Find("div", "id", "UCAP-CONTENT").FindAll("p") |
逐条内容分析与日期转换
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 |
for _, p := range s.Container { if p.Text() == "" { continue } //判断是否为大写数字开头的序号,大写数字开头的序号为具体放假安排 mRegex := regexp.MustCompile(`[一二三四五六七八九十]、(.+?):(.+)`) match := mRegex.FindStringSubmatch(p.FullText()) if len(match) <= 2 { continue } //分段处理,降低匹配复杂度 for _, str := range regexp.MustCompile("[,。;]").Split(match[2], -1) { if str == "" { continue } //获取休息日 rest := regexp.MustCompile(`(.+)(放假|补休|调休|公休)+(?:\d+天)?$`).FindStringSubmatch(str) if len(rest) > 2 { //解析具体日期 s.ExtractDates(match[1], rest[1], true) continue } //获取工作日 work := regexp.MustCompile(`(.+)上班$`).FindStringSubmatch(str) if len(work) > 1 { // 解析具体日期 s.ExtractDates(match[1], work[1], false) continue } } } |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 |
2023/06/28 16:14:12 [ 2023 ] ====> http://www.gov.cn/zhengce/content/2022-12/08/content_5730844.htm {"Year":2023,"Name":"元旦","Date":"2022-12-31T00:00:00+08:00","IsOffDay":true} {"Year":2023,"Name":"元旦","Date":"2023-01-02T00:00:00+08:00","IsOffDay":true} {"Year":2023,"Name":"元旦","Date":"2023-01-01T00:00:00+08:00","IsOffDay":true} {"Year":2023,"Name":"春节","Date":"2023-01-21T00:00:00+08:00","IsOffDay":true} {"Year":2023,"Name":"春节","Date":"2023-01-22T00:00:00+08:00","IsOffDay":true} {"Year":2023,"Name":"春节","Date":"2023-01-23T00:00:00+08:00","IsOffDay":true} {"Year":2023,"Name":"春节","Date":"2023-01-24T00:00:00+08:00","IsOffDay":true} {"Year":2023,"Name":"春节","Date":"2023-01-25T00:00:00+08:00","IsOffDay":true} {"Year":2023,"Name":"春节","Date":"2023-01-26T00:00:00+08:00","IsOffDay":true} {"Year":2023,"Name":"春节","Date":"2023-01-27T00:00:00+08:00","IsOffDay":true} {"Year":2023,"Name":"春节","Date":"2023-01-28T00:00:00+08:00","IsOffDay":false} {"Year":2023,"Name":"春节","Date":"2023-01-29T00:00:00+08:00","IsOffDay":false} {"Year":2023,"Name":"清明节","Date":"2023-04-05T00:00:00+08:00","IsOffDay":true} {"Year":2023,"Name":"劳动节","Date":"2023-04-29T00:00:00+08:00","IsOffDay":true} {"Year":2023,"Name":"劳动节","Date":"2023-05-03T00:00:00+08:00","IsOffDay":true} {"Year":2023,"Name":"劳动节","Date":"2023-04-30T00:00:00+08:00","IsOffDay":true} {"Year":2023,"Name":"劳动节","Date":"2023-05-01T00:00:00+08:00","IsOffDay":true} {"Year":2023,"Name":"劳动节","Date":"2023-05-02T00:00:00+08:00","IsOffDay":true} {"Year":2023,"Name":"劳动节","Date":"2023-04-23T00:00:00+08:00","IsOffDay":false} {"Year":2023,"Name":"劳动节","Date":"2023-05-06T00:00:00+08:00","IsOffDay":false} {"Year":2023,"Name":"端午节","Date":"2023-06-22T00:00:00+08:00","IsOffDay":true} {"Year":2023,"Name":"端午节","Date":"2023-06-23T00:00:00+08:00","IsOffDay":true} {"Year":2023,"Name":"端午节","Date":"2023-06-24T00:00:00+08:00","IsOffDay":true} {"Year":2023,"Name":"端午节","Date":"2023-06-25T00:00:00+08:00","IsOffDay":false} {"Year":2023,"Name":"中秋节、国庆节","Date":"2023-09-29T00:00:00+08:00","IsOffDay":true} {"Year":2023,"Name":"中秋节、国庆节","Date":"2023-10-06T00:00:00+08:00","IsOffDay":true} {"Year":2023,"Name":"中秋节、国庆节","Date":"2023-09-30T00:00:00+08:00","IsOffDay":true} {"Year":2023,"Name":"中秋节、国庆节","Date":"2023-10-01T00:00:00+08:00","IsOffDay":true} {"Year":2023,"Name":"中秋节、国庆节","Date":"2023-10-02T00:00:00+08:00","IsOffDay":true} {"Year":2023,"Name":"中秋节、国庆节","Date":"2023-10-03T00:00:00+08:00","IsOffDay":true} {"Year":2023,"Name":"中秋节、国庆节","Date":"2023-10-04T00:00:00+08:00","IsOffDay":true} {"Year":2023,"Name":"中秋节、国庆节","Date":"2023-10-05T00:00:00+08:00","IsOffDay":true} {"Year":2023,"Name":"中秋节、国庆节","Date":"2023-10-07T00:00:00+08:00","IsOffDay":false} {"Year":2023,"Name":"中秋节、国庆节","Date":"2023-10-08T00:00:00+08:00","IsOffDay":false} |
将数据落库,封装成接口请求效果展示
查询2023年元旦安排
查询2023-10-08日是否为休息日
查询2023-09-30日是否为休息日
结合巡检任务的自动化处理示例
源代码放在了这里 https://github.com/itnotebooks/chinese-holiday