检索数据直到它匹配下一个正则表达式模式_python

我从服务器检索到错误日志数据，其格式如下：

文本文件：

2018-01-09 04:50:25,226 [18] INFO messages starts here line1 \n   
    line2 above error continued in next line  
2018-01-09 04:50:29,226 [18] ERROR messages starts here line1 \n  
    line2 above error continued in next line  
2018-01-09 05:50:29,226 [18] ERROR messages starts here line1 \n 
    line2 above error continued in next line

我需要检索错误/信息性消息以及日期时间戳。

已经在 python 中编写了下面的代码，如果错误消息只在一行中，它工作正常，但如果在多行中记录了相同的错误，它就不能正常工作（在这种情况下它只给出一行作为输出，但如果属于下一行，我也需要下一行到同样的错误）。

如果您提供任何解决方案/想法会有所帮助。

下面是我的代码：

 f = open('text.txt', 'r', encoding="Latin-1")
 import re    
 strr=re.findall(r'(\d{4}-\d{1,2}-\d{1,2}\s\d{1,2}:\d{1,2}:\d{1,2})(\,\d{1,3}\s\[\d{1,3}\]\s)(INFO|ERROR)(.*)$', f.read(), re.MULTILINE)
 print(strr)

上面的代码给出输出为：

[('2018-01-09 04:50:25',',226 [18]', 'INFO','消息从这里开始 line1'),('2018-01-09 04:50:29',' ,226 [18]', 'ERROR','messages 从这里开始 line1'), ('2018-01-09 05:50:25',',226 [18]', 'ERROR','messages 从这里开始 line1 ')]

正如我所期望的那样输出

[('2018-01-09 04:50:25',',226 [18]','INFO','消息从这里开始 line1 line2 以上错误在下一行继续'),('2018-01-09 04 :50:29',',226 [18]','ERROR','messages starts here line1 line2 above error continue in next line'),('2018-01-09 05:50:29',',226 [18]','ERROR','messages starts here line1 line2 above error continue in next line')]

正则表达式：

蟒蛇代码：

import re

matches = re.findall(r'(\d{4}(?:-\d{2}){2}\s\d{2}(?::\d{2}){2})(,\d+[^\]]+\])\s(INFO|ERROR)\s([\S\s]+?)(?=\r?\n\d{4}(?:-\d{2}){2}|$)', text)

输出：

[('2018-01-09 04:50:25', ',226 [18]', 'INFO', 'messages starts here line1\nline2 above error continued in next line'), ('2018-01-09 04:50:29', ',226 [18]', 'ERROR', 'messages starts here line1\nline2 above error continued in next line'), ('2018-01-09 05:50:29', ',226 [18]', 'ERROR', 'messages starts here line1\nline2 above error continued in next line')]

在您的正则表达式中添加 \\n ：

(\d{4}-\d{1,2}-\d{1,2}\s\d{1,2}:\d{1,2}:\d{1,2})(\,\d{1,3}\s\[\d{1,3}\]\s)(INFO|ERROR)(.*\n.*)

您可以使用先行表达式并搜索<date1> （包含）和<date2> （排除）结构之间的匹配项。 在您的情况下，每个日志记录都以<date>结构开头。 您还需要删除$因为它在re.MULTILINE情况下匹配新行。

编辑

你可以做得更好。 一旦找到<date>结构，就逐行运行。 开始收集新的日志记录，直到您观察到新的<date>结构。 连接与一条记录相关的日志行并执行regex 。 移至下一条记录。

这可能不像您希望的那么整洁，但是没有什么可以阻止您逐行检查并在进行过程中累积错误信息：

import re

example = '''2018-01-09 04:50:25,226 [18] INFO messages starts here line1
    line2 above error continued in next line
2018-01-09 04:50:29,226 [18] ERROR messages starts here line1
    line2 above error continued in next line
2018-01-09 05:50:29,226 [18] ERROR messages starts here line1
    line2 above error continued in next line  '''

output = []

for line in example.splitlines():
    match = re.match(r'(\d{4}-\d{1,2}-\d{1,2}\s\d{1,2}:\d{1,2}:\d{1,2})'
                     r'(\,\d{1,3}\s\[\d{1,3}\]\s)(INFO|ERROR)(.*)',
                     line, re.MULTILINE + re.VERBOSE)
    if match:
        output.append(list(match.groups()))
    # Check that output already exists - in case of headers
    elif output:
        output[-1].append(line)

这返回

[['2018-01-09 04:50:25', ',226 [18] ', 'INFO', ' messages starts here line1', '    line2 above error continued in next line'], ['2018-01-09 04:50:29', ',226 [18] ', 'ERROR', ' messages starts here line1', '    line2 above error continued in next line'], ['2018-01-09 05:50:29', ',226 [18] ', 'ERROR', ' messages starts here line1', '    line2 above error continued in next line  ']]

检索数据直到它匹配下一个正则表达式模式

问题描述

1楼

2楼

3楼

4楼