sed regex to match multiple fields and values, including quotes


Maguy IB

I have a (space-separated) input file with lines such as:

field1=value1 field2="value 2" field3='value 3' field4="value '4'" ...

The number of fields varies depending of the line. In order to process properly such file, I would ideally like to sed it and obtain some tabulated-separated output such as:

field1 (tab) value1 (tab) field2 (tab) value 2 (tab) field3 (tab) value 3 (tab) field4 (tab) value '4'

The furthest I have been so far is with something such as sed "s/\([a-z][a-z]*\)=\(['\"]\{0,1\}\)\(..*?\)\2/\t\1\t\3/g" but way too far from solving my problem. My difficulty is to handle properly the absence or presence of delimiters (quotes) to the values. For the sake of elegance (or geekness), I am sticking to sed, but would also consider an awk alternative.

Thanks in advance for any help,

Edit: I am shocked to say, but @Jotne is right.

echo "field1=value1 field2=\"value 2\" field3='value 3' field4=\"value '4'\"" | sed "s/\([a-z][a-z]*\)=\(\([^ ][^ ]*\)\|'\([^'][^']*\)'\|\"\([^\"][^\"]*\)\"\)/\1\t\3\4\5\t/g"

does not work:

field1=value1 field2="value 2" field3='value 3' field4="value '4'"

Though the following (the idea behind is to parse an audit.log file)

echo "type=USER_END msg=audit(1570385821.075:671): pid=32605 uid=0 auid=0 ses=399 msg='op=PAM:session_close acct="root" exe="/usr/sbin/cron" hostname=? addr=? terminal=cron res=success'" | sed "s/\([a-z][a-z]*\)=\(\([^ ][^ ]*\)\|'\([^'][^']*\)'\|\"\([^\"][^\"]*\)\"\)/\1\t\3\4\5\t/g"


type USER_END msg audit(1570385821.075:671): pid 32605 uid 0 auid 0 ses 399 msg op=PAM:session_close acct=root exe=/usr/sbin/cron hostname=? addr=? terminal=cron res=success


Continue reading...