Unix Sort By Key Unexpected Behavior
Recently I needed to sort a file of entries by the date in each entry, where the date was a human-readable string across a few whitespace breaks. The format of each line was like this:
1 | ENTRY-0009 FAIL Thu Sep 5 17:39:24 2019 PASS 7 FAIL 6
|
But the most obvious sort command
sort entries.txt -k7 -k4M -k5 -k6
didn't produce results sorted in the expected order (2019, Sep [month], 5, 17:39:24). Let me show you after the break:
The full file to reproduce this is at the end.
Here's the abridged output of the above command:
12345678910111213141516171819202122232425262728293031323334353637 | $ sort file.txt -k7 -k4M -k5 -k6
ENTRY-0041 FAIL Sat Jul 13 16:13:35 2019 PASS 0 FAIL 1
ENTRY-0051 FAIL Sat Jul 13 20:31:09 2019 PASS 0 FAIL 1
ENTRY-0112 FAIL Mon Jul 15 07:50:46 2019 PASS 0 FAIL 1
ENTRY-0113 FAIL Mon Jul 15 07:59:11 2019 PASS 0 FAIL 1
ENTRY-0024 FAIL Mon Jul 15 08:04:43 2019 PASS 0 FAIL 1
ENTRY-0024 FAIL Mon Jul 15 08:12:14 2019 PASS 0 FAIL 1
ENTRY-0105 FAIL Tue Jul 16 07:30:09 2019 PASS 0 FAIL 1
ENTRY-0124 FAIL Tue Jul 16 07:36:08 2019 PASS 0 FAIL 1
ENTRY-0252 FAIL Sat Aug 10 10:47:59 2019 PASS 0 FAIL 1
ENTRY-0287 FAIL Fri Sep 27 17:10:00 2019 PASS 0 FAIL 1
ENTRY-0242 FAIL Fri Sep 27 17:12:00 2019 PASS 0 FAIL 1
ENTRY-0023 FAIL Sun Oct 27 12:13:11 2019 PASS 0 FAIL 1
ENTRY-0018 FAIL Thu Oct 17 15:22:08 2019 PASS 0 FAIL 2
ENTRY-0042 FAIL Sat Oct 26 12:42:20 2019 PASS 10 FAIL 2
ENTRY-0078 FAIL Sun Oct 27 08:01:55 2019 PASS 10 FAIL 2
ENTRY-0001 OK Wed Sep 4 20:37:36 2019 PASS 11 FAIL 0
ENTRY-0003 OK Wed Sep 4 20:57:44 2019 PASS 11 FAIL 0
ENTRY-0054 FAIL Sat Oct 26 13:58:55 2019 PASS 11 FAIL 1
ENTRY-0004 OK Thu Sep 5 17:46:48 2019 PASS 12 FAIL 0
ENTRY-0006 OK Thu Sep 5 17:49:08 2019 PASS 12 FAIL 0
ENTRY-0016 OK Fri Sep 6 16:42:10 2019 PASS 12 FAIL 0
ENTRY-0026 OK Fri Sep 6 16:44:48 2019 PASS 12 FAIL 0
ENTRY-0030 OK Fri Sep 6 16:55:26 2019 PASS 12 FAIL 0
ENTRY-0001 OK Thu Oct 17 15:07:28 2019 PASS 12 FAIL 0
...snip...
ENTRY-0047 OK Fri Oct 25 11:28:12 2019 PASS 12 FAIL 0
ENTRY-0060 OK Fri Oct 25 15:15:41 2019 PASS 12 FAIL 0
ENTRY-0003 OK Sat Oct 26 07:08:59 2019 PASS 12 FAIL 0
ENTRY-0059 OK Sat Oct 26 08:59:57 2019 PASS 12 FAIL 0
...snip...
ENTRY-0078 OK Sun Oct 27 08:03:14 2019 PASS 12 FAIL 0
ENTRY-0162 OK Wed Jul 17 09:02:36 2019 PASS 16 FAIL 0
ENTRY-0041 OK Sat Jul 13 16:14:24 2019 PASS 17 FAIL 0
ENTRY-0013 OK Sun Jul 14 10:39:49 2019 PASS 17 FAIL 0
...snip...
|
Clearly out of order!
Thankfully, the
--debug
flag was very illuminating.
123456 | $ sort file.txt -k7 -k4M -k5 -k6 --debug
Memory to be used for sorting: 8589934592
Using collate rules of en_US.UTF-8 locale
sort_method=heapsort
; k1=< 2019 PASS 12 FAIL 0>(20), k2=< 2019 PASS 0 FAIL 1>(19); ... snip ... cmp1=1
|
As you can see, though we intended with -k7
to refer to the 7th column (date), in fact sort
takes this as the "7th column and everything after."
The solution: restrict it to precisely the column width.
123456 | $ sort file.txt -k7.1,7.5 -k4M -k5 -k6 --debug
Memory to be used for sorting: 8589934592
Using collate rules of en_US.UTF-8 locale
sort_method=heapsort
; k1=< 2019>(5), k2=< 2019>(5); ...snip... cmp1=-1
|
1234567891011121314151617181920212223242526272829303132 | $ sort file.txt -k7.1,7.5 -k4M -k5 -k6
ENTRY-0001 OK Sat Jul 13 13:31:15 2019 PASS 17 FAIL 0
ENTRY-0041 FAIL Sat Jul 13 16:13:35 2019 PASS 0 FAIL 1
...
ENTRY-0024 FAIL Mon Jul 15 08:12:14 2019 PASS 0 FAIL 1
ENTRY-0105 FAIL Tue Jul 16 07:30:09 2019 PASS 0 FAIL 1
ENTRY-0105 OK Tue Jul 16 07:30:27 2019 PASS 17 FAIL 0
ENTRY-0124 FAIL Tue Jul 16 07:36:08 2019 PASS 0 FAIL 1
ENTRY-0102 OK Tue Jul 16 08:26:46 2019 PASS 17 FAIL 0
ENTRY-0162 OK Wed Jul 17 09:02:36 2019 PASS 16 FAIL 0
ENTRY-0204 OK Wed Jul 17 09:06:45 2019 PASS 17 FAIL 0
ENTRY-0252 FAIL Sat Aug 10 10:47:59 2019 PASS 0 FAIL 1
ENTRY-0260 OK Sat Aug 10 11:27:30 2019 PASS 17 FAIL 0
ENTRY-0222 OK Sat Aug 10 11:28:43 2019 PASS 17 FAIL 0
ENTRY-0001 OK Wed Sep 4 20:37:36 2019 PASS 11 FAIL 0
...snip...
ENTRY-0030 OK Fri Sep 6 16:55:26 2019 PASS 12 FAIL 0
ENTRY-0287 FAIL Fri Sep 27 17:10:00 2019 PASS 0 FAIL 1
ENTRY-0242 FAIL Fri Sep 27 17:12:00 2019 PASS 0 FAIL 1
ENTRY-0001 OK Thu Oct 17 15:07:28 2019 PASS 12 FAIL 0
ENTRY-0018 FAIL Thu Oct 17 15:22:08 2019 PASS 0 FAIL 2
ENTRY-0018 OK Thu Oct 17 15:22:22 2019 PASS 17 FAIL 0
ENTRY-0004 OK Fri Oct 18 10:25:05 2019 PASS 12 FAIL 0
...snip...
ENTRY-0047 OK Thu Oct 24 10:24:53 2019 PASS 12 FAIL 0
ENTRY-0043 OK Thu Oct 24 10:38:35 2019 PASS 17 FAIL 0
ENTRY-0047 OK Fri Oct 25 11:28:12 2019 PASS 12 FAIL 0
ENTRY-0060 OK Fri Oct 25 15:15:41 2019 PASS 12 FAIL 0
ENTRY-0003 OK Sat Oct 26 07:08:59 2019 PASS 12 FAIL 0
...snip...
ENTRY-0038 OK Sat Oct 26 14:06:41 2019 PASS 12 FAIL 0
|
Here's the file for you to play with:
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667 | $ cat file.txt
ENTRY-0001 OK Sat Jul 13 13:31:15 2019 PASS 17 FAIL 0
ENTRY-0041 FAIL Sat Jul 13 16:13:35 2019 PASS 0 FAIL 1
ENTRY-0041 OK Sat Jul 13 16:14:24 2019 PASS 17 FAIL 0
ENTRY-0051 FAIL Sat Jul 13 20:31:09 2019 PASS 0 FAIL 1
ENTRY-0013 OK Sun Jul 14 10:39:49 2019 PASS 17 FAIL 0
ENTRY-0019 OK Sun Jul 14 10:47:27 2019 PASS 17 FAIL 0
ENTRY-0112 FAIL Mon Jul 15 07:50:46 2019 PASS 0 FAIL 1
ENTRY-0114 OK Mon Jul 15 07:57:02 2019 PASS 17 FAIL 0
ENTRY-0113 FAIL Mon Jul 15 07:59:11 2019 PASS 0 FAIL 1
ENTRY-0110 OK Mon Jul 15 08:00:01 2019 PASS 17 FAIL 0
ENTRY-0024 FAIL Mon Jul 15 08:04:43 2019 PASS 0 FAIL 1
ENTRY-0035 OK Mon Jul 15 08:07:11 2019 PASS 17 FAIL 0
ENTRY-0028 OK Mon Jul 15 08:09:36 2019 PASS 17 FAIL 0
ENTRY-0024 FAIL Mon Jul 15 08:12:14 2019 PASS 0 FAIL 1
ENTRY-0105 FAIL Tue Jul 16 07:30:09 2019 PASS 0 FAIL 1
ENTRY-0105 OK Tue Jul 16 07:30:27 2019 PASS 17 FAIL 0
ENTRY-0124 FAIL Tue Jul 16 07:36:08 2019 PASS 0 FAIL 1
ENTRY-0102 OK Tue Jul 16 08:26:46 2019 PASS 17 FAIL 0
ENTRY-0162 OK Wed Jul 17 09:02:36 2019 PASS 16 FAIL 0
ENTRY-0204 OK Wed Jul 17 09:06:45 2019 PASS 17 FAIL 0
ENTRY-0252 FAIL Sat Aug 10 10:47:59 2019 PASS 0 FAIL 1
ENTRY-0260 OK Sat Aug 10 11:27:30 2019 PASS 17 FAIL 0
ENTRY-0222 OK Sat Aug 10 11:28:43 2019 PASS 17 FAIL 0
ENTRY-0001 OK Wed Sep 4 20:37:36 2019 PASS 11 FAIL 0
ENTRY-0003 OK Wed Sep 4 20:57:44 2019 PASS 11 FAIL 0
ENTRY-0009 FAIL Thu Sep 5 17:39:24 2019 PASS 7 FAIL 6
ENTRY-0010 FAIL Thu Sep 5 17:42:58 2019 PASS 7 FAIL 6
ENTRY-0004 OK Thu Sep 5 17:46:48 2019 PASS 12 FAIL 0
ENTRY-0006 OK Thu Sep 5 17:49:08 2019 PASS 12 FAIL 0
ENTRY-0016 OK Fri Sep 6 16:42:10 2019 PASS 12 FAIL 0
ENTRY-0026 OK Fri Sep 6 16:44:48 2019 PASS 12 FAIL 0
ENTRY-0030 OK Fri Sep 6 16:55:26 2019 PASS 12 FAIL 0
ENTRY-0287 FAIL Fri Sep 27 17:10:00 2019 PASS 0 FAIL 1
ENTRY-0242 FAIL Fri Sep 27 17:12:00 2019 PASS 0 FAIL 1
ENTRY-0001 OK Thu Oct 17 15:07:28 2019 PASS 12 FAIL 0
ENTRY-0018 FAIL Thu Oct 17 15:22:08 2019 PASS 0 FAIL 2
ENTRY-0018 OK Thu Oct 17 15:22:22 2019 PASS 17 FAIL 0
ENTRY-0004 OK Fri Oct 18 10:25:05 2019 PASS 12 FAIL 0
ENTRY-0011 OK Fri Oct 18 15:42:59 2019 PASS 17 FAIL 0
ENTRY-0004 OK Fri Oct 18 21:47:37 2019 PASS 12 FAIL 0
ENTRY-0013 OK Sat Oct 19 08:15:40 2019 PASS 12 FAIL 0
ENTRY-0011 OK Sat Oct 19 08:19:07 2019 PASS 12 FAIL 0
ENTRY-0007 OK Mon Oct 21 14:38:53 2019 PASS 12 FAIL 0
ENTRY-0022 OK Mon Oct 21 14:56:12 2019 PASS 12 FAIL 0
ENTRY-0008 OK Tue Oct 22 07:35:59 2019 PASS 12 FAIL 0
ENTRY-0009 OK Tue Oct 22 07:38:24 2019 PASS 12 FAIL 0
ENTRY-0052 OK Thu Oct 24 09:57:03 2019 PASS 12 FAIL 0
ENTRY-0047 FAIL Thu Oct 24 10:23:50 2019 PASS 8 FAIL 4
ENTRY-0047 OK Thu Oct 24 10:24:53 2019 PASS 12 FAIL 0
ENTRY-0043 OK Thu Oct 24 10:38:35 2019 PASS 17 FAIL 0
ENTRY-0047 OK Fri Oct 25 11:28:12 2019 PASS 12 FAIL 0
ENTRY-0060 OK Fri Oct 25 15:15:41 2019 PASS 12 FAIL 0
ENTRY-0003 OK Sat Oct 26 07:08:59 2019 PASS 12 FAIL 0
ENTRY-0059 OK Sat Oct 26 08:59:57 2019 PASS 12 FAIL 0
ENTRY-0058 FAIL Sat Oct 26 09:03:51 2019 PASS 8 FAIL 4
ENTRY-0069 OK Sat Oct 26 12:08:27 2019 PASS 12 FAIL 0
ENTRY-0054 OK Sat Oct 26 12:12:37 2019 PASS 12 FAIL 0
ENTRY-0042 FAIL Sat Oct 26 12:42:20 2019 PASS 10 FAIL 2
ENTRY-0055 OK Sat Oct 26 13:50:00 2019 PASS 12 FAIL 0
ENTRY-0053 OK Sat Oct 26 13:56:53 2019 PASS 12 FAIL 0
ENTRY-0054 FAIL Sat Oct 26 13:58:55 2019 PASS 11 FAIL 1
ENTRY-0038 OK Sat Oct 26 14:06:41 2019 PASS 12 FAIL 0
ENTRY-0078 FAIL Sun Oct 27 08:01:55 2019 PASS 10 FAIL 2
ENTRY-0078 OK Sun Oct 27 08:03:14 2019 PASS 12 FAIL 0
ENTRY-0023 FAIL Sun Oct 27 12:13:11 2019 PASS 0 FAIL 1
|
Ches Koblents
March 2, 2020