Quantcast
Channel: Fast way of finding lines in one file that are not in another? - Stack Overflow
Viewing all articles
Browse latest Browse all 14

Fast way of finding lines in one file that are not in another?

$
0
0

I have two large files (sets of filenames). Roughly 30.000 lines in each file. I am trying to find a fast way of finding lines in file1 that are not present in file2.

For example, if this is file1:

line1line2line3

And this is file2:

line1line4line5

Then my result/output should be:

line2line3

This works:

grep -v -f file2 file1

But it is very, very slow when used on my large files.

I suspect there is a good way to do this using diff(), but the output should be just the lines, nothing else, and I cannot seem to find a switch for that.

Can anyone help me find a fast way of doing this, using bash and basic Linux binaries?

EDIT: To follow up on my own question, this is the best way I have found so far using diff():

diff file2 file1 | grep '^>' | sed 's/^>\ //'

Surely, there must be a better way?


Viewing all articles
Browse latest Browse all 14

Latest Images

Trending Articles





Latest Images