About diff
diff analyzes two files and prints the lines that are different. Essentially, it outputs a set of instructions for how to change one file to make it identical to the second file.
It does not actually change the files; however, it can optionally generate a script (with the -e option) for the program ed (or ex which can be used to apply the changes.
How diff Works
Let's say we have two files, file1.txt and file2.txt.
If file1.txt contains the following four lines of text:
I need to buy apples. I need to run the laundry. I need to wash the dog. I need to get the car detailed.
...and file2.txt contains these four lines:
I need to buy apples. I need to do the laundry. I need to wash the car. I need to get the dog detailed.
...then we can use diff to automatically display for us which lines differ between the two files with this command:
diff file1.txt file2.txt
...and the output will be:
2,4c2,4 < I need to run the laundry. < I need to wash the dog. < I need to get the car detailed. --- > I need to do the laundry. > I need to wash the car. > I need to get the dog detailed.
Let's take a look at what this output means. The important thing to remember is that when diff is describing these differences to you, it's doing so in a prescriptive context: it's telling you how to change the first file to make it match the second file.
The first line of the diff output will contain:
- line numbers corresponding to the first file,
- a letter (a for add, c for change, or d for delete), and
- line numbers corresponding to the second file.
In our output above, "2,4c2,4" means: "Lines 2 through 4 in the first file need to be changed to match lines 2 through 4 in the second file." It then tells us what those lines are in each file:
- Lines preceded by a < are lines from the first file;
- lines preceded by > are lines from the second file.
The three dashes ("---") merely separate the lines of file 1 and file 2.
Let's look at another example. Let's say our two files look like this:
file1.txt:
I need to go to the store. I need to buy some apples. When I get home, I'll wash the dog.
file2.txt:
I need to go to the store. I need to buy some apples. Oh yeah, I also need to buy grated cheese. When I get home, I'll wash the dog.
diff file1.txt file2.txt
Output:
2a3 > Oh yeah, I also need to buy grated cheese.
Here, the output is telling us "After line 2 in the first file, a line needs to be added: line 3 from the second file." It then shows us what that line is.
Now let's see what it looks like when diff tells us we need to delete a line.
file1:
I need to go to the store. I need to buy some apples. When I get home, I'll wash the dog. I promise.
file2:
I need to go to the store. I need to buy some apples. When I get home, I'll wash the dog.
Our command:
diff file1.txt file2.txt
The output:
4d3 < I promise.
Here, the output is telling us "You need to delete line 4 in the first file so that both files sync up at line 3." It then shows us the contents of the line that needs to be deleted.
Viewing diff Output In Context
The examples above show the default output of diff. It's intended to be read by a computer, not a human, so for human purposes, sometimes it helps to see the context of the changes.
GNU diff, which is the version most linux users will be using, offers two different ways to do this: "context mode" and "unified mode".
To view differences in context mode, use the -c option. For instance, let's say file1.txt and file2.txt contain the following:
file1.txt:
apples oranges kiwis carrots
file2.txt:
apples kiwis carrots grapefruits
Let's look at the contextual output for the diff of these two files. Our command is:
diff -c file1.txt file2.txt
And our output looks like this:
*** file1.txt 2014-08-21 17:58:29.764656635 -0400 --- file2.txt 2014-08-21 17:58:50.768989841 -0400 *************** *** 1,4 **** apples - oranges kiwis carrots --- 1,4 ---- apples kiwis carrots + grapefruits
The first two lines of this output show us information about our "from" file (file 1) and our "to" file (file 2). It lists the file name, modification date, and modification time of each of our files, one per line. The "from" file is indicated by "***", and the "to" file is indicated by "---".
The line "***************" is just a separator.
The next line has three asterisks ("***") followed by a line range from the first file (in this case lines 1 through 4, separated by a comma). Then four asterisks ("****").
Then it shows us the contents of those lines. If the line
 is unchanged, it's prefixed by two spaces. If the line is changed, 
however, it's prefixed by an indicative character and a space. The 
character meanings are as follows:
| character | meaning | 
|---|---|
| ! | Indicates that this line is part of a group of one or more lines that needs to change. There is a corresponding group of lines prefixed with "!" in the other file's context as well. | 
| + | Indicates a line in the second file that needs to be added to the first file. | 
| - | Indicates a line in the first file that needs to be deleted. | 
After the lines from the first file, there are three dashes ("---"), then a line range, then four dashes ("----"). This indicates the line range in the second file that will sync up with our changes in the first file.
If there is more than one section that needs to change, diff will show these sections one after the other. Lines from the first file will still be indicated with "***", and lines from the second file with "---".
Unified Mode
Unified mode (the -u option) is similar to context
 mode, but it doesn't display any redundant information. Here's an 
example, using the same input files as our last example:
file1.txt:
apples oranges kiwis carrots
file2.txt:
apples kiwis carrots grapefruits
Our command:
diff -u file1.txt file2.txt
The output:
--- file1.txt 2014-08-21 17:58:29.764656635 -0400 +++ file2.txt 2014-08-21 17:58:50.768989841 -0400 @@ -1,4 +1,4 @@ apples -oranges kiwis carrots +grapefruits
The output is similar to above, but as you can see, the differences are "unified" into one set.
Finding Differences In Directory Contents
diff can also compare directories by providing directory names instead of file names. See the Examples section.
Using diff To Create An Editing Script
The -e option tells diff to output a script, which can be used by the editing programs ed or ex, that contains a sequence of commands. The commands are a combination of c (change), a (add), and d (delete) which, when executed by the editor, will modify the contents of file1 (the first file specified on the diff command line) so that it matches the contents of file2 (the second file specified).
Let's say we have two files with the following contents:
file1.txt:
Once upon a time, there was a girl named Persephone. She had black hair. She loved her mother more than anything. She liked to sit outside in the sunshine with her cat, Daisy. She dreamed of being a painter when she grew up.
file2.txt
Once upon a time, there was a girl named Persephone. She had red hair. She loved chocolate chip cookies more than anything. She liked to sit outside in the sunshine with her cat, Daisy. She would look up into the clouds and dream of being a world-famous baker.
We can run the following command to analyze the two files with diff and produce a script to create a file identical to file2.txt from the contents of file1.txt:
diff -e file1.txt file2.txt
...and the output will look like this:
5c She would look up into the clouds and dream of being a world-famous baker. . 2,3c She had red hair. She loved chocolate chip cookies more than anything. .
Notice that the changes are listed in reverse order: the 
changes closer to the end of the file are listed first, and changes 
closer to the beginning of the file are listed last. This order is to 
preserve line numbering; if we made the changes at the beginning of the 
file first, that might change the line numbers later in the file. So the
 script starts at the end, and works backwards.
Here, the script is telling the editing program: "change line 5 to (the following line), and change lines 2 through 3 to (the following two lines)."
Next, we should save the script to a file. We can redirect the diff output to a file using the > operator, like this:
diff -e file1.txt file2.txt > my-ed-script.txt
This command will not display anything on the screen (unless there is an error); instead, the output is redirected to the file my-ed-script.txt. If my-ed-script.txt doesn't exist, it will be created; if it exists already, it will be overwritten.
If we now check the contents of my-ed-script.txt with the cat command...
cat my-ed-script.txt
...we will see the same script we saw displayed above.
There's still one thing missing, though: we need the script to tell ed to actually write the file. All that's missing from the script is the w command, which will write the changes. We can add this to our script by echoing the letter "w" and using the >> operator to add it to our file. (The >> operator is similar to the > operator. It redirects output to a file, but instead of overwriting the destination file, it appends to the end of the file.) The command looks like this:
echo "w" >> my-ed-script.txt
Now, we can check to see that our script has changed by running the cat command again:
cat my-ed-script.txt
5c She would look up into the clouds and dream of being a world-famous baker. . 2,3c She had red hair. She loved chocolate chip cookies more than anything. . w
Now our script, when issued to ed, will make the changes and write the changes to disk.
So how do we get ed do do this?
We can issue this script to ed with the following command, telling it to overwrite our original file. The dash ("-") tells ed to read from the standard input, and the <
 operator directs our script to that input. In essence, the system 
enters whatever is in our script as input to the editing program. The 
command looks like this:
ed - file1.txt < my-ed-script.txt
This command displays nothing, but if we look at the contents of our original file...
cat file1.txt
Once upon a time, there was a girl named Persephone. She had red hair. She loved chocolate chip cookies more than anything. She liked to sit outside in the sunshine with her cat, Daisy. She would look up into the clouds and dream of being a world-famous baker.
...we can see that file1.txt now matches file2.txt exactly.
Warning! In this example, ed overwrote the contents of our original file, file1.txt. After running the script, the original text of file1.txt disappears, so make sure you understand what you're doing before running these commands!
Commonly-Used diff Options
Here are some useful diff options to take note of:
| -b | Ignore any changes which only change the amount of whitespace (such as spaces or tabs). | 
| -w | Ignore whitespace entirely. | 
| -B | Ignore blank lines when calculating differences. | 
| -y | Display output in two columns. | 
These are only some of the most commonly-used diff options. What follows is a complete list of diff options and their function.
diff syntax
diff [OPTION]... FILES
Options
| --normal | Output a "normal" diff, which is the default. | ||||||
| -q, --brief | Produce output only when files differ. If there are no differences, output nothing. | ||||||
| -s, --report-identical-files | Report when two files are the same. | ||||||
| -c, -C NUM, --context[=NUM] | Provide NUM (default 3) lines of context. | ||||||
| -u, -U NUM, --unified[=NUM] | Provide NUM (default 3) lines of unified context. | ||||||
| -e, --ed | Output an ed script. | ||||||
| -n, --rcs | Output an RCS-format diff. | ||||||
| -y, --side by side | Format output in two columns. | ||||||
| -W, --width=NUM | Output at most NUM (default 130) print columns. | ||||||
| --left-column | Output only the left column of common lines. | ||||||
| --suppress-common-lines | Do not output lines common between the two files. | ||||||
| -p, --show-c-function | For files that contain C code, also show each C function change. | ||||||
| -F, --show-function-line=RE | Show the most recent line matching regular expression RE. | ||||||
| --label LABEL | When displaying output, use the label LABEL instead of the file name. This option can be issued more than once for multiple labels. | ||||||
| -t, --expand-tabs | Expand tabs to spaces in output. | ||||||
| -T, --initial-tab | Make tabs line up by prepending a tab if necessary. | ||||||
| --tabsize=NUM | Define a tab stop as NUM (default 8) columns. | ||||||
| --suppress-blank-empty | Suppress spaces or tabs before empty output lines. | ||||||
| -l, --paginate | Pass output through pr to paginate. | ||||||
| -r, --recursive | Recursively compare any subdirectories found. | ||||||
| -N, --new-file | If a specified file does not exist, perform the diff as if it is an empty file. | ||||||
| --unidirectional-new-file | Same as -n, but only applies to the first file. | ||||||
| --ignore-file-name-case | Ignore case when comparing file names. | ||||||
| --no-ignore-file-name-case | Consider case when comparing file names. | ||||||
| -x, --exclude=PAT | Exclude files that match file name pattern PAT. | ||||||
| -X, --exclude-from=FILE | Exclude files that match any file name pattern in file FILE. | ||||||
| -S, --starting-file=FILE | Start with file FILE when comparing directories. | ||||||
| --from-file=FILE1 | Compare FILE1 to all operands; FILE1 can be a directory. | ||||||
| --to-file=FILE2 | Compare all operands to FILE2; FILE2 can be a directory. | ||||||
| -i, --ignore-case | Ignore case differences in file contents. | ||||||
| -E, --ignore-tab-expansion | Ignore changes due to tab expansion. | ||||||
| -b, --ignore-space-change | Ignore changes in the amount of white space. | ||||||
| -w, --ignore-all-space | Ignore all white space. | ||||||
| -B, --ignore-blank-lines | Ignore changes whose lines are all blank. | ||||||
| -I, --ignore-matching-lines=RE | Ignore changes whose lines all match regular expression RE. | ||||||
| -a, --text | Treat all files as text. | ||||||
| --strip-trailing-cr | Strip trailing carriage return on input. | ||||||
| -D, --ifdef=NAME | Output merged file with "#ifdef NAME" diffs. | ||||||
| --GTYPE-group-format=GFMT | Format GTYPE input groups with GFMT. | ||||||
| --line-format=LFMT | Format all input lines with LFMT. | ||||||
| --LTYPE-line-format=LFMT | Format LTYPE input lines with LFMT. These format options provide fine-grained control over the output of diff, generalizing -D/--ifdef. LTYPE is old, new, or unchanged. GTYPE can be any of the LTYPE values, or the value changed. GFMT (but not LFMT) may contain: 
 | ||||||
| %[-][WIDTH][.[PREC]]{doxX}LETTER | printf-style spec for LETTER | 
LETTERs are as follows for new group, lower case for old group:
| F | First line number. | 
| L | Last line number, | 
| N | Number of lines = L - F + 1. | 
| E | F - 1 | 
| M | L + 1 | 
| %(A=B?T:E) | If A equals B then T else E. | 
LFMT (only) may contain:
| %L | Contents of line. | 
| %l | Contents of line, excluding any trailing newline. | 
| %[-][WIDTH][.[PREC]]{doxX}n | printf-style spec for input line number. | 
Both GFMT and LFMT may contain:
| %% | A literal %. | 
| %c'C' | The single character C. | 
| %c'\OOO' | The character with octal code OOO. | 
| C | The character C (other characters represent themselves). | 
| -d, --minimal | Try hard to find a smaller set of changes. | 
| --horizon-lines=NUM | Keep NUM lines of the common prefix and suffix. | 
| --speed-large-files | Assume large files and many scattered small changes. | 
| --help | Display a help message and exit. | 
| -v, --version | Output version information and exit. | 
FILES takes the form "FILE1 FILE2" or "DIR1 DIR2" or "DIR FILE..." or "FILE... DIR".
If the --from-file or --to-file options are given, there are no restrictions on FILE(s). If a FILE is a dash ("-"), diff reads from standard input.
Exit status is either 0 if inputs are the same, 1 if different, or 2 if diff encounters any trouble.
diff examples
Here's an example of using diff to examine the differences between two files side by side using the -y option, given the following input files:
file1.txt:
apples oranges kiwis carrots
file2.txt:
apples kiwis carrots grapefruits
diff -y file1.txt file2.txt
Output:
apples            apples
oranges         <
kiwis             kiwis
carrots           carrots
                > grapefruits
And as promised, here is an example of using diff to compare two directories:
diff dir1 dir2
Output:
Only in dir1: tab2.gif Only in dir1: tab3.gif Only in dir1: tab4.gif Only in dir1: tape.htm Only in dir1: tbernoul.htm Only in dir1: tconner.htm Only in dir1: tempbus.psd
 
No comments:
Post a Comment