Chapter 4 Tracking Changes

Make sure that you are still in the test_one directory. If you removed the repository then just create a new one.

4.1 Including a file in a repository

Now create a file called file_one.txt that contains some notes about machine learning. I’m going to use atom as my editor but please feel free to use any of the others. Note that this does not have to be the core.editor you set globally earlier.

> atom file_one.txt

Now that the editor is open we can type the text below into the file_one.txt file:

“According to Arthur Samuel (1959), machine learning gives computers the ability to learn without being explicitly programmed.”

After saving the file it will contain this single line, which we can see by running the following commands in the Git Shell:

> ls
# file_one.txt

To view the contents of the file use the following command:

> cat file_one.txt
# According to Arthur Samuel (1959), machine learning gives computers the ability to learn without being explicitly programmed.

Now if we check the status of our project, Git will inform us that it’s noticed the new file:

> git status
# On branch master
# 
# Initial commit
#
# Untracked files:
#   (use "git add <file>..." to include in what will be committed)
#
#     file_one.txt
# nothing added to commit but untracked files present (use "git add" to track)

The “untracked files” message means that there’s a file in the directory that Git isn’t keeping track of. We can tell Git to track a file using git add:

> git add file_one.txt

and then check that the right thing happened:

> git status
# On branch master
# 
# Initial commit
# 
# Changes to be committed:
#   (use "git rm --cached <file>..." to unstage)
# 
#   new file:   file_one.txt

Git now knows that it’s supposed to keep track of file_one.txt, but it hasn’t recorded these changes as a commit yet. To get it to do that, we need to run one more command:

> git commit -m "Start notes on file_one as a base"
# [master (root-commit) 339dcbf] Start notes on file_one as a base
#  1 file changed, 1 insertion(+)
#  create mode 100644 file_one.txt

When we run git commit, Git takes everything we have told it to save by using git add and stores a copy permanently inside the special .git directory. This permanent copy is called a commit (or revision) and its short identifier is 339dcbf (where your commit may have another identifier).

We use the -m flag (for “message”) to record a short, descriptive, and specific comment that will help us remember later on what we did and why. If we just run git commit without the -m option, Git will launch atom (or whatever other editor we configured as core.editor) so that we can write a longer message.

[Good commit messages][commit-messages] start with a brief (<50 characters) summary of changes made in the commit. If you want to go into more detail, add a blank line between the summary line and your additional notes.

Now if we run git status it gives us some useful information:

> git status
# On branch master
# nothing to commit, working tree clean

This essentially tells us that everything is up to date. If we want to know what we’ve done recently, we can ask Git to show us the project’s history using git log:

> git log

In my case this provides: ~ # commit 339dcbf7860f9f55fe306a1e47c07aa5fd47ccf1 # Author: Kevin Kotze kevinkotze@gmail.com # Date: Wed Jun 14 20:38:00 2017 +0200 # # Start notes on file_one as a base ~

git log lists all commits made to a repository in reverse chronological order. The listing for each commit includes the commit’s full identifier (which starts with the same characters as the short identifier printed by the git commit command earlier), the commit’s author, when it was created, and the log message Git was given when the commit was created.

4.2 Making and reviewing changes

To open and make a few changes to the file, I’m going to use atom as my editor once again:

> atom file_one.txt

After I’ve saved my change, I can take a look at the new text using the command:

> cat file_one.txt
# According to Arthur Samuel (1959), machine learning gives computers the ability to learn without being explicitly programmed. 
# Today, this area of research largely considers the construction of algorithms that learn from and make predictions about the underlying data.

where it is noted that I’ve added an additional line. If we look at the status now, it tells us that a file that it is aware of has been modified:

> git status
# On branch master
# Changes not staged for commit:
#   (use "git add <file>..." to update what will be committed)
#   (use "git checkout -- <file>..." to discard changes in working directory)
# 
#         modified:   file_one.txt
# 
# no changes added to commit (use "git add" and/or "git commit -a")

The last line is the key phrase as it tells us that “no changes added to commit”. Hence, we have changed this file, but we haven’t told Git we will want to save those changes (which we do with git add) nor have we saved them (which we do with git commit). So let’s do that now. It is good practice to always review our changes before saving them. We do this using git diff. This shows us the differences between the current state of the file and the most recently saved version:

> git diff
# diff --git a/file_one.txt b/file_one.txt
# index 4660794..3d2d2c7 100644
# --- a/file_one.txt
# +++ b/file_one.txt
# @@ -1 +1 @@
# According to Arthur Samuel (1959), machine learning gives computers the ability to learn without being explicitly programmed.
# +Today, this area of research # largely considers the construction of algorithms that learn from and make predictions about the underlying data.

The output is cryptic as it contains information that tells how to reconstruct one file given the other. If we break it down into pieces:

  1. The first line tells us that Git is producing output similar to the Unix diff command comparing the old and new versions of the file.
  2. The second line tells exactly which versions of the file Git is comparing, where 4660794 and 3d2d2c7 are unique computer-generated labels for those versions.
  3. The third and fourth lines once again show the name of the file being changed.
  4. The remaining lines are the most interesting, they show us the actual differences and the lines on which they occur. In particular, the + marker in the first column shows where we added a line.

After reviewing our change, we need to firstly add it before we can commit the change. Thereafter, we can review the status of the repository:

> git add file_one.txt
> git commit -m "Add a modern interpretation"
> git status

4.3 Staging areas

Git insists that we add files to the set we want to commit before actually committing anything. This allows us to commit our changes in stages and capture changes in logical portions rather than only large batches. For example, suppose that we have completed some data cleaning and we want to commit those changes, but we don’t want to commit the work we’re doing on the visualisation (which we haven’t finished yet). To allow for this, Git has a special staging area where it keeps track of things that have been added to the current change set but not yet committed. If you think of Git as taking snapshots of changes over the life of a project, git add specifies what will go in a snapshot (putting things in the staging area), and git commit then actually takes the snapshot, and makes a permanent record of it (as a commit). If you don’t have anything staged when you type git commit, Git will prompt you to use git commit -a or git commit --all, which is kind of like gathering everyone for the picture! Hence, it’s almost always better to explicitly add things to the staging area, because you might commit changes you forgot you made. Note that if you make liberal use of the git commit --all command you might find yourself searching for git undo commit more than you would like!

The Git Staging Area