pwd
3 Command Line and Git
3.1 Command Line
The command line is a direct way to type instructions for your computer to immediately execute. Comfort with some basics can help you more easily use R, Git, and any other language or software.
- If you are a Windows user, open GitBash. There are other options, but practicing in GitBash (or your Git application of choice) will help you learn Git later on.
- If you are a Mac user, open Terminal.
- If you are Linux user, open Terminal or your system’s shell application.
3.1.2 Vim
Vim is a text editor accessible through the command line. It is often the default text editor for Window’s GitBash, Apple’s Terminal, and Linux’s shell. There are other options (Nano, Emacs, Notepad++), and you are free to choose the one that works best for you. I will highlight the basics of Vim as that is the more universal option.
Open a read-only version of a file in Vim. If the file does not exist, Vim will create an empty file.
vim file.txt
To edit the file, type i
and begin editing. To exit without saving, esc + :q
. To exit with saving (write and quit), esc + :wq
.
3.1.3 Naming Conventions
Only use letters (a-z
, A-Z
), numbers (0-9
), underscores (_
), and hyphens (-
) in your file and folder names. This will make navigating your file system from the command line much smoother. It is also good practice to have a consistent naming system.
Name | Example |
---|---|
Dash case | my-file.txt |
CamelCase | myFile.txt , MyFile.txt |
Snake case | my_file.txt |
Flat case | myfile.txt |
UPPERCASE | MYFILE.txt |
3.1.4 Practice Exercises
- Open the command line on your computer. Print your working directory and list the files there.
- Navigate to where you want to have a folder with the work for this class (e.g.,
Documents
). - Create a folder called
temp
. Navigate inside thetemp
folder. - With one line, create two folders inside the temp folder called
temp1
andtemp2
. - Create a file called
test.txt
and save it totemp/temp1
. Challenge yourself by creating this file from the command line using Vim or another editor. Type"Hello world!"
and save the file. - Move
test.txt
fromtemp1
totemp2
. - Make a copy of
test.txt
calledtest-copy.txt
. Save it intemp1
. - Delete the file
test.txt
. - Confirm you are in the path from exercise 2. Now create the folder in which you will save all your class material. Call it whatever you would like.
3.2 Git
Git is a distributed version control system. It documents the complete history of a project, including an archive of changes and previous versions of files. Economists commonly use Git across sectors.
Git has several advantages over other formal and casual methods of version control. It is a free tool that centrally stores project files with a record of all changes, who made the changes, and why those changes were made. Each change of a file is documented, and once entered in the database, cannot be changed. The disadvantage is that it takes some time and practice to learn how to use Git. This handout describes Git and introduces the essential Git commands.
3.2.1 Setting Up Git
3.2.1.1 Installation
Git must be installed on your local machine, i.e., your personal computer. This section explains the most flexible way to use Git, which is through the command line (a direct place for you to type instructions to your computer). There are many GUIs (graphical user interfaces) available as well. Once you understand how to use Git on the command line, you should be able to learn any GUIs or other Git-related tools easily.
Mac computers should already have Git installed. You will access Git through the Terminal application (this application is how Mac users access the command line). Open Terminal and make sure you have Git installed by typing the below. It is fine if your version is different than mine.
git --version
git version 2.39.5 (Apple Git-154)
If it is not installed, you will get instructions on how to install it.
Machines with Windows operating systems require more setup. Git Bash allows you to use Git in the command line the same way a Mac or Linux user would. The below steps are copied from Matoso (2019).
- Download the Git installer.
- Execute the file with the default options in the “Select Components” page.
- The option “Windows Explorer integration > Context menu entries” makes it possible to right click on a folder and open Git Bash to that folder’s location.
- In “Adjusting your path environment” select “Use Git Bash only.”
- Select “Checkout as-is, commit Unix-style line endings.” This helps prevent compatibility issues due to different line endings with files from other operating systems.
- Click “Next” and then “Finish.”
You can also use another program like PowerShell or Command Prompt. Git Bash seems to be the preferred application.
See page 17 of Chacon and Straub (2020) for installation instructions for RPM-based and Debian-based distributions (Linux).
3.2.1.2 Configuration
Once Git is installed, there are several options to configure options available through the config
command. You will need to set your user name and email.
git config --global user.name "Your Name"
git config --global user.email youremail
The global
option means that this configuration is the same regardless of the project. You can always check your values.
git config user.name
git config user.email
GitHub recently changed its security so that you require a Personal Access Token (PAT) to use Git from the command line.
- Navigate to the settings of your GitHub account.
- On the left-hand side menu, scroll all the way to the bottom and click on “Developer Settings.”
- Click on “Personal access tokens.”
- Click on “Generate new token.” You may have to sign in with your GitHub credentials.
- Give the token a name, set the expiration for the appropriate amount of time, and select all scopes. Many of these are applicable to repositories that publish packages or software, but it is easier just to select all of them.
- Click on “Generate token.”
- Your token will appear on the screen. Copy it into a location where you can easily access it. This can be some note-taking software or a text file on your computer. Once you exit this screen, you will not be able to see this token again.
3.2.2 Characteristics of Git
- Instead of storing changes to files across versions, Git stores snapshots of the project over time. The below figures from Chacon and Straub (2020) provide a visualization of what saving changes would look like and how Git stores data. Effectively, a picture of the whole project is taken each time you “save” your project.
All the project files and the project’s history are stored locally on your computer. This minimizes reliance on connecting to an external server, aiding speed and stability. It also allows one to keep track of changes even without an internet connection.
Once “saved,” a project’s history cannot be changed. This gives Git integrity; You cannot accidentally corrupt or lose information without a warning or error message. Everything in a Git database is checksummed and given an SHA-1 hash, basically a long string of numbers and letters. Here is an example SHA-1 hash:
6c658d1e96acb313eed5e9d13d723275b6479d04
.
3.2.3 Repositories
A repository is a folder whose contents are under Git’s version control. You can turn any folder on your computer into a git repository. See page 245 of Chacon and Straub (2020) for instructions on this. In this class, we will talk about how to set up a Git repository on an external server (in our case, GitHub) and clone it on you local machine. Using a website like GitHub as a server for your Git-controlled repository has its advantages. The ability to control who can access your repository helps streamline collaborations. There are many interactive features on the website to help keep track of issues and changes for the project. Even without collaborators, using GitHub provides a natural backup for your own work.
Once the repository is setup on GitHub, you need to clone it to your local machine. Navigate to the directory where you want to store the repository and clone it there.1
git clone https://github.com/aziff/example.git
This essentially copies the entire repository, including all tracked changes, to your local machine. There is a hidden directory, .git
, that stores all the version control information. The presence of this directory is what makes a folder a repository tracked by Git. To clone a repostiory, you will usually need to enter your PAT.
3.2.4 Save and Track Changes
Each file in the repository can be tracked or untracked. If a file is tracked, then it is backed up in the server’s repository and changes can be recorded. If a file is untracked, then Git does not “know” about it and any changes made to it are not stored.
- Tracked Files. A tracked file can be in three states: unmodified, modified, or staged. If a file is unmodified, then the copy of the file on your local machine is identical to the last snapshot of the file in the repository. Once you make changes to the file, it is modified. The file becomes staged once the modifications are recorded to the repository. Figure 3 is from Chacon and Straub (2020) and visualizes these three states.
- Untracked Files. There are certain files that should remain untracked. Because Git takes a snapshot of the whole repository, keeping track of very large or complex files is burdensome. Generally speaking, do not track datasets, PDFs, images, auxiliary files, or Microsoft Office files (more details on this below).
3.2.5 Basic Git Workflow
Checking the status of your local repository is an important tool to help you navigate the Git workflow.
git status
If you have untracked files, they will be listed as such after typing git status
. To track both new files and update the tracking on modified files, you first need to add them.
git add file.txt
If you run git status
again, you will now see that there is a new file with “changes to be committed.’’ This means that the file is staged. You can also add all the contents of a folder.
git add dir1
Here are some shortcuts. The period (.
) stages every downstream change. The asterisk (*
) is a placeholder.
git add .
git add *.txt
Again, you can check the status of the repository to ensure you staged what you wanted. You can use the -short
(or -s
) flag if you want to view a more condensed output. To view more details on what specifically was changed in the unstaged file.
git status
git status -short
git status -s
git diff
You can modify a file further even once it is staged. You will just need to add the file again to keep track of the additional changes.
Once you have staged everything you want, the next step is to commit your changes. Any file that is staged will be included in the commit.
git commit
This will open a text editor with information about the commit, including what files were newly staged or modified. You can add a message with additional information before quitting the text editor. Exiting the text editor induces the commit to be created with the commit message. It is often convenient to write the commit message inline.
git commit -m "Initial commit."
Already, this commit provides a record of the current version of the repository. This is useful even without saving on GitHub. However, GitHub is useful for collaboration and to back-up your local machine. To “save” to GitHub, you will need to push
your commits.
git push
# Specifying the remote (origin) and the branch (master)
git push origin master
This push will be rejected if anyone else on the project has pushed work that you have not yet integrated into your files. It is thus good practice to update your files before changing anything.
git pull
This is good practice even if you are working independently across multiple machines (e.g., you work on your laptop and in the Econ cluster). To review, the workflow should be as follows.
- Update your local repository:
git pull
- Make your changes
- Stage your changes:
git add
- Commit your changes:
git commit -m "Commit message."
- Push your changes:
git push
The command git log
allows you to view the commit history of your repository. See Chacon and Straub (2020) for details on how to format the output of this. Another option is to view the repository on GitHub’s website.
git log
3.2.6 Deleting and Renaming Tracked Files
To delete tracked files, use git rm
. This removes the file from the repository and from the staging area. Once you commit this change, the file will no longer be tracked.
git rm results.txt
If the file has been modified or it is already staged, you need to add the force flag, -f
.
git rm -f results.txt
To remove files from the staging area without deleting the files entirely, use the cached flag. This is an issue that often arises, especially if you forgot to update your .gitignore
file (more on this below).
git rm --cached results.txt
If you change the location of a file or change its name, git will view this as deleting the original file and creating a new file with the new name. You can use git mv
to specify this directly. It is not strictly necessary, but is a convenience function that may be helpful when you want a specific commit message to go with changing a file’s location or name.
git mv original-name.txt new-name.txt
3.2.7 Should all files be tracked?
In short: no. Keeping track of every change for certain types of files is burdensome and can greatly impede on your ability to use git efficiently within a repository. There may also be auxiliary files that you never reference anyway, such as log or output files that are automatically generated. Here is a list of some file types that are generally a good idea to avoid tracking.2
- Operating system files:
Thumbs.db
,.DS_Store
- Application files:
.Rhistory
,.Rapp.history
,.RData
- Data files:
.xlsx
,.csv
,.dta
- Binary files:
.pdf
,.docx
,.pptx
, image files
You can create a file inside your repository to instruct git what kind of files should never be tracked. This file is called .gitignore
. The period at the beginning means that it is a hidden file (it will not show up in your file viewer unless you have set your options to view hidden files). To create a .gitignore file, navigate to your repository and open a new file with Vim.
vim .gitignore
In the .gitignore
file, blank lines and lines starting with #
are ignored. The asterisk (*
) is a place holder.
# Ignore all Excel files
*.xlsx
# Ignore all files in the directory named Data
Data/
# Track Codebook.xslx even though Excel files are ignored
!Codebook.xlsx
There are always exceptions to the above. Perhaps for your project, you want to have a codebook in Excel available to anyone who views the code. If the codebook will not change too much, then it is fine to track it. Another example is that you may want to track the images for the figures needed for your paper. If you use .jpg
or .png
, then you can view these images on GitHub itself. Even with these exceptions, it is good practice to maintain a .gitignore
file, inserting the exceptions or using the force flag (-f
) as needed.
When starting a project and setting up a repository, you can reference this list to populate your .gitignore
. Checking git status
frequently will help you keep the .gitignore
updated and useful.
3.2.8 Practice Exercises
- On GitHub, create your own (private and empty) repository for your assignments. Call it “Assignments-First-Last” with your first and last names.
- Clone this repository to the folder you created for this class.
- From the command line, create a
.gitignore
file. Make it so that your repository will ignore all.csv
and.xlsx
files. - From the command line, create a file called
README.md
. Write whatever you would like to describe the repository. If in doubt, write:This repository contains my assingments for the Summer 2022 R and workflow minicourse.
- Make it so these changes show up on GitHub (hint: three steps). Use
git status
for guidance. - Check that these changes show up on GitHub.
- Navigate to the repository’s settings on GitHub and add me as a collaborator (Settings \(>\) Collaborators \(>\) Add people). My GitHub username is
aziff
.
3.3 Troubleshooting Git
Generally, there are few things in Git that cannot be undone. But, when you make a mistake or run into trouble, be mindful that the fixes may actually be irreversible. When in doubt, make a defensive copy of your repository before troubleshooting.
3.3.1 “I committed before making all my changes, but I haven’t pushed yet.”
You can always make a new commit by staging (git add
) and committing (git commit
) the other changes you wanted to make. It is also possible to change the original commit. Make your changes and git add
. Then, add the amend
option. This will result in one commit, with the amended commit completely replacing the original commit. This is possible as long as the original commit was not pushed.
git commit -m "Initial commit."
git add forgotten-file
# Amend without changing the commit message
git commit --amend --no-edit
# Amend with an updated commit message
git commit --amend -m "Intial commit proofread for typos."
3.3.2 “I staged a file by accident, but I haven’t commited yet.”
Suppose you typed git add *
and accidentally staged a file you did not want to stage. The command git reset HEAD
will unstage the specified file. The changes made to this file will be saved. Be careful with this command! I suggest against using the hard
option.
git reset HEAD results.txt
git status
An alternative approach is to use the below to unstage the specified file. The below does the same using the more current restore
command.
git restore --staged results.txt
git status
3.3.3 “I want to discard my changes to a file since the last commit.”
Suppose you change some files the repository and you do not want to keep those changes. In other words, you want to revert some files back to the version in the most recent commit. The below command accomplishes this. This is another potentially dangerous command as it deletes work done locally. Do not use this command unless you are 100% sure you want to delete your changes. If the change was not committed, it will not be saved at all.
git checkout -- results.txt
An alternative approach is to use git restore
to revert the file back to the version of the last commit. As for git checkout
, proceed with caution as local changes will be overwritten.
git restore results.txt
3.4 Futher Reading
As for any software, there are plentiful resources online, including StackExchange or other forums, as well as AI assistants like ChatGPT for specific trouble shooting. The text book Chacon and Straub (2020) is excellent if you want to learn more about the details of Git commands and the underlying system. If you want to see the official help documentation for a command, you can access the manual from the command line by typing git help <verb>
. For example, this is how you would get the manual on add
.
git help add
Sometimes, you just want to check the available options.
git add -h
There are so many capabilities of Git. The above will help you in this course, but they are just the basics. The textbook Chacon and Straub (2020) goes through all what is possible in Git in an accessible format.
3.4.1 References
This method of cloning is through an HTTP protocol. If you are cloning a repository from GitHub to your local machine, this will be fine. If you want to clone a repository to a server or network share, you may need to use an SSH protocol instead. Follow the directions here for that.↩︎