Must know CLI-commands for Data Scientists
Learn the most important Linux CLI commands that help you become a more efficient and productive data scientist!
Intro
CLI stands for Command Line Interface. It is a program that accepts text input to execute operating system functions. Bash is a popular Unix shell to run CLI commands.
- Use
manto view the manual of a command. Example:man manshows the manual for themancommand. Use Space to skip to the next page and :q to quit. - Use
histto show the history of commands used - Use “Up Key” to show the last command
- Use “Tab Key” for auto-completion
- Use
>to redirect the output from screen to a file (e.g.ls > out.txtwrites the output of the ls command into the file out.txt) - Use
|to “pipe” the output from one command as input for the next command (e.g.head -n 10 info.txt | tail -n 2takes the first ten lines of info.txt and then shows the last 2 lines of those 10 lines) - Use
Ctrl + Cto stop a running program
Files & Directories
pwd (print working directory)
pwd prints the absolute path of your current working directory
$ pwd
/home/myuser
ls (list)
ls lists all contents of your current directory (the one displayed by pwd)
$ ls
data info.txt R
ls /homelists all contents of the directory /home (absolute directory)ls datalists all contents of the directory data (relative directory)ls -allists all contents of the current directory, including hidden files and showing additional informationls -Rlists everything underneath a directory
cd (change directory)
Use cd to change directory
$ pwd
/home/myuser
$ cd tmp
$ pwd
/home/myuser/tmp
- Use
cd ..to change to parent directory - Use
cd ./Rto change to directory R (from your current directory) - Use
cd ~to change to your home directory
cp (copy)
cp let you copy a file or directory
$ ls
data info.txt R
$ cp info.txt info2.txt
$ ls
data info.txt info2.txt R
You can copy multiple files into a directory too. In this example, we copy the files info.txt and info2.txt into the data directory.
$ ls
data info.txt info2.txt R
$ cp info.txt info2.txt data
$ ls data
info.txt info2.txt
mv (move)
mv let you move (and rename) a file or directory
To move file info2.txt into data directory:
$ ls
data info.txt info2.txt R
$ mv info2.txt data
$ ls
data info.txt R
To rename file info.txt to info3:
$ ls
data info.txt R
$ mv info.txt info3.txt
$ ls
data info3.txt R
To rename a directory:
$ ls
data info.txt R
$ mv data data2
$ ls
data2 info.txt R
rm (remove)
Use rm to remove (delete) a file or directory
To delete a single file:
$ ls
data info.txt info2.txt info3.txt R
$ rm info3.txt
$ ls
data info.txt info2.txt R
To delete multiple files:
$ ls
data info.txt info2.txt R
$ rm info.txt info2.txt
$ ls
data R
To delete a directory you must use rmdir (to prevent you accidentally deleting an entire directory)
rmdir (remove directory)
Use rmdir to remove a directory
$ ls
data data2 info.txt info2.txt R
$ rmdir data2
$ ls
data info.txt info2.txt R
mkdir (make directory)
Use mkdir to remove a directory
$ ls
data info.txt info2.txt R
$ mkdir data2
$ ls
data data2 info.txt info2.txt R
Wildcards
You can use wildcards to define which files to use.
*is used for “any string”?is used for “any single character”[...]is used for a list of single characters (e.g. [234] accepts the character 2,3, and 4){..., ...}is used for a list of patterns (e.g. {*.txt, *.csv} accepts all files with extensions txt and csv)
View content
cat (concatenate)
Use cat to view the content of a text-file:
$ ls
data info.txt info2.txt R
$ cat info.txt
This is written in info.txt
You can “concatenate” the content of multiple text-files:
$ ls
data info.txt info2.txt R
$ cat info.txt info2.txt
This is written in info.txt
This is written in info2.txt
head (view top lines)
Use head to view the top 10 lines of a text-file:
$ ls
data info.txt info2.txt info3.txt R
$ head info3.txt
Line 1 ...
Line 2 ...
Line 3 ...
Line 4 ...
Line 5 ...
Line 6 ...
Line 7 ...
Line 8 ...
Line 9 ...
Line10 ...
To control the number of lines:
$ head -n 3 info3.txt
Line 1 ...
Line 2 ...
Line 3 ...
tail (view bottom lines)
Use tail to view the bottom 10 lines of a text file. You can control the number of lines too:
$ tail -n 3 info3.txt
Line 18 ...
Line 19 ...
Line 20 ...
cut (view columns)
Use cut to view columns of a text file
$ cat data.csv
1, "Robert", 100
2, "Joe", 200
3, "Jack", 150
$ cut -d , f 2-3 data.csv
"Robert", 100
"Joe", 200
"Jack", 150
- -d = defining the delimiter (in this case we use “,”)
- -f = defining the fields (columns) to view (in this case columns 2-3)
Search content
grep (search text)
Use grep to search for text or patterns in text files
grep Robert data.csvwill print all lines containing the string Robert in data.csvgrep -c Robert data.csvwill search for the string Robert in data.csv and returns the count of matchninggrep -c -i Robert data.csvwill search for the string Robert in data.csv and returns the count of matchning, ignoring upper/lower casegrep -v Robert data.csvwill print all lines NOT containing the string Robert
wc (word count)
Use wc to count newlines, words and bytes
$ cat info.txt
hello world
$ wc info.txt
1 2 12 info.txt
sort & uniq (Sort & Unique)
sort
Use sort to sort lines of a text-file
sort data.csvsorts lines in data.csvsort -r data.csvsorts lines in data.csv and reverse ordersort -b data.csvsorts lines in data.csv and ignore leading blankssort -f data.csvsorts lines in data.csv case insensitivesort -n data.csvsorts lines in data.csv compare according to string numerical value
uniq
Use uniq to remove duplicate lines from a text-file
uniq data.csvremoves duplicate lines from a text-fileuniq -c data.csvremoves duplicate lines from a text-file and print count
Variables
Use echo to print the value of a variable
Environment variables:
echo $USERprints the user-nameecho $HOMEprints the home directoryecho $PWDprints the name of the present working directoryecho $SHELLprints the name of the shell
Create your own variables:
$ myname=Bill
$ echo $myname
Bill
You can use variables in commands:
$ lines=3
$ echo $lines
3
$ head -n $lines data.csv
1, "Robert", 100
2, "Joe", 200
3, "Jack", 150
Loops
$ for filetype in gif txt csv; do echo $filetype; done
gif
txt
csv