Ubuntu for beginners part – 1

I personally like to work on Linux operating system(s)(OS). I’m using Ubuntu which is one of the famous Linux flavour operating system.See the list of Linux distributions

This post is specifically on Ubuntu OS,one of the Linux operating system for beginners

Let’s start with Ubuntu installation. There are 2 types of installation.

There is pre-requisite for Ubuntu installation is  you need bootable flashdrive or DVD. I prefer flashdrive.

Basic steps

  • You can download latest version of Ubuntu desktop OS  by clicking on this link. This is in ISO format
  • You need bootable flash drive or DVD. If  you are using windows then you can install software called Universal USB Installer (UUI) which will create bootable flashdrive for you.
  •  How to create the bootable flashdrive using UUI then follow this article

Once you have bootable flashdrive or DVD with you, It’s time to start installation.

Standalone installation

Virtual machine installation

  • Using oracle virtual box or VMware you can install Ubuntu and run it on windows.see this video

Dual-Boot installation

  • See this link which is useful when you are trying to install Ubuntu along with Windows OS

For windows user who don’t want to install Ubuntu and want to learn or play around with Linux based OS then they should try Cygwin

Ubuntu Basic

How to open terminal

  1. Open the Dash by clicking the Ubuntu icon in the upper-left, type “terminal”, and select the Terminal application from the results that appear.
  2. Hit the keyboard shortcut Ctrl – Alt + T .

On your terminal you can see user name. In my case it is jalaj.

selection_004

When you open terminal you are at system’s home location.Here ~ stands for system’s home path and your system’s home path will be  /home/yourusername

selection_007
~ stands for your system’s home location. In my location it is same as /home/jalaj

So /home/jalaj is same as  ~  (which is useful to understand basic commands)

To see you are currently at which location then use following command.

$ pwd

selection_009

when you open File explorer  or Folders you can see Home icon on your left side which shows you are at system’s home location

When you press Ctrl + l on title bar you can see the full path of current system location

selection_011                         selection_013

Basic Commands:

Make directory or folder

$ mkdir /home/yoursystemusername/test

or you can do it by following this

$ mkdir ~/test

see the following picture

selection_014

Change directory or jump from one location to another

$ cd /home/jalaj/test
$ cd ~/test

List down directories & files

$ ls  # This is for list down all directories
Command  with flags Description
ls -a list all files including hidden file starting with ‘.’
ls -d list directories – with ‘ */’
ls -i list file’s inode index number
ls -l list with long format – show permissions
ls -la list long format including hidden files
ls -lh list long format with readable file size
ls -ls list with long format with file size
ls -r list in reverse order
ls -R list recursively directory tree
ls -s list file size
ls -S sort by file size
ls -t sort by time & date
ls -X sort by extension name
# List directory using relative path
$ ls ~/test
# List directory using absolute path
$ ls /home/yoursystemusername/test 

# List root directory
$ ls / 

# List parent directory
$ ls ..
# List user's home directory means /home/yoursystemusername
$ ls ~ 

# List with long format
$ ls -l 

# Show hidden files
$ ls -a 

# List with long format and show hidden files
$ ls -la 

# Sort by date/time
$ ls -t 

# Sort by file size
$ ls -S 

# List all sub-directories
$ ls * 

# Recursive directory tree list
$ ls -R 

# List only text files with wildcard
$ ls *.txt 

# List directories only
$ ls -d */ 

# List files and directories with full path
$ ls -d $PWD/* 

# List files and directories with permissions in reverse order
$ ls -ltr 

Remove directory or files

# delete file
$ rm /home/jalaj/test/test.txt

# Forcefully delete write-protected file
$ rm -f /home/jalaj/test/test.txt

# If you are already in ~/test directory then
$ rm -f ./test.txt # Current directory is referred as ./

# If remove directory
$ rm /home/jalaj/test/demo

# remove directory recursively
$ rm -r /home/jalaj/test

# remove directory recursively and forcefully
$ rm -rf /home/jalaj/test

# Remove all files in the working directory.
# rm will prompt you for any reason before deleting them.
$ rm -i *

Copy files or directory

$ cp FLAG SOURCE DESTINATION
cp -a archive files
cp -f force copy by removing the destination file if needed
cp -i interactive – ask before overwrite
cp -l link files instead of copy
cp -L follow symbolic links
cp -n no file overwrite
cp -R recursive copy (including hidden files)
cp -u update – copy when source is newer than dest
cp -v verbose – print informative messages
# Copy single file main.c to destination directory bak
$ cp main.c bak

# Copy 2 files main.c and def.h to destination absolute path directory
$ cp main.c def.h /home/jalaj/test/ 

# Copy all C files in current directory to subdirectory bak
$ cp *.c bak

# Copy directory src to absolute path directory /home/jalaj/test/
$ cp src /home/jalaj/test/

# Copy all files and directories in dev recursively to subdirectory bak
$ cp -R dev bak

# Force file copy to directoy
$ cp -f test.txt bak

# Interactive prompt before file overwrite
$ cp -i test.c bak
cp: overwrite 'bak/test.c'? y

# Update all files in current directory
# - copy only newer files to destination directory bak
$ cp -u * bak

Move files or directory 

$ mv FLAG SOURCE DESTINATION
mv -f force move by overwriting destination file without prompt
mv -i interactive prompt before overwrite
mv -u update – move when source is newer than destination
mv -v verbose – print source and destination files
# Move main.c def.h files to /home/jalaj/test/ directory
$ mv main.c def.h /home/jalaj/test

# Move all C files in current directory to subdirectory bak
$ mv *.c bak

# Move all files in subdirectory bak to current directory
$ mv bak/* . 

# Rename file main.c to main.bak
$ mv main.c main.bak

# Rename directory bak to bak2:
$ mv bak bak2

# Update - move when main.c is newer:
$ mv -u main.c bak

# Move main.c and prompt before overwrite bak/main.c
$ mv -v main.c bak
'bak/main.c' - 'bak/main.c'

This helps.. Part – 2 is coming soon….!

Introduction to Data Science

Introduction 

Data science creates lot of buzz since past few years. There are so many questions come into our mind when we heard the term data science such as Why this field creates a lot of buzz , What kind of Data is needed for Data science,What are the important aspects of the data science , What are the applications of Data science, What are the techniques available to solve data science related problems, How can anybody can getting into the data science.etc… let’s check out all the questions related to data science.

What is Data Science?

Let’s go back in 1990, when world wide web is evolved , slowly and gradually people are using this powerful invention from last 25 years and making it batter.

The data volumes are exploding, more data has been created in the past two years than in the entire previous history of the human race on web.[1]

Now a days world wide web is the major resources of the data.People from all over the world using web everyday .This usage generate lot and lots of data. According to the report on EMC, In 2013 [2], we have 4.4 zettabytes (ZB) of data on web.This ZB of data contains historical data as well as real time data. Everyday we are interacting with data whether its social media, web search,news,blogs,videos,images,documents etc..

Now we have more then enough data which can be used for extracting knowledge out of it. After analysing the data by using proper scientific techniques we can find some of the hidden pattern or facts from the data which will lead us to solve existing unsolvable questions.

“This scientific way of analysing data or extracting knowledge out of data is called Data science.”

OR

“Data science is all about making sense out of the data or extracting the knowledge from the data using data science techniques.”

What kind of Data is needed for Data science?

There are three kind of data available on web,

Structured data

  • This kind of data is highly organised. Data is stored in table
  • A data model explicitly determines the structure of data.
  • This kind of data has relational key and they are stored in relational databases.
  • Examples: Student information database,Employee information database, etc..

Semi-structured data

  • Semi-structured data is a form of structured data but it is not completely similar to the structured data.
  • It contains tags ,other markers or key-value pairs to separate semantic elements and enforce hierarchies of records and fields within the data. Therefore, it is also known as self-describing structure.[3]
  • Examples :  XML, Json, CSV

Unstructured data

  • Majorly on web we find data which does not follow any structure.
  • This kind of data is not neatly fit in to the traditional relational databases.
  • Examples: Satellite images, Scientific data, Photos, Videos, Radar data, Mobile data, Text , web content,Social media etc…

Majorly semi-structure and unstructured data set is used for solving data science related problems. There are very small set of applications in which structured data can be used.

Why does data science field create a lot of buzz?

In current era, We have lot of data, cheap but efficient hardware, tools and techniques which emerging in last few years to solve the previously  unsolvable questions , these are the factors which create buzz around the  data science.

Aspects of the Data Science

Data science is umbrella term, this field contains many other fields in it.

Data science includes Statistics, Programming, Machine Learning, Natural Language Processing(NLP), Text Mining, Visualisation, Big Data, Data Ingestion, Data Munging, Tools for data science.

the-data-science-clock-v1-1-full1
Data Science Clock [C.1]

Data science techniques

Data science techniques majorly include statistics, Machine learning and  Deep Learning for solving problems like speech recognition, Image recognition, various NLP applications, etc..

Data science tool kit

Those who are coming from the technical background can use following tools

  • Scripting language  for rapid prototyping (Scala or Python)
  • R – Statistics programming tool
  • Hadoop framwork
  • Spark
  • Deep Learning libraries tenserflow, torch,  Deeplearning4j etc…
  • Node.js
  • Social media libraries
  • Basic Machine learning libraries

Those who are coming from the non-technical background can use following tools[4]

  • RapidMiner
  • DataRobot
  • BigML
  • Google cloud prediction API
  • H2O
  • Weka

Applications

  • Internet Search –  Ranking algorithms
  • Digital advertisement – Statistics techniques heavily  used
  • Recommend system -Machine learning techniques majorly used
  • Image recognition – Deep Neural Network /Deep and wide  Neural Network
  • Speech recognition – Deep Neural Network /Deep and wide  Neural Network/Linguistics techniques
  • Gaming – Machine learning / Deep Neural Network /Deep and wide  Neural Network
  • Credit risk modelling –  Statistics and Machine learning
  • Fraud detection – Statistics, Machine learning and graph theory
  • Social Media Intelligence – NLP, Sentiment analysis, Influence detection, etc..
  • Intelligent Chat bots – Statistics, Machine learning, NLP and deep learning
  • Self driving car -Rule based system
  • Robots – under research

From next post onward, I am going to start tutorial series for data science beginners.

This tutorial series includes

  • Ubuntu for beginners
  • Tool kit list and installation guide
  • Regular expression guide
  • Scraping of the data
  • Data cleaning / pre-processing
  • Basics of  statistics
  • Basics of Machine learning techniques
  • Apply machine learning techniques on pre-processed data
  • Basics of Deep learning

References:

[1] http://www.forbes.com/sites/bernardmarr/2015/09/30/big-data-20-mind-boggling-facts-everyone-must-read/#4460ae776c1d

[2] http://www.emc.com/leadership/digital-universe/2014iview/executive-summary.htm

[3] https://en.wikipedia.org/wiki/Semi-structured_data

[4] https://www.analyticsvidhya.com/blog/2016/05/19-data-science-tools-for-people-dont-understand-coding/

Copyrights:

[C.1] The Data Science Clock by Jamie Whitehorn is licensed under a Creative Commons Attribution 4.0 International License. see data-science clock
Permissions beyond the scope of this license may be available at http://www.exploringdatascience.com/about/copyright/.