AWS/GCS/Azure Basic Concepts and Pre-Requisite for Inerviews

Monday, June 10, 2019

My GutHub basics and cheat sheat

used for Source control management (SCM).

Following are the few things you need to know:

1. cloning - pulling down the copy of the source code from the remote repo

2. adding - once changes are made to the source code, it needs to be added to be committed.

3. committing - adds the change to source control repository

4. pushing - pushing changes from local repo to global repo

5. branching - separate group of changes are kept in separate branch, e.g. new branch for feature development.

6. merging - merge the branch with master branch

7. pull requests - implemented by github, to allow review before merge.

Installing Git:

sudo yum -y install git

git --version -> to verify installation

Once installed, set the name and email associated with your git commits.
git config --global user.name "<user name>"
git config --global user.email <user email>

Once configured, you need to set up private key access. Use ssh key pair to authenticate with remote git server.

generate a key using keygen command:
ssh-keygen -t rsa -b 4096

copy the contents of: ~/.ssh/id_rsa.pub
go to github.com and paste the content at settings->ssh and gpg keys section.
click new ssh key and enter the name and the content copied into the key field.

Basics of Git File System:

When you create a local repo, a folder .git is added to that directory

when you add a file to the directory, you need to use "git add" so as to let git know that git needs to track this file.

git add puts the file in the staging area within the directory. - > virtual location -> git uses to track changes to file.

within your directory, you will have lines of development, known as branches.
when you commit the added file, it gets stored in the branch.

Git Repository listing:
commit_editmsg - file - each commit to the repo is recorded here.
config - config information related to developer
description - name of repo
hooks- scripts that you create to enable some custom action - automated
index - files in staging area
logs - logs

to create a git repository:
git init "directory"

git init --bare "directory"
initializes a bare git repository, for larger projects, contains no working area
used when you have many contributors.

git config
command used to configure various elements of your git environment
added to track who made changes to the file

git config --global user.name "kunal"
git config --global user.email "kunal@gmail.com"

view your configuration:
git config --list

you can even configure the above two config per repo as well as global. git config will pick up the last value stored in its config file. It stores them as key value pair.

git add "filename.txt"
- now tracked by git, in the staging area

git status
- to see what files are in the staging area

git does not care of directories, only files.

touch src/.keep -> another way to add directory to git staging area
add keep to staging area

git rm
-remove the file from git
git rm -f
- forcefully remove

git status -s
- less verbose

A - added
M- modified - if you modify the file which is just added. when you commit, the M part will not be committed.

git status -v
- verbose output

git commit
- open the commits in text pad, type in the message there
all commit messages are recorded in commit_editmsg file

git commit -m "message"

git rm --cached "filename"
-removes the committed file from git, but keeps it in the local working directory

git commit -a -m "filename"
-adds the file modified in the staging area

git - ignoring files -> .git/info/exclude

.gitignore file -> used to ignore patterns of file local to git repo
git add .gitignore
git commit -m .gitignore

git check-ignore <pattern>
- shows what is ignored by git, based on your pattern

tag
acts like a stocky note. Often times, marked with version number for a release

annotated tag
git tag -a <tag-name> -m message
-marked for release or referenced in automation
contains full objects and database of your commits

lightweight tag
git tag <tag-name> -m <message>
is meant for lightweight label

Branch
git branch <branch-name>
master is default

git checkout <branch name>
switch to another branch

to see commits in just one line
git log --oneline --decorate

git merge
combines the latest commits from two branches in one branch

git branch -d branch_name
delete the specified branch

git rebase <branch>
-replay changes made to one branch over the top of another branch.
It simply copies over the changes

reverting a commit:
git revert <commit>
-revert a commit in project

git revert head -> revert the previous commit

git log --oneline
git revert head~2 for going two level down from head
or simply copy the commit id to revert

git diff
view the differences between two commits, files, blobs or between the working tree and staging area

git diff --summary commit1 commit2

git gc
cleans out old objects that can not be references by the db anymore and compresses the content within .git directory to save disk space

git gc --prune
cleans out the object that are older than two weeks

git gc --auto
to check if repo needs cleaning

Git log
git log --oneline

git clone <local-repo> <new-repo>
- to avoid messing up prod copy

cloning remote repositories
git clone <repo-url>
the remote project will be downloaded to our system

forking - as simple as making a copy of that repo

git pull git fetch
fetches new commit information down from the remote server for the current repo, does not commit anything to local db.

-> tells if you are up-to-date with the remote repo and merges as well

-> must do fetch or pull before committing changes to remote repo.

git push -u <remote> local
pushes local changes to remote repository

pull request
Now that you have your contributions to a project pushed to the remote repository, you will need to create a pull request to get it incorporated

Tuesday, November 13, 2018

Hard Link vs Soft Link

Q.) Soft Link vs Hard Link

Soft Link:

Soft link is the link between files. It is more like shortcut in windows.

You delete a soft link and it does not effect the actual file or directory it is pointing to.

Inode of the linked file is different from inode of the original file.

Deleting original files makes the symlink dangling

Soft Link can link both the files as well as directories and can span across filesystems as well.

Diplayed in Console:
lrwxrwxrwx 12 12 root abc.txt->def.txt

How to create it:

ln -s <Source> <LinkName>

Hard Link:

Both hard link and the actual file share the same inode.

If source file deleted, then hard link still exists.

can not span across different filesystems

Can only link files, not directories.

How to create it:

ln <Source> <Destination>

Monday, September 3, 2018

What happens when you type www.amazon.com in browser?

1. you type www.amazon.com into the address bar or the browser

2. Your browser checks the cache for a DNS record to find the corresponding ip address.
DNS is a database that maintains a list of website URL and their corresponding ip address it links to.

In order to find the DNS record, the browser checks four caches:

2.1 Browser cache : browser maintains the list of dns records for a fixed duration for websites you have previously visited

2.2 OS Cache: after browser it asks the OS cache for the dns record, by making a call to OS, (gethostname on windows)

2.3 Router cache: browser then communicates with the router to get the dns record

2.4 ISP Cache:

3. ISPs DNS server initiates a DNS query to find the IP address of the server that hosts amazon.com
The purpose of the DNS query is to search multiple DNS server until it finds the correct IP address of the website. This type of search is called recursive search.
ISPs DNS server is called DNS recursor, whose main job is to find the correct IP address of the URL.
Root Domain = .
Top Level Domain = com, org
second level domain = amazon, google
third level domain = www, download,....

4. Browser initiates a TCP connection with the server
Most common protocol used by browser is TCP. Most of the HTTP connection are made using TCP.
It establishes a three way handshake.

4.1 Client sends a SYN packet to the server over the internet, asking if it is free for connection.

4.2 If the server has open ports that can accept new connection, then it will respond with ACK of the SYN packet, with SYN/ACK

4.3 The client will respond to the SYN/ACK packet by sending a ACK to the server.

Then a TCP connection is established for data transmission.

5. The browser sends an HTTP request to the web server.
The browser now sends a GET or POST request to the webserver, asking for the data or posting the form data respectively.
User-Agent header: info about browser
Accept Header: types of request it will accept
connection: keep the connection alive
Will also pass the cookie information

6. The server handles the request and sends back the response
The server contains a web server, which passes the request to the request handler to generate the response. The request handler is a program which reads the request and generates the response accordingly.

7. The server sends out an HTTP response
The server send out the response, along with status code, compression types (Content-encoding), how to cache the page (Cache-Control).any cookies to set, privacy information, etc.

HTTP Status Codes:

1xx - Informational Message only
2xx - Success messages
3xx - Redirects the client to another URL
4xx - error on the client's part
5xx - error on the server's part

8. Browser displays the HTML content

Sunday, September 2, 2018

How do you troubleshoot if you are not able to connect to a database

In Amazon RDS, following could be the reason why you might not be able to connect:

1. Your instance is still booting up and getting ready. It takes about 20 minutes to do so.

2. use the command "netstat -an | grep 3306" on the ec2 instance. if the status is syn_sent, then check for firewall rules on the instance or the security group

2.if you are trying to access rds instance from the internet, then ensure that it is created in public subnet

3. Incorrect authentication. From the instance, try accessing with wrong username and pasword,
it will say access denied for user.

4. DNS is not able to resolve the endpoint: it will say unknown mysql server host
ensure that the endpoint is correct, else check your dns. you can use the tools like nslookup or netconnect, nc.
nc -zv ipaddress port -> name or service not known
you can also use telnet to see if it is listening on that port or not

5. check to see if the rds db instance is healthy
number of connections
amount of cpu used or memory used

How would you ensure High availability of databases

Q.) High availability of databases

High available databases uses an architecture that continues to function normally even when there are hardware or software failures within the system.

They are different from traditional rdbms which are built on single server and are built on master/replica architecture to provide availability.

In the master/replica model, only the master is available for data updates, unless it fails, at which time a new replica takes over as master.

Another approach is that of masterless architecture that uses clustering, where a group of servers get combined, and any server can respond to read or write requests. Data is then replicated across all servers in the cluster, providing system redundancy and minimizing the possibility of downtime.

CAP Theorem: - Dr Eric Brewer
It is impossible for a distributed system to provide simultaneously:
Consistency: multiple values for the same piece of data do not occur

Availability: Operates fully

Partition Tolerant: Responds correctly to node and/or network failure

Configure RDS for high availability
1. Make it multi-AZ
2. failover instance, usually takes 60 - 120 sec.

Q.)How do you ensure uptime of your DB

1. Practise routine maintenance

2. Use management and monitoring tools

3. Make the systems more secure

4. Quality hardware

5. Plan carefully

6. Competent staff

7. Follow change management process

8. Estimating server capacity limits correctly

9. Redundancy of equipments - horizontal scaling

AWS FAQ - Servers and troubleshooting

Q.) Difference between application servers and web servers

Web, application, database server can all run on the same machine or can be distributed across physical machines.

Web server
Server on which your website is hosted. The server will have installed web servers such as apache, IIS.
Deals with HTTP(S) requests. They implement the HTTP specification and know how to handle HTTP request and response object/headers.

Application Server
Server on which your created application which are utilizing your database, web services, etc.
Can also support HTTP requests, but also other protocols, such as RMI/RPC

Other capabilities like load-balancing, clustering, session-failover, connection pooling etc. that used to be in the realm of application servers, are becoming available on web servers as well directly or through some third party products.

Q.) Things to check if your computer is running slow?

1. Too many startup programs

2.Your hard drive is full or nearing an end
No temp space left.

3. browser has too many add ons

4. Running too many programs at once

5. Your antivirus program could be running scans in the background too frequently

6.Disk Defragmentation
Defragmenting disk minimizes head travel, which reduces the time it takes to read files from and write to the disk.

Q.) Blue screen causes?
Fatal system error
A state where the OS can no longer operate safely
Usually hardware or driver related
Use system restore
Rollback/uninstall device driver
Check that there is enough free space left on the drive where windows is installed
Scan your system for virus
Return BIOS and Hardware Settings to default
Hardware that is not fit properly can also cause sudden death

Q.) Device Manager
Find all your hardware information in one place.
Extention of MS Mgmt Console, that provides a central and organized view of all the MS windows recognized hardware installed in the computer. Like HDD, Keyboard, USB Drive.

It can be used for:
changing hardware configuration
managing drivers
disabling and enabling hardware

It is like a master list of hardware that windows understand
It is the place where you go if the device is not working correctly.
Such as update a driver, disable the device.
Yellow exclamation point is when windows finds problem with the device.
If a device is disabled it will show red cross or black
It also has error codes if it is having conflict with the system resource

Q.) Virtual Memory
Shortage of RAM is compensated by space in hard disk drive
memory can run out if multiple programs run simultaneously
OS divides the memory into page files or swap files, that contain a fixed number of addresses. Each page is stored on the disk and when that page is needed, the OS copies it from disk to main memory and translates the virtual address into real address.

AWS FAQ - Firewall/ WAF

Q.1) What is a firewall? What is WAF? How is WAF different from iptables? Why might a WAF be a better solution?
Firewall: A software program that prevents unauthorized access to or from a private network. They are a tools that can be used to enhance the security of computers connected to a network.

It is a network security device that monitors incoming and outgoing network traffic and decides whether to allow or block specific traffic based on a defined set of security tools.

They scan data packets and make sure they don't contain anything malicious.

It can be hardware, software or both.

Packet Filtering Firewall
Only checks the sender and receiver ip address and the port number
Allowed addresses and ports are mentioned in the Access Control Lists.
already implemented in routers
does not check the data portion or the payload

Application/Proxy Firewall
Hides us from the attacker in internet
does not disclose our ip address
Checks the data packet payload as well. Hence, slower than the packet filtering firewall

Hybrid Firewall
combines packet filtering and application firewall in series

Web Application Firewall
Applies rules to HTTP conversation
allow or deny based on expected input
helps prevents sql injection
It is an appliance or plugin that sits between the organization's network and servers. Directing the access to and from the application and services. It monitors and filters out content that do not meet the advanced criteria of firewall. It is able to specifically montior and filter the contents of specific web application

A regular firewall typically looks at layer 3 and layer 4, such as ip address, port. For HTTP requests, once "allow tcp port 80" is cleared, it is not interested in what is passed through.
A WAF works at layer 7, concerning with security in terms of content of the HTTP request. It prevents attacks like cross site scripting, sql injection.
They shield the web server from the kind of manipulated and malicious requests that attackers use to compromise the web server.

Iptables is an extremely flexible firewall utility for linux operating system
Uses policy chain to allow or block traffic.

When a connection tries to establish itself, the iptables look for a macthing rule, if it doesn't, then it resorts to default action.

Iptables uses three different chains: input, output and forward
Input chain: used to control the behavior of incoming connection
Forward chain: This is used for incoming connections that are not delivered locally. e.g. Router
Output chain: This chain is used for outgoing connection

iptables --policy INPUT/OUTPUT/FORWARD ACCEPT/DROP/REJECT
Accept- allow the connection
drop - drop the connection and dont let receiver know
reject - drop the connection and let the receiver know

iptables -A = append to the rules
it starts from the top of the list and goes till end, until it find a matching rule

example to block all connection from the ip address;
iptables -A input -s 10.10.10.0 -j DROP

block ssh connection from any ipaddress
iptables -A input -p tcp -dport ssh -j drop

Use states when you want to allow two way communication but only one way connection

save the iptables config: sudo service iptables save

list the currently configured iptables rules; iptables -L

to clear all the rules: iptables -f

packet and byte information: iptables -v