dbms-notes: writing blocks to disk: NFS

Showing posts with label NFS. Show all posts

Configuring NFS on Ubuntu

How NFS works:

Typically, NFS allows a client machine (quark) to require transparent access to data stored on a server machine (dirak).
For this to take place successfully:

The server (dirak) runs NFS daemon processes (nfsd and mountd) in order to make its data available to clients.
The sysadmin determines what to make available, and exports names and parameters of directories to be shared, normally using the /etc/exports configuration file and the exportfs command.
The sysadmin configures the server (using hosts.deny, hosts.allow) so that it can recognize and approve validated clients.
The client machine requests access to exported data, typically by issuing a mount command.

Client quark mounts the /usr/home directory from host dirac on the local directory /home

# mount -t nfs dirac:/usr/home/ /home

To mount the remote directory:

mount connects to mountd daemon, running on dirac.
mountd checks whether quark has permission to mount /usr/home. If so, it returns a file handle.
When someone tries to access the file /home/jdoe/login.sh in quark, the kernel places an RPC call to nfsd on the NFS server (dirac):

rpc_call(file handle, file name, UID, GID) - User and Group IDs must be the same on both hosts.

If all goes well, users on the client machine can then view and interact with mounted filesystems on the server within the parameters permitted.

Client and server NFS functionality is implemented as kernel-level daemons that are started from user space at system boot.
These NFS daemons are normally started at boot time and register themselves with the portmapper, a service that manages the access to TCP ports of programs involved in remote procedure calls.

mountd - Runs on the NFS Server. Processes client's NFS requests.
nfsd (NFS daemon) - Runs on the NFS Server. Service the client's request.

Installing and Configuring NFS Server:

(Step 1): Check whether your kernel has NFS support compiled in. One way to do this is to query the kernel interface on the proc filesystem.

$ cat /proc/filesystems | grep nfs
nodev   nfs
nodev   nfs4
nodev   nfsd

-- If Kernel support for NFS is installed, you should see the lines above. 
-- If no results are displayed, you need to install NFS Server support:

$ sudo apt-get install portmap nfs-kernel-server

(Step 2): Configure NFS Server: define shared directories

Now you need to tell the NFS server which directories should be available for mounting, and which parameters should control client access to them.
You do this by exporting the files, that is, listing filesystems and access controls in the /etc/exports file.

# exports file for dirac. 
# Each line defines a directory and the hosts allowed to mount it

/home      quark.math.usm.edu(rw, sync)  proton.math.usm.edu(rw, sync)
/usr/TeX   *.math.usm.edu
/home/ftp  *(ro)

In the exports file above:

*.math.usm.edu -- matches all hosts in teh domain math.usm.edu
rw - allow read/write in the exported file. Disallowed by default.
sync - Reply to requests only after changes have been committed to stable storage.

(Step 3): export the shares.
After modifying /etc/exports, run the command

$ sudo exportfs -ra

(Step 4): Edit /etc/default/portmap to enable access to portmap from remote machines.
By default, portmap listens only for RPC calls coming from the loopback interface (127.0.0.1). For this,
(a) comment the "-i 127.0.0.1" entry in the file;
(b) restart portmap; and
(c) restart the NFS kernel server:

edit /etc/default/portmap
S sudo /etc/init.d/portmap restart
$ sudo /etc/init.d/nfs-kernel-server restart

Configuring NFS Clients

(Step 1): Install NSF Client

$ sudo apt-get intsall portmap nfs-common

(Step 2 - optional): Configure portmap to allow connections to the NFS server.

/etc/hosts.deny - list of hosts that are not allowed to access the system. Edit the file to block all clients. In this sense, only those that you explicitly authorize (in /etc/hosts.allow) will be able to connect the server.

portmap: ALL

/etc/hosts.allow - list of hosts authorized to access the server

portmap: <nfs Server IP address>

Mounting a remote filesystem manually:

From the client:

$ sudo mount dirac.math.usm.edu:/users/home /home

Configure auto mounting during startup:

You can set up automatic nfs mounting by including entries in /etc/fstab.
The /etc/fstab file is used to statically define the file systems that will be automatically mounted at boot time.
It contains a list of all available disks and disk partitions, and indicates how they are to be initialized into the overall system's file system
During machine startup, the mount program reads /etc/fstab file to determine which options should be used when mounting the specified device.

# device name   mount point     fs-type      options       dump-freq pass-num                                          
# servername:dir /mntpoint        nfs          rw,hard,intr   0         0

dirac:/users/home  /home  nfs  rw, hard, intr  0  0

Just like other /etc/fstab mounts, NFS mounts in /etc/fstab have 6 columns, listed in order as follows:

The filesystem to be mounted (dirac.math.usm.edu:/users/home/)
The mountpoint (/home)
The filesystem type (nfs)
The options (rw, hard, intr)
Frequency to be dumped (a backup method) (0)
Order in which to be fsck'ed at boot time. (0) - dont perform fsck.

Options:

rw - read/write
hard - share mounted so that if the server becomes unavailable, the program will wait until the server is available again.

See more details on man mount

Network File System (NFS) - Concepts

What is NFS

NFS is a platform independent remote file system technology created by SUN in the 1980s.
It is a client/server application that provides shared file storage for clients across a network.
It was designed to simplify the sharing of filesystems resources in a network of non-homogeneous machines.
It is implemented using the RPC protocol and the files are available through the network via a Virtual File System (VFS), an interface that runs on top of the TCP/IP layer.
Allows an application to access files on remote hosts in the same way it access local files.

NFS Servers: Computers that share files

During the late 1980s and 1990s, a common configuration was to configure a powerful workstation with lots of local disks and often without a graphical display to be a NFS Server.
"Thin," diskless workstations would then mount the remote file systems provided by the NFS Servers and transparently use them as if they were local files.

NFS Simplifies management:

Instead of duplicating common directories such as /usr/local on every system, NFS provides a single copy of the directory that is shared by all systems on the network.
Simplify backup procedures - Instead of setting up backup for the local contents of each workstation (of /home for exmaple), with NFS a sysadm needs to backup only the server's disks.

NFS Clients: Computers that access shared files

NFS uses a mixture of kernel support and user-space daemons on the client side.
Multiple clients can mount the same remote file system so that users can share files.
Mounting can be done at boot time. (i.e. /home could be a shared directory mounted by each client when user logs in).
An NFS client

(a) mounts a remore file system onto the client's local file system name space and
(b) provides an interface so that access to the files in the remote file system is done as if they were local files.

----
Goals of NFS design:

Compatibility:
Easy deployable:
Machine and OS independence:
Efficienty:

NSF Versions

Version 1: used only inside Sun Microsystems.
Version 2: Released in 1987 (RFC 1989)
Version 3: Released 1995
Version 4: Released 2000

NFS design: NFS Protocol, Server, Client

NFS Protocol

Uses Remote Procedure Call (RPC) mechanisms
RPCs are synchronous (client application blocks while waits for the server response)
NFS uses a stateless protocol (server do not keep track of past requests) - This simplify crash recovery. All that is needed to resubmit the last request.
In this way, the client cannot differentiate between a server that crashed and recovered and one that is just slow.

New File system interface

The original Unix file system interface was modified in order to implement NFS as an extension of the Unix file system.
NFS was built into the Unix kernel by separating generic file systems operations from specific implementations. With this the kernel can treat all filesystems and nodes in the same way and new file systems can be added to the kernel easily:

A Virtual File System (VFS) interface: defines the operations that can be done on a filesystem.
A Virtual node (vnode) interface: defines the operations that can be done on a file within a filesystem.

A vnode is a logical structure that abstracts whether a file or directory is implemented by a local or a remote file system. In this sense, applications had to "see" only the vnode interface and the actual location of the file (local or remote file system) is irrelevant for the application.
In addition, this interface allows a computer to transparently access locally different types of file systems (i.e. ext2, ext3, Reiserfs, msdos, proc, etc).

NFS Client
Uses a mounter program. The mounter:

takes a remote file system identification host:path;
sends RPC to host and asks for (1) a file handle for path and (2) server network address.
marks the mount point in the local file system as a remote file system associated with host address:path pair.

Diagram of NFS architecture

NFS Remote Procedure Calls
NFS client users RPCs to implement each file system operation.
Consider the user program code below:

fd <- OPEN ("f", READONLY)
READ (fd, buf, n)
CLOSE (fd)

An application opens file "f" sends a read request and close the file.
The file "f" is a remote file, but this information is irrelevant for the application.
The virtual file system holds a map with host address and file handles (dirfh) of all the mounted remote file systems.
The sequence of steps to obtain the file are listed below:

The Virtual File System finds that file "f" is on a remote file system, and passes the request to the NFS client.
The NFS client sends a lookup request (LOOKUP(dirth, "f") for the NFS Server, passing the file handler (dirth) for the remote file system and file name to be read.
The NFS server receives LOOKUP request, extracts the file system identifier and inode number from dirth, and asks the identified file system to look up the inode number in dirth and find the local directory inode information.
The NFS server searches the directory identified by the inode number for file "f".
If file is found, the server creates a handle for "f" and sends it back to the client.
The NFS client allocates the first unused entry in the program's file descriptor table, stores a reference to f's file handle in that entry, and returns the index for the entry (fd) to the user program.
Next, the user program calls READ(fd, buf, n).
The NFS client sends the RPC READ(fh,0,n).
The NFS server looks up the inode for fh, reads the data and send it in a reply message.
When the user program calls to close the file (CLOSE(fd)), the NFS client does not issue an RPC, since the program did not modify the file.

Configuring NFS on Ubuntu

References:
Russel Sandberg, David Goldberg, Steve Kleiman, Dan Walsh, and Bob Lyon. Design and Implementation of the Sun Network Filesystem . Proceedings of the Summer 1985 USENIX Conference, Portland OR, June 1985, pp. 119-130.
Saltzer, Jerome H. and M. Frans Kaashoek. 2009. Principles of computer system design.

TCP/IP Networking (I)

TCP/IP Architecture

Gateways

IP Addresses

TCP/IP Architecture

TCP/IP protocol has a four-layer structure linking an application to the physical network.
Each layer has its own independent data structures.
Conceptually, each layer is speaking directly to its counterpart on the other machine. In this sense, it is ignorant of what goes one after the data is sent.
For example, in the Application layer, a NFS Client talks to a NFS Server and knows only the details of the NFS protocol they both use.
As data packets are transported from the application to the physical network, each layer adds some control information in the form of a header.
Once the packet reaches its destination in the physical network, each layer reads and removes its corresponding header before passing the package up in the stack until it is received by the application.

This layer contains all application protocols (often providing user services) that use the Transport layer.
Examples of application protocols include FTP, HTTP, DNS, NFS, SMTP, Telnet
To send data, the application calls up a Transport layer protocol, such as TCP.
Application Layer protocols usually treat transport and lower layer protocols as "black boxes." In this sense, they assume a stable network connection exist across which to communicate.

TCP and UDP are the most importan protocols in this layer, delivering data between application and internet layers.
TCP provides reliable data delivery service with error detection and error correction. It delivers data received from IP to the correct application (identified by a port number).
UPD provides a connectionless delivery service.
When called by an application, TCP wraps the data into a TCP packet.
A TCP packet (also called TCP segment) contains a TCP header followed by the application data (including header).
TCP then hands the packet to IP.
TCP keeps track of what data belongs to what process.
It is also responsible for ensuring that the packets are delivered with the correct contents and put in the right order before handing them off to the receiving application.

The layer above the Network Access layer, and it provides the packet delivery service on which TCP/IP networks are built.
It provides a routing mechanism allowing for packets to be transmitted across one or more different networks.

The Internet Protocol (IP) runs in this layer and provides a way to transport datagrams across the network.
It is a connectionless protocol and does not provide error control, relying on protocols in the other layers to provide error detection and recovery.
Source and destination may be in the same or different networks.
The IP protocol performs the functions of (a) host addressing and identification, and (b) packet routing (transporting packets from source to destination).
After receiving a TCP packet, IP wraps it up and prepends an IP header, creating an IP datagram.
Moving the data down the stack, IP hands it off to the hardware driver, that runs in the Network Access Layer.

The IP layer has to figure out how to send the packet.
Destination on a different physical network ?

Then IP needs to find and send it to the appropriate gateway.

Destination on the local ethernet network ?

IP uses the Address Resolution Protocol (ARP) to determine what Ethernet card's MAC address is associated with the datagram IP address.

How does it work?

ARP broadcasts an ARP packet across the entire network asking which MAC address belongs to a particular IP address.
Although every machines gets this broadcast, only the one out there that matches will respond. This is then stored by the IP layer in its internal ARP table.

You can look at the ARP table at any time by running the command:

jdoe@quark:~$ arp -a
home (194.113.47.147) at 98:0:bd:bd:8c:d2 [ether] on eth0
jdoe@quark:~$

Protocols in this layer are designed to move packets (IP datagrams) between the internet layer interface of two different hosts on the same physical link.
The actual process of moving packets at this level is usually controlled by device drivers of the network cards, which must know the details of the underlying network in order to format the data appropriately.
At this level IP addresses are translated to physical addresses used by the network cards (i.e. Media Access Control (MAC) addresses)
The network access layer (also called link layer) can be represented by different kinds of physical connections: Ethernet, token-ring, fiber-optics, ISDN, RS-232, etc.

Network Interfaces

TCP/IP defines an abstract interface for hardware access.
The interface, offering a set of operations that is used to access all types of hardware, hides the implementation details of operations necessary to access each particular equipment. Each vendor is responsible for provinding a driver that translates the commands of the TCP/IP interface to those of the particular piece of hardware.
Each networking device has a corresponding interface in the kernel.
When configured, each physical device is assigned an interface name.
Each interface must also be assigned an IP address. Some interface names include:

Ethernet interfaces: eth0, eth1
PPP interfaces: ppp0, ppp1
FDDI interfaces: fddi0, fddi1

A computer having more than one logical or physical network interface is usually called a Multihomed host.

An Ethernet network works like a bus system, where a host may send packets (or frames) of up to 1,500 bytes to another host on the same Ethernet.
Hosts are identified by a six-byte address hardcoded into the firmware of its Ethernet network interface card (NIC).
Ethernet addresses are usually written as a sequence of two-digit hex numbers separated
by colons, as in aa:bb:cc:dd:ee:ff.

References:
Bautts, Tony, Terry Dawson and Gregor Prudy. 2005. Linux Network Administratos Guide
Hunt, Craig. 2002. TCP/IP Network Administration

Pages

Configuring NFS on Ubuntu

Network File System (NFS) - Concepts

TCP/IP Networking (I)