This chapter covers the UNIX communication facilities for distributed
interprocess communication. Two APIs are introduced:
Different computers normally make use of networks and protocols to exchange information, as they do not share memory. This chapter describes two application program interfaces (APIs) to communication protocols: the socket interface and the Transport Layer Interface. The TCP/IP protocol suite is used as an example where appropriate.
The socket interface to networking was developed by
the University of California in Berkeley. It was introduced with 4.2BSD.
The basic idea of the socket interface was to make interprocess
communication similar to file I/O.
However, network protocols are more complex than conventional I/O devices.
Therefore interaction between user processes and network protocols
has to be more complex than interaction between user processes and conventional
I/O facilities. Normal I/O does not allow server code that awaits connections
passively and client code that forms connections actively. Also there was no
mechanism for sending datagrams to different destinations: the open()
system call always bounds a fixed file name to a descriptor.
The basic building block for communication is the socket, an endpoint of communication. A name may be bound to a socket. Sockets can have different types. A socket can be associated with one or more processes. A socket exists within a communication domain, an abstraction to bundle properties of processes communicating through sockets. Data normally can only be exchanged with sockets in the same domain.
The socket interface supports different communication protocols. The UNIX domain protocol (used for interprocess communication between processes on the same computer) and the Internet domain protocol (used for interprocess communication between processes on either the same or different computers) are described.
The term connection is used in this chapter for a communication
link between two processes.
The term association is used for an n-tuple
that completely specifies the two endpoints of communication that make up
a connection.
For connections which use the TCP/IP protocol suite this n-tuple is the
5-tuple
The typical client/server relationship, described in Chapter 2.2.2, is not symmetrical. Initiating a network connection requires that both programs know which role, client or server, they are to play. This is reflected in the design of the socket interface. Some system calls are only useful for the clients, while others are only useful for servers.
A socket has a specific type which influences its communication
properties. Processes are presumed to communicate only between sockets of the
same type. Four different types of sockets are available: stream sockets,
datagram sockets, raw sockets, and sequenced packet sockets.
A stream socket provides bi-directional, reliable, sequenced, unduplicated flow of data. Message boundaries are not visible. A datagram socket supports bi-directional flow of data. Record boundaries are visible. Messages may be duplicated, lost, or arrive in a different order than they were sent. A raw socket allows user processes to access the underlying communication protocols. Raw sockets are not intended to be used by normal applications and therefore not described further. A sequenced packet socket is similar to a stream socket, with the exception that record boundaries are preserved. Sequenced packet sockets are not supported by the Internet domain, therefore they are also not discussed further.
A socket is created with the
Sockets for the UNIX domain can also be created with the
system call. It creates two sockets simultaneously and places the two socket descriptors in the two elements of sd. This call is the socket equivalent of the pipe() system call. The first three parameters are equal to the ones in the socket() system call.
The local part of an association (local address, local port number
for the Internet domain, and local pathname for the UNIX domain)
has to be specified before a connection can be established. This is
normally done by the
struct sockaddr_in { short int sin_family; /* Address family (AF_INET) */ unsigned short int sin_port; /* Port number, network byte order */ struct in_addr sin_addr; /* Internet address, network byte order */ char sin_zero[8]; /* unused */ }; struct sockaddr_un { unsigned short sun_family; /* Address family (AF_UNIX) */ char sun_path[108]; /* pathname */ };There are still unspecified elements of the association: the foreign address and foreign port number for the Internet domain, and foreign pathname for the UNIX domain. These are filled in differently for connection-oriented and connectionless communication.
When a process finishes using a socket, it calls the normal close() system call. If the process was the last one using this socket, the socket is discarded.
Figure 26 gives an overview of system calls used for client/server interaction using a connection-oriented protocol.
is usually asymmetrical, with one process being the client and the
other the server. The server first binds to its well-known port number
as described above
and then passively ``listens'' on its socket with the
The client calls
Figure 27 gives an overview of system calls used for client/server interaction using a connectionless protocol.
Data can be exchanged in both directions with the new sendto() and recvfrom() system calls. For each datagram the destination has to be stated. If several messages are to be sent to the same destination, the connect() system call may be used to register the destination. Data is then sent to the registered destination with the normal write() and writev() system calls. Also, only datagrams from this address will be received by the socket.
For many applications the need arises to control the socket in more detail. The application might change a timeout parameter, or change the size of a buffer. The system call getsockopt() allows the application to request information about a specific socket, while the system call setsockopt() allows an application to set a socket option.
A process might need to determine the local address of a socket or the destination address to which a socket connects. This can be performed with the system calls getpeername() and getsocketname().
The gethostname() system call allows user processes to access the host name. The host name can be set by a privileged process with sethostname(). The host domain may be obtained with the getdomainname() and can be changed by privileged user processes with the setdomainname() system call.
A number of library functions are available that handle
Library Call | Purpose |
gethostbyname() | obtains network host entry for a host by its name |
gethostbyaddr() | obtains network host entry for a host by its address |
getnetbyname() | obtains network net entry by its name |
getnetbyaddr() | obtains network net entry by its address |
getprotobyname() | obtains protocol entry by its name |
getprotobynumber() | obtains protocol entry by its number |
getservbyname() | obtains service entry by its name |
getservbyproto() | obtains service entry by its port number |
In addition to these general protocol independent library functions
Internet domain specific functions are available.
Table 16 summarizes functions which convert data from and to
network byte order. Table 17 summarizes the
internet address manipulation functions.
Library Call | Purpose |
htonl(val) | convert 32-bit quantity from host to network byte order |
htons(val) | convert 16-bit quantity from host to network byte order |
ntohl(val) | convert 32-bit quantity from network to host byte order |
ntohs(val) | convert 16-bit quantity from network to host byte order |
Library Call | Purpose |
inet_addr(cp) | converts the Internet host address cp from dotted decimal notation into binary data in network byte order |
inet_network(cp) | extract the network number in network byte order from the address cp in dotted decimal notation |
inet_ntoa(in) | converts the Internet host address in given in network byte order to a string in dotted decimal notation |
inet_makeaddr(net, host) | makes an Internet host address in network byte order by combining the network number net with the local address host in network net, both in local host byte order. |
inet_lnaof(in) | returns the local host address part of the Internet address in. The local host address is returned in local host byte order |
inet_netof(in) | returns the network number part of the Internet Address in. The network number is returned in local host byte order |
Details about these functions can be found in the corresponding manual pages.
The BSD interprocess communication facilities have become the de facto standard
for UNIX, as they are available in nearly all UNIX derivatives. The socket
interface is nowadays also used on other operating systems, e.g Microsoft Windows.
Sockets were created to make interprocess communication look like standard
UNIX file I/O. The socket function calls are based on the client/server model.
Different ``domains'' exist which support local IPC or distributed IPC.
The socket interface allows different types of sockets for different purposes.
Table 18 summarizes which system calls change fields in an association. Figure 28 summarizes how data flows from one process over a socket descriptor via TCP, IP, Ethernet to another process.
The Transport Layer Interface (TLI) was introduced with Release 3 of
System V in 1986. The TLI is designed after the ISO Transport Service
Definition. It is based on System V STREAMS (the principles of System V
STREAMS are summarized in Chapter 9.2.2). Normal UNIX descriptors are used
to refer to network data. As the TLI is similar to
the socket interface, only the more important differences are stated.
The Transport Layer Interface is a collection of library routines,
not system calls. Therefore if the TLI is used, a special
library has to be linked to the program.
The function names and functionality of the TLI are similar
to the socket interface (with an additional t_).
The number of options for TLI functions is much higher than in
the socket interface. This allows
more flexibility, but prevents a detailed description in this document.
Figures 29 and 30 give
an overview of the function calls used for an iterative server.
Many of the data structures passed between the user code and the TLI
functions are structures. Most of these structures contain one or more
netbuf structures, each of which contains a pointer to a
buffer used for a special purpose. The functions
t_alloc() and t_free() simplify the dynamic allocation of these
structures.
The t_listen() function does not accept an incoming connection. This
has to be done with t_accept().
The t_accept() function call does not create a new
transport endpoint (the equivalent to a socket). For a new file
descriptor, needed by a concurrent server, the functions t_open(),
and t_bind() have to be called.
The functions t_sndudata() and t_rcvudata() correspond to
the socket system calls sendtto() and recvfrom().
The example programs distTLI/udpclient.c and distTLI/udpserver.c demonstrate how to use TLI with UDP. The UDP server is implemented as an iterative server. The use of TCP with TLI is demonstrated with the example programs distTLI/tcpclient.c and distTLI/tcpserver.c. As TCP connections might exist for a longer time the TCP server is implemented in a concurrent fashion.
Remote Procedure Calls (RPCs) are a higher level communication
abstraction which make use of the distributed UNIX communication
facilities described above.
As remote procedure calls are becoming more and more important,
and RPCs are a basic functionality in most modern operating systems, the
principles of RPCs are sketched.
The idea behind remote procedure calls is to hide the underlying
I/O (message passing...) communication for invoking a service on a
different computer.
The call of a function or procedure on a different computer
should be as easy and transparent as a local procedure call;
the programmer should not be concerned where a function is executed.
To archive that goal, special programs are used to create code that handles
the necessary message exchange: the client and the server
stub code.
The client stub code packs the arguments into
a message (called parameter marshalling),
sends the message,
waits for the result message, unpacks the result and returns the result to
the caller. The server stub code unpacks the parameters, calls the right
function, packs the result into a message and sends a message back.
Figure 31 (from [Tan92]) illustrates this.
RPC should be as transparent as possible. To achieve this goal the following issues have to be considered:
With remote procedure calls the development of distributed applications
is much easier.
Sun RPC
The Sun RPC protocol, defined in [RFC 1057] is common in the
UNIX environment.
The program rpcgen(1) is used to generate the appropriate
client and server C-stub code. XDR (eXternal Data Representation, defined in
[RFC 1057]) is used for the representation of data. UDP and TCP
can be used as network protocols. The portmapper daemon handles
all binding issues.
The principle of remote procedure calls is described in more detail in [Tan92, Chapter 10.3], while existing implementations are covered in [Stev90, Chapter 18]. A modified RPC example from [Stev90] is included in Appendix A.7. An even more powerful method for developing distributed applications is the use of the Arjuna system.
Distributed interprocess communication is far more complex than local
interprocess communication. Details about the underlying network
protocols have to be considered. The number of system and library
calls that are needed for distributed IPC is much higher than
for local IPC.
Two different APIs for distributed IPC have been described: the socket
interface and Transport Layer Interface. Sockets are widely used, although
TLI provides more flexibility.
A high-level use of networking facilities, remote procedure calls,
can make the development of distributed applications easier.
Distributed IPC is more general than local IPC, as it can be used on one computer, but also between distinct computers. Table 19 compares the function calls for some local and distributed IPC facilities used for connection-oriented communication. Chapter 7 compares the performance of local IPC and distributed IPC facilities.
Sockets | TLI | Messages | FIFOs | ||
Server | allocate space | t_alloc() | |||
create endpoint | socket() | t_open() | msgget() | mknod() | |
open() | |||||
bind address | bind() | t_bind() | |||
specify queue | listen() | ||||
wait for connection | accept() | t_listen() | |||
get new fd | t_open() | ||||
t_bind() | |||||
t_connect() | |||||
Client | allocate space | t_alloc() | |||
create endpoint | socket() | t_open() | msgget() | open() | |
bind address | bind() | t_bind() | |||
connect to server | connect() | t_connect() | |||
transfer data | read() | read() | msgrcv() | ||
write() | write() | msgsnd() | read() | ||
recv() | t_rcv() | write() | |||
send() | t_snd() | ||||
datagrams | recvfrom() | t_rcvudata() | |||
sendto() | t_sndudata() | ||||
terminate | close() | t_close() | msgctl() | close() | |
shutdown() | t_sndrel() | unlink() | |||
t_snddis() | |||||