OSF's Distributed Computing Environment provides services and tools that support the creation, use, and maintenance of distributed applications in a heterogeneous computing environment. This chapter provides an overview of DCE, beginning with a section describing distributed computing and its benefits. The next section describes three distributed computing models -- client/server, RPC, and data sharing. The final section gives an overview of DCE itself, describing its technology components, the organization of a DCE environment, and the relationship between DCE and the underlying computing system.
By ``distributed computing'' we mean computing that involves the cooperation of two or more machines communicating over a network (see Figure 1-1). The machines participating in the system can range from personal computers to supercomputers; the network can connect machines in one building or on different continents.
Why is enabling this type of cooperative computing important? One reason is historical: computing resources that used to operate independently now need to work together. For example, consider an office that acquired personal workstations for individual use. After a while, there were many workstations in the office building, and the users recognized that it would be desirable to share data and resources among the individual computers. They accomplished this by connecting the workstations over a network.
A second reason is functional: if there is special-function hardware or software available over the network, then that functionality does not have to be duplicated on every computer system (or ``node'') that needs to access the special-purpose resource. For example, an organization could make a typesetting service available over the network, allowing users throughout the organization to submit their jobs to be typeset.
A third reason is economical: it may be more cost-effective to have many small computers working together than one large computer of equivalent power. In addition, having many units connected to a network is the more flexible configuration -- if more resources are needed, another unit can be added in place, rather than bringing the whole system down and replacing it with an upgraded one.
Finally, a distributed system can be more reliable and available than a centralized system. This is a result of the ability to replicate both data and functionality. For example, when a given file is copied on two different machines, then even if one machine is unavailable, the file can still be accessed on the other machine. Likewise, if several printers are attached to a network, then even if an administrator takes one printer offline for maintenance, users can still print their files using an alternate printer.
Distributed computing inherently brings with it not only potential advantages, but also new problems. Examples are keeping multiple copies of data consistent, and keeping the clocks on different machines in the system synchronized. A system that provides distributed computing support must address these new issues.
Given that, for one of the reasons previously mentioned or some other reason, an organization decides that it wants to acquire distributed computing capability, why is DCE in particular advantageous? Why would an organization with a network such as the one in Figure 1-1 benefit from using DCE to enable distributed computing? DCE's benefits can be categorized into its support of distributed applications, the integration of its components with each other, DCE's relationship to its platforms, its support for data sharing, and DCE's interaction with the world outside of DCE:
DCE provides a high-level, coherent environment for developing and running applications on a distributed system. The DCE components fall into two categories: tools for developing distributed applications, and services for running distributed applications. The tools, such as DCE Remote Procedure Call and DCE Threads, assist in the development of an application. The services, such as the DCE Directory Service, Security Service, and Distributed Time Service, provide the support required in a distributed system that is analogous to the support an operating system provides in a centralized system.
(It is possible to develop distributed applications with much less assistance than what DCE offers. Programmers can write applications that cooperate across machines by explicitly writing the code that performs the network communications between them, but this requires much time and expertise. Programmers can also write distributed applications using a communications tool, such as remote procedure call, while explicitly using other necessary technologies, like standalone name and security services. However, DCE provides a set of components necessary for distributed computing that are already integrated, and that do as much work as possible automatically for the application programmer, system administrator, and end user.)
A second benefit is the integration and comprehensiveness of the DCE components. Not only does DCE provide all the tools and services needed for developing and running distributed applications, but the DCE components themselves are well integrated. They use one another's services whenever possible, since many of the DCE components are themselves distributed applications. In addition to supporting the development of distributed applications, DCE includes services that address some of the new problems inherent in the distributed system itself, such as data consistency and clock synchronization. Finally, DCE includes management tools for administering all of the DCE services and many aspects of the distributed environment itself.
Another benefit of DCE is its orientation toward heterogeneous rather than homogeneous systems. One way to implement a distributed system is to use a single operating system that runs on all nodes participating in the distributed network. The DCE architecture, however, allows for different operating systems and hardware platforms. Using DCE, a process running on one computer can interoperate with a process on a second computer, even when the two computers have different hardware or operating systems. DCE can therefore accommodate a wider range of networks -- especially networks needing distributed computing for the historical reasons previously listed -- than a model that requires the same operating system running on every node. Applications that are built using DCE are portable to other hardware/operating system platforms that run DCE.
Another benefit is DCE's support of data sharing through its directory service and distributed file service. A user anywhere in the distributed system can share data by placing it in the namespace or in a file, whichever is appropriate for the application. The data is then accessible by authorized users throughout the system.
One final benefit of DCE is the way it interacts with the outside world. In addition to supporting cooperation within and between themselves, DCE systems can also interoperate with computing environments outside of DCE. In particular, the DCE Directory Service can interoperate with two standard, global directory services -- X.500 and Domain Name Service -- allowing users from within DCE to access information about the outside world. In this way, DCE participates in a global directory service. One benefit of such participation can be seen in DCE's distributed file system: it looks like one global file system, and users anywhere in the world can address the same file using the same global name.
This section gives some examples of computing environments that can profit from distributed computing capabilities. In general, any computing organization wishing to take advantage of the benefits of a distributed computing environment -- data and resource sharing, extensibility, availability, interoperability -- can benefit from using DCE. For example:
DCE is based on three distributed computing models -- client/server, remote procedure call, and data sharing. The client/server model is a way of organizing a distributed application. The remote procedure call model is a way of communicating between parts of a distributed application. The data sharing model is a way of handling data in a distributed system. The following subsections briefly describe each model.
A useful model for implementing distributed applications is the ``client/server'' model. In this model, the distributed application is divided into two parts, one part residing on each of the two computers that will be communicating during the distributed computation (see Figure 1-2).
The client side of the application is the part that resides on the node that initiates the distributed request and receives the benefit of the service (for example, a workstation that requests that a file be printed). The server side of the application is the part that resides on the node that receives and executes the distributed request (for example, the node with the printer). In this model, two different sets of code are produced -- one that runs as a client, the other as a server.
Figure 1-3 shows a workstation running the client side of a distributed print program, and a print server running the server side of the distributed program.
Note that the terms ``client'' and ``server'' can be seen as relative roles rather than as absolutes. For example, in executing the print request, the print server may in turn become a client in a distributed communication -- it may ask the file server to send it a copy of the file to be printed (see Figure 1-4).
The terms ``client'' and ``server'' are also used to refer to specific nodes. This can be confusing since a given node, or even a given process, can be acting in both the client and server role. Nevertheless, it is often convenient to use the term ``file server'' when referring to the node on which the server side of a distributed file system is running -- probably a machine that contains a lot of disk storage. Likewise, the ``directory server'' is a node that contains a database with names in it, and answers requests for access to those names. When clarification is needed, we use the term ``machine'' to indicate the node rather than the role. For example, in Figure 1-4, the print server, which runs on the print server machine, is acting as a client to the file server.
Note that it is possible for more than one server to run on a given node. For example, both a security server and a time server can run on the same machine. In this case, the given node is both the security server machine and the time server machine (see Figure 1-5).
In general, when referring to clients and servers as nodes, the server nodes are specialized -- they require software that is found only on that particular server (for example, the directory server); whereas client nodes are generalized -- client machines are typically configured with the capability to be many types of client (for example, a directory, file, and security service client). See Figure 1-6.
The reason client nodes are generalized is that the client code is usually relatively small compared to the code that implements a server, and typically many nodes need to be able to run the client side of an application; whereas only one or two nodes may be equipped to run the server side of an application.
One final distinction between client and server: the server is typically implemented as a continuous process (daemon); whereas the client is usually implemented as a library. In other words, the client side of an application consists of a call to a routine that executes (sending the request over the network and receiving the result) and then returns and goes on with whatever else it was doing; whereas the server side of an application is a dedicated process that runs continuously -- waiting for a request, executing it and returning the answer, then waiting for the next request, and so on. Figure 1-7 illustrates this distinction.
DCE is based on the client/server model. The DCE services are themselves examples of distributed programs with a client and server side. The basic communications mechanism used in DCE, remote procedure call, assumes the presence of a client and a server. Since DCE applications are built using remote procedure call, they are also based on the client/server model of distributed computation.
One way of implementing communications between the client and server sides of a distributed application is to use the procedure call model. In this model, the client makes what looks like a procedure call. The procedure call is translated into network communications by the underlying RPC mechanism. The server receives a request and executes the procedure, returning the results to the client. One of the DCE technology components, DCE RPC, is an implementation of this model. It is used by most of the other DCE technology components for their network communications. (See Section 3.2 of this manual for more information on remote procedure calls and DCE RPC.)
Some of the DCE services are based on the ``data sharing'' model, in which data is shared by distributing it throughout the system. Like RPC, data sharing assumes the existence of clients and servers. Data sharing, however, focuses on distributed data rather than distributed execution. In RPC, the client's procedure is executed on the server. In data sharing, the server's data is sent to the client. For example, if a client wants to access a file, a copy of the file is sent from the server to the client. The client then proceeds to access the file locally. Data sharing can be built on top of RPC, using RPC as the communications mechanism between the client and server, and as the means of transferring data.
Data sharing usually entails having multiple copies of the same data; for example, a master copy of a file on a file server, and a copy of the file on one or more client machines. As a result, copies of data may diverge -- a client may make changes to its copy that make the client's copy inconsistent with the copy on the server. Therefore, distributed services based on the data sharing model usually include mechanisms for keeping copies of data consistent.
In addition, services that implement data sharing must be able to synchronize multiple access to data. For example, two clients may each want to modify a given record in a database. The server that manages the database must either prevent them from making conflicting modifications, or decide which modification takes precedence.
Two DCE services are based on the data sharing model. The first is the Directory Service. The DCE directory service, CDS, maintains a cache on the client. This cache contains copies of data that users on the client have recently accessed. Subsequent access to the data can be made locally to the cache, rather than over the network to the server.
The DCE Distributed File Service is also based on the data sharing model. A DFS client maintains a cache of files that have recently been accessed by a user on the system. DFS servers distribute and revoke tokens, which represent a client's capability to perform operations on files. Through careful token management, the DFS server can ensure that its clients do not perform conflicting operations on shared files, and that they do not see inconsistent copies of the same file.
Data sharing, like RPC, enables users and programmers to communicate transparently in a distributed system.
OSF's Distributed Computing Environment is a layer between the operating system and network on the one hand, and the distributed application on the other. DCE provides the services that allow a distributed application to interact with a collection of possibly heterogeneous computers, operating systems, and networks as if they were a single system. Figure 1-8 shows DCE in relation to operating systems, network communications software, and applications software.
Several technology components work together to implement the DCE layer. Many of these components provide in a distributed environment what an operating system provides in a centralized (single-node) environment.
Figure 1-9 shows the DCE architecture and its technology components, along with their relationship to applications, underlying system support, and placeholders for future technologies.
This section gives a short description of each of the DCE technology components. A more in-depth description of each of these components is given in Chapter 3 of this manual.
DCE Threads supports the creation, management, and synchronization of multiple threads of control within a single process. This component is conceptually a part of the operating system layer, the layer below DCE. If the host operating system already supports threads, DCE can use that software and DCE Threads is not necessary. However, not all operating systems provide a threads facility, and DCE components require that threads be present, so this user-level threads package is included in DCE.
The DCE Remote Procedure Call (RPC) facility consists of both a development tool and a runtime service. The development tool consists of a language (and its compiler) that supports the development of distributed applications following the client/server model. It automatically generates code that transforms procedure calls into network messages. The runtime service implements the network protocols by which the client and server sides of an application communicate. DCE RPC also includes software for generating unique identifiers, which are useful in identifying service interfaces and other resources.
The DCE Directory Service is a central repository for information about resources in the distributed system. Typical resources are users, machines, and RPC-based services. The information consists of the name of the resource and its associated attributes. Typical attributes could include a user's home directory, or the location of an RPC-based server.
The DCE Directory Service comprises several parts: the Cell Directory Service (CDS), the Global Directory Agent (GDA), and a directory service programming interface. The Cell Directory Service manages a database of information about the resources in a group of machines called a DCE cell. (Cells are described in the next section.) The Global Directory Agent (GDA) acts as a go-between for cell and global directory services. CDS is accessed using the directory service application programming interface, the X/Open Directory Service (XDS) API.
The DCE Distributed Time Service (DTS) provides synchronized time on the computers participating in a Distributed Computing Environment. DTS synchronizes a DCE host's time with Coordinated Universal Time (UTC), an international time standard.
The DCE Security Service provides secure communications and controlled access to resources in the distributed system. There are four aspects to DCE security: authentication, secure communications, authorization, and auditing. These aspects are implemented by several services and facilities that together comprise the DCE Security Service, including the Registry Service, the Authentication Service, the Privilege Service, the Access Control List (ACL) Facility, the Login Facility, and the Audit Service.
The identity of a DCE user or service is verified, or authenticated, by the Authentication Service. Communications are protected by the integration of DCE RPC with the Security Service -- communication over the network can be checked for tampering or encrypted for privacy. Access to resources is controlled by comparing the credentials conferred to a user by the Privilege Service with the rights to the resource, which are specified in the resource's Access Control List. The Login Facility initializes a user's security environment, and the Registry Service manages the information (such as user accounts) in the DCE Security database. Security-relevant events can be monitored through the Audit Service. ``Code points'' can be set in DCE servers to record events that are deemed to be important to the integrity of the system. For example, the Login Facility uses the Audit Service to record logins by DCE users and services.
The DCE Distributed File Service (DFS) allows users to access and share files stored on a File Server anywhere on the network, without having to know the physical location of the file. Files are part of a single, global namespace, so no matter where in the network a user is, the file can be found using the same name. The Distributed File Service achieves high performance, particularly through caching of file system data, so that many users can access files that are located on a given File Server without prohibitive amounts of network traffic and resulting delays.
DCE DFS includes a physical file system, the DCE Local File System (LFS), which supports special features that are useful in a distributed environment. They include the ability to replicate data; log file system data, enabling quick recovery after a crash; simplify administration by dividing the file system into easily managed units called filesets; and associate ACLs with files and directories.
The Management block shown in Figure 1-9 is actually not a single component, but a cross section of the other components. Each DCE service contains an administrative component so it can be managed over the network. In addition, some of the DCE services themselves provide for management of the distributed system as a whole. For example, users are registered in the Security Service, and servers' network addresses are registered in the Directory Service.
A cell has its own Security Service, Cell Directory Service, and optionally, Distributed File Service; these services are available cell-wide. The Security Service for a cell manages the cell's registry, where user account information is kept. Each cell has its own namespace; the Cell Directory Service for the cell manages that namespace and its hierarchy. If DFS is present in the cell, the Distributed File Service allows remote access to files from anywhere in the cell. Each cell also has its own Distributed Time Service, which keeps the clocks on all of the machines in the cell synchronized.
A cell provides a single security domain. Users log into accounts in a cell. Access Control Lists (ACLs) identify users and groups in the cell (they can also refer to users and groups in other cells). A cell also provides a single naming domain. Each cell has a name, and all objects in the cell share that name.
DCE cells can be connected so that they can communicate with each other. Going back to the example, if the different departments' cells are connected, then a user in one department's cell may be able to access resources in another department's cell, although this access would typically be less frequent and more restricted than access to resources within the user's own cell.
Cells connect to each other by means of a global directory service. A cell's name is registered in a global directory service, and the cell is then able to contact other cells registered in that global service. Note that communication between DCE cells is not automatic. Cells that wish to communicate with each other must first establish a trust relationship between their cells' Security Services; this process is called ``cross-cell authentication'' and is described in more detail in Chapter 3.
A cell can have more than one name. In this case, one of the cell's names is designated its ``primary name'' while the other names are the cell's ``alias'' names. The cell's primary name is the default name for the cell; that is, it is the name that DCE services return. Cell name aliasing permits a cell to be registered in more than one global namespace. It also provides a way to change a cell's name if the need arises, for example, to respond to organizational changes within the company. For more information on how to create cell name aliases for a cell, see the Transarc DCE Administration Guide--Introduction and the Transarc DCE Administration Guide--Core Components.
A DCE cell can be configured in many ways, depending on its users' requirements. A cell consists of a network connecting three kinds of nodes: DCE User Machines, DCE Administrator Machines, and DCE Server Machines. DCE User Machines are general-purpose DCE machines. They contain software that enables them to act as clients to all of the DCE services. DCE Administrator Machines contain software that enables a DCE administrator to manage DCE system services remotely.
The DCE server machines are equipped with special software enabling them to provide one or more of the DCE services. Every cell must have at least one each of the following servers in order to function:
Other DCE servers may be present in a given DCE cell to provide additional functionality -- a Global Directory Agent may be present to enable the cell's directory server to communicate with other cells' directory servers; a Global Directory Server may be present to provide X.500 directory service; and Distributed File Servers may be present to provide storage of files and the special functions of the Local File System. (See Chapter 2 of this manual for more detailed information on DCE cell configuration.)
One of the benefits of OSF's DCE is its coherence: although the components themselves are modular with well-defined interfaces, they are also well integrated; the various DCE components each make use of the services of the other components wherever possible. For example, the RPC facility uses the Directory Service to advertise and look up RPC-based servers and their characteristics; it uses the Security Service to ensure message integrity and privacy; and it uses DCE Threads to handle concurrent execution of multiple RPCs. The Distributed File Service uses Threads, RPC, Directory Service, Distributed Time Service, and Security Service in providing its file service.
In general, the DCE components shown higher in the DCE Architecture (see Figure 1-9) make use of the components shown lower in the architecture. For example, DCE Threads is used by most other DCE components, but does not itself use other components. This ordering is not strictly hierarchical; often two services each depend on the other. For example, the Directory Service uses the Security Service, which in turn uses the Directory Service. The interdependence of DCE components is explained in more detail in Chapter 4.
As shown in Figure 1-8, DCE is layered on top of local operating system and networking software. DCE makes certain assumptions about the services provided by the underlying network and operating systems. DCE's requirements for these services are described in the following subsections.
In general, DCE is layered over a transport level service, such as UDP (User Datagram Protocol), TCP (Transmission Control Protocol), or ISO TP0-TP4 transport protocols, which is accessed through a transport interface, such as sockets or XTI (X/Open Transport Interface). DCE assumes that all nodes participating in the DCE environment are physically connected by a highly available network. The network can be a Local Area Network (LAN), a Wide Area Network (WAN), or a combination of both.
The DCE architecture supports different types of network protocol families. For example, DCE could be ported to run over Open Systems Interconnection (OSI) protocols. (The OSF DCE reference implementation runs over the Internet Protocol (IP) family.) However, in order for DCE systems to communicate with one another they must have at least one set of network protocols in common. For example, DCE is not designed to enable a node running only IP protocols to communicate with a node running only OSI protocols.
Finally, DCE assumes the ability to identify a node with a unique network address, and the ability to identify a process with a network endpoint address (for example, a port or T-selector).
DCE assumes that certain services are available through the underlying operating system, namely:
The previous two subsections listed assumptions made by the DCE architecture. The OSF DCE reference implementation contains additional dependencies on the operating system and network, which are specific to the implementation. These include the use of Internet Protocol and socket networking services, and UNIX operating system facilities.
The next sections discuss these aspects in greater detail.
A ``code set'' is a mapping of the members of a character set to specific numeric code values. Examples of code sets include ASCII, EBCDIC, JIS X0208 (Japanese Kanji), and ISO 8859-1 (also known as Latin-1.) The DCE RPC communications protocol automatically converts DCE PCS characters between the ASCII and EBCDIC code sets, if necessary. DCE RPC also provides constructs and routines for character and code set interoperability between non-PCS, or ``international'' characters. These features permit programmers to write DCE RPC applications that guarantee character and code set interoperability between clients and servers in a DCE that supports a variety of languages and encodings for those languages.