Chia sẻ: Thanh Cong | Ngày: | Loại File: PDF | Số trang:50

0
68
lượt xem
8

Mô tả tài liệu

Tham khảo tài liệu 'advanced php programming- p9', công nghệ thông tin, kỹ thuật lập trình phục vụ nhu cầu học tập, nghiên cứu và làm việc hiệu quả

Chủ đề:

Bình luận(0)

Lưu

## Nội dung Text: Advanced PHP Programming- P9

1. 378 Chapter 15 Building a Distributed Environment Client X Client Y Client X get a fresh Client Y gets a stale copy of Joe's page copy of Joe's page Server A Server B Newly Older Cache Cached Figure 15.6 Stale cache data resulting in inconsistent cluster behavior. Centralized Caches One of the easiest and most common techniques for guaranteeing cache consistency is to use a centralized cache solution. If all participants use the same set of cache files, most of the worries regarding distributed caching disappear (basically because the caching is no longer completely distributed—just the machines performing it are). Network file shares are an ideal tool for implementing a centralized file cache. On Unix systems the standard tool for doing this is NFS. NFS is a good choice for this application for two main reasons: n NFS servers and client software are bundled with essentially every modern Unix system. n Newer Unix systems supply reliable file-locking mechanisms over NFS, meaning that the cache libraries can be used without change.
2. Caching in a Distributed Environment 379 Joe Joe starts his shopping cart on A Joe When Joe gets served by B he gets a brand new cart. Cart A is not merged into B. Server A Server B Shopping Empty Cart Cart A Server A Server B Shopping Shopping Cart A Cart B Figure 15.7 Inconsistent cached session data breaking shopping carts. The real beauty of using NFS is that from a user level, it appears no different from any other filesystem, so it provides a very easy path for growing a cache implementation from a single file machine to a cluster of machines. If you have a server that utilizes /cache/www.foo.com as its cache directory, using the Cache_File module developed in Chapter 10, “Data Component Caching,” you can extend this caching architecture seamlessly by creating an exportable directory /shares/ cache/www.foo.com on your NFS server and then mounting it on any interested machine as follows:
4. Caching in a Distributed Environment 381 To achieve this, you can use Spread, a group communication toolkit designed at the Johns Hopkins University Center for Networking and Distributed Systems to provide an extremely efficient means of multicast communication between services in a cluster with robust ordering and reliability semantics. Spread is not a distributed application in itself; it is a toolkit (a messaging bus) that allows the construction of distributed applications. The basic architecture plan is shown in Figure 15.8. Cache files will be written in a nonversioned fashion locally on every machine.When an update to the cached data occurs, the updating application will send a message to the cache Spread group. On every machine, there is a daemon listening to that group.When a cache invalidation request comes in, the daemon will perform the cache invalidation on that local machine. group group 1 2 host 1 spread ring host host 3 2 group 1 group 1 group 2 Figure 15.8 A simple Spread ring. This methodology works well as long as there are no network partitions. A network par- tition event occurs whenever a machine joins or leaves the ring. Say, for example, that a machine crashes and is rebooted. During the time it was down, updates to cache entries may have changed. It is possible, although complicated, to build a system using Spread whereby changes could be reconciled on network rejoin. Fortunately for you, the nature of most cached information is that it is temporary and not terribly painful to re-create. You can use this assumption and simply destroy the cache on a Web server whenever the cache maintenance daemon is restarted.This measure, although draconian, allows you to easily prevent usage of stale data.
5. 382 Chapter 15 Building a Distributed Environment To implement this strategy, you need to install some tools.To start with, you need to download and install the Spread toolkit from www.spread.org. Next, you need to install the Spread wrapper from PEAR: # pear install spread The Spread wrapper library is written in C, so you need all the PHP development tools installed to compile it (these are installed when you build from source). So that you can avoid having to write your own protocol, you can use XML-RPC to encapsulate your purge requests.This might seem like overkill, but XML-RPC is actually an ideal choice: It is much lighter-weight than a protocol such as SOAP, yet it still provides a relatively extensible and “canned” format, which ensures that you can easily add clients in other languages if needed (for example, a standalone GUI to survey and purge cache files). To start, you need to install an XML-RPC library.The PEAR XML-RPC library works well and can be installed with the PEAR installer, as follows: # pear install XML_RPC After you have installed all your tools, you need a client.You can augment the Cache_File class by using a method that allows for purging data: require_once ‘XML/RPC.php’; class Cache_File_Spread extends File { private $spread; Spread works by having clients attach to a network of servers, usually a single server per machine. If the daemon is running on the local machine, you can simply specify the port that it is running on, and a connection will be made over a Unix domain socket.The default Spread port is 4803: private$spreadName = ‘4803’; Spread clients join groups to send and receive messages on. If you are not joined to a group, you will not see any of the messages for it (although you can send messages to a group you are not joined to). Group names are arbitrary, and a group will be automati- cally created when the first client joins it.You can call your group xmlrpc: private $spreadGroup = ‘xmlrpc’; private$cachedir = ‘/cache/’; public function _ _construct($filename,$expiration=false) { parent::_ _construct($filename,$expiration); You create a new Spread object in order to have the connect performed for you auto- matically: $this->spread = new Spread($this->spreadName); }
6. Caching in a Distributed Environment 383 Here’s the method that does your work.You create an XML-RPC message and then send it to the xmlrpc group with the multicast method: function purge() { // We don’t need to perform this unlink, // our local spread daemon will take care of it. // unlink(“$this->cachedir/$this->filename”); $params = array($this->filename); $client = new XML_RPC_Message(“purgeCacheEntry”,$params); $this->spread->multicast($this->spreadGroup, $client->serialize()); } } } Now, whenever you need to poison a cache file, you simply use this:$cache->purge(); You also need an RPC server to receive these messages and process them: require_once ‘XML/RPC/Server.php’; $CACHEBASE = ‘/cache/’;$serverName = ‘4803’; $groupName = ‘xmlrpc’; The function that performs the cache file removal is quite simple.You decode the file to be purged and then unlink it.The presence of the cache directory is a half-hearted attempt at security. A more robust solution would be to use chroot on it to connect it to the cache directory at startup. Because you’re using this purely internally, you can let this slide for now. Here is a simple cache removal function: function purgeCacheEntry($message) { global $CACHEBASE;$val = $message->params[0];$filename = $val->getval(); unlink(“$CACHEBASE/$filename”); } Now you need to do some XML-RPC setup, setting the dispatch array so that your server object knows what functions it should call:$dispatches = array( ‘purgeCacheEntry’ => array(‘function’ => ‘purgeCacheEntry’)); $server = new XML_RPC_Server($dispatches, 0); Now you get to the heart of your server.You connect to your local Spread daemon, join the xmlrpc group, and wait for messages.Whenever you receive a message, you call the server’s parseRequest method on it, which in turn calls the appropriate function (in this case, purgeCacheEntry):
7. 384 Chapter 15 Building a Distributed Environment $spread = new Spread($serverName); $spread->join($groupName); while(1) { $message =$spread->receive(); $server->parseRequest($data->message); } Scaling Databases One of the most difficult challenges in building large-scale services is the scaling of data- bases.This applies not only to RDBMSs but to almost any kind of central data store.The obvious solution to scaling data stores is to approach them as you would any other serv- ice: partition and cluster. Unfortunately, RDBMSs are usually much more difficult to make work than other services. Partitioning actually works wonderfully as a database scaling method.There are a number of degrees of portioning. On the most basic level, you can partition by breaking the data objects for separate services into distinct schemas. Assuming that a complete (or at least mostly complete) separation of the dependant data for the applications can be achieved, the schemas can be moved onto separate physical database instances with no problems. Sometimes, however, you have a database-intensive application where a single schema sees so much DML (Data Modification Language—SQL that causes change in the data- base) that it needs to be scaled as well. Purchasing more powerful hardware is an easy way out and is not a bad option in this case. However, sometimes simply buying larger hardware is not an option: n Hardware pricing is not linear with capacity. High-powered machines can be very expensive. n I/O bottlenecks are hard (read expensive) to overcome. n Commercial applications often run on a per-processor licensing scale and, like hardware, scale nonlinearly with the number of processors. (Oracle, for instance, does not allow standard edition licensing on machines that can hold more than four processors.) Common Bandwidth Problems You saw in Chapter 12, “Interacting with Databases,” that selecting more rows than you actually need can result in your queries being slow because all that information needs to be pulled over the network from the RDBMS to the requesting host. In high-volume applications, it’s very easy for this query load to put a signif- icant strain on your network. Consider this: If you request 100 rows to generate a page and your average row width is 1KB, then you are pulling 100KB of data across your local network per page. If that page is requested 100 times per second, then just for database data, you need to fetch 100KB × 100 = 10MB of data per second. That’s bytes, not bits. In bits, it is 80Mbps. That will effectively saturate a 100Mb Ethernet link.
10. Scaling Databases 387 Load Balancer Webserver Webserver Webserver database reads database writes Load Balancer Master RO Slave RO Slave DB DB DB Figure 15.9 Overview of MySQL master/slave replication. Writing Applications to Use Master/Slave Setups In MySQL version 4.1 or later, there are built-in functions to magically handle query distribution over a master/slave setup.This is implemented at the level of the MySQL client libraries, which means that it is extremely efficient.To utilize these functions in PHP, you need to be using the new mysqli extension, which breaks backward compatibility with the standard mysql extension and does not support MySQL prior to version 4.1. If you’re feeling lucky, you can turn on completely automagical query dispatching, like this: $dbh = mysqli_init(); mysqli_real_connect($dbh, $host,$user, $password,$dbname); mysqli_rpl_parse_enable($dbh); // prepare and execute queries as per usual The mysql_rpl_parse_enable() function instructs the client libraries to attempt to automatically determine whether a query can be dispatched to a slave or must be serv- iced by the master. 11. 388 Chapter 15 Building a Distributed Environment Reliance on auto-detection is discouraged, though. As the developer, you have a much better idea of where a query should be serviced than auto-detection does.The mysqli interface provides assistance in this case as well. Acting on a single resource, you can also specify a query to be executed either on a slave or on the master:$dbh = mysqli_init(); mysqli_real_connect($dbh,$host, $user,$password, $dbname); mysqli_slave_query($dbh, $readonly_query); mysqli_master_query($dbh, $write_query); You can, of course, conceal these routines inside the wrapper classes. If you are running MySQL prior to 4.1 or another RDBMS system that does not seamlessly support auto- matic query dispatching, you can emulate this interface inside the wrapper as well: class Mysql_Replicated extends DB_Mysql { protected$slave_dbhost; protected $slave_dbname; protected$slave_dbh; public function _ _construct($user,$pass, $dbhost,$dbname, $slave_dbhost,$slave_dbname) { $this->user =$user; $this->pass =$pass; $this->dbhost =$dbhost; $this->dbname =$dbname; $this->slave_dbhost =$slave_dbhost; $this->slave_dbname =$slave_dbname; } protected function connect_master() { $this->dbh = mysql_connect($this->dbhost, $this->user,$this->pass); mysql_select_db($this->dbname,$this->dbh); } protected function connect_slave() { $this->slave_dbh = mysql_connect($this->slave_dbhost, $this->user,$this->pass); mysql_select_db($this->slave_dbname,$this->slave_dbh); } protected function _execute($dbh,$query) { $ret = mysql_query($query, $dbh); if(is_resource($ret)) { return new DB_MysqlStatement($ret); } return false; } 12. Scaling Databases 389 public function master_execute($query) { if(!is_resource($this->dbh)) {$this->connect_master(); } $this->_execute($this->dbh, $query); } public function slave_execute($query) { if(!is_resource($this->slave_dbh)) {$this->connect_slave(); } $this->_execute($this->slave_dbh, $query); } } You could even incorporate query auto-dispatching into your API by attempting to detect queries that are read-only or that must be dispatched to the master. In general, though, auto-detection is less desirable than manually determining where a query should be directed.When attempting to port a large code base to use a replicated database, auto- dispatch services can be useful but should not be chosen over manual determination when time and resources permit. Alternatives to Replication As noted earlier in this chapter, master/slave replication is not the answer to everyone’s database scalability problems. For highly write-intensive applications, setting up slave replication may actually detract from performance. In this case, you must look for idio- syncrasies of the application that you can exploit. An example would be data that is easily partitionable. Partitioning data involves breaking a single logical schema across multiple physical databases by a primary key.The critical trick to efficient partitioning of data is that queries that will span multiple data- bases must be avoided at all costs. An email system is an ideal candidate for partitioning. Email messages are accessed only by their recipient, so you never need to worry about making joins across multiple recipients.Thus you can easily split email messages across, say, four databases with ease: class Email { public$recipient; public $sender; public$body; /* ... */ } class PartionedEmailDB { public $databases; You start out by setting up connections for the four databases. Here you use wrapper classes that you’ve written to hide all the connection details for each: 13. 390 Chapter 15 Building a Distributed Environment public function _ _construct() {$this->databases[0] = new DB_Mysql_Email0; $this->databases[1] = new DB_Mysql_Email1;$this->databases[2] = new DB_Mysql_Email2; $this->databases[3] = new DB_Mysql_Email3; } On both insertion and retrieval, you hash the recipient to determine which database his or her data belongs in. crc32 is used because it is faster than any of the cryptographic hash functions (md5, sha1, and so on) and because you are only looking for a function to distribute the users over databases and don’t need any of the security the stronger one- way hashes provide. Here are both insertion and retrieval functions, which use a crc32- based hashing scheme to spread load across multiple databases: public function insertEmail(Email$email) { $query = “INSERT INTO emails (recipient, sender, body) VALUES(:1, :2, :3)”;$hash = crc32($email->recipient) % count($this->databases); $this->databases[$hash]->prepare($query)->execute($email->recipient, $email->sender,$email->body); } public function retrieveEmails($recipient) {$query = “SELECT * FROM emails WHERE recipient = :1”; $hash = crc32($email->recipient) % count($this->databases);$result = $this->databases[$hash]->prepare($query)->execute($recipient); while($hr =$result->fetch_assoc) { $retval[] = new Email($hr); } } Alternatives to RDBMS Systems This chapter focuses on RDBMS-backed systems.This should not leave you with the impression that all applications are backed against RDBMS systems. Many applications are not ideally suited to working in a relational system, and they benefit from interacting with custom-written application servers. Consider an instant messaging service. Messaging is essentially a queuing system. Sending users’ push messages onto a queue for a receiving user to pop off of. Although you can model this in an RDBMS, it is not ideal. A more efficient solution is to have an application server built specifically to handle the task. Such a server can be implemented in any language and can be communicated with over whatever protocol you build into it. In Chapter 16, “RPC: Interacting with Remote Services,” you will see a sample of so-called Web services–oriented protocols. You will also be able to devise your own protocol and talk over low-level network sock- ets by using the sockets extension in PHP.
17. XML-RPC 395 Of course you don’t have to build and interpret these documents yourself.There are a number of different XML-RPC implementations for PHP. I generally prefer to use the PEAR XML-RPC classes because they are distributed with PHP itself. (They are used by the PEAR installer.) Thus, they have almost 100% deployment. Because of this, there is little reason to look elsewhere. An XML-RPC dialogue consists of two parts: the client request and the server response. First let’s talk about the client code.The client creates a request document, sends it to a server, and parses the response.The following code generates the request document shown earlier in this section and parses the resulting response: require_once ‘XML/RPC.php’; $client = new XML_RPC_Client(‘/xmlrpc.php’, ‘www.example.com’);$msg = new XML_RPC_Message(‘system.load’); $result =$client->send($msg); if ($result->faultCode()) { echo “Error\n”; } print XML_RPC_decode($result->value()); You create a new XML_RPC_Client object, passing in the remote service URI and address. Then an XML_RPC_Message is created, containing the name of the method to be called (in this case, system.load). Because no parameters are passed to this method, no additional data needs to be added to the message. Next, the message is sent to the server via the send() method.The result is checked to see whether it is an error. If it is not an error, the value of the result is decoded from its XML format into a native PHP type and printed, using XML_RPC_decode(). You need the supporting functionality on the server side to receive the request, find and execute an appropriate callback, and return the response. Here is a sample imple- mentation that handles the system.load method you requested in the client code: require_once ‘XML/RPC/Server.php’; function system_load() {$uptime = uptime; if(preg_match(“/load average: ([\d.]+)/”, $uptime,$matches)) { return new XML_RPC_Response( new XML_RPC_Value($matches[1], ‘string’)); } }$dispatches = array(‘system.load’ => array(‘function’ => ‘system_uptime’)); new XML_RPC_Server($dispatches, 1); 18. 396 Chapter 16 RPC: Interacting with Remote Services The PHP functions required to support the incoming requests are defined.You only need to deal with the system.load request, which is implemented through the func- tion system_load(). system_load() runs the Unix command uptime and extracts the one-minute load average of the machine. Next, it serializes the extracted load into an XML_RPC_Value and wraps that in an XML_RPC_Response for return to the user. Next, the callback function is registered in a dispatch map that instructs the server how to dispatch incoming requests to particular functions.You create a$dispatches array of functions that will be called.This is an array that maps XML-RPC method names to PHP function names. Finally, an XML_RPC_Server object is created, and the dispatch array $dispatches is passed to it.The second parameter, 1, indicates that it should immediately service a request, using the service() method (which is called internally). service() looks at the raw HTTP POST data, parses it for an XML-RPC request, and then performs the dispatching. Because it relies on the PHP autoglobal$HTTP_RAW_POST_DATA, you need to make certain that you do not turn off always_populate_raw_post_data in your php.ini file. Now, if you place the server code at www.example.com/xmlrpc.php and execute the client code from any machine, you should get back this: > php system_load.php 0.34 or whatever your one-minute load average is. Building a Server: Implementing the MetaWeblog API The power of XML-RPC is that it provides a standardized method for communicating between services.This is especially useful when you do not control both ends of a serv- ice request. XML-RPC allows you to easily set up a well-defined way of interfacing with a service you provide. One example of this is Web log submission APIs. There are many Web log systems available, and there are many tools for helping peo- ple organize and post entries to them. If there were no standardize procedures, every tool would have to support every Web log in order to be widely usable, or every Web log would need to support every tool.This sort of tangle of relationships would be impossi- ble to scale. Although the feature sets and implementations of Web logging systems vary consider- ably, it is possible to define a set of standard operations that are necessary to submit entries to a Web logging system.Then Web logs and tools only need to implement this interface to have tools be cross-compatible with all Web logging systems. In contrast to the huge number of Web logging systems available, there are only three real Web log submission APIs in wide usage: the Blogger API, the MetaWeblog API, and the MovableType API (which is actually just an extension of the MetaWeblog API). All
19. XML-RPC 397 the Web log posting tools available speak one of these three protocols, so if you imple- ment these APIs, your Web log will be able to interact with any tool out there.This is a tremendous asset for making a new blogging system easily adoptable. Of course, you first need to have a Web logging system that can be targeted by one of the APIs. Building an entire Web log system is beyond the scope of this chapter, so instead of creating it from scratch, you can add an XML-RPC layer to the Serendipity Web logging system.The APIs in question handle posting, so they will likely interface with the following routines from Serendipity: function serendipity_updertEntry($entry) {} function serendipity_fetchEntry($key, $match) {} serendipity_updertEntry() is a function that either updates an existing entry or inserts a new one, depending on whether id is passed into it. Its$entry parameter is an array that is a row gateway (a one-to-one correspondence of array elements to table columns) to the following database table: CREATE TABLE serendipity_entries ( id INT AUTO_INCREMENT PRIMARY KEY, title VARCHAR(200) DEFAULT NULL, timestamp INT(10) DEFAULT NULL, body TEXT, author VARCHAR(20) DEFAULT NULL, isdraft INT ); serendipity_fetchEntry() fetches an entry from that table by matching the specified key/value pair. The MetaWeblog API provides greater depth of features than the Blogger API, so that is the target of our implementation.The MetaWeblog API implements three main meth- ods: metaWeblog.newPost(blogid,username,password,item_struct,publish) returns string metaWeblog.editPost(postid,username,password,item_struct,publish) returns true metaWeblog.getPost(postid,username,password) returns item_struct blogid is an identifier for the Web log you are targeting (which is useful if the system supports multiple separate Web logs). username and password are authentication criteria that identify the poster. publish is a flag that indicates whether the entry is a draft or should be published live. item_struct is an array of data for the post. Instead of implementing a new data format for entry data, Dave Winer, the author of the MetaWeblog spec, chose to use the item element definition from the Really Simple Syndication (RSS) 2.0 specification, available at http://blogs.law.harvard.edu/ tech/rss. RSS is a standardized XML format developed for representing articles and journal entries. Its item node contains the following elements: