Concepts

Concepts This explains the architecture and related concepts for this plugin, and describes the impact that MySQL replication and this plugin have on developmental tasks while using a database cluster. Reading and understanding these concepts is required, in order to use this plugin with success.

Architecture The mysqlnd replication and load balancing plugin is implemented as a PHP extension. It is written in C and operates under the hood of PHP. During the startup of the PHP interpreter, in the module init phase of the PHP engine, it gets registered as a mysqlnd plugin to replace selected mysqlnd C methods. At PHP runtime, it inspects queries sent from mysqlnd (PHP) to the MySQL server. If a query is recognized as read-only, it will be sent to one of the configured slave servers. Statements are considered read-only if they either start with SELECT, the SQL hint /*ms=slave*/ or a slave had been chosen for running the previous query, and the query started with the SQL hint /*ms=last_used*/. In all other cases, the query will be sent to the MySQL replication master server. For better portability, applications should use the MYSQLND_MS_MASTER_SWITCH, MYSQLND_MS_SLAVE_SWITCH, and MYSQLND_MS_LAST_USED_SWITCH predefined mysqlnd_ms constants, instead of their literal values, such as /*ms=slave*/. The plugin handles the opening and closing of database connections to both master and slave servers. From an application point of view, there continues to be only one connection handle. However, internally, this one public connection handle represents a pool of network connections that are managed by the plugin. The plugin proxies queries to the master server, and to the slaves using multiple connections. Database connections have a state consisting of, for example, transaction status, transaction settings, character set settings, and temporary tables. The plugin will try to maintain the same state among all internal connections, whenever this can be done in an automatic and transparent way. In cases where it is not easily possible to maintain state among all connections, such as when using BEGIN TRANSACTION, the plugin leaves it to the user to handle.

Connection pooling and switching The replication and load balancing plugin changes the semantics of a PHP MySQL connection handle. The existing API of the PHP MySQL extensions (mysqli, mysql, and PDO_MYSQL) are not changed in a way that functions are added or removed. But their behaviour changes when using the plugin. Existing applications do not need to be adapted to a new API, but they may need to be modified because of the behaviour changes. The plugin breaks the one-by-one relationship between a mysqli, mysql, and PDO_MYSQL connection handle and a MySQL network connection. And a mysqli, mysql, and PDO_MYSQL connection handle represents a local pool of connections to the configured MySQL replication master and MySQL replication slave servers. The plugin redirects queries to the master and slave servers. At some point in time one and the same PHP connection handle may point to the MySQL master server. Later on, it may point to one of the slave servers or still the master. Manipulating and replacing the network connection referenced by a PHP MySQL connection handle is not a transparent operation. Every MySQL connection has a state. The state of the connections in the connection pool of the plugin can differ. Whenever the plugin switches from one wire connection to another, the current state of the user connection may change. The applications must be aware of this. The following list shows what the connection state consists of. The list may not be complete. Transaction status Temporary tables Table locks Session system variables and session user variables The current database set using USE and other state chaining SQL commands Prepared statements HANDLER variables Locks acquired with GET_LOCK() Connection switches happen right before queries are executed. The plugin does not switch the current connection until the next statement is executed. Replication issues See also the MySQL reference manual chapter about replication features and related issues. Some restrictions may not be related to the PHP plugin, but are properties of the MySQL replication system. Broadcasted messages The plugins philosophy is to align the state of connections in the pool only if the state is under full control of the plugin, or if it is necessary for security reasons. Just a few actions that change the state of the connection fall into this category. The following is a list of connection client library calls that change state, and are broadcasted to all open connections in the connection pool. If any of the listed calls below are to be executed, the plugin loops over all open master and slave connections. The loop continues until all servers have been contacted, and the loop does not break if a server indicates a failure. If possible, the failure will propagate to the called user API function, which may be detected depending on which underlying library function was triggered. Library call Notes Version change_user() Called by the mysqli_change_user user API call. Also triggered upon reuse of a persistent mysqli connection. Since 1.0.0. select_db Called by the following user API calls: mysql_select_db, mysql_list_tables, mysql_db_query, mysql_list_fields, mysqli_select_db. Note, that SQL USE is not monitored. Since 1.0.0. set_charset() Called by the following user API calls: mysql_set_charset. mysqli_set_charset. Note, that SQL SET NAMES is not monitored. Since 1.0.0. set_server_option() Called by the following user API calls: mysqli_multi_query, mysqli_real_query, mysqli_query, mysql_query. Since 1.0.0. set_client_option() Called by the following user API calls: mysqli_options, mysqli_ssl_set, mysqli_connect, mysql_connect, mysql_pconnect. Since 1.0.0. set_autocommit() Called by the following user API calls: mysqli_autocommit, PDO::setAttribute(PDO::ATTR_AUTOCOMMIT). Since 1.0.0. PHP >= 5.4.0. ssl_set() Called by the following user API calls: mysqli_ssl_set. Since 1.1.0. Broadcasting and lazy connections The plugin does not proxy or remember all settings to apply them on connections opened in the future. This is important to remember, if using lazy connections. Lazy connections are connections which are not opened before the client sends the first connection. Use of lazy connections is the default plugin action. The following connection library calls each changed state, and their execution is recorded for later use when lazy connections are opened. This helps ensure that the connection state of all connections in the connection pool are comparable. Library call Notes Version change_user() User, password and database recorded for future use. Since 1.1.0. select_db Database recorded for future use. Since 1.1.0. set_charset() Calls set_client_option(MYSQL_SET_CHARSET_NAME, charset) on lazy connection to ensure charset will be used upon opening the lazy connection. Since 1.1.0. set_autocommit() Adds SET AUTOCOMMIT=0|1 to the list of init commands of a lazy connection using set_client_option(MYSQL_INIT_COMMAND, "SET AUTOCOMMIT=...%quot;). Since 1.1.0. PHP >= 5.4.0. Connection state The connection state is not only changed by API calls. Thus, even if PECL mysqlnd_ms monitors all API calls, the application must still be aware. Ultimately, it is the applications responsibility to maintain the connection state, if needed. Charsets and string escaping Due to the use of lazy connections, which are a default, it can happen that an application tries to escape a string for use within SQL statements before a connection has been established. In this case string escaping is not possible. The string escape function does not know what charset to use before a connection has been established. To overcome the problem a new configuration setting server_charset has been introduced in version 1.4.0. Attention has to be paid on escaping strings with a certain charset but using the result on a connection that uses a different charset. Please note, that PECL/mysqlnd_ms manipulates connections and one application level connection represents a pool of multiple connections that all may have different default charsets. It is recommended to configure the servers involved to use the same default charsets. The configuration setting server_charset does help with this situation as well. If using server_charset, the plugin will set the given charset on all newly opened connections.

Local transaction handling Transaction handling is fundamentally changed. An SQL transaction is a unit of work that is run on one database server. The unit of work consists of one or more SQL statements. By default the plugin is not aware of SQL transactions. The plugin may switch connections for load balancing at any point in time. Connection switches may happen in the middle of a transaction. This is against the nature of an SQL transaction. By default, the plugin is not transaction safe. Any kind of MySQL load balancer must be hinted about the begin and end of a transaction. Hinting can either be done implicitly by monitoring API calls or using SQL hints. Both options are supported by the plugin, depending on your PHP version. API monitoring requires PHP 5.4.0 or newer. The plugin, like any other MySQL load balancer, cannot detect transaction boundaries based on the MySQL Client Server Protocol. Thus, entirely transparent transaction aware load balancing is not possible. The least intrusive option is API monitoring, which requires little to no application changes, depending on your application. Please, find examples of using SQL hints or the API monitoring in the examples section. The details behind the API monitoring, which makes the plugin transaction aware, are described below. Beginning with PHP 5.4.0, the mysqlnd library allows this plugin to subclass the library C API call set_autocommit(), to detect the status of autocommit mode. The PHP MySQL extensions either issue a query (such as SET AUTOCOMMIT=0|1), or use the mysqlnd library call set_autocommit() to control the autocommit setting. If an extension makes use of set_autocommit(), the plugin can be made transaction aware. Transaction awareness cannot be achieved if using SQL to set the autocommit mode. The library function set_autocommit() is called by the mysqli_autocommit and PDO::setAttribute(PDO::ATTR_AUTOCOMMIT) user API calls. The plugin configuration option trx_stickiness=master can be used to make the plugin transactional aware. In this mode, the plugin stops load balancing if autocommit becomes disabled, and directs all statements to the master until autocommit gets enabled. An application that does not want to set SQL hints for transactions but wants to use the transparent API monitoring to avoid application changes must make sure that the autocommit settings is changed exclusively through the listed API calls. API based transaction boundary detection has been improved with PHP 5.5.0 and PECL/mysqlnd_ms 1.5.0 to cover not only calls to mysqli_autocommit but also mysqli_begin, mysqli_commit and mysqli_rollback.

Error handling Applications using PECL/mysqlnd_ms should implement proper error handling for all user API calls. And because the plugin changes the semantics of a connection handle, API calls may return unexpected errors. If using the plugin on a connection handle that no longer represents an individual network connection, but a connection pool, an error code and error message will be set on the connection handle whenever an error occurs on any of the network connections behind. If using lazy connections, which is the default, connections are not opened until they are needed for query execution. Therefore, an API call for a statement execution may return a connection error. In the example below, an error is provoked when trying to run a statement on a slave. Opening a slave connection fails because the plugin configuration file lists an invalid host name for the slave. Provoking a connection error The explicit activation of lazy connections is for demonstration purpose only. Connection error on query execution query("SET @myrole='master'")) { printf("[%d] %s\n", $mysqli->errno, $mysqli->error); } /* Connection 2, run on slave because SELECT, provoke connection error */ if (!($res = $mysqli->query("SELECT @myrole AS _role"))) { printf("[%d] %s\n", $mysqli->errno, $mysqli->error); } else { $row = $res->fetch_assoc(); $res->close(); printf("@myrole = '%s'\n", $row['_role']); } $mysqli->close(); ?> ]]> &example.outputs.similar; Applications are expected to handle possible connection errors by implementing proper error handling. Depending on the use case, applications may want to handle connection errors differently from other errors. Typical connection errors are 2002 (CR_CONNECTION_ERROR) - Can't connect to local MySQL server through socket '%s' (%d), 2003 (CR_CONN_HOST_ERROR) - Can't connect to MySQL server on '%s' (%d) and 2005 (CR_UNKNOWN_HOST) - Unknown MySQL server host '%s' (%d). For example, the application may test for the error codes and manually perform a fail over. The plugins philosophy is not to offer automatic fail over, beyond master fail over, because fail over is not a transparent operation. Provoking a connection error Explicitly activating lazy connections is done for demonstration purposes, as is round robin load balancing as opposed to the default random once type. Most basic failover query("SET @myrole='master'")) { printf("[%d] %s\n", $mysqli->errno, $mysqli->error); } /* Connection 2, first slave */ $res = $mysqli->query("SELECT VERSION() AS _version"); /* Hackish manual fail over */ if (2002 == $mysqli->errno || 2003 == $mysqli->errno || 2004 == $mysqli->errno) { /* Connection 3, first slave connection failed, trying next slave */ $res = $mysqli->query("SELECT VERSION() AS _version"); } if (!$res) { printf("ERROR, [%d] '%s'\n", $mysqli->errno, $mysqli->error); } else { /* Error messages are taken from connection 3, thus no error */ printf("SUCCESS, [%d] '%s'\n", $mysqli->errno, $mysqli->error); $row = $res->fetch_assoc(); $res->close(); printf("version = %s\n", $row['_version']); } $mysqli->close(); ?> ]]> &example.outputs.similar; In some cases, it may not be easily possible to retrieve all errors that occur on all network connections through a connection handle. For example, let's assume a connection handle represents a pool of three open connections. One connection to a master and two connections to the slaves. The application changes the current database using the user API call mysqli_select_db, which then calls the mysqlnd library function to change the schemata. mysqlnd_ms monitors the function, and tries to change the current database on all connections to harmonize their state. Now, assume the master succeeds in changing the database, and both slaves fail. Upon the initial error from the first slave, the plugin will set an appropriate error on the connection handle. The same is done when the second slave fails to change the database. The error message from the first slave is lost. Such cases can be debugged by either checking for errors of the type E_WARNING (see above) or, if no other option, investigation of the mysqlnd_ms debug and trace log.

Transient errors Some distributed database clusters make use of transient errors. A transient error is a temporary error that is likely to disappear soon. By definition it is safe for a client to ignore a transient error and retry the failed operation on the same database server. The retry is free of side effects. Clients are not forced to abort their work or to fail over to another database server immediately. They may enter a retry loop before to wait for the error to disappear before giving up on the database server. Transient errors can be seen, for example, when using MySQL Cluster. But they are not bound to any specific clustering solution per se. PECL/mysqlnd_ms can perform an automatic retry loop in case of a transient error. This increases distribution transparency and thus makes it easier to migrate an application running on a single database server to run on a cluster of database servers without having to change the source of the application. The automatic retry loop will repeat the requested operation up to a user configurable number of times and pause between the attempts for a configurable amount of time. If the error disappears during the loop, the application will never see it. If not, the error is forwarded to the application for handling. In the example below a duplicate key error is provoked to make the plugin retry the failing query two times before the error is passed to the application. Between the two attempts the plugin sleeps for 100 milliseconds. Provoking a transient error Transient error retry loop query("DROP TABLE IF EXISTS test") || !$mysqli->query("CREATE TABLE test(id INT PRIMARY KEY)") || !$mysqli->query("INSERT INTO test(id) VALUES (1))")) { printf("[%d] %s\n", $mysqli->errno, $mysqli->error); } /* Retry loop is completely transparent. Checking statistics is the only way to know about implicit retries */ $stats = mysqlnd_ms_get_stats(); printf("Transient error retries before error: %d\n", $stats['transient_error_retries']); /* Provoking duplicate key error to see statistics change */ if (!$mysqli->query("INSERT INTO test(id) VALUES (1))")) { printf("[%d] %s\n", $mysqli->errno, $mysqli->error); } $stats = mysqlnd_ms_get_stats(); printf("Transient error retries after error: %d\n", $stats['transient_error_retries']); $mysqli->close(); ?> ]]> &example.outputs.similar; Because the execution of the retry loop is transparent from a users point of view, the example checks the statistics provided by the plugin to learn about it. As the example shows, the plugin can be instructed to consider any error transient regardless of the database servers error semantics. The only error that a stock MySQL server considers temporary has the error code 1297. When configuring other error codes but 1297 make sure your configuration reflects the semantics of your clusters error codes. The following mysqlnd C API calls are monitored by the plugin to check for transient errors: query(), change_user(), select_db(), set_charset(), set_server_option() prepare(), execute(), set_autocommit(), tx_begin(), tx_commit(), tx_rollback(), tx_commit_or_rollback(). The corresponding user API calls have similar names. The maximum time the plugin may sleep during the retry loop depends on the function in question. The a retry loop for query(), prepare() or execute() will sleep for up to max_retries * usleep_retry milliseconds. However, functions that control connection state are dispatched to all connections. The retry loop settings are applied to every connection on which the command is to be run. Thus, such a function may interrupt program execution for longer than a function that is run on one server only. For example, set_autocommit() is dispatched to connections and may sleep up to (max_retries * usleep_retry) * number_of_open_connections) milliseconds. Please, keep this in mind when setting long sleep times and large retry numbers. Using the default settings of max_retries=1, usleep_retry=100 and lazy_connections=1 it is unlikely that you will ever see a delay of more than 1 second.

Failover By default, connection failover handling is left to the user. The application is responsible for checking return values of the database functions it calls and reacting to possible errors. If, for example, the plugin recognizes a query as a read-only query to be sent to the slave servers, and the slave server selected by the plugin is not available, the plugin will raise an error after not executing the statement. Default: manual failover It is up to the application to handle the error and, if required, re-issue the query to trigger the selection of another slave server for statement execution. The plugin will make no attempts to failover automatically, because the plugin cannot ensure that an automatic failover will not change the state of the connection. For example, the application may have issued a query which depends on SQL user variables which are bound to a specific connection. Such a query might return incorrect results if the plugin would switch the connection implicitly as part of automatic failover. To ensure correct results, the application must take care of the failover, and rebuild the required connection state. Therefore, by default, no automatic failover is performed by the plugin. A user that does not change the connection state after opening a connection may activate automatic failover. Please note, that automatic failover logic is limited to connection attempts. Automatic failover is not used for already established connections. There is no way to instruct the plugin to attempt failover on a connection that has been connected to MySQL already in the past. Automatic failover The failover policy is configured in the plugins configuration file, by using the failover configuration directive. Automatic and silent failover can be enabled through the failover configuration directive. Automatic failover can either be configured to try exactly one master after a slave failure or, alternatively, loop over slaves and masters before returning an error to the user. The number of connection attempts can be limited and failed hosts can be excluded from future load balancing attempts. Limiting the number of retries and remembering failed hosts are considered experimental features, albeit being reasonable stable. Syntax and semantics may change in future versions. Please note, since version 1.5.0 automatic failover is disabled for the duration of a transaction if transaction stickiness is enabled and transaction boundaries have been detected. The plugin will not switch connections for the duration of a transaction. It will also not perform automatic and silent failover. Instead an error will be thrown. It is then left to the user to handle the failure of the transaction. Please check, the trx_stickiness documentation how to do this. A basic manual failover example is provided within the error handling section. Standby servers Using weighted load balancing, introduced in PECL/mysqlnd 1.4.0, it is possible to configure standby servers that are sparsely used during normal operations. A standby server that is primarily used as a worst-case standby failover target can be assigned a very low weight/priority in relation to all other servers. As long as all servers are up and running the majority of the workload is assigned to the servers which have hight weight values. Few requests will be directed to the standby system which has a very low weight value. Upon failure of the servers with a high priority, you can still failover to the standby, which has been given a low load balancing priority by assigning a low weight to it. Failover can be some manually or automatically. If done automatically, you may want to combine it with the remember_failed option. At this point, it is not possible to instruct the load balancer to direct no requests at all to a standby. This may not be much of a limitation given that the highest weight you can assign to a server is 65535. Given two slaves, of which one shall act as a standby and has been assigned a weight of 1, the standby will have to handle far less than one percent of the overall workload. Failover and primary copy Please note, if using a primary copy cluster, such as MySQL Replication, it is difficult to do connection failover in case of a master failure. At any time there is only one master in the cluster for a given dataset. The master is a single point of failure. If the master fails, clients have no target to fail over write requests. In case of a master outage the database administrator must take care of the situation and update the client configurations, if need be.

Load balancing Four load balancing strategies are supported to distribute statements over the configured MySQL slave servers: random Chooses a random server whenever a statement is executed. random once (default) Chooses a random server after the first statement is executed, and uses the decision for the rest of the PHP request. It is the default, and the lowest impact on the connection state. round robin Iterates over the list of configured servers. user-defined via callback Is used to implement any other strategy. The load balancing policy is configured in the plugins configuration file using the random, roundrobin, and user filters. Servers can be prioritized assigning a weight. A server that has been given a weight of two will get twice as many requests as a server that has been given the default weight of one. Prioritization can be handy in heterogenous environments. For example, you may want to assign more requests to a powerful machine than to a less powerful. Or, you may have configured servers that are close or far from the client, thus expose different latencies.

Read-write splitting The plugin executes read-only statements on the configured MySQL slaves, and all other queries on the MySQL master. Statements are considered read-only if they either start with SELECT, the SQL hint /*ms=slave*/, or if a slave had been chosen for running the previous query and the query starts with the SQL hint /*ms=last_used*/. In all other cases, the query will be sent to the MySQL replication master server. It is recommended to use the constants MYSQLND_MS_SLAVE_SWITCH, MYSQLND_MS_MASTER_SWITCH and MYSQLND_MS_LAST_USED_SWITCH instead of /*ms=slave*/. See also the list of mysqlnd_ms constants. SQL hints are a special kind of standard compliant SQL comments. The plugin does check every statement for certain SQL hints. The SQL hints are described within the mysqlnd_ms constants documentation, constants that are exported by the extension. Other systems involved with the statement processing, such as the MySQL server, SQL firewalls, and SQL proxies, are unaffected by the SQL hints, because those systems are designed to ignore SQL comments. The built-in read-write splitter can be replaced by a user-defined filter, see also the user filter documentation. A user-defined read-write splitter can request the built-in logic to send a statement to a specific location, by invoking mysqlnd_ms_is_select. The built-in read-write splitter is not aware of multi-statements. Multi-statements are seen as one statement. The splitter will check the beginning of the statement to decide where to run the statement. If, for example, a multi-statement begins with SELECT 1 FROM DUAL; INSERT INTO test(id) VALUES (1); ... the plugin will run it on a slave although the statement is not read-only.

Filter Version requirement Filters exist as of mysqlnd_ms version 1.1.0-beta. filters. PHP applications that implement a MySQL replication cluster must first identify a group of servers in the cluster which could execute a statement before the statement is executed by one of the candidates. In other words: a defined list of servers must be filtered until only one server is available. The process of filtering may include using one or more filters, and filters can be chained. And they are executed in the order they are defined in the plugins configuration file. Explanation: comparing filter chaining to pipes The concept of chained filters can be compared to using pipes to connect command line utilities on an operating system command shell. For example, an input stream is passed to a processor, filtered, and then transferred to be output. Then, the output is passed as input to the next command, which is connected to the previous using the pipe operator. Available filters: Load balancing filters: random and roundrobin. Selection filter: user, user_multi, quality_of_service. The random filter implements the 'random' and 'random once' load balancing policies. The 'round robin' load balancing can be configured through the roundrobin filter. Setting a 'user defined callback' for server selection is possible with the user filter. The quality_of_service filter finds cluster nodes capable of delivering a certain service, for example, read-your-writes or, not lagging more seconds behind the master than allowed. Filters can accept parameters to change their behaviour. The random filter accepts an optional sticky parameter. If set to true, the filter changes load balancing from random to random once. Random picks a random server every time a statement is to be executed. Random once picks a random server when the first statement is to be executed and uses the same server for the rest of the PHP request. One of the biggest strength of the filter concept is the possibility to chain filters. This strength does not become immediately visible because tje random, roundrobin and user filters are supposed to output no more than one server. If a filter reduces the list of candidates for running a statement to only one server, it makes little sense to use that one server as input for another filter for further reduction of the list of candidates. An example filter sequence that will fail: Statement to be executed: SELECT 1 FROM DUAL. Passed to all filters. All configured nodes are passed as input to the first filter. Master nodes: master_0. Slave nodes:slave_0, slave_1 Filter: random, argument sticky=1. Picks a random slave once to be used for the rest of the PHP request. Output: slave_0. Output of slave_0 and the statement to be executed is passed as input to the next filter. Here: roundrobin, server list passed to filter is: slave_0. Filter: roundrobin. Server list consists of one server only, round robin will always return the same server. If trying to use such a filter sequence, the plugin may emit a warning like (mysqlnd_ms) Error while creating filter '%s' . Non-multi filter '%s' already created. Stopping in %s on line %d. Furthermore, an appropriate error on the connection handle may be set. A second type of filter exists: multi filter. A multi filter emits zero, one or multiple servers after processing. The quality_of_service filter is an example. If the service quality requested sets an upper limit for the slave lag and more than one slave is lagging behind less than the allowed number of seconds, the filter returns more than one cluster node. A multi filter must be followed by other to further reduce the list of candidates for statement execution until a candidate is found. A filter sequence with the quality_of_service multi filter followed by a load balancing filter. Statement to be executed: SELECT sum(price) FROM orders WHERE order_id = 1. Passed to all filters. All configured nodes are passed as input to the first filter. Master nodes: master_0. Slave nodes: slave_0, slave_1, slave_2, slave_3 Filter: quality_of_service, rule set: session_consistency (read-your-writes) Output: master_0 Output of master_0 and the statement to be executed is passed as input to the next filter, which is roundrobin. Filter: roundrobin. Server list consists of one server. Round robin selects master_0. A filter sequence must not end with a multi filter. If trying to use a filter sequence which ends with a multi filter the plugin may emit a warning like (mysqlnd_ms) Error in configuration. Last filter is multi filter. Needs to be non-multi one. Stopping in %s on line %d. Furthermore, an appropriate error on the connection handle may be set. Speculation towards the future: MySQL replication filtering In future versions, there may be additional multi filters. For example, there may be a table filter to support MySQL replication filtering. This would allow you to define rules for which database or table is to be replicated to which node of a replication cluster. Assume your replication cluster consists of four slaves (slave_0, slave_1, slave_2, slave_3) two of which replicate a database named sales (slave_0, slave_1). If the application queries the database slaves, the hypothetical table filter reduces the list of possible servers to slave_0 and slave_1. Because the output and list of candidates consists of more than one server, it is necessary and possible to add additional filters to the candidate list, for example, using a load balancing filter to identify a server for statement execution.

Service level and consistency Version requirement Service levels have been introduced in mysqlnd_ms version 1.2.0-alpha. mysqlnd_ms_set_qos requires PHP 5.4.0 or newer. The plugin can be used with different kinds of MySQL database clusters. Different clusters can deliver different levels of service to applications. The service levels can be grouped by the data consistency levels that can be achieved. The plugin knows about: eventual consistency session consistency strong consistency Depending how a cluster is used it may be possible to achieve higher service levels than the default one. For example, a read from an asynchronous MySQL replication slave is eventual consistent. Thus, one may say the default consistency level of a MySQL replication cluster is eventual consistency. However, if the master only is used by a client for reading and writing during a session, session consistency (read your writes) is given. PECL mysqlnd 1.2.0 abstracts the details of choosing an appropriate node for any of the above service levels from the user. Service levels can be set through the qualify-of-service filter in the plugins configuration file and at runtime using the function mysqlnd_ms_set_qos. The plugin defines the different service levels as follows. Eventual consistency is the default service provided by an asynchronous cluster, such as classical MySQL replication. A read operation executed on an arbitrary node may or may not return stale data. The applications view of the data is eventual consistent. Session consistency is given if a client can always read its own writes. An asynchronous MySQL replication cluster can deliver session consistency if clients always use the master after the first write or never query a slave which has not yet replicated the clients write operation. The plugins understanding of strong consistency is that all clients always see the committed writes of all other clients. This is the default when using MySQL Cluster or any other cluster offering synchronous data distribution. Service level parameters Eventual consistency and session consistency service level accept parameters. Eventual consistency is the service provided by classical MySQL replication. By default, all nodes qualify for read requests. An optional age parameter can be given to filter out nodes which lag more than a certain number of seconds behind the master. The plugin is using SHOW SLAVE STATUS to measure the lag. Please, see the MySQL reference manual to learn about accuracy and reliability of the SHOW SLAVE STATUS command. Session consistency (read your writes) accepts an optional GTID parameter to consider reading not only from the master but also from slaves which already have replicated a certain write described by its transaction identifier. This way, when using asynchronous MySQL replication, read requests may be load balanced over slaves while still ensuring session consistency. The latter requires the use of client-side global transaction id injection. Advantages of the new approach The new approach supersedes the use of SQL hints and the configuration option master_on_write in some respects. If an application running on top of an asynchronous MySQL replication cluster cannot accept stale data for certain reads, it is easier to tell the plugin to choose appropriate nodes than prefixing all read statements in question with the SQL hint to enforce the use of the master. Furthermore, the plugin may be able to use selected slaves for reading. The master_on_write configuration option makes the plugin use the master after the first write (session consistency, read your writes). In some cases, session consistency may not be needed for the rest of the session but only for some, few read operations. Thus, master_on_write may result in more read load on the master than necessary. In those cases it is better to request a higher than default service level only for those reads that actually need it. Once the reads are done, the application can return to default service level. Switching between service levels is only possible using mysqlnd_ms_set_qos. Performance considerations A MySQL replication cluster cannot tell clients which slaves are capable of delivering which level of service. Thus, in some cases, clients need to query the slaves to check their status. PECL mysqlnd_ms transparently runs the necessary SQL in the background. However, this is an expensive and slow operation. SQL statements are run if eventual consistency is combined with an age (slave lag) limit and if session consistency is combined with a global transaction ID. If eventual consistency is combined with an maximum age (slave lag), the plugin selects candidates for statement execution and load balancing for each statement as follows. If the statement is a write all masters are considered as candidates. Slaves are not checked and not considered as candidates. If the statement is a read, the plugin transparently executes SHOW SLAVE STATUS on every slaves connection. It will loop over all connections, send the statement and then start checking for results. Usually, this is slightly faster than a loop over all connections in which for every connection a query is send and the plugin waits for its results. A slave is considered a candidate if SHOW SLAVE STATUS reports Slave_IO_Running=Yes, Slave_SQL_Running=Yes and Seconds_Behind_Master is less or equal than the allowed maximum age. In case of an SQL error, the plugin emits a warning but does not set an error on the connection. The error is not set to make it possible to use the plugin as a drop-in. If session consistency is combined with a global transaction ID, the plugin executes the SQL statement set with the fetch_last_gtid entry of the global_transaction_id_injection section from the plugins configuration file. Further details are identical to those described above. In version 1.2.0 no additional optimizations are done for executing background queries. Future versions may contain optimizations, depending on user demand. If no parameters and options are set, no SQL is needed. In that case, the plugin consider all nodes of the type shown below. eventual consistency, no further options set: all masters, all slaves session consistency, no further options set: all masters strong consistency (no options allowed): all masters Throttling The quality of service filter can be combined with Global transaction IDs to throttle clients. Throttling does reduce the write load on the master by slowing down clients. If session consistency is requested and global transactions idenentifier are used to check the status of a slave, the check can be done in two ways. By default a slave is checked and skipped immediately if it does not match the criteria for session consistency. Alternatively, the plugin can wait for a slave to catch up to the master until session consistency is possible. To enable the throttling, you have to set wait_for_gtid_timeout configuration option.

Global transaction IDs Version requirement Client side global transaction ID injection exists as of mysqlnd_ms version 1.2.0-alpha. Transaction boundaries are detected by monitoring API calls. This is possible as of PHP 5.4.0. Please, see also Transaction handling. As of MySQL 5.6.5-m8 the MySQL server features built-in global transaction identifiers. The MySQL built-in global transaction ID feature is supported by PECL/mysqlnd_ms 1.3.0-alpha or later. Neither are client-side transaction boundary monitoring nor any setup activities required if using the server feature. Please note, all MySQL 5.6 production versions do not provide clients with enough information to use GTIDs for enforcing session consistency. In the worst case, the plugin will choose the master only. Idea and client-side emulation PECL/mysqlnd_ms can do client-side transparent global transaction ID injection. In its most basic form, a global transaction identifier is a counter which is incremented for every transaction executed on the master. The counter is held in a table on the master. Slaves replicate the counter table. In case of a master failure a database administrator can easily identify the most recent slave for promiting it as a new master. The most recent slave has the highest transaction identifier. Application developers can ask the plugin for the global transaction identifier (GTID) for their last successful write operation. The plugin will return an identifier that refers to an transaction no older than that of the clients last write operation. Then, the GTID can be passed as a parameter to the quality of service (QoS) filter as an option for session consistency. Session consistency ensures read your writes. The filter ensures that all reads are either directed to a master or a slave which has replicated the write referenced by the GTID. When injection is done The plugin transparently maintains the GTID table on the master. In autocommit mode the plugin injects an UPDATE statement before executing the users statement for every master use. In manual transaction mode, the injection is done before the application calls commit() to close a transaction. The configuration option report_error of the GTID section in the plugins configuration file is used to control whether a failed injection shall abort the current operation or be ignored silently (default). Please note, the PHP version requirements for transaction boundary monitoring and their limits. Limitations Client-side global transaction ID injection has shortcomings. The potential issues are not specific to PECL/mysqlnd_ms but are rather of general nature. Global transaction ID tables must be deployed on all masters and replicas. The GTID can have holes. Only PHP clients using the plugin will maintain the table. Other clients will not. Client-side transaction boundary detection is based on API calls only. Client-side transaction boundary detection does not take implicit commit into account. Some MySQL SQL statements cause an implicit commit and cannot be rolled back. Using server-side global transaction identifier Starting with PECL/mysqlnd_ms 1.3.0-alpha the MySQL 5.6.5-m8 or newer built-in global transaction identifier feature is supported. Use of the server feature lifts all of the above listed limitations. Please, see the MySQL Reference Manual for limitations and preconditions for using server built-in global transaction identifiers. Whether to use the client-side emulation or the server built-in functionality is a question not directly related to the plugin, thus it is not discussed in depth. There are no plans to remove the client-side emulation and you can continue to use it, if the server-side solution is no option. This may be the case in heterogenous environments with old MySQL server or, if any of the server-side solution limitations is not acceptable. From an applications perspective there is hardly a difference in using one or the other approach. The following properties differ. Client-side emulation, as shown in the manual, is using an easy to compare sequence number for global transactions. Multi-master is not handled to keep the manual examples easy. Server-side built-in feature is using a combination of a server identifier and a sequence number as a global transaction identifier. Comparison cannot use numeric algebra. Instead a SQL function must be used. Please, see the MySQL Reference Manual for details. Server-side built-in feature of MySQL 5.6 cannot be used to ensure session consistency under all circumstances. Do not use it for the quality-of-service feature. Here is a simple example why it will not give reliable results. There are more edge cases that cannot be covered with limited functionality exported by the server. Currently, clients can ask a MySQL replication master for a list of all executed global transaction IDs only. If a slave is configured not to replicate all transactions, for example, because replication filters are set, then the slave will never show the same set of executed global transaction IDs. Albeit the slave may have replicated a clients writes and it may be a candidate for a consistent read, it will never be considered by the plugin. Upon write the plugin learns from the master that the servers complete transaction history consists of GTID=1..3. There is no way for the plugin to ask for the GTID of the write transaction itself, say GTID=3. Assume that a slave does not replicate the transactions GTID=1..2 but only GTID=3 because of a replication feature. Then, the slaves transaction history is GTID=3. However, the plugin tries to find a node which has a transaction history of GITD=1...3. Albeit the slave has replicated the clients write and session consistency may be achieved when reading from the slave, it will not be considered by the plugin. This is not a fault of the plugin implementation but a feature gap on the server side. Please note, this is a trivial case to illustrate the issue there are other issues. In sum you are asked not to attempt using MySQL 5.6 built-in GTIDs for enforcing session consistency. Sooner or later the load balancing will stop working properly and the plugin will direct all session consistency requests to the master. Plugin global transaction ID statistics are only available with client-side emulation because they monitor the emulation. Global transaction identifiers in distributed systems Global transaction identifiers can serve multiple purposes in the context of distributed systems, such as a database cluster. Global transaction identifiers can be used for, for example, system wide identification of transactions, global ordering of transactions, heartbeat mechanism and for checking the replication status of replicas. PECL/mysqlnd_ms, a clientside driver based software, does focus on using GTIDs for tasks that can be handled at the client, such as checking the replication status of replicas for asynchronous replication setups.

Cache integration Version requirement The feature requires use of PECL/mysqlnd_ms 1.3.0-beta or later, and PECL/mysqlnd_qc 1.1.0-alpha or newer. PECL/mysqlnd_ms must be compiled to support the feature. PHP 5.4.0 or newer is required. Setup: extension load order PECL/mysqlnd_ms must be loaded before PECL/mysqlnd_qc, when using shared extensions. Feature stability The cache integration is of beta quality. Suitable MySQL clusters The feature is targeted for use with MySQL Replication (primary copy). Currently, no other kinds of MySQL clusters are supported. Users of such cluster must control PECL/mysqlnd_qc manually if they are interested in client-side query caching. Support for MySQL replication clusters (asynchronous primary copy) is the main focus of PECL/mysqlnd_ms. The slaves of a MySQL replication cluster may or may not reflect the latest updates from the master. Slaves are asynchronous and can lag behind the master. A read from a slave is eventual consistent from a cluster-wide perspective. The same level of consistency is offered by a local cache using time-to-live (TTL) invalidation strategy. Current data or stale data may be served. Eventually, data searched for in the cache is not available and the source of the cache needs to be accessed. Given that both a MySQL Replication slave (asynchronous secondary) and a local TTL-driven cache deliver the same level of service it is possible to transparently replace a remote database access with a local cache access to gain better possibility. As of PECL/mysqlnd_ms 1.3.0-beta the plugin is capable of transparently controlling PECL/mysqlnd_ms 1.1.0-alpha or newer to cache a read-only query if explicitly allowed by setting an appropriate quality of service through mysqlnd_ms_set_qos. P lease, see the quickstart for a code example. Both plugins must be installed, PECL/mysqlnd_ms must be compiled to support the cache feature and PHP 5.4.0 or newer has to be used. Applications have full control of cache usage and can request fresh data at any time, if need be. Thec ache usage can be enabled and disabled time during the execution of a script. The cache will be used if mysqlnd_ms_set_qos sets the quality of service to eventual consistency and enables cache usage. Cache usage is disabled by requesting higher consistency levels, for example, session consistency (read your writes). Once the quality of service has been relaxed to eventual consistency the cache can be used again. If caching is enabled for a read-only statement, PECL/mysqlnd_ms may inject SQL hints to control caching by PECL/mysqlnd_qc. It may modify the SQL statement it got from the application. Subsequent SQL processors are supposed to ignore the SQL hints. A SQL hint is a SQL comment. Comments must not be ignored, for example, by the database server. The TTL of a cache entry is computed on a per statement basis. Applications set an maximum age for the data they want to retrieve using mysqlnd_ms_set_qos. The age sets an approximate upper limit of how many seconds the data returned may lag behind the master. The following logic is used to compute the actual TTL if caching is enabled. The logic takes the estimated slave lag into account for choosing a TTL. If, for example, there are two slaves lagging 5 and 10 seconds behind and the maximum age allowed is 60 seconds, the TTL is set to 50 seconds. Please note, the age setting is no more than an estimated guess. Check whether the statement is read-only. If not, don't cache. If caching is enabled, check the slave lag of all configured slaves. Establish slave connections if none exist so far and lazy connections are used. Send SHOW SLAVE STATUS to all slaves. Do not wait for the first slave to reply before sending to the second slave. Clients often wait long for replies, thus we send out all requests in a burst before fetching in a second stage. Loop over all slaves. For every slave wait for its reply. Do not start checking another slave before the currently waited for slave has replied. Check for Slave_IO_Running=Yes and Slave_SQL_Running=Yes. If both conditions hold true, fetch the value of Seconds_Behind_Master. In case of any errors or if conditions fail, set an error on the slave connection. Skip any such slave connection for the rest of connection filtering. Search for the maximum value of Seconds_Behind_Master from all slaves that passed the previous conditions. Subtract the value from the maximum age provided by the user with mysqlnd_ms_set_qos. Use the result as a TTL. The filtering may sort out all slaves. If so, the maximum age is used as TTL, because the maximum lag found equals zero. It is perfectly valid to sort out all slaves. In the following it is up to subsequent filter to decide what to do. The built-in load balancing filter will pick the master. Inject the appropriate SQL hints to enable caching by PECL/mysqlnd_qc. Proceed with the connection filtering, e.g. apply load balancing rules to pick a slave. PECL/mysqlnd_qc is loaded after PECL/mysqlnd_ms by PHP. Thus, it will see all query modifications of PECL/mysqlnd_ms and cache the query if instructed to do so. The algorithm may seem expensive. SHOW SLAVE STATUS is a very fast operation. Given a sufficient number of requests and cache hits per second the cost of checking the slaves lag can easily outweight the costs of the cache decision. Suggestions on a better algorithm are always welcome.

Supported clusters Any application using any kind of MySQL cluster is faced with the same tasks: Identify nodes capable of executing a given statement with the required service level Load balance requests within the list of candidates Automatic fail over within candidates, if needed The plugin is optimized for fulfilling these tasks in the context of a classical asynchronous MySQL replication cluster consisting of a single master and many slaves (primary copy). When using classical, asynchronous MySQL replication all of the above listed tasks need to be mastered at the client side. Other types of MySQL cluster may have lower requirements on the application side. For example, if all nodes in the cluster can answer read and write requests, no read-write splitting needs to be done (multi-master, update-all). If all nodes in the cluster are synchronous, they automatically provide the highest possible quality of service which makes choosing a node easier. In this case, the plugin may serve the application after some reconfiguration to disable certain features, such as built-in read-write splitting. Documentation focus The documentation focusses describing the use of the plugin with classical asynchronous MySQL replication clusters (primary copy). Support for this kind of cluster has been the original development goal. Use of other clusters is briefly described below. Please note, that this is still work in progress. Primary copy (MySQL Replication) This is the primary use case of the plugin. Follow the hints given in the descriptions of each feature. Configure one master and one or more slaves. Server configuration details are given in the setup section. Use random load balancing policy together with the sticky flag. If you do not plan to use the service level API calls, add the master on write flag. Please, make yourself aware of the properties of automatic failover before adding a failover directive. Consider the use of trx_stickiness to execute transactions on the primary only. Please, read carefully how it works before you rely on it. Enabling the plugin (php.ini) Basic plugin configuration (mysqlnd_ms_plugin.ini) for MySQL Replication Primary copy with multi primaries (MMM - MySQL Multi Master) MySQL Replication allows you to create cluster topologies with multiple masters (primaries). Write-write conflicts are not handled by the replication system. This is no update anywhere setup. Thus, data must be partitioned manually and clients must redirected in accordance to the partitioning rules. The recommended setup is equal to the sharding setup below. Manual sharding, possibly combined with primary copy and multiple primaries Use SQL hints and the node group filter for clusters that use data partitioning but leave query redirection to the client. The example configuration shows a multi master setup with two shards. Multiple primaries - multi master (php.ini) Primary copy with multiple primaries and paritioning The plugin can also be used with a loose collection of unrelated shards. For such a cluster, configure masters only and disable read write splitting. The nodes of such a cluster are called masters in the plugin configuration as they accept both reads and writes for their partition. Using synchronous update everywhere clusters such as MySQL Cluster MySQL Cluster is a synchronous cluster solution. All cluster nodes accept read and write requests. In the context of the plugin, all nodes shall be considered as masters. Use the load balancing and fail over features only. Disable the plugins built-in read-write splitting. Configure masters only. Consider random once load balancing strategy, which is the plugins default. If random once is used, only masters are configured and no SQL hints are used to force using a certain node, no connection switches will happen for the duration of a web request. Thus, no special handling is required for transactions. The plugin will pick one master at the beginning of the PHP script and use it until the script terminates. Do not set the quality of service. All nodes have all the data. This automatically gives you the highest possible service quality (strong consistency). Do not enable client-side global transaction injection. It is neither required to help with server-side fail over nor to assist the quality of service filter choosing an appropriate node. Disabling built-in read-write splitting. Set mysqlnd_ms.disable_rw_split=1 Do not use SQL hints to enforce the use of slaves Configure masters only. Set mysqlnd_ms.multi_master=1. Do not configure any slaves. Set failover=loop_before_master in the plugins configuration file to avoid warnings about the empty slave list and to make the failover logic loop over all configured masters before emitting an error. Please, note the warnings about automatic failover given in the previous sections. Multiple primaries - multi master (php.ini) Synchronous update anywhere cluster If running an update everywhere cluster that has no built-in partitioning to avoid hot spots and high collision rates, consider using the node groups filter to keep updates on a frequently accessed table on one of the nodes. This may help to reduce collision rates and thus improve performance.

XA/Distributed transactions Version requirement XA related functions have been introduced in PECL/mysqlnd_ms version 1.6.0-alpha. Early adaptors wanted The feature is currently under development. There may be issues and/or feature limitations. Do not use in production environments, although early lab tests indicate reasonable quality. Please, contact the development team if you are interested in this feature. We are looking for real life feedback to complement the feature. Below is a list of some feature restrictions. The feature is not yet compatible with the MySQL Fabric support . This limitation is soon to be lifted. XA transaction identifier are currently restricted to numbers. This limitation will be lifted upon request, it is a simplification used during the initial implementation. MySQL server restrictions The XA support by the MySQL server has some restrictions. Most noteably, the servers binary log may lack changes made by XA transactions in case of certain errors. Please, see the MySQL manual for details. XA/Distributed transactions can spawn multiple MySQL servers. Thus, they may seem like a perfect tool for sharded MySQL clusters, for example, clusters managed with MySQL Fabric. PECL/mysqlnd_ms hides most of the SQL commands to control XA transactions and performs automatic administrative tasks in cases of errors, to provide the user with a comprehensive API. Users should setup the plugin carefully and be well aware of server restrictions prior to using the feature. General pattern for XA transactions query("UPDATE some_table SET col_a = 1"); ... /* COMMIT */ mysqlnd_ms_xa_commit($link, 1); ?> ]]> XA transactions use the two-phase commit protocol. The two-phase commit protocol is a blocking protocol. During the first phase participating servers begin a transaction and the client carries out its work. This phase is followed by a second voting phase. During voting, the servers first make a firm promise that they are ready to commit the work even in case of their possible unexpected failure. Should a server crash in this phase, it will still recall the aborted transaction after recover and wait for the client to decide on whether it shall be committed or rolled back. Should a client that has initiated a global transaction crash after all the participating servers gave their promise to be ready to commit, then the servers must wait for a decision. The servers are not allowed to uniliterally decide on the transaction. A client crash or disconnect from a participant, a server crash or server error during the fist phase of the protocol is uncritical. In most cases, the server will forget about the XA transaction and its work is rolled back. Additionally, the plugin tries to reach out to as many participants as it can to instruct the server to roll back the work immediately. It is not possible to disable this implicit rollback carried out by PECL/mysqlnd_ms in case of errors during the first phase of the protocol. This design decision has been made to keep the implementation simple. An error during the second phase of the commit protocol can develop into a more severe situation. The servers will not forget about prepared but unfinished transactions in all cases. The plugin will not attempt to solve these cases immediately but waits for optional background garbage collection to ensure progress of the commit protocol. It is assumed that a solution will take significant time as it may include waiting for a participating server to recover from a crash. This time span may be longer than a developer and end user expects when trying to commit a global transaction with mysqlnd_ms_xa_commit. Thus, the function returns with the unfinished global transaction still requiring attention. Please, be warned that at this point, it is not yet clear whether the global transaction will be committed or rolled back later on. Errors during the second phase can be ignored, handled by yourself or solved by the build-int garbage collection logic. Ignoring them is not recommended as you may experience unfinished global transactions on your servers that block resources virtually indefinetly. Handling the errors requires knowing the participants, checking their state and issueing appropriate SQL commands on them. There are no user API calls to expose this very information. You will have to configure a state store and make the plugin record its actions in it to receive the desired facts. Please, see the quickstart and related plugin configuration file settings for an example how to configure a state. In addition to configuring a state store, you have to setup some SQL tables. The table definitions are given in the description of the plugin configuration settings. Setting up and configuring a state store is also a precondition for using the built-in garbage collection for XA transactions that fail during the second commit phase. Recording information about ongoing XA transactions is an unavoidable extra task. The extra task consists of updating the state store after each and every operation that changes the state of the global transaction itself (started, committed, rolled back, errors and aborts), the addition of participants (host, optionally user and password required to connect) and any changes to a participants state. Please note, depending on configuration and your security policies, these recordings may be considered sensitive. It is therefore recommended to restrict access to the state store. Unless the state store itself becomes overloaded, writing the state information may contribute noteworthy to the runtime but should overall be only a minor factor. It is possible that the effort it takes to implement your own routines for handling XA transactions that failed during the second commit phase exceeds the benefits of using the XA feature of PECL/mysqlnd_ms in the first place. Thus, the manual focussed on using the built-on garbage collection only. Garbage collection can be triggered manually or automatically in the background. You may want to call mysqlnd_ms_xa_gc immediately after a commit failure to attempt to solve any failed but still open global transactions as soon as possible. You may also decide to disable the automatic background garbage collection, implement your own rule set for invoking the built-in garbage collection and trigger it when desired. By default the plugin will start the garbage collection with a certain probability in the extensions internal RSHUTDOWN method. The request shutdown is called after your script finished. Whether the garbage collection will be triggered is determined by computing a random value between 1...1000 and comparing it with the configuration setting probability (default: 5). If the setting is greater or equal to the random value, the garbage collection will be triggered. Once started, the garbage collection acts upon up to max_transactions_per_run (default: 100) global transactions recorded. Records include successfully finished but also unfinished XA transactions. Records for successful transactions are removed and unfinished transactions are attempted to be solved. There are no statistics that help you finding the right balance between keeping garbage collection runs short by limiting the number of transactions considered per run and preventing the garbage collection to fall behind, resulting in many records. For each failed XA transaction the garbage collection makes max_retries (default: 5) attempts to finish it. After that PECL/mysqlnd_ms gives up. There are two possible reasons for this. Either a participating server crashed and has not become accessible again within max_retries invocations of the garbage collection, or there is a situation that the built-in garbage collection cannot cope with. Likely, the latter would be considered a bug. However, you can manually force more garbage collection runs calling mysqlnd_ms_xa_gc with the appropriate parameter set. Should even those function runs fail to solve the situation, then the problem must be solved by an operator. The function mysqlnd_ms_get_stats provides some statistics on how many XA transactions have been started, committed, failed or rolled back.