Paul DuBois
paul@kitebird.com
Document revision: 1.06
Last update: 2007-05-26
Programs that access MySQL databases can be written in the Ruby
scripting language by using Tomita Masahiro's MySQL module. This
module provides a Ruby client API; it is implemented as a wrapper
around the MySQL C client API. This document describes how to
install the MySQL module and use it to write MySQL-based Ruby
scripts. A related document describes the Ruby DBI module that
provides a higher-level interface that is more abstract and database-independent.
See the "Resources" section for a pointer to that document
and also for information about downloading the example scripts
used here.
Obtaining and Installing the MySQL Module
To use the Ruby MySQL module, first make sure that you have the
MySQL C client API header files and libraries installed. This
is a requirement because the API provided by the Ruby module is
built on top of the C API.
The Ruby MySQL module can be obtained from the following site:
http://www.tmtm.org/en/mysql/ruby/The module is distributed as a compressed tar file, which you should unpack after downloading it. For example, if the current version is 2.7.1, the distribution file can be unpacked using either of the following commands:
% tar zxf mysql-ruby-2.7.1.tar.gz % gunzip < mysql-ruby-2.7.1.tar.gz | tar xf -After unpacking the distribution, change location into its top-level directory and configure it using the extconf.rb script in that directory:
% ruby extconf.rbIf extconf.rb successfully locates your MySQL header file and library directories, you can proceed to build and install the module. Otherwise, it indicates what it could not find, and you'll need to run the command again with additional options that specify the appropriate directory locations. For example, if your header file and library directories are /usr/local/mysql/include/mysql and /usr/local/mysql/include/lib, the configuration command looks like this:
% ruby extconf.rb \ --with-mysql-include=/usr/local/mysql/include/mysql \ --with-mysql-lib=/usr/local/mysql/lib/mysqlAlternatively, tell extconf.rb where to find the mysql_config program. In that case, extconf.rb runs mysql_config to locate the header and library files:
% ruby extconf.rb --with-mysql-config=/usr/local/mysql/bin/mysql_configAfter configuring the distribution, build and install the module:
% make % make installYou might need to run the installation command as root.
If you have problems getting the module installed, see the distribution's
README file for additional configuration and build information.
The preceding instructions apply to Unix systems. It is possible
to install the module on Windows as well, but you need some sort
of Unix-like environment such as Cygwin. For links to pages that
provide Windows instructions, see the "Resources" section.
MySQL Module Overview
The MySQL module defines four classes:
With the Ruby MySQL module installed, you should be able to access
your MySQL server from within Ruby programs. Assume for purposes
of this article that the server is running on the local host and
that you have access to a database named test by connecting
using an account that has a username and password of testuser
and testpass. You can set up this account by using the
mysql program to connect to the server as the MySQL root
user and issuing the following statement:
mysql> GRANT ALL ON test.* TO 'testuser'@'localhost' IDENTIFIED BY 'testpass';If the test database does not exist, create it with this statement:
mysql> CREATE DATABASE test;If you want to use a different server host, username, password, or database name, just substitute the appropriate values in each of the scripts discussed in the remainder of this article.
As a first exercise in MySQL-based Ruby programming, let's create
a script named simple.rb that just connects to the server,
retrieves and displays the server version, and disconnects. Use
a text editor to create simple.rb with the following contents
(or download the script from the link listed in "Resources"):
#!/usr/bin/ruby -w # simple.rb - simple MySQL script using Ruby MySQL module require "mysql" begin # connect to the MySQL server dbh = Mysql.real_connect("localhost", "testuser", "testpass", "test") # get server version string and display it puts "Server version: " + dbh.get_server_info rescue Mysql::Error => e puts "Error code: #{e.errno}" puts "Error message: #{e.error}" puts "Error SQLSTATE: #{e.sqlstate}" if e.respond_to?("sqlstate") ensure # disconnect from server dbh.close if dbh endThe script works as follows:
% ruby simple.rb Server version: 5.1.14-beta-logIf simple.rb does not execute successfully, an error will occur. Methods in the MySQL module raise a Mysql::Error exception when they fail. Exception objects have read-only error, errno, and (for recent versions of the MySQL module) sqlstate values that contain the error message string, numeric error code, and five-character SQLSTATE value. The rescue clause in simple.rb illustrates how to access the exception values: It places a reference to the Mysql::Error exception object in e, and the body of the block prints the errno, error, and sqlstate values to provide information about the cause of the failure. To see what happens when an exception occurs, change one of the connection parameters in the real_connect call to some invalid value (such as changing the username to an invalid name), then run simple.rb again. It will display error information as follows:
% ruby simple.rb Error code: 1045 Error message: Access denied for user 'nouser'@'localhost' (using password: YES) Error SQLSTATE: 28000
Statements such as CREATE TABLE, INSERT,
DELETE, and UPDATE return no result set and
are quite easy to process. Statements such as SELECT
and SHOW do return rows; it takes a little more work
to process them. The following discussion shows how to handle
both types of statements. (The code is part of the animal.rb
script available for download as described in "Resources.")
Processing Statements that Return No Result Set
To execute a statement that does not return a result set, invoke
the database handle's query method to send the statement
to the server. If you want to know how many rows the statement
affected, invoke affected_rows to get the count. The
following code demonstrates this by initializing a table named
animal that contains two columns, name and category.
It drops any existing version of the table, creates it anew, and
then inserts some sample data into it. Each of these operations
requires only an invocation of the query method to send
the appropriate statement to the server. After issuing the INSERT,
the script also invokes affected_rows to determine how
many rows were added to the table:
dbh.query("DROP TABLE IF EXISTS animal") dbh.query("CREATE TABLE animal ( name CHAR(40), category CHAR(40) ) ") dbh.query("INSERT INTO animal (name, category) VALUES ('snake', 'reptile'), ('frog', 'amphibian'), ('tuna', 'fish'), ('racoon', 'mammal') ") puts "Number of rows inserted: #{dbh.affected_rows}"
To execute a statement that returns a result set, a typical sequence
of events is as follows:
# issue a retrieval query, perform a fetch loop, print # the row count, and free the result set res = dbh.query("SELECT name, category FROM animal") while row = res.fetch_row do printf "%s, %s\n", row[0], row[1] end puts "Number of rows returned: #{res.num_rows}" res.freeThe example fetches the rows using a while loop and the result set's fetch_row method. Another approach is to use the each iterator directly with the result set object:
res = dbh.query("SELECT name, category FROM animal") res.each do |row| printf "%s, %s\n", row[0], row[1] end puts "Number of rows returned: #{res.num_rows}" res.freefetch_row and each return successive rows of the result, each row as an array of column values. There are hashed versions of each of these that return rows as hashes keyed by column name. The hash method, fetch_hash is used like this:
res = dbh.query("SELECT name, category FROM animal") while row = res.fetch_hash do printf "%s, %s\n", row["name"], row["category"] end puts "Number of rows returned: #{res.num_rows}" res.freeThe hash iterator, each_hash, works like this:
res = dbh.query("SELECT name, category FROM animal") res.each_hash do |row| printf "%s, %s\n", row["name"], row["category"] end puts "Number of rows returned: #{res.num_rows}" res.freeBy default, hash keys in rows returned by fetch_hash and each_hash are column names. This can result in loss of values if multiple columns have the same name. For example, the following statement produces two columns named i:
SELECT t1.i, t2.i FROM t1, t2;Only one of the columns will be accessible if you process rows as hashes. To disambiguate hash elements in such cases, you can supply a with_table=true argument to fetch_hash or each_hash. This causes each hash key to be qualified with the appropriate table name, in tbl_name.col_name format. It's still possible to lose values, because if you select the same value from a table multiple times, they'll both have the same qualified name--but since both columns will have the same value anyway, it hardly matters.
When you use with_table=true, remember to access column
values in row hashes with key values that include the table name.
For example:
res = dbh.query("SELECT name, category FROM animal") res.each_hash(with_table = true) do |row| printf "%s, %s\n", row["animal.name"], row["animal.category"] end puts "Number of rows returned: #{res.num_rows}" res.freeIf you use aliases in your statement, either for tables or columns, those aliases are used in the hash keys rather than the original table or column names.
With fetch_row and each, you must know the order
in which column values are present in each row. This makes them
unsuitable for SELECT * statements because no
column order can be assumed. fetch_hash and each_hash
enable column values to be accessed by column name. They're less
efficient than the array versions, but more suited to processing
the results of SELECT * statements because you
need know nothing about order of columns within the result set.
For result sets fetched with with_table=true, the tbl_name
part of the hash key is empty for columns calculated from expressions.
Suppose you issue the following statement:
SELECT i, i+0, VERSION(), 4+2 FROM t;Only the first column comes directly from the table t, so it's the only column for which the hash key contains a table name. The hash keys for rows of the statement are "t.i", ".i+0", ".VERSION()", and ".4+2".
NULL values in result sets are represented by the Ruby
nil value. Beginning with the animal table used
thus far, we can insert a row containing NULL values
like this:
dbh.query("INSERT INTO animal (name, category) VALUES (NULL, NULL)")The following code retrieves and prints the table contents:
res = dbh.query("SELECT name, category FROM animal") res.each do |row| printf "%s, %s\n", row[0], row[1] end res.freeThe output produced by the loop is as follows. Note that NULL values show up as empty values in the last line of the output:
snake, reptile frog, amphibian tuna, fish racoon, mammal ,To detect NULL values and print the word "NULL" instead, the loop can look for nil values in the result:
res.each do |row| row[0] = "NULL" if row[0].nil? row[1] = "NULL" if row[1].nil? printf "%s, %s\n", row[0], row[1] endNow the output becomes:
snake, reptile frog, amphibian tuna, fish racoon, mammal NULL, NULLOf course, individual column value testing quickly becomes ugly as the number of columns increases. A more Ruby-like way to map nil to a printable "NULL" is to use collect, a technique that has the advantage of being a one-liner no matter the number of columns:
res.each do |row| row = row.collect { |v| v.nil? ? "NULL" : v } printf "%s, %s\n", row[0], row[1] endOr, to modify the row in place, use the collect! method:
res.each do |row| row.collect! { |v| v.nil? ? "NULL" : v } printf "%s, %s\n", row[0], row[1] end
Suppose we want to put a new animal into the animal table,
but we don't know its category. We could use "don't know"
as the category value, but a statement written as follows
raises an exception:
dbh.query("INSERT INTO animal (name, category) VALUES ('platypus','don't know')")That statement contains a single quote within a single-quoted string, which is syntactically illegal. To make the statement legal, escape the quote with a backslash:
dbh.query("INSERT INTO animal (name, category) VALUES ('platypus','don\'t know')")However, for an arbitrary data value (such as a value stored in a variable), you might not know whether or not it contains any special characters. To make the value safe for insertion as a data value in a statement, use the escape_string method, or its alias, quote. These methods map onto the C mysql_real_escape_string() function if it is available and to the mysql_escape_string() function otherwise.
Using escape_string, the platypus record might be inserted
as follows:
name = dbh.escape_string("platypus") category = dbh.escape_string("don't know") dbh.query("INSERT INTO animal (name, category) VALUES ('" + name + "','" + category + "')")Strictly speaking, it's unnecessary to process a name value like "platypus" with escape_string, because it contains no special characters. But it's not a bad idea to develop the habit of escaping your data values, especially if you obtain them from an external source such as a web script.
Note that escape_string does not add any surrounding
quotes around data values; you'll need to do that yourself. Also,
take care about using escape_string to handle nil
values; it will throw an exception. If a data value is nil,
you should insert the literal word "NULL" into your
statement without surrounding quotes instead of invoking
escape_string.
Statement Result Metadata
For a statement that doesn't return any rows (such as INSERT),
the only statement metadata available is the number of rows affected.
This value can be obtained by invoking the affected_rows
method of your database handle.
For a statement that does return rows (such as SELECT),
available metadata includes the number of rows and columns in
the result set, as well as information describing the characteristics
of each column, such as its name and type. All this information
is available through the result set object:
If you know whether a statement returns rows, you can tell in
advance which metadata methods are appropriate for obtaining information
about the statement result. If you don't know, you can determine
which methods are applicable using the result from query.
If query returns nil, there is no result set.
Otherwise use the value as a result set object through which the
metadata can be obtained.
The following example shows how to use this technique to display
metadata for any arbitrary statement, assumed here to be stored
as a string in the stmt variable. The script issues the
statement and examines the result set to determine which types
of metadata are available:
res = dbh.query(stmt) puts "Statement: #{stmt}" if res.nil? then puts "Statement has no result set" printf "Number of rows affected: %d\n", dbh.affected_rows else puts "Statement has a result set" printf "Number of rows: %d\n", res.num_rows printf "Number of columns: %d\n", res.num_fields res.fetch_fields.each_with_index do |info, i| printf "--- Column %d (%s) ---\n", i, info.name printf "table: %s\n", info.table printf "def: %s\n", info.def printf "type: %s\n", info.type printf "length: %s\n", info.length printf "max_length: %s\n", info.max_length printf "flags: %s\n", info.flags printf "decimals: %s\n", info.decimals end res.free end
When using the MySQL C client library, you typically process a
statement by calling mysql_query() or mysql_real_query()
to send the statement string to the server, mysql_store_result()
to generate the result set, a row-fetching function to get the
rows of the result set, and mysql_free_result() to release
the result set.
By default, the Ruby query method handles the first two
parts of that process. That is, it sends the statement string
to the server and then automatically invokes store_result
to generate the result set, which it returns as a Mysql::Result
object.
If you want to suppress automatic result set generation by query,
set your database handle's query_with_result variable
to false:
dbh.query_with_result = falseThe effect of this is that after invoking query, you must generate the result set yourself before fetching its rows. To do so, invoke either store_result or use_result explicitly to obtain the result set object. This in fact the approach you must use if you want to retrieve rows with use_result:
dbh.query("SELECT name, category FROM animal") res = dbh.use_result while row = res.fetch_row do printf "%s, %s\n", row[0], row[1] end puts "Number of rows returned: #{res.num_rows}" res.freeNote that if you fetch rows with use_result, the row count will not be correct until after you have fetched all the rows. (With store_result, the row count is correct as soon as you generate the result set.)
As shown earlier, you connect to the server by invoking real_connect
as a class method to obtain a database handle object:
dbh = Mysql.real_connect("localhost", "testuser", "testpass", "test")It's also possible to connect by first invoking the init class method to obtain a database handle object, and then invoking real_connect as a method of that object:
dbh = Mysql.init dbh.real_connect("localhost", "testuser", "testpass", "test")In itself, this approach doesn't gain you anything over invoking real_connect as a class method. Its advantage is that it enables you to specify options that afford more specific control over the connection. To do this, invoke the object's options method one or more times before invoking real_connect. options takes two arguments indicating an option type and its value. The names correspond to the symbolic constants used for the C mysql_options() function. For example, if you want to connect using parameters listed in the [client] group in the standard option files, rather than specifying them in the real_connect call, do this:
dbh = Mysql.init dbh.options(Mysql::READ_DEFAULT_GROUP, "client") dbh.real_connectThe real_connect method takes up to seven parameters. The full invocation syntax is:
real_connect(host,user,password,db,port,socket,flags)The host, user, password, and db parameters have already been discussed.
The port and socket parameters
indicate the port number (for TCP/IP connections) and the Unix
domain socket file pathname (for connections to localhost).
They can be used to override the defaults (which typically are
3306 and /tmp/mysql.sock).
The flags argument can be used to specify additional
connection flags. The allowable flag names correspond to the symbolic
constants used for the C mysql_real_connect() function.
The flag values are bit values and can be OR-ed or added together.
For example, if you want to connect using the compressed client/server
protocol and to tell the server to use the interactive-client
timeout value, specify the flags value like this:
Mysql::CLIENT_COMPRESS | Mysql::CLIENT_INTERACTIVEOr like this:
Mysql::CLIENT_COMPRESS + Mysql::CLIENT_INTERACTIVE
The Ruby MySQL module mirrors the C API fairly closely and provides
bindings to most of the C client functions. Several of the C API
functions are deprecated and to be avoided, which means that you
should also try to avoid the corresponding Ruby methods. You can
do this without any loss of functionality in your Ruby scripts;
the reason functions in the C API become deprecated is that they
are superceded by other ways to achieve the same effect. For example,
the mysql_create_db() function now is deprecated because
you can issue a CREATE DATABASE SQL statement
with mysql_query() or mysql_real_query(). Correspondingly,
instead of using the Ruby create_db method to create
a database, do this:
dbh.query("CREATE DATABASE db_name")
The scripts that are used for examples in this document can be
downloaded from the following location:
http://www.kitebird.com/articles/Another document at that location discusses database programming using the database interface provided by the Ruby DBI module.
The following references might be helpful as sources of information
about Ruby, the Ruby MySQL module, and the C API on which the
module is built:
http://www.ruby-lang.org/en/
http://www.tmtm.org/en/mysql/ruby/Tomita's site also provides links to pages that have instructions for installing the module on Windows.
http://www.mysql.com/The MySQL C API is defined in the MySQL Reference Manual:
http://dev.mysql.com/doc/mysql/
http://www.kitebird.com/mysql-book/The book also contains a reference appendix listing all the data types and functions in the C API.