Technology

Data storage layers in the Zend framework

August 29, 2011
By Pieter de Zwart

You have to hand it to Zend. Their Framework lets developers build solid LAMP applications with all kinds of useful, out-of-the-box support for RESTful servers, plugging into Lucene, handling LDAP interactions, and even interacting with Windows Azure! Yes! Windows Azure! Something I didn’t even know existed until I browsed Zend’s API docs. I wonder how much Microsoft paid them to include that package…

Anyway, back to the topic at hand. Storing things. In places. With stuff.

Your average web application generally starts off with a basic database back-end like MySQL orPostgreSQL. That works, for a while—until the database starts sweating under load because you are so awesome and successful that your website pulls in a million hits a minute. Reads begin to far exceed writes at the database level. And for most of the time, you’re looking objects with their primary keys, like user or blog post IDs. This means you have an opportunity to relieve the load on your database database (all these dumb ID lookups), and instead let it do what it does best: store, manage and query complex relational data in an atomic and transactional fashion.

Relieving this load is as simple as putting a caching mechanism in front of the database that can, given an entity type and ID, quickly spit out corresponding objects. A good caching system will actually return data much faster than a database, since the data is all stored in memory and built to handle this (and only this) scenario.

At least it should be this simple, in theory.

Unfortunately, Zend only comes with the back-end caching basics: wrapper classes for things like Memcache, WinCache, APC and even the file system. (Front end caching is also pretty cool, and Zend provides some nifty utilities. More on this later.) There is, however, no integration between the caching and storage layers—yet this is exactly what we need to alleviate database load. Being solution oriented people, we put our thinking caps on and got to work.

Think of data storage as a series of layers, where each successfully deeper layer is more versatile, more complex, and therefore slower.

The highest layer is local process memory, which is the fastest possible place to cache things. Highly modular applications that are quick to build often have the tradeoff of having to create multiple instances of the same object within the same page request. An in-memory object cache, on the other hand, drastically reduces the cost of recreating objects, allowing you to continue to develop at a high velocity.

The second highest layer is local shared memory: memory not owned by the current running process and therefore accessible by all page requests. Chances are if you’re looking to cache things you’ve moved beyond single front-end servers, so let’s talk about a multi-server approach: distributed key/value stores like Memcache. Memcache is generally set up in a pool configuration, evenly distributing cached items across multiple servers using a hash of the key. If one server takes a dive you don’t lose all your user cache objects and thereby put heavy load on the database users table to rebuild its cache set. Instead, you lose cached information from all tables, and your storage backend will therefore be able to avoid locking its reason one table—a good thing.

The lowest layer is the database, which could be any piece of software that is ACID compliant. This is the layer that keeps all your data shiny and clean (read: available and not corrupted) and minimizes computational costs to when the unexpected happens, like someone tripping over your power cord.

Now that we know what the data storage layers are, our next step is to understand the interaction between these layers.

Let’s say the application scope requests a user object with an ID of 23. The first step should be to check the local process memory (the first layer) because hey, you never know: this user object may already have been loaded for the current session. If no cached user object exists here, we move down to Memcache and check there. If nothing’s in Memcache, we move down to the database.

In our perfect idealized example where nothing can possibly go wrong, the database of course contains a user object with ID 23, which it returns. At this point we want to automagically save this object at each data storage level for later use. We move back up the layer hierarchy, first stopping at Memcache (Local Shared Memory) and shoving the object into a predefined key. We move up to Local Process Memory and do the same thing. Finally, we return the request object to the caller.

The actual implementation is pretty straightforward, and the Chain of Responsibility pattern suits our needs well. We’ll need 1) a dispatcher to initiate the request, as well as 2) a series of layers to act on data as it flows up and down the chain.

Let’s start with the series of layers first. They’ll need an interface to help guarantee a certain implementation architecture, which looks something like this:

interface Framework_Storage_Interface {
    /**
     * fetch
     * get data from storage layer
     * @param   $type   string  Model class name
     * @param   $ids    mixed   single integer or array of integers of id of requested items
     * @return  array   associative array of data representing the object
     */
    public function fetch($type, $ids, $options = array());

    /**
     * create
     * create new storage item
     * @param   $type   string  Model class name
     * @param   $data   array   data representing the object
     * @return  mixed   boolean false on failure integer representing object id on success
     */
    public function create($type, $data, $options = array());

    /**
     * update
     * update storage item
     * @param   $type   string  Model class name
     * @param   $id     mixed   single integer id of item to be updated
     * @param   $data   array   data representing the object
     * @return  boolean true if success, false if failure
     */
    public function update($type, $id, $data, $options = array());

    /**
     * delete
     * delete storage item
     * @param   $type   string  Model class name
     * @param   $id     mixed   single integer id of item to be deleted
     * @return  boolean true if success, false if failure
     */
    public function delete($type, $id, $options = array());
}

We now need a Base Layer object, which is something that will never pretend to have or do anything. We’ll be extending this class for each of our caching/storage layers in Part 2, coming soon!

abstract class Framework_Storage_Layer
    extends Framework_Chain
    implements Framework_Storage_Interface
{
    /**
     * fetch
     * get data from storage layer
     * @param   $type   string  Model class name
     * @param   $ids    mixed   single integer or array of integers of id of requested items
     * @return  array   associative array of data representing the object
     */
    public function fetch($type, $ids, $options = array())
    {
        return false;
    }

    /**
     * create
     * create new storage item
     * @param   $type   string  Model class name
     * @param   $data   array   data representing the object
     * @return  mixed   boolean false on failure integer representing object id on success
     */
    public function create($type, $data, $options = array())
    {
        return false;
    }

    /**
     * update
     * update storage item
     * @param   $type   string  Model class name
     * @param   $id     mixed   single integer id of item to be updated
     * @param   $data   array   data representing the object
     * @return  boolean true if success, false if failure
     */
    public function update($type, $id, $data, $options = array())
    {
        return false;
    }

    /**
     * delete
     * delete storage item
     * @param   $type   string  Model class name
     * @param   $id     mixed   single integer id of item to be deleted
     * @return  boolean true if success, false if failure
     */
    public function delete($type, $id, $options = array())
    {
        return false;
    }

    /**
     * find
     * find item(s) in storage system
     * @param   $criteria   Framework_Model_Criteria    object containing criteria for find
     * @param   $type   string  The type of the requested objects for which to search
     * @return  array   ids of storage items matching criteria
     */
    public function find($type, $criteria, $options = array())
    {
        return false;
    }

    /**
     * fetchByCriteria
     * find item(s) in storage system
     * @param   $type       string  Model class name
     * @param   $criteria   Framework_Model_Criteria    object containing criteria for find
     * @return  array       multi-dimensional array of items matching criteria
     */
    public function fetchByCriteria($type, $criteria, $options = array())
    {
        return false;
    }

    /**
     * getStructure
     * structure of storage type given by $type
     * @param   $type       string  Model class name
     * @return  array       multi-dimensional array of object structure
     */
    public function getStructure($type, $options = array())
    {
        return false;
    }

    /**
     * checkOption
     * @param   $options    array   The options list to check against
     * @param   $key        string  The option key to check
     * @param   $value      mixed   Check whether this value exists
     * @return  bool                Whether this option is set
     */
    protected function checkOption($options, $key, $value = true)
    {
        // Options must be a valid array of information
        if (!is_array($options)) {
            return false;
        }

        // Check whether given key exists:
        if (!isset($options[$key])) {
            return false;
        }

        // At this point, key exists, and that is all we want, so success
        if ($value === true) {
            return true;
        }

        // If the key value is true, then it is enabled for all:
        if ($options[$key] === true) {
            return true;
        }

        // If we have an array of data, check if value is present
        if (is_array($options[$key]) && in_array($value, $options[$key])) {
            return true;
        }

        // Option is a string, so compare string to value
        if ($options[$key] == $value) {
            return true;
        }

        // There were no matches, so option is off
        return false;
    }
}


Tags: , , , ,