banner

Work Ahead

The work on the documentation is in progress. Feel free to email me your comments and suggestions. You can find my contacts in Contacts section.

What is it

This is a library you can use to perform background operations. For example, when the user requests a page, this library can determine what page this user will visit next (by statistical data it gathers) and run a separate thread where you can perform data caching or other non critical operations.

Unfortunately, this library may not work on Windows platform because it uses PCNTL functions. I didn't hear about PCNTL extension release for windows, but if you did - you are welcome to try.

How does it work

This library should run on your site entry point. Each time user visits a page, a call to library should be performed. Library gathers statistical information and stores it as you wish. For each page it calculates the most probable page user will visit next. You can control the minimum probablity value when the library will fork a child.
For example: User visited 'Portfolio' and library detected that the next page user can visit 70% is 'Contact Us'. If you set probability limit to 90%, no action will be taken in the case above. But if you set it to 70%, a separate process will be forked and the main process will be continued. The new process works separately and can receive some data from parent one. What data child should get - this is for you to decide. It's configurable.

High customization

The library doesn't need any special setup, database tables, etc. This is for you to decide where to store your stats. If you have a complicated storage (like MogileFS, or your own), you can get JSON serialized data from the library and perform save/load by yourself. If you store data via the standard method (MySQL or file), library can use it standalone and will not require your application to perform stats load/save.

Usually, you will not need to have stats for all your site, but only for pages you can perform background operations. Thus, the sitemap segment you need library to analyze is specified in config. And there is also a tool to generate the fresh stats in MySQL table. For files and external storage you will not need it.

Configuration

Here is the sample config you can find in library package:


<?xml version="1.0" encoding="UTF-8"?>
<Configuration>
	<!-- Database connection parameters -->
	<MySql>
		<host>localhost</host>
		<username>username</username>
		<password>password</password>
		<database>workAhead</database>
		<table>workAheadStats</table>
	</MySql>
	
	<!-- PHP settings -->
	<Php>
	 	<!--  How to call php interpreter (example: /bin/bash -l /usr/local/bin/php) -->
		<CallName>php</CallName>
	</Php>
	
	<!-- Site map for workAhead lookups -->
	<!-- Page ids are on your favor, you will have to pass them to the library from PHP -->
	<LookupMap>
		<Page id="index.php">
			<!-- References tell workAhead where to look for further user move -->
			<Reference targetId="second.php"/>
			<Reference targetId="third.php"/>
		</Page>
		<Page id="second.php">
			<Reference targetId="index.php"/>
			<Reference targetId="third.php"/>
		</Page>		
		<Page id="third.php">
			<Reference targetId="index.php"/>
			<Reference targetId="second.php"/>
		</Page>		
	</LookupMap>
	
	<Stats>
		<!-- Autoload section tells what source should be used for config autoload -->
		<Autoload>
			<!-- You need filename for file storage mode. You can specify absolute file path as well -->
			<Filename>stats.data</Filename>
			<!-- Allowed options are: -->
			<!-- FILE     - load/store data in file which name is specified above -->
			<!-- DATABASE - load/store data in database -->
			<!-- NONE 	 - Disable autosave/autoload -->
			<Storage>DATABASE</Storage>
		</Autoload>		
	
		<!-- Tells how probably potential user move should be to perform actions -->
		<MinProbability>100</MinProbability>

		<!-- Call section determines what script should be called and what parameters should be passed there -->
		<!-- This call will be performed for each guessed user move -->
		<!-- callDir is an initial working directory for the script. -->
		<!-- You can specify a full path to the script in "script" attribute and some different path in callDir -->
		<Call script="/var/www/html/workAhead/worker.php" callDir="/tmp" interpretPageIdAs="pageName">
			<!-- These are parameters specification. You can use three scopes: -->
			<!-- `internal` - function call (in example below, the result of session_id function will be passed as 'sid' parameter -->
			<!-- `session`  - a value from $_SESSION array -->
			<!-- `cookie`   - a value from $_COOKIE array -->
			<Parameter scope="internal" name="session_id" interpretAs="sid"/>
			<Parameter scope="session" name="userId" interpretAs="uid"/>
			<Parameter scope="cookie" name="myCookie" interpretAs="locale"/>
		</Call>	
		
		<!-- Log level. Logs via PHP error_log function. Be sure to redirect it -->
		<Log>
			<!-- Log level can be in range 0 (no logging) - 3 (most verbose) -->
			<LogLevel>3</LogLevel>
		</Log>
		
		<!-- Specifies how to lower stats values  -->
		<!-- This has influence only on current visited page stats -->
		<Lower>
			<!-- Decrease stats on reaching this limit -->
			<DecreaseOn>10000</DecreaseOn>
			<!-- Divide stats by this value -->
			<DivideBy>1000</DivideBy>
		</Lower>

	</Stats>
	
</Configuration>

As you can see, the library can be easily configured for your needs. The most confusing section is <Lower> I guess. So let me explain:

Imagine, you have 1000 clicks at your site per hour. WorkAhead will gather (increment) stats each time user clicks. After some short time interval you will have very large numbers in your db (file, etc), and it will be different to operate with them. So, library will lower stat values by dividing by value specified as "DecreaseBy".
"DecreaseOn" parameter specifies the limit to lower stats once the highest value reached it.

Take a look at the example how you can use WorkAhead in your existing code:

It is highly recommended that you use MySQL storage for the library

require 'lib/classes/WorkAheadManager.class.php';
$manager = WorkAheadManager::getInstance();
$manager->loadConfig('WorkAhead.config.xml');

$thisPage = 'index.php'; //Here you should create the same name you used in LookupMap (see config) as page id
$referer = 'second.php; //You should determine the page user visited before and make it name the same as in LookupMap (see config)

$manager->pageVisit($thisPage, $referer);

And that's all you have to change in your existing code!

If you need to save your stats, you can follow the next example:

.....
//Some code and library intialization here
.....
	
$myStats = WorkAheadManager::getInstance()->saveStats(WorkAheadManager::STORAGE_JSON);
//Here you can save your stats anywhere
	 
.....
//Some code
.....
	 
 WorkAheadManager::getInstance()->loadStats(WorkAheadManager::STORAGE_JSON, $myStats);
	 
...

Worker thread

Worker thread is your script that will be forked by the library decision. This script can be written by your favor, but it will not work forked until you include workAheadWorker class. It will perform all required operation to allow you working in background and also provide you an interface to get all parameters passed by main thread.

Here is an example of a simple worker script:

Keep in mind not to perform critical actions in worker. Do data caching, pregeneration, etc, but no actions that is not done by application itself on page visit

<?php

//Including worker will automatically fork the worker (this script)
require 'lib/classes/WorkAheadWorker.class.php';
$worker = new WorkAheadWorker();

//Worker instance provides access to properties you specified in config
//This example shows the page id specified in config as interpretPageIdAs="pageName"
error_log('Worker script was initiated for page '. $worker->pageName);

?>

Download

Theory is enough, it's time for you to experiment! Go ahead - Download

Code contains inline documentation and HTML generated one.