Compression synchronization and optimization

ZixThree · Feb 5, 2005

Compression synchronization and optimization

The current implementation of the compression class is locking a SyncRoot object to use the internal buffer only once at any time. The problem is that this internal buffer is returned after the lock is finished and thus could be changed before it is used. Right now, the Compression.Compress method is used in Packet.Compile which I think is never accessed from two threads at the same time. But since I don't know all the implication of this, I cannot assume that. And so, the Compression.Compress method should never return is internal buffer.

I took liberty to optimize some critical code in Compression.Compress. The optimization have been benchmarked and the data output has been verified.

Code:

/***************************************************************************
 *                               Compression.cs
 *                            -------------------
 *   begin                : May 1, 2002
 *   copyright            : (C) The RunUO Software Team
 *   email                : [email protected]
 *
 *   $Id: Compression.cs,v 1.3 2005/01/22 04:25:04 krrios Exp $
 *   $Author: krrios $
 *   $Date: 2005/01/22 04:25:04 $
 *
 *
 ***************************************************************************/

/***************************************************************************
 *
 *   This program is free software; you can redistribute it and/or modify
 *   it under the terms of the GNU General Public License as published by
 *   the Free Software Foundation; either version 2 of the License, or
 *   (at your option) any later version.
 *
 ***************************************************************************/

// 02-02-2005: ZixThree:
//					Fixed the internal return bug.
//					Optimization

using System;
using System.IO;

namespace Server.Network
{
	/// <summary>
	/// Handles outgoing packet compression for the network.
	/// </summary>
	public class CompressionUnsafeOptimized
	{
		private static int[] m_Table = new int[514]
		{
			0x2, 0x000,	0x5, 0x01F,	0x6, 0x022,	0x7, 0x034,	0x7, 0x075,	0x6, 0x028,	0x6, 0x03B,	0x7, 0x032,
			0x8, 0x0E0,	0x8, 0x062,	0x7, 0x056,	0x8, 0x079,	0x9, 0x19D,	0x8, 0x097,	0x6, 0x02A,	0x7, 0x057,
			0x8, 0x071,	0x8, 0x05B,	0x9, 0x1CC,	0x8, 0x0A7,	0x7, 0x025,	0x7, 0x04F,	0x8, 0x066,	0x8, 0x07D,
			0x9, 0x191,	0x9, 0x1CE,	0x7, 0x03F,	0x9, 0x090,	0x8, 0x059,	0x8, 0x07B,	0x8, 0x091,	0x8, 0x0C6,
			0x6, 0x02D,	0x9, 0x186,	0x8, 0x06F,	0x9, 0x093,	0xA, 0x1CC,	0x8, 0x05A,	0xA, 0x1AE,	0xA, 0x1C0,
			0x9, 0x148,	0x9, 0x14A,	0x9, 0x082,	0xA, 0x19F,	0x9, 0x171,	0x9, 0x120,	0x9, 0x0E7,	0xA, 0x1F3,
			0x9, 0x14B,	0x9, 0x100,	0x9, 0x190,	0x6, 0x013,	0x9, 0x161,	0x9, 0x125,	0x9, 0x133,	0x9, 0x195,
			0x9, 0x173,	0x9, 0x1CA,	0x9, 0x086,	0x9, 0x1E9,	0x9, 0x0DB,	0x9, 0x1EC,	0x9, 0x08B,	0x9, 0x085,
			0x5, 0x00A,	0x8, 0x096,	0x8, 0x09C,	0x9, 0x1C3,	0x9, 0x19C,	0x9, 0x08F,	0x9, 0x18F,	0x9, 0x091,
			0x9, 0x087,	0x9, 0x0C6,	0x9, 0x177,	0x9, 0x089,	0x9, 0x0D6,	0x9, 0x08C,	0x9, 0x1EE,	0x9, 0x1EB,
			0x9, 0x084,	0x9, 0x164,	0x9, 0x175,	0x9, 0x1CD,	0x8, 0x05E,	0x9, 0x088,	0x9, 0x12B,	0x9, 0x172,
			0x9, 0x10A,	0x9, 0x08D,	0x9, 0x13A,	0x9, 0x11C,	0xA, 0x1E1,	0xA, 0x1E0,	0x9, 0x187,	0xA, 0x1DC,
			0xA, 0x1DF,	0x7, 0x074,	0x9, 0x19F,	0x8, 0x08D,	0x8, 0x0E4,	0x7, 0x079,	0x9, 0x0EA,	0x9, 0x0E1,
			0x8, 0x040,	0x7, 0x041,	0x9, 0x10B,	0x9, 0x0B0,	0x8, 0x06A,	0x8, 0x0C1,	0x7, 0x071,	0x7, 0x078,
			0x8, 0x0B1,	0x9, 0x14C,	0x7, 0x043,	0x8, 0x076,	0x7, 0x066,	0x7, 0x04D,	0x9, 0x08A,	0x6, 0x02F,
			0x8, 0x0C9,	0x9, 0x0CE,	0x9, 0x149,	0x9, 0x160,	0xA, 0x1BA,	0xA, 0x19E,	0xA, 0x39F,	0x9, 0x0E5,
			0x9, 0x194,	0x9, 0x184,	0x9, 0x126,	0x7, 0x030,	0x8, 0x06C,	0x9, 0x121,	0x9, 0x1E8,	0xA, 0x1C1,
			0xA, 0x11D,	0xA, 0x163,	0xA, 0x385,	0xA, 0x3DB,	0xA, 0x17D,	0xA, 0x106,	0xA, 0x397,	0xA, 0x24E,
			0x7, 0x02E,	0x8, 0x098,	0xA, 0x33C,	0xA, 0x32E,	0xA, 0x1E9,	0x9, 0x0BF,	0xA, 0x3DF,	0xA, 0x1DD,
			0xA, 0x32D,	0xA, 0x2ED,	0xA, 0x30B,	0xA, 0x107,	0xA, 0x2E8,	0xA, 0x3DE,	0xA, 0x125,	0xA, 0x1E8,
			0x9, 0x0E9,	0xA, 0x1CD,	0xA, 0x1B5,	0x9, 0x165,	0xA, 0x232,	0xA, 0x2E1,	0xB, 0x3AE,	0xB, 0x3C6,
			0xB, 0x3E2,	0xA, 0x205,	0xA, 0x29A,	0xA, 0x248,	0xA, 0x2CD,	0xA, 0x23B,	0xB, 0x3C5,	0xA, 0x251,
			0xA, 0x2E9,	0xA, 0x252,	0x9, 0x1EA,	0xB, 0x3A0,	0xB, 0x391,	0xA, 0x23C,	0xB, 0x392,	0xB, 0x3D5,
			0xA, 0x233,	0xA, 0x2CC,	0xB, 0x390,	0xA, 0x1BB,	0xB, 0x3A1,	0xB, 0x3C4,	0xA, 0x211,	0xA, 0x203,
			0x9, 0x12A,	0xA, 0x231,	0xB, 0x3E0,	0xA, 0x29B,	0xB, 0x3D7,	0xA, 0x202,	0xB, 0x3AD,	0xA, 0x213,
			0xA, 0x253,	0xA, 0x32C,	0xA, 0x23D,	0xA, 0x23F,	0xA, 0x32F,	0xA, 0x11C,	0xA, 0x384,	0xA, 0x31C,
			0xA, 0x17C,	0xA, 0x30A,	0xA, 0x2E0,	0xA, 0x276,	0xA, 0x250,	0xB, 0x3E3,	0xA, 0x396,	0xA, 0x18F,
			0xA, 0x204,	0xA, 0x206,	0xA, 0x230,	0xA, 0x265,	0xA, 0x212,	0xA, 0x23E,	0xB, 0x3AC,	0xB, 0x393,
			0xB, 0x3E1,	0xA, 0x1DE,	0xB, 0x3D6,	0xA, 0x31D,	0xB, 0x3E5,	0xB, 0x3E4,	0xA, 0x207,	0xB, 0x3C7,
			0xA, 0x277,	0xB, 0x3D4,	0x8, 0x0C0,	0xA, 0x162,	0xA, 0x3DA,	0xA, 0x124,	0xA, 0x1B4,	0xA, 0x264,
			0xA, 0x33D,	0xA, 0x1D1,	0xA, 0x1AF,	0xA, 0x39E,	0xA, 0x24F,	0xB, 0x373,	0xA, 0x249,	0xB, 0x372,
			0x9, 0x167,	0xA, 0x210,	0xA, 0x23A,	0xA, 0x1B8,	0xB, 0x3AF,	0xA, 0x18E,	0xA, 0x2EC,	0x7, 0x062,
			0x4, 0x00D
		};

		private static byte[] m_OutputBuffer = new byte[0x40000];
		private static object m_SyncRoot = new object();

		public unsafe static void Compress( byte[] input, int length, out byte[] output, out int outputLength )
		{
			lock ( m_SyncRoot )
			{
				int holdCount = 0;
				int holdValue = 0;

				int packCount = 0;
				int packValue = 0;

				int byteValue = 0;

				int inputLength = length;
				int inputIndex = 0;

				int outputCount = 0;

				fixed ( int *pTable = m_Table )
				{
					fixed ( byte *pOutputBuffer = m_OutputBuffer )
					{
						while ( inputIndex < inputLength )
						{
							byteValue = input[inputIndex++] << 1;

							packCount = pTable[byteValue];
							packValue = pTable[byteValue | 1];

							holdValue = (holdValue << packCount) | packValue; // 02-02-2005: ZixThree: Removed one variable assignation
							holdCount  += packCount;

							if( holdCount >= 8) // 02-02-2005: ZixThree: Unrolled loop considering table data.
							{
								holdCount -= 8;
								pOutputBuffer[outputCount++] = (byte)(holdValue >> holdCount);
								
								if( holdCount >= 8 ) // 02-02-2005: ZixThree: Can unroll here because cumulative value will never be greater than 24
								{
									holdCount -= 8;
									pOutputBuffer[outputCount++] = (byte)(holdValue >> holdCount);
								}
							}
						} 

						packCount = pTable[0x200];
						packValue = pTable[0x201];

						holdValue = (holdValue << packCount) | packValue; // 02-02-2005: ZixThree: Removed one variable assignation
						holdCount  += packCount;

						if( holdCount >= 8) // 02-02-2005: ZixThree: Unrolled loop considering table data.
						{
							holdCount -= 8;
							pOutputBuffer[outputCount++] = (byte)(holdValue >> holdCount);
								
							if( holdCount >= 8 ) // 02-02-2005: ZixThree: Can unroll here because cumulative value will never be greater than 24
							{
								holdCount -= 8;
								pOutputBuffer[outputCount++] = (byte)(holdValue >> holdCount);
							}
						}

						if ( holdCount > 0 )
							pOutputBuffer[outputCount++] = (byte)(holdValue << (8 - holdCount));
					}
				}

				output = new byte[outputCount]; // 02-02-2005: ZixThree: Return a new buffer instead of internal buffer.
				outputLength = outputCount;
				Buffer.BlockCopy(m_OutputBuffer, 0, output, 0, outputCount);
			}
		}
	}
}

With these optimization, the equivalent "safe" code is about same speed as the original file and in some circonstances a little faster.

Code:

/***************************************************************************
 *                               Compression.cs
 *                            -------------------
 *   begin                : May 1, 2002
 *   copyright            : (C) The RunUO Software Team
 *   email                : [email protected]
 *
 *   $Id: Compression.cs,v 1.3 2005/01/22 04:25:04 krrios Exp $
 *   $Author: krrios $
 *   $Date: 2005/01/22 04:25:04 $
 *
 *
 ***************************************************************************/

/***************************************************************************
 *
 *   This program is free software; you can redistribute it and/or modify
 *   it under the terms of the GNU General Public License as published by
 *   the Free Software Foundation; either version 2 of the License, or
 *   (at your option) any later version.
 *
 ***************************************************************************/

// 02-02-2005: ZixThree:
//					Fixed a thread-safety bug.
//					Optimization

using System;
using System.IO;

namespace Server.Network
{
	/// <summary>
	/// Handles outgoing packet compression for the network.
	/// </summary>
	public class Compression
	{
		private static int[] m_Table = new int[514]
		{
			0x2, 0x000,	0x5, 0x01F,	0x6, 0x022,	0x7, 0x034,	0x7, 0x075,	0x6, 0x028,	0x6, 0x03B,	0x7, 0x032,
			0x8, 0x0E0,	0x8, 0x062,	0x7, 0x056,	0x8, 0x079,	0x9, 0x19D,	0x8, 0x097,	0x6, 0x02A,	0x7, 0x057,
			0x8, 0x071,	0x8, 0x05B,	0x9, 0x1CC,	0x8, 0x0A7,	0x7, 0x025,	0x7, 0x04F,	0x8, 0x066,	0x8, 0x07D,
			0x9, 0x191,	0x9, 0x1CE,	0x7, 0x03F,	0x9, 0x090,	0x8, 0x059,	0x8, 0x07B,	0x8, 0x091,	0x8, 0x0C6,
			0x6, 0x02D,	0x9, 0x186,	0x8, 0x06F,	0x9, 0x093,	0xA, 0x1CC,	0x8, 0x05A,	0xA, 0x1AE,	0xA, 0x1C0,
			0x9, 0x148,	0x9, 0x14A,	0x9, 0x082,	0xA, 0x19F,	0x9, 0x171,	0x9, 0x120,	0x9, 0x0E7,	0xA, 0x1F3,
			0x9, 0x14B,	0x9, 0x100,	0x9, 0x190,	0x6, 0x013,	0x9, 0x161,	0x9, 0x125,	0x9, 0x133,	0x9, 0x195,
			0x9, 0x173,	0x9, 0x1CA,	0x9, 0x086,	0x9, 0x1E9,	0x9, 0x0DB,	0x9, 0x1EC,	0x9, 0x08B,	0x9, 0x085,
			0x5, 0x00A,	0x8, 0x096,	0x8, 0x09C,	0x9, 0x1C3,	0x9, 0x19C,	0x9, 0x08F,	0x9, 0x18F,	0x9, 0x091,
			0x9, 0x087,	0x9, 0x0C6,	0x9, 0x177,	0x9, 0x089,	0x9, 0x0D6,	0x9, 0x08C,	0x9, 0x1EE,	0x9, 0x1EB,
			0x9, 0x084,	0x9, 0x164,	0x9, 0x175,	0x9, 0x1CD,	0x8, 0x05E,	0x9, 0x088,	0x9, 0x12B,	0x9, 0x172,
			0x9, 0x10A,	0x9, 0x08D,	0x9, 0x13A,	0x9, 0x11C,	0xA, 0x1E1,	0xA, 0x1E0,	0x9, 0x187,	0xA, 0x1DC,
			0xA, 0x1DF,	0x7, 0x074,	0x9, 0x19F,	0x8, 0x08D,	0x8, 0x0E4,	0x7, 0x079,	0x9, 0x0EA,	0x9, 0x0E1,
			0x8, 0x040,	0x7, 0x041,	0x9, 0x10B,	0x9, 0x0B0,	0x8, 0x06A,	0x8, 0x0C1,	0x7, 0x071,	0x7, 0x078,
			0x8, 0x0B1,	0x9, 0x14C,	0x7, 0x043,	0x8, 0x076,	0x7, 0x066,	0x7, 0x04D,	0x9, 0x08A,	0x6, 0x02F,
			0x8, 0x0C9,	0x9, 0x0CE,	0x9, 0x149,	0x9, 0x160,	0xA, 0x1BA,	0xA, 0x19E,	0xA, 0x39F,	0x9, 0x0E5,
			0x9, 0x194,	0x9, 0x184,	0x9, 0x126,	0x7, 0x030,	0x8, 0x06C,	0x9, 0x121,	0x9, 0x1E8,	0xA, 0x1C1,
			0xA, 0x11D,	0xA, 0x163,	0xA, 0x385,	0xA, 0x3DB,	0xA, 0x17D,	0xA, 0x106,	0xA, 0x397,	0xA, 0x24E,
			0x7, 0x02E,	0x8, 0x098,	0xA, 0x33C,	0xA, 0x32E,	0xA, 0x1E9,	0x9, 0x0BF,	0xA, 0x3DF,	0xA, 0x1DD,
			0xA, 0x32D,	0xA, 0x2ED,	0xA, 0x30B,	0xA, 0x107,	0xA, 0x2E8,	0xA, 0x3DE,	0xA, 0x125,	0xA, 0x1E8,
			0x9, 0x0E9,	0xA, 0x1CD,	0xA, 0x1B5,	0x9, 0x165,	0xA, 0x232,	0xA, 0x2E1,	0xB, 0x3AE,	0xB, 0x3C6,
			0xB, 0x3E2,	0xA, 0x205,	0xA, 0x29A,	0xA, 0x248,	0xA, 0x2CD,	0xA, 0x23B,	0xB, 0x3C5,	0xA, 0x251,
			0xA, 0x2E9,	0xA, 0x252,	0x9, 0x1EA,	0xB, 0x3A0,	0xB, 0x391,	0xA, 0x23C,	0xB, 0x392,	0xB, 0x3D5,
			0xA, 0x233,	0xA, 0x2CC,	0xB, 0x390,	0xA, 0x1BB,	0xB, 0x3A1,	0xB, 0x3C4,	0xA, 0x211,	0xA, 0x203,
			0x9, 0x12A,	0xA, 0x231,	0xB, 0x3E0,	0xA, 0x29B,	0xB, 0x3D7,	0xA, 0x202,	0xB, 0x3AD,	0xA, 0x213,
			0xA, 0x253,	0xA, 0x32C,	0xA, 0x23D,	0xA, 0x23F,	0xA, 0x32F,	0xA, 0x11C,	0xA, 0x384,	0xA, 0x31C,
			0xA, 0x17C,	0xA, 0x30A,	0xA, 0x2E0,	0xA, 0x276,	0xA, 0x250,	0xB, 0x3E3,	0xA, 0x396,	0xA, 0x18F,
			0xA, 0x204,	0xA, 0x206,	0xA, 0x230,	0xA, 0x265,	0xA, 0x212,	0xA, 0x23E,	0xB, 0x3AC,	0xB, 0x393,
			0xB, 0x3E1,	0xA, 0x1DE,	0xB, 0x3D6,	0xA, 0x31D,	0xB, 0x3E5,	0xB, 0x3E4,	0xA, 0x207,	0xB, 0x3C7,
			0xA, 0x277,	0xB, 0x3D4,	0x8, 0x0C0,	0xA, 0x162,	0xA, 0x3DA,	0xA, 0x124,	0xA, 0x1B4,	0xA, 0x264,
			0xA, 0x33D,	0xA, 0x1D1,	0xA, 0x1AF,	0xA, 0x39E,	0xA, 0x24F,	0xB, 0x373,	0xA, 0x249,	0xB, 0x372,
			0x9, 0x167,	0xA, 0x210,	0xA, 0x23A,	0xA, 0x1B8,	0xB, 0x3AF,	0xA, 0x18E,	0xA, 0x2EC,	0x7, 0x062,
			0x4, 0x00D
		};

		private static byte[] m_OutputBuffer = new byte[0x40000];
		private static object m_SyncRoot = new object();

		public static void Compress( byte[] input, int length, out byte[] output, out int outputLength )
		{
			// 01-23-2005: ZixThree: Start: Removed unsafe mode and changed all pointers to Arrays.
			lock ( m_SyncRoot )
			{
				int holdCount = 0;
				int holdValue = 0;

				int packCount = 0;
				int packValue = 0;

				int byteValue = 0;

				int inputLength = length;
				int inputIndex = 0;

				int outputCount = 0;

				while ( inputIndex < inputLength )
				{
					byteValue = input[inputIndex++] << 1;

					packCount = m_Table[byteValue];
					packValue = m_Table[byteValue | 1];

					holdValue = (holdValue << packCount) | packValue; // 02-02-2005: ZixThree: Removed one variable assignation
					holdCount  += packCount;

					if( holdCount >= 8 ) // 02-02-2005: ZixThree: Unrolled loop considering table data.
					{
						holdCount -= 8;
						m_OutputBuffer[outputCount++] = (byte)(holdValue >> holdCount);

						if( holdCount >= 8 ) // 02-02-2005: ZixThree: Can unroll here because cumulative value will never be greater than 24
						{
							holdCount -= 8;
							m_OutputBuffer[outputCount++] = (byte)(holdValue >> holdCount);
						}
					}
				}

				packCount = m_Table[0x200];
				packValue = m_Table[0x201];

				holdValue = (holdValue << packCount) | packValue; // 02-02-2005: ZixThree: Removed one variable assignation
				holdCount  += packCount;

				if( holdCount >= 8 ) // 02-02-2005: ZixThree: Unrolled loop considering table data.
				{
					holdCount -= 8;
					m_OutputBuffer[outputCount++] = (byte)(holdValue >> holdCount);

					if( holdCount >= 8 ) // 02-02-2005: ZixThree: Can unroll here because cumulative value will never be greater than 24
					{
						holdCount -= 8;
						m_OutputBuffer[outputCount++] = (byte)(holdValue >> holdCount);
					}
				}

				if ( holdCount > 0 )
					m_OutputBuffer[outputCount++] = (byte)(holdValue << (8 - holdCount));

				output = new byte[outputCount]; // 02-02-2005: ZixThree: Create a buffer directly, provide better thread safety outside Compress method.
				outputLength = outputCount;
				Buffer.BlockCopy( m_OutputBuffer, 0, output, 0, outputLength );
			}
			// 01-23-2005: ZixThree: End: Removed unsafe mode and changed all pointers to Arrays.
		}
	}
}

Last change needed is in Packet.cs file. The InternalCompile method should look more like this:

Code:

private void InternalCompile( bool compress )
		{
			if ( m_Length == 0 )
			{
				long streamLen = m_Stream.Length;

				m_Stream.Seek( 1, SeekOrigin.Begin );
				m_Stream.Write( (ushort) streamLen );
			}
			else if ( m_Stream.Length != m_Length )
			{
				int diff = (int)m_Stream.Length - m_Length;

				Console.WriteLine( "Packet: 0x{0:X2}: Bad packet length! ({1}{2} bytes)", m_PacketID, diff >= 0 ? "+" : "", diff );
			}

			MemoryStream ms = m_Stream.UnderlyingStream;

			int length;

			m_CompiledBuffer = ms.GetBuffer();
			length = (int)ms.Length;

			if ( compress )
			{
				try
				{
					Compression.Compress( m_CompiledBuffer, length, out m_CompiledBuffer, out length );
				}
				catch ( IndexOutOfRangeException )
				{
					Console.WriteLine( "Warning: Compression buffer overflowed on packet 0x{0:X2} ('{1}') (length={2})", m_PacketID, GetType().Name, length );

					m_CompiledBuffer = null;
				}
			}
			else // 02-02-2005: ZixThree: New buffer now created when compressing, only need creation for MemoryStream buffer.
			{
				byte[] old = m_CompiledBuffer;

				m_CompiledBuffer = new byte[length];

				Buffer.BlockCopy( old, 0, m_CompiledBuffer, 0, length );
			}

			PacketWriter.ReleaseInstance( m_Stream );
			m_Stream = null;
		}

Phantom · Feb 5, 2005

Good Work

sirens song · Feb 5, 2005

Yes, Very good work

Right now, the Compression.Compress method is used in Packet.Compile which I think is never accessed from two threads at the same time.

You think correctly
I think this should be include with future distro releases

-Jamie

TheOutkastDev · Feb 7, 2005

ZixThree said:
With these optimization, the equivalent "safe" code is about same speed as the original file and in some circonstances a little faster.

Such as?

Ran a few tests on this "safe" code and its far from about the same speed.

RunUO - [www.runuo.com] Version 1.0.0, Build 35532
Scripts: Compiling C# scripts...done (0 errors, 0 warnings)
Scripts: Compiling VB.net scripts...no files found.
Scripts: Verifying...done (1658 items, 435 mobiles)
World: Loading...done (16 items, 1 mobiles) (0.5 seconds)
Reports: Loading...done
Regions: Loading...done
Address: 127.0.0.1:2593
Address: 192.168.0.101:2593
Client: 127.0.0.1: Connected. [1 Online]
Login: 127.0.0.1: Valid credentials for 'admin'
Login: 127.0.0.1: Account 'admin' at character list
10000 compression calls on RunUO Distro Compression: 0 ticks, 0ms.
10000 compression calls on ZixThree Compression: 0 ticks, 0ms.
100000 compression calls on RunUO Distro Compression: 156250 ticks, 15.625ms.
100000 compression calls on ZixThree Compression: 312500 ticks, 31.25ms.
500000 compression calls on RunUO Distro Compression: 781250 ticks, 78.125ms.
500000 compression calls on ZixThree Compression: 2968750 ticks, 296.875ms.
1000000 compression calls on RunUO Distro Compression: 1718750 ticks, 171.875ms.
1000000 compression calls on ZixThree Compression: 3593750 ticks, 359.375ms.
5000000 compression calls on RunUO Distro Compression: 8906250 ticks, 890.625ms.
5000000 compression calls on ZixThree Compression: 17968750 ticks, 1796.875ms.

ZixThree · Feb 7, 2005

[edit]If you compared with the actual distro compress, it is indeed slower since the original distro don't allocate a new buffer for each compress request. (It is allocated outside the Compress method)[/edit]

Oky, I was perhaps a little enthousiastic but anyway, here's the figures.

Code:

Configuration =>
        RandomSeed: 0
        Number of test Buffer: 16384
        Max buffer size: 16
        Number of test iteration: 16777216
Generating test data...done.
Testing with unsafe...done in 7187,5ms.
Testing with unsafe (new buffer)...done in 9203,125ms.
Testing with unsafe (optimized)...done in 6953,125ms.
Testing with safe...done in 7718,75ms.
Testing with safe (optimized)...done in 7078,125ms.
Test ended. Press enter key to start next test series.

Configuration =>
        RandomSeed: 0
        Number of test Buffer: 16384
        Max buffer size: 64
        Number of test iteration: 4194304
Generating test data...done.
Testing with unsafe...done in 4375ms.
Testing with unsafe (new buffer)...done in 5015,625ms.
Testing with unsafe (optimized)...done in 4109,375ms.
Testing with safe...done in 4593,75ms.
Testing with safe (optimized)...done in 4375ms.
Test ended. Press enter key to start next test series.

Configuration =>
        RandomSeed: 0
        Number of test Buffer: 16384
        Max buffer size: 256
        Number of test iteration: 1048576
Generating test data...done.
Testing with unsafe...done in 3578,125ms.
Testing with unsafe (new buffer)...done in 4125ms.
Testing with unsafe (optimized)...done in 3390,625ms.
Testing with safe...done in 3750ms.
Testing with safe (optimized)...done in 3578,125ms.
Test ended. Press enter key to start next test series.

Configuration =>
        RandomSeed: 0
        Number of test Buffer: 16384
        Max buffer size: 1024
        Number of test iteration: 262144
Generating test data...done.
Testing with unsafe...done in 3281,25ms.
Testing with unsafe (new buffer)...done in 3546,875ms.
Testing with unsafe (optimized)...done in 3078,125ms.
Testing with safe...done in 3500ms.
Testing with safe (optimized)...done in 3406,25ms.
Test ended. Press enter key to start next test series.

Configuration =>
        RandomSeed: 0
        Number of test Buffer: 16384
        Max buffer size: 4096
        Number of test iteration: 65536
Generating test data...done.
Testing with unsafe...done in 3328,125ms.
Testing with unsafe (new buffer)...done in 3578,125ms.
Testing with unsafe (optimized)...done in 3109,375ms.
Testing with safe...done in 3500ms.
Testing with safe (optimized)...done in 3390,625ms.
Test ended. Press enter key to start next test series.

Configuration =>
        RandomSeed: 0
        Number of test Buffer: 4096
        Max buffer size: 16384
        Number of test iteration: 32768
Generating test data...done.
Testing with unsafe...done in 6531,25ms.
Testing with unsafe (new buffer)...done in 7000ms.
Testing with unsafe (optimized)...done in 6109,375ms.
Testing with safe...done in 6921,875ms.
Testing with safe (optimized)...done in 6703,125ms.
Test ended. Press enter key to start next test series.

Configuration =>
        RandomSeed: 0
        Number of test Buffer: 8096
        Max buffer size: 65536
        Number of test iteration: 16384
Generating test data...done.
Testing with unsafe...done in 13140,625ms.
Testing with unsafe (new buffer)...done in 14218,75ms.
Testing with unsafe (optimized)...done in 12265,625ms.
Testing with safe...done in 14015,625ms.
Testing with safe (optimized)...done in 13531,25ms.
Test ended. Press enter key to end test application.

Little description:
unsafe -> usual compression method.
unsafe (new buffer) -> usual compression method with the creation of a new buffer to hold returned data
unsafe (optimized) -> unsafe with some optimization
safe -> direct port from unsafe
safe (optimized) -> same optimisation as for the unsafe (optimized)

I have also included the project I used to benchmark these and each implementation (in sub-folder Compression).

TheOutkastDev · Feb 7, 2005

That test code is terribly flawed for benchmarks and overly complex. The distro compress method isn't faster because it isn't allocating a new byte[] array, its because it uses pointers.

Code:

using System;
using Server.Network;

namespace Server.Scripts.Custom
{
	/// <summary>
	/// Summary description for PacketTest.
	/// </summary>
	public class PacketTest
	{
		public static void Initialize()
		{
			Server.Commands.Register( "DistroTest", AccessLevel.Administrator, new CommandEventHandler( CompressTest ) );
			Server.Commands.Register( "ChangeTest", AccessLevel.Administrator, new CommandEventHandler( Compress2Test ) );
		}

		private static void CompressTest( CommandEventArgs e )
		{
			int amount = Int32.Parse( e.Arguments[0] );

			Packet p = e.Mobile.OPLPacket;

			byte[] data = p.Compile( false );
			
			byte[] comp;
			int newLen = 0;

			DateTime start = DateTime.Now;

			for( int i  = 0; i < amount; ++i )
			{
				Compression.Compress( data, data.Length, out comp, out newLen );
			}

			DateTime finish = DateTime.Now;
			TimeSpan execute = finish - start;

			Console.WriteLine( "{0} compression calls on RunUO Distro Compression: {1} ticks, {2}ms.", amount, execute.Ticks, execute.TotalMilliseconds );
		}

		private static void Compress2Test( CommandEventArgs e )
		{
			int amount = Int32.Parse( e.Arguments[0] );

			Packet p = e.Mobile.OPLPacket;

			byte[] data = p.Compile( false );
			
			byte[] comp;
			int newLen = 0;

			DateTime start = DateTime.Now;

			for( int i  = 0; i < amount; ++i )
			{
				Compression2.Compress( data, data.Length, out comp, out newLen );
			}

			DateTime finish = DateTime.Now;
			TimeSpan execute = finish - start;

			Console.WriteLine( "{0} compression calls on ZixThree Compression: {1} ticks, {2}ms.", amount, execute.Ticks, execute.TotalMilliseconds );
		}
	}
}

ZixThree · Feb 7, 2005

Oky, I used your code for the test and here is the result.

Code:

RunUO - [www.runuo.com] Version 1.0.0, Build 36918
Scripts: Compiling C# scripts...done (0 errors, 0 warnings)
Scripts: Compiling VB.net scripts...no files found.
Scripts: Verifying...done (1658 items, 435 mobiles)
World: Loading...done (87789 items, 2350 mobiles) (4,6 seconds)
This server has no accounts.
Do you want to create an administrator account now? (y/n)
y
Username: User
Password: Password
Account created, continuing
Regions: Loading...done
Address: 127.0.0.1:2593
Address: 142.137.194.136:2593
Client: 127.0.0.1: Connected. [1 Online]
Login: 127.0.0.1: Valid credentials for 'User'
Login: 127.0.0.1: Account 'User' at character list
Login: 127.0.0.1: New character being created (account=User)
 - Character: User (serial=0x0000092F)
 - Started: Skara Brae (618, 2234, 0) in Trammel
10000 compression calls on RunUO Distro Compression: 0 ticks, 0ms.
10000 compression calls on ZixThree Compression: 0 ticks, 0ms.
100000 compression calls on RunUO Distro Compression: 312500 ticks, 31,25ms.
100000 compression calls on ZixThree Compression: 312500 ticks, 31,25ms.
500000 compression calls on RunUO Distro Compression: 2187500 ticks, 218,75ms.
500000 compression calls on ZixThree Compression: 2187500 ticks, 218,75ms.
1000000 compression calls on RunUO Distro Compression: 4375000 ticks, 437,5ms.
1000000 compression calls on ZixThree Compression: 4062500 ticks, 406,25ms.
5000000 compression calls on RunUO Distro Compression: 20781250 ticks, 2078,125m
s.
5000000 compression calls on ZixThree Compression: 20312500 ticks, 2031,25ms.

I don't know what I can add to this. Retry my safe code without the buffer allocation, because right now, this is the result I get from your test suite.

And buffer allocation cost a lot as you can see from the benchmark I made (even if you say they are flawed, prolly because varying the data input to more accurately simulate real world is bad and imprecise).

Last thing... You know, I really like to improve myself in the area I am lacking. If you can tell me the reason why you consider that I'm wrong, I would really appreciate. Thank you.

TheOutkastDev · Feb 7, 2005

ZixThree said:
And buffer allocation cost a lot as you can see from the benchmark I made (even if you say they are flawed, prolly because varying the data input to more accurately simulate real world is bad and imprecise).

Its a benchmark. It isn't supposed to be random. By randomizing the objects your testing on, you skew the results and bring other factors into play. Its just like a science experiment. You're code fools around with memory allocation and does not specifically test the use of pointers vs no pointers. Your buffers are not realistic to begin with as they do not even resemble UO packets that would be compressed as my test suite uses the Mobile's object property list packet. Either way your argument is inapplicable.

ZixThree said:
Last thing... You know, I really like to improve myself in the area I am lacking. If you can tell me the reason why you consider that I'm wrong, I would really appreciate. Thank you.

There is no way that "safe" code is faster than "unsafe" code. Direct manipulation of the memory is the fastest possible way of manipulating a value type in .Net. Your safe code cannot be faster. Its simply impossible.

Explain to my please, your phobia of "unsafe" code. Why are you so hell bent on removing all traces of it from RunUO? Do you mean to tell me YOUR design, that is based on "safe" code is better than Krrios' design that uses "unsafe" code?

ZixThree · Feb 7, 2005

TheOutkastDev said:
Its a benchmark. It isn't supposed to be random. By randomizing the objects your testing on, you skew the results and bring other factors into play. Its just like a science experiment. You're code fools around with memory allocation and does not specifically test the use of pointers vs no pointers. Your buffers are not realistic to begin with as they do not even resemble UO packets that would be compressed as my test suite uses the Mobile's object property list packet. Either way your argument is inapplicable.

You should check my code better as I only allocate buffers at generation of my data, not anywhere else (except for the test that specifically do it). But I agree that my tested data is flawed in that, it is not valid packets. But you must agree that OPLPacket is not the only one packet there.

Lastly, random data in that case is not much random as I can run my test numerous times and the value will always be the same. Randomization seed is not timer.

Anyway, my test bed was enough for my need though not even near 100% accurate.

TheOutkastDev said:
There is no way that "safe" code is faster than "unsafe" code. Direct manipulation of the memory is the fastest possible way of manipulating a value type in .Net. Your safe code cannot be faster. Its simply impossible.

Explain to my please, your phobia of "unsafe" code. Why are you so hell bent on removing all traces of it from RunUO? Do you mean to tell me YOUR design, that is based on "safe" code is better than Krrios' design that uses "unsafe" code?

[edit]My design as you tell it is not mine at all, it is Krrios' one directly ported from unsafe to safe code with some optimization. All I did was those specific optimization.[/edit]

Optimization ring a bell? As I have said, I optimized the safe code, it is not a direct port. The same optimization applied to the unsafe code makes it way faster and outrun both the usual unsafe compress method and my safe one.

Also, I have no phobia toward unsafe code. But, there is one thing I consider are important about safe vs unsafe. Safe code is usually clearer and more easily readable than unsafe code (for the neophyte, pointer arithmetic is not fun to look at). In the current case, the first code I gave was the unsafe optimized one with the synchronization fix, then the safe one (optimized and fixed too). I included the safe one as a demonstration.

When I read my first post back, I perhaps did not tell what I meant. What I meant was that the safe optimized code is slower than the unsafe optimized code (same optimization here), but it is about the same speed as the current unsafe one.

I think you are badly reading my intentions. It is probably only myself who don't know how to tell things the good way and I apologize about that. The first purpose of this post was to signal a potential bug in the core and to add some optimization to the unsafe code in the process. The rest was only because I found it an interesting case to look at.

I hope you better understand my point of view now. If there is anything else you are unsure about, just ask.

darkstorm · Feb 8, 2005

Your safe code cannot be faster. Its simply impossible.

What do you think your .NET JIT Compiler optimizes your bytecode to do?
And by the way. I can definetly come up with something that uses direct pointer manipulation and is slower than normal managed code. (It's all about the algorithm...)

cu,
Storm

TheOutkastDev · Feb 8, 2005

darkstorm said:
What do you think your .NET JIT Compiler optimizes your bytecode to do?
And by the way. I can definetly come up with something that uses direct pointer manipulation and is slower than normal managed code. (It's all about the algorithm...)

cu,
Storm

The JIT optimizations never change "safe" code to "unsafe" code. It simply eliminates redundant instructions and optimizes where it can, such as removing empty switch cases ect ect..

Safe code is still bound by the CLR's type safety checks, security permissions ( acess modifiers ) and exception services. Unsafe code doesn't have this overhead. Hence the reason it's called "unsafe". I have yet to find any article that says "safe" code is faster than its equivalent "unsafe" code.

ZixThree · Feb 8, 2005

TheOutkastDev said:
The JIT optimizations never change "safe" code to "unsafe" code. It simply eliminates redundant instructions and optimizes where it can, such as removing empty switch cases ect ect..

Safe code is still bound by the CLR's type safety checks, security permissions ( acess modifiers ) and exception services. Unsafe code doesn't have this overhead. Hence the reason it's called "unsafe". I have yet to find any article that says "safe" code is faster than its equivalent "unsafe" code.

I agree with you. I wish in the future, the JIT will be able to take more in depth optimizations. In fact, have you looked at some of the articles about the new profiler that will come with the 2005 version of Visual Studio? It got nice feature for optimization of safe and somewhat unsafe code (cold and hot code is one of them if I remember well). I was not able to put my hand back on this article I read about it though, sadly.

While reviewing some old articles I read last year, I found this one: Harness the Features of C# to Power Your Scientific Computing Projects. This article highlight two other problems of unsafe code. It needs full thrust to be run. Security is not about performance but anyway, I thought it would add to pinpoint that. Thus, you are right, there is no overhead about permissions because you need to have them all to run unsafe code... Also, algorithms that are effective in C and unsafe C# are not necessarily the one that will be most effective in safe C# (yet, I don't deny unsafe have the possibility to be way faster).

Here are some fun links about optimization for .Net.
http://msdn.microsoft.com/netframework/programming/performance/default.aspx

darkstorm · Feb 9, 2005

TheOutkastDev said:
The JIT optimizations never change "safe" code to "unsafe" code. It simply eliminates redundant instructions and optimizes where it can, such as removing empty switch cases ect ect..

Safe code is still bound by the CLR's type safety checks, security permissions ( acess modifiers ) and exception services. Unsafe code doesn't have this overhead. Hence the reason it's called "unsafe". I have yet to find any article that says "safe" code is faster than its equivalent "unsafe" code.

The point of a JIT compiler is, besides normal compilation, that it can perform optimizations based on profiling data during runtime. The question is however, if the JIT compiler can do that for unsafe code...

cu,
Storm

Zippy · Feb 16, 2005

The main problem with ZixThree's method is allocating a new buffer for every Packet sent out would be a huge problem for large shards. Not because of speed, because of memory usage.

But you are right, I doesnt appear Compress will ever be called from another thread, so the lock is unnecessary... and since the lock is unecessary, the worries about returning the compiled buffer are too.

ZixThree · Feb 16, 2005

Thanks for the information Zippy. I already fiddled with the removal of the synchronized section but since I did not know for sure if it was called only in single threading, I could not assume anything.

About that buffer allocation, isn't one created later, just after the Compress method is used? (in the Packet.cs Packet.InternalCompile method?)

Zippy · Feb 16, 2005

That's been optimized out in 1.0.1

ZixThree · Feb 16, 2005

Nice!

I'll wait next release to see your good work then!

cfaust · Feb 17, 2005

Zippy said:
That's been optimized out in 1.0.1

*looks around, ducks under the nearest desk for fear of the "gimmie gimmie" flood*

I just got done reading this thread, and I have to say that I'm impressed with everyone that has posted, from an outsider's view it has been very constructive and informative. Thank you to all who posted.

tophyr · Nov 14, 2005

TheOutkastDev said:
There is no way that "safe" code is faster than "unsafe" code. Direct manipulation of the memory is the fastest possible way of manipulating a value type in .Net. Your safe code cannot be faster. Its simply impossible.

Actually it is possible that safe code would be faster than unsafe code for pure, straight allocations in certain cases.

The .NET Memory Manager works actually very simply. It allocates for the program a pretty large chunk of contiguous RAM, and then basically treats the whole thing as one big stack. When you allocate a new object in managed code all it does is take the next however-many bytes, and then increment its "free memory starts here" pointer appropriately. Two assignments and an addition, basically.

On the other hand, "unsafe" (normal C++ etc) memory management isn't so nice. It works by keeping internally a linked list of free chunks of memory, and when a block of ram is requested, it traverses the list looking for a chunk large enough. When it finds one, it splits it and assigns the ram.

Now, initially, these two algorithms are extremely similar, because there's no fragmentation. However as the program runs and objects that were created get destroyed, the unsafe MM starts to become less efficient. The .NET MM takes care of fragmentation by Garbage Collection - every so often it'll go through its stack, clear out the deleted objects and compact everything. This takes a hefty bit of work but in the long run is well worth it, as the alternative - memory fragmentation - has a harder impact on memory performance.

Packer898 · Nov 14, 2005

WoW, way to necro a 9 month old post! *cries IRL*

Edit - LOL Nice job on the -Karma. Im not the one necor'ing 9 month old posts. Get a clue.

Compression synchronization and optimization

Wanderer

Knight

Wanderer

Knight

Wanderer

Attachments

Knight

Wanderer

Knight

Wanderer

Wanderer

Knight

Wanderer

Wanderer

Razor Creator

Wanderer

Razor Creator

Wanderer

Wanderer

Wanderer

Knight