Saturday, October 27, 2007

How fast StringBuilder is ?

Subject

As we know StringBuilder builds strings faster than + operator or String.Concat() method.  Also StringBuilder with specified capacity performs faster than with default one.

But how fast StringBuilder is?

I created simple application to measure performance of different approaches of string concatenations. I will use several ways how to build a text: String.Concat(), StringBuilder with default capacity(16 character), StringBuilder with specified capacity.

 

Theory

 

Using String concatenation

Strings are immutable objects which means there is no way how to change already existing instance of string. Each time when we perform some action with string such as Trim(), Substring(), Replace(), etc it creates new object in the heap.

For example let's imagine that we have some array of strings and would like to concatenate them.

 

string[] myNameArray = new string[5]{"A", "R", "M", "E", "N"};

string text = null;

    

foreach (string s in myNameArray)

{

     text = String.Concat(text, s);

}   

 

Below you can see illustration of what exactly is happening in memory during each iteration.

 

Iteration 1:

There will be a String object ("A") allocated in the heap and a pointer will be returned back to "text" field.

image

 

Iteration 2:

By concatenating second member ("R") to already existing string we actually create new String object in a heap ("AR") and a reference to new created object will be returned to "text" field. The old object ("A") in a heap became obsolete and it will be collected by Garbage Collector and will be destroyed after.

image

 

Iteration 3:

The same story will happen with next member of array, string "M". There will be again created a new Sting object ("ARM") in a heap and the pointer of new object will be returned to "text" field. Previous object will loose its connection to pointer in the stack and object will be Garbage Collected.

image 

 

Iteration 4:

The same behavior as with the previous iteration. New object "ARME" will be created in the heap and "text" field start reference to it. As "ARM" object is not connected anymore with any pointer in the stack, it will be destroyed by Garbage Collector. 

image

 

Iteration 5:

And finally the last iteration. There will be again created new object and the previous one will be collected by Garbage Collector.

image

 

So, the conclusion is that to prepare simple string with 5 characters inside, we had to create 5 String objects, allocate memory space for each of them, where 4 objects immediately became a candidates for GarbageCollector.

 

When I was young my Mom was saying to me "I am afraid to ask you to clean anything in the kitchen, because you will create more mess around you" :D String Concatenation works pretty the same way.

 

StringBuilder with default capacity 

Ok, let's now build a sting by using StringBuilder. By default StringBuilder has 16 character capacity. It means that object in the heap will have allocated memory space for 16 characters. Which is perfect for us as we have to build a string with only 5 characters inside:

StringBuilder textSBuilder = new StringBuilder();

 

image

 

Again, we are iterating thru array and adding each member to StingBuilder:

foreach (string s in myNameArray)

{

     textSBuilder.Append(s);

}

 

Illustration below shows that all characters (strings) will be located to existing StringBuilder object, which "txtSBuilder" referencing to. There is no new object creation in the heap and there is nothing to be destroyed by Garbage Collector. So this approach should perform much faster than usual string concatenation.

 

image

 

Well, this is good, but what will happen if the string become bigger than capacity of StringBuilder. The answer is easy, in the moment when StringBuilder will reach to its capacity and will need more space, there will be created new StringBuilder object in the heap with doubled (in our case it will be 2 x 16 = 32 characters) capacity. All members of the current object will be copied to the new one and the pointer in stack will start referencing to the new object.

So, let's try to make the same operation as we did before but repeat it 4 times. Which means we need to get string of 4 x 5 ("A", "R", "M", "E", "N") = 20 characters.

 

for (int i = 0; i < 4; i++ )

{

      foreach (string s in myNameArray)

      {

            textSBuilder.Append(s);

      }

}

As you can see on illustration below, when StringBuilder object fills its capacity and need more space for adding new characters, new StringBuilder object will be allocated in the heap with double capacity than previous one (2 x 16 = 32), all chars from original object will be copied to new one and "textSBuilder" field will change reference to the new StringBuilder object. The original StringBuilder will become obsolete and will be destroyed by GarbageCollector.

 

image

 

 

StringBuilder with Specified Capacity

 

To optimize StringBuilder to perform better, it is recommended to specify capacity of StringBuilder object in the moment of instantiation. The point is that if we know how many letters will string contain, we have to specify this number as capacity for StringBuilder and it will allocate just one object with enough size for our need.

Lets perform the same operation as we did before but with specifying capacity.

 

StringBuilder textSBuilder = new StringBuilder(20);

 

Illustration shows that we just created one object. No need for new object creation, no new memory allocation, no work for GarbageCollection (at least for now). 

 

image

So, this is the best way how to build a strings.

 

 

Speedometer

Now is the time for real test :)

As I mentioned before, I created small console application where I used all those approaches to see the result in "real numbers".

There are several methods which basically do the same thing: cycle specified number of times and build a string by concatenating existing string with new char(or text). Difference is that all of them use different approaches: String.Concat(),  StringBuilder(), StringBuilder(capacity). There also set some checkpoint interval in which application writes out report.

 

Here is the example of one of those methods.

/// <summary>

/// Concatenation of strings by using String.Concat method.

/// </summary>

private static void StringConcatenation()

{

     String text = null;

               

     //Reset the properties related to checkpoint.

     //Write out start time.

     OperationReportPreset("String.Concat()");          

 

     //Perform concatination specified number of times

     for(int i=0; i<OPERATION_COUNT; i++)

     {

          //Concatinate two string by using

          //String.Concat(string, string) static method

          text = String.Concat(text, textForConcatenation);

 

          //Write out checkpoint result to show

          CheckpointReport();

 

          //each concatination creates new object in a heap;

          numberOfCreatedObjects = i + 1;           

     }

               

     //Write out operation end time.

     OperationReportFinalizer();

}

Other methods are pretty the same. The difference is that they use StringBuilder().Append() functionality instead of standard concatenation.

 

Note: The whole project you can download at the end of this blog post. There is a "Source Code" link.

 

Execution

After execution I got this output:

image

Execution with String.Concat() took more than 2 minutes.

It is interesting to observe the dynamic of objects creation. First 20000 concatenations took 1 second, for next 20000 almost 3 seconds, for another 20000 more than 6 seconds. For the last 20000 operations, it took 25 seconds. Which means as bigger string becomes after each iteration as more time it needs to allocate memory for the new String object in a heap.

BTW, after all iterations there  will be 200000 objects created in the heap where 199999 are unused and obsolete.

 

Lets take a look how StringBuilder with default, not specified capacity performed: 

image

The same action has been performed in less than 1 second, almost instantly. New object creates with double size each time when StringBuilder reaches its capacity. There were only 15 objects created during whole iterations where "only" 14 are obsolete.

 

What about StringBuilder with specified capacity:

image

Well it performed with the same result - under the 1 second but only 1 object has been created in the heap.

 

 

Report

 

 

Number of operations

Total performance duration

Created unused objects for GC

String.Concat()

200000

2 min 22 sec.

199999

StringBuilder with default capacity

200000

less than 1 sec.

14

StringBuilder with specified capacity

200000

less than 1 sec.

0

 

 

The StringBuilder with default parameter performs with the same speed as StringBuilder with specified. But of course we didn't count time which GarbageCollector needs to clean up the memory from unused objects. And of course the size of those objects also very important. There are 14 obsolete objects created by "StringBuilder with default capacity" which should be removed from memory and they all together take around half megabyte of memory size.

 

My Advice

My advice is to avoid using Sting.Concat() or + operator at all. Use instead StringBuilder with specified capacity. It is much faster and cheaper approach.

 

 

Technorati Tags: , , ,

 

kick it on DotNetKicks.com 

Wednesday, August 8, 2007

Rosario - the next generation of Visual Studio available already today

 

"Rosario" is the new version of Visual Studio Team System follows "Visual Studio 2008" Team System. New generation of Visual Studio available already today. Yes yes, there is a CTP version of "Rosario" on msdn available for download. Follow the link and get an image of "Rosario" (plus some documentation) now :)

Visual Studio Team System Code Name "Rosario"

Friday, August 3, 2007

Commerce Server 2007

Commerce Server 2007 is a huge and powerful application but it is a problem to find documentation or material about it. Including the fact that there are not so many people working on Commerce Server 2007, it is a new product and there is no book, it should be a problem to start implementation.

Nikola Malovic has a couple of awesome posts on his blog site blog.vuscode.com about Commerce Server 2007. Follow the link and check them out. 

Friday, July 27, 2007

Visual Studio 2008 Beta 2

Visual Studio 2008 Beta 2 is now available on MSDN web site :). Follow the link, pick the edition and download it.

Download Visual Studio 2008 Beta 2

Sunday, July 22, 2007

Family.Show 2.0

Version 2.0 of Family.Show application which is an incredible example of WPF technology now available. You can download it at Vertigo website http://www.vertigo.com/familyshow.aspx

There are a couple of new features added to application:

· New “Family Data View” with filtering, sorting, and in-place editing

· Family Analytics including last name tag cloud, age distribution historgram and birthday list.

· Enhanced story editing with support for font name, size, alignment, bullets,  and numerical lists

· Filtering and sorting the Family List view, in the main window

· More cowbell!

· Skinnable user interface with two skins: black and silver

· Improved Windsor family sample data file with stories and images.

· Source code migrated to CodePlex

Tuesday, July 3, 2007

WinForm UI interface on WEB applications

If you get used to WinForm UI interface and would like to create a web site witn the similar design or maybe you have a goal to create two applications WinForm and Web with synchronized UI interface, the NETIKA TECH company has solution for that. NETIKA TECH provides rich GOA WinForm controls for Silverlight or Adobe Flash and entirely FREE. You can visit NETIKA TECH web pages to get full information about the product, see live demos and download for free. The GOA core library includes more the 40 controls and components:

 

  • Control, ContainerControl, ScrollableControl, Panel
  • Button, CheckBox, RadioButton, GroupBox, Label
  • TextBox, NumericUpDown
  • ImageBox, ImageList
  • ScrollBar, HScrollBar, VScrollBar
  • Form, MessageBox, Cursor
  • ListBox, CheckedListBox
  • ComboBox
  • TreeView
  • MonthCalendar
  • TabControl, Splitter
  • ToolTip, ProgressBar, Timer
  • ToolStrip, StatusStrip, MenuStrip, ToolStripButton, ToolStripComboBox, ToolStripDropDown, ToolStripLabel, ToolStripProgressBar, ToolStripSeparator, ToolStripSplitButton, ToolStripTextBox
  • XamlCanvas (Silverlight specific)
  • Rich DataGrid controls and more

 

Download the GOA WinForm

Tuesday, June 26, 2007

Hidden Development Methodologies

There are many unknown methodologies which we are touching every day during development but don't have a clue about them. Have you heard about ADD, CDD, CYAE, DBD or GMPM. Well, I did not until I visited the Berkun Blog.

Follow the link http://www.scottberkun.com/blog/2007/asshole-driven-development/ and you will be surprised. :)

Very useful information for people who are organizing and leading development.