Sam Allen's profileDOT NET PERLSBlogListsSkyDrive Tools Help

Blog


    Numbers to DateTime, to Database Efficiency

    I spent the afternoon reading the books "Expert SQL Server 2005 Development" and the "SQL Cookbook" by O'Reilly. I got my books and a drawing board and a piece of paper and a pen and did some old fashioned studying. It is nice to think about programming and not be at the computer. I don't like laptops due to (lack of) screen size and I much prefer mice, so I can't take a laptop around and use that. What I wanted to do today was learn the ins and outs of the DateTime data type in SQL Server 2005/SQL Compact Edition databases.

    Now, I had designed a content management system that used a bunch of integers for dates: Month, Day, and Year. This was okay at first but I could do better. SQL Server has a very handy DateTime data type to use—the hard part is knowing exactly how to use it. The syntax, the nitty-gritty details, etc. Well, that's what I tackled and I have gotten a good start on it now.

    Here's the first thing to know: SQL DateTime fields always contain both a date and a time. So if you try to compare a SQL cell to DateTime.Now, you will fail. The trick is to use the greater than, less than, and >= and <= operators. To find a DateTime on a specific date, your WHERE clause can look like this:

    WHERE DateTimeColumn >= @dateTimeToday AND DateTimeColumn < @dateTimeTodayPlusOne

    Define the value of @dateTimeToday to be DateTime.Today and @dateTimeTodayPlusOne to be DateTimeToday.AddDays(1). I will include an example code snippet at the end.

    In my program, I wanted "Go To Next Date" and "Go To Previous Date" functionality. Those can be expressed this way (starting with getting the date in the database that follows the current date):

    SELECT DateTimeColumn FROM JournalTable WHERE DateTimeColumn > @dateTimeCurrent ORDER BY DateTimeColumn ASC

    Define @dateTimeCurrent as the date you are currently viewing or that is selected. The SELECT will return all dates greater than the current date, with the ones that are nearest at the top (ASC means sort by smallest to largest).

    To get the previous date, do this:

    SELECT DateTimeColumn FROM JournalTable WHERE DateTimeColumn < @dateTimeCurrent ORDER BY DateTimeColumn DESC

    That gets the biggest date that is smaller than the current date (the date previous to the current one).

    Unfortunately, I am using SQL Server Compact Edition right now, so TOP 1 doesn't work. If you are using a more powerful database, you can SELECT TOP 5 or whatever and the statement will only return the first 5 results. That is better.

    C Optimization Tip: Testing Against Zero--MOVED

    Another moved post: see the new document: Optimize C Loops Against Zero.

    DeBruijn Bit Positions, FFS: MOVED

    This wasn't a popular article, but I have moved it to a much better place, my new site (Dot Net Perls Dot Com). Visit the article here.

    Segoe/Tahoma Article Moved

    A much better article at Dot Net Perls Dot Com! With screenshot, etc. All about Segoe and Tahoma in C#

    Random Perls: Firefox Personas, Alphanumeric Sorting

    UPDATE: I have a better alphanumeric sorting algorithm, and the source code is open source and easy to use! Check out my newer post, Alphanumeric Sorting Revisted:

    Alphanumeric Sorting Revisted

    Some tidbits (snippets?) from what has been going on in my little programming microcosm:

     

    • Played around with Firefox Personas, which I had thought were rather silly and useless, but surprisingly after actually installing the add-on I was quite impressed and think that once a nasty glitch bug gets fixed I might use the feature often. (Maybe themes are not just for kids in junior high?) I particularly liked the orange circles theme, and the paper one. Wow!
    • I decided to fix the sorting order in C# application #4 (Highway Viewer). I have a combo box with a list of about 300 different highways and when the selected item changes, my program will download new information about the highway condition and display it. I guess it is more of an experiment or proof of concept, but I have had a bit of fun tinkering with it and I like it. So, anyways, there were some interesting posts on Reddit about "alphanumeric" or "natural" sort. I did some looking around and found a nice piece of C# code by Dave Koelle that provides a special Comparer in C# that I can use on an Array. I changed a couple things in my code, added the file, and it works great! Now the highway dropdown list is nicely sorted and easier to use.
      • Some people on Reddit were ridiculing the claims that this algorithm was "invented" because it has been present in operating systems for many years. Well, I have invented some algorithms that are well-known too. And frankly these things are more useful than many new inventions. Just look at the new patents for examples. I realize that the A* Pathfinder algorithm was already known, but I enjoyed making one myself. How else is a programmer supposed to get better at his art?
      • Likewise, I am sick of programming "gurus" complaining about Java U—programmers who were taught Java in school and don't know all the details about machine code, etc. Well, if you look at the industry, knowing Java is a more profitable skill than some obscure assembly language from the 70's or whatever. I don't do Java but I do C# and as far as I know they are quite similar.
    • I was explaining my views on engineering and they are this: an engineer simply makes a change and tests it. Then he or she does that again. And so the process continues. Now, some knowledge and logic is required, but really you can get pretty good just by repeating this process. I don't have many fancy certificates, though, so maybe I am not qualified to say these things.

    This is a picture of my Highway Viewer program display some road conditions and also the sorted drop down items:

    hv1

    Winforms C# .NET: Making It Faster By Making It Slower

    I have two desktop Winforms applications that store 30+ MB of data in databases (originally SQLCE, now SQLite). One thing I like to do is page through large series of documents or images stored in these databases, to gain a clearer view of time and change between them. With one of my programs, ~700 documents are 8+ MB in total, and they are over 1 million words in length. The viewer/editor program I made in C#/.NET works well and I have been adding features to it, but I wanted to solve the problem with the application that would cause it to "starve" the UI thread by flooding it with rendering requests in rapid succession. This would cause the program to generally look bad, unsteady, and unreliable.

    I have a keyboard shortcut to open and display the previous day's document (Control-Left). On my computer, in the .NET runtime, the slowest operation is painting and rendering the controls. Next is reading from the hard disk. Well, the database structure I am using fixes the hard disk slowdown in the best way I know how, but the painting problem was still present. I implemented a paint suppression (throttling) algorithm. The idea is that the program will skip painting certain screens if the screen is expired/not current before it is painted. Here's the algorithm:

    1. The user request the next document in a series.
    2. On the ThreadPool, call a function that prepares for rendering the next document requested.
    3. In the above preparation function, store the current thread ID (Thread.CurrentThread.ManagedThreadId) in a local variable. At the end of this function, store the TickCount of the time. (Environment.TickCount is essentially the elapsed milliseconds since the program started.)
    4. Read the document into memory: access the database (SQLite/SqlCe).
    5. Check the previously stored Environment.TickCount number against the current time.
    6. If the last paint occurred less than 100 milliseconds ago, wait until 100 milliseconds have passed. (Some subtraction helps here to find the time difference.) Call Thread.Sleep() here to make this thread wait.
    7. Now, after Thread.Sleep(), see if we are still the current thread (check again against ManagedThreadId).
    8. Return early without painting or setting the textBox if a new painting request has come in while the thread was sleeping.
    9. Once a thread is at least 100 milliseconds from the last paint, and is the current (latest) thread, call an Invoke on the UI thread using a delegate.
    10. The UI thread can now set the TextBoxes as required. Make sure to lock the text boxes' updates with a member variable that tells you what they really correspond to!
    So this is what happens:
     
    • No layout can occur less than 100 ms after the previous layout. That's (less than) 10 layouts a second, which appears to be pretty fast.
    • The layout that does occur is always the most recently requested one.
    • When moving through hundreds of documents quickly, the program skips painting many documents but the program remains fluid and responsive.
    • CPU time elapsed decreases to 5 seconds from ~20 seconds to page through 700+ documents.
    • Memory usage might be increased a little bit due to more threads being run and more heap being used.
    • If the user requests to move just a single page, no rendering will be delayed or lost. It will occur instantly.

    There are more bugs to work out, but so far I am really impressed with this "paint suppression," or "paint throttling." It really improves the user experience of my programs.

    Update: Okay, I have worked out more of the details of this algorithm. I have to say this approach really makes a Winforms application look professional and it decreases CPU time and enhances responsiveness. Here is the biggest issue I have had with it: You have to use a lock (or similar object) to ensure that the currently displayed form contents are in sync with the "document position" variable (a date, number, etc., that indicates the ID of the currently requested document). Note that the currently requested document (a cursor of sorts) is different from the currently displayed document.

    lock (_threadLock)
    {
        textBox1.Text = "Document text for the date selected as of this moment.";
        _displayedDate = _dateSelected;
    }
    // _dateSelected will be changed elsewhere, before textBox1 changes (it will schedule a thread to change textBox1).
    // _displayedDate will always be synchronized with textBox1's current properties/Text.
    I also implemented this in my Unabridged Dictionary program, and it really does enhance the user experience. Just keep your variables in sync and keep track of the time, and things will work out great.
     
    Possibly the most important thing about this sort of enhancement is that it makes your users feel that your program is steady, reliable, and predictable. The program doesn't stutter and it does everything in specified intervals. It is even, and it feels faster even though it does less and basically puts itself to sleep over and over again.
     
    What applications is this technique useful for? Any program that allows a user to scan over many documents/pieces of data/images quickly. A more sophisticated technique is used in web browsers and I am sure Microsoft Word uses a bunch of algorithms that are probably more complex and refined.
     
    UPDATE: This code is now available in my public folder.

    Developers and Databases All Around the World

    What's going on in the computer world? Well, quite a lot, and most of it I don't know about.

    • I saw an article that discusses the performance of solid state drives (flash memory, etc.) and the author(s) found some amazing performance gains with databases. As I am working on my database skills right now (client based), I was intrigued and think that databases will become even more important in the future. Wikipedia and Google are pretty much forms of databases (well, with a bunch of features on top). Storage is so important and I feel that my PC's performance bottleneck is its hard drive. That is what slows it down—installing programs, etc. I do wonder if SQL CE is as fast as SQLite or other competing products on the client/desktop; WebKit, Mozilla, Adobe AIR, and others use SQLite, so it must be good. I don't see SQL Compact Edition adopted in quite that many different products. I do think it is fairly popular in certain situations, such as the enterprise.
    • Mozilla's developers are on a campaign to reduce memory fragmentation. Why is this important? Well, they think that Firefox's endless memory growth is due to the memory allocated being place in all different locations, so the allocator just keeps putting more on the end. Does it ever stop? I am skeptical about how useful this push will be, as there have been many different approaches to Mozilla's performance problem and a lot of them don't seem to change much. Of course, Firefox is only using 100 MB of RAM right now (I am using Minefield/Firefox 3). WebKit/Safari uses more.
    • Opera (makers of the Opera web browser) have sent in their lawyers to attack Microsoft and its "monopoly" of Windows Internet Explorer. If they have a monopoly, why does 15% of the world's Internet users surf with Firefox? (It might be lower overall because of China's non-adoption of Firefox.) I dislike Opera but like Firefox and Safari quite a lot. I also like Internet Explorer, even more than Opera.
    • This blog is getting visitors from Europe, Asia, Africa, North America—still haven't seen Australia or South America. I have been enjoying it even though it is not terribly popular.
    • I am still looking around at jobs and working on my skills so that I am more employable in the industry. I wouldn't take a job I could do and do well. Databases are an important thing to know about, which is great considering I find them fascinating and almost invented my own.
    • I installed Windows Vista SP1 and it is very nice. In the RTM version of Vista, the Sidebar would cause hundreds of millions of page faults over the days I leave the computer running (it sleeps at night). This is fixed in SP1, which alone makes it worth the upgrade. Office SP1 2007 came out too. I am typing this blog post in Word, which now can publish blogs on its own.
    • I did some database work, optimization, implemented a refresh timer, polished some UI issues, and of course implemented new features like word count. I also installed the SQL Compact Edition program I wrote on the old Dell, and it worked well.

    SQL Compact Express and Compactness

    I have been building things with SQL Compact Edition 3.5 databases, and these "things" are working pretty well. I still have to put forth more effort to really like my creations, but I make enhancements each day. I am going to share with the blog today my discoveries with Microsoft Compact Edition SQL Server. Here is how you set up a connection, and then read in the values from that connection into some objects in C#:

    string SomethingExample(string input) // This function is called thousands of times
         string result = String.Empty;
         SqlCeConnection com = new SqlCeConnection(@"Data Source=|DataDirectory|\data.sdf");
         com.Open();
        SqlCeCommand c = new SqlCeCommand("SELECT Something FROM TableName WHERE ColumnName=@input", com);
        c.Parameters.AddWithValue("@var", input);
        SqlCeReader reader = c.ExecuteReader();
        if (reader.Read())
            result = reader.GetString(0);
        return result;
    }

    So that is the code. However, if you run that code, your nice, efficient C# program could end up taking 1.5 GBs of memory and take your brand new machine to its knees! Microsoft has some problems with memory here. It is extremely important that you call Dispose() on SqlCeCommand and SqlReader, and also Close on reader. Of course SqlCeConnection must also be closed. Let's make a new class with a constructor that sets up the SqlCe stuff:

     

    class Setup // This class is allocated once
    {
        SqlCeConnection com;
        SqlCeCommand c;
        public Setup() // Initialize and allocate the objects once on class creation
        {
            com = new SqlCeConnection(@"Data Source=|DataDirectory|\data.sdf");
            com.Open();
            c = new SqlCeCommand("SELECT Something FROM TableName WHERE ColumnName=@input", com);
        }
        string SomethingExample(string input) // Called thousands of times
        {
            string result = String.Empty;
            c.Parameters.AddWithValue("@var", input);
            SqlCeReader reader = c.ExecuteReader();
            if (reader.Read())
                result = reader.GetString(0);
     reader.Close();
     reader.Dispose();
     c.Parameters.Clear();
     return result;
        }
    }

    Wow! So let's go over what I changed. The SomethingExample() function is now using class-level member variables and not local variables. This means that it doesn't allocate those variables each time it is called and instead uses memory already present. We use the same SELECT SQL string, as that doesn't change. We use the input parameter to modify the SQL SELECT string (so that the lookup is for the input string). The new function also calls Clear() on the Parameters collection, so that when Example() is called next, the parameter list will be empty.

    Also note that I call Dispose() and Clear() on the reader. (I tried making the reader a class-level member variable, but it didn't change behavior or performance and seemed to make the code less clear.) This frees up all the resources. Note that in my first example, the code example I used isn't the exact same one in my program that I tested so it might not waste 1.5 GBs of memory. I am using some poetic license here. However, I can assure you that the second example is several times faster and also uses less memory.

    I have some stuff to say about implementing transactions in SQL Compact Edition, but I will save that for later. The code above wasn't compiled and was just typed manually (no copy/paste, either). Isn't it amazing how two pieces of code can do the exact same thing, but one can be 2x, 4x, 1000x more efficient than the other?

    Unabridged Programming Endeavors and Optimization

    Well, today I have made some worthwhile enhancements to my Unabridged Dictionary program (based on the Project Gutenberg Webster's dictionary). I started in on converting the data storage in the program to an SQL Compact Edition database, and completed that a day ago or so. The improvement was not astounding but worthwhile, and the memory use and CPU usage are both decreased by a bit, the exact amount depending on the specific use case.

     
    Version
    Test 1 (3 Words)
    Memory Used (MB)
    Test 2 (4 Words)
    Memory Used (MB)
    1
    17.3
    25.3
    2
    12.5
    22.8
    3
    10.8
    21.4
    4
    9.3
    12.9
    5 (SQL)
    9.1
    9.2
    5.1 (SQL)
    8.9
    9.0

    On the first 4 versions, I used a Dictionary<string, string> or similar structure to cache the words from a file, and in the last two I use a SQL Compact database to find the words and then cache the results for later accesses. The performance is generally better as memory goes down, although once the files are cached in versions 1-4, the program is faster. The CPU and disk hit is spread out more on versions 5-5.1. Version 5.1's database is about 7 MB smaller, because I eliminated an unneeded index.

    It is interesting that when I first got the Gutenberg.sdf database working, it took about 300 ms for a word lookup. I added an index and changed the file format slightly and then it took about 10-30 ms. So, the lesson here is that adding an index to a SQL database is critical and very beneficial if you do it right. I just use an index on the words themselves, and use that to find the definitions.

    Gutenberg Unabridged Dictionary Changes

    1. Replaced bookmarks file code with code that simply uses Settings; simplifies, removes lines. I have a default string set as a setting and a user-defined string that can be changed. Simply use Split() and use.
    2. Use StringBuilder in GetDefs...() function.
    3. Use quick C-style string scanning for uppercase lines. (Performance was tested in Benchmarker.cs; used best-performing technique.)
    4. Added Close() and Dispose() on SqlCe objects. Possible memory improvement (?).
    5. Removed unused "usings"; simply right-clicked on them and selected the "Organize Usings" option.
    6. Moved database into the local directory. Added back DataDirectory option in ConnectionString.
    7. Changed bookmarks menu to only load its data when it is opened, improving startup speed. Fewer file operations, simpler code. Used an anonymous delegate function in the EventHandler(). Added code to prevent ReadBookmarksMenu() from being run twice.
    8. Modified input box focus code so that the caret is always placed at the end and never at the start.
    9. Changed Application.Exit() to this.Close(). Cleaner and safer I would think.

    Unabridged Dictionary Database Changes

    1. Removed second index that I wasn't using (figured out how to drop indexes easily in the GUI). This saves 7 MB off of the database.

    To Do

    1. Install Unabridged Dictionary.exe on other computer (Dell 3 GHz). Make sure installation works properly.
    2. Eventually improve the Perl dictionary parser to improve data quality.
    3. Add Edit menu items to match context menus.

    Other

    I added a file to my public folder, called Benchmarker.cs. This program is a simple benchmarking program that indicates the various times that an operation might take in C# code. The same code using regular expressions might take 1.3 seconds to execute while more C-style code takes 250 milliseconds. This stuff is useful to know and will help me learn more best practices.

    SQL Compact Express: All My Base Are....

    I am going to suspend my feverish blogging for a bit while I explore and learn a new technology: SQL Express and Compact Express. I have a working database set up and I am going to make a new one and transition two of my personal projects to SQL Compact databases. I think performance, scalability, and simplicity will improve. I am excited. Here's a history of my computer programming activity and learning:
    1. Learned HTML, which was easy.
    2. Learned CSS, which was a little harder but still easy.
    3. Tried to learn C, and grasped some of the simpler concepts, but it was hard.
    4. Learned Perl, and mastered associative arrays and references.
    5. Learned regular expressions, which looked hard but turned out to be relatively simple (if scary-looking).
    6. Learned C for real--this took a while. Relatively complex (definitely complex?) stuff learned; malloc(), pointers, bits.
    7. Learned C#, which wasn't really challenging but I have a lot more to learn.
    8. Now, SQL databases. Still in the beginning stages but it doesn't seem that threatening anymore. In fact it seems I have done things much more complicated and in some ways equivalent. It is harder to write the database algorithm than to administer a database.

    There is more to learn--but I am determined and progress will be made. I am going to post and interesting or confusing things I learn about SQL Express as I deal with them. It seems like databases are about simplicity, not complexity. There is a learning curve but once you are past that it seems a lot easier. It is harder to reinvent the database algorithm than to pass a few SQL queries to something someone else coded!

    Rogues, Minotaurs, and Seeker Algorithms MOVED

    This post is available in an improved version at DotNetPerls.com.

    How To Thread, or How Not To?

    Okay, I started learning C# two months ago, but almost every day I have spent several hours reading, writing, and researching code. So I am not an expert, but I do try to do my best. So I left a comment about an article about Python threads today--the article was good, and I just said something about a possible idea of mine to improve the code. After I did that, I searched for some C# threading articles to satiate my curiousity about optimal practices. I found this article.
     
    No, no, no. Do not read that article. Well, you can read it but don't try to learn from it. I respect that Mr. Kanetkar may be a good programmer in many ways, but this article is bad.
     
    First, he starts three threads with the ThreadStart being display1, display2, and display3. So he is having a function on another thread update the textboxes on the UI thread. Don't do this! Call Invoke() from the threaded function to change the textbox's value (the textboxes being num1, num2, and num3). That is the threadsafe way of doing it. Doing it the way in the article can cause awful bugs because of thread races, etc. It is bad.
     
    Second, when the user clicks the button, he aborts all three threads. Abort() is bad! Visual C#'s IDE complains on Abort() for a reason--it can cause problems in more complex programs. So, basically, here is my take on Mr. Kanetkar's article:
    1. It showcases thread handling in C# at its worst (well, in the worst way that probably, usually, works).
    2. It uses duplication of code that is not graceful (not a big deal, really).
    3. Why not just use a timer and update the textboxes? Well, that wouldn't make a good article about threading, so I won't complain there.

    I am not an expert, but I know this code is bad. I didn't start this blog to complain, but sometimes doing so is reasonable.