Some thoughts on synching data between servers and smart clients
Anyway you slice it, it depends on the specific uses in your application.Here is yet another question on smart client architecture:
My application works as follows:
The desktop application will download the database from the server (when it installed in the user machine). User will not do any changes to the local copy of the database. However, the application will link into a web service (say once a month) to obtain a new copy of the database.
I can reduce the network traffic, if I can retrieve only the changed/new records from the database in the server.
Ok... I send a copy of my local data as a dataset to the web server. Is it possible to use this database to check for changes in the server database and to get only the changed/new rows from the server database? Then I can update the desktop copy with new rows.
And the answer is:
… complicated.
There are three variables that you are working with, and you want to minimize all three:
First, how often your program must communicate from client to server.
Second, how much data must be retrieved when your program communicates from client to server.
Third, the extra logic required to handle out-of-date data at the desktop.
To begin with, a Dataset is just the wrong class for the scenario you’ve outlined above. Tracking changes in a Dataset object will work to update a database at the server with changes from the client. But, going the other way is just not particularly effective. You’d need to have a different dataset object that tracks the differences between the current database contents and the contents as last delivered to each and every desktop workstation. That simply won’t scale, and adds quite a bit of extra processing on the server. Don’t do that.
You suggested sending the entire database copy to the server, and then sending a new copy back down to the client. That will seriously hurt the second point above (how much data should be transferred between the server and the client.)
So, what to do? Well, to some extent, it greatly depends on your application. However, there are some general ideas that you can use to pick the best solution. First, consider the best granularity for updates. A good first pass solution often is to send any and all tables that have changed to the client.
Decide if records will be deleted in your application. (Many scientific applications will mark data as deleted, but not permanently remove it. It provides a better audit trail). Deleted records are often handles in two different ways. You can either completely replace the table at the client side, or write methods that send records marked for deletion to the client. The client can then delete those records. Which you choose depends on both the size of the table, and the frequency of delete operations.
Some final recommendations
This question is just too broad for a blog post. The fact is that the best way to manage data synchronization between client and server will be very application-specific. You’ll need to analyze the user cases for your application, and optimize your design for the most likely scenarios. But, in all cases, the dataset transaction mechanism is not the best method for sending changes from the server to the client.