May 2013 ~ Windows Mobile Support

Thursday, 30 May 2013

Windows Azure - CRON jobs on Web Roles

Posted on 06:02 by Unknown

In one of the current project we need to support jobs on SQL Azure. If you already tried to create a job in SQL Azure maybe you noticed that there is no support for that. I mean that the job concept don’t exist on SQL Azure – yet, there are some rumors that in the near future we will have something like this.
Our client needs this functionality now and we cannot wait one more week. Because of this we had to find a solution to make our client happy.
On the store of Windows Azure we can find an application called Scheduler that can help us to implement something like this (CRON jobs). You can find more about this app on the following link: http://vunvulearadu.blogspot.ro/2013/02/cron-job-in-windows-azure-scheduler.html
In our case we cannot use this application because we have custom security configuration on the Azure machines. Because of this we cannot expose an endpoint to a 3rd part. The solution that we end up is a temporary solution that will be used until we will be able to run jobs on SQL Azure.
Having a web application, hosted on web roles we decided to incorporate this functionality in the current web roles. We don’t want to have a dedicated worker role for this – is more expensive, is more reliable but for our case we can live with this task on web roles also.

Solution
On the web roles we have an action that trigger the stored procedure from SQL Azure. Before calling the stored procedure, our method will check a table from Azure Storage to see when was the last time when the CRON “job” ran and if we need to run it again. Each time when the store procedure is called the last run time field from Azure Storage table is updated. This action is triggered on Application_Start() method on Global.asax and therefore is called each time the application starts.
In the same time we started a timer that is set to do the same thing but at a specific time interval – in our case every 2 weeks.

Flow

1. Application Start

2. Try to run the job

a. Check when was the last time when job ran

i. We should run the job again

1. Update the field that specifies the last time when the job ran

2. Run the store procedure from SQL Azure

ii. We don’t need to run the job

1. Nothing happens

3. Start the timer

a. Run the following action at a specific time interval

i. Check when was the last time when the job ran

1. We should run the job again

a. Update the field that specifies the last time when the job ran

b. Run the store procedure from SQL Azure

2. We don’t need to run the job

a. Nothing happens

Why we need to run this code every time when the application starts?
Because even if we can configure the idle time of the application pool on ISS, we can end up in having our application on idle or the IIS resets. Because of this, we need to try to run the CRON “job” when the application starts.
Why we need a table from Azure Storage?
We don’t want to run the store procedure over and over again. Because of this we need to know when was the last time when the store procedure ran with success. Of course we could use SQL for this and log information in SQL, but we didn’t want to depend on SQL.
Also, we can have more than one web roles. In this case we don’t want to trigger this job multiple times – because of this we need to know what was the last time when the CRON “job” ran.
Why we don’t store the status of the store procedure on the table from Azure Storage?
In our case we had to make some cleaning tasks on SQL. Because of this we could afford to run the store procedure more then one time. We know about this risk and we accept this. If we would have had a job that should have run only once a week then we should have add the status of the CRON "job" to our table from Azure Storage.

Conclusion
This is a pretty simple solution that can be used with success when we need to run CRON “jobs” on Windows Azure. Based on the requirements we can have more complicated solutions. One thing that we should remember is to keep it simple. We don’t want to complicated things just for fun.

Posted in Azure, Cloud, Windows Azure | No comments

Friday, 24 May 2013

[Post-event] ITCamp 2013

Posted on 07:26 by Unknown

These days I participated at ITCamp 2013. This year I had the opportunity to participate at a premium event with almost 400 attendees. For Romania, this is big number, it is hard to gather 400 people at a premium conference.

What we found this year? COOL stuff. We discover what is the future of tablets, what is happening on the cloud and the voodoo magic that is happening inside the team management... and so on.

I really enjoyed the conference, the sessions and especially the location – you could see all the city from above. Each year one of the most important thing at ITCamp is socialization – meeting people from different parts of the world, discover what they are doing and what are the trends of the IT industry.

This year I had the opportunity to be invited as a speaker at ITCamp. Last year I talked about background task on Windows 8, but this year I decided to go on the cloud and talk about messaging pattern and how we can implement those patterns using Windows Azure Service Bus Services.

My slides from ITCamp:

Messaging patterns in the cloud from Radu Vunvulea

Posted in Azure, Cloud, Cluj, cluj-napoca, design patterns, eveniment, event, ITCamp, service bus, Windows Azure | No comments

Wednesday, 22 May 2013

JavaScript, Unit Test, Visual Studio and Build Machine Integration

Posted on 14:54 by Unknown

Today I will write a short post about JavaScript and unit testing. I heard a lot of .NET developers that they didn’t wrote code for JavaScript because is not supported by Visual Studio, is complicated to run it on the build machine or they don’t have time.

Guys, I have news for you, Visual Studio 2012 supports unit tests for JavaScript, even Visual Studio 2010. You can run them almost like a normal unit test for C# without needing to install anything. The JavaScript unit tests are so smart that are integrated in a way that you don’t need to change/install anything on your build machine – you even receive the standard message notification when a unit test fail. You don’t have time for them – I will not comment this, definition of DONE is wrong for those developers.

When I need to write unit tests for JavaScript code I usually prefer qunit. Why? Because in combination with a small NuGet package called NQunit you can make magic.

qunit give you the possibility to write and run JavaScript unit tests. This is a simple testing framework. The output of running unit tests is a XML that can be parsed, used by build machine or any other machine. More about qunit:

http://vunvulearadu.blogspot.ro/2012/10/how-to-write-unit-test-in-javascript.html

http://vunvulearadu.blogspot.ro/2012/10/new-unit-tests-features-of-visual.html

NQunit makes the integration between classic unit tests for C# and JavaScript code. Using this package you will have the possibility to run unit tests written in JavaScript like normal unit tests. I prefer it because you can use it with success in Visual Studio 2010 also, not only in Visual Studio 2012. One nice feature of this package is the way is integrated with the build machine. Because this package runs normal unit tests you don’t have to change anything on the build machine or to install something on developers’ machine.

The secret of NQunit is the way how it run the tests. He takes from the output folder of the HTML files that were added and run them in a browser. For each of them, he capture the XML that is generated after the test run. The XML that is generated by qunit contains the summary of the test result and can be used to get all the information that are needed. At the end, NQunit will close the browser.

public static IEnumerable<QUnitTest> GetTests(params string[] filesToTest)

Using this method you can specific all the files you want to test. Don’t worry, when you will start to use NQunit you will see that there is a great sample on NQUnit that is preconfigured.

Hints before you start:

Don’t run the tests from ReSharper, ReSharper don’t run the tests as you expect – because the way how each session of tests runs and access the rest of the output build.

If more then one unit test from JavaScript fails, you will only receive information related to that only one unit-tests – this can be fixed writing some C# code, but I can live with it.

If you have JavaScript code then write some Unit Tests.

I started to use this NuGet package one year ago and it works great. For more information about this great package:

http://nuget.org/packages/NQUnit/

http://robdmoore.id.au/blog/2011/03/13/nqunit-javascript-testing-within-net-ci/

Posted in java script, javascript, unit test, unittest | No comments

Tuesday, 21 May 2013

[Post-Event] Asynchronous Master Class, May 18-19, Cluj-Napoca

Posted on 12:00 by Unknown

This was one of the greatest weekends of this year. Me and another 60 developers from Cluj-Napoca had the opportunity to participate to a master class about asynchronous programming. The master class was organized by Codecamp and took place for two days and was 100% free for all the attendees (a BIG THANK YOU for iQuest – they were the sponsors of this event).

At this event we had the best trainers ever - Andrew Clymerand & Richard Blewett. This was the first master class, when in both days the training room was full 110%. At the end of the master class (Sunday afternoon) all the attendees had learn new stuff. I hope that they are prepared now for the asynchronous world.

Special thanks for Andrew and Richard. They flew special from UK to Cluj-Napoca for this master class.

Posted in async, cluj-napoca, codecamp, eveniment, event | No comments

Monday, 20 May 2013

Testing the limits of Windows Azure Service Bus

Posted on 11:52 by Unknown

Part 1 – What is the maximum number of messages that can be processed in 30 minutes?

Part 2 – What is the optimum size of worker roles that process messages from Service Bus?

Part 3 – What is the maximum number of worker role that can process messages from Service Bus?

Part 4 – What are cost of Service Bus when we need to process millions of messages every hour?

Part 5 – What are the scalabilities points of Service Bus?
Part 6 - Topic isolation

Posted in azure. cloud, service bus, Windows Azure | No comments

Thursday, 16 May 2013

Hadoop and HDFS

Posted on 03:49 by Unknown

We all heard about trends. We have trends in music, fashion and of course in IT. For 2013 there were announced some trends that are already part of our lives. Who didn’t hear about cloud, machine to machine (M2M) or NoSQL. All these trends have entered in our lives as part of our ordinary day. Big Data is a trend that existed last year and it remained one of the strongest one.
In the next part of this article I would like to talk about Hadoop. Why? Big Data doesn’t exist without Hadoop. Big Data could be an amount of bytes that the client wouldn’t even know how to process them. Clients started a long time ago to ask for a scalable way of processing data. On ordinary systems, processing 50T would be a problem. Computer systems for indexing and processing this amount of data are extremely costly, not just financial but also in terms of time.
At this point Hadoop is one (maybe the best) of the best processing solution for a big amount of data. First of all let’s see how it appeared and how did it ended in being a system that can run on a 40.000 distribute instances without a problem.

A little bit of history
Several years ago (2003-2004), Google published an article about the way it handles the huge amount of data. It explained what solution uses to process and store large amounts of data. As the number of sites available on the internet was growing so fast, Apache Software Foundation starts creating Apache Hadoop based on the Google article. We could say that the article has become the standard for storing, processing and analyzing data.
Two important features for which Hadoop is currently a system that many companies adopt it for processing Big Data are scalability and the unique way in which data is processed and stored. About these features we will talk a little later in the article.
During the entire development process, Hadoop system has been and will remain an open source project. At the beginning it was supported by Yahoo which needed an indexing system for their search engine. Because this system was working so well it ended also to be used by Yahoo for publicity.
A very interesting thing is that Hadoop system didn’t appeared overnight and at the beginning it wasn’t as robust as it is today. At the beginning there was a scalability problem when it had to scale up to 10-20 nodes. The same problem was with the performance. Companies as Yahoo, IBM, Cloudera, Hortonworks saw the value that Hadoop system was bringing and they invested in it. Each of this system had a similar system which tried to resolve the same problems. At this moment it became a robust system that can be successfully used. Companies as Yahoo, Facebook, IBM, ebay, Twitter, Amazon use it without a problem.
Since in Hadoop data can be stored very simple and the processed information occupy very little space, any legacy system or big system can store data for a long time very easily and with minimum costs.

Data storage – HDFS
The way Hadoop is built is very interesting. Each part is thought for something big starting with files storage to processing and scalability. One of the most important and interesting component that Hadoop has is the files storage system - Hadoop Distributed File System (HDFS).
Generally when we talk about storage systems with high capacity our thought leads us to custom hardware which is extremely costly (price and maintenance). HDFS is a system which doesn’t need special hardware. It runs smoothly on normal configurations and can be used together with our home or office computers.

1. Sub-system management
Maybe one of the most important properties of this system is the way that each hardware problem is seen. From the beginning this system it was designed to run on tens, hundreds of instances, therefore any hardware problem that can occur is not seen as an error but as an exception of the normal flow. HDFS is a system that knows that not all the registered components will work. As a system that is aware of this it is always ready to detect any problem that might come and start the recovery procedure.
Each component of the system stores a part of the files and each stored bit can be replicated in one or more locations. HDFS is seen as a system that can be used to store files which have several gigabytes and can reach to several terabytes. The system is prepared to distribute a file on one or more instances.

2. Data access
Normal file system storages have as a main purpose the data storage and they send us the data we need for processing. HDFS is totally different. Because they work with large amounts of data they solve this problem in a very innovative way. Any system we will use, we will have problems in the moment we will want to transfer large amount of data for processing. HDFS allows us to send the processing logic on the components where we keep the data. Through this mechanism the data needed for processing will not be transferred and only the final result must be passed on (only when needed).
In such a system you would expect to have a highly complex versioning mechanism. A system that would allow to have multiple writers on the same file. In fact HDFS is a storage that allows to have only one writer and multiple readers. It is designed this way because of the type of data it contains. These data doesn’t change very often and that’s way it doesn’t need modifications. For example the logs of an application will not change and the same thing happens with the data obtained from an audit. Very often data that is stored after processing they end in being erased or never changed.

3. Portability
Before talking about the architecture of the system and how it works I would like to talk about another property that the system has – portability. For this reason HDFS is not used just together with Hadoop system but also as a storage system. I think this property helped HDFS to be widespread.
From the software point of view, it is written in Java and can run on any system.

4. NameNode and DataNode
If we talk about the architecture of such a system is necessary to introduce two terms in our vocabulary: NameNode and DataNode. It is a master-slave system.
NameNode is „the master” of the storage system. This is handling the storage system of file name and knows where it can find it – mapping files. This system doesn’t stores the file data; he is just dealing with the mapping of the files, knowing in every moment the location where these files are stored. Once the name has been resolve by NameNode it will redirect the clients to the DataNode.
DataNode is „the slave” that stores the actual content of the files. Customers will access the DataNode to access the stored information- reading and writing of the data.
As a system that is ready for the fall of a component, in addition to NameNode, we have a SecondaryNameNode. This component automatically makes NameNode checkpoints and if something happens to the NameNode this component is ready to provide the checkpoint in order to restore the state that the NameNode had it before the fall. Note that SecondaryNameNode will never take the position that the NameNode has. It will not solve the location where the files are stored. The only purpose it’s to create checkpoints for NameNode.

5. Data storage
All data that is stored as files. For the client the file is not divided into several parts even if this happens internally. Internally the file is divided into blocks that will end in being stored on one or more DataNodes. A large file can be stored in 2, 3 or even 20 nodes. The NameSpace controls this and it may require the blocks to be replicated in several locations.
In Figure 1 is the architecture of the system.
What is interesting about this architecture is how it works with files. The accessed data by the customers never passes through the NameNode. Therefore even if we have only one NameNode in the whole system once resolved the location of the files no client request will need to go through the NameNode.

6. File structure
How the files are stored to be accessed by customers is very simple. The customer can define a structure of directories and files. All these data are stored by the NameNode. This is the only one who knows how folders and files are defined by the customer. Options such as a hard-link or soft-link are not supported by HDFS.

7. Replication
Because the stored data is very important, HDFS allows us to set the number of copies that we want to have for each file. This can be set when creating the file or anytime thereafter. NameNode is the one that knows the number of copies that must exist for each file and makes sure that it exists.
For example, when increasing the number of replications that we want to have for a NameNode file take care that all blocks to be replicated again. NameNode’s job it doesn’t end here. For each block it receives an “I’m alive” signal at a specific time- heartbeat. If one of these blocks does not receive the signal the NameNode will start the recovery procedure automatically.
How the data replicates is extremely complex. HDFS must take into account many factors. When we need to make a new copy we must be careful since this operation consumes bandwidth. For these reasons we have a load-balancer that handles data distribution in cluster, the cluster location where the data is copied to be able to do the load-balance in the best way possible. There are many options for replication; one of the most common is the 30% of responses to be on the same node. In terms of distribution of the replications on racks, 2/3 are on the same rack and the other is on a separate rack.
Data from a DataNode may automatically be moved to another DataNode if it detects that the data is not evenly distributed.
All copies that exist for a file are used. Depending on the location where you want to retrieve the data, the client will have access the closest copy. Therefore, HDFS is a system that knows how the internal network looks like- including every DataNode, rack and other systems.

8. Namespace
The namespace that the NameNode has it stored in the RAM can be easily accessed. It is copied to the hard disk at precisely defined intervals- the image name that is written on the disk is FsImage. Because the copy on the disk is not the same as in the RAM memory, there is a file in which all changes that are made to the file or folder structure are logged- EditLog. This way if something happens to RAM memory or the NameNode, the recovery is simple and can contain the latest changes.

9. Data manipulation
A very interesting thing is how the client can create a file. Initially, data is not stored directly in the DataNode, but in a temporary location. Only when there is enough data for a write operation worthwhile, the NameNode is notified and copies the data in the DataNode.
When a client wants to delete a file it is not physically deleting from the system. The file is only marked for deletion and moved in the trash directory. In the trash is kept only the latest copy of the file and the client can enter into the folder to retrieve it. All files in the trash folder are automatically deleted after a certain period of time.

Conclusion
In this article we saw how Hadoop appeared, which are its main properties and how the data is stored. We saw that HDFS is a system created to work with large amount of data. This is made extremely well and with minimal costs.

Posted in Hadoop | No comments

Tuesday, 14 May 2013

NoSQL in 5 minutes

Posted on 09:57 by Unknown

NoSQL – one of 2013’s trends. If three or four years ago we rarely heard about a project to use NoSQL, nowadays the number of projects using non-relational databases is extremely high. In this article we will see the advantages In NoSQL taxonomy, a document is and challenges we could have when we use seen as a record from relational databases
NoSQL. In the second part of the article we will analyze and emphasize several nonrelational solutions and their benefits.

What is NoSQL?
The easiest definition would be: NoSQL is a database that doesn’t respect the rules of a non-relational database (DBMS). A non-relational database is not based on a relational model. Data is not groups in tables; therefore there is no mathematical relationship between them.
These databases are built in order to run on a large cluster. Data from such storage does not have a predefined schema. For this reasons, any new field can be added without any problem. NoSQL has appeared and developed around web applications, consequently the vast majority of functionalities are those that a web application has.

Benefits and risks
A non-relational database model is a flexible one. Depending on the solution that we use, we could have a very ‘hang loose’ model that can be changed with a minimum cost. There are many NoSQL solutions that are not model-based. For example, even though Cassandra and HBase have a pre-defined model, adding a new field can be easily done. There are various solutions that can store any kind of data structure without defining a model. An example could be those storing keyvalue-pairs or documents. and collections are seen as tables. The main difference is that in a table we will have records with the same structure, while a collection can have documents with different fields.
Non-relational databases are much more scalable than the classical ones. If we want to scale in a relational database we need powerful servers instead of adding some machines with a normal configuration to the cluster. This is due to the way in which a relational database works and adding a new node can be expensive.
The way in which a relational database is built easily allows a horizontal scaling. Moreover, these databases are suitable for virtualization and cloud.
Taking into account the databases’ dimensions and the growing number of transactions, a relational database is much more expensive than NoSQL. Solutions like Hadoop can process a lot of data. They are extremely horizontally scalable, which makes them very attractive.
Concerning costs, a non-relational database is a lot cheaper. We do not need hardware custom or special features to create a very powerful cluster. Using some regular servers, we can have an efficient database.
Certainly, NoSQL is not only milk and honey. Most of the solutions are rather new on the market compared to relational databases. For this reason some important functionalities may be missing – business mine and business intelligent. NoSQL has evolved to meet the requirements of web applications, which is the main cause for some missing features, not necessary on the web. That does not mean that they are missing and cannot be found, rather they are not quite mature enough or specific to the problem that the NoSQL solution is trying to solve.
Because they are so new to the market, many NoSQL solutions are pre-production versions, which cannot be used every time in the world of enterprise. The lack of official support for some products could be a stopper for medium and large projects.
The syntax with which we can interrogate a NoSQL database is different from a simple SQL query. We usually need to have some programming concepts. The number of experts in NoSQL databases is much lower than the one in SQL. The administration may be a nightmare, because support for administrators is presently weak.
However, ACID and transactions support is not common in NoSQL storage. Queries that can be written are pretty simple, and sometimes storages do not allow us to „join” the collections, therefore we have to write the code to do this.
All these issues will be solved in time, and the question we must ask ourselves when we think about architecture and we believe NoSQL could help is „Why not?”

The most widely used NoSQL solutions
On the market there are countless NoSQL solutions. There is no universal solution to solve all the problems we have. For this reason, when the we want to inteseveral types of storage. We may identify within our application several problems which require a NoSQL solution. We may need different solutions for each of these cases. This would add extra complexity because we would have two storages that we need to integrate.
MongoDB
This is one of the most used types of storage. In this type of storage all content is stored in the form of documents. Over these collections of documents we can perform any kind of dynamic queries to extract different data. In many ways MongoDB is closest to a relational database. All data we want to store is kept as a hash facilitating information retrieval. Basic CRUD operations work quickly on MongoDB.
It is a good solution when you need to store a lot of data that must be accessed in a very short time. MongoDB is a storage which can be used successfully. If we do not perform many insert, update and delete operations, information remains unchanged for a period of time. It can be successfully used when properties are stored as a query and /or index. For example, in a voting system, CMS or a storage system for comments. Another case in which it can be used is to store lists of categories and products in an online store. Due to the fact that it is directed to queries and the list of products does not change every two seconds, queries to be made on them will be rapid.
Another benefit is the self-share. A MongoDB database can be very easily held on 2/3 servers. The mechanism for data and documented.
Cassandra
It is the second on the list with eCommerce solutions for the storage of NoSQL solutions. This storage can become our friend when we have data that changes frequently. If the problem we want to solve is dominated by insertions and modifications of stored data, then Cassandra is our solution. Compared to insert and change, any query we do on our data is much slower. This storage is more oriented to writings, than to queries that retrieve data. If in MongoDB the data we work with was seen as documents with a hash attached to each of them, Cassandra stores all content in the form of columns.
In MongoDB, the data we access may not be in the latest version. Instead, Cassandra guarantees us the data we obtain through queries has the latest version. So if we access an email that is stored with the help of Cassandra, we get the latest version of the message. This solution can be installed in multiple data centers from different locations, providing support for failover or back-up - extremely high availability.
you have an eCommerce solution, where we need a storage system for our shopping cart. Insert and update operations will be done quickly, and each data query will bring the latest version of the shopping cart - this is very important when we perform check-out.
Cassandra came to be used in the financial industry, being ideal due to the performance of insert operations. In this environment data changes very often, the actions’ value being new in every moment.
CouchDB
If most of the operations we perform are just insert and read, no update, then CouchDB is a much better solution. This storage is targeted only to read and write operations.
Cassandra is a storage that can be successfully used as a tool for logging. In such a system we have many scripts, and the queries are rare and quite simple. For this reason it is the ideal solution when Besides this, we have an efficient support to pre-define queries and control the different versions that stored data may have. Therefore, update operations are not so fast. From all storages presented so far, this is the first storage that guarantees us ACID through the versioning system it implements.
Another feature of this storage is the support for replication. CouchDB is a good solution when we want to move the database offline. For example, on a mobile device that does not have an internet connection. Through this functionality, we have support for the distributed architecture to support replication in both directions.
It can be a solution for applications on mobile devices, which do not have 24 hour internet connectivity. Simultaneously, it is very useful in case of a CMS or CRM, where we need versioning and predefined queries.
HBase
This database is entirely integrated into Hadoop. The aim is to be used when we need to perform data analysis. HBase is designed to store large amounts of data that could normally not be stored in a normal database.
It can work in memory without any problem, and the data it stores can be compressed. It is one of the few NoSQL databases that support this feature. Due to its particularity, Hbase is used with Hadoop. In some cases, when working with tens / hundreds of millions of records, Hbase is worth being used.
Membase
As the name implies, this non-relational database can stay in memory. It is a perfect solution with very low latency, and and content replication becomes an easy process.
It is very common in games backend, especially online. Many systems that work with real-time data they need to manipulate or show use Membase storage. In these cases Membase may not be the only storage level that the application uses.
Redis
This storage is perfect when the number of the updates we need to do on our data is very high. It is an optimized storage for such operations. It is based on a very simple key-value. Therefore the queries that can be made are very limited. Although we have support for transactions, there is still not enough mature support for clustering. This can become a problem when the data we want to store does not fit in memory - the size of the database is related to the amount of internal memory.
Redis is quite interesting when we have real-time systems that need to communicate. In these cases Redis is one of the best solutions. There are several stock applications using this storage.

What does the future hold for us?
We see an increasing number of applications that use NoSQL. This does not mean that relational databases will disappear. The two types of storage will continue to exist and often coexist. Hybrid applications, which use both relational databases and NoSQL, are becoming more common. Also, an application does not need to use only a single database. There are solutions using two or more NoSQL databases. A good example is an eCommerce application that can use MongoDB to store the list of items and categories, and Cassandra to store the shopping cart to each of the clients.

Conclusion
In conclusion, we can say that NoSQL databases that must be part of our area of knowledge. Compared to relational databases we have many options, and each of these does one thing very well. In the NoSQL world we do not have storage to solve all the problems we may have. Each type of storage can solve different problems. The future belongs neither to non-relational databases, nor to relational ones. The future belongs to applications that use both types of storage, depending on the needs.

Posted in nosql | No comments

Saturday, 11 May 2013

How to iterage a Service Bus Queue using Azure SDK 2.0 & What are the concerns of this new feature

Posted on 08:37 by Unknown

The new version of Windows Azure Service Bus came with a lot of new feature. One of the new feature that was added in Windows Azure SDK 2.0 is the support of message browsing.
What is message browsing? This is a functionality that gives the ability to a client to iterate and access messages from a Service Bus Queue without locking or removing a message from the actual queue.
There are pretty interesting thing that we can do using this feature. For example we can peek a message from a specific index in the queue:

BrokenMessage message = queueClient.Peek(3);

Or we can peek the first message that is available:

BrokenMessage message = queueClient.Peek();

Another option is to peek a specific number of message from the queue:

IEnumerable<BrokenMessage> messages = queueClient.PeekBatch(20);

The PeekBatch support also two parameters, where the first one is used to specify the starting index (similar with Peek(3)):

IEnumerable<BrokenMessage> messages = queueClient.PeekBatch(100, 20)

In the above example we peek the next 20 messages, starting from the index 100.
Interesting feature, from now one we can iterate a queue. Great! But I have an open question:
Why would we want to iterate a queue?
If we need a functionality like this, then maybe we don’t use the proper service. For example, we would say that we need this feature when we need to do monitoring or audit of a system. Well… true – we could use this feature to have audit system over a queue, BUT. Yes, there is a big BUT. Why would you use a queue when you need to support audit over it. You can use Service Bus Topic and have a dedicated subscription for the audit system. For the same price, you can have a system that was created for this scenario.
Other cases when people would find this feature useful would be in the moment when they need to debug the application. They could see the content of the queue – the investigation process would be easier. True, I had the “opportunity” to debug a system that use queues and when you cannot control the content of the queue, then the nightmare begin. When problem like this occurs I would try to use the death letter feature – when a message cannot be processed X times, than the message is send automatically to death letter queue.
I see the value of this feature when we need to make debug or to investigate the content of a queue. In the same time this feature can open Pandora’s door. Why? Because developer will use this feature in strange way and from a queue we will end up with a list.

Posted in Azure, Cloud, messagequeue, service bus, Windows Azure | No comments

HDFS and Hadoop - Today Software Magazine

Posted on 02:55 by Unknown

This week the 11th number of Today Software Magazine was lunched. In this number of the magazine I wrote about how Hadoop manage to store hundreds of terabytes without any kind of problem. I tried to explain what the secret that make Hadoop a system that supports more than 40.000 nodes.
The article in Romanian can be found here: www.todaysoftmag.com/tsm/ro/11
I hope until next week to have the article in English also. Until than I will share the slides from the presentation.

HDFS and Hadoop from Radu Vunvulea

Video:

Posted in cluj-napoca, eveniment, event, Hadoop | No comments

Thursday, 9 May 2013

Taks and Thread.Sleep

Posted on 07:50 by Unknown

What is the problem of the following code?

Task.Factory.StartNew(() =>
    {
        while (true)
        {
            Console.WriteLine("@ " + DateTime.Now.ToString());
            Thread.Sleep(TimeSpan.FromSeconds(2));
        }
    });
var task = Task.Factory.StartNew(() =>
{
    while (true)
    {
        Console.WriteLine("!  " + DateTime.Now.ToString());
        Thread.Sleep(TimeSpan.FromSeconds(4));
    }
});

Task.WaitAll(task);

The main problem is related to Thread.Sleep. Using this method in combination with tasks we can gen odd behavior and in the end the performance of the tasks can be affected. One of the behavior that we could get is related to how the sleep period – that will not be the same as we expect ("random" sleep interval).

This occurs because of the TPL behavior, which will not guarantee that each task will have a dedicated thread where he can run.

What we should do to resolve this problem?

We have two possible solution.

1. One solution is to create a timer for each task. This would work 100% and for this example the solution is the way how we should implemented this from the start.

2. In other cases we wouldn't be able to use timers and we would like to be able to put the task in sleep. In the case we have a long running task, than we could set the TaskCreationOption to LongRunning. This will notify TPL that this task is a long running task and will create a dedicate thread for it.

Task.Factory.StartNew(() =>
    {
        while (true)
        {
            Console.WriteLine("@ " + DateTime.Now.ToString());
            Thread.Sleep(TimeSpan.FromSeconds(2));
        }
    },TaskCreationOptions.LongRunning);
var task = Task.Factory.StartNew(() =>
{
    while (true)
    {
        Console.WriteLine("!  " + DateTime.Now.ToString());
        Thread.Sleep(TimeSpan.FromSeconds(4));
    }
}, TaskCreationOptions.LongRunning);

Task.WaitAll(task);

Posted in multitasking, Windows Task | No comments

Friday, 3 May 2013

(Part 5) Testing the limits of Windows Azure Service Bus

Posted on 07:07 by Unknown

In the last series of post about testing the limits of Windows Azure Service Bus we saw what is the best configuration for our problem, the best configuration is to have 4 medium instances.
The next question for us is: How we can decrease the time processing?
In our case we saw that from the cost and benefits perspective we reached the limitation on Service Bus. This is around 4 medium instances that are able to process around 1.000.000 messages in 34 minutes. But what we can do if we have 10.000.000 messages that needs to be processed in 1 hour.
We could increase the number of instances, but this will not improve our performance too much. In this moment we use only one topic. This topic, as Service Bus, has its own limitations – we cannot make an undefined number of request per second and expect to have a low latency.
A solution to decrease the number of request is to use batches – we already use them.
Another solution is to scale the topics. In this moment we have only one topic. If we would use 5 topics, than we could have 4 instances that will consume messages for each topic. In this way we could consume 5.000.000 messages in 34 minutes. The downside is from the cost perspective. The costs will increase with a 5X factor.
From the cost perceptive, this would be acceptable because all the instances work at maximum capacity. In the moment when we don’t need any more this instances we could stop them.
The only problem is how we can distribute the messages between different topics.
One solution would be to identify different attributes of our messages and try to group messages based on this attributes. In this way we could distributed the messages on more than one topic. In theory this could be a good solution, but how many times you could group items in an equal way. In real life this cannot be accomplish in normal cases.
What we could do is to create a mechanism that can select what topic to use. For example each producer, at a specific time interval or after he send a specific number of messages, check what is the load on each topic and decide what topic should be used.
In this way we could have a flat distribution of messages over our topics.
In this post we saw how we can scale on horizontal Windows Azure Service Bus. With a good design we should be able to support scaling in all the locations where we expect to have a bottleneck.

Posted in Azure, service bus, Windows Azure | No comments

Thursday, 2 May 2013

Windows Azure Service Bus - Control what actions can be made

Posted on 14:05 by Unknown

Did you ever need or want to disable the send functionality from a queue or from a topic? All the people start to send messages to you even if you say STOP. How nice would be to be able to say STOP.
We have good news for people that use Windows Azure Service Bus (Queue, Topics and Subscriptions). With the new version of SDK (2.0), we can control the operation that can be made on Service Bus. How nice is that.
People would ask – why we would want to do something like this. Well, there are times, especially when we make tests, when we want to control this. I see this value in two situations:
When we want to make performance tests, to see how many messages we can consume from a queue in a specific time interval
When there a lot of messages in the queue or topic and we don’t want to accept new messages anymore.
This feature can be controlled from the description of the queue, topic or subscription. Based on what kind of service we use from Service Bus we can have different status. It is important to not forger to call the UpdateQueue or UpdateTopic/UpdateSubscription after you change the service status.
The entity status of a services can have the following values:

EntityStatus.Disable – Sending and receiving messages is disabled
EntityStatus.SendDisabled – Sending is disable
EntityStatus.ReceiveDisabled – Receiving is disabled
EntityStatus.Active – Sending and receiving are active

Don’t try to use a combination of this status. All the possible combinations are supported with this 4 status.

TopicDescription topic = namespaceManager.GetTopic("fooTopic");
topic.Status = EntityStatus.SendDisabled; //can continue to de-queue
namespaceManager.UpdateQueue(topic);

This configuration can be made from the moment when you create the topic.

if (!namespaceManager.TopicExists("fooTopic"))
{
    TopicDescription topic = new TopicDescription("fooTopic");
    topic.Status = EntityStatus.SendDisabled; //can continue to de-queue
    namespaceManager.CreateTopic(topic);
}

Posted in Azure, Cloud, service bus, Windows Azure | No comments

Windows Mobile Support