Wednesday, April 22, 2009

Good and Bad Public Cloud candidates

I recently have a good conversation with Ed and Dean of RightScale on what are the characteristics making an application a good public cloud citizen.

Good Public Cloud Candidates

Being stateless
  • There is no warm up and cool down periods required. Newly started instances are immediately ready to work
  • Work dispatching is very simple when any instances can do the work
Compute intensive with small dataset size
  • Cloud computing enable quick deployment of a lot of CPU to work on you problem. But if it requires to load large amount of data before the CPU can start their computation, the latency and bandwidth cost will be increased
Contains only non-sensitive data
  • Although cryptography technology is sufficient to protect your data, you don't want to pay the CPU overhead for encrypt/decrypt for every piece of data that you use in the cloud.
Highly fluctuating workload pattern
  • Now you don't need to provision inhouse equipment to cater for the peak load, which sits idle for most of the time.
  • The "pay as you go" model save cost because you don't pay when you are not using them
New application launch with unknown workload anticipation
  • You don't need to take the risk of over-estimating the popularity of your new application, and buy more equipment than you actually need.
  • You don't need to take the risk of under-estimating the popularity and ends up frustrating your customer because you cannot handle their workload.
  • You can defer your big step investment and still be able to try out new ideas.


Bad Public Cloud Candidates

Demand special hardware
  • Most of the cloud equipment are cheap, general purpose commodities. If you application demands a certain piece of hardware (e.g. a graphic processor for rendering), it will just not work or awfully slow.
Demand multi-cast communication facilities
  • Most of the cloud providers disable multicast traffic in their network
Need to reside in a particular geographic location
  • If there is legal or business requirement demanding your server to run in a physical location and that location is not covered by cloud providers, then you are out of luck.
Contain large dataset
  • Bandwidth cost across cloud boundary is high. So you may endup have a large bill when loading large amount of data into the cloud
  • Loading large amount of data also takes time. You need to compare that with the overall time of the processing itself to see if it makes sense
Contain highly sensitive data
  • Legal, liabilities, auditing practices hasn't catched up yet. Companies running their core business app in the cloud will face a lot of legal challenges
Demand extremely low latency of user response
  • Since you have no control about the location of where the machines are residing, latency is usually increase when you run your app in the cloud residing in a remote location
Run by 24 x 7 with extremely stable and non-fluctuating workload pattern
  • If you are using a machine without shutting down, then many cost analysis report shows running the machine inhouse will be cheaper (especially for large enterprise who already have data center setup and a team of system administrators)

Hybrid Cloud


I personally believe the real usage pattern of public cloud for large enterprise is to move the fluctuating workload into the public cloud (e.g. Christmas sales for e-commerce site, Newly launched services) but retain most of the steady workload traffic in-house. In fact, I think enterprise is going to move in and out their application constantly based on the change of traffic patterns.

It is much appropriate to do the classification at the component level rather than at the App level. Instead of saying whether the App (as a whole) is suitable or not, we should determine which components of the app should run in the public cloud and which should stay in the data center.

In other words, an Application is running in a hybrid cloud mix, which span across public and private cloud.

The ability to move your application “frictionless” across cloud boundaries and manage the scattered components in a holistic way is key. Once you have this freedom to move, then the price of a particular public cloud provider has less impact on you because you can easily move to any cheaper provider at any time.

6 comments:

Anonymous said...

Why do you say that large data sets are a poor candidate for cloud computing?

Unknown said...

Large data sets are a poor candidate because you have a finite amount of bandwidth to and from any cloud instance.

Network bottlenecks will kill you if you only have T1 or DSL but you're trying to move files bigger than a few MB to and from the cloud. At maximum speed (heh) it takes an hour to send 675MB over a 1.5Mbps T1.

Anonymous said...

Network bottlenecks will also kill you when you first load a large data set onto your own computers. But once you have it on your own internal file system it's no longer such a problem. Why isn't that the same for the cloud? Once you get your large data set onto S3, you can keep it there, and can run EC2 amok on it.

Ricky Ho said...

You are assuming data is static. What if data is continuously produced ? Think about a log file ...

Every time you need to analyze the log file "in the cloud", you need to upload your log file (which resides in your data center) to the cloud first. Whether this extra cost or latency is acceptable depends on how how your computation in the cloud take place.

Well, you may think why not skip the local log file completely and directly write your log to S3. There are two issues here.

First, Security can be a concern if your log contains other senstitive data. Second, S3 is not designed for "append" operation.

Anonymous said...

Yes, we can all agree that privacy-protected data sets are a poor candidate for the cloud, whether large or small. As for large non-sensitive data sets, the outcome is less clear. Large slowly changing data sets (that can easily be uploaded to the cloud) are a good candidate for the cloud. And large continuously-produced data sets are best analyzed in situ, which could be inside or outside the cloud.

Anonymous said...

Another bad cloud candidate is large memory jobs. The most you can get on Amazon EC2 is 15GB RAM.